Hi Shannon,  [1] will be a good start on coder in Java SDK.

[1]
https://beam.apache.org/documentation/programming-guide/#data-encoding-and-type-safety

Rui

On Thu, Jul 11, 2019 at 3:08 PM Shannon Duncan <joseph.dun...@liveramp.com>
wrote:

> Was able to get it to use ArrayList by doing List<List<Integer>> result =
> new ArrayList<List<Integer>>();
>
> Then storing my keys in a separate array that I'll pass in as a side input
> to key for the list of lists.
>
> Thanks for the help, lemme know more in the future about how coders work
> and instantiate and I'd love to help contribute by adding some new coders.
>
> - Shannon
>
> On Thu, Jul 11, 2019 at 4:59 PM Shannon Duncan <joseph.dun...@liveramp.com>
> wrote:
>
>> Will do. Thanks. A new coder for deterministic Maps would be great in the
>> future. Thank you!
>>
>> On Thu, Jul 11, 2019 at 4:58 PM Rui Wang <ruw...@google.com> wrote:
>>
>>> I think Mike refers to ListCoder
>>> <https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/ListCoder.java>
>>>  which
>>> is deterministic if its element is the same. Maybe you can search the repo
>>> for examples of ListCoder?
>>>
>>>
>>> -Rui
>>>
>>> On Thu, Jul 11, 2019 at 2:55 PM Shannon Duncan <
>>> joseph.dun...@liveramp.com> wrote:
>>>
>>>> So ArrayList doesn't work either, so just a standard List?
>>>>
>>>> On Thu, Jul 11, 2019 at 4:53 PM Rui Wang <ruw...@google.com> wrote:
>>>>
>>>>> Shannon, I agree with Mike on List is a good workaround if your
>>>>> element within list is deterministic and you are eager to make your new
>>>>> pipeline working.
>>>>>
>>>>>
>>>>> Let me send back some pointers to adding new coder later.
>>>>>
>>>>>
>>>>> -Rui
>>>>>
>>>>> On Thu, Jul 11, 2019 at 2:45 PM Shannon Duncan <
>>>>> joseph.dun...@liveramp.com> wrote:
>>>>>
>>>>>> I just started learning Java today to attempt to convert our python
>>>>>> pipelines to Java to take advantage of key features that Java has. I have
>>>>>> no idea how I would create a new coder and include it in for beam to
>>>>>> recognize.
>>>>>>
>>>>>> If you can point me in the right direction of where it hooks together
>>>>>> I might be able to figure that out. I can duplicate MapCoder and try to
>>>>>> make changes, but how will beam know to pick up that coder for a 
>>>>>> groupByKey?
>>>>>>
>>>>>> Thanks!
>>>>>> Shannon
>>>>>>
>>>>>> On Thu, Jul 11, 2019 at 4:42 PM Rui Wang <ruw...@google.com> wrote:
>>>>>>
>>>>>>> It could be just straightforward to create a SortedMapCoder for
>>>>>>> TreeMap. Just add checks on map instances and then change
>>>>>>> verifyDeterministic.
>>>>>>>
>>>>>>> If this is a common need we could just submit it into Beam repo.
>>>>>>>
>>>>>>> [1]:
>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java#L146
>>>>>>>
>>>>>>> On Thu, Jul 11, 2019 at 2:26 PM Mike Pedersen <m...@mikepedersen.dk>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> There isn't a coder for deterministic maps in Beam, so even if your
>>>>>>>> datastructure is deterministic, Beam will assume the serialized bytes
>>>>>>>> aren't deterministic.
>>>>>>>>
>>>>>>>> You could make one using the MapCoder as a guide:
>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java
>>>>>>>> Just change it such that the exception in VerifyDeterministic is
>>>>>>>> removed and when decoding it instantiates a TreeMap or such instead of 
>>>>>>>> a
>>>>>>>> HashMap.
>>>>>>>>
>>>>>>>> Alternatively, you could just represent your key as a sorted list
>>>>>>>> of KV pairs. Lookups could be done using binary search if necessary.
>>>>>>>>
>>>>>>>> Mike
>>>>>>>>
>>>>>>>> Den tor. 11. jul. 2019 kl. 22.41 skrev Shannon Duncan <
>>>>>>>> joseph.dun...@liveramp.com>:
>>>>>>>>
>>>>>>>>> So I'm working on essentially doing a word-count on a complex data
>>>>>>>>> structure.
>>>>>>>>>
>>>>>>>>> I tried just using a HashMap as the Structure, but that didn't
>>>>>>>>> work because it is non-deterministic.
>>>>>>>>>
>>>>>>>>> However when Given a LinkedHashMap or TreeMap which is
>>>>>>>>> deterministic the SDK complains that it's non-deterministic when 
>>>>>>>>> trying to
>>>>>>>>> use it as a key for GroupByKey.
>>>>>>>>>
>>>>>>>>> What would be an appropriate Map style data structure that would
>>>>>>>>> be deterministic enough for Apache Beam to accept it as a key?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Shannon
>>>>>>>>>
>>>>>>>>

Reply via email to