I have a working TreeMapCoder now. Got it all setup and done, and the
GroupByKey is accepting it.

Thanks for all the help. I need to read up more on contributing guidelines
then I'll PR the coder into the SDK. Also willing to write coders for
things such as ArrayList etc if people want them.

On Fri, Jul 12, 2019 at 9:31 AM Shannon Duncan <joseph.dun...@liveramp.com>
wrote:

> Aha, makes sense. Thanks!
>
> On Fri, Jul 12, 2019 at 9:26 AM Lukasz Cwik <lc...@google.com> wrote:
>
>> TreeMapCoder.of(StringUtf8Coder.of(), ListCoder.of(VarIntCoder.of()));
>>
>> On Fri, Jul 12, 2019 at 10:22 AM Shannon Duncan <
>> joseph.dun...@liveramp.com> wrote:
>>
>>> So I have my custom coder created for TreeMap and I'm ready to set it...
>>>
>>> So my Type is "TreeMap<String, ArrayList<Integer>>"
>>>
>>> What do I put for ".setCoder(TreeMapCoder.of(???, ???))"
>>>
>>> On Thu, Jul 11, 2019 at 8:21 PM Rui Wang <ruw...@google.com> wrote:
>>>
>>>> Hi Shannon,  [1] will be a good start on coder in Java SDK.
>>>>
>>>>
>>>> [1]
>>>> https://beam.apache.org/documentation/programming-guide/#data-encoding-and-type-safety
>>>>
>>>> Rui
>>>>
>>>> On Thu, Jul 11, 2019 at 3:08 PM Shannon Duncan <
>>>> joseph.dun...@liveramp.com> wrote:
>>>>
>>>>> Was able to get it to use ArrayList by doing List<List<Integer>>
>>>>> result = new ArrayList<List<Integer>>();
>>>>>
>>>>> Then storing my keys in a separate array that I'll pass in as a side
>>>>> input to key for the list of lists.
>>>>>
>>>>> Thanks for the help, lemme know more in the future about how coders
>>>>> work and instantiate and I'd love to help contribute by adding some new
>>>>> coders.
>>>>>
>>>>> - Shannon
>>>>>
>>>>> On Thu, Jul 11, 2019 at 4:59 PM Shannon Duncan <
>>>>> joseph.dun...@liveramp.com> wrote:
>>>>>
>>>>>> Will do. Thanks. A new coder for deterministic Maps would be great in
>>>>>> the future. Thank you!
>>>>>>
>>>>>> On Thu, Jul 11, 2019 at 4:58 PM Rui Wang <ruw...@google.com> wrote:
>>>>>>
>>>>>>> I think Mike refers to ListCoder
>>>>>>> <https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/ListCoder.java>
>>>>>>>  which
>>>>>>> is deterministic if its element is the same. Maybe you can search the 
>>>>>>> repo
>>>>>>> for examples of ListCoder?
>>>>>>>
>>>>>>>
>>>>>>> -Rui
>>>>>>>
>>>>>>> On Thu, Jul 11, 2019 at 2:55 PM Shannon Duncan <
>>>>>>> joseph.dun...@liveramp.com> wrote:
>>>>>>>
>>>>>>>> So ArrayList doesn't work either, so just a standard List?
>>>>>>>>
>>>>>>>> On Thu, Jul 11, 2019 at 4:53 PM Rui Wang <ruw...@google.com> wrote:
>>>>>>>>
>>>>>>>>> Shannon, I agree with Mike on List is a good workaround if your
>>>>>>>>> element within list is deterministic and you are eager to make your 
>>>>>>>>> new
>>>>>>>>> pipeline working.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Let me send back some pointers to adding new coder later.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Rui
>>>>>>>>>
>>>>>>>>> On Thu, Jul 11, 2019 at 2:45 PM Shannon Duncan <
>>>>>>>>> joseph.dun...@liveramp.com> wrote:
>>>>>>>>>
>>>>>>>>>> I just started learning Java today to attempt to convert our
>>>>>>>>>> python pipelines to Java to take advantage of key features that Java 
>>>>>>>>>> has. I
>>>>>>>>>> have no idea how I would create a new coder and include it in for 
>>>>>>>>>> beam to
>>>>>>>>>> recognize.
>>>>>>>>>>
>>>>>>>>>> If you can point me in the right direction of where it hooks
>>>>>>>>>> together I might be able to figure that out. I can duplicate 
>>>>>>>>>> MapCoder and
>>>>>>>>>> try to make changes, but how will beam know to pick up that coder 
>>>>>>>>>> for a
>>>>>>>>>> groupByKey?
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>> Shannon
>>>>>>>>>>
>>>>>>>>>> On Thu, Jul 11, 2019 at 4:42 PM Rui Wang <ruw...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> It could be just straightforward to create a SortedMapCoder for
>>>>>>>>>>> TreeMap. Just add checks on map instances and then change
>>>>>>>>>>> verifyDeterministic.
>>>>>>>>>>>
>>>>>>>>>>> If this is a common need we could just submit it into Beam repo.
>>>>>>>>>>>
>>>>>>>>>>> [1]:
>>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java#L146
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jul 11, 2019 at 2:26 PM Mike Pedersen <
>>>>>>>>>>> m...@mikepedersen.dk> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> There isn't a coder for deterministic maps in Beam, so even if
>>>>>>>>>>>> your datastructure is deterministic, Beam will assume the 
>>>>>>>>>>>> serialized bytes
>>>>>>>>>>>> aren't deterministic.
>>>>>>>>>>>>
>>>>>>>>>>>> You could make one using the MapCoder as a guide:
>>>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java
>>>>>>>>>>>> Just change it such that the exception in VerifyDeterministic
>>>>>>>>>>>> is removed and when decoding it instantiates a TreeMap or such 
>>>>>>>>>>>> instead of a
>>>>>>>>>>>> HashMap.
>>>>>>>>>>>>
>>>>>>>>>>>> Alternatively, you could just represent your key as a sorted
>>>>>>>>>>>> list of KV pairs. Lookups could be done using binary search if 
>>>>>>>>>>>> necessary.
>>>>>>>>>>>>
>>>>>>>>>>>> Mike
>>>>>>>>>>>>
>>>>>>>>>>>> Den tor. 11. jul. 2019 kl. 22.41 skrev Shannon Duncan <
>>>>>>>>>>>> joseph.dun...@liveramp.com>:
>>>>>>>>>>>>
>>>>>>>>>>>>> So I'm working on essentially doing a word-count on a complex
>>>>>>>>>>>>> data structure.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I tried just using a HashMap as the Structure, but that didn't
>>>>>>>>>>>>> work because it is non-deterministic.
>>>>>>>>>>>>>
>>>>>>>>>>>>> However when Given a LinkedHashMap or TreeMap which is
>>>>>>>>>>>>> deterministic the SDK complains that it's non-deterministic when 
>>>>>>>>>>>>> trying to
>>>>>>>>>>>>> use it as a key for GroupByKey.
>>>>>>>>>>>>>
>>>>>>>>>>>>> What would be an appropriate Map style data structure that
>>>>>>>>>>>>> would be deterministic enough for Apache Beam to accept it as a 
>>>>>>>>>>>>> key?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Shannon
>>>>>>>>>>>>>
>>>>>>>>>>>>

Reply via email to