TreeMapCoder.of(StringUtf8Coder.of(), ListCoder.of(VarIntCoder.of()));

On Fri, Jul 12, 2019 at 10:22 AM Shannon Duncan <joseph.dun...@liveramp.com>
wrote:

> So I have my custom coder created for TreeMap and I'm ready to set it...
>
> So my Type is "TreeMap<String, ArrayList<Integer>>"
>
> What do I put for ".setCoder(TreeMapCoder.of(???, ???))"
>
> On Thu, Jul 11, 2019 at 8:21 PM Rui Wang <ruw...@google.com> wrote:
>
>> Hi Shannon,  [1] will be a good start on coder in Java SDK.
>>
>>
>> [1]
>> https://beam.apache.org/documentation/programming-guide/#data-encoding-and-type-safety
>>
>> Rui
>>
>> On Thu, Jul 11, 2019 at 3:08 PM Shannon Duncan <
>> joseph.dun...@liveramp.com> wrote:
>>
>>> Was able to get it to use ArrayList by doing List<List<Integer>> result
>>> = new ArrayList<List<Integer>>();
>>>
>>> Then storing my keys in a separate array that I'll pass in as a side
>>> input to key for the list of lists.
>>>
>>> Thanks for the help, lemme know more in the future about how coders work
>>> and instantiate and I'd love to help contribute by adding some new coders.
>>>
>>> - Shannon
>>>
>>> On Thu, Jul 11, 2019 at 4:59 PM Shannon Duncan <
>>> joseph.dun...@liveramp.com> wrote:
>>>
>>>> Will do. Thanks. A new coder for deterministic Maps would be great in
>>>> the future. Thank you!
>>>>
>>>> On Thu, Jul 11, 2019 at 4:58 PM Rui Wang <ruw...@google.com> wrote:
>>>>
>>>>> I think Mike refers to ListCoder
>>>>> <https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/ListCoder.java>
>>>>>  which
>>>>> is deterministic if its element is the same. Maybe you can search the repo
>>>>> for examples of ListCoder?
>>>>>
>>>>>
>>>>> -Rui
>>>>>
>>>>> On Thu, Jul 11, 2019 at 2:55 PM Shannon Duncan <
>>>>> joseph.dun...@liveramp.com> wrote:
>>>>>
>>>>>> So ArrayList doesn't work either, so just a standard List?
>>>>>>
>>>>>> On Thu, Jul 11, 2019 at 4:53 PM Rui Wang <ruw...@google.com> wrote:
>>>>>>
>>>>>>> Shannon, I agree with Mike on List is a good workaround if your
>>>>>>> element within list is deterministic and you are eager to make your new
>>>>>>> pipeline working.
>>>>>>>
>>>>>>>
>>>>>>> Let me send back some pointers to adding new coder later.
>>>>>>>
>>>>>>>
>>>>>>> -Rui
>>>>>>>
>>>>>>> On Thu, Jul 11, 2019 at 2:45 PM Shannon Duncan <
>>>>>>> joseph.dun...@liveramp.com> wrote:
>>>>>>>
>>>>>>>> I just started learning Java today to attempt to convert our python
>>>>>>>> pipelines to Java to take advantage of key features that Java has. I 
>>>>>>>> have
>>>>>>>> no idea how I would create a new coder and include it in for beam to
>>>>>>>> recognize.
>>>>>>>>
>>>>>>>> If you can point me in the right direction of where it hooks
>>>>>>>> together I might be able to figure that out. I can duplicate MapCoder 
>>>>>>>> and
>>>>>>>> try to make changes, but how will beam know to pick up that coder for a
>>>>>>>> groupByKey?
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>> Shannon
>>>>>>>>
>>>>>>>> On Thu, Jul 11, 2019 at 4:42 PM Rui Wang <ruw...@google.com> wrote:
>>>>>>>>
>>>>>>>>> It could be just straightforward to create a SortedMapCoder for
>>>>>>>>> TreeMap. Just add checks on map instances and then change
>>>>>>>>> verifyDeterministic.
>>>>>>>>>
>>>>>>>>> If this is a common need we could just submit it into Beam repo.
>>>>>>>>>
>>>>>>>>> [1]:
>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java#L146
>>>>>>>>>
>>>>>>>>> On Thu, Jul 11, 2019 at 2:26 PM Mike Pedersen <
>>>>>>>>> m...@mikepedersen.dk> wrote:
>>>>>>>>>
>>>>>>>>>> There isn't a coder for deterministic maps in Beam, so even if
>>>>>>>>>> your datastructure is deterministic, Beam will assume the serialized 
>>>>>>>>>> bytes
>>>>>>>>>> aren't deterministic.
>>>>>>>>>>
>>>>>>>>>> You could make one using the MapCoder as a guide:
>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java
>>>>>>>>>> Just change it such that the exception in VerifyDeterministic is
>>>>>>>>>> removed and when decoding it instantiates a TreeMap or such instead 
>>>>>>>>>> of a
>>>>>>>>>> HashMap.
>>>>>>>>>>
>>>>>>>>>> Alternatively, you could just represent your key as a sorted list
>>>>>>>>>> of KV pairs. Lookups could be done using binary search if necessary.
>>>>>>>>>>
>>>>>>>>>> Mike
>>>>>>>>>>
>>>>>>>>>> Den tor. 11. jul. 2019 kl. 22.41 skrev Shannon Duncan <
>>>>>>>>>> joseph.dun...@liveramp.com>:
>>>>>>>>>>
>>>>>>>>>>> So I'm working on essentially doing a word-count on a complex
>>>>>>>>>>> data structure.
>>>>>>>>>>>
>>>>>>>>>>> I tried just using a HashMap as the Structure, but that didn't
>>>>>>>>>>> work because it is non-deterministic.
>>>>>>>>>>>
>>>>>>>>>>> However when Given a LinkedHashMap or TreeMap which is
>>>>>>>>>>> deterministic the SDK complains that it's non-deterministic when 
>>>>>>>>>>> trying to
>>>>>>>>>>> use it as a key for GroupByKey.
>>>>>>>>>>>
>>>>>>>>>>> What would be an appropriate Map style data structure that would
>>>>>>>>>>> be deterministic enough for Apache Beam to accept it as a key?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Shannon
>>>>>>>>>>>
>>>>>>>>>>

Reply via email to