I think Mike refers to ListCoder
<https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/ListCoder.java>
which
is deterministic if its element is the same. Maybe you can search the repo
for examples of ListCoder?


-Rui

On Thu, Jul 11, 2019 at 2:55 PM Shannon Duncan <joseph.dun...@liveramp.com>
wrote:

> So ArrayList doesn't work either, so just a standard List?
>
> On Thu, Jul 11, 2019 at 4:53 PM Rui Wang <ruw...@google.com> wrote:
>
>> Shannon, I agree with Mike on List is a good workaround if your element
>> within list is deterministic and you are eager to make your new pipeline
>> working.
>>
>>
>> Let me send back some pointers to adding new coder later.
>>
>>
>> -Rui
>>
>> On Thu, Jul 11, 2019 at 2:45 PM Shannon Duncan <
>> joseph.dun...@liveramp.com> wrote:
>>
>>> I just started learning Java today to attempt to convert our python
>>> pipelines to Java to take advantage of key features that Java has. I have
>>> no idea how I would create a new coder and include it in for beam to
>>> recognize.
>>>
>>> If you can point me in the right direction of where it hooks together I
>>> might be able to figure that out. I can duplicate MapCoder and try to make
>>> changes, but how will beam know to pick up that coder for a groupByKey?
>>>
>>> Thanks!
>>> Shannon
>>>
>>> On Thu, Jul 11, 2019 at 4:42 PM Rui Wang <ruw...@google.com> wrote:
>>>
>>>> It could be just straightforward to create a SortedMapCoder for
>>>> TreeMap. Just add checks on map instances and then change
>>>> verifyDeterministic.
>>>>
>>>> If this is a common need we could just submit it into Beam repo.
>>>>
>>>> [1]:
>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java#L146
>>>>
>>>> On Thu, Jul 11, 2019 at 2:26 PM Mike Pedersen <m...@mikepedersen.dk>
>>>> wrote:
>>>>
>>>>> There isn't a coder for deterministic maps in Beam, so even if your
>>>>> datastructure is deterministic, Beam will assume the serialized bytes
>>>>> aren't deterministic.
>>>>>
>>>>> You could make one using the MapCoder as a guide:
>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java
>>>>> Just change it such that the exception in VerifyDeterministic is
>>>>> removed and when decoding it instantiates a TreeMap or such instead of a
>>>>> HashMap.
>>>>>
>>>>> Alternatively, you could just represent your key as a sorted list of
>>>>> KV pairs. Lookups could be done using binary search if necessary.
>>>>>
>>>>> Mike
>>>>>
>>>>> Den tor. 11. jul. 2019 kl. 22.41 skrev Shannon Duncan <
>>>>> joseph.dun...@liveramp.com>:
>>>>>
>>>>>> So I'm working on essentially doing a word-count on a complex data
>>>>>> structure.
>>>>>>
>>>>>> I tried just using a HashMap as the Structure, but that didn't work
>>>>>> because it is non-deterministic.
>>>>>>
>>>>>> However when Given a LinkedHashMap or TreeMap which is deterministic
>>>>>> the SDK complains that it's non-deterministic when trying to use it as a
>>>>>> key for GroupByKey.
>>>>>>
>>>>>> What would be an appropriate Map style data structure that would be
>>>>>> deterministic enough for Apache Beam to accept it as a key?
>>>>>>
>>>>>> Thanks,
>>>>>> Shannon
>>>>>>
>>>>>

Reply via email to