Shannon, I agree with Mike on List is a good workaround if your element within list is deterministic and you are eager to make your new pipeline working.
Let me send back some pointers to adding new coder later. -Rui On Thu, Jul 11, 2019 at 2:45 PM Shannon Duncan <joseph.dun...@liveramp.com> wrote: > I just started learning Java today to attempt to convert our python > pipelines to Java to take advantage of key features that Java has. I have > no idea how I would create a new coder and include it in for beam to > recognize. > > If you can point me in the right direction of where it hooks together I > might be able to figure that out. I can duplicate MapCoder and try to make > changes, but how will beam know to pick up that coder for a groupByKey? > > Thanks! > Shannon > > On Thu, Jul 11, 2019 at 4:42 PM Rui Wang <ruw...@google.com> wrote: > >> It could be just straightforward to create a SortedMapCoder for TreeMap. >> Just add checks on map instances and then change verifyDeterministic. >> >> If this is a common need we could just submit it into Beam repo. >> >> [1]: >> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java#L146 >> >> On Thu, Jul 11, 2019 at 2:26 PM Mike Pedersen <m...@mikepedersen.dk> >> wrote: >> >>> There isn't a coder for deterministic maps in Beam, so even if your >>> datastructure is deterministic, Beam will assume the serialized bytes >>> aren't deterministic. >>> >>> You could make one using the MapCoder as a guide: >>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java >>> Just change it such that the exception in VerifyDeterministic is removed >>> and when decoding it instantiates a TreeMap or such instead of a HashMap. >>> >>> Alternatively, you could just represent your key as a sorted list of KV >>> pairs. Lookups could be done using binary search if necessary. >>> >>> Mike >>> >>> Den tor. 11. jul. 2019 kl. 22.41 skrev Shannon Duncan < >>> joseph.dun...@liveramp.com>: >>> >>>> So I'm working on essentially doing a word-count on a complex data >>>> structure. >>>> >>>> I tried just using a HashMap as the Structure, but that didn't work >>>> because it is non-deterministic. >>>> >>>> However when Given a LinkedHashMap or TreeMap which is deterministic >>>> the SDK complains that it's non-deterministic when trying to use it as a >>>> key for GroupByKey. >>>> >>>> What would be an appropriate Map style data structure that would be >>>> deterministic enough for Apache Beam to accept it as a key? >>>> >>>> Thanks, >>>> Shannon >>>> >>>