Hi Shannon, [1] will be a good start on coder in Java SDK.
[1] https://beam.apache.org/documentation/programming-guide/#data-encoding-and-type-safety Rui On Thu, Jul 11, 2019 at 3:08 PM Shannon Duncan <joseph.dun...@liveramp.com> wrote: > Was able to get it to use ArrayList by doing List<List<Integer>> result = > new ArrayList<List<Integer>>(); > > Then storing my keys in a separate array that I'll pass in as a side input > to key for the list of lists. > > Thanks for the help, lemme know more in the future about how coders work > and instantiate and I'd love to help contribute by adding some new coders. > > - Shannon > > On Thu, Jul 11, 2019 at 4:59 PM Shannon Duncan <joseph.dun...@liveramp.com> > wrote: > >> Will do. Thanks. A new coder for deterministic Maps would be great in the >> future. Thank you! >> >> On Thu, Jul 11, 2019 at 4:58 PM Rui Wang <ruw...@google.com> wrote: >> >>> I think Mike refers to ListCoder >>> <https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/ListCoder.java> >>> which >>> is deterministic if its element is the same. Maybe you can search the repo >>> for examples of ListCoder? >>> >>> >>> -Rui >>> >>> On Thu, Jul 11, 2019 at 2:55 PM Shannon Duncan < >>> joseph.dun...@liveramp.com> wrote: >>> >>>> So ArrayList doesn't work either, so just a standard List? >>>> >>>> On Thu, Jul 11, 2019 at 4:53 PM Rui Wang <ruw...@google.com> wrote: >>>> >>>>> Shannon, I agree with Mike on List is a good workaround if your >>>>> element within list is deterministic and you are eager to make your new >>>>> pipeline working. >>>>> >>>>> >>>>> Let me send back some pointers to adding new coder later. >>>>> >>>>> >>>>> -Rui >>>>> >>>>> On Thu, Jul 11, 2019 at 2:45 PM Shannon Duncan < >>>>> joseph.dun...@liveramp.com> wrote: >>>>> >>>>>> I just started learning Java today to attempt to convert our python >>>>>> pipelines to Java to take advantage of key features that Java has. I have >>>>>> no idea how I would create a new coder and include it in for beam to >>>>>> recognize. >>>>>> >>>>>> If you can point me in the right direction of where it hooks together >>>>>> I might be able to figure that out. I can duplicate MapCoder and try to >>>>>> make changes, but how will beam know to pick up that coder for a >>>>>> groupByKey? >>>>>> >>>>>> Thanks! >>>>>> Shannon >>>>>> >>>>>> On Thu, Jul 11, 2019 at 4:42 PM Rui Wang <ruw...@google.com> wrote: >>>>>> >>>>>>> It could be just straightforward to create a SortedMapCoder for >>>>>>> TreeMap. Just add checks on map instances and then change >>>>>>> verifyDeterministic. >>>>>>> >>>>>>> If this is a common need we could just submit it into Beam repo. >>>>>>> >>>>>>> [1]: >>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java#L146 >>>>>>> >>>>>>> On Thu, Jul 11, 2019 at 2:26 PM Mike Pedersen <m...@mikepedersen.dk> >>>>>>> wrote: >>>>>>> >>>>>>>> There isn't a coder for deterministic maps in Beam, so even if your >>>>>>>> datastructure is deterministic, Beam will assume the serialized bytes >>>>>>>> aren't deterministic. >>>>>>>> >>>>>>>> You could make one using the MapCoder as a guide: >>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java >>>>>>>> Just change it such that the exception in VerifyDeterministic is >>>>>>>> removed and when decoding it instantiates a TreeMap or such instead of >>>>>>>> a >>>>>>>> HashMap. >>>>>>>> >>>>>>>> Alternatively, you could just represent your key as a sorted list >>>>>>>> of KV pairs. Lookups could be done using binary search if necessary. >>>>>>>> >>>>>>>> Mike >>>>>>>> >>>>>>>> Den tor. 11. jul. 2019 kl. 22.41 skrev Shannon Duncan < >>>>>>>> joseph.dun...@liveramp.com>: >>>>>>>> >>>>>>>>> So I'm working on essentially doing a word-count on a complex data >>>>>>>>> structure. >>>>>>>>> >>>>>>>>> I tried just using a HashMap as the Structure, but that didn't >>>>>>>>> work because it is non-deterministic. >>>>>>>>> >>>>>>>>> However when Given a LinkedHashMap or TreeMap which is >>>>>>>>> deterministic the SDK complains that it's non-deterministic when >>>>>>>>> trying to >>>>>>>>> use it as a key for GroupByKey. >>>>>>>>> >>>>>>>>> What would be an appropriate Map style data structure that would >>>>>>>>> be deterministic enough for Apache Beam to accept it as a key? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Shannon >>>>>>>>> >>>>>>>>