TreeMapCoder.of(StringUtf8Coder.of(), ListCoder.of(VarIntCoder.of())); On Fri, Jul 12, 2019 at 10:22 AM Shannon Duncan <joseph.dun...@liveramp.com> wrote:
> So I have my custom coder created for TreeMap and I'm ready to set it... > > So my Type is "TreeMap<String, ArrayList<Integer>>" > > What do I put for ".setCoder(TreeMapCoder.of(???, ???))" > > On Thu, Jul 11, 2019 at 8:21 PM Rui Wang <ruw...@google.com> wrote: > >> Hi Shannon, [1] will be a good start on coder in Java SDK. >> >> >> [1] >> https://beam.apache.org/documentation/programming-guide/#data-encoding-and-type-safety >> >> Rui >> >> On Thu, Jul 11, 2019 at 3:08 PM Shannon Duncan < >> joseph.dun...@liveramp.com> wrote: >> >>> Was able to get it to use ArrayList by doing List<List<Integer>> result >>> = new ArrayList<List<Integer>>(); >>> >>> Then storing my keys in a separate array that I'll pass in as a side >>> input to key for the list of lists. >>> >>> Thanks for the help, lemme know more in the future about how coders work >>> and instantiate and I'd love to help contribute by adding some new coders. >>> >>> - Shannon >>> >>> On Thu, Jul 11, 2019 at 4:59 PM Shannon Duncan < >>> joseph.dun...@liveramp.com> wrote: >>> >>>> Will do. Thanks. A new coder for deterministic Maps would be great in >>>> the future. Thank you! >>>> >>>> On Thu, Jul 11, 2019 at 4:58 PM Rui Wang <ruw...@google.com> wrote: >>>> >>>>> I think Mike refers to ListCoder >>>>> <https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/ListCoder.java> >>>>> which >>>>> is deterministic if its element is the same. Maybe you can search the repo >>>>> for examples of ListCoder? >>>>> >>>>> >>>>> -Rui >>>>> >>>>> On Thu, Jul 11, 2019 at 2:55 PM Shannon Duncan < >>>>> joseph.dun...@liveramp.com> wrote: >>>>> >>>>>> So ArrayList doesn't work either, so just a standard List? >>>>>> >>>>>> On Thu, Jul 11, 2019 at 4:53 PM Rui Wang <ruw...@google.com> wrote: >>>>>> >>>>>>> Shannon, I agree with Mike on List is a good workaround if your >>>>>>> element within list is deterministic and you are eager to make your new >>>>>>> pipeline working. >>>>>>> >>>>>>> >>>>>>> Let me send back some pointers to adding new coder later. >>>>>>> >>>>>>> >>>>>>> -Rui >>>>>>> >>>>>>> On Thu, Jul 11, 2019 at 2:45 PM Shannon Duncan < >>>>>>> joseph.dun...@liveramp.com> wrote: >>>>>>> >>>>>>>> I just started learning Java today to attempt to convert our python >>>>>>>> pipelines to Java to take advantage of key features that Java has. I >>>>>>>> have >>>>>>>> no idea how I would create a new coder and include it in for beam to >>>>>>>> recognize. >>>>>>>> >>>>>>>> If you can point me in the right direction of where it hooks >>>>>>>> together I might be able to figure that out. I can duplicate MapCoder >>>>>>>> and >>>>>>>> try to make changes, but how will beam know to pick up that coder for a >>>>>>>> groupByKey? >>>>>>>> >>>>>>>> Thanks! >>>>>>>> Shannon >>>>>>>> >>>>>>>> On Thu, Jul 11, 2019 at 4:42 PM Rui Wang <ruw...@google.com> wrote: >>>>>>>> >>>>>>>>> It could be just straightforward to create a SortedMapCoder for >>>>>>>>> TreeMap. Just add checks on map instances and then change >>>>>>>>> verifyDeterministic. >>>>>>>>> >>>>>>>>> If this is a common need we could just submit it into Beam repo. >>>>>>>>> >>>>>>>>> [1]: >>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java#L146 >>>>>>>>> >>>>>>>>> On Thu, Jul 11, 2019 at 2:26 PM Mike Pedersen < >>>>>>>>> m...@mikepedersen.dk> wrote: >>>>>>>>> >>>>>>>>>> There isn't a coder for deterministic maps in Beam, so even if >>>>>>>>>> your datastructure is deterministic, Beam will assume the serialized >>>>>>>>>> bytes >>>>>>>>>> aren't deterministic. >>>>>>>>>> >>>>>>>>>> You could make one using the MapCoder as a guide: >>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java >>>>>>>>>> Just change it such that the exception in VerifyDeterministic is >>>>>>>>>> removed and when decoding it instantiates a TreeMap or such instead >>>>>>>>>> of a >>>>>>>>>> HashMap. >>>>>>>>>> >>>>>>>>>> Alternatively, you could just represent your key as a sorted list >>>>>>>>>> of KV pairs. Lookups could be done using binary search if necessary. >>>>>>>>>> >>>>>>>>>> Mike >>>>>>>>>> >>>>>>>>>> Den tor. 11. jul. 2019 kl. 22.41 skrev Shannon Duncan < >>>>>>>>>> joseph.dun...@liveramp.com>: >>>>>>>>>> >>>>>>>>>>> So I'm working on essentially doing a word-count on a complex >>>>>>>>>>> data structure. >>>>>>>>>>> >>>>>>>>>>> I tried just using a HashMap as the Structure, but that didn't >>>>>>>>>>> work because it is non-deterministic. >>>>>>>>>>> >>>>>>>>>>> However when Given a LinkedHashMap or TreeMap which is >>>>>>>>>>> deterministic the SDK complains that it's non-deterministic when >>>>>>>>>>> trying to >>>>>>>>>>> use it as a key for GroupByKey. >>>>>>>>>>> >>>>>>>>>>> What would be an appropriate Map style data structure that would >>>>>>>>>>> be deterministic enough for Apache Beam to accept it as a key? >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Shannon >>>>>>>>>>> >>>>>>>>>>