[Java] TextIO not reading file as expected

2019-07-11 Thread Shannon Duncan
I have a file where every line is a record separated by a tab. So a tab delimited file. However as I read this file in using TextIO.read().from(filename) and pass the results to a pardo, the elements are random chunks of the records. I expected the element to be the entire line of text which then

SDK support status clarification

2019-07-11 Thread Neville Li
Hi all, more specifically Googlers here, I want to clarify the Beam SDK support status w.r.t. Dataflow runner here: https://cloud.google.com/dataflow/docs/support/sdk-version-support-status When a Beam SDK is deprecated, what does it mean for users running it on Dataflow? The page mentions that

Re: SDK support status clarification

2019-07-11 Thread Kenneth Knowles
That page is a better authority than this list. It has all the public information and is up to date. What you may be most interested in is the orange box describing the decommissioning you mention: "The new end date has yet to be finalized but is expected to happen in 2019. When decommissioning ha

[Java] Using a complex datastructure as Key for KV

2019-07-11 Thread Shannon Duncan
So I'm working on essentially doing a word-count on a complex data structure. I tried just using a HashMap as the Structure, but that didn't work because it is non-deterministic. However when Given a LinkedHashMap or TreeMap which is deterministic the SDK complains that it's non-deterministic whe

Re: [Java] Using a complex datastructure as Key for KV

2019-07-11 Thread Mike Pedersen
There isn't a coder for deterministic maps in Beam, so even if your datastructure is deterministic, Beam will assume the serialized bytes aren't deterministic. You could make one using the MapCoder as a guide: https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/

Re: [Java] Using a complex datastructure as Key for KV

2019-07-11 Thread Rui Wang
It could be just straightforward to create a SortedMapCoder for TreeMap. Just add checks on map instances and then change verifyDeterministic. If this is a common need we could just submit it into Beam repo. [1]: https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/b

Re: [Java] Using a complex datastructure as Key for KV

2019-07-11 Thread Shannon Duncan
I just started learning Java today to attempt to convert our python pipelines to Java to take advantage of key features that Java has. I have no idea how I would create a new coder and include it in for beam to recognize. If you can point me in the right direction of where it hooks together I migh

Re: [Java] Using a complex datastructure as Key for KV

2019-07-11 Thread Rui Wang
Shannon, I agree with Mike on List is a good workaround if your element within list is deterministic and you are eager to make your new pipeline working. Let me send back some pointers to adding new coder later. -Rui On Thu, Jul 11, 2019 at 2:45 PM Shannon Duncan wrote: > I just started lear

Re: [Java] Using a complex datastructure as Key for KV

2019-07-11 Thread Shannon Duncan
So ArrayList doesn't work either, so just a standard List? On Thu, Jul 11, 2019 at 4:53 PM Rui Wang wrote: > Shannon, I agree with Mike on List is a good workaround if your element > within list is deterministic and you are eager to make your new pipeline > working. > > > Let me send back some p

Re: [Java] Using a complex datastructure as Key for KV

2019-07-11 Thread Rui Wang
I think Mike refers to ListCoder which is deterministic if its element is the same. Maybe you can search the repo for examples of ListCoder? -Rui On Thu, Jul 11, 2019 at 2:55 PM Sh

Re: [Java] Using a complex datastructure as Key for KV

2019-07-11 Thread Shannon Duncan
Will do. Thanks. A new coder for deterministic Maps would be great in the future. Thank you! On Thu, Jul 11, 2019 at 4:58 PM Rui Wang wrote: > I think Mike refers to ListCoder > >

Re: [Java] Using a complex datastructure as Key for KV

2019-07-11 Thread Shannon Duncan
Was able to get it to use ArrayList by doing List> result = new ArrayList>(); Then storing my keys in a separate array that I'll pass in as a side input to key for the list of lists. Thanks for the help, lemme know more in the future about how coders work and instantiate and I'd love to help cont

Re: [Java] Using a complex datastructure as Key for KV

2019-07-11 Thread Rui Wang
Hi Shannon, [1] will be a good start on coder in Java SDK. [1] https://beam.apache.org/documentation/programming-guide/#data-encoding-and-type-safety Rui On Thu, Jul 11, 2019 at 3:08 PM Shannon Duncan wrote: > Was able to get it to use ArrayList by doing List> result = > new ArrayList>(); >

Re: [Java] TextIO not reading file as expected

2019-07-11 Thread Kenneth Knowles
Doesn't sound good. TextIO has been around a long time so I'm surprised. Would you mind creating a ticket in Jira ( https://issues.apache.org/jira/projects/BEAM/) and posting some technical details, like input/output/code snippets? Kenn On Thu, Jul 11, 2019 at 9:45 AM Shannon Duncan wrote: > I