[python SDK] Returning Pub/Sub message_id and timestamp

2019-07-12 Thread Matthew Darwin
Good morning, I'm very new to Beam, and pretty new to Python so please first accept my apologies for any obvious misconceptions/mistakes in the following. I am currently trying to develop a sample pipeline in Python to pull messages from Pub/Sub and then write them to either files in cloud stor

Re: [Java] TextIO not reading file as expected

2019-07-12 Thread Shannon Duncan
So as it turns out, it was an STDOUT issue for my logging and not a data read in. Beam operated just fine but the way I was debugging was causing the glitches. Beam is operating as expected now. On Thu, Jul 11, 2019 at 10:28 PM Kenneth Knowles wrote: > Doesn't sound good. TextIO has been around

Re: [Java] Using a complex datastructure as Key for KV

2019-07-12 Thread Shannon Duncan
So I have my custom coder created for TreeMap and I'm ready to set it... So my Type is "TreeMap>" What do I put for ".setCoder(TreeMapCoder.of(???, ???))" On Thu, Jul 11, 2019 at 8:21 PM Rui Wang wrote: > Hi Shannon, [1] will be a good start on coder in Java SDK. > > > [1] > https://beam.apac

Re: [Java] Using a complex datastructure as Key for KV

2019-07-12 Thread Lukasz Cwik
TreeMapCoder.of(StringUtf8Coder.of(), ListCoder.of(VarIntCoder.of())); On Fri, Jul 12, 2019 at 10:22 AM Shannon Duncan wrote: > So I have my custom coder created for TreeMap and I'm ready to set it... > > So my Type is "TreeMap>" > > What do I put for ".setCoder(TreeMapCoder.of(???, ???))" > > O

Re: [Java] Using a complex datastructure as Key for KV

2019-07-12 Thread Shannon Duncan
Aha, makes sense. Thanks! On Fri, Jul 12, 2019 at 9:26 AM Lukasz Cwik wrote: > TreeMapCoder.of(StringUtf8Coder.of(), ListCoder.of(VarIntCoder.of())); > > On Fri, Jul 12, 2019 at 10:22 AM Shannon Duncan < > joseph.dun...@liveramp.com> wrote: > >> So I have my custom coder created for TreeMap and

Re: [Java] Using a complex datastructure as Key for KV

2019-07-12 Thread Shannon Duncan
I have a working TreeMapCoder now. Got it all setup and done, and the GroupByKey is accepting it. Thanks for all the help. I need to read up more on contributing guidelines then I'll PR the coder into the SDK. Also willing to write coders for things such as ArrayList etc if people want them. On F

Re: [Java] Using a complex datastructure as Key for KV

2019-07-12 Thread Lukasz Cwik
Additional coders would be useful. Note that we usually don't have coders for specific collection types like ArrayList but prefer to have Coders for their general counterparts like List, Map, Iterable, There has been discussion in the past to make the MapCoder a deterministic coder when a cod

Re: [Java] Using a complex datastructure as Key for KV

2019-07-12 Thread Shannon Duncan
I tried to pass ArrayList in and it wouldn't generalize it to List. It required me to convert my ArrayLists to Lists. On Fri, Jul 12, 2019 at 10:20 AM Lukasz Cwik wrote: > Additional coders would be useful. Note that we usually don't have coders > for specific collection types like ArrayList bu

Re: [python SDK] Returning Pub/Sub message_id and timestamp

2019-07-12 Thread Valentyn Tymofieiev
Hi Matthew, Welcome to Beam! Looking at Python PubSub IO API, you should be able to access id and timestamp by setting `with_attributes=True` when using `ReadFromPubSub` PTransform, see [1,2]. [1] https://github.com/apache/beam/blob/0fce2b88660f52dae638697e1472aa108c982ae6/sdks/python/apache_bea

[python] ReadFromPubSub broken in Flink

2019-07-12 Thread Chad Dombrova
Hi all, This error came as a bit of a surprise. Here’s a snippet of the traceback (full traceback below). File "apache_beam/runners/common.py", line 751, in apache_beam.runners.common.DoFnRunner.process return self.do_fn_invoker.invoke_process(windowed_value) File "apache_beam/runners/com

Re: [Python] Read Hadoop Sequence File?

2019-07-12 Thread Shannon Duncan
Awesome. I got it working for a single file, but for a structure of: /part-0001/index /part-0001/data /part-0002/index /part-0002/data I tried to do /part-* and /part-*/data It does not find the multipart files. However if I just do /part-0001/data it will find it and read it. Any ideas why?

Re: [Python] Read Hadoop Sequence File?

2019-07-12 Thread Shannon Duncan
Clarification on previous message. Only happens on local file system where it is unable to match a pattern string. Via a `gs://` link it is able to do multiple file matching. On Fri, Jul 12, 2019 at 1:36 PM Shannon Duncan wrote: > Awesome. I got it working for a single file, but for a structure

Re: [Java] TextIO not reading file as expected

2019-07-12 Thread Kenneth Knowles
Glad to hear it :-) On Fri, Jul 12, 2019 at 6:33 AM Shannon Duncan wrote: > So as it turns out, it was an STDOUT issue for my logging and not a data > read in. Beam operated just fine but the way I was debugging was causing > the glitches. > > Beam is operating as expected now. > > On Thu, Jul 1