Reading BigQuery table containing a repeated field into POJOs

2020-04-17 Thread Joshua Bassett
Hi there I'm trying to read rows from a BigQuery table that contains a repeated field into POJOs. Unfortunately, I'm running into issues and I can't figure it out. I have something like this: @DefaultSchema(JavaFieldSchema.class) class Article implements Serializable { public Long id; publi

Re: Reading BigQuery table containing a repeated field into POJOs

2020-04-17 Thread Chamikara Jayalath
Do you have the full stack trace ? Also, does readTableRows() work for you (without using schemas) ? On Fri, Apr 17, 2020 at 3:44 AM Joshua Bassett wrote: > Hi there > > I'm trying to read rows from a BigQuery table that contains a repeated > field into POJOs. Unfortunately, I'm running into iss

Re: Copying tar.gz libraries to apache-beam workers

2020-04-17 Thread Luke Cwik
When you said you checked '/usr/local/', did you check inside the docker container or on the VM itself? Have you tried adding an echo command before and after your script runs and looked for them in the stackdriver logs? It should help you locate any errors that might have happened when executing

Distributed Tracing in Apache Beam

2020-04-17 Thread Rion Williams
Hi all, I'm reaching out today to inquire if Apache Beam has any support or mechanisms to support some type of distributed tracing similar to something like Jaeger (https://www.jaegertracing.io/). Jaeger itself has a Java SDK, however due to the nature of Beam working with transforms that yield

Re: Copying tar.gz libraries to apache-beam workers

2020-04-17 Thread OrielResearch Eila Arich-Landkof
See inline — Eila www.orielesearch.com https://www.meetup.com/Deep-Learning-In-Production Sent from my iPhone > On Apr 17, 2020, at 11:32 AM, Luke Cwik wrote: > >  > When you said you checked '/usr/local/', did you check inside the docker > container or on the VM itself? Yes. Checked as

Re: Copying tar.gz libraries to apache-beam workers

2020-04-17 Thread Luke Cwik
On Dataflow you should be able to use /opt/userowned On Fri, Apr 17, 2020 at 9:01 AM OrielResearch Eila Arich-Landkof < e...@orielresearch.org> wrote: > See inline > > > — > Eila > www.orielesearch.com > https://www.meetu > p.co

Re: Distributed Tracing in Apache Beam

2020-04-17 Thread Alex Van Boxel
Can you explain a bit more of what you want to achieve here? Do you want to trace how your elements go to the pipeline or do you want to see how every ParDo interacts with external systems? On Fri, Apr 17, 2020, 17:38 Rion Williams wrote: > Hi all, > > I'm reaching out today to inquire if Apach

Re: Distributed Tracing in Apache Beam

2020-04-17 Thread Rion Williams
Hi Alex, As mentioned before, I'm in the process of migrating a pipeline of several Kafka Streams applications over to Apache Beam and I'm hoping to leverage the tracing infrastructure that I had established using Jaeger whenever I can, but specifically to trace an element as it flows through a

Re: Distributed Tracing in Apache Beam

2020-04-17 Thread Alexey Romanenko
Not sure if it will help, but KafkaIO allows to keep all meta information while reading (using KafkaRecord) and writing (using ProducerRecord). So, you can keep your tracing id in the record headers as you did with Kafka Streams. > On 17 Apr 2020, at 18:58, Rion Williams wrote: > > Hi Alex,

Re: Distributed Tracing in Apache Beam

2020-04-17 Thread Rion Williams
Hi Alexey, So this is currently the approach that I'm taking. Basically creating a wrapper Traceable class that will contain all of my record information as well as the data necessary to update the traces for that record. It requires an extra step and will likely mean persisting something along

Re: Distributed Tracing in Apache Beam

2020-04-17 Thread Alexey Romanenko
Hi Rion, In general, yes, it sounds reasonable to me. I just do not see why you need to have extra Traceable wrapper? Do you need to keep some temporary information there that you don’t want to store in Kafka record headers? PS: Now I started to think that we probably have to change an interfa

Re: Distributed Tracing in Apache Beam

2020-04-17 Thread Rion Williams
Hi Alexey, I think you’re right about the wrapper, it’s likely unnecessary as I think I’d have enough information in the headers to rehydrate my “tracer” that communicates the traces/spans to Jaeger as needed. I’d love to not have to touch those or muddy the waters with a wrapper class, additi

Re: Reading BigQuery table containing a repeated field into POJOs

2020-04-17 Thread Joshua Bassett
org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Map at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish (DirectRunner.java:348) at org.apache.beam.runners.direct.DirectRunner$Di

how to add schema to TableRow

2020-04-17 Thread Aniruddh Sharma
I want to do some SQL transforms on following collection PCollection allData = p.apply(tab, BigQueryIO.readTableRowsWithSchema() .from(optionsFromDB.get("stdBQTable")) .withMethod(BigQueryIO.TypedRead.Method.D

Re: Copying tar.gz libraries to apache-beam workers

2020-04-17 Thread OrielResearch Eila Arich-Landkof
Thank you. I was able to tar my libraries at the /opt/userowned fodler. I am using setup.py from this url (recommended your apache-beam documentation) https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/complete/juliaset/setup.py Last, I want to install anaconda. I have a l