Re: Is a window function necessary when using a KafkaIO source and a BigQueryIO sink?

2019-01-29 Thread Kaymak, Tobias
I am feeling a bit stupid, but I haven't had time to try out the different possibilities to model the Kafka -> Partitioned-Table-in-BQ pipeline in Beam, until now. I am using the snapshort 2.10 version at the moment and your comment was on point: After rewriting the pipeline (which limits it to dea

Re: Is a window function necessary when using a KafkaIO source and a BigQueryIO sink?

2019-01-29 Thread Reza Rokni
Thanx Tobi very interesting ! Sorry for the long gaps in email, I am based out of Singapore :-) I wonder if this is coming from the use of .to( partitionedTableDynamicDestinations), which I am guessing accesses the Intervalwindow. With the use of the Time column based partitioning in my pipeline

Re: kafkaIO Consumer Rebalance with Spark Runner

2019-01-29 Thread Alexey Romanenko
Rick, I think “spark.streaming.kafka.maxRatePerPartition” won’t work for you since, afaik, it’s a configuration option of Spark Kafka reader and Beam KafkaIO doesn’t use it (since it has own consumer implementation). In the same time, if you want to set an option for Beam KafkaIO consumer confi

Re: Spark: No TransformEvaluator registered

2019-01-29 Thread Matt Casters
That was it Juan Carlos. Can't thank you enough. I now have generic Kettle transformations running on the Direct Runner, Dataflow and Spark. Cheers, Matt --- Matt Casters attcast...@gmail.com> Senior Solution Architect, Kettle Project Founder Op di 29 jan. 2019 om 18:19 schreef Matt Casters :

Re: Spark: No TransformEvaluator registered

2019-01-29 Thread Matt Casters
No you're right. I got so focused on getting org.apache.hadoop.fs.FileSystem in order that I forgot about the other files. Doh! Thanks for the tip! --- Matt Casters attcast...@gmail.com> Senior Solution Architect, Kettle Project Founder Op di 29 jan. 2019 om 16:41 schreef Juan Carlos Garcia :

Re: Spark: No TransformEvaluator registered

2019-01-29 Thread Juan Carlos Garcia
Hi Matt, Are the META-INF/services files merged correctly on the fat-jar? On Tue, Jan 29, 2019 at 2:33 PM Matt Casters wrote: > Hi Beam! > > After I have my pipeline created and the execution started on the Spark > master I get this strange nugget: > > java.lang.IllegalStateException: No Tran

Re: No checkpoints for KafkaIO to BigQueryIO pipeline on Flink runner?

2019-01-29 Thread Maximilian Michels
Hi Tobias, It is normal to see "No restore state for UnbounedSourceWrapper" when not restoring from a checkpoint/savepoint. Just checking. You mentioned you set the checkpoint interval via: --checkpointingInterval=30 That means you have to wait 5 minutes until the first checkpoint will

Spark: No TransformEvaluator registered

2019-01-29 Thread Matt Casters
Hi Beam! After I have my pipeline created and the execution started on the Spark master I get this strange nugget: java.lang.IllegalStateException: No TransformEvaluator registered for BOUNDED transform Read(CompressedSource) I'm reading from a HDFS file (hdfs:///input/customers-noheader-1M.txt)

Re: Is a window function necessary when using a KafkaIO source and a BigQueryIO sink?

2019-01-29 Thread Kaymak, Tobias
I am using FILE_LOADS and no timestamp column, but partitioned tables. In this case I have seen the following error, when I comment the windowing out (direct runner): Exception in thread "main" org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.lang.ClassCastException: org.apache.beam.s

Re: Spark progress feedback

2019-01-29 Thread Matt Casters
OK folks, I figured it out. For the other people desperately clutching to years-old google results in the hope to find any hint... The Spark requirement to work with a fat jar caused a collision in the packaging on file: META-INF/services/org.apache.hadoop.fs.FileSystem This in turn erased cert

Re: No checkpoints for KafkaIO to BigQueryIO pipeline on Flink runner?

2019-01-29 Thread Kaymak, Tobias
If I have a pipeline running and I restart the taskmanager on which it's executing the log shows - I find the "No restore state for UnbounedSourceWrapper." interesting, as it seems to indicate that the pipeline never stored a state in the first place? Starting taskexecutor as a console application

Re: Is a window function necessary when using a KafkaIO source and a BigQueryIO sink?

2019-01-29 Thread Reza Rokni
Also my BQ table was partitioned based on a TIMESTAMP column, rather than being ingestion time based partitioning. Cheers Reza On Tue, 29 Jan 2019 at 15:44, Reza Rokni wrote: > Hya Tobi, > > When you mention failed do you mean you get an error on running the > pipeline or there is a incorrect d