Re: How Beam Pipeline Handle late events

2023-04-19 Thread Pavel Solomin
Hello Lydian, Do you always observe data loss? Or - maybe, it happens only when you restart your pipeline from a Flink savepoint? If you lose data only between restarts - is you issue similar to https://github.com/apache/beam/issues/26041 ? Best Regards, Pavel Solomin Tel: +351 962 950 692 | Sky

Re: Is there any way to set the parallelism of operators like group by, join?

2023-04-19 Thread Nimalan Mahendran
Same need here, using Flink runner. We are processing a pcollection (extracting features per element) then combining these into groups of features and running the next operator on those groups. Each group contains ~50 elements, so the parallelism of the operator upstream of the groupby should be h

Q: Apache Beam IOElasticsearchIO.read() method (Java), which expects a PBegin input and a means to handle a collection of queries

2023-04-19 Thread Murphy, Sean P. via user
I'm running into an issue using the ElasticsearchIO.read() to handle more than one instance of a query. My queries are being dynamically built as a PCollection based on an incoming group of values. I'm trying to see how to load the .withQuery() parameter which could provide this capability or an

Re: Is there any way to set the parallelism of operators like group by, join?

2023-04-19 Thread Ning Kang via user
Hi, There is a flink pipeline option `parallelism` that you can set: https://github.com/apache/beam/blob/master/sdks/python/apache_beam/options/pipeline_options.py#L1504-L1510 . This parallelism is applied to each step (there is no API to configure a different value for each step). So if you have

Re: Q: Apache Beam IOElasticsearchIO.read() method (Java), which expects a PBegin input and a means to handle a collection of queries

2023-04-19 Thread Shahar Frank
Hi Sean, I'm not an expert but I think the .withQuery() functions takes part of the build stage rather than the runtime stage. This means that the way ElasticsearchIO was built is so that while the pipeline is being built you could set the query but it is not possible during runtime which mean you

Can I batch data when i use JDBC write operation?

2023-04-19 Thread Juan Romero
Hi community. On this occasion I have a doubt regarding how to read a stream from kafka and write batches of data with the jdbc connector. The idea is to override a specific row if the current row we want to insert into has the same id and the load_date_time is greater. The conceptual pipeline loo

Re: Q: Apache Beam IOElasticsearchIO.read() method (Java), which expects a PBegin input and a means to handle a collection of queries

2023-04-19 Thread Evan Galpin
Yes unfortunately the ES IO connector is not built in a way that can work by taking inputs from a PCollection to issue Reads. The most scalable way to support this is to revisit the implementation of Elasticsearch Read transform and instead implement it as a SplittableDoFn. On Wed, Apr 19, 2023 at

Re: How Beam Pipeline Handle late events

2023-04-19 Thread Trevor Burke
Hi Pavel, Thanks for the reply. No, the event losses are not consistent. While we've been running our pipelines in parallel (Beam vs Spark) we are seeing some days with no event loss and some days with some, but it's always less than 0.05% On Wed, Apr 19, 2023 at 8:07 AM Pavel Solomin wrote:

Re: How Beam Pipeline Handle late events

2023-04-19 Thread Pavel Solomin
Thank you Just to confirm: how did you configure Kafka offset commits? Did you have this flag enabled? https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/kafka/KafkaIO.Read.html#isCommitOffsetsInFinalizeEnabled-- On Thursday, 20 April 2023, Trevor Burke wrote: > Hi Pavel,

Re: How Beam Pipeline Handle late events

2023-04-19 Thread Lydian
Yes, we did enabled this in our pipeline. On Wed, Apr 19, 2023 at 5:00 PM Pavel Solomin wrote: > Thank you > > Just to confirm: how did you configure Kafka offset commits? Did you have > this flag enabled? > > > > https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/kafka/Kafk

Re: [Question] 2.47.0 Release

2023-04-19 Thread Ahmet Altay via user
Hi! On Sat, Mar 18, 2023 at 11:53 AM Ayoyinka Obisesan < ayoyinkaobise...@gmail.com> wrote: > cc-ing: user@beam.apache.org > > Please see the questions in the above email. > > Kind regards, > Ayoyinka. > > > On Fri, Mar 17, 2023 at 11:40 AM Ayoyinka Obisesan < > ayoyinkaobise...@gmail.com> wrote: