Hello Lydian,
Do you always observe data loss? Or - maybe, it happens only when you
restart your pipeline from a Flink savepoint? If you lose data only between
restarts - is you issue similar to
https://github.com/apache/beam/issues/26041 ?
Best Regards,
Pavel Solomin
Tel: +351 962 950 692 | Sky
Same need here, using Flink runner. We are processing a pcollection
(extracting features per element) then combining these into groups of
features and running the next operator on those groups.
Each group contains ~50 elements, so the parallelism of the operator
upstream of the groupby should be h
I'm running into an issue using the ElasticsearchIO.read() to handle more than
one instance of a query. My queries are being dynamically built as a
PCollection based on an incoming group of values. I'm trying to see how to load
the .withQuery() parameter which could provide this capability or an
Hi,
There is a flink pipeline option `parallelism` that you can set:
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/options/pipeline_options.py#L1504-L1510
.
This parallelism is applied to each step (there is no API to configure a
different value for each step). So if you have
Hi Sean,
I'm not an expert but I think the .withQuery() functions takes part of the
build stage rather than the runtime stage.
This means that the way ElasticsearchIO was built is so that while the
pipeline is being built you could set the query but it is not possible
during runtime which mean you
Hi community.
On this occasion I have a doubt regarding how to read a stream from kafka
and write batches of data with the jdbc connector. The idea is to override
a specific row if the current row we want to insert into has the same id
and the load_date_time is greater. The conceptual pipeline loo
Yes unfortunately the ES IO connector is not built in a way that can work
by taking inputs from a PCollection to issue Reads. The most scalable way
to support this is to revisit the implementation of Elasticsearch Read
transform and instead implement it as a SplittableDoFn.
On Wed, Apr 19, 2023 at
Hi Pavel,
Thanks for the reply.
No, the event losses are not consistent. While we've been running our
pipelines in parallel (Beam vs Spark) we are seeing some days with no event
loss and some days with some, but it's always less than 0.05%
On Wed, Apr 19, 2023 at 8:07 AM Pavel Solomin wrote:
Thank you
Just to confirm: how did you configure Kafka offset commits? Did you have
this flag enabled?
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/kafka/KafkaIO.Read.html#isCommitOffsetsInFinalizeEnabled--
On Thursday, 20 April 2023, Trevor Burke wrote:
> Hi Pavel,
Yes, we did enabled this in our pipeline.
On Wed, Apr 19, 2023 at 5:00 PM Pavel Solomin wrote:
> Thank you
>
> Just to confirm: how did you configure Kafka offset commits? Did you have
> this flag enabled?
>
>
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/kafka/Kafk
Hi!
On Sat, Mar 18, 2023 at 11:53 AM Ayoyinka Obisesan <
ayoyinkaobise...@gmail.com> wrote:
> cc-ing: user@beam.apache.org
>
> Please see the questions in the above email.
>
> Kind regards,
> Ayoyinka.
>
>
> On Fri, Mar 17, 2023 at 11:40 AM Ayoyinka Obisesan <
> ayoyinkaobise...@gmail.com> wrote:
11 matches
Mail list logo