Re: Can we use KafkaIO SplittableDoFn ?

2022-07-20 Thread John Casey via user
This is being temporarily disabled due to some substantial issues we've discovered with the SDF implementation. Ideally this is temporary, and will be resolved quickly. If the SDF issues do not affect you, you can avoid upgrading Beam versions temporarily. On Wed, Jul 20, 2022 at 9:00 AM Jean Wiss

Re: [troubleshooting] KafkaIO#write gets stuck "since the associated topicId changed from null to "

2022-09-13 Thread John Casey via user
Hi Evan, I haven't seen this before. Can you share your Kafka write configuration, and any other stack traces that could be relevant? John On Tue, Sep 13, 2022 at 10:23 AM Evan Galpin wrote: > Hi all, > > I've recently started using the KafkaIO connector as a sink, and am new to > Kafka in gen

Re: [troubleshooting] KafkaIO#write gets stuck "since the associated topicId changed from null to "

2022-09-13 Thread John Casey via user
Googling that error message returned https://stackoverflow.com/questions/71077394/kafka-producer-resetting-the-last-seen-epoch-of-partition-resulting-in-timeout and https://github.com/a0x8o/kafka/blob/master/clients/src/main/java/org/apache/kafka/clients/Metadata.java#L402 Which suggests that ther

Re: [troubleshooting] KafkaIO#write gets stuck "since the associated topicId changed from null to "

2022-09-13 Thread John Casey via user
o be supported by KafkaIO today? > > Thanks, > Evan > > On Tue, Sep 13, 2022 at 11:07 AM John Casey via user > wrote: > >> Googling that error message returned >> https://stackoverflow.com/questions/71077394/kafka-producer-resetting-the-last-seen-epoch-of-partition

Re: AvroIO.to(DynamicAvroDestinations) deprecated?

2022-09-13 Thread John Casey via user
Hi Steve, I've asked around, and it looks like this confusing state is due to a migration that isn't complete (and likely won't be until Beam 3.0). Here is the doc that explains some of the history: https://docs.google.com/document/d/1zcF4ZGtq8pxzLZxgD_JMWAouSszIf9LnFANWHKBsZlg/edit And a PR that

Re: AvroIO.to(DynamicAvroDestinations) deprecated?

2022-09-13 Thread John Casey via user
That would be great, thanks! On Tue, Sep 13, 2022 at 3:00 PM Steve Niemitz wrote: > Ah this is super useful context, thank you! I can submit a couple PRs to > get AvroIO.sink up to parity if that's the way forward. > > On Tue, Sep 13, 2022 at 2:53 PM John Casey via user >

Re: SSL issue: Kafka Avro write with Schema Registry (GCP)

2022-11-01 Thread John Casey via user
This issue indicates that your certificates don't have a path between the worker and the schema registry. You need to make sure your certs know that certification path, though I don't actually know how to ensure that. On Mon, Oct 31, 2022 at 6:13 PM Ahmet Altay wrote: > (moving this to the user

Re: SSL issue: Kafka Avro write with Schema Registry (GCP)

2022-11-02 Thread John Casey via user
Unfortunately, I don't have a working sample for this, apologies On Tue, Nov 1, 2022 at 1:00 PM MURUGASUNDARAM ANNAMALAI < murugasundaram.annama...@davita.com> wrote: > Yes, Truststore is a JKS file and we are not sure how to inject path based > files in Google dataflow as how dataflow can identi

Re: [KafkaIO] Use of sinkGroupId with Exactly Once Semantics

2022-11-11 Thread John Casey via user
I haven't done this experimentally before, so take this with a grain of salt, but... Kafka Group Ids are essentially used to track where a logical (aka application level, not thread/machine level) producer / consumer is at. As such, I think it would be fine to use just one group id, even when writ

Re: major reduction is performance when using schema registry - KafkaIO

2023-04-12 Thread John Casey via user
My initial guess is that there are queries being made in order to retrieve the schemas, which would impact performance, especially if those queries aren't cached with Beam splitting in mind. I'm looking to improve our interaction with Kafka schemas in the next couple of quarters, so I'll keep this

Re: Doubts about kafka connector

2023-04-13 Thread John Casey via user
Hi Juan, Under normal usage, Kafka will maintain ordering within a partition without any extra work by you. For 2, you can use .commitOffsetsInFinalize to only commit back to the source topic once the pipeline has persisted the message, at which point it may not be fully processed, but it is guar

Re: Doubts about kafka connector

2023-04-13 Thread John Casey via user
in this case the producer will push the row updatings between > the different partitions and we can potencially lose the original messages > order, because we are pushing the different updates referring an specific > row across multiple partions. > > Let me know If you can understand m

Dataloss Bug in BigQuery IO Storage Write when used in Batch

2023-05-03 Thread John Casey via user
Hi All, Per https://github.com/apache/beam/issues/26521 and https://github.com/apache/beam/issues/26520, there is an issue in Beam versions 2.33 - 2.47 where data can be lost when using the Storage Write API in Batch. This issue is much more likely to occur in versions 2.44-2.47. The bugs themsel

How to Write a Beam IO

2023-06-13 Thread John Casey via user
By request, I'm resharing my slides and doc on how to write a beam IO https://docs.google.com/document/d/1-WxZTNu9RrLhh5O7Dl5PbnKqz3e5gm1x3gDBBhszVF8/edit?usp=sharing https://docs.google.com/presentation/d/14PjBNFoCOFOROiQCdR3hkbg1fkDGSz-0Aer4L88P8Uk/edit?usp=sharing Please feel free to comment

Re: Create IO connector for HTTP or ParDO

2023-06-28 Thread John Casey via user
I have a doc ( https://docs.google.com/document/d/1-WxZTNu9RrLhh5O7Dl5PbnKqz3e5gm1x3gDBBhszVF8/edit#heading=h.n02teqc95avo) on writing an IO in Beam. Some of it is specific to using SplittableDoFn, but most of it is applicable to doing IO at scale in beam in general. I hope this helps John On Mo

Re: [question]Best practices for branching pipeline.

2023-08-09 Thread John Casey via user
Depending on the specifics of your processing, it may be simpler to just do both transforms within a single pardo. i.e. pipeline.apply(kafka.read()) .apply(ParDo.of(new UserTransform()); public static class UserTransform extends DoFn{ @ProcessElement public void processElement(@Element KafkaRec

Re: [question]Best practices for branching pipeline.

2023-08-10 Thread John Casey via user
On Wed, Aug 9, 2023 at 12:16 PM John Casey via user > wrote: > >> Depending on the specifics of your processing, it may be simpler to just >> do both transforms within a single pardo. >> >> i.e. >> >> pipeline.apply(kafka.read()) >> .appl

Re: Questions about writing to BigQuery using storage api

2023-12-05 Thread John Casey via user
Hi, Could you add some more detail? Which beam version are you using? On Tue, Dec 5, 2023 at 1:52 PM hsy...@gmail.com wrote: > Any one has experience in writing to BQ using storage api > > I tried to use it because according to the document it is more efficient > but I got error below > > 2023

Re: Streaming management exception in the sink target.

2023-12-06 Thread John Casey via user
I'm currently implementing improvements on Kafka, File, Spanner, and Bigtable IOs. I'm planning on tackling PubSub and BQ next year. All of this is still in progress though, so there aren't easy workarounds for the moment. On Tue, Dec 5, 2023 at 5:56 PM Robert Bradshaw wrote: > Currently error

Re: Streaming management exception in the sink target.

2023-12-06 Thread John Casey via user
ink and put it in another target (eg: kafka error topic where we can > verify which messages failed to be delivered to the target)? > > > El mié, 6 dic 2023 a las 10:40, John Casey via user () > escribió: > >> I'm currently implementing improvements on Kafka, File, Spanner,

Re: Questions about writing to BigQuery using storage api

2023-12-06 Thread John Casey via user
orInfo() > .withMethod(Write.Method.STORAGE_WRITE_API) > .withTriggeringFrequency(Duration.standardSeconds(10)) > .withAutoSharding().optimizedWrites() > .withFailedInsertRetryPolicy(retryTransientErrors()); > > > On Tue, Dec 5, 2023 at 11:20 AM John Casey via user > wrote: &g

Re: Streaming management exception in the sink target.

2023-12-06 Thread John Casey via user
not possible to get the message that can't reach the target >>> sink and put it in another target (eg: kafka error topic where we can >>> verify which messages failed to be delivered to the target)? >>> >>> >>> El mié, 6 dic 2023 a las 10:40, Jo

Re: Streaming management exception in the sink target.

2023-12-06 Thread John Casey via user
AM Juan Romero wrote: >>>> >>>>> But , is it not possible to get the message that can't reach the >>>>> target sink and put it in another target (eg: kafka error topic where we >>>>> can verify which messages failed to be delivered to

Re: Questions about writing to BigQuery using storage api

2023-12-06 Thread John Casey via user
3 John Casey via user > wrote: > >> Well, that is odd. It looks like the underlying client is closed, which >> is unexpected. >> >> Do you see any retries in your pipeline? Also, what runner are you using? >> >> @Ahmed Abualsaud this might be interesting >&g

Re: [Question] Timeout expired while fetching topic metadata

2024-11-20 Thread John Casey via user
Hi Utkarsh, I can't say for certain without more details, but timeouts against Kafka usually represent auth issues. By and large, when a user tries to connect but doesn't have permission, Kafka times out the connection instead of sending back a "not authenticated/authorized" message. This can al