This is being temporarily disabled due to some substantial issues we've
discovered with the SDF implementation.
Ideally this is temporary, and will be resolved quickly. If the SDF issues
do not affect you, you can avoid upgrading Beam versions temporarily.
On Wed, Jul 20, 2022 at 9:00 AM Jean Wiss
Hi Evan,
I haven't seen this before. Can you share your Kafka write configuration,
and any other stack traces that could be relevant?
John
On Tue, Sep 13, 2022 at 10:23 AM Evan Galpin wrote:
> Hi all,
>
> I've recently started using the KafkaIO connector as a sink, and am new to
> Kafka in gen
Googling that error message returned
https://stackoverflow.com/questions/71077394/kafka-producer-resetting-the-last-seen-epoch-of-partition-resulting-in-timeout
and
https://github.com/a0x8o/kafka/blob/master/clients/src/main/java/org/apache/kafka/clients/Metadata.java#L402
Which suggests that ther
o be supported by KafkaIO today?
>
> Thanks,
> Evan
>
> On Tue, Sep 13, 2022 at 11:07 AM John Casey via user
> wrote:
>
>> Googling that error message returned
>> https://stackoverflow.com/questions/71077394/kafka-producer-resetting-the-last-seen-epoch-of-partition
Hi Steve,
I've asked around, and it looks like this confusing state is due to a
migration that isn't complete (and likely won't be until Beam 3.0).
Here is the doc that explains some of the history:
https://docs.google.com/document/d/1zcF4ZGtq8pxzLZxgD_JMWAouSszIf9LnFANWHKBsZlg/edit
And a PR that
That would be great, thanks!
On Tue, Sep 13, 2022 at 3:00 PM Steve Niemitz wrote:
> Ah this is super useful context, thank you! I can submit a couple PRs to
> get AvroIO.sink up to parity if that's the way forward.
>
> On Tue, Sep 13, 2022 at 2:53 PM John Casey via user
>
This issue indicates that your certificates don't have a path between the
worker and the schema registry. You need to make sure your certs know that
certification path, though I don't actually know how to ensure that.
On Mon, Oct 31, 2022 at 6:13 PM Ahmet Altay wrote:
> (moving this to the user
Unfortunately, I don't have a working sample for this, apologies
On Tue, Nov 1, 2022 at 1:00 PM MURUGASUNDARAM ANNAMALAI <
murugasundaram.annama...@davita.com> wrote:
> Yes, Truststore is a JKS file and we are not sure how to inject path based
> files in Google dataflow as how dataflow can identi
I haven't done this experimentally before, so take this with a grain of
salt, but...
Kafka Group Ids are essentially used to track where a logical (aka
application level, not thread/machine level) producer / consumer is at. As
such, I think it would be fine to use just one group id, even when writ
My initial guess is that there are queries being made in order to
retrieve the schemas, which would impact performance, especially if those
queries aren't cached with Beam splitting in mind.
I'm looking to improve our interaction with Kafka schemas in the next
couple of quarters, so I'll keep this
Hi Juan,
Under normal usage, Kafka will maintain ordering within a partition without
any extra work by you.
For 2, you can use .commitOffsetsInFinalize to only commit back to the
source topic once the pipeline has persisted the message, at which point it
may not be fully processed, but it is guar
in this case the producer will push the row updatings between
> the different partitions and we can potencially lose the original messages
> order, because we are pushing the different updates referring an specific
> row across multiple partions.
>
> Let me know If you can understand m
Hi All,
Per https://github.com/apache/beam/issues/26521 and
https://github.com/apache/beam/issues/26520, there is an issue in Beam
versions 2.33 - 2.47 where data can be lost when using the Storage Write
API in Batch. This issue is much more likely to occur in versions 2.44-2.47.
The bugs themsel
By request, I'm resharing my slides and doc on how to write a beam IO
https://docs.google.com/document/d/1-WxZTNu9RrLhh5O7Dl5PbnKqz3e5gm1x3gDBBhszVF8/edit?usp=sharing
https://docs.google.com/presentation/d/14PjBNFoCOFOROiQCdR3hkbg1fkDGSz-0Aer4L88P8Uk/edit?usp=sharing
Please feel free to comment
I have a doc (
https://docs.google.com/document/d/1-WxZTNu9RrLhh5O7Dl5PbnKqz3e5gm1x3gDBBhszVF8/edit#heading=h.n02teqc95avo)
on writing an IO in Beam.
Some of it is specific to using SplittableDoFn, but most of it is
applicable to doing IO at scale in beam in general.
I hope this helps
John
On Mo
Depending on the specifics of your processing, it may be simpler to just do
both transforms within a single pardo.
i.e.
pipeline.apply(kafka.read())
.apply(ParDo.of(new UserTransform());
public static class UserTransform extends DoFn{
@ProcessElement
public void processElement(@Element KafkaRec
On Wed, Aug 9, 2023 at 12:16 PM John Casey via user
> wrote:
>
>> Depending on the specifics of your processing, it may be simpler to just
>> do both transforms within a single pardo.
>>
>> i.e.
>>
>> pipeline.apply(kafka.read())
>> .appl
Hi,
Could you add some more detail? Which beam version are you using?
On Tue, Dec 5, 2023 at 1:52 PM hsy...@gmail.com wrote:
> Any one has experience in writing to BQ using storage api
>
> I tried to use it because according to the document it is more efficient
> but I got error below
>
> 2023
I'm currently implementing improvements on Kafka, File, Spanner, and
Bigtable IOs.
I'm planning on tackling PubSub and BQ next year.
All of this is still in progress though, so there aren't easy workarounds
for the moment.
On Tue, Dec 5, 2023 at 5:56 PM Robert Bradshaw wrote:
> Currently error
ink and put it in another target (eg: kafka error topic where we can
> verify which messages failed to be delivered to the target)?
>
>
> El mié, 6 dic 2023 a las 10:40, John Casey via user ()
> escribió:
>
>> I'm currently implementing improvements on Kafka, File, Spanner,
orInfo()
> .withMethod(Write.Method.STORAGE_WRITE_API)
> .withTriggeringFrequency(Duration.standardSeconds(10))
> .withAutoSharding().optimizedWrites()
> .withFailedInsertRetryPolicy(retryTransientErrors());
>
>
> On Tue, Dec 5, 2023 at 11:20 AM John Casey via user
> wrote:
&g
not possible to get the message that can't reach the target
>>> sink and put it in another target (eg: kafka error topic where we can
>>> verify which messages failed to be delivered to the target)?
>>>
>>>
>>> El mié, 6 dic 2023 a las 10:40, Jo
AM Juan Romero wrote:
>>>>
>>>>> But , is it not possible to get the message that can't reach the
>>>>> target sink and put it in another target (eg: kafka error topic where we
>>>>> can verify which messages failed to be delivered to
3 John Casey via user
> wrote:
>
>> Well, that is odd. It looks like the underlying client is closed, which
>> is unexpected.
>>
>> Do you see any retries in your pipeline? Also, what runner are you using?
>>
>> @Ahmed Abualsaud this might be interesting
>&g
Hi Utkarsh,
I can't say for certain without more details, but timeouts against Kafka
usually represent auth issues.
By and large, when a user tries to connect but doesn't have permission,
Kafka times out the connection instead of sending back a "not
authenticated/authorized" message.
This can al
25 matches
Mail list logo