, Mickey Donaghy wrote:
Hi,
I'm trying to do a join between a stream and a KTable with a grace period,
which requires a versioned table. However even if I specify a serializer
everywhere, it seems this doesn't quite make it through to
the RocksDBTimeOrderedKeyValueBuffer and I get an
]
o.a.k.c.c.internals.AbstractCoordinator : [Consumer
clientId=consumer-kafka-group-9, groupId=kafka-group] Join group failed with
org.apache.kafka.common.errors.RebalanceInProgressException: The group is
rebalancing, so a rejoin is needed.
2024-08-30 08:35:10.070 INFO 5000 --- [tainer#33-0-C-1
Hi,
I'm trying to do a join between a stream and a KTable with a grace period,
which requires a versioned table. However even if I specify a serializer
everywhere, it seems this doesn't quite make it through to
the RocksDBTimeOrderedKeyValueBuffer and I get an error when
building/st
To subscribe, please follow instructions from the webpage
https://kafka.apache.org/contact
-Matthias
On 2/23/24 1:15 AM, kashi mori wrote:
Hi, please add my email to the mailin list
Hi, please add my email to the mailin list
eporting+Issues+in+Apache+Kafka>
> to
> join kafka developer community to contribute to kafka
>
> Kafka Jira ID: ericzhifengchen
>
> Who AM I: I'm a Software Engineer at Uber technology, works on kafka and
> Kafka eco-system services
>
>
> Zhifeng,
> Thanks
Hi Apache Kafka Community
Following the guide
<https://cwiki.apache.org/confluence/display/KAFKA/Reporting+Issues+in+Apache+Kafka>
to
join kafka developer community to contribute to kafka
Kafka Jira ID: ericzhifengchen
Who AM I: I'm a Software Engineer at Uber technology, works o
; >
> > KTable customers = streamBuilder.table(customerTopic,
> > Consumed.with());
> > KTable comm = streamBuilder.table(commTopic,
> > Consumed.with());
> > KTable labels = streamBuilder.table(labelTopic,
> > Consumed.with());
> > // Re-Keying
> > KTable groupedCom
Streams? Note that customers do not
> > necessisarily
> > > have any communication or label at all, thus non-key joins are out of the
> > > game as far as I understand.
> > >
> > > Our initial (naive) solution was to re-key the dependent KTab
s a lot of steps and intermediate State
> > Stores, especially when considering the stated example is heavily
> > simplified:
> >
> > KTable customers = streamBuilder.table(customerTopic,
> > Consumed.with());
> > KTable comm = streamBuilder.table(commTopic,
> &g
.with());
> KTable labels = streamBuilder.table(labelTopic,
> Consumed.with());
> // Re-Keying
> KTable groupedComm =
> communications.groupBy(...).aggregate(...);
> KTable groupedLabels =
> labels.groupBy(...).aggregate(...);
> // Join
> KTable aggregated = customers
>
Topic,
Consumed.with());
KTable labels = streamBuilder.table(labelTopic,
Consumed.with());
// Re-Keying
KTable groupedComm =
communications.groupBy(...).aggregate(...);
KTable groupedLabels =
labels.groupBy(...).aggregate(...);
// Join
KTable aggregated = customers
.leftJoin(groupedComm, ...)
.lef
fka.apache.org>
mailto:us...@kafka.apache.org>>
Betreff: Table updates are not consistent when doing a join with a Stream
Hello Folks,
We are having an issue with a Kafka Streams Java application.
We have a KStream and a KTable which are joined using a Left Join. The entries
in the KTable
2023 22:57
> An: users@kafka.apache.org<mailto:users@kafka.apache.org>
> mailto:us...@kafka.apache.org>>
> Betreff: Table updates are not consistent when doing a join with a Stream
>
> Hello Folks,
>
> We are having an issue with a Kafka Streams Java application.
ind a solution to your problem.
Best,
Claudia
Von: Mauricio Lopez
Gesendet: Donnerstag, 17. August 2023 22:57
An: users@kafka.apache.org
Betreff: Table updates are not consistent when doing a join with a Stream
Hello Folks,
We are having an issue with a Kafka Streams
Hello Folks,
We are having an issue with a Kafka Streams Java application.
We have a KStream and a KTable which are joined using a Left Join. The entries
in the KTable are constantly updated by the new information that comes from the
KStream. Each KStream message is adding entries to an array
pache.org/jira/browse/KAFKA-6891. I am willing to resolve
> this issue. But according to the rule I have to join the contribution list
> first. So I am asking for permission here.
>
> My JIRA id is garyparrottt.
>
> Many thanks in advance!
>
Hi,
I just spot an issue related to Kafka Connect (
https://github.com/apache/kafka/pull/13398)
I think this issue has been opened at
https://issues.apache.org/jira/browse/KAFKA-6891. I am willing to resolve
this issue. But according to the rule I have to join the contribution list
first. So I
Hello,
I spot a configuration issue related to Kafka Connect (
https://github.com/apache/kafka/pull/13398)
I think this issue has been opened at
https://issues.apache.org/jira/browse/KAFKA-6891. I am interested in
resolving this issue, hope someone can let me join the contribution list.
My JIRA
Bolitho wrote:
> Good morning,
>
> I work with Kafka and a friend has a recommend I join your community. Could
> I be added to your mailing list please?
>
> Kind regards,
>
> Tom
>
Good morning,
I work with Kafka and a friend has a recommend I join your community. Could
I be added to your mailing list please?
Kind regards,
Tom
Hi,
You should be good to go now.
Cheers,
Chris
On Wed, Nov 9, 2022 at 12:52 AM hzhkafka wrote:
> Jira id: hzh0425@apache
>
Jira id: hzh0425@apache
Hi,
I'm trying to implement the event sourcing pattern with Kafka Streams. I have
two topics: commands and events.
Commands are modelled as KStream. To get the current state for a given entity,
I create a KTable by aggregating (replaying) all the events.
Then I join commands KStream
ate spurious
> messages in kafka streams 3.0.0 ?
>
>
> thanks
> - Miguel
>
>
> On Mon, Dec 6, 2021 at 2:49 PM Matthias J. Sax wrote:
>
> > It's fixed in upcoming 3.1 release.
> >
> > Cf https://issues.apache.org/jira/browse/KAFKA-10847
> >
&g
, Dec 6, 2021 at 2:49 PM Matthias J. Sax wrote:
> It's fixed in upcoming 3.1 release.
>
> Cf https://issues.apache.org/jira/browse/KAFKA-10847
>
>
> A stream-(global)table join has different semantics, so I am not sure if
> it would help.
>
> One workaro
I had heard when doing a join, the timestamp of the generated
message is taken from the message triggering the join or the biggest
timestamp of the two.
In older versions it was the timestamp of the record that triggered the
join. Since 2.3, it is the maximum of both (cf
https
It's fixed in upcoming 3.1 release.
Cf https://issues.apache.org/jira/browse/KAFKA-10847
A stream-(global)table join has different semantics, so I am not sure if
it would help.
One workaround would be to apply a stateful` faltTransformValues()`
after the join to "buffer" a
mestamp+Synchronization>,
KIP-353
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-353%3A+Improve+Kafka+Streams+Timestamp+Synchronization>
for more information.
Thank you.
Luke
On Sat, Dec 4, 2021 at 2:10 AM Miguel González
wrote:
> Hello
>
> So I've been using
Hello
So I've been using a Streams app to join two input topics... the messages
have a certain order... but I have seen the messages on the output topic
arriving with a different ordering Even before, when doing a
map/flatmap operation are processed with different ordering.
Example:
S
Hi Miguel,
> Is there a way to force the behavior I need, meaning... using left join
and
a JoinWindows output only one message (A,B) or (A, null)
I think you can try to achieve it by using *KStream-GlobalKTable left join*,
where the GlobalKTable should read all records at the right topic,
oiner =
(transactionEvent, balanceEvent) -> buildMessage(balanceEvent,
transactionEvent);
transactions
// TODO: change to leftJoin
.join(beWithTransaction, valueJoiner, joinWindows, joinParams)
It's pretty simple, but for my use case I need to process in some way the
messages
Hi,
* "... the KTable update won't be trigger on the join on the Transformer."
You are right, KStream-KTable join don't get trigger on KTable update
https://docs.confluent.io/platform/current/streams/developer-guide/dsl-api.html#kstream-ktable-join
Only input records for t
Hi,
I didn't reach any people on StackOverflow, so I try here :
https://stackoverflow.com/questions/67694907/kafka-stream-custom-join-using-state-store
I'm really stuck on that part and I have the feeling only the Processor API
can help me.
But since the stream is really complex, I wo
:10 PM
To: Users
Subject: Kafka Connect Dist. Worker Does not join group
Hello,
Has anyone experienced this scenario before when having a distributed connect
cluster:
[kafka-coordinator-heartbeat-thread | connect-cluster] INFO
org.apache.kafka.clients.consumer.internals.AbstractCoordinator
Hello,
Has anyone experienced this scenario before when having a distributed connect
cluster:
[kafka-coordinator-heartbeat-thread | connect-cluster] INFO
org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Worker
clientId=connect-1, groupId=connect-cluster] Member
connect-1-50ee
Done.
On 12/9/20 7:16 PM, 642826683 wrote:
> Jira ID??lihongyi
>
>
> thanks
>
Jira ID??lihongyi
thanks
Hi,
Could you please add me to the JIRA Assignee list. I would like to start
contributing.
Jira username : ymxz
Full Name: songyingshuan
Apologies in case I've sent this request to the wrong mailing list.
Thanks,
Yingshuan Song
oid the repartition. If they are, then the repartition
> > topics are indeed necessary. Streams should ensure that the repartition
> > topics get the same number of partitions as the topic you’re joining with.
> >
> > As you mentioned, I can only speculate without seeing
indeed necessary. Streams should ensure that the repartition
> topics get the same number of partitions as the topic you’re joining with.
>
> As you mentioned, I can only speculate without seeing the code. I think my
> next step would be to find a specific null join output that you think
&g
to find a specific null join output that you think should have
been non-null and trace the key back through the topology. You should be able
to locate the key in each of the topics and state stores to verify that it’s in
the right partition all the way through.
You could also experiment with the t
d
like to add a left join with a KTable (data is coming from a DB via kafka
connect jdbc source).
So we end-up with a topology like this:
Event Source (some transformations and joins) - leftJoin(A:
KTable) - leftJoin(B: KTable) sinks
The new leftjoin is the one joining A.
The
I am not sure if I understand the question correctly, but a 1:n join in
Kafka Streams does not miss any data.
As explained in a previous answer, you can consider the join eventual
consistent and if you stop sending new data to the input topic, the
"final" join result will be exact.
Als
I am not sure if I fully understand what your question is?
Are you talking about stream-table or table-table join?
For (1), why do you `merge()`? The merge operator is defined on
KStreams, not KTable and a merge is also not a join?
-Matthias
On 7/15/20 3:27 AM, Dumitru-Nicolae Marasoui wrote
The answer is many-fold:
(1) There is only a final result if the input topics don't get new data.
For this case, the result is deterministic. You can consider it
eventually consistent.
The non-determinism really only applies to "intermediate results".
Both input-topic partitions
Hello kafka community,
Hi, in KTable-KTable Join document from an older version, the cwiki
mentions:
“Pay attention, that the KTable lookup is done on the current KTable state,
and thus, out-of-order records can yield non-deterministic result.
Furthermore, in practice Kafka Streams does not
Hello kafka community,
Writing black on white to be more visible,
This is a thought on making join more clear to me and less prone to
concurrency issues that would be risky, not knowing the underlying
implementation of join:
Waiting your feedback,
Thanks,
1. kafka streams 1:
map topic1 in: key
Hello kafka community,
Refining the step 2 and some questions:
- is the indeterminism of the ktable join a real problem?
- how is the ktable join implemented?
- do you think the solution outlined is a step in the right direction?
- does the ktable join implement such a strategy in a future version
In
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Join+Semantics#KafkaStreamsJoinSemantics-KTable-KTableJoin.1
it
is mentioned that "Pay attention, that the KTable lookup is done on the
*current* KTable state, and thus, out-of-order records can yield
non-deterministic r
Hello kafka community,
As I understand it, a kafka-streams join that involves a kTable: “the
KTable lookup is done on the current KTable state, and thus, out-of-order
records can yield non-deterministic result” [1]
Does the solution below involving an intermediate topic sound right to you?
1
Hello kafka community,
In a ktable-ktable join, assuming that kt1 has k1 and v1 that contains k2
and kt2 has k2 and v2,
is it possible that the (k1, v1, k2, v2) pair is never emitted?
I am trying to understand how it works and if any race condition would be
possible.
If race conditions would not
I think it is an appropriate to support multi-way joins of KTables
(including outer, left), but it is not necessarily an extension of the
"co-group" syntax. We can start this discussion as a separate KIP.
Guozhang
Hi Guozhang
Thank you very much for your response.
Since KStream does not have an aggregate method, I guess you meant:
"builder.stream("customer").groupByKey()", which can then be used in the
CoGroup, and there I'd probably need an aggregation like this:
(Long key, Customer value, Customer agg) ->
Hello Murilo,
Thanks for your interests in KIP-150.
As we discussed in the KIP, the scope of this co-group is for stream
co-aggregation. For your case, the first joining table is not from the
aggregation but is a source table itself, in this case it cannot be
included in the co-group of KIP-150.
Hi
I'm really excited about the new release for KafkaStreams.
I've been watching the new CoGroup feature, and now that this is out, I'm
trying to play around with it.
I wonder what would be the best way to do a
KTable.leftJoin(otherTable).leftJoin(yetAnotherTable)...
Taking the Customer example in
ings.
>
> So, what is the proper approach here? Just keep it as-is and ignore the
> warnings?
>
>
> On Sat, Oct 5, 2019 at 10:23 AM Trey Hutcheson
> wrote:
>
> > Thank you for your response Sophie.
> >
> > What confuses me is that I have already implemented t
Just keep it as-is and ignore the
warnings?
On Sat, Oct 5, 2019 at 10:23 AM Trey Hutcheson
wrote:
> Thank you for your response Sophie.
>
> What confuses me is that I have already implemented this same pattern
> (join to table, mutate entity, write back to table) in two other str
Thank you for your response Sophie.
What confuses me is that I have already implemented this same pattern (join
to table, mutate entity, write back to table) in two other streams
applications without any issue whatsoever. After thinking about it, I think
the difference here is the input data. In
you revert 0ms commit interval to default? It will not help with
> the situation as you will try to commit on every poll()
> 2. I couldn't know how you actually write your code, but you could try
> something really simple as print statement within the join operation to see
> if you
print statement within the join operation to see
if your application is actually taking incoming traffic. If you have
metrics exported, that would also be useful.
Boyang
On Fri, Oct 4, 2019 at 6:42 AM Trey Hutcheson
wrote:
> This is my third kafka streams application and I'd thought I
This is my third kafka streams application and I'd thought I had gotten to
know the warts and how to use it correctly. But I'm beating my head against
something that I just cannot explain. Values written to a table, when later
read back in a join operation, are stale.
Assume the
This should be addressed via
https://cwiki.apache.org/confluence/display/KAFKA/KIP-479%3A+Add+Materialized+to+Join
-Matthias
On 9/5/19 8:05 AM, Dmitry Minkovsky wrote:
> Whoops! Just after sending this email I saw
> https://issues.apache.org/jira/browse/KAFKA-4729.
>
> On Thu, Sep 5
.0, I thought maybe it would be
> possible to configure an in-memory stream-stream join using KStream#join.
> But I am not seeing this in the API. My use case is joining streams of
> short-lived events that service a request/response flow. The join doesn't
> have to be durable at all.
>
> Dmitry
>
Hello!
When I saw KAFKA-4730/KIP-428 ("Streams does not have an in-memory windowed
store") were included in 2.3.0, I thought maybe it would be possible to
configure an in-memory stream-stream join using KStream#join. But I am not
seeing this in the API. My use case is joining stream
> The two streams were read in separately:
> instead of together:
If you want to join two streams, reading both topic separately sound
correct.
> There are twenty partitions per topic. It seems as if it is not reading
> equally fast from all topic partitions.
This should no
> partitions and topics as the current event time? If so, if you read faster
> from one partition than from the other when processing old data, what can you
> do to make sure this does not get discarded besides putting a very high grace
> period?
>
> My join window is one second, grace
from the other when processing old data, what can you do to make sure this
does not get discarded besides putting a very high grace period?
My join window is one second, grace period 50ms, retention time is default.
I use the timestamp inside the observations. But I have tried with the default
Kafka log append time.
If it's log append time, that the broker sets the timestamp. Do you use
the embedded record timestamp for the join (default)? Or do you have an
embedded timestamps in the value and use an custom `TimestampExtractor`?
How large is your join-window, what is your grace period a
I verified keys and timestamps and they match. If I start the publisher and
processor at the same time, the join has entirely correct output with 6000
messages coming in and 3000 coming out.
Putting the grace period to a higher value has no effect.
When is the watermark for the grace period
00 000 messages already on each topic). The
> job parses the data and joins the two streams. On the intermediate topics, I
> see that all the earlier 2x90 events are flowing through until the join.
> However, only 250 000 are outputted from the join, instead of 900 000.
>
> A
topic). The
job parses the data and joins the two streams. On the intermediate topics, I
see that all the earlier 2x90 events are flowing through until the join.
However, only 250 000 are outputted from the join, instead of 900 000.
After processing the burst, the code works perfect on the new
> > );
> > >
> > > KStream stream_topic1_topic2 = topic1KStream.join(
> > > topic2KTable,
> > > (topic2Id, topic1Obj) -> topic1.get("id").toString(),
> > > (topic1Obj,
> .withValueSerde(genericRecordSerde)
> > );
> >
> > KStream stream_topic1_topic2 = topic1KStream.join(
> > topic2KTable,
> > (topic2Id, topic1Obj) -> topic1.get("id").toString(),
> > (topic1Obj, t
.withValueSerde(genericRecordSerde)
> );
>
> KStream stream_topic1_topic2 = topic1KStream.join(
> topic2KTable,
> (topic2Id, topic1Obj) -> topic1.get("id").toString(),
> (topic1Obj, topic2Obj) -
Dear all,
Kafka consumer has exposed join rate metric from
consumer-coordinator-metrics. I'm wondering if there a possibility to
expose it on the broker side? As descriptions say join group rebalance is a
signal of unhealthy consumption and it's hard to get this from the client
oach would be to run it locally and just see which
> topics get created.
>
> Does that help?
> -John
>
> On Tue, Sep 4, 2018 at 10:07 PM Meeiling.Bradley <
> meeiling.brad...@target.com> wrote:
>
>> Would I be able to run a Kafka streams application contains join
hich
topics get created.
Does that help?
-John
On Tue, Sep 4, 2018 at 10:07 PM Meeiling.Bradley <
meeiling.brad...@target.com> wrote:
> Would I be able to run a Kafka streams application contains join
> KStream-KStream that consumes and produces to a Kafka cluster that has a
> forma
Would I be able to run a Kafka streams application contains join
KStream-KStream that consumes and produces to a Kafka cluster that has a formal
process to provision topics creation? Since there are behind scene creation of
and topics need to be created for join
operation. These topics are
Hi John,
Sorry for the confusion! I just noticed that we failed to document the
merge operator. I've created
https://issues.apache.org/jira/browse/KAFKA-7269 to fix it.
But in the mean time,
* merge: interleave the records from two streams to produce one collated
stream
* join: compute
Hi All,
I am a little confused on the difference between the
KStreamBuilder merge() function and doing a KStream-to-KStream Join operation.
I understand the difference between Inner, Left and Outer joins, but I don't
understand exactly what the difference is between the tw
Hi All,
I am a little confused on the difference between the
KStreamBuilder merge() function and doing a KStream-to-KStream Join operation.
I understand the difference between Inner, Left and Outer joins, but I don't
understand exactly what the difference is between the tw
Hello,
I work on the version 0.11 of Kafka, 14.04.5 of Ubuntu and 8 of Java. I have a
bug when I want to use an asymmetric time window in the Kstream-Kstream join.
More detail on this page
https://stackoverflow.com/questions/50412784/incorrect-result-kstream-kstream-join-with-asymmetric-time
Question cross-posted at SO:
https://stackoverflow.com/questions/50492491/customize-window-store-implementation-in-kstream-kstream-join
I did put an answer there.
-Matthias
On 5/23/18 8:56 AM, Edmondo Porcu wrote:
> We need to perform a Kstream - Kstream join with a very large window, where
We need to perform a Kstream - Kstream join with a very large window, where
a tick on the left would trigger a join only with the most recent record on
the right, and viceversa.
This is not how the default window works, since the WindowStoreIterator
returned by window.fetch inside the
s an updated record (cf.
> > {@link KStream} vs {@code KTable}).
>
>
> Not to belabor the point, but I wouldn't want you to focus too much on
> getting rid of the "toStream" and in favor of the same methods on KTable,
> as I think that would have the exact same semantics.
KTable,
as I think that would have the exact same semantics.
It's entirely possible that some additional tuning on the join could reduce
the deplicates you're seeing. For example, what are your current settings
for commit interval and dedup cache size?
In any case, though, Kafka Stream
Hi John,
yes, the answer is very helpful and your understanding of the data flow is
correct. Although, deduplication is not the issue because there will not be
any duplicates inserted into the flow.
These, the duplicates will be generated, from unique records after the join
between claim and
a list for each "claimNumber" of all the claim/payment pairs. Is
that right?
It's not in your example, but the grouped stream or table for "claims"
(claimStrGrouped) and "payments" (paymentGrouped) must be keyed with the
same key, right? In that case, the result of
Hi,
a while ago I hit KAFKA-4609 when running a simple pipeline. I have two
KTable joins followed by group by and aggregate on and KStream and one
additional join. Now this KTable/KTable join followed by group by and
aggregated genereates duplicates.
I wonder if a possible workaround would be
You're right Matthias, I will take your suggestion of KTable-KTable join 😊
I suppose so that meant I don't need windowing ?
-Adrien
De : Matthias J. Sax
Envoyé : lundi 9 avril 2018 20:15:17
À : users@kafka.apache.org
Objet : Re: join 2 topic stre
From what you describe I infer (but it might be a wild guess), that you
are actually trying to do KTable-KTable join?
It's all about the semantics of your input data... Note, that a Kafka
topic does not have semantics per-se; you apply semantics when you read
a topic either as stream or
stamp to the key ?
For single stock join with multiple dividends, I didn't think about it before
... is it possible ?
For join depending of timestamps why not, is it possible with windowing ?
Thank Matthias
Adrien
De : Matthias J. Sax
Envoyé : diman
he different joins work:
> https://www.confluent.io/blog/crossing-streams-joins-apache-kafka/
>
> It's hard to give a general answer -- it depends on the context of your
> application. Are keys unique? Do you want to get exactly one result or
> should a single stock join with
ingle stock join with multiple dividends? Do you want Stock
and Dividend join depending the their timestamps?
-Matthias
On 4/8/18 1:34 PM, adrien ruffie wrote:
> Hello all,
>
> I have 2 topics streamed by KStream and one KStream Dividend>
>
> I want to merge both object's in
Hello all,
I have 2 topics streamed by KStream and one KStream
I want to merge both object's informations (Stock & Dividend) and send to
another topic
with for example
The key of 2 two topic is the same. I need to use, leftJoin, merge,
KTable, ...
what is the best solution ? What do you
after is to "only trigger join with records
coming from stream a, but not from stream b". So stream B never triggers
join but only used to book keep the received records.
With the DSL stream-stream join, it is not supported since for all inner /
left / outer joins both streams' record
Get it , thank you Damian
> 在 2017年11月16日,18:55,Damian Guy 写道:
>
> Hi,
>
> You don't need to set the serde until you do another operation that
> requires serialization, i.e., if you followed the join with a `to()`,
> `groupBy()` etc, you would pass in the serde to t
Hi,
You don't need to set the serde until you do another operation that
requires serialization, i.e., if you followed the join with a `to()`,
`groupBy()` etc, you would pass in the serde to that operation.
Thanks,
Damian
On Thu, 16 Nov 2017 at 10:53 sy.pan wrote:
> Hi, all:
>
&
Hi, all:
Recently I have read kafka streams join
document(https://docs.confluent.io/current/streams/developer-guide.html#kafka-streams-dsl
<https://docs.confluent.io/current/streams/developer-guide.html#kafka-streams-dsl>).
The sample code is pasted below:
1 - 100 of 228 matches
Mail list logo