Re: Kafka Streams: finding a solution to a particular use case

Henry Cai Tue, 19 Apr 2016 10:03:32 -0700

Related to the log compaction question: " it will be log
compacted on the key over time", how do we control the time for log
compaction?  For the log compaction implementation, is the storage used to
map a new value for a given key stored in memory or on disk?


On Tue, Apr 19, 2016 at 8:58 AM, Guillermo Lammers Corral <
guillermo.lammers.cor...@tecsisa.com> wrote:

> Hello,
>
> Thanks again for your reply :)
>
> 1) In my example when I send a record from outer table and there is no
> matching record from inner table I receive data to the output topic and
> vice versa. I am trying it with the topics empties at the first execution.
> How is possible?
>
> Why KTable joins does not support windowing strategies? I think that for
> this use cases I need it, what do you think?
>
> 2) What does it means? Although the log may not be yet compacted, there
> should be no problem to read from them and execute a new stream process,
> right? (like a new joins, counts...).
>
> Thanks!!
>
> 2016-04-15 17:37 GMT+02:00 Guozhang Wang <wangg...@gmail.com>:
>
> > 1) There are three types of joins for KTable-KTable join, the follow the
> > same semantics in SQL joins:
> >
> > KTable.join(KTable): when there is no matching record from inner table
> when
> > received a new record from outer table, no output; and vice versa.
> > KTable.leftjoin(KTable): when there is no matching record from inner
> table
> > when received a new record from outer table, output (a, null); on the
> other
> > direction no output.
> > KTable.outerjoin(KTable): when there is no matching record from inner /
> > outer table when received a new record from outer / inner table, output
> (a,
> > null) or (null, b).
> >
> >
> > 2) The result topic is also a changelog topic, although it will be log
> > compacted on the key over time, if you consume immediately the log may
> not
> > be yet compacted.
> >
> >
> > Guozhang
> >
> > On Fri, Apr 15, 2016 at 2:11 AM, Guillermo Lammers Corral <
> > guillermo.lammers.cor...@tecsisa.com> wrote:
> >
> > > Hi Guozhang,
> > >
> > > Thank you very much for your reply and sorry for the generic question,
> > I'll
> > > try to explain with some pseudocode.
> > >
> > > I have two KTable with a join:
> > >
> > > ktable1: KTable[String, String] = builder.table("topic1")
> > > ktable2: KTable[String, String] = builder.table("topic2")
> > >
> > > result: KTable[String, ResultUnion] =
> > > ktable1.join(ktable2, (data1, data2) => new ResultUnion(data1, data2))
> > >
> > > I send the result to a topic result.to("resultTopic").
> > >
> > > My questions are related with the following scenario:
> > >
> > > - The streming is up & running without data in topics
> > >
> > > - I send data to "topic2", for example a key/value like that
> > ("uniqueKey1",
> > > "hello")
> > >
> > > - I see null values in topic "resultTopic", i.e. ("uniqueKey1", null)
> > >
> > > - If I send data to "topic1", for example a key/value like that
> > > ("uniqueKey1", "world") then I see this values in topic "resultTopic",
> > > ("uniqueKey1", ResultUnion("hello", "world"))
> > >
> > > Q: If we send data for one of the KTable that does not have the
> > > corresponding data by key in the other one, obtain null values in the
> > > result final topic is the expected behavior?
> > >
> > > My next step would be use Kafka Connect to persist result data in C* (I
> > > have not read yet the Connector docs...), is this the way to do it? (I
> > mean
> > > prepare the data in the topic).
> > >
> > > Q: On the other hand, just to try, I have a KTable that read messages
> in
> > > "resultTopic" and prints them. If the stream is a KTable I am wondering
> > why
> > > is getting all the values from the topic even those with the same key?
> > >
> > > Thanks in advance! Great job answering community!
> > >
> > > 2016-04-14 20:00 GMT+02:00 Guozhang Wang <wangg...@gmail.com>:
> > >
> > > > Hi Guillermo,
> > > >
> > > > 1) Yes in your case, the streams are really a "changelog" stream,
> hence
> > > you
> > > > should create the stream as KTable, and do KTable-KTable join.
> > > >
> > > > 2) Could elaborate about "achieving this"? What behavior do require
> in
> > > the
> > > > application logic?
> > > >
> > > >
> > > > Guozhang
> > > >
> > > >
> > > > On Thu, Apr 14, 2016 at 1:30 AM, Guillermo Lammers Corral <
> > > > guillermo.lammers.cor...@tecsisa.com> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I am a newbie to Kafka Streams and I am using it trying to solve a
> > > > > particular use case. Let me explain.
> > > > >
> > > > > I have two sources of data both like that:
> > > > >
> > > > > Key (string)
> > > > > DateTime (hourly granularity)
> > > > > Value
> > > > >
> > > > > I need to join the two sources by key and date (hour of day) to
> > obtain:
> > > > >
> > > > > Key (string)
> > > > > DateTime (hourly granularity)
> > > > > ValueSource1
> > > > > ValueSource2
> > > > >
> > > > > I think that first I'd need to push the messages in Kafka topics
> with
> > > the
> > > > > date as part of the key because I'll group by key taking into
> account
> > > the
> > > > > date. So maybe the key must be a new string like key_timestamp.
> But,
> > of
> > > > > course, it is not the main problem, is just an additional
> > explanation.
> > > > >
> > > > > Ok, so data are in topics, here we go!
> > > > >
> > > > > - Multiple records allows per key but only the latest value for a
> > > record
> > > > > key will be considered. I should use two KTable with some join
> > > strategy,
> > > > > right?
> > > > >
> > > > > - Data of both sources could arrive at any time. What can I do to
> > > achieve
> > > > > this?
> > > > >
> > > > > Thanks in advance.
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>

Re: Kafka Streams: finding a solution to a particular use case

Reply via email to