Related to the log compaction question: " it will be log compacted on the key over time", how do we control the time for log compaction? For the log compaction implementation, is the storage used to map a new value for a given key stored in memory or on disk?
On Tue, Apr 19, 2016 at 8:58 AM, Guillermo Lammers Corral < guillermo.lammers.cor...@tecsisa.com> wrote: > Hello, > > Thanks again for your reply :) > > 1) In my example when I send a record from outer table and there is no > matching record from inner table I receive data to the output topic and > vice versa. I am trying it with the topics empties at the first execution. > How is possible? > > Why KTable joins does not support windowing strategies? I think that for > this use cases I need it, what do you think? > > 2) What does it means? Although the log may not be yet compacted, there > should be no problem to read from them and execute a new stream process, > right? (like a new joins, counts...). > > Thanks!! > > 2016-04-15 17:37 GMT+02:00 Guozhang Wang <wangg...@gmail.com>: > > > 1) There are three types of joins for KTable-KTable join, the follow the > > same semantics in SQL joins: > > > > KTable.join(KTable): when there is no matching record from inner table > when > > received a new record from outer table, no output; and vice versa. > > KTable.leftjoin(KTable): when there is no matching record from inner > table > > when received a new record from outer table, output (a, null); on the > other > > direction no output. > > KTable.outerjoin(KTable): when there is no matching record from inner / > > outer table when received a new record from outer / inner table, output > (a, > > null) or (null, b). > > > > > > 2) The result topic is also a changelog topic, although it will be log > > compacted on the key over time, if you consume immediately the log may > not > > be yet compacted. > > > > > > Guozhang > > > > On Fri, Apr 15, 2016 at 2:11 AM, Guillermo Lammers Corral < > > guillermo.lammers.cor...@tecsisa.com> wrote: > > > > > Hi Guozhang, > > > > > > Thank you very much for your reply and sorry for the generic question, > > I'll > > > try to explain with some pseudocode. > > > > > > I have two KTable with a join: > > > > > > ktable1: KTable[String, String] = builder.table("topic1") > > > ktable2: KTable[String, String] = builder.table("topic2") > > > > > > result: KTable[String, ResultUnion] = > > > ktable1.join(ktable2, (data1, data2) => new ResultUnion(data1, data2)) > > > > > > I send the result to a topic result.to("resultTopic"). > > > > > > My questions are related with the following scenario: > > > > > > - The streming is up & running without data in topics > > > > > > - I send data to "topic2", for example a key/value like that > > ("uniqueKey1", > > > "hello") > > > > > > - I see null values in topic "resultTopic", i.e. ("uniqueKey1", null) > > > > > > - If I send data to "topic1", for example a key/value like that > > > ("uniqueKey1", "world") then I see this values in topic "resultTopic", > > > ("uniqueKey1", ResultUnion("hello", "world")) > > > > > > Q: If we send data for one of the KTable that does not have the > > > corresponding data by key in the other one, obtain null values in the > > > result final topic is the expected behavior? > > > > > > My next step would be use Kafka Connect to persist result data in C* (I > > > have not read yet the Connector docs...), is this the way to do it? (I > > mean > > > prepare the data in the topic). > > > > > > Q: On the other hand, just to try, I have a KTable that read messages > in > > > "resultTopic" and prints them. If the stream is a KTable I am wondering > > why > > > is getting all the values from the topic even those with the same key? > > > > > > Thanks in advance! Great job answering community! > > > > > > 2016-04-14 20:00 GMT+02:00 Guozhang Wang <wangg...@gmail.com>: > > > > > > > Hi Guillermo, > > > > > > > > 1) Yes in your case, the streams are really a "changelog" stream, > hence > > > you > > > > should create the stream as KTable, and do KTable-KTable join. > > > > > > > > 2) Could elaborate about "achieving this"? What behavior do require > in > > > the > > > > application logic? > > > > > > > > > > > > Guozhang > > > > > > > > > > > > On Thu, Apr 14, 2016 at 1:30 AM, Guillermo Lammers Corral < > > > > guillermo.lammers.cor...@tecsisa.com> wrote: > > > > > > > > > Hi, > > > > > > > > > > I am a newbie to Kafka Streams and I am using it trying to solve a > > > > > particular use case. Let me explain. > > > > > > > > > > I have two sources of data both like that: > > > > > > > > > > Key (string) > > > > > DateTime (hourly granularity) > > > > > Value > > > > > > > > > > I need to join the two sources by key and date (hour of day) to > > obtain: > > > > > > > > > > Key (string) > > > > > DateTime (hourly granularity) > > > > > ValueSource1 > > > > > ValueSource2 > > > > > > > > > > I think that first I'd need to push the messages in Kafka topics > with > > > the > > > > > date as part of the key because I'll group by key taking into > account > > > the > > > > > date. So maybe the key must be a new string like key_timestamp. > But, > > of > > > > > course, it is not the main problem, is just an additional > > explanation. > > > > > > > > > > Ok, so data are in topics, here we go! > > > > > > > > > > - Multiple records allows per key but only the latest value for a > > > record > > > > > key will be considered. I should use two KTable with some join > > > strategy, > > > > > right? > > > > > > > > > > - Data of both sources could arrive at any time. What can I do to > > > achieve > > > > > this? > > > > > > > > > > Thanks in advance. > > > > > > > > > > > > > > > > > > > > > -- > > > > -- Guozhang > > > > > > > > > > > > > > > -- > > -- Guozhang > > >