Hi, there was a similar discussion on the list already "Kafka stream join scenario":
http://search-hadoop.com/m/uyzND1WsAGW1vB5O91&subj=Kafka+stream+join+scenarios Long story short: there is no explicit support or guarantee. As Jay mentioned, some alignment is best effort. However, the main issues is the question, what does it mean to load a KTable *completely*. As a KTable consumers a changelog stream, there is no defined end as a KTable is a always/infinitely updating "dynamic" table... You might be able to build a custom solution for it, though (see the email thread I linked above). Hope this helps. -Matthias On 06/29/2016 04:26 AM, Gwen Shapira wrote: > Upgrade :) > > On Tue, Jun 28, 2016 at 6:49 PM, Rohit Valsakumar <rvalsaku...@tivo.com> > wrote: >> Hi Jay, >> >> Thanks for the reply. >> >> Unfortunately in our case due to legacy reasons we are using >> WallclockTimestampExtractor in the application for all the streams and the >> existing messages in the stream probably won¹t have timestamps as they are >> being produced by legacy clients. So the events are being ingested with >> processing times and it may not be able to synchronize based on the >> message timestamps. What do you recommend for this scenario? >> >> Rohit >> >> On 6/28/16, 5:18 PM, "Jay Kreps" <j...@confluent.io> wrote: >> >>> I think you may get this for free as Kafka Streams attempts to align >>> consumption across different topics/partitions by the timestamp in the >>> messages. So in a case where you are starting a job fresh and it has a >>> database changelog to consume and a event stream to consume, it will >>> attempt to keep the Ktable at the "time" the event stream is at. This is >>> only a heuristic, of course, since messages are necessarily strongly >>> ordered by time. I think this is likely mostly the same but slightly >>> better >>> than the bootstrap usage in Samza but also covers other cases of >>> alignment. >>> >>> If you want more control you can override the timestamp extractor that >>> associates time and hence priority for the streams: >>> https://kafka.apache.org/0100/javadoc/org/apache/kafka/streams/processor/T >>> imestampExtractor.html >>> >>> -Jay >>> >>> On Tue, Jun 28, 2016 at 2:49 PM, Rohit Valsakumar <rvalsaku...@tivo.com> >>> wrote: >>> >>>> Hi all, >>>> >>>> Is there a way to consume all the contents of a kafka topic into a >>>> KTable >>>> before doing a left join with another Kstream? >>>> >>>> I am looking at something that simulates a bootstrap topic in a Samza >>>> job. >>>> >>>> Thanks, >>>> Rohit Valsakumar >>>> >>>> ________________________________ >>>> >>>> This email and any attachments may contain confidential and privileged >>>> material for the sole use of the intended recipient. Any review, >>>> copying, >>>> or distribution of this email (or any attachments) by others is >>>> prohibited. >>>> If you are not the intended recipient, please contact the sender >>>> immediately and permanently delete this email and any attachments. No >>>> employee or agent of TiVo Inc. is authorized to conclude any binding >>>> agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo >>>> Inc. may only be made by a signed written agreement. >>>> >> >> >> ________________________________ >> >> This email and any attachments may contain confidential and privileged >> material for the sole use of the intended recipient. Any review, copying, or >> distribution of this email (or any attachments) by others is prohibited. If >> you are not the intended recipient, please contact the sender immediately >> and permanently delete this email and any attachments. No employee or agent >> of TiVo Inc. is authorized to conclude any binding agreement on behalf of >> TiVo Inc. by email. Binding agreements with TiVo Inc. may only be made by a >> signed written agreement.
signature.asc
Description: OpenPGP digital signature