[ https://issues.apache.org/jira/browse/KAFKA-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15833762#comment-15833762 ]
Guozhang Wang commented on KAFKA-4113: -------------------------------------- Hello [~mitch-seymour], sorry for the late reply. In a recent change we add overloaded functions in {{builder.table/stream}} calls so that users can specify different "reset policies" per stream. For example, you can do the following: {code} myTable = builder.table(EARLEST, "topic1", ..); myStream = builder.stream(LATEST, "topic2", ..); myStream.join(myTable)... {code} If there is no committed offsets (i.e. the application has not executed before), then different streams will use their own reset policies; if it is not specified then the global config value in {{ConsumerConfig}} is used. Does this solve your use case? > Allow KTable bootstrap > ---------------------- > > Key: KAFKA-4113 > URL: https://issues.apache.org/jira/browse/KAFKA-4113 > Project: Kafka > Issue Type: Sub-task > Components: streams > Reporter: Matthias J. Sax > Assignee: Guozhang Wang > > On the mailing list, there are multiple request about the possibility to > "fully populate" a KTable before actual stream processing start. > Even if it is somewhat difficult to define, when the initial populating phase > should end, there are multiple possibilities: > The main idea is, that there is a rarely updated topic that contains the > data. Only after this topic got read completely and the KTable is ready, the > application should start processing. This would indicate, that on startup, > the current partition sizes must be fetched and stored, and after KTable got > populated up to those offsets, stream processing can start. > Other discussed ideas are: > 1) an initial fixed time period for populating > (it might be hard for a user to estimate the correct value) > 2) an "idle" period, ie, if no update to a KTable for a certain time is > done, we consider it as populated > 3) a timestamp cut off point, ie, all records with an older timestamp > belong to the initial populating phase > The API change is not decided yet, and the API desing is part of this JIRA. > One suggestion (for option (4)) was: > {noformat} > KTable table = builder.table("topic", 1000); // populate the table without > reading any other topics until see one record with timestamp 1000. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)