Re: "End of Batch" event

2017-02-01 Thread Matthias J. Sax
But you cant delete them from the local store like this... you need to process tombstone to get them deleted from there. The idea about the design is, to compute those tombstone an inject them into the source topics. -Matthias On 2/1/17 3:34 PM, Gwen Shapira wrote: > I'm wondering why it has to b

Re: "End of Batch" event

2017-02-01 Thread Gwen Shapira
I'm wondering why it has to be so complex... Kafka can be configured to delete items older than 24h in a topic. So if you want to get rid of records that did not arrive in the last 24h, just configure the topic accordingly? On Wed, Feb 1, 2017 at 2:37 PM, Matthias J. Sax wrote: > Understood now.

Re: "End of Batch" event

2017-02-01 Thread Matthias J. Sax
Understood now. It's a tricky problem you have, and the only solution I can come up with is quite complex -- maybe anybody else has a better idea? Honestly, I am not sure if this will work: For my proposal, the source ID must be part of the key of your records to distinguish records from differen

Re: "End of Batch" event

2017-01-31 Thread Eric Dain
Sorry for the confusion, I stopped the example before processing the file from S2. So in day 2, if we get S2=[D,E, Z], we will have to remove F and add Z; K = [A,B,D,E,Z] To elaborate more, A, B and C belong to S1 ( items have field to state their source). Processing files from S1 should never de

Re: "End of Batch" event

2017-01-31 Thread Matthias J. Sax
Thanks for the update. What is not clear to me: why do you only need to remove C, but not D,E,F, too, as source2 does not deliver any data on day 2? Furhtermore, IQ is designed to be use outside of you Streams code, and thus, you should no use it in SourceTask (not sure if this would even be poss

Re: "End of Batch" event

2017-01-31 Thread Eric Dain
Sorry for not being clear. Let me explain by example. Let's say I have two sources S1 and S2. The application that I need to write will load the files from these sources every 24 hours. The results will be KTable K. For day 1: S1=[A, B, C] => the result K = [A,B,C] S2=[D,E,F] => K will be [A

Re: "End of Batch" event

2017-01-31 Thread Matthias J. Sax
I am not sure if I understand the complete scenario yet. > I need to delete all items from that source that > doesn't exist in the latest CSV file. Cannot follow here. I thought your CSV files provide the data you want to process. But it seems you also have a second source? How does your Streams

Re: "End of Batch" event

2017-01-30 Thread Eric Dain
Thanks Matthias for your reply. I'm not trying to stop the application. I'm importing inventory from CSV files coming from 3rd party sources. The CSVs are snapshots for each source's inventory. I need to delete all items from that source that doesn't exist in the latest CSV file. I was thinking o

Re: "End of Batch" event

2017-01-29 Thread Matthias J. Sax
Hi, currently, a Kafka Streams application is designed to "run forever" and there is no notion of "End of Batch" -- we have plans to add this though... (cf. https://cwiki.apache.org/confluence/display/KAFKA/KIP-95%3A+Incremental+Batch+Processing+for+Kafka+Streams) Thus, right now you need to stop