Re: Re-processing a la Kappa/Liquid

2015-02-23 Thread Jay Kreps
d you always have a key and therefore generate one? > > Best regards,Tom > From: Felix GV > To: "dev@samza.apache.org" > Sent: Monday, February 23, 2015 2:15 PM > Subject: RE: Re-processing a la Kappa/Liquid > > A recently-compacted topic is pretty similar to a s

Re: Re-processing a la Kappa/Liquid

2015-02-23 Thread Thomas Bernhardt
Assume that there is data that doesn't have a key, how would you handle that? Would you always have a key and therefore generate one? Best regards,Tom From: Felix GV To: "dev@samza.apache.org" Sent: Monday, February 23, 2015 2:15 PM Subject: RE: Re-processing a la Kap

Re: Re-processing a la Kappa/Liquid

2015-02-23 Thread Roger Hoover
v > > > From: Roger Hoover [roger.hoo...@gmail.com] > Sent: Monday, February 23, 2015 10:33 AM > To: dev@samza.apache.org > Subject: Re: Re-processing a la Kappa/Liquid > > Thanks, Julian. > > I didn't see any menti

Re: Re-processing a la Kappa/Liquid

2015-02-23 Thread Roger Hoover
Ah, right. To save historical snapshots, one could periodically read the whole compacted topic and save it somewhere. Sent from my iPhone > On Feb 23, 2015, at 11:11 AM, Jay Kreps wrote: > > Basically log compaction == snapshot in a logical format. You can optimize > a tiny bit more, of cours

RE: Re-processing a la Kappa/Liquid

2015-02-23 Thread Felix GV
a la Kappa/Liquid Thanks, Julian. I didn't see any mention of checkpoints in Kappa or Liquid information I've read but it does seem like a very useful optimization to make re-processing and failure recovery much faster. Databus supports snapshots, I believe, so that DB replica

Re: Re-processing a la Kappa/Liquid

2015-02-23 Thread Jay Kreps
Basically log compaction == snapshot in a logical format. You can optimize a tiny bit more, of course, if you store the data files themselves for whatever store but that is going to be very storage engine specific. -Jay On Mon, Feb 23, 2015 at 10:33 AM, Roger Hoover wrote: > Thanks, Julian. > >

Re: Re-processing a la Kappa/Liquid

2015-02-23 Thread Roger Hoover
Thanks, Julian. I didn't see any mention of checkpoints in Kappa or Liquid information I've read but it does seem like a very useful optimization to make re-processing and failure recovery much faster. Databus supports snapshots, I believe, so that DB replicates can be initialized in a practical

Re: Re-processing a la Kappa/Liquid

2015-02-22 Thread Julian Hyde
Can I quibble with semantics? This problem seems to be more naturally a stream-to-stream join, not a stream-to-table join. It seems unreasonable to expect the system to be able to give you the state of a table at a given moment in the past, but it is reasonable ask for the stream up to that po

Re: Re-processing a la Kappa/Liquid

2015-02-21 Thread Roger Hoover
Thanks, Jay. This is one of the really nice advantages of local state in my mind. Full retention would work but eventually run out of space, right? Ideally, Kafka would guarantee to keep dirty keys for a configurable amount of time as Chris suggested. Sent from my iPhone > On Feb 21, 2015,

Re: Re-processing a la Kappa/Liquid

2015-02-21 Thread Jay Kreps
Gotcha. Yes if you want to be able to join to past versions you definitely can't turn on compaction as the whole goal of that feature is to delete past versions. But wouldn't it work to use full retention if you want that (and use the MessageChooser interface during reprocessing if you want tight c

Re: Re-processing a la Kappa/Liquid

2015-02-20 Thread Chris Riccomini
I think the nuance in Roger's example is that the stream that's being rewound is an event stream not a primary data stream. As such, going back to the earliest offer might only bring you back a week. If you want a consistent view of that time, you'd want your table join to be the view as of a week

Re: Re-processing a la Kappa/Liquid

2015-02-20 Thread Roger Hoover
Jay, Sorry, I didn't explain it very well. I'm talking about a stream-table join where the table comes from a compacted topic that is used to populate a local data store. As the stream events are processed, they are joined with dimension data from the local store. If you want to kick off anothe

Re: Re-processing a la Kappa/Liquid

2015-02-20 Thread Jay Kreps
Hey Roger, I'm not sure if I understand the case you are describing. As Chris says we don't yet give you fined grained control over when history starts to disappear (though we designed with the intention of making that configurable later). However I'm not sure if you need that for the case you de

Re: Re-processing a la Kappa/Liquid

2015-02-20 Thread Roger Hoover
Chris, Thank you for this really great response. I don't need this right now but wanted to understand the limitations. If Kafka could guarantee to keep the last N hours dirty, that would provide foundation to build on. Thanks, Roger On Fri, Feb 20, 2015 at 8:16 AM, Chris Riccomini wrote: >

Re: Re-processing a la Kappa/Liquid

2015-02-20 Thread Chris Riccomini
Hey Roger, I believe your description is correct. Kafka has a "dirty ratio" concept for its log-compacted topics. Once the dirty (unclean) portion of the log passes the "dirty ratio" (e.g. 50% of the log hasn't been clean, based on bytes on disk), the log cleaner kicks in. Once this happens, you c