d you always have a key and therefore generate one?
>
> Best regards,Tom
> From: Felix GV
> To: "dev@samza.apache.org"
> Sent: Monday, February 23, 2015 2:15 PM
> Subject: RE: Re-processing a la Kappa/Liquid
>
> A recently-compacted topic is pretty similar to a s
Assume that there is data that doesn't have a key, how would you handle that?
Would you always have a key and therefore generate one?
Best regards,Tom
From: Felix GV
To: "dev@samza.apache.org"
Sent: Monday, February 23, 2015 2:15 PM
Subject: RE: Re-processing a la Kap
v
>
>
> From: Roger Hoover [roger.hoo...@gmail.com]
> Sent: Monday, February 23, 2015 10:33 AM
> To: dev@samza.apache.org
> Subject: Re: Re-processing a la Kappa/Liquid
>
> Thanks, Julian.
>
> I didn't see any menti
Ah, right. To save historical snapshots, one could periodically read the whole
compacted topic and save it somewhere.
Sent from my iPhone
> On Feb 23, 2015, at 11:11 AM, Jay Kreps wrote:
>
> Basically log compaction == snapshot in a logical format. You can optimize
> a tiny bit more, of cours
a la Kappa/Liquid
Thanks, Julian.
I didn't see any mention of checkpoints in Kappa or Liquid information I've
read but it does seem like a very useful optimization to make re-processing
and failure recovery much faster. Databus supports snapshots, I believe,
so that DB replica
Basically log compaction == snapshot in a logical format. You can optimize
a tiny bit more, of course, if you store the data files themselves for
whatever store but that is going to be very storage engine specific.
-Jay
On Mon, Feb 23, 2015 at 10:33 AM, Roger Hoover
wrote:
> Thanks, Julian.
>
>
Thanks, Julian.
I didn't see any mention of checkpoints in Kappa or Liquid information I've
read but it does seem like a very useful optimization to make re-processing
and failure recovery much faster. Databus supports snapshots, I believe,
so that DB replicates can be initialized in a practical
Can I quibble with semantics?
This problem seems to be more naturally a stream-to-stream join, not a
stream-to-table join. It seems unreasonable to expect the system to be able to
give you the state of a table at a given moment in the past, but it is
reasonable ask for the stream up to that po
Thanks, Jay. This is one of the really nice advantages of local state in my
mind. Full retention would work but eventually run out of space, right?
Ideally, Kafka would guarantee to keep dirty keys for a configurable amount of
time as Chris suggested.
Sent from my iPhone
> On Feb 21, 2015,
Gotcha. Yes if you want to be able to join to past versions you definitely
can't turn on compaction as the whole goal of that feature is to delete
past versions. But wouldn't it work to use full retention if you want that
(and use the MessageChooser interface during reprocessing if you want tight
c
I think the nuance in Roger's example is that the stream that's being
rewound is an event stream not a primary data stream. As such, going back
to the earliest offer might only bring you back a week. If you want a
consistent view of that time, you'd want your table join to be the view as
of a week
Jay,
Sorry, I didn't explain it very well. I'm talking about a stream-table
join where the table comes from a compacted topic that is used to populate
a local data store. As the stream events are processed, they are joined
with dimension data from the local store.
If you want to kick off anothe
Hey Roger,
I'm not sure if I understand the case you are describing.
As Chris says we don't yet give you fined grained control over when history
starts to disappear (though we designed with the intention of making that
configurable later). However I'm not sure if you need that for the case you
de
Chris,
Thank you for this really great response. I don't need this right now but
wanted to understand the limitations. If Kafka could guarantee to keep the
last N hours dirty, that would provide foundation to build on.
Thanks,
Roger
On Fri, Feb 20, 2015 at 8:16 AM, Chris Riccomini
wrote:
>
Hey Roger,
I believe your description is correct. Kafka has a "dirty ratio" concept
for its log-compacted topics. Once the dirty (unclean) portion of the log
passes the "dirty ratio" (e.g. 50% of the log hasn't been clean, based on
bytes on disk), the log cleaner kicks in. Once this happens, you c
Chris + Samza Devs,
I was wondering whether Samza could support re-processing as described by
the Kappa architecture or Liquid (
http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper25u.pdf).
It seems that a changelog is not sufficient to be able to restore state
backward in time. Kafka compaction
Chris + Samza Devs,
I was wondering whether Samza could support re-processing as described by
the Kappa architecture or Liquid (
http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper25u.pdf).
It seems that a changelog is not sufficient to be able to restore state
backward in time. Kafka compaction
17 matches
Mail list logo