Re: How to read CDC from Cassandra?

Josh McKenzie Fri, 17 Feb 2017 05:41:49 -0800

1) What is exactly written to the commit log? Is it just the id or the whole
of the object?
It's a raw commit log, so the entire serialized mutation


2) If its just the IDs of the inserted/modified row, then is the
client expected
to read the whole object from the ID?
see 1

3) If its the entire payload, how does the client deserialize the payload to
the the full row?
See CommitLogReader.java and CommitLogReadHandler.java

4) What about partial updates? Some clients cannot work on partial updates and
will need to read the full object. Any recommendations for those?
We'd have to read-before-write and throw away most of the benefits of the
append-only commitlog for this data to be in a CDC log, so we don't. You'd
need to read from the DB for that.

5) What is the best way to try out the whole flow? Is it the following:
 - a) Setup cassandra.yaml for cdc and create  tables with cdc=true
 - b) Write some data to the table and see the files being generated in the
cdc_raw_directory
 - c) Launch an agent similar to CASSANDRA-11575. Consume and delete the cdc
files?
yes

On Thu, Feb 16, 2017 at 7:10 PM, S G <sg.online.em...@gmail.com> wrote:

> Hey Jay,
>
> Thanks for the pointer.
> I have spent quite some time in trying to understand this, even went
> through a good deal of
> https://github.com/apache/cassandra/commit/e31e216234c6b57a531cae607e0355
> 666007deb2,
> but I am not able to understand how this whole thing works.
>
>
> *Can someone please correct my understanding till now (stated below)?*
>
> 1) Cassandra would only write to the CDC log, and never delete from it.
> 2) Cleaning up consumed logfiles would be the client daemon's
> responibility.
> 3) Daemons should be able to checkpoint their work, and resume from where
> they left off.
>    This means they would have to leave some file artifact in the CDC log's
> directory.
> 4) Upon flush, CommitLogSegments containing data for CDC-enabled tables are
> moved to the data/cdc_raw directory until removed by the user
>
>
> *Questions:*
>
> 1) What is exactly written to the commit log? Is it just the id or the
> whole of the object?
> 2) If its just the IDs of the inserted/modified row, then is the client
> expected to read the whole object from the ID?
> 3) If its the entire payload, how does the client deserialize the payload
> to the the full row?
> 4) What about partial updates? Some clients cannot work on partial updates
> and will need to read the full object. Any recommendations for those?
> 5) What is the best way to try out the whole flow? Is it the following:
>  - a) Setup cassandra.yaml for cdc and create  tables with cdc=true
>  - b) Write some data to the table and see the files being generated in the
> cdc_raw_directory
>  - c) Launch an agent similar to CASSANDRA-11575. Consume and delete the
> cdc files?
>
> Thanks for your help,
> SG
>
>
>
> On Wed, Feb 15, 2017 at 3:19 PM, Jay Zhuang <jay.zhu...@yahoo.com.invalid>
> wrote:
>
> > I tried this CASSANDRA-11575 for 3.8. Works great.
> >
> > Thanks,
> > Jay
> >
> >
> > On 2/15/17 3:08 PM, S G wrote:
> >
> >> Hi,
> >>
> >> I have gone through several resources mentioned in
> >> http://cassandra.apache.org/doc/latest/operating/cdc.html
> >>
> >> The only thing mentioned about reading the CDC is that it is fairly
> >> straightforward with a link to
> >> https://github.com/apache/cassandra/blob/e31e216234c6b57a531
> >> cae607e0355666007deb2/src/java/org/apache/cassandra/db/
> >> commitlog/CommitLogReplayer.java#L132-L140
> >>
> >> This is way too high level.
> >>
> >> Can someone please explain or provide me the code to read CDC data after
> >> enabling this feature in Cassandra?
> >>
> >>
> >> Thanks
> >>
> >> SG
> >>
> >>
>

Re: How to read CDC from Cassandra?

Reply via email to