I wrote an ETL tool for Cassandra which is based on scanning the binary
commit log of each node, extracting which keys have received inserts,
filtering them by the column timestamp to only select the last X minutes
mutations, then it issues a multiget to Cassandra to get the freshest
version of the
Hello Chin,
you can extract delta using pig script and save it in another CF in Cassandra.
By using Pentaho kettle you can then load the data from the CF to RDBMS.
Pentaho Kettle is open source project. All of the process you can automate
through Azkaban or Ozzie.
Kafka is also an alternatives
Why would you use Cassandra for primary store of logging information? Have
you considered Kafka ?
You could , of course, then fan out the logs to both Cassandra (on a near
real time basis ) and then on a daily basis (if you wish) extract the
"deltas" from Kafka into a RDBMS; with no PIG/Hive etc.