Why would you use Cassandra for primary store of logging information? Have
you considered Kafka ?

You could , of course, then fan out the logs to both Cassandra (on a near
real time basis ) and then on a daily basis (if you wish) extract the
"deltas" from Kafka into a RDBMS; with no PIG/Hive etc.


Regards
Milind


Regards
Milind



On Thu, Dec 13, 2012 at 7:19 PM, cko2...@gmail.com <cko2...@gmail.com>wrote:

> We will use Cassandra as logging storage in one of our web application.
> The application only insert rows into Cassandra but never update or delete
> any rows. The CF is expected to grow by about 0.5 million rows per day.
>
> We need to transfer the data in Cassandra to another relational database
> daily. Due to the large size of the CF, instead of truncating the
> relational table and reloading all rows into it each time, we plan to run a
> job to select the "delta" rows since the last run and insert them into the
> relational database.
>
> We know we can use Java, Pig or Hive to extract the delta rows to a flat
> file and load the data into the target relational table. We are
> particularly interested in a process that can extract delta rows without
> scanning the entire CF.
>
> Has anyone used any other ETL tools to do this kind of delta extraction
> from Cassandra? We appreciate any comments and experience.
>
> Thanks,
> Chin
>

Reply via email to