Why would you use Cassandra for primary store of logging information? Have you considered Kafka ?
You could , of course, then fan out the logs to both Cassandra (on a near real time basis ) and then on a daily basis (if you wish) extract the "deltas" from Kafka into a RDBMS; with no PIG/Hive etc. Regards Milind Regards Milind On Thu, Dec 13, 2012 at 7:19 PM, cko2...@gmail.com <cko2...@gmail.com>wrote: > We will use Cassandra as logging storage in one of our web application. > The application only insert rows into Cassandra but never update or delete > any rows. The CF is expected to grow by about 0.5 million rows per day. > > We need to transfer the data in Cassandra to another relational database > daily. Due to the large size of the CF, instead of truncating the > relational table and reloading all rows into it each time, we plan to run a > job to select the "delta" rows since the last run and insert them into the > relational database. > > We know we can use Java, Pig or Hive to extract the delta rows to a flat > file and load the data into the target relational table. We are > particularly interested in a process that can extract delta rows without > scanning the entire CF. > > Has anyone used any other ETL tools to do this kind of delta extraction > from Cassandra? We appreciate any comments and experience. > > Thanks, > Chin >