Hi,

I have an already running system where I define a simple data flow (using a
simple custom data flow language) and configure jobs to run against stored
data. I use quartz to schedule and run these jobs and the data exists on
various data stores (mainly Cassandra but some data exists in RDBMS like
mysql as well).

Thinking about scalability and already existing support for standard data
flow languages in the form of Pig and HiveQL, I plan to move my system to
Hadoop.

I've seen some efforts on the integration of Cassandra and Hadoop. I've been
reading up and still am contemplating on how to make this change.

It would be great to hear the recommended approach of doing this on Hadoop
with the integration of Cassandra and other RDBMS. For example, a sample
task that already runs on the system is "once in every hour, get rows from
column family X, aggregate data in columns A, B and C and write back to
column family Y, and enter details of last aggregated row into a table in
mysql"

Thanks in advance.

-- 
Regards,

Tharindu

Reply via email to