Hi, I have an already running system where I define a simple data flow (using a simple custom data flow language) and configure jobs to run against stored data. I use quartz to schedule and run these jobs and the data exists on various data stores (mainly Cassandra but some data exists in RDBMS like mysql as well).
Thinking about scalability and already existing support for standard data flow languages in the form of Pig and HiveQL, I plan to move my system to Hadoop. I've seen some efforts on the integration of Cassandra and Hadoop. I've been reading up and still am contemplating on how to make this change. It would be great to hear the recommended approach of doing this on Hadoop with the integration of Cassandra and other RDBMS. For example, a sample task that already runs on the system is "once in every hour, get rows from column family X, aggregate data in columns A, B and C and write back to column family Y, and enter details of last aggregated row into a table in mysql" Thanks in advance. -- Regards, Tharindu