Hi everyone, I currently have a column family InputCf in production which has 1 data input per row. Everytime I receive new data from web, I insert a row in this CF. Besides that, I have another CF InputCfIndex in which the year/month/day is my row id (yyyyMMdd) and I insert the id of InputCf on each column, with no value. At the end of the day, I check all the row inserted that day on InputCf and process it. Reading the id from InputCfIndex is fast, but reading from InputCf uses a lot of IO, because I cannot know in which machine on the cluster the data will be. When I query Cassandra for all the rows inserted today in InputCf, it takes me 100% of Network IO utilization and almost no cpu or memory consumption. I was wondering if there is a way of quering a lot of messages at a time, but multi_get orchestration happens in the client and as data is distributed along the cluster, I am not sure it would help. So here is my question: any ideas of how to change my model to be able to query several inputs at a time, consuming less network IO? I am guessing there must be a way of optimizing it...
Best regards, -- Marcelo Elias Del Valle http://mvalle.com - @mvallebr