io bound model

Marcelo Elias Del Valle Tue, 26 Nov 2013 05:43:20 -0800

Hi everyone,

    I currently have a column family InputCf in production which has 1 data
input per row. Everytime I receive new data from web, I insert a row in
this CF. Besides that, I have another CF InputCfIndex in which the
year/month/day is my row id (yyyyMMdd) and I insert the id of InputCf on
each column, with no value.
    At the end of the day, I check all the row inserted that day on InputCf
and process it. Reading the id from InputCfIndex is fast, but reading from
InputCf uses a lot of IO, because I cannot know in which machine on the
cluster the data will be. When I query Cassandra for all the rows inserted
today in InputCf, it takes me 100% of Network IO utilization and almost no
cpu or memory consumption.
    I was wondering if there is a way of quering a lot of messages at a
time, but multi_get orchestration happens in the client and as data is
distributed along the cluster, I am not sure it would help.
    So here is my question: any ideas of how to change my model to be able
to query several inputs at a time, consuming less network IO? I am guessing
there must be a way of optimizing it...


Best regards,
-- 
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr

io bound model

Reply via email to