I am guessing you already asked if they could give you three 100MB files instead? so you could parallelize the operation. or maybe your task doesn't lend itself well to that.
Dean On Tue, Jul 24, 2012 at 10:01 AM, Pushpalanka Jayawardhana < pushpalankaj...@gmail.com> wrote: > Hi all, > > I am dealing with a scenario where I receive a .csv file in every 10mins > intervals which is of average 300MB. I need to update a Cassandra cluster > according to the received data from .csv file, after some processing > functions. > > Current approach is keeping a Hashmap in memory, updating it from the > processed .csv files gathering the data to be updated(This data is mostly a > update on a counter). Then periodically(let's say in 2s intervals) the > values in the Hashmap are read one by one again and updated in Cassandra. > > I have tried generating sstables and loading data as batches via > sstableloader, but it is lot slower than the requirement that I need near > real time results. > > Are there any hints on what I can try out? Is there any possibility to do > something like directly updating values in a memtable (Instead of using > Hashmap) and sending to Cassandra than loading via sstables? > > > > -- > Pushpalanka Jayawardhana > > >