Aaron, thanks a lot for you response! It gave us many ideas for future re-factorings.
Meanwhile, while trying to monitor Cassandra response times on all 3 servers (online, offline and cassandra itself), I have noticed that the system time was different on all 3. After I ran ntpdate on all of them, the problem was gone! The changes saved in Cassandra on offline are immediately visible to online. Unfortunately, I cannot explain, why system time on the client machine matters, but I really hope that I have found the root cause of the problem, and it is not just a coincidence that performance has improved, after I have synched the times. Best, Vitaly Sourikov On Wed, Jan 9, 2013 at 4:24 AM, aaron morton <aa...@thelastpickle.com>wrote: > EC2 m1.large node > > You will have a much happier time if you use a m1.xlarge. > > We set MAX_HEAP_SIZE="6G" and HEAP_NEWSIZE="400M" > > Thats a pretty low new heap size. > > checks for new entries (in "Entries" CF, with indexed column status=1), > processes them, and sets the status to 2, when done > > This is not the best data model. > You may be better have one CF for the unprocessed and one for the process. > Or if you really need a queue using something like Kafka. > > I will appreciate any advice on how to speed the writes up, > > Writes are instantly available for reading. > The first thing I would do is see where the delay is. Use the nodetool > cfstats to see the local write latency, or track the write latency from the > client perspective. > > If you are looking for near real time / continuous computation style > processing take a look at http://storm-project.net/ and register for this > talk from a Brian O'Neill one of my fellow Data Stax MVP's > http://learn.datastax.com/WebinarCEPDistributedProcessingonCassandrawithStorm_Registration.html > > Cheers > > ----------------- > Aaron Morton > Freelance Cassandra Developer > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 9/01/2013, at 5:48 AM, Vitaly Sourikov <vitaly.souri...@gmail.com> > wrote: > > Hi, > we are currently at an early stage of our project and have only one > Cassandra 1.1.7 node hosted on EC2 m1.large node, where the data is written > to the ephemeral disk, and /var/lib/cassandra/data is just a soft link to > it. Commit logs and caches are still on /var/lib/cassandra/. We set > MAX_HEAP_SIZE="6G" and HEAP_NEWSIZE="400M" > > On the client-side, we use Astyanax 1.56.18 to access the data. We have a > processing server that writes to Cassandra, and an online server that reads > from it. The former wakes up every 0.5-5sec., checks for new entries (in > "Entries" CF, with indexed column status=1), processes them, and sets the > status to 2, when done. The online server checks once a second if an entry > that should be processed got the status 2 and sends it to its client side > for display. Processing takes 5-10 seconds and updates various columns in > the "Entries" CF few times on the way. One of these columns may contain > ~12KB of textual data, others are just short strings or numbers. > > Now, our problem is that it takes 20-40 seconds before the online server > actually sees the change - and it is way too long, this process is supposed > to be nearly real-time. Moreover, in sqlsh, if I perform a similar update, > it is immediately seen in the following select results, but the updates > from the back-end server also do not appear for 20-40 seconds. > > I tried switching the row caches for that table and in yaml on and of. I > tried commitlog_sync: batch with commitlog_sync_batch_window_in_ms: 50. > Nothing helped. > > I will appreciate any advice on how to speed the writes up, or at least an > explanation why this happens. > > thanks, > Vitaly > > >