Hi, The timestamp is generated on the client side, so actually if you have two clients which sets the timestamp from the system time, you will experience trouble. I don't know how Astyanax does it, and I am not sure if it would cause trouble when getting data? Could it be that the Process server actually saw the information, but tried to update with a lower timestamp - which then again means that failed - until 40 seconds had passed. From http://wiki.apache.org/cassandra/DataModel:"All values are supplied by the client, including the 'timestamp'. This means that clocks on the clients should be synchronized (in the Cassandra server environment is useful also), as these timestamps are used for conflict resolution. In many cases the 'timestamp' is not used in client applications, and it becomes convenient to think of a column as a name/value pair. For the remainder of this document, 'timestamps' will be elided for readability. It is also worth noting the name and value are binary values, although in many applications they are UTF8 serialized strings." .vegard, ----- Original Message ----- From: user@cassandra.apache.org To: Cc: Sent:Wed, 9 Jan 2013 15:56:08 +0200 Subject:Re: How long does it take for a write to actually happen?
Aaron, thanks a lot for you response! It gave us many ideas for future re-factorings. Meanwhile, while trying to monitor Cassandra response times on all 3 servers (online, offline and cassandra itself), I have noticed that the system time was different on all 3. After I ran ntpdate on all of them, the problem was gone! The changes saved in Cassandra on offline are immediately visible to online. Unfortunately, I cannot explain, why system time on the client machine matters, but I really hope that I have found the root cause of the problem, and it is not just a coincidence that performance has improved, after I have synched the times. Best, Vitaly Sourikov On Wed, Jan 9, 2013 at 4:24 AM, aaron morton wrote: EC2 m1.large nodeYou will have a much happier time if you use a m1.xlarge. We set MAX_HEAP_SIZE="6G" and HEAP_NEWSIZE="400M" Thats a pretty low new heap size. checks for new entries (in "Entries" CF, with indexed column status=1), processes them, and sets the status to 2, when doneThis is not the best data model. You may be better have one CF for the unprocessed and one for the process. Or if you really need a queue using something like Kafka. I will appreciate any advice on how to speed the writes up,Writes are instantly available for reading. The first thing I would do is see where the delay is. Use the nodetool cfstats to see the local write latency, or track the write latency from the client perspective. If you are looking for near real time / continuous computation style processing take a look at http://storm-project.net/ [2] and register for this talk from a Brian O'Neill one of my fellow Data Stax MVP's http://learn.datastax.com/WebinarCEPDistributedProcessingonCassandrawithStorm_Registration.html [3] Cheers ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com [4] On 9/01/2013, at 5:48 AM, Vitaly Sourikov wrote: Hi, we are currently at an early stage of our project and have only one Cassandra 1.1.7 node hosted on EC2 m1.large node, where the data is written to the ephemeral disk, and /var/lib/cassandra/data is just a soft link to it. Commit logs and caches are still on /var/lib/cassandra/. We set MAX_HEAP_SIZE="6G" and HEAP_NEWSIZE="400M" On the client-side, we use Astyanax 1.56.18 to access the data. We have a processing server that writes to Cassandra, and an online server that reads from it. The former wakes up every 0.5-5sec., checks for new entries (in "Entries" CF, with indexed column status=1), processes them, and sets the status to 2, when done. The online server checks once a second if an entry that should be processed got the status 2 and sends it to its client side for display. Processing takes 5-10 seconds and updates various columns in the "Entries" CF few times on the way. One of these columns may contain ~12KB of textual data, others are just short strings or numbers. Now, our problem is that it takes 20-40 seconds before the online server actually sees the change - and it is way too long, this process is supposed to be nearly real-time. Moreover, in sqlsh, if I perform a similar update, it is immediately seen in the following select results, but the updates from the back-end server also do not appear for 20-40 seconds. I tried switching the row caches for that table and in yaml on and of. I tried commitlog_sync: batch with commitlog_sync_batch_window_in_ms: 50. Nothing helped. I will appreciate any advice on how to speed the writes up, or at least an explanation why this happens. thanks, Vitaly Links: ------ [1] mailto:aa...@thelastpickle.com [2] http://storm-project.net/ [3] http://learn.datastax.com/WebinarCEPDistributedProcessingonCassandrawithStorm_Registration.html [4] http://www.thelastpickle.com [5] mailto:vitaly.souri...@gmail.com