Aaron, thanks a lot for you response! It gave us many ideas for future
re-factorings.

Meanwhile, while trying to monitor Cassandra response times on all 3
servers (online, offline and cassandra itself), I have noticed that the
system time was different on all 3. After I ran ntpdate on all of them, the
problem was gone! The changes saved in Cassandra on offline are immediately
visible to online.

Unfortunately, I cannot explain, why system time on the client machine
matters, but I really hope that I have found the root cause of the problem,
and it is not just a coincidence that performance has improved, after I
have synched the times.

Best,
Vitaly Sourikov

On Wed, Jan 9, 2013 at 4:24 AM, aaron morton <aa...@thelastpickle.com>wrote:

> EC2 m1.large node
>
> You will have a much happier time if you use a m1.xlarge.
>
> We set MAX_HEAP_SIZE="6G" and HEAP_NEWSIZE="400M"
>
> Thats a pretty low new heap size.
>
> checks for new entries (in "Entries" CF, with indexed column status=1),
> processes them, and sets the status to 2, when done
>
> This is not the best data model.
> You may be better have one CF for the unprocessed and one for the process.
> Or if you really need a queue using something like Kafka.
>
> I will appreciate any advice on how to speed the writes up,
>
> Writes are instantly available for reading.
> The first thing I would do is see where the delay is. Use the nodetool
> cfstats to see the local write latency, or track the write latency from the
> client perspective.
>
> If you are looking for near real time / continuous computation style
> processing take a look at http://storm-project.net/ and register for this
> talk from a Brian O'Neill one of my fellow Data Stax MVP's
> http://learn.datastax.com/WebinarCEPDistributedProcessingonCassandrawithStorm_Registration.html
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 9/01/2013, at 5:48 AM, Vitaly Sourikov <vitaly.souri...@gmail.com>
> wrote:
>
> Hi,
> we are currently at an early stage of our project and have only one
> Cassandra 1.1.7 node hosted on EC2 m1.large node, where the data is written
> to the ephemeral disk, and /var/lib/cassandra/data is just a soft link to
> it. Commit logs and caches are still on /var/lib/cassandra/. We set
> MAX_HEAP_SIZE="6G" and HEAP_NEWSIZE="400M"
>
> On the client-side, we use Astyanax 1.56.18 to access the data.  We have a
> processing server that writes to Cassandra, and an online server that reads
> from it. The former wakes up every 0.5-5sec., checks for new entries (in
> "Entries" CF, with indexed column status=1), processes them, and sets the
> status to 2, when done. The online server checks once a second if an entry
> that should be processed got the status 2 and sends it to its client side
> for display. Processing takes 5-10 seconds and updates various columns in
> the "Entries" CF few times on the way. One of these columns may contain
> ~12KB of textual data, others are just short strings or numbers.
>
> Now, our problem is that it takes 20-40 seconds before the online server
> actually sees the change - and it is way too long, this process is supposed
> to be nearly real-time. Moreover, in sqlsh, if I perform a similar update,
> it is immediately seen in the following select results, but the updates
> from the back-end server also do not appear for 20-40 seconds.
>
> I tried switching the row caches for that table and in yaml on and of. I
> tried commitlog_sync: batch with commitlog_sync_batch_window_in_ms: 50.
> Nothing helped.
>
> I will appreciate any advice on how to speed the writes up, or at least an
> explanation why this happens.
>
> thanks,
> Vitaly
>
>
>

Reply via email to