That sounds interesting, but not sure exactly what you mean? My key is
like this: "((f1, f2, day) timeuuid)", and f1/f2 are roughly
well-distributed. So my inserts are pretty evenly distributed across
about 22k combinations of f1+f2 each day.
Are you saying that you get better performance by
What's the disk setup like on these system? You have some pending tasks in
MemtablePostFlusher and FlushWriter which may mean there is contention on
flushing discarded segments from the commit log.
On Wed, Aug 21, 2013 at 5:14 PM, Keith Freeman <8fo...@gmail.com> wrote:
> Ok, I tried batching 5
The only thing I can think to suggest at this point is upping that batch
size - say to 500 and see what happens.
Do you have any monitoring on this cluster? If not, what do you see as the
output of 'nodetool tpstats' while you run this test?
On Wed, Aug 21, 2013 at 1:40 PM, Keith Freeman <8fo...
Building the giant batch string wasn't as bad as I thought, and at first
I had great(!) results (using "unlogged" batches): 2500 rows/sec
(batches of 100 in 48 threads) ran very smoothly, and the load on the
cassandra server nodes averaged about 1.0 or less continuously.
But then I upped it to
Thrift will allow for more large, free-form batch contstruction. The
increase will be doing a lot more in the same payload message. Otherwise
CQL is more efficient.
If you do build those giant string, yes you should see a performance
improvement.
On Tue, Aug 20, 2013 at 8:03 PM, Keith Freeman <8
Thanks. Can you tell me why would using thrift would improve performance?
Also, if I do try to build those giant strings for a prepared batch
statement, should I expect another performance improvement?
On 08/20/2013 05:06 PM, Nate McCall wrote:
Ugh - sorry, I knew Sylvain and Michaël had wor
Ugh - sorry, I knew Sylvain and Michaël had worked on this recently but it
is only in 2.0 - I could have sworn it got marked for inclusion back into
1.2 but I was wrong:
https://issues.apache.org/jira/browse/CASSANDRA-4693
This is indeed an issue if you don't know the column count before hand (or
So I tried inserting prepared statements separately (no batch), and my
server nodes load definitely dropped significantly. Throughput from my
client improved a bit, but only a few %. I was able to *almost* get
5000 rows/sec (sort of) by also reducing the rows/insert-thread to 20-50
and elimin
AFAIK, batch prepared statements were added just recently:
https://issues.apache.org/jira/browse/CASSANDRA-4693 and many client
libraries are not supporting it yet. (And I believe that the problem is
related to batch operations).
On Tue, Aug 20, 2013 at 4:43 PM, Nate McCall wrote:
> Thanks for
Thanks for putting this up - sorry I missed your post the other week. I
would be real curious as to your results if you added a prepared statement
for those inserts.
On Tue, Aug 20, 2013 at 9:14 AM, Przemek Maciolek wrote:
> I had similar issues (sent a note on the list few weeks ago but nobody
I had similar issues (sent a note on the list few weeks ago but nobody
responded). I think there's a serious bottleneck with using wide rows and
composite keys. I made a trivial benchmark, which you check here:
http://pastebin.com/qAcRcqbF - it's written in cql-rb, but I ran the test
using astyana
John makes a good point re:prepared statements (I'd increase batch sizes
again once you did this as well - separate, incremental runs of course so
you can gauge the effect of each). That should take out some of the
processing overhead of statement validation in the server (some - that load
spike st
Ok, I'll try prepared statements. But while sending my statements
async might speed up my client, it wouldn't improve throughput on the
cassandra nodes would it? They're running at pretty high loads and only
about 10% idle, so my concern is that they can't handle the data any
faster, so some
I'd suggest using prepared statements that you initialize at application
start up and switching to use Session.executeAsync coupled with Google
Guava Futures API to get better throughput on the client side.
On Mon, Aug 19, 2013 at 10:14 PM, Keith Freeman <8fo...@gmail.com> wrote:
> Sure, I've t
Sure, I've tried different numbers for batches and threads, but
generally I'm running 10-30 threads at a time on the client, each
sending a batch of 100 insert statements in every call, using the
QueryBuilder.batch() API from the latest datastax java driver, then
calling the Session.execute() f
How big are the batch sizes? In other words, how many rows are you sending
per insert operation?
Other than the above, not much else to suggest without seeing some example
code (on pastebin, gist or similar, ideally).
On Mon, Aug 19, 2013 at 5:49 PM, Keith Freeman <8fo...@gmail.com> wrote:
> I'v
I've got a 3-node cassandra cluster (16G/4-core VMs ESXi v5 on 2.5Ghz
machines not shared with any other VMs). I'm inserting time-series data
into a single column-family using "wide rows" (timeuuids) and have a
3-part partition key so my primary key is something like ((a, b, day),
in-time-uuid
17 matches
Mail list logo