Re: sometimes get timeout while batch inserting. (using pycassa)

Tyler Hobbs Thu, 20 Sep 2012 10:32:55 -0700

That's showing a client-side socket timeout.  By default, the timeout for
pycassa connections is fairly low, at 0.5 seconds. With the default batch
insert size of 100 rows, you're probably hitting this timeout
occasionally.  I suggest lowering the batch size and using multiple threads
for the highest write throughput, but you could also just increase the
timeout on the ConnectionPool if you don't care that much.


P.S.: There is a pycassa-specific mailing list:
https://groups.google.com/forum/?fromgroups#!forum/pycassa-discuss

On Thu, Sep 20, 2012 at 5:14 AM, Yan Chunlu <springri...@gmail.com> wrote:

> forgot to mention the rpc configuration in cassandra.yaml is:
>
> rpc_timeout_in_ms: 20000
>
> and the cassandra version on production server is: 1.1.3
>
> the cassandra version I am using on my macbook is:  1.0.10
>
>
> On Thu, Sep 20, 2012 at 6:07 PM, Yan Chunlu <springri...@gmail.com> wrote:
>
>> I am testing the performance of 1 cassandra node on a production server.
>>  I wrote a script to insert 1 million items into cassandra. the data is
>> like below:
>>
>> *prefix = "benchmark_"*
>> *dct = {}*
>> *for i in range(0,1000000):*
>> *    key = "%s%d" % (prefix,i)*
>> *    dct[key] = "abc"*200*
>>
>> and the inserting code is like this:
>> *
>> *
>> *cf.batch(write_consistency_level = CL_ONEl):*
>> *cf.insert('%s%s' % (prefix, key),*
>> *                                              {'value':
>> pickle.dumps(val)},*
>> *                                              ttl = None)*
>>
>>
>> sometimes I get timeout error (detail here:
>> https://gist.github.com/3754965)  while it's executing. sometime it runs
>> okay.
>>
>> while the script and cassandra run smoothly on my macbook(for many
>> times), the configuration of my mac is " 2.4 GHz Intel Core 2 Duo", 8GB
>> memory, SSD disk though.
>>
>> really have no idea why is this...
>>
>> the reason I am do this test is that on the other production server, my 3
>> nodes cluster also give the pycassa client "timeout" error. make the system
>> unstable. but I am not sure what the problem is. is it the bug of python
>> library?
>> thanks for any further help!
>>
>> the test script is running on server A and cassandra is running on server
>> B.
>> the CPU of B is : "Intel(R) Xeon(R) CPU X3470  @ 2.93GHz Quadcore"
>>
>> the sys stats on B is normal:
>>
>> *vmstat 2*
>> procs -----------memory---------- ---swap-- -----io---- -system--
>> ----cpu----
>>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
>> id wa
>>  1  0 3643716 134876 191720 2352624    1    1     1    44    0    0 22  3
>> 74  0
>>  1  0 3643716 132016 191728 2355180    0    0     0   288 4701 16764  9
>>  4 87  0
>>  0  0 3643716 129700 191736 2357996    0    0     0  5772 3775 17139  9
>>  4 87  0
>>  0  0 3643716 127468 191744 2360420   32    0    32   404 4490 17487 11
>>  3 85  0
>> *
>> *
>> *iostat -x 2*
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> sda               0.00   230.00    1.00   15.00     6.00   980.00
>> 123.25     0.03    2.00    8.00    1.60   1.12   1.80
>> sdb               0.00     0.00    0.00    0.00     0.00     0.00
>> 0.00     0.00    0.00    0.00    0.00   0.00   0.00
>>
>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>           11.52    1.21    1.99    0.48    0.00   84.80
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> sda               7.00   184.00   12.50   12.00    78.00   784.00
>>  70.37     0.11    4.65    8.32    0.83   1.88   4.60
>> sdb               0.00     0.00    0.00    0.00     0.00     0.00
>> 0.00     0.00    0.00    0.00    0.00   0.00   0.00
>>
>> *free -t*
>>              total       used       free     shared    buffers     cached
>> Mem:      16467952   16378592      89360          0     152032    2452216
>> -/+ buffers/cache:   13774344    2693608
>> Swap:      7287436    3643716    3643720
>> Total:    23755388   20022308    3733080
>>
>> *uptime*
>>  04:52:57 up 422 days, 19:59,  1 user,  load average: 2.71, 2.09, 1.48
>>
>>
>>
>


-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: sometimes get timeout while batch inserting. (using pycassa)

Reply via email to