Re: Occasional 10s Timeouts on Read

Jonathan Ellis Fri, 18 Jun 2010 20:17:04 -0700

set log level to TRACE and see if the OutboundTcpConnection is going
bad.  that would explain the message never arriving.


On Fri, Jun 18, 2010 at 10:39 AM, AJ Slater <a...@zuno.com> wrote:
> To summarize:
>
> If a request for a column comes in *after a period of several hours
> with no requests*, then the node servicing the request hangs while
> looking for its peer rather than servicing the request like it should.
> It then throws either a TimedOutException or a (wrong)
> NotFoundExeption.
>
> And it doen't appear to actually send the message it says it does to
> its peer. Or at least its peer doesn't report the request being
> received.
>
> And then the situation magically clears up after approximately 2 minutes.
>
> However, if the idle period never occurs, then the problem does not
> manifest. If I run a cron job with wget against my server every
> minute, I do not see the problem.
>
> I'll be looking at some tcpdump logs to see if i can suss out what's
> really happening, and perhaps file this as a bug. The several hours
> between reproducible events makes this whole thing aggravating for
> detection, debugging and I'll assume, fixing, if it is indeed a
> cassandra problem.
>
> It was suggested on IRC that it may be my network. But gossip is
> continually sending heartbeats and nodetool and the logs show the
> nodes as up and available. If my network was flaking out I'd think it
> would be dropping heartbeats and I'd see that.
>
> AJ
>
> On Thu, Jun 17, 2010 at 2:26 PM, AJ Slater <a...@zuno.com> wrote:
>> These are physical machines.
>>
>> storage-conf.xml.fs03 is here:
>>
>> http://pastebin.com/weL41NB1
>>
>> Diffs from that for the other two storage-confs are inline here:
>>
>> a...@worm:../Z3/cassandra/conf/dev$ diff storage-conf.xml.lpc03
>> storage-conf.xml.fs01
>> 185c185
>>
>>>   <InitialToken>71603818521973537678586548668074777838</InitialToken>
>> 229c229
>> <   <ListenAddress>10.33.2.70</ListenAddress>
>> ---
>>>   <ListenAddress>10.33.3.10</ListenAddress>
>> 241c241
>> <   <ThriftAddress>10.33.2.70</ThriftAddress>
>> ---
>>>   <ThriftAddress>10.33.3.10</ThriftAddress>
>> 341c341
>> <   <ConcurrentReads>16</ConcurrentReads>
>> ---
>>>   <ConcurrentReads>4</ConcurrentReads>
>>
>>
>> a...@worm:../Z3/cassandra/conf/dev$ diff storage-conf.xml.lpc03
>> storage-conf.xml.fs02
>> 185c185
>> <   <InitialToken>0</InitialToken>
>> ---
>>>   <InitialToken>120215585224964746744782921158327379306</InitialToken>
>> 206d205
>> <       <Seed>10.33.3.20</Seed>
>> 229c228
>> <   <ListenAddress>10.33.2.70</ListenAddress>
>> ---
>>>   <ListenAddress>10.33.3.20</ListenAddress>
>> 241c240
>> <   <ThriftAddress>10.33.2.70</ThriftAddress>
>> ---
>>>   <ThriftAddress>10.33.3.20</ThriftAddress>
>> 341c340
>> <   <ConcurrentReads>16</ConcurrentReads>
>> ---
>>>   <ConcurrentReads>4</ConcurrentReads>
>>
>>
>> Thank you for your attention,
>>
>> AJ
>>
>>
>> On Thu, Jun 17, 2010 at 2:09 PM, Benjamin Black <b...@b3k.us> wrote:
>>> Are these physical machines or virtuals?  Did you post your
>>> cassandra.in.sh and storage-conf.xml someplace?
>>>
>>> On Thu, Jun 17, 2010 at 10:31 AM, AJ Slater <a...@zuno.com> wrote:
>>>> Total data size in the entire cluster is about twenty 12k images. With
>>>> no other load on the system. I just ask for one column and I get these
>>>> timeouts. Performing multiple gets on the columns leads to multiple
>>>> timeouts for a period of a few seconds or minutes and then the
>>>> situation magically resolves itself and response times are down to
>>>> single digit milliseconds for a column get.
>>>>
>>>> On Thu, Jun 17, 2010 at 10:24 AM, AJ Slater <a...@zuno.com> wrote:
>>>>> Cassandra 0.6.2 from the apache debian source.
>>>>> Ubunutu Jaunty. Sun Java6 jvm.
>>>>>
>>>>> All nodes in separate racks at 365 main.
>>>>>
>>>>> On Thu, Jun 17, 2010 at 10:12 AM, AJ Slater <a...@zuno.com> wrote:
>>>>>> I'm seing 10s timeouts on reads few times a day. Its hard to reproduce
>>>>>> consistently but seems to happen most often after its been a long time
>>>>>> between reads. After presenting itself for a couple minutes the
>>>>>> problem then goes away.
>>>>>>
>>>>>> I've got a three node cluster with replication factor 2, reading at
>>>>>> consistency level ONE. The columns being read are around 12k each. The
>>>>>> nodes are 8GB multicore boxes with the JVM limits between 4GB and 6GB.
>>>>>>
>>>>>> Here's an application log from early this morning when a developer in
>>>>>> Belgrade accessed the system:
>>>>>>
>>>>>> Jun 17 03:54:17 lpc03 pinhole[5736]: MainThread:pinhole.py:61 |
>>>>>> Requested image_id: 5827067133c3d670071c17d9144f0b49
>>>>>> Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:pinhole.py:76 |
>>>>>> TimedOutException for Image 5827067133c3d670071c17d9144f0b49
>>>>>> Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:zlog.py:105 | Image
>>>>>> Get took 10005.388975 ms
>>>>>> Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:pinhole.py:61 |
>>>>>> Requested image_id: af8caf3b76ce97d13812ddf795104a5c
>>>>>> Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:zlog.py:105 | Image
>>>>>> Get took 3.658056 ms
>>>>>> Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:zlog.py:105 | Image
>>>>>> Transform took 0.978947 ms
>>>>>>
>>>>>> That's a Timeout and then a successful get of another column.
>>>>>>
>>>>>> Here's the cassandra log for 10.33.2.70:
>>>>>>
>>>>>> DEBUG 03:54:17,070 get_slice
>>>>>> DEBUG 03:54:17,071 weakreadremote reading
>>>>>> SliceFromReadCommand(table='jolitics.com',
>>>>>> key='5827067133c3d670071c17d9144f0b49',
>>>>>> column_parent='QueryPath(columnFamilyName='Images',
>>>>>> superColumnName='null', columnName='null')', start='', finish='
>>>>>> ', reversed=false, count=100)
>>>>>> DEBUG 03:54:17,071 weakreadremote reading
>>>>>> SliceFromReadCommand(table='jolitics.com',
>>>>>> key='5827067133c3d670071c17d9144f0b49',
>>>>>> column_parent='QueryPath(columnFamilyName='Images',
>>>>>> superColumnName='null', columnName='null')', start='', finish='
>>>>>> ', reversed=false, count=100) from 45138@/10.33.3.10
>>>>>> DEBUG 03:54:27,077 get_slice
>>>>>> DEBUG 03:54:27,078 weakreadlocal reading
>>>>>> SliceFromReadCommand(table='jolitics.com',
>>>>>> key='af8caf3b76ce97d13812ddf795104a5c',
>>>>>> column_parent='QueryPath(columnFamilyName='Images',
>>>>>> superColumnName='null', columnName='null')', start='', finish=''
>>>>>> , reversed=false, count=100)
>>>>>> DEBUG 03:54:27,079 collecting body:false:1...@1275951327610885
>>>>>> DEBUG 03:54:27,080 collecting body:false:1...@1275951327610885
>>>>>> DEBUG 03:54:27,080 Reading consistency digest for 
>>>>>> af8caf3b76ce97d13812ddf795104a
>>>>>> 5c from 45...@[/10.33.2.70, /10.33.3.10]
>>>>>> DEBUG 03:54:50,779 Disseminating load info ...
>>>>>>
>>>>>> It looks like it asks for key='5827067133c3d670071c17d9144f0b49' from
>>>>>> the local host and also queries 10.33.3.10 for the first one and then
>>>>>> for 'af8caf3b76ce97d13812ddf795104a5c' it only queries the local host
>>>>>> and then returns appropriately.
>>>>>>
>>>>>> Here's the log for 10.33.3.10 around that time:
>>>>>>
>>>>>> DEBUG 03:54:19,645 Disseminating load info ...
>>>>>> DEBUG 03:55:19,645 Disseminating load info ...
>>>>>> DEBUG 03:56:19,646 Disseminating load info ...
>>>>>> DEBUG 03:57:19,645 Disseminating load info ...
>>>>>> DEBUG 03:58:19,645 Disseminating load info ...
>>>>>> DEBUG 03:59:19,646 Disseminating load info ...
>>>>>> DEBUG 04:00:18,635 GC for ParNew: 4 ms, 21443128 reclaimed leaving
>>>>>> 55875144 used; max is 6580535296
>>>>>>
>>>>>> No record of communication from 10.33.2.70.
>>>>>>
>>>>>> Does this ring any bells for anyone? I can of course attach
>>>>>> storage-conf's for all nodes if that sounds useful and I'll be on
>>>>>> #cassandra as ajslater.
>>>>>>
>>>>>> Much thanks for taking a look and any suggestions. We fear we'll have
>>>>>> to abandon Cassandra if this bug cannot be resolved.
>>>>>>
>>>>>> AJ
>>>>>>
>>>>>
>>>>
>>>
>>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Occasional 10s Timeouts on Read

Reply via email to