Re: Occasional 10s Timeouts on Read

AJ Slater Sat, 19 Jun 2010 15:12:46 -0700

Logging with TRACE reveals immediate problems with no client requests
coming in to the servers. The problem was immediate and persisted over
the course of  half an hour:


10.33.2.70   lpc03
10.33.3.10   fs01
10.33.3.20   fs02

a...@lpc03:~$ grep unable /var/log/cassandra/output.log
TRACE 14:07:52,104 unable to connect to /10.33.3.10
...
TRACE 14:42:00,008 unable to connect to /10.33.3.20
...
TRACE 14:42:06,751 unable to connect to /10.33.3.20

Note that lpc03 has trouble talking to fs01 and fs02. But after After
seeing this I started logging TRACE on fs01 and fs02.

During the six seconds before I restarted fs02:

a...@fs01:~/logs$ grep unable /var/log/cassandra/output.log | grep unable
Bad configuration; unable to start server
TRACE 14:42:00,865 unable to connect to /10.33.3.20
...
TRACE 14:42:06,730 unable to connect to /10.33.3.20

Restarted fs02 and no issues in any of the logs.

a...@fs02:~$ grep unable /var/log/cassandra/output.log
a...@fs02:~$



The unfiltered log messages all look more like:

TRACE 14:42:06,248 unable to connect to /10.33.3.20
java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
        at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
        at java.net.Socket.connect(Socket.java:529)
        at java.net.Socket.connect(Socket.java:478)
        at java.net.Socket.<init>(Socket.java:375)
        at java.net.Socket.<init>(Socket.java:276)
        at org.apache.cassandra.net.OutboundTcpConnection.connect(OutboundTcpCon
nection.java:149)
        at org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnect
ion.java:85)


On Sat, Jun 19, 2010 at 2:19 PM, AJ Slater <a...@zuno.com> wrote:
> I shall do just that. I did a bunch of tests this morning and the
> situation appears to be this:
>
> I have three nodes A, B and C, with RF=2. I understand now why this
> issue wasn't apparent with RF=3.
>
> If there are regular intranode column requests going on (e.g. i set up
> a pinger to get remote columns), the cluster functions normally.
> However, if no intranode column requests happen for a few hours, (3
> hours is the minimum I've seen, but sometimes it takes longer), things
> go wrong. Using node A as the point of contact from the client, all
> columns that live on A are returned in a timely fashion, but for
> columns that only live on B & C, the retrieval times out, with this in
> the log:
>
> INFO 13:13:28,345 error writing to /10.33.3.20
>
> No request for replicas, or consistency checks are seen in the logs of
> B & C at this time. Using 'nodetool ring' from each of the three nodes
> shows all nodes as Up. Telnet from A to B on port 7000 connects.
> Tcpdump logs look like, at first glance, that gossip communication,
> perhaps heartbeats, are proceeding normally, but I haven't really
> analyzed them.
>
> Fifteen minutes later, the cluster decided to behave normally again.
> Everyone talks to each other like buddies and delivers columns fast an
> regularly.
>
> This is really looking like a Cassandra bug. I'll report back with my
> TRACE log later and I expect I'll be opening a ticket. The confidence
> level of my employer in my Cassandra solution to their petabyte data
> storage project is... uh... well... it could be better.
>
> AJ
>
>
> On Fri, Jun 18, 2010 at 8:16 PM, Jonathan Ellis <jbel...@gmail.com> wrote:
>> set log level to TRACE and see if the OutboundTcpConnection is going
>> bad.  that would explain the message never arriving.
>>
>> On Fri, Jun 18, 2010 at 10:39 AM, AJ Slater <a...@zuno.com> wrote:
>>> To summarize:
>>>
>>> If a request for a column comes in *after a period of several hours
>>> with no requests*, then the node servicing the request hangs while
>>> looking for its peer rather than servicing the request like it should.
>>> It then throws either a TimedOutException or a (wrong)
>>> NotFoundExeption.
>>>
>>> And it doen't appear to actually send the message it says it does to
>>> its peer. Or at least its peer doesn't report the request being
>>> received.
>>>
>>> And then the situation magically clears up after approximately 2 minutes.
>>>
>>> However, if the idle period never occurs, then the problem does not
>>> manifest. If I run a cron job with wget against my server every
>>> minute, I do not see the problem.
>>>
>>> I'll be looking at some tcpdump logs to see if i can suss out what's
>>> really happening, and perhaps file this as a bug. The several hours
>>> between reproducible events makes this whole thing aggravating for
>>> detection, debugging and I'll assume, fixing, if it is indeed a
>>> cassandra problem.
>>>
>>> It was suggested on IRC that it may be my network. But gossip is
>>> continually sending heartbeats and nodetool and the logs show the
>>> nodes as up and available. If my network was flaking out I'd think it
>>> would be dropping heartbeats and I'd see that.
>>>
>>> AJ
>>>
>>> On Thu, Jun 17, 2010 at 2:26 PM, AJ Slater <a...@zuno.com> wrote:
>>>> These are physical machines.
>>>>
>>>> storage-conf.xml.fs03 is here:
>>>>
>>>> http://pastebin.com/weL41NB1
>>>>
>>>> Diffs from that for the other two storage-confs are inline here:
>>>>
>>>> a...@worm:../Z3/cassandra/conf/dev$ diff storage-conf.xml.lpc03
>>>> storage-conf.xml.fs01
>>>> 185c185
>>>>
>>>>>   <InitialToken>71603818521973537678586548668074777838</InitialToken>
>>>> 229c229
>>>> <   <ListenAddress>10.33.2.70</ListenAddress>
>>>> ---
>>>>>   <ListenAddress>10.33.3.10</ListenAddress>
>>>> 241c241
>>>> <   <ThriftAddress>10.33.2.70</ThriftAddress>
>>>> ---
>>>>>   <ThriftAddress>10.33.3.10</ThriftAddress>
>>>> 341c341
>>>> <   <ConcurrentReads>16</ConcurrentReads>
>>>> ---
>>>>>   <ConcurrentReads>4</ConcurrentReads>
>>>>
>>>>
>>>> a...@worm:../Z3/cassandra/conf/dev$ diff storage-conf.xml.lpc03
>>>> storage-conf.xml.fs02
>>>> 185c185
>>>> <   <InitialToken>0</InitialToken>
>>>> ---
>>>>>   <InitialToken>120215585224964746744782921158327379306</InitialToken>
>>>> 206d205
>>>> <       <Seed>10.33.3.20</Seed>
>>>> 229c228
>>>> <   <ListenAddress>10.33.2.70</ListenAddress>
>>>> ---
>>>>>   <ListenAddress>10.33.3.20</ListenAddress>
>>>> 241c240
>>>> <   <ThriftAddress>10.33.2.70</ThriftAddress>
>>>> ---
>>>>>   <ThriftAddress>10.33.3.20</ThriftAddress>
>>>> 341c340
>>>> <   <ConcurrentReads>16</ConcurrentReads>
>>>> ---
>>>>>   <ConcurrentReads>4</ConcurrentReads>
>>>>
>>>>
>>>> Thank you for your attention,
>>>>
>>>> AJ
>>>>
>>>>
>>>> On Thu, Jun 17, 2010 at 2:09 PM, Benjamin Black <b...@b3k.us> wrote:
>>>>> Are these physical machines or virtuals?  Did you post your
>>>>> cassandra.in.sh and storage-conf.xml someplace?
>>>>>
>>>>> On Thu, Jun 17, 2010 at 10:31 AM, AJ Slater <a...@zuno.com> wrote:
>>>>>> Total data size in the entire cluster is about twenty 12k images. With
>>>>>> no other load on the system. I just ask for one column and I get these
>>>>>> timeouts. Performing multiple gets on the columns leads to multiple
>>>>>> timeouts for a period of a few seconds or minutes and then the
>>>>>> situation magically resolves itself and response times are down to
>>>>>> single digit milliseconds for a column get.
>>>>>>
>>>>>> On Thu, Jun 17, 2010 at 10:24 AM, AJ Slater <a...@zuno.com> wrote:
>>>>>>> Cassandra 0.6.2 from the apache debian source.
>>>>>>> Ubunutu Jaunty. Sun Java6 jvm.
>>>>>>>
>>>>>>> All nodes in separate racks at 365 main.
>>>>>>>
>>>>>>> On Thu, Jun 17, 2010 at 10:12 AM, AJ Slater <a...@zuno.com> wrote:
>>>>>>>> I'm seing 10s timeouts on reads few times a day. Its hard to reproduce
>>>>>>>> consistently but seems to happen most often after its been a long time
>>>>>>>> between reads. After presenting itself for a couple minutes the
>>>>>>>> problem then goes away.
>>>>>>>>
>>>>>>>> I've got a three node cluster with replication factor 2, reading at
>>>>>>>> consistency level ONE. The columns being read are around 12k each. The
>>>>>>>> nodes are 8GB multicore boxes with the JVM limits between 4GB and 6GB.
>>>>>>>>
>>>>>>>> Here's an application log from early this morning when a developer in
>>>>>>>> Belgrade accessed the system:
>>>>>>>>
>>>>>>>> Jun 17 03:54:17 lpc03 pinhole[5736]: MainThread:pinhole.py:61 |
>>>>>>>> Requested image_id: 5827067133c3d670071c17d9144f0b49
>>>>>>>> Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:pinhole.py:76 |
>>>>>>>> TimedOutException for Image 5827067133c3d670071c17d9144f0b49
>>>>>>>> Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:zlog.py:105 | Image
>>>>>>>> Get took 10005.388975 ms
>>>>>>>> Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:pinhole.py:61 |
>>>>>>>> Requested image_id: af8caf3b76ce97d13812ddf795104a5c
>>>>>>>> Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:zlog.py:105 | Image
>>>>>>>> Get took 3.658056 ms
>>>>>>>> Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:zlog.py:105 | Image
>>>>>>>> Transform took 0.978947 ms
>>>>>>>>
>>>>>>>> That's a Timeout and then a successful get of another column.
>>>>>>>>
>>>>>>>> Here's the cassandra log for 10.33.2.70:
>>>>>>>>
>>>>>>>> DEBUG 03:54:17,070 get_slice
>>>>>>>> DEBUG 03:54:17,071 weakreadremote reading
>>>>>>>> SliceFromReadCommand(table='jolitics.com',
>>>>>>>> key='5827067133c3d670071c17d9144f0b49',
>>>>>>>> column_parent='QueryPath(columnFamilyName='Images',
>>>>>>>> superColumnName='null', columnName='null')', start='', finish='
>>>>>>>> ', reversed=false, count=100)
>>>>>>>> DEBUG 03:54:17,071 weakreadremote reading
>>>>>>>> SliceFromReadCommand(table='jolitics.com',
>>>>>>>> key='5827067133c3d670071c17d9144f0b49',
>>>>>>>> column_parent='QueryPath(columnFamilyName='Images',
>>>>>>>> superColumnName='null', columnName='null')', start='', finish='
>>>>>>>> ', reversed=false, count=100) from 45138@/10.33.3.10
>>>>>>>> DEBUG 03:54:27,077 get_slice
>>>>>>>> DEBUG 03:54:27,078 weakreadlocal reading
>>>>>>>> SliceFromReadCommand(table='jolitics.com',
>>>>>>>> key='af8caf3b76ce97d13812ddf795104a5c',
>>>>>>>> column_parent='QueryPath(columnFamilyName='Images',
>>>>>>>> superColumnName='null', columnName='null')', start='', finish=''
>>>>>>>> , reversed=false, count=100)
>>>>>>>> DEBUG 03:54:27,079 collecting body:false:1...@1275951327610885
>>>>>>>> DEBUG 03:54:27,080 collecting body:false:1...@1275951327610885
>>>>>>>> DEBUG 03:54:27,080 Reading consistency digest for 
>>>>>>>> af8caf3b76ce97d13812ddf795104a
>>>>>>>> 5c from 45...@[/10.33.2.70, /10.33.3.10]
>>>>>>>> DEBUG 03:54:50,779 Disseminating load info ...
>>>>>>>>
>>>>>>>> It looks like it asks for key='5827067133c3d670071c17d9144f0b49' from
>>>>>>>> the local host and also queries 10.33.3.10 for the first one and then
>>>>>>>> for 'af8caf3b76ce97d13812ddf795104a5c' it only queries the local host
>>>>>>>> and then returns appropriately.
>>>>>>>>
>>>>>>>> Here's the log for 10.33.3.10 around that time:
>>>>>>>>
>>>>>>>> DEBUG 03:54:19,645 Disseminating load info ...
>>>>>>>> DEBUG 03:55:19,645 Disseminating load info ...
>>>>>>>> DEBUG 03:56:19,646 Disseminating load info ...
>>>>>>>> DEBUG 03:57:19,645 Disseminating load info ...
>>>>>>>> DEBUG 03:58:19,645 Disseminating load info ...
>>>>>>>> DEBUG 03:59:19,646 Disseminating load info ...
>>>>>>>> DEBUG 04:00:18,635 GC for ParNew: 4 ms, 21443128 reclaimed leaving
>>>>>>>> 55875144 used; max is 6580535296
>>>>>>>>
>>>>>>>> No record of communication from 10.33.2.70.
>>>>>>>>
>>>>>>>> Does this ring any bells for anyone? I can of course attach
>>>>>>>> storage-conf's for all nodes if that sounds useful and I'll be on
>>>>>>>> #cassandra as ajslater.
>>>>>>>>
>>>>>>>> Much thanks for taking a look and any suggestions. We fear we'll have
>>>>>>>> to abandon Cassandra if this bug cannot be resolved.
>>>>>>>>
>>>>>>>> AJ
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>>
>

Re: Occasional 10s Timeouts on Read

Reply via email to