Re: Occasional 10s Timeouts on Read

Jonathan Ellis Sat, 19 Jun 2010 19:02:58 -0700

This is definitely not a Cassandra bug, something external is causing
those connection failures.


On Sat, Jun 19, 2010 at 3:12 PM, AJ Slater <a...@zuno.com> wrote:
> Logging with TRACE reveals immediate problems with no client requests
> coming in to the servers. The problem was immediate and persisted over
> the course of  half an hour:
>
> 10.33.2.70   lpc03
> 10.33.3.10   fs01
> 10.33.3.20   fs02
>
> a...@lpc03:~$ grep unable /var/log/cassandra/output.log
> TRACE 14:07:52,104 unable to connect to /10.33.3.10
> ...
> TRACE 14:42:00,008 unable to connect to /10.33.3.20
> ...
> TRACE 14:42:06,751 unable to connect to /10.33.3.20
>
> Note that lpc03 has trouble talking to fs01 and fs02. But after After
> seeing this I started logging TRACE on fs01 and fs02.
>
> During the six seconds before I restarted fs02:
>
> a...@fs01:~/logs$ grep unable /var/log/cassandra/output.log | grep unable
> Bad configuration; unable to start server
> TRACE 14:42:00,865 unable to connect to /10.33.3.20
> ...
> TRACE 14:42:06,730 unable to connect to /10.33.3.20
>
> Restarted fs02 and no issues in any of the logs.
>
> a...@fs02:~$ grep unable /var/log/cassandra/output.log
> a...@fs02:~$
>
>
>
> The unfiltered log messages all look more like:
>
> TRACE 14:42:06,248 unable to connect to /10.33.3.20
> java.net.ConnectException: Connection refused
>        at java.net.PlainSocketImpl.socketConnect(Native Method)
>        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
>        at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
>        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
>        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
>        at java.net.Socket.connect(Socket.java:529)
>        at java.net.Socket.connect(Socket.java:478)
>        at java.net.Socket.<init>(Socket.java:375)
>        at java.net.Socket.<init>(Socket.java:276)
>        at 
> org.apache.cassandra.net.OutboundTcpConnection.connect(OutboundTcpCon
> nection.java:149)
>        at 
> org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnect
> ion.java:85)
>
>
> On Sat, Jun 19, 2010 at 2:19 PM, AJ Slater <a...@zuno.com> wrote:
>> I shall do just that. I did a bunch of tests this morning and the
>> situation appears to be this:
>>
>> I have three nodes A, B and C, with RF=2. I understand now why this
>> issue wasn't apparent with RF=3.
>>
>> If there are regular intranode column requests going on (e.g. i set up
>> a pinger to get remote columns), the cluster functions normally.
>> However, if no intranode column requests happen for a few hours, (3
>> hours is the minimum I've seen, but sometimes it takes longer), things
>> go wrong. Using node A as the point of contact from the client, all
>> columns that live on A are returned in a timely fashion, but for
>> columns that only live on B & C, the retrieval times out, with this in
>> the log:
>>
>> INFO 13:13:28,345 error writing to /10.33.3.20
>>
>> No request for replicas, or consistency checks are seen in the logs of
>> B & C at this time. Using 'nodetool ring' from each of the three nodes
>> shows all nodes as Up. Telnet from A to B on port 7000 connects.
>> Tcpdump logs look like, at first glance, that gossip communication,
>> perhaps heartbeats, are proceeding normally, but I haven't really
>> analyzed them.
>>
>> Fifteen minutes later, the cluster decided to behave normally again.
>> Everyone talks to each other like buddies and delivers columns fast an
>> regularly.
>>
>> This is really looking like a Cassandra bug. I'll report back with my
>> TRACE log later and I expect I'll be opening a ticket. The confidence
>> level of my employer in my Cassandra solution to their petabyte data
>> storage project is... uh... well... it could be better.
>>
>> AJ
>>
>>
>> On Fri, Jun 18, 2010 at 8:16 PM, Jonathan Ellis <jbel...@gmail.com> wrote:
>>> set log level to TRACE and see if the OutboundTcpConnection is going
>>> bad.  that would explain the message never arriving.
>>>
>>> On Fri, Jun 18, 2010 at 10:39 AM, AJ Slater <a...@zuno.com> wrote:
>>>> To summarize:
>>>>
>>>> If a request for a column comes in *after a period of several hours
>>>> with no requests*, then the node servicing the request hangs while
>>>> looking for its peer rather than servicing the request like it should.
>>>> It then throws either a TimedOutException or a (wrong)
>>>> NotFoundExeption.
>>>>
>>>> And it doen't appear to actually send the message it says it does to
>>>> its peer. Or at least its peer doesn't report the request being
>>>> received.
>>>>
>>>> And then the situation magically clears up after approximately 2 minutes.
>>>>
>>>> However, if the idle period never occurs, then the problem does not
>>>> manifest. If I run a cron job with wget against my server every
>>>> minute, I do not see the problem.
>>>>
>>>> I'll be looking at some tcpdump logs to see if i can suss out what's
>>>> really happening, and perhaps file this as a bug. The several hours
>>>> between reproducible events makes this whole thing aggravating for
>>>> detection, debugging and I'll assume, fixing, if it is indeed a
>>>> cassandra problem.
>>>>
>>>> It was suggested on IRC that it may be my network. But gossip is
>>>> continually sending heartbeats and nodetool and the logs show the
>>>> nodes as up and available. If my network was flaking out I'd think it
>>>> would be dropping heartbeats and I'd see that.
>>>>
>>>> AJ
>>>>
>>>> On Thu, Jun 17, 2010 at 2:26 PM, AJ Slater <a...@zuno.com> wrote:
>>>>> These are physical machines.
>>>>>
>>>>> storage-conf.xml.fs03 is here:
>>>>>
>>>>> http://pastebin.com/weL41NB1
>>>>>
>>>>> Diffs from that for the other two storage-confs are inline here:
>>>>>
>>>>> a...@worm:../Z3/cassandra/conf/dev$ diff storage-conf.xml.lpc03
>>>>> storage-conf.xml.fs01
>>>>> 185c185
>>>>>
>>>>>>   <InitialToken>71603818521973537678586548668074777838</InitialToken>
>>>>> 229c229
>>>>> <   <ListenAddress>10.33.2.70</ListenAddress>
>>>>> ---
>>>>>>   <ListenAddress>10.33.3.10</ListenAddress>
>>>>> 241c241
>>>>> <   <ThriftAddress>10.33.2.70</ThriftAddress>
>>>>> ---
>>>>>>   <ThriftAddress>10.33.3.10</ThriftAddress>
>>>>> 341c341
>>>>> <   <ConcurrentReads>16</ConcurrentReads>
>>>>> ---
>>>>>>   <ConcurrentReads>4</ConcurrentReads>
>>>>>
>>>>>
>>>>> a...@worm:../Z3/cassandra/conf/dev$ diff storage-conf.xml.lpc03
>>>>> storage-conf.xml.fs02
>>>>> 185c185
>>>>> <   <InitialToken>0</InitialToken>
>>>>> ---
>>>>>>   <InitialToken>120215585224964746744782921158327379306</InitialToken>
>>>>> 206d205
>>>>> <       <Seed>10.33.3.20</Seed>
>>>>> 229c228
>>>>> <   <ListenAddress>10.33.2.70</ListenAddress>
>>>>> ---
>>>>>>   <ListenAddress>10.33.3.20</ListenAddress>
>>>>> 241c240
>>>>> <   <ThriftAddress>10.33.2.70</ThriftAddress>
>>>>> ---
>>>>>>   <ThriftAddress>10.33.3.20</ThriftAddress>
>>>>> 341c340
>>>>> <   <ConcurrentReads>16</ConcurrentReads>
>>>>> ---
>>>>>>   <ConcurrentReads>4</ConcurrentReads>
>>>>>
>>>>>
>>>>> Thank you for your attention,
>>>>>
>>>>> AJ
>>>>>
>>>>>
>>>>> On Thu, Jun 17, 2010 at 2:09 PM, Benjamin Black <b...@b3k.us> wrote:
>>>>>> Are these physical machines or virtuals?  Did you post your
>>>>>> cassandra.in.sh and storage-conf.xml someplace?
>>>>>>
>>>>>> On Thu, Jun 17, 2010 at 10:31 AM, AJ Slater <a...@zuno.com> wrote:
>>>>>>> Total data size in the entire cluster is about twenty 12k images. With
>>>>>>> no other load on the system. I just ask for one column and I get these
>>>>>>> timeouts. Performing multiple gets on the columns leads to multiple
>>>>>>> timeouts for a period of a few seconds or minutes and then the
>>>>>>> situation magically resolves itself and response times are down to
>>>>>>> single digit milliseconds for a column get.
>>>>>>>
>>>>>>> On Thu, Jun 17, 2010 at 10:24 AM, AJ Slater <a...@zuno.com> wrote:
>>>>>>>> Cassandra 0.6.2 from the apache debian source.
>>>>>>>> Ubunutu Jaunty. Sun Java6 jvm.
>>>>>>>>
>>>>>>>> All nodes in separate racks at 365 main.
>>>>>>>>
>>>>>>>> On Thu, Jun 17, 2010 at 10:12 AM, AJ Slater <a...@zuno.com> wrote:
>>>>>>>>> I'm seing 10s timeouts on reads few times a day. Its hard to reproduce
>>>>>>>>> consistently but seems to happen most often after its been a long time
>>>>>>>>> between reads. After presenting itself for a couple minutes the
>>>>>>>>> problem then goes away.
>>>>>>>>>
>>>>>>>>> I've got a three node cluster with replication factor 2, reading at
>>>>>>>>> consistency level ONE. The columns being read are around 12k each. The
>>>>>>>>> nodes are 8GB multicore boxes with the JVM limits between 4GB and 6GB.
>>>>>>>>>
>>>>>>>>> Here's an application log from early this morning when a developer in
>>>>>>>>> Belgrade accessed the system:
>>>>>>>>>
>>>>>>>>> Jun 17 03:54:17 lpc03 pinhole[5736]: MainThread:pinhole.py:61 |
>>>>>>>>> Requested image_id: 5827067133c3d670071c17d9144f0b49
>>>>>>>>> Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:pinhole.py:76 |
>>>>>>>>> TimedOutException for Image 5827067133c3d670071c17d9144f0b49
>>>>>>>>> Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:zlog.py:105 | Image
>>>>>>>>> Get took 10005.388975 ms
>>>>>>>>> Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:pinhole.py:61 |
>>>>>>>>> Requested image_id: af8caf3b76ce97d13812ddf795104a5c
>>>>>>>>> Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:zlog.py:105 | Image
>>>>>>>>> Get took 3.658056 ms
>>>>>>>>> Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:zlog.py:105 | Image
>>>>>>>>> Transform took 0.978947 ms
>>>>>>>>>
>>>>>>>>> That's a Timeout and then a successful get of another column.
>>>>>>>>>
>>>>>>>>> Here's the cassandra log for 10.33.2.70:
>>>>>>>>>
>>>>>>>>> DEBUG 03:54:17,070 get_slice
>>>>>>>>> DEBUG 03:54:17,071 weakreadremote reading
>>>>>>>>> SliceFromReadCommand(table='jolitics.com',
>>>>>>>>> key='5827067133c3d670071c17d9144f0b49',
>>>>>>>>> column_parent='QueryPath(columnFamilyName='Images',
>>>>>>>>> superColumnName='null', columnName='null')', start='', finish='
>>>>>>>>> ', reversed=false, count=100)
>>>>>>>>> DEBUG 03:54:17,071 weakreadremote reading
>>>>>>>>> SliceFromReadCommand(table='jolitics.com',
>>>>>>>>> key='5827067133c3d670071c17d9144f0b49',
>>>>>>>>> column_parent='QueryPath(columnFamilyName='Images',
>>>>>>>>> superColumnName='null', columnName='null')', start='', finish='
>>>>>>>>> ', reversed=false, count=100) from 45138@/10.33.3.10
>>>>>>>>> DEBUG 03:54:27,077 get_slice
>>>>>>>>> DEBUG 03:54:27,078 weakreadlocal reading
>>>>>>>>> SliceFromReadCommand(table='jolitics.com',
>>>>>>>>> key='af8caf3b76ce97d13812ddf795104a5c',
>>>>>>>>> column_parent='QueryPath(columnFamilyName='Images',
>>>>>>>>> superColumnName='null', columnName='null')', start='', finish=''
>>>>>>>>> , reversed=false, count=100)
>>>>>>>>> DEBUG 03:54:27,079 collecting body:false:1...@1275951327610885
>>>>>>>>> DEBUG 03:54:27,080 collecting body:false:1...@1275951327610885
>>>>>>>>> DEBUG 03:54:27,080 Reading consistency digest for 
>>>>>>>>> af8caf3b76ce97d13812ddf795104a
>>>>>>>>> 5c from 45...@[/10.33.2.70, /10.33.3.10]
>>>>>>>>> DEBUG 03:54:50,779 Disseminating load info ...
>>>>>>>>>
>>>>>>>>> It looks like it asks for key='5827067133c3d670071c17d9144f0b49' from
>>>>>>>>> the local host and also queries 10.33.3.10 for the first one and then
>>>>>>>>> for 'af8caf3b76ce97d13812ddf795104a5c' it only queries the local host
>>>>>>>>> and then returns appropriately.
>>>>>>>>>
>>>>>>>>> Here's the log for 10.33.3.10 around that time:
>>>>>>>>>
>>>>>>>>> DEBUG 03:54:19,645 Disseminating load info ...
>>>>>>>>> DEBUG 03:55:19,645 Disseminating load info ...
>>>>>>>>> DEBUG 03:56:19,646 Disseminating load info ...
>>>>>>>>> DEBUG 03:57:19,645 Disseminating load info ...
>>>>>>>>> DEBUG 03:58:19,645 Disseminating load info ...
>>>>>>>>> DEBUG 03:59:19,646 Disseminating load info ...
>>>>>>>>> DEBUG 04:00:18,635 GC for ParNew: 4 ms, 21443128 reclaimed leaving
>>>>>>>>> 55875144 used; max is 6580535296
>>>>>>>>>
>>>>>>>>> No record of communication from 10.33.2.70.
>>>>>>>>>
>>>>>>>>> Does this ring any bells for anyone? I can of course attach
>>>>>>>>> storage-conf's for all nodes if that sounds useful and I'll be on
>>>>>>>>> #cassandra as ajslater.
>>>>>>>>>
>>>>>>>>> Much thanks for taking a look and any suggestions. We fear we'll have
>>>>>>>>> to abandon Cassandra if this bug cannot be resolved.
>>>>>>>>>
>>>>>>>>> AJ
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of Riptano, the source for professional Cassandra support
>>> http://riptano.com
>>>
>>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Occasional 10s Timeouts on Read

Reply via email to