I Forgot to say that we use version 1.2.2 (we'll update soon but I didn't see any change about that in CHANGES.txt) -- Cyril SCETBON
On 27 Jan 2014, at 12:01, Cyril Scetbon <cyril.scet...@free.fr> wrote: > Hi, > > When one node has crashed for system reasons, it takes more than an hour to > come back in the ring. During this time, no other node sees it : > > Datacenter: b1 > ============== > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns Host ID > Rack > DN XXXXXXXXXX ? 256 3.8% > 7b3d0ac4-bdf6-4e09-8a11-9794b1481c95 b05 > DN XXXXXXXXXX ? 256 3.1% > 3a1172df-0260-4398-a008-05dc77e9f763 c03 > DN XXXXXXXXXX ? 256 3.7% > 9e3cfd48-5697-4150-898e-b176d0eed4a0 b05 > DN XXXXXXXXXX ? 256 3.7% > 347df11c-0d83-429c-a7a0-8d20c21a075a c09 > DN XXXXXXXXXX ? 256 3.8% > d4083488-c614-4786-851b-e50a407d61a9 c03 > DN XXXXXXXXXX ? 256 3.7% > 5a50d537-08fb-48cb-b8a0-829acb05b72e b08 > DN XXXXXXXXXX ? 256 3.6% > a309c0da-aee8-4fed-aa9c-16ae103e42d3 c09 > DN XXXXXXXXXX ? 256 3.5% > 41ff6e09-fb84-46f5-9efd-33f6ade49d7f b08 > DN XXXXXXXXXX ? 256 3.2% > ad3ba9a2-5fe4-4208-b5ae-4f1a40942bb9 b08 > DN XXXXXXXXXX ? 256 3.4% > 40140f99-e1b0-4fe0-93d2-cafdde05151f c09 > DN XXXXXXXXXX ? 256 3.4% > f0c37b06-a335-49ab-819f-603945507ee9 b05 > DN XXXXXXXXXX ? 256 3.4% > ef1df7f6-5ae9-4ebf-bb14-e1373fc451ea c03 > Datacenter: s1 > ============== > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns Host ID > Rack > DN XXXXXXXXXX ? 256 3.5% > bbfbc2bb-dbee-4221-804d-9cc0760cc440 k09 > UN XXXXXXXXXX 113.13 GB 256 3.5% > f6e41cf7-fffa-4a24-bc2d-0325051afd8f h05 > DN XXXXXXXXXX ? 256 3.8% > 3f66cbb2-427b-4bb0-8521-9789ce4358fa h05 > DN XXXXXXXXXX ? 256 3.5% > c4763e28-48cf-4581-b576-6c0b06924ec6 h05 > DN XXXXXXXXXX ? 256 3.2% > 8edb1155-990a-4946-a251-bb4cb4c59552 b05 > DN XXXXXXXXXX ? 256 4.2% > 695adecd-4d49-412b-94db-cf695e3b5298 h05 > DN XXXXXXXXXX ? 256 3.8% > 60b9b784-25ec-4a5a-ac76-d1c19bb2be72 c05 > DN XXXXXXXXXX ? 256 3.8% > 3cf22978-c8d9-474e-8f6d-dbbcb1d7784e b05 > DN XXXXXXXXXX ? 256 3.5% > 4cfb5924-ea62-465b-8e39-b0b77a809422 k09 > DN XXXXXXXXXX ? 256 3.3% > 08d99c7c-fb6b-4731-8408-27afb6aa79e5 k09 > DN XXXXXXXXXX ? 256 3.1% > 1fb09426-3191-46f5-ab54-c2b6e980fcfe k09 > DN XXXXXXXXXX ? 256 3.5% > 79f64055-2681-43d2-a8e3-375ca9d6b771 c05 > DN XXXXXXXXXX ? 256 3.7% > 88a8c59e-4dc9-47b2-b7d7-bb422199fa76 b05 > DN XXXXXXXXXX ? 256 3.7% > 1d6ef3e5-76bc-4cac-9151-bbfd5b5e7e0e c05 > DN XXXXXXXXXX ? 256 3.4% > 79cf98d7-3bfe-4a94-97bd-95837dbe7623 c05 > DN XXXXXXXXXX ? 256 4.1% > 541cd94b-1f94-47a4-83d3-66ed3ffe222d b05 > > there is nothing noticeable in the logs even if debug mode : > > INFO [main] 2014-01-27 10:00:21,706 TServerCustomFactory.java (line 47) > Using synchronous/threadpool thrift server on 0.0.0.0 : 9160 > INFO [Thread-8] 2014-01-27 10:00:21,707 ThriftServer.java (line 110) > Listening for thrift clients... > WARN [NonPeriodicTasks:1] 2014-01-27 10:00:31,765 > Password4LevelAuthenticator.java (line 205) PasswordAuthenticator skipped > default user setup: some nodes were not ready > WARN [NonPeriodicTasks:1] 2014-01-27 10:00:31,794 Auth.java (line 207) > Skipped default superuser setup: some nodes were not ready > > Top threads are RMI threads : > > <Screen Shot 2014-01-27 at 11.23.29.png> > > and more than one hour later we see : > > DEBUG [Thread-3964] 2014-01-27 11:24:18,856 IncomingTcpConnection.java (line > 75) Connection version 6 from /XXXXXXXXXX > DEBUG [Thread-3964] 2014-01-27 11:24:18,857 IncomingTcpConnection.java (line > 112) Upgrading incoming connection to be compressed > DEBUG [Thread-3964] 2014-01-27 11:24:18,857 IncomingTcpConnection.java (line > 120) Max version for /XXXXXXXXXX is 6 > DEBUG [Thread-3964] 2014-01-27 11:24:18,857 MessagingService.java (line 805) > Setting version 6 for /XXXXXXXXXX > DEBUG [Thread-3964] 2014-01-27 11:24:18,858 IncomingTcpConnection.java (line > 129) set version for /XXXXXXXXXX to 6 > DEBUG [Thread-3964] 2014-01-27 11:24:18,862 MessagingService.java (line 812) > Reseting version for /XXXXXXXXXX > DEBUG [Thread-3965] 2014-01-27 11:24:18,867 IncomingTcpConnection.java (line > 75) Connection version 6 from /XXXXXXXXXX > DEBUG [Thread-3965] 2014-01-27 11:24:18,867 IncomingTcpConnection.java (line > 112) Upgrading incoming connection to be compressed > DEBUG [Thread-3965] 2014-01-27 11:24:18,869 IncomingTcpConnection.java (line > 120) Max version for /XXXXXXXXXX is 6 > DEBUG [Thread-3965] 2014-01-27 11:24:18,869 MessagingService.java (line 805) > Setting version 6 for /XXXXXXXXXX > DEBUG [Thread-3965] 2014-01-27 11:24:18,869 IncomingTcpConnection.java (line > 129) set version for /XXXXXXXXXX to 6 > DEBUG [GossipStage:1] 2014-01-27 11:24:18,876 Gossiper.java (line 722) > Clearing interval times for /XXXXXXXXXX due to generation change > DEBUG [GossipStage:1] 2014-01-27 11:24:18,878 Gossiper.java (line 722) > Clearing interval times for /XXXXXXXXXX due to generation change > DEBUG [GossipStage:1] 2014-01-27 11:24:18,878 Gossiper.java (line 722) > Clearing interval times for /XXXXXXXXXX due to generation change > > We meet this issue only when the system crashes > > any idea of a possible origin or a known behaviour ? > -- > Cyril SCETBON >