I am having the same issue in 1.0.7 with leveled compation. It seems that the repair is flaky. It either completes relatively fast in a TEST environment (7 minutes) or gets stuck trying to receive a merkle tree from a peer that is already sending it the merkle tree.
Only solution is to restart cassandra. But, we that's not good. On Thu, Apr 26, 2012 at 2:12 PM, <user-h...@cassandra.apache.org> wrote: > > user Digest of: get.23021 > > Topics (messages 23021 through 23021) > > repair waiting for something > 23021 by: Igor > > > > Return-Path: <buzzt...@gmail.com> > Received: (qmail 18382 invoked by uid 99); 26 Apr 2012 18:12:10 -0000 > Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) > by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Apr 2012 18:12:10 > +0000 > X-ASF-Spam-Status: No, hits=1.5 required=5.0 > tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS > X-Spam-Check-By: apache.org > Received-SPF: pass (nike.apache.org: domain of buzztemk@gmail.comdesignates > 209.85.213.44 as permitted sender) > Received: from [209.85.213.44] (HELO mail-yw0-f44.google.com) > (209.85.213.44) > by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Apr 2012 18:12:03 > +0000 > Received: by yhkk25 with SMTP id k25so1353248yhk.31 > for <user-get.23...@cassandra.apache.org>; Thu, 26 Apr 2012 > 11:11:42 -0700 (PDT) > DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; > d=gmail.com; s=20120113; > h=mime-version:date:message-id:subject:from:to:content-type; > bh=r9z+JIAEkTfLo/8PFQJjtEFfJbNxrmswWgqgBxX7sGs=; > b=SqDIdsaA/YBsIb8yTAjwlLyz/3KvP2fJzedX1lywPYnAT698AbE2yGI30qpGo8rUQM > > q/QFJ5mFNQkdrn0Ghr6L+wKe+slq6Teb8C/feeHBU9BkjbaAY40UPPljJyf/L0Yr9Sp8 > > ryso93dpcgcC18DdwbAPHmxd0C9G20gf4dbQcpquAKgyxtTK849GQXpPICS4AUHlG2bL > > OY83kIzRIBv7g3Zy2SJALwYX9eeB6zGin0DbnrtgGr7IqI0LBscWv6eKNMS658twLGG+ > > 37cVt+Wmtf5QIIT/Jm2qUdBZ7NViwwlnkJL79ULGnesj4Hewp2npFQAmLypK+8fGqoAM > ie9Q== > MIME-Version: 1.0 > Received: by 10.182.113.106 with SMTP id > ix10mr10045510obb.26.1335463902287; > Thu, 26 Apr 2012 11:11:42 -0700 (PDT) > Received: by 10.60.143.102 with HTTP; Thu, 26 Apr 2012 11:11:42 -0700 (PDT) > Date: Thu, 26 Apr 2012 14:11:42 -0400 > Message-ID: < > caal7ocavuw1rtaqwlddzbnzosv7-qxqfhot7w6uj8q08m03...@mail.gmail.com> > Subject: Get > From: Frank Ng <buzzt...@gmail.com> > To: user-get.23...@cassandra.apache.org > Content-Type: multipart/alternative; boundary=f46d0447f3b081982d04be98eb8e > > > ---------------------------------------------------------------------- > > > Hi, > > 10 nodes cassandra 1.0.3, several DC. weekly nodetool repair stuck for > unusual long time for node 10.254.237.2. > > output log on this node: > INFO 11:19:42,045 Starting repair command #1, repairing 5 ranges. > INFO 11:19:42,053 [repair #040aae00-28a1-11e1-0000-e378018944ff] new > session: will sync *localhost/10.254.237.2, /10.254.221.2, /10.253.2.2, / > 10.254.217.2, /10.254.94.2* on range > (85070591730234615865843651857942052864,85070591730234615865843651857942052865] > for meter.[eventschema, schema, ids, transaction] > INFO 11:19:42,055 [repair #040aae00-28a1-11e1-0000-e378018944ff] requests > for merkle tree sent for eventschema (to [/10.253.2.2, /10.254.221.2, > localhost/10.254.237.2, /10.254.217.2, /10.254.94.2]) > INFO 11:19:42,063 Enqueuing flush of > Memtable-eventschema@1509399856(18748/23435 > serialized/live bytes, 4 ops) > INFO 11:19:42,063 Writing Memtable-eventschema@1509399856(18748/23435 > serialized/live bytes, 4 ops) > INFO 11:19:42,072 Completed flushing > /spool1/cassandra/data/meter/eventschema-hb-40-Data.db (4745 bytes) > INFO 11:19:42,073 Discarding obsolete commit > log:CommitLogSegment(/var/lib/cassandra/commitlog/CommitLog-1324019623060.log) > INFO 11:19:42,076 [repair #040aae00-28a1-11e1-0000-e378018944ff] Received > merkle tree for eventschema from localhost/10.254.237.2 > INFO 11:19:42,102 [repair #040aae00-28a1-11e1-0000-e378018944ff] Received > merkle tree for eventschema from /10.254.221.2 > INFO 11:19:42,128 [repair #040aae00-28a1-11e1-0000-e378018944ff] Received > merkle tree for eventschema from /10.254.217.2 > INFO 11:19:42,228 [repair #040aae00-28a1-11e1-0000-e378018944ff] Received > merkle tree for eventschema from /10.253.2.2 > > And nothing after that for long time. So node sent request for trees to > other nodes and received all but from the 10.254.94.2* > > *On that 10.254.94.2 node: > INFO 11:19:42,083 [repair #040aae00-28a1-11e1-0000-e378018944ff] Sending > completed merkle tree to /10.254.237.2 for (meter,eventschema) > > So merkle tree were lost somewhere. Will this waiting break somehow or I > need to restart node? > >