monitor the repair using nodetool compactionstats to see the merkle trees being created, and nodetool netstats to see data streaming.
Also look in the logs for messages from AntiEntropyService.java , that will tell you how long the node waited for each replica to get back to it. Cheers ----------------- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 4/04/2013, at 5:42 PM, Ondřej Černoš <cern...@gmail.com> wrote: > Hi, > > most has been resolved - the failed to uncompress error was really a > bug in cassandra (see > https://issues.apache.org/jira/browse/CASSANDRA-5391) and the problem > with different load reporting is a change between 1.2.1 (reports 100% > for 3 replicas/3 nodes/2 DCs setup I have) and 1.2.3 which reports the > fraction. Is this correct? > > Anyway, the nodetool repair still takes ages to finish, considering > only megabytes of not changing data are involved in my test: > > [root@host:/etc/puppet] nodetool repair ks > [2013-04-04 13:26:46,618] Starting repair command #1, repairing 1536 > ranges for keyspace ks > [2013-04-04 13:47:17,007] Repair session > 88ebc700-9d1a-11e2-a0a1-05b94e1385c7 for range > (-2270395505556181001,-2268004533044804266] finished > ... > [2013-04-04 13:47:17,063] Repair session > 65d31180-9d1d-11e2-a0a1-05b94e1385c7 for range > (1069254279177813908,1070290707448386360] finished > [2013-04-04 13:47:17,063] Repair command #1 finished > > This is the status before the repair (by the way, after the datacenter > has been bootstrapped from the remote one): > > [root@host:/etc/puppet] nodetool status > Datacenter: us-east > =================== > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns Host ID > Rack > UN xxx.xxx.xxx.xxx 5.74 MB 256 17.1% > 06ff8328-32a3-4196-a31f-1e0f608d0638 1d > UN xxx.xxx.xxx.xxx 5.73 MB 256 15.3% > 7a96bf16-e268-433a-9912-a0cf1668184e 1d > UN xxx.xxx.xxx.xxx 5.72 MB 256 17.5% > 67a68a2a-12a8-459d-9d18-221426646e84 1d > Datacenter: na-dev > ================== > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns Host ID > Rack > UN xxx.xxx.xxx.xxx 5.74 MB 256 16.4% > eb86aaae-ef0d-40aa-9b74-2b9704c77c0a cmp02 > UN xxx.xxx.xxx.xxx 5.74 MB 256 17.0% > cd24af74-7f6a-4eaa-814f-62474b4e4df1 cmp01 > UN xxx.xxx.xxx.xxx 5.74 MB 256 16.7% > 1a55cfd4-bb30-4250-b868-a9ae13d81ae1 cmp05 > > Why does it take 20 minutes to finish? Fortunately the big number of > compactions I reported in the previous email was not triggered. > > And is there a documentation where I could find the exact semantics of > repair when vnodes are used (and what -pr means in such a setup) and > when run in multiple datacenter setup? I still don't quite get it. > > regards, > Ondřej Černoš > > > On Thu, Mar 28, 2013 at 3:30 AM, aaron morton <aa...@thelastpickle.com> wrote: >> During one of my tests - see this thread in this mailing list: >> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/java-io-IOException-FAILED-TO-UNCOMPRESS-5-exception-when-running-nodetool-rebuild-td7586494.html >> >> That thread has been updated, check the bug ondrej created. >> >> How will this perform in production with much bigger data if repair >> takes 25 minutes on 7MB and 11k compactions were triggered by the >> repair run? >> >> Seems a little odd. >> See what happens the next time you run repair. >> >> Cheers >> >> ----------------- >> Aaron Morton >> Freelance Cassandra Consultant >> New Zealand >> >> @aaronmorton >> http://www.thelastpickle.com >> >> On 27/03/2013, at 2:36 AM, Ondřej Černoš <cern...@gmail.com> wrote: >> >> Hi all, >> >> I have 2 DCs, 3 nodes each, RF:3, I use local quorum for both reads and >> writes. >> >> Currently I test various operational qualities of the setup. >> >> During one of my tests - see this thread in this mailing list: >> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/java-io-IOException-FAILED-TO-UNCOMPRESS-5-exception-when-running-nodetool-rebuild-td7586494.html >> - I ran into this situation: >> >> - all nodes have all data and agree on it: >> >> [user@host1-dc1:~] nodetool status >> >> Datacenter: na-prod >> =================== >> Status=Up/Down >> |/ State=Normal/Leaving/Joining/Moving >> -- Address Load Tokens Owns >> (effective) Host ID Rack >> UN XXX.XXX.XXX.XXX 7.74 MB 256 100.0% >> 0b1f1d79-52af-4d1b-a86d-bf4b65a05c49 cmp17 >> UN XXX.XXX.XXX.XXX 7.74 MB 256 100.0% >> 039f206e-da22-44b5-83bd-2513f96ddeac cmp10 >> UN XXX.XXX.XXX.XXX 7.72 MB 256 100.0% >> 007097e9-17e6-43f7-8dfc-37b082a784c4 cmp11 >> Datacenter: us-east >> =================== >> Status=Up/Down >> |/ State=Normal/Leaving/Joining/Moving >> -- Address Load Tokens Owns >> (effective) Host ID Rack >> UN XXX.XXX.XXX.XXX 7.73 MB 256 100.0% >> a336efae-8d9c-4562-8e2a-b766b479ecb4 1d >> UN XXX.XXX.XXX.XXX 7.73 MB 256 100.0% >> ab1bbf0a-8ddc-4a12-a925-b119bd2de98e 1d >> UN XXX.XXX.XXX.XXX 7.73 MB 256 100.0% >> f53fd294-16cc-497e-9613-347f07ac3850 1d >> >> - only one node disagrees: >> >> [user@host1-dc2:~] nodetool status >> Datacenter: us-east >> =================== >> Status=Up/Down >> |/ State=Normal/Leaving/Joining/Moving >> -- Address Load Tokens Owns Host ID >> Rack >> UN XXX.XXX.XXX.XXX 7.73 MB 256 17.6% >> a336efae-8d9c-4562-8e2a-b766b479ecb4 1d >> UN XXX.XXX.XXX.XXX 7.75 MB 256 16.4% >> ab1bbf0a-8ddc-4a12-a925-b119bd2de98e 1d >> UN XXX.XXX.XXX.XXX 7.73 MB 256 15.7% >> f53fd294-16cc-497e-9613-347f07ac3850 1d >> Datacenter: na-prod >> =================== >> Status=Up/Down >> |/ State=Normal/Leaving/Joining/Moving >> -- Address Load Tokens Owns Host ID >> Rack >> UN XXX.XXX.XXX.XXX 7.74 MB 256 16.9% >> 0b1f1d79-52af-4d1b-a86d-bf4b65a05c49 cmp17 >> UN XXX.XXX.XXX.XXX 7.72 MB 256 17.1% >> 007097e9-17e6-43f7-8dfc-37b082a784c4 cmp11 >> UN XXX.XXX.XXX.XXX 7.73 MB 256 16.3% >> 039f206e-da22-44b5-83bd-2513f96ddeac cmp10 >> >> I tried to rebuild the node from scratch, repair the node, no results. >> Still the same owns stats. >> >> The cluster is built from cassandra 1.2.3 and uses vnodes. >> >> >> On the related note: the data size, as you can see, is really small. >> The cluster was created by setting up the us-east datacenter, >> populating it with the dataset, then building the na-prod datacenter >> and running nodetool rebuild us-east. When I tried to run nodetool >> repair it took 25 minutes to finish, on this small dataset. Is this >> ok? >> >> One other think I notices is the amount of compactions on the system >> keyspace: >> >> /.../system/schema_columnfamilies/system-schema_columnfamilies-ib-11694-TOC.txt >> /.../system/schema_columnfamilies/system-schema_columnfamilies-ib-11693-Statistics.db >> >> This is just after running the repair. Is this ok, considering the >> dataset is 7MB and during the repair no operations were running >> against the database, neither read, nor write, nothing? >> >> How will this perform in production with much bigger data if repair >> takes 25 minutes on 7MB and 11k compactions were triggered by the >> repair run? >> >> regards, >> >> Ondrej Cernos >> >>