Hi,

After upgrading to the vnodes I created and enabled shuffle operation as suggested. After running for a couple of hours I had to disable it because nodes were not catching up with compactions. I repeated this process 3 times (enable/disable).

I have 5 nodes and each of them had ~35GB. After shuffle operations described above some nodes are now reaching ~170GB. In the log files I can see same files transferred 2-4 times to the same host within the same shuffle session. Worst of all, after all of these I had only 20 vnodes transferred out of 1280. So if it will continue at the same speed it will take about a month or two to complete shuffle.

I had few question to better understand shuffle:

1. Does disabling and re-enabling shuffle starts shuffle process from
   scratch or it resumes from the last point?

2. Will vnode reallocations speedup as shuffle proceeds or it will
   remain the same?

3. Why I see multiple transfers of the same file to the same host? e.g.:

   INFO [Streaming to /10.0.1.8:6] 2013-04-07 14:27:10,038
   StreamReplyVerbHandler.java (line 44) Successfully sent
   /u01/cassandra/data/Keyspace/Metadata/Keyspace-Metadata-ib-111-Data.db
   to /10.0.1.8
   INFO [Streaming to /10.0.1.8:7] 2013-04-07 16:27:07,427
   StreamReplyVerbHandler.java (line 44) Successfully sent
   /u01/cassandra/data/Keyspace/Metadata/Keyspace-Metadata-ib-111-Data.db
   to /10.0.1.8

4. When I enable/disable shuffle I receive warning message such as
   below. Do I need to worry about it?

   cassandra-shuffle -h localhost disable
   Failed to enable shuffling on 10.0.1.1!
   Failed to enable shuffling on 10.0.1.3!

I couldn't find many docs on shuffle, only read through JIRA and original proposal by Eric.

BR,
Rustam.

Reply via email to