Hi,
After upgrading to the vnodes I created and enabled shuffle operation as
suggested. After running for a couple of hours I had to disable it
because nodes were not catching up with compactions. I repeated this
process 3 times (enable/disable).
I have 5 nodes and each of them had ~35GB. After shuffle operations
described above some nodes are now reaching ~170GB. In the log files I
can see same files transferred 2-4 times to the same host within the
same shuffle session. Worst of all, after all of these I had only 20
vnodes transferred out of 1280. So if it will continue at the same speed
it will take about a month or two to complete shuffle.
I had few question to better understand shuffle:
1. Does disabling and re-enabling shuffle starts shuffle process from
scratch or it resumes from the last point?
2. Will vnode reallocations speedup as shuffle proceeds or it will
remain the same?
3. Why I see multiple transfers of the same file to the same host? e.g.:
INFO [Streaming to /10.0.1.8:6] 2013-04-07 14:27:10,038
StreamReplyVerbHandler.java (line 44) Successfully sent
/u01/cassandra/data/Keyspace/Metadata/Keyspace-Metadata-ib-111-Data.db
to /10.0.1.8
INFO [Streaming to /10.0.1.8:7] 2013-04-07 16:27:07,427
StreamReplyVerbHandler.java (line 44) Successfully sent
/u01/cassandra/data/Keyspace/Metadata/Keyspace-Metadata-ib-111-Data.db
to /10.0.1.8
4. When I enable/disable shuffle I receive warning message such as
below. Do I need to worry about it?
cassandra-shuffle -h localhost disable
Failed to enable shuffling on 10.0.1.1!
Failed to enable shuffling on 10.0.1.3!
I couldn't find many docs on shuffle, only read through JIRA and
original proposal by Eric.
BR,
Rustam.