I am not familiar with shuffle, but if you attempt a shuffle and it fails if would be a good idea to let compaction die down, or even trigger major compaction on the nodes where the size grew. The reason is because once the data files are on disk, even if they are duplicates, cassandra does not know that fact. Thus if you do a move or shuffle again cassandra will try to move all that duplicated data again. In other words, if some failed operation grows the size of your data, deal with that first before trying that same operation again.
FOr now your best bet is to run major compact on each node and get the data sizes small again. On Sun, Apr 7, 2013 at 8:43 AM, Rustam Aliyev <rustam.li...@code.az> wrote: > Hi, > > After upgrading to the vnodes I created and enabled shuffle operation as > suggested. After running for a couple of hours I had to disable it because > nodes were not catching up with compactions. I repeated this process 3 > times (enable/disable). > > I have 5 nodes and each of them had ~35GB. After shuffle operations > described above some nodes are now reaching ~170GB. In the log files I can > see same files transferred 2-4 times to the same host within the same > shuffle session. Worst of all, after all of these I had only 20 vnodes > transferred out of 1280. So if it will continue at the same speed it will > take about a month or two to complete shuffle. > > I had few question to better understand shuffle: > > 1. Does disabling and re-enabling shuffle starts shuffle process from > scratch or it resumes from the last point? > > 2. Will vnode reallocations speedup as shuffle proceeds or it will > remain the same? > > 3. Why I see multiple transfers of the same file to the same host? > e.g.: > > INFO [Streaming to /10.0.1.8:6] 2013-04-07 14:27:10,038 > StreamReplyVerbHandler.java (line 44) Successfully sent > /u01/cassandra/data/Keyspace/Metadata/Keyspace-Metadata-ib-111-Data.db > to /10.0.1.8 > INFO [Streaming to /10.0.1.8:7] 2013-04-07 16:27:07,427 > StreamReplyVerbHandler.java (line 44) Successfully sent > /u01/cassandra/data/ > Keyspace/Metadata/Keyspace-Metadata-ib-111-Data.db to /10.0.1.8 > > 4. When I enable/disable shuffle I receive warning message such as > below. Do I need to worry about it? > > cassandra-shuffle -h localhost disable > Failed to enable shuffling on 10.0.1.1! > Failed to enable shuffling on 10.0.1.3! > > I couldn't find many docs on shuffle, only read through JIRA and original > proposal by Eric. > > BR, > Rustam. > >