For what it is worth, I finally wrote a blog post about this --> http://thelastpickle.com/blog/2016/02/25/removing-a-disk-mapping-from-cassandra.html
If you are not done yet, every step is detailed in there. C*heers, ----------------------- Alain Rodriguez - al...@thelastpickle.com France The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com 2016-02-19 10:04 GMT+01:00 Alain RODRIGUEZ <arodr...@gmail.com>: > Alain, thanks for sharing! I'm confused why you do so many repetitive >> rsyncs. Just being cautious or is there another reason? Also, why do you >> have --delete-before when you're copying data to a temp (assumed empty) >> directory? > > > Since they are immutable I do a first sync while everything is up and >> running to the new location which runs really long. Meanwhile new ones are >> created and I sync them again online, much less files to copy now. After >> that I shutdown the node and my last rsync now has to copy only a few files >> which is quite fast and so the downtime for that node is within minutes. > > > Jan guess is right. Except for the "immutable" thing. Compaction can make > big files go away, replaced by bigger ones you'll have to stream again. > > Here is a detailed explanation about what I did it this way. > > More precisely, let's say we have 10 files of 100 GB on the disk to remove > (let's say 'old-dir') > > I run a first rsync to an empty folder indeed (let's call this 'tmp-dir'), > in the disk that will remain after the operation. Let's say this takes > about 10 hours. This can be run in parallel though. > > So I now have 10 files of 10GB on the tmp-dir. But meanwhile one > compaction triggered and I now have 6 files of 100 GB and 1 of 350 GB. > > At this point I disable compaction, stop running ones. > > My second rsync has to remove the 4 files that were compacted from > tmp-dir, so that's why I use the '--delete-before'. As this tmp-dir needs > to be mirroring old-dir, this is fine. This new operation takes 3.5 hours, > also runnable in parallel (Keep in mind C* won't compact anything for 3.5 > hours, that's why I did not stopped compaction before the first rsync, in > my case dataset was 2 TB big) > > At this point I have 950 GB in tmp-dir, but meanwhile clients continued to > write on the disk. let's say 50 GB more. > > 3rd rsync will take 0.5 hour, no compaction ran, so I just have to add the > diff to tmp-dir. Still runnable in parallel. > > Then the script stop the node, so should be run sequentially, and perform > 2 more rsync, the first one to take the diff between end of 3rd rsync and > the moment you stop the node, should be a few seconds, minutes maybe, > depending how fast you ran the script after 3rd rsync ended. The second > rsync in the script is a 'useless' one. I just like to control things. I > run it, expect to see it to say that there is no diff. It is just a way to > stop the script if for some reason data is still being appended to old-dir. > > Then I just move all the files from tmp-dir to new-dir (the proper data > dir remaining after the operation). This is an instant op a files are not > really moved as they already are on disk. That's due to system files > property. > > I finally unmount and rm -rf old-dir. > > So the full op takes 10h + 3.5 h + 0.5h + (number of noodes * 0.1 h) and > nodes are down for about 5-10 min. > > VS > > Straight forward way (stop node, move, start node) : 10 h * number of node > as this needs to be sequential. Plus each node is down for 10 hours, you > have to repair them as it is higher than hinted handoff limit... > > Branton, I did not went through your process, but I guess you will be able > to review it by yourself after reading the above (typically, repair is not > needed if you use the strategy I describe above, as node is down for 5-10 > minutes). Also, not sure how "rsync -azvuiP /var/data/cassandra/data2/ > /var/data/cassandra/data/" will behave, my guess i this is going to do a > copy, so this might be very long. My script perform an instant move and as > the next command is 'rm -Rf /var/data/cassandra/data2' I see no reason > copying rather than moving files. > > Your solution would probably work, but with big constraints on operational > point of view (very long operation + repair needed) > > Hope this long email will be useful, maybe should I blog about this. Let > me know if the process above makes sense or if some things might be > improved. > > C*heers, > ----------------- > Alain Rodriguez > France > > The Last Pickle > http://www.thelastpickle.com > > 2016-02-19 7:19 GMT+01:00 Branton Davis <branton.da...@spanning.com>: > >> Jan, thanks! That makes perfect sense to run a second time before >> stopping cassandra. I'll add that in when I do the production cluster. >> >> On Fri, Feb 19, 2016 at 12:16 AM, Jan Kesten <j.kes...@enercast.de> >> wrote: >> >>> Hi Branton, >>> >>> two cents from me - I didnt look through the script, but for the rsyncs >>> I do pretty much the same when moving them. Since they are immutable I do a >>> first sync while everything is up and running to the new location which >>> runs really long. Meanwhile new ones are created and I sync them again >>> online, much less files to copy now. After that I shutdown the node and my >>> last rsync now has to copy only a few files which is quite fast and so the >>> downtime for that node is within minutes. >>> >>> Jan >>> >>> >>> >>> Von meinem iPhone gesendet >>> >>> Am 18.02.2016 um 22:12 schrieb Branton Davis <branton.da...@spanning.com >>> >: >>> >>> Alain, thanks for sharing! I'm confused why you do so many repetitive >>> rsyncs. Just being cautious or is there another reason? Also, why do you >>> have --delete-before when you're copying data to a temp (assumed empty) >>> directory? >>> >>> On Thu, Feb 18, 2016 at 4:12 AM, Alain RODRIGUEZ <arodr...@gmail.com> >>> wrote: >>> >>>> I did the process a few weeks ago and ended up writing a runbook and a >>>> script. I have anonymised and share it fwiw. >>>> >>>> https://github.com/arodrime/cassandra-tools/tree/master/remove_disk >>>> >>>> It is basic bash. I tried to have the shortest down time possible, >>>> making this a bit more complex, but it allows you to do a lot in parallel >>>> and just do a fast operation sequentially, reducing overall operation time. >>>> >>>> This worked fine for me, yet I might have make some errors while making >>>> it configurable though variables. Be sure to be around if you decide to run >>>> this. Also I automated this more by using knife (Chef), I hate to repeat >>>> ops, this is something you might want to consider. >>>> >>>> Hope this is useful, >>>> >>>> C*heers, >>>> ----------------- >>>> Alain Rodriguez >>>> France >>>> >>>> The Last Pickle >>>> http://www.thelastpickle.com >>>> >>>> 2016-02-18 8:28 GMT+01:00 Anishek Agarwal <anis...@gmail.com>: >>>> >>>>> Hey Branton, >>>>> >>>>> Please do let us know if you face any problems doing this. >>>>> >>>>> Thanks >>>>> anishek >>>>> >>>>> On Thu, Feb 18, 2016 at 3:33 AM, Branton Davis < >>>>> branton.da...@spanning.com> wrote: >>>>> >>>>>> We're about to do the same thing. It shouldn't be necessary to shut >>>>>> down the entire cluster, right? >>>>>> >>>>>> On Wed, Feb 17, 2016 at 12:45 PM, Robert Coli <rc...@eventbrite.com> >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Feb 16, 2016 at 11:29 PM, Anishek Agarwal <anis...@gmail.com >>>>>>> > wrote: >>>>>>>> >>>>>>>> To accomplish this can I just copy the data from disk1 to disk2 >>>>>>>> with in the relevant cassandra home location folders, change the >>>>>>>> cassanda.yaml configuration and restart the node. before starting i >>>>>>>> will >>>>>>>> shutdown the cluster. >>>>>>>> >>>>>>> >>>>>>> Yes. >>>>>>> >>>>>>> =Rob >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >