It's taking longer than I expected to sync the data back to the local disks. This is due to the fact that the system is also rebuilding two RAID6 arrays which I forgot to account for. This is also making the system more slower than I expected. At this rate it might take a few days to copy all of the data back. Hopefully once the RAID6 arrays have finished rebuilding, the I/O rate will speed up the syncing. Both arrays are currently at 55% and 47% and we've transferred over 993G of 8.8T of data to the local disks.
I will send another update once I'm ready switch the system back over. Thanks- On Mon, Jun 18, 2018 at 3:49 PM, Lance Albertson <la...@osuosl.org> wrote: > I just wanted to send you all an update on where we're at in the process. > > As of right now, ftp-osl is back online and serving it's content from the > the Ceph volume. I've gone ahead and kicked off a few manual syncs to catch > everything up however if you're using us as a master I recommend you kick > off an update job right now. I'm also currently copying the content to the > local disks which I expect to run through tomorrow sometime. > > The rebuild took a little bit longer than originally planned due to some > issues I ran into building the new RAID array. My original plan didn't work > so I had to go with plan B which took a little longer. Plan B resulted in > creating two separate RAID6 arrays which means I lost about 2T in capacity > from my original plan. > > I'm keeping ftp-osl out of the public rotation for now since it's I/O > throughput isn't likely as good as before since it's serving the content > via Ceph. > > I'll send another update tomorrow when I'm ready to switch back over to > local storage. Please let me know if you notice any issues. > > Thanks- > > On Thu, Jun 14, 2018 at 3:52 PM, Lance Albertson <la...@osuosl.org> wrote: > >> I had a few questions regarding this outages that I wanted to clarify for >> everyone. >> >> 1. There should be no outage during the 5.5 hour outage window for >> anything pointed to ftp.osuosl.org (unless your DNS is directly pointing >> at ftp-osl.osuosl.org) >> 2. During the 18-24hr sync from ceph to local storage, ftp-osl should >> have normal read/write operations. There might be a little bit of I/O >> performance hit during that window but it's hard to tell. There will be a >> short (likely 5 min) outage to read/writes on ftp-osl when I do the final >> switch back to local storage however. >> >> On Thu, Jun 14, 2018 at 10:00 AM, Lance Albertson <la...@osuosl.org> >> wrote: >> >>> Service(s) affected: ftp.osuosl.org >>> >>> During the outage, the master syncing node for our FTP cluster (ftp-osl) >>> will be offline which means any updates to our software mirrors will be >>> delayed. >>> >>> Outage Window: >>> Start: Mon, Jun 18 9:30AM PDT (Mon Jun 18 1630 UTC) >>> End: Mon, Jun 18 3:00PM PDT (Mon Jun 18 2200 UTC) >>> >>> Reason for outage: >>> >>> Our FTP cluster is starting to run low on disk space and we will be >>> adding additional hard drives to the system. Our system currently has >>> 9.375T of disk space and we're planning on upgrading it to 18.75T (this >>> takes into account the RAID6 configuration) >>> >>> Unfortunately, due to the nature of the how the disk arrays are >>> configured, we will not be able to grow the RAID array without a complete >>> rebuild. This means we're going to have to re-copy all 8.8TB of data off of >>> the machine and back onto it. Since this task is rather large and time >>> consuming we've come up with a better alternative so that we don't have our >>> master FTP server offline for very long. >>> >>> We have just recently built a new Ceph cluster for some new storage >>> needs at the OSL and we are going to temporarily use this cluster to serve >>> the ftp-osl content. I've already copied the content onto a new volume and >>> have tested it enough to feel it can handle the load. This should make the >>> transition plan much easier and quicker than initially.This server is >>> already out of DNS rotation and we are planning on keeping it out of >>> rotation until this process is complete to reduce the I/O load. >>> >>> So here's the plan thus far starting on Monday: >>> >>> 1. Stopping all services on the system and doing one final rsync to the >>> Ceph volume >>> 2. Rebooting machine and destroying the current RAID and creating a new >>> one with the new disks >>> 3. Reinstall the OS >>> 4. Bootstrap machine without FTP components initially, setup ceph volume >>> 5. Deploy FTP components after Ceph volume is setup and ready to go >>> 6. Ensure inter FTP node syncing is working using the Ceph volume >>> 7. Sync data from Ceph volume back over to local disks (I'm guessing >>> this will take 18-24 hours) >>> 8. Once sync is complete, shutdown all services and switch the mount >>> point over to the local disks >>> 9. Profit! >>> >>> I would like to thank IBM for donating the hard drives needed for this >>> upgrade. >>> >>> We will plan on doing the storage upgrades on our two other nodes >>> (ftp-nyc & ftp-chi) soon, however we won't be using the Ceph cluster for >>> this since they are remote. The current plan is to take one machine out for >>> several days and sync the data back between the nodes. I will send another >>> outage announcement for those two nodes once we're ready for that. We still >>> need to ship the drives to the locations and work with the local data >>> centers to get them installed. >>> >>> Projects affected: Any project using our FTP cluster as a master syncing >>> point >>> >> -- Lance Albertson Director Oregon State University | Open Source Lab
_______________________________________________ Hosting mailing list host...@osuosl.org https://lists.osuosl.org/mailman/listinfo/hosting