The move for openpower6 has hit an hardware issue and I'm needing to call IBM support to try and resolve it. I don't have any ETA on when this server will come back online at the moment.
On Fri, Mar 9, 2018 at 3:18 PM, Lance Albertson <la...@osuosl.org> wrote: > The move for openpower5 has been completed. Please let me know if any VMs > are still unreachable. > > > On Fri, Mar 9, 2018 at 10:03 AM, Lance Albertson <la...@osuosl.org> wrote: > >> The move for openpower3 has been completed. Please let me know if any VMs >> are still unreachable. >> >> >> On Thu, Mar 8, 2018 at 5:25 PM, Lance Albertson <la...@osuosl.org> wrote: >> >>> The move for openpower2 has been completed. Sorry it took a little >>> longer than planned. Please let me know if any VMs are still unreachable. >>> >>> On Thu, Mar 8, 2018 at 11:15 AM, Lance Albertson <la...@osuosl.org> >>> wrote: >>> >>>> The move for openpower1 has been completed and all VMs should be >>>> booting up or already should be back online that were on that hypervisor. >>>> Please let us know if you have an issue with one of your VMs. We'll be >>>> moving openpower2 later this afternoon as planned. >>>> >>>> Thanks- >>>> >>>> On Tue, Mar 6, 2018 at 2:36 PM, Lance Albertson <la...@osuosl.org> >>>> wrote: >>>> >>>>> Service(s) affected: >>>>> >>>>> All VMs hosted on the OpenPOWER OpenStack cluster will be offline for >>>>> approximately 2-4 hours during each server move. In addition, any VMs >>>>> which >>>>> have block storage attached to the affected nodes will have an outage. >>>>> >>>>> For a list of affected VMs per hypervisor node, please see the >>>>> following spreadsheet which includes the UUID for each instance as it >>>>> stands today. You can see what UUID your VM has by looking at the >>>>> /run/cloud-init/.instance-id file on your vm. In addition, if you're using >>>>> a block storage (cinder) volume, I have a sheet which shows the mappings >>>>> by >>>>> UUID to the host. >>>>> >>>>> OpenStack Cluster Server Moves >>>>> <https://docs.google.com/a/osuosl.org/spreadsheets/d/15D3VE13chSn0jmGWpf5wsPsin6ex0B3I6FTwS74T5uY/edit?usp=drive_web> >>>>> >>>>> Outage Window >>>>> s >>>>> : >>>>> >>>>> openpower1 >>>>> Start: Thu, Mar 8, 9:00AM PST (Thu Mar 8 1700 UTC) >>>>> End: Thu, Mar 8, 11:00AM PST (Thu Mar 8 1900 UTC) >>>>> >>>>> openpower2 >>>>> Start: Thu, Mar 8, 3:00PM PST (Thu Mar 8 2300 UTC) >>>>> End: Thu, Mar 8, 5:00PM PST (Fri Mar 9 0100 UTC) >>>>> >>>>> openpower3 >>>>> Start: Fri, Mar 9, 8:30AM PST (Fri Mar 9 1630 UTC) >>>>> End: Fri Mar 9, 10:30AM PST (Fri Mar 9 1830 UTC) >>>>> >>>>> openpower5 >>>>> Start: Fri, Mar 9, 1:00PM PST (Fri Mar 9 2100 UTC) >>>>> End: Fri Mar 9, 3:00PM PST (Fri Mar 9 2300 UTC) >>>>> >>>>> openpower6 (note DST change for us) >>>>> Start: Mon, Mar 12, 1:00PM PDT (Fri Mar 9 2000 UTC) >>>>> End: Mon Mar 12, 3:00PM PDT (Fri Mar 9 2200 UTC) >>>>> >>>>> Reason for outage: >>>>> >>>>> We are in the process of migrating the storage backend of the >>>>> cluster from local storage to using Ceph as a backend. The migration to >>>>> Ceph should improve I/O bandwidth and capacity and also provide more >>>>> flexibility with doing server maintenance since we can do live migrations >>>>> on VMs. Thanks to a donation from IBM, we have a new five node Ceph >>>>> cluster >>>>> with 292TB of capacity including SSD's for journal caching. In addition, >>>>> we're going to be upgrading the networking layer from 1Gbps to 40Gbps due >>>>> to the use of Ceph thanks to several donations from Mellanox. Since we're >>>>> going to be incurring an outage for the server move, we wanted to do a few >>>>> other items as the same time to reduce additional outage times. >>>>> >>>>> The first phase of this migration includes the following (which this >>>>> outage covers): >>>>> >>>>> 1. Moving each compute server to a different rack closer to a Mellanox >>>>> 40G switch >>>>> 2. Installing and configuring a Mellanox 40G NIC card >>>>> 3. Upgrading the system firmware (which includes Meltdown/Spectre >>>>> fixes) >>>>> 4. Switching over to a 4.14 mainline kernel on the host to provide >>>>> better feature support on ppc64le (also provides fixes for >>>>> Meltdown/Spectre) >>>>> >>>>> We have five compute nodes and we're planning on doing two sever moves >>>>> a day starting on Thursday of this week. We're going to need to bring the >>>>> nodes up and down several times so we'll be disabling the openstack >>>>> services on those nodes until the process is complete. >>>>> >>>>> The second phase of the migration will happen in a few weeks and >>>>> should only have per VM impacts while we migrate them over to the new Ceph >>>>> cluster. I'll send a separate announcement about that once we're ready for >>>>> that. >>>>> >>>>> If you have any questions or concerns please let me know directly via >>>>> email or IRC. >>>>> >>>>> Thanks! >>>>> >>>>> -- >>>>> Lance Albertson >>>>> Director >>>>> Oregon State University | Open Source Lab >>>>> >>>> >>>> >>>> >>>> -- >>>> Lance Albertson >>>> Director >>>> Oregon State University | Open Source Lab >>>> >>> >>> >>> >>> -- >>> Lance Albertson >>> Director >>> Oregon State University | Open Source Lab >>> >> >> >> >> -- >> Lance Albertson >> Director >> Oregon State University | Open Source Lab >> > > > > -- > Lance Albertson > Director > Oregon State University | Open Source Lab > -- Lance Albertson Director Oregon State University | Open Source Lab
_______________________________________________ openpower mailing list openpo...@osuosl.org https://lists.osuosl.org/mailman/listinfo/openpower