Re: [ceph-users] RHEL 7.1 ceph-disk failures creating OSD
Hi Bruce, I think the problem comes from using /dev/disk/by-id/wwn-0x53959bd02f56 instead of /dev/sdw for the data disk, because ceph-disk has a device name parsing logic that works with /dev/XXX. Could you run the ceph-disk prepare command again with --verbose to confirm ? If that's the case there should be an error instead of what appears to be something that only does part of the work. Cheers On 26/06/2015 18:56, Bruce McFarland wrote: > Loic, > Thank you very much for the partprobe workaround. I rebuilt the cluster using > 94.2. > > I've created partitions on the journal SSDs with parted and then use > ceph-disk prepare as below. I'm not seeing all of the disks with the tmp > mounts when I check 'mount' but I also don't see any of the mount directory > mount points at /var/lib/ceph/osd. I'm see the following output from prepare. > When I attempt to 'activate' it errors out saying the devices don't exist. > > ceph-disk prepare --cluster ceph --cluster-uuid > b2c2e866-ab61-4f80-b116-20fa2ea2ca94 --fs-type xfs --zap-disk > /dev/disk/by-id/wwn-0x53959bd02f56 > /dev/disk/by-id/wwn-0x500080d91010024b-part1 > Caution: invalid backup GPT header, but valid main header; regenerating > backup header from main header. > > > Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk > verification and recovery are STRONGLY recommended. > > GPT data structures destroyed! You may now partition the disk using fdisk or > other utilities. > Creating new GPT entries. > The operation has completed successfully. > partx: specified range <1:0> does not make sense > WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the same > device as the osd data > WARNING:ceph-disk:Journal /dev/disk/by-id/wwn-0x500080d91010024b-part1 was > not prepared with ceph-disk. Symlinking directly. > The operation has completed successfully. > partx: /dev/disk/by-id/wwn-0x53959bd02f56: error adding partition 1 > meta-data=/dev/sdw1 isize=2048 agcount=4, agsize=244188597 blks > = sectsz=512 attr=2, projid32bit=1 > = crc=0finobt=0 > data = bsize=4096 blocks=976754385, imaxpct=5 > = sunit=0 swidth=0 blks > naming =version 2 bsize=4096 ascii-ci=0 ftype=0 > log =internal log bsize=4096 blocks=476930, version=2 > = sectsz=512 sunit=0 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 > The operation has completed successfully. > partx: /dev/disk/by-id/wwn-0x53959bd02f56: error adding partition 1 > > > [root@ceph0 ceph]# ceph -v > ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3) > [root@ceph0 ceph]# rpm -qa | grep ceph > ceph-radosgw-0.94.2-0.el7.x86_64 > libcephfs1-0.94.2-0.el7.x86_64 > ceph-common-0.94.2-0.el7.x86_64 > python-cephfs-0.94.2-0.el7.x86_64 > ceph-0.94.2-0.el7.x86_64 > [root@ceph0 ceph]# > > > >> -Original Message- >> From: Loic Dachary [mailto:l...@dachary.org] >> Sent: Friday, June 26, 2015 3:29 PM >> To: Bruce McFarland; ceph-users@lists.ceph.com >> Subject: Re: [ceph-users] RHEL 7.1 ceph-disk failures creating OSD >> >> Hi, >> >> Prior to firefly v0.80.8 ceph-disk zap did not call partprobe and that was >> causing the kind of problems you're experiencing. It was fixed by >> https://github.com/ceph/ceph/commit/e70a81464b906b9a304c29f474e672 >> 6762b63a7c and is described in more details at >> http://tracker.ceph.com/issues/9665. Rebooting the machine ensures the >> partition table is up to date and that's what you probably want to do after >> that kind of failure. You can however avoid the failure by running: >> >> * ceph-disk zap >> * partproble >> * ceph-disk prepare >> >> Cheers >> >> P.S. The "partx: /dev/disk/by-id/wwn-0x53959ba80a4e: error adding >> partition 1" can be ignored, it does not actually matter. A message was >> added later to avoid confusion with a real error. >> . >> On 26/06/2015 17:09, Bruce McFarland wrote: >>> I have moved storage nodes to RHEL 7.1 and used the basic server install. I >> installed ceph-deploy and used the ceph.repo/epel.repo for installation of >> ceph 80.7. I have tried ceph-disk with issuing "zap" on the same command >> line as "prepare" and on a separate command line immediately before the >> ceph-disk prepare. I consistently run into the partition errors and am unable >> to create OSD's on RHEL 7.1. >>> >>> >>> >>> ceph-disk prepare --cluster ceph --cluster-uuid 373a09f7-2070-4d20-8504- >> c8653fb6db80 --fs-type xfs --zap-disk /dev/disk/by-id/wwn- >> 0x53959ba80a4e /dev/disk/by-id/wwn-0x500080d9101001d6-part1 >>> >>> Caution: invalid backup GPT header, but valid main header; regeneratin
Re: [ceph-users] Trying to understand Cache Pool behavior
Hi Reid, Yes they will, but if the object which the user is writing to (Disk Block if using RBD, which then maps to an object) has never been written to before, it won't have to promote the object from the base pool before being able to write it. However as you write each object, once the cache pool is full, another object will be demoted down to the base tier. As long as you don't mind slow performance, using the cache tier should be ok. Otherwise wait until the next release as there will be several improvements. > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Reid Kelley > Sent: 27 June 2015 00:04 > To: ceph-users@lists.ceph.com > Subject: [ceph-users] Trying to understand Cache Pool behavior > > Have been reading the docs and trying to wrap my head around the idea of a > "write miss" with a cache tier in write-back mode. > > My use case is a large media archive, with write activity on file ingest > (previews and thumbs generated) followed by very cold limited ready > access. Seems to fit the cache model. > > What I am confused with is the write-miss. Would a user uploading a new file > every experience a write-miss? > > Thanks, > Reid > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] kernel 3.18 io bottlenecks?
Dear Ilya, Am 25.06.2015 um 14:07 schrieb Ilya Dryomov: On Wed, Jun 24, 2015 at 10:29 PM, Stefan Priebe wrote: Am 24.06.2015 um 19:53 schrieb Ilya Dryomov: On Wed, Jun 24, 2015 at 8:38 PM, Stefan Priebe wrote: Am 24.06.2015 um 16:55 schrieb Nick Fisk: That kernel probably has the bug where tcp_nodelay is not enabled. That is fixed in Kernel 4.0+, however also in 4.0 blk-mq was introduced which brings two other limitations:- blk-mq is terrible slow. That's correct. Is that a general sentiment or your experience with rbd? If the latter, can you describe your workload and provide some before and after blk-mq numbers? We'd be very interested in identifying and fixing any performance regressions you might have on blk-mq rbd. oh i'm sorry. I accidently compiled blk-mq into the kernel when 3.18.1 came out and i was wondering why my I/O waits on my ceph osds where doubled or even tripled. After reverting back to cfq everything was fine again. I didn't digged deeper into it as i thought blk-mq is experimental in 3.18. That doesn't make sense - rbd was switched to blk-mq in 4.0. Or did you try to apply the patch from the mailing list to 3.18? I'm talking about the ceph-osd process / side not about rbd client side. If you're willing to assist i can give it a try - but need the patches you mention first (git commit ids?). No commit ids as the patches are not upstream yet. I have everything gathered in testing+blk-mq-plug branch of ceph-client.git: https://github.com/ceph/ceph-client/tree/testing%2Bblk-mq-plug A deb (ubuntu, debian, etc): http://gitbuilder.ceph.com/kernel-deb-precise-x86_64-basic/ref/testing_blk-mq-plug/linux-image.deb An rpm (fedora, centos, rhel): http://gitbuilder.ceph.com/kernel-rpm-centos7-x86_64-basic/ref/testing_blk-mq-plug/kernel.x86_64.rpm These are built with slightly stripped down distro configs so it should boot most boxes. Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Trying to understand Cache Pool behavior
Sounds good, thanks for the info and will wait and test with next releases. > On Jun 27, 2015, at 9:24 AM, Nick Fisk wrote: > > Hi Reid, > > Yes they will, but if the object which the user is writing to (Disk Block if > using RBD, which then maps to an object) has never been written to before, > it won't have to promote the object from the base pool before being able to > write it. > > However as you write each object, once the cache pool is full, another > object will be demoted down to the base tier. > > As long as you don't mind slow performance, using the cache tier should be > ok. Otherwise wait until the next release as there will be several > improvements. > >> -Original Message- >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of >> Reid Kelley >> Sent: 27 June 2015 00:04 >> To: ceph-users@lists.ceph.com >> Subject: [ceph-users] Trying to understand Cache Pool behavior >> >> Have been reading the docs and trying to wrap my head around the idea of a >> "write miss" with a cache tier in write-back mode. >> >> My use case is a large media archive, with write activity on file ingest >> (previews and thumbs generated) followed by very cold limited ready >> access. Seems to fit the cache model. >> >> What I am confused with is the write-miss. Would a user uploading a new > file >> every experience a write-miss? >> >> Thanks, >> Reid >> >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] kernel 3.18 io bottlenecks?
On Sat, Jun 27, 2015 at 6:20 PM, Stefan Priebe wrote: > Dear Ilya, > > Am 25.06.2015 um 14:07 schrieb Ilya Dryomov: >> >> On Wed, Jun 24, 2015 at 10:29 PM, Stefan Priebe >> wrote: >>> >>> >>> Am 24.06.2015 um 19:53 schrieb Ilya Dryomov: On Wed, Jun 24, 2015 at 8:38 PM, Stefan Priebe wrote: > > > > Am 24.06.2015 um 16:55 schrieb Nick Fisk: >> >> >> >> That kernel probably has the bug where tcp_nodelay is not enabled. >> That >> is fixed in Kernel 4.0+, however also in 4.0 blk-mq was introduced >> which >> brings two other limitations:- > > > > > blk-mq is terrible slow. That's correct. Is that a general sentiment or your experience with rbd? If the latter, can you describe your workload and provide some before and after blk-mq numbers? We'd be very interested in identifying and fixing any performance regressions you might have on blk-mq rbd. >>> >>> >>> >>> oh i'm sorry. I accidently compiled blk-mq into the kernel when 3.18.1 >>> came >>> out and i was wondering why my I/O waits on my ceph osds where doubled or >>> even tripled. After reverting back to cfq everything was fine again. I >>> didn't digged deeper into it as i thought blk-mq is experimental in 3.18. >> >> >> That doesn't make sense - rbd was switched to blk-mq in 4.0. Or did >> you try to apply the patch from the mailing list to 3.18? > > > I'm talking about the ceph-osd process / side not about rbd client side. Ah, sorry - Nick was clearly talking about the kernel client and I replied to his mail. The kernel you run your OSDs on shouldn't matter much, as long as it's not something ancient (except when you need to work around a particular filesystem bug), so I just assumed you and German were talking about the kernel client. Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Redundant networks in Ceph
The current network design in Ceph (http://ceph.com/docs/master/rados/configuration/network-config-ref) uses nonredundant networks for both cluster and public communication. Ideally, in a high load environment these will be 10 or 40+ GbE networks. For cost reasons, most such installation will use the same switch hardware and separate Ceph traffic using VLANs. Networking in complex, and situations are possible when switches and routers drop traffic. We ran into one of those at one of our sites - connections to hosts stay up (so bonding NICs does not help), yet OSD communication gets disrupted, client IO hangs and failures cascade to client applications. My understanding is that if OSDs cannot connect for some time over the cluster network, that IO will hang and time out. The document states " If you specify more than one IP address and subnet mask for either the public or the cluster network, the subnets within the network must be capable of routing to each other." Which in real world means complicated Layer 3 setup for routing and is not practical in many configurations. What if there was an option for "cluster 2" and "public 2" networks, to which OSDs and MONs would go either in active/backup or active/active mode (cluster 1 and cluster 2 exist separately do not route to each other)? The difference between this setup and bonding is that here decision to fail over and try the other network is at OSD/MON level, and it bring resilience to faults within the switch core, which is really only detectable at application layer. Am I missing an already existing feature? Please advise. Best regards, Alex Gorbachev Intelligent Systems Services Inc. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Redundant networks in Ceph
Hi Alex, I think the answer is you do 1 of 2 things. You either design your network so that it is fault tolerant in every way so that network interruption is not possible. Or go with non-redundant networking, but design your crush map around the failure domains of the network. I'm interested in your example of where OSD's where unable to communicate. What happened? Would it possible to redesign the network to stop this happening? Nick > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Alex Gorbachev > Sent: 27 June 2015 19:02 > To: ceph-users@lists.ceph.com > Subject: [ceph-users] Redundant networks in Ceph > > The current network design in Ceph > (http://ceph.com/docs/master/rados/configuration/network-config-ref) > uses nonredundant networks for both cluster and public communication. > Ideally, in a high load environment these will be 10 or 40+ GbE networks. For > cost reasons, most such installation will use the same switch hardware and > separate Ceph traffic using VLANs. > > Networking in complex, and situations are possible when switches and > routers drop traffic. We ran into one of those at one of our sites - > connections to hosts stay up (so bonding NICs does not help), yet OSD > communication gets disrupted, client IO hangs and failures cascade to client > applications. > > My understanding is that if OSDs cannot connect for some time over the > cluster network, that IO will hang and time out. The document states " > > If you specify more than one IP address and subnet mask for either the > public or the cluster network, the subnets within the network must be > capable of routing to each other." > > Which in real world means complicated Layer 3 setup for routing and is not > practical in many configurations. > > What if there was an option for "cluster 2" and "public 2" networks, to which > OSDs and MONs would go either in active/backup or active/active mode > (cluster 1 and cluster 2 exist separately do not route to each other)? > > The difference between this setup and bonding is that here decision to fail > over and try the other network is at OSD/MON level, and it bring resilience to > faults within the switch core, which is really only detectable at application > layer. > > Am I missing an already existing feature? Please advise. > > Best regards, > Alex Gorbachev > Intelligent Systems Services Inc. > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ubuntu -Juno Openstack - Ceph integrated - Istalling ubuntu server instance
Hello everyone, I created a bootable volume on openstack and trying to boot a ubuntu-server 14.04 image . I am able to get to the initial setup screen of Ubuntu, but after asking for the timezone and location I am not able to proceed further as it says it's not able to access the CDrom drive . The backend storage is connected to Ceph.Does anyone any solutions workarounds to this issue.? cinder list +--+---+--+--+-+--+--+ | ID | Status | Display Name | Size | Volume Type | Bootable | Attached to | +--+---+--+--+-+--+--+ | 002762cc-2e4b-417d-9d33-c90d6e87e758 | in-use | my-boot-vol| 10 | None| true | 78b80f78-07e9-46ed-8b0c-0e807a4c0805 | | 94164dcb-7e7a-4d87-89b7-f535edf299b6 | available | cinder-ceph-vol1 | 10 | None| false | | | c1f39f7d-82ef-48cd-884a-86431d251e43 | available | ubuntu-boot-vol2 | 40 | None| true | | | ef346116-8ed8-48d0-9401-78b5c120a4ef | available | cinder-ceph-vol2 | 100 | None| false | | | f67f79d0-cafb-4575-b88a-765fc631aa42 | in-use | ubuntu-boot-vol | 10 | None| true | c1289143-9bed-4d21-a12d-18c22b01163e | +--+---+--+--+-+--+-- nova list +--+--+-++-+--+ | ID | Name | Status | Task State | Power State | Networks | +--+--+-++-+--+ | 78b80f78-07e9-46ed-8b0c-0e807a4c0805 | dsl-linux| SHUTOFF | - | Shutdown| ext-net=10.11.12.158 | | b6738a95-9bf6-441f-9cd5-e0665111c040 | testvm-1 | ACTIVE | - | Running | ext-net=10.11.12.156 | | 01dd21c3-c84d-4572-93ef-c3904a6f877e | ubuntu-server-14 | ACTIVE | - | Running | ext-net=10.11.12.160 | +--+--+-++-+-- [cid:image001.png@01D0B138.1FC2A800] Regards Teclus Dsouza ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Redundant networks in Ceph
Hi Nick, Thank you fro writing back: > I think the answer is you do 1 of 2 things. You either design your network > so that it is fault tolerant in every way so that network interruption is > not possible. Or go with non-redundant networking, but design your crush map > around the failure domains of the network. We'll redesign the network shortly - the problem is in general that I am finding it is possible, in even well designed redundant networks, to have packet loss occur for various reasons (maintenance, cables, protocol issues etc.). So while there is not an interruption (defined as 100% service loss), there may be occasional packet loss issues and high latency situations, even when the backbone is very fast. The CRUSH map idea sounds interesting. But there are still concerns, such as massive data relocations East-West (between racks in a leaf-spine architecture such as https://community.mellanox.com/docs/DOC-1475 , should there be an outage in the spine. Plus such issues are enormously hard to troubleshoot. > I'm interested in your example of where OSD's where unable to communicate. > What happened? Would it possible to redesign the network to stop this > happening? Our SuperCore design uses Ceph OSD nodes to provide storage to LIO Target iSCSI nodes, which then deliver it to ESXi hosts. LIO is sensitive to hangs, and often we see an RBD hang translate into iSCSI timeout, which causes ESXi to abort connections, hang and crash applications. This only happens at one site, where it is likely there is a switch issue somewhere. These issues are sporadic and come and go as storms - so far all Ceph analysis pointed to network disruptions, from which the RBD client is unable to recover. The network vendor still cannot find anything wrong. We'll replace the whole network, but I was thinking, having seen such issues at a few other sites, if a "B-bus" for networking would be a good design for OSDs. This approach is commonly used in traditional SANs, where the "A bus" and "B bus" are not connected,so they cannot possibly cross contaminate in any way. Another reference is multipathing, where IO can be send via redundant paths - most storage vendors recommend using application (higher) level multipathing (aka MPIO) vs. network redundancy (such as bonding). We find this to be a valid recommendation as clients run into issues less. Somewhat related to http://serverfault.com/questions/510882/why-mpio-instead-of-802-3ad-team-for-iscsi to quote - "MPIO detects and handles path failures, whereas 802.3ad can only compensate for a link failure". I see OSD connections as paths, rather than links, as these are higher level object storage exchanges. Thank you, Alex > > Nick > >> -Original Message- >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of >> Alex Gorbachev >> Sent: 27 June 2015 19:02 >> To: ceph-users@lists.ceph.com >> Subject: [ceph-users] Redundant networks in Ceph >> >> The current network design in Ceph >> (http://ceph.com/docs/master/rados/configuration/network-config-ref) >> uses nonredundant networks for both cluster and public communication. >> Ideally, in a high load environment these will be 10 or 40+ GbE networks. > For >> cost reasons, most such installation will use the same switch hardware and >> separate Ceph traffic using VLANs. >> >> Networking in complex, and situations are possible when switches and >> routers drop traffic. We ran into one of those at one of our sites - >> connections to hosts stay up (so bonding NICs does not help), yet OSD >> communication gets disrupted, client IO hangs and failures cascade to > client >> applications. >> >> My understanding is that if OSDs cannot connect for some time over the >> cluster network, that IO will hang and time out. The document states " >> >> If you specify more than one IP address and subnet mask for either the >> public or the cluster network, the subnets within the network must be >> capable of routing to each other." >> >> Which in real world means complicated Layer 3 setup for routing and is not >> practical in many configurations. >> >> What if there was an option for "cluster 2" and "public 2" networks, to > which >> OSDs and MONs would go either in active/backup or active/active mode >> (cluster 1 and cluster 2 exist separately do not route to each other)? >> >> The difference between this setup and bonding is that here decision to > fail >> over and try the other network is at OSD/MON level, and it bring > resilience to >> faults within the switch core, which is really only detectable at > application >> layer. >> >> Am I missing an already existing feature? Please advise. >> >> Best regards, >> Alex Gorbachev >> Intelligent Systems Services Inc. >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > ___ ceph-users mailing l
Re: [ceph-users] Redundant networks in Ceph
> -Original Message- > From: Alex Gorbachev [mailto:a...@iss-integration.com] > Sent: 27 June 2015 21:55 > To: Nick Fisk > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Redundant networks in Ceph > > Hi Nick, > > Thank you fro writing back: > > > I think the answer is you do 1 of 2 things. You either design your > > network so that it is fault tolerant in every way so that network > > interruption is not possible. Or go with non-redundant networking, but > > design your crush map around the failure domains of the network. > > We'll redesign the network shortly - the problem is in general that I am > finding it is possible, in even well designed redundant networks, to have > packet loss occur for various reasons (maintenance, cables, protocol issues > etc.). So while there is not an interruption (defined as 100% service loss), > there may be occasional packet loss issues and high latency situations, even > when the backbone is very fast. I know what you mean, no matter how hard you try something unexpected always happens. That said I think OSD timeouts should be higher than HSRP and spanning tree convergence times, so I think it should survive most incidents that I can think of. > > The CRUSH map idea sounds interesting. But there are still concerns, such as > massive data relocations East-West (between racks in a leaf-spine > architecture such as > https://community.mellanox.com/docs/DOC-1475 , should there be an > outage in the spine. Plus such issues are enormously hard to troubleshoot. You can set the maximum crush grouping that will allow OSD's to be marked out. You can use this to stop unwanted data movement from occurring during outages. > > > I'm interested in your example of where OSD's where unable to > communicate. > > What happened? Would it possible to redesign the network to stop this > > happening? > > Our SuperCore design uses Ceph OSD nodes to provide storage to LIO Target > iSCSI nodes, which then deliver it to ESXi hosts. LIO is sensitive to hangs, > and > often we see an RBD hang translate into iSCSI timeout, which causes ESXi to > abort connections, hang and crash applications. This only happens at one > site, where it is likely there is a switch issue somewhere. These issues are > sporadic and come and go as storms - so far all Ceph analysis pointed to > network disruptions, from which the RBD client is unable to recover. The > network vendor still cannot find anything wrong. Ah, yeah, been there with LIO and esxi and gave up on it. I found any pause longer than around 10 seconds would send both of them into a death spiral. I know you currently only see it due to some networking blip, but you will most likely also see it when disks fail...etc For me I couldn't have all my Datastores going down every time something blipped or got a little slow. There are discussions ongoing about it on the Target mailing list and Mike Christie from Redhat is looking into the problem, so hopefully it will get sorted at some point. For what it's worth, both SCST and TGT seem to be immune from this. > > We'll replace the whole network, but I was thinking, having seen such issues > at a few other sites, if a "B-bus" for networking would be a good design for > OSDs. This approach is commonly used in traditional SANs, where the "A > bus" and "B bus" are not connected,so they cannot possibly cross > contaminate in any way. Probably implementing something like multipathTCP would be the best bet to mirror the traditional dual fabric SAN design. > > Another reference is multipathing, where IO can be send via redundant > paths - most storage vendors recommend using application (higher) level > multipathing (aka MPIO) vs. network redundancy (such as bonding). We find > this to be a valid recommendation as clients run into issues less. Somewhat > related to http://serverfault.com/questions/510882/why-mpio-instead-of- > 802-3ad-team-for-iscsi > to quote - "MPIO detects and handles path failures, whereas 802.3ad can > only compensate for a link failure". > > I see OSD connections as paths, rather than links, as these are higher level > object storage exchanges. > > Thank you, > Alex > > > > > Nick > > > >> -Original Message- > >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf > >> Of Alex Gorbachev > >> Sent: 27 June 2015 19:02 > >> To: ceph-users@lists.ceph.com > >> Subject: [ceph-users] Redundant networks in Ceph > >> > >> The current network design in Ceph > >> (http://ceph.com/docs/master/rados/configuration/network-config-ref) > >> uses nonredundant networks for both cluster and public communication. > >> Ideally, in a high load environment these will be 10 or 40+ GbE networks. > > For > >> cost reasons, most such installation will use the same switch > >> hardware and separate Ceph traffic using VLANs. > >> > >> Networking in complex, and situations are possible when switches and > >> routers drop traffic. We ran
[ceph-users] SSL Certificate failure when attaching volume to VM
Dear Ceph Community, We are trying to integrate Ceph with OpenStack, and facing certificate issues when attaching a Cinder volume to a Nova vm. We have the environment variable OS_CACERT set to the correct certificate address, which is read to set cacert. The certificate is verified successfully in creating images, volumes, and vms. However when the compute vm tries to communicate with the controller, the certificate fails to verify. Is there a configuration variable that must be set for the certificate to verify correctly? Any advice is much appreciated. 2015-06-26 23:15:41.526 1437 TRACE oslo.messaging.rpc.dispatcher 2015-06-26 23:15:41.528 1437 ERROR oslo.messaging._drivers.common [-] Returning exception Unable to establish connection: [Errno 1] _ssl.c:492: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed to caller 2015-06-26 23:15:41.528 1437 ERROR oslo.messaging._drivers.common [-] ['Traceback (most recent call last):\n', ' File "/usr/lib/python2.6/site-packages/oslo/messaging/rpc/dispatcher.py", line 133, in _dispatch_and_reply\nincoming.message))\n', ' File "/usr/lib/python2.6/site-packages/oslo/messaging/rpc/dispatcher.py", line 176, in _dispatch\nreturn self._do_dispatch(endpoint, method, ctxt, args)\n', ' File "/usr/lib/python2.6/site-packages/oslo/messaging/rpc/dispatcher.py", line 122, in _do_dispatch\nresult = getattr(endpoint, method)(ctxt, **new_args)\n', ' File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 393, in decorated_function\nreturn function(self, context, *args, **kwargs)\n', ' File "/usr/lib/python2.6/site-packages/nova/exception.py", line 88, in wrapped\npayload)\n', ' File "/usr/lib/python2.6/site-packages/nova/openstack/common/excutils.py", line 68, in __exit__\nsix.reraise(self.type_, self.value, self.tb)\n', ' File "/usr/lib/python2.6/site-packages/nova/exception.py", line 71, in wrapped\n return f(self, context, *args, **kw)\n', ' File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 274, in decorated_function\npass\n', ' File "/usr/lib/python2.6/site-packages/nova/openstack/common/excutils.py", line 68, in __exit__\nsix.reraise(self.type_, self.value, self.tb)\n', ' File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 260, in decorated_function\nreturn function(self, context, *args, **kwargs)\n', ' File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 303, in decorated_function\ne, sys.exc_info())\n', ' File "/usr/lib/python2.6/site-packages/nova/openstack/common/excutils.py", line 68, in __exit__\nsix.reraise(self.type_, self.value, self.tb)\n', ' File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 290, in decorated_function\nreturn function(self, context, *args, **kwargs)\n', ' File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 4167, in attach_volume\nbdm.destroy(context)\n', ' File "/usr/lib/python2.6/site-packages/nova/openstack/common/excutils.py", line 68, in __exit__\nsix.reraise(self.type_, self.value, self.tb)\n', ' File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 4164, in attach_volume\nreturn self._attach_volume(context, instance, driver_bdm)\n', ' File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 4185, in _attach_volume\nself.volume_api.unreserve_volume(context, bdm.volume_id)\n', ' File "/usr/lib/python2.6/site-packages/nova/volume/cinder.py", line 173, in wrapper\nres = method(self, ctx, volume_id, *args, **kwargs)\n', ' File "/usr/lib/python2.6/site-packages/nova/volume/cinder.py", line 249, in unreserve_volume\ncinderclient(context).volumes.unreserve(volume_id)\n', ' File "/usr/lib/python2.6/site-packages/cinderclient/v1/volumes.py", line 293, in unreserve\nreturn self._action(\'os-unreserve\', volume)\n', ' File "/usr/lib/python2.6/site-packages/cinderclient/v1/volumes.py", line 250, in _action\nreturn self.api.client.post(url, body=body)\n', ' File "/usr/lib/python2.6/site-packages/cinderclient/client.py", line 223, in post\n return self._cs_request(url, \'POST\', **kwargs)\n', ' File "/usr/lib/python2.6/site-packages/cinderclient/client.py", line 212, in _cs_request\nraise exceptions.ConnectionError(msg)\n', 'ConnectionError: Unable to establish connection: [Errno 1] _ssl.c:492: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed\n'] Sincerely, Johanni B. Thunstrom ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com