Re: [ceph-users] [EXTERNAL] Ceph performance is too good (impossible..)...

2016-12-12 Thread Will . Boege
My understanding is that when using direct=1 on a raw block device FIO (aka-you) will have to handle all the sector alignment or the request will get buffered to perform the alignment. Try adding the –blockalign=512b option to your jobs, or better yet just use the native FIO RBD engine. Someth

Re: [ceph-users] [EXTERNAL] Re: 2x replication: A BIG warning

2016-12-07 Thread Will . Boege
Thanks for the explanation. I guess this case you outlined explains why the Ceph developers chose to make this a ‘safe’ default. 2 osds are transiently down and the third fails hard. The PGs on the 3rd osd with no more replicas are marked unfound. You bring up 1 and 2 and these PGs will remai

Re: [ceph-users] [EXTERNAL] Re: 2x replication: A BIG warning

2016-12-07 Thread Will . Boege
Hi Wido, Just curious how blocking IO to the final replica provides protection from data loss? I’ve never really understood why this is a Ceph best practice. In my head all 3 replicas would be on devices that have roughly the same odds of physically failing or getting logically corrupted in a

Re: [ceph-users] [EXTERNAL] Re: ceph in an OSPF environment

2016-11-23 Thread Will . Boege
Check your MTU. I think ospf has issues when fragmenting. Try setting your interface MTU to something obnoxiously small to ensure that anything upstream isn't fragmenting - say 1200. If it works try a saner value like 1496 which accounts for any vlan headers. If you're running in a spine/leaf

Re: [ceph-users] [EXTERNAL] Re: osd set noin ignored for old OSD ids

2016-11-23 Thread Will . Boege
>From my experience noin doesn't stop new OSDs from being marked in. noin only >works on OSDs already in the crushmap. To accomplish the behavior you want >I've injected "mon osd auto mark new in = false" into MONs. This also seems to >set their OSD weight to 0 when they are created. > On Nov

Re: [ceph-users] [EXTERNAL] Big problems encoutered during upgrade from hammer 0.94.5 to jewel 10.2.3

2016-11-13 Thread Will . Boege
Hi Vincent, When I did a similar upgrade I found that having mixed version OSDs caused issues much like yours. My advice to you is to power through the upgrade as fast as possible. Pretty sure this is related to an issue/bug discussed here previously around excessive load on the monitors in mix

Re: [ceph-users] [EXTERNAL] Re: pg stuck with unfound objects on non exsisting osd's

2016-11-01 Thread Will . Boege
Start with a rolling restart of just the OSDs one system at a time, checking the status after each restart. On Nov 1, 2016, at 6:20 PM, Ronny Aasen mailto:ronny+ceph-us...@aasen.cx>> wrote: thanks for the suggestion. is a rolling reboot sufficient? or must all osd's be down at the same time ?

Re: [ceph-users] [EXTERNAL] Re: Instance filesystem corrupt

2016-10-26 Thread Will . Boege
Strangely enough, I’m also seeing similar user issues – a strangely high volume of corrupt instance boot disks. At this point I’m attributing it to the fact that our Ceph cluster is patched 9 months ahead of our RedHat OSP Kilo environment. However that’s a total guess at this point….. From:

Re: [ceph-users] [EXTERNAL] Instance filesystem corrupt

2016-10-25 Thread Will . Boege
Just out of curiosity, did you recently upgrade to Jewel? From: ceph-users on behalf of "keynes_...@wistron.com" Date: Tuesday, October 25, 2016 at 10:52 PM To: "ceph-users@lists.ceph.com" Subject: [EXTERNAL] [ceph-users] Instance filesystem corrupt We are using OpenStack + Ceph. Recently we

Re: [ceph-users] [EXTERNAL] Benchmarks using fio tool gets stuck

2016-10-05 Thread Will . Boege
Because you do not have segregated networks, the cluster traffic is most likely drowning out the FIO user traffic. This is especially exacerbated by the fact that it is only a 1gb link between the cluster nodes. If you are planning on using this cluster for anything other than testing, you’ll

Re: [ceph-users] [EXTERNAL] Benchmarks using fio tool gets stuck

2016-10-05 Thread Will . Boege
What does your network setup look like? Do you have a separate cluster network? Can you explain how you are performing the FIO test? Are you mounting a volume through krbd and testing that from a different server? On Oct 5, 2016, at 3:11 AM, Mario Rodríguez Molins mailto:mariorodrig...@tuenti.

Re: [ceph-users] [EXTERNAL] Upgrading 0.94.6 -> 0.94.9 saturating mon node networking

2016-09-22 Thread Will . Boege
Just went through this upgrading a ~400 OSD cluster. I was in the EXACT spot you were in. The faster you can get all OSDs to the same version as the MONs the better. We decided to power forward and the performance got better for every OSD node we patched. Additionally I also discovered your Le

Re: [ceph-users] [EXTERNAL] Re: jewel blocked requests

2016-09-19 Thread Will . Boege
Sorry make that 'ceph tell osd.* version' > On Sep 19, 2016, at 2:55 PM, WRIGHT, JON R (JON R) > wrote: > > When you say client, we're actually doing everything through Openstack vms > and cinder block devices. > > librbd and librados are: > > /usr/lib/librbd.so.1.0.0 > > /usr/lib/librados.

Re: [ceph-users] [EXTERNAL] Re: jewel blocked requests

2016-09-19 Thread Will . Boege
Do you still have OSDs that aren't upgraded? What does a 'ceph tell osd.* show' ? > On Sep 19, 2016, at 2:55 PM, WRIGHT, JON R (JON R) > wrote: > > When you say client, we're actually doing everything through Openstack vms > and cinder block devices. > > librbd and librados are: > > /usr/li

Re: [ceph-users] [EXTERNAL] Re: Increase PG number

2016-09-18 Thread Will . Boege
How many PGs do you have - and how many are you increasing it to? Increasing PG counts can be disruptive if you are increasing by a large proportion of the initial count because all the PG peering involved. If you are doubling the amount of PGs it might be good to do it in stages to minimize p

[ceph-users] Keystone RADOSGW ACLs

2015-10-19 Thread Will . Boege
I'm working with some teams who would like to not only create ACLs within RADOSGW to a tenant level, they would like to tailor ACLs to users within that tenant. After trial and error, I can only seem to get ACLs to stick at a tenant level using the keystone tenant ID uuid. Is this expected beh

Re: [ceph-users] Is there a way to configure a cluster_network for a running cluster?

2015-08-17 Thread Will . Boege
Bah, what Waldo said. Forgot the MONs don’t use the cluster net. Do what he said you’ll be fine. On 8/17/15, 8:41 PM, "Will.Boege" wrote: >Thinking this through, pretty sure you would need to take your cluster >offline to do this. I can¹t think of a scenario where you could reliably >keep quo

Re: [ceph-users] Is there a way to configure a cluster_network for a running cluster?

2015-08-17 Thread Will . Boege
Thinking this through, pretty sure you would need to take your cluster offline to do this. I can¹t think of a scenario where you could reliably keep quorum as you swap your monitors to use the cluster network. On 8/10/15, 8:59 AM, "Daniel Marks" wrote: >Hi all, > >we just found out that our cep

Re: [ceph-users] slow requests going up and down

2015-07-14 Thread Will . Boege
In my experience I have seen something like this this happen twice - First time there were unclean PGs because Ceph was down to one replica of a PG. When that happens Ceph blocks IO to remaining replicas when the number falls below the Œmin_size¹ parameter. That will manifest as blocked ops. Second

Re: [ceph-users] slow requests going up and down

2015-07-13 Thread Will . Boege
Does the ceph health detail show anything about stale or unclean PGs, or are you just getting the blocked ops messages? On 7/13/15, 5:38 PM, "Deneau, Tom" wrote: >I have a cluster where over the weekend something happened and successive >calls to ceph health detail show things like below. >What