date:20150717

Re: [ceph-users] RGW Malformed Headers

2015-07-17 Thread Simon Murray

Test complete. Civet still shows the same problem: https://gist.github.com/spjmurray/88203f564389294b3774 "/admin/user?uid=admin" is fine "/admin/user?quota&uid=admin"a-type=user" is not so good. Upgrade to 0.94.2 didn't solve the problem nor 9.0.2. Unless anyone knows anything more I'll go a

[ceph-users] OSD RAM usage values

2015-07-17 Thread Kenneth Waegeman

Hi all, I've read in the documentation that OSDs use around 512MB on a healthy cluster.(http://ceph.com/docs/master/start/hardware-recommendations/#ram) Now, our OSD's are all using around 2GB of RAM memory while the cluster is healthy. PID USER PR NIVIRTRESSHR S %CPU %M

Re: [ceph-users] OSD RAM usage values

2015-07-17 Thread Gregory Farnum

On Fri, Jul 17, 2015 at 1:13 PM, Kenneth Waegeman wrote: > Hi all, > > I've read in the documentation that OSDs use around 512MB on a healthy > cluster.(http://ceph.com/docs/master/start/hardware-recommendations/#ram) > Now, our OSD's are all using around 2GB of RAM memory while the cluster is > h

Re: [ceph-users] 10d

2015-07-17 Thread Dan van der Ster

Hi Greg + list, Sorry to reply to this old'ish thread, but today one of these PGs bit us in the ass. Running hammer 0.94.2, we are deleting pool 36 and the OSDs 30, 171, and 69 all crash when trying to delete pg 36.10d. They all crash with ENOTEMPTY suggests garbage data in osd data dir (ful

Re: [ceph-users] 10d

2015-07-17 Thread Gregory Farnum

I think you'll need to use the ceph-objectstore-tool to remove the PG/data consistently, but I've not done this — David or Sam will need to chime in. -Greg On Fri, Jul 17, 2015 at 2:15 PM, Dan van der Ster wrote: > Hi Greg + list, > > Sorry to reply to this old'ish thread, but today one of these

Re: [ceph-users] 10d

2015-07-17 Thread Dan van der Ster

Thanks for the quick reply. We /could/ just wipe these OSDs and start from scratch (the only other pools were 4+2 ec and recovery already brought us to 100% active+clean). But it'd be good to understand and prevent this kind of crash... Cheers, Dan On Fri, Jul 17, 2015 at 3:18 PM, Gregory Fa

Re: [ceph-users] 10d

2015-07-17 Thread Dan van der Ster

A bit of progress: rm'ing everything from inside current/36.10d_head/ actually let the OSD start and continue deleting other PGs. Cheers, Dan On Fri, Jul 17, 2015 at 3:26 PM, Dan van der Ster wrote: > Thanks for the quick reply. > > We /could/ just wipe these OSDs and start from scratch (the onl

[ceph-users] Deadly slow Ceph cluster revisited

2015-07-17 Thread J David

This is the same cluster I posted about back in April. Since then, the situation has gotten significantly worse. Here is what iostat looks like for the one active RBD image on this cluster: Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_awai

Re: [ceph-users] Deadly slow Ceph cluster revisited

2015-07-17 Thread Mark Nelson

On 07/17/2015 08:38 AM, J David wrote: This is the same cluster I posted about back in April. Since then, the situation has gotten significantly worse. Here is what iostat looks like for the one active RBD image on this cluster: Device: rrqm/s wrqm/s r/s w/srkB/swkB/s

Re: [ceph-users] Deadly slow Ceph cluster revisited

2015-07-17 Thread Quentin Hartman

What does "ceph status" say? I had a problem with similar symptoms some months ago that was accompanied by OSDs getting marked out for no apparent reason and the cluster going into a HEALTH_WARN state intermittently. Ultimately the root of the problem ended up being a faulty NIC. Once I took that o

Re: [ceph-users] Deadly slow Ceph cluster revisited

2015-07-17 Thread J David

On Fri, Jul 17, 2015 at 10:21 AM, Mark Nelson wrote: > rados -p 30 bench write > > just to see how it handles 4MB object writes. Here's that, from the VM host: Total time run: 52.062639 Total writes made: 66 Write size: 4194304 Bandwidth (MB/sec): 5.071 Stddev Ban

[ceph-users] Problem re-running dpkg-buildpackages with '-nc' option

2015-07-17 Thread Bartłomiej Święcki

Hi all, I'm trying to rebuild ceph deb packages using 'dpkg-buildpackages -nc'. Without '-nc' the compilation works fine but obviously takes a long time. When I add the '-nc' option, I end up with following issues: > .. > ./check_version ./.git_version > ./.git_version is up to date. > CXXL

Re: [ceph-users] Deadly slow Ceph cluster revisited

2015-07-17 Thread J David

On Fri, Jul 17, 2015 at 10:47 AM, Quentin Hartman wrote: > What does "ceph status" say? Usually it says everything is cool. However just now it gave this: cluster e9c32e63-f3eb-4c25-b172-4815ed566ec7 health HEALTH_WARN 2 requests are blocked > 32 sec monmap e3: 3 mons at {f16=192.

Re: [ceph-users] Deadly slow Ceph cluster revisited

2015-07-17 Thread Quentin Hartman

That looks a lot like what I was seeing initially. The OSDs getting marked out was relatively rare and it took a bit before I saw it. I ended up digging into the logs on the OSDs themselves to discover that they were getting marked out. The messages were like "So-and-so incorrectly marked us out" I

Re: [ceph-users] Deadly slow Ceph cluster revisited

2015-07-17 Thread J David

On Fri, Jul 17, 2015 at 11:15 AM, Quentin Hartman wrote: > That looks a lot like what I was seeing initially. The OSDs getting marked > out was relatively rare and it took a bit before I saw it. Our problem is "most of the time" and does not appear confined to a specific ceph cluster node or OSD:

Re: [ceph-users] Deadly slow Ceph cluster revisited

2015-07-17 Thread Shane Gibson

David - I'm new to Ceph myself, so can't point out any smoking guns - but your problem "feels" like a network issue. I suggest you check all of your OSD/Mon/Clients network interfaces. Check for errors, check that they are negotiating the same link speed/type with your switches (if you have LLD

Re: [ceph-users] Deadly slow Ceph cluster revisited

2015-07-17 Thread Steve Dainard

Disclaimer: I'm relatively new to ceph, and haven't moved into production with it. Did you run your bench for 30 seconds? For reference my bench from a VM bridged to a 10Gig card with 90x4TB at 30 seconds is: Total time run: 30.766596 Total writes made: 1979 Write size:

Re: [ceph-users] Deadly slow Ceph cluster revisited

2015-07-17 Thread Somnath Roy

I would say use admin socket to find out which part is causing most of the latencies, don't rule out disk anomalies. Thanks & Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of J David Sent: Friday, July 17, 2015 8:07 AM To: Quent

Re: [ceph-users] Deadly slow Ceph cluster revisited

2015-07-17 Thread Mark Nelson

On 07/17/2015 09:55 AM, J David wrote: On Fri, Jul 17, 2015 at 10:21 AM, Mark Nelson wrote: rados -p 30 bench write just to see how it handles 4MB object writes. Here's that, from the VM host: Total time run: 52.062639 Total writes made: 66 Write size: 4194304

Re: [ceph-users] Dont used fqdns in "monmaptool" and "ceph-mon --mkfs"

2015-07-17 Thread Shane Gibson

On 7/16/15, 9:51 PM, "ceph-users on behalf of Goncalo Borges" wrote: >Once I substituted the fqdn by simply the hostname (without the domain) >it worked. Goncalo, I ran into the same problems too - and ended up bailing on the "ceph-deploy" tools and manually building my clusters ... eventual

Re: [ceph-users] Slow requests during ceph osd boot

2015-07-17 Thread Kostis Fardelas

Thanks for your answers, we will also experiment with osd recovery max active / threads and will come back to you Regards, Kostis On 16 July 2015 at 12:29, Jan Schermer wrote: > For me setting recovery_delay_start helps during the OSD bootup _sometimes_, > but it clearly does something differen

Re: [ceph-users] OSD latency inaccurate reports?

2015-07-17 Thread Kostis Fardelas

Also, by running ceph osd perf, I see that fs_apply_latency is larger than fs_commit_latency. Shouldn't that be the opposite? Apply latency is afaik the time that it takes to to apply updates to the file system in page cache. Commitcycle latency is the time it takes to flush cache on disks, right?

Re: [ceph-users] Unsetting osd_crush_chooseleaf_type = 0

2015-07-17 Thread Robert LeBlanc

-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Yes you will need to change osd to host as you thought so that copies will be separated between hosts. You will run into problems you see until that is changed. It will cause data movement. - Robert LeBlanc PGP Fingerprint 79A2 9CA4

Re: [ceph-users] Deadly slow Ceph cluster revisited

2015-07-17 Thread J David

On Fri, Jul 17, 2015 at 12:19 PM, Mark Nelson wrote: > Maybe try some iperf tests between the different OSD nodes in your > cluster and also the client to the OSDs. This proved to be an excellent suggestion. One of these is not like the others: f16 inbound: 6Gbps f16 outbound: 6Gbps f17 inbound

Re: [ceph-users] Workaround for RHEL/CentOS 7.1 rbdmap service start warnings?

2015-07-17 Thread Steve Dainard

Other than those errors, do you find RBD's will not be unmapped on system restart/shutdown on a machine using systemd? Leaving the system hanging without network connections trying to unmap RBD's? That's been my experience thus far, so I wrote an (overly simple) systemd file to handle this on a pe

Re: [ceph-users] Workaround for RHEL/CentOS 7.1 rbdmap service start warnings?

2015-07-17 Thread Bruce McFarland

Yes the rbd's are not remapped at system boot time. I haven't run into a VM or system hang because this since I ran into it as part of investigating using RHEL 7.1 as a client distro. Yes remapping the rbd's in a startup script worked around the issue. > -Original Message- > From: Stev

Re: [ceph-users] Deadly slow Ceph cluster revisited

2015-07-17 Thread Quentin Hartman

Glad we were able to point you in the right direction! I would suspect a borderline cable at this point. Did you happen to notice if the interface had negotiated down to some dumb speed? If it had, I've seen cases where a dodgy cable has caused an intermittent problem that causes it to negotiate th

Re: [ceph-users] Deadly slow Ceph cluster revisited

2015-07-17 Thread Steve Thompson

On Fri, 17 Jul 2015, J David wrote: f16 inbound: 6Gbps f16 outbound: 6Gbps f17 inbound: 6Gbps f17 outbound: 6Gbps f18 inbound: 6Gbps f18 outbound: 1.2Mbps Unless the network was very busy when you did this, I think that 6 Gb/s may not be very good either. Usually iperf will give you much more

Re: [ceph-users] backing Hadoop with Ceph ??

2015-07-17 Thread Josh Durgin

On 07/15/2015 11:48 AM, Shane Gibson wrote: Somnath - thanks for the reply ... :-) Haven't tried anything yet - just starting to gather info/input/direction for this solution. Looking at the S3 API info [2] - there is no mention of support for the "S3a" API extensions - namely "rename" suppor

Re: [ceph-users] Deadly slow Ceph cluster revisited

2015-07-17 Thread Alex Gorbachev

May I suggest checking also the error counters on your network switch? Check speed and duplex. Is bonding in use? Is flow control on? Can you swap the network cable? Can you swap a NIC with another node and does the problem follow? Hth, Alex On Friday, July 17, 2015, Steve Thompson wrote: >

Re: [ceph-users] RGW Malformed Headers

[ceph-users] OSD RAM usage values

Re: [ceph-users] OSD RAM usage values

Re: [ceph-users] 10d

Re: [ceph-users] 10d

Re: [ceph-users] 10d

Re: [ceph-users] 10d

[ceph-users] Deadly slow Ceph cluster revisited

Re: [ceph-users] Deadly slow Ceph cluster revisited

Re: [ceph-users] Deadly slow Ceph cluster revisited

Re: [ceph-users] Deadly slow Ceph cluster revisited

[ceph-users] Problem re-running dpkg-buildpackages with '-nc' option

Re: [ceph-users] Deadly slow Ceph cluster revisited

Re: [ceph-users] Deadly slow Ceph cluster revisited

Re: [ceph-users] Deadly slow Ceph cluster revisited

Re: [ceph-users] Deadly slow Ceph cluster revisited

Re: [ceph-users] Deadly slow Ceph cluster revisited

Re: [ceph-users] Deadly slow Ceph cluster revisited

Re: [ceph-users] Deadly slow Ceph cluster revisited

Re: [ceph-users] Dont used fqdns in "monmaptool" and "ceph-mon --mkfs"

Re: [ceph-users] Slow requests during ceph osd boot

Re: [ceph-users] OSD latency inaccurate reports?

Re: [ceph-users] Unsetting osd_crush_chooseleaf_type = 0

Re: [ceph-users] Deadly slow Ceph cluster revisited

Re: [ceph-users] Workaround for RHEL/CentOS 7.1 rbdmap service start warnings?

Re: [ceph-users] Workaround for RHEL/CentOS 7.1 rbdmap service start warnings?

Re: [ceph-users] Deadly slow Ceph cluster revisited

Re: [ceph-users] Deadly slow Ceph cluster revisited

Re: [ceph-users] backing Hadoop with Ceph ??

Re: [ceph-users] Deadly slow Ceph cluster revisited

30 matches

Site Navigation

Mail list logo

Footer information