date:20150904

Re: [ceph-users] libvirt rbd issue

2015-09-04 Thread Rafael Lopez

We don't have thousands but these RBDs are in a pool backed by ~600ish. I can see the fd count is up well past 10k, closer to 15k when I use a decent number of RBDs (eg. 16 or 32) and seems to increase more the bigger the file I write. Procs are almost 30k when writing a 50GB file across that numb

Re: [ceph-users] high density machines

2015-09-04 Thread Gurvinder Singh

On 09/04/2015 02:31 AM, Wang, Warren wrote: > In the minority on this one. We have a number of the big SM 72 drive units w/ > 40 Gbe. Definitely not as fast as even the 36 drive units, but it isn't awful > for our average mixed workload. We can exceed all available performance with > some worklo

Re: [ceph-users] high density machines

2015-09-04 Thread Nick Fisk

> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Gurvinder Singh > Sent: 04 September 2015 08:57 > To: Wang, Warren ; Mark Nelson > ; ceph-users@lists.ceph.com > Subject: Re: [ceph-users] high density machines > > On 09/04/2015 02:31 AM, Wan

[ceph-users] Deep scrubbing OSD

2015-09-04 Thread Межов Игорь Александрович

Hi! Just one simple question: how can we see, when deep-scrub of osd complete, if we execute 'ceph osd deep-scrub ' command? Megov Igor CIO, Yuterra ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-c

[ceph-users] Receiving "failed to parse date for auth header"

2015-09-04 Thread Ramon Marco Navarro

Good day everyone! I'm having a problem using aws-java-sdk to connect to Ceph using radosgw. I am reading a " NOTICE: failed to parse date for auth header" message in the logs. HTTP_DATE is "Fri, 04 Sep 2015 09:25:33 +00:00", which is I think a valid rfc 1123 date... Here's a link to the related

[ceph-users] Ceph Client parallized access?

2015-09-04 Thread Alexander Walker

Hi, i've configured a CephFS and mouted this in fstab ceph1:6789,ceph2:6789,ceph3:6789:/ /cephfsceph name=admin,secret=AQDVOOhVxEI7IBAAM+4el6WYbCwKvFxmW7ygcA==,noatime 0 2 it's mean: 1. Ceph Client can write data on all three server at the same time? 2. Client access the second

[ceph-users] Nova fails to download image from Glance backed with Ceph

2015-09-04 Thread Vasiliy Angapov

Hi all, Not sure actually where does this bug belong to - OpenStack or Ceph - but writing here in humble hope that anyone faced that issue also. I configured test OpenStack instance with Glance images stored in Ceph 0.94.3. Nova has local storage. But when I'm trying to launch instance from large

Re: [ceph-users] CephFS and caching

2015-09-04 Thread Les

Cephfs can use fscache. I am testing it at the moment. Some lines from my deployment process: sudo apt-get install linux-generic-lts-utopic cachefilesd sudo reboot sudo mkdir /mnt/cephfs sudo mkdir /mnt/ceph_cache sudo mkfs -t xfs /dev/md3 # A 100gb local raid partition sudo bash -c "echo /dev/md

Re: [ceph-users] Ceph performance, empty vs part full

2015-09-04 Thread Nick Fisk

I've just made the same change ( 4 and 40 for now) on my cluster which is a similar size to yours. I didn't see any merging happening, although most of the directory's I looked at had more files in than the new merge threshold, so I guess this is to be expected I'm currently splitting my PG's f

Re: [ceph-users] Ceph performance, empty vs part full

2015-09-04 Thread Nick Fisk

Actually just thinking about this some more, shouldn't the PG's per OSD "golden rule" also depend on the size of the OSD? If this Directory splitting is a big deal then an 8TB OSD is going to need a lot more PG's than say a 1TB OSD. Any thoughts Mark? > -Original Message- > From: ceph-u

[ceph-users] Impact add PG

2015-09-04 Thread Jimmy Goffaux

English version : Hello everyone, Recently we have increased the number of PG in a pool. We had a big performance problem because everything had CEPH cluster 0 on IOPS while there are production above. So we did this: ceph tell osd.* injectargs '--osd_max_backfills 1' ceph tell osd.* injec

Re: [ceph-users] Impact add PG

2015-09-04 Thread Wang, Warren

Sadly, this is one of those things that people find out after running their first production Ceph cluster. Never run with the defaults. I know it's been recently reduced to 3 and 1 or 1 and 3, I forget, but I would advocate 1 and 1. Even that will cause a tremendous amount of traffic with any re

Re: [ceph-users] Nova fails to download image from Glance backed with Ceph

2015-09-04 Thread Sebastien Han

Just to take away a possible issue from infra (LBs etc). Did you try to download the image on the compute node? Something like rbd export? > On 04 Sep 2015, at 11:56, Vasiliy Angapov wrote: > > Hi all, > > Not sure actually where does this bug belong to - OpenStack or Ceph - > but writing here

Re: [ceph-users] Nova fails to download image from Glance backed with Ceph

2015-09-04 Thread Jan Schermer

Didn't you run out of space? Happened to me when a customer tried to create a 1TB image... Z. > On 04 Sep 2015, at 15:15, Sebastien Han wrote: > > Just to take away a possible issue from infra (LBs etc). > Did you try to download the image on the compute node? Something like rbd > export? >

Re: [ceph-users] maximum number of mapped rbds?

2015-09-04 Thread Sebastien Han

Which Kernel are you running on? These days, the theoretical limit is 65536 AFAIK. Ilya would know the kernel needed for that. > On 03 Sep 2015, at 15:05, Jeff Epstein wrote: > > Hello, > > In response to an rbd map command, we are getting a "Device or resource busy". > > $ rbd -p platform ma

Re: [ceph-users] Ceph performance, empty vs part full

2015-09-04 Thread Mark Nelson

There's a lot of factors that play into all of this. The more PGs you have, the more total objects you can store before you hit the thresholds. More PGs also means slightly better random distribution across OSDs (Not really affected by the size of the OSD assuming all OSDs are uniform). You

Re: [ceph-users] maximum number of mapped rbds?

2015-09-04 Thread Ilya Dryomov

On Fri, Sep 4, 2015 at 4:30 PM, Sebastien Han wrote: > Which Kernel are you running on? > These days, the theoretical limit is 65536 AFAIK. > > Ilya would know the kernel needed for that. 3.14 or later, and, if you are loading your kernel modules by hand or have your distro load them for you duri

Re: [ceph-users] maximum number of mapped rbds?

2015-09-04 Thread Ilya Dryomov

On Fri, Sep 4, 2015 at 4:44 PM, Ilya Dryomov wrote: > On Fri, Sep 4, 2015 at 4:30 PM, Sebastien Han wrote: >> Which Kernel are you running on? >> These days, the theoretical limit is 65536 AFAIK. >> >> Ilya would know the kernel needed for that. > > 3.14 or later, and, if you are loading your ker

Re: [ceph-users] How to disable object-map and exclusive features ?

2015-09-04 Thread Jason Dillaman

> I have a coredump with the size of 1200M compressed . > > Where shall i put the dump ? > I believe you can use the ceph-post-file utility [1] to upload the core and your current package list to ceph.com. Jason [1] http://ceph.com/docs/master/man/8/ceph-post-file/ __

Re: [ceph-users] crash on rbd bench-write

2015-09-04 Thread Jason Dillaman

Any particular reason why you have the image mounted via the kernel client while performing a benchmark? Not to say this is the reason for the crash, but strange since 'rbd bench-write' will not test the kernel IO speed since it uses the user-mode library. Are you able to test bench-write with

Re: [ceph-users] Ceph performance, empty vs part full

2015-09-04 Thread Jan Schermer

Mark could you please elaborate on this? "use larger directory splitting thresholds to at least balance that part of the equation out" Thanks Jan > On 04 Sep 2015, at 15:31, Mark Nelson wrote: > > There's a lot of factors that play into all of this. The more PGs you have, > the more total ob

[ceph-users] Best layout for SSD & SAS OSDs

2015-09-04 Thread German Anders

Hi cephers, I've the following scheme: 7x OSD servers with: 4x 800GB SSD Intel DC S3510 (OSD-SSD) 3x 120GB SSD Intel DC S3500 (Journals) 5x 3TB SAS disks (OSD-SAS) The OSD servers are located on two separate Racks with two power circuits each. I would like to know what is the

Re: [ceph-users] Receiving "failed to parse date for auth header"

2015-09-04 Thread Ilya Dryomov

On Fri, Sep 4, 2015 at 12:42 PM, Ramon Marco Navarro wrote: > Good day everyone! > > I'm having a problem using aws-java-sdk to connect to Ceph using radosgw. I > am reading a " NOTICE: failed to parse date for auth header" message in the > logs. HTTP_DATE is "Fri, 04 Sep 2015 09:25:33 +00:00", wh

Re: [ceph-users] Best layout for SSD & SAS OSDs

2015-09-04 Thread Nick Fisk

Hi German, Are the power feeds completely separate (ie 4 feeds in total), or just each rack has both feeds? If it’s the latter I don’t see any benefit from including this into the crushmap and would just create a “rack” bucket. Also assuming your servers have dual PSU’s, this also changes th

Re: [ceph-users] Best layout for SSD & SAS OSDs

2015-09-04 Thread German Anders

Thanks a lot Nick, regarding the power feeds, we only had two circuits for all the racks, so I'll to do in the crush the "rack" bucket and separate the osd servers on the rack buckets, then regarding the SSD pools, I've installed the hammer version and wondering to upgrade to Infernalis v9.0.3 and

Re: [ceph-users] Best layout for SSD & SAS OSDs

2015-09-04 Thread Nick Fisk

I wouldn't advise upgrading yet if this cluster is going into production. I think several people got bitten last time round when they upgraded to pre hammer. Here is a good example on how to create separate root's for SSD's and HDD's http://ceph.com/docs/master/rados/operations/crush-map/#placi

Re: [ceph-users] ESXi/LIO/RBD repeatable problem, hang when cloning VM

2015-09-04 Thread Alex Gorbachev

On Thu, Sep 3, 2015 at 3:20 AM, Nicholas A. Bellinger wrote: > (RESENDING) > > On Wed, 2015-09-02 at 21:14 -0400, Alex Gorbachev wrote: >> e have experienced a repeatable issue when performing the following: >> >> Ceph backend with no issues, we can repeat any time at will in lab and >> production

Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-04 Thread James (Fei) Liu-SSI

Hi Quentin and Andrija, Thanks so much for reporting the problems with Samsung. Would be possible to get to know your configuration of your system? What kind of workload are you running? Do you use Samsung SSD as separate journaling disk, right? Thanks so much. James From: ceph-users [mailt

[ceph-users] ceph osd prepare btrfs

2015-09-04 Thread German Anders

Trying to do a prepare on a osd with btrfs, and getting this error: [cibosd04][INFO ] Running command: sudo ceph-disk -v prepare --cluster ceph --fs-type btrfs -- /dev/sdc [cibosd04][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid [cibosd04][WARNI

Re: [ceph-users] OSD respawning -- FAILED assert(clone_size.count(clone))

2015-09-04 Thread David Zafman

Chris, I see that you have stack traces that indicate some OSDs are running v0.94.2 (osd.23) and some running v0.94.3 (osd.30). They should be running the same release except briefly while upgrading. I see some snapshot/cache tiering fixes went into 0.94.3. So an OSD running v0.94.2 when

Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-04 Thread Andrija Panic

Hi James, I had 3 CEPH nodes as folowing: 12 OSDs(HDD) and 2 SSDs (2x 6 Journals partitions on each SSD) - SSDs just vanished with no warning, no smartctl errors nothing... so 2 SSDs in 3 servers vanished in...2-3 weeks, after a 3-4 months of being in production (VMs/KVM/CloudStack) Mine were als

[ceph-users] ceph-deploy prepare btrfs osd error

2015-09-04 Thread German Anders

Any ideas? ceph@cephdeploy01:~/ceph-ib$ ceph-deploy osd prepare --fs-type btrfs cibosd04:sdc [ceph_deploy.conf][DEBUG ] found configuration file at: /home/ceph/.cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (1.5.28): /usr/bin/ceph-deploy osd prepare --fs-type btrfs cibosd04:sdc [ceph_deploy.cl

Re: [ceph-users] Ceph performance, empty vs part full

2015-09-04 Thread Ben Hines

Yeah, i'm not seeing stuff being moved at all. Perhaps we should file a ticket to request a way to tell an OSD to rebalance its directory structure. On Fri, Sep 4, 2015 at 5:08 AM, Nick Fisk wrote: > I've just made the same change ( 4 and 40 for now) on my cluster which is a > similar size to yo

Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-04 Thread Quentin Hartman

Mine are also mostly 850 Pros. I have a few 840s, and a few 850 EVOs in there just because I couldn't find 14 pros at the time we were ordering hardware. I have 14 nodes, each with a single 128 or 120GB SSD that serves as the boot drive and the journal for 3 OSDs. And similarly, mine just started

Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-04 Thread James (Fei) Liu-SSI

Hi Andrija, Thanks for your promptly response. Would be possible to have any change to know your hardware configuration including your server information? Secondly, Is there anyway to duplicate your workload with fio-rbd, rbd bench or rados bench? “so 2 SSDs in 3 servers vanished in...2-3 week

Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-04 Thread Andrija Panic

Quentin, try fio or dd with O_DIRECT and D_SYNC flags, and you will see less than 1MB/s - that is common for most "home" drives - check the post down to understand We removed all Samsung 850 pro 256GB from our new CEPH installation and replaced with Intel S3500 (18.000 (4Kb) IOPS constant wri

Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-04 Thread Andrija Panic

Hi James, yes CEPH with Cloudstack. all 6 SSDs (2 SSDs in each of 3 nodes) vanished in 2-3 weeks total time, and yes brand new Samsung 850 Pro 128GB - I also checked wear_level atribute via smartctl prior to all drives dying - no indication wear_level is low or anything...also all other parametres

Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-04 Thread Quentin Hartman

Yeah, we've ordered some S3700's since we can't afford to have these sorts of failures and haven't been able to find any of the DC-rated Samsung drives anywhere. fwiw, we didn't have any performance problems with the samsungs, it's exclusively this sudden failure that's making us look elsewhere.

Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-04 Thread James (Fei) Liu-SSI

Andrija, In your email thread, (18.000 (4Kb) IOPS constant write speed stands for 18K iops with 4k block size, right? However, you can only achieve 200IOPS with Samsung 850Pro, right? Theoretically, Samsung 850 Pro can get up to 100,000 IOPS with 4k Random Read with certain workload. It is a l

Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-04 Thread Andrija Panic

James, there are simple FIO tests or even DD test on Linux, which you can run to see how good SSD will perform as CEPH Journal device (CEPH does writes with O_DIRECT and D_SYNC flags to SSDs) - Samsung 850 perform here extremely bad, as many, many other vendors (D_SYNC kills performance for them..

Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-04 Thread Quentin Hartman

Oh, I forgot to mention, these drives have been in service for about 9 months. If it's useful / interesting at all, here is the smartctl -a output from one of the 840's I installed about the same time as the ones that failed recently, but it has not yet failed: smartctl 6.2 2013-07-26 r3841 [x86_

Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-04 Thread James (Fei) Liu-SSI

Hi Anrija, Your feedback is greatly appreciated. Regards, James From: Andrija Panic [mailto:andrija.pa...@gmail.com] Sent: Friday, September 04, 2015 12:39 PM To: James (Fei) Liu-SSI Cc: Quentin Hartman; ceph-users Subject: Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s37

Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-04 Thread Quentin Hartman

I just went through and ran this on all my currently running SSDs: echo "$(smartctl -a /dev/sda | grep Total_LBAs_Written | awk '{ print $NF }') * 512 /1025/1024/1024/1024" | bc which is showing about 32TB written on the oldest nodes, about 20 on the newer ones, and 1 on the first one I've RMA'd

[ceph-users] XFS and nobarriers on Intel SSD

2015-09-04 Thread Richard Bade

Hi Everyone, We have a Ceph pool that is entirely made up of Intel S3700/S3710 enterprise SSD's. We are seeing some significant I/O delays on the disks causing a “SCSI Task Abort” from the OS. This seems to be triggered by the drive receiving a “Synchronize cache command”. My current thinking is

[ceph-users] Cannot add/create new monitor on ceph v0.94.3

2015-09-04 Thread Chang, Fangzhe (Fangzhe)

Hi, I’m trying to add a second monitor using ‘ceph-deploy mon new ’. However, the log file shows the following error: 2015-09-04 16:13:54.863479 7f4cbc3f7700 0 cephx: verify_reply couldn't decrypt with error: error decoding block for decryption 2015-09-04 16:13:54.863491 7f4cbc3f7700 0 -- :6789

Re: [ceph-users] XFS and nobarriers on Intel SSD

2015-09-04 Thread Jan Schermer

>> We are seeing some significant I/O delays on the disks causing a “SCSI Task >> Abort” from the OS. This seems to be triggered by the drive receiving a >> “Synchronize cache command”. >> >> How exactly do you know this is the cause? This is usually just an effect of something going wrong a

[ceph-users] НА: which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-04 Thread Межов Игорь Александрович

Hi! Have worked with Intel DC S3700 200Gb. Due to budget restrictions, one ssd hosts a system volume and 1:12 OSD journals. 6 nodes, 120Tb raw space. Cluster serves as RBD storage for ~100VM. Not a single failure per year - all devices are healthy. The remainig resource (by smart) is ~92%.

Re: [ceph-users] XFS and nobarriers on Intel SSD

2015-09-04 Thread Richard Bade

Hi Jan, Thanks for your response. > *How exactly do you know this is the cause? This is usually just an effect > of something going wrong and part of error recovery process.**Preceding > this event should be the real error/root cause...* We have been working with LSI/Avago to resolve this. We ge

Re: [ceph-users] Ceph performance, empty vs part full

2015-09-04 Thread GuangYang

IIRC, it only triggers the move (merge or split) when that folder is hit by a request, so most likely it happens gradually. Another thing might be helpful (and we have had good experience with), is that we do the folder splitting at the pool creation time, so that we avoid the performance impac

Re: [ceph-users] Ceph performance, empty vs part full

2015-09-04 Thread Shinobu Kinjo

> IIRC, it only triggers the move (merge or split) when that folder is hit by a > request, so most likely it happens gradually. Do you know what causes this? I would like to be more clear "gradually". Shinobu - Original Message - From: "GuangYang" To: "Ben Hines" , "Nick Fisk" Cc: "ce

Re: [ceph-users] Ceph performance, empty vs part full

2015-09-04 Thread GuangYang

> Date: Fri, 4 Sep 2015 20:31:59 -0400 > From: ski...@redhat.com > To: yguan...@outlook.com > CC: bhi...@gmail.com; n...@fisk.me.uk; ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Ceph performance, empty vs part full > >> IIRC, it only triggers the mo

Re: [ceph-users] Ceph performance, empty vs part full

2015-09-04 Thread Shinobu Kinjo

Very nice. You're my hero! Shinobu - Original Message - From: "GuangYang" To: "Shinobu Kinjo" Cc: "Ben Hines" , "Nick Fisk" , "ceph-users" Sent: Saturday, September 5, 2015 9:40:06 AM Subject: RE: [ceph-users] Ceph performance, empty vs part full

Re: [ceph-users] НА: which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-04 Thread Christian Balzer

Hello, On Fri, 4 Sep 2015 22:37:06 + Межов Игорь Александрович wrote: > Hi! > > > Have worked with Intel DC S3700 200Gb. Due to budget restrictions, one > > ssd hosts a system volume and 1:12 OSD journals. 6 nodes, 120Tb raw > space. > Meaning you're limited to 360MB/s writes per node at

Re: [ceph-users] Best layout for SSD & SAS OSDs

2015-09-04 Thread Christian Balzer

Hello, On Fri, 4 Sep 2015 12:30:12 -0300 German Anders wrote: > Hi cephers, > >I've the following scheme: > > 7x OSD servers with: > Is this a new cluster, total initial deployment? What else are these nodes made of, CPU/RAM/network? While uniform nodes have some appeal (interchangeabilit

Re: [ceph-users] Nova fails to download image from Glance backed with Ceph

2015-09-04 Thread Vasiliy Angapov

Thanks for response! The free space on /var/lib/nova/instances is very large on every compute host. Glance image-download works as expected. 2015-09-04 21:27 GMT+08:00 Jan Schermer : > Didn't you run out of space? Happened to me when a customer tried to create a > 1TB image... > > Z. > >> On 04

55 matches

Mail list logo