Re: [ceph-users] Intel 520/530 SSD for ceph
On Mon, Nov 18, 2013 at 02:38:42PM +0100, Stefan Priebe - Profihost AG wrote: > Hi guys, > > in the past we've used intel 520 ssds for ceph journal - this worked > great and our experience was good. > > Now they started to replace the 520 series with their new 530. > > When we did we were supriced by the ugly performance and i need some > days to reproduce. > > While O_DIRECT works fine for both and the intel ssd 530 is even faster > than the 520. > > O_DSYNC... see the results: > > ~# dd if=randfile.gz of=/dev/sda bs=350k count=1 oflag=direct,dsync > 358400 bytes (3,6 GB) copied, 22,287 s, 161 MB/s > > ~# dd if=randfile.gz of=/dev/sdb bs=350k count=1 oflag=direct,dsync > 358400 bytes (3,6 GB) copied, 136,505 s, 26,3 MB/s > > I used a blocksize of 350k as my graphes shows me that this is the > average workload we have on the journal. But i also tried using fio, > bigger blocksize, ... it stays the same. > > Does anybody have an idea? Without dsync both devices have around the > same performance of 260MB/s. > > Greets, > Stefan > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html You may actually be doing O_SYNC - recent kernels implement O_DSYNC, but glibc maps O_DSYNC into O_SYNC. But since you're writing to the block device this won't matter much. I believe the effect of O_DIRECT by itself is just to bypass the buffer cache, which is not going to make much difference for your dd case. (It will mainly affect other applications that are also using the buffer cache...) O_SYNC should be causing the writes to block until a response is received from the disk. Without O_SYNC, the writes will just queue operations and return - potentially very fast. Your dd is probably writing enough data that there is some throttling by the system as it runs out of disk buffers and has to wait for some previous data to be written to the drive, but the delay for any individual block is not likely to matter. With O_SYNC, you are measuring the delay for each block directly, and you have absolutely removed the ability for the disk to perform any sort of parallism. [It's also conceivable the kernel is sending some form of write barrier flag to the drive, which will slow it down further, but I can't find any kernel logic that does this at a quick glance.] Sounds like the intel 530 is has a much larger block write latency, but can make up for it by performing more overlapped operations. You might be able to vary this behavior by experimenting with sdparm, smartctl or other tools, or possibly with different microcode in the drive. -Marcus Watts ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Intel 520/530 SSD for ceph
On Tue, Nov 19, 2013 at 09:02:41AM +0100, Stefan Priebe wrote: ... > >You might be able to vary this behavior by experimenting with sdparm, > >smartctl or other tools, or possibly with different microcode in the drive. > Which values or which settings do you think of? ... Off-hand, I don't know. Probably the first thing would be to compare the configuration of your 520 & 530; anything that's different is certainly worth investigating. This should display all pages, sdparm --all --long /dev/sdX the 520 only appears to have 3 pages, which can be fetched directly w/ sdparm --page=ca --long /dev/sdX sdparm --page=co --long /dev/sdX sdparm --page=rw --long /dev/sdX The sample machine I'm looking has an intel 520, and on ours, most options show as 0 except for AWRE1 [cha: n, def: 1] Automatic write reallocation enabled WCE 1 [cha: y, def: 1] Write cache enable DRA 1 [cha: n, def: 1] Disable read ahead GLTSD 1 [cha: n, def: 1] Global logging target save disable BTP-1 [cha: n, def: -1] Busy timeout period (100us) ESTCT 30 [cha: n, def: 30] Extended self test completion time (sec) Perhaps that's an interesting data point to compare with yours. Figuring out if you have up-to-date intel firmware appears to require burning and running an iso image from https://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=18455 The results of sdparm --page= --long /dev/sdc show the intel firmware, but this labels it better: smartctl -i /dev/sdc Our 520 has firmware "400i" loaded. -Marcus Watts ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] running as non-root
On Sat, Dec 06, 2014 at 08:44:41PM +, Paulo Almeida wrote: ... > You can also register uids with Debian. Quoting from the Policy > Manual[1]: > > The UID and GID numbers are divided into classes as follows: > > 0-99: > > Globally allocated by the Debian project, the same on every ... I think you will find it hard to get one of those 99 numbers. Probably you would have to argue that your id will show up on almost all systems anyways - same as "bin", "root", "daemon", etc. [ Another problem with one number: you probably want the *same* number on redhat, ubuntu, suse, etc. -- surely you want to allow people to have mixed debian/redhat/ubuntu/suse setups. . . ] The same policy page (https://www.debian.org/doc/debian-policy/ch-opersys.html) goes on to say: 100-999: Dynamically allocated system users and groups. Packages which need a user or group, but can have this user or group allocated dynamically and differently on each system, should use adduser --system to create the group and/or user. adduser will check for the existence of the user or group, and if necessary choose an unused id based on the ranges specified in adduser.conf. I think this is what you really want to be using instead, and it's easy to find good examples of how to do this in many other existing debian packages. It does mean the per-machine local "ceph" user might be different on different machines. I don't think that this number is actually visible outside of the local machine, so it should't cause any real problems. Well, except for ceph.conf. Probably the easiest fix for "ceph.conf" would be to accept a username instead of a uid. The name is what you really care the most about - it means things you do with ssh between machines will be coordinated. A more complicated fix might be to separate out "ceph.conf" into a generic piece that can be shared across all servers and clients, and per-cluster and per-machine pieces that can be used to contain anything the cluster needs to keep in common, and anything that doesn't need to be know outside of that machine. uid#'s clearly fall into the last category. The main software thing that needs to happen for this it to allow for a "/etc/ceph.d/" directory to contain configuration that gets merged at runtime. -Marcus Watts ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] xfsprogs missing in rhel6 repository
On Fri, Dec 12, 2014 at 04:57:29PM +, Lukac, Erik wrote: > Hi Guys, > > xfsprogs is missing in http://ceph.com/rpm-giant/rhel6/x86_64/ > Unfortunately it is not avaivable in standard-rhel. > Could you please add it as in firefly AND update repodata? > > Thanks in advance > > Erik Um. Maybe I'm missing the point here, but if you want to run redhat and enjoy all the license goodness thereof, shouldn't you be buying their "scalable filesystem add-on" product so you can get the official "xfsprogs"? Sure it costs money - but you are running redhat precisely to do that right? [ Granted, I don't really understand redhat's pricing strategy here... ] And if you don't care about that, wouldn't it then make more sense to just run centos, where xfsprogs is just part of the standard everything? -Marcus Watts ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com