Re: [ceph-users] Intel 520/530 SSD for ceph

2013-11-18 Thread mdw
On Mon, Nov 18, 2013 at 02:38:42PM +0100, Stefan Priebe - Profihost AG wrote:
> Hi guys,
> 
> in the past we've used intel 520 ssds for ceph journal - this worked
> great and our experience was good.
> 
> Now they started to replace the 520 series with their new 530.
> 
> When we did we were supriced by the ugly performance and i need some
> days to reproduce.
> 
> While O_DIRECT works fine for both and the intel ssd 530 is even faster
> than the 520.
> 
> O_DSYNC... see the results:
> 
> ~# dd if=randfile.gz of=/dev/sda bs=350k count=1 oflag=direct,dsync
> 358400 bytes (3,6 GB) copied, 22,287 s, 161 MB/s
> 
> ~# dd if=randfile.gz of=/dev/sdb bs=350k count=1 oflag=direct,dsync
> 358400 bytes (3,6 GB) copied, 136,505 s, 26,3 MB/s
> 
> I used a blocksize of 350k as my graphes shows me that this is the
> average workload we have on the journal. But i also tried using fio,
> bigger blocksize, ... it stays the same.
> 
> Does anybody have an idea? Without dsync both devices have around the
> same performance of 260MB/s.
> 
> Greets,
> Stefan
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

You may actually be doing O_SYNC - recent kernels implement O_DSYNC,
but glibc maps O_DSYNC into O_SYNC.  But since you're writing to the
block device this won't matter much.

I believe the effect of O_DIRECT by itself is just to bypass the buffer
cache, which is not going to make much difference for your dd case.
(It will mainly affect other applications that are also using the
buffer cache...)

O_SYNC should be causing the writes to block until a response
is received from the disk.  Without O_SYNC, the writes will
just queue operations and return - potentially very fast.
Your dd is probably writing enough data that there is some
throttling by the system as it runs out of disk buffers and
has to wait for some previous data to be written to the drive,
but the delay for any individual block is not likely to matter.
With O_SYNC, you are measuring the delay for each block directly,
and you have absolutely removed the ability for the disk to
perform any sort of parallism.
[It's also conceivable the kernel is sending some form of write
barrier flag to the drive, which will slow it down further,
but I can't find any kernel logic that does this at a quick glance.]
Sounds like the intel 530 is has a much larger block write latency,
but can make up for it by performing more overlapped operations.

You might be able to vary this behavior by experimenting with sdparm,
smartctl or other tools, or possibly with different microcode in the drive.

-Marcus Watts
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Intel 520/530 SSD for ceph

2013-11-20 Thread mdw
On Tue, Nov 19, 2013 at 09:02:41AM +0100, Stefan Priebe wrote:
...
> >You might be able to vary this behavior by experimenting with sdparm,
> >smartctl or other tools, or possibly with different microcode in the drive.
> Which values or which settings do you think of?
...

Off-hand, I don't know.  Probably the first thing would be
to compare the configuration of your 520 & 530; anything that's
different is certainly worth investigating.

This should display all pages,
sdparm --all --long /dev/sdX
the 520 only appears to have 3 pages, which can be fetched directly w/
sdparm --page=ca --long /dev/sdX
sdparm --page=co --long /dev/sdX
sdparm --page=rw --long /dev/sdX

The sample machine I'm looking has an intel 520, and on ours,
most options show as 0 except for
  AWRE1  [cha: n, def:  1]  Automatic write reallocation enabled
  WCE 1  [cha: y, def:  1]  Write cache enable
  DRA 1  [cha: n, def:  1]  Disable read ahead
  GLTSD   1  [cha: n, def:  1]  Global logging target save disable
  BTP-1  [cha: n, def: -1]  Busy timeout period (100us)
  ESTCT  30  [cha: n, def: 30]  Extended self test completion time (sec)
Perhaps that's an interesting data point to compare with yours.

Figuring out if you have up-to-date intel firmware appears to require
burning and running an iso image from
https://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=18455

The results of sdparm --page= --long /dev/sdc
show the intel firmware, but this labels it better:
smartctl -i /dev/sdc
Our 520 has firmware "400i" loaded.

-Marcus Watts
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] running as non-root

2014-12-06 Thread mdw
On Sat, Dec 06, 2014 at 08:44:41PM +, Paulo Almeida wrote:
...
> You can also register uids with Debian. Quoting from the Policy
> Manual[1]:
> 
> The UID and GID numbers are divided into classes as follows:
> 
> 0-99:
> 
> Globally allocated by the Debian project, the same on every
...

I think you will find it hard to get one of those 99 numbers.  Probably
you would have to argue that your id will show up on almost all systems
anyways - same as "bin", "root", "daemon", etc.

[   Another problem with one number: you probably want the
*same* number on redhat, ubuntu, suse, etc. -- surely
you want to allow people to have mixed debian/redhat/ubuntu/suse
setups. . . ]

The same policy page (https://www.debian.org/doc/debian-policy/ch-opersys.html)
goes on to say:

100-999:

Dynamically allocated system users and groups. Packages which need a
user or group, but can have this user or group allocated dynamically
and differently on each system, should use adduser --system to 
create
the group and/or user. adduser will check for the existence of the
user or group, and if necessary choose an unused id based on the
ranges specified in adduser.conf.

I think this is what you really want to be using instead, and it's easy
to find good examples of how to do this in many other existing debian packages.

It does mean the per-machine local "ceph" user might be different
on different machines.  I don't think that this number is actually visible
outside of the local machine, so it should't cause any real problems.
Well, except for ceph.conf.

Probably the easiest fix for "ceph.conf" would be to accept a username instead
of a uid.  The name is what you really care the most about - it means things
you do with ssh between machines will be coordinated.

A more complicated fix might be to separate out "ceph.conf" into a generic
piece that can be shared across all servers and clients, and per-cluster
and per-machine pieces that can be used to contain anything the cluster
needs to keep in common, and anything that doesn't need to be know outside
of that machine.  uid#'s clearly fall into the last category.  The main
software thing that needs to happen for this it to allow for a "/etc/ceph.d/"
directory to contain configuration that gets merged at runtime.

-Marcus Watts
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] xfsprogs missing in rhel6 repository

2014-12-12 Thread mdw
On Fri, Dec 12, 2014 at 04:57:29PM +, Lukac, Erik wrote:
> Hi Guys,
> 
> xfsprogs is missing in http://ceph.com/rpm-giant/rhel6/x86_64/
> Unfortunately it is not avaivable in standard-rhel.
> Could you please add it as in firefly AND update repodata?
> 
> Thanks in advance
> 
> Erik

Um.  Maybe I'm missing the point here, but if you want to run
redhat and enjoy all the license goodness thereof, shouldn't you
be buying their "scalable filesystem add-on" product so you can
get the official "xfsprogs"?  Sure it costs money - but you are running
redhat precisely to do that right?
[ Granted, I don't really understand redhat's pricing strategy here... ]

And if you don't care about that, wouldn't it then make
more sense to just run centos, where xfsprogs is just part
of the standard everything?

-Marcus Watts
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com