Re: [ceph-users] CephFS - Problems with the reported used space

2015-08-07 Thread Goncalo Borges
Hi All... I am still fighting with this issue. It may be something which is not properly implemented, and if that is the case, that is fine. I am still trying to understand what is the real space occupied by files in a /cephfs filesystem, reported for example by a df. Maybe I did not expla

Re: [ceph-users] CephFS - Problems with the reported used space

2015-08-07 Thread Yan, Zheng
On Fri, Aug 7, 2015 at 3:41 PM, Goncalo Borges wrote: > Hi All... > > I am still fighting with this issue. It may be something which is not > properly implemented, and if that is the case, that is fine. > > I am still trying to understand what is the real space occupied by files in > a /cephfs fil

Re: [ceph-users] Slow requests during ceph osd boot

2015-08-07 Thread Jan Schermer
You'd need to disable the udev rule as well as the initscript (probably somewhere in /lib/udev/) What I do when I'm restarting the server is: chmod -x /usr/bin/ceph-osd Jan > On 07 Aug 2015, at 05:11, Nathan O'Sullivan wrote: > > I'm seeing the same sort of issue. > > Any suggestions on how

Re: [ceph-users] Direct IO tests on RBD device vary significantly

2015-08-07 Thread Jan Schermer
You're not really testing only a RBD device there - you're testing 1) the O_DIRECT implementation in the kernel version you have (they differ) - try different kernels in guest 2) cache implementation in qemu (and possibly virtio block driver) - if it's enabled - disable it for this test completely

Re: [ceph-users] Warning regarding LTTng while checking status or restarting service

2015-08-07 Thread Jan Schermer
Well, you could explicitly export HOME=/root then, that should make it go away. I think it's normally only present in a login shell. Jan > On 06 Aug 2015, at 17:51, Josh Durgin wrote: > > On 08/06/2015 03:10 AM, Daleep Bais wrote: >> Hi, >> >> Whenever I restart or check the logs for OSD, MON,

[ceph-users] inconsistent pgs

2015-08-07 Thread Константин Сахинов
Hi! I have a large number of inconsistent pgs 229 of 656, and it's increasing every hour. I'm using ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3). For example, pg 3.d8: # ceph health detail | grep 3.d8 pg 3.d8 is active+clean+scrubbing+deep+inconsistent, acting [1,7] # grep 3.d8

[ceph-users] НА: inconsistent pgs

2015-08-07 Thread Межов Игорь Александрович
Hi! Do you have any disk errors in dmesg output? In our practice, every time the deep scrub found inconsistent PG, we also found a disk error, that was the reason. Sometimes it was media errors (bad sectors), one time - bad sata cable and we also had some raid/hba firmware issues. But in all case

Re: [ceph-users] inconsistent pgs

2015-08-07 Thread Константин Сахинов
No, dmesg is clean on both hosts of osd.1 (block0) and osd.7 (block2). There are only boot time messages (listings below). If there is cable or SATA-controller issues, will it be shown in /var/log/dmesg? block0 dmesg: [5.296302] XFS (sdb1): Mounting V4 Filesystem [5.316487] XFS (sda1): Mo

[ceph-users] Different filesystems on OSD hosts at the same cluster

2015-08-07 Thread Межов Игорь Александрович
Hi! We do some performance tests on our small Hammer install: - Debian Jessie; - Ceph Hammer 0.94.2 self-built from sources (tcmalloc) - 1xE5-2670 + 128Gb RAM - 2 nodes shared with mons, system and mon DB are on separate SAS mirror; - 16 OSD on each node, SAS 10k; - 2 Intel DC S3700 200Gb SS

[ceph-users] НА: inconsistent pgs

2015-08-07 Thread Межов Игорь Александрович
Hi! >If there is cable or SATA-controller issues, will it be shown in >/var/log/dmesg? If it will lead to read errors, it will be logged in dmesg. If it cause only SATA command retransmission, it maybe won't logged in dmesg, but have to be shown in SMART attributes. And anyway, we face only som

Re: [ceph-users] inconsistent pgs

2015-08-07 Thread Константин Сахинов
"you use XFS on your OSDs?" This OSD was formatted in BTRFS as a whole block device /dev/sdc (with no partition table). Then I moved from BTRFS to XFS /dev/sdc1 (with partition table), because BTRFS was v-v-very slow. Maybe partprober sees some old signatures from first sectors of that disk... By

[ceph-users] НА: inconsistent pgs

2015-08-07 Thread Межов Игорь Александрович
Hi! When inconsistent PGs starting to appear? Maybe after some event? Hang, node reboot or after reconfiguration or changing parameters? Can you say, what triggers such behaviour? And, BTW, what system/kernel you use? Megov Igor CIO, Yuterra ___ ceph-u

Re: [ceph-users] inconsistent pgs

2015-08-07 Thread Константин Сахинов
It's hard to say now. I changed one-by-one my 6 OSDs from btrfs to xfs. During the repair process I added 2 more OSDs. Changed crush map from root-host-osd to root-*chasis*-host-osd structure... There was SSD cache tiering set, when first inconsistency showed up. Then I removed tiering to confirm t

Re: [ceph-users] inconsistent pgs

2015-08-07 Thread Константин Сахинов
When I changed crush from root-host-osd to root-chasis-host-osd, did I've to change default ruleset? I didn't changed it. It looks like this: rule replicated_ruleset { ruleset 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } п

[ceph-users] OSD are not seen as down when i stop node

2015-08-07 Thread Thomas Bernard
Hi, I recently add 5 nodes to my ceph cluster, each node store 16 osd. My old nodes were in firefly release and i upgrade my cluster in hammer. My problem is when i stop a node or an osd (with or without set noout first) the OSD are not seen as down. All OSD are still up and i have manifold block

Re: [ceph-users] Different filesystems on OSD hosts at the same cluster

2015-08-07 Thread Udo Lembke
Hi, some time ago I switched all OSDs from XFS to ext4 (step by step). I had no issues during mixed osd-format (the process takes some weeks). And yes, for me ext4 performs also better (esp. the latencies). Udo Am 07.08.2015 13:31, schrieb Межов Игорь Александрович: > Hi! > > We do some perform

Re: [ceph-users] Different filesystems on OSD hosts at the samecluster

2015-08-07 Thread Burkhard Linke
Hi, On 08/07/2015 04:04 PM, Udo Lembke wrote: Hi, some time ago I switched all OSDs from XFS to ext4 (step by step). I had no issues during mixed osd-format (the process takes some weeks). And yes, for me ext4 performs also better (esp. the latencies). Just out of curiosity: Do you use a ext

Re: [ceph-users] Different filesystems on OSD hosts at the samecluster

2015-08-07 Thread Udo Lembke
Hi, I use the ext4-parameters like Christian Balzer wrote in one posting: osd mount options ext4 = "user_xattr,rw,noatime,nodiratime" osd_mkfs_options_ext4 = -J size=1024 -E lazy_itable_init=0,lazy_journal_init=0 The osd-journals are on SSD-Partitions (without filesystem). IMHO ext4 don't support

Re: [ceph-users] Different filesystems on OSD hosts at the samecluster

2015-08-07 Thread Burkhard Linke
Hi, On 08/07/2015 04:30 PM, Udo Lembke wrote: Hi, I use the ext4-parameters like Christian Balzer wrote in one posting: osd mount options ext4 = "user_xattr,rw,noatime,nodiratime" osd_mkfs_options_ext4 = -J size=1024 -E lazy_itable_init=0,lazy_journal_init=0 Thx for the details. The osd-journ

Re: [ceph-users] Different filesystems on OSD hosts at the samecluster

2015-08-07 Thread Jan Schermer
ext4 does support external journal, and it is _FAST_ btw I'm not sure noatime is the right option nowadays for two reasons 1) the default is "relatime" which has minimal impact on performance 2) AFAIK some ceph features actually use atime (cache tiering was it?) or at least so I gathered from som

[ceph-users] НА: Different filesystems on OSD hosts at the samecluster

2015-08-07 Thread Межов Игорь Александрович
Hi! >No, I was indeed talking about the ext4 journals, e.g. described here: ... >but the problem with the persistent device names is keeping me from trying it. So you assume 3-way setup in Ceph: first drive for filesystem data, second drive for filesystem journal and third drive for ceph journal?

Re: [ceph-users] НА: Different filesystems on OSD hosts at the samecluster

2015-08-07 Thread Jan Schermer
An interesting benchmark would be to compare "Ceph SSD journal" + "ext4 on spinner" versus "Ceph without journal" + "ext4 on spinner with external SSD journal". I won't be surprised if the second outperformed the first - you are actually making the whole setup much simpler and Ceph is mostly CPU

Re: [ceph-users] НА: Different filesystems on OSD hosts at the samecluster

2015-08-07 Thread Udo Lembke
Hi, I think also it's much too complicate and the effort is not in any relation, like Megor allready wrote the osd-journal on SSD handle the speed. But for the persistant device names you can easily use partlabel and select the disk with something like /dev/disk/by-partlabel/ext4-journal-15 I d

[ceph-users] НА: inconsistent pgs

2015-08-07 Thread Межов Игорь Александрович
Hi! I'm sorry, but I dont know, how to help you. We move OSDs from XFS to EXT4 on our test cluster (Hammer 0.94.2), removing ODSs one-by-one and re-adding them after reformatting to EXT4. This process is usual to a ceph (Add/Remove OSDs in documentaion) and took place without any data loss. We

[ceph-users] НА: НА: Different filesystems on OSD hosts at the samecluster

2015-08-07 Thread Межов Игорь Александрович
Hi! >An interesting benchmark would be to compare "Ceph SSD journal" + "ext4 on >spinner" >versus "Ceph without journal" + "ext4 on spinner with external SSD >journal". >I won't be surprised if the second outperformed the first - you are actually >making >the whole setup much simpler and Ceph

Re: [ceph-users] Ceph 0.94 (and lower) performance on >1 hosts ??

2015-08-07 Thread SCHAER Frederic
De : Jake Young [mailto:jak3...@gmail.com] Envoyé : mercredi 29 juillet 2015 17:13 À : SCHAER Frederic Cc : ceph-users@lists.ceph.com Objet : Re: [ceph-users] Ceph 0.94 (and lower) performance on >1 hosts ?? On Tue, Jul 28, 2015 at 11:48 AM, SCHAER Frederic mailto:frederic.sch...@cea.fr>> wrote

[ceph-users] Is there a limit for object size in CephFS?

2015-08-07 Thread Hadi Montakhabi
Hello Cephers, I am benchmarking CephFS. In one of my experiments, I change the object size. I start from 64kb. Everytime I do different block size reads and writes. By increasing the object size to 64MB and increasing the block size to 64MB, CephFS crashes (shown in the chart below). What I mean

Re: [ceph-users] НА: inconsistent pgs

2015-08-07 Thread Jan Schermer
Did you copy the OSD objects between btrfs->xfs or did you remove the btrfs OSD and add a new XFS OSD? Jan > On 07 Aug 2015, at 17:06, Межов Игорь Александрович wrote: > > Hi! > > I'm sorry, but I dont know, how to help you. We move OSDs from XFS to EXT4 on > our test > cluster (Hammer 0.94

[ceph-users] OSD crashes when starting

2015-08-07 Thread Gerd Jakobovitsch
Dear all, I got to an unrecoverable crash at one specific OSD, every time I try to restart it. It happened first at firefly 0.80.8, I updated to 0.80.10, but it continued to happen. Due to this failure, I have several PGs down+peering, that won't recover even marking the OSD out. Could som

Re: [ceph-users] НА: inconsistent pgs

2015-08-07 Thread Константин Сахинов
I removed btrfs OSD as written in docs, reformatted it to xfs, and then added as new OSD. пт, 7 авг. 2015 г. в 19:00, Jan Schermer : > Did you copy the OSD objects between btrfs->xfs or did you remove the > btrfs OSD and add a new XFS OSD? > > Jan > > On 07 Aug 2015, at 17:06, Межов Игорь Алексан

[ceph-users] Flapping OSD's when scrubbing

2015-08-07 Thread Tuomas Juntunen
Hi We are experiencing an annoying problem where scrubs make OSD's flap down and cause Ceph cluster to be unusable for couple of minutes. Our cluster consists of three nodes connected with 40gbit infiniband using IPoIB, with 2x 6 core X5670 CPU's and 64GB of memory Each node has 6 SSD's fo

Re: [ceph-users] Flapping OSD's when scrubbing

2015-08-07 Thread Quentin Hartman
That kind of behavior is usually caused by the OSDs getting busy enough that they aren't answering heartbeats in a timely fashion. It can also happen if you have any netowrk flakiness and heartbeats are getting lost because of that. I think (I'm not positive though) that increasing your heartbeat

Re: [ceph-users] Flapping OSD's when scrubbing

2015-08-07 Thread Tuomas Juntunen
Thanks We play with the values a bit and see what happens. Br, Tuomas From: Quentin Hartman [mailto:qhart...@direwolfdigital.com] Sent: 7. elokuuta 2015 20:32 To: Tuomas Juntunen Cc: ceph-users Subject: Re: [ceph-users] Flapping OSD's when scrubbing That kind of behavior is usu

Re: [ceph-users] Flapping OSD's when scrubbing

2015-08-07 Thread Константин Сахинов
Hi! One time I faced such a behavior of my home cluster. At the time my OSDs go down I noticed that node is using swap despite of sufficient memory. Tuning /proc/sys/vm/swappiness to 0 helped to solve the problem. пт, 7 авг. 2015 г. в 20:41, Tuomas Juntunen : > Thanks > > > > We play with the va

Re: [ceph-users] Different filesystems on OSD hosts at the samecluster

2015-08-07 Thread Udo Lembke
Hi Jan, thanks for the hint. I changed the mount-option from noatime to relatime and will remount all OSDs during weekend. Udo On 07.08.2015 16:37, Jan Schermer wrote: > ext4 does support external journal, and it is _FAST_ > > btw I'm not sure noatime is the right option nowadays for two reasons

[ceph-users] btrfs w/ centos 7.1

2015-08-07 Thread Ben Hines
Howdy, The Ceph docs still say btrfs is 'experimental' in one section, but say it's the long term ideal for ceph in the later section. Is this still accurate with Hammer? Is it mature enough on centos 7.1 for production use? (kernel is 3.10.0-229.7.2.el7.x86_64 ) thanks- -Ben _

Re: [ceph-users] btrfs w/ centos 7.1

2015-08-07 Thread Константин Сахинов
I'v tested it on my home cluster: 8 OSDs (4 nodes by 2x4TB OSDs with Celeron J1900 and 8GB RAM) + 4 cache tier OSDs (2 nodes by 2x250GB SSD OSDs with Atom D2500 and 4GB RAM). HDD OSDs worked v-v-very slow. And SSD OSDs sometimes stopped working because btrfs couldn't rebalance quickly enough and ov

Re: [ceph-users] btrfs w/ centos 7.1

2015-08-07 Thread Quentin Hartman
I would say probably not. btrfs (or, "worse FS" as we call it around my office) still does weird stuff from time to time, especially in low-memory conditions. This is based on testing we did on Ubuntu 14.04, running kernel 3.16.something. I long for the day that btrfs realizes it's promise, but I

Re: [ceph-users] btrfs w/ centos 7.1

2015-08-07 Thread Jan Schermer
The answer to this, as well as life, universe and everything, is simple: ZFS. :) > On 07 Aug 2015, at 22:24, Quentin Hartman > wrote: > > I would say probably not. btrfs (or, "worse FS" as we call it around my > office) still does weird stuff from time to time, especially in low-memory > con

Re: [ceph-users] Flapping OSD's when scrubbing

2015-08-07 Thread Tuomas Juntunen
Hi Thanks, we were able to resolve the problem by disabling swap completely, no need for it anyway. Also memory was fragmenting since all memory was used for caching Running “perf top” we saw that freeing blocks of memory took all the cpu power Samples: 4M of event 'cycles', Event

[ceph-users] optimizing non-ssd journals

2015-08-07 Thread Ben Hines
Our cluster is primarily used for RGW, but would like to use for RBD eventually... We don't have SSDs on our journals (for a while yet) and we're still updating our cluster to 10GBE. I do see some pretty high commit and apply latencies in 'osd perf' often 100-500 ms, which figure is a result of t

Re: [ceph-users] btrfs w/ centos 7.1

2015-08-07 Thread Lionel Bouton
Le 07/08/2015 22:05, Ben Hines a écrit : > Howdy, > > The Ceph docs still say btrfs is 'experimental' in one section, but > say it's the long term ideal for ceph in the later section. Is this > still accurate with Hammer? Is it mature enough on centos 7.1 for > production use? > > (kernel is 3.10.

Re: [ceph-users] Flapping OSD's when scrubbing

2015-08-07 Thread Somnath Roy
Yes, if you dig down older mails, you will see I reported that as a Ubuntu kernel bug (not sure about other Linux flavors) .. vm.min_free_kbytes is the way to work around that.. From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Tuomas Juntunen Sent: Friday, August 07, 20

Re: [ceph-users] btrfs w/ centos 7.1

2015-08-07 Thread Shinobu Kinjo
Hello, Ceph is not problem. Problem is that btrfs is not still production. There are many testing line in source codes. But it's really up to you which filesystem you use. Each filesystem has unique functions so you have to consider them to get best performance from one of them. Meaning that th

Re: [ceph-users] btrfs w/ centos 7.1

2015-08-07 Thread Ross Annetts
Hi Ben, RedHat (which CentOS is based off) have included btrfs in RHEL7 as Technology Preview. This basically means that they are happy for you to use it at your own risk. I have spoken with an engineer there and they said they would basically try to support any related issues using it that