Hi All...
I am still fighting with this issue. It may be something which is not
properly implemented, and if that is the case, that is fine.
I am still trying to understand what is the real space occupied by files
in a /cephfs filesystem, reported for example by a df.
Maybe I did not expla
On Fri, Aug 7, 2015 at 3:41 PM, Goncalo Borges
wrote:
> Hi All...
>
> I am still fighting with this issue. It may be something which is not
> properly implemented, and if that is the case, that is fine.
>
> I am still trying to understand what is the real space occupied by files in
> a /cephfs fil
You'd need to disable the udev rule as well as the initscript (probably
somewhere in /lib/udev/)
What I do when I'm restarting the server is:
chmod -x /usr/bin/ceph-osd
Jan
> On 07 Aug 2015, at 05:11, Nathan O'Sullivan wrote:
>
> I'm seeing the same sort of issue.
>
> Any suggestions on how
You're not really testing only a RBD device there - you're testing
1) the O_DIRECT implementation in the kernel version you have (they differ)
- try different kernels in guest
2) cache implementation in qemu (and possibly virtio block driver) - if it's
enabled
- disable it for this test completely
Well, you could explicitly export HOME=/root then, that should make it go away.
I think it's normally only present in a login shell.
Jan
> On 06 Aug 2015, at 17:51, Josh Durgin wrote:
>
> On 08/06/2015 03:10 AM, Daleep Bais wrote:
>> Hi,
>>
>> Whenever I restart or check the logs for OSD, MON,
Hi!
I have a large number of inconsistent pgs 229 of 656, and it's increasing
every hour.
I'm using ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3).
For example, pg 3.d8:
# ceph health detail | grep 3.d8
pg 3.d8 is active+clean+scrubbing+deep+inconsistent, acting [1,7]
# grep 3.d8
Hi!
Do you have any disk errors in dmesg output? In our practice, every time the
deep
scrub found inconsistent PG, we also found a disk error, that was the reason.
Sometimes it was media errors (bad sectors), one time - bad sata cable and
we also had some raid/hba firmware issues. But in all case
No, dmesg is clean on both hosts of osd.1 (block0) and osd.7 (block2).
There are only boot time messages (listings below).
If there is cable or SATA-controller issues, will it be shown in
/var/log/dmesg?
block0 dmesg:
[5.296302] XFS (sdb1): Mounting V4 Filesystem
[5.316487] XFS (sda1): Mo
Hi!
We do some performance tests on our small Hammer install:
- Debian Jessie;
- Ceph Hammer 0.94.2 self-built from sources (tcmalloc)
- 1xE5-2670 + 128Gb RAM
- 2 nodes shared with mons, system and mon DB are on separate SAS mirror;
- 16 OSD on each node, SAS 10k;
- 2 Intel DC S3700 200Gb SS
Hi!
>If there is cable or SATA-controller issues, will it be shown in
>/var/log/dmesg?
If it will lead to read errors, it will be logged in dmesg. If it cause only
SATA command retransmission, it maybe won't logged in dmesg,
but have to be shown in SMART attributes.
And anyway, we face only som
"you use XFS on your OSDs?"
This OSD was formatted in BTRFS as a whole block device /dev/sdc (with no
partition table). Then I moved from BTRFS to XFS /dev/sdc1 (with partition
table), because BTRFS was v-v-very slow. Maybe partprober sees some old
signatures from first sectors of that disk...
By
Hi!
When inconsistent PGs starting to appear? Maybe after some event?
Hang, node reboot or after reconfiguration or changing parameters?
Can you say, what triggers such behaviour? And, BTW, what system/kernel
you use?
Megov Igor
CIO, Yuterra
___
ceph-u
It's hard to say now. I changed one-by-one my 6 OSDs from btrfs to xfs.
During the repair process I added 2 more OSDs. Changed crush map from
root-host-osd to root-*chasis*-host-osd structure... There was SSD cache
tiering set, when first inconsistency showed up. Then I removed tiering to
confirm t
When I changed crush from root-host-osd to root-chasis-host-osd, did I've
to change default ruleset? I didn't changed it. It looks like this:
rule replicated_ruleset {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
п
Hi,
I recently add 5 nodes to my ceph cluster, each node store 16 osd.
My old nodes were in firefly release and i upgrade my cluster in hammer.
My problem is when i stop a node or an osd (with or without set noout
first) the OSD are not seen as down. All OSD are still up and i have
manifold block
Hi,
some time ago I switched all OSDs from XFS to ext4 (step by step).
I had no issues during mixed osd-format (the process takes some weeks).
And yes, for me ext4 performs also better (esp. the latencies).
Udo
Am 07.08.2015 13:31, schrieb Межов Игорь Александрович:
> Hi!
>
> We do some perform
Hi,
On 08/07/2015 04:04 PM, Udo Lembke wrote:
Hi,
some time ago I switched all OSDs from XFS to ext4 (step by step).
I had no issues during mixed osd-format (the process takes some weeks).
And yes, for me ext4 performs also better (esp. the latencies).
Just out of curiosity:
Do you use a ext
Hi,
I use the ext4-parameters like Christian Balzer wrote in one posting:
osd mount options ext4 = "user_xattr,rw,noatime,nodiratime"
osd_mkfs_options_ext4 = -J size=1024 -E lazy_itable_init=0,lazy_journal_init=0
The osd-journals are on SSD-Partitions (without filesystem). IMHO ext4 don't
support
Hi,
On 08/07/2015 04:30 PM, Udo Lembke wrote:
Hi,
I use the ext4-parameters like Christian Balzer wrote in one posting:
osd mount options ext4 = "user_xattr,rw,noatime,nodiratime"
osd_mkfs_options_ext4 = -J size=1024 -E lazy_itable_init=0,lazy_journal_init=0
Thx for the details.
The osd-journ
ext4 does support external journal, and it is _FAST_
btw I'm not sure noatime is the right option nowadays for two reasons
1) the default is "relatime" which has minimal impact on performance
2) AFAIK some ceph features actually use atime (cache tiering was it?) or at
least so I gathered from som
Hi!
>No, I was indeed talking about the ext4 journals, e.g. described here:
...
>but the problem with the persistent device names is keeping me from trying it.
So you assume 3-way setup in Ceph: first drive for filesystem data, second
drive for filesystem journal and third drive for ceph journal?
An interesting benchmark would be to compare "Ceph SSD journal" + "ext4 on
spinner" versus "Ceph without journal" + "ext4 on spinner with external SSD
journal".
I won't be surprised if the second outperformed the first - you are actually
making the whole setup much simpler and Ceph is mostly CPU
Hi,
I think also it's much too complicate and the effort is not in any relation,
like Megor allready wrote the osd-journal
on SSD handle the speed.
But for the persistant device names you can easily use partlabel and select the
disk with something like
/dev/disk/by-partlabel/ext4-journal-15
I d
Hi!
I'm sorry, but I dont know, how to help you. We move OSDs from XFS to EXT4 on
our test
cluster (Hammer 0.94.2), removing ODSs one-by-one and re-adding them after
reformatting
to EXT4. This process is usual to a ceph (Add/Remove OSDs in documentaion) and
took
place without any data loss. We
Hi!
>An interesting benchmark would be to compare "Ceph SSD journal" + "ext4 on
>spinner" >versus "Ceph without journal" + "ext4 on spinner with external SSD
>journal".
>I won't be surprised if the second outperformed the first - you are actually
>making
>the whole setup much simpler and Ceph
De : Jake Young [mailto:jak3...@gmail.com]
Envoyé : mercredi 29 juillet 2015 17:13
À : SCHAER Frederic
Cc : ceph-users@lists.ceph.com
Objet : Re: [ceph-users] Ceph 0.94 (and lower) performance on >1 hosts ??
On Tue, Jul 28, 2015 at 11:48 AM, SCHAER Frederic
mailto:frederic.sch...@cea.fr>> wrote
Hello Cephers,
I am benchmarking CephFS. In one of my experiments, I change the object
size.
I start from 64kb. Everytime I do different block size reads and writes.
By increasing the object size to 64MB and increasing the block size to
64MB, CephFS crashes (shown in the chart below). What I mean
Did you copy the OSD objects between btrfs->xfs or did you remove the btrfs OSD
and add a new XFS OSD?
Jan
> On 07 Aug 2015, at 17:06, Межов Игорь Александрович wrote:
>
> Hi!
>
> I'm sorry, but I dont know, how to help you. We move OSDs from XFS to EXT4 on
> our test
> cluster (Hammer 0.94
Dear all,
I got to an unrecoverable crash at one specific OSD, every time I try to
restart it. It happened first at firefly 0.80.8, I updated to 0.80.10,
but it continued to happen.
Due to this failure, I have several PGs down+peering, that won't recover
even marking the OSD out.
Could som
I removed btrfs OSD as written in docs, reformatted it to xfs, and then
added as new OSD.
пт, 7 авг. 2015 г. в 19:00, Jan Schermer :
> Did you copy the OSD objects between btrfs->xfs or did you remove the
> btrfs OSD and add a new XFS OSD?
>
> Jan
>
> On 07 Aug 2015, at 17:06, Межов Игорь Алексан
Hi
We are experiencing an annoying problem where scrubs make OSD's flap down
and cause Ceph cluster to be unusable for couple of minutes.
Our cluster consists of three nodes connected with 40gbit infiniband using
IPoIB, with 2x 6 core X5670 CPU's and 64GB of memory
Each node has 6 SSD's fo
That kind of behavior is usually caused by the OSDs getting busy enough
that they aren't answering heartbeats in a timely fashion. It can also
happen if you have any netowrk flakiness and heartbeats are getting lost
because of that.
I think (I'm not positive though) that increasing your heartbeat
Thanks
We play with the values a bit and see what happens.
Br,
Tuomas
From: Quentin Hartman [mailto:qhart...@direwolfdigital.com]
Sent: 7. elokuuta 2015 20:32
To: Tuomas Juntunen
Cc: ceph-users
Subject: Re: [ceph-users] Flapping OSD's when scrubbing
That kind of behavior is usu
Hi!
One time I faced such a behavior of my home cluster. At the time my OSDs go
down I noticed that node is using swap despite of sufficient memory. Tuning
/proc/sys/vm/swappiness to 0 helped to solve the problem.
пт, 7 авг. 2015 г. в 20:41, Tuomas Juntunen :
> Thanks
>
>
>
> We play with the va
Hi Jan,
thanks for the hint.
I changed the mount-option from noatime to relatime and will remount all
OSDs during weekend.
Udo
On 07.08.2015 16:37, Jan Schermer wrote:
> ext4 does support external journal, and it is _FAST_
>
> btw I'm not sure noatime is the right option nowadays for two reasons
Howdy,
The Ceph docs still say btrfs is 'experimental' in one section, but
say it's the long term ideal for ceph in the later section. Is this
still accurate with Hammer? Is it mature enough on centos 7.1 for
production use?
(kernel is 3.10.0-229.7.2.el7.x86_64 )
thanks-
-Ben
_
I'v tested it on my home cluster: 8 OSDs (4 nodes by 2x4TB OSDs with
Celeron J1900 and 8GB RAM) + 4 cache tier OSDs (2 nodes by 2x250GB SSD OSDs
with Atom D2500 and 4GB RAM).
HDD OSDs worked v-v-very slow. And SSD OSDs sometimes stopped working
because btrfs couldn't rebalance quickly enough and ov
I would say probably not. btrfs (or, "worse FS" as we call it around my
office) still does weird stuff from time to time, especially in low-memory
conditions. This is based on testing we did on Ubuntu 14.04, running kernel
3.16.something.
I long for the day that btrfs realizes it's promise, but I
The answer to this, as well as life, universe and everything, is simple:
ZFS.
:)
> On 07 Aug 2015, at 22:24, Quentin Hartman
> wrote:
>
> I would say probably not. btrfs (or, "worse FS" as we call it around my
> office) still does weird stuff from time to time, especially in low-memory
> con
Hi
Thanks, we were able to resolve the problem by disabling swap completely, no
need for it anyway.
Also memory was fragmenting since all memory was used for caching
Running “perf top” we saw that freeing blocks of memory took all the cpu power
Samples: 4M of event 'cycles', Event
Our cluster is primarily used for RGW, but would like to use for RBD
eventually...
We don't have SSDs on our journals (for a while yet) and we're still
updating our cluster to 10GBE.
I do see some pretty high commit and apply latencies in 'osd perf'
often 100-500 ms, which figure is a result of t
Le 07/08/2015 22:05, Ben Hines a écrit :
> Howdy,
>
> The Ceph docs still say btrfs is 'experimental' in one section, but
> say it's the long term ideal for ceph in the later section. Is this
> still accurate with Hammer? Is it mature enough on centos 7.1 for
> production use?
>
> (kernel is 3.10.
Yes, if you dig down older mails, you will see I reported that as a Ubuntu
kernel bug (not sure about other Linux flavors) .. vm.min_free_kbytes is the
way to work around that..
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Tuomas
Juntunen
Sent: Friday, August 07, 20
Hello,
Ceph is not problem. Problem is that btrfs is not still production.
There are many testing line in source codes.
But it's really up to you which filesystem you use.
Each filesystem has unique functions so you have to consider
them to get best performance from one of them.
Meaning that th
Hi Ben,
RedHat (which CentOS is based off) have included btrfs in RHEL7 as
Technology Preview. This basically means that they are happy for you to
use it at your own risk. I have spoken with an engineer there and they
said they would basically try to support any related issues using it
that
45 matches
Mail list logo