[ceph-users] Re: CephFS: What is the maximum number of files per directory

2020-06-24 Thread Darren Soothill
So here is the answer from the docs

https://docs.ceph.com/docs/master/cephfs/app-best-practices/

There isnt a physical limit but you may well hit operational issues with some 
tools.

Best practice would be to create some sort of directory tree and not just dump 
all the files you have into a single directory.

Darren



Sent from my iPhone

On 24 Jun 2020, at 07:57, Martin Palma  wrote:

Hi, What is the maximum number of files per directory? I could find
the answer in the docs.

Best,
Martin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to remove one of two filesystems

2020-06-24 Thread Frank Schilder
Hi Francois,

I have seen reports of poor performance from Nautilus onwards and you might be 
hit by this. This might require a ticket. There is a hypothesis that a 
regression occurred that affects the cluster's ability to run background 
operations properly.

What you observe should not happen and I didn't see any of this on mimic when 
removing a 120TB file system.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Francois Legrand 
Sent: 24 June 2020 00:25:03
To: Patrick Donnelly; Frank Schilder
Cc: ceph-users
Subject: Re: [ceph-users] Re: How to remove one of two filesystems

Thanks a lot. It works.
I could delete the filesystem and remove the pools (data and metadata).
But now I am facing another problem which is that the removal of the pools 
seems to take a incredible time to free the space (the pool I deleted was about 
100TB and in 36h I got back only 10TB). In the meantime, the cluster is 
extremely slow (a rbd extract takes ~30 mn for a 9 GB image and writing 10MB in 
cephfs takes half a minute !!) which makes the cluster almost unusable.
It seems that the removal of deleted pg is done by deep-scrubs according to  
https://medium.com/opsops/a-very-slow-pool-removal-7089e4ac8301
But I couldn't find a way to speedup the process or to get back the cluster to 
a decent reactivity ?
Do you have a suggestion ?
F.


Le 22/06/2020 à 16:40, Patrick Donnelly a écrit :

On Mon, Jun 22, 2020 at 7:29 AM Frank Schilder 
 wrote:



Use

ceph fs set  down true

after this all mdses of fs fs_name will become standbys. Now you can cleanly 
remove everything.

Wait for the fs to be shown as down in ceph status, the command above is 
non-blocking but the shutdown takes a long time. Try to disconnect all clients 
first.



If you're planning to delete the file system, it is faster to just do:

ceph fs fail 

which will remove all the MDS and mark the cluster as not joinable.
See also: 
https://docs.ceph.com/docs/master/cephfs/administration/#taking-the-cluster-down-rapidly-for-deletion-or-disaster-recovery



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs

2020-06-24 Thread Stefan Priebe - Profihost AG
HI Ben,

yes we have the same issues and switched to seagate for those reasons.

you can fix at least a big part of it by disabling the write cache of
those drives - generally speaking it seems the toshiba firmware is broken.

I was not able to find a newer one.

Greets,
Stefan

Am 24.06.20 um 09:43 schrieb Benoît Knecht:
> Hi,
> 
> We have a Nautilus (14.2.9) Ceph cluster with two types of HDDs:
> 
> - TOSHIBA MG07ACA14TE   [1]
> - HGST HUH721212ALE604  [2]
> 
> They're all bluestore OSDs with no separate DB+WAL and part of the same pool.
> 
> We noticed that while the HGST OSDs have a commit latency of about 15ms, the 
> Toshiba OSDs hover around 150ms (these values come from the 
> `ceph_osd_commit_latency_ms` metric in Prometheus).
> 
> On paper, it seems like those drives have very similar specs, so it's not 
> clear to me why we're seeing such a large difference when it comes to commit 
> latency.
> 
> Has anyone had any experience with those Toshiba drives? Or looking at the 
> specs, do you spot anything suspicious?
> 
> And if you're running a Ceph cluster with various disk brands/models, have 
> you ever noticed some of them standing out when looking at 
> `ceph_osd_commit_latency_ms`?
> 
> Thanks in advance for your feedback.
> 
> Cheers,
> 
> --
> Ben
> 
> [1]: 
> https://toshiba.semicon-storage.com/content/dam/toshiba-ss/asia-pacific/docs/product/storage/product-manual/eHDD-MG07ACA-Product-Manual.pdf
> [2]: 
> https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/data-center-drives/ultrastar-dc-hc500-series/data-sheet-ultrastar-dc-hc520.pdf
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs

2020-06-24 Thread Benoît Knecht
Hi,

We have a Nautilus (14.2.9) Ceph cluster with two types of HDDs:

- TOSHIBA MG07ACA14TE   [1]
- HGST HUH721212ALE604  [2]

They're all bluestore OSDs with no separate DB+WAL and part of the same pool.

We noticed that while the HGST OSDs have a commit latency of about 15ms, the 
Toshiba OSDs hover around 150ms (these values come from the 
`ceph_osd_commit_latency_ms` metric in Prometheus).

On paper, it seems like those drives have very similar specs, so it's not clear 
to me why we're seeing such a large difference when it comes to commit latency.

Has anyone had any experience with those Toshiba drives? Or looking at the 
specs, do you spot anything suspicious?

And if you're running a Ceph cluster with various disk brands/models, have you 
ever noticed some of them standing out when looking at 
`ceph_osd_commit_latency_ms`?

Thanks in advance for your feedback.

Cheers,

--
Ben

[1]: 
https://toshiba.semicon-storage.com/content/dam/toshiba-ss/asia-pacific/docs/product/storage/product-manual/eHDD-MG07ACA-Product-Manual.pdf
[2]: 
https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/data-center-drives/ultrastar-dc-hc500-series/data-sheet-ultrastar-dc-hc520.pdf
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Removing pool in nautilus is incredibly slow

2020-06-24 Thread Francois Legrand

Hello,
I am running ceph nautilus 14.2.8
I had to remove 2 pools (old cephfs data and metadata pool with 1024 pgs).
The removal of the pools seems to take a incredible time to free the 
space (the data pool I deleted was more than 100 TB and in 36h I got 
back only 10TB). In the meantime, the cluster is extremely slow (a rbd 
extract takes ~1h30 mn for a 32 GB image and writing 10MB in cephfs 
takes half a minute !!) which makes the cluster almost unusable.
It seems that the removal of deleted pg is done by deep-scrubs according 
tohttps://medium.com/opsops/a-very-slow-pool-removal-7089e4ac8301
Also it has been reported that this could be a regression in 
nautilushttps://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/W4M5XQRDBLXFGJGDYZALG6TQ4QBVGGAJ/#W4M5XQRDBLXFGJGDYZALG6TQ4QBVGGAJ 



But I couldn't find a fix or a way to speedup (or slow down) the process 
and get back the cluster to a decent reactivity.

Is there a way ?
Thanks
F.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to ceph-volume on remote hosts?

2020-06-24 Thread Sebastian Wagner


Am 24.06.20 um 05:15 schrieb steven prothero:
> Hello,
> 
> I am new to CEPH and on a few test servers attempting to setup and
> learn a test ceph system.
> 
> I started off the install with the "Cephadm" option and it uses podman
> containers.
> Followed steps here:
> https://docs.ceph.com/docs/master/cephadm/install/
> 
> I ran the bootstrap, added remote hosts, added monitors and everything
> is looking good.
> 
> Now I would like to add OSDs...
> 
> On the bootstrapped server i did a :
> 
> ceph-volume lvm prepare   --data /dev/sda6
>and then the "activate" and "ceph orch daemon add osd (etc)" to add
> it and it works...
> 
> But now I am ready to add OSDs on the remote nodes. I am not able to
> find documentation or examples on how to do :
> 
>   ceph-volume lvm prepare & activate steps on the remote hosts.
> 
> How do we prepare & activate the remote hosts disks?

ceph orch apply osd

as described in
https://docs.ceph.com/docs/master/cephadm/install/#deploy-osds

should do the trick. In case it doesn't, what's the output of

ceph orch device ls

?

> 
> Thank you very much for your input,
> 
> Cheers
> Steve
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 

-- 
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg). Geschäftsführer: Felix Imendörffer



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs

2020-06-24 Thread Mark Nelson
This isn't the first time I've seen drive cache cause problematic 
latency issues, and not always from the same manufacturer. Unfortunately 
it seems like you really have to test the drives you want to use before 
deploying them them to make sure you don't run into issues.



Mark


On 6/24/20 6:36 AM, Stefan Priebe - Profihost AG wrote:

HI Ben,

yes we have the same issues and switched to seagate for those reasons.

you can fix at least a big part of it by disabling the write cache of
those drives - generally speaking it seems the toshiba firmware is broken.

I was not able to find a newer one.

Greets,
Stefan

Am 24.06.20 um 09:43 schrieb Benoît Knecht:

Hi,

We have a Nautilus (14.2.9) Ceph cluster with two types of HDDs:

- TOSHIBA MG07ACA14TE   [1]
- HGST HUH721212ALE604  [2]

They're all bluestore OSDs with no separate DB+WAL and part of the same pool.

We noticed that while the HGST OSDs have a commit latency of about 15ms, the 
Toshiba OSDs hover around 150ms (these values come from the 
`ceph_osd_commit_latency_ms` metric in Prometheus).

On paper, it seems like those drives have very similar specs, so it's not clear 
to me why we're seeing such a large difference when it comes to commit latency.

Has anyone had any experience with those Toshiba drives? Or looking at the 
specs, do you spot anything suspicious?

And if you're running a Ceph cluster with various disk brands/models, have you 
ever noticed some of them standing out when looking at 
`ceph_osd_commit_latency_ms`?

Thanks in advance for your feedback.

Cheers,

--
Ben

[1]: 
https://toshiba.semicon-storage.com/content/dam/toshiba-ss/asia-pacific/docs/product/storage/product-manual/eHDD-MG07ACA-Product-Manual.pdf
[2]: 
https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/data-center-drives/ultrastar-dc-hc500-series/data-sheet-ultrastar-dc-hc520.pdf
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs

2020-06-24 Thread Igor Fedotov

Benoit, wondering what are the write cache settings in your case?

And do you see any difference after disabling it if any?


Thanks,

Igor


On 6/24/2020 3:16 PM, Mark Nelson wrote:
This isn't the first time I've seen drive cache cause problematic 
latency issues, and not always from the same manufacturer. 
Unfortunately it seems like you really have to test the drives you 
want to use before deploying them them to make sure you don't run into 
issues.



Mark


On 6/24/20 6:36 AM, Stefan Priebe - Profihost AG wrote:

HI Ben,

yes we have the same issues and switched to seagate for those reasons.

you can fix at least a big part of it by disabling the write cache of
those drives - generally speaking it seems the toshiba firmware is 
broken.


I was not able to find a newer one.

Greets,
Stefan

Am 24.06.20 um 09:43 schrieb Benoît Knecht:

Hi,

We have a Nautilus (14.2.9) Ceph cluster with two types of HDDs:

- TOSHIBA MG07ACA14TE   [1]
- HGST HUH721212ALE604  [2]

They're all bluestore OSDs with no separate DB+WAL and part of the 
same pool.


We noticed that while the HGST OSDs have a commit latency of about 
15ms, the Toshiba OSDs hover around 150ms (these values come from 
the `ceph_osd_commit_latency_ms` metric in Prometheus).


On paper, it seems like those drives have very similar specs, so 
it's not clear to me why we're seeing such a large difference when 
it comes to commit latency.


Has anyone had any experience with those Toshiba drives? Or looking 
at the specs, do you spot anything suspicious?


And if you're running a Ceph cluster with various disk 
brands/models, have you ever noticed some of them standing out when 
looking at `ceph_osd_commit_latency_ms`?


Thanks in advance for your feedback.

Cheers,

--
Ben

[1]: 
https://toshiba.semicon-storage.com/content/dam/toshiba-ss/asia-pacific/docs/product/storage/product-manual/eHDD-MG07ACA-Product-Manual.pdf
[2]: 
https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/data-center-drives/ultrastar-dc-hc500-series/data-sheet-ultrastar-dc-hc520.pdf

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to remove one of two filesystems

2020-06-24 Thread Frank Schilder
Here is a thread that seems most relevant: 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/W4M5XQRDBLXFGJGDYZALG6TQ4QBVGGAJ/#W4M5XQRDBLXFGJGDYZALG6TQ4QBVGGAJ

I do not see this issue on mimic, but it seems to be a problem from nautilus 
onwards.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Frank Schilder 
Sent: 24 June 2020 09:34:24
To: Patrick Donnelly; f...@lpnhe.in2p3.fr
Cc: ceph-users
Subject: [ceph-users] Re: How to remove one of two filesystems

Hi Francois,

I have seen reports of poor performance from Nautilus onwards and you might be 
hit by this. This might require a ticket. There is a hypothesis that a 
regression occurred that affects the cluster's ability to run background 
operations properly.

What you observe should not happen and I didn't see any of this on mimic when 
removing a 120TB file system.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Francois Legrand 
Sent: 24 June 2020 00:25:03
To: Patrick Donnelly; Frank Schilder
Cc: ceph-users
Subject: Re: [ceph-users] Re: How to remove one of two filesystems

Thanks a lot. It works.
I could delete the filesystem and remove the pools (data and metadata).
But now I am facing another problem which is that the removal of the pools 
seems to take a incredible time to free the space (the pool I deleted was about 
100TB and in 36h I got back only 10TB). In the meantime, the cluster is 
extremely slow (a rbd extract takes ~30 mn for a 9 GB image and writing 10MB in 
cephfs takes half a minute !!) which makes the cluster almost unusable.
It seems that the removal of deleted pg is done by deep-scrubs according to  
https://medium.com/opsops/a-very-slow-pool-removal-7089e4ac8301
But I couldn't find a way to speedup the process or to get back the cluster to 
a decent reactivity ?
Do you have a suggestion ?
F.


Le 22/06/2020 à 16:40, Patrick Donnelly a écrit :

On Mon, Jun 22, 2020 at 7:29 AM Frank Schilder 
 wrote:



Use

ceph fs set  down true

after this all mdses of fs fs_name will become standbys. Now you can cleanly 
remove everything.

Wait for the fs to be shown as down in ceph status, the command above is 
non-blocking but the shutdown takes a long time. Try to disconnect all clients 
first.



If you're planning to delete the file system, it is faster to just do:

ceph fs fail 

which will remove all the MDS and mark the cluster as not joinable.
See also: 
https://docs.ceph.com/docs/master/cephfs/administration/#taking-the-cluster-down-rapidly-for-deletion-or-disaster-recovery



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs

2020-06-24 Thread Igor Fedotov

Benoit, thanks for the update.

for the sake of completeness one more experiment please if possible:

turn off write cache for HGST drives and measure commit latency once again.


Kind regards,

Igor

On 6/24/2020 3:53 PM, Benoît Knecht wrote:

Thank you all for your answers, this was really helpful!

Stefan Priebe wrote:

yes we have the same issues and switched to seagate for those reasons.
you can fix at least a big part of it by disabling the write cache of
those drives - generally speaking it seems the toshiba firmware is
broken.
I was not able to find a newer one.

Good to know that we're not alone :) I also looked for a newer firmware, to no
avail.

Igor Fedotov wrote:

Benoit, wondering what are the write cache settings in your case?

And do you see any difference after disabling it if any?

Write cache is enabled on all our OSDs (including the HGST drives that don't
have a latency issue).

To see if disabling write cache on the Toshiba drives would help, I turned it
off on all 12 drives in one of our OSD nodes:

```
for disk in /dev/sd{a..l}; do hdparm -W0 $disk; done
```

and left it on in the remaining nodes. I used `rados bench write` to create
some load on the cluster, and looked at

```
avg by (hostname) (ceph_osd_commit_latency_ms * on (ceph_daemon) group_left 
(hostname) ceph_osd_metadata)
```

in Prometheus. The hosts with write cache _enabled_ had a commit latency around
145ms, while the host with write cache _disabled_ had a commit latency around
25ms. So it definitely helps!

Mark Nelson wrote:

This isn't the first time I've seen drive cache cause problematic
latency issues, and not always from the same manufacturer.
Unfortunately it seems like you really have to test the drives you
want to use before deploying them them to make sure you don't run into
issues.

That's very true! Data sheets and even public benchmarks can be quite
deceiving, and two hard drives that seem to have similar performance profiles
can perform very differently within a Ceph cluster. Lesson learned.

Cheers,

--
Ben

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to ceph-volume on remote hosts?

2020-06-24 Thread Simon Sutter
Hello,

If you do it, like Sebastian told you, you would automatically deploy osd's.
For a beginner I would recommend to do it semi-automated, so you know a bit 
more what's going on.

So first do the "ceph orch device ls" which should print every disk, in all 
nodes.
Then I recommend to zap devices you want to use first, even if it says you can 
use them:
ceph orch device zap {node} /dev/sd? --force

And then you can add devices one by one:
ceph orch daemon add osd {node}:/dev/sd?

Just to top it off.

Regards,
Simon

-Ursprüngliche Nachricht-
Von: Sebastian Wagner [mailto:swag...@suse.com] 
Gesendet: Mittwoch, 24. Juni 2020 14:06
An: steven prothero ; ceph-users@ceph.io
Betreff: [ceph-users] Re: How to ceph-volume on remote hosts?



Am 24.06.20 um 05:15 schrieb steven prothero:
> Hello,
> 
> I am new to CEPH and on a few test servers attempting to setup and 
> learn a test ceph system.
> 
> I started off the install with the "Cephadm" option and it uses podman 
> containers.
> Followed steps here:
> https://docs.ceph.com/docs/master/cephadm/install/
> 
> I ran the bootstrap, added remote hosts, added monitors and everything 
> is looking good.
> 
> Now I would like to add OSDs...
> 
> On the bootstrapped server i did a :
> 
> ceph-volume lvm prepare   --data /dev/sda6
>and then the "activate" and "ceph orch daemon add osd (etc)" to add 
> it and it works...
> 
> But now I am ready to add OSDs on the remote nodes. I am not able to 
> find documentation or examples on how to do :
> 
>   ceph-volume lvm prepare & activate steps on the remote hosts.
> 
> How do we prepare & activate the remote hosts disks?

ceph orch apply osd

as described in
https://docs.ceph.com/docs/master/cephadm/install/#deploy-osds

should do the trick. In case it doesn't, what's the output of

ceph orch device ls

?

> 
> Thank you very much for your input,
> 
> Cheers
> Steve
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
> email to ceph-users-le...@ceph.io
> 

--
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany 
(HRB 36809, AG Nürnberg). Geschäftsführer: Felix Imendörffer

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs

2020-06-24 Thread Frank R
fyi, there is an interesting note on disabling the write cache here:

https://yourcmc.ru/wiki/index.php?title=Ceph_performance&mobileaction=toggle_view_desktop#Drive_cache_is_slowing_you_down

On Wed, Jun 24, 2020 at 9:45 AM Benoît Knecht  wrote:
>
> Hi Igor,
>
> Igor Fedotov wrote:
> > for the sake of completeness one more experiment please if possible:
> >
> > turn off write cache for HGST drives and measure commit latency once again.
>
> I just did the same experiment with HGST drives, and disabling the write cache
> on those drives brought the latency down from about 7.5ms to about 4ms.
>
> So it seems disabling the write cache across the board would be advisable in
> our case. Is it recommended in general, or specifically when the DB+WAL is on
> the same hard drive?
>
> Stefan, Mark, are you disabling the write cache on your HDDs by default?
>
> Cheers,
>
> --
> Ben
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Feedback of the used configuration

2020-06-24 Thread Simon Sutter
Hello,

After two months of the "ceph try and error game", I finally managed to get an 
Octopuss cluster up and running.
The unconventional thing about it is, it's just for hot backups, no virtual 
machines on there.
All the  nodes are without any caching ssd's, just plain hdd's.
At the moment there are eight of them with a total of 50TB. We are planning to 
go up to 25 and bigger disks so we end on 300TB-400TB

I decided to go with cephfs, because I don't have any experience in things like 
S3 and I need to read the same file system from more than one client.

I made one cephfs with a replicated pool.
On there I added erasure-coded pools to save some Storage.
To add those pools, I did it with the setfattr command like this:
setfattr -n ceph.dir.layout.pool -v ec_data_server1 /cephfs/nfs/server1

Some of our servers cannot use cephfs (old kernels, special OS's) so I have to 
use nfs.
This is set up with the included ganesha-nfs.
Exported is the /cephfs/nfs folder and clients can mount folders below this.

There are two final questions:

-  Was it right to go with the way of "mounting" pools with setfattr, 
or should I have used multiple cephfs?

First I was thinking about using multiple cephfs but there are warnings 
everywhere. The deeper I got in, the more it seems I would have been fine with 
multiple cephfs.

-  Is there a way I don't know, but it would be easier?

I still don't know much about Rest, S3, RBD etc... so there may be a better way

Other remarks are desired.

Thanks in advance,
Simon
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs

2020-06-24 Thread Paul Emmerich
Has anyone ever encountered a drive with a write cache that actually
*helped*?
I haven't.

As in: would it be a good idea for the OSD to just disable the write cache
on startup? Worst case it doesn't do anything, best case it improves
latency.

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


On Wed, Jun 24, 2020 at 3:49 PM Frank R  wrote:

> fyi, there is an interesting note on disabling the write cache here:
>
>
> https://yourcmc.ru/wiki/index.php?title=Ceph_performance&mobileaction=toggle_view_desktop#Drive_cache_is_slowing_you_down
>
> On Wed, Jun 24, 2020 at 9:45 AM Benoît Knecht 
> wrote:
> >
> > Hi Igor,
> >
> > Igor Fedotov wrote:
> > > for the sake of completeness one more experiment please if possible:
> > >
> > > turn off write cache for HGST drives and measure commit latency once
> again.
> >
> > I just did the same experiment with HGST drives, and disabling the write
> cache
> > on those drives brought the latency down from about 7.5ms to about 4ms.
> >
> > So it seems disabling the write cache across the board would be
> advisable in
> > our case. Is it recommended in general, or specifically when the DB+WAL
> is on
> > the same hard drive?
> >
> > Stefan, Mark, are you disabling the write cache on your HDDs by default?
> >
> > Cheers,
> >
> > --
> > Ben
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Feedback of the used configuration

2020-06-24 Thread Paul Emmerich
Have a look at cephfs subvolumes:
https://docs.ceph.com/docs/master/cephfs/fs-volumes/#fs-subvolumes

They are internally just directories with quota/pool placement
layout/namespace with some mgr magic to make it easier than doing that all
by hand

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


On Wed, Jun 24, 2020 at 4:38 PM Simon Sutter  wrote:

> Hello,
>
> After two months of the "ceph try and error game", I finally managed to
> get an Octopuss cluster up and running.
> The unconventional thing about it is, it's just for hot backups, no
> virtual machines on there.
> All the  nodes are without any caching ssd's, just plain hdd's.
> At the moment there are eight of them with a total of 50TB. We are
> planning to go up to 25 and bigger disks so we end on 300TB-400TB
>
> I decided to go with cephfs, because I don't have any experience in things
> like S3 and I need to read the same file system from more than one client.
>
> I made one cephfs with a replicated pool.
> On there I added erasure-coded pools to save some Storage.
> To add those pools, I did it with the setfattr command like this:
> setfattr -n ceph.dir.layout.pool -v ec_data_server1 /cephfs/nfs/server1
>
> Some of our servers cannot use cephfs (old kernels, special OS's) so I
> have to use nfs.
> This is set up with the included ganesha-nfs.
> Exported is the /cephfs/nfs folder and clients can mount folders below
> this.
>
> There are two final questions:
>
> -  Was it right to go with the way of "mounting" pools with
> setfattr, or should I have used multiple cephfs?
>
> First I was thinking about using multiple cephfs but there are warnings
> everywhere. The deeper I got in, the more it seems I would have been fine
> with multiple cephfs.
>
> -  Is there a way I don't know, but it would be easier?
>
> I still don't know much about Rest, S3, RBD etc... so there may be a
> better way
>
> Other remarks are desired.
>
> Thanks in advance,
> Simon
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs

2020-06-24 Thread Paul Emmerich
Well, what I was saying was "does it hurt to unconditionally run hdparm -W
0 on all disks?"

Which disk would suffer from this? I haven't seen any disk where this would
be a bad idea


Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


On Wed, Jun 24, 2020 at 5:35 PM Frank Schilder  wrote:

> Yes, non-volatile write cache helps as described in the wiki. When you
> disable write cache with hdparm, it actually only disables volatile write
> cache. That's why SSDs with power loss protection are recommended for ceph.
>
> A SAS/SATA SSD without any write cache will perform poorly no matter what.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Paul Emmerich 
> Sent: 24 June 2020 17:30:51
> To: Frank R
> Cc: Benoît Knecht; s.pri...@profihost.ag; ceph-users@ceph.io
> Subject: [ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba
> MG07ACA14TE HDDs
>
> Has anyone ever encountered a drive with a write cache that actually
> *helped*?
> I haven't.
>
> As in: would it be a good idea for the OSD to just disable the write cache
> on startup? Worst case it doesn't do anything, best case it improves
> latency.
>
> Paul
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
>
> On Wed, Jun 24, 2020 at 3:49 PM Frank R  wrote:
>
> > fyi, there is an interesting note on disabling the write cache here:
> >
> >
> >
> https://yourcmc.ru/wiki/index.php?title=Ceph_performance&mobileaction=toggle_view_desktop#Drive_cache_is_slowing_you_down
> >
> > On Wed, Jun 24, 2020 at 9:45 AM Benoît Knecht 
> > wrote:
> > >
> > > Hi Igor,
> > >
> > > Igor Fedotov wrote:
> > > > for the sake of completeness one more experiment please if possible:
> > > >
> > > > turn off write cache for HGST drives and measure commit latency once
> > again.
> > >
> > > I just did the same experiment with HGST drives, and disabling the
> write
> > cache
> > > on those drives brought the latency down from about 7.5ms to about 4ms.
> > >
> > > So it seems disabling the write cache across the board would be
> > advisable in
> > > our case. Is it recommended in general, or specifically when the DB+WAL
> > is on
> > > the same hard drive?
> > >
> > > Stefan, Mark, are you disabling the write cache on your HDDs by
> default?
> > >
> > > Cheers,
> > >
> > > --
> > > Ben
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs

2020-06-24 Thread Marc Roos



> I run the corresponding smartctl command on every drive just before 
OSD daemon start. 

How/where did you do this?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs

2020-06-24 Thread Reed Dier
Just throwing my hat in here with a small bit of anecdotal experience.

In the early days of experimenting with ceph, I had 24x 8T disk, all behind 
RAID controllers as R0 vd's with no BBU (so controller cache is WT, default 
value), and pdcache (disk write cache) enabled (default value).

We had a lightning strike at our previous data center that killed power, and we 
ended up losing the entire ceph pool (not prod), due mostly in part to the 
pdcache setting.

We then did an exhaustive failure test following that, further isolating the 
pdcache as the culprit, and not the controllers write cache. The controllers 
now have BBU's to further prevent issues, but WB cache with the BBU did not 
yield issues, only pdcache.

So, all of this to say, in my experience, the on-disk write cache was a huge 
liability for losing writes.
This was also in the filestore days, and most of our issues were with XFS, but 
the point remains.

Write cache can be a consistency killer, and I recommend disabling where 
possible.

Reed

> On Jun 24, 2020, at 10:30 AM, Paul Emmerich  wrote:
> 
> Has anyone ever encountered a drive with a write cache that actually
> *helped*?
> I haven't.
> 
> As in: would it be a good idea for the OSD to just disable the write cache
> on startup? Worst case it doesn't do anything, best case it improves
> latency.
> 
> Paul
> 
> -- 
> Paul Emmerich
> 
> Looking for help with your Ceph cluster? Contact us at https://croit.io
> 
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
> 
> 
> On Wed, Jun 24, 2020 at 3:49 PM Frank R  wrote:
> 
>> fyi, there is an interesting note on disabling the write cache here:
>> 
>> 
>> https://yourcmc.ru/wiki/index.php?title=Ceph_performance&mobileaction=toggle_view_desktop#Drive_cache_is_slowing_you_down
>> 
>> On Wed, Jun 24, 2020 at 9:45 AM Benoît Knecht 
>> wrote:
>>> 
>>> Hi Igor,
>>> 
>>> Igor Fedotov wrote:
 for the sake of completeness one more experiment please if possible:
 
 turn off write cache for HGST drives and measure commit latency once
>> again.
>>> 
>>> I just did the same experiment with HGST drives, and disabling the write
>> cache
>>> on those drives brought the latency down from about 7.5ms to about 4ms.
>>> 
>>> So it seems disabling the write cache across the board would be
>> advisable in
>>> our case. Is it recommended in general, or specifically when the DB+WAL
>> is on
>>> the same hard drive?
>>> 
>>> Stefan, Mark, are you disabling the write cache on your HDDs by default?
>>> 
>>> Cheers,
>>> 
>>> --
>>> Ben
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs

2020-06-24 Thread Marc Roos



> Sorry for the spam, but I need to add this disclaimer:

> Although it is documented as safe to disable volatile write cache on a 
disk in use, I would
> probably not do it. The required cache flush might be erroneous in the 
firmware.

I can remember reading this before. I was hoping you maybe had some 
setup with systemd scripts or maybe udev.



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs

2020-06-24 Thread Benoît Knecht
Thank you all for your answers, this was really helpful!

Stefan Priebe wrote:
> yes we have the same issues and switched to seagate for those reasons.
> you can fix at least a big part of it by disabling the write cache of
> those drives - generally speaking it seems the toshiba firmware is
> broken.
> I was not able to find a newer one.

Good to know that we're not alone :) I also looked for a newer firmware, to no
avail.

Igor Fedotov wrote:
> Benoit, wondering what are the write cache settings in your case?
>
> And do you see any difference after disabling it if any?

Write cache is enabled on all our OSDs (including the HGST drives that don't
have a latency issue).

To see if disabling write cache on the Toshiba drives would help, I turned it
off on all 12 drives in one of our OSD nodes:

```
for disk in /dev/sd{a..l}; do hdparm -W0 $disk; done
```

and left it on in the remaining nodes. I used `rados bench write` to create
some load on the cluster, and looked at

```
avg by (hostname) (ceph_osd_commit_latency_ms * on (ceph_daemon) group_left 
(hostname) ceph_osd_metadata)
```

in Prometheus. The hosts with write cache _enabled_ had a commit latency around
145ms, while the host with write cache _disabled_ had a commit latency around
25ms. So it definitely helps!

Mark Nelson wrote:
> This isn't the first time I've seen drive cache cause problematic
> latency issues, and not always from the same manufacturer.
> Unfortunately it seems like you really have to test the drives you
> want to use before deploying them them to make sure you don't run into
> issues.

That's very true! Data sheets and even public benchmarks can be quite
deceiving, and two hard drives that seem to have similar performance profiles
can perform very differently within a Ceph cluster. Lesson learned.

Cheers,

--
Ben
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs

2020-06-24 Thread DHilsbos
All;

This conversation has been fascinating.

I'm throwing my hat in the ring, though I know almost nothing about systemd...

Completely non-portable, but...
Couldn't you write a script to issue the necessary commands to the desired 
drives, then create a system unit that calls it before OSD initialization?

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International, Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Frank Schilder [mailto:fr...@dtu.dk] 
Sent: Wednesday, June 24, 2020 9:15 AM
To: Marc Roos; paul.emmerich
Cc: bknecht; ceph-users; s.priebe
Subject: [ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba 
MG07ACA14TE HDDs

> I can remember reading this before. I was hoping you maybe had some
> setup with systemd scripts or maybe udev.

Yeah, doing this on boot up would be ideal. I was looking really hard into 
tuned and other services that claimed can do it, but required plugins or other 
stuff did/does not exist and documentation is close to non-existent.

After spending a couple of days I gave up and went with the simple 
script-command version.

If you come across something that allows easy configuration of this at 
boot-time, please let me know.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Marc Roos 
Sent: 24 June 2020 18:08:49
To: Frank Schilder; paul.emmerich
Cc: bknecht; ceph-users; s.priebe
Subject: RE: [ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba 
MG07ACA14TE HDDs

> Sorry for the spam, but I need to add this disclaimer:

> Although it is documented as safe to disable volatile write cache on a
disk in use, I would
> probably not do it. The required cache flush might be erroneous in the
firmware.

I can remember reading this before. I was hoping you maybe had some
setup with systemd scripts or maybe udev.



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs

2020-06-24 Thread Frank Ritchie
a udev rule may be easier

On Wed, Jun 24, 2020 at 1:17 PM  wrote:
>
> All;
>
> This conversation has been fascinating.
>
> I'm throwing my hat in the ring, though I know almost nothing about systemd...
>
> Completely non-portable, but...
> Couldn't you write a script to issue the necessary commands to the desired 
> drives, then create a system unit that calls it before OSD initialization?
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International, Inc.
> dhils...@performair.com
> www.PerformAir.com
>
>
>
> -Original Message-
> From: Frank Schilder [mailto:fr...@dtu.dk]
> Sent: Wednesday, June 24, 2020 9:15 AM
> To: Marc Roos; paul.emmerich
> Cc: bknecht; ceph-users; s.priebe
> Subject: [ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba 
> MG07ACA14TE HDDs
>
> > I can remember reading this before. I was hoping you maybe had some
> > setup with systemd scripts or maybe udev.
>
> Yeah, doing this on boot up would be ideal. I was looking really hard into 
> tuned and other services that claimed can do it, but required plugins or 
> other stuff did/does not exist and documentation is close to non-existent.
>
> After spending a couple of days I gave up and went with the simple 
> script-command version.
>
> If you come across something that allows easy configuration of this at 
> boot-time, please let me know.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Marc Roos 
> Sent: 24 June 2020 18:08:49
> To: Frank Schilder; paul.emmerich
> Cc: bknecht; ceph-users; s.priebe
> Subject: RE: [ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba 
> MG07ACA14TE HDDs
>
> > Sorry for the spam, but I need to add this disclaimer:
>
> > Although it is documented as safe to disable volatile write cache on a
> disk in use, I would
> > probably not do it. The required cache flush might be erroneous in the
> firmware.
>
> I can remember reading this before. I was hoping you maybe had some
> setup with systemd scripts or maybe udev.
>
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs

2020-06-24 Thread Anthony D'Atri


>> I can remember reading this before. I was hoping you maybe had some
>> setup with systemd scripts or maybe udev.
> 
> Yeah, doing this on boot up would be ideal. I was looking really hard into 
> tuned and other services that claimed can do it, but required plugins or 
> other stuff did/does not exist and documentation is close to non-existent.
> 
> After spending a couple of days I gave up and went with the simple 
> script-command version.
> 
> If you come across something that allows easy configuration of this at 
> boot-time, please let me know.

FWIW, doing this at boot-time only doesn’t address drives that are 
added/replaced without a reboot.  One could simply do so for them manually 
before deployment, but I thought I should mention it.

> I'm throwing my hat in the ring, though I know almost nothing about systemd…

In certain ways you’re fortunate ;)

> Couldn't you write a script to issue the necessary commands to the desired 
> drives, then create a system unit that calls it before OSD initialization?

The systemd unit file I think accepts an ExecStartPre entry for a command to 
run before firing up the daemon, eg. /var/lib/ceph/ceph-osd-prestart.sh

> a udev rule may be easier

I’m going to pretend that you did *not* just drop the U-bomb in polite company 
;)



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs

2020-06-24 Thread Benoît Knecht
Hi Igor,

Igor Fedotov wrote:
> for the sake of completeness one more experiment please if possible:
>
> turn off write cache for HGST drives and measure commit latency once again.

I just did the same experiment with HGST drives, and disabling the write cache
on those drives brought the latency down from about 7.5ms to about 4ms.

So it seems disabling the write cache across the board would be advisable in
our case. Is it recommended in general, or specifically when the DB+WAL is on
the same hard drive?

Stefan, Mark, are you disabling the write cache on your HDDs by default?

Cheers,

--
Ben
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: NFS Ganesha 2.7 in Xenial not available

2020-06-24 Thread Victoria Martinez de la Cruz
Thanks Ramana and David.

So we are using the Shaman search API to get the latest build for
ceph_nautilus flavor of NFS Ganesha, and that's how we get to the mentioned
build. We are doing this since it's part of our CI and it's better for
automation.

Should we use different repos?

Thanks,

V

On Wed, Jun 24, 2020 at 3:33 PM Victoria Martinez de la Cruz <
v...@redhat.com> wrote:

> Thanks Ramana and David.
>
> So we are using the Shaman search API to get the latest build for
> ceph_nautilus flavor of NFS Ganesha, and that's how we get to the mentioned
> build. We are doing this since it's part of our CI and it's better for
> automation.
>
> Should we use different repos?
>
> Thanks,
>
> V
>
> On Tue, Jun 23, 2020 at 2:42 PM David Galloway 
> wrote:
>
>>
>>
>> On 6/23/20 1:21 PM, Ramana Venkatesh Raja wrote:
>> > On Tue, Jun 23, 2020 at 6:59 PM Victoria Martinez de la Cruz
>> >  wrote:
>> >>
>> >> Hi folks,
>> >>
>> >> I'm hitting issues with the nfs-ganesha-stable packages [0], the repo
>> url
>> >> [1] is broken. Is there a known issue for this?
>> >>
>> >
>> > The missing packages in chacra could be due to the recent mishap in
>> > the sepia long running cluster,
>> >
>> https://lists.ceph.io/hyperkitty/list/d...@ceph.io/thread/YQMAHTB7MUHL25QP7V5ZUJQSTOGY4GHX/
>>
>> Hi Victoria,
>>
>> Ramana is correct.  Do you need 2.7.4 specifically?  If not, signed
>> nfs-ganesha packages can also be found here:
>> http://download.ceph.com/nfs-ganesha/
>>
>> >
>> >> Thanks,
>> >>
>> >> Victoria
>> >>
>> >> [0]
>> >>
>> https://shaman.ceph.com/repos/nfs-ganesha-stable/V2.7-stable/1a1fb71cdb811c1bac68f269dfbd5fed69c0913f/ceph_nautilus/128925/
>> >> [1]
>> >>
>> https://chacra.ceph.com/r/nfs-ganesha-stable/V2.7-stable/1a1fb71cdb811c1bac68f269dfbd5fed69c0913f/ubuntu/xenial/flavors/ceph_nautilus/
>> >> ___
>> >> ceph-users mailing list -- ceph-users@ceph.io
>> >> To unsubscribe send an email to ceph-users-le...@ceph.io
>> >>
>> >
>>
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS: What is the maximum number of files per directory

2020-06-24 Thread pchoi
Just an anecdotal answer from me...

You want as few files as possible. I wouldn't go beyond a few hundred files in 
a dir.
Seeing ~1s for each 1,000 files when I "ls".

But this is in a pretty idle directory. When there were files actively being 
written to those dirs and being read, just doing "ls" on the directories was 
very very slow (in the order of minutes)

I have a single MDS setup with cephfs metadata on SSDs. MDS cache at 20GB, 6 
million inodes, approaching 10k reqs/s.

$ for i in `ls --color=none | head -50 | tail -10`; do echo; echo -n "file 
count in dir: "; time ls $i | wc -l; done

file count in dir: 4354

real0m4.129s
user0m0.029s
sys 0m0.179s

file count in dir: 3064

real0m2.847s
user0m0.027s
sys 0m0.127s

file count in dir: 1770

real0m1.658s
user0m0.026s
sys 0m0.075s
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs

2020-06-24 Thread Frank Schilder
Yes, non-volatile write cache helps as described in the wiki. When you disable 
write cache with hdparm, it actually only disables volatile write cache. That's 
why SSDs with power loss protection are recommended for ceph.

A SAS/SATA SSD without any write cache will perform poorly no matter what.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Paul Emmerich 
Sent: 24 June 2020 17:30:51
To: Frank R
Cc: Benoît Knecht; s.pri...@profihost.ag; ceph-users@ceph.io
Subject: [ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba 
MG07ACA14TE HDDs

Has anyone ever encountered a drive with a write cache that actually
*helped*?
I haven't.

As in: would it be a good idea for the OSD to just disable the write cache
on startup? Worst case it doesn't do anything, best case it improves
latency.

Paul

--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


On Wed, Jun 24, 2020 at 3:49 PM Frank R  wrote:

> fyi, there is an interesting note on disabling the write cache here:
>
>
> https://yourcmc.ru/wiki/index.php?title=Ceph_performance&mobileaction=toggle_view_desktop#Drive_cache_is_slowing_you_down
>
> On Wed, Jun 24, 2020 at 9:45 AM Benoît Knecht 
> wrote:
> >
> > Hi Igor,
> >
> > Igor Fedotov wrote:
> > > for the sake of completeness one more experiment please if possible:
> > >
> > > turn off write cache for HGST drives and measure commit latency once
> again.
> >
> > I just did the same experiment with HGST drives, and disabling the write
> cache
> > on those drives brought the latency down from about 7.5ms to about 4ms.
> >
> > So it seems disabling the write cache across the board would be
> advisable in
> > our case. Is it recommended in general, or specifically when the DB+WAL
> is on
> > the same hard drive?
> >
> > Stefan, Mark, are you disabling the write cache on your HDDs by default?
> >
> > Cheers,
> >
> > --
> > Ben
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs

2020-06-24 Thread Frank Schilder
Ah, OK, misunderstood the question.

In my experience, no. I run the corresponding smartctl command on every drive 
just before OSD daemon start. I use smartctl because it applies to SAS and SATA 
drives with the same command (otherwise, you need to select between hdparm and 
sdparm). All SAS drives I got came with write cache disabled by default, 
however.

I think the blog post gives a very good explanation why disabling volatile 
write cache on any drive is either beneficial or has no effect and, therefore, 
is always safe (recommended). At least I read it this way and I have no 
contradicting evidence.

To get back to the last part of your question, I think if the OSD daemon just 
did it by default, a lot of people would have a better life.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Paul Emmerich 
Sent: 24 June 2020 17:39:16
To: Frank Schilder
Cc: Frank R; Benoît Knecht; s.pri...@profihost.ag; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba 
MG07ACA14TE HDDs

Well, what I was saying was "does it hurt to unconditionally run hdparm -W 0 on 
all disks?"

Which disk would suffer from this? I haven't seen any disk where this would be 
a bad idea


Paul

--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


On Wed, Jun 24, 2020 at 5:35 PM Frank Schilder 
mailto:fr...@dtu.dk>> wrote:
Yes, non-volatile write cache helps as described in the wiki. When you disable 
write cache with hdparm, it actually only disables volatile write cache. That's 
why SSDs with power loss protection are recommended for ceph.

A SAS/SATA SSD without any write cache will perform poorly no matter what.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Paul Emmerich mailto:paul.emmer...@croit.io>>
Sent: 24 June 2020 17:30:51
To: Frank R
Cc: Benoît Knecht; s.pri...@profihost.ag; 
ceph-users@ceph.io
Subject: [ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba 
MG07ACA14TE HDDs

Has anyone ever encountered a drive with a write cache that actually
*helped*?
I haven't.

As in: would it be a good idea for the OSD to just disable the write cache
on startup? Worst case it doesn't do anything, best case it improves
latency.

Paul

--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


On Wed, Jun 24, 2020 at 3:49 PM Frank R 
mailto:frankaritc...@gmail.com>> wrote:

> fyi, there is an interesting note on disabling the write cache here:
>
>
> https://yourcmc.ru/wiki/index.php?title=Ceph_performance&mobileaction=toggle_view_desktop#Drive_cache_is_slowing_you_down
>
> On Wed, Jun 24, 2020 at 9:45 AM Benoît Knecht 
> mailto:bkne...@protonmail.ch>>
> wrote:
> >
> > Hi Igor,
> >
> > Igor Fedotov wrote:
> > > for the sake of completeness one more experiment please if possible:
> > >
> > > turn off write cache for HGST drives and measure commit latency once
> again.
> >
> > I just did the same experiment with HGST drives, and disabling the write
> cache
> > on those drives brought the latency down from about 7.5ms to about 4ms.
> >
> > So it seems disabling the write cache across the board would be
> advisable in
> > our case. Is it recommended in general, or specifically when the DB+WAL
> is on
> > the same hard drive?
> >
> > Stefan, Mark, are you disabling the write cache on your HDDs by default?
> >
> > Cheers,
> >
> > --
> > Ben
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to 
> > ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to 
> ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs

2020-06-24 Thread Frank Schilder
I use the stable ceph/daemon containers and introduced my own startup script 
for the container entrypoint. On the action "disk activate", it does a smartctl 
on the device argument before executing entrypoint.sh.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Marc Roos 
Sent: 24 June 2020 17:55:35
To: Frank Schilder; paul.emmerich
Cc: bknecht; ceph-users; s.priebe
Subject: RE: [ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba 
MG07ACA14TE HDDs

> I run the corresponding smartctl command on every drive just before
OSD daemon start.

How/where did you do this?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs

2020-06-24 Thread Frank Schilder
Sorry for the spam, but I need to add this disclaimer:

Although it is documented as safe to disable volatile write cache on a disk in 
use, I would probably not do it. The required cache flush might be erroneous in 
the firmware.

Therefore, the method I use will not necessarily apply to OSD set-ups with 
WAL/DB partitions, multiple OSDs per disk and other set-ups where several 
daemons share the same drive. Here, more logic seems warranted. This also means 
that OSD daemons can probably not just do it without checking if a drive is 
currently in use or not.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Frank Schilder 
Sent: 24 June 2020 18:00:19
To: Marc Roos; paul.emmerich
Cc: bknecht; ceph-users; s.priebe
Subject: [ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba 
MG07ACA14TE HDDs

I use the stable ceph/daemon containers and introduced my own startup script 
for the container entrypoint. On the action "disk activate", it does a smartctl 
on the device argument before executing entrypoint.sh.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Marc Roos 
Sent: 24 June 2020 17:55:35
To: Frank Schilder; paul.emmerich
Cc: bknecht; ceph-users; s.priebe
Subject: RE: [ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba 
MG07ACA14TE HDDs

> I run the corresponding smartctl command on every drive just before
OSD daemon start.

How/where did you do this?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs

2020-06-24 Thread Frank Schilder
> I can remember reading this before. I was hoping you maybe had some
> setup with systemd scripts or maybe udev.

Yeah, doing this on boot up would be ideal. I was looking really hard into 
tuned and other services that claimed can do it, but required plugins or other 
stuff did/does not exist and documentation is close to non-existent.

After spending a couple of days I gave up and went with the simple 
script-command version.

If you come across something that allows easy configuration of this at 
boot-time, please let me know.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Marc Roos 
Sent: 24 June 2020 18:08:49
To: Frank Schilder; paul.emmerich
Cc: bknecht; ceph-users; s.priebe
Subject: RE: [ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba 
MG07ACA14TE HDDs

> Sorry for the spam, but I need to add this disclaimer:

> Although it is documented as safe to disable volatile write cache on a
disk in use, I would
> probably not do it. The required cache flush might be erroneous in the
firmware.

I can remember reading this before. I was hoping you maybe had some
setup with systemd scripts or maybe udev.



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW listing slower on nominally faster setup

2020-06-24 Thread jgoetz
We have a cluster, running Octopus 15.2.2, with the same exact issue described 
originally.

Confirmed, setting debug_rgw logs to "20/1" fixed the issue for us as well. 

What information would be needed to begin a preliminary bug report? As with 
others, I haven't found a way to easily replicate this issue. I can confirm 
this issue was also not occasional for us, but also affected every bucket, 
always, like Stefan saw.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Bench on specific OSD

2020-06-24 Thread Seena Fallah
Hi all.

Is there anyway to completely health check one OSD host or instance?
For example rados bech just on that OSD or do some checks for disk and
front and back netowrk?

Thanks.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs

2020-06-24 Thread Anthony D'Atri
The benefit of disabling on-drive cache may be at least partly dependent on the 
HBA; I’ve done testing of one specific drive model and found no difference, 
where someone else reported a measurable difference for the same model.

> Good to know that we're not alone :) I also looked for a newer firmware, to 
> no avail.

Dell sometimes publishes firmware blobs for drives that they resell, though 
those seem to have customized inquiry strings baked in, and their firmware 
won’t apply to “generic” drives without questionable hackery with a hex editor. 
 

My experience with Toshiba has been that the only way to get firmware blobs for 
generic drives is to persuade Toshiba themselves to give it to you, be it 
through a rep or the CSO.

> 
> Mark Nelson wrote:
>> This isn't the first time I've seen drive cache cause problematic
>> latency issues, and not always from the same manufacturer.
>> Unfortunately it seems like you really have to test the drives you
>> want to use before deploying them them to make sure you don't run into
>> issues.
> 
> That's very true! Data sheets and even public benchmarks can be quite
> deceiving, and two hard drives that seem to have similar performance profiles
> can perform very differently within a Ceph cluster. Lesson learned.

Benchmarks often are in a context rather removed from what anyone would deploy 
in production.

Notably I’ve had at least two experiences with drives that passed chassis 
vendor and in-house initial qualification.

The first was an HDD.  We had a mix of drives from vendor A and vendor B.  
Found that Vendor B’s drives were throwing read errors at 30x the rate of 
Vendor A’s.  After persisting for months through the layers I was finally able 
to send drives to the vendor’s engineers, who found at least one design flaw 
that was tickled by the op pattern of a Filestore (XFS) OSD with colo journal.  
Firmware was not able to substantially fix the problem, so they all had to be 
replaced with Vendor A.  Today BlueStore probably would not trigger the same 
design flaw.


The second was an SSD that was marketed as “enterprise” but had certain things 
that would only properly housekeep if allowed long idle times.  In that case I 
was eventually able to work with the vendor for a firmware fix.  In this case, 
BlueStore seemed to correlate with the behavior, as well as a serial number 
range.  This was one that didn’t manifest until drives had been in production 
for at least 90 days and as workload increased.


Moral of the story is to stress-test every model of drive if you care about 
data durability, availability, and performance.  Throw increasingly busy 
workloads and queue depths against the drives; performance of some will hit an 
abrupt cliff at a certain point.



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs

2020-06-24 Thread Lindsay Mathieson

On 25/06/2020 3:17 am, dhils...@performair.com wrote:

Completely non-portable, but...
Couldn't you write a script to issue the necessary commands to the desired 
drives, then create a system unit that calls it before OSD initialization?


Couldn't we just set (uncomment)

write_cache = off

in /etc/hdparm.conf?

--
Lindsay

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs

2020-06-24 Thread vitalif

Hi, https://yourcmc.ru/wiki/Ceph_performance author here %)

Disabling write cache is REALLY bad for SSDs without capacitors 
[consumer SSDs], also it's bad for HDDs with firmwares that don't have 
this bug-o-feature. The bug is really common though. I have no idea 
where it comes from, but it's really common. When you "disable" the 
write cache you actually "enable" the non-volatile write cache on those 
drives. Seagate EXOS drives also behave like that... It seems most EXOS 
drives have an SSD cache even though it's not mentioned in specs. And it 
gets enabled when you do hdparm -W 0. In theory hdparm -W 0 may hurt 
linear write performance even on those HDDs, though.


Well, what I was saying was "does it hurt to unconditionally run hdparm 
-W

0 on all disks?"

Which disk would suffer from this? I haven't seen any disk where this 
would

be a bad idea

Paul

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs

2020-06-24 Thread Marc Roos
 
I did a quick test with wcache off[1]. And have the impression the 
simple rados bench of 2 minutes performed a bit worse on my slow hdd's.

[1]
IFS=$'\n' && for line in `mount | grep 'osd/ceph'| awk '{print $1" 
"$3}'| sed -e 's/1 / /' -e 's#/var/lib/ceph/osd/ceph-##'`;do IFS=' ' 
arr=($line); service ceph-osd@${arr[1]} stop && smartctl -s wcache,off 
${arr[0]} && service ceph-osd@${arr[1]} start ;done


-Original Message-
To: Paul Emmerich
Cc: Benoît Knecht; s.pri...@profihost.ag; ceph-users@ceph.io
Subject: [ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba 
MG07ACA14TE HDDs

Hi, https://yourcmc.ru/wiki/Ceph_performance author here %)

Disabling write cache is REALLY bad for SSDs without capacitors 
[consumer SSDs], also it's bad for HDDs with firmwares that don't have 
this bug-o-feature. The bug is really common though. I have no idea 
where it comes from, but it's really common. When you "disable" the 
write cache you actually "enable" the non-volatile write cache on those 
drives. Seagate EXOS drives also behave like that... It seems most EXOS 
drives have an SSD cache even though it's not mentioned in specs. And it 
gets enabled when you do hdparm -W 0. In theory hdparm -W 0 may hurt 
linear write performance even on those HDDs, though.

> Well, what I was saying was "does it hurt to unconditionally run 
> hdparm -W 0 on all disks?"
> 
> Which disk would suffer from this? I haven't seen any disk where this 
> would be a bad idea
> 
> Paul
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io