[ceph-users] Re: Strange performance drop and low oss performance

2020-02-06 Thread Janne Johansson
>
>
> For object gateway, the performance is got by `swift-bench -t 64` which
> uses 64 threads concurrently. Will the radosgw and http overhead be so
> significant (94.5MB/s to 26MB/s for cluster1) when multiple threads are
> used? Thanks in advance!
>
>
Can't say what it "must" be, but if I log in to one of my rgw's (we have
several, loadbalanced) and run ceph benchmarks against spindrive pools (ie,
talking ceph directly), I get something like 200MB/s, if I run a write test
on the same host, but talking s3-over-http against itself, I get something
like 100MB/s, so the overhead in my case seems to be 100% (or 50% however
you calculate it).

You know there has to be some kind of penalty for doing protocol
translations, if for nothing else, because of object store client does
checksums, asks rgw to store it, rgw checksums the part(s), asks ceph to
store, ceph sends ack, rgw sends ack to client with checksum and client
compares before moving to next part.
This will be far slower than just plain writes to ceph (the two innermost
ops), and can in part be offset by using large IO, parallel streams,
multiple rgw backends and so on.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: osd_memory_target ignored

2020-02-06 Thread Frank Schilder
Dear Stefan,

thanks for your help. I opened these:

https://tracker.ceph.com/issues/44010
https://tracker.ceph.com/issues/44011

Best regards,

=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Stefan Kooman 
Sent: 05 February 2020 10:29
To: Frank Schilder
Cc: ceph-users
Subject: Re: [ceph-users] Re: osd_memory_target ignored

Quoting Frank Schilder (fr...@dtu.dk):
> Dear Stefan,
>
> is it possible that there is a mistake in the documentation or a bug? Out of 
> curiosity, I restarted one of these OSDs and the memory usage starts going up:
>
> ceph  881203 15.4  4.0 6201580 5344764 ? Sl   09:18   6:38 
> /usr/bin/ceph-osd --cluster ceph -f -i 243 --setuser ceph --setgroup disk
>
> The documentation of ods_memory_target says "Can update at runtime: true", 
> but it seems that a restart is required to activate the setting, so it can 
> *not* be updated at runtime (meaning it takes effect without restart).

Ah, that might be. If the documentation states it can be updated at
runtime it's a bug (in eiter the code or the documentation).
>
>
> In addition to that, I would like to have different default memory
> targets set for different device classes. Unfortunately, there seem
> not to be different memory_target_[devide class] default options. Is
> there a good way to set different while avoiding to bloat "ceph config
> dump" unnecessarily?

I'm afraid not. You might want to file a tracker issue with an
enhancement request.

Gr. Stefan

--
| BIT BV  https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Write i/o in CephFS metadata pool

2020-02-06 Thread Samy Ascha


> On 4 Feb 2020, at 16:14, Samy Ascha  wrote:
> 
> 
> 
>> On 2 Feb 2020, at 12:45, Patrick Donnelly  wrote:
>> 
>> On Wed, Jan 29, 2020 at 1:25 AM Samy Ascha  wrote:
>>> 
>>> Hi!
>>> 
>>> I've been running CephFS for a while now and ever since setting it up, I've 
>>> seen unexpectedly large write i/o on the CephFS metadata pool.
>>> 
>>> The filesystem is otherwise stable and I'm seeing no usage issues.
>>> 
>>> I'm in a read-intensive environment, from the clients' perspective and 
>>> throughput for the metadata pool is consistently larger than that of the 
>>> data pool.
>>> 
>>> For example:
>>> 
>>> # ceph osd pool stats
>>> pool cephfs_data id 1
>>> client io 7.6 MiB/s rd, 19 KiB/s wr, 404 op/s rd, 1 op/s wr
>>> 
>>> pool cephfs_metadata id 2
>>> client io 338 KiB/s rd, 43 MiB/s wr, 84 op/s rd, 26 op/s wr
>>> 
>>> I realise, of course, that this is a momentary display of statistics, but I 
>>> see this unbalanced r/w activity consistently when monitoring it live.
>>> 
>>> I would like some insight into what may be causing this large imbalance in 
>>> r/w, especially since I'm in a read-intensive (web hosting) environment.
>> 
>> The MDS is still writing its journal and updating the "open file
>> table". The MDS needs to record certain information about the state of
>> its cache and the state issued to clients. Even if the clients aren't
>> changing anything. (This is workload dependent but will be most
>> obvious when clients are opening files _not_ in cache already.)
>> 
>> -- 
>> Patrick Donnelly, Ph.D.
>> He / Him / His
>> Senior Software Engineer
>> Red Hat Sunnyvale, CA
>> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
>> 
> 
> Hi Patrick,
> 
> Thanks for this extra information.
> 
> I should be able to confirm this by checking network traffic flowing from the 
> MDSes to the OSDs, and compare it to what's coming in from the CephFS clients.
> 
> I'll report back when I have more information on that. I'm a little caught up 
> in other stuff right now, but I wanted to just acknowledge your message.
> 
> Samy
> 

Hi!

I've confirmed that the write IO to the metadata pool is coming form active 
MDSes.

I'm experiencing very poor write performance on clients and I would like to see 
if there's anything I can do to optimise the performance.

Right now, I'm specifically focussing on speeding up this use case:

In CephFS mounted dir:

$ time unzip -q wordpress-seo.12.9.1.zip 

real0m47.596s
user0m0.218s
sys 0m0.157s

On RBD mount:

$ time unzip -q wordpress-seo.12.9.1.zip 

real0m0.176s
user0m0.131s
sys 0m0.045s

The difference is just too big. I'm having real trouble finding a good 
reference to check my setup for bad configuration etc.

I have network bandwidth, RAM and CPU to spare, but I'm unsure on how to put it 
to work to help my case.

Thanks a lot,

Samy
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Write i/o in CephFS metadata pool

2020-02-06 Thread Stefan Kooman
> Hi!
> 
> I've confirmed that the write IO to the metadata pool is coming form active 
> MDSes.
> 
> I'm experiencing very poor write performance on clients and I would like to 
> see if there's anything I can do to optimise the performance.
> 
> Right now, I'm specifically focussing on speeding up this use case:
> 
> In CephFS mounted dir:
> 
> $ time unzip -q wordpress-seo.12.9.1.zip 
> 
> real  0m47.596s
> user  0m0.218s
> sys   0m0.157s
> 
> On RBD mount:
> 
> $ time unzip -q wordpress-seo.12.9.1.zip 
> 
> real  0m0.176s
> user  0m0.131s
> sys   0m0.045s
> 
> The difference is just too big. I'm having real trouble finding a good 
> reference to check my setup for bad configuration etc.
> 
> I have network bandwidth, RAM and CPU to spare, but I'm unsure on how to put 
> it to work to help my case.

Are there a lot of directories to be created from that zip file? I think
it boils down to the directory operations that need to be performed
synchrously. See
https://fosdem.org/2020/schedule/event/sds_ceph_async_directory_ops/
https://fosdem.org/2020/schedule/event/sds_ceph_async_directory_ops/attachments/slides/3962/export/events/attachments/sds_ceph_async_directory_ops/slides/3962/async_dirops_cephfs.pdf
https://video.fosdem.org/2020/H.1308/sds_ceph_async_directory_ops.webm

Gr. Stefan

-- 
| BIT BV  https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Stuck with an unavailable iscsi gateway

2020-02-06 Thread jcharles
Hello

I can't find, a way to resolve my problem.
I lost a iscsi gateway in a pool of 4 gateway, there is 3 lefts. I can't delete 
the lost gateway from host and I can't change the Owner of the resource owned 
by the lost gateway.

Finally, I have ressources which are inaccessible from clients and I can't 
reconfigure them because of the lost gateway.
Please, tell me there is a way to remove a lost gateway and that I won't be 
stuck for ever.

If I do 
  delete compute04.adm.local

it answers 
   Failed : Gateway deletion failed, gateway(s) 
unavailable:compute04.adm.local(UNKNOWN state)

I saw a reference of my problem in  the thread "Error in add new ISCSI gateway" 
but unfortunatly, no answer seems to be avalaible.


Thanks for any help
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Understanding Bluestore performance characteristics

2020-02-06 Thread vitalif

Hi Stefan,


Do you mean more info than:


Yes, there's more... I don't remember exactly, I think some information 
ends up included into OSD perf counters and some information is dumped 
into the OSD log, maybe there's even a 'ceph daemon' command to trigger 
it...


There are 4 options that enable various parts of it:

#rocksdb_perf = true
#rocksdb_collect_compaction_stats = true
#rocksdb_collect_extended_stats = true
#rocksdb_collect_memory_stats = true
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Need info about ceph bluestore autorepair

2020-02-06 Thread Mario Giammarco
Hello,
if I have a pool with replica 3 what happens when one replica is corrupted?
I suppose ceph detects bad replica using checksums and replace it with good
one
If I have a pool with replica 2 what happens?
Thanks,
Mario
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Stuck with an unavailable iscsi gateway

2020-02-06 Thread Jason Dillaman
Originally, the idea of a gateway just permanently disappearing
out-of-the-blue was never a concern. However, since this seems to be a
recurring issue, the latest version of ceph-iscsi includes support for
force-deleting a permanently dead iSCSI gateway [1]. I don't think
that fix is in an official release yet, but it's available as a dev
build here [2].

On Thu, Feb 6, 2020 at 6:45 AM  wrote:
>
> Hello
>
> I can't find, a way to resolve my problem.
> I lost a iscsi gateway in a pool of 4 gateway, there is 3 lefts. I can't 
> delete the lost gateway from host and I can't change the Owner of the 
> resource owned by the lost gateway.
>
> Finally, I have ressources which are inaccessible from clients and I can't 
> reconfigure them because of the lost gateway.
> Please, tell me there is a way to remove a lost gateway and that I won't be 
> stuck for ever.
>
> If I do
>   delete compute04.adm.local
>
> it answers
>Failed : Gateway deletion failed, gateway(s) 
> unavailable:compute04.adm.local(UNKNOWN state)
>
> I saw a reference of my problem in  the thread "Error in add new ISCSI 
> gateway" but unfortunatly, no answer seems to be avalaible.
>
>
> Thanks for any help
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

[1] https://github.com/ceph/ceph-iscsi/pull/156
[2] 
https://2.chacra.ceph.com/r/ceph-iscsi/master/945fc555a0434cd0b9f5dbcb0ebaadcde8989d0a/centos/7/flavors/default/

-- 
Jason
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Need info about ceph bluestore autorepair

2020-02-06 Thread Janne Johansson
Den tors 6 feb. 2020 kl 15:06 skrev Mario Giammarco :

> Hello,
> if I have a pool with replica 3 what happens when one replica is corrupted?
>

The PG on which this happens will turn from active+clean to
active+inconsistent.


> I suppose ceph detects bad replica using checksums and replace it with good
> one
>

There is a "osd fix on error = true/false" setting (whose name I can't
remember right
off the bat now) which controls this. If false, you need to "ceph pg
repair" it, then
it happens as you describe.


> If I have a pool with replica 2 what happens?
>

Same.

Except with repl=2, you run a higher chance of surprises* on the remaining
replica
while the first one is bad until it gets repaired.

*) ie, data loss, tears and less sleep for ceph admins

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RBD cephx read-only key

2020-02-06 Thread Andras Pataki
I'm trying to set up a cephx key to mount RBD images read-only.  I have 
the following two keys:


[client.rbd]
    key = xxx
    caps mgr = "profile rbd"
    caps mon = "profile rbd"
    caps osd = "profile rbd pool=rbd_vm"

[client.rbd-ro]
    key = xxx
    caps mgr = "profile rbd-read-only"
    caps mon = "profile rbd"
    caps osd = "profile rbd-read-only pool=rbd_vm"

The following works:

# rbd map --pool rbd_vm andras_test --name client.rbd
/dev/rbd0

and so does this:

# rbd map --pool rbd_vm andras_test --name client.rbd --read-only
/dev/rbd0

but the using the rbd-ro key doesn't work:

# rbd map --pool rbd_vm andras_test --name client.rbd-ro --read-only
rbd: sysfs write failed
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: map failed: (1) Operation not permitted

the logs only have the following:

[1281776.788709] libceph: mon4 10.128.150.14:6789 session established
[1281776.801747] libceph: client88900164 fsid 
d7b33135-0940-4e48-8aa6-1d2026597c2f


The back end of mimic 13.2.8, the kernel is the CentOS kernel 
3.10.0-957.27.2.el7.x86_64


Any ideas what I'm doing wrong here?

Andras

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD cephx read-only key

2020-02-06 Thread Jason Dillaman
On Thu, Feb 6, 2020 at 11:20 AM Andras Pataki
 wrote:
>
> I'm trying to set up a cephx key to mount RBD images read-only.  I have
> the following two keys:
>
> [client.rbd]
>  key = xxx
>  caps mgr = "profile rbd"
>  caps mon = "profile rbd"
>  caps osd = "profile rbd pool=rbd_vm"
>
> [client.rbd-ro]
>  key = xxx
>  caps mgr = "profile rbd-read-only"
>  caps mon = "profile rbd"
>  caps osd = "profile rbd-read-only pool=rbd_vm"
>
> The following works:
>
> # rbd map --pool rbd_vm andras_test --name client.rbd
> /dev/rbd0
>
> and so does this:
>
> # rbd map --pool rbd_vm andras_test --name client.rbd --read-only
> /dev/rbd0
>
> but the using the rbd-ro key doesn't work:
>
> # rbd map --pool rbd_vm andras_test --name client.rbd-ro --read-only
> rbd: sysfs write failed
> In some cases useful info is found in syslog - try "dmesg | tail".
> rbd: map failed: (1) Operation not permitted
>
> the logs only have the following:
>
> [1281776.788709] libceph: mon4 10.128.150.14:6789 session established
> [1281776.801747] libceph: client88900164 fsid
> d7b33135-0940-4e48-8aa6-1d2026597c2f
>
> The back end of mimic 13.2.8, the kernel is the CentOS kernel
> 3.10.0-957.27.2.el7.x86_64
>
> Any ideas what I'm doing wrong here?

You need kernel v5.5 or later to map an RBD image via krbd using
read-only caps [1]. Prior to this patch, krbd would be in a
quasi-read-only state internally.

> Andras
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

[1] https://tracker.ceph.com/issues/42667

-- 
Jason
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD cephx read-only key

2020-02-06 Thread Andras Pataki

Ah, that makes sense.  Thanks for the quick reply!

Andras


On 2/6/20 11:24 AM, Jason Dillaman wrote:

On Thu, Feb 6, 2020 at 11:20 AM Andras Pataki
 wrote:

I'm trying to set up a cephx key to mount RBD images read-only.  I have
the following two keys:

[client.rbd]
  key = xxx
  caps mgr = "profile rbd"
  caps mon = "profile rbd"
  caps osd = "profile rbd pool=rbd_vm"

[client.rbd-ro]
  key = xxx
  caps mgr = "profile rbd-read-only"
  caps mon = "profile rbd"
  caps osd = "profile rbd-read-only pool=rbd_vm"

The following works:

# rbd map --pool rbd_vm andras_test --name client.rbd
/dev/rbd0

and so does this:

# rbd map --pool rbd_vm andras_test --name client.rbd --read-only
/dev/rbd0

but the using the rbd-ro key doesn't work:

# rbd map --pool rbd_vm andras_test --name client.rbd-ro --read-only
rbd: sysfs write failed
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: map failed: (1) Operation not permitted

the logs only have the following:

[1281776.788709] libceph: mon4 10.128.150.14:6789 session established
[1281776.801747] libceph: client88900164 fsid
d7b33135-0940-4e48-8aa6-1d2026597c2f

The back end of mimic 13.2.8, the kernel is the CentOS kernel
3.10.0-957.27.2.el7.x86_64

Any ideas what I'm doing wrong here?

You need kernel v5.5 or later to map an RBD image via krbd using
read-only caps [1]. Prior to this patch, krbd would be in a
quasi-read-only state internally.


Andras

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[1] https://tracker.ceph.com/issues/42667


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Different memory usage on OSD nodes after update to Nautilus

2020-02-06 Thread Massimo Sgaravatto
Dear all

In the mid of January I updated my ceph cluster from Luminous to Nautilus.

Attached you can see the memory metrics collected on one OSD node (I see
the very same behavior on all OSD hosts) graphed via Ganglia
This is Centos 7 node, with 64 GB of RAM, hosting 10 OSDs.

So before the update there were about 20 GB of FreeMem.
Now FreeMem is basically 0, but I see 20 GB of Buffers,

I guess this triggered some swapping, probably because I forgot to
set vm.swappiness to 0 (it was set to 60, the default value).

I was wondering if this the expected behavior

PS: Actually besides updating ceph, I also updated all the other packages
(yum update), so I am not sure that this different memory usage is because
of the ceph update
For the record in this update the kernel was updated from 3.10.0-1062.1.2
to 3.10.0-1062.9.1

Thanks, Massimo
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Different memory usage on OSD nodes after update to Nautilus

2020-02-06 Thread Massimo Sgaravatto
Thanks for your feedback

The Ganglia graphs are available here:

https://cernbox.cern.ch/index.php/s/0xBDVwNkRqcoGdF

Replying to the other questions:

- Free Memory in ganglia is derived from "MemFree" in /proc/meminfo
- Memory Buffers in ganglia is derived from "Buffers" in /proc/meminfo
- On this host, the OSDs are 6TB. On other hosts we have 10TB OSDs
- "osd memory target" is set to ~ 4.5 GB (actually, while debugging this
issue, I have just lowered the value to 3.2 GB)
- "ceph tell osd.x heap stats" basically always reports 0 (or a very low
value) for "Bytes in page heap freelist" and a heap release doesn't change
the memory usage
- I can agree that swap is antiquated. But so far it was simply not used
and didn't cause any problems. At any rate I am now going to remove the
swap (or setting the swappiness to 0).

Thanks again !

Cheers, Massimo




On Thu, Feb 6, 2020 at 6:28 PM Anthony D'Atri  wrote:

>  Attachments are usually filtered by mailing lists.  Yours did not come
> through.  A URL to Skitch or some other hosting works better.
>
> Your kernel version sounds like RHEL / CentOS?  I can say that memory
> accounting definitely did change between upstream 3.19 and 4.9
>
>
> osd04-cephstorage1-gsc:~ # head /proc/meminfo
> MemTotal:   197524684 kB
> MemFree:80388504 kB
> MemAvailable:   86055708 kB
> Buffers:  633768 kB
> Cached:  4705408 kB
> SwapCached:0 kB
>
> Specifically, node_memory_Active as reported by node_exporter changes
> dramatically, and MemAvailable is the more meaningful metric.  What is your
> “FreeMem” metric actually derived from?
>
> 64GB for 10 OSDs might be on the light side, how large are those OSDs?
>
> For sure swap is antiquated.  If your systems have any swap provisioned at
> all, you’re doing it wrong.  I’ve had good results setting it to 1.
>
> Do `ceph daemon osd.xx heap stats`, see if your OSD processes have much
> unused memory that has not been released to the OS.  If they do, “heap
> release” can be useful.
>
>
>
> > On Feb 6, 2020, at 9:08 AM, Massimo Sgaravatto <
> massimo.sgarava...@gmail.com> wrote:
> >
> > Dear all
> >
> > In the mid of January I updated my ceph cluster from Luminous to
> Nautilus.
> >
> > Attached you can see the memory metrics collected on one OSD node (I see
> > the very same behavior on all OSD hosts) graphed via Ganglia
> > This is Centos 7 node, with 64 GB of RAM, hosting 10 OSDs.
> >
> > So before the update there were about 20 GB of FreeMem.
> > Now FreeMem is basically 0, but I see 20 GB of Buffers,
> >
> > I guess this triggered some swapping, probably because I forgot to
> > set vm.swappiness to 0 (it was set to 60, the default value).
> >
> > I was wondering if this the expected behavior
> >
> > PS: Actually besides updating ceph, I also updated all the other packages
> > (yum update), so I am not sure that this different memory usage is
> because
> > of the ceph update
> > For the record in this update the kernel was updated from 3.10.0-1062.1.2
> > to 3.10.0-1062.9.1
> >
> > Thanks, Massimo
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ubuntu 18.04.4 Ceph 12.2.12

2020-02-06 Thread Dan Hill
For the Ubuntu 18.04 LTS, the latest ceph package is
12.2.12-0ubuntu0.18.04.4 and can be found in the bionic-updates pocket [0].
There is an active SRU (stable release update) to move to the new 12.2.13
point release. You can follow its progress on launchpad [1].

I should note that the Ubuntu 18.04 LTS also supports the mimic and
nautilus releases through the ubuntu cloud archive ppas. You can find
details on which LTS supports which ceph releases here [2].

Please open a launchpad bug if you are having problems installing from
Ubuntu sourced packaging.

[0] https://packages.ubuntu.com/bionic-updates/ceph
[1] https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1861793
[2] https://ubuntu.com/ceph


On Mon, Feb 3, 2020 at 4:07 PM Atherion  wrote:

> So now that 12.2.13 has been released, now I will have a mixed environment
> if I use Ubuntu 18.04 repo 12.2.12
>
> I also found there is a docker container
> https://hub.docker.com/r/ceph/daemon I could potentially just use the
> container to run the version I need. Wondering if anyone has done this in
> production?
>
> Managing the ubuntu repos for ceph has not been easy to say the least :(
> Found this ticket but looks dead https://tracker.ceph.com/issues/24326
>
> ‐‐‐ Original Message ‐‐‐
> On Friday, January 24, 2020 1:12 PM, Anthony D'Atri 
> wrote:
>
> > I applied those packages for the same reason on a staging cluster and so
> far so good.
> >
> >> On Jan 24, 2020, at 9:15 AM, Atherion  wrote:
> >
> >> 
> >> Hi Ceph Community.
> >> We currently have a luminous cluster running and some machines still on
> Ubuntu 14.04
> >> We are looking to upgrade these machines to 18.04 but the only upgrade
> path for luminous with the ceph repo is through 16.04.
> >> It is doable to get to Mimic but then we have to upgrade all those
> machines to 16.04 but then we have to upgrade again to 18.04 when we get to
> Mimic, it is becoming a huge time sink.
> >>
> >> I did notice in the Ubuntu repos they have added 12.2.12 in 18.04.4
> release. Is this a reliable build we can use?
> >>
> https://ubuntu.pkgs.org/18.04/ubuntu-proposed-main-amd64/ceph_12.2.12-0ubuntu0.18.04.4_amd64.deb.html
> >> If so then we can go straight to 18.04.4 and not waste so much time.
> >>
> >> Best
> >>
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: mds lost very frequently

2020-02-06 Thread Stefan Kooman
Hi,

After setting:

ceph config set mds mds_recall_max_caps 1

(5000 before change)

and 

ceph config set mds mds_recall_max_decay_rate 1.0

(2.5 before change)

And the:

ceph tell 'mds.*' injectargs '--mds_recall_max_caps 1'
ceph tell 'mds.*' injectargs '--mds_recall_max_decay_rate 1.0'

our up:active MDS stopped responding and the standby-replay stepped in
... and hit an assert (same as in this thread):

2020-02-06 16:42:16.712 7ff76a528700  1 heartbeat_map reset_timeout 'MDSRank' 
had timed out after 15
2020-02-06 16:42:17.616 7ff76ff1b700  0 mds.beacon.mds2  MDS is no longer laggy
2020-02-06 16:42:20.348 7ff76d716700 -1 /build/ceph-13.2.8/src/mds/Locker.cc: 
In function 'void Locker::file_recover(ScatterLock*)' thread 7ff76d716700 time 
2020-02-06 16:42:20.351124
/build/ceph-13.2.8/src/mds/Locker.cc: 5307: FAILED assert(lock->get_state() == 
LOCK_PRE_SCAN)

 ceph version 13.2.8 (5579a94fafbc1f9cc913a0f5d362953a5d9c3ae0) mimic (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x14e) [0x7ff7759939de]
 2: (()+0x287b67) [0x7ff775993b67]
 3: (()+0x28a9ea) [0x5585eb2b79ea]
 4: (MDCache::start_files_to_recover()+0xbb) [0x5585eb1f897b]
 5: (MDSRank::active_start()+0x135) [0x5585eb146be5]
 6: (MDSRankDispatcher::handle_mds_map(MMDSMap*, MDSMap*)+0x4e5) 
[0x5585eb151ea5]
 7: (MDSDaemon::handle_mds_map(MMDSMap*)+0xca8) [0x5585eb134608]
 8: (MDSDaemon::handle_core_message(Message*)+0x6c) [0x5585eb138bbc]
 9: (MDSDaemon::ms_dispatch(Message*)+0xbb) [0x5585eb13929b]
 10: (DispatchQueue::entry()+0xb92) [0x7ff775a56e52]
 11: (DispatchQueue::DispatchThread::entry()+0xd) [0x7ff775af3e2d]
 12: (()+0x76db) [0x7ff7752846db]
 13: (clone()+0x3f) [0x7ff77446a88f]

2020-02-06 16:42:20.348 7ff76d716700 -1 *** Caught signal (Aborted) **
 in thread 7ff76d716700 thread_name:ms_dispatch

 ceph version 13.2.8 (5579a94fafbc1f9cc913a0f5d362953a5d9c3ae0) mimic (stable)
 1: (()+0x12890) [0x7ff77528f890]
 2: (gsignal()+0xc7) [0x7ff774387e97]
 3: (abort()+0x141) [0x7ff774389801]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x256) [0x7ff775993ae6]
 5: (()+0x287b67) [0x7ff775993b67]
 6: (()+0x28a9ea) [0x5585eb2b79ea]
 7: (MDCache::start_files_to_recover()+0xbb) [0x5585eb1f897b]
 8: (MDSRank::active_start()+0x135) [0x5585eb146be5]
 9: (MDSRankDispatcher::handle_mds_map(MMDSMap*, MDSMap*)+0x4e5) 
[0x5585eb151ea5]
 10: (MDSDaemon::handle_mds_map(MMDSMap*)+0xca8) [0x5585eb134608]
 11: (MDSDaemon::handle_core_message(Message*)+0x6c) [0x5585eb138bbc]
 12: (MDSDaemon::ms_dispatch(Message*)+0xbb) [0x5585eb13929b]
 13: (DispatchQueue::entry()+0xb92) [0x7ff775a56e52]
 14: (DispatchQueue::DispatchThread::entry()+0xd) [0x7ff775af3e2d]
 15: (()+0x76db) [0x7ff7752846db]
 16: (clone()+0x3f) [0x7ff77446a88f]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.



Quoting Yan, Zheng (uker...@gmail.com):

> Please try below patch if you can compile ceph from source.  If you
> can't compile ceph or the issue still happens, please set  debug_mds =
> 10 for standby mds (change debug_mds to 0 after mds becomes active).
> 
> Regards
> Yan, Zheng
> 
> diff --git a/src/mds/MDSRank.cc b/src/mds/MDSRank.cc
> index 1e8b024b8a..d1150578f1 100644
> --- a/src/mds/MDSRank.cc
> +++ b/src/mds/MDSRank.cc
> @@ -1454,8 +1454,8 @@ void MDSRank::rejoin_done()
>  void MDSRank::clientreplay_start()
>  {
>dout(1) << "clientreplay_start" << dendl;
> -  finish_contexts(g_ceph_context, waiting_for_replay);  // kick waiters
>mdcache->start_files_to_recover();
> +  finish_contexts(g_ceph_context, waiting_for_replay);  // kick waiters
>queue_one_replay();
>  }
> 
> @@ -1487,8 +1487,8 @@ void MDSRank::active_start()
> 
>mdcache->clean_open_file_lists();
>mdcache->export_remaining_imported_caps();
> -  finish_contexts(g_ceph_context, waiting_for_replay);  // kick waiters
>mdcache->start_files_to_recover();
> +  finish_contexts(g_ceph_context, waiting_for_replay);  // kick waiters
> 
>mdcache->reissue_all_caps();
>mdcache->activate_stray_manager();

AFAICT this patch has never been tested and never commited. Do you still think
this might fix the issue? Any hints on how we might reproduce this issue:
failing active mds and hitting this specific recovery scenario

We will happily apply this patch and do testing to check if it really fixes the
issue.

Gr. Stefan

P.s. For my understanding: the MDS should never stop responding by setting
these parameters, right?



-- 
| BIT BV  https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: slow using ISCSI - Help-me

2020-02-06 Thread Mike Christie
On 02/05/2020 07:03 AM, Gesiel Galvão Bernardes wrote:
> Em dom., 2 de fev. de 2020 às 00:37, Gesiel Galvão Bernardes
> mailto:gesiel.bernar...@gmail.com>> escreveu:
> 
> Hi,
> 
> Just now was possible continue this. Below is the information
> required. Thanks advan


Hey, sorry for the late reply. I just back from PTO.

> 
> esxcli storage nmp device list -d naa.6001405ba48e0b99e4c418ca13506c8e
> naa.6001405ba48e0b99e4c418ca13506c8e
>Device Display Name: LIO-ORG iSCSI Disk
> (naa.6001405ba48e0b99e4c418ca13506c8e)
>Storage Array Type: VMW_SATP_ALUA
>Storage Array Type Device Config: {implicit_support=on;
> explicit_support=off; explicit_allow=on; alua_followover=on;
> action_OnRetryErrors=on; {TPG_id=1,TPG_state=ANO}}
>Path Selection Policy: VMW_PSP_MRU
>Path Selection Policy Device Config: Current Path=vmhba68:C0:T0:L0
>Path Selection Policy Device Custom Config:
>Working Paths: vmhba68:C0:T0:L0
>Is USB: false



> Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x2 0x4 0xa. Act:FAILOVER


Are you sure you are using tcmu-runner 1.4? Is that the actual daemon
reversion running? Did you by any chance install the 1.4 rpm, but you/it
did not restart the daemon? The error code above is returned in 1.3 and
earlier.

You are probably hitting a combo of 2 issues.

We had only listed ESX 6.5 in the docs you probably saw, and in 6.7 the
value of action_OnRetryErrors defaulted to on instead of off. You should
set this back to off.

You should also upgrade to the current version of tcmu-runner 1.5.x. It
should fix the issue you are hitting, so non IO commands like inquiry,
RTPG, etc are executed while failing over/back, so you would not hit the
problem where path initialization and path testing IO is failed causing
the path to marked as failed.


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Benefits of high RAM on a metadata server?

2020-02-06 Thread Matt Larson
Hi, we are planning out a Ceph storage cluster and were choosing
between 64GB, 128GB, or even 256GB on metadata servers. We are
considering having 2 metadata servers overall.

Does going to high levels of RAM possibly yield any performance
benefits? Is there a size beyond which there are just diminishing
returns vs cost?

The expected use case would be for a cluster where there might be
10-20 concurrent users working on individual datasets of 5TB in size.
I expect there would be lots of reads of the 5TB datasets matched with
the creation of hundreds to thousands of smaller files during
processing of the images.

Thanks!
-Matt

-- 
Matt Larson, PhD
Madison, WI  53705 U.S.A.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Benefits of high RAM on a metadata server?

2020-02-06 Thread Bogdan Adrian Velica
Hi,
I am running on 3 MDS servers (1 active and 2 backups and I recommend that)
each of 128 GB of RAM (the clients are running ML analysis) and I have
about 20 mil inodes loaded in ram. It's working fine except some warnings I
have  "client X is failing to respond to cache pressure."
Besides that there are no complaints but I thing you would need the 256GB
of ram specially if the datasets will increase...  just my 2 cents..

Will you have SSD ?



On Fri, Feb 7, 2020 at 12:02 AM Matt Larson  wrote:

> Hi, we are planning out a Ceph storage cluster and were choosing
> between 64GB, 128GB, or even 256GB on metadata servers. We are
> considering having 2 metadata servers overall.
>
> Does going to high levels of RAM possibly yield any performance
> benefits? Is there a size beyond which there are just diminishing
> returns vs cost?
>
> The expected use case would be for a cluster where there might be
> 10-20 concurrent users working on individual datasets of 5TB in size.
> I expect there would be lots of reads of the 5TB datasets matched with
> the creation of hundreds to thousands of smaller files during
> processing of the images.
>
> Thanks!
> -Matt
>
> --
> Matt Larson, PhD
> Madison, WI  53705 U.S.A.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Benefits of high RAM on a metadata server?

2020-02-06 Thread Wido den Hollander



On 2/6/20 11:01 PM, Matt Larson wrote:
> Hi, we are planning out a Ceph storage cluster and were choosing
> between 64GB, 128GB, or even 256GB on metadata servers. We are
> considering having 2 metadata servers overall.
> 
> Does going to high levels of RAM possibly yield any performance
> benefits? Is there a size beyond which there are just diminishing
> returns vs cost?
> 

The MDS will try to cache as much inodes as you allow it to.

So the amount of users nor the total amount of bytes doesn't matter,
it's the amount of inodes, thus: files and directories.

The more you have of those, the more memory it requires.

A lot of small files? A lot of memory!

Wido

> The expected use case would be for a cluster where there might be
> 10-20 concurrent users working on individual datasets of 5TB in size.
> I expect there would be lots of reads of the 5TB datasets matched with
> the creation of hundreds to thousands of smaller files during
> processing of the images.
> 
> Thanks!
> -Matt
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Benefits of high RAM on a metadata server?

2020-02-06 Thread Matt Larson
Hi Bogdan,

Are the "client failing to respond" messages indicating that you
actually exceed the 128 GB ram on your MDS hosts?

The MDS servers are not planned to have SSD drives. The storage
servers would have HD's and 1 nVME SSD drive that could hold metadata
volumes.


On Thu, Feb 6, 2020 at 4:11 PM Bogdan Adrian Velica  wrote:
>
> Hi,
> I am running on 3 MDS servers (1 active and 2 backups and I recommend that) 
> each of 128 GB of RAM (the clients are running ML analysis) and I have about 
> 20 mil inodes loaded in ram. It's working fine except some warnings I have  
> "client X is failing to respond to cache pressure."
> Besides that there are no complaints but I thing you would need the 256GB of 
> ram specially if the datasets will increase...  just my 2 cents..
>
> Will you have SSD ?
>
>
>
> On Fri, Feb 7, 2020 at 12:02 AM Matt Larson  wrote:
>>
>> Hi, we are planning out a Ceph storage cluster and were choosing
>> between 64GB, 128GB, or even 256GB on metadata servers. We are
>> considering having 2 metadata servers overall.
>>
>> Does going to high levels of RAM possibly yield any performance
>> benefits? Is there a size beyond which there are just diminishing
>> returns vs cost?
>>
>> The expected use case would be for a cluster where there might be
>> 10-20 concurrent users working on individual datasets of 5TB in size.
>> I expect there would be lots of reads of the 5TB datasets matched with
>> the creation of hundreds to thousands of smaller files during
>> processing of the images.
>>
>> Thanks!
>> -Matt
>>
>> --
>> Matt Larson, PhD
>> Madison, WI  53705 U.S.A.
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io



-- 
Matt Larson, PhD
Madison, WI  53705 U.S.A.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io