[ceph-users] Building a Ceph cluster with Ubuntu 18.04 and NVMe SSDs

2020-03-24 Thread Georg Schönberger

Hi Ceph users!

We are currently configuring our new production Ceph cluster and I have 
some questions regarding Ubuntu and NVMe SSDs.


Basic setup:
- Ubuntu 18.04 with HWE Kernel 5.3
- Deployment via ceph-ansible (Ceph stable "Nautilus")
- 5x Nodes with AMD EPYC 7402P CPUs
- 25Gbit/s NICs and switches for Ceph private and public network
- 4x Intel P4510 2TB NVMe SSDs (all flash) per Node

My questions:
1. Should we deploy more than one OSD per NVMe SSD? (as P4510's 
performance can sustain e.g. 2 OSDs)

2. Does anyone know NVMe specific Linux settings we should enable?
3. Can we use io_uring, if yes how can we enable it? Is it enough to set 
bluestore_iouring=true?


What I know so far:
Ad 1: My opinion is to use at least 2 OSDs per NVMEe SSD as the Intel 
P4510 is fast enough to serve the parallel requests.
Please be aware to use the latest firmware version VDV10170 -> with 
version VDV10131 we had massive stalls on Ceph side!


Ad 2: I have already enabled NVMe polling queues, Ubuntu has disabled 
them by default:
Added nvme.poll_queues=1 to /etc/default/grub, then checked 
/sys/block/nvme1n1/queue/io_poll
Cf. 
https://lore.kernel.org/linux-block/20190318222133.GA24176@localhost.localdomain/


Ad 3: This commit states it should be possible to use io_uring:
https://github.com/ceph/ceph/pull/27392
This issue also shows how to set bluestore_iouring=true but it's not 
clear if any more setup is required, like liburing:

https://github.com/axboe/liburing
A presentation from Christoph Hellwig shows the advantages:
https://www.snia.org/sites/default/files/SDC/2019/presentations/NVMe/Hellwig_Christoph_Linux_NVMe_and_Block_Layer_Status_Update.pdf

Any help and inputs would be appreciated,
THX - Georg
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Questions on Ceph cluster without OS disks

2020-03-24 Thread Anthony D'Atri
I suspect Ceph is configured in their case to send all logs off-node to a 
central syslog server, ELK, etc.

With Jewel this seemed to result in daemons crashing, but probably it’s since 
been fixed (I haven’t tried).



> that is much less than I experienced of allocated disk space in case
> something is wrong with the cluster.
> I have defined at least 10GB and there were situations (in the past)
> when this space was quickly allocated by
> syslog
> user.log
> messages
> daemon.log
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Questions on Ceph cluster without OS disks

2020-03-24 Thread Thomas Schneider
Hello Martin,

I suspect you're using a central syslog server.
Can you share information which central syslog server you use?
Is this central server running on ceph cluster, too?

Regards
Thomas

Am 23.03.2020 um 09:39 schrieb Martin Verges:
> Hello Thomas,
>
> by default we allocate 1GB per Host on the Management Node, nothing on
> the PXE booted server.
>
> This value can be changed in the management container config file
> (/config/config.yml):
> > ...
> > logFilesPerServerGB: 1
> > ...
> After changing the config, you need to restart the mgmt container.
>
> --
> Martin Verges
> Managing director
>
> Mobile: +49 174 9335695
> E-Mail: martin.ver...@croit.io 
> Chat: https://t.me/MartinVerges
>
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
>
> Web: https://croit.io
> YouTube: https://goo.gl/PGE1Bx
>
>
> Am Mo., 23. März 2020 um 09:30 Uhr schrieb Thomas Schneider
> <74cmo...@gmail.com >:
>
> Hello Martin,
>
> how much disk space do you reserve for log in the PXE setup?
>
> Regards
> Thomas
>
> Am 22.03.2020 um 20:50 schrieb Martin Verges:
> > Hello Samuel,
> >
> > we from croit.io  don't use NFS to boot up
> Servers. We copy the OS directly
> > into the RAM (approximately 0.5-1GB). Think of it like a
> container, you
> > start it and throw it away when you no longer need it.
> > This way we can save the slots of OS harddisks to add more
> storage per node
> > and reduce overall costs as 1GB ram is cheaper then an OS disk
> and consumes
> > less power.
> >
> > If our management node is down, nothing will happen to the
> cluster. No
> > impact, no downtime. However, you do need the mgmt node to boot
> up the
> > cluster. So after a very rare total power outage, your first
> system would
> > be the mgmt node and then the cluster itself. But again, if you
> configure
> > your systems correct, no manual work is required to recover from
> that. For
> > everything else, it is possible (but definitely not needed) to
> deploy our
> > mgmt node in active/passive HA.
> >
> > We have multiple hundred installations worldwide in production
> > environments. Our strong PXE knowledge comes from more than 20
> years of
> > datacenter hosting experience and it never ever failed us in the
> last >10
> > years.
> >
> > The main benefits out of that:
> >  - Immutable OS freshly booted: Every host has exactly the same
> version,
> > same library, kernel, Ceph versions,...
> >  - OS is heavily tested by us: Every croit deployment has
> exactly the same
> > image. We can find errors much faster and hit much fewer errors.
> >  - Easy Update: Updating OS, Ceph or anything else is just a
> node reboot.
> > No cluster downtime, No service Impact, full automatic handling
> by our mgmt
> > Software.
> >  - No need to install OS: No maintenance costs, no labor
> required, no other
> > OS management required.
> >  - Centralized Logs/Stats: As it is booted in memory, all logs and
> > statistics are collected on a central place for easy access.
> >  - Easy to scale: It doesn't matter if you boot 3 oder 300
> nodes, all
> > boot the exact same image in a few seconds.
> >  .. lots more
> >
> > Please do not hesitate to contact us directly. We always try to
> offer an
> > excellent service and are strongly customer oriented.
> >
> > --
> > Martin Verges
> > Managing director
> >
> > Mobile: +49 174 9335695
> > E-Mail: martin.ver...@croit.io 
> > Chat: https://t.me/MartinVerges
> >
> > croit GmbH, Freseniusstr. 31h, 81247 Munich
> > CEO: Martin Verges - VAT-ID: DE310638492
> > Com. register: Amtsgericht Munich HRB 231263
> >
> > Web: https://croit.io
> > YouTube: https://goo.gl/PGE1Bx
> >
> >
> > Am Sa., 21. März 2020 um 13:53 Uhr schrieb huxia...@horebdata.cn
>  <
> > huxia...@horebdata.cn >:
> >
> >> Hello, Martin,
> >>
> >> I notice that Croit advocate the use of ceph cluster without OS
> disks, but
> >> with PXE boot.
> >>
> >> Do you use a NFS server to serve the root file system for each
> node? such
> >> as hosting configuration files, user and password, log files,
> etc. My
> >> question is, will the NFS server be a single point of failure?
> If the NFS
> >> server goes down, the network experience any outage, ceph nodes
> may not be
> >> able to write to the local file systems, possibly leading to
> service outage.
> >>
> >> How do you deal with the above potential issues in prod

[ceph-users] rbd-mirror -> how far behind_master am i time wise?

2020-03-24 Thread Ml Ml
Hello List,

i use rbd-mirror and i asynchronously mirror to my backup cluster.
My backup cluster only has "spinnung rust" and wont be able to always
perform like the live cluster.

Thats is fine for me, as far as it´s not further behind than 12h.

vm-194-disk-1:
  global_id:   7a95730f-451c-4973-8038-2a59e29ac5ad
  state:   up+replaying
  description: replaying, master_position=[object_number=1046,
tag_tid=4, entry_tid=936210], mirror_position=[object_number=911,
tag_tid=4, entry_tid=815131], entries_behind_master=121079
  last_update: 2020-03-24 08:43:43

I learned, that the entries_behind_master are single transactions. But
what i am really interested in is: How far am i behind time wise?
Is there a way to tell his?

Thanks,
Michael
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephfs mount error 1 = Operation not permitted

2020-03-24 Thread Eugen Block
I suppose the correct syntax is that anything after "client." is the  
name? So:


ceph fs authorize cephfs client.bob / r / rw

Would authorize a client named bob?


Yes, exactly:

admin:~ # ceph fs authorize cephfs client.bob / r / rw
[client.bob]
key = AQAyw3leAv9tKxAA+wtNEa40yK6svPE/VPlqdA==

admin:~ # mount -t ceph mon1:/ /mnt/ -o  
name=bob,secret=AQAyw3leAv9tKxAA+wtNEa40yK6svPE/VPlqdA==

admin:~ # touch /mnt/file


Zitat von "Dungan, Scott A." :

That was it! I am not sure how I got confused with the client name  
syntax. When I issued the command to create a client key, I used:


ceph fs authorize cephfs client.1 / r / rw

I assumed from the syntax that my client name is "client.1"

I suppose the correct syntax is that anything after "client." is the  
name? So:


ceph fs authorize cephfs client.bob / r / rw

Would authorize a client named bob?

-Scott

From: Eugen Block 
Sent: Monday, March 23, 2020 11:30 AM
To: Dungan, Scott A. 
Cc: Yan, Zheng ; ceph-users@ceph.io 
Subject: Re: [ceph-users] Re: Cephfs mount error 1 = Operation not permitted

Wait, your client name is just "1"? In that case you need to specify
that in your mount command:

mount ... -o name=1,secret=...

It has to match your ceph auth settings, where "client" is only a
prefix and is followed by the client's name

[client.1]


Zitat von "Dungan, Scott A." :


Tried that:

[client.1]
key = ***
caps mds = "allow rw path=/"
caps mon = "allow r"
caps osd = "allow rw tag cephfs pool=meta_data, allow rw pool=data"

No change.



From: Yan, Zheng 
Sent: Sunday, March 22, 2020 9:28 PM
To: Dungan, Scott A. 
Cc: Eugen Block ; ceph-users@ceph.io 
Subject: Re: [ceph-users] Re: Cephfs mount error 1 = Operation not permitted

On Sun, Mar 22, 2020 at 8:21 AM Dungan, Scott A.  
 wrote:


Zitat, thanks for the tips.

I tried appending the key directly in the mount command
(secret=) and that produced the same error.

I took a look at the thread you suggested and I ran the commands
that Paul at Croit suggested even though I the ceph dashboard
showed "cephs" as already set as the application on both my data
and metadata pools:

[root@ceph-n4 ~]# ceph osd pool application set data cephfs data cephfs
set application 'cephfs' key 'data' to 'cephfs' on pool 'data'
[root@ceph-n4 ~]# ceph osd pool application set meta_data cephfs
metadata cephfs
set application 'cephfs' key 'metadata' to 'cephfs' on pool 'meta_data'

No change. I get the "mount error 1 = Operation not permitted"
error the same as before.

I also tried manually editing the caps osd pool tags for my
client.1, to allow rw to both the data pool as well as the metadata
pool, as suggested further in the thread:

[client.1]
key = ***
caps mds = "allow rw path=all"



try replacing this with  "allow rw path=/"


caps mon = "allow r"
caps osd = "allow rw tag cephfs pool=meta_data, allow rw pool=data"

No change.


From: Eugen Block 
Sent: Saturday, March 21, 2020 1:16 PM
To: ceph-users@ceph.io 
Subject: [ceph-users] Re: Cephfs mount error 1 = Operation not permitted

I just remembered there was a thread [1] about that a couple of weeks
ago. Seems like you need to add the capabilities to the client.

[1]
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/23FDDSYBCDVMYGCUTALACPFAJYITLOHJ/#I6LJR72AJGOCGINVOVEVSCKRIWV5TTZ2


Zitat von Eugen Block :

> Hi,
>
> have you tried to mount with the secret only instead of a secret file?
>
> mount -t ceph ceph-n4:6789:/ /ceph -o name=client.1,secret=
>
> If that works your secret file is not right. If not you should check
> if the client actually has access to the cephfs pools ('ceph auth
> list').
>
>
>
> Zitat von "Dungan, Scott A." :
>
>> I am still very new to ceph and I have just set up my first small
>> test cluster. I have Cephfs enabled (named cephfs) and everything
>> is good in the dashboard. I added an authorized user key for cephfs
>> with:
>>
>> ceph fs authorize cephfs client.1 / r / rw
>>
>> I then copied the key to a file with:
>>
>> ceph auth get-key client.1 > /tmp/client.1.secret
>>
>> Copied the file over to the client and then attempt mount witth the
>> kernel driver:
>>
>> mount -t ceph ceph-n4:6789:/ /ceph -o
>> name=client.1,secretfile=/root/client.1.secret
>> mount error 1 = Operation not permitted
>>
>> I looked in the logs on the mds (which is also the mgr and mon for
>> the cluster) and I don't see any events logged for this. I also
>> tried the mount command with verbose and I didn't get any further
>> detail. Any tips would be most appreciated.
>>
>> --
>>
>> Scott Dungan
>> California Institute of Technology
>> Office: (626) 395-3170
>> sdun...@caltech.edu
>>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsub

[ceph-users] Re: Questions on Ceph cluster without OS disks

2020-03-24 Thread Martin Verges
Hello Thomas,

we export the Logs using systemd-journald-remote / -upload. Long term
retention can be done configuring an external syslog / elk / .. using our
config file.

--
Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx


Am Di., 24. März 2020 um 08:47 Uhr schrieb Thomas Schneider <
74cmo...@gmail.com>:

> Hello Martin,
>
> I suspect you're using a central syslog server.
> Can you share information which central syslog server you use?
> Is this central server running on ceph cluster, too?
>
> Regards
> Thomas
>
> Am 23.03.2020 um 09:39 schrieb Martin Verges:
>
> Hello Thomas,
>
> by default we allocate 1GB per Host on the Management Node, nothing on the
> PXE booted server.
>
> This value can be changed in the management container config file
> (/config/config.yml):
> > ...
> > logFilesPerServerGB: 1
> > ...
> After changing the config, you need to restart the mgmt container.
>
> --
> Martin Verges
> Managing director
>
> Mobile: +49 174 9335695
> E-Mail: martin.ver...@croit.io
> Chat: https://t.me/MartinVerges
>
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
>
> Web: https://croit.io
> YouTube: https://goo.gl/PGE1Bx
>
>
> Am Mo., 23. März 2020 um 09:30 Uhr schrieb Thomas Schneider <
> 74cmo...@gmail.com>:
>
>> Hello Martin,
>>
>> how much disk space do you reserve for log in the PXE setup?
>>
>> Regards
>> Thomas
>>
>> Am 22.03.2020 um 20:50 schrieb Martin Verges:
>> > Hello Samuel,
>> >
>> > we from croit.io don't use NFS to boot up Servers. We copy the OS
>> directly
>> > into the RAM (approximately 0.5-1GB). Think of it like a container, you
>> > start it and throw it away when you no longer need it.
>> > This way we can save the slots of OS harddisks to add more storage per
>> node
>> > and reduce overall costs as 1GB ram is cheaper then an OS disk and
>> consumes
>> > less power.
>> >
>> > If our management node is down, nothing will happen to the cluster. No
>> > impact, no downtime. However, you do need the mgmt node to boot up the
>> > cluster. So after a very rare total power outage, your first system
>> would
>> > be the mgmt node and then the cluster itself. But again, if you
>> configure
>> > your systems correct, no manual work is required to recover from that.
>> For
>> > everything else, it is possible (but definitely not needed) to deploy
>> our
>> > mgmt node in active/passive HA.
>> >
>> > We have multiple hundred installations worldwide in production
>> > environments. Our strong PXE knowledge comes from more than 20 years of
>> > datacenter hosting experience and it never ever failed us in the last
>> >10
>> > years.
>> >
>> > The main benefits out of that:
>> >  - Immutable OS freshly booted: Every host has exactly the same version,
>> > same library, kernel, Ceph versions,...
>> >  - OS is heavily tested by us: Every croit deployment has exactly the
>> same
>> > image. We can find errors much faster and hit much fewer errors.
>> >  - Easy Update: Updating OS, Ceph or anything else is just a node
>> reboot.
>> > No cluster downtime, No service Impact, full automatic handling by our
>> mgmt
>> > Software.
>> >  - No need to install OS: No maintenance costs, no labor required, no
>> other
>> > OS management required.
>> >  - Centralized Logs/Stats: As it is booted in memory, all logs and
>> > statistics are collected on a central place for easy access.
>> >  - Easy to scale: It doesn't matter if you boot 3 oder 300 nodes, all
>> > boot the exact same image in a few seconds.
>> >  .. lots more
>> >
>> > Please do not hesitate to contact us directly. We always try to offer an
>> > excellent service and are strongly customer oriented.
>> >
>> > --
>> > Martin Verges
>> > Managing director
>> >
>> > Mobile: +49 174 9335695
>> > E-Mail: martin.ver...@croit.io
>> > Chat: https://t.me/MartinVerges
>> >
>> > croit GmbH, Freseniusstr. 31h, 81247 Munich
>> > CEO: Martin Verges - VAT-ID: DE310638492
>> > Com. register: Amtsgericht Munich HRB 231263
>> >
>> > Web: https://croit.io
>> > YouTube: https://goo.gl/PGE1Bx
>> >
>> >
>> > Am Sa., 21. März 2020 um 13:53 Uhr schrieb huxia...@horebdata.cn <
>> > huxia...@horebdata.cn>:
>> >
>> >> Hello, Martin,
>> >>
>> >> I notice that Croit advocate the use of ceph cluster without OS disks,
>> but
>> >> with PXE boot.
>> >>
>> >> Do you use a NFS server to serve the root file system for each node?
>> such
>> >> as hosting configuration files, user and password, log files, etc. My
>> >> question is, will the NFS server be a single point of failure? If the
>> NFS
>> >> server goes down, the network experience any outage, ceph nodes may
>> not be
>> >> able to write to the local file systems, possibly leading to serv

[ceph-users] Re: Questions on Ceph cluster without OS disks

2020-03-24 Thread Marc Roos

The default rsyslog in centos has been able to do remote logging for 
many years. 


-Original Message-
Cc: ceph-users
Subject: [ceph-users] Re: Questions on Ceph cluster without OS disks

Hello Martin,

I suspect you're using a central syslog server.
Can you share information which central syslog server you use?
Is this central server running on ceph cluster, too?

Regards
Thomas

Am 23.03.2020 um 09:39 schrieb Martin Verges:
> Hello Thomas,
>
> by default we allocate 1GB per Host on the Management Node, nothing on 

> the PXE booted server.
>
> This value can be changed in the management container config file
> (/config/config.yml):
> > ...
> > logFilesPerServerGB: 1
> > ...
> After changing the config, you need to restart the mgmt container.
>
> --
> Martin Verges
> Managing director
>
> Mobile: +49 174 9335695
> E-Mail: martin.ver...@croit.io 
> Chat: https://t.me/MartinVerges
>
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht 
> Munich HRB 231263
>
> Web: https://croit.io
> YouTube: https://goo.gl/PGE1Bx
>
>
> Am Mo., 23. März 2020 um 09:30 Uhr schrieb Thomas Schneider 
> <74cmo...@gmail.com >:
>
> Hello Martin,
>
> how much disk space do you reserve for log in the PXE setup?
>
> Regards
> Thomas
>
> Am 22.03.2020 um 20:50 schrieb Martin Verges:
> > Hello Samuel,
> >
> > we from croit.io  don't use NFS to boot up
> Servers. We copy the OS directly
> > into the RAM (approximately 0.5-1GB). Think of it like a
> container, you
> > start it and throw it away when you no longer need it.
> > This way we can save the slots of OS harddisks to add more
> storage per node
> > and reduce overall costs as 1GB ram is cheaper then an OS disk
> and consumes
> > less power.
> >
> > If our management node is down, nothing will happen to the
> cluster. No
> > impact, no downtime. However, you do need the mgmt node to boot
> up the
> > cluster. So after a very rare total power outage, your first
> system would
> > be the mgmt node and then the cluster itself. But again, if you
> configure
> > your systems correct, no manual work is required to recover from
> that. For
> > everything else, it is possible (but definitely not needed) to
> deploy our
> > mgmt node in active/passive HA.
> >
> > We have multiple hundred installations worldwide in production
> > environments. Our strong PXE knowledge comes from more than 20
> years of
> > datacenter hosting experience and it never ever failed us in the
> last >10
> > years.
> >
> > The main benefits out of that:
> >  - Immutable OS freshly booted: Every host has exactly the same
> version,
> > same library, kernel, Ceph versions,...
> >  - OS is heavily tested by us: Every croit deployment has
> exactly the same
> > image. We can find errors much faster and hit much fewer errors.
> >  - Easy Update: Updating OS, Ceph or anything else is just a
> node reboot.
> > No cluster downtime, No service Impact, full automatic handling
> by our mgmt
> > Software.
> >  - No need to install OS: No maintenance costs, no labor
> required, no other
> > OS management required.
> >  - Centralized Logs/Stats: As it is booted in memory, all logs 
and
> > statistics are collected on a central place for easy access.
> >  - Easy to scale: It doesn't matter if you boot 3 oder 300
> nodes, all
> > boot the exact same image in a few seconds.
> >  .. lots more
> >
> > Please do not hesitate to contact us directly. We always try to
> offer an
> > excellent service and are strongly customer oriented.
> >
> > --
> > Martin Verges
> > Managing director
> >
> > Mobile: +49 174 9335695
> > E-Mail: martin.ver...@croit.io 
> > Chat: https://t.me/MartinVerges
> >
> > croit GmbH, Freseniusstr. 31h, 81247 Munich
> > CEO: Martin Verges - VAT-ID: DE310638492
> > Com. register: Amtsgericht Munich HRB 231263
> >
> > Web: https://croit.io
> > YouTube: https://goo.gl/PGE1Bx
> >
> >
> > Am Sa., 21. März 2020 um 13:53 Uhr schrieb huxia...@horebdata.cn
>  <
> > huxia...@horebdata.cn >:
> >
> >> Hello, Martin,
> >>
> >> I notice that Croit advocate the use of ceph cluster without OS
> disks, but
> >> with PXE boot.
> >>
> >> Do you use a NFS server to serve the root file system for each
> node? such
> >> as hosting configuration files, user and password, log files,
> etc. My
> >> question is, will the NFS server be a single point of failure?
> If the NFS
> >> server goes down, the network experience an

[ceph-users] Re: MGRs failing once per day and generally slow response times

2020-03-24 Thread Janek Bevendorff
For anybody finding this thread via Google or something, here's a link
to a (so far unresolved) bug report: https://tracker.ceph.com/issues/39264


On 19/03/2020 17:37, Janek Bevendorff wrote:
> Sorry for nagging, but is there a solution to this? Routinely restarting
> my MGRs every few hours isn't how I want to spend my time (although I
> guess I could schedule a cron job for that).
>
>
> On 16/03/2020 09:35, Janek Bevendorff wrote:
>> Over the weekend, all five MGRs failed, which means we have no more
>> Prometheus monitoring data. We are obviously monitoring the MGR status
>> as well, so we can detect the failure, but it's still a pretty serious
>> issue. Any ideas as to why this might happen?
>>
>>
>> On 13/03/2020 16:56, Janek Bevendorff wrote:
>>> Indeed. I just had another MGR go bye-bye. I don't think host clock
>>> skew is the problem.
>>>
>>>
>>> On 13/03/2020 15:29, Anthony D'Atri wrote:
 Chrony does converge faster, but I doubt this will solve your
 problem if you don’t have quality peers. Or if it’s not really a
 time problem.

> On Mar 13, 2020, at 6:44 AM, Janek Bevendorff
>  wrote:
>
> I replaced ntpd with chronyd and will let you know if it changes
> anything. Thanks.
>
>
>> On 13/03/2020 06:25, Konstantin Shalygin wrote:
>>> On 3/13/20 12:57 AM, Janek Bevendorff wrote:
>>> NTPd is running, all the nodes have the same time to the second.
>>> I don't think that is the problem.
>> As always in such cases - try to switch your ntpd to default EL7
>> daemon - chronyd.
>>
>>
>>
>> k
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Newbie to Ceph jacked up his monitor

2020-03-24 Thread Eneko Lacunza

Hi Jarett,

El 23/3/20 a las 3:52, Jarett DeAngelis escribió:

So, I thought I’d post with what I learned re: what to do with this problem.

This system is a 3-node Proxmox cluster, and each node had:

1 x 1TB NVMe
2 x 512GB HDD

I had maybe 100GB of data in this system total. Then I added:

2 x 256GB SSD
1 x 1TB HDD

To each system, and let it start rebalancing. When it started the management 
interface showed the storage as being out of order in various ways, but it was 
clear that Ceph was rebalancing PGs across the 3 nodes and the “broken” part of 
the graphic display was shrinking as it spread data across the added OSDs.

In the process, however, the monitors racked up ENORMOUS amounts of files. On 
one machine, the boot drive only has 64GB of space total so the partition where 
/var/lib/ceph/somethingsomething.db lived was only 27GB. This filled up very, 
very fast, and eventually killed the monitor on that node. I figured out you 
can `ceph-monstore-tool compact` or `ceph-kvstore-tool rocksdb /path compact` 
to get the system to truncate the files in there, but even when I scheduled 
those jobs to run on each monitor every minute the amount of space being taken 
up by those rocksdb files grew and grew until they threatened to kill the 
monitors on the nodes with larger amounts of space too. Other, dumber measures 
I tried taking to give the system more space for these files ended up screwing 
up my Proxmox system, so now I have to reinstall.

What can be done about this problem so that I don’t have this issue when I try 
to implement again?

This is quite low in details to understand well what happened. I'll 
supose you added the new disks all at once, without waiting the 
rebalancing to finish?


I'd suggest:

- Use same size disks. If you add new disks, don't add smaller disks.
- Add one disk (OSD) at a time.

We usually use 20GB root partitions for monitors, never had any problem 
with disk size (small clusters like yours).


Cheers
Eneko

--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943569206
Astigarragako bidea 2, 2º izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
www.binovo.es
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW failing to create bucket

2020-03-24 Thread Abhinav Singh
anyone?

On Mon, 23 Mar 2020, 23:39 Abhinav Singh, 
wrote:

> please someone help me
>
> On Mon, 23 Mar 2020, 19:44 Abhinav Singh, 
> wrote:
>
>>
>>
>> -- Forwarded message -
>> From: Abhinav Singh 
>> Date: Mon, Mar 23, 2020 at 7:43 PM
>> Subject: RGW failing to create bucket
>> To: 
>>
>>
>> ceph : octopus
>> JaegerTracing : master
>> ubuntu : 18.04
>>
>> When I implementing jaeger tracing it is unable to create a bucket.
>> (I m using swif to perform testing.)
>> /src/librados/IoCtxImpl.cc
>>
>> ```
>> void librados::IoCtxImpl::queue_aio_write(AioCompletionImpl *c)
>> {
>> std::cout<<"yes"<> JTracer tracer;
>> tracer.initTracer("Writing Started",
>> "/home/abhinav/Desktop/GSOC/deepika/ceph/src/librados/tracerConfig.yaml"
>> );
>> Span span=tracer.newSpan("writing started");
>> span->Finish();
>> try{
>> auto yaml = YAML::LoadFile("tracerConfig.yaml");
>> }catch(const YAML::ParserException& pe){
>> // ldout<> std::cout<> ofstream f;
>> f.open("/home/abhinav/Desktop/err.txt");
>> f<> f.close();
>> }
>> // auto config = jaegertracing::Config::parse(yaml);
>> // auto tracer=jaegertracing::Tracer::make(
>> // "Writing",
>> // config,
>> // jaegertracing::logging::consoleLogger()
>> // );
>> // opentracing::Tracer::InitGlobal(
>> // static_pointer_cast(tracer)
>> // );
>> // auto span = opentracing::Tracer::Global()->StartSpan("Span1");
>> get();
>> ofstream file;
>> file.open("/home/abhinav/Desktop/write.txt",std::ios::out | std::ios
>> ::app);
>> file<<"Writing /src/librados/IoCtxImpl.cc 310.\n";
>> file.close();
>> std::scoped_lock l{aio_write_list_lock};
>> ceph_assert(c->io == this);
>> c->aio_write_seq = ++aio_write_seq;
>> ldout(client->cct, 20) << "queue_aio_write " << this << " completion "
>> << c
>> << " write_seq " << aio_write_seq << dendl;
>> aio_write_list.push_back(&c->aio_write_list_item);
>> // opentracing::Tracer::Global()->Close();
>> }
>> ```
>>  /include/tracer.h
>> ```
>> typedef std::unique_ptr Span;
>>
>> class JTracer{
>> public:
>> JTracer(){}
>> ~JTracer(){
>> opentracing::Tracer::Global()->Close();
>> }
>> void static inline loadYamlConfigFile(const char* path){
>> return;
>> }
>> void initTracer(const char* tracerName,const char* filePath){
>> auto yaml = YAML::LoadFile(filePath);
>> auto configuration = jaegertracing::Config::parse(yaml);
>> auto tracer = jaegertracing::Tracer::make(
>> tracerName,
>> configuration,
>> jaegertracing::logging::consoleLogger());
>> opentracing::Tracer::InitGlobal(
>> std::static_pointer_cast(tracer));
>> Span s=opentracing::Tracer::Global()->StartSpan("Testing");
>> s->Finish();
>> }
>> Span newSpan(const char* spanName){
>> Span span=opentracing::Tracer::Global()->StartSpan(spanName);
>> return std::move(span);
>> }
>> Span childSpan(const char* spanName,const Span& parentSpan){
>> Span span = opentracing::Tracer::Global()->StartSpan(spanName, {
>> opentracing::ChildOf(&parentSpan->context())});
>> return std::move(span);
>> }
>> Span followUpSpan(const char *spanName, const Span& parentSpan){
>> Span span = opentracing::Tracer::Global()->StartSpan(spanName, {
>> opentracing::FollowsFrom(&parentSpan->context())});
>> return std::move(span);
>> }
>> };
>> ```
>>
>> Output when trying to create new container
>>
>> ```
>> errno 111 connection refused
>> ```
>> But when I remove the tracer part in IoCtxImpl.cc it is workng fine.
>>
>> I m new to ceph, and dont know what information to share to correctly
>> track down the problem, if any extra informtion is needed I will share it
>> instantly.
>>
>> Been stuck into this issue for one week.
>> Please someone help me!
>>
>> Thank you.
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW failing to create bucket

2020-03-24 Thread Casey Bodley
On Tue, Mar 24, 2020 at 6:14 AM Abhinav Singh 
wrote:

> anyone?
>
> On Mon, 23 Mar 2020, 23:39 Abhinav Singh, 
> wrote:
>
> > please someone help me
> >
> > On Mon, 23 Mar 2020, 19:44 Abhinav Singh, 
> > wrote:
> >
> >>
> >>
> >> -- Forwarded message -
> >> From: Abhinav Singh 
> >> Date: Mon, Mar 23, 2020 at 7:43 PM
> >> Subject: RGW failing to create bucket
> >> To: 
> >>
> >>
> >> ceph : octopus
> >> JaegerTracing : master
> >> ubuntu : 18.04
> >>
> >> When I implementing jaeger tracing it is unable to create a bucket.
> >> (I m using swif to perform testing.)
> >> /src/librados/IoCtxImpl.cc
> >>
> >> ```
> >> void librados::IoCtxImpl::queue_aio_write(AioCompletionImpl *c)
> >> {
> >> std::cout<<"yes"< >> JTracer tracer;
> >> tracer.initTracer("Writing Started",
> >> "/home/abhinav/Desktop/GSOC/deepika/ceph/src/librados/tracerConfig.yaml"
> >> );
> >> Span span=tracer.newSpan("writing started");
> >> span->Finish();
> >> try{
> >> auto yaml = YAML::LoadFile("tracerConfig.yaml");
> >> }catch(const YAML::ParserException& pe){
> >> // ldout< >> std::cout< >> ofstream f;
> >> f.open("/home/abhinav/Desktop/err.txt");
> >> f< >> f.close();
> >> }
> >> // auto config = jaegertracing::Config::parse(yaml);
> >> // auto tracer=jaegertracing::Tracer::make(
> >> // "Writing",
> >> // config,
> >> // jaegertracing::logging::consoleLogger()
> >> // );
> >> // opentracing::Tracer::InitGlobal(
> >> // static_pointer_cast(tracer)
> >> // );
> >> // auto span = opentracing::Tracer::Global()->StartSpan("Span1");
> >> get();
> >> ofstream file;
> >> file.open("/home/abhinav/Desktop/write.txt",std::ios::out | std::ios
> >> ::app);
> >> file<<"Writing /src/librados/IoCtxImpl.cc 310.\n";
> >> file.close();
> >> std::scoped_lock l{aio_write_list_lock};
> >> ceph_assert(c->io == this);
> >> c->aio_write_seq = ++aio_write_seq;
> >> ldout(client->cct, 20) << "queue_aio_write " << this << " completion "
> >> << c
> >> << " write_seq " << aio_write_seq << dendl;
> >> aio_write_list.push_back(&c->aio_write_list_item);
> >> // opentracing::Tracer::Global()->Close();
> >> }
> >> ```
> >>  /include/tracer.h
> >> ```
> >> typedef std::unique_ptr Span;
> >>
> >> class JTracer{
> >> public:
> >> JTracer(){}
> >> ~JTracer(){
> >> opentracing::Tracer::Global()->Close();
> >> }
> >> void static inline loadYamlConfigFile(const char* path){
> >> return;
> >> }
> >> void initTracer(const char* tracerName,const char* filePath){
> >> auto yaml = YAML::LoadFile(filePath);
> >> auto configuration = jaegertracing::Config::parse(yaml);
> >> auto tracer = jaegertracing::Tracer::make(
> >> tracerName,
> >> configuration,
> >> jaegertracing::logging::consoleLogger());
> >> opentracing::Tracer::InitGlobal(
> >> std::static_pointer_cast(tracer));
> >> Span s=opentracing::Tracer::Global()->StartSpan("Testing");
> >> s->Finish();
> >> }
> >> Span newSpan(const char* spanName){
> >> Span span=opentracing::Tracer::Global()->StartSpan(spanName);
> >> return std::move(span);
> >> }
> >> Span childSpan(const char* spanName,const Span& parentSpan){
> >> Span span = opentracing::Tracer::Global()->StartSpan(spanName, {
> >> opentracing::ChildOf(&parentSpan->context())});
> >> return std::move(span);
> >> }
> >> Span followUpSpan(const char *spanName, const Span& parentSpan){
> >> Span span = opentracing::Tracer::Global()->StartSpan(spanName, {
> >> opentracing::FollowsFrom(&parentSpan->context())});
> >> return std::move(span);
> >> }
> >> };
> >> ```
> >>
> >> Output when trying to create new container
> >>
> >> ```
> >> errno 111 connection refused
> >> ```
>

connection refused probably means that radosgw crashed or isn't running. if
it crashed, you might find out why by looking at its log file

>> But when I remove the tracer part in IoCtxImpl.cc it is workng fine.
> >>
> >> I m new to ceph, and dont know what information to share to correctly
> >> track down the problem, if any extra informtion is needed I will share
> it
> >> instantly.
> >>
> >> Been stuck into this issue for one week.
> >> Please someone help me!
> >>
> >> Thank you.
> >>
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: multi-node NFS Ganesha + libcephfs caching

2020-03-24 Thread Daniel Gryniewicz




On 3/23/20 4:31 PM, Maged Mokhtar wrote:


On 23/03/2020 20:50, Jeff Layton wrote:

On Mon, 2020-03-23 at 15:49 +0200, Maged Mokhtar wrote:

Hello all,

For multi-node NFS Ganesha over CephFS, is it OK to leave libcephfs 
write caching on, or should it be configured off for failover ?



You can do libcephfs write caching, as the caps would need to be
recalled for any competing access. What you really want to avoid is any
sort of caching at the ganesha daemon layer.


Hi Jeff,

Thanks for your reply. I meant caching by libcepfs used within the 
ganesha ceph fsal plugin, which i am not sure from your reply if this is 
what you refer to as ganesha daemon layer (or does the later mean the 
internal mdcache in ganesha). I really appreciate if you can clarify 
this point.


Caching in libcephfs is fine, it's caching above the FSAL layer that you 
should avoid.




I really have doubts that it is safe to leave write caching in the 
plugin and have safe failover, yet i see comments in the conf file such as:

# The libcephfs client will aggressively cache information while it
# can, so there is little benefit to ganesha actively caching the same
# objects.

Or is it up to the NFS client to issue cache syncs and re-submit writes 
if it detects failover ?


Correct.  During failover, NFS will go into it's Grace period, which 
blocks new state,  and allow the NFS clients to re-acquire the state 
(opens, locks, delegations, etc.).  This includes re-sending any 
non-committed writes (commits will cause the data to be saved to the 
cluster, not just the libcephfs cache).  Once this is all done, normal 
operation proceeds.  It should be safe, even with caching in libcephfs.


Daniel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] v15.2.0 Octopus released

2020-03-24 Thread Abhishek Lekshmanan

We're happy to announce the first stable release of Octopus v15.2.0.
There are a lot of changes and new features added, we advise everyone to
read the release notes carefully, and in particular the upgrade notes,
before upgrading. Please refer to the official blog entry
https://ceph.io/releases/v15-2-0-octopus-released/ for a detailed
version with links & changelog.

This release wouldn't have been possible without the support of the
community, this release saw contributions from over 330 developers & 80
organizations, and we thank everyone for making this release happen.

Major Changes from Nautilus
---
General
~~~
* A new deployment tool called **cephadm** has been introduced that
  integrates Ceph daemon deployment and management via containers
  into the orchestration layer. 
* Health alerts can now be muted, either temporarily or permanently.
* Health alerts are now raised for recent Ceph daemons crashes.
* A simple 'alerts' module has been introduced to send email
  health alerts for clusters deployed without the benefit of an
  existing external monitoring infrastructure.
* Packages are built for the following distributions:
  - CentOS 8
  - CentOS 7 (partial--see below)
  - Ubuntu 18.04 (Bionic)
  - Debian Buster
  - Container images (based on CentOS 8)

  Note that the dashboard, prometheus, and restful manager modules
  will not work on the CentOS 7 build due to Python 3 module
  dependencies that are missing in CentOS 7.

  Besides this packages built by the community will also available for the
  following distros:
  - Fedora (33/rawhide)
  - openSUSE (15.2, Tumbleweed)

Dashboard
~
The mgr-dashboard has gained a lot of new features and functionality:

* UI Enhancements
  - New vertical navigation bar
  - New unified sidebar: better background task and events notification
  - Shows all progress mgr module notifications
  - Multi-select on tables to perform bulk operations

* Dashboard user account security enhancements
  - Disabling/enabling existing user accounts
  - Clone an existing user role
  - Users can change their own password
  - Configurable password policies: Minimum password complexity/length
requirements
  - Configurable password expiration
  - Change password after first login

New and enhanced management of Ceph features/services:

* OSD/device management
  - List all disks associated with an OSD
  - Add support for blinking enclosure LEDs via the orchestrator
  - List all hosts known by the orchestrator
  - List all disks and their properties attached to a node
  - Display disk health information (health prediction and SMART data)
  - Deploy new OSDs on new disks/hosts
  - Display and allow sorting by an OSD's default device class in the OSD
table
  - Explicitly set/change the device class of an OSD, display and sort OSDs by
device class

* Pool management
  - Viewing and setting pool quotas
  - Define and change per-pool PG autoscaling mode

* RGW management enhancements
  - Enable bucket versioning
  - Enable MFA support
  - Select placement target on bucket creation

* CephFS management enhancements
  - CephFS client eviction
  - CephFS snapshot management
  - CephFS quota management
  - Browse CephFS directory

* iSCSI management enhancements
  - Show iSCSI GW status on landing page
  - Prevent deletion of IQNs with open sessions
  - Display iSCSI "logged in" info

* Prometheus alert management
  - List configured Prometheus alerts

RADOS
~  
* Objects can now be brought in sync during recovery by copying only
  the modified portion of the object, reducing tail latencies during
  recovery.
* Ceph will allow recovery below *min_size* for Erasure coded pools,
  wherever possible.
* The PG autoscaler feature introduced in Nautilus is enabled for
  new pools by default, allowing new clusters to autotune *pg num*
  without any user intervention.  The default values for new pools
  and RGW/CephFS metadata pools have also been adjusted to perform
  well for most users.
* BlueStore has received several improvements and performance
  updates, including improved accounting for "omap" (key/value)
  object data by pool, improved cache memory management, and a
  reduced allocation unit size for SSD devices.  (Note that by
  default, the first time each OSD starts after upgrading to octopus
  it will trigger a conversion that may take from a few minutes to a
  few hours, depending on the amount of stored "omap" data.)
* Snapshot trimming metadata is now managed in a more efficient and
  scalable fashion.

RBD block storage
~  
* Mirroring now supports a new snapshot-based mode that no longer requires
  the journaling feature and its related impacts in exchange for the loss
  of point-in-time consistency (it remains crash consistent).
* Clone operations now preserve the sparseness of the underlying RBD image.
* The trash feature has been improved to (optionally) automatically
  move old parent images to the trash when the

[ceph-users] French Classes in Bangalore

2020-03-24 Thread Ria institute
RIA Institute of Technology Provides easy to understand French Language Classes 
in Bangalore. Learning French is considered to be a part of the curriculum for 
many professional jobs in major Multinational companies. We offer French 
Training in Marathahalli, Bangalore to aspiring Professionals and Students who 
are looking forward to upgrade their skills in Foreign Language.

We offer French Training Classes to Educational Segments such as First and 
Second PUC College Students, Working Professionals, Business Travellers and 
Language learning enthusiasts. Structured Course content and individual focused 
approach has enabled us to be one of the Best Training Institutes in Bangalore 
when it comes to French Language Courses. Our Trainers are Professionals in 
their field with relative experience.

https://www.riainstitute.co.in/French-Language-Training-in-Bangalore.html
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: multi-node NFS Ganesha + libcephfs caching

2020-03-24 Thread Maged Mokhtar


On 24/03/2020 13:35, Daniel Gryniewicz wrote:



On 3/23/20 4:31 PM, Maged Mokhtar wrote:


On 23/03/2020 20:50, Jeff Layton wrote:

On Mon, 2020-03-23 at 15:49 +0200, Maged Mokhtar wrote:

Hello all,

For multi-node NFS Ganesha over CephFS, is it OK to leave libcephfs 
write caching on, or should it be configured off for failover ?



You can do libcephfs write caching, as the caps would need to be
recalled for any competing access. What you really want to avoid is any
sort of caching at the ganesha daemon layer.


Hi Jeff,

Thanks for your reply. I meant caching by libcepfs used within the 
ganesha ceph fsal plugin, which i am not sure from your reply if this 
is what you refer to as ganesha daemon layer (or does the later mean 
the internal mdcache in ganesha). I really appreciate if you can 
clarify this point.


Caching in libcephfs is fine, it's caching above the FSAL layer that 
you should avoid.




I really have doubts that it is safe to leave write caching in the 
plugin and have safe failover, yet i see comments in the conf file 
such as:

# The libcephfs client will aggressively cache information while it
# can, so there is little benefit to ganesha actively caching the same
# objects.

Or is it up to the NFS client to issue cache syncs and re-submit 
writes if it detects failover ?


Correct.  During failover, NFS will go into it's Grace period, which 
blocks new state,  and allow the NFS clients to re-acquire the state 
(opens, locks, delegations, etc.).  This includes re-sending any 
non-committed writes (commits will cause the data to be saved to the 
cluster, not just the libcephfs cache).  Once this is all done, normal 
operation proceeds.  It should be safe, even with caching in libcephfs.


Daniel

Thanks Daniel for the clarification..so it is the responsibility of the 
client tor re-send writes...2 questions so i can understand this better:


-If this is handled at the client..why on the gateway it is ok to cache 
at the FSAL layer but not above ?


-At what level/layer on the client does this get handled: NFS client 
layer (which will detect failover), filesystem layer, page cache...?


Thanks for your patience :)   /Maged

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: multi-node NFS Ganesha + libcephfs caching

2020-03-24 Thread Daniel Gryniewicz



On 3/24/20 8:19 AM, Maged Mokhtar wrote:


On 24/03/2020 13:35, Daniel Gryniewicz wrote:



On 3/23/20 4:31 PM, Maged Mokhtar wrote:


On 23/03/2020 20:50, Jeff Layton wrote:

On Mon, 2020-03-23 at 15:49 +0200, Maged Mokhtar wrote:

Hello all,

For multi-node NFS Ganesha over CephFS, is it OK to leave libcephfs 
write caching on, or should it be configured off for failover ?



You can do libcephfs write caching, as the caps would need to be
recalled for any competing access. What you really want to avoid is any
sort of caching at the ganesha daemon layer.


Hi Jeff,

Thanks for your reply. I meant caching by libcepfs used within the 
ganesha ceph fsal plugin, which i am not sure from your reply if this 
is what you refer to as ganesha daemon layer (or does the later mean 
the internal mdcache in ganesha). I really appreciate if you can 
clarify this point.


Caching in libcephfs is fine, it's caching above the FSAL layer that 
you should avoid.




I really have doubts that it is safe to leave write caching in the 
plugin and have safe failover, yet i see comments in the conf file 
such as:

# The libcephfs client will aggressively cache information while it
# can, so there is little benefit to ganesha actively caching the same
# objects.

Or is it up to the NFS client to issue cache syncs and re-submit 
writes if it detects failover ?


Correct.  During failover, NFS will go into it's Grace period, which 
blocks new state,  and allow the NFS clients to re-acquire the state 
(opens, locks, delegations, etc.).  This includes re-sending any 
non-committed writes (commits will cause the data to be saved to the 
cluster, not just the libcephfs cache).  Once this is all done, normal 
operation proceeds.  It should be safe, even with caching in libcephfs.


Daniel

Thanks Daniel for the clarification..so it is the responsibility of the 
client tor re-send writes...2 questions so i can understand this better:


-If this is handled at the client..why on the gateway it is ok to cache 
at the FSAL layer but not above ?


In principle, it's fine above.  However, that requires a level of 
coordination that's not there right now.  The libcephfs cache is 
integrated with the CAPs system, and knows when it can cache and when it 
needs to flush.  There's work to do to get that up to the higher layers.




-At what level/layer on the client does this get handled: NFS client 
layer (which will detect failover), filesystem layer, page cache...?


The NFS client layer, interacting with the VFS/page cache.  (NFS is the 
filesystem in this case, so technically the filesystem layer.)


Daniel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rbd-mirror -> how far behind_master am i time wise?

2020-03-24 Thread Jason Dillaman
On Tue, Mar 24, 2020 at 3:50 AM Ml Ml  wrote:
>
> Hello List,
>
> i use rbd-mirror and i asynchronously mirror to my backup cluster.
> My backup cluster only has "spinnung rust" and wont be able to always
> perform like the live cluster.
>
> Thats is fine for me, as far as it´s not further behind than 12h.
>
> vm-194-disk-1:
>   global_id:   7a95730f-451c-4973-8038-2a59e29ac5ad
>   state:   up+replaying
>   description: replaying, master_position=[object_number=1046,
> tag_tid=4, entry_tid=936210], mirror_position=[object_number=911,
> tag_tid=4, entry_tid=815131], entries_behind_master=121079
>   last_update: 2020-03-24 08:43:43
>
> I learned, that the entries_behind_master are single transactions. But
> what i am really interested in is: How far am i behind time wise?
> Is there a way to tell his?

Unfortunately, there is no current way to tell. However, this is being
actively worked on since it has been heavily requested lately.

> Thanks,
> Michael
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

--
Jason
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: multi-node NFS Ganesha + libcephfs caching

2020-03-24 Thread Maged Mokhtar


On 24/03/2020 15:14, Daniel Gryniewicz wrote:



On 3/24/20 8:19 AM, Maged Mokhtar wrote:


On 24/03/2020 13:35, Daniel Gryniewicz wrote:



On 3/23/20 4:31 PM, Maged Mokhtar wrote:


On 23/03/2020 20:50, Jeff Layton wrote:

On Mon, 2020-03-23 at 15:49 +0200, Maged Mokhtar wrote:

Hello all,

For multi-node NFS Ganesha over CephFS, is it OK to leave 
libcephfs write caching on, or should it be configured off for 
failover ?



You can do libcephfs write caching, as the caps would need to be
recalled for any competing access. What you really want to avoid 
is any

sort of caching at the ganesha daemon layer.


Hi Jeff,

Thanks for your reply. I meant caching by libcepfs used within the 
ganesha ceph fsal plugin, which i am not sure from your reply if 
this is what you refer to as ganesha daemon layer (or does the 
later mean the internal mdcache in ganesha). I really appreciate if 
you can clarify this point.


Caching in libcephfs is fine, it's caching above the FSAL layer that 
you should avoid.




I really have doubts that it is safe to leave write caching in the 
plugin and have safe failover, yet i see comments in the conf file 
such as:

# The libcephfs client will aggressively cache information while it
# can, so there is little benefit to ganesha actively caching the same
# objects.

Or is it up to the NFS client to issue cache syncs and re-submit 
writes if it detects failover ?


Correct.  During failover, NFS will go into it's Grace period, which 
blocks new state,  and allow the NFS clients to re-acquire the state 
(opens, locks, delegations, etc.).  This includes re-sending any 
non-committed writes (commits will cause the data to be saved to the 
cluster, not just the libcephfs cache).  Once this is all done, 
normal operation proceeds.  It should be safe, even with caching in 
libcephfs.


Daniel

Thanks Daniel for the clarification..so it is the responsibility of 
the client tor re-send writes...2 questions so i can understand this 
better:


-If this is handled at the client..why on the gateway it is ok to 
cache at the FSAL layer but not above ?


In principle, it's fine above.  However, that requires a level of 
coordination that's not there right now.  The libcephfs cache is 
integrated with the CAPs system, and knows when it can cache and when 
it needs to flush.  There's work to do to get that up to the higher 
layers.




-At what level/layer on the client does this get handled: NFS client 
layer (which will detect failover), filesystem layer, page cache...?


The NFS client layer, interacting with the VFS/page cache.  (NFS is 
the filesystem in this case, so technically the filesystem layer.)


Daniel



Thank you so much for the clarification..

Maged
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Space leak in Bluestore

2020-03-24 Thread vitalif

Hi.

I'm experiencing some kind of a space leak in Bluestore. I use EC, 
compression and snapshots. First I thought that the leak was caused by 
"virtual clones" (issue #38184). However, then I got rid of most of the 
snapshots, but continued to experience the problem.


I suspected something when I added a new disk to the cluster and free 
space in the cluster didn't increase (!).


So to track down the issue I moved one PG (34.1a) using upmaps from 
osd11,6,0 to osd6,0,7 and then back to osd11,6,0.


It ate +59 GB after the first move and +51 GB after the second. As I 
understand this proves that it's not #38184. Devirtualizaton of virtual 
clones couldn't eat additional space after SECOND rebalance of the same 
PG.


The PG has ~39000 objects, it is EC 2+1 and the compression is enabled. 
Compression ratio is about ~2.7 in my setup, so the PG should use ~90 GB 
raw space.


Before and after moving the PG I stopped osd0, mounted it with 
ceph-objectstore-tool with debug bluestore = 20/20 and opened the 
34.1a***/all directory. It seems to dump all object extents into the log 
in that case. So now I have two logs with all allocated extents for osd0 
(I hope all extents are there). I parsed both logs and added all 
compressed blob sizes together ("get_ref Blob ... 0x2 -> 0x... 
compressed"). But they add up to ~39 GB before first rebalance 
(34.1as2), ~22 GB after it (34.1as1) and ~41 GB again after the second 
move (34.1as2) which doesn't indicate a leak.


But the raw space usage still exceeds initial by a lot. So it's clear 
that there's a leak somewhere.


What additional details can I provide for you to identify the bug?

I posted the same message in the issue tracker, 
https://tracker.ceph.com/issues/44731


--
Vitaliy Filippov
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Space leak in Bluestore

2020-03-24 Thread Steven Pine
Hi Vitaliy,

You may be coming across the EC space amplification issue,
https://tracker.ceph.com/issues/44213

I am not aware of any recent updates to resolve this issue.

Sincerely,

On Tue, Mar 24, 2020 at 12:53 PM  wrote:

> Hi.
>
> I'm experiencing some kind of a space leak in Bluestore. I use EC,
> compression and snapshots. First I thought that the leak was caused by
> "virtual clones" (issue #38184). However, then I got rid of most of the
> snapshots, but continued to experience the problem.
>
> I suspected something when I added a new disk to the cluster and free
> space in the cluster didn't increase (!).
>
> So to track down the issue I moved one PG (34.1a) using upmaps from
> osd11,6,0 to osd6,0,7 and then back to osd11,6,0.
>
> It ate +59 GB after the first move and +51 GB after the second. As I
> understand this proves that it's not #38184. Devirtualizaton of virtual
> clones couldn't eat additional space after SECOND rebalance of the same
> PG.
>
> The PG has ~39000 objects, it is EC 2+1 and the compression is enabled.
> Compression ratio is about ~2.7 in my setup, so the PG should use ~90 GB
> raw space.
>
> Before and after moving the PG I stopped osd0, mounted it with
> ceph-objectstore-tool with debug bluestore = 20/20 and opened the
> 34.1a***/all directory. It seems to dump all object extents into the log
> in that case. So now I have two logs with all allocated extents for osd0
> (I hope all extents are there). I parsed both logs and added all
> compressed blob sizes together ("get_ref Blob ... 0x2 -> 0x...
> compressed"). But they add up to ~39 GB before first rebalance
> (34.1as2), ~22 GB after it (34.1as1) and ~41 GB again after the second
> move (34.1as2) which doesn't indicate a leak.
>
> But the raw space usage still exceeds initial by a lot. So it's clear
> that there's a leak somewhere.
>
> What additional details can I provide for you to identify the bug?
>
> I posted the same message in the issue tracker,
> https://tracker.ceph.com/issues/44731
>
> --
> Vitaliy Filippov
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Steven Pine
webair.com
*P*  516.938.4100 x
*E * steven.p...@webair.com


   

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: multi-node NFS Ganesha + libcephfs caching

2020-03-24 Thread Maged Mokhtar


On 24/03/2020 16:48, Maged Mokhtar wrote:


On 24/03/2020 15:14, Daniel Gryniewicz wrote:



On 3/24/20 8:19 AM, Maged Mokhtar wrote:


On 24/03/2020 13:35, Daniel Gryniewicz wrote:



On 3/23/20 4:31 PM, Maged Mokhtar wrote:


On 23/03/2020 20:50, Jeff Layton wrote:

On Mon, 2020-03-23 at 15:49 +0200, Maged Mokhtar wrote:

Hello all,

For multi-node NFS Ganesha over CephFS, is it OK to leave 
libcephfs write caching on, or should it be configured off for 
failover ?



You can do libcephfs write caching, as the caps would need to be
recalled for any competing access. What you really want to avoid 
is any

sort of caching at the ganesha daemon layer.


Hi Jeff,

Thanks for your reply. I meant caching by libcepfs used within the 
ganesha ceph fsal plugin, which i am not sure from your reply if 
this is what you refer to as ganesha daemon layer (or does the 
later mean the internal mdcache in ganesha). I really appreciate 
if you can clarify this point.


Caching in libcephfs is fine, it's caching above the FSAL layer 
that you should avoid.




I really have doubts that it is safe to leave write caching in the 
plugin and have safe failover, yet i see comments in the conf file 
such as:

# The libcephfs client will aggressively cache information while it
# can, so there is little benefit to ganesha actively caching the 
same

# objects.

Or is it up to the NFS client to issue cache syncs and re-submit 
writes if it detects failover ?


Correct.  During failover, NFS will go into it's Grace period, 
which blocks new state,  and allow the NFS clients to re-acquire 
the state (opens, locks, delegations, etc.). This includes 
re-sending any non-committed writes (commits will cause the data to 
be saved to the cluster, not just the libcephfs cache).  Once this 
is all done, normal operation proceeds.  It should be safe, even 
with caching in libcephfs.


Daniel

Thanks Daniel for the clarification..so it is the responsibility of 
the client tor re-send writes...2 questions so i can understand this 
better:


-If this is handled at the client..why on the gateway it is ok to 
cache at the FSAL layer but not above ?


In principle, it's fine above.  However, that requires a level of 
coordination that's not there right now.  The libcephfs cache is 
integrated with the CAPs system, and knows when it can cache and when 
it needs to flush.  There's work to do to get that up to the higher 
layers.




-At what level/layer on the client does this get handled: NFS client 
layer (which will detect failover), filesystem layer, page cache...?


The NFS client layer, interacting with the VFS/page cache.  (NFS is 
the filesystem in this case, so technically the filesystem layer.)


Daniel



Thank you so much for the clarification..

Maged
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


One more thing: for non-Linux clients, specifically VMWare, their NFS 
client may not behave the same, correct ?  In the iSCSI domain, VMWare 
does not have any kind of buffer/page cache, which is probably to 
support failover among ESXi nodes, should i test this or am i on the 
wrong track ? /Maged


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Space leak in Bluestore

2020-03-24 Thread vitalif

Hi Steve,

Thanks, it's an interesting discussion, however I don't think that it's 
the same problem, because in my case bluestore eats additional space 
during rebalance. And it doesn't seem that Ceph does small overwrites 
during rebalance. As I understand it does the opposite: it reads and 
writes the whole object... Also I have bluestore_min_alloc_size set to 
4K from the beginning and Igor says that it works around that bug... 
bug-o-feature. :D



Hi Vitaliy,

You may be coming across the EC space amplification issue,
https://tracker.ceph.com/issues/44213

I am not aware of any recent updates to resolve this issue.

Sincerely,

On Tue, Mar 24, 2020 at 12:53 PM  wrote:


Hi.

I'm experiencing some kind of a space leak in Bluestore. I use EC,
compression and snapshots. First I thought that the leak was caused
by
"virtual clones" (issue #38184). However, then I got rid of most of
the
snapshots, but continued to experience the problem.

I suspected something when I added a new disk to the cluster and
free
space in the cluster didn't increase (!).

So to track down the issue I moved one PG (34.1a) using upmaps from
osd11,6,0 to osd6,0,7 and then back to osd11,6,0.

It ate +59 GB after the first move and +51 GB after the second. As I

understand this proves that it's not #38184. Devirtualizaton of
virtual
clones couldn't eat additional space after SECOND rebalance of the
same
PG.

The PG has ~39000 objects, it is EC 2+1 and the compression is
enabled.
Compression ratio is about ~2.7 in my setup, so the PG should use
~90 GB
raw space.

Before and after moving the PG I stopped osd0, mounted it with
ceph-objectstore-tool with debug bluestore = 20/20 and opened the
34.1a***/all directory. It seems to dump all object extents into the
log
in that case. So now I have two logs with all allocated extents for
osd0
(I hope all extents are there). I parsed both logs and added all
compressed blob sizes together ("get_ref Blob ... 0x2 -> 0x...
compressed"). But they add up to ~39 GB before first rebalance
(34.1as2), ~22 GB after it (34.1as1) and ~41 GB again after the
second
move (34.1as2) which doesn't indicate a leak.

But the raw space usage still exceeds initial by a lot. So it's
clear
that there's a leak somewhere.

What additional details can I provide for you to identify the bug?

I posted the same message in the issue tracker,
https://tracker.ceph.com/issues/44731

--
Vitaliy Filippov
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--

Steven Pine

webair.com [1]

P  516.938.4100 x

 E  steven.p...@webair.com

   [2]  [3]



Links:
--
[1] http://webair.com
[2] https://www.facebook.com/WebairInc/
[3] https://www.linkedin.com/company/webair

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Space leak in Bluestore

2020-03-24 Thread Mark Nelson
FWIW, Igor has been doing some great work on improving performance with 
the 4k_min_alloc size.  He gave a presentation at a recent weekly 
performance meeting on it and it's looking really good.  On HDDs I think 
he was seeing up to 2X faster 8K-128K random writes at the expense of up 
to a 20% sequential read hit when there is fragmentation all while 
retaining the space saving benefits of the 4K min_alloc size.  I believe 
in a future point release we should be able to have 4K be the default 
for both HDD and flash as it's already arguably faster on NVMe at this 
point.



Mark

On 3/24/20 12:03 PM, Steven Pine wrote:

Hi Vitaliy,

You may be coming across the EC space amplification issue,
https://tracker.ceph.com/issues/44213

I am not aware of any recent updates to resolve this issue.

Sincerely,

On Tue, Mar 24, 2020 at 12:53 PM  wrote:


Hi.

I'm experiencing some kind of a space leak in Bluestore. I use EC,
compression and snapshots. First I thought that the leak was caused by
"virtual clones" (issue #38184). However, then I got rid of most of the
snapshots, but continued to experience the problem.

I suspected something when I added a new disk to the cluster and free
space in the cluster didn't increase (!).

So to track down the issue I moved one PG (34.1a) using upmaps from
osd11,6,0 to osd6,0,7 and then back to osd11,6,0.

It ate +59 GB after the first move and +51 GB after the second. As I
understand this proves that it's not #38184. Devirtualizaton of virtual
clones couldn't eat additional space after SECOND rebalance of the same
PG.

The PG has ~39000 objects, it is EC 2+1 and the compression is enabled.
Compression ratio is about ~2.7 in my setup, so the PG should use ~90 GB
raw space.

Before and after moving the PG I stopped osd0, mounted it with
ceph-objectstore-tool with debug bluestore = 20/20 and opened the
34.1a***/all directory. It seems to dump all object extents into the log
in that case. So now I have two logs with all allocated extents for osd0
(I hope all extents are there). I parsed both logs and added all
compressed blob sizes together ("get_ref Blob ... 0x2 -> 0x...
compressed"). But they add up to ~39 GB before first rebalance
(34.1as2), ~22 GB after it (34.1as1) and ~41 GB again after the second
move (34.1as2) which doesn't indicate a leak.

But the raw space usage still exceeds initial by a lot. So it's clear
that there's a leak somewhere.

What additional details can I provide for you to identify the bug?

I posted the same message in the issue tracker,
https://tracker.ceph.com/issues/44731

--
Vitaliy Filippov
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: multi-node NFS Ganesha + libcephfs caching

2020-03-24 Thread Daniel Gryniewicz



On 3/24/20 1:16 PM, Maged Mokhtar wrote:


On 24/03/2020 16:48, Maged Mokhtar wrote:


On 24/03/2020 15:14, Daniel Gryniewicz wrote:



On 3/24/20 8:19 AM, Maged Mokhtar wrote:


On 24/03/2020 13:35, Daniel Gryniewicz wrote:



On 3/23/20 4:31 PM, Maged Mokhtar wrote:


On 23/03/2020 20:50, Jeff Layton wrote:

On Mon, 2020-03-23 at 15:49 +0200, Maged Mokhtar wrote:

Hello all,

For multi-node NFS Ganesha over CephFS, is it OK to leave 
libcephfs write caching on, or should it be configured off for 
failover ?



You can do libcephfs write caching, as the caps would need to be
recalled for any competing access. What you really want to avoid 
is any

sort of caching at the ganesha daemon layer.


Hi Jeff,

Thanks for your reply. I meant caching by libcepfs used within the 
ganesha ceph fsal plugin, which i am not sure from your reply if 
this is what you refer to as ganesha daemon layer (or does the 
later mean the internal mdcache in ganesha). I really appreciate 
if you can clarify this point.


Caching in libcephfs is fine, it's caching above the FSAL layer 
that you should avoid.




I really have doubts that it is safe to leave write caching in the 
plugin and have safe failover, yet i see comments in the conf file 
such as:

# The libcephfs client will aggressively cache information while it
# can, so there is little benefit to ganesha actively caching the 
same

# objects.

Or is it up to the NFS client to issue cache syncs and re-submit 
writes if it detects failover ?


Correct.  During failover, NFS will go into it's Grace period, 
which blocks new state,  and allow the NFS clients to re-acquire 
the state (opens, locks, delegations, etc.). This includes 
re-sending any non-committed writes (commits will cause the data to 
be saved to the cluster, not just the libcephfs cache).  Once this 
is all done, normal operation proceeds.  It should be safe, even 
with caching in libcephfs.


Daniel

Thanks Daniel for the clarification..so it is the responsibility of 
the client tor re-send writes...2 questions so i can understand this 
better:


-If this is handled at the client..why on the gateway it is ok to 
cache at the FSAL layer but not above ?


In principle, it's fine above.  However, that requires a level of 
coordination that's not there right now.  The libcephfs cache is 
integrated with the CAPs system, and knows when it can cache and when 
it needs to flush.  There's work to do to get that up to the higher 
layers.




-At what level/layer on the client does this get handled: NFS client 
layer (which will detect failover), filesystem layer, page cache...?


The NFS client layer, interacting with the VFS/page cache.  (NFS is 
the filesystem in this case, so technically the filesystem layer.)


Daniel



Thank you so much for the clarification..

Maged
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


One more thing: for non-Linux clients, specifically VMWare, their NFS 
client may not behave the same, correct ?  In the iSCSI domain, VMWare 
does not have any kind of buffer/page cache, which is probably to 
support failover among ESXi nodes, should i test this or am i on the 
wrong track ? /Maged





This behavior is a requirement of the spec.  All compliant NFS 
implementations behave this way.  If you don't have a client side cache, 
then you have to do only stable writes (each write is sync'd to the 
backing store).  This is slower, but it's safe.  If VMWare doesn't do 
this, then they *will* lose data if the server ever crashes, and it will 
be their exclusive fault.


Daniel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: v15.2.0 Octopus released

2020-03-24 Thread Mazzystr
I'm trying to install on a fresh CentOs 8 host and get the following
error

# yum install ceph
..
Error:
 Problem: package ceph-2:15.2.0-0.el8.x86_64 requires ceph-osd =
2:15.2.0-0.el8, but none of the providers can be installed
  - conflicting requests
  - nothing provides libleveldb.so.1()(64bit) needed by
ceph-osd-2:15.2.0-0.el8.x86_64
(try to add '--skip-broken' to skip uninstallable packages or '--nobest' to
use not only best candidate packages)


Does anyone have a recommendation on where to acquire a trusted pkg that
provides libleveldb.so.1?

Thanks!
/C

On Tue, Mar 24, 2020 at 4:42 AM Abhishek Lekshmanan 
wrote:

>
> We're happy to announce the first stable release of Octopus v15.2.0.
> There are a lot of changes and new features added, we advise everyone to
> read the release notes carefully, and in particular the upgrade notes,
> before upgrading. Please refer to the official blog entry
> https://ceph.io/releases/v15-2-0-octopus-released/ for a detailed
> version with links & changelog.
>
> This release wouldn't have been possible without the support of the
> community, this release saw contributions from over 330 developers & 80
> organizations, and we thank everyone for making this release happen.
>
> Major Changes from Nautilus
> ---
> General
> ~~~
> * A new deployment tool called **cephadm** has been introduced that
>   integrates Ceph daemon deployment and management via containers
>   into the orchestration layer.
> * Health alerts can now be muted, either temporarily or permanently.
> * Health alerts are now raised for recent Ceph daemons crashes.
> * A simple 'alerts' module has been introduced to send email
>   health alerts for clusters deployed without the benefit of an
>   existing external monitoring infrastructure.
> * Packages are built for the following distributions:
>   - CentOS 8
>   - CentOS 7 (partial--see below)
>   - Ubuntu 18.04 (Bionic)
>   - Debian Buster
>   - Container images (based on CentOS 8)
>
>   Note that the dashboard, prometheus, and restful manager modules
>   will not work on the CentOS 7 build due to Python 3 module
>   dependencies that are missing in CentOS 7.
>
>   Besides this packages built by the community will also available for the
>   following distros:
>   - Fedora (33/rawhide)
>   - openSUSE (15.2, Tumbleweed)
>
> Dashboard
> ~
> The mgr-dashboard has gained a lot of new features and functionality:
>
> * UI Enhancements
>   - New vertical navigation bar
>   - New unified sidebar: better background task and events notification
>   - Shows all progress mgr module notifications
>   - Multi-select on tables to perform bulk operations
>
> * Dashboard user account security enhancements
>   - Disabling/enabling existing user accounts
>   - Clone an existing user role
>   - Users can change their own password
>   - Configurable password policies: Minimum password complexity/length
> requirements
>   - Configurable password expiration
>   - Change password after first login
>
> New and enhanced management of Ceph features/services:
>
> * OSD/device management
>   - List all disks associated with an OSD
>   - Add support for blinking enclosure LEDs via the orchestrator
>   - List all hosts known by the orchestrator
>   - List all disks and their properties attached to a node
>   - Display disk health information (health prediction and SMART data)
>   - Deploy new OSDs on new disks/hosts
>   - Display and allow sorting by an OSD's default device class in the OSD
> table
>   - Explicitly set/change the device class of an OSD, display and sort
> OSDs by
> device class
>
> * Pool management
>   - Viewing and setting pool quotas
>   - Define and change per-pool PG autoscaling mode
>
> * RGW management enhancements
>   - Enable bucket versioning
>   - Enable MFA support
>   - Select placement target on bucket creation
>
> * CephFS management enhancements
>   - CephFS client eviction
>   - CephFS snapshot management
>   - CephFS quota management
>   - Browse CephFS directory
>
> * iSCSI management enhancements
>   - Show iSCSI GW status on landing page
>   - Prevent deletion of IQNs with open sessions
>   - Display iSCSI "logged in" info
>
> * Prometheus alert management
>   - List configured Prometheus alerts
>
> RADOS
> ~
> * Objects can now be brought in sync during recovery by copying only
>   the modified portion of the object, reducing tail latencies during
>   recovery.
> * Ceph will allow recovery below *min_size* for Erasure coded pools,
>   wherever possible.
> * The PG autoscaler feature introduced in Nautilus is enabled for
>   new pools by default, allowing new clusters to autotune *pg num*
>   without any user intervention.  The default values for new pools
>   and RGW/CephFS metadata pools have also been adjusted to perform
>   well for most users.
> * BlueStore has received several improvements and performance
>   updates, including improved accounting for "omap" (key/value)
>   obje

[ceph-users] Re: v15.2.0 Octopus released

2020-03-24 Thread konstantin . ilyasov
Is it poosible to provide instructions about upgrading from CentOs7+ ceph 
14.2.8 to CentOs8+ceph 15.2.0 ?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: v15.2.0 Octopus released

2020-03-24 Thread Bryan Stillwell
Great work!  Thanks to everyone involved!

One minor thing I've noticed so far with the Ubuntu Bionic build is it's 
reporting the release as an RC instead of being 'stable':

$ ceph versions | grep octopus
"ceph version 15.2.0 (dc6a0b5c3cbf6a5e1d6d4f20b5ad466d76b96247) octopus 
(rc)": 1

Bryan

> On Mar 24, 2020, at 5:38 AM, Abhishek Lekshmanan  wrote:
> 
> Notice: This email is from an external sender.
> 
> 
> 
> We're happy to announce the first stable release of Octopus v15.2.0.
> There are a lot of changes and new features added, we advise everyone to
> read the release notes carefully, and in particular the upgrade notes,
> before upgrading. Please refer to the official blog entry
> https://ceph.io/releases/v15-2-0-octopus-released/ for a detailed
> version with links & changelog.
> 
> This release wouldn't have been possible without the support of the
> community, this release saw contributions from over 330 developers & 80
> organizations, and we thank everyone for making this release happen.
> 
> Major Changes from Nautilus
> ---
> General
> ~~~
> * A new deployment tool called **cephadm** has been introduced that
>  integrates Ceph daemon deployment and management via containers
>  into the orchestration layer.
> * Health alerts can now be muted, either temporarily or permanently.
> * Health alerts are now raised for recent Ceph daemons crashes.
> * A simple 'alerts' module has been introduced to send email
>  health alerts for clusters deployed without the benefit of an
>  existing external monitoring infrastructure.
> * Packages are built for the following distributions:
>  - CentOS 8
>  - CentOS 7 (partial--see below)
>  - Ubuntu 18.04 (Bionic)
>  - Debian Buster
>  - Container images (based on CentOS 8)
> 
>  Note that the dashboard, prometheus, and restful manager modules
>  will not work on the CentOS 7 build due to Python 3 module
>  dependencies that are missing in CentOS 7.
> 
>  Besides this packages built by the community will also available for the
>  following distros:
>  - Fedora (33/rawhide)
>  - openSUSE (15.2, Tumbleweed)
> 
> Dashboard
> ~
> The mgr-dashboard has gained a lot of new features and functionality:
> 
> * UI Enhancements
>  - New vertical navigation bar
>  - New unified sidebar: better background task and events notification
>  - Shows all progress mgr module notifications
>  - Multi-select on tables to perform bulk operations
> 
> * Dashboard user account security enhancements
>  - Disabling/enabling existing user accounts
>  - Clone an existing user role
>  - Users can change their own password
>  - Configurable password policies: Minimum password complexity/length
>requirements
>  - Configurable password expiration
>  - Change password after first login
> 
> New and enhanced management of Ceph features/services:
> 
> * OSD/device management
>  - List all disks associated with an OSD
>  - Add support for blinking enclosure LEDs via the orchestrator
>  - List all hosts known by the orchestrator
>  - List all disks and their properties attached to a node
>  - Display disk health information (health prediction and SMART data)
>  - Deploy new OSDs on new disks/hosts
>  - Display and allow sorting by an OSD's default device class in the OSD
>table
>  - Explicitly set/change the device class of an OSD, display and sort OSDs by
>device class
> 
> * Pool management
>  - Viewing and setting pool quotas
>  - Define and change per-pool PG autoscaling mode
> 
> * RGW management enhancements
>  - Enable bucket versioning
>  - Enable MFA support
>  - Select placement target on bucket creation
> 
> * CephFS management enhancements
>  - CephFS client eviction
>  - CephFS snapshot management
>  - CephFS quota management
>  - Browse CephFS directory
> 
> * iSCSI management enhancements
>  - Show iSCSI GW status on landing page
>  - Prevent deletion of IQNs with open sessions
>  - Display iSCSI "logged in" info
> 
> * Prometheus alert management
>  - List configured Prometheus alerts
> 
> RADOS
> ~
> * Objects can now be brought in sync during recovery by copying only
>  the modified portion of the object, reducing tail latencies during
>  recovery.
> * Ceph will allow recovery below *min_size* for Erasure coded pools,
>  wherever possible.
> * The PG autoscaler feature introduced in Nautilus is enabled for
>  new pools by default, allowing new clusters to autotune *pg num*
>  without any user intervention.  The default values for new pools
>  and RGW/CephFS metadata pools have also been adjusted to perform
>  well for most users.
> * BlueStore has received several improvements and performance
>  updates, including improved accounting for "omap" (key/value)
>  object data by pool, improved cache memory management, and a
>  reduced allocation unit size for SSD devices.  (Note that by
>  default, the first time each OSD starts after upgrading to octopus
>  it will trigger a conversion that may take from a few minutes to a
>  f

[ceph-users] Re: v15.2.0 Octopus released

2020-03-24 Thread Sage Weil
On Tue, 24 Mar 2020, konstantin.ilya...@mediascope.net wrote:
> Is it poosible to provide instructions about upgrading from CentOs7+ 
> ceph 14.2.8 to CentOs8+ceph 15.2.0 ?

You have ~2 options:

- First, upgrade Ceph packages to 15.2.0.  Note that your dashboard will 
break temporarily.  Then, upgrade each host to CentOS 8.  Your dashboard 
should "un-break" when the el8 ceph packages are installed.

- Combine the Ceph upgrade with a transition to cephadm based on 
these directions:

https://docs.ceph.com/docs/octopus/cephadm/adoption/

After the transition, you can either stick with el7 indefinitely (cephadm 
doesn't care too much about the host OS) or upgrade the host to centos8.

- First upgrade each host to CentOS8, then upgrade Ceph.  This will 
eventually be possible, but at the moment we don't have el8 packages built 
for nautilus.  :/

s
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: v15.2.0 Octopus released

2020-03-24 Thread Mazzystr
epel has leveldb for el7 but not el8...A Fedora 30 pkg miiight work...It
resolves the rpm dependency at the very least.  Octopus also reqs el8
python3-cherrypy, python3.7dist(six), and python(abi) ... and these req
specific versions of libstdc++

This is pretty much a brick wall for Ceph on el8 since Nautilus never got
an el8 build.

I got me all excited for nothing... :/


On Tue, Mar 24, 2020 at 2:32 PM Mazzystr  wrote:

> I'm trying to install on a fresh CentOs 8 host and get the following
> error
>
> # yum install ceph
> ..
> Error:
>  Problem: package ceph-2:15.2.0-0.el8.x86_64 requires ceph-osd =
> 2:15.2.0-0.el8, but none of the providers can be installed
>   - conflicting requests
>   - nothing provides libleveldb.so.1()(64bit) needed by
> ceph-osd-2:15.2.0-0.el8.x86_64
> (try to add '--skip-broken' to skip uninstallable packages or '--nobest'
> to use not only best candidate packages)
>
>
> Does anyone have a recommendation on where to acquire a trusted pkg that
> provides libleveldb.so.1?
>
> Thanks!
> /C
>
> On Tue, Mar 24, 2020 at 4:42 AM Abhishek Lekshmanan 
> wrote:
>
>>
>> We're happy to announce the first stable release of Octopus v15.2.0.
>> There are a lot of changes and new features added, we advise everyone to
>> read the release notes carefully, and in particular the upgrade notes,
>> before upgrading. Please refer to the official blog entry
>> https://ceph.io/releases/v15-2-0-octopus-released/ for a detailed
>> version with links & changelog.
>>
>> This release wouldn't have been possible without the support of the
>> community, this release saw contributions from over 330 developers & 80
>> organizations, and we thank everyone for making this release happen.
>>
>> Major Changes from Nautilus
>> ---
>> General
>> ~~~
>> * A new deployment tool called **cephadm** has been introduced that
>>   integrates Ceph daemon deployment and management via containers
>>   into the orchestration layer.
>> * Health alerts can now be muted, either temporarily or permanently.
>> * Health alerts are now raised for recent Ceph daemons crashes.
>> * A simple 'alerts' module has been introduced to send email
>>   health alerts for clusters deployed without the benefit of an
>>   existing external monitoring infrastructure.
>> * Packages are built for the following distributions:
>>   - CentOS 8
>>   - CentOS 7 (partial--see below)
>>   - Ubuntu 18.04 (Bionic)
>>   - Debian Buster
>>   - Container images (based on CentOS 8)
>>
>>   Note that the dashboard, prometheus, and restful manager modules
>>   will not work on the CentOS 7 build due to Python 3 module
>>   dependencies that are missing in CentOS 7.
>>
>>   Besides this packages built by the community will also available for the
>>   following distros:
>>   - Fedora (33/rawhide)
>>   - openSUSE (15.2, Tumbleweed)
>>
>> Dashboard
>> ~
>> The mgr-dashboard has gained a lot of new features and functionality:
>>
>> * UI Enhancements
>>   - New vertical navigation bar
>>   - New unified sidebar: better background task and events notification
>>   - Shows all progress mgr module notifications
>>   - Multi-select on tables to perform bulk operations
>>
>> * Dashboard user account security enhancements
>>   - Disabling/enabling existing user accounts
>>   - Clone an existing user role
>>   - Users can change their own password
>>   - Configurable password policies: Minimum password complexity/length
>> requirements
>>   - Configurable password expiration
>>   - Change password after first login
>>
>> New and enhanced management of Ceph features/services:
>>
>> * OSD/device management
>>   - List all disks associated with an OSD
>>   - Add support for blinking enclosure LEDs via the orchestrator
>>   - List all hosts known by the orchestrator
>>   - List all disks and their properties attached to a node
>>   - Display disk health information (health prediction and SMART data)
>>   - Deploy new OSDs on new disks/hosts
>>   - Display and allow sorting by an OSD's default device class in the OSD
>> table
>>   - Explicitly set/change the device class of an OSD, display and sort
>> OSDs by
>> device class
>>
>> * Pool management
>>   - Viewing and setting pool quotas
>>   - Define and change per-pool PG autoscaling mode
>>
>> * RGW management enhancements
>>   - Enable bucket versioning
>>   - Enable MFA support
>>   - Select placement target on bucket creation
>>
>> * CephFS management enhancements
>>   - CephFS client eviction
>>   - CephFS snapshot management
>>   - CephFS quota management
>>   - Browse CephFS directory
>>
>> * iSCSI management enhancements
>>   - Show iSCSI GW status on landing page
>>   - Prevent deletion of IQNs with open sessions
>>   - Display iSCSI "logged in" info
>>
>> * Prometheus alert management
>>   - List configured Prometheus alerts
>>
>> RADOS
>> ~
>> * Objects can now be brought in sync during recovery by copying only
>>   the modified portion of the object, reducin

[ceph-users] Re: Ceph fully crash and we unable to recovery

2020-03-24 Thread Parker Lau
Hello Sir/Madam,

We are facing the serious problem for our proxmox with Ceph. I have already 
submitted the ticket to Proxmox but they said that only option trying to 
recover the mondb.We would like to know any some suggestion in our situation.

So far the only option that I see would be in trying to recover the mondb from 
an OSD. But this action is usually a last resort. Since I don't know the 
outcome, the cluster could then very well be dead and thus all data lost.
https://docs.ceph.com/docs/luminous/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds

---

> > I would like to set the nodown on the cluster to see if the OSDs are kept 
> > in the cluster.
> > The OSDs are joining the cluster but are set as down shortly after.
> >
> ok . Please go ahead.
Sadly this didn't have any effect either.

But I think I found a clue to what might be going on.
# ceph-osd.0.log
2020-03-24 21:22:06.462100 7fb33aab0e00 10 osd.0 0 read_superblock 
sb(e8e81549-91e5-4370-b091-9500f406a2b2 osd.0 
0bb2b9bb-9a70-4d6f-8d4e-3fc5049d63d6 e14334 [13578,14334] lci=[0,14334])

# ceph-mon.cccs01.log
2020-03-24 21:26:48.038345 7f7ef791a700 10 
mon.cccs01@0(leader).osd e14299 e14299: 48 
total, 13 up, 35 in
2020-03-24 21:26:48.038351 7f7ef791a700  5 
mon.cccs01@0(leader).osd e14299 can_mark_out 
current in_ratio 0.729167 < min 0.75, will not mark osds out
2020-03-24 21:26:48.038360 7f7ef791a700 10 
mon.cccs01@0(leader).osd e14299 tick NOOUT 
flag set, not checking down osds
2020-03-24 21:26:48.038364 7f7ef791a700 10 
mon.cccs01@0(leader).osd e14299  
min_last_epoch_clean 0

# ceph-mon.cccs06.log
2020-03-22 22:26:57.056939 7f3c1993a700  1 
mon.cccs06@5(peon).osd e14333 e14333: 48 total, 
48 up, 48 in
2020-03-22 22:27:04.113054 7f3c1993a700  0 
mon.cccs06@5(peon) e31 handle_command 
mon_command({"prefix":"df","format":"json"} v 0) v1
2020-03-22 22:27:04.113086 7f3c1993a700  0 log_channel(audit) log [DBG] : 
from='client.? 10.1.14.8:0/4265796352' entity='client.admin' 
cmd=[{"prefix":"df","format":"json"}]: dispatch
2020-03-22 22:27:09.752027 7f3c1993a700  1 
mon.cccs06@5(peon).osd e14334 e14334: 48 total, 
48 up, 48 in
...
2020-03-23 10:42:51.891722 7ff1d9079700  0 
mon.cccs06@2(synchronizing).osd e14269 
crush map has features 288514051259236352, adjusting msgr requires
2020-03-23 10:42:51.891729 7ff1d9079700  0 
mon.cccs06@2(synchronizing).osd e14269 
crush map has features 288514051259236352, adjusting msgr requires
2020-03-23 10:42:51.891730 7ff1d9079700  0 
mon.cccs06@2(synchronizing).osd e14269 
crush map has features 1009089991638532096, adjusting msgr requires
2020-03-23 10:42:51.891732 7ff1d9079700  0 
mon.cccs06@2(synchronizing).osd e14269 
crush map has features 288514051259236352, adjusting msgr requires
It seems that the OSD have a epoch of e14334 but the MONs seems to have e14269 
for the OSDs. I could only find the e14334 on the ceph-mon.cccs06.
The cccs06 has been the last MON standing (the node also reset). But when the 
cluster came back the MONs with the old epoch came up first and joined.

The syslog showed that these were the last log entries written. So the nodes 
reset shortly after. This would fit to cccs06 being the last MON alive.
# cccs01
Mar 22 22:16:09 cccs01 pmxcfs[2502]: [dcdb] notice: leader is 1/2502
Mar 22 22:16:09 cccs01 pmxcfs[2502]: [dcdb] notice: synced members: 1/2502, 
5/2219

# cccs02
Mar 22 22:15:57 cccs02 pmxcfs[2514]: [dcdb] notice: we (3/2514) left the 
process group
Mar 22 22:15:57 cccs02 pmxcfs[2514]: [dcdb] crit: leaving CPG group

# cccs06
Mar 22 22:31:16 cccs06 pmxcfs[2662]: [status] noMar 22 22:34:01 cccs06 
systemd-modules-load[773]: Inserted module 'iscsi_tcp'
Mar 22 22:34:01 cccs06 systemd-modules-load[773]: Inserted module 'ib_iser'

There must have been some issue prior to the reset as I found those error 
messages. They could explain why no older epoch was written anymore.
# cccs01
2020-03-22 22:22:15.661060 7fc09c85c100 -1 rocksdb: IO error: 
/var/lib/ceph/mon/ceph-cccs01/store.db/LOCK: Permission denied
2020-03-22 22:22:15.661067 7fc09c85c100 -1 error opening mon data directory at 
'/var/lib/ceph/mon/ceph-cccs01': (22) Invalid argument

# cccs02
2020-03-22 22:31:11.209524 7fd034786100 -1 rocksdb: IO error: 
/var/lib/ceph/mon/ceph-cccs02/store.db/LOCK: Permission denied
2020-03-22 22:31:11.209541 7fd034786100 -1 error opening mon data directory at 
'/var/lib/ceph/mon/ceph-cccs02': (22) Invalid argument

# cccs06
no such entries.

From my point of view, the last question is now: How to get the epoch from the 
OSD into the MON DB.
I have no answer to this yet.


Best Regards,
Parker Lau
ReadyS