[ceph-users] Re: Question about per MDS journals

2021-02-25 Thread John Spray
osdc/Journaler is for RDB, client/Journaler is for CephFS.


On Thu, Feb 25, 2021 at 8:26 AM 조규진  wrote:
>
> Hi, John.
>
> Thanks for your kind reply!
>
> While i'm checking the code that you recommend to check and other .cc files 
> about journal, I find that there is two Journaler class.
> One is at "src/osdc/Journaler.h" and the other one is at 
> "src/journal/Journaler.h".
> If you don't mind, could you tell me which one is for MDS journal? and the 
> differences between them?
>
> Thanks.
> kyujin
>
> 2021년 2월 25일 (목) 오전 1:15, John Spray 님이 작성:
>>
>> On Wed, Feb 24, 2021 at 9:10 AM 조규진  wrote:
>> >
>> > Hi.
>> >
>> > I'm a newbie in CephFS and I have some questions about how per-MDS journals
>> > work.
>> > In Sage's paper (osdi '06), I read that each MDSs has its own journal and
>> > it lazily flushes metadata modifications on OSD cluster.
>> > What I'm wondering is that some directory operations like rename work with
>> > multiple metadata and It may work on two or more MDSs and their journals,
>> > so I think it needs some mechanisms to construct a transaction that works
>> > on multiple journals like some distributed transaction mechanisms.
>> >
>> > Could anybody explains how per-MDS journals work in such directory
>> > operations? or recommends some references about it?
>>
>> Your intuition is correct: these transactions span multiple MDS journals.
>>
>> The code for this stuff is somewhat long, in src/mds/Server.cc, but
>> here are a couple of pointers if you're interested in untangling it:
>> - Server::handle_client_rename is the entry point
>> - The MDS which handles the client request sends MMDSPeerRequest
>> messages to peers in rename_prepare_witness, and waits for
>> acknowledgements before writing EUpdate events to its journal
>> - The peer(s) write EPeerUpdate(OP_PREPARE) events to their journals
>> during prepare, and EPeerUpdate(OP_COMMIT) after the first MDS has
>> completed.
>>
>> John
>>
>>
>>
>> >
>> > Thanks.
>> > kyujin.
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph slow at 80% full, mds nodes lots of unused memory

2021-02-25 Thread Simon Oosthoek
On 24/02/2021 22:28, Patrick Donnelly wrote:
> Hello Simon,
> 
> On Wed, Feb 24, 2021 at 7:43 AM Simon Oosthoek  
> wrote:
>>
>> On 24/02/2021 12:40, Simon Oosthoek wrote:
>>> Hi
>>>
>>> we've been running our Ceph cluster for nearly 2 years now (Nautilus)
>>> and recently, due to a temporary situation the cluster is at 80% full.
>>>
>>> We are only using CephFS on the cluster.
>>>
>>> Normally, I realize we should be adding OSD nodes, but this is a
>>> temporary situation, and I expect the cluster to go to <60% full quite soon.
>>>
>>> Anyway, we are noticing some really problematic slowdowns. There are
>>> some things that could be related but we are unsure...
>>>
>>> - Our 2 MDS nodes (1 active, 1 standby) are configured with 128GB RAM,
>>> but are not using more than 2GB, this looks either very inefficient, or
>>> wrong ;-)
>>
>> After looking at our monitoring history, it seems the mds cache is
>> actually used more fully, but most of our servers are getting a weekly
>> reboot by default. This clears the mds cache obviously. I wonder if
>> that's a smart idea for an MDS node...? ;-)
> 
> No, it's not. Can you also check that you do not have mds_cache_size
> configured, perhaps on the MDS local ceph.conf?
> 

Hi Patrick,

I've already changed the reboot period to 1 month.

The mds_cache_size is not configured locally in the /etc/ceph/ceph.conf
file, so I guess it's just the weekly reboot that cleared the memory of
cache data...

I'm starting to think that a full ceph cluster could probably be the
only cause of performance problems. Though I don't know why that would be.

Cheers

/Simon
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph slow at 80% full, mds nodes lots of unused memory

2021-02-25 Thread Dylan McCulloch
Simon Oosthoek wrote:
> On 24/02/2021 22:28, Patrick Donnelly wrote:
> >   Hello Simon,
> >
> >  On Wed, Feb 24, 2021 at 7:43 AM Simon Oosthoek 
> >  wrote:
> >
> >  On 24/02/2021 12:40, Simon Oosthoek wrote:
> >   Hi
> >
> >  we've been running our Ceph cluster for nearly 2 years now (Nautilus)
> >  and recently, due to a temporary situation the cluster is at 80% full.
> >
> >  We are only using CephFS on the cluster.
> >
> >  Normally, I realize we should be adding OSD nodes, but this is a
> >  temporary situation, and I expect the cluster to go to <60% full quite 
> > soon.
> >
> >  Anyway, we are noticing some really problematic slowdowns. There are
> >  some things that could be related but we are unsure...
> >
> >  - Our 2 MDS nodes (1 active, 1 standby) are configured with 128GB RAM,
> >  but are not using more than 2GB, this looks either very inefficient, or
> >  wrong ;-)
> >  After looking at our monitoring history, it seems the mds cache is
> >  actually used more fully, but most of our servers are getting a weekly
> >  reboot by default. This clears the mds cache obviously. I wonder if
> >  that's a smart idea for an MDS node...? ;-)
> >  No, it's not. Can you also check that you do not have mds_cache_size
> >  configured, perhaps on the MDS local ceph.conf?
> >
> Hi Patrick,
>
> I've already changed the reboot period to 1 month.
>
> The mds_cache_size is not configured locally in the /etc/ceph/ceph.conf
> file, so I guess it's just the weekly reboot that cleared the memory of
> cache data...
>
> I'm starting to think that a full ceph cluster could probably be the
> only cause of performance problems. Though I don't know why that would be.

Did the performance issue only arise when OSDs in the cluster reached 80% 
usage? What is your osd nearfull_ratio?
$ ceph osd dump | grep ratio
Is the cluster in HEALTH_WARN with nearfull OSDs?
We noticed recently when one of our clusters had nearfull OSDs that cephfs 
client performance was heavily impacted.
Our cluster is nautilus 14.2.15. Clients are kernel 4.19.154.
We determined that it was most likely due to the ceph client forcing sync file 
writes when nearfull flag is present.
https://github.com/ceph/ceph-client/commit/7614209736fbc4927584d4387faade4f31444fce
Increasing and decreasing the nearfull ratio confirmed that performance was 
only impacted while the nearfull flag was present.
Not sure if that's relevant for your case.

>
> Cheers
>
> /Simon


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [Suspicious newsletter] Re: List number of buckets owned per user

2021-02-25 Thread Szabo, Istvan (Agoda)
Maybe this one?

radosgw-admin bucket list --uid=

And after linux commands, grep, wc -l.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---

-Original Message-
From: Konstantin Shalygin 
Sent: Thursday, February 25, 2021 12:49 PM
To: Marcelo 
Cc: ceph-users@ceph.io
Subject: [Suspicious newsletter] [ceph-users] Re: List number of buckets owned 
per user

Email received from outside the company. If in doubt don't click links nor open 
attachments!


Or you can achieve users from bucket usage, consult with code of 
radosgw_usage_exporter [1] maybe if enough to just start exporter and work with 
data in Grafana


Cheers,
k

[1] https://github.com/blemmenes/radosgw_usage_exporter

> On 24 Feb 2021, at 16:08, Marcelo  wrote:
>
> I'm trying to list the number of buckets that users have for
> monitoring purposes, but I need to list and count the number of
> buckets per user. Is it possible to get this information somewhere else?

___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
ceph-users-le...@ceph.io


This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph slow at 80% full, mds nodes lots of unused memory

2021-02-25 Thread Simon Oosthoek
On 25/02/2021 11:19, Dylan McCulloch wrote:
> Simon Oosthoek wrote:
>> On 24/02/2021 22:28, Patrick Donnelly wrote:
>> >   Hello Simon,
>> >  
>> >  On Wed, Feb 24, 2021 at 7:43 AM Simon Oosthoek
>  wrote:
>> >  
>> >  On 24/02/2021 12:40, Simon Oosthoek wrote:
>> >   Hi
>> >
>> >  we've been running our Ceph cluster for nearly 2 years now (Nautilus)
>> >  and recently, due to a temporary situation the cluster is at 80% full.
>> >
>> >  We are only using CephFS on the cluster.
>> >
>> >  Normally, I realize we should be adding OSD nodes, but this is a
>> >  temporary situation, and I expect the cluster to go to <60% full
> quite soon.
>> >
>> >  Anyway, we are noticing some really problematic slowdowns. There are
>> >  some things that could be related but we are unsure...
>> >
>> >  - Our 2 MDS nodes (1 active, 1 standby) are configured with 128GB RAM,
>> >  but are not using more than 2GB, this looks either very inefficient, or
>> >  wrong ;-)
>> >  After looking at our monitoring history, it seems the mds cache is
>> >  actually used more fully, but most of our servers are getting a weekly
>> >  reboot by default. This clears the mds cache obviously. I wonder if
>> >  that's a smart idea for an MDS node...? ;-)  
>> >  No, it's not. Can you also check that you do not have mds_cache_size
>> >  configured, perhaps on the MDS local ceph.conf?
>> >  
>> Hi Patrick,
>>
>> I've already changed the reboot period to 1 month.
>>
>> The mds_cache_size is not configured locally in the /etc/ceph/ceph.conf
>> file, so I guess it's just the weekly reboot that cleared the memory of
>> cache data...
>>
>> I'm starting to think that a full ceph cluster could probably be the
>> only cause of performance problems. Though I don't know why that would be.
> 
> Did the performance issue only arise when OSDs in the cluster reached
> 80% usage? What is your osd nearfull_ratio?
> $ ceph osd dump | grep ratio
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85


> Is the cluster in HEALTH_WARN with nearfull OSDs?

]# ceph -s
  cluster:
id: b489547c-ba50-4745-a914-23eb78e0e5dc
health: HEALTH_WARN
2 pgs not deep-scrubbed in time
957 pgs not scrubbed in time

  services:
mon: 3 daemons, quorum cephmon3,cephmon1,cephmon2 (age 7d)
mgr: cephmon3(active, since 2M), standbys: cephmon1, cephmon2
mds: cephfs:1 {0=cephmds2=up:active} 1 up:standby
osd: 168 osds: 168 up (since 11w), 168 in (since 9M); 43 remapped pgs

  task status:
scrub status:
mds.cephmds2: idle

  data:
pools:   10 pools, 5280 pgs
objects: 587.71M objects, 804 TiB
usage:   1.4 PiB used, 396 TiB / 1.8 PiB avail
pgs: 9634168/5101965463 objects misplaced (0.189%)
 5232 active+clean
 29   active+remapped+backfill_wait
 14   active+remapped+backfilling
 5active+clean+scrubbing+deep+repair

  io:
client:   136 MiB/s rd, 600 MiB/s wr, 386 op/s rd, 359 op/s wr
recovery: 328 MiB/s, 169 objects/s

> We noticed recently when one of our clusters had nearfull OSDs that
> cephfs client performance was heavily impacted.
> Our cluster is nautilus 14.2.15. Clients are kernel 4.19.154.
> We determined that it was most likely due to the ceph client forcing
> sync file writes when nearfull flag is present.
> https://github.com/ceph/ceph-client/commit/7614209736fbc4927584d4387faade4f31444fce
> Increasing and decreasing the nearfull ratio confirmed that performance
> was only impacted while the nearfull flag was present.
> Not sure if that's relevant for your case.

I think this could be very similar in our cluster, thanks for sharing
your insights!

Cheers

/Simon
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Erasure coded calculation

2021-02-25 Thread Simon Sutter
Hello everyone!

I'm trying to calculate the theoretical usable storage of a ceph cluster with 
erasure coded pools.

I have 8 nodes and the profile for all data pools will be k=6 m=2.
If every node has 6 x 1TB wouldn't the calculation be like this:
RAW capacity: 8Nodes x 6Disks x 1TB = 48TB
Loss to m=2: 48TB / 8Nodes x 2m = 12TB
EC capacity: 48TB - 12TB = 36TB

At the moment I have one cluster with 8 nodes and different disks than the 
sample (but every node has the same amount of disks and the same sized disks).
The output of ceph df detail is:
--- RAW STORAGE ---
CLASS  SIZE AVAILUSED RAW USED  %RAW USED
hdd109 TiB  103 TiB  5.8 TiB   5.9 TiB   5.41
TOTAL  109 TiB  103 TiB  5.8 TiB   5.9 TiB   5.41

--- POOLS ---
POOL   ID  PGS  STORED   OBJECTS  %USED  MAX AVAIL
device_health_metrics   11   51 MiB   48  0 30 TiB
rep_data_fs 2   32   14 KiB3.41k  0 30 TiB
rep_meta_fs 3   32  227 MiB1.72k  0 30 TiB
ec_bkp14   32  4.2 TiB1.10M   6.11 67 TiB

So ec_bkp1 uses 4.2TiB an there are 67TiB free usable Storage.
This means total EC usable storage would be 71.2TiB.
But calculating with the 109TiB RAW storage, shouldn't it be  81.75?
Are the 10TiB just some overhead (that would be much overhead) or is the 
calculation not correct?

And what If I want to expand the cluster in the first sample above by three 
nodes with 6 x 2TB, which means not the same sized disks as the others.
Will the calculation with the same EC profile still be the same?
RAW capacity: 8Nodes x 6Disks x 1TB + 3Nodes x 6Disks x 2TB = 84TB
Loss to m=2: 84TB / 11Nodes x 2m = 15.27TB
EC capacity: 84TB - 15.27TB = 68.72TB


Thanks in advance,
Simon
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Newbie Requesting Help - Please, This Is Driving Me Mad/Crazy!

2021-02-25 Thread Frank Schilder
> The line in the steps I did is:

OK, missed that one. Should be the right location.

The logs I'm referring to are the ordinary stdout/stderr of the osd process. 
Just start the daemon in foreground by hand if this output is not available 
otherwise.

Simplest form is:

/usr/bin/ceph-osd -f -i 

Make sure no daemon is running at the time.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: duluxoz 
Sent: 25 February 2021 03:53:10
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Newbie Requesting Help - Please, This Is Driving Me 
Mad/Crazy!

Yes, the OSD Key is in the correct folder (or, at least, I think it is).
The line in the steps I did is:

 sudo -u ceph ceph auth get-or-create osd.0 osd 'allow *' mon 'allow 
profile osd' mgr 'allow profile osd' -o /var/lib/ceph/osd/ceph-0/keyring

This places the osd-0 key in the file 'keyring' in the
'var/lib/ceph/osd/ceph-0' folder.

Now, I *assume* (ie made an @ss out of... well, me) that this is the
correct location for that key (based on my understanding of the Ceph
Doco), but obviously, I could be wrong.

And as far as the start-up and shutdown-log is concerned: there ain't
none - or at least, I can't find them (unless you mean the 'systemctl
start' log, etc?)

Any other ideas :-)

Cheers

Dulux-Oz

On 25/02/2021 07:04, Frank Schilder wrote:
> I'm not running octupus and I don't use the hard-core bare metal deployment 
> method. I use ceph-volume and things work smoothly. Hence, my input might be 
> useless.
>
> Now looking at your text, you should always include the start-up and 
> shut-down log of the OSD. As a wild guess, did you copy the OSD auth key to 
> the required directory? Its somewhere in the instructions and I can't seem to 
> find the copy command in your description.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: matt...@peregrineit.net 
> Sent: 23 February 2021 06:09:52
> To: ceph-users@ceph.io
> Subject: [ceph-users] Newbie Requesting Help - Please, This Is Driving Me 
> Mad/Crazy!
>
> Hi Everyone,
>
> Let me apologise upfront:
>
>  If this isn't the correct List to post to
>  If this has been answered already (& I've missed it in my searching)
>  If this has ended up double posted
>  If I've in any way given (or about to give) offence to anyone
>
> I really need some help.
>
> I'm trying to get a simple single host Pilot/Test Cluster up and running. I'm 
> using CentOS 8 (fully updated), and Ceph-Octopus (latest version from the 
> Ceph Repo). I have both ceph-mon and ceph-mgr working/running (although 
> ceph-mge keeps stopping/crashing after about 1-3 hours or so - but that's 
> another issue), and my first osd (and only osd at this point) *appears* to be 
> working, but when I issue the command 'systemctl start ceph-osd@0' the 
> ceph-osd daemon won't spin up and thus when I issue 'ceph -s' the result says 
> the 'osd: 1 osds: 0 up, 0 in'.
>
> I've gone through the relevant logs but I can't seem to find the issue.
>
> I'm doing this as a Manual Install because I want to actually *learn* what's 
> going on during the install/etc. I know I can use cephadmin (in a production 
> environment), but as I said, I'm trying to learn how everything "fits 
> together".
>
> I've read and re-read the official Ceph Documentation and followed the 
> following steps/commands to get Ceph installed and running:
>
> Ran the following commands:
>  su -
>  useradd -d /home/ceph -m ceph -p 
>  mkdir /home/ceph/.ssh
>
> Added a public SSH Key to /home/ceph/.ssh/authorized_keys.
>
> Ran the following commands:
>  chmod 600 /home/ceph/.ssh/*
>  chown ceph:ceph -R /home/ceph/.ssh
>
> Added the ceph.repo details to /etc/yum.repos.d/ceph.repo (as per the Ceph 
> Documentation).
>
> Ran the following command:
>  dnf -y install qemu-kvm qemu-guest-agent libvirt gdisk ceph
>
> Created the /etc/ceph/ceph.conf file (see listing below).
>
> Ran the following commands:
>  ceph-authtool --create-keyring /etc/ceph/ceph.mon.keyring --gen-key 
> -n mon. --cap mon 'allow *'
>  ceph-authtool --create-keyring /etc/ceph/ceph.client.admin.keyring 
> --gen-key -n client.admin --cap mon 'allow *' --cap osd 'allow *' --cap mds 
> 'allow *' --cap mgr 'allow *'
>  ceph-authtool --create-keyring /var/lib/ceph/bootstrap-osd/keyring 
> --gen-key -n client.bootstrap-osd --cap mon 'profile bootstrap-osd' --cap mgr 
> 'allow r'
>  ceph-authtool /etc/ceph/ceph.mon.keyring --import-keyring 
> /etc/ceph/ceph.client.admin.keyring
>  ceph-authtool /etc/ceph/ceph.mon.keyring --import-keyring 
> /var/lib/ceph/bootstrap-osd/keyring
>  chown -R ceph:ceph /etc/ceph/
>  chown -R ceph:ceph /var/lib/ceph/
>  monmaptool --create --add ceph01 192.168.0.10 --fsid 
> 98e84f97-031f-4958-bd54-22305f

[ceph-users] Re: Newbie Requesting Help - Please, This Is Driving Me Mad/Crazy!

2021-02-25 Thread Frank Schilder
I think it is this here: 
https://docs.ceph.com/en/latest/install/manual-deployment/#long-form . As far 
as I can tell, this is only for educational purposes and not intended for real 
deployment. It is not a manual deployment for lvm OSDs though, so yeah, this 
should be updated.

I think the intention of this section is exactly what you write, it is supposed 
to explain what the high-level tool is doing under the hood. Unfortunately, it 
seems to be ceph-disk and not ceph-volume.

For manual deployment, I myself use and would recommend the ceph-volume 
bluestore method just one section above 
https://docs.ceph.com/en/latest/install/manual-deployment/#bluestore with the 
additional simplification of using the "lvm batch" sub-command.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Burkhard Linke 
Sent: 25 February 2021 08:41:14
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Newbie Requesting Help - Please, This Is Driving Me 
Mad/Crazy!

Hi,


your whole OSD deployment is wrong. CEPH does not use any filesystem
anymore for at least two major releases, and the existing filestore
backend is deprecated. Dunno where you got those steps from...


Just use ceph-volume, and preferable the lvm based deployment. If you
really want to understand how CEPH works, use a default deployment
method and then investigate what this methods actually does (and why it
does exactly that...).


Regards,

Burkhard

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Erasure coded calculation

2021-02-25 Thread Szabo, Istvan (Agoda)
109 is 81.75 yes the rest of them some bluestore stuffs I guess.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---

-Original Message-
From: Simon Sutter 
Sent: Thursday, February 25, 2021 5:55 PM
To: ceph-users@ceph.io
Subject: [ceph-users] Erasure coded calculation

Email received from outside the company. If in doubt don't click links nor open 
attachments!


Hello everyone!

I'm trying to calculate the theoretical usable storage of a ceph cluster with 
erasure coded pools.

I have 8 nodes and the profile for all data pools will be k=6 m=2.
If every node has 6 x 1TB wouldn't the calculation be like this:
RAW capacity: 8Nodes x 6Disks x 1TB = 48TB Loss to m=2: 48TB / 8Nodes x 2m = 
12TB EC capacity: 48TB - 12TB = 36TB

At the moment I have one cluster with 8 nodes and different disks than the 
sample (but every node has the same amount of disks and the same sized disks).
The output of ceph df detail is:
--- RAW STORAGE ---
CLASS  SIZE AVAILUSED RAW USED  %RAW USED
hdd109 TiB  103 TiB  5.8 TiB   5.9 TiB   5.41
TOTAL  109 TiB  103 TiB  5.8 TiB   5.9 TiB   5.41

--- POOLS ---
POOL   ID  PGS  STORED   OBJECTS  %USED  MAX AVAIL
device_health_metrics   11   51 MiB   48  0 30 TiB
rep_data_fs 2   32   14 KiB3.41k  0 30 TiB
rep_meta_fs 3   32  227 MiB1.72k  0 30 TiB
ec_bkp14   32  4.2 TiB1.10M   6.11 67 TiB

So ec_bkp1 uses 4.2TiB an there are 67TiB free usable Storage.
This means total EC usable storage would be 71.2TiB.
But calculating with the 109TiB RAW storage, shouldn't it be  81.75?
Are the 10TiB just some overhead (that would be much overhead) or is the 
calculation not correct?

And what If I want to expand the cluster in the first sample above by three 
nodes with 6 x 2TB, which means not the same sized disks as the others.
Will the calculation with the same EC profile still be the same?
RAW capacity: 8Nodes x 6Disks x 1TB + 3Nodes x 6Disks x 2TB = 84TB Loss to m=2: 
84TB / 11Nodes x 2m = 15.27TB EC capacity: 84TB - 15.27TB = 68.72TB


Thanks in advance,
Simon
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
ceph-users-le...@ceph.io


This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: February 2021 Tech Talk and Code Walk-through

2021-02-25 Thread Mike Perez
Just a reminder, Sage is live now, giving an update on the Pacific
Release. This will be recorded and posted later to the Ceph Youtube
channel.

https://ceph.io/ceph-tech-talks/

On Tue, Feb 16, 2021 at 11:14 PM Mike Perez  wrote:
>
> Hi everyone!
>
> I'm excited to announce two talks we have on the schedule for February 2021:
>
> Jason Dillaman will be giving part 2 to the librbd code walk-through.
>
> The stream starts on February 23rd at 18:00 UTC / 19:00 CET / 1:00 PM
> EST / 10:00 AM PST
>
> https://tracker.ceph.com/projects/ceph/wiki/Code_Walkthroughs
>
> Part 1: https://www.youtube.com/watch?v=L0x61HpREy4
>
> --
>
> What's New in the Pacific Release
>
> Hear Sage Weil give a live update on the development of the Pacific Release.
>
> The stream starts on February 25th at 17:00 UTC / 18:00 CET / 12 PM
> EST / 9 AM PST.
>
> https://ceph.io/ceph-tech-talks/
>
> All live streams will be recorded and
>
> --
> Mike Perez
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RGW: Multiple Site does not sync olds data

2021-02-25 Thread 特木勒
Hi all:

ceph version: 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8)

I have a strange question, I just create a multiple site for Ceph cluster.
But I notice the old data of source cluster is not synced. Only new data
will be synced into second zone cluster.

Is there anything I need to do to enable full sync for bucket or this is a
bug?

Thanks
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Question about per MDS journals

2021-02-25 Thread Mykola Golub
On Thu, Feb 25, 2021 at 09:59:41AM +, John Spray wrote:
> osdc/Journaler is for RDB, client/Journaler is for CephFS.

Actually, src/journal/(Journaler.h) is for RBD (it is more generic,
but currently is used by RBD only).

And src/osdc/Journaler.h is for cephfs.

> 
> 
> On Thu, Feb 25, 2021 at 8:26 AM 조규진  wrote:
> >
> > Hi, John.
> >
> > Thanks for your kind reply!
> >
> > While i'm checking the code that you recommend to check and other .cc files 
> > about journal, I find that there is two Journaler class.
> > One is at "src/osdc/Journaler.h" and the other one is at 
> > "src/journal/Journaler.h".
> > If you don't mind, could you tell me which one is for MDS journal? and the 
> > differences between them?
> >
> > Thanks.
> > kyujin
> >
> > 2021년 2월 25일 (목) 오전 1:15, John Spray 님이 작성:
> >>
> >> On Wed, Feb 24, 2021 at 9:10 AM 조규진  wrote:
> >> >
> >> > Hi.
> >> >
> >> > I'm a newbie in CephFS and I have some questions about how per-MDS 
> >> > journals
> >> > work.
> >> > In Sage's paper (osdi '06), I read that each MDSs has its own journal and
> >> > it lazily flushes metadata modifications on OSD cluster.
> >> > What I'm wondering is that some directory operations like rename work 
> >> > with
> >> > multiple metadata and It may work on two or more MDSs and their journals,
> >> > so I think it needs some mechanisms to construct a transaction that works
> >> > on multiple journals like some distributed transaction mechanisms.
> >> >
> >> > Could anybody explains how per-MDS journals work in such directory
> >> > operations? or recommends some references about it?
> >>
> >> Your intuition is correct: these transactions span multiple MDS journals.
> >>
> >> The code for this stuff is somewhat long, in src/mds/Server.cc, but
> >> here are a couple of pointers if you're interested in untangling it:
> >> - Server::handle_client_rename is the entry point
> >> - The MDS which handles the client request sends MMDSPeerRequest
> >> messages to peers in rename_prepare_witness, and waits for
> >> acknowledgements before writing EUpdate events to its journal
> >> - The peer(s) write EPeerUpdate(OP_PREPARE) events to their journals
> >> during prepare, and EPeerUpdate(OP_COMMIT) after the first MDS has
> >> completed.
> >>
> >> John
> >>
> >>
> >>
> >> >
> >> > Thanks.
> >> > kyujin.
> >> > ___
> >> > ceph-users mailing list -- ceph-users@ceph.io
> >> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
Mykola Golub
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Erasure coded calculation

2021-02-25 Thread Mykola Golub
On Thu, Feb 25, 2021 at 10:55:05AM +, Simon Sutter wrote:

> The output of ceph df detail is:
> --- RAW STORAGE ---
> CLASS  SIZE AVAILUSED RAW USED  %RAW USED
> hdd109 TiB  103 TiB  5.8 TiB   5.9 TiB   5.41
> TOTAL  109 TiB  103 TiB  5.8 TiB   5.9 TiB   5.41
> 
> --- POOLS ---
> POOL   ID  PGS  STORED   OBJECTS  %USED  MAX AVAIL
> device_health_metrics   11   51 MiB   48  0 30 TiB
> rep_data_fs 2   32   14 KiB3.41k  0 30 TiB
> rep_meta_fs 3   32  227 MiB1.72k  0 30 TiB
> ec_bkp14   32  4.2 TiB1.10M   6.11 67 TiB
> 
> So ec_bkp1 uses 4.2TiB an there are 67TiB free usable Storage.
> This means total EC usable storage would be 71.2TiB.
> But calculating with the 109TiB RAW storage, shouldn't it be  81.75?

The "MAX AVAIL" is not "free", i.e. it is not a difference between
total and used. It is estimated by the cluster as "how much data user
will be able to store additionally until one of osds is filled up".

Consider an example, when by some reason all osds are 10% full and one
50% full. The cluster will "assume" that if you store additional data
it will be distributed the same way, i.e. that one osd will continue
to store more than others, so when all other osds will 20% full that
one will be 100% full and the cluster is not usable.

I.e. basically "MAX AVAIL" in this example is (N - 1) * 10% + 1 * 50%,
instead of (N - 1) * 90% + 1 * 50%, which whould you expect for "free".

To make "MAX AVAIL" match exactly "free", you have to have a perfectly
balanced cluster. Look at `ceph osd df` output to see how well data is
balanced in your case.

-- 
Mykola Golub
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] MON slow ops and growing MON store

2021-02-25 Thread Janek Bevendorff
Hi,

All of a sudden, we are experiencing very concerning MON behaviour. We have 
five MONs and all of them have thousands up to tens of thousands of slow ops, 
the oldest one blocking basically indefinitely (at least the timer keeps 
creeping up). Additionally, the MON stores keep inflating heavily. Under normal 
circumstances we have about 450-550MB there. Right now its 27GB and growing 
(rapidly).

I tried restarting all MONs, I disabled auto-scaling (just in case) and checked 
the system load and hardware. I also restarted the MGR and MDS daemons, but to 
no avail.

Is there any way I can debug this properly? I can’t seem to find how I can 
actually view what ops are causing this and what client (if any) may be 
responsible for it.

Thanks
Janek
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MON slow ops and growing MON store

2021-02-25 Thread Dan van der Ster
ceph daemon mon.`hostname -s` ops

That should show you the accumulating ops.

.. dan


On Thu, Feb 25, 2021, 8:23 PM Janek Bevendorff <
janek.bevendo...@uni-weimar.de> wrote:

> Hi,
>
> All of a sudden, we are experiencing very concerning MON behaviour. We
> have five MONs and all of them have thousands up to tens of thousands of
> slow ops, the oldest one blocking basically indefinitely (at least the
> timer keeps creeping up). Additionally, the MON stores keep inflating
> heavily. Under normal circumstances we have about 450-550MB there. Right
> now its 27GB and growing (rapidly).
>
> I tried restarting all MONs, I disabled auto-scaling (just in case) and
> checked the system load and hardware. I also restarted the MGR and MDS
> daemons, but to no avail.
>
> Is there any way I can debug this properly? I can’t seem to find how I can
> actually view what ops are causing this and what client (if any) may be
> responsible for it.
>
> Thanks
> Janek
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] MDSs report damaged metadata

2021-02-25 Thread ricardo.re.azevedo
Hi all,

 

My cephfs MDS is reporting damaged metadata following the addition (and
remapping) of 12 new OSDs. 
`ceph tell mds.database-0 damage ls` reports ~85 files damaged. All of type
"backtrace" which is very concerning. 
` ceph tell mds.database-0 scrub start / recursive repair` seems to have no
effect on the damage. What does this sort of damage mean? Is there anything
I can do to recover these files?


> ceph status reports:
  cluster:

id: 692905c0-f271-4cd8-9e43-1c32ef8abd13

health: HEALTH_ERR

1 MDSs report damaged metadata

630 pgs not deep-scrubbed in time

630 pgs not scrubbed in time

 

  services:

mon: 3 daemons, quorum database-0,file-server,webhost (age 37m)

mgr: webhost(active, since 3d), standbys: file-server, database-0

mds: cephfs:1 {0=database-0=up:active} 2 up:standby

osd: 48 osds: 48 up (since 56m), 48 in (since 13d); 10 remapped pgs

 

  task status:

scrub status:

mds.database-0: idle

 

  data:

pools:   7 pools, 633 pgs

objects: 60.82M objects, 231 TiB

usage:   336 TiB used, 246 TiB / 582 TiB avail

pgs: 623 active+clean

 6   active+remapped+backfilling

 4   active+remapped+backfill_wait

 

Thanks for the help.

 

Best,

Ricardo

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MON slow ops and growing MON store

2021-02-25 Thread Janek Bevendorff
Thanks, Dan.

On the first MON, the command doesn’t even return, but I was able to get a dump 
from the one I restarted most recently. The oldest ops look like this:

{
"description": "log(1000 entries from seq 17876238 at 
2021-02-25T15:13:20.306487+0100)",
"initiated_at": "2021-02-25T20:40:34.698932+0100",
"age": 183.762551121,
"duration": 183.762599201,
"type_data": {
"events": [
{
"time": "2021-02-25T20:40:34.698932+0100",
"event": "initiated"
},
{
"time": "2021-02-25T20:40:34.698636+0100",
"event": "throttled"
},
{
"time": "2021-02-25T20:40:34.698932+0100",
"event": "header_read"
},
{
"time": "2021-02-25T20:40:34.701407+0100",
"event": "all_read"
},
{
"time": "2021-02-25T20:40:34.701455+0100",
"event": "dispatched"
},
{
"time": "2021-02-25T20:40:34.701458+0100",
"event": "mon:_ms_dispatch"
},
{
"time": "2021-02-25T20:40:34.701459+0100",
"event": "mon:dispatch_op"
},
{
"time": "2021-02-25T20:40:34.701459+0100",
"event": "psvc:dispatch"
},
{
"time": "2021-02-25T20:40:34.701490+0100",
"event": "logm:wait_for_readable"
},
{
"time": "2021-02-25T20:40:34.701491+0100",
"event": "logm:wait_for_readable/paxos"
},
{
"time": "2021-02-25T20:40:34.701496+0100",
"event": "paxos:wait_for_readable"
},
{
"time": "2021-02-25T20:40:34.989198+0100",
"event": "callback finished"
},
{
"time": "2021-02-25T20:40:34.989199+0100",
"event": "psvc:dispatch"
},
{
"time": "2021-02-25T20:40:34.989208+0100",
"event": "logm:preprocess_query"
},
{
"time": "2021-02-25T20:40:34.989208+0100",
"event": "logm:preprocess_log"
},
{
"time": "2021-02-25T20:40:34.989278+0100",
"event": "forward_request_leader"
},
{
"time": "2021-02-25T20:40:34.989344+0100",
"event": "forwarded"
},
{
"time": "2021-02-25T20:41:58.658022+0100",
"event": "resend forwarded message to leader"
},
{
"time": "2021-02-25T20:42:27.735449+0100",
"event": "resend forwarded message to leader"
}
],
"info": {
"seq": 41550,
"src_is_mon": false,
"source": "osd.104 v2:XXX:6864/16579",
"forwarded_to_leader": true
}


Any idea what that might be about? Almost looks like this: 
https://tracker.ceph.com/issues/24180
I set debug_mon to 0, but I keep getting a lot of log spill in journals. It’s 
about 1-2 messages per second, mostly RocksDB stuff, but nothing that actually 
looks serious or even log-worthy. I noticed that before that despite logging 
being set to warning level, the cluster log keeps being written to the MON log. 
But it shouldn’t cause such massive stability issues, should it? The date on 
the log op is also weird. 15:13+0100 was hours ago.

Here’s my log config:

globaladvanced  clog_to_syslog_level
 warning
globalbasic err_to_syslog   
 true
globalbasic log_to_file 
 false
globalbasic log_to_stderr   
 false
globalbasic log_to_syslog   
 true
global  

[ceph-users] Re: MDSs report damaged metadata

2021-02-25 Thread Patrick Donnelly
Hello Ricardo,

On Thu, Feb 25, 2021 at 11:51 AM  wrote:
>
> Hi all,
>
>
>
> My cephfs MDS is reporting damaged metadata following the addition (and
> remapping) of 12 new OSDs.
> `ceph tell mds.database-0 damage ls` reports ~85 files damaged. All of type
> "backtrace" which is very concerning.

It is not concerning, actually. This just indicates that the reverse
link of the file's object data to its path in the file system is
incorrect.

> ` ceph tell mds.database-0 scrub start / recursive repair` seems to have no
> effect on the damage. What does this sort of damage mean? Is there anything
> I can do to recover these files?

Scrubbing should correct it. Try "recursive repair force" to see if
that helps. "force" will cause the MDS to revisit metadata that has
been scrubbed previously but unchanged since then.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MON slow ops and growing MON store

2021-02-25 Thread Dan van der Ster
> "source": "osd.104...

What's happening on that osd? Is it something new which corresponds to when
your mon started growing? Are other OSDs also flooding the mons with logs?

I'm mobile so can't check... Are those logging configs the defaults? If not
 revert to default...

BTW do your mons have stable quorum or are they flapping with this load?

.. dan



On Thu, Feb 25, 2021, 8:58 PM Janek Bevendorff <
janek.bevendo...@uni-weimar.de> wrote:

> Thanks, Dan.
>
> On the first MON, the command doesn’t even return, but I was able to get a
> dump from the one I restarted most recently. The oldest ops look like this:
>
> {
> "description": "log(1000 entries from seq 17876238 at
> 2021-02-25T15:13:20.306487+0100)",
> "initiated_at": "2021-02-25T20:40:34.698932+0100",
> "age": 183.762551121,
> "duration": 183.762599201,
> "type_data": {
> "events": [
> {
> "time": "2021-02-25T20:40:34.698932+0100",
> "event": "initiated"
> },
> {
> "time": "2021-02-25T20:40:34.698636+0100",
> "event": "throttled"
> },
> {
> "time": "2021-02-25T20:40:34.698932+0100",
> "event": "header_read"
> },
> {
> "time": "2021-02-25T20:40:34.701407+0100",
> "event": "all_read"
> },
> {
> "time": "2021-02-25T20:40:34.701455+0100",
> "event": "dispatched"
> },
> {
> "time": "2021-02-25T20:40:34.701458+0100",
> "event": "mon:_ms_dispatch"
> },
> {
> "time": "2021-02-25T20:40:34.701459+0100",
> "event": "mon:dispatch_op"
> },
> {
> "time": "2021-02-25T20:40:34.701459+0100",
> "event": "psvc:dispatch"
> },
> {
> "time": "2021-02-25T20:40:34.701490+0100",
> "event": "logm:wait_for_readable"
> },
> {
> "time": "2021-02-25T20:40:34.701491+0100",
> "event": "logm:wait_for_readable/paxos"
> },
> {
> "time": "2021-02-25T20:40:34.701496+0100",
> "event": "paxos:wait_for_readable"
> },
> {
> "time": "2021-02-25T20:40:34.989198+0100",
> "event": "callback finished"
> },
> {
> "time": "2021-02-25T20:40:34.989199+0100",
> "event": "psvc:dispatch"
> },
> {
> "time": "2021-02-25T20:40:34.989208+0100",
> "event": "logm:preprocess_query"
> },
> {
> "time": "2021-02-25T20:40:34.989208+0100",
> "event": "logm:preprocess_log"
> },
> {
> "time": "2021-02-25T20:40:34.989278+0100",
> "event": "forward_request_leader"
> },
> {
> "time": "2021-02-25T20:40:34.989344+0100",
> "event": "forwarded"
> },
> {
> "time": "2021-02-25T20:41:58.658022+0100",
> "event": "resend forwarded message to leader"
> },
> {
> "time": "2021-02-25T20:42:27.735449+0100",
> "event": "resend forwarded message to leader"
> }
> ],
> "info": {
> "seq": 41550,
> "src_is_mon": false,
> "source": "osd.104 v2:XXX:6864/16579",
> "forwarded_to_leader": true
> }
>
>
> Any idea what that might be about? Almost looks like this:
> https://tracker.ceph.com/issues/24180
> I set debug_mon to 0, but I keep getting a lot of log spill in journals.
> It’s about 1-2 messages per second, mostly RocksDB stuff, but nothing that
> actually looks serious or even log-worthy. I noticed that before that
> despite logging being set to warning level, the cluster log keeps being
> written to the MON log. But it shouldn’t cause such massive stability
> is

[ceph-users] Re: MON slow ops and growing MON store

2021-02-25 Thread Janek Bevendorff
Nothing special is going on that OSD as far as I can tell and the OSD number of 
each op is different.
The config isn’t entirely default, but we have been using it successfully for 
quite a bit. It basically just redirects everything to journald so that we 
don’t have log creep. I reverted it nonetheless.

The MONs have a stable quorum, but the store size is so large now (35GB by this 
time), that I am seeing first daemon restarts.


> On 25. Feb 2021, at 21:10, Dan van der Ster  wrote:
> 
> > "source": "osd.104...
> 
> What's happening on that osd? Is it something new which corresponds to when 
> your mon started growing? Are other OSDs also flooding the mons with logs?
> 
> I'm mobile so can't check... Are those logging configs the defaults? If not 
>  revert to default...
> 
> BTW do your mons have stable quorum or are they flapping with this load?
> 
> .. dan
> 
> 
> 
> On Thu, Feb 25, 2021, 8:58 PM Janek Bevendorff 
> mailto:janek.bevendo...@uni-weimar.de>> 
> wrote:
> Thanks, Dan.
> 
> On the first MON, the command doesn’t even return, but I was able to get a 
> dump from the one I restarted most recently. The oldest ops look like this:
> 
> {
> "description": "log(1000 entries from seq 17876238 at 
> 2021-02-25T15:13:20.306487+0100)",
> "initiated_at": "2021-02-25T20:40:34.698932+0100",
> "age": 183.762551121,
> "duration": 183.762599201,
> "type_data": {
> "events": [
> {
> "time": "2021-02-25T20:40:34.698932+0100",
> "event": "initiated"
> },
> {
> "time": "2021-02-25T20:40:34.698636+0100",
> "event": "throttled"
> },
> {
> "time": "2021-02-25T20:40:34.698932+0100",
> "event": "header_read"
> },
> {
> "time": "2021-02-25T20:40:34.701407+0100",
> "event": "all_read"
> },
> {
> "time": "2021-02-25T20:40:34.701455+0100",
> "event": "dispatched"
> },
> {
> "time": "2021-02-25T20:40:34.701458+0100",
> "event": "mon:_ms_dispatch"
> },
> {
> "time": "2021-02-25T20:40:34.701459+0100",
> "event": "mon:dispatch_op"
> },
> {
> "time": "2021-02-25T20:40:34.701459+0100",
> "event": "psvc:dispatch"
> },
> {
> "time": "2021-02-25T20:40:34.701490+0100",
> "event": "logm:wait_for_readable"
> },
> {
> "time": "2021-02-25T20:40:34.701491+0100",
> "event": "logm:wait_for_readable/paxos"
> },
> {
> "time": "2021-02-25T20:40:34.701496+0100",
> "event": "paxos:wait_for_readable"
> },
> {
> "time": "2021-02-25T20:40:34.989198+0100",
> "event": "callback finished"
> },
> {
> "time": "2021-02-25T20:40:34.989199+0100",
> "event": "psvc:dispatch"
> },
> {
> "time": "2021-02-25T20:40:34.989208+0100",
> "event": "logm:preprocess_query"
> },
> {
> "time": "2021-02-25T20:40:34.989208+0100",
> "event": "logm:preprocess_log"
> },
> {
> "time": "2021-02-25T20:40:34.989278+0100",
> "event": "forward_request_leader"
> },
> {
> "time": "2021-02-25T20:40:34.989344+0100",
> "event": "forwarded"
> },
> {
> "time": "2021-02-25T20:41:58.658022+0100",
> "event": "resend forwarded message to leader"
> },
> {
> "time": "2021-02-25T20:42:27.735449+0100",
> "event": "resend forwarded message to leader"
> }
> ],
> "info": {
> "seq": 41550,
> "src_is_mon": false,
> "source": "osd.104 v2:XXX:6864/16579",
>  

[ceph-users] Re: MON slow ops and growing MON store

2021-02-25 Thread Seena Fallah
I had the same problem in my cluster and it was because of insights mgr
module that was storing lots of data to the RocksDB because mu cluster was
degraded.
If you have degraded pgs try to disable insights module.

On Thu, Feb 25, 2021 at 11:40 PM Dan van der Ster 
wrote:

> > "source": "osd.104...
>
> What's happening on that osd? Is it something new which corresponds to when
> your mon started growing? Are other OSDs also flooding the mons with logs?
>
> I'm mobile so can't check... Are those logging configs the defaults? If not
>  revert to default...
>
> BTW do your mons have stable quorum or are they flapping with this load?
>
> .. dan
>
>
>
> On Thu, Feb 25, 2021, 8:58 PM Janek Bevendorff <
> janek.bevendo...@uni-weimar.de> wrote:
>
> > Thanks, Dan.
> >
> > On the first MON, the command doesn’t even return, but I was able to get
> a
> > dump from the one I restarted most recently. The oldest ops look like
> this:
> >
> > {
> > "description": "log(1000 entries from seq 17876238 at
> > 2021-02-25T15:13:20.306487+0100)",
> > "initiated_at": "2021-02-25T20:40:34.698932+0100",
> > "age": 183.762551121,
> > "duration": 183.762599201,
> > "type_data": {
> > "events": [
> > {
> > "time": "2021-02-25T20:40:34.698932+0100",
> > "event": "initiated"
> > },
> > {
> > "time": "2021-02-25T20:40:34.698636+0100",
> > "event": "throttled"
> > },
> > {
> > "time": "2021-02-25T20:40:34.698932+0100",
> > "event": "header_read"
> > },
> > {
> > "time": "2021-02-25T20:40:34.701407+0100",
> > "event": "all_read"
> > },
> > {
> > "time": "2021-02-25T20:40:34.701455+0100",
> > "event": "dispatched"
> > },
> > {
> > "time": "2021-02-25T20:40:34.701458+0100",
> > "event": "mon:_ms_dispatch"
> > },
> > {
> > "time": "2021-02-25T20:40:34.701459+0100",
> > "event": "mon:dispatch_op"
> > },
> > {
> > "time": "2021-02-25T20:40:34.701459+0100",
> > "event": "psvc:dispatch"
> > },
> > {
> > "time": "2021-02-25T20:40:34.701490+0100",
> > "event": "logm:wait_for_readable"
> > },
> > {
> > "time": "2021-02-25T20:40:34.701491+0100",
> > "event": "logm:wait_for_readable/paxos"
> > },
> > {
> > "time": "2021-02-25T20:40:34.701496+0100",
> > "event": "paxos:wait_for_readable"
> > },
> > {
> > "time": "2021-02-25T20:40:34.989198+0100",
> > "event": "callback finished"
> > },
> > {
> > "time": "2021-02-25T20:40:34.989199+0100",
> > "event": "psvc:dispatch"
> > },
> > {
> > "time": "2021-02-25T20:40:34.989208+0100",
> > "event": "logm:preprocess_query"
> > },
> > {
> > "time": "2021-02-25T20:40:34.989208+0100",
> > "event": "logm:preprocess_log"
> > },
> > {
> > "time": "2021-02-25T20:40:34.989278+0100",
> > "event": "forward_request_leader"
> > },
> > {
> > "time": "2021-02-25T20:40:34.989344+0100",
> > "event": "forwarded"
> > },
> > {
> > "time": "2021-02-25T20:41:58.658022+0100",
> > "event": "resend forwarded message to leader"
> > },
> > {
> > "time": "2021-02-25T20:42:27.735449+0100",
> > "event": "resend forwarded message to leader"
> > }
> > ],
> > "info": {
> > "seq": 41550,
> > "src_is_mon": false,
> > "source": "osd.104 v2:XXX:6864/16579",
> > "forwarded_to_leader": true
> >  

[ceph-users] Re: MON slow ops and growing MON store

2021-02-25 Thread Janek Bevendorff
Thanks for the tip, but I do not have degraded PGs and the module is already 
disabled.


> On 25. Feb 2021, at 21:17, Seena Fallah  wrote:
> 
> I had the same problem in my cluster and it was because of insights mgr 
> module that was storing lots of data to the RocksDB because mu cluster was 
> degraded. 
> If you have degraded pgs try to disable insights module.
> 
> On Thu, Feb 25, 2021 at 11:40 PM Dan van der Ster  > wrote:
> > "source": "osd.104...
> 
> What's happening on that osd? Is it something new which corresponds to when
> your mon started growing? Are other OSDs also flooding the mons with logs?
> 
> I'm mobile so can't check... Are those logging configs the defaults? If not
>  revert to default...
> 
> BTW do your mons have stable quorum or are they flapping with this load?
> 
> .. dan
> 
> 
> 
> On Thu, Feb 25, 2021, 8:58 PM Janek Bevendorff <
> janek.bevendo...@uni-weimar.de > wrote:
> 
> > Thanks, Dan.
> >
> > On the first MON, the command doesn’t even return, but I was able to get a
> > dump from the one I restarted most recently. The oldest ops look like this:
> >
> > {
> > "description": "log(1000 entries from seq 17876238 at
> > 2021-02-25T15:13:20.306487+0100)",
> > "initiated_at": "2021-02-25T20:40:34.698932+0100",
> > "age": 183.762551121,
> > "duration": 183.762599201,
> > "type_data": {
> > "events": [
> > {
> > "time": "2021-02-25T20:40:34.698932+0100",
> > "event": "initiated"
> > },
> > {
> > "time": "2021-02-25T20:40:34.698636+0100",
> > "event": "throttled"
> > },
> > {
> > "time": "2021-02-25T20:40:34.698932+0100",
> > "event": "header_read"
> > },
> > {
> > "time": "2021-02-25T20:40:34.701407+0100",
> > "event": "all_read"
> > },
> > {
> > "time": "2021-02-25T20:40:34.701455+0100",
> > "event": "dispatched"
> > },
> > {
> > "time": "2021-02-25T20:40:34.701458+0100",
> > "event": "mon:_ms_dispatch"
> > },
> > {
> > "time": "2021-02-25T20:40:34.701459+0100",
> > "event": "mon:dispatch_op"
> > },
> > {
> > "time": "2021-02-25T20:40:34.701459+0100",
> > "event": "psvc:dispatch"
> > },
> > {
> > "time": "2021-02-25T20:40:34.701490+0100",
> > "event": "logm:wait_for_readable"
> > },
> > {
> > "time": "2021-02-25T20:40:34.701491+0100",
> > "event": "logm:wait_for_readable/paxos"
> > },
> > {
> > "time": "2021-02-25T20:40:34.701496+0100",
> > "event": "paxos:wait_for_readable"
> > },
> > {
> > "time": "2021-02-25T20:40:34.989198+0100",
> > "event": "callback finished"
> > },
> > {
> > "time": "2021-02-25T20:40:34.989199+0100",
> > "event": "psvc:dispatch"
> > },
> > {
> > "time": "2021-02-25T20:40:34.989208+0100",
> > "event": "logm:preprocess_query"
> > },
> > {
> > "time": "2021-02-25T20:40:34.989208+0100",
> > "event": "logm:preprocess_log"
> > },
> > {
> > "time": "2021-02-25T20:40:34.989278+0100",
> > "event": "forward_request_leader"
> > },
> > {
> > "time": "2021-02-25T20:40:34.989344+0100",
> > "event": "forwarded"
> > },
> > {
> > "time": "2021-02-25T20:41:58.658022+0100",
> > "event": "resend forwarded message to leader"
> > },
> > {
> > "time": "2021-02-25T20:42:27.735449+0100",
> > "event": "resend forwarded message to leader"
> > }
> > ],
> 

[ceph-users] Re: MON slow ops and growing MON store

2021-02-25 Thread Dan van der Ster
Maybe the debugging steps in that insights tracker can be helpful
anyway: https://tracker.ceph.com/issues/39955

-- dan

On Thu, Feb 25, 2021 at 9:27 PM Janek Bevendorff
 wrote:
>
> Thanks for the tip, but I do not have degraded PGs and the module is already 
> disabled.
>
>
> On 25. Feb 2021, at 21:17, Seena Fallah  wrote:
>
> I had the same problem in my cluster and it was because of insights mgr 
> module that was storing lots of data to the RocksDB because mu cluster was 
> degraded.
> If you have degraded pgs try to disable insights module.
>
> On Thu, Feb 25, 2021 at 11:40 PM Dan van der Ster  wrote:
>>
>> > "source": "osd.104...
>>
>> What's happening on that osd? Is it something new which corresponds to when
>> your mon started growing? Are other OSDs also flooding the mons with logs?
>>
>> I'm mobile so can't check... Are those logging configs the defaults? If not
>>  revert to default...
>>
>> BTW do your mons have stable quorum or are they flapping with this load?
>>
>> .. dan
>>
>>
>>
>> On Thu, Feb 25, 2021, 8:58 PM Janek Bevendorff <
>> janek.bevendo...@uni-weimar.de> wrote:
>>
>> > Thanks, Dan.
>> >
>> > On the first MON, the command doesn’t even return, but I was able to get a
>> > dump from the one I restarted most recently. The oldest ops look like this:
>> >
>> > {
>> > "description": "log(1000 entries from seq 17876238 at
>> > 2021-02-25T15:13:20.306487+0100)",
>> > "initiated_at": "2021-02-25T20:40:34.698932+0100",
>> > "age": 183.762551121,
>> > "duration": 183.762599201,
>> > "type_data": {
>> > "events": [
>> > {
>> > "time": "2021-02-25T20:40:34.698932+0100",
>> > "event": "initiated"
>> > },
>> > {
>> > "time": "2021-02-25T20:40:34.698636+0100",
>> > "event": "throttled"
>> > },
>> > {
>> > "time": "2021-02-25T20:40:34.698932+0100",
>> > "event": "header_read"
>> > },
>> > {
>> > "time": "2021-02-25T20:40:34.701407+0100",
>> > "event": "all_read"
>> > },
>> > {
>> > "time": "2021-02-25T20:40:34.701455+0100",
>> > "event": "dispatched"
>> > },
>> > {
>> > "time": "2021-02-25T20:40:34.701458+0100",
>> > "event": "mon:_ms_dispatch"
>> > },
>> > {
>> > "time": "2021-02-25T20:40:34.701459+0100",
>> > "event": "mon:dispatch_op"
>> > },
>> > {
>> > "time": "2021-02-25T20:40:34.701459+0100",
>> > "event": "psvc:dispatch"
>> > },
>> > {
>> > "time": "2021-02-25T20:40:34.701490+0100",
>> > "event": "logm:wait_for_readable"
>> > },
>> > {
>> > "time": "2021-02-25T20:40:34.701491+0100",
>> > "event": "logm:wait_for_readable/paxos"
>> > },
>> > {
>> > "time": "2021-02-25T20:40:34.701496+0100",
>> > "event": "paxos:wait_for_readable"
>> > },
>> > {
>> > "time": "2021-02-25T20:40:34.989198+0100",
>> > "event": "callback finished"
>> > },
>> > {
>> > "time": "2021-02-25T20:40:34.989199+0100",
>> > "event": "psvc:dispatch"
>> > },
>> > {
>> > "time": "2021-02-25T20:40:34.989208+0100",
>> > "event": "logm:preprocess_query"
>> > },
>> > {
>> > "time": "2021-02-25T20:40:34.989208+0100",
>> > "event": "logm:preprocess_log"
>> > },
>> > {
>> > "time": "2021-02-25T20:40:34.989278+0100",
>> > "event": "forward_request_leader"
>> > },
>> > {
>> > "time": "2021-02-25T20:40:34.989344+0100",
>> > "event": "forwarded"
>> > },
>> > {
>> > "time": "2021-02-25T20:41:58.658022+0100",
>> > "event": "resend forwarded message to leader"
>> > },
>> >

[ceph-users] Re: MON slow ops and growing MON store

2021-02-25 Thread Janek Bevendorff
Thanks, I’ll try that tomorrow.


> On 25. Feb 2021, at 21:59, Dan van der Ster  wrote:
> 
> Maybe the debugging steps in that insights tracker can be helpful
> anyway: https://tracker.ceph.com/issues/39955
> 
> -- dan
> 
> On Thu, Feb 25, 2021 at 9:27 PM Janek Bevendorff
>  wrote:
>> 
>> Thanks for the tip, but I do not have degraded PGs and the module is already 
>> disabled.
>> 
>> 
>> On 25. Feb 2021, at 21:17, Seena Fallah  wrote:
>> 
>> I had the same problem in my cluster and it was because of insights mgr 
>> module that was storing lots of data to the RocksDB because mu cluster was 
>> degraded.
>> If you have degraded pgs try to disable insights module.
>> 
>> On Thu, Feb 25, 2021 at 11:40 PM Dan van der Ster  
>> wrote:
>>> 
 "source": "osd.104...
>>> 
>>> What's happening on that osd? Is it something new which corresponds to when
>>> your mon started growing? Are other OSDs also flooding the mons with logs?
>>> 
>>> I'm mobile so can't check... Are those logging configs the defaults? If not
>>>  revert to default...
>>> 
>>> BTW do your mons have stable quorum or are they flapping with this load?
>>> 
>>> .. dan
>>> 
>>> 
>>> 
>>> On Thu, Feb 25, 2021, 8:58 PM Janek Bevendorff <
>>> janek.bevendo...@uni-weimar.de> wrote:
>>> 
 Thanks, Dan.
 
 On the first MON, the command doesn’t even return, but I was able to get a
 dump from the one I restarted most recently. The oldest ops look like this:
 
{
"description": "log(1000 entries from seq 17876238 at
 2021-02-25T15:13:20.306487+0100)",
"initiated_at": "2021-02-25T20:40:34.698932+0100",
"age": 183.762551121,
"duration": 183.762599201,
"type_data": {
"events": [
{
"time": "2021-02-25T20:40:34.698932+0100",
"event": "initiated"
},
{
"time": "2021-02-25T20:40:34.698636+0100",
"event": "throttled"
},
{
"time": "2021-02-25T20:40:34.698932+0100",
"event": "header_read"
},
{
"time": "2021-02-25T20:40:34.701407+0100",
"event": "all_read"
},
{
"time": "2021-02-25T20:40:34.701455+0100",
"event": "dispatched"
},
{
"time": "2021-02-25T20:40:34.701458+0100",
"event": "mon:_ms_dispatch"
},
{
"time": "2021-02-25T20:40:34.701459+0100",
"event": "mon:dispatch_op"
},
{
"time": "2021-02-25T20:40:34.701459+0100",
"event": "psvc:dispatch"
},
{
"time": "2021-02-25T20:40:34.701490+0100",
"event": "logm:wait_for_readable"
},
{
"time": "2021-02-25T20:40:34.701491+0100",
"event": "logm:wait_for_readable/paxos"
},
{
"time": "2021-02-25T20:40:34.701496+0100",
"event": "paxos:wait_for_readable"
},
{
"time": "2021-02-25T20:40:34.989198+0100",
"event": "callback finished"
},
{
"time": "2021-02-25T20:40:34.989199+0100",
"event": "psvc:dispatch"
},
{
"time": "2021-02-25T20:40:34.989208+0100",
"event": "logm:preprocess_query"
},
{
"time": "2021-02-25T20:40:34.989208+0100",
"event": "logm:preprocess_log"
},
{
"time": "2021-02-25T20:40:34.989278+0100",
"event": "forward_request_leader"
},
{
"time": "2021-02-25T20:40:34.989344+0100",
"event": "forwarded"
},
{
"time": "2021-02-25T20:41:58.658022+0100",
"event": 

[ceph-users] Re: MON slow ops and growing MON store

2021-02-25 Thread Dan van der Ster
Also did you solve your log spam issue here?
https://tracker.ceph.com/issues/49161
Surely these things are related?

You might need to share more full logs from cluster, mon, osd, mds,
mgr so that we can help get to the bottom of this.

-- dan

On Thu, Feb 25, 2021 at 10:04 PM Janek Bevendorff
 wrote:
>
> Thanks, I’ll try that tomorrow.
>
>
> > On 25. Feb 2021, at 21:59, Dan van der Ster  wrote:
> >
> > Maybe the debugging steps in that insights tracker can be helpful
> > anyway: https://tracker.ceph.com/issues/39955
> >
> > -- dan
> >
> > On Thu, Feb 25, 2021 at 9:27 PM Janek Bevendorff
> >  wrote:
> >>
> >> Thanks for the tip, but I do not have degraded PGs and the module is 
> >> already disabled.
> >>
> >>
> >> On 25. Feb 2021, at 21:17, Seena Fallah  wrote:
> >>
> >> I had the same problem in my cluster and it was because of insights mgr 
> >> module that was storing lots of data to the RocksDB because mu cluster was 
> >> degraded.
> >> If you have degraded pgs try to disable insights module.
> >>
> >> On Thu, Feb 25, 2021 at 11:40 PM Dan van der Ster  
> >> wrote:
> >>>
>  "source": "osd.104...
> >>>
> >>> What's happening on that osd? Is it something new which corresponds to 
> >>> when
> >>> your mon started growing? Are other OSDs also flooding the mons with logs?
> >>>
> >>> I'm mobile so can't check... Are those logging configs the defaults? If 
> >>> not
> >>>  revert to default...
> >>>
> >>> BTW do your mons have stable quorum or are they flapping with this load?
> >>>
> >>> .. dan
> >>>
> >>>
> >>>
> >>> On Thu, Feb 25, 2021, 8:58 PM Janek Bevendorff <
> >>> janek.bevendo...@uni-weimar.de> wrote:
> >>>
>  Thanks, Dan.
> 
>  On the first MON, the command doesn’t even return, but I was able to get 
>  a
>  dump from the one I restarted most recently. The oldest ops look like 
>  this:
> 
> {
> "description": "log(1000 entries from seq 17876238 at
>  2021-02-25T15:13:20.306487+0100)",
> "initiated_at": "2021-02-25T20:40:34.698932+0100",
> "age": 183.762551121,
> "duration": 183.762599201,
> "type_data": {
> "events": [
> {
> "time": "2021-02-25T20:40:34.698932+0100",
> "event": "initiated"
> },
> {
> "time": "2021-02-25T20:40:34.698636+0100",
> "event": "throttled"
> },
> {
> "time": "2021-02-25T20:40:34.698932+0100",
> "event": "header_read"
> },
> {
> "time": "2021-02-25T20:40:34.701407+0100",
> "event": "all_read"
> },
> {
> "time": "2021-02-25T20:40:34.701455+0100",
> "event": "dispatched"
> },
> {
> "time": "2021-02-25T20:40:34.701458+0100",
> "event": "mon:_ms_dispatch"
> },
> {
> "time": "2021-02-25T20:40:34.701459+0100",
> "event": "mon:dispatch_op"
> },
> {
> "time": "2021-02-25T20:40:34.701459+0100",
> "event": "psvc:dispatch"
> },
> {
> "time": "2021-02-25T20:40:34.701490+0100",
> "event": "logm:wait_for_readable"
> },
> {
> "time": "2021-02-25T20:40:34.701491+0100",
> "event": "logm:wait_for_readable/paxos"
> },
> {
> "time": "2021-02-25T20:40:34.701496+0100",
> "event": "paxos:wait_for_readable"
> },
> {
> "time": "2021-02-25T20:40:34.989198+0100",
> "event": "callback finished"
> },
> {
> "time": "2021-02-25T20:40:34.989199+0100",
> "event": "psvc:dispatch"
> },
> {
> "time": "2021-02-25T20:40:34.989208+0100",
> "event": "logm:preprocess_query"
> },
> {
> "time": "2021-02-25T20:40:34.989208+0100",
> "even

[ceph-users] Re: MON slow ops and growing MON store

2021-02-25 Thread Janek Bevendorff


> On 25. Feb 2021, at 22:17, Dan van der Ster  wrote:
> 
> Also did you solve your log spam issue here?
> https://tracker.ceph.com/issues/49161
> Surely these things are related?


No. But I noticed that DBG log spam only happens when log_to_syslog is enabled. 
systemd is smart enough to avoid filling up the disks/RAM, but it may still 
strain the whole system. I disabled that for now and enabled log_to_file again, 
which doesn’t ignore the configured debug level. I am pretty sure that’s a bug.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Question about per MDS journals

2021-02-25 Thread John Spray
Quite right, I flipped them while writing the mail - oops.

John

On Thu, Feb 25, 2021 at 6:04 PM Mykola Golub  wrote:
>
> On Thu, Feb 25, 2021 at 09:59:41AM +, John Spray wrote:
> > osdc/Journaler is for RDB, client/Journaler is for CephFS.
>
> Actually, src/journal/(Journaler.h) is for RBD (it is more generic,
> but currently is used by RBD only).
>
> And src/osdc/Journaler.h is for cephfs.
>
> >
> >
> > On Thu, Feb 25, 2021 at 8:26 AM 조규진  wrote:
> > >
> > > Hi, John.
> > >
> > > Thanks for your kind reply!
> > >
> > > While i'm checking the code that you recommend to check and other .cc 
> > > files about journal, I find that there is two Journaler class.
> > > One is at "src/osdc/Journaler.h" and the other one is at 
> > > "src/journal/Journaler.h".
> > > If you don't mind, could you tell me which one is for MDS journal? and 
> > > the differences between them?
> > >
> > > Thanks.
> > > kyujin
> > >
> > > 2021년 2월 25일 (목) 오전 1:15, John Spray 님이 작성:
> > >>
> > >> On Wed, Feb 24, 2021 at 9:10 AM 조규진  wrote:
> > >> >
> > >> > Hi.
> > >> >
> > >> > I'm a newbie in CephFS and I have some questions about how per-MDS 
> > >> > journals
> > >> > work.
> > >> > In Sage's paper (osdi '06), I read that each MDSs has its own journal 
> > >> > and
> > >> > it lazily flushes metadata modifications on OSD cluster.
> > >> > What I'm wondering is that some directory operations like rename work 
> > >> > with
> > >> > multiple metadata and It may work on two or more MDSs and their 
> > >> > journals,
> > >> > so I think it needs some mechanisms to construct a transaction that 
> > >> > works
> > >> > on multiple journals like some distributed transaction mechanisms.
> > >> >
> > >> > Could anybody explains how per-MDS journals work in such directory
> > >> > operations? or recommends some references about it?
> > >>
> > >> Your intuition is correct: these transactions span multiple MDS journals.
> > >>
> > >> The code for this stuff is somewhat long, in src/mds/Server.cc, but
> > >> here are a couple of pointers if you're interested in untangling it:
> > >> - Server::handle_client_rename is the entry point
> > >> - The MDS which handles the client request sends MMDSPeerRequest
> > >> messages to peers in rename_prepare_witness, and waits for
> > >> acknowledgements before writing EUpdate events to its journal
> > >> - The peer(s) write EPeerUpdate(OP_PREPARE) events to their journals
> > >> during prepare, and EPeerUpdate(OP_COMMIT) after the first MDS has
> > >> completed.
> > >>
> > >> John
> > >>
> > >>
> > >>
> > >> >
> > >> > Thanks.
> > >> > kyujin.
> > >> > ___
> > >> > ceph-users mailing list -- ceph-users@ceph.io
> > >> > To unsubscribe send an email to ceph-users-le...@ceph.io
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
> --
> Mykola Golub
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Slow cluster / misplaced objects - Ceph 15.2.9

2021-02-25 Thread David Orman
Hi,

We've got an interesting issue we're running into on Ceph 15.2.9. We're
experiencing VERY slow performance from the cluster, and extremely slow
misplaced object correction, with very little cpu/disk/network utilization
(almost idle) across all nodes in the cluster.

We have 7 servers in this cluster, 24 rotational OSDs, and two NVMEs with
12 OSD's worth of DB/WAL files on them. The OSDs are all equal weighted, so
the tree is pretty straightforward:

root@ceph01:~# ceph osd tree

Inferring fsid 41bb9256-c3bf-11ea-85b9-9e07b0435492

Inferring config
/var/lib/ceph/41bb9256-c3bf-11ea-85b9-9e07b0435492/mon.ceph01/config

Using recent ceph image
docker.io/ceph/ceph@sha256:4e710662986cf366c282323bfb4c4ca507d7e117c5ccf691a8273732073297e5

ID   CLASS  WEIGHT  TYPE NAMESTATUS  REWEIGHT  PRI-AFF

 -1 2149.39062  root default

 -2 2149.39062  rack rack1

 -5  307.05579  host ceph01

  0hdd12.79399  osd.0up   1.0  1.0

  1hdd12.79399  osd.1up   1.0  1.0

  2hdd12.79399  osd.2up   1.0  1.0

  3hdd12.79399  osd.3up   1.0  1.0

  4hdd12.79399  osd.4up   1.0  1.0

  5hdd12.79399  osd.5up   1.0  1.0

  6hdd12.79399  osd.6up   1.0  1.0

  7hdd12.79399  osd.7up   1.0  1.0

  8hdd12.79399  osd.8up   1.0  1.0

  9hdd12.79399  osd.9up   1.0  1.0

 10hdd12.79399  osd.10   up   1.0  1.0

 11hdd12.79399  osd.11   up   1.0  1.0

 12hdd12.79399  osd.12   up   1.0  1.0

 13hdd12.79399  osd.13   up   1.0  1.0

 14hdd12.79399  osd.14   up   1.0  1.0

 15hdd12.79399  osd.15   up   1.0  1.0

 16hdd12.79399  osd.16   up   1.0  1.0

 17hdd12.79399  osd.17   up   1.0  1.0

 18hdd12.79399  osd.18   up   1.0  1.0

 19hdd12.79399  osd.19   up   1.0  1.0

 20hdd12.79399  osd.20   up   1.0  1.0

 21hdd12.79399  osd.21   up   1.0  1.0

 22hdd12.79399  osd.22   up   1.0  1.0

 23hdd12.79399  osd.23   up   1.0  1.0

 -7  307.05579  host ceph02

 24hdd12.79399  osd.24   up   1.0  1.0

 25hdd12.79399  osd.25   up   1.0  1.0

 26hdd12.79399  osd.26   up   1.0  1.0

 27hdd12.79399  osd.27   up   1.0  1.0

 28hdd12.79399  osd.28   up   1.0  1.0

 29hdd12.79399  osd.29   up   1.0  1.0

 30hdd12.79399  osd.30   up   1.0  1.0

 31hdd12.79399  osd.31   up   1.0  1.0

 32hdd12.79399  osd.32   up   1.0  1.0

 33hdd12.79399  osd.33   up   1.0  1.0

 34hdd12.79399  osd.34   up   1.0  1.0

 35hdd12.79399  osd.35   up   1.0  1.0

 36hdd12.79399  osd.36   up   1.0  1.0

 37hdd12.79399  osd.37   up   1.0  1.0

 38hdd12.79399  osd.38   up   1.0  1.0

 39hdd12.79399  osd.39   up   1.0  1.0

 40hdd12.79399  osd.40   up   1.0  1.0

 41hdd12.79399  osd.41   up   1.0  1.0

 42hdd12.79399  osd.42   up   1.0  1.0

 43hdd12.79399  osd.43   up   1.0  1.0

 44hdd12.79399  osd.44   up   1.0  1.0

 45hdd12.79399  osd.45   up   1.0  1.0

 46hdd12.79399  osd.46   up   1.0  1.0

 47hdd12.79399  osd.47   up   1.0  1.0

 -9  307.05579  host ceph03

 48hdd12.79399  osd.48   up   1.0  1.0

 49hdd12.79399  osd.49   up   1.0  1.0

 50hdd12.79399  osd.50   up   1.0  1.0

 51hdd12.79399  osd.51   up   1.0  1.0

 52hdd12.79399  osd.52   up   1.0  1.0

 53hdd12.79399  osd.53   up   1.0  1.0

 54hdd12.79399  osd.54   up   1.0  1.0

 55h

[ceph-users] Re: [Suspicious newsletter] RGW: Multiple Site does not sync olds data

2021-02-25 Thread Szabo, Istvan (Agoda)
Same for me, 15.2.8 also.
I’m trying directional sync now, looks like symmetrical has issue.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---

On 2021. Feb 26., at 1:03, 特木勒  wrote:

Email received from outside the company. If in doubt don't click links nor 
open attachments!


Hi all:

ceph version: 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8)

I have a strange question, I just create a multiple site for Ceph cluster.
But I notice the old data of source cluster is not synced. Only new data
will be synced into second zone cluster.

Is there anything I need to do to enable full sync for bucket or this is a
bug?

Thanks
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Newbie Requesting Help - Please, This Is Driving Me Mad/Crazy! - A Follow Up

2021-02-25 Thread duluxoz

Hi Everyone,

Thanks to all for both the online and PM help - once it was pointed out 
that the existing (Octopus) Documentation was... less than current I 
ended up using the ceph-volume command.


A couple of follow-up questions:

When using ceph-volume lvm create:

1. Can you specify an osd number, or are you stuck with the system
   assigned one?
2. How do you use the command with only part of a HDD - ie something
   along the lines of 'ceph-volume lvm create --data /dev/sda4'
   (because sda1-3 contains the os/boot/swap systems)?
3. Where the hell do the drives get mapped/mounted/whatever too?

Thanks for the help

Regards

Dulux-Oz

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Newbie Help With ceph-mgr

2021-02-25 Thread duluxoz

Hi All,

My ceph-mgr keeps stopping (for some unknown reason) after about an hour 
or so (but has run for up to 2-3 hours before stopping). Up till now 
I've simple restarted it with 'ceph-mgr -i ceph01'.


Is this normal behaviour, or if it isn't, what should I be looking for 
in the logs?


I was thinking of writing a quick cron script (with 'ceph-mgr -i 
ceph01') to run on the hour every hour to restart it, but figured that 
there had to be a better way - especially if ceph-mgr was crashing 
instead of being a "feature". Any ideas/advice?


Thanks in advance

Dulux-Oz

--
Peregrine IT Signature

*Matthew J BLACK*
  M.Inf.Tech.(Data Comms)
  MBA
  B.Sc.
  MACS (Snr), CP, IP3P

When you want it done /right/ ‒ the first time!

Phone:  +61 4 0411 0089
Email:  matt...@peregrineit.net 
Web:www.peregrineit.net 

View Matthew J BLACK's profile on LinkedIn 



This Email is intended only for the addressee.  Its use is limited to 
that intended by the author at the time and it is not to be distributed 
without the author’s consent.  You must not use or disclose the contents 
of this Email, or add the sender’s Email address to any database, list 
or mailing list unless you are expressly authorised to do so.  Unless 
otherwise stated, Peregrine I.T. Pty Ltd accepts no liability for the 
contents of this Email except where subsequently confirmed in 
writing.  The opinions expressed in this Email are those of the author 
and do not necessarily represent the views of Peregrine I.T. Pty 
Ltd.  This Email is confidential and may be subject to a claim of legal 
privilege.


If you have received this Email in error, please notify the author and 
delete this message immediately.


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io