[ceph-users] Re: rbd mirroring - journal growing and snapshot high io load

2022-09-16 Thread ronny.lippold

hi and thanks a lot.
good to stay not alone and understand some right :)

i will also tell, if there is something new.


so from my point of view, the only consistant way is to freeze fs or 
shutdown vm.

after that start journal mirroring. so i think, only journal can work.

you helped me a lot, cause i had a major understanding problem.

maybe i will start a new thread in the mailing list and will see.

have a great weekend and hopefully a smooth job switching ... i know, 
what you mean :)



ronny


Am 2022-09-15 15:33, schrieb Arthur Outhenin-Chalandre:

Hi Ronny,


On 15/09/2022 14:32 ronny.lippold  wrote:
hi arthur, some time went ...

i would like to know, if there are some news of your setup.
do you have replication active running?


No, there was no change at CERN. I am switching jobs as well actually
so I won't have much news for you on CERN infra in the future. I know
other people from the Ceph team at CERN watch this ml so you might
hear from them as well I guess.


we are using actually snapshot based and had last time a move of both
clusters.
after that, we had some damaged filesystems ind the kvm vms.
did you ever had such a problems in your tests.

i think, there are not so many people, how are using ceph replication.
for me its hard to find the right way.
can a snapshot based ceph replication be crash consisten? i think no.


I never noticed it myself, but yes it's written on the docs actually
https://docs.ceph.com/en/quincy/rbd/rbd-snapshot/ (but on the
mirroring docs this is not actually explained). I never tested that
super carefully though and thought this was more a rare occurence than
anything else.

I heard a while back (maybe a year-ish ago) that there was some long
term plan to automatically trigger an fsfreeze for librbd/qemu on a
snapshot which would probably solve your issue (and also allow
application level consistency via fsfreeze custom hooks). But this was
apparently a tricky feature to add. I cc'ed Illya maybe he would know
more about that or if something else could have caused your issue.

Cheers,

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] CephFS Mirroring failed

2022-09-16 Thread Aristide Bekroundjo



Hi Dear,

I have a two clusters (A,B) Under ceph 16.2.10, and try to implement cephfs 
mirroring (From A to B).

It failed at FS mirroring, is their any logs others that can help me to 
investigate the failure cause..

[root@monnode1]# ceph --admin-daemon 
ceph-client.cephfs-mirror.monnode1.msygpf.7.93868793542088.asok fs mirror peer 
status fs-cluster@1 107f24c3-016f-467d-8ed0-87f1026445d5
{
"/mnt/pointfs": {
"state": "failed",
"snaps_synced": 0,
"snaps_deleted": 0,
"snaps_renamed": 0
}
}
[root@monnode1]#


The source cluster can see the remote cluster.

[root@monnode1 ~]#  ceph fs snapshot mirror peer_list fs-cluster | jq
{
  "b4b020a5-a3f8-4713-8b33-787f0210dbec": {
"client_name": "client.mirror_remote",
"site_name": "site-remote",
"fs_name": "fs-cluster2",
"mon_host": "[v2:192.168.1.176:3300/0,v1:192.168.1.176:6789/0] 
[v2:192.168.1.225:3300/0,v1:192.168.1.225:6789/0] 
[v2:192.168.1.222:3300/0,v1:192.168.1.222:6789/0] 
[v2:192.168.1.48:3300/0,v1:192.168.1.48:6789/0]"
  }
}
[root@monnode1 ~]#

Best regards,



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Public RGW access without any LB in front?

2022-09-16 Thread Boris Behrens
Hi,
does someone got experience with having the RGW daemons directly handling
the public traffic, without any LB or so in front?

We are thinking to ditch the HAproxy. It handles SSL termination, load
balancing (only RR) and stuff like this, but because of the nature of the
setup we only get 6-8 GBit traffic through it.

Then we thought to put the HAProxy directly on RGW hosts (which are also
mon, mgr and OSD hosts) and hope to get more bandwidth through it (remove
one network hop, more power than some virtualized VM).

And now we are discussing just to remove the haproxy, and have the RGW
processes handle it directly.
I am a bit scared this might be a bad idea (can it handle SSL updates well,
without killing active connections? Does nonlocal bind work and we move IP
adresses between the three hosts via keepalived? How good is it handling
bad HTTP request, sent by an attacker?)

Does someone got experience with it and can share some insights?

Cheers
 Boris

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy v17.2.4 QE Validation status

2022-09-16 Thread Ilya Dryomov
On Wed, Sep 14, 2022 at 11:11 AM Ilya Dryomov  wrote:
>
> On Tue, Sep 13, 2022 at 10:03 PM Yuri Weinstein  wrote:
> >
> > Details of this release are summarized here:
> >
> > https://tracker.ceph.com/issues/57472#note-1
> > Release Notes - https://github.com/ceph/ceph/pull/48072
> >
> > Seeking approvals for:
> >
> > rados - Neha, Travis, Ernesto, Adam
> > rgw - Casey
> > fs - Venky
> > orch - Adam
> > rbd - Ilya, Deepika
>
> rbd approved.
>
> > krbd - missing packages, Adam Kr is looking into it
>
> It seems like a transient issue to me, I would just reschedule.

http://pulpito.ceph.com/yuriw-2022-09-14_15:46:47-krbd-quincy-release-testing-default-smithi
http://pulpito.ceph.com/yuriw-2022-09-15_14:38:20-krbd-quincy-release-testing-default-smithi
http://pulpito.ceph.com/yuriw-2022-09-15_17:55:23-krbd-quincy-release-testing-default-smithi

krbd approved.

Thanks,

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rbd mirroring - journal growing and snapshot high io load

2022-09-16 Thread Josef Johansson
Hi,

Are you guys affected by
https://tracker.ceph.com/issues/57396 ?

On Fri, 16 Sep 2022 at 09:40, ronny.lippold  wrote:

> hi and thanks a lot.
> good to stay not alone and understand some right :)
>
> i will also tell, if there is something new.
>
>
> so from my point of view, the only consistant way is to freeze fs or
> shutdown vm.
> after that start journal mirroring. so i think, only journal can work.
>
> you helped me a lot, cause i had a major understanding problem.
>
> maybe i will start a new thread in the mailing list and will see.
>
> have a great weekend and hopefully a smooth job switching ... i know,
> what you mean :)
>
>
> ronny
>
>
> Am 2022-09-15 15:33, schrieb Arthur Outhenin-Chalandre:
> > Hi Ronny,
> >
> >> On 15/09/2022 14:32 ronny.lippold  wrote:
> >> hi arthur, some time went ...
> >>
> >> i would like to know, if there are some news of your setup.
> >> do you have replication active running?
> >
> > No, there was no change at CERN. I am switching jobs as well actually
> > so I won't have much news for you on CERN infra in the future. I know
> > other people from the Ceph team at CERN watch this ml so you might
> > hear from them as well I guess.
> >
> >> we are using actually snapshot based and had last time a move of both
> >> clusters.
> >> after that, we had some damaged filesystems ind the kvm vms.
> >> did you ever had such a problems in your tests.
> >>
> >> i think, there are not so many people, how are using ceph replication.
> >> for me its hard to find the right way.
> >> can a snapshot based ceph replication be crash consisten? i think no.
> >
> > I never noticed it myself, but yes it's written on the docs actually
> > https://docs.ceph.com/en/quincy/rbd/rbd-snapshot/ (but on the
> > mirroring docs this is not actually explained). I never tested that
> > super carefully though and thought this was more a rare occurence than
> > anything else.
> >
> > I heard a while back (maybe a year-ish ago) that there was some long
> > term plan to automatically trigger an fsfreeze for librbd/qemu on a
> > snapshot which would probably solve your issue (and also allow
> > application level consistency via fsfreeze custom hooks). But this was
> > apparently a tricky feature to add. I cc'ed Illya maybe he would know
> > more about that or if something else could have caused your issue.
> >
> > Cheers,
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] default data pool and cephfs using erasure-coded pools

2022-09-16 Thread Jerry Buburuz


Hello,

Scenario 1.

Create 2 pools 1 data/1meta for cephfs using EC

cech fs new mycephfs data1 meta1
Error: "EC pool for default data pool discouraged"

Reading creating-pools I think understand you want for recovery
information for system in replicated pools. I am just not certain I
understand if ceph needs one default replicated pool for the whole
cluster?

Scenario 2.

Can I do this:

# create default pool
1. create 2 new pool data and meta pool, replicated.
2. create new fs for step 1.

# create EC pool for cephfs export
3. create 2 new pools data1 and meta1, erasure-coded
2. create new fs for step 3.

Hope this is clear? If this does work, does this mean a cluster has a
default pool or a default replicated pool for every new erasure-coded
pool?

thanks
jerry

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy v17.2.4 QE Validation status

2022-09-16 Thread Nizamudeen A
Dashboard LGTM!

On Wed, 14 Sept 2022, 01:33 Yuri Weinstein,  wrote:

> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/57472#note-1
> Release Notes - https://github.com/ceph/ceph/pull/48072
>
> Seeking approvals for:
>
> rados - Neha, Travis, Ernesto, Adam
> rgw - Casey
> fs - Venky
> orch - Adam
> rbd - Ilya, Deepika
> krbd - missing packages, Adam Kr is looking into it
> upgrade/octopus-x - missing packages, Adam Kr is looking into it
> ceph-volume - Guillaume is looking into it
>
> Please reply to this email with approval and/or trackers of known
> issues/PRs to address them.
>
> Josh, Neha - LRC upgrade pending major suites approvals.
> RC release - pending major suites approvals.
>
> Thx
> YuriW
>
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy v17.2.4 QE Validation status

2022-09-16 Thread Neha Ojha
The new rados runs look good, rados approved!

Thanks,
Neha

On Fri, Sep 16, 2022 at 8:55 AM Nizamudeen A  wrote:
>
> Dashboard LGTM!
>
> On Wed, 14 Sept 2022, 01:33 Yuri Weinstein,  wrote:
>
> > Details of this release are summarized here:
> >
> > https://tracker.ceph.com/issues/57472#note-1
> > Release Notes - https://github.com/ceph/ceph/pull/48072
> >
> > Seeking approvals for:
> >
> > rados - Neha, Travis, Ernesto, Adam
> > rgw - Casey
> > fs - Venky
> > orch - Adam
> > rbd - Ilya, Deepika
> > krbd - missing packages, Adam Kr is looking into it
> > upgrade/octopus-x - missing packages, Adam Kr is looking into it
> > ceph-volume - Guillaume is looking into it
> >
> > Please reply to this email with approval and/or trackers of known
> > issues/PRs to address them.
> >
> > Josh, Neha - LRC upgrade pending major suites approvals.
> > RC release - pending major suites approvals.
> >
> > Thx
> > YuriW
> >
> > ___
> > Dev mailing list -- d...@ceph.io
> > To unsubscribe send an email to dev-le...@ceph.io
> >
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS Mirroring failed

2022-09-16 Thread Aristide Bekroundjo
Hi,

In the mirror service status I can see the the error about no found directory.
So What is the right full path ?



ync_snaps: failed to sync snapshots for dir_root=/mnt/folderfs
uild_snap_map: failed to open local snap directory=/mnt/folderfs/.snap: (2) No 
such file or directory
o_sync_snaps: failed to build local snap map
ync_snaps: failed to sync snapshots for dir_root=/mnt/folderfs
uild_snap_map: failed to open local snap directory=/mnt/folderfs/.snap: (2) No 
such file or directory
o_sync_snaps: failed to build local snap map
ync_snaps: failed to sync snapshots for dir_root=/mnt/folderfs
uild_snap_map: failed to open local snap directory=/mnt/folderfs/.snap: (2) No 
such file or directory
o_sync_snaps: failed to build local snap map
ync_snaps: failed to sync snapshots for dir_root=/mnt/folderfs




[root@monnode1]# ceph --admin-daemon 
ceph-client.cephfs-mirror.monnode1.msygpf.7.94270975030728.asok fs mirror peer 
status fs-cluster@1 107f24c3-016f-467d-8ed0-87f1026445d5
{
"/mnt/folderfs": {
"state": "failed",
"snaps_synced": 0,
"snaps_deleted": 0,
"snaps_renamed": 0
}
}
[root@monnode1 ca59e8b0-3398-11ed-8fc2-525400c2ee0b]#

Best regards,
Ari.

De : Aristide Bekroundjo
Envoyé le :vendredi 16 septembre 2022 10:53
À : ceph-users@ceph.io
Objet :[ceph-users] CephFS Mirroring failed



Hi Dear,

I have a two clusters (A,B) Under ceph 16.2.10, and try to implement cephfs 
mirroring (From A to B).

It failed at FS mirroring, is their any logs others that can help me to 
investigate the failure cause..

[root@monnode1]# ceph --admin-daemon 
ceph-client.cephfs-mirror.monnode1.msygpf.7.93868793542088.asok fs mirror peer 
status fs-cluster@1 107f24c3-016f-467d-8ed0-87f1026445d5
{
"/mnt/pointfs": {
"state": "failed",
"snaps_synced": 0,
"snaps_deleted": 0,
"snaps_renamed": 0
}
}
[root@monnode1]#


The source cluster can see the remote cluster.

[root@monnode1 ~]#  ceph fs snapshot mirror peer_list fs-cluster | jq
{
  "b4b020a5-a3f8-4713-8b33-787f0210dbec": {
"client_name": "client.mirror_remote",
"site_name": "site-remote",
"fs_name": "fs-cluster2",
"mon_host": "[v2:192.168.1.176:3300/0,v1:192.168.1.176:6789/0] 
[v2:192.168.1.225:3300/0,v1:192.168.1.225:6789/0] 
[v2:192.168.1.222:3300/0,v1:192.168.1.222:6789/0] 
[v2:192.168.1.48:3300/0,v1:192.168.1.48:6789/0]"
  }
}
[root@monnode1 ~]#

Best regards,



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rbd mirroring - journal growing and snapshot high io load

2022-09-16 Thread Arthur Outhenin-Chalandre
Hi Josef,

> On 16/09/2022 14:15 Josef Johansson  wrote:
> Are you guys affected by
> https://tracker.ceph.com/issues/57396 ?

The issue with journal mode for me was more that the journal replay was slow 
which made the journal also grows... You should probably inspect your 
rbd-mirror logs (and if nothing interesting is in there increase the debug 
level and retry).

Cheers,

-- 
Arthur Outhenin-Chalandre
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rbd mirroring - journal growing and snapshot high io load

2022-09-16 Thread Josef Johansson
Hi,

I've added as much logging as I can, still shows nothing.

On Fri, 16 Sep 2022 at 21:35, Arthur Outhenin-Chalandre <
arthur.outhenin-chalan...@cern.ch> wrote:

> Hi Josef,
>
> > On 16/09/2022 14:15 Josef Johansson  wrote:
> > Are you guys affected by
> > https://tracker.ceph.com/issues/57396 ?
>
> The issue with journal mode for me was more that the journal replay was
> slow which made the journal also grows... You should probably inspect your
> rbd-mirror logs (and if nothing interesting is in there increase the debug
> level and retry).
>
> Cheers,
>
> --
> Arthur Outhenin-Chalandre
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: weird performance issue on ceph

2022-09-16 Thread Mark Nelson

Hi Zoltan,


So kind of interesting results.  In the "good" write test the OSD 
doesn't actually seem to be working very hard.  If you look at the kv 
sync thread, it's mostly idle with only about 22% of the time in the 
thread spent doing real work:


1.
   | + 99.90% BlueStore::_kv_sync_thread()
2.
   | + 78.60% std::condition_variable::wait(std::unique_lock&)
3.
   | |+ 78.60% pthread_cond_wait
4.
   | + 18.00%
   
RocksDBStore::submit_transaction_sync(std::shared_ptr)

...but at least it's actually doing work!  For reference though, on our 
high performing setup with enough concurrency we can push things hard 
enough where this thread isn't spending much time in pthread_cond_wait.  
In the "bad" state, your example OSD here is basically doing nothing at 
all (100% of the time in pthread_cold_wait!).  The tp_osd_tp and the kv 
sync thread are just waiting around twiddling their thumbs:


1.
   Thread 339848 (bstore_kv_sync) - 1000 samples
2.
   + 100.00% clone
3.
   + 100.00% start_thread
4.
   + 100.00% BlueStore::KVSyncThread::entry()
5.
   + 100.00% BlueStore::_kv_sync_thread()
6.
   + 100.00% std::condition_variable::wait(std::unique_lock&)
7.
   + 100.00% pthread_cond_wait


My first thought is that you might have one or more OSDs that are 
slowing the whole cluster down so that clients are backing up on it and 
other OSDs are just waiting around for IO.  It might be worth checking 
the perf admin socket stats on each OSD to see if you can narrow down if 
any of them are having issues.



Thanks,

Mark


On 9/16/22 05:57, Zoltan Langi wrote:
Hey people and Mark, the cluster was left overnight to do nothing and 
the problem as expected came back in the morning. We managed to 
capture the bad states on the exact same OSD-s we captured the good 
states earlier:


Here is the output of a read test when the cluster is in a bad state 
on the same OSD which I recorded in the good state earlier:


https://pastebin.com/jp5JLWYK

Here is the output of a write test when the cluster is in a bad state 
on the same OSD which I recorded in the good state earlier:


The write speed came down from 30,1GB/s to 17,9GB/s

https://pastebin.com/9e80L5XY

We are still open for any suggestions, so please feel free to comment 
or suggest. :)


Thanks a lot,
Zoltan

Am 15.09.22 um 16:53 schrieb Zoltan Langi:
Hey people and Mark, we managed to capture the good and bad states 
separately:


Here is the output of a read test when the cluster is in a bad state:

https://pastebin.com/0HdNapLQ

Here is the output of a write test when the cluster is in a bad state:

https://pastebin.com/2T2pKu6Q

Here is the output of a read test when the cluster is in a brand new 
reinstalled state:


https://pastebin.com/qsKeX0D8

Here is the output of a write test when the cluster is in a brand new 
reinstalled state:


https://pastebin.com/nTCuEUAb

Hope anyone can suggest anything, any ideas are welcome! :)

Zoltan

Am 13.09.22 um 14:27 schrieb Zoltan Langi:

Hey Mark,

Sorry about the silence for a while, but a lot of things came up. We 
finally managed to fix up the profiler and here is an output when 
the ceph is under heavy write load, in a pretty bad state and its 
throughput is not achieving more than 12,2GB/s.


For a good state we have to recreate the whole thing, so we thought 
we start with the bad state, maybe something obvious is already 
visible for someone who knows the osd internals well.


You find the file here: https://pastebin.com/0HdNapLQ

Tanks a lot in advance,

Zolta

Am 12.08.22 um 18:25 schrieb Mark Nelson:
CAUTION: This email originated from outside the organization. Do 
not click links unless you can confirm the sender and know the 
content is safe.


Hi Zoltan,


Sadly it looks like some of the debug symbols are messed which 
makes things a little rough to debug from this.  On the write path 
if you look at the bstore_kv_sync thread:



Good state write test:

1.
   + 86.00% FileJournal::_open_file(long, long, bool)
2.
   |+ 86.00% ???
3.
   + 11.50% ???
4.
   |+ 0.20% ???

Bad state write test:

1.
   Thread 2869223 (bstore_kv_sync) - 1000 samples
2.
   + 73.70% FileJournal::_open_file(long, long, bool)
3.
   |+ 73.70% ???
4.
   + 24.90% ???

That's really strange, because FileJournal is part of filestore. 
There also seems to be stuff in this trace regarding 
BtrfsFileStoreBackend and FuseStore::Stop(). Seems like the debug 
symbols are basically just wrong.  Is it possible that some how you 
ended up with debug symbols for the wrong version of ceph or 
something?



Mark


On 8/12/22 11:13, Zoltan Langi wrote:

Hi Mark,

I managed to profile one osd before and after the bad state. We 
have downgraded ceph to 14.2.22


Good state with read test:

https://pastebin.com/etreYzQc

Good state with write test:

https://pastebin.com/qrN5MaY6

Bad state with read test:

https://pastebin.com/S1pRiJDq

Bad state with write test:

https://pastebin.com/dEv05eGV

Do you see anything obvious that could give us a clue what is