[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-10 Thread Janek Bevendorff
Thanks for the reply. This issue seems to be VERY serious. New objects 
are disappearing every day. This is a silent, creeping data loss.


I couldn't find the object with rados stat, but I am now listing all the 
objects and will grep the dump to see if there is anything left.


Janek

On 09/11/2020 23:31, Rafael Lopez wrote:

Hi Mariusz, all

We have seen this issue as well, on redhat ceph 4 (I have an 
unresolved case open). In our case, `radosgw-admin stat` is not a 
sufficient check to guarantee that there are rados objects. You have 
to do a `rados stat` to know that.


In your case, the object is ~48M in size, appears to also use S3 
multipart.
This means, when uploaded, S3 will slice it up into parts based on 
what S3 multipart size you use (5M default, i think 8M here). After 
that, rados further slices any incoming (multipart size objects) into 
rados object objects of 4Mib size (default).


The end result is you have a bunch of rados objects labelled with the 
'prefix' from the `radosgw-admin stat` you ran, as well as a head 
object (named the same as the S3 object you uploaded) that contains 
the metadata so rgw knows how to put the S3 object back together. In 
our case, the head object is there but the other rados pieces that 
hold the actual data seem to be gone, so `radosgw-admin stat` returns 
fine, but we get NoSuchKey when trying to download.


Try `rados -p {rgw buckets pool} stat 
255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4`, 
it will show you the rados stat of the head object, which will be much 
smaller than the S3 object.


To check if you actually have all rados objects for this 48M S3 
object, try searching for parts of the prefix or the whole prefix on a 
list of all rados objects in buckets pool.
FYI, the `rados ls` will list every rados object in the bucket, so it 
may be very large and take a long time if you have many objects.


rados -p {rgw buckets pool} ls > {tmpfile}
grep '2~NTy88SkDkXR9ifSrrRcw5WPDxqN3PO2' {tmpfile}
grep 'juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4' {tmpfile}

The first grep is actually the S3 multipart ID string added to the 
prefix by rgw.


Rafael

On Tue, 10 Nov 2020 at 01:04, Janek Bevendorff 
> wrote:


We are having the exact same problem (also Octopus). The object is
listed by s3cmd, but trying to download it results in a 404 error.
radosgw-admin object stat shows that the object still exists. Any
further ideas how I can restore access to this object?

(Sorry if this is a duplicate, but it seems like the mailing list
hasn't
accepted my original mail).


> Mariusz Gronczewski wrote:
>
>
>> Dnia 2020-07-27, o godz. 21:31:33
>> "Robin H. Johnson" mailto:robb...@gentoo.org>
>> >> napisał(a):
>>
>>
>>>
On Mon, Jul 27, 2020 at 08:02:23PM +0200, Mariusz Gronczewski wrote:
>>>
 Hi,
 I've got a problem on Octopus (15.2.3, debian packages) install,
 bucket S3 index shows a file:
 s3cmd ls s3://upvid/255/38355 --recursive
 2020-07-27 17:48  50584342


s3://upvid/255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4
 radosgw-admin bi list also shows it
 {
 "type": "plain",
 "idx":

"255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4",
 "entry": { "name":

"255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4",
 "instance": "", "ver": {
 "pool": 11,
 "epoch": 853842
 },
 "locator": "",
 "exists": "true",
 "meta": {
 "category": 1,
 "size": 50584342,
 "mtime": "2020-07-27T17:48:27.203008Z",
 "etag": "2b31cc8ce8b1fb92a5f65034f2d12581-7",
 "storage_class": "",
 "owner": "filmweb-app",
 "owner_display_name": "filmweb app user",
 "content_type": "",
 "accounted_size": 50584342,
 "user_data": "",
 "appendable": "false"
 },
 "tag": "_3ubjaztglHXfZr05wZCFCPzebQf-ZFP",
 "flags": 0,
 "pending_map": [],
 "versioned_epoch": 0
 }
 },

but trying to download it via curl (I've set permissions to public0

>>> only gets me
>>> Does the RADOS object for this still exist?
>>>
>>> try:
>>> radosgw-admin object stat --bucket ... --object
>>>
'255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_m

[ceph-users] Re: cephfs - blacklisted client coming back?

2020-11-10 Thread Dan van der Ster
Hi Andras,

I don't have much experience with blacklisting to know what is a safe default.
For our clusters we use the auto-reconnect settings and never
blacklist any clients.

Cheers, Dan

On Tue, Nov 10, 2020 at 2:10 AM Andras Pataki
 wrote:
>
> Hi Dan,
>
> That makes sense - the time between blacklist and magic comeback was
> around 1 hour - thanks for the explanation.  Is this is a safe default?
> At eviction, the MDS takes all caps from the client away, so if it comes
> back in an hour, doesn't it then  write to files that it perhaps
> shouldn't have access to?
>
> There is the other strange thing ceph-fuse was doing for an hour
> (increased the objecter log level to 20).
>
> Here is the eviction:
> 2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680 I was
> blacklisted at osd epoch 1717894
> 2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680
> remove_session_caps still has dirty|flushing caps on
> 0x100673a2613.head(faked_ino=0 ref=5 ll_ref=1
> cap_refs={4=0,1024=0,4096=0,8192=0} open={3=1} mode=100640
> size=106/4194304 nlink=1 btime=0.00 mtime=2020-11-09 14:34:28.139916
> ctime=2020-11-09 14:34:28.139916 caps=- dirty_caps=Fw
> objectset[0x100673a2613 ts 0/0 objects 1 dirty_or_tx 0]
> parents=0x10067375a7c.head["pwaf-00680.ene"] 0x7fffd034b4d0)
> 2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680
> remove_session_caps still has dirty|flushing caps on
> 0x100673a2614.head(faked_ino=0 ref=5 ll_ref=1
> cap_refs={4=0,1024=0,4096=0,8192=0} open={3=1} mode=100640
> size=106/4194304 nlink=1 btime=0.00 mtime=2020-11-09 14:34:28.145199
> ctime=2020-11-09 14:34:28.145199 caps=- dirty_caps=Fw
> objectset[0x100673a2614 ts 0/0 objects 1 dirty_or_tx 0]
> parents=0x10067375a7c.head["pwaf-00685.ene"] 0x7fffd034bc20)
> 2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680
> remove_session_caps still has dirty|flushing caps on
> 0x100673a2615.head(faked_ino=0 ref=5 ll_ref=1
> cap_refs={4=0,1024=0,4096=0,8192=0} open={3=1} mode=100640
> size=106/4194304 nlink=1 btime=0.00 mtime=2020-11-09 14:34:28.150306
> ctime=2020-11-09 14:34:28.150306 caps=- dirty_caps=Fw
> objectset[0x100673a2615 ts 0/0 objects 1 dirty_or_tx 0]
> parents=0x10067375a7c.head["pwaf-00682.ene"] 0x7fffd034c1d0)
> ... and a lot more of these ...
>
> then the following types of messages repeat:
>
> 2020-11-09 16:51:10.236 7fffdaffd700  7 objectcacher bh_write_scattered
> bh[ 0x7fff60a0ae40 2026998~4 0x7fffac4d0460 (4) v 131065 dirty
> firstbyte=32] waiters = {}
> 2020-11-09 16:51:10.236 7fffdaffd700  7 objectcacher bh_write_scattered
> bh[ 0x7ffe6c405f80 2051562~328804 0x7fffac4d0460 (328804) v 131065 dirty
> firstbyte=-42] waiters = {}
> 2020-11-09 16:51:10.236 7fffdaffd700  7 objectcacher bh_write_scattered
> bh[ 0x7fff9b14d850 2380366~4 0x7fffac4d0460 (4) v 131065 dirty
> firstbyte=32] waiters = {}
> 2020-11-09 16:51:10.236 7fffdaffd700  7 objectcacher bh_write_scattered
> bh[ 0x7fff9bc966f0 2380370~8176 0x7fffac4d0460 (8176) v 131065 dirty
> firstbyte=96] waiters = {}
> ... about 200 or so of these ...
>
> followed by
>
> 2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commit
> marking dirty again due to error bh[ 0x7fff60a0ae40 2026998~4
> 0x7fffac4d0460 (4) v 131183 dirty firstbyte=32] waiters = {} r = -108
> (108) Cannot send after transport endpoint shutdown
> 2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commit
> marking dirty again due to error bh[ 0x7ffe6c405f80 2051562~328804
> 0x7fffac4d0460 (328804) v 131183 dirty firstbyte=-42] waiters = {} r =
> -108 (108) Cannot send after transport endpoint shutdown
> 2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commit
> marking dirty again due to error bh[ 0x7fff9b14d850 2380366~4
> 0x7fffac4d0460 (4) v 131183 dirty firstbyte=32] waiters = {} r = -108
> (108) Cannot send after transport endpoint shutdown
> 2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commit
> marking dirty again due to error bh[ 0x7fff9bc966f0 2380370~8176
> 0x7fffac4d0460 (8176) v 131183 dirty firstbyte=96] waiters = {} r = -108
> (108) Cannot send after transport endpoint shutdown
> ... about 200 or so of these ...
>
> then again:
>
> 2020-11-09 16:51:11.260 7fffdaffd700  7 objectcacher bh_write_scattered
> bh[ 0x7fff60a0ae40 2026998~4 0x7fffac4d0460 (4) v 131183 dirty
> firstbyte=32] waiters = {}
> 2020-11-09 16:51:11.260 7fffdaffd700  7 objectcacher bh_write_scattered
> bh[ 0x7ffe6c405f80 2051562~328804 0x7fffac4d0460 (328804) v 131183 dirty
> firstbyte=-42] waiters = {}
> 2020-11-09 16:51:11.260 7fffdaffd700  7 objectcacher bh_write_scattered
> bh[ 0x7fff9b14d850 2380366~4 0x7fffac4d0460 (4) v 131183 dirty
> firstbyte=32] waiters = {}
> 2020-11-09 16:51:11.260 7fffdaffd700  7 objectcacher bh_write_scattered
> bh[ 0x7fff9bc966f0 2380370~8176 0x7fffac4d0460 (8176) v 131183 dirty
> firstbyte=96] waiters = {}
>
> rejected again:
>
> 2020-11-09 16:51:11.772 7fffdb7fe700 10 objectcacher bh_write_commit
> marking dirty again due to error bh[ 

[ceph-users] Re: Cephfs Kernel client not working properly without ceph cluster IP

2020-11-10 Thread Amudhan P
Hi Eugen,

Yes, you're right other than OSD's rest don't require cluster IP.

But in my case, I don't know what went wrong my kernel client requires
cluster IP for the mount to work properly.

About my setup; :-
Cluster Initially bootstrapped configured with public IP only, later added
cluster IP by the below steps.

### adding public IP for ceph cluster ###
ceph config set global cluster_network 10.100.4.0/24

ceph orch daemon reconfig mon.host1
ceph orch daemon reconfig mon.host2
ceph orch daemon reconfig mon.host3
ceph orch daemon reconfig osd.1
ceph orch daemon reconfig osd.2
ceph orch daemon reconfig osd.3

restarting all daemons.

regards
Amudhan P

On Mon, Nov 9, 2020 at 9:49 PM Eugen Block  wrote:

> Clients don't need the cluster IP because that's only for OSD <--> OSD
> replication, no client traffic. But of course to be able to
> communicate with Ceph the clients need a public IP, how else would
> they contact the MON? Or did I misunderstand your setup?
>
>
> Zitat von Amudhan P :
>
> > Hi,
> >
> > I have mounted my cephfs (ceph octopus) thru kernel client in Debian.
> > I get following error in "dmesg" when I try to read any file from my
> mount.
> > "[  236.429897] libceph: osd1 10.100.4.1:6891 socket closed (con state
> > CONNECTING)"
> >
> > I use public IP (10.100.3.1) and cluster IP (10.100.4.1) in my ceph
> > cluster. I think public IP is enough to mount the share and work on it
> but
> > in my case, it needs me to assign public IP also to the client to work
> > properly.
> >
> > Does anyone have experience in this?
> >
> > I have earlier also mailed the ceph-user group but I didn't get any
> > response. So sending again not sure my mail went through.
> >
> > regards
> > Amudhan
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs - blacklisted client coming back?

2020-11-10 Thread Dan van der Ster
On Tue, Nov 10, 2020 at 10:59 AM Frank Schilder  wrote:
>
> Hi Dan.
>
> > For our clusters we use the auto-reconnect settings
>
> Could you give me a hint what settings these are? Are they available in mimic?

Yes. On the mds you need:
mds session blacklist on timeout = false
mds session blacklist on evict = false

And on the fuse client you need:
   client reconnect stale = true

And kernels reconnect by default.

(There might be some consistency sacrificed by this config, but tbh we
never had an issue in a few years).

Cheers, Dan

>
> Thanks!
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Dan van der Ster 
> Sent: 10 November 2020 10:47:11
> To: Andras Pataki
> Cc: ceph-users
> Subject: [ceph-users] Re: cephfs - blacklisted client coming back?
>
> Hi Andras,
>
> I don't have much experience with blacklisting to know what is a safe default.
> For our clusters we use the auto-reconnect settings and never
> blacklist any clients.
>
> Cheers, Dan
>
> On Tue, Nov 10, 2020 at 2:10 AM Andras Pataki
>  wrote:
> >
> > Hi Dan,
> >
> > That makes sense - the time between blacklist and magic comeback was
> > around 1 hour - thanks for the explanation.  Is this is a safe default?
> > At eviction, the MDS takes all caps from the client away, so if it comes
> > back in an hour, doesn't it then  write to files that it perhaps
> > shouldn't have access to?
> >
> > There is the other strange thing ceph-fuse was doing for an hour
> > (increased the objecter log level to 20).
> >
> > Here is the eviction:
> > 2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680 I was
> > blacklisted at osd epoch 1717894
> > 2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680
> > remove_session_caps still has dirty|flushing caps on
> > 0x100673a2613.head(faked_ino=0 ref=5 ll_ref=1
> > cap_refs={4=0,1024=0,4096=0,8192=0} open={3=1} mode=100640
> > size=106/4194304 nlink=1 btime=0.00 mtime=2020-11-09 14:34:28.139916
> > ctime=2020-11-09 14:34:28.139916 caps=- dirty_caps=Fw
> > objectset[0x100673a2613 ts 0/0 objects 1 dirty_or_tx 0]
> > parents=0x10067375a7c.head["pwaf-00680.ene"] 0x7fffd034b4d0)
> > 2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680
> > remove_session_caps still has dirty|flushing caps on
> > 0x100673a2614.head(faked_ino=0 ref=5 ll_ref=1
> > cap_refs={4=0,1024=0,4096=0,8192=0} open={3=1} mode=100640
> > size=106/4194304 nlink=1 btime=0.00 mtime=2020-11-09 14:34:28.145199
> > ctime=2020-11-09 14:34:28.145199 caps=- dirty_caps=Fw
> > objectset[0x100673a2614 ts 0/0 objects 1 dirty_or_tx 0]
> > parents=0x10067375a7c.head["pwaf-00685.ene"] 0x7fffd034bc20)
> > 2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680
> > remove_session_caps still has dirty|flushing caps on
> > 0x100673a2615.head(faked_ino=0 ref=5 ll_ref=1
> > cap_refs={4=0,1024=0,4096=0,8192=0} open={3=1} mode=100640
> > size=106/4194304 nlink=1 btime=0.00 mtime=2020-11-09 14:34:28.150306
> > ctime=2020-11-09 14:34:28.150306 caps=- dirty_caps=Fw
> > objectset[0x100673a2615 ts 0/0 objects 1 dirty_or_tx 0]
> > parents=0x10067375a7c.head["pwaf-00682.ene"] 0x7fffd034c1d0)
> > ... and a lot more of these ...
> >
> > then the following types of messages repeat:
> >
> > 2020-11-09 16:51:10.236 7fffdaffd700  7 objectcacher bh_write_scattered
> > bh[ 0x7fff60a0ae40 2026998~4 0x7fffac4d0460 (4) v 131065 dirty
> > firstbyte=32] waiters = {}
> > 2020-11-09 16:51:10.236 7fffdaffd700  7 objectcacher bh_write_scattered
> > bh[ 0x7ffe6c405f80 2051562~328804 0x7fffac4d0460 (328804) v 131065 dirty
> > firstbyte=-42] waiters = {}
> > 2020-11-09 16:51:10.236 7fffdaffd700  7 objectcacher bh_write_scattered
> > bh[ 0x7fff9b14d850 2380366~4 0x7fffac4d0460 (4) v 131065 dirty
> > firstbyte=32] waiters = {}
> > 2020-11-09 16:51:10.236 7fffdaffd700  7 objectcacher bh_write_scattered
> > bh[ 0x7fff9bc966f0 2380370~8176 0x7fffac4d0460 (8176) v 131065 dirty
> > firstbyte=96] waiters = {}
> > ... about 200 or so of these ...
> >
> > followed by
> >
> > 2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commit
> > marking dirty again due to error bh[ 0x7fff60a0ae40 2026998~4
> > 0x7fffac4d0460 (4) v 131183 dirty firstbyte=32] waiters = {} r = -108
> > (108) Cannot send after transport endpoint shutdown
> > 2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commit
> > marking dirty again due to error bh[ 0x7ffe6c405f80 2051562~328804
> > 0x7fffac4d0460 (328804) v 131183 dirty firstbyte=-42] waiters = {} r =
> > -108 (108) Cannot send after transport endpoint shutdown
> > 2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commit
> > marking dirty again due to error bh[ 0x7fff9b14d850 2380366~4
> > 0x7fffac4d0460 (4) v 131183 dirty firstbyte=32] waiters = {} r = -108
> > (108) Cannot send after transport endpoint shutdown
> > 2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commit
> > marking dirty again due to error bh[ 0x7fff9bc966f0 23

[ceph-users] Re: Cephfs Kernel client not working properly without ceph cluster IP

2020-11-10 Thread Amudhan P
Hi Nathan,

Kernel client should be using only the public IP of the cluster to
communicate with OSD's.

But here it requires both IP's for mount to work properly.

regards
Amudhan



On Mon, Nov 9, 2020 at 9:51 PM Nathan Fish  wrote:

> It sounds like your client is able to reach the mon but not the OSD?
> It needs to be able to reach all mons and all OSDs.
>
> On Sun, Nov 8, 2020 at 4:29 AM Amudhan P  wrote:
> >
> > Hi,
> >
> > I have mounted my cephfs (ceph octopus) thru kernel client in Debian.
> > I get following error in "dmesg" when I try to read any file from my
> mount.
> > "[  236.429897] libceph: osd1 10.100.4.1:6891 socket closed (con state
> > CONNECTING)"
> >
> > I use public IP (10.100.3.1) and cluster IP (10.100.4.1) in my ceph
> > cluster. I think public IP is enough to mount the share and work on it
> but
> > in my case, it needs me to assign public IP also to the client to work
> > properly.
> >
> > Does anyone have experience in this?
> >
> > I have earlier also mailed the ceph-user group but I didn't get any
> > response. So sending again not sure my mail went through.
> >
> > regards
> > Amudhan
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-10 Thread Janek Bevendorff
I found some of the data in the rados ls dump. We host some WARCs from 
the Internet Archive and one affected WARC still has its warc.os.cdx.gz 
file intact, while the actual warc.gz is gone.


A rados stat revealed

WIDE-20110903143858-01166.warc.os.cdx.gz mtime 
2019-07-14T17:48:39.00+0200, size 1060428


for the cdx.gz file, but

WIDE-20110903143858-01166.warc.gz mtime 2019-07-14T17:04:49.00+0200, 
size 0


for the warc.gz.

I couldn't find any of the suffixed multipart objects listed in 
radosgw-admin stat.


WIDE-20110903143858-01166.warc.gz.2~m5Y42lPMIeis5qgJAZJfuNnzOKd7lme.19: 
(2) No such file or directory



On 10/11/2020 10:14, Janek Bevendorff wrote:
Thanks for the reply. This issue seems to be VERY serious. New objects 
are disappearing every day. This is a silent, creeping data loss.


I couldn't find the object with rados stat, but I am now listing all 
the objects and will grep the dump to see if there is anything left.


Janek

On 09/11/2020 23:31, Rafael Lopez wrote:

Hi Mariusz, all

We have seen this issue as well, on redhat ceph 4 (I have an 
unresolved case open). In our case, `radosgw-admin stat` is not a 
sufficient check to guarantee that there are rados objects. You have 
to do a `rados stat` to know that.


In your case, the object is ~48M in size, appears to also use S3 
multipart.
This means, when uploaded, S3 will slice it up into parts based on 
what S3 multipart size you use (5M default, i think 8M here). After 
that, rados further slices any incoming (multipart size objects) into 
rados object objects of 4Mib size (default).


The end result is you have a bunch of rados objects labelled with the 
'prefix' from the `radosgw-admin stat` you ran, as well as a head 
object (named the same as the S3 object you uploaded) that contains 
the metadata so rgw knows how to put the S3 object back together. In 
our case, the head object is there but the other rados pieces that 
hold the actual data seem to be gone, so `radosgw-admin stat` returns 
fine, but we get NoSuchKey when trying to download.


Try `rados -p {rgw buckets pool} stat 
255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4`, 
it will show you the rados stat of the head object, which will be 
much smaller than the S3 object.


To check if you actually have all rados objects for this 48M S3 
object, try searching for parts of the prefix or the whole prefix on 
a list of all rados objects in buckets pool.
FYI, the `rados ls` will list every rados object in the bucket, so it 
may be very large and take a long time if you have many objects.


rados -p {rgw buckets pool} ls > {tmpfile}
grep '2~NTy88SkDkXR9ifSrrRcw5WPDxqN3PO2' {tmpfile}
grep 'juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4' 
{tmpfile}


The first grep is actually the S3 multipart ID string added to the 
prefix by rgw.


Rafael

On Tue, 10 Nov 2020 at 01:04, Janek Bevendorff 
> wrote:


    We are having the exact same problem (also Octopus). The object is
    listed by s3cmd, but trying to download it results in a 404 error.
    radosgw-admin object stat shows that the object still exists. Any
    further ideas how I can restore access to this object?

    (Sorry if this is a duplicate, but it seems like the mailing list
    hasn't
    accepted my original mail).


    > Mariusz Gronczewski wrote:
    >
    >
    >> Dnia 2020-07-27, o godz. 21:31:33
    >> "Robin H. Johnson" 

    >> >> napisał(a):
    >>
    >>
    >>>
On Mon, Jul 27, 2020 at 08:02:23PM +0200, Mariusz Gronczewski wrote:
    >>>
     Hi,
     
I've got a problem on Octopus (15.2.3, debian packages) install,

     bucket S3 index shows a file:
     s3cmd ls s3://upvid/255/38355 --recursive
     2020-07-27 17:48  50584342
    

s3://upvid/255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4
     radosgw-admin bi list also shows it
     {
     "type": "plain",
     "idx":
    
"255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4",
     "entry": { "name":
    
"255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4",
     "instance": "", "ver": {
     "pool": 11,
     "epoch": 853842
     },
     "locator": "",
     "exists": "true",
     "meta": {
     "category": 1,
     "size": 50584342,
     "mtime": "2020-07-27T17:48:27.203008Z",
     "etag": "2b31cc8ce8b1fb92a5f65034f2d12581-7",
     "storage_class": "",
     "owner": "filmweb-app",
     "owner_display_name": "filmweb app user",
     "content_type": "",
     "acco

[ceph-users] Re: Cephfs Kernel client not working properly without ceph cluster IP

2020-11-10 Thread Janne Johansson
Den tis 10 nov. 2020 kl 11:13 skrev Amudhan P :

> Hi Nathan,
>
> Kernel client should be using only the public IP of the cluster to
> communicate with OSD's.
>

"ip of the cluster" is a bit weird way to state it.

A mounting client needs only to talk to ips in the public range yes, but
OSDs alwaysneed to have an ip in the public range too.
The private range is only for OSD<->OSD traffic and can be in the private
network, meaning an OSD which uses both private and public ranges needs two
ips, one in each range.



> But here it requires both IP's for mount to work properly.
>
> regards
> Amudhan
>
>
>
> On Mon, Nov 9, 2020 at 9:51 PM Nathan Fish  wrote:
>
> > It sounds like your client is able to reach the mon but not the OSD?
> > It needs to be able to reach all mons and all OSDs.
> >
> > On Sun, Nov 8, 2020 at 4:29 AM Amudhan P  wrote:
> > >
> > > Hi,
> > >
> > > I have mounted my cephfs (ceph octopus) thru kernel client in Debian.
> > > I get following error in "dmesg" when I try to read any file from my
> > mount.
> > > "[  236.429897] libceph: osd1 10.100.4.1:6891 socket closed (con state
> > > CONNECTING)"
> > >
> > > I use public IP (10.100.3.1) and cluster IP (10.100.4.1) in my ceph
> > > cluster. I think public IP is enough to mount the share and work on it
> > but
> > > in my case, it needs me to assign public IP also to the client to work
> > > properly.
> > >
> > > Does anyone have experience in this?
> > >
> > > I have earlier also mailed the ceph-user group but I didn't get any
> > > response. So sending again not sure my mail went through.
> > >
> > > regards
> > > Amudhan
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Dovecot and fnctl locks

2020-11-10 Thread Eugen Block

Hi Dan,

one of our customers reported practically the same issue with fnctl  
locks but no negative PIDs:


0807178332093/mailboxes/Spam/rbox-Mails/dovecot.index.log (WRITE  
lock held by pid 25164)
0807178336211/mailboxes/INBOX/rbox-Mails/dovecot.index.log (WRITE  
lock held by pid 8143)


These errors occured during failure tests where the underlying MDS  
servers were shutoff. Restarting dovecot was enough to get rid of the  
erros. The mounted dovecot directories are pinned to specific MDS  
daemons, the environment is not in production though.
Since we saw these for the first time and the root cause was a  
disaster scenario we didn't really take the time to investigate, so I  
can't really share anything, just confirm it (for now), maybe this  
topic comes up again.


Regards,
Eugen



Zitat von Dan van der Ster :


Hi,

Yeah the negative pid is interesting. AFAICT we use a negative pid to
indicate that the lock was taken on another host:

https://github.com/torvalds/linux/blob/master/fs/ceph/locks.c#L119
https://github.com/torvalds/linux/commit/9d5b86ac13c573795525ecac6ed2db39ab23e2a8

"Finally, we convert remote filesystems to present remote pids using
negative numbers. Have lustre, 9p, ceph, cifs, and dlm negate the remote
pid returned for F_GETLK lock requests."

The good news is that my colleagues managed to clear this filelock by
restarting dovecot on a couple nodes.
But I'm still curious if others have a nice way to debug such things.

Cheers, Dan


On Mon, Nov 9, 2020 at 8:11 PM Anthony D'Atri  
 wrote:


Looks like a - in front of the 9605 — signed/unsigned int flern?

> On Nov 9, 2020, at 4:59 AM, Dan van der Ster  wrote:
>
> Hi all,
>
> MDS version v14.2.11
> Client kernel 3.10.0-1127.19.1.el7.x86_64
>
> We are seeing a strange issue with a dovecot use-case on cephfs.
> Occasionally we have dovecot reporting a file locked, such as:
>
> Nov 09 13:55:00 dovecot-backend-00.cern.ch dovecot[27710]:
> imap(reguero)<23945>: Error: Mailbox Deleted Items:
> Timeout (180s) while waiting for lock for transaction log file
> /mail/users/r/reguero//mdbox/mailboxes/Deleted
> Items/dbox-Mails/dovecot.index.log (WRITE lock held by pid -9605)
>
> We checked all hosts that have mounted the cephfs -- there is no pid 9605.
>
> Is there any way to see who exactly created the lock? ceph_filelock
> has a client id, but I didn't find a way to inspect the
> cephfs_metadata to see the ceph_filelock directly.
>
> Otherwise, are other Dovecot/CephFS users seeing this? Did you switch
> to flock or lockfile instead of fnctlk locks?
>
> Thanks!
>
> Dan
>
> P.S. here is the output from print locks tool from the kernel client:
>
> Read lock:
>  Type: 1 (0: Read, 1: Write, 2: Unlocked)
>  Whence: 0 (0: start, 1: current, 2: end)
>  Offset: 0
>  Len: 1
>  Pid: -9605
> Write lock:
>  Type: 1 (0: Read, 1: Write, 2: Unlocked)
>  Whence: 0 (0: start, 1: current, 2: end)
>  Offset: 0
>  Len: 1
>  Pid: -9605
>
> and same file from a 15.2.5 fuse client :
>
> Read lock:
>  Type: 1 (0: Read, 1: Write, 2: Unlocked)
>  Whence: 0 (0: start, 1: current, 2: end)
>  Offset: 0
>  Len: 0
>  Pid: 0
> Write lock:
>  Type: 1 (0: Read, 1: Write, 2: Unlocked)
>  Whence: 0 (0: start, 1: current, 2: end)
>  Offset: 0
>  Len: 0
>  Pid: 0
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs - blacklisted client coming back?

2020-11-10 Thread Frank Schilder
Hi Dan.

> For our clusters we use the auto-reconnect settings

Could you give me a hint what settings these are? Are they available in mimic?

Thanks!
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Dan van der Ster 
Sent: 10 November 2020 10:47:11
To: Andras Pataki
Cc: ceph-users
Subject: [ceph-users] Re: cephfs - blacklisted client coming back?

Hi Andras,

I don't have much experience with blacklisting to know what is a safe default.
For our clusters we use the auto-reconnect settings and never
blacklist any clients.

Cheers, Dan

On Tue, Nov 10, 2020 at 2:10 AM Andras Pataki
 wrote:
>
> Hi Dan,
>
> That makes sense - the time between blacklist and magic comeback was
> around 1 hour - thanks for the explanation.  Is this is a safe default?
> At eviction, the MDS takes all caps from the client away, so if it comes
> back in an hour, doesn't it then  write to files that it perhaps
> shouldn't have access to?
>
> There is the other strange thing ceph-fuse was doing for an hour
> (increased the objecter log level to 20).
>
> Here is the eviction:
> 2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680 I was
> blacklisted at osd epoch 1717894
> 2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680
> remove_session_caps still has dirty|flushing caps on
> 0x100673a2613.head(faked_ino=0 ref=5 ll_ref=1
> cap_refs={4=0,1024=0,4096=0,8192=0} open={3=1} mode=100640
> size=106/4194304 nlink=1 btime=0.00 mtime=2020-11-09 14:34:28.139916
> ctime=2020-11-09 14:34:28.139916 caps=- dirty_caps=Fw
> objectset[0x100673a2613 ts 0/0 objects 1 dirty_or_tx 0]
> parents=0x10067375a7c.head["pwaf-00680.ene"] 0x7fffd034b4d0)
> 2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680
> remove_session_caps still has dirty|flushing caps on
> 0x100673a2614.head(faked_ino=0 ref=5 ll_ref=1
> cap_refs={4=0,1024=0,4096=0,8192=0} open={3=1} mode=100640
> size=106/4194304 nlink=1 btime=0.00 mtime=2020-11-09 14:34:28.145199
> ctime=2020-11-09 14:34:28.145199 caps=- dirty_caps=Fw
> objectset[0x100673a2614 ts 0/0 objects 1 dirty_or_tx 0]
> parents=0x10067375a7c.head["pwaf-00685.ene"] 0x7fffd034bc20)
> 2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680
> remove_session_caps still has dirty|flushing caps on
> 0x100673a2615.head(faked_ino=0 ref=5 ll_ref=1
> cap_refs={4=0,1024=0,4096=0,8192=0} open={3=1} mode=100640
> size=106/4194304 nlink=1 btime=0.00 mtime=2020-11-09 14:34:28.150306
> ctime=2020-11-09 14:34:28.150306 caps=- dirty_caps=Fw
> objectset[0x100673a2615 ts 0/0 objects 1 dirty_or_tx 0]
> parents=0x10067375a7c.head["pwaf-00682.ene"] 0x7fffd034c1d0)
> ... and a lot more of these ...
>
> then the following types of messages repeat:
>
> 2020-11-09 16:51:10.236 7fffdaffd700  7 objectcacher bh_write_scattered
> bh[ 0x7fff60a0ae40 2026998~4 0x7fffac4d0460 (4) v 131065 dirty
> firstbyte=32] waiters = {}
> 2020-11-09 16:51:10.236 7fffdaffd700  7 objectcacher bh_write_scattered
> bh[ 0x7ffe6c405f80 2051562~328804 0x7fffac4d0460 (328804) v 131065 dirty
> firstbyte=-42] waiters = {}
> 2020-11-09 16:51:10.236 7fffdaffd700  7 objectcacher bh_write_scattered
> bh[ 0x7fff9b14d850 2380366~4 0x7fffac4d0460 (4) v 131065 dirty
> firstbyte=32] waiters = {}
> 2020-11-09 16:51:10.236 7fffdaffd700  7 objectcacher bh_write_scattered
> bh[ 0x7fff9bc966f0 2380370~8176 0x7fffac4d0460 (8176) v 131065 dirty
> firstbyte=96] waiters = {}
> ... about 200 or so of these ...
>
> followed by
>
> 2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commit
> marking dirty again due to error bh[ 0x7fff60a0ae40 2026998~4
> 0x7fffac4d0460 (4) v 131183 dirty firstbyte=32] waiters = {} r = -108
> (108) Cannot send after transport endpoint shutdown
> 2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commit
> marking dirty again due to error bh[ 0x7ffe6c405f80 2051562~328804
> 0x7fffac4d0460 (328804) v 131183 dirty firstbyte=-42] waiters = {} r =
> -108 (108) Cannot send after transport endpoint shutdown
> 2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commit
> marking dirty again due to error bh[ 0x7fff9b14d850 2380366~4
> 0x7fffac4d0460 (4) v 131183 dirty firstbyte=32] waiters = {} r = -108
> (108) Cannot send after transport endpoint shutdown
> 2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commit
> marking dirty again due to error bh[ 0x7fff9bc966f0 2380370~8176
> 0x7fffac4d0460 (8176) v 131183 dirty firstbyte=96] waiters = {} r = -108
> (108) Cannot send after transport endpoint shutdown
> ... about 200 or so of these ...
>
> then again:
>
> 2020-11-09 16:51:11.260 7fffdaffd700  7 objectcacher bh_write_scattered
> bh[ 0x7fff60a0ae40 2026998~4 0x7fffac4d0460 (4) v 131183 dirty
> firstbyte=32] waiters = {}
> 2020-11-09 16:51:11.260 7fffdaffd700  7 objectcacher bh_write_scattered
> bh[ 0x7ffe6c405f80 2051562~328804 0x7fffac4d0460 (328804) v 131183 dirty
> firstbyte=-42] waiters = {}
> 2020-11-09 16:51:11.260 7fffdaffd700  7 objectcache

[ceph-users] Re: cephfs - blacklisted client coming back?

2020-11-10 Thread Frank Schilder
Super, thanks! Yeah, I read that an unclean reconnect might lead to data loss 
and a proper mount/unmount is better. So far, any evicted client was rebooting, 
so the reconnect works fine for us with blacklisting. Good to know the 
alternative though.

Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Dan van der Ster 
Sent: 10 November 2020 11:04:07
To: Frank Schilder
Cc: Andras Pataki; ceph-users
Subject: Re: [ceph-users] Re: cephfs - blacklisted client coming back?

On Tue, Nov 10, 2020 at 10:59 AM Frank Schilder  wrote:
>
> Hi Dan.
>
> > For our clusters we use the auto-reconnect settings
>
> Could you give me a hint what settings these are? Are they available in mimic?

Yes. On the mds you need:
mds session blacklist on timeout = false
mds session blacklist on evict = false

And on the fuse client you need:
   client reconnect stale = true

And kernels reconnect by default.

(There might be some consistency sacrificed by this config, but tbh we
never had an issue in a few years).

Cheers, Dan

>
> Thanks!
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Dan van der Ster 
> Sent: 10 November 2020 10:47:11
> To: Andras Pataki
> Cc: ceph-users
> Subject: [ceph-users] Re: cephfs - blacklisted client coming back?
>
> Hi Andras,
>
> I don't have much experience with blacklisting to know what is a safe default.
> For our clusters we use the auto-reconnect settings and never
> blacklist any clients.
>
> Cheers, Dan
>
> On Tue, Nov 10, 2020 at 2:10 AM Andras Pataki
>  wrote:
> >
> > Hi Dan,
> >
> > That makes sense - the time between blacklist and magic comeback was
> > around 1 hour - thanks for the explanation.  Is this is a safe default?
> > At eviction, the MDS takes all caps from the client away, so if it comes
> > back in an hour, doesn't it then  write to files that it perhaps
> > shouldn't have access to?
> >
> > There is the other strange thing ceph-fuse was doing for an hour
> > (increased the objecter log level to 20).
> >
> > Here is the eviction:
> > 2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680 I was
> > blacklisted at osd epoch 1717894
> > 2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680
> > remove_session_caps still has dirty|flushing caps on
> > 0x100673a2613.head(faked_ino=0 ref=5 ll_ref=1
> > cap_refs={4=0,1024=0,4096=0,8192=0} open={3=1} mode=100640
> > size=106/4194304 nlink=1 btime=0.00 mtime=2020-11-09 14:34:28.139916
> > ctime=2020-11-09 14:34:28.139916 caps=- dirty_caps=Fw
> > objectset[0x100673a2613 ts 0/0 objects 1 dirty_or_tx 0]
> > parents=0x10067375a7c.head["pwaf-00680.ene"] 0x7fffd034b4d0)
> > 2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680
> > remove_session_caps still has dirty|flushing caps on
> > 0x100673a2614.head(faked_ino=0 ref=5 ll_ref=1
> > cap_refs={4=0,1024=0,4096=0,8192=0} open={3=1} mode=100640
> > size=106/4194304 nlink=1 btime=0.00 mtime=2020-11-09 14:34:28.145199
> > ctime=2020-11-09 14:34:28.145199 caps=- dirty_caps=Fw
> > objectset[0x100673a2614 ts 0/0 objects 1 dirty_or_tx 0]
> > parents=0x10067375a7c.head["pwaf-00685.ene"] 0x7fffd034bc20)
> > 2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680
> > remove_session_caps still has dirty|flushing caps on
> > 0x100673a2615.head(faked_ino=0 ref=5 ll_ref=1
> > cap_refs={4=0,1024=0,4096=0,8192=0} open={3=1} mode=100640
> > size=106/4194304 nlink=1 btime=0.00 mtime=2020-11-09 14:34:28.150306
> > ctime=2020-11-09 14:34:28.150306 caps=- dirty_caps=Fw
> > objectset[0x100673a2615 ts 0/0 objects 1 dirty_or_tx 0]
> > parents=0x10067375a7c.head["pwaf-00682.ene"] 0x7fffd034c1d0)
> > ... and a lot more of these ...
> >
> > then the following types of messages repeat:
> >
> > 2020-11-09 16:51:10.236 7fffdaffd700  7 objectcacher bh_write_scattered
> > bh[ 0x7fff60a0ae40 2026998~4 0x7fffac4d0460 (4) v 131065 dirty
> > firstbyte=32] waiters = {}
> > 2020-11-09 16:51:10.236 7fffdaffd700  7 objectcacher bh_write_scattered
> > bh[ 0x7ffe6c405f80 2051562~328804 0x7fffac4d0460 (328804) v 131065 dirty
> > firstbyte=-42] waiters = {}
> > 2020-11-09 16:51:10.236 7fffdaffd700  7 objectcacher bh_write_scattered
> > bh[ 0x7fff9b14d850 2380366~4 0x7fffac4d0460 (4) v 131065 dirty
> > firstbyte=32] waiters = {}
> > 2020-11-09 16:51:10.236 7fffdaffd700  7 objectcacher bh_write_scattered
> > bh[ 0x7fff9bc966f0 2380370~8176 0x7fffac4d0460 (8176) v 131065 dirty
> > firstbyte=96] waiters = {}
> > ... about 200 or so of these ...
> >
> > followed by
> >
> > 2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commit
> > marking dirty again due to error bh[ 0x7fff60a0ae40 2026998~4
> > 0x7fffac4d0460 (4) v 131183 dirty firstbyte=32] waiters = {} r = -108
> > (108) Cannot send after transport endpoint shutdown
> > 2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commit
> > marking dirty again due to error bh[

[ceph-users] Re: Cephfs Kernel client not working properly without ceph cluster IP

2020-11-10 Thread Eugen Block
Could it be possible that you have some misconfiguration in the name  
resolution and IP mapping? I've never heard or experienced that a  
client requires a cluster address, that would make the whole concept  
of separate networks obsolete which is hard to believe, to be honest.  
I would recommend to double-check your setup.



Zitat von Amudhan P :


Hi Nathan,

Kernel client should be using only the public IP of the cluster to
communicate with OSD's.

But here it requires both IP's for mount to work properly.

regards
Amudhan



On Mon, Nov 9, 2020 at 9:51 PM Nathan Fish  wrote:


It sounds like your client is able to reach the mon but not the OSD?
It needs to be able to reach all mons and all OSDs.

On Sun, Nov 8, 2020 at 4:29 AM Amudhan P  wrote:
>
> Hi,
>
> I have mounted my cephfs (ceph octopus) thru kernel client in Debian.
> I get following error in "dmesg" when I try to read any file from my
mount.
> "[  236.429897] libceph: osd1 10.100.4.1:6891 socket closed (con state
> CONNECTING)"
>
> I use public IP (10.100.3.1) and cluster IP (10.100.4.1) in my ceph
> cluster. I think public IP is enough to mount the share and work on it
but
> in my case, it needs me to assign public IP also to the client to work
> properly.
>
> Does anyone have experience in this?
>
> I have earlier also mailed the ceph-user group but I didn't get any
> response. So sending again not sure my mail went through.
>
> regards
> Amudhan
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Nautilus - osdmap not trimming

2020-11-10 Thread m . sliwinski

Hi

We have ceph cluster running on Nautilus, recently upgraded from Mimic.
When in Mimic we noticed issue with osdmap not trimming, which caused 
part of our cluster to crash due to osdmap cache misses. We solved it by 
adding "osd_map_cache_size = 5000" to our ceph.conf
Because we had at that time mixed OSD versions from both Mimic and 
Nautilus we decided to finish upgrade, but it didn't solve our problem.
We have at the moment: "oldest_map": 67114, "newest_map": 72588,and the 
difference is not shrinking even thought cluster is in active+clean 
state. Restarting all mon's didn't help. It seems bug is similar to 
https://tracker.ceph.com/issues/44184 but there's no solution there.

What else can i check or do?
I don't want do to cangerous things like mon_osd_force_trim_to or 
something similar without finding the cause.


I noticed in MON debug log:

2020-11-10 17:11:14.612 7f9592d5b700 10 mon.monb01@0(leader).osd e72571 
should_prune could only prune 4957 epochs (67114..72071), which is less 
than the required minimum (1)
2020-11-10 17:11:19.612 7f9592d5b700 10 mon.monb01@0(leader).osd e72571 
should_prune could only prune 4957 epochs (67114..72071), which is less 
than the required minimum (1)


So i added config options to reduce those values:

  mon   dev  mon_debug_block_osdmap_trim   false
  mon   advanced mon_min_osdmap_epochs 100
  mon   advanced mon_osdmap_full_prune_min 500
  mon   advanced paxos_service_trim_min10

But it didn't help:

2020-11-10 18:28:26.165 7f1b700ab700 20 mon.monb01@0(leader).osd e72588 
load_osdmap_manifest osdmap manifest detected in store; reload.
2020-11-10 18:28:26.169 7f1b700ab700 10 mon.monb01@0(leader).osd e72588 
load_osdmap_manifest store osdmap manifest pinned (67114 .. 72484)
2020-11-10 18:28:26.169 7f1b700ab700 10 mon.monb01@0(leader).osd e72588 
should_prune not enough epochs to form an interval (last pinned: 72484, 
last to pin: 72488, interval: 10)


Command "ceph report | jq '.osdmap_manifest' |jq '.pinned_maps[]'" shows 
67114 on the top, but i'm unable to determine why.


Same with 'ceph report | jq .osdmap_first_committed':

root@monb01:/var/log/ceph# ceph report | jq .osdmap_first_committed
report 4073203295
67114
root@monb01:/var/log/ceph#

When i try to derermine if a certain PG or OSD is keeping it so low i 
don't get anything.


And in MON debug log i get:

2020-11-10 18:42:41.767 7f1b74721700 10 mon.monb01@0(leader) e6 
refresh_from_paxos
2020-11-10 18:42:41.767 7f1b74721700 10 
mon.monb01@0(leader).paxosservice(mdsmap 1..1) refresh
2020-11-10 18:42:41.767 7f1b74721700 10 
mon.monb01@0(leader).paxosservice(osdmap 67114..72588) refresh
2020-11-10 18:42:41.767 7f1b74721700 20 mon.monb01@0(leader).osd e72588 
load_osdmap_manifest osdmap manifest detected in store; reload.
2020-11-10 18:42:41.767 7f1b74721700 10 mon.monb01@0(leader).osd e72588 
load_osdmap_manifest store osdmap manifest pinned (67114 .. 72484)


I also get:

root@monb01:/var/log/ceph#  ceph report |grep "min_last_epoch_clean"
report 2716976759
"min_last_epoch_clean": 0,
root@monb01:/var/log/ceph#


Additional info:
root@monb01:/var/log/ceph# ceph versions
{
"mon": {
"ceph version 14.2.13 (1778d63e55dbff6cedb071ab7d367f8f52a8699f) 
nautilus (stable)": 3

},
"mgr": {
"ceph version 14.2.13 (1778d63e55dbff6cedb071ab7d367f8f52a8699f) 
nautilus (stable)": 3

},
"osd": {
"ceph version 14.2.13 (1778d63e55dbff6cedb071ab7d367f8f52a8699f) 
nautilus (stable)": 120,
"ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) 
nautilus (stable)": 164

},
"mds": {},
"overall": {
"ceph version 14.2.13 (1778d63e55dbff6cedb071ab7d367f8f52a8699f) 
nautilus (stable)": 126,
"ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) 
nautilus (stable)": 164

}
}


root@monb01:/var/log/ceph# ceph mon feature ls

all features
supported: [kraken,luminous,mimic,osdmap-prune,nautilus]
persistent: [kraken,luminous,mimic,osdmap-prune,nautilus]
on current monmap (epoch 6)
persistent: [kraken,luminous,mimic,osdmap-prune,nautilus]
required: [kraken,luminous,mimic,osdmap-prune,nautilus]


root@monb01:/var/log/ceph# ceph osd dump | grep require
require_min_compat_client luminous
require_osd_release nautilus


root@monb01:/var/log/ceph# ceph report | jq 
'.osdmap_manifest.pinned_maps | length'

report 1777129876
538

root@monb01:/var/log/ceph# ceph pg dump -f json | jq .osd_epochs
dumped all
null

--
Best regards
Marcin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph RBD - High IOWait during the Writes

2020-11-10 Thread athreyavc
Hi All,

We have recently deployed a new CEPH cluster Octopus 15.2.4 which consists
of

12 OSD Nodes(16 Core + 200GB RAM,  30x14TB disks, CentOS 8)
3 Mon Nodes (8 Cores + 15GB, CentOS 8)

We use Erasure Coded Pool and RBD block devices.

3 Ceph clients use the RBD devices, each has 25 RBDs  and Each RBD size is
10TB. Each RBD is partitioned with the EXT4 file system.

Cluster Health Is OK and Hardware is New and good.

All the machines have 10Gbps (Active/Passive) bond Interface  configured on
it.

Read operation of the cluster is OK, however, writes are very slow.

One one of the RBDs we did the perf test.

fio -ioengine=libaio -direct=1 -invalidate=1 -name=test -bs=4k -iodepth=128
-rw=randread -runtime=60 -filename=/dev/rbd40

Run status group 0 (all jobs):
   READ: bw=401MiB/s (420MB/s), 401MiB/s-401MiB/s (420MB/s-420MB/s),
io=23.5GiB (25.2GB), run=60054-60054msec

fio -ioengine=libaio -direct=1 -invalidate=1 -name=test -bs=4k -iodepth=128
-rw=randwrite -runtime=60 -filename=/dev/rbd40

Run status group 0 (all jobs):
  WRITE: bw=217KiB/s (222kB/s), 217KiB/s-217KiB/s (222kB/s-222kB/s),
io=13.2MiB (13.9MB), run=62430-62430msec

I see a High IO wait from the client.

Any suggestions/pointers address this issue is really appreciated.

Thanks and Regards,

Athreya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph RBD - High IOWait during the Writes

2020-11-10 Thread Jason Dillaman
On Tue, Nov 10, 2020 at 1:52 PM athreyavc  wrote:
>
> Hi All,
>
> We have recently deployed a new CEPH cluster Octopus 15.2.4 which consists
> of
>
> 12 OSD Nodes(16 Core + 200GB RAM,  30x14TB disks, CentOS 8)
> 3 Mon Nodes (8 Cores + 15GB, CentOS 8)
>
> We use Erasure Coded Pool and RBD block devices.
>
> 3 Ceph clients use the RBD devices, each has 25 RBDs  and Each RBD size is
> 10TB. Each RBD is partitioned with the EXT4 file system.
>
> Cluster Health Is OK and Hardware is New and good.
>
> All the machines have 10Gbps (Active/Passive) bond Interface  configured on
> it.
>
> Read operation of the cluster is OK, however, writes are very slow.
>
> One one of the RBDs we did the perf test.
>
> fio -ioengine=libaio -direct=1 -invalidate=1 -name=test -bs=4k -iodepth=128
> -rw=randread -runtime=60 -filename=/dev/rbd40
>
> Run status group 0 (all jobs):
>READ: bw=401MiB/s (420MB/s), 401MiB/s-401MiB/s (420MB/s-420MB/s),
> io=23.5GiB (25.2GB), run=60054-60054msec
>
> fio -ioengine=libaio -direct=1 -invalidate=1 -name=test -bs=4k -iodepth=128
> -rw=randwrite -runtime=60 -filename=/dev/rbd40
>
> Run status group 0 (all jobs):
>   WRITE: bw=217KiB/s (222kB/s), 217KiB/s-217KiB/s (222kB/s-222kB/s),
> io=13.2MiB (13.9MB), run=62430-62430msec
>
> I see a High IO wait from the client.
>
> Any suggestions/pointers address this issue is really appreciated.

EC pools + small random writes + performance: pick two of the three. ;-)

Writes against an EC pool require the chunk to be re-written via an
expensive read/modify/write cycle.

> Thanks and Regards,
>
> Athreya
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Jason
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph RBD - High IOWait during the Writes

2020-11-10 Thread athreyavc
Thanks for the Reply.

We are not really expecting  performance level  which is needed for the
Virtual Machines or Databases. We want to use it as a File store where an
app writes in to the mounts.

I think it is slow for  file write operations  and we see high IO waits.
Anything I can do to increase the throughput ?

Thanks and regards,

Athreya

On Tue, Nov 10, 2020 at 7:10 PM Jason Dillaman  wrote:

> On Tue, Nov 10, 2020 at 1:52 PM athreyavc  wrote:
> >
> > Hi All,
> >
> > We have recently deployed a new CEPH cluster Octopus 15.2.4 which
> consists
> > of
> >
> > 12 OSD Nodes(16 Core + 200GB RAM,  30x14TB disks, CentOS 8)
> > 3 Mon Nodes (8 Cores + 15GB, CentOS 8)
> >
> > We use Erasure Coded Pool and RBD block devices.
> >
> > 3 Ceph clients use the RBD devices, each has 25 RBDs  and Each RBD size
> is
> > 10TB. Each RBD is partitioned with the EXT4 file system.
> >
> > Cluster Health Is OK and Hardware is New and good.
> >
> > All the machines have 10Gbps (Active/Passive) bond Interface  configured
> on
> > it.
> >
> > Read operation of the cluster is OK, however, writes are very slow.
> >
> > One one of the RBDs we did the perf test.
> >
> > fio -ioengine=libaio -direct=1 -invalidate=1 -name=test -bs=4k
> -iodepth=128
> > -rw=randread -runtime=60 -filename=/dev/rbd40
> >
> > Run status group 0 (all jobs):
> >READ: bw=401MiB/s (420MB/s), 401MiB/s-401MiB/s (420MB/s-420MB/s),
> > io=23.5GiB (25.2GB), run=60054-60054msec
> >
> > fio -ioengine=libaio -direct=1 -invalidate=1 -name=test -bs=4k
> -iodepth=128
> > -rw=randwrite -runtime=60 -filename=/dev/rbd40
> >
> > Run status group 0 (all jobs):
> >   WRITE: bw=217KiB/s (222kB/s), 217KiB/s-217KiB/s (222kB/s-222kB/s),
> > io=13.2MiB (13.9MB), run=62430-62430msec
> >
> > I see a High IO wait from the client.
> >
> > Any suggestions/pointers address this issue is really appreciated.
>
> EC pools + small random writes + performance: pick two of the three. ;-)
>
> Writes against an EC pool require the chunk to be re-written via an
> expensive read/modify/write cycle.
>
> > Thanks and Regards,
> >
> > Athreya
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
>
>
> --
> Jason
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-10 Thread Janek Bevendorff
Here's something else I noticed: when I stat objects that work via 
radosgw-admin, the stat info contains a "begin_iter" JSON object with RADOS key 
info like this


"key": {
"name": 
"29/items/WIDE-20110924034843-crawl420/WIDE-20110924065228-02544.warc.gz",
"instance": "",
"ns": ""
}


and then "end_iter" with key info like this:


"key": {
"name": ".8naRUHSG2zfgjqmwLnTPvvY1m6DZsgh_239",
"instance": "",
"ns": "shadow"
}


However, when I check the broken 0-byte object, the "begin_iter" and "end_iter" 
keys look like this:


"key": {
"name": 
"29/items/WIDE-20110903143858-crawl428/WIDE-20110903143858-01166.warc.gz.2~m5Y42lPMIeis5qgJAZJfuNnzOKd7lme.1",
"instance": "",
"ns": "multipart"
}

[...]


"key": {
"name": 
"29/items/WIDE-20110903143858-crawl428/WIDE-20110903143858-01166.warc.gz.2~m5Y42lPMIeis5qgJAZJfuNnzOKd7lme.19",
"instance": "",
"ns": "multipart"
}


So, it's the full name plus a suffix and the namespace is multipart, not shadow 
(or empty). This in itself may just be an artefact of whether the object was 
uploaded in one go or as a multipart object, but the second difference is that 
I cannot find any of the multipart objects in my pool's object name dump. I 
can, however, find the shadow RADOS object of the intact S3 object.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] safest way to re-crush a pool

2020-11-10 Thread Michael Thomas
I'm setting up a radosgw for my ceph Octopus cluster.  As soon as I 
started the radosgw service, I notice that it created a handful of new 
pools.  These pools were assigned the 'replicated_data' crush rule 
automatically.


I have a mixed hdd/ssd/nvme cluster, and this 'replicated_data' crush 
rule spans all device types.  I would like radosgw to use a replicated 
SSD pool and avoid the HDDs.  What is the recommended way to change the 
crush device class for these pools without risking the loss of any data 
in the pools?  I will note that I have not yet written any user data to 
the pools.  Everything in them was added by the radosgw process 
automatically.


--Mike
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] (Ceph Octopus) Repairing a neglected Ceph cluster - Degraded Data Reduncancy, all PGs degraded, undersized, not scrubbed in time

2020-11-10 Thread seffyroff
I've inherited a Ceph Octopus cluster that seems like it needs urgent 
maintenance before data loss begins to happen. I'm the guy with the most Ceph 
experience on hand and that's not saying much. I'm experiencing most of the ops 
and repair tasks for the first time here.

Ceph health output looks like this:

HEALTH_WARN Degraded data redundancy: 3640401/8801868 objects degraded 
(41.359%),
 128 pgs degraded, 128 pgs undersized; 128 pgs not deep-scrubbed in time;
 128 pgs not scrubbed in time

Ceph -s output: https://termbin.com/i06u

The crush rule 'cephfs.media' is here: https://termbin.com/2klmq

So, it seems like all PGs are in a 'warning' state for the main pool, which is 
erasure coded and 11TiB across 4 OSDs, of which around 6.4TiB is used. The Ceph 
services themselves seem happy, they're stable and have Quorum. I'm able to 
access the web panel fine also.  The block devices are of different sizes and 
types (2 large, different sized spinners, and 2 identical SSDs)

I would welcome any pointers on what my steps to bring this up to full health 
may be.  If it's undersized, can I simply add another block device/OSD? Or 
perhaps adjusting config somewhere will get it to rebalance successfully? (the 
rebalance jobs have been stuck at 0% for weeks)

Thank you for your time reading this message.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] disable / remove multisite sync RGW (Ceph Nautilus)

2020-11-10 Thread gans
Hello everybody,

we are running a multisite (active/active) gateway on 2 ceph clusters.
One production and one backup cluster.
Now we make a backup with rclone from the master and don't need anymore the 
second Gateway.

What is the best way to shutdown the second Gateway and remove the multisite 
sync from the master without lost of the data on the master site.

greetings
Markus
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Slow ops and "stuck peering"

2020-11-10 Thread shehzaad . chakowree
Hello all,

We're trying to debug a "slow ops" situation on our cluster running Nautilus 
(latest version). Things were running smoothly for a while, but we had a few 
issues that made things fall apart (possible clock skew, faulty disk...)

- We've checked the ntp, everything seems fine, the whole cluster shows no 
clock skew. Network config seems fine too (we're using jumbo frames throughout 
the cluster).

- We have multiple PGs that are in a "stuck peering" or "stuck inactive" state.

ceph health detail
HEALTH_WARN Reduced data availability: 1020 pgs inactive, 1008 pgs peering; 
Degraded data redundancy: 208352/95157861 objects degraded (0.219%), 9 pgs 
degraded, 9 pgs undersized; 2 pgs not deep-scrubbed in time; 2 pgs not scrubbed 
in time; 3 daemons have recently crashed; 1184 slow ops, oldest one blocked for 
1792 sec, daemons 
[osd.100,osd.101,osd.102,osd.103,osd.104,osd.105,osd.106,osd.107,osd.108,osd.109]...
 have slow ops.
PG_AVAILABILITY Reduced data availability: 1020 pgs inactive, 1008 pgs peering
pg 12.3cd is stuck inactive for 8939.938831, current state peering, last 
acting [111,75,53]
pg 12.3ce is stuck peering for 350761.931800, current state peering, last 
acting [48,103,76]
pg 12.3cf is stuck peering for 345518.349253, current state peering, last 
acting [80,46,116]
pg 12.3d0 is stuck peering for 396432.771388, current state peering, last 
acting [114,95,42]
pg 12.3d1 is stuck peering for 389771.820478, current state peering, last 
acting [33,99,122]
pg 12.3d2 is stuck peering for 16385.796714, current state peering, last 
acting [48,75,105]
pg 12.3d3 is stuck peering for 375090.876123, current state peering, last 
acting [53,118,90]
pg 12.3d4 is stuck peering for 350665.788611, current state peering, last 
acting [59,81,40]
pg 12.3d5 is stuck peering for 344195.934260, current state peering, last 
acting [104,73,87]
pg 12.3d6 is stuck peering for 388515.338772, current state peering, last 
acting [57,79,60]
pg 12.3d7 is stuck peering for 27320.368320, current state peering, last 
acting [35,56,109]
pg 12.3d8 is stuck peering for 345470.520103, current state peering, last 
acting [91,41,74]
pg 12.3d9 is stuck peering for 347582.613090, current state peering, last 
acting [85,66,103]
pg 12.3da is stuck peering for 346518.712024, current state peering, last 
acting [87,63,56]
pg 12.3db is stuck peering for 348804.986864, current state peering, last 
acting [100,122,46]
pg 12.3dc is stuck peering for 343796.439591, current state peering, last 
acting [55,90,125]
pg 12.3dd is stuck peering for 345621.663979, current state peering, last 
acting [83,38,125]
pg 12.3de is stuck peering for 348026.449482, current state peering, last 
acting [38,113,82]
pg 12.3df is stuck peering for 350263.925579, current state peering, last 
acting [41,104,87]
pg 12.3e0 is stuck peering for 8738.645205, current state peering, last 
acting [57,86,108]
pg 12.3e1 is stuck peering for 397082.568164, current state peering, last 
acting [124,46]
pg 12.3e2 is stuck peering for 345232.402459, current state peering, last 
acting [80,114,65]
pg 12.3e3 is stuck peering for 347014.276511, current state peering, last 
acting [63,102,83]
pg 12.3e4 is stuck peering for 345470.524144, current state peering, last 
acting [91,38,71]
pg 12.3e5 is stuck peering for 346636.837554, current state peering, last 
acting [64,85,118]
pg 12.3e6 is stuck peering for 398952.293609, current state peering, last 
acting [92,36,75]
pg 12.3e7 is stuck peering for 346973.264600, current state peering, last 
acting [31,94,53]
pg 12.3e8 is stuck peering for 370098.248268, current state peering, last 
acting [119,90,72]
pg 12.3e9 is stuck peering for 345134.069457, current state peering, last 
acting [96,105,36]
pg 12.3ea is stuck peering for 346305.043394, current state peering, last 
acting [94,103,51]
pg 12.3eb is stuck peering for 388515.112735, current state peering, last 
acting [57,116,59]
pg 12.3ec is stuck peering for 348097.249845, current state peering, last 
acting [56,111,84]
pg 12.3ed is stuck peering for 346636.835287, current state peering, last 
acting [64,106,101]
pg 12.3ee is stuck peering for 398197.856231, current state peering, last 
acting [53,105,80]
pg 12.3ef is stuck peering for 347061.858678, current state peering, last 
acting [47,64,80]
pg 12.3f0 is stuck peering for 371495.723196, current state peering, last 
acting [77,115,81]
pg 12.3f1 is stuck peering for 27539.717691, current state peering, last 
acting [123,69,48]
pg 12.3f2 is stuck peering for 346973.596729, current state peering, last 
acting [31,80,45]
pg 12.3f3 is stuck peering for 345419.834162, current state peering, last 
acting [108,89,40]
pg 12.3f4 is stuck peering for 347400.170304, current state peering, last 
acting [82,67,104]
pg 12.3f5 is stuck peering for 346793.349638, current state peer

[ceph-users] Ceph RBD - High IOWait during the Writes

2020-11-10 Thread athreyavc
Hi, 

We have recently deployed a Ceph cluster with 

12 OSD nodes(16 Core + 200GB RAM + 30 disks each of 14TB) Running CentOS 8
3 Monitoring Nodes (8 Core + 16GB RAM) Running CentOS 8

We are using Ceph Octopus and we are using RBD block devices.

We have three Ceph client nodes(16core + 30GB RAM, Running CentOS 8) across 
which RBDs are mapped and mounted, 25 RBDs each on each client node. Each RBD 
size is 10TB. Each RBD is formatted as EXT4 file system. 

>From network side, we have 10Gbps Active/Passive Bond on all the Ceph cluster 
>nodes, including the clients. Jumbo frames enabled  and MTU is 9000

This is a new cluster and cluster health reports Ok. But we see high IO wait 
during the writes. 

>From one of the clients, 

15:14:30CPU %user %nice   %system   %iowait%steal %idle
15:14:31all  0.06  0.00  1.00 45.03  0.00 53.91
15:14:32all  0.06  0.00  0.94 41.28  0.00 57.72
15:14:33all  0.06  0.00  1.25 45.78  0.00 52.91
15:14:34all  0.00  0.00  1.06 40.07  0.00 58.86
15:14:35all  0.19  0.00  1.38 41.04  0.00 57.39
Average:all  0.08  0.00  1.13 42.64  0.00 56.16

and the system load shows very high 

top - 15:19:15 up 34 days, 41 min,  2 users,  load average: 13.49, 13.62, 13.83


>From 'atop' 

one of the CPUs shows this 

CPU | sys   7%  | user  1% |  irq   2% |  idle   1394% | wait
195%  | steal 0% |  guest 0% | ipc  initial  | cycl initial  | curf  
806MHz |  curscal   ?%

On the OSD nodes, don't see much %utilization of the disks. 

RBD caching values are default. 

Are we overlooking some configuration item ?

Thanks and Regards,

At
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RGW multisite sync and latencies problem

2020-11-10 Thread Miroslav Bohac
Hi,
I have problem with RGW in multisite configuration with Nautilus 14.2.11. Both 
zones with SSD and 10Gbps network. Master zone consist from 5x DELL R740XD 
servers (every 256GB RAM, 8x800GB SSD for CEPH, 24xCPU). Secondary zone 
(temporary for testing) consist from 3x HPE DL360 Gen10 servers (every 256GB 
RAM, 6x800GB SSD, 48CPU). 

We have 17 test buckets with manual sharding (101 shards). Every bucket with 
10M of small objects (10kB - 15kB). Zonegroup configuration is attached bellow. 
Replication of 150M objects from master to secondary zone takes almost 28 hours 
and the replication completed with success.

After deleting objects from one bucket in master zone is not possible to sync 
zones properly. I tried to restart both secondary RGWs, but without success. 
Sync status on secondary zone is behind master. The number of objects in 
buckets on master zone is different than on secondary zone. 

Ceph HEALTH status is WARNING on both zones. On master zone I have 146 large 
objects found in pool 'prg2a-1.rgw.buckets.index' 16 large objects found in 
pool 'prg2a-1.rgw.log'. On secondary zone 88 large objects found in pool 
'prg2a-2.rgw.log' 1584 large objects found in pool 'prg2a-2.rgw.buckets.index'. 

AVG OSD latencies on secondary zone during sync was "read 0,158ms, write 
1,897ms, overwrite 1,634ms". After unsuccesfull sync (after 12h of sync fall 
down RGW requests, IOPS and throughput) jumps up AVG OSD latencies to "read 
125ms, write 30ms, overwrite 272ms". After stopping of both RGWs on secondary 
zone are AVG OSD latencies almost 0ms, but when I start RGWs on secondary zone 
again, OSD latencies will rise again to "read 125ms, write 30ms, overwrite 
272ms" with spikes up to 3 seconds.

We have seen the same behaviour of ceph multisite with large number of object 
in one bucket (150M+ objects), so we tried different strategy with smaller 
buckets, but results are same.

I will appreciate any help or advice, how tune or diagnose multisite problems.
Does anyone else have any ideas? Is there anyone else with a similar use-case? 
I do not know what is wrong.

Thank you and best regards,
Miroslav

radosgw-admin zonegroup get
{
"id": "ac0005da-2e9f-4f38-835f-72b289c240d0",
"name": "prg2a",
"api_name": "prg2a",
"is_master": "true",
"endpoints": [
"http://s3.prg1a.sys.cz:80";,
"http://s3.prg2a.sys.cz:80";
],
"hostnames": [],
"hostnames_s3website": [],
"master_zone": "d9ebbd1f-3312-4083-b4c2-843e1fb899ad",
"zones": [
{
"id": "d9ebbd1f-3312-4083-b4c2-843e1fb899ad",
"name": "prg2a-1",
"endpoints": [
"http://10.104.200.101:7480";,
"http://10.104.200.102:7480";
],
"log_meta": "false",
"log_data": "true",
"bucket_index_max_shards": 0,
"read_only": "false",
"tier_type": "",
"sync_from_all": "true",
"sync_from": [],
"redirect_zone": ""
},
{
"id": "fdd76c02-c679-4ec7-8e7d-c14d2ac74fb4",
"name": "prg2a-2",
"endpoints": [
"http://10.104.200.221:7480";,
"http://10.104.200.222:7480";
],
"log_meta": "false",
"log_data": "true",
"bucket_index_max_shards": 0,
"read_only": "false",
"tier_type": "",
"sync_from_all": "true",
"sync_from": [],
"redirect_zone": ""
}
],
"placement_targets": [
{
"name": "default-placement",
"tags": [],
"storage_classes": [
"STANDARD"
]
}
],
"default_placement": "default-placement",
"realm_id": "cb831094-e219-44b8-89f3-fe25fc288c00"

ii  radosgw  14.2.11-pve1 amd64 
   REST gateway for RADOS distributed object store
ii  ceph 14.2.11-pve1 amd64 
   distributed storage and file system
ii  ceph-base14.2.11-pve1 amd64 
   common ceph daemon libraries and management tools
ii  ceph-common  14.2.11-pve1 amd64 
   common utilities to mount and interact with a ceph storage cluster
ii  ceph-fuse14.2.11-pve1 amd64 
   FUSE-based client for the Ceph distributed file system
ii  ceph-mds 14.2.11-pve1 amd64 
   metadata server for the ceph distributed file system
ii  ceph-mgr 14.2.11-pve1 amd64 
   manager for the ceph distributed storage system
ii  ceph-mon 14.2.11-pve1 amd64 
   monitor server for the ceph storage system
ii  ceph-osd 14.2.11-pve1 amd64

[ceph-users] 150mb per sec on NVMe pool

2020-11-10 Thread Alex L
Hi,
I have invested in SAMSUNG PM983 (MZ1LB960HAJQ-7) x3 to run a fast pool on. 
However I am only getting 150mb/sec from these. 

vfio results directly on the NVMe's:
https://docs.google.com/spreadsheets/d/1LXupjEUnNdf011QNr24pkAiDBphzpz5_MwM0t9oAl54/edit?usp=sharing

Config and Results of ceph bench:
https://pastebin.com/cScBv7Fv

Appreciate any help you can give me.
A
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OverlayFS with Cephfs to mount a snapshot read/write

2020-11-10 Thread Jeff Layton
Yes, you'd have to apply the patch to that kernel yourself. No RHEL7
kernels have that patch (so far). Newer RHEL8 kernels _do_ if that's an
option for you.
-- Jeff

On Mon, 2020-11-09 at 19:21 +0100, Frédéric Nass wrote:
> I feel lucky to have you on this one. ;-) Do you mean applying a 
> specific patch on 3.10 kernel? Or is this one too old to have it working 
> anyways.
> 
> Frédéric.
> 
> Le 09/11/2020 à 19:07, Luis Henriques a écrit :
> > Frédéric Nass  writes:
> > 
> > > Hi Luis,
> > > 
> > > Thanks for your help. Sorry I forgot about the kernel details. This is 
> > > latest
> > > RHEL 7.9.
> > > 
> > > ~/ uname -r
> > > 3.10.0-1160.2.2.el7.x86_64
> > > 
> > > ~/ grep CONFIG_TMPFS_XATTR /boot/config-3.10.0-1160.2.2.el7.x86_64
> > > CONFIG_TMPFS_XATTR=y
> > > 
> > > upper directory /upperdir is using xattrs
> > > 
> > > ~/ ls -l /dev/mapper/vg0-racine
> > > lrwxrwxrwx 1 root root 7  6 mars   2020 /dev/mapper/vg0-racine -> ../dm-0
> > > 
> > > ~/ cat /proc/fs/ext4/dm-0/options | grep xattr
> > > user_xattr
> > > 
> > > ~/ setfattr -n user.name -v upperdir /upperdir
> > > 
> > > ~/ getfattr -n user.name /upperdir
> > > getfattr: Suppression des « / » en tête des chemins absolus
> > > # file: upperdir
> > > user.name="upperdir"
> > > 
> > > Are you able to modify the content of a snapshot directory using 
> > > overlayfs on
> > > your side?
> > [ Cc'ing Jeff ]
> > 
> > Yes, I'm able to do that using a *recent* kernel.  I got curious and after
> > some digging I managed to reproduce the issue with kernel 5.3.  The
> > culprit was commit e09580b343aa ("ceph: don't list vxattrs in
> > listxattr()"), in 5.4.
> > 
> > Getting a bit more into the whole rabbit hole, it looks like
> > ovl_copy_xattr() will try to copy all the ceph-related vxattrs.  And that
> > won't work (for ex. for ceph.dir.entries).
> > 
> > Can you try cherry-picking this commit into your kernel to see if that
> > fixes it for you?
> > 
> > Cheers,

-- 
Jeff Layton 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] newbie question: direct objects of different sizes to different pools?

2020-11-10 Thread andersnb
Hi All,

I'm exploring deploying Ceph at my organization for use as an object storage 
system (using the S3 RGW interface). 
My users have range of file sizes and I'd like to direct small files to a pool 
that uses replication and large files to a pool that uses erasure encoding.

Is that possible?

Thanks!

Bill
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-10 Thread Janek Bevendorff
We are having the exact same problem (also Octopus). The object is listed by 
s3cmd, but trying to download it results in a 404 error. radosgw-admin object 
stat shows that the object still exists. Any further ideas how I can restore 
access to this object?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Is there a way to make Cephfs kernel client to write data to ceph osd smoothly with buffer io

2020-11-10 Thread Sage Meng
Hi All,

  Cephfs kernel client is influenced by kernel page cache when we write
data to it,  outgoing data will be huge when os starts flush page cache.
So Is there a way to make Cephfs kernel client to write data to ceph osd
smoothly when buffer io is used ?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph as a distributed filesystem and kerberos integration

2020-11-10 Thread Marco Venuti
Hi,
I have the same use-case.
Is there some alternative to Samba in order to export CephFS to the end user? I 
am somewhat concerned with its potential vulnerabilities, which appear to be 
quite frequent.
Specifically, I need server-side enforced permissions and possibly Kerberos 
authentication and server-side enforced quotas.

Thank you,
Marco
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: safest way to re-crush a pool

2020-11-10 Thread DHilsbos
Michael;

I run a Nautilus cluster, but all I had to do was change the rule associated 
with the pool, and ceph moved the data.

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Michael Thomas [mailto:w...@caltech.edu] 
Sent: Tuesday, November 10, 2020 1:32 PM
To: ceph-users@ceph.io
Subject: [ceph-users] safest way to re-crush a pool

I'm setting up a radosgw for my ceph Octopus cluster.  As soon as I 
started the radosgw service, I notice that it created a handful of new 
pools.  These pools were assigned the 'replicated_data' crush rule 
automatically.

I have a mixed hdd/ssd/nvme cluster, and this 'replicated_data' crush 
rule spans all device types.  I would like radosgw to use a replicated 
SSD pool and avoid the HDDs.  What is the recommended way to change the 
crush device class for these pools without risking the loss of any data 
in the pools?  I will note that I have not yet written any user data to 
the pools.  Everything in them was added by the radosgw process 
automatically.

--Mike
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph RBD - High IOWait during the Writes

2020-11-10 Thread Frank Schilder
If this is a file server, why not use SAMBA on cephfs? That's what we do. RBD 
is a very cumbersome extra layer for storing files that eats a lot of 
performance. Add an SSD to each node for the meta-data and primary data pool 
and use an EC pool on HDDs for data. This will be much better.

Still, EC and small writes don't go well together. You may need to consider 
3-times replicated.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: athreyavc 
Sent: 10 November 2020 20:20:55
To: dilla...@redhat.com
Cc: ceph-users
Subject: [ceph-users] Re: Ceph RBD - High IOWait during the Writes

Thanks for the Reply.

We are not really expecting  performance level  which is needed for the
Virtual Machines or Databases. We want to use it as a File store where an
app writes in to the mounts.

I think it is slow for  file write operations  and we see high IO waits.
Anything I can do to increase the throughput ?

Thanks and regards,

Athreya

On Tue, Nov 10, 2020 at 7:10 PM Jason Dillaman  wrote:

> On Tue, Nov 10, 2020 at 1:52 PM athreyavc  wrote:
> >
> > Hi All,
> >
> > We have recently deployed a new CEPH cluster Octopus 15.2.4 which
> consists
> > of
> >
> > 12 OSD Nodes(16 Core + 200GB RAM,  30x14TB disks, CentOS 8)
> > 3 Mon Nodes (8 Cores + 15GB, CentOS 8)
> >
> > We use Erasure Coded Pool and RBD block devices.
> >
> > 3 Ceph clients use the RBD devices, each has 25 RBDs  and Each RBD size
> is
> > 10TB. Each RBD is partitioned with the EXT4 file system.
> >
> > Cluster Health Is OK and Hardware is New and good.
> >
> > All the machines have 10Gbps (Active/Passive) bond Interface  configured
> on
> > it.
> >
> > Read operation of the cluster is OK, however, writes are very slow.
> >
> > One one of the RBDs we did the perf test.
> >
> > fio -ioengine=libaio -direct=1 -invalidate=1 -name=test -bs=4k
> -iodepth=128
> > -rw=randread -runtime=60 -filename=/dev/rbd40
> >
> > Run status group 0 (all jobs):
> >READ: bw=401MiB/s (420MB/s), 401MiB/s-401MiB/s (420MB/s-420MB/s),
> > io=23.5GiB (25.2GB), run=60054-60054msec
> >
> > fio -ioengine=libaio -direct=1 -invalidate=1 -name=test -bs=4k
> -iodepth=128
> > -rw=randwrite -runtime=60 -filename=/dev/rbd40
> >
> > Run status group 0 (all jobs):
> >   WRITE: bw=217KiB/s (222kB/s), 217KiB/s-217KiB/s (222kB/s-222kB/s),
> > io=13.2MiB (13.9MB), run=62430-62430msec
> >
> > I see a High IO wait from the client.
> >
> > Any suggestions/pointers address this issue is really appreciated.
>
> EC pools + small random writes + performance: pick two of the three. ;-)
>
> Writes against an EC pool require the chunk to be re-written via an
> expensive read/modify/write cycle.
>
> > Thanks and Regards,
> >
> > Athreya
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
>
>
> --
> Jason
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph 15.2.3 on Ubuntu 20.04 with odroid xu4 / python thread Problem

2020-11-10 Thread Dominik H
Hi all,
i try to run ceph client tools on an odroid xu4 (armhf) with Ubuntu 20.04
on python 3.8.5.
Unfortunately there is the following error on each "ceph" command (even in
ceph --help)

Traceback (most recent call last):
  File "/usr/bin/ceph", line 1275, in 
retval = main()
  File "/usr/bin/ceph", line 981, in main
cluster_handle = run_in_thread(rados.Rados,
  File "/usr/lib/python3/dist-packages/ceph_argparse.py", line 1342, in
run_in_thread
raise Exception("timed out")
Exception: timed out

With this Server I access an existing Ceph-Cluster with  the same hardware.
I checked the code part, there is just a thread start and a join (waiting
for finish a RadosThread).

Maybe this is a python error in combination with armhf architecture? Maybe
someone can help.

Thanks and greetings
Dominik
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] How to use ceph-volume to create multiple OSDs per NVMe disk, and with fixed WAL/DB partition on another device?

2020-11-10 Thread victorhooi
I'm building a new 4-node Proxmox/Ceph cluster, to hold disk images for our 
VMs. (Ceph version is 15.2.5).

Each node has 6 x NVMe SSDs (4TB), and 1 x Optane drive (960GB).

CPU is AMD Rome 7442, so there should be plenty of CPU capacity to spare.

My aim is to create 4 x OSDs per NVMe SSD (to make more effective use of the 
NVMe performance) and use the Optane drive to store the WAL/DB partition for 
each OSD. (I.e. total of 24 x 35GB WAL/DB partitions).

However, I am struggling to get the right ceph-volume command to achieve this.

Thanks to a very kind Redditor, I was able to get close:

/dev/nvme0n1 is an Optane device (900GB).

/dev/nvme2n1 is an Intel NVMe SSD (4TB).

```
# ceph-volume lvm batch --osds-per-device 4 /dev/nvme2n1 --db-devices 
/dev/nvme0n1

Total OSDs: 4

Solid State VG:
  Targets:   block.db  Total size: 893.00 GB
  Total LVs: 16Size per LV: 223.25 GB
  Devices:   /dev/nvme0n1

  TypePathLV 
Size % of device

  [data]  /dev/nvme2n1
931.25 GB   25.0%
  [block.db]  vg: vg/lv   
223.25 GB   25%

  [data]  /dev/nvme2n1
931.25 GB   25.0%
  [block.db]  vg: vg/lv   
223.25 GB   25%

  [data]  /dev/nvme2n1
931.25 GB   25.0%
  [block.db]  vg: vg/lv   
223.25 GB   25%

  [data]  /dev/nvme2n1
931.25 GB   25.0%
  [block.db]  vg: vg/lv   
223.25 GB   25%
--> The above OSDs would be created if the operation continues
--> do you want to proceed? (yes/no)
```

This does split up the NVMe disk into 4 OSDs, and creates WAL/DB partition on 
the Optane drive - however, it creates 4 x 223 GB partitions on the Optane 
(whereas I want 35GB partitions).

Is there any way to specify the WAL/DB partition size in the above?

And can it be done, such that you can run successive ceph-volume commands, to 
add the OSDs and WAL/DB partitions for each NVMe disk?

(Or if there's an easier way to achieve the above layout, please let me know).

That being said - I also just saw this ceph-users thread:

https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/3Y6DEJCF7ZMXJL2NRLXUUEX76W7PPYXK/

It talks there about "osd op num shards" and "osd op num threads per shard" - 
is there some way to set those, to achieve similar performance to say, 4 x OSDs 
per NVMe drive, but with only 1 x NVMe? Has anybody done any 
testing/benchmarking on this they can share?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: safest way to re-crush a pool

2020-11-10 Thread Michael Thomas
Yes, of course this works.  For some reason I recall having trouble when 
I tried this on my first ceph install.  But I think in that case I 
didn't change the crush tree, but instead I had changed the device 
classes without changing the crush tree.


In any case, the re-crush worked fine.

--Mike

On 11/10/20 4:20 PM, dhils...@performair.com wrote:

Michael;

I run a Nautilus cluster, but all I had to do was change the rule associated 
with the pool, and ceph moved the data.

Thank you,

Dominic L. Hilsbos, MBA
Director - Information Technology
Perform Air International Inc.
dhils...@performair.com
www.PerformAir.com



-Original Message-
From: Michael Thomas [mailto:w...@caltech.edu]
Sent: Tuesday, November 10, 2020 1:32 PM
To: ceph-users@ceph.io
Subject: [ceph-users] safest way to re-crush a pool

I'm setting up a radosgw for my ceph Octopus cluster.  As soon as I
started the radosgw service, I notice that it created a handful of new
pools.  These pools were assigned the 'replicated_data' crush rule
automatically.

I have a mixed hdd/ssd/nvme cluster, and this 'replicated_data' crush
rule spans all device types.  I would like radosgw to use a replicated
SSD pool and avoid the HDDs.  What is the recommended way to change the
crush device class for these pools without risking the loss of any data
in the pools?  I will note that I have not yet written any user data to
the pools.  Everything in them was added by the radosgw process
automatically.

--Mike
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-10 Thread Rafael Lopez
Hi Janek,

What you said sounds right - an S3 single part obj won't have an S3
multipart string as part of the prefix. S3 multipart string looks like
"2~m5Y42lPMIeis5qgJAZJfuNnzOKd7lme".

>From memory, single part S3 objects that don't fit in a single rados object
are assigned a random prefix that has nothing to do with the object name,
and the rados tail/data objects (not the head object) have that prefix.
As per your working example, the prefix for that would be
'.8naRUHSG2zfgjqmwLnTPvvY1m6DZsgh'. So there would be (239) "shadow"
objects with names containing that prefix, and if you add up the sizes it
should be the size of your S3 object.

You should look at working and non working examples of both single and
multipart S3 objects, as they are probably all a bit different when you
look in rados.

I agree it is a serious issue, because once objects are no longer in rados,
they cannot be recovered. If it was a case that there was a link broken or
rados objects renamed, then we could work to recover...but as far as I can
tell, it looks like stuff is just vanishing from rados. The only
explanation I can think of is some (rgw or rados) background process is
incorrectly doing something with these objects (eg. renaming/deleting). I
had thought perhaps it was a bug with the rgw garbage collector..but that
is pure speculation.

Once you can articulate the problem, I'd recommend logging a bug tracker
upstream.


On Wed, 11 Nov 2020 at 06:33, Janek Bevendorff <
janek.bevendo...@uni-weimar.de> wrote:

> Here's something else I noticed: when I stat objects that work via
> radosgw-admin, the stat info contains a "begin_iter" JSON object with
> RADOS key info like this
>
>
> "key": {
> "name":
> "29/items/WIDE-20110924034843-crawl420/WIDE-20110924065228-02544.warc.gz",
> "instance": "",
> "ns": ""
> }
>
>
> and then "end_iter" with key info like this:
>
>
> "key": {
> "name": ".8naRUHSG2zfgjqmwLnTPvvY1m6DZsgh_239",
> "instance": "",
> "ns": "shadow"
> }
>
> However, when I check the broken 0-byte object, the "begin_iter" and
> "end_iter" keys look like this:
>
>
> "key": {
> "name":
> "29/items/WIDE-20110903143858-crawl428/WIDE-20110903143858-01166.warc.gz.2~m5Y42lPMIeis5qgJAZJfuNnzOKd7lme.1",
> "instance": "",
> "ns": "multipart"
> }
>
> [...]
>
>
> "key": {
> "name":
> "29/items/WIDE-20110903143858-crawl428/WIDE-20110903143858-01166.warc.gz.2~m5Y42lPMIeis5qgJAZJfuNnzOKd7lme.19",
> "instance": "",
> "ns": "multipart"
> }
>
> So, it's the full name plus a suffix and the namespace is multipart, not
> shadow (or empty). This in itself may just be an artefact of whether the
> object was uploaded in one go or as a multipart object, but the second
> difference is that I cannot find any of the multipart objects in my pool's
> object name dump. I can, however, find the shadow RADOS object of the
> intact S3 object.
>
>

-- 
*Rafael Lopez*
Devops Systems Engineer
Monash University eResearch Centre

T: +61 3 9905 9118
E: rafael.lo...@monash.edu
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephfs Kernel client not working properly without ceph cluster IP

2020-11-10 Thread Amudhan P
Hi Janne,

My OSD's have both public IP and Cluster IP configured. The monitor node
and OSD nodes are co-located.

regards
Amudhan P

On Tue, Nov 10, 2020 at 4:45 PM Janne Johansson  wrote:

>
>
> Den tis 10 nov. 2020 kl 11:13 skrev Amudhan P :
>
>> Hi Nathan,
>>
>> Kernel client should be using only the public IP of the cluster to
>> communicate with OSD's.
>>
>
> "ip of the cluster" is a bit weird way to state it.
>
> A mounting client needs only to talk to ips in the public range yes, but
> OSDs alwaysneed to have an ip in the public range too.
> The private range is only for OSD<->OSD traffic and can be in the private
> network, meaning an OSD which uses both private and public ranges needs two
> ips, one in each range.
>
>
>
>> But here it requires both IP's for mount to work properly.
>>
>> regards
>> Amudhan
>>
>>
>>
>> On Mon, Nov 9, 2020 at 9:51 PM Nathan Fish  wrote:
>>
>> > It sounds like your client is able to reach the mon but not the OSD?
>> > It needs to be able to reach all mons and all OSDs.
>> >
>> > On Sun, Nov 8, 2020 at 4:29 AM Amudhan P  wrote:
>> > >
>> > > Hi,
>> > >
>> > > I have mounted my cephfs (ceph octopus) thru kernel client in Debian.
>> > > I get following error in "dmesg" when I try to read any file from my
>> > mount.
>> > > "[  236.429897] libceph: osd1 10.100.4.1:6891 socket closed (con
>> state
>> > > CONNECTING)"
>> > >
>> > > I use public IP (10.100.3.1) and cluster IP (10.100.4.1) in my ceph
>> > > cluster. I think public IP is enough to mount the share and work on it
>> > but
>> > > in my case, it needs me to assign public IP also to the client to work
>> > > properly.
>> > >
>> > > Does anyone have experience in this?
>> > >
>> > > I have earlier also mailed the ceph-user group but I didn't get any
>> > > response. So sending again not sure my mail went through.
>> > >
>> > > regards
>> > > Amudhan
>> > > ___
>> > > ceph-users mailing list -- ceph-users@ceph.io
>> > > To unsubscribe send an email to ceph-users-le...@ceph.io
>> >
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
>
> --
> May the most significant bit of your life be positive.
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephfs Kernel client not working properly without ceph cluster IP

2020-11-10 Thread Amudhan P
Hi Eugen,

I have only added my Public IP and relevant hostname to hosts file.

Do you find any issue in the below commands I have used to set cluster IP
in cluster.

### adding public IP for ceph cluster ###
ceph config set global cluster_network 10.100.4.0/24

ceph orch daemon reconfig mon.host1
ceph orch daemon reconfig mon.host2
ceph orch daemon reconfig mon.host3
ceph orch daemon reconfig osd.1
ceph orch daemon reconfig osd.2
ceph orch daemon reconfig osd.3

restarting all daemons.

regards
Amudhan

On Tue, Nov 10, 2020 at 7:42 PM Eugen Block  wrote:

> Could it be possible that you have some misconfiguration in the name
> resolution and IP mapping? I've never heard or experienced that a
> client requires a cluster address, that would make the whole concept
> of separate networks obsolete which is hard to believe, to be honest.
> I would recommend to double-check your setup.
>
>
> Zitat von Amudhan P :
>
> > Hi Nathan,
> >
> > Kernel client should be using only the public IP of the cluster to
> > communicate with OSD's.
> >
> > But here it requires both IP's for mount to work properly.
> >
> > regards
> > Amudhan
> >
> >
> >
> > On Mon, Nov 9, 2020 at 9:51 PM Nathan Fish  wrote:
> >
> >> It sounds like your client is able to reach the mon but not the OSD?
> >> It needs to be able to reach all mons and all OSDs.
> >>
> >> On Sun, Nov 8, 2020 at 4:29 AM Amudhan P  wrote:
> >> >
> >> > Hi,
> >> >
> >> > I have mounted my cephfs (ceph octopus) thru kernel client in Debian.
> >> > I get following error in "dmesg" when I try to read any file from my
> >> mount.
> >> > "[  236.429897] libceph: osd1 10.100.4.1:6891 socket closed (con
> state
> >> > CONNECTING)"
> >> >
> >> > I use public IP (10.100.3.1) and cluster IP (10.100.4.1) in my ceph
> >> > cluster. I think public IP is enough to mount the share and work on it
> >> but
> >> > in my case, it needs me to assign public IP also to the client to work
> >> > properly.
> >> >
> >> > Does anyone have experience in this?
> >> >
> >> > I have earlier also mailed the ceph-user group but I didn't get any
> >> > response. So sending again not sure my mail went through.
> >> >
> >> > regards
> >> > Amudhan
> >> > ___
> >> > ceph-users mailing list -- ceph-users@ceph.io
> >> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io