[ceph-users] Re: rbd mirroring - journal growing and snapshot high io load

2022-05-24 Thread Arthur Outhenin-Chalandre
Hi Ronny,

Not sure what could have cause your outage with journaling TBH :/. Best
of luck for the Ceph/Proxmox bug!

On 5/23/22 20:09, ronny.lippold wrote:
> hi arthur,
> 
> just for information. we had some horrible days ...
> 
> last week, we shut some virtual machines down.
> most of them did not came back. timeout qmp socket ... and no kvm 
> console.
> 
> so, we switched to our rbd-mirror cluster and ... yes, was working, puh.
> 
> some days later, we tried to install a devel proxmox package, which 
> should help.
> did not ... helpfull was, to rbd move the image and than move back (like 
> rename).
> 
> today, i found the answer.
> 
> i cleaned up the pool config and we removed the journaling feature from 
> the images.
> after that, everything was booting fine.
> 
> maybe the performance issue with snapshots came from an proxmox bug ... 
> we will see
> (https://forum.proxmox.com/threads/possible-bug-after-upgrading-to-7-2-vm-freeze-if-backing-up-large-disks.109272/)
> 
> have a great time ...
> 
> ronny
> 
> Am 2022-05-12 15:29, schrieb Arthur Outhenin-Chalandre:
>> On 5/12/22 14:31, ronny.lippold wrote:
>>> many thanks, we will check the slides ... are looking great
>>>
>>>
>
> ok, you mean, that the growing came, cause of replication is to 
> slow?
> strange ... i thought our cluster is not so big ... but ok.
> so, we cannot use journal ...
> maybe some else have same result?

 If you want a bit more details on this you can check my slides here:
 https://codimd.web.cern.ch/p/-qWD2Y0S9#/.


 Hmmm I think there are some plan to have a way to spread the 
 snapshots
 in the provided interval in Reef (and not take every snapshots at 
 once)
 but that's unfortunately not here today... The timing thing is a bit
 weird but I am not an expert on RBD snapshots implication in 
 general...
 Maybe you can try to reproduce by taking snapshot by hand with `rbd
 mirror image snapshot` on some of your images, maybe that's something
 related to really big images? Or that there was a lot of write since
 the
 last snapshot?

>>>
>>> yes right, i was alos thinking of this ...
>>> i would like to find something, to debug the problem.
>>> problems after 50days ... i do not understand this
>>>
>>> which way are you actually going? do you have a replication?
>>
>> We are going towards mirror snapshots, but we didn't advertise
>> internally so far and we won't enable it on every images; it would only
>> be for new volumes if people want explicitly that feature. So we are
>> probably not going to hit these performance issues that you suffer for
>> quite some time and the scope of it should be limited...

-- 
Arthur Outhenin-Chalandre
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] disaster in many of osd disk

2022-05-24 Thread farhad kh
 I lost some disks in my cluster ceph then began to correct the structure
of the objects and replicate them
This caused me to get some errors on the s3 api

Gateway Time-out (Service: Amazon S3; Status Code: 504; Error Code: 504
Gateway Time-out; Request ID: null; S3 Extended Request ID: null; Proxy:
null

I also have a warning that OSD discs are about to fill up
How can this be explained? How to delay the ramp and clearing of objects?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] HDD disk for RGW and CACHE tier for giving beter performance

2022-05-24 Thread farhad kh
I want to save data pools for rgw on HDD disk drives And use some SSD hard
drive for the cache  tier on top of it
Has anyone tested this scenario?
Is this practical and optimal?
How can I do this?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: HDD disk for RGW and CACHE tier for giving beter performance

2022-05-24 Thread Boris
Hi Farhad,

you can put the block.db (contains WAL and Metadata) on SSDs, when creating the 
OSD. 

Cheers

 - Boris

> Am 24.05.2022 um 11:52 schrieb farhad kh :
> 
> I want to save data pools for rgw on HDD disk drives And use some SSD hard
> drive for the cache  tier on top of it
> Has anyone tested this scenario?
> Is this practical and optimal?
> How can I do this?
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RGW error s3 api

2022-05-24 Thread farhad kh
hi
 i have a lot of error in s3 api
in client s3 i get this :

2022-05-24 10:49:58.095 ERROR 156723 --- [exec-upload-21640003-285-2]
i.p.p.d.service.UploadDownloadService: Gateway Time-out (Service:
Amazon S3; Status Code: 504; Error Code: 504 Gateway Time-out; Request ID:
null; S3 Extended Request ID: null; Proxy: null)
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RGW error s3 api

2022-05-24 Thread farhad kh
 hi
 i have a lot of error in s3 api
in client s3 i get this :

2022-05-24 10:49:58.095 ERROR 156723 --- [exec-upload-21640003-285-2]
i.p.p.d.service.UploadDownloadService: Gateway Time-out (Service:
Amazon S3; Status Code: 504; Error Code: 504 Gateway Time-out; Request ID:
null; S3 Extended Request ID: null; Proxy: null)
2022-05-24 10:49:52.933 ERROR 156723 --- [exec-upload-21640003-282-2]
i.p.p.d.service.UploadDownloadService: Unable to execute HTTP request:
Broken pipe (Write failed)
2022-05-24 10:49:30.646 ERROR 156723 --- [exec-upload-21640003-277-2]
i.p.p.d.service.UploadDownloadService: Unable to execute HTTP request:
The target server failed to respond

--and i get logs for rgw container show this error

debug 2022-05-24T10:50:26.888+ 7fcd35b31700  0 req 17644239529065517086
1.464013577s ERROR: RESTFUL_IO(s)->complete_header() returned
err=Connection reset by peer
debug 2022-05-24T10:50:42.487+ 7fccbb23c700  0 req 17818983848567298192
2.116019487s ERROR: RESTFUL_IO(s)->complete_header() returned err=Broken
pipe
debug 2022-05-24T10:49:58.091+ 7fcd5eb83700  0 req 9823709374853588294
2.907027006s ERROR: RESTFUL_IO(s)->complete_header() returned
err=Connection reset by peer
debug 2022-05-24T10:49:58.104+ 7fcd2fb25700  1 req 13985293647902708528
0.00437s op->ERRORHANDLER: err_no=-2 new_err_no=-2
debug 2022-05-24T09:54:45.012+ 7fa10050e700  0 ERROR:
client_io->complete_request() returned Connection reset by peer

anythings in my cluster is ok and healthy
and system load is so lower than normal
Can anyone tell me why this is happening or how I can find the cause?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] rbd command hangs

2022-05-24 Thread Sopena Ballesteros Manuel
Dear ceph user community,


I am trying to install and configure a node with a ceph cluster. The linux 
kernel we have does not include the rbd kernel module, hence we installed if 
ourselves:


zypper install -y ceph-common > 15
zypper install -y kernel-source = 
5.3.18-24.75_10.0.189_2.1_20.4__g0388af5bc3.shasta
cp /boot/config-5.3.18-24.75_10.0.189-cray_shasta_c /usr/src/linux/.config
chown root:root /usr/src/linux/.config
chown 0644 /usr/src/linux/.config
cd /usr/src/linux
sed -i 's/^# CONFIG_BLK_DEV_RBD is not set/CONFIG_BLK_DEV_RBD=m/g' .config && 
echo 'CONFIG_TCM_RBD=m' >> .config
make drivers/block/rbd.ko
cp /usr/src/linux/drivers/block/rbd.ko 
/lib/modules/5.3.18-24.75_10.0.189-cray_shasta_c/extra/rbd.ko
chown root:root /lib/modules/5.3.18-24.75_10.0.189-cray_shasta_c/extra/rbd.ko
chown 0644 /lib/modules/5.3.18-24.75_10.0.189-cray_shasta_c/extra/rbd.ko


My issue is that rbd command sometimes hangs and we don't know why, this does 
not occur all the time but quite frequently. I google bit but could not find 
any relevant solution so I am looking for advice.


What could cause rbd command to hang?


Below is an strace of when we try to run an rbd command:


nid001388:~ # strace rbd -n client.noir map noir-nvme-meta/nid001388
execve("/usr/bin/rbd", ["rbd", "-n", "client.noir", "map", 
"noir-nvme-meta/nid001388"], 0x7ffe8c35b7b0 /* 62 vars */) = 0
brk(NULL)   = 0x563a18e24000
access("/etc/ld.so.preload", R_OK)  = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib64/ceph/tls/x86_64/x86_64/librbd.so.1", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/ceph/tls/x86_64/x86_64", 0x7ffd2f218d10) = -1 ENOENT (No such 
file or directory)
openat(AT_FDCWD, "/usr/lib64/ceph/tls/x86_64/librbd.so.1", O_RDONLY|O_CLOEXEC) 
= -1 ENOENT (No such file or directory)
stat("/usr/lib64/ceph/tls/x86_64", 0x7ffd2f218d10) = -1 ENOENT (No such file or 
directory)
openat(AT_FDCWD, "/usr/lib64/ceph/tls/x86_64/librbd.so.1", O_RDONLY|O_CLOEXEC) 
= -1 ENOENT (No such file or directory)
stat("/usr/lib64/ceph/tls/x86_64", 0x7ffd2f218d10) = -1 ENOENT (No such file or 
directory)
openat(AT_FDCWD, "/usr/lib64/ceph/tls/librbd.so.1", O_RDONLY|O_CLOEXEC) = -1 
ENOENT (No such file or directory)
stat("/usr/lib64/ceph/tls", 0x7ffd2f218d10) = -1 ENOENT (No such file or 
directory)
openat(AT_FDCWD, "/usr/lib64/ceph/x86_64/x86_64/librbd.so.1", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/ceph/x86_64/x86_64", 0x7ffd2f218d10) = -1 ENOENT (No such file 
or directory)
openat(AT_FDCWD, "/usr/lib64/ceph/x86_64/librbd.so.1", O_RDONLY|O_CLOEXEC) = -1 
ENOENT (No such file or directory)
stat("/usr/lib64/ceph/x86_64", 0x7ffd2f218d10) = -1 ENOENT (No such file or 
directory)
openat(AT_FDCWD, "/usr/lib64/ceph/x86_64/librbd.so.1", O_RDONLY|O_CLOEXEC) = -1 
ENOENT (No such file or directory)
stat("/usr/lib64/ceph/x86_64", 0x7ffd2f218d10) = -1 ENOENT (No such file or 
directory)
openat(AT_FDCWD, "/usr/lib64/ceph/librbd.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT 
(No such file or directory)
stat("/usr/lib64/ceph", {st_mode=S_IFDIR|0755, st_size=42, ...}) = 0
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=61308, ...}) = 0
mmap(NULL, 61308, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7ff1f8f2e000
close(3)= 0
openat(AT_FDCWD, "/usr/lib64/librbd.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\300\3\6\0\0\0\0\0"..., 
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=5584768, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7ff1f8f2c000
mmap(NULL, 769, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0x7ff1f85c
mprotect(0x7ff1f8af4000, 2097152, PROT_NONE) = 0
mmap(0x7ff1f8cf4000, 126976, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x534000) = 0x7ff1f8cf4000
mmap(0x7ff1f8d13000, 1, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7ff1f8d13000
close(3)= 0
openat(AT_FDCWD, "/usr/lib64/ceph/librados.so.2", O_RDONLY|O_CLOEXEC) = -1 
ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib64/librados.so.2", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0PM\3\0\0\0\0\0"..., 832) 
= 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1362768, ...}) = 0
mmap(NULL, 3459208, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0x7ff1f8273000
mprotect(0x7ff1f83b1000, 2097152, PROT_NONE) = 0
mmap(0x7ff1f85b1000, 61440, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x13e000) = 0x7ff1f85b1000
close(3)= 0
openat(AT_FDCWD, "/usr/lib64/ceph/libncurses.so.6", O_RDONLY|O_CLOEXEC) = -1 
ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib64/libncurses.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\\211\0\0\0\0\0\

[ceph-users] Re: rbd command hangs

2022-05-24 Thread Ilya Dryomov
On Tue, May 24, 2022 at 3:57 PM Sopena Ballesteros Manuel
 wrote:
>
> Dear ceph user community,
>
>
> I am trying to install and configure a node with a ceph cluster. The linux 
> kernel we have does not include the rbd kernel module, hence we installed if 
> ourselves:
>
>
> zypper install -y ceph-common > 15
> zypper install -y kernel-source = 
> 5.3.18-24.75_10.0.189_2.1_20.4__g0388af5bc3.shasta
> cp /boot/config-5.3.18-24.75_10.0.189-cray_shasta_c /usr/src/linux/.config
> chown root:root /usr/src/linux/.config
> chown 0644 /usr/src/linux/.config
> cd /usr/src/linux
> sed -i 's/^# CONFIG_BLK_DEV_RBD is not set/CONFIG_BLK_DEV_RBD=m/g' .config && 
> echo 'CONFIG_TCM_RBD=m' >> .config
> make drivers/block/rbd.ko
> cp /usr/src/linux/drivers/block/rbd.ko 
> /lib/modules/5.3.18-24.75_10.0.189-cray_shasta_c/extra/rbd.ko
> chown root:root /lib/modules/5.3.18-24.75_10.0.189-cray_shasta_c/extra/rbd.ko
> chown 0644 /lib/modules/5.3.18-24.75_10.0.189-cray_shasta_c/extra/rbd.ko
>
>
> My issue is that rbd command sometimes hangs and we don't know why, this does 
> not occur all the time but quite frequently. I google bit but could not find 
> any relevant solution so I am looking for advice.
>
>
> What could cause rbd command to hang?

Hi Manuel,

Did you check if the RBD device gets mapped anyway?  If the mapping
succeeds despite the hang, it is probably hanging waiting for udev to
do its job.  It could be somehow related to the stripped down kernel
you are using or, if you are running "rbd map" from a container, there
may be issues with netlink event propagation.  Try "noudev" mapping
option:

$ rbd map -o noudev noir-nvme-meta/nid001388

>
>
> Below is an strace of when we try to run an rbd command:
>
>
> nid001388:~ # strace rbd -n client.noir map noir-nvme-meta/nid001388
> execve("/usr/bin/rbd", ["rbd", "-n", "client.noir", "map", 
> "noir-nvme-meta/nid001388"], 0x7ffe8c35b7b0 /* 62 vars */) = 0
>
> [...]
>
> add_key("ceph", "client.noir", "--REDACTED--", 28, KEY_SPEC_PROCESS_KEYRING) 
> = 201147173
> access("/run/udev/control", F_OK)   = 0
> socket(AF_NETLINK, SOCK_RAW|SOCK_CLOEXEC|SOCK_NONBLOCK, 
> NETLINK_KOBJECT_UEVENT) = 3
> setsockopt(3, SOL_SOCKET, SO_RCVBUFFORCE, [1048576], 4) = 0
> setsockopt(3, SOL_SOCKET, SO_ATTACH_FILTER, {len=13, filter=0x7ffd2f2179c0}, 
> 16) = 0
> bind(3, {sa_family=AF_NETLINK, nl_pid=0, nl_groups=0x02}, 12) = 0
> getsockname(3, {sa_family=AF_NETLINK, nl_pid=21421, nl_groups=0x02}, 
> [12]) = 0
> setsockopt(3, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0
> pipe2([4, 5], O_NONBLOCK)   = 0
> mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 
> 0x7ff1e954f000
> mprotect(0x7ff1e955, 8388608, PROT_READ|PROT_WRITE) = 0
> clone(child_stack=0x7ff1e9d4a230, 
> flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID,
>  parent_tid=[21425], tls=0x7ff1e9d4f700, child_tidptr=0x7ff1e9d4f9d0) = 21425
> poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}], 2, -1

This doesn't tell anything definitive as the actual mapping is done
from a thread.  Pass -f to strace to also trace child processes.

Thanks,

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Upgrade paths beyond octopus on Centos7

2022-05-24 Thread Gary Molenkamp

Good morning,

I'm looking into viable upgrade paths on my cephadm based octopus 
deployment running on Centos7.   Given the podman support matrix for 
cephadm, how did others successful move to Pacific under a Rhel8 based OS?
I am looking to use rocky moving forward, but the latest 8.6 uses podman 
4.0 which does not seem to be supported for either Octopus (Podman < 
2.2) or Pacific (Podman > 2.0-3.0).


I was hoping to upgrade the host OS first before moving from Octopus to 
Pacific to limit the risks, so I'm trying to find a container solution 
the works for supporting the older Octopus and a future update to Pacific.
Perhaps I should I switch back to docker based containers until the 
podman compatibility issues stabilize?


Thanks
Gary

--
Gary Molenkamp  Science Technology Services
Systems/Cloud Administrator University of Western Ontario
molen...@uwo.ca http://sts.sci.uwo.ca
(519) 661-2111 x86882   (519) 661-3566

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rbd command hangs

2022-05-24 Thread Sopena Ballesteros Manuel
Hi Ilya,


thank you very much for your prompt response,


Any rbd command variation is affected (mapping device included)

We are using a physical machine (no container involved)


Below is the output of the running strace as suggested:


nid001388:/usr/src/linux # strace -f rbd -n client.noir -o noudev map 
noir-nvme-meta/nid001388
execve("/usr/bin/rbd", ["rbd", "-n", "client.noir", "-o", "noudev", "map", 
"noir-nvme-meta/nid001388"], 0x7ffc33caafe8 /* 63 vars */) = 0
brk(NULL)   = 0x55ee2015c000
access("/etc/ld.so.preload", R_OK)  = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib64/ceph/tls/x86_64/x86_64/librbd.so.1", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/ceph/tls/x86_64/x86_64", 0x7ffe3af2a840) = -1 ENOENT (No such 
file or directory)
openat(AT_FDCWD, "/usr/lib64/ceph/tls/x86_64/librbd.so.1", O_RDONLY|O_CLOEXEC) 
= -1 ENOENT (No such file or directory)
stat("/usr/lib64/ceph/tls/x86_64", 0x7ffe3af2a840) = -1 ENOENT (No such file or 
directory)
openat(AT_FDCWD, "/usr/lib64/ceph/tls/x86_64/librbd.so.1", O_RDONLY|O_CLOEXEC) 
= -1 ENOENT (No such file or directory)
stat("/usr/lib64/ceph/tls/x86_64", 0x7ffe3af2a840) = -1 ENOENT (No such file or 
directory)
openat(AT_FDCWD, "/usr/lib64/ceph/tls/librbd.so.1", O_RDONLY|O_CLOEXEC) = -1 
ENOENT (No such file or directory)
stat("/usr/lib64/ceph/tls", 0x7ffe3af2a840) = -1 ENOENT (No such file or 
directory)
openat(AT_FDCWD, "/usr/lib64/ceph/x86_64/x86_64/librbd.so.1", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/ceph/x86_64/x86_64", 0x7ffe3af2a840) = -1 ENOENT (No such file 
or directory)
openat(AT_FDCWD, "/usr/lib64/ceph/x86_64/librbd.so.1", O_RDONLY|O_CLOEXEC) = -1 
ENOENT (No such file or directory)
stat("/usr/lib64/ceph/x86_64", 0x7ffe3af2a840) = -1 ENOENT (No such file or 
directory)
openat(AT_FDCWD, "/usr/lib64/ceph/x86_64/librbd.so.1", O_RDONLY|O_CLOEXEC) = -1 
ENOENT (No such file or directory)
stat("/usr/lib64/ceph/x86_64", 0x7ffe3af2a840) = -1 ENOENT (No such file or 
directory)
openat(AT_FDCWD, "/usr/lib64/ceph/librbd.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT 
(No such file or directory)
stat("/usr/lib64/ceph", {st_mode=S_IFDIR|0755, st_size=42, ...}) = 0
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=61308, ...}) = 0
mmap(NULL, 61308, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fdc7c93e000
close(3)= 0
openat(AT_FDCWD, "/usr/lib64/librbd.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\300\3\6\0\0\0\0\0"..., 
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=5584768, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7fdc7c93c000
mmap(NULL, 769, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0x7fdc7bfd
mprotect(0x7fdc7c504000, 2097152, PROT_NONE) = 0
mmap(0x7fdc7c704000, 126976, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x534000) = 0x7fdc7c704000
mmap(0x7fdc7c723000, 1, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fdc7c723000
close(3)= 0
openat(AT_FDCWD, "/usr/lib64/ceph/librados.so.2", O_RDONLY|O_CLOEXEC) = -1 
ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib64/librados.so.2", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0PM\3\0\0\0\0\0"..., 832) 
= 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1362768, ...}) = 0
mmap(NULL, 3459208, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0x7fdc7bc83000
mprotect(0x7fdc7bdc1000, 2097152, PROT_NONE) = 0
mmap(0x7fdc7bfc1000, 61440, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x13e000) = 0x7fdc7bfc1000
close(3)= 0
openat(AT_FDCWD, "/usr/lib64/ceph/libncurses.so.6", O_RDONLY|O_CLOEXEC) = -1 
ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib64/libncurses.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\\211\0\0\0\0\0\0"..., 
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=166968, ...}) = 0
mmap(NULL, 2262464, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0x7fdc7ba5a000
mprotect(0x7fdc7ba82000, 2093056, PROT_NONE) = 0
mmap(0x7fdc7bc81000, 8192, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x27000) = 0x7fdc7bc81000
close(3)= 0
openat(AT_FDCWD, "/usr/lib64/ceph/libtinfo.so.6", O_RDONLY|O_CLOEXEC) = -1 
ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib64/libtinfo.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0p\225\0\0\0\0\0\0"..., 
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=189288, ...}) = 0
mmap(NULL, 2285344, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0x7fdc7b82c000
mprotect(0x7fdc7b852000, 2093056, PROT_NONE) = 0
mmap(0x7fdc7ba51000, 36864, PROT_READ|PROT_WRITE, 
MAP_PRIV

[ceph-users] Re: rbd command hangs

2022-05-24 Thread Ilya Dryomov
On Tue, May 24, 2022 at 5:20 PM Sopena Ballesteros Manuel
 wrote:
>
> Hi Ilya,
>
>
> thank you very much for your prompt response,
>
>
> Any rbd command variation is affected (mapping device included)
>
> We are using a physical machine (no container involved)
>
>
> Below is the output of the running strace as suggested:
>
>
> nid001388:/usr/src/linux # strace -f rbd -n client.noir -o noudev map 
> noir-nvme-meta/nid001388
> execve("/usr/bin/rbd", ["rbd", "-n", "client.noir", "-o", "noudev", "map", 
> "noir-nvme-meta/nid001388"], 0x7ffc33caafe8 /* 63 vars */) = 0
>
> [...]
>
> [pid 28954] openat(AT_FDCWD, "/sys/bus/rbd/add_single_major", O_WRONLY) = 6
> [pid 28954] write(6, "148.187.20.141:6789 name=noir,ke"..., 72  ...>

OK, it appears to be stuck in the kernel, attempting to map the image.
Is there anything in dmesg?

Thanks,

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rbd command hangs

2022-05-24 Thread Sopena Ballesteros Manuel
yes dmesg shows the following:

...

[23661.367449] rbd: rbd12: failed to lock header: -13
[23661.367968] rbd: rbd2: no lock owners detected
[23661.369306] rbd: rbd11: no lock owners detected
[23661.370068] rbd: rbd11: breaking header lock owned by client21473520
[23661.370518] rbd: rbd11: blacklist of client21473520 failed: -13
[23661.370519] rbd: rbd11: failed to lock header: -13
[23661.370869] rbd: rbd5: no lock owners detected
[23661.371994] rbd: rbd1: no lock owners detected
[23661.372546] rbd: rbd1: breaking header lock owned by client21473520
[23661.373058] rbd: rbd1: blacklist of client21473520 failed: -13
[23661.373059] rbd: rbd1: failed to lock header: -13
[23661.374111] rbd: rbd2: breaking header lock owned by client21473520
[23661.374485] rbd: rbd4: no lock owners detected
[23661.375210] rbd: rbd4: breaking header lock owned by client21473520
[23661.375701] rbd: rbd4: blacklist of client21473520 failed: -13
[23661.375702] rbd: rbd4: failed to lock header: -13
[23661.376881] rbd: rbd5: breaking header lock owned by client21473520
[23661.381151] rbd: rbd2: blacklist of client21473520 failed: -13
[23661.385151] rbd: rbd5: blacklist of client21473520 failed: -13
[23661.388279] rbd: rbd2: failed to lock header: -13




From: Ilya Dryomov 
Sent: Tuesday, May 24, 2022 5:53:21 PM
To: Sopena Ballesteros Manuel
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] rbd command hangs

On Tue, May 24, 2022 at 5:20 PM Sopena Ballesteros Manuel
 wrote:
>
> Hi Ilya,
>
>
> thank you very much for your prompt response,
>
>
> Any rbd command variation is affected (mapping device included)
>
> We are using a physical machine (no container involved)
>
>
> Below is the output of the running strace as suggested:
>
>
> nid001388:/usr/src/linux # strace -f rbd -n client.noir -o noudev map 
> noir-nvme-meta/nid001388
> execve("/usr/bin/rbd", ["rbd", "-n", "client.noir", "-o", "noudev", "map", 
> "noir-nvme-meta/nid001388"], 0x7ffc33caafe8 /* 63 vars */) = 0
>
> [...]
>
> [pid 28954] openat(AT_FDCWD, "/sys/bus/rbd/add_single_major", O_WRONLY) = 6
> [pid 28954] write(6, "148.187.20.141:6789 name=noir,ke"..., 72  ...>

OK, it appears to be stuck in the kernel, attempting to map the image.
Is there anything in dmesg?

Thanks,

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rbd command hangs

2022-05-24 Thread Ilya Dryomov
On Tue, May 24, 2022 at 8:14 PM Sopena Ballesteros Manuel
 wrote:
>
> yes dmesg shows the following:
>
> ...
>
> [23661.367449] rbd: rbd12: failed to lock header: -13
> [23661.367968] rbd: rbd2: no lock owners detected
> [23661.369306] rbd: rbd11: no lock owners detected
> [23661.370068] rbd: rbd11: breaking header lock owned by client21473520
> [23661.370518] rbd: rbd11: blacklist of client21473520 failed: -13
> [23661.370519] rbd: rbd11: failed to lock header: -13
> [23661.370869] rbd: rbd5: no lock owners detected
> [23661.371994] rbd: rbd1: no lock owners detected
> [23661.372546] rbd: rbd1: breaking header lock owned by client21473520
> [23661.373058] rbd: rbd1: blacklist of client21473520 failed: -13
> [23661.373059] rbd: rbd1: failed to lock header: -13
> [23661.374111] rbd: rbd2: breaking header lock owned by client21473520
> [23661.374485] rbd: rbd4: no lock owners detected
> [23661.375210] rbd: rbd4: breaking header lock owned by client21473520
> [23661.375701] rbd: rbd4: blacklist of client21473520 failed: -13
> [23661.375702] rbd: rbd4: failed to lock header: -13
> [23661.376881] rbd: rbd5: breaking header lock owned by client21473520
> [23661.381151] rbd: rbd2: blacklist of client21473520 failed: -13
> [23661.385151] rbd: rbd5: blacklist of client21473520 failed: -13
> [23661.388279] rbd: rbd2: failed to lock header: -13

What is the output of "ceph auth get client.noir"?  The auth caps are
likely incorrect and missing blocklist permissions, see

https://docs.ceph.com/en/quincy/rbd/rados-rbd-cmds/#create-a-block-device-user

Thanks,

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] cephadm error mgr not available and ERROR: Failed to add host

2022-05-24 Thread farhad kh
hi
i want used private registry for running cluster ceph storage and i changed
default registry my container runtime (docker)
/etc/docker/deamon.json
{
  "registery-mirrors": ["https://private-registery.fst";]
}

and all registry addres in /usr/sbin/cephadm(quay.ceph.io and docker.io to
my private registry cat /usr/sbin/cephadm | grep private-registery.fst

DEFAULT_IMAGE = 'private-registery.fst/ceph/ceph:v16.2.7'
DEFAULT_PROMETHEUS_IMAGE = 'private-registery.fst/ceph/prometheus:v2.18.1'
DEFAULT_NODE_EXPORTER_IMAGE =
'private-registery.fst/ceph/node-exporter:v0.18.1'
DEFAULT_ALERT_MANAGER_IMAGE =
'private-registery.fst/ceph/alertmanager:v0.20.0'
DEFAULT_GRAFANA_IMAGE = 'private-registery.fst/ceph/ceph-grafana:6.7.4'
DEFAULT_HAPROXY_IMAGE = 'private-registery.fst/ceph/haproxy:2.3'
DEFAULT_KEEPALIVED_IMAGE = 'private-registery.fst/ceph/keepalived'
DEFAULT_REGISTRY = 'private-registery.fst'   # normalize unqualified
digests to this
>>> normalize_image_digest('ceph/ceph:v16', 'private-registery.fst')
>>> normalize_image_digest('private-registery.fst/ceph/ceph:v16',
'private-registery.fst')
'private-registery.fst/ceph/ceph:v16'
>>> normalize_image_digest('private-registery.fst/ceph',
'private-registery.fst')
>>> normalize_image_digest('localhost/ceph', 'private-registery.fst')

when i try deply first node of cluseter with cephadm  i have this error

 cephadm bootstrap   --mon-ip 10.20.23.65 --allow-fqdn-hostname
--initial-dashboard-user admin   --initial-dashboard-password admin
--dashboard-password-noupdate
Verifying podman|docker is present...
Verifying lvm2 is present...
Verifying time synchronization is in place...
Unit chronyd.service is enabled and running
Repeating the final host check...
docker (/bin/docker) is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK
Cluster fsid: e52bee78-db8b-11ec-9099-00505695f8a8
Verifying IP 10.20.23.65 port 3300 ...
Verifying IP 10.20.23.65 port 6789 ...
Mon IP `10.20.23.65` is in CIDR network `10.20.23.0/24`
- internal network (--cluster-network) has not been provided, OSD
replication will default to the public_network
Pulling container image private-registery.fst/ceph/ceph:v16.2.7...
Ceph version: ceph version 16.2.7
(dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)
Extracting ceph user uid/gid from container image...
Creating initial keys...
Creating initial monmap...
Creating mon...
Waiting for mon to start...
Waiting for mon...
mon is available
Assimilating anything we can from ceph.conf...
Generating new minimal ceph.conf...
Restarting the monitor...
Setting mon public_network to 10.20.23.0/24
Wrote config to /etc/ceph/ceph.conf
Wrote keyring to /etc/ceph/ceph.client.admin.keyring
Creating mgr...
Verifying port 9283 ...
Waiting for mgr to start...
Waiting for mgr...
mgr not available, waiting (1/15)...
mgr not available, waiting (2/15)...
mgr not available, waiting (3/15)...
mgr not available, waiting (4/15)...
mgr is available
Enabling cephadm module...
Waiting for the mgr to restart...
Waiting for mgr epoch 5...
mgr epoch 5 is available
Setting orchestrator backend to cephadm...
Generating ssh key...
Wrote public SSH key to /etc/ceph/ceph.pub
Adding key to root@localhost authorized_keys...
Adding host opcpmfpsbpp0101...
Non-zero exit code 22 from /bin/docker run --rm --ipc=host
--stop-signal=SIGTERM --net=host --entrypoint /usr/bin/ceph --init -e
CONTAINER_IMAGE=private-registery.fst/ceph/ceph:v16.2.7 -e
NODE_NAME=opcpmfpsbpp0101 -e  CEPH_USE_RANDOM_NONCE=1 -v
/var/log/ceph/e52bee78-db8b-11ec-9099-00505695f8a8:/var/log/ceph:z -v
/tmp/ceph-tmpwt99ep2e:/etc/ceph/ceph.client.admin.keyring:z -v
/tmp/ceph-tmpweojwqdh:/etc/ceph/ceph.conf:z opkbhfpsb
pp0101.fst/ceph/ceph:v16.2.7 orch host add opcpmfpsbpp0101 10.20.23.65
/usr/bin/ceph: stderr Error EINVAL: Failed to connect to opcpmfpsbpp0101
(10.20.23.65).
/usr/bin/ceph: stderr Please make sure that the host is reachable and
accepts connections using the cephadm SSH key
/usr/bin/ceph: stderr
/usr/bin/ceph: stderr To add the cephadm SSH key to the host:
/usr/bin/ceph: stderr > ceph cephadm get-pub-key > ~/ceph.pub
/usr/bin/ceph: stderr > ssh-copy-id -f -i ~/ceph.pub root@10.20.23.65
/usr/bin/ceph: stderr
/usr/bin/ceph: stderr To check that the host is reachable open a new shell
with the --no-hosts flag:
/usr/bin/ceph: stderr > cephadm shell --no-hosts
/usr/bin/ceph: stderr
/usr/bin/ceph: stderr Then run the following:
/usr/bin/ceph: stderr > ceph cephadm get-ssh-config > ssh_config
/usr/bin/ceph: stderr > ceph config-key get mgr/cephadm/ssh_identity_key >
~/cephadm_private_key
/usr/bin/ceph: stderr > chmod 0600 ~/cephadm_private_key
/usr/bin/ceph: stderr > ssh -F ssh_config -i ~/cephadm_private_key
root@10.20.23.65
ERROR: Failed to add host : Failed command: /bin/docker
run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint
/usr/bin/ceph --init -e
CONTAINER_IMAGE=private-registery.fst/ceph/ceph:v16.2 .7 -e
NODE_NAME=

[ceph-users] Re: Ceph Repo Branch Rename - May 24

2022-05-24 Thread David Galloway

This maintenance is ongoing. This was a much larger effort than anticipated.

I've unpaused Jenkins but fully expect many jobs to fail for the next 
couple days.


If you had a PR targeting master, you will need to edit the PR to target 
main now instead.


I appreciate your patience.

On 5/19/22 14:38, David Galloway wrote:

Hi all,

In an effort to use more inclusive language, we will be renaming all 
Ceph repo 'master' branches to 'main' on May 24.


I anticipate making the change in the morning Eastern US time, merging 
all 's/master/main' pull requests I already have open, then tracking 
down and fixing any remaining references to the master branch.


Please excuse the disruption and thank you for your patience.


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Repo Branch Rename - May 24

2022-05-24 Thread Laura Flores
Thanks for the heads-up David!

FYI for anyone who doesn't know how to change the base branch, click on
"Edit" next to the PR title, click on "base", and change it to "main".

On Tue, May 24, 2022 at 5:31 PM David Galloway  wrote:

> This maintenance is ongoing. This was a much larger effort than
> anticipated.
>
> I've unpaused Jenkins but fully expect many jobs to fail for the next
> couple days.
>
> If you had a PR targeting master, you will need to edit the PR to target
> main now instead.
>
> I appreciate your patience.
>
> On 5/19/22 14:38, David Galloway wrote:
> > Hi all,
> >
> > In an effort to use more inclusive language, we will be renaming all
> > Ceph repo 'master' branches to 'main' on May 24.
> >
> > I anticipate making the change in the morning Eastern US time, merging
> > all 's/master/main' pull requests I already have open, then tracking
> > down and fixing any remaining references to the master branch.
> >
> > Please excuse the disruption and thank you for your patience.
>
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
>
>

-- 

Laura Flores

She/Her/Hers

Associate Software Engineer, Ceph Storage

Red Hat Inc. 

La Grange Park, IL

lflo...@redhat.com
M: +17087388804
@RedHat    Red Hat
  Red Hat


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io