[ceph-users] problem with ceph osd blacklist
Hi all. If cephfs client is in a slow or unreliable network environment, the client will be added to OSD blacklist and OSD map, and the default duration is 1 hour. During this time, the client will be forbidden to access CEPH. If I want to solve this problem and ensure client's normal I/O operation is not interrupted, are the following two options feasible? which one is better? 1. Set "mds_session_blacklist_on_timeout" to false, and forbid to add slow clients to blacklist; 2.Just reduce the duration of slow clients joining the blacklist, and change the default 1 hour to 5 minutes. (set the value of "mon_osd_blacklist_default_expire" to 5 minutes) Are the two schemes feasible? Will it have a great impact on the data security and integrity of CEPH? Can you give me some suggestions? Thanks. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Problem with centos7 repository
Hi Tadas, I also noticed the same issue few days ago. https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/GDUSELT7B3NY7NBU2XHZP6CRHE3OSD6A/ I have reported it to the developers via the ceph-devel IRC. I was told that it will be fixed on the coming Friday the earlist. Hong On Wed, 2020-07-08 at 09:44 +0300, Tadas wrote: > Hello. I'm having problem with Centos7 Nautilus repository. > Since ~10 days ago, ( i guess after release of 14.2.10 packages), yum > does > not find earlier Nautilus releases anymore. > They can be visible in repo if i browse it. But they are not in yum > meta > files i think so you cant install them via yum: > > > > N/S matched: librados2 > = > = > 1:librados2-10.2.5-4.el7.i686 : RADOS distributed object store > client > library > 1:librados2-10.2.5-4.el7.x86_64 : RADOS distributed object store > client > library > 1:librados2-10.2.5-4.el7.x86_64 : RADOS distributed object store > client > library > 2:librados2-14.2.10-0.el7.x86_64 : RADOS distributed object store > client > library > 1:librados2-devel-10.2.5-4.el7.i686 : RADOS headers > 1:librados2-devel-10.2.5-4.el7.x86_64 : RADOS headers > > As you can see, yum finds only latest version of 14.2. release. > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io -- Hurng-Chun (Hong) Lee, PhD ICT manager Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging Radboud University Nijmegen e-mail: h@donders.ru.nl tel: +31(0) 243610977 web: http://www.ru.nl/donders/ ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: AdminSocket occurs segment fault with samba vfs ceph plugin
Got the same problem when we export dirs of CephFS through samba + samba-vfs-cephfs. As a contrast, with the same ceph cluster, we export dirs of CephFS throug nfs-ganesha + nfs-ganesha-ceph, and the admin socket(generated by libcephfs) work fines. ceph version: 14.2.5 samba version: 4.8.8 OS: Centos 7.6.1810 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: RGW: rgw_qactive perf is constantly growing
I upgraded one cluster to 14.2.10 and this perf counter is still growing. Does any have an idea of how to debug this problem? Jacek sob., 4 lip 2020 o 18:49 Simon Leinen napisał(a): > Jacek Suchenia writes: > > On two of our clusters (all v14.2.8) we observe a very strange behavior: > > > Over a time rgw_qactive perf is constantly growing, within 12h to 6k > entries. > > As another data point, we're seeing the same here, on one of our two > clusters, both also running 14.2.8. > > The growth is a bit slower here, about 300-700 connections per 24h > across 6 RadosGW instances, but it's quite obvious. > > Our other cluster doesn't show this behavior, even though it is bigger > and presumably has higher load. > > > image.png > > We observe this situation only on two of our clusters where the common > > thing is an app uploading a lot of files as multipart uploads via ssl. > > Interesting. Our two clusters seem to have similar rates of multipart > uploads, yet only one of them has the issue. > > > How can we debug this situation? How can we check what operations are > > in a queue or why a perf counter has not been decreased? > > I'd be curious about that as well. Maybe it's just an accounting issue > with some kinds of (failed/aborted?) requests. Looks a bit fishy... > -- > Simon. > -- Jacek Suchenia jacek.suche...@gmail.com ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: bluestore: osd bluestore_allocated is much larger than bluestore_stored
Thanks for your reply. It's helpful! We may consider to adjust min_alloc_size to a lower value or take other actions based on your analysis for space overhead with EC pools. Thanks. Best Jerry Pu Igor Fedotov 於 2020年7月7日 週二 下午4:10寫道: > I think you're facing the issue covered by the following ticket: > > https://tracker.ceph.com/issues/44213 > > > Unfortunately the only known solution is migrating to 4K min alloc size > which to be available since Pacific. > > > Thanks, > > Igor > > On 7/7/2020 6:38 AM, Jerry Pu wrote: > > Hi: > > > > We have a cluster (v13.2.4), and we do some tests on a EC k=2, > m=1 > > pool "VMPool0". We deploy some VMs (Windows, CentOS7) on the pool and > then > > use IOMeter to write data to these VMs. After a period of time, we > observe > > a strange thing that pool actual usage is much larger than stored data * > > 1.5 (stored_raw). > > > > [root@Sim-str-R6-4 ~]# ceph df > > GLOBAL: > > CLASS SIZEAVAIL USEDRAW USED %RAW USED > >hdd 3.5 TiB 1.8 TiB 1.7 TiB 1.7 TiB 48.74 > > TOTAL 3.5 TiB 1.8 TiB 1.7 TiB 1.7 TiB 48.74 > > POOLS: > > NAME ID USED%USED MAX AVAIL > OBJECTS > > cephfs_data 1 29 GiB 100.00 0 B > 2597 > > cephfs_md2 831 MiB 100.00 0 B > 133 > > erasure_meta_hdd 3 22 MiB 100.00 0 B > 170 > > VMPool0 4 1.2 TiB 56.77 644 GiB > 116011 > > stresspool 5 2.6 MiB 100.00 0 B > 32 > > > > [root@Sim-str-R6-4 ~]# ceph df detail -f json-pretty > > -snippet- > > { > > "name": "VMPool0", > > "id": 4, > > "stats": { > > "kb_used": 132832, > > "bytes_used": 1360782163968,< > > "percent_used": 0.567110, > > "max_avail": 692481687552, > > "objects": 116011, > > "quota_objects": 0, > > "quota_bytes": 0, > > "dirty": 116011, > > "rd": 27449034, > > "rd_bytes": 126572760064, > > "wr": 20675381, > > "wr_bytes": 1006460652544, > > "comp_ratio": 1.00, > > "stored": 497657610240, > > "stored_raw": 746486431744, < > > } > > }, > > > > The perf counters of all osds (all hdd) used by VMPool0 also > show > > that bluestore_allocated is much larger than bluestore_stored. > > > > [root@Sim-str-R6-4 ~]# for i in {0..3}; do echo $i; ceph daemon osd.$i > perf > > dump | grep bluestore | head -6; done > > 0 > > "bluestore": { > > "bluestore_allocated": 175032369152,< > > "bluestore_stored": 83557936482,< > > "bluestore_compressed": 958795770, > > "bluestore_compressed_allocated": 6431965184, > > "bluestore_compressed_original": 18576584704, > > 1 > > "bluestore": { > > "bluestore_allocated": 119943593984,< > > "bluestore_stored": 53325238866,< > > "bluestore_compressed": 670158436, > > "bluestore_compressed_allocated": 4751818752, > > "bluestore_compressed_original": 13752328192, > > 2 > > "bluestore": { > > "bluestore_allocated": 155444707328,< > > "bluestore_stored": 69067116553,< > > "bluestore_compressed": 565170876, > > "bluestore_compressed_allocated": 4614324224, > > "bluestore_compressed_original": 13469696000, > > 3 > > "bluestore": { > > "bluestore_allocated": 128179240960,< > > "bluestore_stored": 60884752114,< > > "bluestore_compressed": 1653455847, > > "bluestore_compressed_allocated": 9741795328, > > "bluestore_compressed_original": 27878768640, > > > > [root@Sim-str-R6-5 osd]# for i in {4..7}; do echo $i; sh -c "ceph daemon > > osd.$i perf dump | grep bluestore | head -6"; done > > 4 > > "bluestore": { > > "bluestore_allocated": 165950652416,< > > "bluestore_stored": 80255191687,< > > "bluestore_compressed": 1526871060, > > "bluestore_compressed_allocated": 8900378624, > > "bluestore_compressed_original": 25324142592, > > 5 > > admin_socket: exception getting command descriptions: [Errno 111] > > Connection refused > > 6 > > "bluestore": { > > "bluestore_allocated": 166022152192,< > > "bluestore_stored": 84645390708,< > > "bluestore_compressed": 1169055606, > > "bluestore_comp
[ceph-users] Re: bluestore: osd bluestore_allocated is much larger than bluestore_stored
Please note that simple min_alloc_size downsizing might negatively impact OSD performance. That's why this modification has been postponed till Pacific - we've made a bunch of additional changes to eliminate the drop. Regards, Igor On 7/8/2020 12:32 PM, Jerry Pu wrote: Thanks for your reply. It's helpful! We may consider to adjust min_alloc_size to a lower value or take other actions based on your analysis for space overhead with EC pools. Thanks. Best Jerry Pu Igor Fedotov mailto:ifedo...@suse.de>> 於 2020年7月7日 週二 下午4:10寫道: I think you're facing the issue covered by the following ticket: https://tracker.ceph.com/issues/44213 Unfortunately the only known solution is migrating to 4K min alloc size which to be available since Pacific. Thanks, Igor On 7/7/2020 6:38 AM, Jerry Pu wrote: > Hi: > > We have a cluster (v13.2.4), and we do some tests on a EC k=2, m=1 > pool "VMPool0". We deploy some VMs (Windows, CentOS7) on the pool and then > use IOMeter to write data to these VMs. After a period of time, we observe > a strange thing that pool actual usage is much larger than stored data * > 1.5 (stored_raw). > > [root@Sim-str-R6-4 ~]# ceph df > GLOBAL: > CLASS SIZE AVAIL USED RAW USED %RAW USED > hdd 3.5 TiB 1.8 TiB 1.7 TiB 1.7 TiB 48.74 > TOTAL 3.5 TiB 1.8 TiB 1.7 TiB 1.7 TiB 48.74 > POOLS: > NAME ID USED %USED MAX AVAIL OBJECTS > cephfs_data 1 29 GiB 100.00 0 B 2597 > cephfs_md 2 831 MiB 100.00 0 B 133 > erasure_meta_hdd 3 22 MiB 100.00 0 B 170 > VMPool0 4 1.2 TiB 56.77 644 GiB 116011 > stresspool 5 2.6 MiB 100.00 0 B 32 > > [root@Sim-str-R6-4 ~]# ceph df detail -f json-pretty > -snippet- > { > "name": "VMPool0", > "id": 4, > "stats": { > "kb_used": 132832, > "bytes_used": 1360782163968, < > "percent_used": 0.567110, > "max_avail": 692481687552, > "objects": 116011, > "quota_objects": 0, > "quota_bytes": 0, > "dirty": 116011, > "rd": 27449034, > "rd_bytes": 126572760064, > "wr": 20675381, > "wr_bytes": 1006460652544, > "comp_ratio": 1.00, > "stored": 497657610240, > "stored_raw": 746486431744, < > } > }, > > The perf counters of all osds (all hdd) used by VMPool0 also show > that bluestore_allocated is much larger than bluestore_stored. > > [root@Sim-str-R6-4 ~]# for i in {0..3}; do echo $i; ceph daemon osd.$i perf > dump | grep bluestore | head -6; done > 0 > "bluestore": { > "bluestore_allocated": 175032369152, < > "bluestore_stored": 83557936482, < > "bluestore_compressed": 958795770, > "bluestore_compressed_allocated": 6431965184, > "bluestore_compressed_original": 18576584704, > 1 > "bluestore": { > "bluestore_allocated": 119943593984, < > "bluestore_stored": 53325238866, < > "bluestore_compressed": 670158436, > "bluestore_compressed_allocated": 4751818752, > "bluestore_compressed_original": 13752328192, > 2 > "bluestore": { > "bluestore_allocated": 155444707328, < > "bluestore_stored": 69067116553, < > "bluestore_compressed": 565170876, > "bluestore_compressed_allocated": 4614324224, > "bluestore_compressed_original": 13469696000, > 3 > "bluestore": { > "bluestore_allocated": 128179240960, < > "bluestore_stored": 60884752114, < > "bluestore_compressed": 1653455847, > "bluestore_compressed_allocated": 9741795328, > "bluestore_compressed_original": 27878768640, > > [root@Sim-str-R6-5 osd]# for i in {4..7}; do echo $i; sh -c "ceph daemon > osd.$i perf dump | grep bluestore | head -6"; done > 4 > "bluestore": { > "bluestore_allocated": 165950652416, < > "bluestore_stored": 80255191687, <--
[ceph-users] Re: bluestore: osd bluestore_allocated is much larger than bluestore_stored
OK. Thanks for your reminder. We will think about how to make the adjustment to our cluster. Best Jerry Pu Igor Fedotov 於 2020年7月8日 週三 下午5:40寫道: > Please note that simple min_alloc_size downsizing might negatively impact > OSD performance. That's why this modification has been postponed till > Pacific - we've made a bunch of additional changes to eliminate the drop. > > > Regards, > > Igor > On 7/8/2020 12:32 PM, Jerry Pu wrote: > > Thanks for your reply. It's helpful! We may consider to adjust > min_alloc_size to a lower value or take other actions based on > your analysis for space overhead with EC pools. Thanks. > > Best > Jerry Pu > > Igor Fedotov 於 2020年7月7日 週二 下午4:10寫道: > >> I think you're facing the issue covered by the following ticket: >> >> https://tracker.ceph.com/issues/44213 >> >> >> Unfortunately the only known solution is migrating to 4K min alloc size >> which to be available since Pacific. >> >> >> Thanks, >> >> Igor >> >> On 7/7/2020 6:38 AM, Jerry Pu wrote: >> > Hi: >> > >> > We have a cluster (v13.2.4), and we do some tests on a EC >> k=2, m=1 >> > pool "VMPool0". We deploy some VMs (Windows, CentOS7) on the pool and >> then >> > use IOMeter to write data to these VMs. After a period of time, we >> observe >> > a strange thing that pool actual usage is much larger than stored data * >> > 1.5 (stored_raw). >> > >> > [root@Sim-str-R6-4 ~]# ceph df >> > GLOBAL: >> > CLASS SIZEAVAIL USEDRAW USED %RAW >> USED >> >hdd 3.5 TiB 1.8 TiB 1.7 TiB 1.7 TiB >> 48.74 >> > TOTAL 3.5 TiB 1.8 TiB 1.7 TiB 1.7 TiB >> 48.74 >> > POOLS: >> > NAME ID USED%USED MAX AVAIL >> OBJECTS >> > cephfs_data 1 29 GiB 100.00 0 B >> 2597 >> > cephfs_md2 831 MiB 100.00 0 B >>133 >> > erasure_meta_hdd 3 22 MiB 100.00 0 B >>170 >> > VMPool0 4 1.2 TiB 56.77 644 GiB >> 116011 >> > stresspool 5 2.6 MiB 100.00 0 B >> 32 >> > >> > [root@Sim-str-R6-4 ~]# ceph df detail -f json-pretty >> > -snippet- >> > { >> > "name": "VMPool0", >> > "id": 4, >> > "stats": { >> > "kb_used": 132832, >> > "bytes_used": 1360782163968,< >> > "percent_used": 0.567110, >> > "max_avail": 692481687552, >> > "objects": 116011, >> > "quota_objects": 0, >> > "quota_bytes": 0, >> > "dirty": 116011, >> > "rd": 27449034, >> > "rd_bytes": 126572760064, >> > "wr": 20675381, >> > "wr_bytes": 1006460652544, >> > "comp_ratio": 1.00, >> > "stored": 497657610240, >> > "stored_raw": 746486431744, < >> > } >> > }, >> > >> > The perf counters of all osds (all hdd) used by VMPool0 also >> show >> > that bluestore_allocated is much larger than bluestore_stored. >> > >> > [root@Sim-str-R6-4 ~]# for i in {0..3}; do echo $i; ceph daemon osd.$i >> perf >> > dump | grep bluestore | head -6; done >> > 0 >> > "bluestore": { >> > "bluestore_allocated": 175032369152,< >> > "bluestore_stored": 83557936482,< >> > "bluestore_compressed": 958795770, >> > "bluestore_compressed_allocated": 6431965184, >> > "bluestore_compressed_original": 18576584704, >> > 1 >> > "bluestore": { >> > "bluestore_allocated": 119943593984,< >> > "bluestore_stored": 53325238866,< >> > "bluestore_compressed": 670158436, >> > "bluestore_compressed_allocated": 4751818752, >> > "bluestore_compressed_original": 13752328192, >> > 2 >> > "bluestore": { >> > "bluestore_allocated": 155444707328,< >> > "bluestore_stored": 69067116553,< >> > "bluestore_compressed": 565170876, >> > "bluestore_compressed_allocated": 4614324224, >> > "bluestore_compressed_original": 13469696000, >> > 3 >> > "bluestore": { >> > "bluestore_allocated": 128179240960,< >> > "bluestore_stored": 60884752114,< >> > "bluestore_compressed": 1653455847, >> > "bluestore_compressed_allocated": 9741795328, >> > "bluestore_compressed_original": 27878768640, >> > >> > [root@Sim-str-R6-5 osd]# for i in {4..7}; do echo $i; sh -c "ceph >> daemon >> > osd.$i perf dump | grep bluestore | head -6"; done >> > 4 >> > "bluestore": { >> > "bluestore_allocated": 165950652416,< >> >
[ceph-users] Re: problem with ceph osd blacklist
From my point of view, preventing clients from being added to blacklists may be better if you are in a poor network environment. AFAIK the server will send signals to clients frequently. And it will add someone to the blacklist if it doesn't receive a reply. <380562...@qq.com> 于2020年7月8日周三 下午3:01写道: > Hi all. > > If cephfs client is in a slow or unreliable network environment, the > client will be added to OSD blacklist and OSD map, and the default duration > is 1 hour. > During this time, the client will be forbidden to access CEPH. If I want > to solve this problem and ensure client's normal I/O operation is not > interrupted, > are the following two options feasible? which one is better? > > 1. Set "mds_session_blacklist_on_timeout" to false, and forbid to add slow > clients to blacklist; > > 2.Just reduce the duration of slow clients joining the blacklist, and > change the default 1 hour to 5 minutes. > (set the value of "mon_osd_blacklist_default_expire" to 5 minutes) > > > Are the two schemes feasible? Will it have a great impact on the data > security and integrity of CEPH? > Can you give me some suggestions? > > Thanks. > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Octopus upgrade breaks Ubuntu 18.04 libvirt
Please strace both virsh and libvirtd (you can attach to it by pid), and make sure that the strace command uses the "-f" switch (i.e. traces all threads). On Wed, Jul 8, 2020 at 6:20 PM Andrei Mikhailovsky wrote: > > Jason, > > After adding the 1:storage to the log line of the config and restarting the > service I do not see anything in the logs. I've started the "virsh pool-list" > command several times and there is absolutely nothing in the logs. The > command keeps hanging > > > running the strace virsh pool-list shows (the last 50-100 lines or so): > > > > ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0 > ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0 > getuid()= 0 > geteuid() = 0 > openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/gconv/gconv-modules.cache", > O_RDONLY) = 3 > fstat(3, {st_mode=S_IFREG|0644, st_size=26376, ...}) = 0 > mmap(NULL, 26376, PROT_READ, MAP_SHARED, 3, 0) = 0x7fe979933000 > close(3)= 0 > futex(0x7fe978505a08, FUTEX_WAKE_PRIVATE, 2147483647) = 0 > uname({sysname="Linux", nodename="ais-cloudhost1", ...}) = 0 > futex(0x7fe9790bfce0, FUTEX_WAKE_PRIVATE, 2147483647) = 0 > socket(AF_INET6, SOCK_DGRAM, IPPROTO_IP) = 3 > close(3)= 0 > futex(0x7fe9790c0700, FUTEX_WAKE_PRIVATE, 2147483647) = 0 > pipe2([3, 4], O_NONBLOCK|O_CLOEXEC) = 0 > mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = > 0x7fe96ca98000 > mprotect(0x7fe96ca99000, 8388608, PROT_READ|PROT_WRITE) = 0 > clone(child_stack=0x7fe96d297db0, > flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SE > TTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fe96d2989d0, > tls=0x7fe96d298700, child_tidptr=0x7fe96d2 > 989d0) = 54218 > futex(0x7fe9790bffb8, FUTEX_WAKE_PRIVATE, 2147483647) = 0 > futex(0x7fe9790c06f8, FUTEX_WAKE_PRIVATE, 2147483647) = 0 > geteuid() = 0 > access("/etc/libvirt/libvirt.conf", F_OK) = 0 > openat(AT_FDCWD, "/etc/libvirt/libvirt.conf", O_RDONLY) = 5 > read(5, "#\n# This can be used to setup UR"..., 8192) = 547 > read(5, "", 7645) = 0 > close(5)= 0 > getuid()= 0 > geteuid() = 0 > access("/proc/vz", F_OK)= -1 ENOENT (No such file or > directory) > geteuid() = 0 > getuid()= 0 > geteuid() = 0 > socket(AF_UNIX, SOCK_STREAM, 0) = 5 > connect(5, {sa_family=AF_UNIX, sun_path="/var/run/libvirt/libvirt-sock"}, > 110) = 0 > getsockname(5, {sa_family=AF_UNIX}, [128->2]) = 0 > futex(0x7fe9790c0a08, FUTEX_WAKE_PRIVATE, 2147483647) = 0 > fcntl(5, F_GETFD) = 0 > fcntl(5, F_SETFD, FD_CLOEXEC) = 0 > fcntl(5, F_GETFL) = 0x2 (flags O_RDWR) > fcntl(5, F_SETFL, O_RDWR|O_NONBLOCK)= 0 > futex(0x7fe9790c0908, FUTEX_WAKE_PRIVATE, 2147483647) = 0 > pipe2([6, 7], O_CLOEXEC)= 0 > write(4, "\0", 1) = 1 > futex(0x7fe9790bfb60, FUTEX_WAKE_PRIVATE, 1) = 1 > futex(0x7fe9790c09d0, FUTEX_WAKE_PRIVATE, 2147483647) = 0 > futex(0x7fe9790c0920, FUTEX_WAKE_PRIVATE, 2147483647) = 0 > brk(0x5598ffebb000) = 0x5598ffebb000 > write(4, "\0", 1) = 1 > futex(0x7fe9790bfb60, FUTEX_WAKE_PRIVATE, 1) = 1 > rt_sigprocmask(SIG_BLOCK, [PIPE CHLD WINCH], [], 8) = 0 > poll([{fd=5, events=POLLOUT}, {fd=6, events=POLLIN}], 2, -1) = 1 ([{fd=5, > revents=POLLOUT}]) > rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > write(5, "\0\0\0\34 \0\200\206\0\0\0\1\0\0\0B\0\0\0\0\0\0\0\0\0\0\0\0", 28) = > 28 > rt_sigprocmask(SIG_BLOCK, [PIPE CHLD WINCH], [], 8) = 0 > poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}], 2, -1 > > > > > > > It get's stuck at the last line and there is nothing happening. > > Andrei > > - Original Message - > > From: "Jason Dillaman" > > To: "Andrei Mikhailovsky" > > Cc: "ceph-users" > > Sent: Tuesday, 7 July, 2020 23:33:03 > > Subject: Re: [ceph-users] Re: Octopus upgrade breaks Ubuntu 18.04 libvirt > > > On Tue, Jul 7, 2020 at 5:14 PM Andrei Mikhailovsky > > wrote: > >> > >> Hi Jason, > >> The extract from the debug log file is given below in the first message. > >> It just > >> repeats those lines every so often. > >> > >> I can't find anything else. > > > > I would expect lots of debug logs from the storage backend. Do you > > have a "1:storage" entry in your libvirtd.conf? > > > >> Cheers > >> - Original Message - > >> > From: "Jason Dillaman" > >> > To: "Andrei Mikhailovsky" > >> > Cc: "ceph-users" > >> > Sent: Tuesday, 7 July, 2020 16:33:25 > >> > Subject: Re: [ceph-users] Re: Octopus upgrade breaks Ubuntu 18.04 libvirt > >> > >> > On Tue, Jul 7, 2020 at 11:07 AM Andrei Mikhailovsky > >> > wrote: > >> >> > >>
[ceph-users] Re: Octopus upgrade breaks Ubuntu 18.04 libvirt
Jason, this is what I currently have: log_filters="1:libvirt 1:util 1:qemu" log_outputs="1:file:/var/log/libvirt/libvirtd.log" I will add the 1:storage and send more logs. Thanks for trying to help. Andrei - Original Message - > From: "Jason Dillaman" > To: "Andrei Mikhailovsky" > Cc: "ceph-users" > Sent: Tuesday, 7 July, 2020 23:33:03 > Subject: Re: [ceph-users] Re: Octopus upgrade breaks Ubuntu 18.04 libvirt > On Tue, Jul 7, 2020 at 5:14 PM Andrei Mikhailovsky wrote: >> >> Hi Jason, >> The extract from the debug log file is given below in the first message. It >> just >> repeats those lines every so often. >> >> I can't find anything else. > > I would expect lots of debug logs from the storage backend. Do you > have a "1:storage" entry in your libvirtd.conf? > >> Cheers >> - Original Message - >> > From: "Jason Dillaman" >> > To: "Andrei Mikhailovsky" >> > Cc: "ceph-users" >> > Sent: Tuesday, 7 July, 2020 16:33:25 >> > Subject: Re: [ceph-users] Re: Octopus upgrade breaks Ubuntu 18.04 libvirt >> >> > On Tue, Jul 7, 2020 at 11:07 AM Andrei Mikhailovsky >> > wrote: >> >> >> >> I've left the virsh pool-list command 'hang' for a while and it did >> >> eventually >> >> get the results back. In about 4 hours! >> > >> > Perhaps enable the debug logging of libvirt [1] to determine what it's >> > spending its time on? >> > >> >> root@ais-cloudhost1:/home/andrei# time virsh pool-list >> >> Name State Autostart >> >> --- >> >> 12ca033f-e673-4060-8db9-909d79650f39 active no >> >> bcc753c6-e47a-3b7c-904a-fcc1d0a594c5 active no >> >> cf771bc7-8998-354d-8e10-5564585a3c20 active no >> >> d8d5ec36-3cb0-39af-8fc6-084a4abd5d28 active no >> >> >> >> >> >> real234m23.877s >> >> user0m0.351s >> >> sys 0m0.506s >> >> >> >> >> >> >> >> The second attempt was a mere 2 hours with a bit. >> >> >> >> >> >> root@ais-cloudhost1:/home/andrei# time virsh pool-list >> >> Name State Autostart >> >> --- >> >> 12ca033f-e673-4060-8db9-909d79650f39 active no >> >> bcc753c6-e47a-3b7c-904a-fcc1d0a594c5 active no >> >> cf771bc7-8998-354d-8e10-5564585a3c20 active no >> >> d8d5ec36-3cb0-39af-8fc6-084a4abd5d28 active no >> >> >> >> >> >> real148m54.763s >> >> user0m0.241s >> >> sys 0m0.304s >> >> >> >> >> >> >> >> Am I the only person having these issues with libvirt and Octopus release? >> >> >> >> Cheers >> >> >> >> - Original Message - >> >> > From: "Andrei Mikhailovsky" >> >> > To: "ceph-users" >> >> > Sent: Monday, 6 July, 2020 19:27:25 >> >> > Subject: [ceph-users] Re: Octopus upgrade breaks Ubuntu 18.04 libvirt >> >> >> >> > A quick update. >> >> > >> >> > I have done a fresh install of the CloudStack host server running >> >> > Ubuntu 18.04 >> >> > with the latest updates. I've installed ceph 12.x and connected it to >> >> > Cloudstack which uses kvm/libvirt/ceph/rbd. The rest of the ceph >> >> > services >> >> > (mon,mgr,osd,etc) are all running 15.2.3. Works like a charm. >> >> > >> >> > As soon as I've updated the host server to version 15.2.3, Libvirt >> >> > stopped >> >> > working. It just hangs without doing much it seems. Common commands >> >> > like 'virsh >> >> > pool-list' or 'virsh list' are just hanging. I've strace the process >> >> > and it >> >> > just doesn't show any activity. >> >> > >> >> > >> >> > 2020-07-06 18:18:36.930+: 3273: info : >> >> > virEventPollUpdateTimeout:265 : >> >> > EVENT_POLL_UPDATE_TIMEOUT: timer=993 frequen >> >> > cy=5000 >> >> > 2020-07-06 18:18:36.930+: 3273: debug : >> >> > virEventPollUpdateTimeout:282 : Set >> >> > timer freq=5000 expires=1594059521930 >> >> > 2020-07-06 18:18:36.930+: 3273: debug : >> >> > virEventPollInterruptLocked:722 : >> >> > Skip interrupt, 1 140123172218240 >> >> > 2020-07-06 18:18:36.930+: 3273: info : virEventPollUpdateHandle:152 >> >> > : >> >> > EVENT_POLL_UPDATE_HANDLE: watch=1004 events=1 >> >> > 2020-07-06 18:18:36.930+: 3273: debug : >> >> > virEventPollInterruptLocked:722 : >> >> > Skip interrupt, 1 140123172218240 >> >> > 2020-07-06 18:18:36.930+: 3273: debug : >> >> > virEventPollCleanupTimeouts:525 : >> >> > Cleanup 8 >> >> > 2020-07-06 18:18:36.930+: 3273: debug : >> >> > virEventPollCleanupHandles:574 : >> >> > Cleanup 22 >> >> > 2020-07-06 18:18:36.930+: 3273: debug : virEventRunDefaultImpl:324 >> >> > : running >> >> > default event implementation >> >> > 2020-07-06 18:18:36.930+: 3273: debug : >> >> > virEventPollCleanupTimeouts:525 : >> >> > Cleanup 8 >> >> > 2020-07-06 18:18:36.930+: 3273: debug : >> >> > virEventPollCleanupHandles:574 : >> >> > Cleanup 22 >> >> > 2020-07-06 18:18:36.931+: 3273: debug : virEventPollMakePollFDs:401 >> >> > : >> >> > Prepare n=0 w=1, f=5 e=1 d=0 >> >> > 2020-07-06 18:18:36.931+: 3273: debug : virEventPollMakePollFDs:401 >> >>
[ceph-users] Re: Ceph stuck at: objects misplaced (0.064%)
Hi, have you tried restarting all osds? Am 08.07.2020 um 15:43 schrieb Ml Ml: > Hello, > > ceph is stuck since 4 days with 0.064% misplaced and i dunno why. Can > anyone help me to get it fixed? > I did restart some OSDs and reweight them again to get some data > moving but that did not help. > > root@node01:~ # ceph -s > cluster: > id: 251c937e-0b55-48c1-8f34-96e84e4023d4 > health: HEALTH_WARN > 1803/2799972 objects misplaced (0.064%) > mon node02 is low on available space > > services: > mon: 3 daemons, quorum node01,node02,node03 > mgr: node03(active), standbys: node01, node02 > osd: 16 osds: 16 up, 16 in; 1 remapped pgs > > data: > pools: 1 pools, 512 pgs > objects: 933.32k objects, 2.68TiB > usage: 9.54TiB used, 5.34TiB / 14.9TiB avail > pgs: 1803/2799972 objects misplaced (0.064%) > 511 active+clean > 1 active+clean+remapped > > io: > client: 131KiB/s rd, 8.57MiB/s wr, 28op/s rd, 847op/s wr > > root@node01:~ # ceph health detail > HEALTH_WARN 1803/2800179 objects misplaced (0.064%); mon node02 is low > on available space > OBJECT_MISPLACED 1803/2800179 objects misplaced (0.064%) > MON_DISK_LOW mon node02 is low on available space > mon.node02 has 28% avail > root@node01:~ # ceph versions > { > "mon": { > "ceph version 12.2.13 (98af9a6b9a46b2d562a0de4b09263d70aeb1c9dd) > luminous (stable)": 3 > }, > "mgr": { > "ceph version 12.2.13 (98af9a6b9a46b2d562a0de4b09263d70aeb1c9dd) > luminous (stable)": 3 > }, > "osd": { > "ceph version 12.2.13 (98af9a6b9a46b2d562a0de4b09263d70aeb1c9dd) > luminous (stable)": 16 > }, > "mds": {}, > "overall": { > "ceph version 12.2.13 (98af9a6b9a46b2d562a0de4b09263d70aeb1c9dd) > luminous (stable)": 22 > } > } > > root@node02:~ # df -h > Filesystem Size Used Avail Use% Mounted on > udev 63G 0 63G 0% /dev > tmpfs 13G 1.3G 12G 11% /run > /dev/sda3 46G 31G 14G 70% / > tmpfs 63G 57M 63G 1% /dev/shm > tmpfs 5.0M 0 5.0M 0% /run/lock > tmpfs 63G 0 63G 0% /sys/fs/cgroup > /dev/sda1 922M 206M 653M 24% /boot > /dev/fuse 30M 144K 30M 1% /etc/pve > /dev/sde1 93M 5.4M 88M 6% /var/lib/ceph/osd/ceph-11 > /dev/sdf1 93M 5.4M 88M 6% /var/lib/ceph/osd/ceph-14 > /dev/sdc1 889G 676G 214G 77% /var/lib/ceph/osd/ceph-3 > /dev/sdb1 889G 667G 222G 76% /var/lib/ceph/osd/ceph-2 > /dev/sdd1 93M 5.4M 88M 6% /var/lib/ceph/osd/ceph-7 > tmpfs 13G 0 13G 0% /run/user/0 > > root@node02:~ # ceph osd tree > ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF > -1 14.34781 root default > -2 4.25287 host node01 > 0 hdd 0.85999 osd.0 up 0.80005 1.0 > 1 hdd 0.86749 osd.1 up 0.85004 1.0 > 6 hdd 0.87270 osd.6 up 0.90002 1.0 > 12 hdd 0.78000 osd.12 up 0.95001 1.0 > 13 hdd 0.87270 osd.13 up 0.95001 1.0 > -3 3.91808 host node02 > 2 hdd 0.7 osd.2 up 0.80005 1.0 > 3 hdd 0.5 osd.3 up 0.85004 1.0 > 7 hdd 0.87270 osd.7 up 0.85004 1.0 > 11 hdd 0.87270 osd.11 up 0.75006 1.0 > 14 hdd 0.87270 osd.14 up 0.85004 1.0 > -4 6.17686 host node03 > 4 hdd 0.87000 osd.4 up 1.0 1.0 > 5 hdd 0.87000 osd.5 up 1.0 1.0 > 8 hdd 0.87270 osd.8 up 1.0 1.0 > 10 hdd 0.87270 osd.10 up 1.0 1.0 > 15 hdd 0.87270 osd.15 up 1.0 1.0 > 16 hdd 1.81879 osd.16 up 1.0 1.0 > > root@node01:~ # ceph osd df tree > ID CLASS WEIGHT REWEIGHT SIZEUSE DATAOMAPMETA > AVAIL %USE VAR PGS TYPE NAME > -1 14.55780- 14.9TiB 9.45TiB 7.46TiB 1.47GiB 23.2GiB > 5.43TiB 63.52 1.00 - root default > -24.27286- 4.35TiB 3.15TiB 2.41TiB 486MiB 7.62GiB > 1.21TiB 72.32 1.14 - host node01 > 0 hdd 0.85999 0.80005 888GiB 619GiB 269GiB 92.3MiB 0B > 269GiB 69.72 1.10 89 osd.0 > 1 hdd 0.86749 0.85004 888GiB 641GiB 248GiB 109MiB 0B > 248GiB 72.12 1.14 92 osd.1 > 6 hdd 0.87270 0.90002 894GiB 634GiB 632GiB 98.9MiB 2.65GiB > 259GiB 70.99 1.12 107 osd.6 > 12 hdd 0.7 0.95001 894GiB 664GiB 661GiB 94.4MiB 2.52GiB > 230GiB 74.31 1.17 112 osd.12 > 13 hdd 0.87270 0.95001 894GiB 665GiB 663GiB 91.7MiB 2.46GiB > 229GiB 74.43 1.17 112 osd.13 > -34.10808- 4.35TiB 3.17TiB 2.18TiB 479MiB 6.99GiB > 1.18TiB 72.86 1.15 - host node02 > 2 hdd 0.78999 0.75006 888GiB 654GiB 235GiB 95.6MiB 0B > 235GiB 73.57 1.16 94 osd.2 > 3 hdd 0.7 0.80005 888GiB 737GiB 151GiB 114MiB 0B > 151GiB 82.98 1.31 105 osd.3 > 7 hdd 0.87270 0.85004 894GiB 612GiB 610GiB 88.9MiB 2.43GiB > 281GiB 68.50 1.08 103 osd.7 > 11 hdd 0.87270 0.75006 894GiB 576GiB 574GiB 81.8MiB 2.19GiB > 317GiB 64.47 1.01 97 osd.11 > 14 hdd 0.87270 0.85004 894GiB 669GiB 666GiB 98.8MiB 2.37GiB > 225GiB 74.85 1.18 112 osd.14 > -46.17686- 6.17TiB 3.13TiB 2.86TiB 541MiB 8.58GiB > 3.04TiB 50.73 0.80 - host node03 > 4 hdd 0.87000 1.0 888GiB 504GiB 384GiB 124MiB 0B > 384GiB 56.72 0.89 72 osd.4 > 5 hdd 0.87000 1.0 888GiB 520GiB 368GiB 96.2MiB 0
[ceph-users] Re: Octopus upgrade breaks Ubuntu 18.04 libvirt
Jason, After adding the 1:storage to the log line of the config and restarting the service I do not see anything in the logs. I've started the "virsh pool-list" command several times and there is absolutely nothing in the logs. The command keeps hanging running the strace virsh pool-list shows (the last 50-100 lines or so): ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0 ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0 getuid()= 0 geteuid() = 0 openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/gconv/gconv-modules.cache", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=26376, ...}) = 0 mmap(NULL, 26376, PROT_READ, MAP_SHARED, 3, 0) = 0x7fe979933000 close(3)= 0 futex(0x7fe978505a08, FUTEX_WAKE_PRIVATE, 2147483647) = 0 uname({sysname="Linux", nodename="ais-cloudhost1", ...}) = 0 futex(0x7fe9790bfce0, FUTEX_WAKE_PRIVATE, 2147483647) = 0 socket(AF_INET6, SOCK_DGRAM, IPPROTO_IP) = 3 close(3)= 0 futex(0x7fe9790c0700, FUTEX_WAKE_PRIVATE, 2147483647) = 0 pipe2([3, 4], O_NONBLOCK|O_CLOEXEC) = 0 mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7fe96ca98000 mprotect(0x7fe96ca99000, 8388608, PROT_READ|PROT_WRITE) = 0 clone(child_stack=0x7fe96d297db0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SE TTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fe96d2989d0, tls=0x7fe96d298700, child_tidptr=0x7fe96d2 989d0) = 54218 futex(0x7fe9790bffb8, FUTEX_WAKE_PRIVATE, 2147483647) = 0 futex(0x7fe9790c06f8, FUTEX_WAKE_PRIVATE, 2147483647) = 0 geteuid() = 0 access("/etc/libvirt/libvirt.conf", F_OK) = 0 openat(AT_FDCWD, "/etc/libvirt/libvirt.conf", O_RDONLY) = 5 read(5, "#\n# This can be used to setup UR"..., 8192) = 547 read(5, "", 7645) = 0 close(5)= 0 getuid()= 0 geteuid() = 0 access("/proc/vz", F_OK)= -1 ENOENT (No such file or directory) geteuid() = 0 getuid()= 0 geteuid() = 0 socket(AF_UNIX, SOCK_STREAM, 0) = 5 connect(5, {sa_family=AF_UNIX, sun_path="/var/run/libvirt/libvirt-sock"}, 110) = 0 getsockname(5, {sa_family=AF_UNIX}, [128->2]) = 0 futex(0x7fe9790c0a08, FUTEX_WAKE_PRIVATE, 2147483647) = 0 fcntl(5, F_GETFD) = 0 fcntl(5, F_SETFD, FD_CLOEXEC) = 0 fcntl(5, F_GETFL) = 0x2 (flags O_RDWR) fcntl(5, F_SETFL, O_RDWR|O_NONBLOCK)= 0 futex(0x7fe9790c0908, FUTEX_WAKE_PRIVATE, 2147483647) = 0 pipe2([6, 7], O_CLOEXEC)= 0 write(4, "\0", 1) = 1 futex(0x7fe9790bfb60, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7fe9790c09d0, FUTEX_WAKE_PRIVATE, 2147483647) = 0 futex(0x7fe9790c0920, FUTEX_WAKE_PRIVATE, 2147483647) = 0 brk(0x5598ffebb000) = 0x5598ffebb000 write(4, "\0", 1) = 1 futex(0x7fe9790bfb60, FUTEX_WAKE_PRIVATE, 1) = 1 rt_sigprocmask(SIG_BLOCK, [PIPE CHLD WINCH], [], 8) = 0 poll([{fd=5, events=POLLOUT}, {fd=6, events=POLLIN}], 2, -1) = 1 ([{fd=5, revents=POLLOUT}]) rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 write(5, "\0\0\0\34 \0\200\206\0\0\0\1\0\0\0B\0\0\0\0\0\0\0\0\0\0\0\0", 28) = 28 rt_sigprocmask(SIG_BLOCK, [PIPE CHLD WINCH], [], 8) = 0 poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}], 2, -1 It get's stuck at the last line and there is nothing happening. Andrei - Original Message - > From: "Jason Dillaman" > To: "Andrei Mikhailovsky" > Cc: "ceph-users" > Sent: Tuesday, 7 July, 2020 23:33:03 > Subject: Re: [ceph-users] Re: Octopus upgrade breaks Ubuntu 18.04 libvirt > On Tue, Jul 7, 2020 at 5:14 PM Andrei Mikhailovsky wrote: >> >> Hi Jason, >> The extract from the debug log file is given below in the first message. It >> just >> repeats those lines every so often. >> >> I can't find anything else. > > I would expect lots of debug logs from the storage backend. Do you > have a "1:storage" entry in your libvirtd.conf? > >> Cheers >> - Original Message - >> > From: "Jason Dillaman" >> > To: "Andrei Mikhailovsky" >> > Cc: "ceph-users" >> > Sent: Tuesday, 7 July, 2020 16:33:25 >> > Subject: Re: [ceph-users] Re: Octopus upgrade breaks Ubuntu 18.04 libvirt >> >> > On Tue, Jul 7, 2020 at 11:07 AM Andrei Mikhailovsky >> > wrote: >> >> >> >> I've left the virsh pool-list command 'hang' for a while and it did >> >> eventually >> >> get the results back. In about 4 hours! >> > >> > Perhaps enable the debug logging of libvirt [1] to determine what it's >> > spending its time on? >> > >> >> root@ais-cloudhost1:/home/andrei# time virsh pool-list >> >> Name State Autostart >> >> --- >> >> 12ca033f-e673-4060-8db9-909
[ceph-users] Ceph stuck at: objects misplaced (0.064%)
Hello, ceph is stuck since 4 days with 0.064% misplaced and i dunno why. Can anyone help me to get it fixed? I did restart some OSDs and reweight them again to get some data moving but that did not help. root@node01:~ # ceph -s cluster: id: 251c937e-0b55-48c1-8f34-96e84e4023d4 health: HEALTH_WARN 1803/2799972 objects misplaced (0.064%) mon node02 is low on available space services: mon: 3 daemons, quorum node01,node02,node03 mgr: node03(active), standbys: node01, node02 osd: 16 osds: 16 up, 16 in; 1 remapped pgs data: pools: 1 pools, 512 pgs objects: 933.32k objects, 2.68TiB usage: 9.54TiB used, 5.34TiB / 14.9TiB avail pgs: 1803/2799972 objects misplaced (0.064%) 511 active+clean 1 active+clean+remapped io: client: 131KiB/s rd, 8.57MiB/s wr, 28op/s rd, 847op/s wr root@node01:~ # ceph health detail HEALTH_WARN 1803/2800179 objects misplaced (0.064%); mon node02 is low on available space OBJECT_MISPLACED 1803/2800179 objects misplaced (0.064%) MON_DISK_LOW mon node02 is low on available space mon.node02 has 28% avail root@node01:~ # ceph versions { "mon": { "ceph version 12.2.13 (98af9a6b9a46b2d562a0de4b09263d70aeb1c9dd) luminous (stable)": 3 }, "mgr": { "ceph version 12.2.13 (98af9a6b9a46b2d562a0de4b09263d70aeb1c9dd) luminous (stable)": 3 }, "osd": { "ceph version 12.2.13 (98af9a6b9a46b2d562a0de4b09263d70aeb1c9dd) luminous (stable)": 16 }, "mds": {}, "overall": { "ceph version 12.2.13 (98af9a6b9a46b2d562a0de4b09263d70aeb1c9dd) luminous (stable)": 22 } } root@node02:~ # df -h Filesystem Size Used Avail Use% Mounted on udev 63G 0 63G 0% /dev tmpfs 13G 1.3G 12G 11% /run /dev/sda3 46G 31G 14G 70% / tmpfs 63G 57M 63G 1% /dev/shm tmpfs 5.0M 0 5.0M 0% /run/lock tmpfs 63G 0 63G 0% /sys/fs/cgroup /dev/sda1 922M 206M 653M 24% /boot /dev/fuse 30M 144K 30M 1% /etc/pve /dev/sde1 93M 5.4M 88M 6% /var/lib/ceph/osd/ceph-11 /dev/sdf1 93M 5.4M 88M 6% /var/lib/ceph/osd/ceph-14 /dev/sdc1 889G 676G 214G 77% /var/lib/ceph/osd/ceph-3 /dev/sdb1 889G 667G 222G 76% /var/lib/ceph/osd/ceph-2 /dev/sdd1 93M 5.4M 88M 6% /var/lib/ceph/osd/ceph-7 tmpfs 13G 0 13G 0% /run/user/0 root@node02:~ # ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 14.34781 root default -2 4.25287 host node01 0 hdd 0.85999 osd.0 up 0.80005 1.0 1 hdd 0.86749 osd.1 up 0.85004 1.0 6 hdd 0.87270 osd.6 up 0.90002 1.0 12 hdd 0.78000 osd.12 up 0.95001 1.0 13 hdd 0.87270 osd.13 up 0.95001 1.0 -3 3.91808 host node02 2 hdd 0.7 osd.2 up 0.80005 1.0 3 hdd 0.5 osd.3 up 0.85004 1.0 7 hdd 0.87270 osd.7 up 0.85004 1.0 11 hdd 0.87270 osd.11 up 0.75006 1.0 14 hdd 0.87270 osd.14 up 0.85004 1.0 -4 6.17686 host node03 4 hdd 0.87000 osd.4 up 1.0 1.0 5 hdd 0.87000 osd.5 up 1.0 1.0 8 hdd 0.87270 osd.8 up 1.0 1.0 10 hdd 0.87270 osd.10 up 1.0 1.0 15 hdd 0.87270 osd.15 up 1.0 1.0 16 hdd 1.81879 osd.16 up 1.0 1.0 root@node01:~ # ceph osd df tree ID CLASS WEIGHT REWEIGHT SIZEUSE DATAOMAPMETA AVAIL %USE VAR PGS TYPE NAME -1 14.55780- 14.9TiB 9.45TiB 7.46TiB 1.47GiB 23.2GiB 5.43TiB 63.52 1.00 - root default -24.27286- 4.35TiB 3.15TiB 2.41TiB 486MiB 7.62GiB 1.21TiB 72.32 1.14 - host node01 0 hdd 0.85999 0.80005 888GiB 619GiB 269GiB 92.3MiB 0B 269GiB 69.72 1.10 89 osd.0 1 hdd 0.86749 0.85004 888GiB 641GiB 248GiB 109MiB 0B 248GiB 72.12 1.14 92 osd.1 6 hdd 0.87270 0.90002 894GiB 634GiB 632GiB 98.9MiB 2.65GiB 259GiB 70.99 1.12 107 osd.6 12 hdd 0.7 0.95001 894GiB 664GiB 661GiB 94.4MiB 2.52GiB 230GiB 74.31 1.17 112 osd.12 13 hdd 0.87270 0.95001 894GiB 665GiB 663GiB 91.7MiB 2.46GiB 229GiB 74.43 1.17 112 osd.13 -34.10808- 4.35TiB 3.17TiB 2.18TiB 479MiB 6.99GiB 1.18TiB 72.86 1.15 - host node02 2 hdd 0.78999 0.75006 888GiB 654GiB 235GiB 95.6MiB 0B 235GiB 73.57 1.16 94 osd.2 3 hdd 0.7 0.80005 888GiB 737GiB 151GiB 114MiB 0B 151GiB 82.98 1.31 105 osd.3 7 hdd 0.87270 0.85004 894GiB 612GiB 610GiB 88.9MiB 2.43GiB 281GiB 68.50 1.08 103 osd.7 11 hdd 0.87270 0.75006 894GiB 576GiB 574GiB 81.8MiB 2.19GiB 317GiB 64.47 1.01 97 osd.11 14 hdd 0.87270 0.85004 894GiB 669GiB 666GiB 98.8MiB 2.37GiB 225GiB 74.85 1.18 112 osd.14 -46.17686- 6.17TiB 3.13TiB 2.86TiB 541MiB 8.58GiB 3.04TiB 50.73 0.80 - host node03 4 hdd 0.87000 1.0 888GiB 504GiB 384GiB 124MiB 0B 384GiB 56.72 0.89 72 osd.4 5 hdd 0.87000 1.0 888GiB 520GiB 368GiB 96.2MiB 0B 368GiB 58.57 0.92 75 osd.5 8 hdd 0.87270 1.0 894GiB 508GiB 505GiB 80.2MiB 2.07GiB 386GiB 56.80 0.89 85 osd.8 10 hdd 0.87270 1.0 894GiB 374GiB 373GiB 51.9MiB 1.73GiB 519GiB 41.88 0.66 63 osd.10 15 hdd 0.87270 1.0 894GiB 504GiB 502GiB 60.1MiB 1.99GiB 390GiB 56.37 0.89 84 osd
[ceph-users] Re: Octopus upgrade breaks Ubuntu 18.04 libvirt
Alexander, here you go: 1. strace of the libvirtd -l process: root@ais-cloudhost1:/etc/libvirt# strace -f -p 53745 strace: Process 53745 attached with 17 threads [pid 53786] futex(0x7fd90c0f4618, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 2, NULL, 0x [pid 53785] futex(0x55699ad13ae0, FUTEX_WAIT_PRIVATE, 0, NULL [pid 53784] futex(0x55699ad13ae0, FUTEX_WAIT_PRIVATE, 0, NULL [pid 53783] futex(0x55699ad13ae0, FUTEX_WAIT_PRIVATE, 0, NULL [pid 53782] futex(0x55699ad13ae0, FUTEX_WAIT_PRIVATE, 0, NULL [pid 53781] futex(0x55699ad13ae0, FUTEX_WAIT_PRIVATE, 0, NULL [pid 53764] futex(0x55699acc5fc0, FUTEX_WAIT_PRIVATE, 0, NULL [pid 53763] futex(0x55699acc5fc0, FUTEX_WAIT_PRIVATE, 0, NULL [pid 53762] futex(0x55699acc5fc0, FUTEX_WAIT_PRIVATE, 0, NULL [pid 53761] futex(0x55699acc5fc0, FUTEX_WAIT_PRIVATE, 0, NULL [pid 53760] futex(0x55699acc5fc0, FUTEX_WAIT_PRIVATE, 0, NULL [pid 53759] futex(0x55699acc5f20, FUTEX_WAIT_PRIVATE, 0, NULL [pid 53757] futex(0x55699acc5f20, FUTEX_WAIT_PRIVATE, 0, NULL [pid 53756] futex(0x55699acc5f20, FUTEX_WAIT_PRIVATE, 0, NULL [pid 53755] futex(0x55699acc5f20, FUTEX_WAIT_PRIVATE, 0, NULL [pid 53754] futex(0x55699acc5f20, FUTEX_WAIT_PRIVATE, 0, NULL [pid 53745] restart_syscall(<... resuming interrupted poll ...> 2. The strace -f of the virsh pool-list process (the last hundred lines or so): close(3)= 0 openat(AT_FDCWD, "/proc/net/psched", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 read(3, "03e8 0040 000f4240 3b9ac"..., 1024) = 36 close(3)= 0 openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=1683056, ...}) = 0 mmap(NULL, 1683056, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9137936000 close(3)= 0 ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0 ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0 getuid()= 0 geteuid() = 0 openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/gconv/gconv-modules.cache", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=26376, ...}) = 0 mmap(NULL, 26376, PROT_READ, MAP_SHARED, 3, 0) = 0x7f9137afa000 close(3)= 0 futex(0x7f91366cca08, FUTEX_WAKE_PRIVATE, 2147483647) = 0 uname({sysname="Linux", nodename="ais-cloudhost1", ...}) = 0 futex(0x7f9137286ce0, FUTEX_WAKE_PRIVATE, 2147483647) = 0 socket(AF_INET6, SOCK_DGRAM, IPPROTO_IP) = 3 close(3)= 0 futex(0x7f9137287700, FUTEX_WAKE_PRIVATE, 2147483647) = 0 pipe2([3, 4], O_NONBLOCK|O_CLOEXEC) = 0 mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f912ac5f000 mprotect(0x7f912ac6, 8388608, PROT_READ|PROT_WRITE) = 0 clone(child_stack=0x7f912b45edb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SE TTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f912b45f9d0, tls=0x7f912b45f700, child_tidptr=0x7f912b4 5f9d0) = 54510 strace: Process 54510 attached [pid 54509] futex(0x7f9137286fb8, FUTEX_WAKE_PRIVATE, 2147483647 [pid 54510] set_robust_list(0x7f912b45f9e0, 24 [pid 54509] <... futex resumed> ) = 0 [pid 54510] <... set_robust_list resumed> ) = 0 [pid 54509] futex(0x7f91372876f8, FUTEX_WAKE_PRIVATE, 2147483647 [pid 54510] mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0 [pid 54509] <... futex resumed> ) = 0 [pid 54510] <... mmap resumed> )= 0x7f9122c5f000 [pid 54509] geteuid( [pid 54510] munmap(0x7f9122c5f000, 20582400 [pid 54509] <... geteuid resumed> ) = 0 [pid 54510] <... munmap resumed> ) = 0 [pid 54509] access("/etc/libvirt/libvirt.conf", F_OK [pid 54510] munmap(0x7f912800, 46526464 [pid 54509] <... access resumed> ) = 0 [pid 54510] <... munmap resumed> ) = 0 [pid 54509] openat(AT_FDCWD, "/etc/libvirt/libvirt.conf", O_RDONLY [pid 54510] mprotect(0x7f912400, 135168, PROT_READ|PROT_WRITE [pid 54509] <... openat resumed> ) = 5 [pid 54510] <... mprotect resumed> )= 0 [pid 54509] read(5, "#\n# This can be used to setup UR"..., 8192) = 547 [pid 54510] futex(0x7f9137287218, FUTEX_WAKE_PRIVATE, 2147483647 [pid 54509] read(5, [pid 54510] <... futex resumed> ) = 0 [pid 54509] <... read resumed> "", 7645) = 0 [pid 54510] gettid( [pid 54509] close(5 [pid 54510] <... gettid resumed> ) = 54510 [pid 54509] <... close resumed> ) = 0 [pid 54509] getuid( [pid 54510] poll([{fd=3, events=POLLIN}], 1, -1 [pid 54509] <... getuid resumed> ) = 0 [pid 54509] geteuid() = 0 [pid 54509] access("/proc/vz", F_OK)= -1 ENOENT (No such file or directory) [pid 54509] geteuid() = 0 [pid 54509] getuid()= 0 [pid 54509] geteuid() = 0 [pid 54509] socket(AF_UNIX, SOCK_STREAM, 0) = 5 [pid 54509] connect(5, {sa_family=AF_U
[ceph-users] RBD thin provisioning and time to format a volume
Hello, My understanding is that the time to format an RBD volume is not dependent on its size as the RBD volumes are thin provisioned. Is this correct? For example, formatting a 1G volume should take almost the same time as formatting a 1TB volume - although accounting for differences in latencies due to load on the Ceph cluster. Is that a fair assumption? Thanks, Shridhar ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] post - bluestore default vs tuned performance comparison
Hi, For this post: https://ceph.io/community/bluestore-default-vs-tuned-performance-comparison/ I don't see a way to contact the authors so I thought I would try here. Does anyone know how the rocksdb tuning parameters of: " bluestore_rocksdb_options = compression=kNoCompression,max_write_buffer_number=32,min_write_buffer_number_to_merge=2,recycle_log_file_num=32,compaction_style=kCompactionStyleLevel,write_buffer_size=67108864,target_file_size_base=67108864,max_background_compactions=31,level0_file_num_compaction_trigger=8,level0_slowdown_writes_trigger=32,level0_stop_writes_trigger=64,max_bytes_for_level_base=536870912,compaction_threads=32,max_bytes_for_level_multiplier=8,flusher_threads=8,compaction_readahead_size=2MB " were chosen? Some of the settings seem to not be in line with the rocksdb tuning guide: https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide thx Frank ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: RBD thin provisioning and time to format a volume
On Wed, Jul 8, 2020 at 3:28 PM Void Star Nill wrote: > > Hello, > > My understanding is that the time to format an RBD volume is not dependent > on its size as the RBD volumes are thin provisioned. Is this correct? > > For example, formatting a 1G volume should take almost the same time as > formatting a 1TB volume - although accounting for differences in latencies > due to load on the Ceph cluster. Is that a fair assumption? Yes, that is a fair comparison when creating the RBD image. However, a format operation might initialize and discard extents on the disk, so a larger disk will take longer to format. > Thanks, > Shridhar > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > -- Jason ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: RBD thin provisioning and time to format a volume
On Wed, Jul 8, 2020 at 4:56 PM Jason Dillaman wrote: > On Wed, Jul 8, 2020 at 3:28 PM Void Star Nill > wrote: > > > > Hello, > > > > My understanding is that the time to format an RBD volume is not > dependent > > on its size as the RBD volumes are thin provisioned. Is this correct? > > > > For example, formatting a 1G volume should take almost the same time as > > formatting a 1TB volume - although accounting for differences in > latencies > > due to load on the Ceph cluster. Is that a fair assumption? > > Yes, that is a fair comparison when creating the RBD image. However, a > format operation might initialize and discard extents on the disk, so > a larger disk will take longer to format. Thanks for the response Jason. Could you please explain a bit more on the the format operation? Is there a relative time that we can determine based on the volume size? Thanks Shridhar > > > Thanks, > > Shridhar > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > > > -- > Jason > > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Ceph stuck at: objects misplaced (0.064%)
Do you have pg_autoscaler enabled or the balancer module? Zitat von Ml Ml : Hello, ceph is stuck since 4 days with 0.064% misplaced and i dunno why. Can anyone help me to get it fixed? I did restart some OSDs and reweight them again to get some data moving but that did not help. root@node01:~ # ceph -s cluster: id: 251c937e-0b55-48c1-8f34-96e84e4023d4 health: HEALTH_WARN 1803/2799972 objects misplaced (0.064%) mon node02 is low on available space services: mon: 3 daemons, quorum node01,node02,node03 mgr: node03(active), standbys: node01, node02 osd: 16 osds: 16 up, 16 in; 1 remapped pgs data: pools: 1 pools, 512 pgs objects: 933.32k objects, 2.68TiB usage: 9.54TiB used, 5.34TiB / 14.9TiB avail pgs: 1803/2799972 objects misplaced (0.064%) 511 active+clean 1 active+clean+remapped io: client: 131KiB/s rd, 8.57MiB/s wr, 28op/s rd, 847op/s wr root@node01:~ # ceph health detail HEALTH_WARN 1803/2800179 objects misplaced (0.064%); mon node02 is low on available space OBJECT_MISPLACED 1803/2800179 objects misplaced (0.064%) MON_DISK_LOW mon node02 is low on available space mon.node02 has 28% avail root@node01:~ # ceph versions { "mon": { "ceph version 12.2.13 (98af9a6b9a46b2d562a0de4b09263d70aeb1c9dd) luminous (stable)": 3 }, "mgr": { "ceph version 12.2.13 (98af9a6b9a46b2d562a0de4b09263d70aeb1c9dd) luminous (stable)": 3 }, "osd": { "ceph version 12.2.13 (98af9a6b9a46b2d562a0de4b09263d70aeb1c9dd) luminous (stable)": 16 }, "mds": {}, "overall": { "ceph version 12.2.13 (98af9a6b9a46b2d562a0de4b09263d70aeb1c9dd) luminous (stable)": 22 } } root@node02:~ # df -h Filesystem Size Used Avail Use% Mounted on udev 63G 0 63G 0% /dev tmpfs 13G 1.3G 12G 11% /run /dev/sda3 46G 31G 14G 70% / tmpfs 63G 57M 63G 1% /dev/shm tmpfs 5.0M 0 5.0M 0% /run/lock tmpfs 63G 0 63G 0% /sys/fs/cgroup /dev/sda1 922M 206M 653M 24% /boot /dev/fuse 30M 144K 30M 1% /etc/pve /dev/sde1 93M 5.4M 88M 6% /var/lib/ceph/osd/ceph-11 /dev/sdf1 93M 5.4M 88M 6% /var/lib/ceph/osd/ceph-14 /dev/sdc1 889G 676G 214G 77% /var/lib/ceph/osd/ceph-3 /dev/sdb1 889G 667G 222G 76% /var/lib/ceph/osd/ceph-2 /dev/sdd1 93M 5.4M 88M 6% /var/lib/ceph/osd/ceph-7 tmpfs 13G 0 13G 0% /run/user/0 root@node02:~ # ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 14.34781 root default -2 4.25287 host node01 0 hdd 0.85999 osd.0 up 0.80005 1.0 1 hdd 0.86749 osd.1 up 0.85004 1.0 6 hdd 0.87270 osd.6 up 0.90002 1.0 12 hdd 0.78000 osd.12 up 0.95001 1.0 13 hdd 0.87270 osd.13 up 0.95001 1.0 -3 3.91808 host node02 2 hdd 0.7 osd.2 up 0.80005 1.0 3 hdd 0.5 osd.3 up 0.85004 1.0 7 hdd 0.87270 osd.7 up 0.85004 1.0 11 hdd 0.87270 osd.11 up 0.75006 1.0 14 hdd 0.87270 osd.14 up 0.85004 1.0 -4 6.17686 host node03 4 hdd 0.87000 osd.4 up 1.0 1.0 5 hdd 0.87000 osd.5 up 1.0 1.0 8 hdd 0.87270 osd.8 up 1.0 1.0 10 hdd 0.87270 osd.10 up 1.0 1.0 15 hdd 0.87270 osd.15 up 1.0 1.0 16 hdd 1.81879 osd.16 up 1.0 1.0 root@node01:~ # ceph osd df tree ID CLASS WEIGHT REWEIGHT SIZEUSE DATAOMAPMETA AVAIL %USE VAR PGS TYPE NAME -1 14.55780- 14.9TiB 9.45TiB 7.46TiB 1.47GiB 23.2GiB 5.43TiB 63.52 1.00 - root default -24.27286- 4.35TiB 3.15TiB 2.41TiB 486MiB 7.62GiB 1.21TiB 72.32 1.14 - host node01 0 hdd 0.85999 0.80005 888GiB 619GiB 269GiB 92.3MiB 0B 269GiB 69.72 1.10 89 osd.0 1 hdd 0.86749 0.85004 888GiB 641GiB 248GiB 109MiB 0B 248GiB 72.12 1.14 92 osd.1 6 hdd 0.87270 0.90002 894GiB 634GiB 632GiB 98.9MiB 2.65GiB 259GiB 70.99 1.12 107 osd.6 12 hdd 0.7 0.95001 894GiB 664GiB 661GiB 94.4MiB 2.52GiB 230GiB 74.31 1.17 112 osd.12 13 hdd 0.87270 0.95001 894GiB 665GiB 663GiB 91.7MiB 2.46GiB 229GiB 74.43 1.17 112 osd.13 -34.10808- 4.35TiB 3.17TiB 2.18TiB 479MiB 6.99GiB 1.18TiB 72.86 1.15 - host node02 2 hdd 0.78999 0.75006 888GiB 654GiB 235GiB 95.6MiB 0B 235GiB 73.57 1.16 94 osd.2 3 hdd 0.7 0.80005 888GiB 737GiB 151GiB 114MiB 0B 151GiB 82.98 1.31 105 osd.3 7 hdd 0.87270 0.85004 894GiB 612GiB 610GiB 88.9MiB 2.43GiB 281GiB 68.50 1.08 103 osd.7 11 hdd 0.87270 0.75006 894GiB 576GiB 574GiB 81.8MiB 2.19GiB 317GiB 64.47 1.01 97 osd.11 14 hdd 0.87270 0.85004 894GiB 669GiB 666GiB 98.8MiB 2.37GiB 225GiB 74.85 1.18 112 osd.14 -46.17686- 6.17TiB 3.13TiB 2.86TiB 541MiB 8.58GiB 3.04TiB 50.73 0.80 - host node03 4 hdd 0.87000 1.0 888GiB 504GiB 384GiB 124MiB 0B 384GiB 56.72 0.89 72 osd.4 5 hdd 0.87000 1.0 888GiB 520GiB 368GiB 96.2MiB 0B 368GiB 58.57 0.92 75 osd.5 8 hdd 0.87270 1.0 894GiB 508GiB 505GiB 80.2MiB 2.07GiB 386GiB 56.80 0.89 85 osd.8 10 hdd 0.87270 1.0 894GiB 374GiB 373GiB 51.9MiB 1.73GiB 519GiB 41.88 0.66 63 osd.10 15 hdd 0.87270