[ceph-users] mds damaged with preallocated inodes that are inconsistent with inotable

2024-08-07 Thread zxcs
HI, Experts, we are running a cephfs with V16.2.*, and has multi active mds. Currently, we are hitting a mds fs cephfs mds.* id damaged. and this mds always complain “client *** loaded with preallocated inodes that are inconsistent with inotable” and the mds always suicide during replay

[ceph-users] Ceph nvme timeout and then aborting

2021-02-19 Thread zxcs
Hi, I have one ceph cluster with nautilus 14.2.10 and one node has 3 SSD and 4 HDD each. Also has two nvmes as cache. (Means nvme0n1 cache for 0-2 SSD and Nvme1n1 cache for 3-7 HDD) but there is one nodes’ nvme0n1 always hit below issues(see name..I/O…timeout, aborting), and sudden this nv

[ceph-users] Re: Ceph nvme timeout and then aborting

2021-02-19 Thread zxcs
: 0 C Thanks, zx > 在 2021年2月19日,下午6:01,Konstantin Shalygin 写道: > > Please paste your `name smart-log /dev/nvme0n1` output > > > > k > >> On 19 Feb 2021, at 12:53, zxcs > <mailto:zhuxion...@163.com>> wrote: >> >> I ha

[ceph-users] Re: Ceph nvme timeout and then aborting

2021-02-19 Thread zxcs
ature Sensor 8: 0 C Thanks, zx > 在 2021年2月19日,下午6:08,zxcs 写道: > > Thank you very much, Konstantin! > > Here is the output of `nvme smart-log /dev/nvme0n1` > > Smart Log for NVME device:nvme0n1 namespace-id: > critical_warning

[ceph-users] Re: Ceph nvme timeout and then aborting

2021-02-19 Thread zxcs
you mean OS? it ubuntu 16.04 and Nvme is Samsung 970 PRO 1TB. Thanks, zx > 在 2021年2月19日,下午6:56,Konstantin Shalygin <mailto:k0...@k0ste.ru>> 写道: > > Look's good, what is your hardware? Server model & NVM'es? > > > > k > >> On 19 Feb 2021

[ceph-users] Re: Ceph nvme timeout and then aborting

2021-02-21 Thread zxcs
One nvme sudden crash again. Could anyone please help shed some light here? Thank a ton!!! Below are syslog and ceph log. From /var/log/syslog Feb 21 19:38:33 ip kernel: [232562.847916] nvme :03:00.0: I/O 943 QID 7 timeout, aborting Feb 21 19:38:34 ip kernel: [232563.847946] nvme :03:0

[ceph-users] Re: Ceph nvme timeout and then aborting

2021-02-21 Thread zxcs
on, Feb 22, 2021 at 1:56 AM zxcs wrote: >> >> One nvme sudden crash again. Could anyone please help shed some light here? >> Thank a ton!!! >> Below are syslog and ceph log. >> >> From /var/log/syslog >> Feb 21 19:38:33 ip kernel: [232562.847916] nvme

[ceph-users] Re: Ceph nvme timeout and then aborting

2021-02-22 Thread zxcs
Haven’t do any fio test for single disk , but did fio for the ceph cluster, actually the cluster has 12 nodes, and each node has same disks(means, 2 nvmes for cache, and 3 ssds as osd, 4 hdds also as osd). Only two nodes has such problem. And these two nodes are crash many times(at least 4 time

[ceph-users] Re: Ceph nvme timeout and then aborting

2021-02-22 Thread zxcs
past. If you did not do your research on > drives, I think it is probably your drives. > > " just throw away your crappy Samsung SSD 860 Pro " > https://www.mail-archive.com/ceph-users@ceph.io/msg06820.html > > > >> -Original Message- >> From

[ceph-users] Re: Ceph nvme timeout and then aborting

2021-02-22 Thread zxcs
; when you least expect it. Putting the db/wal on a separate drive is > usually premature optimization that is only useful for benchmarkers. > My opinion of course. > > Mark > > > > > > > > > On Sun, Feb 21, 2021 at 7:16 PM zxcs wrote: >

[ceph-users] how to disable ceph version check?

2023-11-07 Thread zxcs
Hi, Experts, we have a ceph cluster report HEALTH_ERR due to multiple old versions. health: HEALTH_ERR There are daemons running multiple old versions of ceph after run `ceph version`, we see three ceph versions in {16.2.*} , these daemons are ceph osd. our question is: how to

[ceph-users] mds hit find_exports balancer runs too long

2023-11-09 Thread zxcs
Hi, Experts, we have a CephFS cluster running with 16.2.*, and enable multi active mds, found somehow mds complain some info as below: mds.*.bal find_exports balancer runs too long and we already set below config mds_bal_interval = 30 mds_bal_sample_interval = 12 and then we can

[ceph-users] really need help how to save old client out of hang?

2023-11-16 Thread zxcs
Hi, Experts, we have an cephfs cluster 16.2.* run with multi active mds, and we have some old machine run with ubuntu 16.04 , so we mount these client using ceph-fuse. After a full mds process restart, all of these old ubuntu 16.04 clients cannot connect to ceph , `ls -lrth` or `df -hT` hang o

[ceph-users] mds slow request with “failed to authpin, subtree is being exported"

2023-11-22 Thread zxcs
HI, Experts, we are using cephfs with 16.2.* with multi active mds, and recently, we have two nodes mount with ceph-fuse due to the old os system. and one nodes run a python script with `glob.glob(path)`, and another client doing `cp` operation on the same path. then we see some log about

[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

2023-11-22 Thread zxcs
here? Thanks, xz > 2023年11月22日 19:44,Xiubo Li 写道: > > > On 11/22/23 16:02, zxcs wrote: >> HI, Experts, >> >> we are using cephfs with 16.2.* with multi active mds, and recently, we >> have two nodes mount with ceph-fuse due to the old os system. >&

[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

2023-11-26 Thread zxcs
Li 于2023年11月23日周四 15:47写道: > >> >> On 11/23/23 11:25, zxcs wrote: >>> Thanks a ton, Xiubo! >>> >>> it not disappear. >>> >>> even we umount the ceph directory on these two old os node. >>> >>> after dump ops flight

[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

2023-12-04 Thread zxcs
question is, why it still see "internal op exportdir”, any other config also need to set 0? and could please shed light here which config we need set . Thanks, xz > 2023年11月27日 13:19,Xiubo Li 写道: > > > On 11/27/23 13:12, zxcs wrote: >> current, we using `ceph config s

[ceph-users] Cephfs too many repaired copies on osds

2023-12-12 Thread zxcs
Hi, Experts, we are using cephfs with 16.2.* with multi active mds, and recently we see an osd report “full object read crc *** != expected ox on :head” “missing primary copy of ***: will try to read copies on **” from `ceph -s`, could see OSD_TOO_MANY_REPAIRS: Too many repaired

[ceph-users] Re: Cephfs too many repaired copies on osds

2023-12-12 Thread zxcs
Also osd frequently report these ERROR logs, lead this osd has slow request. how to stop these log ? > “full object read crc *** != expected ox on :head” > “missing primary copy of ***: will try to read copies on **” Thanks xz > 2023年12月13日 01:20,zxcs 写道: > &

[ceph-users] cephfs read hang after cluster stuck, but need attach the process to continue

2023-12-13 Thread zxcs
Hi, experts, we are using cephfs with 16.2.* with multi active mds, and recently we see a strange thing, we have some c++ code about read file from cephfs. the client code just call very base read(), and when the cluster hit mds has slow request, and later the cluster back to normal. the r

[ceph-users] binary file cannot execute in cephfs directory

2022-08-22 Thread zxcs
Hi, experts, We are using cephfs 15.2.13, and after mount ceph on one node, copy a binary into the ceph dir, see below (cmake-3.22 is a binary), but when i using `./cmake-3.22` it report permission denied, why? this file has “x” permission, and “ld" is the binary file owner. could anyone p

[ceph-users] Re: binary file cannot execute in cephfs directory

2022-08-22 Thread zxcs
.22: Permission denied > 2022年8月23日 08:57,zxcs 写道: > > Hi, experts, > > > We are using cephfs 15.2.13, and after mount ceph on one node, copy a binary > into the ceph dir, see below (cmake-3.22 is a binary), > > but when i using `./cmake-3.22` it report per

[ceph-users] Re: binary file cannot execute in cephfs directory

2022-08-23 Thread zxcs
oh, yes, there is a “noexec” option in the mount command. Thanks a ton! Thanks, Xiong > 2022年8月23日 22:01,Daniel Gryniewicz 写道: > > Does the mount have the "noexec" option on it? > > Daniel > > On 8/22/22 21:02, zxcs wrote: >> In case someone missing the

[ceph-users] how to fix slow request without remote or restart mds

2022-08-26 Thread zxcs
Hi, experts we have a cephfs cluster with 15.2.* version and kernel mount, today there is a health report mds slow request as below, i checked this mds log, seems it report some slow request for a long time. mds report: 1 MDSs report slow requests mds log: log_channel(cluster) log [WRN]

[ceph-users] Re: how to fix slow request without remote or restart mds

2022-08-30 Thread zxcs
Thanks a ton! Yes, restart mds fixed this. But can’t confirm it hit bug 50840, seems when we read huge small files will hit this! (means more than 10,000 small files in one directory ). Thanks Xiong > 2022年8月26日 19:13,Stefan Kooman 写道: > > On 8/26/22 12:33, zxcs wrote: >&

[ceph-users] how to fix mds stuck at dispatched without restart ads

2022-08-30 Thread zxcs
Hi, experts we have a cephfs(15.2.13) cluster with kernel mount, and when we read from 2000+ processes to one ceph path(called /path/to/A/), then all of the process hung, and ls -lrth /path/to/A/ always stuck, but list other directory are health( /path/to/B/), health detail always report md

[ceph-users] Re: how to fix mds stuck at dispatched without restart ads

2022-09-01 Thread zxcs
after upgrade. Will try to flush ads journal option when we hit this bug next time(if no user urgent need list directory). Seems it can 100% recurrent these days. Thanks All! Thanks, zx > 2022年8月31日 15:23,Xiubo Li 写道: > > > On 8/31/22 2:43 PM, zxcs wrote: >> Hi,

[ceph-users] how to speed up hundreds of millions small files read base on cephfs?

2022-09-01 Thread zxcs
Hi, experts, We are using cephfs(15.2.*) with kernel mount on our production environment. And these days when we do massive read from cluster(multi processes), ceph health always report slow ops for some osds(build with hdd(8TB) which using ssd as db cache). our cluster have more read than w

[ceph-users] how to sync data on two site CephFS

2023-02-16 Thread zxcs
Hi, Experts, we already have a CephFS cluster, called A, and now we want to setup another CephFS cluster(called B) in other site. And we need to synchronize data with each other for some directory(if all directory can synchronize , then very very good), Means when we write a file in A cluste

[ceph-users] how to set load balance on multi active mds?

2023-08-09 Thread zxcs
Hi, experts, we have a product env build with ceph version 16.2.11 pacific, and using CephFS. Also enable multi active mds(more than 10), but we usually see load unbalance on our client request with these mds. see below picture. the top 1 mds has 32.2k client request. and the last one only 3

[ceph-users] Re: how to set load balance on multi active mds?

2023-08-09 Thread zxcs
; [2] > https://docs.ceph.com/en/reef/cephfs/multimds/#dynamic-subtree-partitioning-with-balancer-on-specific-ranks > > Zitat von zxcs mailto:zhuxion...@163.com>>: > >> Hi, experts, >> >> we have a product env build with ceph version 16.2.11 pacific, and using >

[ceph-users] how to list ceph file size on ubuntu 20.04

2021-11-16 Thread zxcs
Hi, I want to list cephfs directory size on ubuntu 20.04, but when I use ls -alh [directory] ,it shows the number of files and directorys under this directory(it only count number not size) , i remember when i use ls -alh [directory] on ubuntu 16.04, it will shows the size of this directory (i

[ceph-users] Re: how to list ceph file size on ubuntu 20.04

2021-11-17 Thread zxcs
Thanks a ton!!! Very helps!Thanks,Xiong在 2021年11月17日,上午11:16,胡 玮文 写道:There is a rbytes mount option [1]. Besides, you can use “getfattr -n ceph.dir.rbytes /path/in/cephfs”[1]: https://docs.ceph.com/en/latest/man/8/mount.ceph/#advancedWeiwen Hu在 2021年11月17日,10:26,zxcs 写道:Hi,I want to list cephfs

[ceph-users] How to using alluxio with Cephfs as backend storage?

2021-11-25 Thread zxcs
Hi, I am want to using alluxio to speed up the read/write cephfs, so want to ask if anyone already did this ? Any wiki or experience to share how to setup the environment? I know there is a wiki about alluxio using cephfs as backend storage https://docs.alluxio.io/os/user/stable/en/ufs/CephF

[ceph-users] Re: How to using alluxio with Cephfs as backend storage?

2021-11-25 Thread zxcs
Wow, so supervised! Words cannot express my thanks for you, yantao! I send you a mail with my detail questions, would you please help to check. Thanks a ton Thanks, Xiong > 在 2021年11月26日,上午10:47,xueyantao2114 写道: > > First, thanks for you question. Alluxio underfs ceph and ce

[ceph-users] how to change system time with cephfs not lost connect

2022-01-04 Thread zxcs
Hi, Recently we need do some change timezone test on our ubuntu node. And this node mount a cephfs with kernel driver. when I changed the time of the system(for example, current is 2022-01-05 09:00:00, then we change the time to 2022-01-03 08:00:00 using date command), after about 30m~1h, this