HI, Experts,
we are running a cephfs with V16.2.*, and has multi active mds. Currently, we
are hitting a mds fs cephfs mds.* id damaged. and this mds always complain
“client *** loaded with preallocated inodes that are inconsistent with
inotable”
and the mds always suicide during replay
Hi,
I have one ceph cluster with nautilus 14.2.10 and one node has 3 SSD and 4 HDD
each.
Also has two nvmes as cache. (Means nvme0n1 cache for 0-2 SSD and Nvme1n1
cache for 3-7 HDD)
but there is one nodes’ nvme0n1 always hit below issues(see name..I/O…timeout,
aborting), and sudden this nv
: 0 C
Thanks,
zx
> 在 2021年2月19日,下午6:01,Konstantin Shalygin 写道:
>
> Please paste your `name smart-log /dev/nvme0n1` output
>
>
>
> k
>
>> On 19 Feb 2021, at 12:53, zxcs > <mailto:zhuxion...@163.com>> wrote:
>>
>> I ha
ature Sensor 8: 0 C
Thanks,
zx
> 在 2021年2月19日,下午6:08,zxcs 写道:
>
> Thank you very much, Konstantin!
>
> Here is the output of `nvme smart-log /dev/nvme0n1`
>
> Smart Log for NVME device:nvme0n1 namespace-id:
> critical_warning
you mean OS? it ubuntu 16.04 and Nvme is Samsung 970 PRO 1TB.
Thanks,
zx
> 在 2021年2月19日,下午6:56,Konstantin Shalygin <mailto:k0...@k0ste.ru>> 写道:
>
> Look's good, what is your hardware? Server model & NVM'es?
>
>
>
> k
>
>> On 19 Feb 2021
One nvme sudden crash again. Could anyone please help shed some light here?
Thank a ton!!!
Below are syslog and ceph log.
From /var/log/syslog
Feb 21 19:38:33 ip kernel: [232562.847916] nvme :03:00.0: I/O 943 QID 7
timeout, aborting
Feb 21 19:38:34 ip kernel: [232563.847946] nvme :03:0
on, Feb 22, 2021 at 1:56 AM zxcs wrote:
>>
>> One nvme sudden crash again. Could anyone please help shed some light here?
>> Thank a ton!!!
>> Below are syslog and ceph log.
>>
>> From /var/log/syslog
>> Feb 21 19:38:33 ip kernel: [232562.847916] nvme
Haven’t do any fio test for single disk , but did fio for the ceph cluster,
actually the cluster has 12 nodes, and each node has same disks(means, 2 nvmes
for cache, and 3 ssds as osd, 4 hdds also as osd).
Only two nodes has such problem. And these two nodes are crash many times(at
least 4 time
past. If you did not do your research on
> drives, I think it is probably your drives.
>
> " just throw away your crappy Samsung SSD 860 Pro "
> https://www.mail-archive.com/ceph-users@ceph.io/msg06820.html
>
>
>
>> -Original Message-
>> From
; when you least expect it. Putting the db/wal on a separate drive is
> usually premature optimization that is only useful for benchmarkers.
> My opinion of course.
>
> Mark
>
>
>
>
>
>
>
>
> On Sun, Feb 21, 2021 at 7:16 PM zxcs wrote:
>
Hi, Experts,
we have a ceph cluster report HEALTH_ERR due to multiple old versions.
health: HEALTH_ERR
There are daemons running multiple old versions of ceph
after run `ceph version`, we see three ceph versions in {16.2.*} , these
daemons are ceph osd.
our question is: how to
Hi, Experts,
we have a CephFS cluster running with 16.2.*, and enable multi active mds,
found somehow mds complain some info as below:
mds.*.bal find_exports balancer runs too long
and we already set below config
mds_bal_interval = 30
mds_bal_sample_interval = 12
and then we can
Hi, Experts,
we have an cephfs cluster 16.2.* run with multi active mds, and we have some
old machine run with ubuntu 16.04 , so we mount these client using ceph-fuse.
After a full mds process restart, all of these old ubuntu 16.04 clients cannot
connect to ceph , `ls -lrth` or `df -hT` hang o
HI, Experts,
we are using cephfs with 16.2.* with multi active mds, and recently, we have
two nodes mount with ceph-fuse due to the old os system.
and one nodes run a python script with `glob.glob(path)`, and another client
doing `cp` operation on the same path.
then we see some log about
here?
Thanks,
xz
> 2023年11月22日 19:44,Xiubo Li 写道:
>
>
> On 11/22/23 16:02, zxcs wrote:
>> HI, Experts,
>>
>> we are using cephfs with 16.2.* with multi active mds, and recently, we
>> have two nodes mount with ceph-fuse due to the old os system.
>&
Li 于2023年11月23日周四 15:47写道:
>
>>
>> On 11/23/23 11:25, zxcs wrote:
>>> Thanks a ton, Xiubo!
>>>
>>> it not disappear.
>>>
>>> even we umount the ceph directory on these two old os node.
>>>
>>> after dump ops flight
question is, why it still see "internal op exportdir”, any other config
also need to set 0? and could please shed light here which config we need set .
Thanks,
xz
> 2023年11月27日 13:19,Xiubo Li 写道:
>
>
> On 11/27/23 13:12, zxcs wrote:
>> current, we using `ceph config s
Hi, Experts,
we are using cephfs with 16.2.* with multi active mds, and recently we see an
osd report
“full object read crc *** != expected ox on :head”
“missing primary copy of ***: will try to read copies on **”
from `ceph -s`, could see
OSD_TOO_MANY_REPAIRS: Too many repaired
Also osd frequently report these ERROR logs, lead this osd has slow request.
how to stop these log ?
> “full object read crc *** != expected ox on :head”
> “missing primary copy of ***: will try to read copies on **”
Thanks
xz
> 2023年12月13日 01:20,zxcs 写道:
>
&
Hi, experts,
we are using cephfs with 16.2.* with multi active mds, and recently we see a
strange thing,
we have some c++ code about read file from cephfs. the client code just call
very base read(),
and when the cluster hit mds has slow request, and later the cluster back to
normal. the r
Hi, experts,
We are using cephfs 15.2.13, and after mount ceph on one node, copy a binary
into the ceph dir, see below (cmake-3.22 is a binary),
but when i using `./cmake-3.22` it report permission denied, why? this file has
“x” permission, and “ld" is the binary file owner.
could anyone p
.22: Permission denied
> 2022年8月23日 08:57,zxcs 写道:
>
> Hi, experts,
>
>
> We are using cephfs 15.2.13, and after mount ceph on one node, copy a binary
> into the ceph dir, see below (cmake-3.22 is a binary),
>
> but when i using `./cmake-3.22` it report per
oh, yes, there is a “noexec” option in the mount command. Thanks a ton!
Thanks,
Xiong
> 2022年8月23日 22:01,Daniel Gryniewicz 写道:
>
> Does the mount have the "noexec" option on it?
>
> Daniel
>
> On 8/22/22 21:02, zxcs wrote:
>> In case someone missing the
Hi, experts
we have a cephfs cluster with 15.2.* version and kernel mount, today there is a
health report mds slow request as below, i checked this mds log, seems it
report some slow request for a long time.
mds report:
1 MDSs report slow requests
mds log:
log_channel(cluster) log [WRN]
Thanks a ton!
Yes, restart mds fixed this. But can’t confirm it hit bug 50840, seems when we
read huge small files will hit this! (means more than 10,000 small files in one
directory ).
Thanks
Xiong
> 2022年8月26日 19:13,Stefan Kooman 写道:
>
> On 8/26/22 12:33, zxcs wrote:
>&
Hi, experts
we have a cephfs(15.2.13) cluster with kernel mount, and when we read from
2000+ processes to one ceph path(called /path/to/A/), then all of the process
hung, and ls -lrth /path/to/A/ always stuck, but list other directory are
health( /path/to/B/),
health detail always report md
after upgrade.
Will try to flush ads journal option when we hit this bug next time(if no user
urgent need list directory). Seems it can 100% recurrent these days. Thanks All!
Thanks,
zx
> 2022年8月31日 15:23,Xiubo Li 写道:
>
>
> On 8/31/22 2:43 PM, zxcs wrote:
>> Hi,
Hi, experts,
We are using cephfs(15.2.*) with kernel mount on our production environment.
And these days when we do massive read from cluster(multi processes), ceph
health always report slow ops for some osds(build with hdd(8TB) which using ssd
as db cache).
our cluster have more read than w
Hi, Experts,
we already have a CephFS cluster, called A, and now we want to setup another
CephFS cluster(called B) in other site.
And we need to synchronize data with each other for some directory(if all
directory can synchronize , then very very good), Means when we write a file in
A cluste
Hi, experts,
we have a product env build with ceph version 16.2.11 pacific, and using
CephFS.
Also enable multi active mds(more than 10), but we usually see load unbalance
on our client request with these mds.
see below picture. the top 1 mds has 32.2k client request. and the last one
only 3
; [2]
> https://docs.ceph.com/en/reef/cephfs/multimds/#dynamic-subtree-partitioning-with-balancer-on-specific-ranks
>
> Zitat von zxcs mailto:zhuxion...@163.com>>:
>
>> Hi, experts,
>>
>> we have a product env build with ceph version 16.2.11 pacific, and using
>
Hi,
I want to list cephfs directory size on ubuntu 20.04, but when I use ls -alh
[directory] ,it shows the number of files and directorys under this
directory(it only count number not size) , i remember when i use ls -alh
[directory] on ubuntu 16.04, it will shows the size of this directory (i
Thanks a ton!!! Very helps!Thanks,Xiong在 2021年11月17日,上午11:16,胡 玮文 写道:There is a rbytes mount option [1]. Besides, you can use “getfattr -n ceph.dir.rbytes /path/in/cephfs”[1]: https://docs.ceph.com/en/latest/man/8/mount.ceph/#advancedWeiwen Hu在 2021年11月17日,10:26,zxcs 写道:Hi,I want to list cephfs
Hi,
I am want to using alluxio to speed up the read/write cephfs, so want to ask if
anyone already did this ? Any wiki or experience to share how to setup the
environment?
I know there is a wiki about alluxio using cephfs as backend storage
https://docs.alluxio.io/os/user/stable/en/ufs/CephF
Wow, so supervised! Words cannot express my thanks for you, yantao!
I send you a mail with my detail questions, would you please help to check.
Thanks a ton
Thanks,
Xiong
> 在 2021年11月26日,上午10:47,xueyantao2114 写道:
>
> First, thanks for you question. Alluxio underfs ceph and ce
Hi,
Recently we need do some change timezone test on our ubuntu node. And this node
mount a cephfs with kernel driver. when I changed the time of the system(for
example, current is 2022-01-05 09:00:00, then we change the time to 2022-01-03
08:00:00 using date command), after about 30m~1h, this
36 matches
Mail list logo