Re: [ceph-users] CephFS fsync failed and read error

2016-01-27 Thread Yan, Zheng
This seems like the user has no read/write permission to cephfs data pool. Regards Yan, Zheng On Thu, Jan 28, 2016 at 11:36 AM, FaHui Lin wrote: > Dear Ceph experts, > > I've got a problem with CephFS one day. > When I use vim to edit a file on cephfs, it will show fsync failed

Re: [ceph-users] CephFS fsync failed and read error

2016-01-28 Thread Yan, Zheng
ke > fsck?). this should be kernel client issue, no need fsck please enable kernel dynamic debug: echo module ceph +p > /sys/kernel/debug/dynamic_debug/control echo module libceph +p > /sys/kernel/debug/dynamic_debug/control run vim and cat gather the kernel log and send it to us. Reg

Re: [ceph-users] CEPHFS: standby-replay mds crash

2016-02-01 Thread Yan, Zheng
On Tue, Feb 2, 2016 at 12:52 PM, Goncalo Borges wrote: > Hi... > > Seems very similar to > > http://tracker.ceph.com/issues/14144 > > Can you confirm it is the same issue? should be the same. it's fixed by https://github.com/ceph/ceph/pull/7132 Regards

Re: [ceph-users] CephFS is not maintianing conistency

2016-02-01 Thread Yan, Zheng
On Tue, Feb 2, 2016 at 2:27 AM, Mykola Dvornik wrote: > What version are you running on your servers and clients? > Are you using 4.1 or 4.2 kernel? https://bugzilla.kernel.org/show_bug.cgi?id=104911. Upgrade to 4.3+ kernel or 4.1.17 kernel or 4.2.8 kernel can resolve this issue. > > On the clie

Re: [ceph-users] CephFS is not maintianing conistency

2016-02-02 Thread Yan, Zheng
y on > > 3.10.0-327.4.4.el7.x86_64 (CentOS Linux release 7.2.1511) > > Should I file report a bug on the RedHat bugzilla? you can open a bug at http://tracker.ceph.com/projects/cephfs/issues Regards Yan, Zheng > > On Tue, Feb 2, 2016 at 8:57 AM, Yan, Zheng wrote: > > On Tue,

Re: [ceph-users] CephFS is not maintianing conistency

2016-02-02 Thread Yan, Zheng
n using ceph-fuse on 4.3.5 kernel ? (fuse mount can also be affected by kernel bug) Regards Yan, Zheng > > Anyway I will file the ceph-fuse bug then. > > On Tue, Feb 2, 2016 at 12:43 PM, Yan, Zheng wrote: > > On Tue, Feb 2, 2016 at 5:32 PM, Mykola Dvornik > wrote: > &g

Re: [ceph-users] mds0: Client X failing to respond to capability release

2016-02-03 Thread Yan, Zheng
ow. > We're already far away from centos-Dist-Kernel. but upgrading to 4.4.x > for the clients should be possible if that might help. > mds log should contain messages like: client. isn't responding to mclientcaps(revoke) please send these messages to us. Regards Yan,

Re: [ceph-users] MDS: bad/negative dir size

2016-02-03 Thread Yan, Zheng
ix this error? Would it be safe to delete the affected directory? > Markus Directory 603 is special directory which is used by MDS internally. The error has already been fixed when MDS reveals these message. No need to worry about. Regards Yan, Zheng

Re: [ceph-users] mds0: Client X failing to respond to capability release

2016-02-03 Thread Yan, Zheng
> On Feb 3, 2016, at 21:50, Michael Metz-Martini | SpeedPartner GmbH > wrote: > > Hi, > > Am 03.02.2016 um 12:11 schrieb Yan, Zheng: >>> On Feb 3, 2016, at 17:39, Michael Metz-Martini | SpeedPartner GmbH >>> wrote: >>> Am 03.02.2016 um 10:26 sch

Re: [ceph-users] mds0: Client X failing to respond to capability release

2016-02-04 Thread Yan, Zheng
On Thu, Feb 4, 2016 at 4:36 PM, Michael Metz-Martini | SpeedPartner GmbH wrote: > Hi, > > Am 03.02.2016 um 15:55 schrieb Yan, Zheng: >>> On Feb 3, 2016, at 21:50, Michael Metz-Martini | SpeedPartner GmbH >>> wrote: >>> Am 03.02.2016 um 12:11 schrieb Yan, Z

Re: [ceph-users] mds0: Client X failing to respond to capability release

2016-02-04 Thread Yan, Zheng
> On Feb 4, 2016, at 17:00, Michael Metz-Martini | SpeedPartner GmbH > wrote: > > Hi, > > Am 04.02.2016 um 09:43 schrieb Yan, Zheng: >> On Thu, Feb 4, 2016 at 4:36 PM, Michael Metz-Martini | SpeedPartner >> GmbH wrote: >>> Am 03.02.2016 um 15:55 schri

Re: [ceph-users] mds0: Client X failing to respond to capability release

2016-02-05 Thread Yan, Zheng
> On Feb 6, 2016, at 13:41, Michael Metz-Martini | SpeedPartner GmbH > wrote: > > Hi, > > sorry for the delay - productional system unfortunately ;-( > > Am 04.02.2016 um 15:38 schrieb Yan, Zheng: >>> On Feb 4, 2016, at 17:00, Michael Metz-Martini | Spee

Re: [ceph-users] Cannot mount cephfs after some disaster recovery

2016-03-01 Thread Yan, Zheng
On Tue, Mar 1, 2016 at 11:51 AM, 1 <10...@candesoft.com> wrote: > Hi, > I meet a trouble on mount the cephfs after doing some disaster recovery > introducing by official > document(http://docs.ceph.com/docs/master/cephfs/disaster-recovery). > Now when I try to mount the cephfs, I get "m

Re: [ceph-users] MDS memory sizing

2016-03-01 Thread Yan, Zheng
f RAM then be enough for the > ceph-mon/ceph-msd nodes? > Each file inode in MDS uses about 2k memory (It's not relevant to file size). MDS memory usage depends on how large active file set are. Regards Yan, Zheng > Thanks > Dietmar > >

Re: [ceph-users] Manual or fstab mount on Ceph FS

2016-03-01 Thread Yan, Zheng
the kernel shipped by 14.04 is a little old, please update it if possible. Yan, Zheng > > Thanks in advance, > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com __

Re: [ceph-users] cephfs infernalis (ceph version 9.2.1) - bonnie++

2016-03-20 Thread Yan, Zheng
files in sequential order...Bonnie: drastic I/O error (rmdir): > Directory not empty > Cleaning up test directory after error. Please check if there are leftover files in the test directory. This seems like readdir bug (some files are missing in readdir result) in old kernel. which version o

Re: [ceph-users] cephfs infernalis (ceph version 9.2.1) - bonnie++

2016-03-21 Thread Yan, Zheng
On Mon, Mar 21, 2016 at 2:33 PM, Michael Hanscho wrote: > On 2016-03-21 05:07, Yan, Zheng wrote: >> On Sat, Mar 19, 2016 at 9:38 AM, Michael Hanscho wrote: >>> Hi! >>> >>> Trying to run bonnie++ on cephfs mounted via the kernel driver on a >

Re: [ceph-users] DONTNEED fadvise flag

2016-03-21 Thread Yan, Zheng
ncluding cephfs kernel client) that use page cache. Yan, Zheng > On 16/03/16 20:29, Gregory Farnum wrote: >> On Wed, Mar 16, 2016 at 9:46 AM, Kenneth Waegeman >> wrote: >>> Hi all, >>> >>> Quick question: Does cephFS pass the fadvise DONTNEED flag and tak

Re: [ceph-users] DONTNEED fadvise flag

2016-03-23 Thread Yan, Zheng
> On Mar 24, 2016, at 01:28, Gregory Farnum wrote: > > On Mon, Mar 21, 2016 at 6:02 AM, Yan, Zheng wrote: >> >>> On Mar 21, 2016, at 18:17, Kenneth Waegeman >>> wrote: >>> >>> Thanks! As we are using the kernel client of EL7,

Re: [ceph-users] cephfs rm -rf on directory of 160TB /40M files

2016-04-04 Thread Yan, Zheng
ts are handled. If MDS's CPU usage is less than 100%. you can try running multiple instance of 'rm -rf' (each one removes different sub-directory) Regards Yan, Zheng >> >> Thank you very much! >> >> Kenneth >> >&

Re: [ceph-users] Frozen Client Mounts

2016-04-04 Thread Yan, Zheng
On Sat, Apr 2, 2016 at 12:45 AM, Diego Castro wrote: > Ok, i got it. > Having a stable network will save the system from a node crash? What happens > if a osd goes down? Will the clients suffer from freeze mounts and things > like that? If an osd goes down, other osds will take its place and serv

Re: [ceph-users] ceph mds error

2016-04-08 Thread Yan, Zheng
newest el7 kernel. Regards Yan, Zheng > Regards > Prabu GJ > > > On Tue, 05 Apr 2016 16:43:27 +0530 John Spray wrote > > > Usually we see those warning from older clients which have some bugs. > You should use the most recent client version you can (or the mo

Re: [ceph-users] CephFS: Issues handling thousands of files under the same dir (?)

2016-04-17 Thread Yan, Zheng
che for inodes and we were wondering is there is some cache tuning > we can do with the FUSE client. Please run 'ceph daemon mds.xxx dump_ops_in_flight' a few times while running the MPI application, save the outputs and send them to us. Hopefully, they will give us hints where does t

Re: [ceph-users] cephfs does not seem to properly free up space

2016-04-19 Thread Yan, Zheng
have you ever used fancy layout? see http://tracker.ceph.com/issues/15050 On Wed, Apr 20, 2016 at 3:17 AM, Simion Rad wrote: > Mounting and unmount doesn't change anyting. > The used space reported by df command is nearly the same as the values > returned by ceph -s command. > > Example 1, df

Re: [ceph-users] mds segfault on cephfs snapshot creation

2016-04-20 Thread Yan, Zheng
On Wed, Apr 20, 2016 at 12:12 PM, Brady Deetz wrote: > As soon as I create a snapshot on the root of my test cephfs deployment with > a single file within the root, my mds server kernel panics. I understand > that snapshots are not recommended. Is it beneficial to developers for me to > leave my c

Re: [ceph-users] cephfs does not seem to properly free up space

2016-04-20 Thread Yan, Zheng
fficult to write a script to find all orphan objects and delete them. If there are multiple data pools, repeat above steps for each data pool. Regards Yan, Zheng On Wed, Apr 20, 2016 at 4:20 PM, Simion Rad wrote: > Yes, we do use customized layout settings for most of our folders. > We h

Re: [ceph-users] mds segfault on cephfs snapshot creation

2016-04-20 Thread Yan, Zheng
On Wed, Apr 20, 2016 at 11:52 PM, Brady Deetz wrote: > > > On Wed, Apr 20, 2016 at 4:09 AM, Yan, Zheng wrote: >> >> On Wed, Apr 20, 2016 at 12:12 PM, Brady Deetz wrote: >> > As soon as I create a snapshot on the root of my test cephfs deployment >> > with

Re: [ceph-users] cephfs kernel client blocks when removing large files

2018-10-21 Thread Yan, Zheng
ible that heavy data IO slowed down metadata IO? > > Test results are from a new pre-production cluster that does not have any > significant data IO. We've also confirmed the same behaviour on another > cluster with similar configuration. Both clusters have separate > device-c

Re: [ceph-users] cephfs kernel client blocks when removing large files

2018-10-22 Thread Yan, Zheng
/mnt/cephfs_mountpoint/test1 > > > >> > > > >> > > > >> Result: delay ~16 seconds > > > >> > > > >> real0m16.818s > > > >> > > > >> user0m0.000s > > > >> > > > >> s

Re: [ceph-users] cephfs kernel client blocks when removing large files

2018-10-22 Thread Yan, Zheng
> > > >> > > > > > >> > > > > > >> time ls /mnt/cephfs_mountpoint/test1 > > > > > >> > > > > > >> > > > > > >> Result: delay ~16 seconds > > > > > >> > &g

Re: [ceph-users] Do you ever encountered a similar deadlock cephfs stack?

2018-10-22 Thread Yan, Zheng
Oct 20 15:11:40 2018] ? nfsd_destroy+0x60/0x60 [nfsd] > [Sat Oct 20 15:11:40 2018] ? kthread_park+0x60/0x60 > [Sat Oct 20 15:11:40 2018] ret_from_fork+0x25/0x30 I did see this before. Please run ‘ echo t > /proc/sysrq-trigger’ and send kernel log us if you encountered this aga

Re: [ceph-users] Ceph mds memory leak while replay

2018-10-26 Thread Yan, Zheng
On Fri, Oct 26, 2018 at 2:41 AM Johannes Schlueter wrote: > > Hello, > > os: ubuntu bionic lts > ceph v12.2.7 luminous (on one node we updated to ceph-mds 12.2.8 with no luck) > 2 mds and 1 backup mds > > we just experienced a problem while restarting a mds. As it has begun to > replay the journa

Re: [ceph-users] Ceph mds memory leak while replay

2018-10-26 Thread Yan, Zheng
t; IMPORTFINISH: 197 > IMPORTSTART: 197 > OPEN: 28096 > SESSION: 2 > SESSIONS: 64 > SLAVEUPDATE: 8440 > SUBTREEMAP: 256 > UPDATE: 124222 > Errors: 0 > how about rank1 (cephfs-journal-tool --rank 1 event get summary) > Yan, Zheng schrieb am Fr., 26. O

Re: [ceph-users] Ceph mds memory leak while replay

2018-10-26 Thread Yan, Zheng
Reset the source code and apply the attached patch. It should resolve the memory issue. good luck Yan, Zheng On Fri, Oct 26, 2018 at 2:41 AM Johannes Schlueter wrote: > Hello, > > os: ubuntu bionic lts > ceph v12.2.7 luminous (on one node we updated to ceph-mds 12.2.8 with no

Re: [ceph-users] Lost machine with MON and MDS

2018-10-26 Thread Yan, Zheng
ceph-mds store all its data in object store. you just need to create new ceph-mds on another machine On Sat, Oct 27, 2018 at 1:40 AM Maiko de Andrade wrote: > Hi, > > I have 3 machine with ceph config with cephfs. But I lost one machine, > just with mon and mds. It's possible recovey cephfs? If

Re: [ceph-users] ceph-mds failure replaying journal

2018-10-29 Thread Yan, Zheng
We backported a wrong patch to 13.2.2. downgrade ceph to 13.2.1, then run 'ceph mds repaired fido_fs:1" . Sorry for the trouble Yan, Zheng On Mon, Oct 29, 2018 at 7:48 AM Jon Morby wrote: > > We accidentally found ourselves upgraded from 12.2.8 to 13.2.2 after a > ceph-

Re: [ceph-users] ceph-mds failure replaying journal

2018-10-29 Thread Yan, Zheng
upted monitor's data when downgrading ceph-mon from minic to luminous). > > > - On 29 Oct, 2018, at 08:19, Yan, Zheng wrote: > > > We backported a wrong patch to 13.2.2. downgrade ceph to 13.2.1, then run > 'ceph mds repaired fido_fs:1" . > Sorry for

Re: [ceph-users] ceph-mds failure replaying journal

2018-10-29 Thread Yan, Zheng
please try again debug_mds=10 and send log to me Regards Yan, Zheng On Mon, Oct 29, 2018 at 6:30 PM Jon Morby (Fido) wrote: > fyi, downgrading to 13.2.1 doesn't seem to have fixed the issue either :( > > --- end dump of recent events --- > 2018-10-29 10:27:50.440 7feb58b43

Re: [ceph-users] ceph-mds failure replaying journal

2018-10-29 Thread Yan, Zheng
way, I'm currently getting errors that 13.2.1 isn't available / >> shaman is offline / etc >> >> What's the best / recommended way of doing this downgrade across our >> estate? >> >> > You have already upgraded ceph-mon. I don't know If it can be

Re: [ceph-users] is it right involving cap->session_caps without lock protection in the two functions ?

2018-10-30 Thread Yan, Zheng
> On Oct 30, 2018, at 18:10, ? ? wrote: > > Hello: > Recently, we have encountered a kernel dead question, and the reason > we analyses vmcore dmesg is that list_add_tail(&cap->session_caps) in > __ceph_remove_cap has wrong,since &cap->session_cap is NULL! > so we analyses codes with

Re: [ceph-users] is it right involving cap->session_caps without lock protection in the two functions ?

2018-10-30 Thread Yan, Zheng
needed by > list_del(&cap->session_caps)? > Spin_lock is not needed here because ‘dispose' is local variable. No one else can touch items in it > 发件人: Yan, Zheng [mailto:z...@redhat.com] > 发送时间: 2018年10月30日 20:34 > 收件人: ? ? > 抄送: ceph-users > 主题: Re: is it

Re: [ceph-users] Using Cephfs Snapshots in Luminous

2018-11-12 Thread Yan, Zheng
On Mon, Nov 12, 2018 at 3:53 PM Felix Stolte wrote: > > Hi folks, > > is anybody using cephfs with snapshots on luminous? Cephfs snapshots are > declared stable in mimic, but I'd like to know about the risks using > them on luminous. Do I risk a complete cephfs failure or just some not > working s

Re: [ceph-users] get cephfs mounting clients' infomation

2018-11-18 Thread Yan, Zheng
'ceph daemon mds.xx session ls' On Mon, Nov 19, 2018 at 2:40 PM Zhenshi Zhou wrote: > > Hi, > > I have a cluster providing cephfs and it looks well. But as times > goes by, more and more clients use it. I wanna write a script > for getting the clients' informations so that I can keep everything >

Re: [ceph-users] get cephfs mounting clients' infomation

2018-11-18 Thread Yan, Zheng
whole > cephfs usage from the server. For instance, I have /docker, /kvm, /backup, > etc. > I wanna know how much space is taken up by each of them. > 'getfattr -d -m - sub-dir' > Thanks. > > Yan, Zheng 于2018年11月19日周一 下午2:50写道: >> >> 'ceph daemon

Re: [ceph-users] how to mount one of the cephfs namespace using ceph-fuse?

2018-11-20 Thread Yan, Zheng
ceph-fuse --client_mds_namespace=xxx On Tue, Nov 20, 2018 at 7:33 PM ST Wong (ITSC) wrote: > > Hi all, > > > > We’re using mimic and enabled multiple fs flag. We can do kernel mount of particular fs (e.g. fs1) with mount option mds_namespace=fs1.However, this is not working for ceph-fuse:

Re: [ceph-users] MDS damaged after mimic 13.2.1 to 13.2.2 upgrade

2018-11-20 Thread Yan, Zheng
ase to 13.2.1. > > Thank you, > Chris Martin > > > Sorry. this is caused wrong backport. downgrading mds to 13.2.1 and > > marking mds repaird can resolve this. > > > > Yan, Zheng > > On Sat, Oct 6, 2018 at 8:26 AM Sergey Malinin wrote: > > > > >

Re: [ceph-users] CephFS file contains garbage zero padding after an unclean cluster shutdown

2018-11-25 Thread Yan, Zheng
On Mon, Nov 26, 2018 at 4:30 AM Hector Martin wrote: > > On 26/11/2018 00.19, Paul Emmerich wrote: > > No, wait. Which system did kernel panic? Your CephFS client running rsync? > > In this case this would be expected behavior because rsync doesn't > > sync on every block and you lost your file sy

Re: [ceph-users] CephFs CDir fnode version far less then subdir inode version causes mds can't start correctly

2018-11-25 Thread Yan, Zheng
Did you do any special operation (such as reset journal) before this happened On Sat, Nov 24, 2018 at 3:58 PM 关云飞 wrote: > > hi, > >According to my understanding, the parent directory CDir fnode > version should be incremented if creating file or directory operation > happened. But there was

Re: [ceph-users] [cephfs] Kernel outage / timeout

2018-12-04 Thread Yan, Zheng
On Tue, Dec 4, 2018 at 6:55 PM wrote: > > Hi, > > I have some wild freeze using cephfs with the kernel driver > For instance: > [Tue Dec 4 10:57:48 2018] libceph: mon1 10.5.0.88:6789 session lost, > hunting for new mon > [Tue Dec 4 10:57:48 2018] libceph: mon2 10.5.0.89:6789 session established

Re: [ceph-users] 【cephfs】cephfs hung when scp/rsync large files

2018-12-05 Thread Yan, Zheng
Is the cephfs mount on the same machine that run OSD? On Wed, Dec 5, 2018 at 2:33 PM NingLi wrote: > > Hi all, > > We found that some process writing cephfs will hang for a long time (> 120s) > when uploading(scp/rsync) large files(totally 50G ~ 300G)to the app node's > cephfs mountpoint. > >

Re: [ceph-users] 【cephfs】cephfs hung when scp/rsync large files

2018-12-05 Thread Yan, Zheng
On Wed, Dec 5, 2018 at 2:33 PM NingLi wrote: > > Hi all, > > We found that some process writing cephfs will hang for a long time (> 120s) > when uploading(scp/rsync) large files(totally 50G ~ 300G)to the app node's > cephfs mountpoint. > > This problem is not always reproduciable. But when thi

Re: [ceph-users] 【cephfs】cephfs hung when scp/rsync large files

2018-12-05 Thread Yan, Zheng
l on the ceph storage > server side. > > > Anyway,I will have a try. > > — > Best Regards > Li, Ning > > > > > On Dec 6, 2018, at 11:41, Yan, Zheng wrote: > > > > On Wed, Dec 5, 2018 at 2:33 PM NingLi wrote: > >> > >> Hi all, > >

Re: [ceph-users] mds lost very frequently

2018-12-12 Thread Yan, Zheng
lt;<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< > > > > The full log is also attached. Could you please help us? Thanks! > > Please try

Re: [ceph-users] mds lost very frequently

2018-12-13 Thread Yan, Zheng
On Thu, Dec 13, 2018 at 9:25 PM Sang, Oliver wrote: > > Thanks a lot, Yan Zheng! > > Regarding the " set debug_mds =10 for standby mds (change debug_mds to 0 > after mds becomes active)." > Could you please explain the purpose? Just want to collect debug log, or it

Re: [ceph-users] mds lost very frequently

2018-12-13 Thread Yan, Zheng
On Fri, Dec 14, 2018 at 12:05 PM Sang, Oliver wrote: > > Thanks a lot, Yan Zheng! > > I enabled only 2 MDS - node1(active) and node2. Then I modified ceph.conf of > node2 to have - > debug_mds = 10/10 > > At 08:35:28, I observed degradation, the node1 was not a MDS

Re: [ceph-users] cephfs file block size: must it be so big?

2018-12-14 Thread Yan, Zheng
been discussed much? Is there a good reason that it's > the RADOS object size? > > I'm thinking of modifying the cephfs filesystem driver to add a mount option > to specify a fixed block size to be reported for all files, and using

Re: [ceph-users] cephfs client operation record

2019-01-01 Thread Yan, Zheng
On Wed, Jan 2, 2019 at 11:12 AM Zhenshi Zhou wrote: > > Hi all, > > I have a cluster on Luminous(12.2.8). > Is there a way I can check clients' operation records? > No way do that > Thanks > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > h

Re: [ceph-users] CephFS client df command showing raw space after adding second pool to mds

2019-01-03 Thread Yan, Zheng
On Fri, Jan 4, 2019 at 1:53 AM David C wrote: > > Hi All > > Luminous 12.2.12 > Single MDS > Replicated pools > > A 'df' on a CephFS kernel client used to show me the usable space (i.e the > raw space with the replication overhead applied). This was when I just had a > single cephfs data pool. >

Re: [ceph-users] MDS uses up to 150 GByte of memory during journal replay

2019-01-06 Thread Yan, Zheng
likely caused by http://tracker.ceph.com/issues/37399. Regards Yan, Zheng On Sat, Jan 5, 2019 at 5:44 PM Matthias Aebi wrote: > > Hello everyone, > > We are running a small cluster on 5 machines with 48 OSDs / 5 MDSs / 5 MONs > based on Luminous 12.2.10 and Debian Stretch 9.

Re: [ceph-users] cephfs : rsync backup create cache pressure on clients, filling caps

2019-01-06 Thread Yan, Zheng
On Fri, Jan 4, 2019 at 11:40 AM Alexandre DERUMIER wrote: > > Hi, > > I'm currently doing cephfs backup, through a dedicated clients mounting the > whole filesystem at root. > others clients are mounting part of the filesystem. (kernel cephfs clients) > > > I have around 22millions inodes, > > be

Re: [ceph-users] tuning ceph mds cache settings

2019-01-09 Thread Yan, Zheng
etween 700-1k/minute. > Could you please run following command (for each active mds) when operations are fast and when operations are slow - for i in `seq 10`; do ceph daemon mds.xxx dump_historic_ops > mds.xxx.$i; sleep 1; done Then send the results to us Regards Yan, Zheng > T

Re: [ceph-users] Ceph MDS laggy

2019-01-13 Thread Yan, Zheng
ype 'set logging on' and 'thread apply all bt' inside gdb. and send the output to us Yan, Zheng > -- > Adam > > On Sat, Jan 12, 2019 at 7:53 PM Adam Tygart wrote: > > > > On a hunch, I shutdown the compute nodes for our HPC cluster, and 10 > >

Re: [ceph-users] mds0: Metadata damage detected

2019-01-15 Thread Yan, Zheng
uot;\/public\/video\/3h\/3hG6X7\/screen-msmall"}] > Looks like object 1005607c727. in cephfs metadata pool is corrupted. please run following commands and send mds.0 log to us ceph tell mds.0 injectargs '--debug_mds 10' ceph tell mds.0 dama

Re: [ceph-users] Process stuck in D+ on cephfs mount

2019-01-20 Thread Yan, Zheng
check /proc//stack to find where it is stuck On Mon, Jan 21, 2019 at 5:51 AM Marc Roos wrote: > > > I have a process stuck in D+ writing to cephfs kernel mount. Anything > can be done about this? (without rebooting) > > > CentOS Linux release 7.5.1804 (Core) > Linux 3.10.0-514.21.2.el7.x86_64 > >

Re: [ceph-users] Ceph MDS laggy

2019-01-20 Thread Yan, Zheng
It's http://tracker.ceph.com/issues/37977. Thanks for your help. Regards Yan, Zheng On Sun, Jan 20, 2019 at 12:40 AM Adam Tygart wrote: > > It worked for about a week, and then seems to have locked up again. > > Here is the back trace from the threads on the mds: > http

Re: [ceph-users] MDS performance issue

2019-01-20 Thread Yan, Zheng
On Mon, Jan 21, 2019 at 11:16 AM Albert Yue wrote: > > Dear Ceph Users, > > We have set up a cephFS cluster with 6 osd machines, each with 16 8TB > harddisk. Ceph version is luminous 12.2.5. We created one data pool with > these hard disks and created another meta data pool with 3 ssd. We create

Re: [ceph-users] MDS performance issue

2019-01-21 Thread Yan, Zheng
On Mon, Jan 21, 2019 at 12:12 PM Albert Yue wrote: > > Hi Yan Zheng, > > 1. mds cache limit is set to 64GB > 2. we get the size of meta data pool by running `ceph df` and saw meta data > pool just used 200MB space. > That's very strange. One file uses about 1k me

Re: [ceph-users] Process stuck in D+ on cephfs mount

2019-01-21 Thread Yan, Zheng
; > no, there is no config for request timeout > > -Original Message- > From: Yan, Zheng [mailto:uker...@gmail.com] > Sent: 21 January 2019 02:50 > To: Marc Roos > Cc: ceph-users > Subject: Re: [ceph-users] Process stuck in D+ on cephfs mount > > check /proc//

Re: [ceph-users] MDS performance issue

2019-01-21 Thread Yan, Zheng
On Mon, Jan 21, 2019 at 11:16 AM Albert Yue wrote: > > Dear Ceph Users, > > We have set up a cephFS cluster with 6 osd machines, each with 16 8TB > harddisk. Ceph version is luminous 12.2.5. We created one data pool with > these hard disks and created another meta data pool with 3 ssd. We create

Re: [ceph-users] MDS performance issue

2019-01-22 Thread Yan, Zheng
On Tue, Jan 22, 2019 at 10:49 AM Albert Yue wrote: > > Hi Yan Zheng, > > In your opinion, can we resolve this issue by move MDS to a 512GB or 1TB > memory machine? > The problem is from client side, especially clients with large memory. I don't think enlarge mds cache siz

Re: [ceph-users] Broken CephFS stray entries?

2019-01-22 Thread Yan, Zheng
gt; } else { > - clog->error() << "unmatched fragstat on " << ino() << ", inode > has " > + clog->warn() << "unmatched fragstat on " << ino() << ", inode has > " >

Re: [ceph-users] Process stuck in D+ on cephfs mount

2019-01-22 Thread Yan, Zheng
On Wed, Jan 23, 2019 at 5:50 AM Marc Roos wrote: > > > I got one again > > [] wait_on_page_bit_killable+0x83/0xa0 > [] __lock_page_or_retry+0xb2/0xc0 > [] filemap_fault+0x3b7/0x410 > [] ceph_filemap_fault+0x13c/0x310 [ceph] > [] __do_fault+0x4c/0xc0 > [] do_read_fault.isra.42+0x43/0x130 > [] handl

Re: [ceph-users] cephfs performance degraded very fast

2019-01-22 Thread Yan, Zheng
On Tue, Jan 22, 2019 at 8:24 PM renjianxinlover wrote: > > hi, >at some time, as cache pressure or caps release failure, client apps mount > got stuck. >my use case is in kubernetes cluster and automatic kernel client mount in > nodes. >is anyone faced with same issue or has related

Re: [ceph-users] Broken CephFS stray entries?

2019-01-22 Thread Yan, Zheng
On Tue, Jan 22, 2019 at 10:42 PM Dan van der Ster wrote: > > On Tue, Jan 22, 2019 at 3:33 PM Yan, Zheng wrote: > > > > On Tue, Jan 22, 2019 at 9:08 PM Dan van der Ster > > wrote: > > > > > > Hi Zheng, > > > > > > We also just saw

Re: [ceph-users] MDS performance issue

2019-01-22 Thread Yan, Zheng
00G metadata, mds may need 1T or more memory. > On Tue, Jan 22, 2019 at 5:48 PM Yan, Zheng wrote: >> >> On Tue, Jan 22, 2019 at 10:49 AM Albert Yue >> wrote: >> > >> > Hi Yan Zheng, >> > >> > In your opinion, can we resolve this issue by mo

Re: [ceph-users] Process stuck in D+ on cephfs mount

2019-01-23 Thread Yan, Zheng
On Wed, Jan 23, 2019 at 6:07 PM Marc Roos wrote: > > Yes sort of. I do have an inconsistent pg for a while, but it is on a > different pool. But I take it this is related to a networking issue I > currently have with rsync and broken pipe. > > Where exactly does it go wrong? The cephfs kernel clie

Re: [ceph-users] MDS performance issue

2019-01-27 Thread Yan, Zheng
On Mon, Jan 28, 2019 at 10:34 AM Albert Yue wrote: > > Hi Yan Zheng, > > Our clients are also complaining about operations like 'du' or 'ncdu' being > very slow. Is there any alternative tool for such kind of operation on > CephFS? Thanks! > 'du&#x

Re: [ceph-users] how to debug a stuck cephfs?

2019-01-27 Thread Yan, Zheng
http://docs.ceph.com/docs/master/cephfs/troubleshooting/ For your case, it's likely client got evicted by mds. On Mon, Jan 28, 2019 at 9:50 AM Sang, Oliver wrote: > > Hello, > > > > Our cephfs looks just stuck. If I run some command such like ‘makdir’, > ‘touch’ a new file, it just stuck there

Re: [ceph-users] ceph-fs crashed after upgrade to 13.2.4

2019-01-29 Thread Yan, Zheng
upgraded from which version? have you try downgrade ceph-mds to old version? On Mon, Jan 28, 2019 at 9:20 PM Ansgar Jazdzewski wrote: > > hi folks we need some help with our cephfs, all mds keep crashing > > starting mds.mds02 at - > terminate called after throwing an instance of > 'ceph::buffe

Re: [ceph-users] tuning ceph mds cache settings

2019-01-29 Thread Yan, Zheng
s section of ceph.conf). and use 'export_pin' to manually pin directories to mds (https://ceph.com/community/new-luminous-cephfs-subtree-pinning/) > > On Wed, Jan 9, 2019 at 9:10 PM Yan, Zheng wrote: >> >> [...] >> Could you please run following command (for

Re: [ceph-users] cephfs constantly strays ( num_strays)

2019-01-29 Thread Yan, Zheng
Nothing to worried about. On Sun, Jan 27, 2019 at 10:13 PM Marc Roos wrote: > > > I have constantly strays. What are strays? Why do I have them? Is this > bad? > > > > [@~]# ceph daemon mds.c perf dump| grep num_stray > "num_strays": 25823, > "num_strays_delayed": 0, > "nu

Re: [ceph-users] ceph-fs crashed after upgrade to 13.2.4

2019-01-29 Thread Yan, Zheng
01-28 14:46:47.983292 59=0+59), dirfrag has f(v0 > m2019-01-28 14:46:47.983292 58=0+58) > log [ERR] : unmatched rstat rbytes on single dirfrag 0x10002253db6, > inode has n(v11 rc2019-01-28 14:46:47.983292 b1478 71=11+60), dirfrag > has n(v11 rc2019-01-28 14:46:47.983292 b1347 68=10+58) >

Re: [ceph-users] tuning ceph mds cache settings

2019-01-29 Thread Yan, Zheng
On Tue, Jan 29, 2019 at 9:05 PM Jonathan Woytek wrote: > > On Tue, Jan 29, 2019 at 7:12 AM Yan, Zheng wrote: >> >> Looks like you have 5 active mds. I suspect your issue is related to >> load balancer. Please try disabling mds load balancer (add >> "mds_bal_m

Re: [ceph-users] MDS crash (Mimic 13.2.2 / 13.2.4 ) elist.h: 39: FAILED assert(!is_on_list())

2019-02-11 Thread Yan, Zheng
On Sat, Feb 9, 2019 at 12:36 AM Jake Grimmett wrote: > > Dear All, > > Unfortunately the MDS has crashed on our Mimic cluster... > > First symptoms were rsync giving: > "No space left on device (28)" > when trying to rename or delete > > This prompted me to try restarting the MDS, as it reported l

Re: [ceph-users] Controlling CephFS hard link "primary name" for recursive stat

2019-02-11 Thread Yan, Zheng
On Sat, Feb 9, 2019 at 8:10 AM Hector Martin wrote: > > Hi list, > > As I understand it, CephFS implements hard links as effectively "smart > soft links", where one link is the primary for the inode and the others > effectively reference it. When it comes to directories, the size for a > hardlinke

Re: [ceph-users] MDS crash (Mimic 13.2.2 / 13.2.4 ) elist.h: 39: FAILED assert(!is_on_list())

2019-02-11 Thread Yan, Zheng
> mds_cache_size = 8589934592 > mds_cache_memory_limit = 17179869184 > > Should these values be left in our configuration? No. you'd better to change them to original values. > > again thanks for the assistance, > > Jake > > On 2/11/19 8:17 AM, Yan, Zheng wro

Re: [ceph-users] CephFS: client hangs

2019-02-18 Thread Yan, Zheng
On Mon, Feb 18, 2019 at 10:55 PM Hennen, Christian wrote: > > Dear Community, > > > > we are running a Ceph Luminous Cluster with CephFS (Bluestore OSDs). During > setup, we made the mistake of configuring the OSDs on RAID Volumes. Initially > our cluster consisted of 3 nodes, each housing 1 OSD

Re: [ceph-users] CephFS: client hangs

2019-02-19 Thread Yan, Zheng
On Tue, Feb 19, 2019 at 5:10 PM Hennen, Christian wrote: > > Hi! > > >mon_max_pg_per_osd = 400 > > > >In the ceph.conf and then restart all the services / or inject the config > >into the running admin > > I restarted each server (MONs and OSDs weren’t enough) and now the health > warning is gone

Re: [ceph-users] Cephfs recursive stats | rctime in the future

2019-02-28 Thread Yan, Zheng
On Thu, Feb 28, 2019 at 5:33 PM David C wrote: > > On Wed, Feb 27, 2019 at 11:35 AM Hector Martin wrote: >> >> On 27/02/2019 19:22, David C wrote: >> > Hi All >> > >> > I'm seeing quite a few directories in my filesystem with rctime years in >> > the future. E.g >> > >> > ]# getfattr -d -m ceph.d

Re: [ceph-users] Can CephFS Kernel Client Not Read & Write at the Same Time?

2019-03-07 Thread Yan, Zheng
CephFS kernel mount blocks reads while other client has dirty data in its page cache. Cache coherency rule looks like: state 1 - only one client opens a file for read/write. the client can use page cache state 2 - multiple clients open a file for read, no client opens the file for wirte. client

Re: [ceph-users] CephFS - large omap object

2019-03-18 Thread Yan, Zheng
cephfs does not create/use object "4.". Please show us some of its keys. On Mon, Mar 18, 2019 at 4:16 PM Dylan McCulloch wrote: > > Hi all, > > We have a large omap object warning on one of our Ceph clusters. > The only reports I've seen regarding the "large omap objects" warning from >

Re: [ceph-users] CephFS - large omap object

2019-03-18 Thread Yan, Zheng
On Mon, Mar 18, 2019 at 6:05 PM Dylan McCulloch wrote: > > > > > >cephfs does not create/use object "4.". Please show us some > >of its keys. > > > > https://pastebin.com/WLfLTgni > Thanks > Is the object recently modified? rados -p hpcfs_metadata stat 4. > >On Mon, Mar 18, 2

Re: [ceph-users] CephFS - large omap object

2019-03-18 Thread Yan, Zheng
please check if 4. has omap header and xattrs rados -p hpcfs_data listxattr 4. rados -p hpcfs_data getomapheader 4. On Mon, Mar 18, 2019 at 7:37 PM Dylan McCulloch wrote: > > >> > > >> >cephfs does not create/use object "4.". Please show us some > >> >of its ke

Re: [ceph-users] CephFS - large omap object

2019-03-18 Thread Yan, Zheng
please run following command. It will show where is 4. rados -p -p hpcfs_metadata getxattr 4. parent >/tmp/parent ceph-dencoder import /tmp/parent type inode_backtrace_t decode dump_json On Mon, Mar 18, 2019 at 8:15 PM Dylan McCulloch wrote: > > >> >> >cephfs does not create/u

Re: [ceph-users] CephFS - large omap object

2019-03-18 Thread Yan, Zheng
On Mon, Mar 18, 2019 at 9:50 PM Dylan McCulloch wrote: > > > >please run following command. It will show where is 4. > > > >rados -p -p hpcfs_metadata getxattr 4. parent >/tmp/parent > >ceph-dencoder import /tmp/parent type inode_backtrace_t decode dump_json > > > > $ ceph-dencoder

Re: [ceph-users] CephFS: effects of using hard links

2019-03-20 Thread Yan, Zheng
x27;t changed. hard link in cephfs is magic symbol link. Its main overhead is at open. Regards Yan, Zheng 2. Is there any performance (dis)advantage? Generally not once the file is open. 3. When using hard links, is there an actual space savings, or is there some trickery

Re: [ceph-users] Ceph MDS laggy

2019-03-25 Thread Yan, Zheng
On Mon, Mar 25, 2019 at 6:36 PM Mark Schouten wrote: > > On Mon, Jan 21, 2019 at 10:17:31AM +0800, Yan, Zheng wrote: > > It's http://tracker.ceph.com/issues/37977. Thanks for your help. > > > > I think I've hit this bug. Ceph MDS using 100% ceph and reporting as

Re: [ceph-users] co-located cephfs client deadlock

2019-04-01 Thread Yan, Zheng
On Mon, Apr 1, 2019 at 6:45 PM Dan van der Ster wrote: > > Hi all, > > We have been benchmarking a hyperconverged cephfs cluster (kernel > clients + osd on same machines) for awhile. Over the weekend (for the > first time) we had one cephfs mount deadlock while some clients were > running ior. > >

Re: [ceph-users] MDS stuck at replaying status

2019-04-02 Thread Yan, Zheng
please set debug_mds=10, and try again On Tue, Apr 2, 2019 at 1:01 PM Albert Yue wrote: > > Hi, > > This happens after we restart the active MDS, and somehow the standby MDS > daemon cannot take over successfully and is stuck at up:replaying. It is > showing the following log. Any idea on how t

Re: [ceph-users] inline_data (was: CephFS and many small files)

2019-04-02 Thread Yan, Zheng
ture. We don't have plan to mark this feature stable. (probably we will remove this feature in the furthure). Yan, Zheng > $ ceph fs dump | grep inline_data > dumped fsmap epoch 1224 > inline_data enabled > > I have reduced the size of the bonnie-generated files to 1

<    1   2   3   4   5   6   >