Re: [ceph-users] inline_data (was: CephFS and many small files)

2019-04-02 Thread Yan, Zheng
On Tue, Apr 2, 2019 at 9:05 PM Yan, Zheng wrote: > > On Tue, Apr 2, 2019 at 8:23 PM Clausen, Jörn wrote: > > > > Hi! > > > > Am 29.03.2019 um 23:56 schrieb Paul Emmerich: > > > There's also some metadata overhead etc. You might want to consider &g

Re: [ceph-users] inline_data (was: CephFS and many small files)

2019-04-02 Thread Yan, Zheng
On Tue, Apr 2, 2019 at 9:10 PM Paul Emmerich wrote: > > On Tue, Apr 2, 2019 at 3:05 PM Yan, Zheng wrote: > > > > On Tue, Apr 2, 2019 at 8:23 PM Clausen, Jörn wrote: > > > > > > Hi! > > > > > > Am 29.03.2019 um 23:56 schrieb Paul Emmerich

Re: [ceph-users] MDS allocates all memory (>500G) replaying, OOM-killed, repeat

2019-04-02 Thread Yan, Zheng
Looks like http://tracker.ceph.com/issues/37399. which version of ceph-mds do you use? On Tue, Apr 2, 2019 at 7:47 AM Sergey Malinin wrote: > > These steps pretty well correspond to > http://docs.ceph.com/docs/mimic/cephfs/disaster-recovery/ > Were you able to replay journal manually with no iss

Re: [ceph-users] "Failed to authpin" results in large number of blocked requests

2019-04-03 Thread Yan, Zheng
http://tracker.ceph.com/issues/25131 may relieve the issue. please try ceph version 13.2.5. Regards Yan, Zheng On Thu, Mar 28, 2019 at 6:02 PM Zoë O'Connell wrote: > > We're running a Ceph mimic (13.2.4) cluster which is predominantly used > for CephFS. We have recentl

Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-15 Thread Yan, Zheng
On Wed, May 15, 2019 at 9:34 PM Frank Schilder wrote: > > Dear Stefan, > > thanks for the fast reply. We encountered the problem again, this time in a > much simpler situation; please see below. However, let me start with your > questions first: > > What bug? -- In a single-active MDS set-up, sh

Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-16 Thread Yan, Zheng
ns, to reproduce the issue I will create a directory with many > entries and execute a test with the many-clients single-file-read load on it. > try setting mds_bal_split_rd and mds_bal_split_wr to very large value. which prevent mds from splitting hot dirfrag Regards Yan, Zheng > I hop

Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-20 Thread Yan, Zheng
dirfrag.zip?l"; . Its a bit > more than 100MB. > MSD cache dump shows there is a snapshot related. Please avoid using snapshot until we fix the bug. Regards Yan, Zheng > The active MDS failed over to the standby after or during the dump cache > operation. Is this expected? As a re

Re: [ceph-users] Cephfs client evicted, how to unmount the filesystem on the client?

2019-05-22 Thread Yan, Zheng
try 'umount -f' On Tue, May 21, 2019 at 4:41 PM Marc Roos wrote: > > > > > > [@ceph]# ps -aux | grep D > USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND > root 12527 0.0 0.0 123520 932 pts/1D+ 09:26 0:00 umount > /home/mail-archive > root 14549 0.2 0

Re: [ceph-users] CephFS msg length greater than osd_max_write_size

2019-05-22 Thread Yan, Zheng
On Tue, May 21, 2019 at 6:10 AM Ryan Leimenstoll wrote: > > Hi all, > > We recently encountered an issue where our CephFS filesystem unexpectedly was > set to read-only. When we look at some of the logs from the daemons I can see > the following: > > On the MDS: > ... > 2019-05-18 16:34:24.341 7

Re: [ceph-users] Quotas with Mimic (CephFS-FUSE) clients in a Luminous Cluster

2019-05-27 Thread Yan, Zheng
s-i-really-mean-it") but sadly, the max_bytes attribute is still not > > there > > (also not after remounting on the client / using the file creation and > > deletion trick). > > That's interesting - it suddenly started to work for one directory aft

Re: [ceph-users] CEPH MDS Damaged Metadata - recovery steps

2019-06-03 Thread Yan, Zheng
On Mon, Jun 3, 2019 at 3:06 PM James Wilkins wrote: > > Hi all, > > After a bit of advice to ensure we’re approaching this the right way. > > (version: 12.2.12, multi-mds, dirfrag is enabled) > > We have corrupt meta-data as identified by ceph > > health: HEALTH_ERR > 2 MDSs report

Re: [ceph-users] How to fix ceph MDS HEALTH_WARN

2019-06-05 Thread Yan, Zheng
On Thu, Jun 6, 2019 at 6:36 AM Jorge Garcia wrote: > > We have been testing a new installation of ceph (mimic 13.2.2) mostly > using cephfs (for now). The current test is just setting up a filesystem > for backups of our other filesystems. After rsyncing data for a few > days, we started getting t

Re: [ceph-users] MDS getattr op stuck in snapshot

2019-06-12 Thread Yan, Zheng
On Wed, Jun 12, 2019 at 3:26 PM Hector Martin wrote: > > Hi list, > > I have a setup where two clients mount the same filesystem and > read/write from mostly non-overlapping subsets of files (Dovecot mail > storage/indices). There is a third client that takes backups by > snapshotting the top-leve

Re: [ceph-users] How does cephfs ensure client cache consistency?

2019-06-18 Thread Yan, Zheng
On Tue, Jun 18, 2019 at 4:25 PM ?? ?? wrote: > > > > There are 2 clients, A and B. There is a directory /a/b/c/d/. > > Client A create a file /a/b/c/d/a.txt. > > Client B move the folder d to /a/. > > Now, this directory looks like this:/a/b/c/ and /a/d/. > > /a/b/c/d is not exist any more. > > Cl

Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-06-21 Thread Yan, Zheng
On Fri, Jun 21, 2019 at 6:10 PM Frank Schilder wrote: > > Dear Yan, Zheng, > > does mimic 13.2.6 fix the snapshot issue? If not, could you please send me a > link to the issue tracker? > no https://tracker.ceph.com/issues/39987 > Thanks and best regards, > > ===

Re: [ceph-users] MDS getattr op stuck in snapshot

2019-06-29 Thread Yan, Zheng
On Fri, Jun 28, 2019 at 11:42 AM Hector Martin wrote: > > On 12/06/2019 22.33, Yan, Zheng wrote: > > I have tracked down the bug. thank you for reporting this. 'echo 2 > > > /proc/sys/vm/drop_cache' should fix the hang. If you can compile ceph > > from sour

Re: [ceph-users] writable snapshots in cephfs? GDPR/DSGVO

2019-07-10 Thread Yan, Zheng
On Wed, Jul 10, 2019 at 4:16 PM Lars Täuber wrote: > > Hi everbody! > > Is it possible to make snapshots in cephfs writable? > We need to remove files because of this General Data Protection Regulation > also from snapshots. > It's possible (only delete data), but need to modify both mds and osd

Re: [ceph-users] HEALTH_WARN 1 MDSs report slow metadata IOs

2019-07-17 Thread Yan, Zheng
Check if there is any hang request in 'ceph daemon mds.xxx objecter_requests' On Tue, Jul 16, 2019 at 11:51 PM Dietmar Rieder wrote: > > On 7/16/19 4:11 PM, Dietmar Rieder wrote: > > Hi, > > > > We are running ceph version 14.1.2 with cephfs only. > > > > I just noticed that one of our pgs had s

Re: [ceph-users] Mark CephFS inode as lost

2019-07-22 Thread Yan, Zheng
please create a ticket at http://tracker.ceph.com/projects/cephfs and upload mds log with debug_mds =10 On Tue, Jul 23, 2019 at 6:00 AM Robert LeBlanc wrote: > > We have a Luminous cluster which has filled up to 100% multiple times and > this causes an inode to be left in a bad state. Doing anyt

Re: [ceph-users] MDS fails repeatedly while handling many concurrent meta data operations

2019-07-23 Thread Yan, Zheng
On Wed, Jul 24, 2019 at 4:06 AM Janek Bevendorff < janek.bevendo...@uni-weimar.de> wrote: > Thanks for your reply. > > On 23/07/2019 21:03, Nathan Fish wrote: > > What Ceph version? Do the clients match? What CPUs do the MDS servers > > have, and how is their CPU usage when this occurs? > > Sorry,

Re: [ceph-users] MDS fails repeatedly while handling many concurrent meta data operations

2019-07-24 Thread Yan, Zheng
On Wed, Jul 24, 2019 at 1:58 PM Janek Bevendorff < janek.bevendo...@uni-weimar.de> wrote: > Ceph-fuse ? > > No, I am using the kernel module. > > which version? > > Was there "Client xxx failing to respond to cache pressure" health warning? > > > At first, yes (at least with the Mimic client). T

Re: [ceph-users] MDS fails repeatedly while handling many concurrent meta data operations

2019-07-24 Thread Yan, Zheng
On Wed, Jul 24, 2019 at 3:13 PM Janek Bevendorff < janek.bevendo...@uni-weimar.de> wrote: > > which version? > > Nautilus, 14.2.2. > I mean kernel version > try mounting cephfs on a machine/vm with small memory (4G~8G), then rsync > your date into mount point of that machine. > > I could try ru

Re: [ceph-users] loaded dup inode (but no mds crash)

2019-07-29 Thread Yan, Zheng
cephfs_metadata rmomapkey 617. 10006289992_head. I suggest run 'cephfs-data-scan scan_links' after taking down cephfs (either use 'mds set down true' or 'flush all journasl and kill all mds') Regards Yan, Zheng > > Thanks! > > Dan > _

Re: [ceph-users] loaded dup inode (but no mds crash)

2019-07-29 Thread Yan, Zheng
On Mon, Jul 29, 2019 at 9:13 PM Dan van der Ster wrote: > > On Mon, Jul 29, 2019 at 2:52 PM Yan, Zheng wrote: > > > > On Fri, Jul 26, 2019 at 4:45 PM Dan van der Ster > > wrote: > > > > > > Hi all, > > > > > > Last night we h

Re: [ceph-users] loaded dup inode (but no mds crash)

2019-07-29 Thread Yan, Zheng
On Mon, Jul 29, 2019 at 9:54 PM Dan van der Ster wrote: > > On Mon, Jul 29, 2019 at 3:47 PM Yan, Zheng wrote: > > > > On Mon, Jul 29, 2019 at 9:13 PM Dan van der Ster > > wrote: > > > > > > On Mon, Jul 29, 2019 at 2:52 PM Yan, Zheng wrote: > >

Re: [ceph-users] Error Mounting CephFS

2019-08-07 Thread Yan, Zheng
On Wed, Aug 7, 2019 at 3:46 PM wrote: > > All; > > I have a server running CentOS 7.6 (1810), that I want to set up with CephFS > (full disclosure, I'm going to be running samba on the CephFS). I can mount > the CephFS fine when I use the option secret=, but when I switch to > secretfile=, I g

Re: [ceph-users] MDS corruption

2019-08-13 Thread Yan, Zheng
nautilus version (14.2.2) of ‘cephfs-data-scan scan_links’ can fix snaptable. hopefully it will fix your issue. you don't need to upgrade whole cluster. Just install nautilus in a temp machine or compile ceph from source. On Tue, Aug 13, 2019 at 2:35 PM Adam wrote: > > Pierre Dittes helped me

Re: [ceph-users] cephfs-snapshots causing mds failover, hangs

2019-08-20 Thread Yan, Zheng
nd the mds > daemons on these machines have to be manually restarted. more often than > we wish, the failover fails altogether, resulting in an unresponsive cephfs. > Please enable debug mds (debug_mds=10), and try reproducing it again. Regards Yan, Zheng > this is with mimic 13.2.6

Re: [ceph-users] cephfs-snapshots causing mds failover, hangs

2019-08-26 Thread Yan, Zheng
On Mon, Aug 26, 2019 at 6:57 PM thoralf schulze wrote: > > hi Zheng, > > On 8/21/19 4:32 AM, Yan, Zheng wrote: > > Please enable debug mds (debug_mds=10), and try reproducing it again. > > please find the logs at > https://www.user.tu-berlin.de/thoralf.schulze/ceph-de

Re: [ceph-users] cephfs-snapshots causing mds failover, hangs

2019-08-26 Thread Yan, Zheng
On Mon, Aug 26, 2019 at 9:25 PM thoralf schulze wrote: > > hi Zheng - > > On 8/26/19 2:55 PM, Yan, Zheng wrote: > > I tracked down the bug > > https://tracker.ceph.com/issues/41434 > > wow, that was quick - thank you for investigating. we are looking > forward fo

Re: [ceph-users] Stray count increasing due to snapshots (?)

2019-09-05 Thread Yan, Zheng
On Thu, Sep 5, 2019 at 4:31 PM Hector Martin wrote: > > I have a production CephFS (13.2.6 Mimic) with >400K strays. I believe > this is caused by snapshots. The backup process for this filesystem > consists of creating a snapshot and rsyncing it over daily, and > snapshots are kept locally in the

Re: [ceph-users] CephFS deletion performance

2019-09-17 Thread Yan, Zheng
On Sat, Sep 14, 2019 at 8:57 PM Hector Martin wrote: > > On 13/09/2019 16.25, Hector Martin wrote: > > Is this expected for CephFS? I know data deletions are asynchronous, but > > not being able to delete metadata/directories without an undue impact on > > the whole filesystem performance is somew

Re: [ceph-users] ceph mdss keep on crashing after update to 14.2.3

2019-09-19 Thread Yan, Zheng
854dcfe8 > ?? > You are right. Sorry for the bug. For now, please got back to 14.2.2 (just mds) or complie ceph-mds from source Yan, Zheng > Did you already try going back to v14.2.2 (on the MDS's only) ?? > > -- dan > > On Thu, Sep 19, 2019 at 4:59 PM Kenneth Waeg

Re: [ceph-users] mds fail ing to start 14.2.2

2019-10-11 Thread Yan, Zheng
On Sat, Oct 12, 2019 at 1:10 AM Kenneth Waegeman wrote: > Hi all, > > After solving some pg inconsistency problems, my fs is still in > trouble. my mds's are crashing with this error: > > > > -5> 2019-10-11 19:02:55.375 7f2d39f10700 1 mds.1.564276 rejoin_start > > -4> 2019-10-11 19:02:5

Re: [ceph-users] Crashed MDS (segfault)

2019-10-17 Thread Yan, Zheng
On Tue, Oct 15, 2019 at 12:03 PM Gustavo Tonini wrote: > > Dear ceph users, > we're experiencing a segfault during MDS startup (replay process) which is > making our FS inaccessible. > > MDS log messages: > > Oct 15 03:41:39.894584 mds1 ceph-mds: -472> 2019-10-15 00:40:30.201 > 7f3c08f49700 1

Re: [ceph-users] Crashed MDS (segfault)

2019-10-17 Thread Yan, Zheng
-1c5e4d9ddbc > ceph@deployer:~$ > > Could a journal reset help with this? > > I could snapshot all FS pools and export the journal before to guarantee a > rollback to this state if something goes wrong with jounal reset. > > On Thu, Oct 17, 2019, 09:07 Yan, Zheng wrot

Re: [ceph-users] Crashed MDS (segfault)

2019-10-21 Thread Yan, Zheng
w could variable "newparent" be NULL at > https://github.com/ceph/ceph/blob/master/src/mds/SnapRealm.cc#L599 ? Is there > a way to fix this? > try 'cephfs-data-scan init'. It will setup root inode's snaprealm. > On Thu, Oct 17, 2019 at 9:58 PM Yan, Zh

Re: [ceph-users] MDS crash - FAILED assert(omap_num_objs <= MAX_OBJECTS)

2019-10-21 Thread Yan, Zheng
On Sun, Oct 20, 2019 at 1:53 PM Stefan Kooman wrote: > > Dear list, > > Quoting Stefan Kooman (ste...@bit.nl): > > > I wonder if this situation is more likely to be hit on Mimic 13.2.6 than > > on any other system. > > > > Any hints / help to prevent this from happening? > > We have had this happe

Re: [ceph-users] MDS crash - FAILED assert(omap_num_objs <= MAX_OBJECTS)

2019-10-21 Thread Yan, Zheng
On Mon, Oct 21, 2019 at 4:33 PM Stefan Kooman wrote: > > Quoting Yan, Zheng (uker...@gmail.com): > > > delete 'mdsX_openfiles.0' object from cephfs metadata pool. (X is rank > > of the crashed mds) > > OK, MDS crashed again, restarted. I stopped it, deleted t

Re: [ceph-users] MDS crash - FAILED assert(omap_num_objs <= MAX_OBJECTS)

2019-10-21 Thread Yan, Zheng
On Mon, Oct 21, 2019 at 7:58 PM Stefan Kooman wrote: > > Quoting Yan, Zheng (uker...@gmail.com): > > > I double checked the code, but didn't find any clue. Can you compile > > mds with a debug patch? > > Sure, I'll try to do my best to get a properly packag

Re: [ceph-users] Crashed MDS (segfault)

2019-10-22 Thread Yan, Zheng
epair' after mds restart can fix the incorrect stat. > On Mon, Oct 21, 2019 at 4:36 AM Yan, Zheng wrote: >> >> On Fri, Oct 18, 2019 at 9:10 AM Gustavo Tonini >> wrote: >> > >> > Hi Zheng, >> > the cluster is running ceph mimic. T

Re: [ceph-users] Crashed MDS (segfault)

2019-10-25 Thread Yan, Zheng
> CephFS worked well for approximately 3 hours and then our MDS crashed again, > apparently due to the bug described at https://tracker.ceph.com/issues/38452 > does the method in issue #38452 work for you? if not, please debug_mds to 10, and set log around the crash to us Yan, Zheng

Re: [ceph-users] cephfs 1 large omap objects

2019-10-29 Thread Yan, Zheng
see https://tracker.ceph.com/issues/42515. just ignore the warning for now On Mon, Oct 7, 2019 at 7:50 AM Nigel Williams wrote: > > Out of the blue this popped up (on an otherwise healthy cluster): > > HEALTH_WARN 1 large omap objects > LARGE_OMAP_OBJECTS 1 large omap objects > 1 large objec

Re: [ceph-users] user and group acls on cephfs mounts

2019-11-05 Thread Yan, Zheng
On Wed, Nov 6, 2019 at 5:47 AM Alex Litvak wrote: > > Hello Cephers, > > > I am trying to understand how uid and gid are handled on the shared cephfs > mount. I am using 14.2.2 and cephfs kernel based client. > I have 2 client vms with following uid gid > > vm1 user dev (uid=500) group dev (gid=

Re: [ceph-users] user and group acls on cephfs mounts

2019-11-06 Thread Yan, Zheng
does 'group dev' have the same id on two VMss? do the the VMs use the same 'ceph auth name' to mount cephfs? On Wed, Nov 6, 2019 at 4:12 PM Alex Litvak wrote: > > Plot thickens. > > I create a new user sam2 and group sam2 both uid and gid = 1501. User sam2 > is a member of group dev. When I sw

Re: [ceph-users] MDS crash - FAILED assert(omap_num_objs <= MAX_OBJECTS)

2019-12-04 Thread Yan, Zheng
Base*, uint64_t, int)' thread > 7fd436ca7700 time 2019-12-04 20:28:34.939048 > /build/ceph-13.2.6/src/mds/OpenFileTable.cc: 476: FAILED assert(omap_num_objs > <= MAX_OBJECTS) > > mds.0.openfiles omap_num_objs 1025 <- ... just 1 higher than 1024? > Coincidence? > > Gr. S

Re: [ceph-users] Ceph MDS randomly hangs with no useful error message

2020-01-17 Thread Yan, Zheng
On Fri, Jan 17, 2020 at 4:47 PM Janek Bevendorff wrote: > > Hi, > > We have a CephFS in our cluster with 3 MDS to which > 300 clients > connect at any given time. The FS contains about 80 TB of data and many > million files, so it is important that meta data operations work > smoothly even when li

Re: [ceph-users] Ceph MDS randomly hangs with no useful error message

2020-01-20 Thread Yan, Zheng
dump_historic_ops' and ''ceph daemon mds.xxx perf reset; ceph daemon mds.xxx perf dump'. send the outputs to us. > > On 17/01/2020 13:07, Yan, Zheng wrote: > > On Fri, Jan 17, 2020 at 4:47 PM Janek Bevendorff > > wrote: > >> Hi, > >> > &g

<    1   2   3   4   5   6