ernel clients accessing the filesystem.
>
> Are there some new defaults I need to change perhaps? Or potentially a bug?
>
we introduced config option 'mds_cache_memory_limit'.
Regards
Yan, Zheng
> Output of perf dump mds:
>
>>> "mds": {
>>>
On Fri, Nov 24, 2017 at 4:59 PM, Zhang Qiang wrote:
> Hi all,
>
> To observe what will happen to ceph-fuse mount if the network is down, we
> blocked
> network connections to all three monitors by iptables. If we restore the
> network
> immediately(within minutes), the blocked I/O request will be
On Sat, Nov 25, 2017 at 2:27 AM, Jens-U. Mozdzen wrote:
> Hi all,
>
> with our Ceph Luminous CephFS, we're plaqued with "failed to open ino"
> messages. These don't seem to affect daily business (in terms of "file
> access"). (There's a backup performance issue that may eventually be
> related, bu
On Wed, Nov 29, 2017 at 7:06 AM, Nigel Williams
wrote:
> On 29 November 2017 at 01:51, Daniel Baumann wrote:
>> On 11/28/17 15:09, Geoffrey Rhodes wrote:
>>> I'd like to run more than one Ceph file system in the same cluster.
>
> Are their opinions on how stable multiple filesystems per single Ce
On Thu, Nov 30, 2017 at 2:08 AM, Jens-U. Mozdzen wrote:
> Hi *,
>
> while tracking down a different performance issue with CephFS (creating tar
> balls from CephFS-based directories takes multiple times as long as when
> backing up the same data from local disks, i.e. 56 hours instead of 7), we
>
On Thu, Dec 7, 2017 at 11:59 PM, Reed Dier wrote:
>> You can try doubling (several times if necessary) the MDS configs
>> `mds_log_max_segments` and `mds_log_max_expiring` to make it more
>> aggressively trim its journal. (That may not help since your OSD
>> requests are slow.)
>
>
> This may be o
On Fri, Dec 8, 2017 at 6:51 PM, Florent B wrote:
> I don't know I didn't touched that setting. Which one is recommended ?
>
>
If multiple dovecot instances are running at the same time and they
all modify the same files. you need to set fuse_disable_pagecache to
true.
> On 08/12/2017 11:49, Alex
On Thu, Dec 7, 2017 at 3:40 PM, Burkhard Linke
wrote:
> Hi,
>
>
> we have upgraded our cluster to luminous 12.2.2 and wanted to use a second
> MDS for HA purposes. Upgrade itself went well, setting up the second MDS
> from the former standby-replay configuration worked, too.
>
>
> But upon load bo
On Fri, Dec 8, 2017 at 10:04 PM, Florent B wrote:
> When I look in MDS slow requests I have a few like this :
>
> {
> "description": "client_request(client.460346000:5211
> setfilelockrule 1, type 2, owner 9688352835732396778, pid 660, start 0,
> length 0, wait 1 #0x100017da2aa 2017-12
On Mon, Dec 11, 2017 at 10:13 PM, Tobias Prousa wrote:
> Hi there,
>
> I'm running a CEPH cluster for some libvirt VMs and a CephFS providing /home
> to ~20 desktop machines. There are 4 Hosts running 4 MONs, 4MGRs, 3MDSs (1
> active, 2 standby) and 28 OSDs in total. This cluster is up and running
On Mon, Dec 11, 2017 at 10:13 PM, Tobias Prousa wrote:
> Hi there,
>
> I'm running a CEPH cluster for some libvirt VMs and a CephFS providing /home
> to ~20 desktop machines. There are 4 Hosts running 4 MONs, 4MGRs, 3MDSs (1
> active, 2 standby) and 28 OSDs in total. This cluster is up and running
On Mon, Dec 11, 2017 at 11:17 PM, Tobias Prousa wrote:
>
> These are essentially the first commands I did execute, in this exact order.
> Additionally I did a:
>
> ceph fs reset cephfs --yes-i-really-mean-it
>
how many active mds were there before the upgrading.
>
> Any hint on how to find max i
On Tue, Dec 12, 2017 at 4:22 PM, Tobias Prousa wrote:
> Hi there,
>
> regarding my ML post from yesterday (Upgrade from 12.2.1 to 12.2.2 broke my
> CephFs) I was able to get a little further with the suggested
> "cephfs-table-tool take_inos ". This made the whole issue with
> loads of "falsely fre
missing on disk; some
> files may be lost" much earlier during that scrub (so three in total), but
> the first two did not make the MDS going to standby.
> I marked FS repaired, restarted MDS with mdf debug level 20 and reran a
> scrub on that particular path but this time MDS wo
x27; and
stop mds. you'd better to do this after scrub
>
> I cannot simply remove that dir through filesystem as it refuses to delete
> that folder.
>
> Then you say its easy to fix backtrace, yet here it looks like some
> backtraces get fixed with online MDS scrub while mos
On Wed, Dec 13, 2017 at 9:27 AM, 13605702...@163.com
<13605702...@163.com> wrote:
> hi
>
> since Jewel, cephfs is considered as production ready.
> but can anybody tell me which version fo ceph is better? Jewel? kraken? or
> Luminous?
>
luminous, version 12.2.2
> thanks
>
> __
On Wed, Dec 13, 2017 at 10:00 PM, Florent B wrote:
> Hi,
>
> Trying to solve my problem of corrupted files on CephFS, I create a new
> thread to talk about the warning "1 MDSs report slow requests" which
> often occur when a process locks a file for a long time while others
> processes asks for th
On Wed, Dec 13, 2017 at 6:45 PM, Tobias Prousa wrote:
> Hi there,
>
> sorry to disturb you again but I'm still not there. After restoring my
> CephFS to a working state (with a lot of help from Yan, Zheng, thank you so
> much), I got my CephFS back working by restarting MDSs
The problem can happen even multiples clients accessing a file at
different time.
Regards
Yan, Zheng
>
> It seems my problem is gone when I set fuse_disable_pagecache to true,
> only on the client accessing this file.
>
> Is it possible a corruption occurs on CephFS ? I never h
tion (FS size around 630 GB per "df" output, current data pool size
> about 1100 GB, peak size was around 1.3 TB before the mass deletion).
>
There is a reconnect stage during MDS recovers. To reduce reconnect
message size, clients trim unused inodes from their cache
aggressively. In
On Wed, Dec 13, 2017 at 11:49 PM, Florent B wrote:
> On 13/12/2017 16:48, Yan, Zheng wrote:
>> On Wed, Dec 13, 2017 at 11:23 PM, Florent B wrote:
>>> The problem is : only a single client accesses each file !
>>>
>> do you mean the file is only accessed by
On Thu, Dec 14, 2017 at 12:49 AM, Florent B wrote:
> On 13/12/2017 17:40, Yan, Zheng wrote:
>> On Wed, Dec 13, 2017 at 11:49 PM, Florent B wrote:
>>> On 13/12/2017 16:48, Yan, Zheng wrote:
>>>> On Wed, Dec 13, 2017 at 11:23 PM, Florent B wrote:
>>>&
On Thu, Dec 14, 2017 at 2:14 PM, gjprabu wrote:
>
>
> Hi Team,
>
> Today we found one of the client data were not accessible it
> shown "d? ? ? ??? backups" like this.
> Anybody faced same and any solution for this.
>
>
> [root@ /]# cd /data/build/rep
On Thu, Dec 14, 2017 at 12:52 AM, Jens-U. Mozdzen wrote:
> Hi Yan,
>
> Zitat von "Yan, Zheng" :
>>
>> [...]
>>
>> It's likely some clients had caps on unlinked inodes, which prevent
>> MDS from purging objects. When a file gets deleted, mds
On Thu, Dec 14, 2017 at 8:52 PM, Florent B wrote:
> On 14/12/2017 03:38, Yan, Zheng wrote:
>> On Thu, Dec 14, 2017 at 12:49 AM, Florent B wrote:
>>>
>>> Systems are on Debian Jessie : kernel 3.16.0-4-amd64 & libfuse 2.9.3-15.
>>>
>>> I don
: "admin"
> },
> "completed_requests" : 0,
> "num_leases" : 0,
> "inst" : "client.1069714 10.0.0.111:0/1876172355"
>},
>{
> "replay_requests" : 0,
> "reconn
On Fri, Dec 15, 2017 at 6:54 PM, Webert de Souza Lima
wrote:
> Hello, Mr. Yan
>
> On Thu, Dec 14, 2017 at 11:36 PM, Yan, Zheng wrote:
>>
>>
>> The client hold so many capabilities because kernel keeps lots of
>> inodes in its cache. Kernel does not trim inodes
On Fri, Dec 15, 2017 at 8:46 PM, Yan, Zheng wrote:
> On Fri, Dec 15, 2017 at 6:54 PM, Webert de Souza Lima
> wrote:
>> Hello, Mr. Yan
>>
>> On Thu, Dec 14, 2017 at 11:36 PM, Yan, Zheng wrote:
>>>
>>>
>>> The client hold so many capabilities b
On Mon, Dec 18, 2017 at 9:24 AM, 13605702...@163.com
<13605702...@163.com> wrote:
> hi John
>
> thanks for your answer.
>
> in normal condition, i can run "ceph mds fiail" before reboot.
> but if the host reboots by itself for some reason, i can do nothing!
> if this happens, data must be losed.
>
in the BOTH conditions?
>
> in my test, i echo the date string per second into the file under cephfs
> dir,
> when i reboot the master mds, there are 15 lines got lost.
>
> thanks
>
> ________
> 13605702...@163.com
>
>
> From: Yan, Zhe
echo the date string per second into the file under cephfs
> dir,
> when i reboot the master mds, there are 15 lines got lost.
>
what do you mean 15 line got lost? are you sure it's not caused by write stall?
> thanks
>
> ________
> 136057
2017 <-- reboot
> Mon Dec 18 03:08:05 UTC 2017 <-- mds failover works
this is caused by write stall
> Mon Dec 18 03:08:06 UTC 2017
> Mon Dec 18 03:08:07 UTC 2017
> Mon Dec 18 03:08:08 UTC 2017
> Mon Dec 18 03:08:09 UTC 2017
> Mon Dec 18 03:08:10 UTC 2017
>
> __
8 03:08:00 UTC 2017
> Mon Dec 18 03:08:01 UTC 2017
> Mon Dec 18 03:08:02 UTC 2017
> Mon Dec 18 03:08:03 UTC 2017
> Mon Dec 18 03:08:04 UTC 2017
>
>
> 13605702...@163.com
>
>
> From: Yan, Zheng
> Date: 2017-12-18 11:27
> To: 13605702..
On Thu, Dec 21, 2017 at 6:18 PM, nigel davies wrote:
> Hay all is it possable to set cephfs to have an sapce limit
> eg i like to set my cephfs to have an limit of 20TB
> and my s3 storage to have 4TB for example
>
you can set pool quota on cephfs data pools
> thanks
>
>
On Thu, Dec 21, 2017 at 9:32 PM, Stefan Kooman wrote:
> Hi,
>
> We have two MDS servers. One active, one active-standby. While doing a
> parallel rsync of 10 threads with loads of files, dirs, subdirs we get
> the following HEALTH_WARN:
>
> ceph health detail
> HEALTH_WARN 2 MDSs behind on trimmin
ent kernel enough ?
>
See http://tracker.ceph.com/issues/22446. We haven't implemented that
feature. "echo 3 >/proc/sys/vm/drop_caches" should drop most caps.
>
>
> Regards,
>
> Webert Lima
> DevOps Engineer at MAV Tecnologia
> Belo Horizonte - Brasil
> I
rizonte - Brasil
> IRC NICK - WebertRLZ
>
> On Thu, Dec 21, 2017 at 11:55 AM, Yan, Zheng wrote:
>>
>> On Thu, Dec 21, 2017 at 7:33 PM, Webert de Souza Lima
>> wrote:
>> > I have upgraded the kernel on a client node (one that has close-to-zero
>> > tra
On Fri, Dec 22, 2017 at 3:23 PM, nigel davies wrote:
> Right ok I take an look. Can you do that after the pool /cephfs has been set
> up
>
yes, see http://docs.ceph.com/docs/jewel/rados/operations/pools/
>
> On 21 Dec 2017 12:25 pm, "Yan, Zheng" wrote:
>>
>
On Tue, Dec 26, 2017 at 2:28 PM, 周 威 wrote:
> We don't use hardlink.
> I reduced the mds_cache_size from 1000 to 200.
> After that, the num_strays reduce to about 100k
> The cluster is normal now. I think there is some bug about it.
> Anyway, thanks for your reply!
>
This seems like a cli
On Wed, Jan 10, 2018 at 10:59 AM, Mark Schouten wrote:
> Hi,
>
> While upgrading a server with a CephFS mount tonight, it stalled on installing
> a new kernel, because it was waiting for `sync`.
>
> I'm pretty sure it has something to do with the CephFS filesystem which caused
> some issues last w
On Thu, Jan 18, 2018 at 6:39 PM, Florent B wrote:
> I still have file corruption on Ceph-fuse with Luminous (on Debian
> Jessie, default kernel) !
>
> My mounts are using fuse_disable_pagecache=true
>
> And I have a lot of errors like "EOF reading msg header (got 0/30
> bytes)" in my app.
does th
> On 31 Jan 2018, at 15:23, donglifec...@gmail.com wrote:
>
> ZhengYan,
>
> I meet a problem, I use cephfs(10.2.10, kernel client4.12) as backend
> storage when config gitlab, so:
> 1. git clone ssh://git@10.100.161.182/source/test.git
> 2. git add test.file
> 3.git commit -am "test"
> 4.git
have you enabled path restriction on cephfs?
On Thu, Aug 25, 2016 at 1:25 AM, Lazuardi Nasution
wrote:
> Hi,
>
> I have problem with CephFS on writing big size file. I have found that my
> OpenStack Nova backup was not working after I change the rbd based mount of
> /var/lib/nova/instances/snapsh
x27;cat
/proc//stack. This should tell us where does dd hang.
Regard
Yan, Zheng
> Best regards,
>
>
> On Aug 25, 2016 00:46, "Gregory Farnum" wrote:
>>
>> On Wed, Aug 24, 2016 at 10:25 AM, Lazuardi Nasution
>> wrote:
>> > Hi,
>> >
>&
X" in MDS' auth string
> Best regards,
>
>
> On Aug 25, 2016 08:12, "Yan, Zheng" wrote:
>>
>> have you enabled path restriction on cephfs?
>>
>> On Thu, Aug 25, 2016 at 1:25 AM, Lazuardi Nasution
>> wrote:
>> > Hi,
>> >
>
No, idea. I never encounter issue like this. Maybe there is other
issue made you feel that writing small fileworks, but writing large
file does not. Did you try writing small file again after writing
large file failed.
Regards
Yan, Zheng
> Best regards,
>
>>
>> Date: Thu
>>
>> [root@server2]# ls -al /cephfs/webdata/static/456/JHL/66448H-755h.jpg
>> -rw-r--r-- 1 apache apache 53317 Aug 28 23:46
>> /cephfs/webdata/static/456/JHL/66448H-755h.jpg
It seems this file was modified recently. Maybe the web server
silently modifies the files. Please
eturns 0's - This
> is then cached by the file system page cache until it expires or is flushed
> manually.
do server1 or server2 use memory-mapped IO to read the file?
Regards
Yan, Zheng
>
> 5) As step 4 typically only happens on one of the two web servers before
> st
I think about this again. This issue could be caused by stale session.
Could you check kernel logs of your servers. Are there any ceph
related kernel message (such as "ceph: mds0 caps stale")
Regards
Yan, Zheng
On Thu, Sep 1, 2016 at 11:02 PM, Sean Redmond wrote:
> Hi,
>
> I
ime.
>
Can you reproduce this bug manually? (updating file on one server and
reading the file on another server). If you do, please enable
debug_mds=10 , repeat the steps that reproduce this and send log to
us.
Regards
Yan, Zheng
> On Fri, Sep 2, 2016 at 4:37 AM, Yan, Zheng wrote:
>>
&g
ut.
>
> I've mounted on a server that is almost completely idle and I'm still seeing
> those lines every 20 secs.
how busy MDS is? I guess MDS is too busy to response renew cap request?
Regards
Yan, Zheng
>
> Ceph health is OK and the MDS server seems pretty happy althoug
that's why — normally these are kept live
> (unstale) just by passing normal messages back and forth. Although,
> Zheng, it ought to be sending off messages prior to stale if there's
> no other traffic, shouldn't it?
client sends CEPH_SESSION_REQUEST_RENEWCAPS to mds even i
gt;
> Is there already a solution for this?I don't see anything ceph related
> popping up in the release notes of the newer kernels..
>
try updating your kernel. The newest fixes are included in kernel-3.10.0-448.el7
Regards
Yan, Zheng
> Thanks !!
>
> Kenneth
>
> ___
> On 3 Oct 2016, at 20:27, Ilya Dryomov wrote:
>
> On Mon, Oct 3, 2016 at 1:19 PM, Nikolay Borisov wrote:
>> Hello,
>>
>> I've been investigating the following crash with cephfs:
>>
>> [8734559.785146] general protection fault: [#1] SMP
>> [8734559.791921] ioatdma shpchp ipmi_devintf ip
gt;
>>> It's almost never the case that your cache is too small unless your
>>> workload is holding a silly number of files open at one time -- assume
>>> this is a client bug (although some people work around it by creating
>>> much bigger MDS caches!)
>>
On Mon, Oct 3, 2016 at 5:48 AM, Mykola Dvornik wrote:
> Hi Johan,
>
> Many thanks for your reply. I will try to play with the mds tunables and
> report back to your ASAP.
>
> So far I see that mds log contains a lot of errors of the following kind:
>
> 2016-10-02 11:58:03.002769 7f8372d54700 0 md
On Wed, Oct 5, 2016 at 5:06 PM, Burkhard Linke
wrote:
> Hi,
>
> I've managed to move the data from the old pool to the new one using some
> shell scripts and cp/rsync. Recursive getfattr on the mount point does not
> reveal any file with a layout refering the old pool.
>
> Nonetheless 486 objects
icket.
>
I'm afraid that issue does cause corruption.
> BTW, when I posted this issue on the ML the amount of ground state stry
> objects was around 7.5K. Now it went up to 23K. No inconsistent PGs or any
> other problems happened to the cluster within this time scale.
>
>
Here is an untested method
list omap keys in objects 600. ~ 609.. find all duplicated keys
for each duplicated keys, use ceph-dencoder to decode their values,
find the one has the biggest version and delete the rest
(ceph-dencoder type inode_t skip 9 import /tmp/ decode dump_json
On Thu, Oct 6, 2016 at 4:11 PM, wrote:
> Is there any way to repair pgs/cephfs gracefully?
>
So far no. We need to write a tool to repair this type of corruption.
Which version of ceph did you use before upgrading to 10.2.3 ?
Regards
Yan, Zheng
>
>
> -Mykola
>
>
>
&
On Sat, Oct 8, 2016 at 2:05 AM, Kjetil Jørgensen wrote:
> Hi
>
> On Fri, Oct 7, 2016 at 6:31 AM, Yan, Zheng wrote:
>>
>> On Fri, Oct 7, 2016 at 8:20 AM, Kjetil Jørgensen
>> wrote:
>> > And - I just saw another recent thread -
>> > http://tracker.cep
I have written a tool that fixes this type of error. I'm currently
testing it. Will push it out tomorrow
Regards
Yan, Zheng
On Wed, Oct 12, 2016 at 9:18 PM, Davie De Smet
wrote:
> Hi Gregory,
>
> Thanks for the help! I've been looping over all trashcan files and the amo
n Wed, Oct 12, 2016 at 9:51 PM, Davie De Smet
wrote:
> Hi,
>
> That sounds great. I'll certainly try it out.
>
> Kind regards,
>
> Davie De Smet
>
> -Original Message-
> From: Yan, Zheng [mailto:uker...@gmail.com]
> Sent: Wednesday, October 12, 2016 3:41
On Sat, Oct 22, 2016 at 4:14 AM, Gregory Farnum wrote:
> On Fri, Oct 21, 2016 at 7:56 AM, Nick Fisk wrote:
>>> -Original Message-
>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>>> Haomai Wang
>>> Sent: 21 October 2016 15:40
>>> To: Nick Fisk
>>> Cc: ceph-u
> On 24 Oct 2016, at 17:29, Nick Fisk wrote:
>
>> -Original Message-----
>> From: Yan, Zheng [mailto:uker...@gmail.com]
>> Sent: 24 October 2016 10:19
>> To: Gregory Farnum
>> Cc: Nick Fisk ; Zheng Yan ; Ceph Users
>>
>> Subject: Re: [cep
I finally reproduced this issue. Adding following lines to httpd.conf
can workaround this issue.
EnableMMAP off
EnableSendfile off
On Sat, Sep 3, 2016 at 11:07 AM, Yan, Zheng wrote:
> On Fri, Sep 2, 2016 at 5:10 PM, Sean Redmond wrote:
>> I have checked all the servers in scop
nel is too old, please use recent 4.x kernel
Regards
Yan, Zheng
> Thank you
>
> Florent
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_
er some time it crashes and I need to
> reset the fs to get it back again.
>
> I'm at a loss here.
I guess you did reset mds journal. have you run complete recovery sequence?
cephfs-data-scan init
cephfs-data-scan scan_extents
cephfs-data-scan scan_inodes
cephfs-data-scan scan_links
On Thu, Jul 5, 2018 at 4:51 PM Dennis Kramer (DBS) wrote:
>
> Hi,
>
>
> On Thu, 2018-07-05 at 09:55 +0800, Yan, Zheng wrote:
> > On Wed, Jul 4, 2018 at 7:02 PM Dennis Kramer (DBS)
> > wrote:
> > >
> > >
> > > Hi,
> > >
&g
On Thu, Jul 12, 2018 at 11:39 PM Alessandro De Salvo
wrote:
>
> Some progress, and more pain...
>
> I was able to recover the 200. using the ceph-objectstore-tool for
> one of the OSDs (all identical copies) but trying to re-inject it just with
> rados put was giving no error while the g
could you profile memory allocation of mds
http://docs.ceph.com/docs/mimic/rados/troubleshooting/memory-profiling/
On Tue, Jul 24, 2018 at 7:54 AM Daniel Carrasco wrote:
>
> Yeah, is also my thread. This thread was created before lower the cache size
> from 512Mb to 8Mb. I thought that maybe was
19 Thread heaps in use
> MALLOC: 8192 Tcmalloc page size
>
> Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
> Bytes released to the OS take up virtual address
running. Is there any
> problem if is done in a low traffic time? (less usage and maybe it don't
> fails, but maybe less info about usage).
>
just one time, wait a few minutes between start_profiler and stop_profiler
> Greetings!
>
> 2018-07-24 10:21 GMT+02:00 Yan, Zh
limit to 25Mb to test if stil with aceptable values of RAM.
>
Looks like there are memory leak in async messenger. what's output of
"dd /usr/bin/ceph-mds"? Could you try simple messenger (add "ms type =
simple" to 'global' section of ceph.conf)
Regards
Yan, Z
On Wed, Jul 25, 2018 at 8:12 PM Yan, Zheng wrote:
>
> On Wed, Jul 25, 2018 at 5:04 PM Daniel Carrasco wrote:
> >
> > Hello,
> >
> > I've attached the PDF.
> >
> > I don't know if is important, but I made changes on configuration and I've
On Fri, Jul 27, 2018 at 4:47 PM Guillaume Lefranc
wrote:
>
> Hi,
>
> I am trying to repair a failed cluster with multiple MDS, but the failed MDS
> crashes on restart and won't stay up. I could not find a bug report for that
> specific failure. Here are the logs:
>
> -9> 2018-07-27 10:40:45.
On Wed, Aug 1, 2018 at 6:43 AM Kamble, Nitin A
wrote:
> Hi John,
>
>
>
> I am running ceph Luminous 12.2.1 release on the storage nodes with
> v4.4.114 kernel on the cephfs clients.
>
>
>
> 3 client nodes are running 3 instances of a test program.
>
> The test program is doing this repeatedly in
On Thu, Aug 2, 2018 at 3:36 AM Benjeman Meekhof wrote:
>
> I've been encountering lately a much higher than expected memory usage
> on our MDS which doesn't align with the cache_memory limit even
> accounting for potential over-runs. Our memory limit is 4GB but the
> MDS process is steadily at ar
"items": 166781270,
> "bytes": 10728710833
> }
> }
> }
>
> and heap_stats:
>
> MALLOC:12418630040 (11843.3 MiB) Bytes in use by application
> MALLOC: + 1310720 ( 1.2 MiB) Bytes in page heap freelist
> MALLOC:
On Mon, Aug 6, 2018 at 5:36 PM Zhou Choury wrote:
>
> The mds of my cluster can't boot, crash all the time.
> The log is attached.
>
please set debug_mds=20 and try starting mds again.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http:
did you mount cephfs on the same machines that run ceph-osd?
On Tue, Aug 7, 2018 at 5:14 PM Zhenshi Zhou wrote:
>
> Hi Burkhard,
> Files located in /sys/kernel/debug/ceph/ are all new files generated
> after I reboot the server.
> The clients were in blacklist and I manully remove them from the
On Tue, Aug 7, 2018 at 7:15 PM Zhenshi Zhou wrote:
>
> Yes, some osd servers mount cephfs
>
this can cause memory deadlock. you should avoid doing this
> Yan, Zheng 于2018年8月7日 周二19:12写道:
>>
>> did you mount cephfs on the same machines that run ceph-osd?
>>
>
memory
>>> pressure worse, until the whole thing just seizes up.
>>>
>>> John
>>>
>>> > Granted, I am using ceph-fuse rather than the kernel client at this
>>> > point, but that isn’t etched in stone.
>>> >
>>> > Curi
shi Zhou wrote:
>>>>>
>>>>> Hi, I find an old server which mounted cephfs and has the debug files.
>>>>> # cat osdc
>>>>> REQUESTS 0 homeless 0
>>>>> LINGER REQUESTS
>>>>> BACKOFFS
>>>>> # cat
have you ever run disaster recovery
(http://docs.ceph.com/docs/luminous/cephfs/disaster-recovery/). Try
following steps
stop mds.a and run following commands step by step
cephfs-table-tool 0 reset session
cephfs-journal-tool event recover_dentries summary
cephfs-data-scan scan_links
restart mds
On Sat, Aug 11, 2018 at 1:21 PM Amit Handa wrote:
>
> Thanks for the response, gregory.
>
> We need to support a couple of production services we have migrated to ceph.
> So we are in a bit of soup.
>
> cluster is as follows:
> ```
> ceph osd tree
> ID CLASS WEIGHT TYPE NAME STATUS REWEI
On Mon, Aug 13, 2018 at 9:55 PM Zhenshi Zhou wrote:
>
> Hi Burkhard,
> I'm sure the user has permission to read and write. Besides, we're not using
> EC data pools.
> Now the situation is that any openration to a specific file, the command will
> hang.
> Operations to any other files won't hang.
On Wed, Aug 15, 2018 at 11:44 PM Jonathan Woytek wrote:
>
> Hi list people. I was asking a few of these questions in IRC, too, but
> figured maybe a wider audience could see something that I'm missing.
>
> I'm running a four-node cluster with cephfs and the kernel-mode driver as the
> primary ac
Aug 15, 2018 at 10:42 PM, Jonathan Woytek wrote:
> > On Wed, Aug 15, 2018 at 9:40 PM, Yan, Zheng wrote:
> >> How many client reconnected when mds restarts? The issue is likely
> >> because reconnected clients held two many inodes, mds was opening
> >> the
these files are open files hints. It's safe to delete them.
> On Wed, Aug 15, 2018 at 10:51 PM, Yan, Zheng wrote:
> > On Thu, Aug 16, 2018 at 10:50 AM Jonathan Woytek wrote:
> >>
> >> Actually, I missed it--I do see the wipe start, wipe done in the log.
> >
21:13:51.163236 #0x12e7e5a 2018-08-24
> 21:13:51.163236 caller_uid=0, caller_gid=0{}) currently failed to xlock,
> waiting
> 2018-08-24 21:14:54.698086 [WRN] 1 slow requests, 1 included below; oldest
> blocked for > 63.540533 secs
> 2018-08-24 21:14:28.217536 [WRN]
have osdmap 4545 want 4546
> have fsmap.user 0
> have mdsmap 446 want 447+
> fs_cluster_id -1
>
> mdsc:
>
> 649065 mds0 setattr #12e7e5a
>
> Anything useful?
>
>
>
> Yan, Zheng 于2018年8月25日周六 上午7:53写道:
>>
>> Are there hang request in /sys/k
Could you strace apacha process, check which syscall waits for a long time.
On Sat, Aug 25, 2018 at 3:04 AM Stefan Kooman wrote:
>
> Quoting Gregory Farnum (gfar...@redhat.com):
>
> > Hmm, these aren't actually the start and end times to the same operation.
> > put_inode() is literally adjusting a
stable. If you use
kernel client, you'd better to recent kernel and all client use same
kernel version
> Yan, Zheng 于2018年8月27日周一 上午11:41写道:
>>
>> please check client.213528, instead of client.267792. which version of
>> kernel client.213528 use.
>> On Sat, Aug 25,
On Mon, Aug 27, 2018 at 4:47 AM Stefan Kooman wrote:
>
> Hi,
>
> Quoting Yan, Zheng (uker...@gmail.com):
> > Could you strace apacha process, check which syscall waits for a long time.
>
> Yes, that's how I did all the tests (strace -t -T apache2 -X). With
> deb
t;stone-aged", it matches CentOS 7's
> userspace and RedHat is taking good care to implement fixes.
>
We have already backported quota patches to RHEL 3.10 kernel. It may
take some time for redhat to release the new kernel.
Regards
Yan, Zheng
> Seeing that even features are back
It's a bug. search thread "Poor CentOS 7.5 client performance" in ceph-users.
On Tue, Aug 28, 2018 at 2:50 AM Marc Roos wrote:
>
>
> I have a idle test cluster (centos7.5, Linux c04
> 3.10.0-862.9.1.el7.x86_64), and a client kernel mount cephfs.
>
> I tested reading a few files on this cephfs moun
On Mon, Sep 3, 2018 at 1:57 AM Marlin Cremers
wrote:
>
> Hey there,
>
> So I now have a problem since none of my MDSes can start anymore.
>
> They are stuck in the resolve state since Ceph things there are still MDSes
> alive which I can see when I run:
>
need mds log to check why mds are stuck
Could you please try make ceph-fuse use simple messenger (add "ms type
= simple" to client section of ceph.conf).
Regards
Yan, Zheng
On Wed, Sep 5, 2018 at 10:09 PM Sage Weil wrote:
>
> On Wed, 5 Sep 2018, Andras Pataki wrote:
> > Hi cephers,
> >
> > Every so
"items": 16753337,
> > "bytes": 68782648777
> > },
> > "buffer_meta": {
> > "items": 771,
> > "bytes": 67848
> > },
> > ... snip ...
> > "osdmap"
201 - 300 of 548 matches
Mail list logo