[ceph-users] mds_cache_memory_limit value

2018-10-05 Thread Hervé Ballans
Hi all, I have just configured a new value for 'mds_cache_memory_limit'. The output message tells "not observed, change may require restart". So I'm not really sure, has the new value been taken into account directly or do I have to restart the mds daemons on each MDS node ? $ sudo ceph tell

Re: [ceph-users] mds_cache_memory_limit value

2018-10-05 Thread Eugen Block
Hi, you can monitor the cache size and see if the new values are applied: ceph@mds:~> ceph daemon mds. cache status { "pool": { "items": 106708834, "bytes": 5828227058 } } You should also see in top (or similar tools) that the memory increases/decreases. From my experi

Re: [ceph-users] Erasure coding with more chunks than servers

2018-10-05 Thread Caspar Smit
Hi Vlad, You can check this blog: http://cephnotes.ksperis.com/blog/2017/01/27/erasure-code-on-small-clusters Note! Be aware that these settings do not automatically cover a node failure. Check out this thread why: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-February/024423.html K

Re: [ceph-users] Cannot write to cephfs if some osd's are not available on the client network

2018-10-05 Thread Marc Roos
I guess then this waiting "quietly" should be looked at again, I am having load of 10 on this vm. [@~]# uptime 11:51:58 up 4 days, 1:35, 1 user, load average: 10.00, 10.01, 10.05 [@~]# uname -a Linux smb 3.10.0-862.11.6.el7.x86_64 #1 SMP Tue Aug 14 21:49:04 UTC 2018 x86_64 x86_64 x86_64

[ceph-users] Inconsistent directory content in cephfs

2018-10-05 Thread Burkhard Linke
Hi, a user just stumbled across a problem with directory content in cephfs (kernel client, ceph 12.2.8, one active, one standby-replay instance): root@host1:~# ls /ceph/sge-tmp/db/work/06/ | wc -l 224 root@host1:~# uname -a Linux host1 4.13.0-32-generic #35~16.04.1-Ubuntu SMP Thu Jan 25 10:1

[ceph-users] Invalid bucket in reshard list

2018-10-05 Thread Alexandru Cucu
Hello, I'm running a Luminous 12.2.7 cluster. Wanted to reshard the index of an RGW bucket and accidentally typed the name wrong. Now in "radosgw-admin reshard list" I have a task for a bucket that does not exist. Can't process or cancel it: # radosgw-admin reshard process ERROR: failed

Re: [ceph-users] Inconsistent directory content in cephfs

2018-10-05 Thread Paul Emmerich
Try running a scrub on that directory, that might yield more information. ceph daemon mds.XXX scrub_path /path/in/cephfs recursive Afterwards you can maybe try to repair it if it finds the error. Could also be something completely different like a bug in the clients. Paul Am Fr., 5. Okt. 2018 um

Re: [ceph-users] Best handling network maintenance

2018-10-05 Thread Martin Palma
Thank you all for the clarification and suggestion. Here is a small experience report what happened during the network maintenance, maybe it is useful for others too: As previously written the Ceph cluster is stretched across two data centers and has a size of 39 storage nodes with a total of 525

Re: [ceph-users] Inconsistent directory content in cephfs

2018-10-05 Thread Sergey Malinin
Are you sure these mounts (work/06 and work/6c) refer to the same directory? > On 5.10.2018, at 13:57, Burkhard Linke > wrote: > > root@host2:~# ls /ceph/sge-tmp/db/work/06/ | wc -l ... > root@host3:~# ls /ceph/sge-tmp/db/work/6c | wc -l ___ ceph-use

Re: [ceph-users] Best handling network maintenance

2018-10-05 Thread Darius Kasparavičius
Hello, I would have risked a nodown option for this short downtime. We had a similar experience when we updated a bonded switch and had reboot it. Some of the connections dropped and whole cluster started marking some osds as down. Due to this almost all osd were marked as down, but none of the p

Re: [ceph-users] Erasure coding with more chunks than servers

2018-10-05 Thread Paul Emmerich
Oh, and you'll need to use m>=3 to ensure availability during a node failure. Paul Am Fr., 5. Okt. 2018 um 11:22 Uhr schrieb Caspar Smit : > > Hi Vlad, > > You can check this blog: > http://cephnotes.ksperis.com/blog/2017/01/27/erasure-code-on-small-clusters > > Note! Be aware that these setting

Re: [ceph-users] Mimic offline problem

2018-10-05 Thread Sage Weil
Quick update here: The problem with the OSDs that are throwing rocksdb errors (missing SST files) is that ceph-kvstore-tool bluestore-kv ... repair was run on OSDs, and it looks like the rocksdb repair function actually broke the (non-broken) rocksdb instance. I'm not quite sure why that is th

Re: [ceph-users] Cluster broken and ODSs crash with failed assertion in PGLog::merge_log

2018-10-05 Thread Neha Ojha
Hi JJ, In the case, the condition olog.head >= log.tail is not true, therefore it crashes. Could you please open a tracker issue(https://tracker.ceph.com/) and attach the osd logs and the pg dump output? Thanks, Neha On Thu, Oct 4, 2018 at 9:29 AM, Jonas Jelten wrote: > Hello! > > Unfortunately

Re: [ceph-users] deep scrub error caused by missing object

2018-10-05 Thread ceph
Hello Roman, I am Not sure if i could be a help but perhaps this Commands can help to find the objects in question... Ceph Heath Detail rados list-inconsistent-pg rbd rados list-inconsistent-obj 2.10d I guess it is also interresting to know you use bluestore or filestore... Hth - Mehmet Am

Re: [ceph-users] Some questions concerning filestore --> bluestore migration

2018-10-05 Thread solarflow99
oh my.. yes 2TB enterprise class SSDs, that a much higher requirement than filestore needed. That would be cost prohibitive to any lower end ceph cluster, On Thu, Oct 4, 2018 at 11:19 PM Massimo Sgaravatto < massimo.sgarava...@gmail.com> wrote: > Argg !! > With 10x10TB SATA DB and 2 SSD disks

Re: [ceph-users] Some questions concerning filestore --> bluestore migration

2018-10-05 Thread Mark Nelson
FWIW, here are values I measured directly from the RocksDB SST files under different small write workloads (ie the ones where you'd expect a larger DB footprint): https://drive.google.com/file/d/1Ews2WR-y5k3TMToAm0ZDsm7Gf_fwvyFw/view?usp=sharing These tests were only with 256GB of data written

Re: [ceph-users] interpreting ceph mds stat

2018-10-05 Thread Gregory Farnum
On Wed, Oct 3, 2018 at 10:09 AM Jeff Smith wrote: > I need some help deciphering the results of ceph mds stat. I have > been digging in the docs for hours. If someone can point me in the > right direction and/or help me understand. > > In the documentation it shows a result like this. > > cephf

Re: [ceph-users] provide cephfs to mutiple project

2018-10-05 Thread Gregory Farnum
Check out http://docs.ceph.com/docs/master/cephfs/client-auth/ On Wed, Oct 3, 2018 at 8:58 PM Joshua Chen wrote: > Hello all, > I am almost ready to provide storage (cephfs in the beginning) to my > colleagues, they belong to different main project, and according to their > budget that are pre

Re: [ceph-users] Cannot write to cephfs if some osd's are not available on the client network

2018-10-05 Thread Gregory Farnum
On Fri, Oct 5, 2018 at 3:13 AM Marc Roos wrote: > > > I guess then this waiting "quietly" should be looked at again, I am > having load of 10 on this vm. > > [@~]# uptime > 11:51:58 up 4 days, 1:35, 1 user, load average: 10.00, 10.01, 10.05 > > [@~]# uname -a > Linux smb 3.10.0-862.11.6.el7.x

Re: [ceph-users] MDS hangs in "heartbeat_map" deadlock

2018-10-05 Thread Gregory Farnum
On Thu, Oct 4, 2018 at 3:58 PM Stefan Kooman wrote: > Dear list, > > Today we hit our first Ceph MDS issue. Out of the blue the active MDS > stopped working: > > mon.mon1 [WRN] daemon mds.mds1 is not responding, replacing it as rank 0 > with standby > daemon mds.mds2. > > Logging of ceph-mds1: >

Re: [ceph-users] MDS hangs in "heartbeat_map" deadlock

2018-10-05 Thread Stefan Kooman
Quoting Gregory Farnum (gfar...@redhat.com): > > Ah, there's a misunderstanding here — the output isn't terribly clear. > "is_healthy" is the name of a *function* in the source code. The line > > heartbeat_map is_healthy 'MDSRank' had timed out after 15 > > is telling you that the heartbeat_map'

Re: [ceph-users] MDS damaged after mimic 13.2.1 to 13.2.2 upgrade

2018-10-05 Thread Sergey Malinin
Update: I discovered http://tracker.ceph.com/issues/24236 and https://github.com/ceph/ceph/pull/22146 Make sure that it is not relevant in your case before proceeding to operations that modify on-disk data. > On

[ceph-users] daahboard

2018-10-05 Thread solarflow99
I enabled the dashboard module in ansible but I don't see ceph-mgr listening on a port for it. Is there something else I missed? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com