[ceph-users] Unfound object on erasure when recovering

2018-10-02 Thread Jan Pekař - Imatic
Hi all, I'm playing with my testing cluster with ceph 12.2.8 installed. It happened to me for the second time, that I have 1 unfound objects on erasure coded pool. I have erasure with 3+1 configuration. First time I was adding additional disk. During cluster rebalance I noticed one unfound ob

Re: [ceph-users] Ceph 12.2.5 - FAILED assert(0 == "put on missing extent (nothing before)")

2018-10-02 Thread Gregory Farnum
I'd create a new ticket and reference the older one; they may not have the same cause. On Tue, Oct 2, 2018 at 12:33 PM Ricardo J. Barberis wrote: > Hello, > > I'm having this same issue on 12.2.8. Should I repoen the bug report? > > This cluster started on 12.2.4 and was upgraded to 12.2.5 and t

Re: [ceph-users] commit_latency equals apply_latency on bluestore

2018-10-02 Thread Gregory Farnum
As I mentioned in that email, the apply and commit values in BlueStore are equivalent. They're exported because it's part of the interface (thanks to FileStore), but they won't differ. If you're doing monitoring or graphs, just pick one. -Greg On Tue, Oct 2, 2018 at 3:43 PM Jakub Jaszewski wrote:

Re: [ceph-users] Mimic offline problem

2018-10-02 Thread Sage Weil
osd_find_best_info_ignore_history_les is a dangerous option and you should only use it in very specific circumstances when directed by a developer. In such cases it will allow a stuck PG to peer. But you're not getting to that point...you're seeing some sort of resource exhaustion. The noup t

Re: [ceph-users] Bluestore vs. Filestore

2018-10-02 Thread Christian Balzer
Hello, this has crept up before, find my thread "Bluestore caching, flawed by design?" for starters, if you haven't already. I'll have to build a new Ceph cluster next year and am also less than impressed with the choices at this time: 1. Bluestore is the new shiny, filestore is going to die

Re: [ceph-users] "rgw relaxed s3 bucket names" and underscores

2018-10-02 Thread Ryan Leimenstoll
Nope, you are right. I think it was just boto catching this for me and I took that for granted. I think that is the behavior I would expect too, S3-compliant restrictions on create and allow legacy buckets to remain. Anyway, noticed you created a ticket [0] in the tracker for this, thanks! Be

Re: [ceph-users] Help! OSDs across the cluster just crashed

2018-10-02 Thread Vasu Kulkarni
can you file tracker for your issues(http://tracker.ceph.com/projects/ceph/issues/new) , email once its lengthy is not great to track the issue, Ideally full details of environment (os/ceph versions /before/after/workload info/ tool used for upgrade) is important if one has to recreate it. There a

[ceph-users] commit_latency equals apply_latency on bluestore

2018-10-02 Thread Jakub Jaszewski
Hi Cephers, Hi Gregory, I consider same case like here, commit_latency==apply_latency in ceph osd perf http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-January/024317.html What's the meaning of commit_latency and apply_latency in bluestore OSD setups[? How useful is it when troubleshooti

Re: [ceph-users] Help! OSDs across the cluster just crashed

2018-10-02 Thread Goktug Yildirim
Hi, Sorry to hear that. I’ve been battling with mine for 2 weeks :/ I’ve corrected mine OSDs with the following commands. My OSD logs (/var/log/ceph/ceph-OSDx.log) has a line including log(EER) with the PG number besides and before crash dump. ceph-objectstore-tool --data-path /var/lib/ceph/os

[ceph-users] Testing cluster throughput - one OSD is always 100% utilized during rados bench write

2018-10-02 Thread Jakub Jaszewski
Hi Cephers, I'm testing cluster throughput before moving to the production. Ceph version 13.2.1 (I'll update to 13.2.2). I run rados bench from 10 cluster nodes and 10 clients in parallel. Just after I call rados command, HDDs behind three OSDs are 100% utilized while others are < 40%. After the

Re: [ceph-users] "rgw relaxed s3 bucket names" and underscores

2018-10-02 Thread Robin H. Johnson
On Tue, Oct 02, 2018 at 12:37:02PM -0400, Ryan Leimenstoll wrote: > I was hoping to get some clarification on what "rgw relaxed s3 bucket > names = false” is intended to filter. Yes, it SHOULD have caught this case, but does not. Are you sure it rejects the uppercase? My test also showed that it

Re: [ceph-users] Mimic offline problem

2018-10-02 Thread Göktuğ Yıldırım
Anyone heart about osd_find_best_info_ignore_history_les = true ? Is that be usefull here? There is such a less information about it. Goktug Yildirim şunları yazdı (2 Eki 2018 22:11): > Hi, > > Indeed I left ceph-disk to decide the wal and db partitions when I read > somewhere that that will d

[ceph-users] Help! OSDs across the cluster just crashed

2018-10-02 Thread Brett Chancellor
Help. I have a 60 node cluster and most of the OSDs decided to crash themselves at the same time. They wont restart, the messages look like... --- begin dump of recent events --- 0> 2018-10-02 21:19:16.990369 7f57ab5b7d80 -1 *** Caught signal (Aborted) ** in thread 7f57ab5b7d80 thread_name:c

Re: [ceph-users] RBD Mirror Question

2018-10-02 Thread Jason Dillaman
On Tue, Oct 2, 2018 at 4:47 PM Vikas Rana wrote: > > Hi, > > We have a CEPH 3 node cluster at primary site. We created a RBD image and the > image has about 100TB of data. > > Now we installed another 3 node cluster on secondary site. We want to > replicate the image at primary site to this new

Re: [ceph-users] Bluestore vs. Filestore

2018-10-02 Thread Ronny Aasen
On 02.10.2018 21:21, jes...@krogh.cc wrote: On 02.10.2018 19:28, jes...@krogh.cc wrote: In the cephfs world there is no central server that hold the cache. each cephfs client reads data directly from the osd's. I can accept this argument, but nevertheless .. if I used Filestore - it would work.

[ceph-users] RBD Mirror Question

2018-10-02 Thread Vikas Rana
Hi, We have a CEPH 3 node cluster at primary site. We created a RBD image and the image has about 100TB of data. Now we installed another 3 node cluster on secondary site. We want to replicate the image at primary site to this new cluster on secondary site. As per documentation, we enabled journ

Re: [ceph-users] Mimic offline problem

2018-10-02 Thread Goktug Yildirim
Hi, Indeed I left ceph-disk to decide the wal and db partitions when I read somewhere that that will do the proper sizing. For the blustore cache size I have plenty of RAM. I will increase 8GB for each and decide a more calculated numberafter cluster settles. For the osd map loading I’ve a

Re: [ceph-users] Mimic offline problem

2018-10-02 Thread Goktug Yildirim
Thanks for the reply! My answers are inline. > On 2 Oct 2018, at 21:51, Paul Emmerich wrote: > > (Didn't follow the whole story, so you might have already answered that) > Did you check what the OSDs are doing during the period of high disk > utilization? > As in: > > * running perf top Did not

Re: [ceph-users] EC pool spread evenly across failure domains?

2018-10-02 Thread Paul Emmerich
step take default step choose indep 3 chassis step chooseleaf indep 2 host which will only work for k+m=6 setups Paul Am Di., 2. Okt. 2018 um 20:36 Uhr schrieb Mark Johnston : > > I have the following setup in a test cluster: > > -1 8.49591 root default > -15 2.83197 chassis vm1

Re: [ceph-users] Mimic offline problem

2018-10-02 Thread Darius Kasparavičius
Hi, I can see some issues from the osd log file. You have an extremely low size db and wal partitions. Only 1GB for DB and 576MB for wal. I would recommend cranking up rocksdb cache size as much as possible. If you have RAM you can also increase bluestores cache size for hdd. Default is 1GB be as

Re: [ceph-users] Mimic offline problem

2018-10-02 Thread Paul Emmerich
(Didn't follow the whole story, so you might have already answered that) Did you check what the OSDs are doing during the period of high disk utilization? As in: * running perf top * sampling a few stack traces from procfs or gdb * or just high log settings * running "status" on the admin socket l

Re: [ceph-users] EC pool spread evenly across failure domains?

2018-10-02 Thread Vasu Kulkarni
On Tue, Oct 2, 2018 at 11:35 AM Mark Johnston wrote: > > I have the following setup in a test cluster: > > -1 8.49591 root default > -15 2.83197 chassis vm1 > -3 1.41599 host ceph01 > 0 ssd 1.41599 osd.0 > -5 1.41599 host ceph02 > 1

Re: [ceph-users] Ceph 12.2.5 - FAILED assert(0 == "put on missing extent (nothing before)")

2018-10-02 Thread Ricardo J. Barberis
Hello, I'm having this same issue on 12.2.8. Should I repoen the bug report? This cluster started on 12.2.4 and was upgraded to 12.2.5 and then directly to 12.2.8 (we skipped 2.6 and 2.7) but the malfunctioning OSD is on a new node installed with 12.2.8. We're using CentOS 7.5, and bluestore f

Re: [ceph-users] cephfs issue with moving files between data pools gives Input/output error

2018-10-02 Thread Marc Roos
I would 'also' choose for a solution where in case there is mv across pools, the user has to wait a bit longer for the cp to finish. And as said before if you export cephfs via smb or nfs, I wonder how the nfs/smb server will execute the move. If I use 1x replicated pool on /tmp and move the

Re: [ceph-users] Bluestore vs. Filestore

2018-10-02 Thread jesper
> On 02.10.2018 19:28, jes...@krogh.cc wrote: > In the cephfs world there is no central server that hold the cache. each > cephfs client reads data directly from the osd's. I can accept this argument, but nevertheless .. if I used Filestore - it would work. > This also means no > single point of

[ceph-users] getattr - failed to rdlock waiting

2018-10-02 Thread Thomas Sumpter
Hi Folks, I am looking for advice on how to troubleshoot some long operations found in MDS. Most of the time performance is fantastic, but occasionally and to no real pattern or trend, a gettattr op will take up to ~30 seconds to complete in MDS which is stuck on "event": "failed to rdlock, wai

Re: [ceph-users] Bluestore vs. Filestore

2018-10-02 Thread Ronny Aasen
On 02.10.2018 19:28, jes...@krogh.cc wrote: Hi. Based on some recommendations we have setup our CephFS installation using bluestore*. We're trying to get a strong replacement for "huge" xfs+NFS server - 100TB-ish size. Current setup is - a sizeable Linux host with 512GB of memory - one large De

[ceph-users] EC pool spread evenly across failure domains?

2018-10-02 Thread Mark Johnston
I have the following setup in a test cluster: -1 8.49591 root default -15 2.83197 chassis vm1 -3 1.41599 host ceph01 0 ssd 1.41599 osd.0 -5 1.41599 host ceph02 1 ssd 1.41599 osd.1

[ceph-users] Recover data from cluster / get rid of down, incomplete, unknown pgs

2018-10-02 Thread Dylan Jones
Our ceph cluster stopped responding to requests two weeks ago, and I have been trying to fix it since then. After a semi-hard reboot, we had 11-ish OSDs "fail" spread across two hosts, with the pool size set to two. I was able to extract a copy of every PG that resided solely on the nonfunctional

Re: [ceph-users] QEMU/Libvirt + librbd issue using Luminous 12.2.7

2018-10-02 Thread Andre Goree
On 2018/10/02 2:03 pm, Andre Goree wrote: On 2018/10/02 1:54 pm, Jason Dillaman wrote: On Tue, Oct 2, 2018 at 1:48 PM Andre Goree wrote: I'm actually not so sure the libvirt user has write access to the location -- will libvirt automatically try to write to the file (given that it's a sett

Re: [ceph-users] QEMU/Libvirt + librbd issue using Luminous 12.2.7

2018-10-02 Thread Andre Goree
On 2018/10/02 1:54 pm, Jason Dillaman wrote: On Tue, Oct 2, 2018 at 1:48 PM Andre Goree wrote: I'm actually not so sure the libvirt user has write access to the location -- will libvirt automatically try to write to the file (given that it's a setting in ceph.conf)? I just confirmed that the

Re: [ceph-users] Mimic offline problem

2018-10-02 Thread Goktug Yildirim
Hello Darius, Thanks for reply! The main problem is we can not query PGs. “ceph pg 67.54f query” does stucks and wait forever since OSD is unresponsive. We are certain that OSD gets unresponsive as soon as it UP. And we are certain that OSD responds again after its disk utilization stops. So

Re: [ceph-users] QEMU/Libvirt + librbd issue using Luminous 12.2.7

2018-10-02 Thread Jason Dillaman
On Tue, Oct 2, 2018 at 1:48 PM Andre Goree wrote: > > On 2018/10/02 1:29 pm, Jason Dillaman wrote: > > On Tue, Oct 2, 2018 at 1:25 PM Andre Goree wrote: > >> > >> > >> Unfortunately, it would appear that I'm not getting anything in the > >> logs > >> _but_ the creation of the rbd image -- i.e., n

Re: [ceph-users] QEMU/Libvirt + librbd issue using Luminous 12.2.7

2018-10-02 Thread Andre Goree
On 2018/10/02 1:29 pm, Jason Dillaman wrote: On Tue, Oct 2, 2018 at 1:25 PM Andre Goree wrote: Unfortunately, it would appear that I'm not getting anything in the logs _but_ the creation of the rbd image -- i.e., nothing regarding the attempt to attach it via libvirt. Here are the logs, fo

Re: [ceph-users] QEMU/Libvirt + librbd issue using Luminous 12.2.7

2018-10-02 Thread Jason Dillaman
On Tue, Oct 2, 2018 at 1:25 PM Andre Goree wrote: > > On 2018/10/02 10:26 am, Andre Goree wrote: > > On 2018/10/02 9:54 am, Jason Dillaman wrote: > >> Perhaps that pastebin link has the wrong log pasted? The provided log > >> looks like it's associated with the creation of image > >> "32635-b65927

[ceph-users] Bluestore vs. Filestore

2018-10-02 Thread jesper
Hi. Based on some recommendations we have setup our CephFS installation using bluestore*. We're trying to get a strong replacement for "huge" xfs+NFS server - 100TB-ish size. Current setup is - a sizeable Linux host with 512GB of memory - one large Dell MD1200 or MD1220 - 100TB + a Linux kernel N

Re: [ceph-users] QEMU/Libvirt + librbd issue using Luminous 12.2.7

2018-10-02 Thread Andre Goree
On 2018/10/02 10:26 am, Andre Goree wrote: On 2018/10/02 9:54 am, Jason Dillaman wrote: Perhaps that pastebin link has the wrong log pasted? The provided log looks like it's associated with the creation of image "32635-b6592790-5519-5184-b5ef-5f16b3523250" and not the attachment of an image to a

Re: [ceph-users] Cephfs mds cache tuning

2018-10-02 Thread Adam Tygart
It may be that having multiple mds is masking the issue, or that we truly didn't have a large enough inode cache at 55GB. Things are behaving for me now, even when presenting the same 0 entries in req and rlat. If this happens again, I'll attempt to get perf trace logs, along with ops, ops_in_flig

[ceph-users] "rgw relaxed s3 bucket names" and underscores

2018-10-02 Thread Ryan Leimenstoll
Hi all, I was hoping to get some clarification on what "rgw relaxed s3 bucket names = false” is intended to filter. In our cluster (Luminous 12.2.8, serving S3) it seems that RGW, with that setting set to false, is still allowing buckets with underscores in the name to be created, although thi

Re: [ceph-users] Mimic offline problem

2018-10-02 Thread Darius Kasparavičius
Hello, Currently you have 15 objects missing. I would recommend finding them and making backups of them. Ditch all other osds that are failing to start and concentrate on bringing online those that have missing objects. Then slowly turn off nodown and noout on the cluster and see if it stabilises

Re: [ceph-users] Strange Ceph host behaviour

2018-10-02 Thread Steve Taylor
Unless this is related to load and OSDs really are unreponsive, it is almost certainly some sort of network issue. Duplicate IP address maybe? Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation 380 Data Drive Suite 300 | Draper | Utah | 84020 Office: 801.871.2799 |

[ceph-users] Strange Ceph host behaviour

2018-10-02 Thread Vincent Godin
Ceph cluster in Jewel 10.2.11 Mons & Hosts are on CentOS 7.5.1804 kernel 3.10.0-862.6.3.el7.x86_64 Everyday, we can see in ceph.log on Monitor a lot of logs like these : 2018-10-02 16:07:08.882374 osd.478 192.168.1.232:6838/7689 386 : cluster [WRN] map e612590 wrongly marked me down 2018-10-02 16

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-02 Thread Sergey Malinin
Sent download link by email. verbosity=10, over 900M uncompressed. > On 2.10.2018, at 16:52, Igor Fedotov wrote: > > May I have a repair log for that "already expanded" OSD? > > > On 10/2/2018 4:32 PM, Sergey Malinin wrote: >> Repair goes through only when LVM volume has been expanded, otherw

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-02 Thread Alfredo Deza
On Tue, Oct 2, 2018 at 10:23 AM Alex Litvak wrote: > > Igor, > > Thank you for your reply. So what you are saying there are really no > sensible space requirements for a collocated device? Even if I setup 30 > GB for DB (which I really wouldn't like to do due to a space waste > considerations ) t

Re: [ceph-users] QEMU/Libvirt + librbd issue using Luminous 12.2.7

2018-10-02 Thread Andre Goree
On 2018/10/02 9:54 am, Jason Dillaman wrote: Perhaps that pastebin link has the wrong log pasted? The provided log looks like it's associated with the creation of image "32635-b6592790-5519-5184-b5ef-5f16b3523250" and not the attachment of an image to a VM. On Fri, Sep 28, 2018 at 3:15 PM Andr

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-02 Thread Alex Litvak
Igor, Thank you for your reply. So what you are saying there are really no sensible space requirements for a collocated device? Even if I setup 30 GB for DB (which I really wouldn't like to do due to a space waste considerations ) there is a chance that if this space feels up I will be in th

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-02 Thread Igor Fedotov
Even with a single device bluestore has a sort of implicit "BlueFS partition" where DB is stored.  And it dynamically adjusts (rebalances) the space for that partition in background. Unfortunately it might perform that "too lazy" and hence under some heavy load it might end-up with the lack of

Re: [ceph-users] Mimic offline problem

2018-10-02 Thread by morphin
One of ceph experts indicated that bluestore is somewhat preview tech (as for Redhat). So it could be best to checkout bluestore and rocksdb. There are some tools to check health and also repair. But there are limited documentation. Anyone who has experince with it? Anyone lead/help to a proper che

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-02 Thread Alex Litvak
I am sorry for interrupting the thread, but my understanding always was that blue store on the single device should not care of the DB size, i.e. it would use the data part for all operations if DB is full. And if it is not true, what would be sensible defaults on 800 GB SSD? I used ceph-ansi

Re: [ceph-users] QEMU/Libvirt + librbd issue using Luminous 12.2.7

2018-10-02 Thread Jason Dillaman
Perhaps that pastebin link has the wrong log pasted? The provided log looks like it's associated with the creation of image "32635-b6592790-5519-5184-b5ef-5f16b3523250" and not the attachment of an image to a VM. On Fri, Sep 28, 2018 at 3:15 PM Andre Goree wrote: > > On 2018/09/28 2:26 pm, Andre G

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-02 Thread Igor Fedotov
May I have a repair log for that "already expanded" OSD? On 10/2/2018 4:32 PM, Sergey Malinin wrote: Repair goes through only when LVM volume has been expanded, otherwise it fails with enospc as well as any other operation. However, expanding the volume immediately renders bluefs unmountable

Re: [ceph-users] cephfs issue with moving files between data pools gives Input/output error

2018-10-02 Thread Janne Johansson
Den mån 1 okt. 2018 kl 22:08 skrev John Spray : > > > totally new for me, also not what I would expect of a mv on a fs. I know > > this is normal to expect coping between pools, also from the s3cmd > > client. But I think more people will not expect this behaviour. Can't > > the move be implemente

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-02 Thread Sergey Malinin
Repair goes through only when LVM volume has been expanded, otherwise it fails with enospc as well as any other operation. However, expanding the volume immediately renders bluefs unmountable with IO error. 2 of 3 OSDs got bluefs log currupted (bluestore tool segfaults at the very end of bluefs

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-02 Thread Igor Fedotov
You mentioned repair had worked before, is that correct? What's the difference now except the applied patch? Different OSD? Anything else? On 10/2/2018 3:52 PM, Sergey Malinin wrote: It didn't work, emailed logs to you. On 2.10.2018, at 14:43, Igor Fedotov wrote: The major change is in g

Re: [ceph-users] cephfs clients hanging multi mds to single mds

2018-10-02 Thread Paul Emmerich
The kernel cephfs client unfortunately has the tendency to get stuck in some unrecoverable states requiring a reboot, especially in older kernels. Usually it's not recoverable without a reboot. Paul Am Di., 2. Okt. 2018 um 14:55 Uhr schrieb Jaime Ibar : > > Hi Paul, > > I tried ceph-fuse mounting

Re: [ceph-users] cephfs clients hanging multi mds to single mds

2018-10-02 Thread Jaime Ibar
Hi Paul, I tried ceph-fuse mounting it in a different mount point and it worked. The problem here is we can't unmount ceph kernel client as it is in use by some virsh processes. We forced the unmount and mount ceph-fuse but we got an I/O error and mount -l cleared all the processes but after

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-02 Thread Sergey Malinin
It didn't work, emailed logs to you. > On 2.10.2018, at 14:43, Igor Fedotov wrote: > > The major change is in get_bluefs_rebalance_txn function, it lacked > bluefs_rebalance_txn assignment.. > > > > On 10/2/2018 2:40 PM, Sergey Malinin wrote: >> PR doesn't seem to have changed since yesterd

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-02 Thread Igor Fedotov
The major change is in get_bluefs_rebalance_txn function, it lacked bluefs_rebalance_txn assignment.. On 10/2/2018 2:40 PM, Sergey Malinin wrote: PR doesn't seem to have changed since yesterday. Am I missing something? On 2.10.2018, at 14:15, Igor Fedotov wrote: Please update the patch f

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-02 Thread Sergey Malinin
PR doesn't seem to have changed since yesterday. Am I missing something? > On 2.10.2018, at 14:15, Igor Fedotov wrote: > > Please update the patch from the PR - it didn't update bluefs extents list > before. > > Also please set debug bluestore 20 when re-running repair and collect the log. >

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-02 Thread Igor Fedotov
Please update the patch from the PR - it didn't update bluefs extents list before. Also please set debug bluestore 20 when re-running repair and collect the log. If repair doesn't help - would you send repair and startup logs directly to me as I have some issues accessing ceph-post-file uplo

Re: [ceph-users] CRUSH puzzle: step weighted-take

2018-10-02 Thread Dan van der Ster
On Mon, Oct 1, 2018 at 8:09 PM Gregory Farnum wrote: > > On Fri, Sep 28, 2018 at 12:03 AM Dan van der Ster wrote: > > > > On Thu, Sep 27, 2018 at 9:57 PM Maged Mokhtar wrote: > > > > > > > > > > > > On 27/09/18 17:18, Dan van der Ster wrote: > > > > Dear Ceph friends, > > > > > > > > I have a CR

Re: [ceph-users] cephfs clients hanging multi mds to single mds

2018-10-02 Thread Paul Emmerich
Kernel 4.4 is not suitable for a multi MDS setup. In general, I wouldn't feel comfortable running 4.4 with kernel cephfs in production. I think at least 4.15 (not sure, but definitely > 4.9) is recommended for multi MDS setups. If you can't reboot: maybe try cephfs-fuse instead which is usually ve

Re: [ceph-users] NVMe SSD not assigned "nvme" device class

2018-10-02 Thread Hervé Ballans
Hi, You can easily configure it manually, e.g. : $ sudo ceph osd crush rm-device-class osd.xx $ sudo ceph osd crush set-device-class nvme osd.xx Indeed, it may be useful when you want to create custom rules on this type of device. Hervé Le 01/10/2018 à 23:25, Vladimir Brik a écrit : Hello,

Re: [ceph-users] cephfs clients hanging multi mds to single mds

2018-10-02 Thread Jaime Ibar
Hi Paul, we're using 4.4 kernel. Not sure if more recent kernels are stable for production services. In any case, as there are some production services running on those servers, rebooting wouldn't be an option if we can bring ceph clients back without rebooting. Thanks Jaime On 01/10/18 21

Re: [ceph-users] cephfs clients hanging multi mds to single mds

2018-10-02 Thread Jaime Ibar
Hi, there's only one entry in blacklist, however is a mon, not a cephfs client and no cephfs is mounted on that host. We're using kernel client and the kernel version is 4.4 for ceph services and cephfs clients. This is what we have in /sys/kernel/debug/ceph cat mdsmap epoch 59259 root 0

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-02 Thread Sergey Malinin
Yes, I did repair all OSDs and it finished with 'repair success'. I backed up OSDs so now I have more room to play. I posted log files using ceph-post-file with the following IDs: 4af9cc4d-9c73-41c9-9c38-eb6c551047a0 20df7df5-f0c9-4186-aa21-4e5c0172cd93 > On 2.10.2018, at 11:26, Igor Fedotov wr

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-02 Thread Igor Fedotov
You did repair for any of this OSDs, didn't you? For all of them? Would you please provide the log for both types (failed on mount and failed with enospc) of failing OSDs. Prior to collecting please remove existing ones prior and set debug bluestore to 20. On 10/2/2018 2:16 AM, Sergey Mali