[ceph-users] After 13.2.2 upgrade: bluefs mount failed to replay log: (5) Input/output error

2018-10-03 Thread Kevin Olbrich
Hi! Yesterday one of our (non-priority) clusters failed when 3 OSDs went down (EC 8+2) together. *This is strange as we did an upgrade from 13.2.1 to 13.2.2 one or two hours before.* They failed exactly at the same moment, rendering the cluster unusable (CephFS). We are using CentOS 7 with latest

Re: [ceph-users] After 13.2.2 upgrade: bluefs mount failed to replay log: (5) Input/output error

2018-10-03 Thread Kevin Olbrich
Small addition: the failing disks are in the same host. This is a two-host, failure-domain OSD cluster. Am Mi., 3. Okt. 2018 um 10:13 Uhr schrieb Kevin Olbrich : > Hi! > > Yesterday one of our (non-priority) clusters failed when 3 OSDs went down > (EC 8+2) together. > *This is strange as we did

[ceph-users] network latency setup for osd nodes combined with vm

2018-10-03 Thread Marc Roos
It was not my first intention to host vm's on osd nodes of the ceph cluster. But since this test cluster is not doing anything, I might aswell use some of the cores. Currently I have configured a macvtap on the ceph client network configured as a vlan. Disadvantage is that the local osd's ca

Re: [ceph-users] Bluestore vs. Filestore

2018-10-03 Thread John Spray
On Tue, Oct 2, 2018 at 6:28 PM wrote: > > Hi. > > Based on some recommendations we have setup our CephFS installation using > bluestore*. We're trying to get a strong replacement for "huge" xfs+NFS > server - 100TB-ish size. > > Current setup is - a sizeable Linux host with 512GB of memory - one l

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-03 Thread Igor Fedotov
You may want to try new updates from the PR along with disabling flush on recovery for rocksdb (avoid_flush_during_recovery parameter). Full cmd line might looks like: CEPH_ARGS="--bluestore_rocksdb_options avoid_flush_during_recovery=1" bin/ceph-bluestore-tool --path repair To be applied

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-03 Thread Igor Fedotov
Alex, upstream recommendations for DB sizing are probably good enough but as most of fixed allocations they aren't super optimal for all the use cases. Usually one either wastes space or lacks it pme day in such configs. So I think we should have means to have more freedom in volumes managem

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-03 Thread Sergey Malinin
Repair has gone farther but failed on something different - this time it appears to be related to store inconsistency rather than lack of free space. Emailed log to you, beware: over 2GB uncompressed. > On 3.10.2018, at 13:15, Igor Fedotov wrote: > > You may want to try new updates from the P

Re: [ceph-users] After 13.2.2 upgrade: bluefs mount failed to replay log: (5) Input/output error

2018-10-03 Thread Igor Fedotov
I've seen somewhat similar behavior in a log from Sergey Malinin in another thread ("mimic: 3/4 OSDs crashed...") He claimed it happened after LVM volume expansion. Isn't this the case for you? Am I right that you use LVM volumes? On 10/3/2018 11:22 AM, Kevin Olbrich wrote: Small addition:

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-03 Thread Sergey Malinin
Update: I rebuilt ceph-osd with latest PR and it started, worked for a few minutes and eventually failed on enospc. After that ceph-bluestore-tool repair started to fail on enospc again. I was unable to collect ceph-osd log, so emailed you the most recent repair log. > On 3.10.2018, at 13:58,

Re: [ceph-users] Bluestore vs. Filestore

2018-10-03 Thread Paul Emmerich
I would never ever start a new cluster with Filestore nowadays. Sure, there are a few minor issues with Bluestore like that it currently requires some manual configuration for the cache. But overall, Bluestore is so much better. Your use case sounds it might profit from the rados cache tier featur

[ceph-users] Some questions concerning filestore --> bluestore migration

2018-10-03 Thread Massimo Sgaravatto
Hi I have a ceph cluster, running luminous, composed of 5 OSD nodes, which is using filestore. Each OSD node has 2 E5-2620 v4 processors, 64 GB of RAM, 10x6TB SATA disk + 2x200GB SSD disk (then I have 2 other disks in RAID for the OS), 10 Gbps. So each SSD disk is used for the journal for 5 OSDs.

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-03 Thread Igor Fedotov
To fix this specific issue please apply the following PR: https://github.com/ceph/ceph/pull/24339 This wouldn't fix original issue but just in case please try to run repair again. Will need log if an error is different from ENOSPC from your latest email. Thanks, Igor On 10/3/2018 1:58 PM

Re: [ceph-users] After 13.2.2 upgrade: bluefs mount failed to replay log: (5) Input/output error

2018-10-03 Thread Kevin Olbrich
The disks were deployed with ceph-deploy / ceph-volume using the default style (lvm) and not simple-mode. The disks were provisioned as a whole, no resizing. I never touched the disks after deployment. It is very strange that this first happened after the update, never met such an error before.

Re: [ceph-users] After 13.2.2 upgrade: bluefs mount failed to replay log: (5) Input/output error

2018-10-03 Thread Paul Emmerich
There's "ceph-bluestore-tool repair/fsck" In your scenario, a few more log files would be interesting: try setting debug bluefs to 20/20. And if that's not enough log try also setting debug osd, debug bluestore, and debug bdev to 20/20. Paul Am Mi., 3. Okt. 2018 um 13:48 Uhr schrieb Kevin Olbri

Re: [ceph-users] Mimic offline problem

2018-10-03 Thread Goktug Yildirim
Hi Sage, Thank you for your response. Now I am sure this incident is going to be resolved. The problem started when 7 server crashed same time and they came back after ~5 minutes. Two of our 3 mon services were restarted in this crash. Since mon services are enabled they should be started ne

Re: [ceph-users] Mimic offline problem

2018-10-03 Thread Darius Kasparavičius
Hello, You can also reduce the osd map updates by adding this to your ceph config file. "osd crush update on start = false". This should remove and update that is generated when osd starts. 2018-10-03 14:03:21.534 7fe15eddb700 0 mon.SRV-SBKUARK14@0(leader) e14 handle_command mon_command({"prefi

Re: [ceph-users] Mimic offline problem

2018-10-03 Thread Goktug Yildirim
Hello, It seems nothing has changed. OSD config: https://paste.ubuntu.com/p/MtvTr5HYW4/ OSD debug log: https://paste.ubuntu.com/p/7Sx64xGzkR/ > On 3 Oct 2018, at 14:27, Darius Kasparavičius wrote: > > Hello, >

[ceph-users] ceph-volume: recreate OSD with same ID after drive replacement

2018-10-03 Thread Andras Pataki
After replacing failing drive I'd like to recreate the OSD with the same osd-id using ceph-volume (now that we've moved to ceph-volume from ceph-disk).  However, I seem to not be successful.  The command I'm using: ceph-volume lvm prepare --bluestore --osd-id 747 --data H901D44/H901D44 --block

Re: [ceph-users] ceph-volume: recreate OSD with same ID after drive replacement

2018-10-03 Thread Alfredo Deza
On Wed, Oct 3, 2018 at 9:57 AM Andras Pataki wrote: > > After replacing failing drive I'd like to recreate the OSD with the same > osd-id using ceph-volume (now that we've moved to ceph-volume from > ceph-disk). However, I seem to not be successful. The command I'm using: > > ceph-volume lvm pre

Re: [ceph-users] Mimic offline problem

2018-10-03 Thread Sage Weil
Oh... I think this is the problem: 2018-10-03 16:37:04.284 7efef2ae0700 20 slow op osd_pg_create(e72883 66.af:60196 66.ba:60196 66.be:60196 66.d8:60196 66.f8:60196 66.124:60196 66.14c:60196 66.1ac:60196 66.223:60196 66.248:60196 66.271:60196 66.2d1:60196 66.47a:68641) initiated 2018-10-03 16:20

Re: [ceph-users] Mimic offline problem

2018-10-03 Thread Goktug Yildirim
Sage, Pool 66 is the only pool it shows right now. This a pool created months ago. ceph osd lspools 66 mypool As we recreated mon db from OSDs, the pools for MDS was unusable. So we deleted them. After we create another cephfs fs and pools we started MDS and it stucked on creation. So we stoppe

Re: [ceph-users] ceph-volume: recreate OSD with same ID after drive replacement

2018-10-03 Thread Andras Pataki
Thanks - I didn't realize that was such a recent fix. I've now tried 12.2.8, and perhaps I'm not clear on what I should have done to the OSD that I'm replacing, since I'm getting the error "The osd ID 747 is already in use or does not exist.".  The case is clearly the latter, since I've comple

Re: [ceph-users] Recover data from cluster / get rid of down, incomplete, unknown pgs

2018-10-03 Thread Gregory Farnum
If you've really extracted all the PGs from the down OSDs, you should have been able to inject them into new OSDs and continue on from there with just rebalancing activity. The use of mark_unfound_lost_revert complicates matters a bit but I'm not sure what the behavior would be if you just put them

Re: [ceph-users] ceph-volume: recreate OSD with same ID after drive replacement

2018-10-03 Thread Alfredo Deza
On Wed, Oct 3, 2018 at 11:23 AM Andras Pataki wrote: > > Thanks - I didn't realize that was such a recent fix. > > I've now tried 12.2.8, and perhaps I'm not clear on what I should have > done to the OSD that I'm replacing, since I'm getting the error "The osd > ID 747 is already in use or does no

Re: [ceph-users] getattr - failed to rdlock waiting

2018-10-03 Thread Gregory Farnum
On Tue, Oct 2, 2018 at 12:18 PM Thomas Sumpter wrote: > Hi Folks, > > > > I am looking for advice on how to troubleshoot some long operations found > in MDS. Most of the time performance is fantastic, but occasionally and to > no real pattern or trend, a gettattr op will take up to ~30 seconds to

Re: [ceph-users] Help! OSDs across the cluster just crashed

2018-10-03 Thread Gregory Farnum
Yeah, don't run these commands blind. They are changing the local metadata of the PG in ways that may make it inconsistent with the overall cluster and result in lost data. Brett, it seems this issue has come up several times in the field but we haven't been able to reproduce it locally or get eno

Re: [ceph-users] Mimic offline problem

2018-10-03 Thread Sage Weil
On Wed, 3 Oct 2018, Goktug Yildirim wrote: > Sage, > > Pool 66 is the only pool it shows right now. This a pool created months ago. > ceph osd lspools > 66 mypool > > As we recreated mon db from OSDs, the pools for MDS was unusable. So we > deleted them. > After we create another cephfs fs and p

Re: [ceph-users] slow export of cephfs through samba

2018-10-03 Thread Gregory Farnum
On Thu, Sep 27, 2018 at 7:37 AM Chad W Seys wrote: > Hi all, >I am exporting cephfs using samba. It is much slower over samba than > direct. Anyone know how to speed it up? >Benchmarked using bonnie++ 5 times either directly to cephfs mounted > by kernel (v4.18.6) module: > bonnie++

Re: [ceph-users] Ceph 12.2.5 - FAILED assert(0 == "put on missing extent (nothing before)")

2018-10-03 Thread Ricardo J. Barberis
I created https://tracker.ceph.com/issues/36303 I can wait maybe a couple of days before recreating this OSD if you need me to extract som more info. Thanks. El Miércoles 03/10/2018 a las 01:43, Gregory Farnum escribió: > I'd create a new ticket and reference the older one; they may not have th

Re: [ceph-users] Mimic offline problem

2018-10-03 Thread Goktug Yildirim
We are starting to work on it. First step is getting the structure out and dumping the current value as you say. And you were correct we did not run force_create_pg. > On 3 Oct 2018, at 17:52, Sage Weil wrote: > > On Wed, 3 Oct 2018, Goktug Yildirim wrote: >> Sage, >> >> Pool 66 is the only p

Re: [ceph-users] Help! OSDs across the cluster just crashed

2018-10-03 Thread Brett Chancellor
That turned out to be exactly the issue (And boy was it fun clearing pgs out on 71 OSDs). I think it's caused by a combination of two factors. 1. This cluster has way to many placement groups per OSD (just north of 800). It was fine when we first created all the pools, but upgrades (most recently t

Re: [ceph-users] Mimic offline problem

2018-10-03 Thread Sage Weil
On Wed, 3 Oct 2018, Goktug Yildirim wrote: > We are starting to work on it. First step is getting the structure out and > dumping the current value as you say. > > And you were correct we did not run force_create_pg. Great. So, eager to see what the current structure is... please attach once yo

[ceph-users] interpreting ceph mds stat

2018-10-03 Thread Jeff Smith
I need some help deciphering the results of ceph mds stat. I have been digging in the docs for hours. If someone can point me in the right direction and/or help me understand. In the documentation it shows a result like this. cephfs-1/1/1 up {0=a=up:active} What do each of the 1s represent?

[ceph-users] fixing another remapped+incomplete EC 4+2 pg

2018-10-03 Thread Graham Allan
Following on from my previous adventure with recovering pgs in the face of failed OSDs, I now have my EC 4+2 pool oeprating with min_size=5 which is as things should be. However I have one pg which is stuck in state remapped+incomplete because it has only 4 out of 6 osds running, and I have be

Re: [ceph-users] Bluestore vs. Filestore

2018-10-03 Thread jesper
> Your use case sounds it might profit from the rados cache tier > feature. It's a rarely used feature because it only works in very > specific circumstances. But your scenario sounds like it might work. > Definitely worth giving it a try. Also, dm-cache with LVM *might* > help. > But if your activ

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-03 Thread Sergey Malinin
Finally goodness happened! I applied PR and ran repair on OSD unmodified after initial failure. It went through without any errors and now I'm able to fuse mount the OSD and export PGs off it using ceph-objectstore-tool. Just in order to not mess it up I haven't started ceph-osd until I have PGs

Re: [ceph-users] Mimic offline problem

2018-10-03 Thread Sage Weil
On Wed, 3 Oct 2018, Göktuğ Yıldırım wrote: > If I didn't do it wrong, I got the output as below. > > ceph-kvstore-tool rocksdb /var/lib/ceph/mon/ceph-SRV-SBKUARK14/store.db/ get > osd_pg_creating creating > dump > 2018-10-03 20:08:52.070 7f07f5659b80 1 rocksdb: do_open column families: > [defau

Re: [ceph-users] Bluestore vs. Filestore

2018-10-03 Thread Paul Emmerich
Am Mi., 3. Okt. 2018 um 20:10 Uhr schrieb : > They are ordered and will hopefully arrive very soon. > > Can I: > 1) Add disks > 2) Create pool > 3) stop all MDS's > 4) rados cppool > 5) Start MDS > > .. Yes, thats a cluster-down on CephFS but shouldn't take long. Or is > there a better guide? you

Re: [ceph-users] Mimic offline problem

2018-10-03 Thread Goktug YILDIRIM
I changed the file name to make it clear. When I use your command with "+decode" I'm getting an error like this: ceph-dencoder type creating_pgs_t import DUMPFILE decode dump_json error: buffer::malformed_input: void creating_pgs_t::decode(ceph::buffer::list::iterator&) no longer understand old e

Re: [ceph-users] Mimic offline problem

2018-10-03 Thread Sage Weil
I bet the kvstore output it in a hexdump format? There is another option to get the raw data iirc On October 3, 2018 3:01:41 PM EDT, Goktug YILDIRIM wrote: >I changed the file name to make it clear. >When I use your command with "+decode" I'm getting an error like this: > >ceph-dencoder type

Re: [ceph-users] Mimic offline problem

2018-10-03 Thread Göktuğ Yıldırım
I'm so sorry about that I missed "out" parameter. My bad.. This is the output: https://paste.ubuntu.com/p/KwT9c8F6TF/ Sage Weil şunları yazdı (3 Eki 2018 21:13): > I bet the kvstore output it in a hexdump format? There is another option to > get the raw data iirc > > > >> On October 3, 201

Re: [ceph-users] Mimic offline problem

2018-10-03 Thread Sage Weil
On Wed, 3 Oct 2018, Göktuğ Yıldırım wrote: > I'm so sorry about that I missed "out" parameter. My bad.. > This is the output: https://paste.ubuntu.com/p/KwT9c8F6TF/ Excellent, thanks. That looks like it confirms the problem is that teh recovery tool didn't repopulate the creating pgs properly.

Re: [ceph-users] Mimic offline problem

2018-10-03 Thread Göktuğ Yıldırım
Also you was asking the RAW output. I've been trying to fix it for days and I didn't sleep. Forgive the dumb mistakes. RAW dump output: https://drive.google.com/file/d/1SzFNNjSK9Q_j4iyYJTRqOYuLWJcsFX9C/view?usp=sharing Göktuğ Yıldırım şunları yazdı (3 Eki 2018 21:34): > I'm so sorry about tha

Re: [ceph-users] ceph-volume: recreate OSD with same ID after drive replacement

2018-10-03 Thread Andras Pataki
Ok, understood (for next time). But just as an update/closure to my investigation - it seems this is a feature of ceph-volume (that it can't just create an OSD from scratch with a given ID), not of base ceph.  The underlying ceph command (ceph osd new) very happily accepts an osd-id as an extr

Re: [ceph-users] Bluestore vs. Filestore

2018-10-03 Thread Ronny Aasen
On 03.10.2018 20:10, jes...@krogh.cc wrote: Your use case sounds it might profit from the rados cache tier feature. It's a rarely used feature because it only works in very specific circumstances. But your scenario sounds like it might work. Definitely worth giving it a try. Also, dm-cache with L

Re: [ceph-users] Bluestore vs. Filestore

2018-10-03 Thread Sage Weil
On Tue, 2 Oct 2018, jes...@krogh.cc wrote: > Hi. > > Based on some recommendations we have setup our CephFS installation using > bluestore*. We're trying to get a strong replacement for "huge" xfs+NFS > server - 100TB-ish size. > > Current setup is - a sizeable Linux host with 512GB of memory - o

Re: [ceph-users] ceph-volume: recreate OSD with same ID after drive replacement

2018-10-03 Thread Alfredo Deza
On Wed, Oct 3, 2018 at 3:52 PM Andras Pataki wrote: > > Ok, understood (for next time). > > But just as an update/closure to my investigation - it seems this is a > feature of ceph-volume (that it can't just create an OSD from scratch > with a given ID), not of base ceph. The underlying ceph comm

[ceph-users] hardware heterogeneous in same pool

2018-10-03 Thread Bruno Carvalho
Hi Cephers, I would like to know how you are growing the cluster. Using dissimilar hardware in the same pool or creating a pool for each different hardware group. What problem would I have many problems using different hardware (CPU, memory, disk) in the same pool? Someone could share the experi

Re: [ceph-users] hardware heterogeneous in same pool

2018-10-03 Thread Jonathan D. Proulx
On Wed, Oct 03, 2018 at 07:09:30PM -0300, Bruno Carvalho wrote: :Hi Cephers, I would like to know how you are growing the cluster. : :Using dissimilar hardware in the same pool or creating a pool for each :different hardware group. : :What problem would I have many problems using different hardware

Re: [ceph-users] Some questions concerning filestore --> bluestore migration

2018-10-03 Thread solarflow99
I use the same configuration you have, and I plan on using bluestore. My SSDs are only 240GB and it worked with filestore all this time, I suspect bluestore should be fine too. On Wed, Oct 3, 2018 at 4:25 AM Massimo Sgaravatto < massimo.sgarava...@gmail.com> wrote: > Hi > > I have a ceph cluste

Re: [ceph-users] ceph-volume: recreate OSD with same ID after drive replacement

2018-10-03 Thread solarflow99
thats strange, I recall only deleting the OSD from the crushmap, authm then osd rm.. On Wed, Oct 3, 2018 at 2:54 PM Alfredo Deza wrote: > On Wed, Oct 3, 2018 at 3:52 PM Andras Pataki > wrote: > > > > Ok, understood (for next time). > > > > But just as an update/closure to my investigation - it

Re: [ceph-users] Mimic offline problem

2018-10-03 Thread Goktug Yildirim
This is our cluster state right now. I can reach rbd list and thats good! Thanks a lot Sage!!! ceph -s: https://paste.ubuntu.com/p/xBNPr6rJg2/ As you can see we have 2 unfound pg since some of our OSDs can not start. 58 OSD gives different errors. How can I fix these OSD's? If I remember correct

[ceph-users] provide cephfs to mutiple project

2018-10-03 Thread Joshua Chen
Hello all, I am almost ready to provide storage (cephfs in the beginning) to my colleagues, they belong to different main project, and according to their budget that are previously claimed, to have different capacity. For example ProjectA will have 50TB, ProjectB will have 150TB. I choosed cephf

Re: [ceph-users] Mimic offline problem

2018-10-03 Thread Sage Weil
On Thu, 4 Oct 2018, Goktug Yildirim wrote: > This is our cluster state right now. I can reach rbd list and thats good! > Thanks a lot Sage!!! > ceph -s: https://paste.ubuntu.com/p/xBNPr6rJg2/ Progress! Not out of the woods yet, though... > As you can see we have 2 unfound pg since some of our O

[ceph-users] CephFS performance.

2018-10-03 Thread jesper
Hi All. First thanks for the good discussion and strong answer's I've gotten so far. Current cluster setup is 4 x 10 x 12TB 7.2K RPM drives with all and 10GbitE and metadata on rotating drives - 3x replication - 256GB memory in OSD hosts and 32+ cores. Behind Perc with eachdiskraid0 and BBWC. Pl