Re: [ceph-users] data cleaup/disposal process

2018-01-11 Thread M Ranga Swami Reddy
Hi - The "rbd rm" or "rados rm -p " will not clean the data in side the OSDs. for ex: I wrote 1 MB data on my image/volume, then removed that image using "rbd rm" command, is this "rbd rm" will remove the data in side the OSD's object or just mark it as removed. Thanks Swami On Thu, Jan 4, 2018 a

Re: [ceph-users] Linux Meltdown (KPTI) fix and how it affects performance?

2018-01-11 Thread Christian Balzer
Hello, On Thu, 11 Jan 2018 11:42:53 -0600 Adam Tygart wrote: > Some people are doing hyperconverged ceph, colocating qemu > virtualization with ceph-osds. It is relevant for a decent subset of > people here. Therefore knowledge of the degree of performance > degradation is useful. > It was my u

Re: [ceph-users] Trying to increase number of PGs throws "Error E2BIG" though PGs/OSD < mon_max_pg_per_osd

2018-01-11 Thread Brad Hubbard
On Fri, Jan 12, 2018 at 11:27 AM, Subhachandra Chandra wrote: > Hello, > > We are running experiments on a Ceph cluster before we move data on it. > While trying to increase the number of PGs on one of the pools it threw the > following error > > root@ctrl1:/# ceph osd pool set data pg_num 65

[ceph-users] Trying to increase number of PGs throws "Error E2BIG" though PGs/OSD < mon_max_pg_per_osd

2018-01-11 Thread Subhachandra Chandra
Hello, We are running experiments on a Ceph cluster before we move data on it. While trying to increase the number of PGs on one of the pools it threw the following error root@ctrl1:/# ceph osd pool set data pg_num 65536 Error E2BIG: specified pg_num 65536 is too large (creating 32768 new PG

Re: [ceph-users] 4 incomplete PGs causing RGW to go offline?

2018-01-11 Thread David Turner
Which pools are the incomplete PGs a part of? I would say it's very likely that if some of the RGW metadata was incomplete that the daemons wouldn't be happy. On Thu, Jan 11, 2018, 6:17 PM Brent Kennedy wrote: > We have 3 RadosGW servers running behind HAProxy to enable clients to > connect to t

[ceph-users] 4 incomplete PGs causing RGW to go offline?

2018-01-11 Thread Brent Kennedy
We have 3 RadosGW servers running behind HAProxy to enable clients to connect to the ceph cluster like an amazon bucket. After all the failures and upgrade issues were resolved, I cannot get the RadosGW servers to stay online. They were upgraded to luminous, I even upgraded the OS to Ubuntu 16 on

Re: [ceph-users] filestore to bluestore: osdmap epoch problem and is the documentation correct?

2018-01-11 Thread Reed Dier
Thank you for documenting your progress and peril on the ML. Luckily I only have 24x 8TB HDD and 50x 1.92TB SSDs to migrate over to bluestore. 8 nodes, 4 chassis (failure domain), 3 drives per node for the HDDs, so I’m able to do about 3 at a time (1 node) for rip/replace. Definitely taking it

Re: [ceph-users] filestore to bluestore: osdmap epoch problem and is the documentation correct?

2018-01-11 Thread Brady Deetz
I hear you on time. I have 350 x 6TB drives to convert. I recently posted about a disaster I created automating my migration. Good luck On Jan 11, 2018 12:22 PM, "Reed Dier" wrote: > I am in the process of migrating my OSDs to bluestore finally and thought > I would give you some input on how I

Re: [ceph-users] filestore to bluestore: osdmap epoch problem and is the documentation correct?

2018-01-11 Thread Reed Dier
I am in the process of migrating my OSDs to bluestore finally and thought I would give you some input on how I am approaching it. Some of saga you can find in another ML thread here: https://www.spinics.net/lists/ceph-users/msg41802.html

Re: [ceph-users] Ceph MGR Influx plugin 12.2.2

2018-01-11 Thread Reed Dier
This morning I went through and enabled the influx plugin in ceph-mgr on12.2.2, so far so good. Only non-obvious step was installing the python-influxdb package that it depends on. Probably needs to be baked into the documentation somewhere. Other than that, 90% of the stats I use are in this,

Re: [ceph-users] Linux Meltdown (KPTI) fix and how it affects performance?

2018-01-11 Thread Adam Tygart
Some people are doing hyperconverged ceph, colocating qemu virtualization with ceph-osds. It is relevant for a decent subset of people here. Therefore knowledge of the degree of performance degradation is useful. -- Adam On Thu, Jan 11, 2018 at 11:38 AM, wrote: > I don't understand how all of t

Re: [ceph-users] Linux Meltdown (KPTI) fix and how it affects performance?

2018-01-11 Thread ceph
I don't understand how all of this is related to Ceph Ceph runs on a dedicated hardware, there is nothing there except Ceph, and the ceph daemons have already all power on ceph's data. And there is no random-code execution allowed on this node. Thus, spectre & meltdown are meaning-less for Cep

Re: [ceph-users] Linux Meltdown (KPTI) fix and how it affects performance?

2018-01-11 Thread Dan van der Ster
Hi all, Is anyone getting useful results with your benchmarking? I've prepared two test machines/pools and don't see any definitive slowdown with patched kernels from CentOS [1]. I wonder if Ceph will be somewhat tolerant of these patches, similarly to what's described here: http://www.scylladb.c

Re: [ceph-users] replace failed disk in Luminous v12.2.2

2018-01-11 Thread Dietmar Rieder
Hi Konstantin, thanks for your answer, see my answer to Alfredo which includes your suggestions. ~Dietmar On 01/11/2018 12:57 PM, Konstantin Shalygin wrote: >> Now wonder what is the correct way to replace a failed OSD block disk? > > Generic way for maintenance (e.g. disk replace) is rebalance

Re: [ceph-users] replace failed disk in Luminous v12.2.2

2018-01-11 Thread Dietmar Rieder
Hi Alfredo, thanks for your coments, see my answers inline. On 01/11/2018 01:47 PM, Alfredo Deza wrote: > On Thu, Jan 11, 2018 at 4:30 AM, Dietmar Rieder > wrote: >> Hello, >> >> we have failed OSD disk in our Luminous v12.2.2 cluster that needs to >> get replaced. >> >> The cluster was initiall

Re: [ceph-users] issue adding OSDs

2018-01-11 Thread Luis Periquito
this was a bit weird, but is now working... Writing for future reference if someone faces the same issue. this cluster was upgraded from jewel to luminous following the recommended process. When it was finished I just set the require_osd to luminous. However I hadn't restarted the daemons since. S

Re: [ceph-users] Does anyone use rcceph script in CentOS/SUSE?

2018-01-11 Thread Ken Dreyer
Please drop it, it has been untested for a long time. - Ken On Thu, Jan 11, 2018 at 4:49 AM, Nathan Cutler wrote: > To all who are running Ceph on CentOS or SUSE: do you use the "rcceph" > script? The ceph RPMs ship it in /usr/sbin/rcceph > > (Why I ask: more-or-less the same functionality is pr

Re: [ceph-users] Ceph MGR Influx plugin 12.2.2

2018-01-11 Thread Benjeman Meekhof
Hi Reed, Someone in our group originally wrote the plugin and put in PR. Since our commit the plugin was 'forward-ported' to master and made incompatible with Luminous so we've been using our own version of the plugin while waiting for the necessary pieces to be back-ported to Luminous to use the

Re: [ceph-users] Cluster crash - FAILED assert(interval.last > last)

2018-01-11 Thread Nick Fisk
I take my hat off to you, well done for solving that!!! > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Zdenek Janda > Sent: 11 January 2018 13:01 > To: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Cluster crash - FAILED assert(int

[ceph-users] Unable to join additional mon servers (luminous)

2018-01-11 Thread Thomas Gebhardt
Hello, I'm running a ceph-12.2.2 cluster on debian/stretch with three mon servers, unsuccessfully trying to add another (or two additional) mon servers. While the new mon server keeps in state "synchronizing", the old mon servers get out of quorum, endlessly changing state from "peon" to "electing

Re: [ceph-users] Performance issues on Luminous

2018-01-11 Thread Rafał Wądołowski
This drives are running as osd, not as journal. I think I can't understand is, why the performance of using rados bench with 1 thread is 3 times slower? Ceph osd bench shows good results. In my opinion it could be a 20% less speed, because of software overhead. I read the blog post (http://c

Re: [ceph-users] cephfs degraded on ceph luminous 12.2.2

2018-01-11 Thread Alessandro De Salvo
Hi, took quite some time to recover the pgs, and indeed the problem with the mds instances was due to the activating pgs. Once they were cleared the fs went back to the original state. I had to restart a few times some OSds though, in order to get all the pgs activated, and I didn't hit the limits

Re: [ceph-users] Cluster crash - FAILED assert(interval.last > last)

2018-01-11 Thread Zdenek Janda
Hi, we have restored damaged ODS not starting after bug caused by this issue, detailed steps are for reference at http://tracker.ceph.com/issues/21142#note-9 , should anybody hit into this this should fix it for you. Thanks Zdenek Janda On 11.1.2018 11:40, Zdenek Janda wrote: > Hi, > I have suc

Re: [ceph-users] replace failed disk in Luminous v12.2.2

2018-01-11 Thread Alfredo Deza
On Thu, Jan 11, 2018 at 4:30 AM, Dietmar Rieder wrote: > Hello, > > we have failed OSD disk in our Luminous v12.2.2 cluster that needs to > get replaced. > > The cluster was initially deployed using ceph-deploy on Luminous > v12.2.0. The OSDs were created using > > ceph-deploy osd create --bluesto

Re: [ceph-users] ceph-volume does not support upstart

2018-01-11 Thread Alfredo Deza
On Wed, Jan 10, 2018 at 8:38 PM, 赵赵贺东 wrote: > Hello, > I am sorry for the delay. > Thank you for your suggestion. > > It is better to update system or keep using ceph-disk in fact. > Thank you Alfredo Deza & Cary. Both are really OK options for you for now. Unless ceph-disk is causing you issues

Re: [ceph-users] replace failed disk in Luminous v12.2.2

2018-01-11 Thread Konstantin Shalygin
Now wonder what is the correct way to replace a failed OSD block disk? Generic way for maintenance (e.g. disk replace) is rebalance by change osd weight: ceph osd crush reweight osdid 0 cluster migrate data "from this osd" When HEALTH_OK you can safe remove this OSD: ceph osd out osd_id syst

[ceph-users] Does anyone use rcceph script in CentOS/SUSE?

2018-01-11 Thread Nathan Cutler
To all who are running Ceph on CentOS or SUSE: do you use the "rcceph" script? The ceph RPMs ship it in /usr/sbin/rcceph (Why I ask: more-or-less the same functionality is provided by the ceph-osd.target and ceph-mon.target systemd units, and the script is no longer maintained, so we'd like to

Re: [ceph-users] One object degraded cause all ceph requests hang - Jewel 10.2.6 (rbd + radosgw)

2018-01-11 Thread Vincent Godin
As no response were given, i will explain what i found : maybe it could help other people .dirXXX object is an index marker with a 0 data size. The metadata associated to this object (located in the levelDB of the OSDs currently holding this marker) is the index of the bucket corresponding to

[ceph-users] Ceph Future

2018-01-11 Thread Massimiliano Cuttini
Hi everybody, i'm always looking at CEPH for the future. But I do see several issue that are leaved unresolved and block nearly future adoption. I would like to know if there are some answear already: _*1) Separation between Client and Server distribution.*_ At this time you have always to upd

[ceph-users] How to get the usage of an indexless-bucket

2018-01-11 Thread Vincent Godin
How to know the usage of an indexless bucket ? We need to have this information for our billing process ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cluster crash - FAILED assert(interval.last > last)

2018-01-11 Thread Zdenek Janda
Hi, I have succeeded in identifying faulty PG: -3450> 2018-01-11 11:32:20.015658 7f066e2a3e00 10 osd.15 15340 12.62d needs 13939-15333 -3449> 2018-01-11 11:32:20.019405 7f066e2a3e00 1 osd.15 15340 build_past_intervals_parallel over 13939-15333 -3448> 2018-01-11 11:32:20.019436 7f066e2a3e00 10

Re: [ceph-users] How to "reset" rgw?

2018-01-11 Thread Martin Emrich
Ok thanks, I'll try it out... Regards, Martin Am 10.01.18 um 18:48 schrieb Casey Bodley: On 01/10/2018 04:34 AM, Martin Emrich wrote: Hi! As I cannot find any solution for my broken rgw pools, the only way out is to give up and "reset". How do I throw away all rgw data from a ceph clust

Re: [ceph-users] Cluster crash - FAILED assert(interval.last > last)

2018-01-11 Thread Zdenek Janda
Hi, updated the issue at http://tracker.ceph.com/issues/21142#note-5 with last 1 lines of strace before ABRT. Crash ends with: 0.002429 pread64(22, "\10\7\213,\0\0\6\1i\33\0\0c\341\353kW\rC\365\2310\34\307\212\270\215{\354:\0\0"..., 12288, 908492996608) = 12288 0.007869 pread64(22,

Re: [ceph-users] Cluster crash - FAILED assert(interval.last > last)

2018-01-11 Thread Zdenek Janda
Hi, does anyone suggest what to do with this ? I have identified the underlying crashing code src/osd/osd_types.cc [assert(interval.last > last);] commited by Sage Weil, however didnt figured out exact mechanism of function and why it crashes. Also unclear is mechanism, how this bug spreaded and cr

Re: [ceph-users] Cluster crash - FAILED assert(interval.last > last)

2018-01-11 Thread Josef Zelenka
I have posted logs/strace from our osds with details to a ticket in the ceph bug tracker - see here http://tracker.ceph.com/issues/21142. You can see where exactly the OSDs crash etc, this can be of help if someone decides to debug it. JZ On 10/01/18 22:05, Josef Zelenka wrote: Hi, today w

[ceph-users] replace failed disk in Luminous v12.2.2

2018-01-11 Thread Dietmar Rieder
Hello, we have failed OSD disk in our Luminous v12.2.2 cluster that needs to get replaced. The cluster was initially deployed using ceph-deploy on Luminous v12.2.0. The OSDs were created using ceph-deploy osd create --bluestore cephosd-${osd}:/dev/sd${disk} --block-wal /dev/nvme0n1 --block-db /d