Re: [ceph-users] ceps-deploy won't install luminous

2017-11-15 Thread Hans van den Bogert
verify that you did that part? > On Nov 15, 2017, at 10:41 AM, Hans van den Bogert > wrote: > > Hi, > > Can you show the contents of the file, /etc/yum.repos.d/ceph.repo ? > > Regards, > > Hans >> On Nov 15, 2017, at 10:27 AM, Ragan, Tj (Dr.) >> wr

[ceph-users] osd/bluestore: Get block.db usage

2017-12-04 Thread Hans van den Bogert
Hi all, Is there a way to get the current usage of the bluestore's block.db? I'd really like to monitor this as we have a relatively high number of objects per OSD. A second question related to the above, are there mechanisms to influence which objects' metadata gets spilled once the block.db is

[ceph-users] mgr dashboard and cull Removing data for x

2017-12-11 Thread Dan Van Der Ster
Hi all, I'm playing with the dashboard module in 12.2.2 (and it's very cool!) but I noticed that some OSDs do not have metadata, e.g. this page: http://xxx:7000/osd/perf/74 Has empty metadata. I *am* able to see all the info with `ceph osd metadata 74`. I noticed in the mgr log we have: 2017-

Re: [ceph-users] The way to minimize osd memory usage?

2017-12-11 Thread Hans van den Bogert
There’s probably multiple reasons. However I just wanted to chime in that I set my cache size to 1G and I constantly see OSD memory converge to ~2.5GB. In [1] you can see the difference between a node with 4 OSDs, v12.2.2, on the left; and a node with 4 OSDs v12.2.1 on the right. I really hoped

[ceph-users] ceph-volume lvm activate could not find osd..0

2017-12-12 Thread Dan van der Ster
Hi all, Did anyone successfully prepare a new OSD with ceph-volume in 12.2.2? We are trying the simplest thing possible and not succeeding :( # ceph-volume lvm prepare --bluestore --data /dev/sdb # ceph-volume lvm list == osd.0 === [block] /dev/ceph-4da6fd06-b069-49af-901f-c9513b

Re: [ceph-users] ceph-volume lvm activate could not find osd..0

2017-12-12 Thread Dan van der Ster
Doh! The activate command needs the *osd* fsid, not the cluster fsid. So this works: ceph-volume lvm activate 0 6608c0cf-3827-4967-94fd-5a3336f604c3 Is an "activate-all" equivalent planned? -- Dan On Tue, Dec 12, 2017 at 11:35 AM, Dan van der Ster wrote: > Hi all, &

[ceph-users] Any RGW admin frontends?

2017-12-15 Thread Dan van der Ster
Hi all, As we are starting to ramp up our internal rgw service, I am wondering if someone already developed some "open source" high-level admin tools for rgw. On the one hand, we're looking for a web UI for users to create and see their credentials, quota, usage, and maybe a web bucket browser. Th

Re: [ceph-users] MDS behind on trimming

2017-12-21 Thread Dan van der Ster
oint? I'm thinking about the following > changes: > > mds log max segments = 200 > mds log max expiring = 200 > > Thanks, > > Stefan > > [1]: https://www.spinics.net/lists/ceph-users/msg39387.html > [2]: > http://lists.ceph.com/piperm

[ceph-users] ceph-volume lvm deactivate/destroy/zap

2017-12-21 Thread Dan van der Ster
Hi, For someone who is not an lvm expert, does anyone have a recipe for destroying a ceph-volume lvm osd? (I have a failed disk which I want to deactivate / wipe before physically removing from the host, and the tooling for this doesn't exist yet http://tracker.ceph.com/issues/22287) > ceph-volu

Re: [ceph-users] ceph-volume lvm deactivate/destroy/zap

2017-12-21 Thread Dan van der Ster
On Thu, Dec 21, 2017 at 3:59 PM, Stefan Kooman wrote: > Quoting Dan van der Ster (d...@vanderster.com): >> Hi, >> >> For someone who is not an lvm expert, does anyone have a recipe for >> destroying a ceph-volume lvm osd? >> (I have a failed disk which I

Re: [ceph-users] Open Compute (OCP) servers for Ceph

2017-12-22 Thread Dan van der Ster
Hi Wido, We have used a few racks of Wiwynn OCP servers in a Ceph cluster for a couple of years. The machines are dual Xeon [1] and use some of those 2U 30-disk "Knox" enclosures. Other than that, I have nothing particularly interesting to say about these. Our data centre procurement team have al

Re: [ceph-users] Increasing PG number

2018-01-02 Thread Hans van den Bogert
Please refer to standard documentation as much as possible, http://docs.ceph.com/docs/jewel/rados/operations/placement-groups/#set-the-number-of-placement-groups Han’s is also incomplet

Re: [ceph-users] ceph-volume lvm deactivate/destroy/zap

2018-01-08 Thread Dan van der Ster
On Mon, Jan 8, 2018 at 4:37 PM, Alfredo Deza wrote: > On Thu, Dec 21, 2017 at 11:35 AM, Stefan Kooman wrote: >> Quoting Dan van der Ster (d...@vanderster.com): >>> Thanks Stefan. But isn't there also some vgremove or lvremove magic >>> that needs to bring down

Re: [ceph-users] Linux Meltdown (KPTI) fix and how it affects performance?

2018-01-11 Thread Dan van der Ster
Hi all, Is anyone getting useful results with your benchmarking? I've prepared two test machines/pools and don't see any definitive slowdown with patched kernels from CentOS [1]. I wonder if Ceph will be somewhat tolerant of these patches, similarly to what's described here: http://www.scylladb.c

Re: [ceph-users] Fwd: Ceph team involvement in Rook (Deploying Ceph in Kubernetes)

2018-01-21 Thread Hans van den Bogert
Should I summarize this is ceph-helm being being EOL? If I'm spinning up a toy cluster for a homelab, should I invest time in Rook, or stay with ceph-helm for now? On Fri, Jan 19, 2018 at 11:55 AM, Kai Wagner wrote: > Just for those of you who are not subscribed to ceph-users. > > > For

[ceph-users] Luminous: example of a single down osd taking out a cluster

2018-01-22 Thread Dan van der Ster
Hi all, We just saw an example of one single down OSD taking down a whole (small) luminous 12.2.2 cluster. The cluster has only 5 OSDs, on 5 different servers. Three of those servers also run a mon/mgr combo. First, we had one server (mon+osd) go down legitimately [1] -- I can tell when it went

Re: [ceph-users] Luminous: example of a single down osd taking out a cluster

2018-01-22 Thread Dan van der Ster
down at e601 Thanks for the help solving this puzzle, Dan On Mon, Jan 22, 2018 at 8:07 PM, Dan van der Ster wrote: > Hi all, > > We just saw an example of one single down OSD taking down a whole > (small) luminous 12.2.2 cluster. > > The cluster has only 5 OSDs, on 5 differ

[ceph-users] Redirect for restful API in manager

2018-02-05 Thread Hans van den Bogert
Hi all, In the release notes of 12.2.2 the following is stated: > Standby ceph-mgr daemons now redirect requests to the active messenger, easing configuration for tools & users accessing the web dashboard, restful API, or other ceph-mgr module services. However, it doesn't seem to be the cas

[ceph-users] Retrieving ceph health from restful manager plugin

2018-02-05 Thread Hans van den Bogert
Hi All, I might really be bad at searching, but I can't seem to find the ceph health status through the new(ish) restful api. Is that right? I know how I could retrieve it through a Python script, however I'm trying to keep our monitoring application as layer cake free as possible -- as such a res

Re: [ceph-users] Luminous 12.2.3 release date?

2018-02-12 Thread Hans van den Bogert
Hi Wido, Did you ever get an answer? I'm eager to know as well. Hans On Tue, Jan 30, 2018 at 10:35 AM, Wido den Hollander wrote: > Hi, > > Is there a ETA yet for 12.2.3? Looking at the tracker there aren't that many > outstanding issues: http://tracker.ceph.com/projects/ceph/roadmap > > On Git

Re: [ceph-users] balancer mgr module

2018-02-16 Thread Dan van der Ster
Hi Caspar, I've been trying the mgr balancer for a couple weeks now and can share some experience. Currently there are two modes implemented: upmap and crush-compat. Upmap requires all clients to be running luminous -- it uses this new pg-upmap mechanism to precisely move PGs one by one to a mor

Re: [ceph-users] [Board] Ceph at OpenStack Barcelona

2016-09-01 Thread Dan Van Der Ster
Hi Patrick, > On 01 Sep 2016, at 16:29, Patrick McGarry wrote: > > Hey cephers, > > Now that our APAC roadshow has concluded I’m starting to look forward > to upcoming events like OpenStack Barcelona. There were a ton of talks > submitted this time around, so many of you did not get your talk >

[ceph-users] Cleanup old osdmaps after #13990 fix applied

2016-09-14 Thread Dan Van Der Ster
Hi, We've just upgraded to 0.94.9, so I believe this issue is fixed: http://tracker.ceph.com/issues/13990 AFAICT "resolved" means the number of osdmaps saved on each OSD will not grow unboundedly anymore. However, we have many OSDs with loads of old osdmaps, e.g.: # pwd /var/lib/ceph/osd

Re: [ceph-users] Cleanup old osdmaps after #13990 fix applied

2016-09-14 Thread Dan Van Der Ster
lete it, together with any > attachments, and be advised that any dissemination or copying of this message > is prohibited. > ________ > From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Dan Van Der > Ster [daniel.vanders...@cern.ch

Re: [ceph-users] Cleanup old osdmaps after #13990 fix applied

2016-09-14 Thread Dan Van Der Ster
; Office: 801.871.2799 | > If you are not the intended recipient of this message or received it > erroneously, please notify the sender and delete it, together with any > attachments, and be advised that any dissemination or copying of this message > is prohibited. > _

Re: [ceph-users] Cleanup old osdmaps after #13990 fix applied

2016-09-15 Thread Dan Van Der Ster
> On 14 Sep 2016, at 23:07, Gregory Farnum wrote: > > On Wed, Sep 14, 2016 at 7:19 AM, Dan Van Der Ster > wrote: >> Indeed, seems to be trimmed by osd_target_transaction_size (default 30) per >> new osdmap. >> Thanks a lot for your help! > > IIRC we had an

Re: [ceph-users] RBD Snapshots and osd_snap_trim_sleep

2016-09-19 Thread Dan van der Ster
Hi Nick, I assume you had osd_snap_trim_sleep > 0 when that snapshot was being deleted? I ask because we haven't seen this problem, but use osd_snap_trim_sleep = 0.1 -- Dan On Mon, Sep 19, 2016 at 11:20 AM, Nick Fisk wrote: > Hi, > > Does the osd_snap_trim_sleep throttle effect the deletion of

Re: [ceph-users] ceph reweight-by-utilization and increasing

2016-09-20 Thread Dan van der Ster
Hi Stefan, What's the current reweight value for osd.110? It cannot be increased above 1. Cheers, Dan On Tue, Sep 20, 2016 at 12:13 PM, Stefan Priebe - Profihost AG wrote: > Hi, > > while using ceph hammer i saw in the doc of ceph reweight-by-utilization > that there is a --no-increasing flag

Re: [ceph-users] Same pg scrubbed over and over (Jewel)

2016-09-21 Thread Dan van der Ster
There was a thread about this a few days ago: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/012857.html And the OP found a workaround. Looks like a bug though... (by default PGs scrub at most once per day). -- dan On Tue, Sep 20, 2016 at 10:43 PM, Martin Bureau wrote: > He

Re: [ceph-users] rbd pool:replica size choose: 2 vs 3

2016-09-23 Thread Dan van der Ster
On Fri, Sep 23, 2016 at 9:29 AM, Wido den Hollander wrote: > > > > Op 23 september 2016 om 9:11 schreef Tomasz Kuzemko > > : > > > > > > Hi, > > > > biggest issue with replica size 2 is that if you find an inconsistent > > object you will not be able to tell which copy is the correct one. With >

[ceph-users] Transitioning existing native CephFS cluster to OpenStack Manila

2016-09-29 Thread Dan van der Ster
hare namespaces would break the existing native usage of the HPC machines. We'd prefer (2), but just wanted to check if there is something we missed. Best Regards, Dan van der Ster CERN IT ___ ceph-users mailing list ceph-users@lists.ceph.com h

Re: [ceph-users] unfound objects blocking cluster, need help!

2016-10-02 Thread Dan van der Ster
Hi, Do you understand why removing that osd led to unfound objects? Do you have the ceph.log from yesterday? Cheers, Dan On 2 Oct 2016 10:18, "Tomasz Kuzemko" wrote: > > Forgot to mention Ceph version - 0.94.5. > > I managed to fix this. By chance I found that when an OSD for a blocked PG is st

Re: [ceph-users] Feedback wanted: health warning when standby MDS dies?

2016-10-18 Thread Dan van der Ster
+1 I would find this warning useful. On Tue, Oct 18, 2016 at 1:46 PM, John Spray wrote: > Hi all, > > Someone asked me today how to get a list of down MDS daemons, and I > explained that currently the MDS simply forgets about any standby that > stops sending beacons. That got me thinking about

Re: [ceph-users] HELP ! Cluster unusable with lots of "hit suicide timeout"

2016-10-19 Thread Dan van der Ster
Hi Yoann, On Wed, Oct 19, 2016 at 9:44 AM, Yoann Moulin wrote: > Dear List, > > We have a cluster in Jewel 10.2.2 under ubuntu 16.04. The cluster is compose > by 12 nodes, each nodes have 10 OSD with journal on disk. > > We have one rbd partition and a radosGW with 2 data pool, one replicated,

Re: [ceph-users] HELP ! Cluster unusable with lots of "hit suicide timeout"

2016-10-19 Thread Dan van der Ster
On Wed, Oct 19, 2016 at 3:22 PM, Yoann Moulin wrote: > Hello, > >>> We have a cluster in Jewel 10.2.2 under ubuntu 16.04. The cluster is >>> compose by 12 nodes, each nodes have 10 OSD with journal on disk. >>> >>> We have one rbd partition and a radosGW with 2 data pool, one replicated, >>> one

Re: [ceph-users] All PGs are active+clean, still remapped PGs

2016-10-24 Thread Dan van der Ster
Hi Wido, This seems similar to what our dumpling tunables cluster does when a few particular osds go down... Though in our case the remapped pgs are correctly shown as remapped, not clean. The fix in our case will be to enable the vary_r tunable (which will move some data). Cheers, Dan On 24 Oc

Re: [ceph-users] All PGs are active+clean, still remapped PGs

2016-10-26 Thread Dan van der Ster
On Tue, Oct 25, 2016 at 7:06 AM, Wido den Hollander wrote: > >> Op 24 oktober 2016 om 22:29 schreef Dan van der Ster : >> >> >> Hi Wido, >> >> This seems similar to what our dumpling tunables cluster does when a few >> particular osds go do

[ceph-users] ceph df show 8E pool

2016-10-27 Thread Dan van der Ster
Hi all, One of our 10.2.3 clusters has a pool with bogus statistics. The pool is empty, but it shows 8E of data used and 2^63-1 objects. POOLS: NAME ID USED %USED MAX AVAIL OBJECTS test 19 8E 0 1362T 9223372036854775807 Strangely,

Re: [ceph-users] ceph df show 8E pool

2016-10-28 Thread Dan van der Ster
with 'ceph df' ? > Cheers > G. > > > From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Dan van der Ster [d...@vanderster.com] > Sent: 28 October 2016 03:01 > To: ceph-users > Subject: [ceph-users] ceph df show 8E pool > > Hi all,

Re: [ceph-users] ceph df show 8E pool

2016-10-28 Thread Dan van der Ster
this state. Cheers, Dan On 28 Oct 2016 10:08, "Dan van der Ster" wrote: > > Hi Goncalo, > > Strange, now ceph df says the pool has 0 bytes used and -1 objects. rados df agrees this those numbers. > > Cheers, Dan > > On 28 Oct 2016 00:47, "Goncalo Borges"

Re: [ceph-users] Monitors stores not trimming after upgrade from Dumpling to Hammer

2016-11-03 Thread Dan van der Ster
Hi Wido, AFAIK mon's won't trim while a cluster is in HEALTH_WARN. Unset noscrub,nodeep-scrub, get that 3rd mon up, then it should trim. -- Dan On Thu, Nov 3, 2016 at 10:40 AM, Wido den Hollander wrote: > Hi, > > After finally resolving the remapped PGs [0] I'm running into a a problem > wher

Re: [ceph-users] Monitors stores not trimming after upgrade from Dumpling to Hammer

2016-11-03 Thread Dan van der Ster
On Thu, Nov 3, 2016 at 11:57 AM, Wido den Hollander wrote: > >> Op 3 november 2016 om 10:46 schreef Wido den Hollander : >> >> >> >> > Op 3 november 2016 om 10:42 schreef Dan van der Ster : >> > >> > >> > Hi Wido, >> &

Re: [ceph-users] Scrubbing not using Idle thread?

2016-11-08 Thread Dan van der Ster
Hi Nick, That's expected since jewel, which moved the scrub IOs out of the disk thread and into the ?op? thread. They can now be prioritized using osd_scrub_priority, and you can experiment with osd_op_queue = prio/wpq to see if scrubs can be made more transparent with the latter, newer, queuing i

Re: [ceph-users] The largest cluster for now?

2016-11-10 Thread Dan Van Der Ster
Hi, > On 10 Nov 2016, at 12:17, han vincent wrote: > > Hello, all: >Recently, I have a plan to build a large-scale ceph cluster in > production for Openstack. I want to build the cluster as larger as > possible. >In the following maillist, Karol has asked a question about > "largest cep

[ceph-users] Intermittent permission denied using kernel client with mds path cap

2016-11-10 Thread Dan van der Ster
Hi all, Hi Zheng, We're seeing a strange issue with the kernel cephfs clients, combined with a path restricted mds cap. It seems that files/dirs are intermittently not created due to permission denied. For example, when I untar a kernel into cephfs, we see ~1/1000 files failed to open/mkdir. Clie

Re: [ceph-users] Intermittent permission denied using kernel client with mds path cap

2016-11-11 Thread Dan van der Ster
ssues in the kernel client. See the > discussion here. > > http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2016-June/010656.html > > http://tracker.ceph.com/issues/16358 > > Cheers > Goncalo > > ________ > From:

Re: [ceph-users] segfault in ceph-fuse when quota is enabled

2016-12-06 Thread Dan van der Ster
Hi Goncalo, That bug is fixed in 10.2.4. See http://tracker.ceph.com/issues/16066 -- Dan On Tue, Dec 6, 2016 at 5:11 AM, Goncalo Borges wrote: > Hi John, Greg, Zheng > > And now a much more relevant problem. Once again, my environment: > > - ceph/cephfs in 10.2.2 but patched for > o client:

Re: [ceph-users] stalls caused by scrub on jewel

2016-12-06 Thread Dan van der Ster
Hi Sage, Could you please clarify: do we need to set nodeep-scrub also, or does this somehow only affect the (shallow) scrub? (Note that deep scrubs will start when the deep_scrub_interval has passed, even with noscrub set). Cheers, Dan On Tue, Nov 15, 2016 at 11:35 PM, Sage Weil wrote: > Hi

Re: [ceph-users] 2x replication: A BIG warning

2016-12-07 Thread Dan van der Ster
Hi Wido, Thanks for the warning. We have one pool as you described (size 2, min_size 1), simply because 3 replicas would be too expensive and erasure coding didn't meet our performance requirements. We are well aware of the risks, but of course this is a balancing act between risk and cost. Anywa

Re: [ceph-users] problem after reinstalling system

2016-12-09 Thread Dan van der Ster
On Thu, Dec 8, 2016 at 5:51 PM, Jake Young wrote: > Hey Dan, > > I had the same issue that Jacek had after changing my OS and Ceph version > from Ubuntu 14 - Hammer to Centos 7 - Jewel. I was also able to recover from > the failure by renaming the .ldb files to .sst files. > > Do you know why thi

Re: [ceph-users] filestore_split_multiple hardcoded maximum?

2016-12-09 Thread Dan van der Ster
Coincidentally, we've been suffering from split-induced slow requests on one of our clusters for the past week. I wanted to add that it isn't at all obvious when slow requests are being caused by filestore splitting. (When you increase the filestore/osd logs to 10, probably also 20, all you see is

Re: [ceph-users] Luminous 12.2.6 release date?

2018-07-11 Thread Dan van der Ster
And voila, I see the 12.2.6 rpms were released overnight. Waiting here for an announcement before upgrading. -- dan On Tue, Jul 10, 2018 at 10:08 AM Sean Purdy wrote: > > While we're at it, is there a release date for 12.2.6? It fixes a > reshard/versioning bug for us. > > Sean >

Re: [ceph-users] unfound blocks IO or gives IO error?

2018-07-12 Thread Dan van der Ster
On Wed, Jul 11, 2018 at 11:40 PM Gregory Farnum wrote: > > On Mon, Jun 25, 2018 at 12:34 AM Dan van der Ster wrote: >> >> On Fri, Jun 22, 2018 at 10:44 PM Gregory Farnum wrote: >> > >> > On Fri, Jun 22, 2018 at 6:22 AM Sergey Malinin wrote: >> >

Re: [ceph-users] MDS damaged

2018-07-12 Thread Dan van der Ster
On Wed, Jul 11, 2018 at 10:25 PM Gregory Farnum wrote: > > On Wed, Jul 11, 2018 at 9:23 AM Alessandro De Salvo > wrote: >> >> OK, I found where the object is: >> >> >> ceph osd map cephfs_metadata 200. >> osdmap e632418 pool 'cephfs_metadata' (10) object '200.' -> pg >> 10.844f34

[ceph-users] upgrading to 12.2.6 damages cephfs (crc errors)

2018-07-13 Thread Dan van der Ster
Hi, Following the reports on ceph-users about damaged cephfs after updating to 12.2.6 I spun up a 1 node cluster to try the upgrade. I started with two OSDs on 12.2.5, wrote some data. Then I restarted the OSDs one by one while continuing to write to the cephfs mountpoint. Then I restarted the (si

Re: [ceph-users] MDS damaged

2018-07-13 Thread Dan van der Ster
n scrubbing, that magically disappeared after restarting the OSD. > > > > However, in my case it was clearly related to > > https://tracker.ceph.com/issues/22464 which doesn't > > seem to be the issue here. > > > > Paul > > > > 20

Re: [ceph-users] mds daemon damaged

2018-07-13 Thread Dan van der Ster
Hi Kevin, Are your OSDs bluestore or filestore? -- dan On Thu, Jul 12, 2018 at 11:30 PM Kevin wrote: > > Sorry for the long posting but trying to cover everything > > I woke up to find my cephfs filesystem down. This was in the logs > > 2018-07-11 05:54:10.398171 osd.1 [ERR] 2.4 full-object rea

Re: [ceph-users] upgrading to 12.2.6 damages cephfs (crc errors)

2018-07-13 Thread Dan van der Ster
The problem seems similar to https://tracker.ceph.com/issues/23871 which was fixed in mimic but not luminous: fe5038c7f9 osd/PrimaryLogPG: clear data digest on WRITEFULL if skip_data_digest .. dan On Fri, Jul 13, 2018 at 12:45 PM Dan van der Ster wrote: > > Hi, > > Following the rep

Re: [ceph-users] MDS damaged

2018-07-13 Thread Dan van der Ster
On Fri, Jul 13, 2018 at 4:07 PM Alessandro De Salvo wrote: > However, I cannot reduce the number of mdses anymore, I was used to do > that with e.g.: > > ceph fs set cephfs max_mds 1 > > Trying this with 12.2.6 has apparently no effect, I am left with 2 > active mdses. Is this another bug? Are yo

[ceph-users] luminous librbd::image::OpenRequest: failed to retreive immutable metadata

2018-07-17 Thread Dan van der Ster
Hi, This mail is for the search engines. An old "Won't Fix" ticket is still quite relevant: http://tracker.ceph.com/issues/16211 When you upgrade an old rbd cluster to luminous, there is a good chance you will have several rbd images with unreadable header objects. E.g. # rbd info -p volumes vol

Re: [ceph-users] Exact scope of OSD heartbeating?

2018-07-18 Thread Dan van der Ster
On Wed, Jul 18, 2018 at 3:20 AM Anthony D'Atri wrote: > > The documentation here: > > http://docs.ceph.com/docs/master/rados/configuration/mon-osd-interaction/ > > says > > "Each Ceph OSD Daemon checks the heartbeat of other Ceph OSD Daemons every 6 > seconds" > > and > > " If a neighboring Ceph

Re: [ceph-users] active+clean+inconsistent PGs after upgrade to 12.2.7

2018-07-20 Thread Dan van der Ster
On Thu, Jul 19, 2018 at 11:51 AM Robert Sander wrote: > > On 19.07.2018 11:15, Ronny Aasen wrote: > > > Did you upgrade from 12.2.5 or 12.2.6 ? > > Yes. > > > sounds like you hit the reason for the 12.2.7 release > > > > read : https://ceph.com/releases/12-2-7-luminous-released/ > > > > there shou

Re: [ceph-users] 12.2.6 upgrade

2018-07-20 Thread Dan van der Ster
CRC errors are expected in 12.2.7 if you ran 12.2.6 with bluestore. See https://ceph.com/releases/12-2-7-luminous-released/#upgrading-from-v12-2-6 On Fri, Jul 20, 2018 at 8:30 AM Glen Baars wrote: > > Hello Ceph Users, > > > > We have upgraded all nodes to 12.2.7 now. We have 90PGs ( ~2000 scrub

Re: [ceph-users] 12.2.6 upgrade

2018-07-20 Thread Dan van der Ster
the active+clean+inconsistent PGs will be OK? > > Is the data still getting replicated even if inconsistent? > > Kind regards, > Glen Baars > > -----Original Message- > From: Dan van der Ster > Sent: Friday, 20 July 2018 3:57 PM > To: Glen Baars > Cc: ceph-users

Re: [ceph-users] 12.2.7 + osd skip data digest + bluestore + I/O errors

2018-07-24 Thread Dan van der Ster
`ceph versions` -- you're sure all the osds are running 12.2.7 ? osd_skip_data_digest = true is supposed to skip any crc checks during reads. But maybe the cache tiering IO path is different and checks the crc anyway? -- dan On Tue, Jul 24, 2018 at 3:01 PM SCHAER Frederic wrote: > > Hi, > > >

Re: [ceph-users] download.ceph.com repository changes

2018-07-24 Thread Dan van der Ster
On Tue, Jul 24, 2018 at 4:38 PM Alfredo Deza wrote: > > Hi all, > > After the 12.2.6 release went out, we've been thinking on better ways > to remove a version from our repositories to prevent users from > upgrading/installing a known bad release. > > The way our repos are structured today means e

Re: [ceph-users] download.ceph.com repository changes

2018-07-24 Thread Dan van der Ster
On Tue, Jul 24, 2018 at 4:59 PM Alfredo Deza wrote: > > On Tue, Jul 24, 2018 at 10:54 AM, Dan van der Ster > wrote: > > On Tue, Jul 24, 2018 at 4:38 PM Alfredo Deza wrote: > >> > >> Hi all, > >> > >> After the 12.2.6 release went out, we'

Re: [ceph-users] download.ceph.com repository changes

2018-07-24 Thread Dan van der Ster
On Tue, Jul 24, 2018 at 5:08 PM Dan van der Ster wrote: > > On Tue, Jul 24, 2018 at 4:59 PM Alfredo Deza wrote: > > > > On Tue, Jul 24, 2018 at 10:54 AM, Dan van der Ster > > wrote: > > > On Tue, Jul 24, 2018 at 4:38 PM Alfredo Deza wrote: > > >>

Re: [ceph-users] Slack-IRC integration

2018-07-28 Thread Dan van der Ster
It's here https://ceph-storage.slack.com/ but for some reason the list of accepted email domains is limited. I have no idea who is maintaining this. Anyway, the slack is just mirroring #ceph and #ceph-devel on IRC so better to connect there directly. Cheers, Dan On Sat, Jul 28, 2018, 6:59 PM

Re: [ceph-users] mgr abort during upgrade 12.2.5 -> 12.2.7 due to multiple active RGW clones

2018-08-01 Thread Dan van der Ster
Sounds like https://tracker.ceph.com/issues/24982 On Wed, Aug 1, 2018 at 10:18 AM Burkhard Linke wrote: > > Hi, > > > I'm currently upgrading our ceph cluster to 12.2.7. Most steps are fine, > but all mgr instances abort after restarting: > > > > > -10> 2018-08-01 09:57:46.357696 7fc48122

[ceph-users] safe to remove leftover bucket index objects

2018-08-01 Thread Dan van der Ster
Dear rgw friends, Somehow we have more than 20 million objects in our default.rgw.buckets.index pool. They are probably leftover from this issue we had last year: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-June/018565.html and we want to clean the leftover / unused index objects To

Re: [ceph-users] cephfs client version in RedHat/CentOS 7.5

2018-08-20 Thread Dan van der Ster
On Mon, Aug 20, 2018 at 5:37 PM Ilya Dryomov wrote: > > On Mon, Aug 20, 2018 at 4:52 PM Dietmar Rieder > wrote: > > > > Hi Cephers, > > > > > > I wonder if the cephfs client in RedHat/CentOS 7.5 will be updated to > > luminous? > > As far as I see there is some luminous related stuff that was > >

Re: [ceph-users] ceph balancer: further optimizations?

2018-08-20 Thread Dan van der Ster
On Mon, Aug 20, 2018 at 10:19 PM Stefan Priebe - Profihost AG wrote: > > > Am 20.08.2018 um 21:52 schrieb Sage Weil: > > On Mon, 20 Aug 2018, Stefan Priebe - Profihost AG wrote: > >> Hello, > >> > >> since loic seems to have left ceph development and his wunderful crush > >> optimization tool isn'

Re: [ceph-users] ceph balancer: further optimizations?

2018-08-21 Thread Dan van der Ster
On Mon, Aug 20, 2018 at 10:45 PM Stefan Priebe - Profihost AG wrote: > > > Am 20.08.2018 um 22:38 schrieb Dan van der Ster: > > On Mon, Aug 20, 2018 at 10:19 PM Stefan Priebe - Profihost AG > > wrote: > >> > >> > >> Am 20.08.2018 um 21:52 schr

Re: [ceph-users] ceph balancer: further optimizations?

2018-08-21 Thread Dan van der Ster
On Tue, Aug 21, 2018 at 11:54 AM Stefan Priebe - Profihost AG wrote: > > Am 21.08.2018 um 11:47 schrieb Dan van der Ster: > > On Mon, Aug 20, 2018 at 10:45 PM Stefan Priebe - Profihost AG > > wrote: > >> > >> > >> Am 20.08.2018 um 22:38 schrieb Dan v

[ceph-users] rocksdb mon stores growing until restart

2018-08-30 Thread Dan van der Ster
Hi, Is anyone else seeing rocksdb mon stores slowly growing to >15GB, eventually triggering the 'mon is using a lot of disk space' warning? Since upgrading to luminous, we've seen this happen at least twice. Each time, we restart all the mons and then stores slowly trim down to <500MB. We have 'm

Re: [ceph-users] safe to remove leftover bucket index objects

2018-08-30 Thread Dan van der Ster
Replying to self... On Wed, Aug 1, 2018 at 11:56 AM Dan van der Ster wrote: > > Dear rgw friends, > > Somehow we have more than 20 million objects in our > default.rgw.buckets.index pool. > They are probably leftover from this issue we had last year: > http://lists.ceph.com

Re: [ceph-users] safe to remove leftover bucket index objects

2018-08-31 Thread Dan van der Ster
ppening both in the research > list and all bucket reshard statuses. > > Does anyone know how to parse the names of these objects and how to tell what > can be deleted? This is if particular interest as I have another costed with > 1M injects in the index pool. > > On Thu, Aug 3

Re: [ceph-users] No announce for 12.2.8 / available in repositories

2018-09-03 Thread Dan van der Ster
I don't think those issues are known... Could you elaborate on your librbd issues with v12.2.8 ? -- dan On Tue, Sep 4, 2018 at 7:30 AM Linh Vu wrote: > > Version 12.2.8 seems broken. Someone earlier on the ML had a MDS issue. We > accidentally upgraded an openstack compute node from 12.2.7 to 1

Re: [ceph-users] v12.2.8 Luminous released

2018-09-05 Thread Dan van der Ster
ll-deps.sh fails on newest openSUSE Leap (issue#25064, > pr#23179, Kyr Shatskyy) > * build/ops: Mimic build fails with -DWITH_RADOSGW=0 (issue#24437, pr#22864, > Dan Mick) > * build/ops: order rbdmap.service before remote-fs-pre.target (issue#24713, > pr#22844, Ilya Dryomov) >

Re: [ceph-users] v12.2.8 Luminous released

2018-09-11 Thread Dan van der Ster
=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 l=0).handle_connect_reply connect protocol version mismatch, my 31 != 30 Cheers, Dan [1] http://docs.ceph.com/docs/luminous/cephfs/upgrading/ On Wed, Sep 5, 2018 at 4:20 PM Dan van der Ster wrote: > > Thanks for the release! > > We've

Re: [ceph-users] mds_cache_memory_limit

2018-09-11 Thread Dan van der Ster
We set it to 50% -- there seems to be some mystery inflation and possibly a small leak (in luminous, at least). -- dan On Tue, Sep 11, 2018 at 4:04 PM marc-antoine desrochers wrote: > > Hi, > > > > Is there any recommendation for the mds_cache_memory_limit ? Like a % of the > total ram or somet

[ceph-users] dm-writecache

2018-09-14 Thread Dan van der Ster
Hi, Has anyone tried the new dm-writecache target that landed in 4.18 [1]? Might be super useful in the osd context... Cheers, Dan [1] https://www.phoronix.com/scan.php?page=news_item&px=Linux-4.18-DM-Writecache ___ ceph-users mailing list ceph-users@l

[ceph-users] CRUSH puzzle: step weighted-take

2018-09-27 Thread Dan van der Ster
Dear Ceph friends, I have a CRUSH data migration puzzle and wondered if someone could think of a clever solution. Consider an osd tree like this: -2 4428.02979 room 0513-R-0050 -72911.81897 rack RA01 -4917.27899 rack RA05 -6917.25500

Re: [ceph-users] CRUSH puzzle: step weighted-take

2018-09-28 Thread Dan van der Ster
how we backfill if we go that route. However I would prefer to avoid one big massive change that takes a long time to complete. - dan > > > On Thu, Sep 27, 2018 at 4:19 PM Dan van der Ster wrote: > > > > Dear Ceph friends, > > > > I have a CRUSH data migration puz

Re: [ceph-users] CRUSH puzzle: step weighted-take

2018-09-28 Thread Dan van der Ster
On Thu, Sep 27, 2018 at 9:57 PM Maged Mokhtar wrote: > > > > On 27/09/18 17:18, Dan van der Ster wrote: > > Dear Ceph friends, > > > > I have a CRUSH data migration puzzle and wondered if someone could > > think of a clever solution. > > > &g

Re: [ceph-users] CRUSH puzzle: step weighted-take

2018-09-28 Thread Dan van der Ster
On Fri, Sep 28, 2018 at 12:51 AM Goncalo Borges wrote: > > Hi Dan > > Hope to find you ok. > > Here goes a suggestion from someone who has been sitting in the side line for > the last 2 years but following stuff as much as possible > > Will weight set per pool help? > > This is only possible in l

Re: [ceph-users] CRUSH puzzle: step weighted-take

2018-10-02 Thread Dan van der Ster
On Mon, Oct 1, 2018 at 8:09 PM Gregory Farnum wrote: > > On Fri, Sep 28, 2018 at 12:03 AM Dan van der Ster wrote: > > > > On Thu, Sep 27, 2018 at 9:57 PM Maged Mokhtar wrote: > > > > > > > > > > > > On 27/09/18 17:18, Dan van der Ster wrot

Re: [ceph-users] ceph-mgr hangs on larger clusters in Luminous

2018-10-18 Thread Dan van der Ster
15 minutes seems like the ms tcp read timeout would be related. Try shortening that and see if it works around the issue... (We use ms tcp read timeout = 60 over here -- the 900s default seems really long to keep idle connections open) -- dan On Thu, Oct 18, 2018 at 9:39 PM Bryan Stillwell wr

Re: [ceph-users] ceph-volume activation

2018-02-21 Thread Dan van der Ster
On Wed, Feb 21, 2018 at 2:24 PM, Alfredo Deza wrote: > On Tue, Feb 20, 2018 at 9:05 PM, Oliver Freyermuth > wrote: >> Many thanks for your replies! >> >> Are there plans to have something like >> "ceph-volume discover-and-activate" >> which would effectively do something like: >> ceph-volume list

Re: [ceph-users] ceph-volume activation

2018-02-22 Thread Dan van der Ster
On Wed, Feb 21, 2018 at 11:56 PM, Oliver Freyermuth wrote: > Am 21.02.2018 um 15:58 schrieb Alfredo Deza: >> On Wed, Feb 21, 2018 at 9:40 AM, Dan van der Ster >> wrote: >>> On Wed, Feb 21, 2018 at 2:24 PM, Alfredo Deza wrote: >>>> On Tue, Feb 20, 2018 at 9:

Re: [ceph-users] ceph-volume activation

2018-02-27 Thread Dan van der Ster
Hi Oliver, No ticket yet... we were distracted. I have the same observations as what you show below... -- dan On Tue, Feb 27, 2018 at 2:33 PM, Oliver Freyermuth wrote: > Am 22.02.2018 um 09:44 schrieb Dan van der Ster: >> On Wed, Feb 21, 2018 at 11:56 PM, Oliver Freyermuth >>

[ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-02-28 Thread Dan van der Ster
Hi all, I'm just updating our test cluster from 12.2.2 to 12.2.4. Mon's and OSD's updated fine. When updating the MDS's (we have 2 active and 1 standby), I started with the standby. At the moment the standby MDS restarted into 12.2.4 [1], both active MDSs (still running 12.2.2) suicided like thi

Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-02-28 Thread Dan van der Ster
More: here is the MDS_FEATURES map for a running 12.2.2 cluster: compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=file layout v2} and here it is

Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-02-28 Thread Dan van der Ster
"file layout v2") Is there a way to update from 12.2.2 without causing the other active MDS's to suicide? Cheers, Dan On Wed, Feb 28, 2018 at 11:01 AM, Dan van der Ster wrote: > More: > > here is the MDS_FEATURES map for a running 12.2.2 cluster: > > compat: compat={

Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-02-28 Thread Dan van der Ster
On Wed, Feb 28, 2018 at 11:38 AM, Patrick Donnelly wrote: > On Wed, Feb 28, 2018 at 2:07 AM, Dan van der Ster wrote: >> (Sorry to spam) >> >> I guess it's related to this fix to the layout v2 feature id: >> https://github.com/ceph/ceph

Re: [ceph-users] ceph mgr balancer bad distribution

2018-02-28 Thread Dan van der Ster
Hi Stefan, Which balancer mode are you using? crush-compat scores using a mix of nobjects, npgs, and size. It's doing pretty well over here as long as you have a relatively small number of empty PGs. I believe that upmap uses nPGs only, and I haven't tested it enough yet to know if it actually imp

Re: [ceph-users] Sizing your MON storage with a large cluster

2018-02-28 Thread Dan van der Ster
Hi Wido, Are your mon's using rocksdb or still leveldb? Are your mon stores trimming back to a small size after HEALTH_OK was restored? One v12.2.2 cluster here just started showing the "is using a lot of disk space" warning on one of our mons. In fact all three mons are now using >16GB. I tried

Re: [ceph-users] ceph mgr balancer bad distribution

2018-03-01 Thread Dan van der Ster
Is the score improving? ceph balancer eval It should be decreasing over time as the variances drop toward zero. You mentioned a crush optimize code at the beginning... how did that leave your cluster? The mgr balancer assumes that the crush weight of each OSD is equal to its size in TB. Do y

Re: [ceph-users] ceph mgr balancer bad distribution

2018-03-01 Thread Dan van der Ster
On Thu, Mar 1, 2018 at 9:31 AM, Stefan Priebe - Profihost AG wrote: > Hi, > Am 01.03.2018 um 09:03 schrieb Dan van der Ster: >> Is the score improving? >> >> ceph balancer eval >> >> It should be decreasing over time as the variances drop toward zero. >&

<    1   2   3   4   5   6   7   8   9   >