Re: [ceph-users] reweight-by-utilization warning

2016-05-15 Thread Dan van der Ster
Hi Blaire! (re-copying to list) The good news is that the functionality of that python script is now available natively in jewel and has been backported to hammer 0.96.7. Now you can use ceph osd test-reweight-by-(pg|utilization) in order to see how the weights would change if you were to run

Re: [ceph-users] v0.94.7 Hammer released

2016-05-16 Thread Dan van der Ster
On Mon, May 16, 2016 at 8:20 AM, Chris Dunlop wrote: > On Fri, May 13, 2016 at 10:21:51AM -0400, Sage Weil wrote: >> This Hammer point release fixes several minor bugs. It also includes a >> backport of an improved ‘ceph osd reweight-by-utilization’ command for >> handling OSDs with higher-than-av

Re: [ceph-users] failing to respond to cache pressure

2016-05-16 Thread Dan van der Ster
On 16 May 2016 16:36, "John Spray" wrote: > > On Mon, May 16, 2016 at 3:11 PM, Andrus, Brian Contractor > wrote: > > Both client and server are Jewel 10.2.0 > > So the fuse client, correct? If you are up for investigating further, > with potential client bugs (or performance issues) it is often

Re: [ceph-users] v0.94.7 Hammer released

2016-05-17 Thread Dan van der Ster
Hi Sage et al, I'm updating our pre-prod cluster from 0.94.6 to 0.94.7 and after upgrading the ceph-mon's I'm getting loads of warnings like: 2016-05-17 10:01:29.314785 osd.76 [WRN] failed to encode map e103116 with expected crc I've seen that error is whitelisted in the qa-suite: https://github

[ceph-users] CephFS Jewel not using cache tiering much

2016-05-17 Thread Daniel van Ham Colchete
Hello everyone! I'm putting CephFS in production here to host Dovecot mailboxes. That's a big use case in the Dovecot community. Versions: Ubuntu 14.04 LTS with kerrnel 4.4.0-22-generic Ceph 10.2.1-1trusty CephFS uses the kernel client Right now I'm migrating my users to this new systems. That s

Re: [ceph-users] v0.94.7 Hammer released

2016-05-17 Thread Dan van der Ster
On Tue, May 17, 2016 at 1:56 PM, Sage Weil wrote: > On Tue, 17 May 2016, Dan van der Ster wrote: >> Hi Sage et al, >> >> I'm updating our pre-prod cluster from 0.94.6 to 0.94.7 and after >> upgrading the ceph-mon's I'm getting loads of warnings like: &g

[ceph-users] PG stuck incomplete after power failure.

2016-05-17 Thread Hein-Pieter van Braam
ave it like this for the time being. Help would be very much appreciated! Thank you, - Hein-Pieter van Braam# ceph pg 54.3e9 query { "state": "incomplete", "snap_trimq": "[]", "epoch": 90440, "up": [ 32,

Re: [ceph-users] PG stuck incomplete after power failure.

2016-05-17 Thread Hein-Pieter van Braam
imary osd for that pg with > osd_find_best_info_ignore_history_les set to true (don't leave it set > long term). > -Sam > > On Tue, May 17, 2016 at 7:50 AM, Hein-Pieter van Braam > wrote: > > > > Hello, > > > > Today we had a power failure in a ra

[ceph-users] Enabling hammer rbd features on cluster with a few dumpling clients

2016-05-19 Thread Dan van der Ster
Hi, We want to enable the hammer rbd features on newly created Cinder volumes [1], but we still have a few VMs running with super old librbd running (dumpling). Perhaps its academic, but does anyone know the expected behaviour if an old dumpling-linked qemu-kvm tries to attach an rbd with exclusi

Re: [ceph-users] v0.94.7 Hammer released

2016-05-24 Thread Dan van der Ster
sumed it to be a more noisy (but harmless) > upgrade artifact. > > Christian > > On Tue, 17 May 2016 14:07:21 +0200 Dan van der Ster wrote: > > > On Tue, May 17, 2016 at 1:56 PM, Sage Weil wrote: > > > On Tue, 17 May 2016, Dan van der Ster wrote: > > >> Hi Sage

Re: [ceph-users] v0.94.7 Hammer released

2016-05-24 Thread Dan van der Ster
: ceph tell osd.* injectargs -- --clog_to_monitors=false which made things much better. When I upgrade our 2nd cluster tomorrow, I'll set clog_to_monitors=false before starting. Cheers, Dan On Tue, May 24, 2016 at 10:02 AM, Dan van der Ster wrote: > Hi all, > > I'm mid-upgrade

Re: [ceph-users] Issues after update (0.94.7): Failed to encode map eXXX with expected crc

2016-06-02 Thread Dan van der Ster
Hi, Are you sure all OSDs have been updated to 0.94.7? Those messages should only be printed by 0.94.6 OSDs trying to handle messages from a 0.94.7 ceph-mon. Also, see the thread about the 0.94.7 release -- I mentioned a workaround there. -- Dan On Thu, Jun 2, 2016 at 11:29 AM, Romero Junior wr

Re: [ceph-users] Cache pool with replicated pool don't work properly.

2016-06-13 Thread Hein-Pieter van Braam
Hi, I don't really have a solution but I can confirm I had the same problem trying to deploy my new Jewel cluster. I reinstalled the cluster with Hammer and everything is working as I expect it to (that is; writes hit the backing pool asynchronously) Although other than you I noticed the same pro

[ceph-users] Ceph Day Switzerland slides and video

2016-06-15 Thread Dan van der Ster
Dear Ceph Community, Yesterday we had the pleasure of hosting Ceph Day Switzerland, and we wanted to let you know that the slides and videos of most talks have been posted online: https://indico.cern.ch/event/542464/timetable/ Thanks again to all the speakers and attendees! Hervé & Dan CERN

Re: [ceph-users] pg scrub and auto repair in hammer

2016-06-27 Thread Dan van der Ster
On Mon, Jun 27, 2016 at 2:14 AM, Christian Balzer wrote: > On Sun, 26 Jun 2016 19:48:18 +0200 Stefan Priebe wrote: > >> Hi, >> >> is there any option or chance to have auto repair of pgs in hammer? >> > Short answer: > No, in any version of Ceph. Well, jewel has a new option to auto-repair a PG i

Re: [ceph-users] Quick short survey which SSDs

2016-07-05 Thread Dan van der Ster
Hi, On Tue, Jul 5, 2016 at 9:23 AM, Götz Reinicke - IT Koordinator wrote: > Hi, > > we have offers for ceph storage nodes with different SSD types and some > are already mentioned as a very good choice but some are total new to me. > > May be you could give some feedback on the SSDs in question o

Re: [ceph-users] Quick short survey which SSDs

2016-07-05 Thread Dan van der Ster
On Tue, Jul 5, 2016 at 9:53 AM, Christian Balzer wrote: >> Unfamiliar: Samsung SM863 >> > You might want to read the thread here: > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-February/007871.html > > And google "ceph SM863". > > However I'm still waiting for somebody to confirm that

Re: [ceph-users] Quick short survey which SSDs

2016-07-05 Thread Dan van der Ster
On Tue, Jul 5, 2016 at 10:04 AM, Dan van der Ster wrote: > On Tue, Jul 5, 2016 at 9:53 AM, Christian Balzer wrote: >>> Unfamiliar: Samsung SM863 >>> >> You might want to read the thread here: >> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-February/0

Re: [ceph-users] multiple journals on SSD

2016-07-06 Thread Dan van der Ster
We have 5 journal partitions per SSD. Works fine (on el6 and el7). Best practice is to use ceph-disk: ceph-disk prepare /dev/sde /dev/sdc # where e is the osd, c is an SSD. -- Dan On Wed, Jul 6, 2016 at 2:03 PM, George Shuklin wrote: > Hello. > > I've been testing Intel 3500 as journal stor

[ceph-users] PG stuck remapped+incomplete

2016-07-16 Thread Hein-Pieter van Braam
Hi all, I had a crash of some OSDs today, every primary OSD of a particular PG just started to crash. I have recorded the information for a bugreport.  I had reweighted the affected osds to 0 and put the processes in a restart loop and eventually all but one placement group ended up recovering. I

[ceph-users] PG stuck remapped+incomplete

2016-07-16 Thread Hein-Pieter van Braam
Hi all, I had a crash of some OSDs today, every primary OSD of a particular PG just started to crash. I have recorded the information for a bugreport.  I had reweighted the affected osds to 0 and put the processes in a restart loop and eventually all but one placement group ended up recovering. I

Re: [ceph-users] PG stuck remapped+incomplete

2016-07-18 Thread Hein-Pieter van Braam
7;ve attached the latest version of the pg query for this pg. Thanks a lot. - HP On Sat, 2016-07-16 at 19:56 +0200, Hein-Pieter van Braam wrote: > Hi all, > > I had a crash of some OSDs today, every primary OSD of a particular > PG > just started to crash. I have recorded the informatio

Re: [ceph-users] Uncompactable Monitor Store at 69GB -- Re: Cluster in warn state, not sure what to do next.

2016-07-21 Thread Dan van der Ster
Hi, The mon's keep all maps going back to the last time the cluster had HEALTH_OK, which is why the mon leveldb's are so large in your case. (I see Greg responded with the same info). Focus on getting the cluster healthy, then the mon sizes should resolve themselves. -- Dan On Thu, Jul 21, 2016

Re: [ceph-users] mon_osd_nearfull_ratio (unchangeable) ?

2016-07-26 Thread Dan van der Ster
On Tue, Jul 26, 2016 at 3:52 AM, Brad Hubbard wrote: >> 1./ if I try to change mon_osd_nearfull_ratio from 0.85 to 0.90, I get >> >># ceph tell mon.* injectargs "--mon_osd_nearfull_ratio 0.90" >>mon.rccephmon1: injectargs:mon_osd_nearfull_ratio = '0.9' >>(unchangeable) >>mon.rcceph

Re: [ceph-users] Recovery stuck after adjusting to recent tunables

2016-07-26 Thread Dan van der Ster
Hi, Starting from the beginning... If a 3-replica PG gets stuck with only 2 replicas after changing tunables, it's probably a case where choose_total_tries is too low for your cluster configuration. Try increasing choose_total_tries from 50 to 75. -- Dan On Fri, Jul 22, 2016 at 4:17 PM, Kosti

Re: [ceph-users] Recovery stuck after adjusting to recent tunables

2016-07-26 Thread Dan van der Ster
with over 150 OSDs > and hundreds of TB... > > I would be grateful if you could point me to some code or > documentation (for this tunable and the others too also) that would > have make me "see" the problem earlier and make a plan for the future. > > Kostis > >

[ceph-users] rgw query bucket usage quickly

2016-07-28 Thread Dan van der Ster
Hi, Does anyone know a fast way for S3 users to query their total bucket usage? 's3cmd du' takes a long time on large buckets (is it iterating over all the objects?). 'radosgw-admin bucket stats' seems to know the bucket usage immediately, but I didn't find a way to expose that to end users. Hopi

Re: [ceph-users] rgw query bucket usage quickly

2016-07-28 Thread Dan van der Ster
On Thu, Jul 28, 2016 at 5:33 PM, Abhishek Lekshmanan wrote: > > Dan van der Ster writes: > >> Hi, >> >> Does anyone know a fast way for S3 users to query their total bucket >> usage? 's3cmd du' takes a long time on large buckets (is it iterating >&

Re: [ceph-users] rgw query bucket usage quickly

2016-07-29 Thread Dan van der Ster
up >> 1656225129419 29 objects s3://seanbackup/ >> >> real 0m0.314s >> user0m0.088s >> sys 0m0.019s >> [root@korn ~]# >> >> >> On Thu, Jul 28, 2016 at 4:49 PM, Dan van der Ster >> wrote: >>> >>> On Thu, Jul 2

Re: [ceph-users] rgw query bucket usage quickly

2016-07-29 Thread Dan van der Ster
On Fri, Jul 29, 2016 at 12:06 PM, Wido den Hollander wrote: > >> Op 29 juli 2016 om 11:59 schreef Dan van der Ster : >> >> >> Oh yes, that should help. BTW, which client are people using for the >> Admin Ops API? Is there something better than s3curl.pl ... >

Re: [ceph-users] rgw query bucket usage quickly

2016-07-29 Thread Dan van der Ster
192.168.100.100, my.ceph.cluster, etc.). Once you add that, you should stop > seeing the 403 responses from RGW. > > Brian > > On Fri, Jul 29, 2016 at 5:14 AM, Dan van der Ster > wrote: >> >> On Fri, Jul 29, 2016 at 12:06 PM, Wido den Hollander >> wrote: >&

[ceph-users] Cascading failure on a placement group

2016-08-13 Thread Hein-Pieter van Braam
Hello all, My cluster started to lose OSDs without any warning, whenever an OSD becomes the primary for a particular PG it crashes with the following stacktrace:  ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)  1: /usr/bin/ceph-osd() [0xada722]  2: (()+0xf100) [0x7fc28bca5100]  3:

Re: [ceph-users] Cascading failure on a placement group

2016-08-13 Thread Hein-Pieter van Braam
_ > From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of > Hein-Pieter van Braam [h...@tmm.cx] > Sent: 13 August 2016 21:48 > To: ceph-users > Subject: [ceph-users] Cascading failure on a placement group > > Hello all, > > M

Re: [ceph-users] Cascading failure on a placement group

2016-08-13 Thread Hein-Pieter van Braam
es/9732 > > Cheers > Goncalo > > From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of > Goncalo Borges [goncalo.bor...@sydney.edu.au] > Sent: 13 August 2016 22:23 > To: Hein-Pieter van Braam; ceph-users > Subject: Re: [ceph-users] Cascading failure on a placement

Re: [ceph-users] Multiple OSD crashing a lot

2016-08-13 Thread Hein-Pieter van Braam
Hi Blade, I appear to be stuck in the same situation you were in. Do you still happen to have a patch to implement this workaround you described? Thanks, - HP ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph

Re: [ceph-users] Multiple OSD crashing a lot

2016-08-13 Thread Hein-Pieter van Braam
e); > assert(obc); --ctx->delta_stats.num_objects; --ctx- > >delta_stats.num_objects_hit_set_archive; > if( obc) > { >  ctx->delta_stats.num_bytes -= obc->obs.oi.size; >  ctx->delta_stats.num_bytes_hit_set_archive -= obc->obs.oi.size; > } > > >

Re: [ceph-users] Calculate and increase pg_num

2013-03-15 Thread Dan van der Ster
Hi, On Fri, Mar 15, 2013 at 9:52 AM, Sebastien Han wrote: > Hi, > > It's not recommended to use this command yet. > > As a workaround you can do: > > $ ceph osd pool create > $ rados cppool > $ ceph osd pool delete > $ ceph osd pool rename > We've just done exactly this on the default p

Re: [ceph-users] Calculate and increase pg_num

2013-03-15 Thread Dan van der Ster
monitor log has this line: > > 2013-03-15 16:08:08.327049 7fe957441700 0 -- 192.168.21.11:6789/0 >> > 192.168.21.10:0/491826119 pipe(0x1b94c80 sd=23 :6789 s=0 pgs=0 cs=0 > l=0).accept peer addr is really 192.168.21.10:0/491826119 (socket is > 192.168.21.10:54670/0) > > -- > M

Re: [ceph-users] Calculate and increase pg_num

2013-03-15 Thread Dan van der Ster
On Fri, Mar 15, 2013 at 4:44 PM, Marco Aroldi wrote: > Dan, > this sound weird: > how can you run "cephfs /mnt/mycephfs set_layout 10" on a unmounted > mountpoint? We had cephfs still mounted from earlier (before the copy pool, delete pool). Basically, any file reads resulted in a I/O error, but

[ceph-users] kernel BUG when mapping unexisting rbd device

2013-03-25 Thread Dan van der Ster
Hi, Apologies if this is already a known bug (though I didn't find it). If we try to map a device that doesn't exist, we get an immediate and reproduceable kernel BUG (see the P.S.). We hit this by accident because we forgot to add the --pool . This works: [root@afs245 /]# rbd map afs254-vicepa

Re: [ceph-users] Cluster Map Problems

2013-03-28 Thread Dan van der Ster
Shouldn't it just be: step take default step chooseleaf firstn 0 type rack step emit Like he has for data and metadata? -- Dan On Thu, Mar 28, 2013 at 2:51 AM, Martin Mailand wrote: > Hi John, > > I still think this part in the crushmap is wrong. > > step take d

[ceph-users] spontaneous pg inconstancies in the rgw.gc pool

2013-04-18 Thread Dan van der Ster
something deleted the objects from the .rgw.gc pool (as one would expect) but the pgs were left inconsistent afterwards. Best Regards, Dan van der Ster CERN IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] spontaneous pg inconstancies in the rgw.gc pool

2013-04-18 Thread Dan van der Ster
-20130411 [root@ceph-radosgw01 ceph]# df -h . FilesystemSize Used Avail Use% Mounted on /dev/mapper/vg1-root 37G 37G 0 100% / The radosgw log filled up the disk. Perhaps this caused the problem.. Cheers, Dan CERN IT On Thu, Apr 18, 2013 at 3:52 PM, Dan van der Ster wrote

Re: [ceph-users] spontaneous pg inconstancies in the rgw.gc pool

2013-04-18 Thread Dan van der Ster
log level or increase the log rotate frequency. Thanks again, Dan CERN IT On Thu, Apr 18, 2013 at 4:09 PM, Dan van der Ster wrote: > Replying to myself... > I just noticed this: > > [root@ceph-radosgw01 ceph]# ls -lh /var/log/ceph/ > total 27G > -rw-r--r--. 1 root roo

Re: [ceph-users] EPEL packages for QEMU-KVM with rbd support?

2013-05-07 Thread Dan van der Ster
Cinder and Glance are still failing to attach rbd volumes or boot from volumes for some unknown reason. We'd be very interested if someone else is trying/succeeding to achieve the same setup, RDO OpenStack + RBD. Cheers, Dan van der Ster CERN IT ___

Re: [ceph-users] e release

2013-05-13 Thread Dan van der Ster
On Fri, May 10, 2013 at 8:31 PM, Sage Weil wrote: > So far I've found > a few latin names, but the main problem is that I can't find a single > large list of species with the common names listed. Go here: http://www.marinespecies.org/aphia.php?p=search Search for common name begins with e Taxon r

[ceph-users] crushtool won't compile its own output

2013-07-08 Thread Dan Van Der Ster
Hi, We are just deploying a new cluster (0.61.4) and noticed this: [root@andy01 ~]# ceph osd getcrushmap -o crush.map got crush map from osdmap epoch 2166 [root@andy01 ~]# crushtool -d crush.map -o crush.txt [root@andy01 ~]# crushtool -c crush.txt -o crush2.map crush.txt:640 error: parse error at

Re: [ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-22 Thread Dan van der Ster
trim more at a time. > > > On 06/21/2017 09:27 AM, Dan van der Ster wrote: >> >> Hi Casey, >> >> I managed to trim up all shards except for that big #54. The others >> all trimmed within a few seconds. >> >> But 54 is proving difficult. It's still

Re: [ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-22 Thread Dan van der Ster
On Wed, Jun 21, 2017 at 4:16 PM, Peter Maloney wrote: > On 06/14/17 11:59, Dan van der Ster wrote: >> Dear ceph users, >> >> Today we had O(100) slow requests which were caused by deep-scrubbing >> of the metadata log: >> >> 2017-06-14 11:07:55.373184 os

Re: [ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-22 Thread Dan van der Ster
On Thu, Jun 22, 2017 at 4:25 PM, Casey Bodley wrote: > > On 06/22/2017 04:00 AM, Dan van der Ster wrote: >> >> I'm now running the three relevant OSDs with that patch. (Recompiled, >> replaced /usr/lib64/rados-classes/libcls_log.so with the new version, >> t

Re: [ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-23 Thread Dan van der Ster
On Thu, Jun 22, 2017 at 5:31 PM, Casey Bodley wrote: > > On 06/22/2017 10:40 AM, Dan van der Ster wrote: >> >> On Thu, Jun 22, 2017 at 4:25 PM, Casey Bodley wrote: >>> >>> On 06/22/2017 04:00 AM, Dan van der Ster wrote: >>>> >>>

Re: [ceph-users] TRIM/Discard on SSDs with BlueStore

2017-06-27 Thread Dan van der Ster
On Tue, Jun 27, 2017 at 1:56 PM, Christian Balzer wrote: > On Tue, 27 Jun 2017 13:24:45 +0200 (CEST) Wido den Hollander wrote: > >> > Op 27 juni 2017 om 13:05 schreef Christian Balzer : >> > >> > >> > On Tue, 27 Jun 2017 11:24:54 +0200 (CEST) Wido den Hollander wrote: >> > >> > > Hi, >> > > >> > >

[ceph-users] hammer -> jewel 10.2.8 upgrade and setting sortbitwise

2017-07-10 Thread Dan van der Ster
Hi all, With 10.2.8, ceph will now warn if you didn't yet set sortbitwise. I just updated a test cluster, saw that warning, then did the necessary ceph osd set sortbitwise I noticed a short re-peering which took around 10s on this small cluster with very little data. Has anyone done this alre

Re: [ceph-users] autoconfigured haproxy service?

2017-07-11 Thread Dan van der Ster
On Tue, Jul 11, 2017 at 5:40 PM, Sage Weil wrote: > On Tue, 11 Jul 2017, Haomai Wang wrote: >> On Tue, Jul 11, 2017 at 11:11 PM, Sage Weil wrote: >> > On Tue, 11 Jul 2017, Sage Weil wrote: >> >> Hi all, >> >> >> >> Luminous features a new 'service map' that lets rgw's (and rgw nfs >> >> gateways

Re: [ceph-users] Stealth Jewel release?

2017-07-12 Thread Dan van der Ster
On Wed, Jul 12, 2017 at 5:51 PM, Abhishek L wrote: > On Wed, Jul 12, 2017 at 9:13 PM, Xiaoxi Chen wrote: >> +However, it also introduced a regression that could cause MDS damage. >> +Therefore, we do *not* recommend that Jewel users upgrade to this version - >> +instead, we recommend upgrading di

Re: [ceph-users] PG stuck inconsistent, but appears ok?

2017-07-13 Thread Dan van der Ster
On Thu, Jul 13, 2017 at 4:23 PM, Aaron Bassett wrote: > Because it was a read error I check SMART stats for that osd's disk and sure > enough, it had some uncorrected read errors. In order to stop it from causing > more problems > I stopped the daemon to let ceph recover from the other osds. >

Re: [ceph-users] hammer -> jewel 10.2.8 upgrade and setting sortbitwise

2017-07-14 Thread Dan van der Ster
, we just upgraded our biggest prod clusters to jewel -- that also went totally smooth!) -- Dan > sage > > >> >> >> >> On Mon, Jul 10, 2017 at 3:17 PM, Dan van der Ster >> wrote: >> > Hi all, >> > >> > With 10.2.8, ceph will

[ceph-users] how to list and reset the scrub schedules

2017-07-14 Thread Dan van der Ster
Hi, Occasionally we want to change the scrub schedule for a pool or whole cluster, but we want to do this by injecting new settings without restarting every daemon. I've noticed that in jewel, changes to scrub_min/max_interval and deep_scrub_interval do not take immediate effect, presumably becau

Re: [ceph-users] PG stuck inconsistent, but appears ok?

2017-07-14 Thread Dan van der Ster
log [INF] >> : 21.1ae9 deep-scrub ok >> >> >> each time I run it, its the same pg. >> >> Is there some reason its not scrubbing all the pgs? >> >> Aaron >> >> > On Jul 13, 2017, at 10:29 AM, Aaron Bassett >> > wrote: >&g

Re: [ceph-users] XFS attempt to access beyond end of device

2017-07-18 Thread Dan van der Ster
On Tue, Jul 18, 2017 at 6:08 AM, Marcus Furlong wrote: > On 22 March 2017 at 05:51, Dan van der Ster wrote: >> On Wed, Mar 22, 2017 at 8:24 AM, Marcus Furlong >> wrote: >>> Hi, >>> >>> I'm experiencing the same issue as outlined in this post: &

Re: [ceph-users] how to list and reset the scrub schedules

2017-07-18 Thread Dan van der Ster
On Fri, Jul 14, 2017 at 10:40 PM, Gregory Farnum wrote: > On Fri, Jul 14, 2017 at 5:41 AM Dan van der Ster wrote: >> >> Hi, >> >> Occasionally we want to change the scrub schedule for a pool or whole >> cluster, but we want to do this by injecting new settings w

Re: [ceph-users] hammer -> jewel 10.2.8 upgrade and setting sortbitwise

2017-07-18 Thread Dan van der Ster
ave a cluster running OSDs on > 10.2.6 and some OSDs on 10.2.9? Or should we wait that all OSDs are on > 10.2.9? > > Monitor nodes are already on 10.2.9. > > Best, > Martin > > On Fri, Jul 14, 2017 at 1:16 PM, Dan van der Ster wrote: >> On Mon, Jul 10, 2017 at 5:06 PM, S

[ceph-users] ipv6 monclient

2017-07-19 Thread Dan van der Ster
Hi Wido, Quick question about IPv6 clusters which you may have already noticed. We have an IPv6 cluster and clients use this as the ceph.conf: [global] mon host = cephv6.cern.ch cephv6 is an alias to our three mons, which are listening on their v6 addrs (ms bind ipv6 = true). But those mon hos

[ceph-users] Linear space complexity or memory leak in `Radosgw-admin bucket check --fix`

2017-07-25 Thread Hans van den Bogert
Hi All, I don't seem to be able to fix a bucket, a bucket which has become inconsistent due to the use of the `inconsistent-index` flag 8). My ceph-admin VM has 4GB of RAM, but that doesn't seem to be enough to do a `radosgw-admin bucket check --fix` which holds 6M items, as the radosgw-admin pro

[ceph-users] ceph osd safe to remove

2017-07-28 Thread Dan van der Ster
Hi all, We are trying to outsource the disk replacement process for our ceph clusters to some non-expert sysadmins. We could really use a tool that reports if a Ceph OSD *would* or *would not* be safe to stop, e.g. # ceph-osd-safe-to-stop osd.X Yes it would be OK to stop osd.X (which of course m

Re: [ceph-users] ceph osd safe to remove

2017-08-03 Thread Dan van der Ster
ll req. -- Dan On Fri, Jul 28, 2017 at 9:39 PM, Alexandre Germain wrote: > Hello Dan, > > Something like this maybe? > > https://github.com/CanonicalLtd/ceph_safe_disk > > Cheers, > > Alex > > 2017-07-28 9:36 GMT-04:00 Dan van der Ster : >> >> H

Re: [ceph-users] ceph osd safe to remove

2017-08-03 Thread Dan van der Ster
complete, respectively. (the magic that made my reweight script > efficient compared to the official reweight script) > > And I have not used such a method in the past... my cluster is small, so I > have always just let recovery completely finish instead. I hope you find it > usefu

Re: [ceph-users] ceph osd safe to remove

2017-08-03 Thread Dan van der Ster
On Thu, Aug 3, 2017 at 11:42 AM, Peter Maloney wrote: > On 08/03/17 11:05, Dan van der Ster wrote: > > On Fri, Jul 28, 2017 at 9:42 PM, Peter Maloney > wrote: > > Hello Dan, > > Based on what I know and what people told me on IRC, this means basicaly the > condition

[ceph-users] Gracefully reboot OSD node

2017-08-03 Thread Hans van den Bogert
Hi all, One thing which has bothered since the beginning of using ceph is that a reboot of a single OSD causes a HEALTH_ERR state for the cluster for at least a couple of seconds. In the case of planned reboot of a OSD node, should I do some extra commands in order not to go to HEALTH_ERR state?

Re: [ceph-users] Gracefully reboot OSD node

2017-08-03 Thread Hans van den Bogert
cted? On Thu, Aug 3, 2017 at 1:36 PM, linghucongsong wrote: > > > set the osd noout nodown > > > > > At 2017-08-03 18:29:47, "Hans van den Bogert" > wrote: > > Hi all, > > One thing which has bothered since the beginning of using ceph is that a >

Re: [ceph-users] Gracefully reboot OSD node

2017-08-03 Thread Hans van den Bogert
Aug 3, 2017 at 1:55 PM, Hans van den Bogert wrote: > What are the implications of this? Because I can see a lot of blocked > requests piling up when using 'noout' and 'nodown'. That probably makes > sense though. > Another thing, no when the OSDs come back onli

Re: [ceph-users] expanding cluster with minimal impact

2017-08-04 Thread Dan van der Ster
ious > threads on this topic from the list I've found the ceph-gentle-reweight > script > (https://github.com/cernceph/ceph-scripts/blob/master/tools/ceph-gentle-reweight) > created by Dan van der Ster (Thank you Dan for sharing the script with us!). > > I've done some e

Re: [ceph-users] expanding cluster with minimal impact

2017-08-08 Thread Dan van der Ster
0 each time which > seemed to reduce the extra data movement we were seeing with smaller weight > increases. Maybe something to try out next time? > > Bryan > > From: ceph-users on behalf of Dan van der > Ster > Date: Friday, August 4, 2017 at 1:59 AM > To: L

Re: [ceph-users] ceph-fuse mouting and returning 255

2017-08-10 Thread Dan van der Ster
Hi, I also noticed this and finally tracked it down: http://tracker.ceph.com/issues/20972 Cheers, Dan On Mon, Jul 10, 2017 at 3:58 PM, Florent B wrote: > Hi, > > Since 10.2.8 Jewel update, when ceph-fuse is mounting a file system, it > returns 255 instead of 0 ! > > $ mount /mnt/cephfs-drupal

Re: [ceph-users] Reaching aio-max-nr on Ubuntu 16.04 with Luminous

2017-08-30 Thread Dan van der Ster
Hi Thomas, Yes we set it to a million. >From our puppet manifest: # need to increase aio-max-nr to allow many bluestore devs sysctl { 'fs.aio-max-nr': val => '1048576' } Cheers, Dan On Aug 30, 2017 9:53 AM, "Thomas Bennett" wrote: > > Hi, > > I've been testing out Lum

Re: [ceph-users] Very slow start of osds after reboot

2017-08-31 Thread Dan van der Ster
Hi, I see the same with jewel on el7 -- it started one of the recent point releases around ~10.2.5, IIRC. Problem seems to be the same -- daemon is started before the osd is mounted... then the service waits several seconds before trying again. Aug 31 15:41:47 ceph-osd: 2017-08-31 15:41:47.26766

Re: [ceph-users] Very slow start of osds after reboot

2017-08-31 Thread Dan van der Ster
├─ceph-osd@84.service ● │ ├─ceph-osd@89.service ● │ ├─ceph-osd@90.service ● │ ├─ceph-osd@91.service ● │ └─ceph-osd@92.service ● ├─getty.target ... On Thu, Aug 31, 2017 at 4:57 PM, Dan van der Ster wrote: > Hi, > > I see the same with jewel on el7 -- it started one of the recent

Re: [ceph-users] moving mons across networks

2017-09-13 Thread Dan van der Ster
Hi Blair, You can add/remove mons on the fly -- connected clients will learn about all of the mons as the monmap changes and there won't be any downtime as long as the quorum is maintained. There is one catch when it comes to OpenStack, however. Unfortunately, OpenStack persists the mon IP addres

Re: [ceph-users] moving mons across networks

2017-09-13 Thread Dan van der Ster
On Wed, Sep 13, 2017 at 10:54 AM, Wido den Hollander wrote: > >> Op 13 september 2017 om 10:38 schreef Dan van der Ster : >> >> >> Hi Blair, >> >> You can add/remove mons on the fly -- connected clients will learn >> about all of the mons as the monm

Re: [ceph-users] moving mons across networks

2017-09-13 Thread Dan van der Ster
On Wed, Sep 13, 2017 at 11:04 AM, Dan van der Ster wrote: > On Wed, Sep 13, 2017 at 10:54 AM, Wido den Hollander wrote: >> >>> Op 13 september 2017 om 10:38 schreef Dan van der Ster >>> : >>> >>> >>> Hi Blair, >>> >>> You

Re: [ceph-users] tunable question

2017-09-28 Thread Dan van der Ster
Hi, How big is your cluster and what is your use case? For us, we'll likely never enable the recent tunables that need to remap *all* PGs -- it would simply be too disruptive for marginal benefit. Cheers, Dan On Thu, Sep 28, 2017 at 9:21 AM, mj wrote: > Hi, > > We have completed the upgrade t

Re: [ceph-users] why sudden (and brief) HEALTH_ERR

2017-10-04 Thread Dan van der Ster
On Wed, Oct 4, 2017 at 9:08 AM, Piotr Dałek wrote: > On 17-10-04 08:51 AM, lists wrote: >> >> Hi, >> >> Yesterday I chowned our /var/lib/ceph ceph, to completely finalize our >> jewel migration, and noticed something interesting. >> >> After I brought back up the OSDs I just chowned, the system ha

Re: [ceph-users] ceph-volume: migration and disk partition support

2017-10-10 Thread Dan van der Ster
On Fri, Oct 6, 2017 at 6:56 PM, Alfredo Deza wrote: > Hi, > > Now that ceph-volume is part of the Luminous release, we've been able > to provide filestore support for LVM-based OSDs. We are making use of > LVM's powerful mechanisms to store metadata which allows the process > to no longer rely on

[ceph-users] How to get current min-compat-client setting

2017-10-13 Thread Hans van den Bogert
Hi, I’m in the middle of debugging some incompatibilities with an upgrade of Proxmox which uses Ceph. At this point I’d like to know what my current value is for the min-compat-client setting, which would’ve been set by: ceph osd set-require-min-compat-client … AFAIK, there is no direct g

Re: [ceph-users] How to get current min-compat-client setting

2017-10-16 Thread Hans van den Bogert
t; >> Op 13 oktober 2017 om 10:22 schreef Hans van den Bogert >> : >> >> >> Hi, >> >> I’m in the middle of debugging some incompatibilities with an upgrade of >> Proxmox which uses Ceph. At this point I’d like to know what my current >>

[ceph-users] High mem with Luminous/Bluestore

2017-10-18 Thread Hans van den Bogert
Hi All, I've converted 2 nodes with 4 HDD/OSDs each from Filestore to Bluestore. I expected somewhat higher memory usage/RSS values, however I see, imo, a huge memory usage for all OSDs on both nodes. Small snippet from `top` PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ C

Re: [ceph-users] High mem with Luminous/Bluestore

2017-10-18 Thread Hans van den Bogert
ke HDDs and monitor the memory usage. Thanks, Hans On Wed, Oct 18, 2017 at 11:56 AM, Wido den Hollander wrote: > > > Op 18 oktober 2017 om 11:41 schreef Hans van den Bogert < > hansbog...@gmail.com>: > > > > > > Hi All, > > > > I've c

Re: [ceph-users] High mem with Luminous/Bluestore

2017-10-19 Thread Hans van den Bogert
> Memory usage is still quite high here even with a large onode cache! > Are you using erasure coding? I recently was able to reproduce a bug in > bluestore causing excessive memory usage during large writes with EC, > but have not tracked down exactly what's going on yet. > > Mark No, this is

Re: [ceph-users] Ceph delete files and status

2017-10-20 Thread Hans van den Bogert
My experience with RGW is that actual freeing up of space is asynchronous to the a S3 client’s command to delete an object. I.e., it might take a while before it’s actually freed up. Can you redo your little experiment and simply wait for an hour to let the garbage collector to do its thing, or

[ceph-users] Drive write cache recommendations for Luminous/Bluestore

2017-10-23 Thread Hans van den Bogert
Hi All, For Jewel there is this page about drive cache: http://docs.ceph.com/docs/jewel/rados/configuration/filesystem-recommendations/#hard-drive-prep For Bluestore I can't find any documentation or discussions about drive write cache, while I can imagine that revisiting this subject might be ne

Re: [ceph-users] announcing ceph-helm (ceph on kubernetes orchestration)

2017-10-25 Thread Hans van den Bogert
Very interesting. I've been toying around with Rook.io [1]. Did you know of this project, and if so can you tell if ceph-helm and Rook.io have similar goals? Regards, Hans [1] https://rook.io/ On 25 Oct 2017 21:09, "Sage Weil" wrote: > There is a new repo under the ceph org, ceph-helm, which

Re: [ceph-users] PGs inconsistent, do I fear data loss?

2017-11-02 Thread Hans van den Bogert
> On Nov 1, 2017, at 4:45 PM, David Turner wrote: > > All it takes for data loss is that an osd on server 1 is marked down and a > write happens to an osd on server 2. Now the osd on server 2 goes down > before the osd on server 1 has finished backfilling and the first osd > receives a reque

Re: [ceph-users] PGs inconsistent, do I fear data loss?

2017-11-02 Thread Hans van den Bogert
Never mind, I should’ve read the whole thread first. > On Nov 2, 2017, at 10:50 AM, Hans van den Bogert wrote: > > >> On Nov 1, 2017, at 4:45 PM, David Turner > <mailto:drakonst...@gmail.com>> wrote: >> >> All it takes for data loss is that an osd on

[ceph-users] Ceph versions not showing RGW

2017-11-02 Thread Hans van den Bogert
Hi all, During our upgrade from Jewel to Luminous I saw the following behaviour, if my memory serves me right: When upgrading for example monitors and OSDs, we saw that the `ceph versions` command correctly showed at one that some OSDs were still on Jewel (10.2.x) and some were already upgraded a

Re: [ceph-users] Ceph versions not showing RGW

2017-11-02 Thread Hans van den Bogert
Just to get this really straight, Jewel OSDs do send this metadata? Otherwise I'm probably mistaken that I ever saw 10.2.x versions in the output. Thanks, Hans On 2 Nov 2017 12:31 PM, "John Spray" wrote: > On Thu, Nov 2, 2017 at 11:16 AM, Hans van den Bogert &g

Re: [ceph-users] Cephfs snapshot work

2017-11-07 Thread Dan van der Ster
On Tue, Nov 7, 2017 at 12:57 PM, John Spray wrote: > On Sun, Nov 5, 2017 at 4:19 PM, Brady Deetz wrote: >> My organization has a production cluster primarily used for cephfs upgraded >> from jewel to luminous. We would very much like to have snapshots on that >> filesystem, but understand that t

Re: [ceph-users] Cephfs snapshot work

2017-11-07 Thread Dan van der Ster
On Tue, Nov 7, 2017 at 4:15 PM, John Spray wrote: > On Tue, Nov 7, 2017 at 3:01 PM, Dan van der Ster wrote: >> On Tue, Nov 7, 2017 at 12:57 PM, John Spray wrote: >>> On Sun, Nov 5, 2017 at 4:19 PM, Brady Deetz wrote: >>>> My organization has a production clu

Re: [ceph-users] Fwd: Luminous RadosGW issue

2017-11-08 Thread Hans van den Bogert
Are you sure you deployed it with the client.radosgw.gateway name as well? Try to redeploy the RGW and make sure the name you give it corresponds to the name you give in the ceph.conf. Also, do not forget to push the ceph.conf to the RGW machine. On Wed, Nov 8, 2017 at 11:44 PM, Sam Huracan wrote

Re: [ceph-users] Fwd: Luminous RadosGW issue

2017-11-09 Thread Hans van den Bogert
config show | grep log_file > "log_file": "/var/log/ceph/ceph-client.rgw.radosgw.log", > > > [root@radosgw system]# cat /etc/ceph/ceph.client.radosgw.keyring > [client.radosgw.gateway] > key = AQCsywNaqQdDHxAAC24O8CJ0A9Gn6qeiPalEYg== > caps mon = "all

Re: [ceph-users] ceps-deploy won't install luminous

2017-11-15 Thread Hans van den Bogert
Hi, Can you show the contents of the file, /etc/yum.repos.d/ceph.repo ? Regards, Hans > On Nov 15, 2017, at 10:27 AM, Ragan, Tj (Dr.) > wrote: > > Hi All, > > I feel like I’m doing something silly. I’m spinning up a new cluster, and > followed the instructions on the pre-flight and quick s

<    1   2   3   4   5   6   7   8   9   >