Hi Blaire! (re-copying to list)
The good news is that the functionality of that python script is now
available natively in jewel and has been backported to hammer 0.96.7.
Now you can use
ceph osd test-reweight-by-(pg|utilization)
in order to see how the weights would change if you were to run
On Mon, May 16, 2016 at 8:20 AM, Chris Dunlop wrote:
> On Fri, May 13, 2016 at 10:21:51AM -0400, Sage Weil wrote:
>> This Hammer point release fixes several minor bugs. It also includes a
>> backport of an improved ‘ceph osd reweight-by-utilization’ command for
>> handling OSDs with higher-than-av
On 16 May 2016 16:36, "John Spray" wrote:
>
> On Mon, May 16, 2016 at 3:11 PM, Andrus, Brian Contractor
> wrote:
> > Both client and server are Jewel 10.2.0
>
> So the fuse client, correct? If you are up for investigating further,
> with potential client bugs (or performance issues) it is often
Hi Sage et al,
I'm updating our pre-prod cluster from 0.94.6 to 0.94.7 and after
upgrading the ceph-mon's I'm getting loads of warnings like:
2016-05-17 10:01:29.314785 osd.76 [WRN] failed to encode map e103116
with expected crc
I've seen that error is whitelisted in the qa-suite:
https://github
Hello everyone!
I'm putting CephFS in production here to host Dovecot mailboxes. That's a
big use case in the Dovecot community.
Versions:
Ubuntu 14.04 LTS with kerrnel 4.4.0-22-generic
Ceph 10.2.1-1trusty
CephFS uses the kernel client
Right now I'm migrating my users to this new systems. That s
On Tue, May 17, 2016 at 1:56 PM, Sage Weil wrote:
> On Tue, 17 May 2016, Dan van der Ster wrote:
>> Hi Sage et al,
>>
>> I'm updating our pre-prod cluster from 0.94.6 to 0.94.7 and after
>> upgrading the ceph-mon's I'm getting loads of warnings like:
&g
ave it like this for the time being. Help would be
very much appreciated!
Thank you,
- Hein-Pieter van Braam# ceph pg 54.3e9 query
{
"state": "incomplete",
"snap_trimq": "[]",
"epoch": 90440,
"up": [
32,
imary osd for that pg with
> osd_find_best_info_ignore_history_les set to true (don't leave it set
> long term).
> -Sam
>
> On Tue, May 17, 2016 at 7:50 AM, Hein-Pieter van Braam
> wrote:
> >
> > Hello,
> >
> > Today we had a power failure in a ra
Hi,
We want to enable the hammer rbd features on newly created Cinder
volumes [1], but we still have a few VMs running with super old librbd
running (dumpling).
Perhaps its academic, but does anyone know the expected behaviour if
an old dumpling-linked qemu-kvm tries to attach an rbd with
exclusi
sumed it to be a more noisy (but harmless)
> upgrade artifact.
>
> Christian
>
> On Tue, 17 May 2016 14:07:21 +0200 Dan van der Ster wrote:
>
> > On Tue, May 17, 2016 at 1:56 PM, Sage Weil wrote:
> > > On Tue, 17 May 2016, Dan van der Ster wrote:
> > >> Hi Sage
:
ceph tell osd.* injectargs -- --clog_to_monitors=false
which made things much better.
When I upgrade our 2nd cluster tomorrow, I'll set
clog_to_monitors=false before starting.
Cheers, Dan
On Tue, May 24, 2016 at 10:02 AM, Dan van der Ster wrote:
> Hi all,
>
> I'm mid-upgrade
Hi,
Are you sure all OSDs have been updated to 0.94.7? Those messages should
only be printed by 0.94.6 OSDs trying to handle messages from a 0.94.7
ceph-mon.
Also, see the thread about the 0.94.7 release -- I mentioned a workaround
there.
--
Dan
On Thu, Jun 2, 2016 at 11:29 AM, Romero Junior wr
Hi,
I don't really have a solution but I can confirm I had the same problem
trying to deploy my new Jewel cluster. I reinstalled the cluster with
Hammer and everything is working as I expect it to (that is; writes hit
the backing pool asynchronously)
Although other than you I noticed the same pro
Dear Ceph Community,
Yesterday we had the pleasure of hosting Ceph Day Switzerland, and we
wanted to let you know that the slides and videos of most talks have
been posted online:
https://indico.cern.ch/event/542464/timetable/
Thanks again to all the speakers and attendees!
Hervé & Dan
CERN
On Mon, Jun 27, 2016 at 2:14 AM, Christian Balzer wrote:
> On Sun, 26 Jun 2016 19:48:18 +0200 Stefan Priebe wrote:
>
>> Hi,
>>
>> is there any option or chance to have auto repair of pgs in hammer?
>>
> Short answer:
> No, in any version of Ceph.
Well, jewel has a new option to auto-repair a PG i
Hi,
On Tue, Jul 5, 2016 at 9:23 AM, Götz Reinicke - IT Koordinator
wrote:
> Hi,
>
> we have offers for ceph storage nodes with different SSD types and some
> are already mentioned as a very good choice but some are total new to me.
>
> May be you could give some feedback on the SSDs in question o
On Tue, Jul 5, 2016 at 9:53 AM, Christian Balzer wrote:
>> Unfamiliar: Samsung SM863
>>
> You might want to read the thread here:
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-February/007871.html
>
> And google "ceph SM863".
>
> However I'm still waiting for somebody to confirm that
On Tue, Jul 5, 2016 at 10:04 AM, Dan van der Ster wrote:
> On Tue, Jul 5, 2016 at 9:53 AM, Christian Balzer wrote:
>>> Unfamiliar: Samsung SM863
>>>
>> You might want to read the thread here:
>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-February/0
We have 5 journal partitions per SSD. Works fine (on el6 and el7).
Best practice is to use ceph-disk:
ceph-disk prepare /dev/sde /dev/sdc # where e is the osd, c is an SSD.
-- Dan
On Wed, Jul 6, 2016 at 2:03 PM, George Shuklin wrote:
> Hello.
>
> I've been testing Intel 3500 as journal stor
Hi all,
I had a crash of some OSDs today, every primary OSD of a particular PG
just started to crash. I have recorded the information for a
bugreport.
I had reweighted the affected osds to 0 and put the processes in a
restart loop and eventually all but one placement group ended up
recovering. I
Hi all,
I had a crash of some OSDs today, every primary OSD of a particular PG
just started to crash. I have recorded the information for a
bugreport.
I had reweighted the affected osds to 0 and put the processes in a
restart loop and eventually all but one placement group ended up
recovering. I
7;ve attached the latest version of the pg
query for this pg.
Thanks a lot.
- HP
On Sat, 2016-07-16 at 19:56 +0200, Hein-Pieter van Braam wrote:
> Hi all,
>
> I had a crash of some OSDs today, every primary OSD of a particular
> PG
> just started to crash. I have recorded the informatio
Hi,
The mon's keep all maps going back to the last time the cluster had
HEALTH_OK, which is why the mon leveldb's are so large in your case.
(I see Greg responded with the same info). Focus on getting the
cluster healthy, then the mon sizes should resolve themselves.
--
Dan
On Thu, Jul 21, 2016
On Tue, Jul 26, 2016 at 3:52 AM, Brad Hubbard wrote:
>> 1./ if I try to change mon_osd_nearfull_ratio from 0.85 to 0.90, I get
>>
>># ceph tell mon.* injectargs "--mon_osd_nearfull_ratio 0.90"
>>mon.rccephmon1: injectargs:mon_osd_nearfull_ratio = '0.9'
>>(unchangeable)
>>mon.rcceph
Hi,
Starting from the beginning...
If a 3-replica PG gets stuck with only 2 replicas after changing
tunables, it's probably a case where choose_total_tries is too low for
your cluster configuration.
Try increasing choose_total_tries from 50 to 75.
-- Dan
On Fri, Jul 22, 2016 at 4:17 PM, Kosti
with over 150 OSDs
> and hundreds of TB...
>
> I would be grateful if you could point me to some code or
> documentation (for this tunable and the others too also) that would
> have make me "see" the problem earlier and make a plan for the future.
>
> Kostis
>
>
Hi,
Does anyone know a fast way for S3 users to query their total bucket
usage? 's3cmd du' takes a long time on large buckets (is it iterating
over all the objects?). 'radosgw-admin bucket stats' seems to know the
bucket usage immediately, but I didn't find a way to expose that to
end users.
Hopi
On Thu, Jul 28, 2016 at 5:33 PM, Abhishek Lekshmanan wrote:
>
> Dan van der Ster writes:
>
>> Hi,
>>
>> Does anyone know a fast way for S3 users to query their total bucket
>> usage? 's3cmd du' takes a long time on large buckets (is it iterating
>&
up
>> 1656225129419 29 objects s3://seanbackup/
>>
>> real 0m0.314s
>> user0m0.088s
>> sys 0m0.019s
>> [root@korn ~]#
>>
>>
>> On Thu, Jul 28, 2016 at 4:49 PM, Dan van der Ster
>> wrote:
>>>
>>> On Thu, Jul 2
On Fri, Jul 29, 2016 at 12:06 PM, Wido den Hollander wrote:
>
>> Op 29 juli 2016 om 11:59 schreef Dan van der Ster :
>>
>>
>> Oh yes, that should help. BTW, which client are people using for the
>> Admin Ops API? Is there something better than s3curl.pl ...
>
192.168.100.100, my.ceph.cluster, etc.). Once you add that, you should stop
> seeing the 403 responses from RGW.
>
> Brian
>
> On Fri, Jul 29, 2016 at 5:14 AM, Dan van der Ster
> wrote:
>>
>> On Fri, Jul 29, 2016 at 12:06 PM, Wido den Hollander
>> wrote:
>&
Hello all,
My cluster started to lose OSDs without any warning, whenever an OSD
becomes the primary for a particular PG it crashes with the following
stacktrace:
ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
1: /usr/bin/ceph-osd() [0xada722]
2: (()+0xf100) [0x7fc28bca5100]
3:
_
> From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of
> Hein-Pieter van Braam [h...@tmm.cx]
> Sent: 13 August 2016 21:48
> To: ceph-users
> Subject: [ceph-users] Cascading failure on a placement group
>
> Hello all,
>
> M
es/9732
>
> Cheers
> Goncalo
>
> From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of
> Goncalo Borges [goncalo.bor...@sydney.edu.au]
> Sent: 13 August 2016 22:23
> To: Hein-Pieter van Braam; ceph-users
> Subject: Re: [ceph-users] Cascading failure on a placement
Hi Blade,
I appear to be stuck in the same situation you were in. Do you still
happen to have a patch to implement this workaround you described?
Thanks,
- HP
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph
e);
> assert(obc); --ctx->delta_stats.num_objects; --ctx-
> >delta_stats.num_objects_hit_set_archive;
> if( obc)
> {
> ctx->delta_stats.num_bytes -= obc->obs.oi.size;
> ctx->delta_stats.num_bytes_hit_set_archive -= obc->obs.oi.size;
> }
>
>
>
Hi,
On Fri, Mar 15, 2013 at 9:52 AM, Sebastien Han
wrote:
> Hi,
>
> It's not recommended to use this command yet.
>
> As a workaround you can do:
>
> $ ceph osd pool create
> $ rados cppool
> $ ceph osd pool delete
> $ ceph osd pool rename
>
We've just done exactly this on the default p
monitor log has this line:
>
> 2013-03-15 16:08:08.327049 7fe957441700 0 -- 192.168.21.11:6789/0 >>
> 192.168.21.10:0/491826119 pipe(0x1b94c80 sd=23 :6789 s=0 pgs=0 cs=0
> l=0).accept peer addr is really 192.168.21.10:0/491826119 (socket is
> 192.168.21.10:54670/0)
>
> --
> M
On Fri, Mar 15, 2013 at 4:44 PM, Marco Aroldi wrote:
> Dan,
> this sound weird:
> how can you run "cephfs /mnt/mycephfs set_layout 10" on a unmounted
> mountpoint?
We had cephfs still mounted from earlier (before the copy pool, delete
pool). Basically, any file reads resulted in a I/O error, but
Hi,
Apologies if this is already a known bug (though I didn't find it).
If we try to map a device that doesn't exist, we get an immediate and
reproduceable kernel BUG (see the P.S.). We hit this by accident
because we forgot to add the --pool .
This works:
[root@afs245 /]# rbd map afs254-vicepa
Shouldn't it just be:
step take default
step chooseleaf firstn 0 type rack
step emit
Like he has for data and metadata?
--
Dan
On Thu, Mar 28, 2013 at 2:51 AM, Martin Mailand wrote:
> Hi John,
>
> I still think this part in the crushmap is wrong.
>
> step take d
something deleted the objects from the .rgw.gc pool (as one would
expect) but the pgs were left inconsistent afterwards.
Best Regards,
Dan van der Ster
CERN IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-20130411
[root@ceph-radosgw01 ceph]# df -h .
FilesystemSize Used Avail Use% Mounted on
/dev/mapper/vg1-root 37G 37G 0 100% /
The radosgw log filled up the disk. Perhaps this caused the problem..
Cheers, Dan
CERN IT
On Thu, Apr 18, 2013 at 3:52 PM, Dan van der Ster wrote
log level or increase the log rotate frequency.
Thanks again,
Dan
CERN IT
On Thu, Apr 18, 2013 at 4:09 PM, Dan van der Ster wrote:
> Replying to myself...
> I just noticed this:
>
> [root@ceph-radosgw01 ceph]# ls -lh /var/log/ceph/
> total 27G
> -rw-r--r--. 1 root roo
Cinder and Glance are still failing to attach rbd volumes or
boot from volumes for some unknown reason. We'd be very interested if
someone else is trying/succeeding to achieve the same setup, RDO
OpenStack + RBD.
Cheers,
Dan van der Ster
CERN IT
___
On Fri, May 10, 2013 at 8:31 PM, Sage Weil wrote:
> So far I've found
> a few latin names, but the main problem is that I can't find a single
> large list of species with the common names listed.
Go here: http://www.marinespecies.org/aphia.php?p=search
Search for common name begins with e
Taxon r
Hi,
We are just deploying a new cluster (0.61.4) and noticed this:
[root@andy01 ~]# ceph osd getcrushmap -o crush.map
got crush map from osdmap epoch 2166
[root@andy01 ~]# crushtool -d crush.map -o crush.txt
[root@andy01 ~]# crushtool -c crush.txt -o crush2.map
crush.txt:640 error: parse error at
trim more at a time.
>
>
> On 06/21/2017 09:27 AM, Dan van der Ster wrote:
>>
>> Hi Casey,
>>
>> I managed to trim up all shards except for that big #54. The others
>> all trimmed within a few seconds.
>>
>> But 54 is proving difficult. It's still
On Wed, Jun 21, 2017 at 4:16 PM, Peter Maloney
wrote:
> On 06/14/17 11:59, Dan van der Ster wrote:
>> Dear ceph users,
>>
>> Today we had O(100) slow requests which were caused by deep-scrubbing
>> of the metadata log:
>>
>> 2017-06-14 11:07:55.373184 os
On Thu, Jun 22, 2017 at 4:25 PM, Casey Bodley wrote:
>
> On 06/22/2017 04:00 AM, Dan van der Ster wrote:
>>
>> I'm now running the three relevant OSDs with that patch. (Recompiled,
>> replaced /usr/lib64/rados-classes/libcls_log.so with the new version,
>> t
On Thu, Jun 22, 2017 at 5:31 PM, Casey Bodley wrote:
>
> On 06/22/2017 10:40 AM, Dan van der Ster wrote:
>>
>> On Thu, Jun 22, 2017 at 4:25 PM, Casey Bodley wrote:
>>>
>>> On 06/22/2017 04:00 AM, Dan van der Ster wrote:
>>>>
>>>
On Tue, Jun 27, 2017 at 1:56 PM, Christian Balzer wrote:
> On Tue, 27 Jun 2017 13:24:45 +0200 (CEST) Wido den Hollander wrote:
>
>> > Op 27 juni 2017 om 13:05 schreef Christian Balzer :
>> >
>> >
>> > On Tue, 27 Jun 2017 11:24:54 +0200 (CEST) Wido den Hollander wrote:
>> >
>> > > Hi,
>> > >
>> > >
Hi all,
With 10.2.8, ceph will now warn if you didn't yet set sortbitwise.
I just updated a test cluster, saw that warning, then did the necessary
ceph osd set sortbitwise
I noticed a short re-peering which took around 10s on this small
cluster with very little data.
Has anyone done this alre
On Tue, Jul 11, 2017 at 5:40 PM, Sage Weil wrote:
> On Tue, 11 Jul 2017, Haomai Wang wrote:
>> On Tue, Jul 11, 2017 at 11:11 PM, Sage Weil wrote:
>> > On Tue, 11 Jul 2017, Sage Weil wrote:
>> >> Hi all,
>> >>
>> >> Luminous features a new 'service map' that lets rgw's (and rgw nfs
>> >> gateways
On Wed, Jul 12, 2017 at 5:51 PM, Abhishek L
wrote:
> On Wed, Jul 12, 2017 at 9:13 PM, Xiaoxi Chen wrote:
>> +However, it also introduced a regression that could cause MDS damage.
>> +Therefore, we do *not* recommend that Jewel users upgrade to this version -
>> +instead, we recommend upgrading di
On Thu, Jul 13, 2017 at 4:23 PM, Aaron Bassett
wrote:
> Because it was a read error I check SMART stats for that osd's disk and sure
> enough, it had some uncorrected read errors. In order to stop it from causing
> more problems > I stopped the daemon to let ceph recover from the other osds.
>
, we just upgraded our biggest prod clusters to jewel -- that also
went totally smooth!)
-- Dan
> sage
>
>
>>
>>
>>
>> On Mon, Jul 10, 2017 at 3:17 PM, Dan van der Ster
>> wrote:
>> > Hi all,
>> >
>> > With 10.2.8, ceph will
Hi,
Occasionally we want to change the scrub schedule for a pool or whole
cluster, but we want to do this by injecting new settings without
restarting every daemon.
I've noticed that in jewel, changes to scrub_min/max_interval and
deep_scrub_interval do not take immediate effect, presumably becau
log [INF]
>> : 21.1ae9 deep-scrub ok
>>
>>
>> each time I run it, its the same pg.
>>
>> Is there some reason its not scrubbing all the pgs?
>>
>> Aaron
>>
>> > On Jul 13, 2017, at 10:29 AM, Aaron Bassett
>> > wrote:
>&g
On Tue, Jul 18, 2017 at 6:08 AM, Marcus Furlong wrote:
> On 22 March 2017 at 05:51, Dan van der Ster wrote:
>> On Wed, Mar 22, 2017 at 8:24 AM, Marcus Furlong
>> wrote:
>>> Hi,
>>>
>>> I'm experiencing the same issue as outlined in this post:
&
On Fri, Jul 14, 2017 at 10:40 PM, Gregory Farnum wrote:
> On Fri, Jul 14, 2017 at 5:41 AM Dan van der Ster wrote:
>>
>> Hi,
>>
>> Occasionally we want to change the scrub schedule for a pool or whole
>> cluster, but we want to do this by injecting new settings w
ave a cluster running OSDs on
> 10.2.6 and some OSDs on 10.2.9? Or should we wait that all OSDs are on
> 10.2.9?
>
> Monitor nodes are already on 10.2.9.
>
> Best,
> Martin
>
> On Fri, Jul 14, 2017 at 1:16 PM, Dan van der Ster wrote:
>> On Mon, Jul 10, 2017 at 5:06 PM, S
Hi Wido,
Quick question about IPv6 clusters which you may have already noticed.
We have an IPv6 cluster and clients use this as the ceph.conf:
[global]
mon host = cephv6.cern.ch
cephv6 is an alias to our three mons, which are listening on their v6
addrs (ms bind ipv6 = true). But those mon hos
Hi All,
I don't seem to be able to fix a bucket, a bucket which has become
inconsistent due to the use of the `inconsistent-index` flag 8).
My ceph-admin VM has 4GB of RAM, but that doesn't seem to be enough to do a
`radosgw-admin bucket check --fix` which holds 6M items, as the
radosgw-admin pro
Hi all,
We are trying to outsource the disk replacement process for our ceph
clusters to some non-expert sysadmins.
We could really use a tool that reports if a Ceph OSD *would* or
*would not* be safe to stop, e.g.
# ceph-osd-safe-to-stop osd.X
Yes it would be OK to stop osd.X
(which of course m
ll req.
-- Dan
On Fri, Jul 28, 2017 at 9:39 PM, Alexandre Germain
wrote:
> Hello Dan,
>
> Something like this maybe?
>
> https://github.com/CanonicalLtd/ceph_safe_disk
>
> Cheers,
>
> Alex
>
> 2017-07-28 9:36 GMT-04:00 Dan van der Ster :
>>
>> H
complete, respectively. (the magic that made my reweight script
> efficient compared to the official reweight script)
>
> And I have not used such a method in the past... my cluster is small, so I
> have always just let recovery completely finish instead. I hope you find it
> usefu
On Thu, Aug 3, 2017 at 11:42 AM, Peter Maloney
wrote:
> On 08/03/17 11:05, Dan van der Ster wrote:
>
> On Fri, Jul 28, 2017 at 9:42 PM, Peter Maloney
> wrote:
>
> Hello Dan,
>
> Based on what I know and what people told me on IRC, this means basicaly the
> condition
Hi all,
One thing which has bothered since the beginning of using ceph is that a
reboot of a single OSD causes a HEALTH_ERR state for the cluster for at
least a couple of seconds.
In the case of planned reboot of a OSD node, should I do some extra
commands in order not to go to HEALTH_ERR state?
cted?
On Thu, Aug 3, 2017 at 1:36 PM, linghucongsong
wrote:
>
>
> set the osd noout nodown
>
>
>
>
> At 2017-08-03 18:29:47, "Hans van den Bogert"
> wrote:
>
> Hi all,
>
> One thing which has bothered since the beginning of using ceph is that a
>
Aug 3, 2017 at 1:55 PM, Hans van den Bogert
wrote:
> What are the implications of this? Because I can see a lot of blocked
> requests piling up when using 'noout' and 'nodown'. That probably makes
> sense though.
> Another thing, no when the OSDs come back onli
ious
> threads on this topic from the list I've found the ceph-gentle-reweight
> script
> (https://github.com/cernceph/ceph-scripts/blob/master/tools/ceph-gentle-reweight)
> created by Dan van der Ster (Thank you Dan for sharing the script with us!).
>
> I've done some e
0 each time which
> seemed to reduce the extra data movement we were seeing with smaller weight
> increases. Maybe something to try out next time?
>
> Bryan
>
> From: ceph-users on behalf of Dan van der
> Ster
> Date: Friday, August 4, 2017 at 1:59 AM
> To: L
Hi,
I also noticed this and finally tracked it down:
http://tracker.ceph.com/issues/20972
Cheers, Dan
On Mon, Jul 10, 2017 at 3:58 PM, Florent B wrote:
> Hi,
>
> Since 10.2.8 Jewel update, when ceph-fuse is mounting a file system, it
> returns 255 instead of 0 !
>
> $ mount /mnt/cephfs-drupal
Hi Thomas,
Yes we set it to a million.
>From our puppet manifest:
# need to increase aio-max-nr to allow many bluestore devs
sysctl { 'fs.aio-max-nr': val => '1048576' }
Cheers, Dan
On Aug 30, 2017 9:53 AM, "Thomas Bennett" wrote:
>
> Hi,
>
> I've been testing out Lum
Hi,
I see the same with jewel on el7 -- it started one of the recent point
releases around ~10.2.5, IIRC.
Problem seems to be the same -- daemon is started before the osd is
mounted... then the service waits several seconds before trying again.
Aug 31 15:41:47 ceph-osd: 2017-08-31 15:41:47.26766
├─ceph-osd@84.service
● │ ├─ceph-osd@89.service
● │ ├─ceph-osd@90.service
● │ ├─ceph-osd@91.service
● │ └─ceph-osd@92.service
● ├─getty.target
...
On Thu, Aug 31, 2017 at 4:57 PM, Dan van der Ster wrote:
> Hi,
>
> I see the same with jewel on el7 -- it started one of the recent
Hi Blair,
You can add/remove mons on the fly -- connected clients will learn
about all of the mons as the monmap changes and there won't be any
downtime as long as the quorum is maintained.
There is one catch when it comes to OpenStack, however.
Unfortunately, OpenStack persists the mon IP addres
On Wed, Sep 13, 2017 at 10:54 AM, Wido den Hollander wrote:
>
>> Op 13 september 2017 om 10:38 schreef Dan van der Ster :
>>
>>
>> Hi Blair,
>>
>> You can add/remove mons on the fly -- connected clients will learn
>> about all of the mons as the monm
On Wed, Sep 13, 2017 at 11:04 AM, Dan van der Ster wrote:
> On Wed, Sep 13, 2017 at 10:54 AM, Wido den Hollander wrote:
>>
>>> Op 13 september 2017 om 10:38 schreef Dan van der Ster
>>> :
>>>
>>>
>>> Hi Blair,
>>>
>>> You
Hi,
How big is your cluster and what is your use case?
For us, we'll likely never enable the recent tunables that need to
remap *all* PGs -- it would simply be too disruptive for marginal
benefit.
Cheers, Dan
On Thu, Sep 28, 2017 at 9:21 AM, mj wrote:
> Hi,
>
> We have completed the upgrade t
On Wed, Oct 4, 2017 at 9:08 AM, Piotr Dałek wrote:
> On 17-10-04 08:51 AM, lists wrote:
>>
>> Hi,
>>
>> Yesterday I chowned our /var/lib/ceph ceph, to completely finalize our
>> jewel migration, and noticed something interesting.
>>
>> After I brought back up the OSDs I just chowned, the system ha
On Fri, Oct 6, 2017 at 6:56 PM, Alfredo Deza wrote:
> Hi,
>
> Now that ceph-volume is part of the Luminous release, we've been able
> to provide filestore support for LVM-based OSDs. We are making use of
> LVM's powerful mechanisms to store metadata which allows the process
> to no longer rely on
Hi,
I’m in the middle of debugging some incompatibilities with an upgrade of
Proxmox which uses Ceph. At this point I’d like to know what my current value
is for the min-compat-client setting, which would’ve been set by:
ceph osd set-require-min-compat-client …
AFAIK, there is no direct g
t;
>> Op 13 oktober 2017 om 10:22 schreef Hans van den Bogert
>> :
>>
>>
>> Hi,
>>
>> I’m in the middle of debugging some incompatibilities with an upgrade of
>> Proxmox which uses Ceph. At this point I’d like to know what my current
>>
Hi All,
I've converted 2 nodes with 4 HDD/OSDs each from Filestore to Bluestore. I
expected somewhat higher memory usage/RSS values, however I see, imo, a
huge memory usage for all OSDs on both nodes.
Small snippet from `top`
PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+
C
ke HDDs and monitor the memory usage.
Thanks,
Hans
On Wed, Oct 18, 2017 at 11:56 AM, Wido den Hollander wrote:
>
> > Op 18 oktober 2017 om 11:41 schreef Hans van den Bogert <
> hansbog...@gmail.com>:
> >
> >
> > Hi All,
> >
> > I've c
> Memory usage is still quite high here even with a large onode cache!
> Are you using erasure coding? I recently was able to reproduce a bug in
> bluestore causing excessive memory usage during large writes with EC,
> but have not tracked down exactly what's going on yet.
>
> Mark
No, this is
My experience with RGW is that actual freeing up of space is asynchronous to
the a S3 client’s command to delete an object. I.e., it might take a while
before it’s actually freed up.
Can you redo your little experiment and simply wait for an hour to let the
garbage collector to do its thing, or
Hi All,
For Jewel there is this page about drive cache:
http://docs.ceph.com/docs/jewel/rados/configuration/filesystem-recommendations/#hard-drive-prep
For Bluestore I can't find any documentation or discussions about drive
write cache, while I can imagine that revisiting this subject might be
ne
Very interesting.
I've been toying around with Rook.io [1]. Did you know of this project, and
if so can you tell if ceph-helm and Rook.io have similar goals?
Regards,
Hans
[1] https://rook.io/
On 25 Oct 2017 21:09, "Sage Weil" wrote:
> There is a new repo under the ceph org, ceph-helm, which
> On Nov 1, 2017, at 4:45 PM, David Turner wrote:
>
> All it takes for data loss is that an osd on server 1 is marked down and a
> write happens to an osd on server 2. Now the osd on server 2 goes down
> before the osd on server 1 has finished backfilling and the first osd
> receives a reque
Never mind, I should’ve read the whole thread first.
> On Nov 2, 2017, at 10:50 AM, Hans van den Bogert wrote:
>
>
>> On Nov 1, 2017, at 4:45 PM, David Turner > <mailto:drakonst...@gmail.com>> wrote:
>>
>> All it takes for data loss is that an osd on
Hi all,
During our upgrade from Jewel to Luminous I saw the following behaviour, if
my memory serves me right:
When upgrading for example monitors and OSDs, we saw that the `ceph
versions` command correctly showed at one that some OSDs were still on
Jewel (10.2.x) and some were already upgraded a
Just to get this really straight, Jewel OSDs do send this metadata?
Otherwise I'm probably mistaken that I ever saw 10.2.x versions in the
output.
Thanks,
Hans
On 2 Nov 2017 12:31 PM, "John Spray" wrote:
> On Thu, Nov 2, 2017 at 11:16 AM, Hans van den Bogert
&g
On Tue, Nov 7, 2017 at 12:57 PM, John Spray wrote:
> On Sun, Nov 5, 2017 at 4:19 PM, Brady Deetz wrote:
>> My organization has a production cluster primarily used for cephfs upgraded
>> from jewel to luminous. We would very much like to have snapshots on that
>> filesystem, but understand that t
On Tue, Nov 7, 2017 at 4:15 PM, John Spray wrote:
> On Tue, Nov 7, 2017 at 3:01 PM, Dan van der Ster wrote:
>> On Tue, Nov 7, 2017 at 12:57 PM, John Spray wrote:
>>> On Sun, Nov 5, 2017 at 4:19 PM, Brady Deetz wrote:
>>>> My organization has a production clu
Are you sure you deployed it with the client.radosgw.gateway name as
well? Try to redeploy the RGW and make sure the name you give it
corresponds to the name you give in the ceph.conf. Also, do not forget
to push the ceph.conf to the RGW machine.
On Wed, Nov 8, 2017 at 11:44 PM, Sam Huracan wrote
config show | grep log_file
> "log_file": "/var/log/ceph/ceph-client.rgw.radosgw.log",
>
>
> [root@radosgw system]# cat /etc/ceph/ceph.client.radosgw.keyring
> [client.radosgw.gateway]
> key = AQCsywNaqQdDHxAAC24O8CJ0A9Gn6qeiPalEYg==
> caps mon = "all
Hi,
Can you show the contents of the file, /etc/yum.repos.d/ceph.repo ?
Regards,
Hans
> On Nov 15, 2017, at 10:27 AM, Ragan, Tj (Dr.)
> wrote:
>
> Hi All,
>
> I feel like I’m doing something silly. I’m spinning up a new cluster, and
> followed the instructions on the pre-flight and quick s
401 - 500 of 818 matches
Mail list logo