First of all, your disk removal process needs tuning. "ceph osd out" sets the
disk reweight to 0 but NOT the crush weight; this is why you're seeing
misplaced objects after removing the osd, because the crush weights have
changed (even though reweight meant that disk currently held no data). Use
On 11/07/17 17:08, Roger Brown wrote:
> What are some options for migrating from Apache/FastCGI to Civetweb for
> RadosGW object gateway *without* breaking other websites on the domain?
>
> I found documention on how to migrate the object gateway to Civetweb
> (http://docs.ceph.com/docs/luminous
Best guess, apache is munging together everything it picks up using the aliases
and translating the host to the ServerName before passing on the request. Try
setting ProxyPreserveHost on as per
https://httpd.apache.org/docs/2.4/mod/mod_proxy.html#proxypreservehost ?
Rich
On 11/07/17 21:47, Rog
solve your
problem.
Rich
On 12/07/17 10:40, Richard Hesketh wrote:
> Best guess, apache is munging together everything it picks up using the
> aliases and translating the host to the ServerName before passing on the
> request. Try setting ProxyPreserveHost on as per
> https://http
On 11/07/17 20:05, Eino Tuominen wrote:
> Hi Richard,
>
> Thanks for the explanation, that makes perfect sense. I've missed the
> difference between ceph osd reweight and ceph osd crush reweight. I have to
> study that better.
>
> Is there a way to get ceph to prioritise fixing degraded objects
On 14/07/17 11:03, Ilya Dryomov wrote:
> On Fri, Jul 14, 2017 at 11:29 AM, Riccardo Murri
> wrote:
>> Hello,
>>
>> I am trying to install a test CephFS "Luminous" system on Ubuntu 16.04.
>>
>> Everything looks fine, but the `mount.ceph` command fails (error 110,
>> timeout);
>> kernel logs show a
Correct me if I'm wrong, but I understand rbd-nbd is a userland client for
mapping RBDs to local block devices (like "rbd map" in the kernel client), not
a client for mounting the cephfs filesystem which is what Riccardo is using?
Rich
On 17/07/17 12:48, Massimiliano Cuttini wrote:
> Hi Riccard
On 19/07/17 15:14, Laszlo Budai wrote:
> Hi David,
>
> Thank you for that reference about CRUSH. It's a nice one.
> There I could read about expanding the cluster, but in one of my cases we
> want to do more: we want to move from host failure domain to chassis failure
> domain. Our concern is: h
> the impact of the recovery/refilling operation on your clients' data traffic?
> What setting have you used to avoid slow requests?
>
> Kind regards,
> Laszlo
>
>
> On 19.07.2017 17:40, Richard Hesketh wrote:
>> On 19/07/17 15:14, Laszlo Budai wrote:
>&g
On 31/07/17 14:05, Edward R Huyer wrote:
> I’m migrating my Ceph cluster to entirely new hardware. Part of that is
> replacing the monitors. My plan is to add new monitors and remove old ones,
> updating config files on client machines as I go.
>
> I have clients actively using the cluster. T
On 01/08/17 12:41, Osama Hasebou wrote:
> Hi,
>
> What would be the best possible and efficient way for big Ceph clusters when
> maintenance needs to be performed ?
>
> Lets say that we have 3 copies of data, and one of the servers needs to be
> maintained, and maintenance might take 1-2 days d
r to Bluestore now and I've just been
reusing the partitions I set up for journals on my SSDs as DB devices for
Bluestore HDDs without specifying anything to do with the WAL, and I'd like to
know sooner rather than later if I'm making some sort of horrible mi
On 08/09/17 11:44, Richard Hesketh wrote:
> Hi,
>
> Reading the ceph-users list I'm obviously seeing a lot of people talking
> about using bluestore now that Luminous has been released. I note that many
> users seem to be under the impression that they need separate bloc
nity/new-luminous-bluestore/
>
> http://ceph.com/planet/understanding-bluestore-cephs-new-storage-backend/
>
> On Mon, Sep 11, 2017 at 8:45 PM, Richard Hesketh
> wrote:
>> On 08/09/17 11:44, Richard Hesketh wrote:
>>> Hi,
>>>
>>> Reading the ceph-users l
re recovery performance on the same version of ceph with the same
> sleep settings?
>
> Mark
>
> On 09/12/2017 05:24 AM, Richard Hesketh wrote:
>> Thanks for the links. That does seem to largely confirm that what I haven't
>> horribly misunderstood anything a
/09/17 11:16, Richard Hesketh wrote:
> Hi Mark,
>
> No, I wasn't familiar with that work. I am in fact comparing speed of
> recovery to maintenance work I did while the cluster was in Jewel; I haven't
> manually done anything to sleep settings, only adjusted max backf
there's been no client
> activity for X seconds. Theoretically more advanced heuristics might cover
> this, but in the interim it seems to me like this would solve the very
> specific problem you are seeing while still throttling recovery when IO is
> happening.
>
> Mark
>
>
I asked the same question a couple of weeks ago. No response I got
contradicted the documentation but nobody actively confirmed the
documentation was correct on this subject, either; my end state was that
I was relatively confident I wasn't making some horrible mistake by
simply specifying a bi
As the subject says... any ceph fs administrative command I try to run hangs
forever and kills monitors in the background - sometimes they come back, on a
couple of occasions I had to manually stop/restart a suffering mon. Trying to
load the filesystem tab in the ceph-mgr dashboard dumps an erro
On 27/09/17 12:32, John Spray wrote:
> On Wed, Sep 27, 2017 at 12:15 PM, Richard Hesketh
> wrote:
>> As the subject says... any ceph fs administrative command I try to run hangs
>> forever and kills monitors in the background - sometimes they come back, on
>> a coup
lt;mailto:ceph-users@lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
On 27/09/17 19:35, John Spray wrote:
> On Wed, Sep 27, 2017 at 1:18 PM, Richard Hesketh
> wrote:
>> On 27/09/17 12:32, John Spray wrote:
>>> On Wed, Sep 27, 2017 at 12:15 PM, Richard Hesketh
>>> wrote:
>>>> As the subject says... any ceph fs admin
When I try to run the command "ceph osd status" on my cluster, I just get an
error. Luckily unlike the last issue I had with ceph fs commands it doesn't
seem to be crashing any of the daemons.
root@vm-ds-01:/var/log/ceph# ceph osd status
Error EINVAL: Traceback (most recent call last):
File "/
On 12/10/17 17:15, Josy wrote:
> Hello,
>
> After taking down couple of OSDs, the dashboard is not showing the
> corresponding hostname.
Ceph-mgr is known to have issues with associated services with hostnames
sometimes, e.g. http://tracker.ceph.com/issues/20887
Fixes look to be incoming.
Ric
On 16/10/17 03:40, Alex Gorbachev wrote:
> On Sat, Oct 14, 2017 at 12:25 PM, Oscar Segarra
> wrote:
>> Hi,
>>
>> In my VDI environment I have configured the suggested ceph
>> design/arquitecture:
>>
>> http://docs.ceph.com/docs/giant/rbd/rbd-snapshot/
>>
>> Where I have a Base Image + Protected S
On 16/10/17 13:45, Wido den Hollander wrote:
>> Op 26 september 2017 om 16:39 schreef Mark Nelson :
>> On 09/26/2017 01:10 AM, Dietmar Rieder wrote:
>>> thanks David,
>>>
>>> that's confirming what I was assuming. To bad that there is no
>>> estimate/method to calculate the db partition size.
>>
>>
On 19/10/17 11:00, Dennis Benndorf wrote:
> Hello @all,
>
> givin the following config:
>
> * ceph.conf:
>
> ...
> mon osd down out subtree limit = host
> osd_pool_default_size = 3
> osd_pool_default_min_size = 2
> ...
>
> * each OSD has its j
On 07/11/17 13:16, Gandalf Corvotempesta wrote:
> Hi to all
> I've been far from ceph from a couple of years (CephFS was still unstable)
>
> I would like to test it again, some questions for a production cluster for
> VMs hosting:
>
> 1. Is CephFS stable?
Yes, CephFS is stable and safe (though
;> >>> on a fast device (ssd,nvme) when using filestore backend.
>> >>> it's not when it comes to bluestore - are there any resources,
>> >>> performance test, etc. out there how a fast wal,db device impacts
>> >>> performance?
>>
h.com <mailto:ceph-users@lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>
>
>
>
> --
> Kind Regards
> Rudi Ahlers
> Website: http://www.rudia
On 15/11/17 12:58, Micha Krause wrote:
> Hi,
>
> I've build a few clusters with separated public/cluster network, but I'm
> wondering if this is really
> the way to go.
>
> http://docs.ceph.com/docs/jewel/rados/configuration/network-config-ref
>
> states 2 reasons:
>
> 1. There is more traffic
Has anyone tried just dd-ing a block.db partition from one device to another
and updating the symlink in OSD metadata partition? Ceph doesn't have commands
that support you moving these partitions from one device to another, but I
don't see a technical reason why manually copying these things sh
> Rudi Ahlers
> Website: http://www.rudiahlers.co.za
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.c
On 23/11/17 16:13, Rudi Ahlers wrote:
> Hi Caspar,
>
> Thanx. I don't see any mention that it's a bad idea to have the WAL and DB on
> the same SSD, but I guess it could improve performance?
It's not that it's a bad idea to put WAL and DB on the same device - it's that
if not otherwise specifi
On 23/11/17 17:19, meike.talb...@women-at-work.org wrote:
> Hello,
>
> in our preset Ceph cluster we used to have 12 HDD OSDs per host.
> All OSDs shared a common SSD for journaling.
> The SSD was used as root device and the 12 journals were files in the
> /usr/share directory, like this:
>
> OS
On 05/12/17 09:20, Ronny Aasen wrote:
> On 05. des. 2017 00:14, Karun Josy wrote:
>> Thank you for detailed explanation!
>>
>> Got one another doubt,
>>
>> This is the total space available in the cluster :
>>
>> TOTAL : 23490G
>> Use : 10170G
>> Avail : 13320G
>>
>>
>> But ecpool shows max avail
On 05/12/17 17:10, Graham Allan wrote:
> On 12/05/2017 07:20 AM, Wido den Hollander wrote:
>> Hi,
>>
>> I haven't tried this before but I expect it to work, but I wanted to
>> check before proceeding.
>>
>> I have a Ceph cluster which is running with manually formatted
>> FileStore XFS disks, Jewel
You are safe to upgrade packages just by doing an apt-get update; apt-get
upgrade, and you will then want to restart your ceph daemons to bring them to
the new version - though you should of course stagger your restarts of each
type to ensure your mons remain quorate (don't restart more than hal
On 06/12/17 09:17, Caspar Smit wrote:
>
> 2017-12-05 18:39 GMT+01:00 Richard Hesketh <mailto:richard.hesk...@rd.bbc.co.uk>>:
>
> On 05/12/17 17:10, Graham Allan wrote:
> > On 12/05/2017 07:20 AM, Wido den Hollander wrote:
> >> Hi,
> >&g
On 21/12/17 10:21, Konstantin Shalygin wrote:
>> Is this the correct way to removes OSDs, or am I doing something wrong ?
> Generic way for maintenance (e.g. disk replace) is rebalance by change osd
> weight:
>
>
> ceph osd crush reweight osdid 0
>
> cluster migrate data "from this osd"
>
>
>
On 21/12/17 10:28, Burkhard Linke wrote:
> OSD config section from ceph.conf:
>
> [osd]
> osd_scrub_sleep = 0.05
> osd_journal_size = 10240
> osd_scrub_chunk_min = 1
> osd_scrub_chunk_max = 1
> max_pg_per_osd_hard_ratio = 4.0
> osd_max_pg_per_osd_hard_ratio = 4.0
> bluestore_cache_size_hdd = 53687
On 02/01/18 02:36, linghucongsong wrote:
> Hi, all!
>
> I just use ceph rbd for openstack.
>
> my ceph version is 10.2.7.
>
> I find a surprise thing that the object save in the osd , in some pgs the
> objects are 8M, and in some pgs the objects are 4M, can someone tell me why?
> thanks!
>
>
No, most filesystems can be expanded pretty trivially (shrinking is a more
complex operation but usually also doable). Assuming the likely case of an
ext2/3/4 filesystem, the command "resize2fs /dev/rbd0" should resize the FS to
cover the available space in the block device.
Rich
On 03/01/18 1
Whoops meant to reply to-list
Forwarded Message
Subject: Re: [ceph-users] Different Ceph versions on OSD/MONs and Clients?
Date: Fri, 5 Jan 2018 15:10:56 +
From: Richard Hesketh
To: Götz Reinicke
On 05/01/18 15:03, Götz Reinicke wrote:
> Hi,
>
> our OSD and MO
I recently came across the bluestore_prefer_deferred_size family of config
options, for controlling the upper size threshold on deferred writes. Given a
number of users suggesting that write performance in filestore is better than
write performance in bluestore - because filestore writing to an
On 02/02/18 08:33, Kevin Olbrich wrote:
> Hi!
>
> I am planning a new Flash-based cluster. In the past we used SAMSUNG PM863a
> 480G as journal drives in our HDD cluster.
> After a lot of tests with luminous and bluestore on HDD clusters, we plan to
> re-deploy our whole RBD pool (OpenNebula clo
Waiting for rebalancing is considered the safest way, since it ensures
you retain your normal full number of replicas at all times. If you take
the disk out before rebalancing is complete, you will be causing some
PGs to lose a replica. That is a risk to your data redundancy, but it
might be an acc
r copies to the
> replaced drive, and doesn’t move data around.
> That script is specific for filestore to bluestore somewhat, as the
> flush-journal command is no longer used in bluestore.
>
> Hope thats helpful.
>
> Reed
>
>> On Aug 6, 2018, at 9:30 AM, Richard
n 07/08/18 17:10, Robert Stanford wrote:
>
> I was surprised to see an email on this list a couple of days ago,
> which said that write performance would actually fall with BlueStore. I
> thought the reason BlueStore existed was to increase performance.
> Nevertheless, it seems like filestore i
It can get confusing.
There will always be a WAL, and there will always be a metadata DB, for
a bluestore OSD. However, if a separate device is not specified for the
WAL, it is kept in the same device/partition as the DB; in the same way,
if a separate device is not specified for the DB, it is kep
I am also curious about this, in light of the reported performance regression
switching from Filestore to Bluestore (when using SSDs for journalling/metadata
db). I didn't get any responses when I asked, though. The major consideration
that seems obvious is that this potentially hugely increases
On 29/03/18 09:25, ST Wong (ITSC) wrote:
> Hi all,
>
> We put 8 (4+4) OSD and 5 (2+3) MON servers in server rooms in 2 buildings for
> redundancy. The buildings are connected through direct connection.
>
> While servers in each building have alternate uplinks. What will happen in
> case the
No, you shouldn't invoke it that way, you should just not specify a WAL device
at all if you want it to be stored with the DB - if not otherwise specified the
WAL is automatically stored with the other metadata on the DB device. You
should do something like:
ceph-volume lvm prepare --bluestore
On 16/04/18 18:32, Shain Miley wrote:
> Hello,
>
> We are currently running Ceph Jewel (10.2.10) on Ubuntu 14.04 in production.
> We have been running into a kernel panic bug off an on for a while and I am
> starting to look into upgrading as a possible solution. We are currently
> running ve
ound all the OSD data just fine and everything did
> startup in a good state post OS reinstall.
>
> Thanks again for your help on this issue.
>
> Shain
>
>
> On 04/17/2018 06:00 AM, Richard Hesketh wrote:
>> On 16/04/18 18:32, Shain Miley wrote:
>
On 05/06/18 14:49, rafael.diazmau...@univ-rennes1.fr wrote:
> Hello,
>
> I run proxmox 5.2 with ceph 12.2 (bluestore).
>
> I've created an OSD on a Hard Drive (/dev/sda) and tried to put both WAL and
> Journal on a SSD part (/dev/sde1) like this :
> pveceph createosd /dev/sda --wal_dev /dev/sde1
On 16/02/17 20:44, girish kenkere wrote:
> Thanks David,
>
> Its not quiet what i was looking for. Let me explain my question in more
> detail -
>
> This is excerpt from Crush paper, this explains how crush algo running on
> each client/osd maps pg to an osd during the write operation[lets assu
On 21/03/17 17:48, Wes Dillingham wrote:
> a min_size of 1 is dangerous though because it means you are 1 hard disk
> failure away from losing the objects within that placement group entirely. a
> min_size of 2 is generally considered the minimum you want but many people
> ignore that advice, so
I definitely saw it on a Hammer cluster, though I decided to check my IRC logs
for more context and found that in my specific cases it was due to PGs going
incomplete. `ceph health detail` offered the following, for instance:
pg 8.31f is remapped+incomplete, acting [39] (reducing pool one min_si
On 12/04/17 09:47, Siniša Denić wrote:
> Hi to all, my cluster stuck after upgrade from hammer 0.94.5 to luminous.
> Iit seems somehow osds stuck at hammer version despite
>
> Can I somehow overcome this situation and what could happened during the
> upgrade?
> I performed upgrade from hammer by
splitting the HDDs and SSDs into separate pools, and
just using the SSD pool for VMs/datablocks which needed to be snappier. For
most of my users it didn't matter that the backing pool was kind of slow, and
only a few were wanting to do I/O intensive workloads where the speed was
required, so p
rimary, so I think this is a reliable way of doing it.
You would of course end up with an acting primary on one of the slow spinners
for a brief period if you lost an SSD for whatever reason and it needed to
rebalance.
The only downside is that if you have your SSD and H
The extra pools are probably the data and metadata pools that are
automatically created for cephfs.
http://ceph.com/pgcalc/ is a useful tool for helping to work out how
many PGs your pools should have.
Rich
On 04/05/17 15:41, David Turner wrote:
> I'm guessing you have more than just the 1 pool
Is there a way, either by individual PG or by OSD, I can prioritise
backfill/recovery on a set of PGs which are currently particularly important to
me?
For context, I am replacing disks in a 5-node Jewel cluster, on a node-by-node
basis - mark out the OSDs on a node, wait for them to clear, rep
Bionic's mimic packages do seem to depend on libcurl4 already, for what
that's worth:
root@vm-gw-1:/# apt-cache depends ceph-common
ceph-common
...
Depends: libcurl4
On 22/11/2018 12:40, Matthew Vernon wrote:
> Hi,
>
> The ceph.com ceph luminous packages for Ubuntu Bionic still depend on
> li
Another option would be adding a boot time script which uses ntpdate (or
something) to force an immediate sync with your timeservers before ntpd
starts - this is actually suggested in ntpdate's man page!
Rich
On 15/05/2019 13:00, Marco Stuurman wrote:
> Hi Yenya,
>
> You could try to synchronize
66 matches
Mail list logo