On 2019-02-20 17:38, Mykola Golub wrote:
Note, if even rbd supported live (without any downtime) migration you
would still need to restart the client after the upgrate to a new
librbd with migration support.
You could probably get away with executing the client with a new librbd
version by li
Are there any means to notify the administrator that an auto-repair has
taken place?
-K.
On 2018-11-15 20:45, Mark Schouten wrote:
As a user, I’m very surprised that this isn’t a default setting.
Mark Schouten
Op 15 nov. 2018 om 18:40 heeft Wido den Hollander het volgende
geschreven:
Hi
The scenario is actually a bit different, see:
Let's assume size=2, min_size=1
-We are looking at pg "A" acting [1, 2]
-osd 1 goes down
-osd 2 accepts a write for pg "A"
-osd 2 goes down
-osd 1 comes back up, while osd 2 still down
-osd 1 has no way to know osd 2 accepted a write in pg "A"
-osd 1
On 2017-10-25 11:21, Wido den Hollander wrote:
>
>> Op 25 oktober 2017 om 5:58 schreef Christian Sarrasin
>> :
>>
>> The one thing I'm still wondering about is failure domains. With
>> Filestore and SSD-backed journals, an SSD failure would kill writes but
>> OSDs were otherwise still whole. Re
Is your min-size at least 2? Is it just one OSD affected?
If yes and if it is only the journal that is corrupt, but the actual OSD
store is intact although lagging behind now in writes and you do have
healthy copies of its PGs elsewhere (hence the min-size requirement) you
could resolve this situ
On 2017-06-02 14:07, Peter Maloney wrote:
> On 06/02/17 12:25, koukou73gr wrote:
>> On 2017-06-02 13:01, Peter Maloney wrote:
>>>> Is it easy for you to reproduce it? I had the same problem, and the same
>>>> solution. But it isn't easy to reproduce... Ja
On 2017-06-02 13:22, Peter Maloney wrote:
> On 06/02/17 12:06, koukou73gr wrote:
>> Thanks for the reply.
>>
>> Easy?
>> Sure, it happens reliably every time I boot the guest with
>> exclusive-lock on :)
> If it's that easy, also try with only exclusive
On 2017-06-02 13:01, Peter Maloney wrote:
>> Is it easy for you to reproduce it? I had the same problem, and the same
>> solution. But it isn't easy to reproduce... Jason Dillaman asked me for
>> a gcore dump of a hung process but I wasn't able to get one. Can you do
>> that, and when you reply, C
Thanks for the reply.
Easy?
Sure, it happens reliably every time I boot the guest with
exclusive-lock on :)
I'll need some walkthrough on the gcore part though!
-K.
On 2017-06-02 12:59, Peter Maloney wrote:
> On 06/01/17 17:12, koukou73gr wrote:
>> Hello list,
>>
>&g
Hello list,
Today I had to create a new image for a VM. This was the first time,
since our cluster was updated from Hammer to Jewel. So far I was just
copying an existing golden image and resized it as appropriate. But this
time I used rbd create.
So I "rbd create"d a 2T image and attached it to
On 2017-02-13 13:47, Wido den Hollander wrote:
>
> The udev rules of Ceph should chown the journal to ceph:ceph if it's set to
> the right partition UUID.
>
> This blog shows it partially:
> http://ceph.com/planet/ceph-recover-osds-after-ssd-journal-failure/
>
> This is done by *95-ceph-osd.r
On 2017-02-07 10:11, Tracy Reed wrote:
> Weird. Now the VMs that were hung in interruptable wait state have now
> disappeared. No idea why.
Have you tried the same procedure but with local storage instead?
-K.
___
ceph-users mailing list
ceph-users@lis
Same here.
Warnings appeared for OSDs running the .6 version each time one of the
rest was restarted to the .7 version.
When the last .6 OSD host was upgraded, there where no more warnings
from the rest.
Cluster seems happy :)
-K.
On 05/17/2016 11:04 AM, Dan van der Ster wrote:
> Hi Sage et a
If you have empty drive slots in your OSD hosts, I'd be tempted to
insert new drive in slot, set noout, shutdown one OSD, unmount OSD
directory, dd the old drive to the new one, remove old drive, restart OSD.
No rebalancing and minimal data movment when the OSD rejoins.
-K.
On 04/14/2016 04:29 P
Space on hosts in rack2 does not add up to cover space in rack1. After
enough data are written to the cluster all pgs on rack2 would be
allocated and the cluster won't be able to find a free pg to map new
data to for the 3rd replica.
Bottomline, spread your big disks to all 4 hosts, or add some mo
What is your pool size? 304 pgs sound awfuly small for 20 OSDs.
More pgs will help distribute full pgs better.
But with a full or near full OSD in hand, increasing pgs is a no-no
operation. If you search in the list archive, I believe there was a
thread last month or so which provided a walkthroug
Are you runnig with the default failure domain of 'host'?
If so, with a pool size of 3 and your 20 OSDs physically only on 2 hosts
Ceph is unable to find a 3rd host to map the 3rd replica.
Either add a host and move some OSDs there or reduce pool size to 2.
-K.
On 03/23/2016 02:17 PM, Zhang Qia
You should have settled with the nearest power of 2, which for 666 is
512. Since you created the cluster and IIRC is a testbed, you may as
well recreate it again, however it will less of a hassle to just
increase the pgs to the next power of two: 1024
Your 20 ods appear to be equal sized in your c
Have you tried restarting osd.0 ?
-K.
On 02/14/2016 09:56 PM, Mario Giammarco wrote:
> Hello,
> I am using ceph hammer under proxmox.
> I have working cluster it is several month I am using it.
> For reasons yet to discover I am now in this situation:
>
> HEALTH_WARN 4 pgs incomplete; 4 pgs st
On 01/28/2016 03:44 PM, Simon Ironside wrote:
> Btw, using virtio-scsi devices as above and discard='unmap' above
> enables TRIM support. This means you can use fstrim or mount file
> systems with discard inside the VM to free up unused space in the image.
Doesn't discard require the pc-q35-rhel7
Even the cheapest stuff nowadays has some more or less decent wear
leveling algorithm built into their controller so this won't be a
problem. Wear leveling algorithms cycle the blocks internally so wear
evens out on the whole disk.
-K.
On 12/22/2015 06:57 PM, Alan Johnson wrote:
> I would also ad
On 06/08/2015 11:54 AM, Jan Schermer wrote:
>
> This should indicate the real wear: 100 Gigabytes_Erased
> 0x0032 000 000 000Old_age Always - 62936
> Bytes written after compression: 233 SandForce_Internal
> 0x 000 000 000O
On 05/21/2015 02:36 PM, Brad Hubbard wrote:
If that's correct then starting from there and building a new RPM
with RBD support is the proper way of updating. Correct?
I guess there are two ways to approach this.
1. use the existing ceph source rpm here.
http://ceph.com/packages/ceph-extras/
Hello,
Can't really help you with you with nova, but using plain libvirt-1.1.1
and qemu-1.5.3 live migration of rbd-backed VMs is (almost*) instant on
the client side. We have rbd write-back cache enabled everywhere and
have no problem at all.
-K.
*There is about a 1-2 second hitch at wors
On 03/31/2015 09:23 PM, Sage Weil wrote:
It's nothing specific to peering (or ceph). The symptom we've seen is
just that byte stop passing across a TCP connection, usually when there is
some largish messages being sent. The ping/heartbeat messages get through
because they are small and we disa
On 03/05/2015 07:19 PM, Josh Durgin wrote:
client.libvirt
key:
caps: [mon] allow r
caps: [osd] allow class-read object_prefix rbd_children, allow rw
class-read pool=rbd
This includes everything except class-write on the pool you're using.
You'll need that so that a copy_up c
On 03/05/2015 03:40 AM, Josh Durgin wrote:
It looks like your libvirt rados user doesn't have access to whatever
pool the parent image is in:
librbd::AioRequest: write 0x7f1ec6ad6960
rbd_data.24413d1b58ba.0186 1523712~4096 should_complete: r
= -1
-1 is EPERM, for operation not
Hi Josh,
Thanks for taking a look at this. I 'm answering your questions inline.
On 03/04/2015 10:01 PM, Josh Durgin wrote:
[...]
And then proceeded to create a qemu-kvm guest with rbd/server as its
backing store. The guest booted but as soon as it got to mount the root
fs, things got weird:
On 03/03/2015 05:53 PM, Jason Dillaman wrote:
Your procedure appears correct to me. Would you mind re-running your cloned
image VM with the following ceph.conf properties:
[client]
rbd cache off
debug rbd = 20
log file = /path/writeable/by/qemu.$pid.log
If you recreate the issue, would you mi
Hello,
Today I thought I'd experiment with snapshots and cloning. So I did:
rbd import --image-format=2 vm-proto.raw rbd/vm-proto
rbd snap create rbd/vm-proto@s1
rbd snap protect rbd/vm-proto@s1
rbd clone rbd/vm-proto@s1 rbd/server
And then proceeded to create a qemu-kvm guest with rbd/server
30 matches
Mail list logo