Hi,
we use OSDs with data on HDD and db/wal on NVMe.
But for now, BlueStore.DB and BlueStore.WAL only store medadata NOT
data. Right ?
So, when we migrated from :
A) Filestore + HDD with hardware writecache + journal on SSD
to :
B) Bluestore + HDD without hardware writecache + DB/WAL on NVMe
Per
> > much each case is using. If there is a memory leak, the
> > > autotuner
> > > can
> > > only do so much. At some point it will reduce the caches to fit
> > > within
> > > cache_min and leave it there.
> > >
> > >
> &
p is not always automatically
> > released. (You can check the heap freelist with `ceph tell osd.0
> > heap
> > stats`).
> > As a workaround we run this hourly:
> >
> > ceph tell mon.* heap release
> > ceph tell osd.* heap release
> >
ally
> released. (You can check the heap freelist with `ceph tell osd.0 heap
> stats`).
> As a workaround we run this hourly:
>
> ceph tell mon.* heap release
> ceph tell osd.* heap release
> ceph tell mds.* heap release
>
> -- Dan
>
> On Sat, Apr 6, 2019 at 1:
Hi,
on a Luminous 12.2.11 deploiement, my bluestore OSD exceed the
osd_memory_target :
daevel-ob@ssdr712h:~$ ps auxw | grep ceph-osd
ceph3646 17.1 12.0 6828916 5893136 ? Ssl mars29 1903:42
/usr/bin/ceph-osd -f --cluster ceph --id 143 --setuser ceph --setgroup ceph
ceph3991 1
Hi,
with Filestore, to estimate the weight of snapshot we use a simple find
script on each OSD :
nice find "$OSDROOT/$OSDDIR/current/" \
-type f -not -name '*_head_*' -not -name '*_snapdir_*' \
-printf '%P\n'
Then we agregate by image prefix, and obtain an estimation of each
eph osd pool ls detail"?
> Also, which version of Ceph are you running?
> Paul
>
> Am Fr., 21. Sep. 2018 um 19:28 Uhr schrieb Olivier Bonvalet
> :
> >
> > So I've totally disable cache-tiering and overlay. Now OSD 68 & 69
> > are
> > fine, no more bloc
di 21 septembre 2018 à 16:51 +0200, Maks Kowalik a écrit :
> According to the query output you pasted shards 1 and 2 are broken.
> But, on the other hand EC profile (4+2) should make it possible to
> recover from 2 shards lost simultanously...
>
> pt., 21 wrz 2018 o 16:29 Olivier Bonv
ing to
> your
> output in https://pastebin.com/zrwu5X0w). Can you verify if that
> block
> device is in use and healthy or is it corrupt?
>
>
> Zitat von Maks Kowalik :
>
> > Could you, please paste the output of pg 37.9c query
> >
> > pt., 21 wrz 2018
rbd_directory
> > rbd_data.f66c92ae8944a.000f2596
> > rbd_header.f66c92ae8944a
> >
> > And "cache-flush-evict-all" still hangs.
> >
> > I also switched the cache tier to "readproxy", to avoid using this
> > cache. But,
, it's still blocked.
Le vendredi 21 septembre 2018 à 02:14 +0200, Olivier Bonvalet a écrit :
> Hello,
>
> on a Luminous cluster, I have a PG incomplete and I can't find how to
> fix that.
>
> It's an EC pool (4+2) :
>
> pg 37.9c is incomplete
lems during recovery. Since
> only
> OSDs 68 and 69 are mentioned I was wondering if your cache tier
> also
> has size 2.
>
>
> Zitat von Olivier Bonvalet :
>
> > Hi,
> >
> > cache-tier on this pool have 26GB of data (for 5.7TB of data on the
> >
lush it, maybe restarting those OSDs (68, 69) helps,
> too.
> Or there could be an issue with the cache tier, what do those logs
> say?
>
> Regards,
> Eugen
>
>
> Zitat von Olivier Bonvalet :
>
> > Hello,
> >
> > on a Luminous cluster, I have a P
Hello,
on a Luminous cluster, I have a PG incomplete and I can't find how to
fix that.
It's an EC pool (4+2) :
pg 37.9c is incomplete, acting [32,50,59,1,0,75] (reducing pool
bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs for
'incomplete')
Of course, we can't reduce min_size fr
On a recent Luminous cluster, with nvme*n1 devices, the class is
automatically set as "nvme" on "Intel SSD DC P3520 Series" :
~# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 2.15996 root default
-9 0.71999 roo
gt; Luminous (you got > 200 PGs per OSD): try to increase
> mon_max_pg_per_osd on the monitors to 300 or so to temporarily
> resolve this.
>
> Paul
>
> 2018-06-05 9:40 GMT+02:00 Olivier Bonvalet :
> > Some more informations : the cluster was just upgraded from Jewel
,76]
21 [NONE,21,76] 21 286462'438402
2018-05-20 18:06:12.443141 286462'438402 2018-05-20 18:06:12.443141
0
Le mardi 05 juin 2018 à 09:25 +0200, Olivier Bonvalet a écrit :
> Hi,
>
> I have a cluster in "stale" sta
Hi,
I have a cluster in "stale" state : a lots of RBD are blocked since ~10
hours. In the status I see PG in stale or down state, but thoses PG
doesn't seem to exists anymore :
root! stor00-sbg:~# ceph health detail | egrep '(stale|down)'
HEALTH_ERR noout,noscrub,nodeep-scrub flag(s) set; 1 nearf
Le jeudi 12 octobre 2017 à 09:12 +0200, Ilya Dryomov a écrit :
> It's a crash in memcpy() in skb_copy_ubufs(). It's not in ceph, but
> ceph-induced, it looks like. I don't remember seeing anything
> similar
> in the context of krbd.
>
> This is a Xen dom0 kernel, right? What did the workload lo
Hi,
I had a "general protection fault: " with Ceph RBD kernel client.
Not sure how to read the call, is it Ceph related ?
Oct 11 16:15:11 lorunde kernel: [311418.891238] general protection fault:
[#1] SMP
Oct 11 16:15:11 lorunde kernel: [311418.891855] Modules linked in: cpuid
binfmt_
Le jeudi 05 octobre 2017 à 21:52 +0200, Ilya Dryomov a écrit :
> On Thu, Oct 5, 2017 at 6:05 PM, Olivier Bonvalet > wrote:
> > Le jeudi 05 octobre 2017 à 17:03 +0200, Ilya Dryomov a écrit :
> > > When did you start seeing these errors? Can you correlate that
> >
Le jeudi 05 octobre 2017 à 17:03 +0200, Ilya Dryomov a écrit :
> When did you start seeing these errors? Can you correlate that to
> a ceph or kernel upgrade? If not, and if you don't see other issues,
> I'd write it off as faulty hardware.
Well... I have one hypervisor (Xen 4.6 and kernel Linux
Le jeudi 05 octobre 2017 à 11:10 +0200, Ilya Dryomov a écrit :
> On Thu, Oct 5, 2017 at 9:03 AM, Olivier Bonvalet > wrote:
> > I also see that, but on 4.9.52 and 4.13.3 kernel.
> >
> > I also have some kernel panic, but don't know if it's related (RBD
> &g
Le jeudi 05 octobre 2017 à 11:47 +0200, Ilya Dryomov a écrit :
> The stable pages bug manifests as multiple sporadic connection
> resets,
> because in that case CRCs computed by the kernel don't always match
> the
> data that gets sent out. When the mismatch is detected on the OSD
> side, OSDs res
I also see that, but on 4.9.52 and 4.13.3 kernel.
I also have some kernel panic, but don't know if it's related (RBD are
mapped on Xen hosts).
Le jeudi 05 octobre 2017 à 05:53 +, Adrian Saul a écrit :
> We see the same messages and are similarly on a 4.4 KRBD version that
> is affected by thi
Le mercredi 23 septembre 2015 à 13:41 +0200, Wido den Hollander a écrit
:
> Hmm, that is weird. It works for me here from the Netherlands via
> IPv6:
You're right, I checked from other providers and it works.
So, a problem between Free (France) and Dreamhost ?
___
Hi,
since several hours http://ceph.com/ doesn't reply anymore in IPv6.
It pings, and we can open TCP socket, but nothing more :
~$ nc -w30 -v -6 ceph.com 80
Connection to ceph.com 80 port [tcp/http] succeeded!
GET / HTTP/1.0
Host: ceph.com
But, a HEAD query works :
~$ n
HDD pool.
At same time, is there tips tuning journal in case of HDD OSD, with
(potentially big) SSD journal, and hardware RAID card which handle
write back ?
Thanks for your help.
Olivier
Le vendredi 18 septembre 2015 à 02:35 +0200, Olivier Bonvalet a écrit :
> Hi,
>
> I have a cluste
Hi,
not sure if it's related, but there is recent changes because of a
security issue :
http://ceph.com/releases/important-security-notice-regarding-signing-key-and-binary-downloads-of-ceph/
Le vendredi 18 septembre 2015 à 08:45 -0500, Brian Kroth a écrit :
> Hi all, we've had the following i
Le vendredi 18 septembre 2015 à 14:14 +0200, Paweł Sadowski a écrit :
> It might be worth checking how many threads you have in your system
> (ps
> -eL | wc -l). By default there is a limit of 32k (sysctl -q
> kernel.pid_max). There is/was a bug in fork()
> (https://lkml.org/lkml/2015/2/3/345) repo
Le vendredi 18 septembre 2015 à 12:04 +0200, Jan Schermer a écrit :
> > On 18 Sep 2015, at 11:28, Christian Balzer wrote:
> >
> > On Fri, 18 Sep 2015 11:07:49 +0200 Olivier Bonvalet wrote:
> >
> > > Le vendredi 18 septembre 2015 à 10:59 +0200, Jan Schermer a é
s before I touch anything has become a
> routine now and that problem is gone.
>
> Jan
>
> > On 18 Sep 2015, at 10:53, Olivier Bonvalet
> > wrote:
> >
> > mmm good point.
> >
> > I don't see CPU or IO problem on mons, but in logs, I have this :
&
neck when you try to investigate...
>
> Jan
>
> > On 18 Sep 2015, at 09:37, Olivier Bonvalet
> > wrote:
> >
> > Hi,
> >
> > sorry for missing informations. I was to avoid putting too much
> > inappropriate infos ;)
> >
> >
> >
> &g
Le vendredi 18 septembre 2015 à 17:04 +0900, Christian Balzer a écrit :
> Hello,
>
> On Fri, 18 Sep 2015 09:37:24 +0200 Olivier Bonvalet wrote:
>
> > Hi,
> >
> > sorry for missing informations. I was to avoid putting too much
> > inappropriate infos ;)
>
Hi,
sorry for missing informations. I was to avoid putting too much
inappropriate infos ;)
Le vendredi 18 septembre 2015 à 12:30 +0900, Christian Balzer a écrit :
> Hello,
>
> On Fri, 18 Sep 2015 02:43:49 +0200 Olivier Bonvalet wrote:
>
> The items below help, but be a s speci
seems waiting for something... but I don't see
> > what.
> >
> >
> > Le vendredi 18 septembre 2015 à 02:35 +0200, Olivier Bonvalet a
> > écrit :
> > > Hi,
> > >
> > > I have a cluster with lot of blocked operations each time I try
>
0 too
> > - bandwith usage is also near 0
> >
> > The whole cluster seems waiting for something... but I don't see
> > what.
> >
> >
> > Le vendredi 18 septembre 2015 à 02:35 +0200, Olivier Bonvalet a
> > écrit :
> > > Hi,
> > >
&
Some additionnal informations :
- I have 4 SSD per node.
- the CPU usage is near 0
- IO wait is near 0 too
- bandwith usage is also near 0
The whole cluster seems waiting for something... but I don't see what.
Le vendredi 18 septembre 2015 à 02:35 +0200, Olivier Bonvalet a écrit :
> Hi
Hi,
I have a cluster with lot of blocked operations each time I try to move
data (by reweighting a little an OSD).
It's a full SSD cluster, with 10GbE network.
In logs, when I have blocked OSD, on the main OSD I can see that :
2015-09-18 01:55:16.981396 7f89e8cb8700 0 log [WRN] : 2 slow request
Le mardi 21 juillet 2015 à 07:06 -0700, Sage Weil a écrit :
> On Tue, 21 Jul 2015, Olivier Bonvalet wrote:
> > Le lundi 13 juillet 2015 à 11:31 +0100, Gregory Farnum a écrit :
> > > On Mon, Jul 13, 2015 at 11:25 AM, Kostis Fardelas <
> > > dante1...@gmail.com>
Le lundi 13 juillet 2015 à 11:31 +0100, Gregory Farnum a écrit :
> On Mon, Jul 13, 2015 at 11:25 AM, Kostis Fardelas <
> dante1...@gmail.com> wrote:
> > Hello,
> > it seems that new packages for firefly have been uploaded to repo.
> > However, I can't find any details in Ceph Release notes. There i
Hi,
Le lundi 23 mars 2015 à 07:29 -0700, Gregory Farnum a écrit :
> On Mon, Mar 23, 2015 at 6:21 AM, Olivier Bonvalet wrote:
> > Hi,
> >
> > I'm still trying to find why there is much more write operations on
> > filestore since Emperor/Firefly than from Dumpli
Erg... I sent to fast. Bad title, please read «More writes on
blockdevice than on filestore)
Le lundi 23 mars 2015 à 14:21 +0100, Olivier Bonvalet a écrit :
> Hi,
>
> I'm still trying to find why there is much more write operations on
> filestore since Emperor/Firefly t
Hi,
I'm still trying to find why there is much more write operations on
filestore since Emperor/Firefly than from Dumpling.
So, I add monitoring of all perf counters values from OSD.
From what I see : «filestore.ops» reports an average of 78 operations
per seconds. But, block device monitoring r
, could give you more logs ?
>
>
> ----- Mail original -
> De: "Olivier Bonvalet"
> À: "aderumier"
> Cc: "ceph-users"
> Envoyé: Mercredi 4 Mars 2015 16:42:13
> Objet: Re: [ceph-users] Perf problem after upgrade from dumpling to firefly
>
-- Mail original -----
> De: "Olivier Bonvalet"
> À: "aderumier"
> Cc: "ceph-users"
> Envoyé: Mercredi 4 Mars 2015 15:13:30
> Objet: Re: [ceph-users] Perf problem after upgrade from dumpling to firefly
>
> Ceph health is OK yes.
>
> The «firefly-
h health ok ?
>
>
>
> - Mail original -
> De: "Olivier Bonvalet"
> À: "aderumier"
> Cc: "ceph-users"
> Envoyé: Mercredi 4 Mars 2015 14:49:41
> Objet: Re: [ceph-users] Perf problem after upgrade from dumpling to firefly
>
> Thanks
kport in dumpling, not sure it's already done for
> firefly
>
>
> Alexandre
>
>
>
> - Mail original -
> De: "Olivier Bonvalet"
> À: "ceph-users"
> Envoyé: Mercredi 4 Mars 2015 12:10:30
> Objet: [ceph-users] Perf problem after upgrade from dump
Hi,
last saturday I upgraded my production cluster from dumpling to emperor
(since we were successfully using it on a test cluster).
A couple of hours later, we had falling OSD : some of them were marked
as down by Ceph, probably because of IO starvation. I marked the cluster
in «noout», start dow
Le mardi 03 mars 2015 à 16:32 -0800, Sage Weil a écrit :
> On Wed, 4 Mar 2015, Olivier Bonvalet wrote:
> > Does kernel client affected by the problem ?
>
> Nope. The kernel client is unaffected.. the issue is in librbd.
>
> sage
>
Ok, thanks for the clarifi
Does kernel client affected by the problem ?
Le mardi 03 mars 2015 à 15:19 -0800, Sage Weil a écrit :
> Hi,
>
> This is just a heads up that we've identified a performance regression in
> v0.80.8 from previous firefly releases. A v0.80.9 is working it's way
> through QA and should be out in a
mented, so check it out! :) »
So, it's not usable to backup a production cluster. I have to use
replicated pool.
Le mercredi 23 juillet 2014 à 17:51 +0200, Olivier Bonvalet a écrit :
> Hi,
>
> from my tests, I can't import snapshot from a replicated pool (in
> cluster
Hi,
from my tests, I can't import snapshot from a replicated pool (in
cluster1) to an erasure-coding pool (in cluster2).
Is it a known limitation ? A temporary one ?
Or did I make a mistake somewhere ?
The cluster1 (aka production) is running Ceph 0.67.9), and the cluster2
(aka backup) is runnin
Le mercredi 21 mai 2014 à 18:20 -0700, Josh Durgin a écrit :
> On 05/21/2014 03:03 PM, Olivier Bonvalet wrote:
> > Le mercredi 21 mai 2014 à 08:20 -0700, Sage Weil a écrit :
> >> You're certain that that is the correct prefix for the rbd image you
> >> removed? D
Le mercredi 21 mai 2014 à 08:20 -0700, Sage Weil a écrit :
>
> You should definitely not do this! :)
Of course ;)
>
> You're certain that that is the correct prefix for the rbd image you
> removed? Do you see the objects lists when you do 'rados -p rbd ls - |
> grep '?
I'm pretty sure yes
.0.14bfb5a.238e1f29.*' -delete
Thanks for any advice,
Olivier
PS : not sure if this kind of problem is for the user or dev mailing
list.
Le mardi 20 mai 2014 à 11:32 +0200, Olivier Bonvalet a écrit :
> Hi,
>
> short : I removed a 1TB RBD image, but I still see files about it on
> OSD
Hi,
short : I removed a 1TB RBD image, but I still see files about it on
OSD.
long :
1) I did : "rbd snap purge $pool/$img"
but since it overload the cluster, I stopped it (CTRL+C)
2) latter, "rbd snap purge $pool/$img"
3) then, "rbd rm $pool/$img"
now, on the disk I can found files of this
Hi,
not sure it's related to ceph... you should probably look at ownClound
project, no ?
Or use any S3/Swift client which will know how to exchange data with a
RADOS gateway.
Le mardi 25 mars 2014 à 16:49 +0100, Loic Dachary a écrit :
> Hi,
>
> It's not available yet but ... are we far away ?
Hi,
Le mardi 24 septembre 2013 à 18:37 +0200, Corin Langosch a écrit :
> Hi there,
>
> do snapshots have an impact on write performance? I assume on each write all
> snapshots have to get updated (cow) so the more snapshots exist the worse
> write
> performance will get?
>
Not exactly : the
I removed some garbage about hosts faude / rurkh / murmillia (they was
temporarily added because cluster was full). So the "clean" CRUSH map :
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
# devices
device 0 device0
device 1 de
y reports space used by partially writed object.
Or is it XFS related only ?
Le mercredi 11 septembre 2013 à 11:00 +0200, Olivier Bonvalet a écrit :
> Hi,
>
> do you need more information about that ?
>
> thanks,
> Olivier
>
> Le mardi 10 septembre 2013 à 11:19 -0700, Samu
Hi,
do you need more information about that ?
thanks,
Olivier
Le mardi 10 septembre 2013 à 11:19 -0700, Samuel Just a écrit :
> Can you post the rest of you crush map?
> -Sam
>
> On Tue, Sep 10, 2013 at 5:52 AM, Olivier Bonvalet wrote:
> > I also checked that all files in th
I removed some garbage about hosts faude / rurkh / murmillia (they was
temporarily added because cluster was full). So the "clean" CRUSH map :
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
# devices
device 0 device0
device 1 d
.46 up
1
47 2.72osd.47 up
1
48 2.72osd.48 up
1
Le mardi 10 septembre 2013 à 21:01 +0200, Olivier Bonvalet a écrit :
> I removed some garb
Le mardi 10 septembre 2013 à 11:19 -0700, Samuel Just a écrit :
> Can you post the rest of you crush map?
> -Sam
>
Yes :
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
devi
eferenced in rados (compared with "rados --pool
ssd3copies ls rados.ssd3copies.dump").
Le mardi 10 septembre 2013 à 13:46 +0200, Olivier Bonvalet a écrit :
> Some additionnal informations : if I look on one PG only, for example
> the 6.31f. "ceph pg dump" report a size
n1
448Mtotal
and the content of the directory : http://pastebin.com/u73mTvjs
Le mardi 10 septembre 2013 à 10:31 +0200, Olivier Bonvalet a écrit :
> Hi,
>
> I have a space problem on a production cluster, like if there is unused
> data not freed : "ceph df" and "rados df
Hi,
I have a space problem on a production cluster, like if there is unused
data not freed : "ceph df" and "rados df" reports 613GB of data, and
disk usage is 2640GB (with 3 replica). It should be near 1839GB.
I have 5 hosts, 3 with SAS storage and 2 with SSD storage. I use crush
rules to put po
Le mercredi 28 août 2013 à 10:07 +0200, Sylvain Munaut a écrit :
> Hi,
>
> > I use Ceph 0.61.8 and Xen 4.2.2 (Debian) in production, and can't use
> > kernel 3.10.* on dom0, which hang very soon. But it's visible in kernel
> > logs of the dom0, not the domU.
>
> Weird. I'm using 3.10.0 without i
Le mardi 27 août 2013 à 13:44 -0700, Josh Durgin a écrit :
> On 08/27/2013 01:39 PM, Timofey Koolin wrote:
> > Is way to know real size of rbd image and rbd snapshots?
> > rbd ls -l write declared size of image, but I want to know real size.
>
> You can sum the sizes of the extents reported by:
>
Hi,
I use Ceph 0.61.8 and Xen 4.2.2 (Debian) in production, and can't use
kernel 3.10.* on dom0, which hang very soon. But it's visible in kernel
logs of the dom0, not the domU.
Anyway, you should probably re-try with kernel 3.9.11 for the dom0 (I
also use 3.10.9 in domU).
Olivier
Le mardi 27 a
Le lundi 19 août 2013 à 12:27 +0200, Olivier Bonvalet a écrit :
> Hi,
>
> I have an OSD which crash every time I try to start it (see logs below).
> Is it a known problem ? And is there a way to fix it ?
>
> root! taman:/var/log/ceph# grep -v ' pipe' osd.65.log
Hi,
I have an OSD which crash every time I try to start it (see logs below).
Is it a known problem ? And is there a way to fix it ?
root! taman:/var/log/ceph# grep -v ' pipe' osd.65.log
2013-08-19 11:07:48.478558 7f6fe367a780 0 ceph version 0.61.7
(8f010aff684e820ecc837c25ac77c7a05d7191ff), pro
Le jeudi 08 août 2013 à 18:04 -0700, Sage Weil a écrit :
> On Fri, 9 Aug 2013, Olivier Bonvalet wrote:
> > Le jeudi 08 ao?t 2013 ? 09:43 -0700, Sage Weil a ?crit :
> > > On Thu, 8 Aug 2013, Olivier Bonvalet wrote:
> > > > Hi,
> > > >
> > > >
Le jeudi 08 août 2013 à 09:43 -0700, Sage Weil a écrit :
> On Thu, 8 Aug 2013, Olivier Bonvalet wrote:
> > Hi,
> >
> > from now I have 5 monitors which share slow SSD with several OSD
> > journal. As a result, each data migration operation (reweight, recovery,
&g
Hi,
from now I have 5 monitors which share slow SSD with several OSD
journal. As a result, each data migration operation (reweight, recovery,
etc) is very slow and the cluster is near down.
So I have to change that. I'm looking to replace this 5 monitors by 3
new monitors, which still share (very
client?
>
> James
>
> > -Original Message-
> > From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-
> > boun...@lists.ceph.com] On Behalf Of Olivier Bonvalet
> > Sent: Monday, 5 August 2013 11:07 AM
> > To: ceph-users@lists.ceph.com
> > Subject: [ceph-u
we have a patch that addresses the bug, would you be
> able to test it?
>
> Thanks!
> sage
>
>
> On Mon, 5 Aug 2013, Olivier Bonvalet wrote:
> > Sorry, the "dev" list is probably a better place for that one.
> >
> > Le lundi 05 ao?t 2013 ? 03:07 +0
Sorry, the "dev" list is probably a better place for that one.
Le lundi 05 août 2013 à 03:07 +0200, Olivier Bonvalet a écrit :
> Hi,
>
> I've just upgraded a Xen Dom0 (Debian Wheezy with Xen 4.2.2) from Linux
> 3.9.11 to Linux 3.10.5, and now I have kernel panic afte
Hi,
I've just upgraded a Xen Dom0 (Debian Wheezy with Xen 4.2.2) from Linux
3.9.11 to Linux 3.10.5, and now I have kernel panic after launching some
VM which use RBD kernel client.
In kernel logs, I have :
Aug 5 02:51:22 murmillia kernel: [ 289.205652] kernel BUG at
net/ceph/osd_client.c:2
Le lundi 03 juin 2013 à 08:04 -0700, Gregory Farnum a écrit :
> On Sunday, June 2, 2013, Dominik Mostowiec wrote:
> Hi,
> I try to start postgres cluster on VMs with second disk
> mounted from
> ceph (rbd - kvm).
> I started some writes (pgbench initialisati
Hi,
it's a Cuttlefish bug, which should be fixed in next point release very
soon.
Olivier
Le dimanche 02 juin 2013 à 18:51 +1000, Bond, Darryl a écrit :
> Cluster has gone into HEALTH_WARN because the mon filesystem is 12%
> The cluster was upgraded to cuttlefish last week and had been running o
Ok, so :
- after a second "rbd rm XXX", the image was gone
- and "rados ls" doesn't see any object from that image
- so I tried to move thoses files
=> scrub is now ok !
So for me it's fixed. Thanks
Le vendredi 31 mai 2013 à 16:34 +0200, Olivier Bonvalet a écrit
Note that I still have scrub errors, but rados doesn't see thoses
objects :
root! brontes:~# rados -p hdd3copies ls | grep '^rb.0.15c26.238e1f29'
root! brontes:~#
Le vendredi 31 mai 2013 à 15:36 +0200, Olivier Bonvalet a écrit :
> Hi,
>
> sorry for the late answer
Le jeudi 23 mai 2013 à 15:53 -0700, Samuel Just a écrit :
> Can you send the filenames in the pg directories for those 4 pgs?
> -Sam
>
> On Thu, May 23, 2013 at 3:27 PM, Olivier Bonvalet wrote:
> > No :
> > pg 3.7c is active+clean+inconsistent, acting [24,13,39]
> &
Hi,
I seem to have a bad edge effect in my setup, don't know if it's a RBD
problem or a Xen problem.
So, I have one Ceph cluster, in which I setup 2 different storage
pools : one on SSD and one on SAS. With appropriate CRUSH rules, those
pools are complety separated, only MON are commons.
Then,
15:17 -0700, Samuel Just a écrit :
> Do all of the affected PGs share osd.28 as the primary? I think the
> only recovery is probably to manually remove the orphaned clones.
> -Sam
>
> On Thu, May 23, 2013 at 5:00 AM, Olivier Bonvalet wrote:
> > Not yet. I keep it for now.
>
Not yet. I keep it for now.
Le mercredi 22 mai 2013 à 15:50 -0700, Samuel Just a écrit :
> rb.0.15c26.238e1f29
>
> Has that rbd volume been removed?
> -Sam
>
> On Wed, May 22, 2013 at 12:18 PM, Olivier Bonvalet
> wrote:
> > 0.61-11-g3b94f03 (0.61-1.1), but t
0.61-11-g3b94f03 (0.61-1.1), but the bug occured with bobtail.
Le mercredi 22 mai 2013 à 12:00 -0700, Samuel Just a écrit :
> What version are you running?
> -Sam
>
> On Wed, May 22, 2013 at 11:25 AM, Olivier Bonvalet
> wrote:
> > Is it enough ?
> >
> > # t
cluding all of these errors?
> -Sam
>
> On Wed, May 22, 2013 at 5:39 AM, Dzianis Kahanovich
> wrote:
> > Olivier Bonvalet пишет:
> >>
> >> Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit :
> >>> Le mardi 07 mai 2013 à 15:51 +0300, Dzianis
Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit :
> Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit :
> > I have 4 scrub errors (3 PGs - "found clone without head"), on one OSD. Not
> > repairing. How to repair it exclude re-creating of OSD?
as IMHO self-provocation for force reinstall). Now (at least
> to my summer outdoors) I keep v0.62 (3 nodes) with every pool size=3
> min_size=2
> (was - size=2 min_size=1).
>
> But try to do nothing first and try to install latest version. And keep your
> vote to issue #4937 to
Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit :
> I have 4 scrub errors (3 PGs - "found clone without head"), on one OSD. Not
> repairing. How to repair it exclude re-creating of OSD?
>
> Now it "easy" to clean+create OSD, but in theory - in case there are multiple
> OSDs - it may
g
for missing object".
Le vendredi 17 mai 2013 à 23:37 +0200, Olivier Bonvalet a écrit :
> Yes, osd.10 is near full because of bad data repartition (not enought PG
> I suppose), and the difficulty to remove snapshot without overloading
> the cluster.
>
> The problem on osd.2
com/docs/master/rados/troubleshooting/troubleshooting-osd/#stopping-w-out-rebalancing
>
> If you have down OSDs that don't get marked out, that would certainly
> cause problems. Have you tried restarting the failed OSDs?
>
> What do the logs look like for osd.15 and
er able to help you.
>
> For example, "ceph osd tree" would help us understand the status of
> your cluster a bit better.
>
> On Thu, May 16, 2013 at 10:32 PM, Olivier Bonvalet
> wrote:
> > Le mercredi 15 mai 2013 à 00:15 +0200, Olivier Bonvalet a écrit :
Le mercredi 15 mai 2013 à 00:15 +0200, Olivier Bonvalet a écrit :
> Hi,
>
> I have some PG in state down and/or incomplete on my cluster, because I
> loose 2 OSD and a pool was having only 2 replicas. So of course that
> data is lost.
>
> My problem now is that I can
Hi,
I have some PG in state down and/or incomplete on my cluster, because I
loose 2 OSD and a pool was having only 2 replicas. So of course that
data is lost.
My problem now is that I can't retreive a "HEALTH_OK" status : if I try
to remove, read or overwrite the corresponding RBD images, near al
Le vendredi 10 mai 2013 à 19:16 +0200, Greg a écrit :
> Hello folks,
>
> I'm in the process of testing CEPH and RBD, I have set up a small
> cluster of hosts running each a MON and an OSD with both journal and
> data on the same SSD (ok this is stupid but this is simple to verify the
> disks a
Le samedi 20 avril 2013 à 09:10 +0200, Olivier Bonvalet a écrit :
> Le mercredi 17 avril 2013 à 20:52 +0200, Olivier Bonvalet a écrit :
> > What I didn't understand is why the OSD process crash, instead of
> > marking that PG "corrupted", and does that PG really &
1 - 100 of 120 matches
Mail list logo