[ceph-users] add writeback to Bluestore thanks to lvm-writecache

2019-08-13 Thread Olivier Bonvalet
Hi, we use OSDs with data on HDD and db/wal on NVMe. But for now, BlueStore.DB and BlueStore.WAL only store medadata NOT data. Right ? So, when we migrated from : A) Filestore + HDD with hardware writecache + journal on SSD to : B) Bluestore + HDD without hardware writecache + DB/WAL on NVMe Per

Re: [ceph-users] osd_memory_target exceeding on Luminous OSD BlueStore

2019-04-09 Thread Olivier Bonvalet
> > much each case is using. If there is a memory leak, the > > > autotuner > > > can > > > only do so much. At some point it will reduce the caches to fit > > > within > > > cache_min and leave it there. > > > > > > > &

Re: [ceph-users] osd_memory_target exceeding on Luminous OSD BlueStore

2019-04-09 Thread Olivier Bonvalet
p is not always automatically > > released. (You can check the heap freelist with `ceph tell osd.0 > > heap > > stats`). > > As a workaround we run this hourly: > > > > ceph tell mon.* heap release > > ceph tell osd.* heap release > >

Re: [ceph-users] osd_memory_target exceeding on Luminous OSD BlueStore

2019-04-09 Thread Olivier Bonvalet
ally > released. (You can check the heap freelist with `ceph tell osd.0 heap > stats`). > As a workaround we run this hourly: > > ceph tell mon.* heap release > ceph tell osd.* heap release > ceph tell mds.* heap release > > -- Dan > > On Sat, Apr 6, 2019 at 1:

[ceph-users] osd_memory_target exceeding on Luminous OSD BlueStore

2019-04-06 Thread Olivier Bonvalet
Hi, on a Luminous 12.2.11 deploiement, my bluestore OSD exceed the osd_memory_target : daevel-ob@ssdr712h:~$ ps auxw | grep ceph-osd ceph3646 17.1 12.0 6828916 5893136 ? Ssl mars29 1903:42 /usr/bin/ceph-osd -f --cluster ceph --id 143 --setuser ceph --setgroup ceph ceph3991 1

[ceph-users] Bluestore & snapshots weight

2018-10-28 Thread Olivier Bonvalet
Hi, with Filestore, to estimate the weight of snapshot we use a simple find script on each OSD : nice find "$OSDROOT/$OSDDIR/current/" \ -type f -not -name '*_head_*' -not -name '*_snapdir_*' \ -printf '%P\n' Then we agregate by image prefix, and obtain an estimation of each

Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Olivier Bonvalet
eph osd pool ls detail"? > Also, which version of Ceph are you running? > Paul > > Am Fr., 21. Sep. 2018 um 19:28 Uhr schrieb Olivier Bonvalet > : > > > > So I've totally disable cache-tiering and overlay. Now OSD 68 & 69 > > are > > fine, no more bloc

Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Olivier Bonvalet
di 21 septembre 2018 à 16:51 +0200, Maks Kowalik a écrit : > According to the query output you pasted shards 1 and 2 are broken. > But, on the other hand EC profile (4+2) should make it possible to > recover from 2 shards lost simultanously... > > pt., 21 wrz 2018 o 16:29 Olivier Bonv

Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Olivier Bonvalet
ing to > your > output in https://pastebin.com/zrwu5X0w). Can you verify if that > block > device is in use and healthy or is it corrupt? > > > Zitat von Maks Kowalik : > > > Could you, please paste the output of pg 37.9c query > > > > pt., 21 wrz 2018

Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Olivier Bonvalet
rbd_directory > > rbd_data.f66c92ae8944a.000f2596 > > rbd_header.f66c92ae8944a > > > > And "cache-flush-evict-all" still hangs. > > > > I also switched the cache tier to "readproxy", to avoid using this > > cache. But,

Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Olivier Bonvalet
, it's still blocked. Le vendredi 21 septembre 2018 à 02:14 +0200, Olivier Bonvalet a écrit : > Hello, > > on a Luminous cluster, I have a PG incomplete and I can't find how to > fix that. > > It's an EC pool (4+2) : > > pg 37.9c is incomplete

Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Olivier Bonvalet
lems during recovery. Since > only > OSDs 68 and 69 are mentioned I was wondering if your cache tier > also > has size 2. > > > Zitat von Olivier Bonvalet : > > > Hi, > > > > cache-tier on this pool have 26GB of data (for 5.7TB of data on the > >

Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Olivier Bonvalet
lush it, maybe restarting those OSDs (68, 69) helps, > too. > Or there could be an issue with the cache tier, what do those logs > say? > > Regards, > Eugen > > > Zitat von Olivier Bonvalet : > > > Hello, > > > > on a Luminous cluster, I have a P

[ceph-users] PG stuck incomplete

2018-09-20 Thread Olivier Bonvalet
Hello, on a Luminous cluster, I have a PG incomplete and I can't find how to fix that. It's an EC pool (4+2) : pg 37.9c is incomplete, acting [32,50,59,1,0,75] (reducing pool bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs for 'incomplete') Of course, we can't reduce min_size fr

Re: [ceph-users] Optane 900P device class automatically set to SSD not NVME

2018-08-13 Thread Olivier Bonvalet
On a recent Luminous cluster, with nvme*n1 devices, the class is automatically set as "nvme" on "Intel SSD DC P3520 Series" : ~# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 2.15996 root default -9 0.71999 roo

Re: [ceph-users] ghost PG : "i don't have pgid xx"

2018-06-05 Thread Olivier Bonvalet
gt; Luminous (you got > 200 PGs per OSD): try to increase > mon_max_pg_per_osd on the monitors to 300 or so to temporarily > resolve this. > > Paul > > 2018-06-05 9:40 GMT+02:00 Olivier Bonvalet : > > Some more informations : the cluster was just upgraded from Jewel

Re: [ceph-users] ghost PG : "i don't have pgid xx"

2018-06-05 Thread Olivier Bonvalet
,76] 21 [NONE,21,76] 21 286462'438402 2018-05-20 18:06:12.443141 286462'438402 2018-05-20 18:06:12.443141 0 Le mardi 05 juin 2018 à 09:25 +0200, Olivier Bonvalet a écrit : > Hi, > > I have a cluster in "stale" sta

[ceph-users] ghost PG : "i don't have pgid xx"

2018-06-05 Thread Olivier Bonvalet
Hi, I have a cluster in "stale" state : a lots of RBD are blocked since ~10 hours. In the status I see PG in stale or down state, but thoses PG doesn't seem to exists anymore : root! stor00-sbg:~# ceph health detail | egrep '(stale|down)' HEALTH_ERR noout,noscrub,nodeep-scrub flag(s) set; 1 nearf

[ceph-users] Re : general protection fault: 0000 [#1] SMP

2017-10-12 Thread Olivier Bonvalet
Le jeudi 12 octobre 2017 à 09:12 +0200, Ilya Dryomov a écrit : > It's a crash in memcpy() in skb_copy_ubufs(). It's not in ceph, but > ceph-induced, it looks like. I don't remember seeing anything > similar > in the context of krbd. > > This is a Xen dom0 kernel, right? What did the workload lo

[ceph-users] general protection fault: 0000 [#1] SMP

2017-10-11 Thread Olivier Bonvalet
Hi, I had a "general protection fault: " with Ceph RBD kernel client. Not sure how to read the call, is it Ceph related ? Oct 11 16:15:11 lorunde kernel: [311418.891238] general protection fault: [#1] SMP Oct 11 16:15:11 lorunde kernel: [311418.891855] Modules linked in: cpuid binfmt_

[ceph-users] Re : Re : Re : bad crc/signature errors

2017-10-06 Thread Olivier Bonvalet
Le jeudi 05 octobre 2017 à 21:52 +0200, Ilya Dryomov a écrit : > On Thu, Oct 5, 2017 at 6:05 PM, Olivier Bonvalet > wrote: > > Le jeudi 05 octobre 2017 à 17:03 +0200, Ilya Dryomov a écrit : > > > When did you start seeing these errors? Can you correlate that > >

[ceph-users] Re : Re : bad crc/signature errors

2017-10-05 Thread Olivier Bonvalet
Le jeudi 05 octobre 2017 à 17:03 +0200, Ilya Dryomov a écrit : > When did you start seeing these errors? Can you correlate that to > a ceph or kernel upgrade? If not, and if you don't see other issues, > I'd write it off as faulty hardware. Well... I have one hypervisor (Xen 4.6 and kernel Linux

[ceph-users] Re : Re : bad crc/signature errors

2017-10-05 Thread Olivier Bonvalet
Le jeudi 05 octobre 2017 à 11:10 +0200, Ilya Dryomov a écrit : > On Thu, Oct 5, 2017 at 9:03 AM, Olivier Bonvalet > wrote: > > I also see that, but on 4.9.52 and 4.13.3 kernel. > > > > I also have some kernel panic, but don't know if it's related (RBD > &g

[ceph-users] Re : bad crc/signature errors

2017-10-05 Thread Olivier Bonvalet
Le jeudi 05 octobre 2017 à 11:47 +0200, Ilya Dryomov a écrit : > The stable pages bug manifests as multiple sporadic connection > resets, > because in that case CRCs computed by the kernel don't always match > the > data that gets sent out. When the mismatch is detected on the OSD > side, OSDs res

[ceph-users] Re : bad crc/signature errors

2017-10-05 Thread Olivier Bonvalet
I also see that, but on 4.9.52 and 4.13.3 kernel. I also have some kernel panic, but don't know if it's related (RBD are mapped on Xen hosts). Le jeudi 05 octobre 2017 à 05:53 +, Adrian Saul a écrit : > We see the same messages and are similarly on a 4.4 KRBD version that > is affected by thi

Re: [ceph-users] ceph.com IPv6 down

2015-09-23 Thread Olivier Bonvalet
Le mercredi 23 septembre 2015 à 13:41 +0200, Wido den Hollander a écrit : > Hmm, that is weird. It works for me here from the Netherlands via > IPv6: You're right, I checked from other providers and it works. So, a problem between Free (France) and Dreamhost ? ___

[ceph-users] ceph.com IPv6 down

2015-09-23 Thread Olivier Bonvalet
Hi, since several hours http://ceph.com/ doesn't reply anymore in IPv6. It pings, and we can open TCP socket, but nothing more : ~$ nc -w30 -v -6 ceph.com 80 Connection to ceph.com 80 port [tcp/http] succeeded! GET / HTTP/1.0 Host: ceph.com But, a HEAD query works : ~$ n

Re: [ceph-users] Lot of blocked operations

2015-09-18 Thread Olivier Bonvalet
HDD pool. At same time, is there tips tuning journal in case of HDD OSD, with (potentially big) SSD journal, and hardware RAID card which handle write back ? Thanks for your help. Olivier Le vendredi 18 septembre 2015 à 02:35 +0200, Olivier Bonvalet a écrit : > Hi, > > I have a cluste

Re: [ceph-users] debian repositories path change?

2015-09-18 Thread Olivier Bonvalet
Hi, not sure if it's related, but there is recent changes because of a security issue : http://ceph.com/releases/important-security-notice-regarding-signing-key-and-binary-downloads-of-ceph/ Le vendredi 18 septembre 2015 à 08:45 -0500, Brian Kroth a écrit : > Hi all, we've had the following i

Re: [ceph-users] Lot of blocked operations

2015-09-18 Thread Olivier Bonvalet
Le vendredi 18 septembre 2015 à 14:14 +0200, Paweł Sadowski a écrit : > It might be worth checking how many threads you have in your system > (ps > -eL | wc -l). By default there is a limit of 32k (sysctl -q > kernel.pid_max). There is/was a bug in fork() > (https://lkml.org/lkml/2015/2/3/345) repo

Re: [ceph-users] Lot of blocked operations

2015-09-18 Thread Olivier Bonvalet
Le vendredi 18 septembre 2015 à 12:04 +0200, Jan Schermer a écrit : > > On 18 Sep 2015, at 11:28, Christian Balzer wrote: > > > > On Fri, 18 Sep 2015 11:07:49 +0200 Olivier Bonvalet wrote: > > > > > Le vendredi 18 septembre 2015 à 10:59 +0200, Jan Schermer a é

Re: [ceph-users] Lot of blocked operations

2015-09-18 Thread Olivier Bonvalet
s before I touch anything has become a > routine now and that problem is gone. > > Jan > > > On 18 Sep 2015, at 10:53, Olivier Bonvalet > > wrote: > > > > mmm good point. > > > > I don't see CPU or IO problem on mons, but in logs, I have this : &

Re: [ceph-users] Lot of blocked operations

2015-09-18 Thread Olivier Bonvalet
neck when you try to investigate... > > Jan > > > On 18 Sep 2015, at 09:37, Olivier Bonvalet > > wrote: > > > > Hi, > > > > sorry for missing informations. I was to avoid putting too much > > inappropriate infos ;) > > > > > > > &g

Re: [ceph-users] Lot of blocked operations

2015-09-18 Thread Olivier Bonvalet
Le vendredi 18 septembre 2015 à 17:04 +0900, Christian Balzer a écrit : > Hello, > > On Fri, 18 Sep 2015 09:37:24 +0200 Olivier Bonvalet wrote: > > > Hi, > > > > sorry for missing informations. I was to avoid putting too much > > inappropriate infos ;) >

Re: [ceph-users] Lot of blocked operations

2015-09-18 Thread Olivier Bonvalet
Hi, sorry for missing informations. I was to avoid putting too much inappropriate infos ;) Le vendredi 18 septembre 2015 à 12:30 +0900, Christian Balzer a écrit : > Hello, > > On Fri, 18 Sep 2015 02:43:49 +0200 Olivier Bonvalet wrote: > > The items below help, but be a s speci

Re: [ceph-users] Lot of blocked operations

2015-09-18 Thread Olivier Bonvalet
seems waiting for something... but I don't see > > what. > > > > > > Le vendredi 18 septembre 2015 à 02:35 +0200, Olivier Bonvalet a > > écrit : > > > Hi, > > > > > > I have a cluster with lot of blocked operations each time I try >

Re: [ceph-users] Lot of blocked operations

2015-09-18 Thread Olivier Bonvalet
0 too > > - bandwith usage is also near 0 > > > > The whole cluster seems waiting for something... but I don't see > > what. > > > > > > Le vendredi 18 septembre 2015 à 02:35 +0200, Olivier Bonvalet a > > écrit : > > > Hi, > > > &

Re: [ceph-users] Lot of blocked operations

2015-09-17 Thread Olivier Bonvalet
Some additionnal informations : - I have 4 SSD per node. - the CPU usage is near 0 - IO wait is near 0 too - bandwith usage is also near 0 The whole cluster seems waiting for something... but I don't see what. Le vendredi 18 septembre 2015 à 02:35 +0200, Olivier Bonvalet a écrit : > Hi

[ceph-users] Lot of blocked operations

2015-09-17 Thread Olivier Bonvalet
Hi, I have a cluster with lot of blocked operations each time I try to move data (by reweighting a little an OSD). It's a full SSD cluster, with 10GbE network. In logs, when I have blocked OSD, on the main OSD I can see that : 2015-09-18 01:55:16.981396 7f89e8cb8700 0 log [WRN] : 2 slow request

Re: [ceph-users] Firefly 0.80.10 ready to upgrade to?

2015-07-21 Thread Olivier Bonvalet
Le mardi 21 juillet 2015 à 07:06 -0700, Sage Weil a écrit : > On Tue, 21 Jul 2015, Olivier Bonvalet wrote: > > Le lundi 13 juillet 2015 à 11:31 +0100, Gregory Farnum a écrit : > > > On Mon, Jul 13, 2015 at 11:25 AM, Kostis Fardelas < > > > dante1...@gmail.com>

Re: [ceph-users] Firefly 0.80.10 ready to upgrade to?

2015-07-21 Thread Olivier Bonvalet
Le lundi 13 juillet 2015 à 11:31 +0100, Gregory Farnum a écrit : > On Mon, Jul 13, 2015 at 11:25 AM, Kostis Fardelas < > dante1...@gmail.com> wrote: > > Hello, > > it seems that new packages for firefly have been uploaded to repo. > > However, I can't find any details in Ceph Release notes. There i

Re: [ceph-users] More writes on filestore than on journal ?

2015-03-23 Thread Olivier Bonvalet
Hi, Le lundi 23 mars 2015 à 07:29 -0700, Gregory Farnum a écrit : > On Mon, Mar 23, 2015 at 6:21 AM, Olivier Bonvalet wrote: > > Hi, > > > > I'm still trying to find why there is much more write operations on > > filestore since Emperor/Firefly than from Dumpli

Re: [ceph-users] More writes on blockdevice than on filestore ?

2015-03-23 Thread Olivier Bonvalet
Erg... I sent to fast. Bad title, please read «More writes on blockdevice than on filestore) Le lundi 23 mars 2015 à 14:21 +0100, Olivier Bonvalet a écrit : > Hi, > > I'm still trying to find why there is much more write operations on > filestore since Emperor/Firefly t

[ceph-users] More writes on filestore than on journal ?

2015-03-23 Thread Olivier Bonvalet
Hi, I'm still trying to find why there is much more write operations on filestore since Emperor/Firefly than from Dumpling. So, I add monitoring of all perf counters values from OSD. From what I see : «filestore.ops» reports an average of 78 operations per seconds. But, block device monitoring r

Re: [ceph-users] Perf problem after upgrade from dumpling to firefly

2015-03-04 Thread Olivier Bonvalet
, could give you more logs ? > > > ----- Mail original - > De: "Olivier Bonvalet" > À: "aderumier" > Cc: "ceph-users" > Envoyé: Mercredi 4 Mars 2015 16:42:13 > Objet: Re: [ceph-users] Perf problem after upgrade from dumpling to firefly >

Re: [ceph-users] Perf problem after upgrade from dumpling to firefly

2015-03-04 Thread Olivier Bonvalet
-- Mail original ----- > De: "Olivier Bonvalet" > À: "aderumier" > Cc: "ceph-users" > Envoyé: Mercredi 4 Mars 2015 15:13:30 > Objet: Re: [ceph-users] Perf problem after upgrade from dumpling to firefly > > Ceph health is OK yes. > > The «firefly-

Re: [ceph-users] Perf problem after upgrade from dumpling to firefly

2015-03-04 Thread Olivier Bonvalet
h health ok ? > > > > - Mail original - > De: "Olivier Bonvalet" > À: "aderumier" > Cc: "ceph-users" > Envoyé: Mercredi 4 Mars 2015 14:49:41 > Objet: Re: [ceph-users] Perf problem after upgrade from dumpling to firefly > > Thanks

Re: [ceph-users] Perf problem after upgrade from dumpling to firefly

2015-03-04 Thread Olivier Bonvalet
kport in dumpling, not sure it's already done for > firefly > > > Alexandre > > > > - Mail original - > De: "Olivier Bonvalet" > À: "ceph-users" > Envoyé: Mercredi 4 Mars 2015 12:10:30 > Objet: [ceph-users] Perf problem after upgrade from dump

[ceph-users] Perf problem after upgrade from dumpling to firefly

2015-03-04 Thread Olivier Bonvalet
Hi, last saturday I upgraded my production cluster from dumpling to emperor (since we were successfully using it on a test cluster). A couple of hours later, we had falling OSD : some of them were marked as down by Ceph, probably because of IO starvation. I marked the cluster in «noout», start dow

Re: [ceph-users] v0.80.8 and librbd performance

2015-03-03 Thread Olivier Bonvalet
Le mardi 03 mars 2015 à 16:32 -0800, Sage Weil a écrit : > On Wed, 4 Mar 2015, Olivier Bonvalet wrote: > > Does kernel client affected by the problem ? > > Nope. The kernel client is unaffected.. the issue is in librbd. > > sage > Ok, thanks for the clarifi

Re: [ceph-users] v0.80.8 and librbd performance

2015-03-03 Thread Olivier Bonvalet
Does kernel client affected by the problem ? Le mardi 03 mars 2015 à 15:19 -0800, Sage Weil a écrit : > Hi, > > This is just a heads up that we've identified a performance regression in > v0.80.8 from previous firefly releases. A v0.80.9 is working it's way > through QA and should be out in a

Re: [ceph-users] rbd import-diff + erasure coding

2014-07-23 Thread Olivier Bonvalet
mented, so check it out! :) » So, it's not usable to backup a production cluster. I have to use replicated pool. Le mercredi 23 juillet 2014 à 17:51 +0200, Olivier Bonvalet a écrit : > Hi, > > from my tests, I can't import snapshot from a replicated pool (in > cluster

[ceph-users] rbd import-diff + erasure coding

2014-07-23 Thread Olivier Bonvalet
Hi, from my tests, I can't import snapshot from a replicated pool (in cluster1) to an erasure-coding pool (in cluster2). Is it a known limitation ? A temporary one ? Or did I make a mistake somewhere ? The cluster1 (aka production) is running Ceph 0.67.9), and the cluster2 (aka backup) is runnin

Re: [ceph-users] Data still in OSD directories after removing

2014-05-22 Thread Olivier Bonvalet
Le mercredi 21 mai 2014 à 18:20 -0700, Josh Durgin a écrit : > On 05/21/2014 03:03 PM, Olivier Bonvalet wrote: > > Le mercredi 21 mai 2014 à 08:20 -0700, Sage Weil a écrit : > >> You're certain that that is the correct prefix for the rbd image you > >> removed? D

Re: [ceph-users] Data still in OSD directories after removing

2014-05-21 Thread Olivier Bonvalet
Le mercredi 21 mai 2014 à 08:20 -0700, Sage Weil a écrit : > > You should definitely not do this! :) Of course ;) > > You're certain that that is the correct prefix for the rbd image you > removed? Do you see the objects lists when you do 'rados -p rbd ls - | > grep '? I'm pretty sure yes

Re: [ceph-users] Data still in OSD directories after removing

2014-05-21 Thread Olivier Bonvalet
.0.14bfb5a.238e1f29.*' -delete Thanks for any advice, Olivier PS : not sure if this kind of problem is for the user or dev mailing list. Le mardi 20 mai 2014 à 11:32 +0200, Olivier Bonvalet a écrit : > Hi, > > short : I removed a 1TB RBD image, but I still see files about it on > OSD

[ceph-users] Data still in OSD directories after removing

2014-05-20 Thread Olivier Bonvalet
Hi, short : I removed a 1TB RBD image, but I still see files about it on OSD. long : 1) I did : "rbd snap purge $pool/$img" but since it overload the cluster, I stopped it (CTRL+C) 2) latter, "rbd snap purge $pool/$img" 3) then, "rbd rm $pool/$img" now, on the disk I can found files of this

Re: [ceph-users] The Ceph disk I would like to have

2014-03-25 Thread Olivier Bonvalet
Hi, not sure it's related to ceph... you should probably look at ownClound project, no ? Or use any S3/Swift client which will know how to exchange data with a RADOS gateway. Le mardi 25 mars 2014 à 16:49 +0100, Loic Dachary a écrit : > Hi, > > It's not available yet but ... are we far away ?

Re: [ceph-users] performance and disk usage of snapshots

2013-09-28 Thread Olivier Bonvalet
Hi, Le mardi 24 septembre 2013 à 18:37 +0200, Corin Langosch a écrit : > Hi there, > > do snapshots have an impact on write performance? I assume on each write all > snapshots have to get updated (cow) so the more snapshots exist the worse > write > performance will get? > Not exactly : the

Re: [ceph-users] Ceph space problem, garbage collector ?

2013-09-11 Thread Olivier Bonvalet
I removed some garbage about hosts faude / rurkh / murmillia (they was temporarily added because cluster was full). So the "clean" CRUSH map : # begin crush map tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 # devices device 0 device0 device 1 de

Re: [ceph-users] Ceph space problem, garbage collector ?

2013-09-11 Thread Olivier Bonvalet
y reports space used by partially writed object. Or is it XFS related only ? Le mercredi 11 septembre 2013 à 11:00 +0200, Olivier Bonvalet a écrit : > Hi, > > do you need more information about that ? > > thanks, > Olivier > > Le mardi 10 septembre 2013 à 11:19 -0700, Samu

Re: [ceph-users] Ceph space problem, garbage collector ?

2013-09-11 Thread Olivier Bonvalet
Hi, do you need more information about that ? thanks, Olivier Le mardi 10 septembre 2013 à 11:19 -0700, Samuel Just a écrit : > Can you post the rest of you crush map? > -Sam > > On Tue, Sep 10, 2013 at 5:52 AM, Olivier Bonvalet wrote: > > I also checked that all files in th

Re: [ceph-users] Ceph space problem, garbage collector ?

2013-09-10 Thread Olivier Bonvalet
I removed some garbage about hosts faude / rurkh / murmillia (they was temporarily added because cluster was full). So the "clean" CRUSH map : # begin crush map tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 # devices device 0 device0 device 1 d

Re: [ceph-users] Ceph space problem, garbage collector ?

2013-09-10 Thread Olivier Bonvalet
.46 up 1 47 2.72osd.47 up 1 48 2.72osd.48 up 1 Le mardi 10 septembre 2013 à 21:01 +0200, Olivier Bonvalet a écrit : > I removed some garb

Re: [ceph-users] Ceph space problem, garbage collector ?

2013-09-10 Thread Olivier Bonvalet
Le mardi 10 septembre 2013 à 11:19 -0700, Samuel Just a écrit : > Can you post the rest of you crush map? > -Sam > Yes : # begin crush map tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 # devices device 0 osd.0 device 1 osd.1 device 2 osd.2 devi

Re: [ceph-users] Ceph space problem, garbage collector ?

2013-09-10 Thread Olivier Bonvalet
eferenced in rados (compared with "rados --pool ssd3copies ls rados.ssd3copies.dump"). Le mardi 10 septembre 2013 à 13:46 +0200, Olivier Bonvalet a écrit : > Some additionnal informations : if I look on one PG only, for example > the 6.31f. "ceph pg dump" report a size

Re: [ceph-users] Ceph space problem, garbage collector ?

2013-09-10 Thread Olivier Bonvalet
n1 448Mtotal and the content of the directory : http://pastebin.com/u73mTvjs Le mardi 10 septembre 2013 à 10:31 +0200, Olivier Bonvalet a écrit : > Hi, > > I have a space problem on a production cluster, like if there is unused > data not freed : "ceph df" and "rados df

[ceph-users] Ceph space problem, garbage collector ?

2013-09-10 Thread Olivier Bonvalet
Hi, I have a space problem on a production cluster, like if there is unused data not freed : "ceph df" and "rados df" reports 613GB of data, and disk usage is 2640GB (with 3 replica). It should be near 1839GB. I have 5 hosts, 3 with SAS storage and 2 with SSD storage. I use crush rules to put po

Re: [ceph-users] Ceph + Xen - RBD io hang

2013-08-28 Thread Olivier Bonvalet
Le mercredi 28 août 2013 à 10:07 +0200, Sylvain Munaut a écrit : > Hi, > > > I use Ceph 0.61.8 and Xen 4.2.2 (Debian) in production, and can't use > > kernel 3.10.* on dom0, which hang very soon. But it's visible in kernel > > logs of the dom0, not the domU. > > Weird. I'm using 3.10.0 without i

Re: [ceph-users] Real size of rbd image

2013-08-27 Thread Olivier Bonvalet
Le mardi 27 août 2013 à 13:44 -0700, Josh Durgin a écrit : > On 08/27/2013 01:39 PM, Timofey Koolin wrote: > > Is way to know real size of rbd image and rbd snapshots? > > rbd ls -l write declared size of image, but I want to know real size. > > You can sum the sizes of the extents reported by: >

Re: [ceph-users] Ceph + Xen - RBD io hang

2013-08-27 Thread Olivier Bonvalet
Hi, I use Ceph 0.61.8 and Xen 4.2.2 (Debian) in production, and can't use kernel 3.10.* on dom0, which hang very soon. But it's visible in kernel logs of the dom0, not the domU. Anyway, you should probably re-try with kernel 3.9.11 for the dom0 (I also use 3.10.9 in domU). Olivier Le mardi 27 a

Re: [ceph-users] osd/OSD.cc: 4844: FAILED assert(_get_map_bl(epoch, bl)) (ceph 0.61.7)

2013-08-19 Thread Olivier Bonvalet
Le lundi 19 août 2013 à 12:27 +0200, Olivier Bonvalet a écrit : > Hi, > > I have an OSD which crash every time I try to start it (see logs below). > Is it a known problem ? And is there a way to fix it ? > > root! taman:/var/log/ceph# grep -v ' pipe' osd.65.log

[ceph-users] osd/OSD.cc: 4844: FAILED assert(_get_map_bl(epoch, bl)) (ceph 0.61.7)

2013-08-19 Thread Olivier Bonvalet
Hi, I have an OSD which crash every time I try to start it (see logs below). Is it a known problem ? And is there a way to fix it ? root! taman:/var/log/ceph# grep -v ' pipe' osd.65.log 2013-08-19 11:07:48.478558 7f6fe367a780 0 ceph version 0.61.7 (8f010aff684e820ecc837c25ac77c7a05d7191ff), pro

Re: [ceph-users] Replace all monitors

2013-08-10 Thread Olivier Bonvalet
Le jeudi 08 août 2013 à 18:04 -0700, Sage Weil a écrit : > On Fri, 9 Aug 2013, Olivier Bonvalet wrote: > > Le jeudi 08 ao?t 2013 ? 09:43 -0700, Sage Weil a ?crit : > > > On Thu, 8 Aug 2013, Olivier Bonvalet wrote: > > > > Hi, > > > > > > > >

Re: [ceph-users] Replace all monitors

2013-08-08 Thread Olivier Bonvalet
Le jeudi 08 août 2013 à 09:43 -0700, Sage Weil a écrit : > On Thu, 8 Aug 2013, Olivier Bonvalet wrote: > > Hi, > > > > from now I have 5 monitors which share slow SSD with several OSD > > journal. As a result, each data migration operation (reweight, recovery, &g

[ceph-users] Replace all monitors

2013-08-08 Thread Olivier Bonvalet
Hi, from now I have 5 monitors which share slow SSD with several OSD journal. As a result, each data migration operation (reweight, recovery, etc) is very slow and the cluster is near down. So I have to change that. I'm looking to replace this 5 monitors by 3 new monitors, which still share (very

Re: [ceph-users] kernel BUG at net/ceph/osd_client.c:2103

2013-08-05 Thread Olivier Bonvalet
client? > > James > > > -Original Message- > > From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users- > > boun...@lists.ceph.com] On Behalf Of Olivier Bonvalet > > Sent: Monday, 5 August 2013 11:07 AM > > To: ceph-users@lists.ceph.com > > Subject: [ceph-u

Re: [ceph-users] kernel BUG at net/ceph/osd_client.c:2103

2013-08-04 Thread Olivier Bonvalet
we have a patch that addresses the bug, would you be > able to test it? > > Thanks! > sage > > > On Mon, 5 Aug 2013, Olivier Bonvalet wrote: > > Sorry, the "dev" list is probably a better place for that one. > > > > Le lundi 05 ao?t 2013 ? 03:07 +0

Re: [ceph-users] kernel BUG at net/ceph/osd_client.c:2103

2013-08-04 Thread Olivier Bonvalet
Sorry, the "dev" list is probably a better place for that one. Le lundi 05 août 2013 à 03:07 +0200, Olivier Bonvalet a écrit : > Hi, > > I've just upgraded a Xen Dom0 (Debian Wheezy with Xen 4.2.2) from Linux > 3.9.11 to Linux 3.10.5, and now I have kernel panic afte

[ceph-users] kernel BUG at net/ceph/osd_client.c:2103

2013-08-04 Thread Olivier Bonvalet
Hi, I've just upgraded a Xen Dom0 (Debian Wheezy with Xen 4.2.2) from Linux 3.9.11 to Linux 3.10.5, and now I have kernel panic after launching some VM which use RBD kernel client. In kernel logs, I have : Aug 5 02:51:22 murmillia kernel: [ 289.205652] kernel BUG at net/ceph/osd_client.c:2

Re: [ceph-users] VMs freez after slow requests

2013-06-03 Thread Olivier Bonvalet
Le lundi 03 juin 2013 à 08:04 -0700, Gregory Farnum a écrit : > On Sunday, June 2, 2013, Dominik Mostowiec wrote: > Hi, > I try to start postgres cluster on VMs with second disk > mounted from > ceph (rbd - kvm). > I started some writes (pgbench initialisati

Re: [ceph-users] Mon store.db size

2013-06-02 Thread Olivier Bonvalet
Hi, it's a Cuttlefish bug, which should be fixed in next point release very soon. Olivier Le dimanche 02 juin 2013 à 18:51 +1000, Bond, Darryl a écrit : > Cluster has gone into HEALTH_WARN because the mon filesystem is 12% > The cluster was upgraded to cuttlefish last week and had been running o

Re: [ceph-users] [solved] scrub error: found clone without head

2013-05-31 Thread Olivier Bonvalet
Ok, so : - after a second "rbd rm XXX", the image was gone - and "rados ls" doesn't see any object from that image - so I tried to move thoses files => scrub is now ok ! So for me it's fixed. Thanks Le vendredi 31 mai 2013 à 16:34 +0200, Olivier Bonvalet a écrit

Re: [ceph-users] scrub error: found clone without head

2013-05-31 Thread Olivier Bonvalet
Note that I still have scrub errors, but rados doesn't see thoses objects : root! brontes:~# rados -p hdd3copies ls | grep '^rb.0.15c26.238e1f29' root! brontes:~# Le vendredi 31 mai 2013 à 15:36 +0200, Olivier Bonvalet a écrit : > Hi, > > sorry for the late answer

Re: [ceph-users] scrub error: found clone without head

2013-05-31 Thread Olivier Bonvalet
Le jeudi 23 mai 2013 à 15:53 -0700, Samuel Just a écrit : > Can you send the filenames in the pg directories for those 4 pgs? > -Sam > > On Thu, May 23, 2013 at 3:27 PM, Olivier Bonvalet wrote: > > No : > > pg 3.7c is active+clean+inconsistent, acting [24,13,39] > &

[ceph-users] Edge effect with multiple RBD kernel clients per host ?

2013-05-25 Thread Olivier Bonvalet
Hi, I seem to have a bad edge effect in my setup, don't know if it's a RBD problem or a Xen problem. So, I have one Ceph cluster, in which I setup 2 different storage pools : one on SSD and one on SAS. With appropriate CRUSH rules, those pools are complety separated, only MON are commons. Then,

Re: [ceph-users] scrub error: found clone without head

2013-05-23 Thread Olivier Bonvalet
15:17 -0700, Samuel Just a écrit : > Do all of the affected PGs share osd.28 as the primary? I think the > only recovery is probably to manually remove the orphaned clones. > -Sam > > On Thu, May 23, 2013 at 5:00 AM, Olivier Bonvalet wrote: > > Not yet. I keep it for now. >

Re: [ceph-users] scrub error: found clone without head

2013-05-23 Thread Olivier Bonvalet
Not yet. I keep it for now. Le mercredi 22 mai 2013 à 15:50 -0700, Samuel Just a écrit : > rb.0.15c26.238e1f29 > > Has that rbd volume been removed? > -Sam > > On Wed, May 22, 2013 at 12:18 PM, Olivier Bonvalet > wrote: > > 0.61-11-g3b94f03 (0.61-1.1), but t

Re: [ceph-users] scrub error: found clone without head

2013-05-22 Thread Olivier Bonvalet
0.61-11-g3b94f03 (0.61-1.1), but the bug occured with bobtail. Le mercredi 22 mai 2013 à 12:00 -0700, Samuel Just a écrit : > What version are you running? > -Sam > > On Wed, May 22, 2013 at 11:25 AM, Olivier Bonvalet > wrote: > > Is it enough ? > > > > # t

Re: [ceph-users] scrub error: found clone without head

2013-05-22 Thread Olivier Bonvalet
cluding all of these errors? > -Sam > > On Wed, May 22, 2013 at 5:39 AM, Dzianis Kahanovich > wrote: > > Olivier Bonvalet пишет: > >> > >> Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit : > >>> Le mardi 07 mai 2013 à 15:51 +0300, Dzianis

Re: [ceph-users] scrub error: found clone without head

2013-05-22 Thread Olivier Bonvalet
Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit : > Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit : > > I have 4 scrub errors (3 PGs - "found clone without head"), on one OSD. Not > > repairing. How to repair it exclude re-creating of OSD?

Re: [ceph-users] scrub error: found clone without head

2013-05-20 Thread Olivier Bonvalet
as IMHO self-provocation for force reinstall). Now (at least > to my summer outdoors) I keep v0.62 (3 nodes) with every pool size=3 > min_size=2 > (was - size=2 min_size=1). > > But try to do nothing first and try to install latest version. And keep your > vote to issue #4937 to

Re: [ceph-users] scrub error: found clone without head

2013-05-19 Thread Olivier Bonvalet
Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit : > I have 4 scrub errors (3 PGs - "found clone without head"), on one OSD. Not > repairing. How to repair it exclude re-creating of OSD? > > Now it "easy" to clean+create OSD, but in theory - in case there are multiple > OSDs - it may

Re: [ceph-users] PG down & incomplete

2013-05-19 Thread Olivier Bonvalet
g for missing object". Le vendredi 17 mai 2013 à 23:37 +0200, Olivier Bonvalet a écrit : > Yes, osd.10 is near full because of bad data repartition (not enought PG > I suppose), and the difficulty to remove snapshot without overloading > the cluster. > > The problem on osd.2

Re: [ceph-users] PG down & incomplete

2013-05-17 Thread Olivier Bonvalet
com/docs/master/rados/troubleshooting/troubleshooting-osd/#stopping-w-out-rebalancing > > If you have down OSDs that don't get marked out, that would certainly > cause problems. Have you tried restarting the failed OSDs? > > What do the logs look like for osd.15 and

Re: [ceph-users] PG down & incomplete

2013-05-17 Thread Olivier Bonvalet
er able to help you. > > For example, "ceph osd tree" would help us understand the status of > your cluster a bit better. > > On Thu, May 16, 2013 at 10:32 PM, Olivier Bonvalet > wrote: > > Le mercredi 15 mai 2013 à 00:15 +0200, Olivier Bonvalet a écrit :

Re: [ceph-users] PG down & incomplete

2013-05-16 Thread Olivier Bonvalet
Le mercredi 15 mai 2013 à 00:15 +0200, Olivier Bonvalet a écrit : > Hi, > > I have some PG in state down and/or incomplete on my cluster, because I > loose 2 OSD and a pool was having only 2 replicas. So of course that > data is lost. > > My problem now is that I can&#x

[ceph-users] PG down & incomplete

2013-05-14 Thread Olivier Bonvalet
Hi, I have some PG in state down and/or incomplete on my cluster, because I loose 2 OSD and a pool was having only 2 replicas. So of course that data is lost. My problem now is that I can't retreive a "HEALTH_OK" status : if I try to remove, read or overwrite the corresponding RBD images, near al

Re: [ceph-users] RBD vs RADOS benchmark performance

2013-05-12 Thread Olivier Bonvalet
Le vendredi 10 mai 2013 à 19:16 +0200, Greg a écrit : > Hello folks, > > I'm in the process of testing CEPH and RBD, I have set up a small > cluster of hosts running each a MON and an OSD with both journal and > data on the same SSD (ok this is stupid but this is simple to verify the > disks a

Re: [ceph-users] Scrub shutdown the OSD process / data loss

2013-04-22 Thread Olivier Bonvalet
Le samedi 20 avril 2013 à 09:10 +0200, Olivier Bonvalet a écrit : > Le mercredi 17 avril 2013 à 20:52 +0200, Olivier Bonvalet a écrit : > > What I didn't understand is why the OSD process crash, instead of > > marking that PG "corrupted", and does that PG really &

  1   2   >