Re: [ceph-users] Some OSD and MDS crash

2014-07-01 Thread Samuel Just
Can you reproduce with debug osd = 20 debug filestore = 20 debug ms = 1 ? -Sam On Tue, Jul 1, 2014 at 1:21 AM, Pierre BLONDEAU wrote: > Hi, > > I join : > - osd.20 is one of osd that I detect which makes crash other OSD. > - osd.23 is one of osd which crash when i start osd.20 > - mds, is one

Re: [ceph-users] Some OSD and MDS crash

2014-07-02 Thread Samuel Just
> osd.20's ? > > Thank you so much for the help > > Regards > Pierre > > Le 01/07/2014 23:51, Samuel Just a écrit : > >> Can you reproduce with >> debug osd = 20 >> debug filestore = 20 >> debug ms = 1 >> ? >> -Sam >> >> On

Re: [ceph-users] Some OSD and MDS crash

2014-07-02 Thread Samuel Just
the osd.20 some other osd crash. I pass from 31 osd up to 16. > I remark that after this the number of down+peering PG decrease from 367 to > 248. It's "normal" ? May be it's temporary, the time that the cluster > verifies all the PG ? > > Regards > Pierre > &

Re: [ceph-users] Some OSD and MDS crash

2014-07-02 Thread Samuel Just
Also, what version did you upgrade from, and how did you upgrade? -Sam On Wed, Jul 2, 2014 at 3:09 PM, Samuel Just wrote: > Ok, in current/meta on osd 20 and osd 23, please attach all files matching > > ^osdmap.13258.* > > There should be one such file on each osd. (should look

Re: [ceph-users] Some OSD and MDS crash

2014-07-02 Thread Samuel Just
Joao: this looks like divergent osdmaps, osd 20 and osd 23 have differing ideas of the acting set for pg 2.11. Did we add hashes to the incremental maps? What would you want to know from the mons? -Sam On Wed, Jul 2, 2014 at 3:10 PM, Samuel Just wrote: > Also, what version did you upgrade f

Re: [ceph-users] Some OSD and MDS crash

2014-07-02 Thread Samuel Just
OSD and 3 mds. > > Pierre > > PS : I find also "inc\uosdmap.13258__0_469271DE__none" on each meta > directory. > > Le 03/07/2014 00:10, Samuel Just a écrit : > >> Also, what version did you upgrade from, and how did you upgrade? >> -Sam >> >>

Re: [ceph-users] Some OSD and MDS crash

2014-07-02 Thread Samuel Just
rre: do you recall how and when that got set? -Sam On Wed, Jul 2, 2014 at 3:43 PM, Samuel Just wrote: > Yeah, divergent osdmaps: > 555ed048e73024687fc8b106a570db4f osd-20_osdmap.13258__0_4E62BB79__none > 6037911f31dc3c18b05499d24dcdbe5c osd-23_osdmap.13258__0_4E62BB79__none > > Joao:

Re: [ceph-users] Some OSD and MDS crash

2014-07-02 Thread Samuel Just
Can you confirm from the admin socket that all monitors are running the same version? -Sam On Wed, Jul 2, 2014 at 4:15 PM, Pierre BLONDEAU wrote: > Le 03/07/2014 00:55, Samuel Just a écrit : > >> Ah, >> >> ~/logs » for i in 20 23; do ../ceph/src/osdmaptool --export-crus

Re: [ceph-users] Some OSD and MDS crash

2014-07-02 Thread Samuel Just
n":"0.82"} > # ceph --admin-daemon /var/run/ceph/ceph-mon.joe.asok version > {"version":"0.82"} > > Pierre > > Le 03/07/2014 01:17, Samuel Just a écrit : > >> Can you confirm from the admin socket that all monitors are running >> t

Re: [ceph-users] scrub error on firefly

2014-07-10 Thread Samuel Just
Can you attach your ceph.conf for your osds? -Sam On Thu, Jul 10, 2014 at 8:01 AM, Christian Eichelmann wrote: > I can also confirm that after upgrading to firefly both of our clusters > (test and live) were going from 0 scrub errors each for about 6 Month to > about 9-12 per week... > This also

Re: [ceph-users] scrub error on firefly

2014-07-10 Thread Samuel Just
;b. is this reconciliation done automatically during deep-scrub > or does it have to be done "manually" because there is no majority? > > Thanks, > > -Sudip > > > -Original Message----- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com]

Re: [ceph-users] scrub error on firefly

2014-07-10 Thread Samuel Just
>> failures than with inconsistencies found during deep scrub - would you >> agree? >> >> Re: repair - do you mean the "repair" process during deep scrub - if yes, >> this is automatic - correct? >> Or >> Are you referring to the explicit

Re: [ceph-users] scrub error on firefly

2014-07-11 Thread Samuel Just
When you get the next inconsistency, can you copy the actual objects from the osd store trees and get them to us? That might provide a clue. -Sam On Fri, Jul 11, 2014 at 6:52 AM, Randy Smith wrote: > > > > On Thu, Jul 10, 2014 at 4:40 PM, Samuel Just wrote: >> >> It cou

Re: [ceph-users] scrub error on firefly

2014-07-11 Thread Samuel Just
gt; > sage > > > On Fri, 11 Jul 2014, Samuel Just wrote: > >> When you get the next inconsistency, can you copy the actual objects >> from the osd store trees and get them to us? That might provide a >> clue. >> -Sam >> >> On Fri, Jul 11, 2014 at 6

Re: [ceph-users] scrub error on firefly

2014-07-11 Thread Samuel Just
ead/DIR_6/DIR_C/DIR_5/rb.0.b0ce3.238e1f29.000b__head_34DC35C6__3 > ? > > > On Fri, Jul 11, 2014 at 2:00 PM, Samuel Just wrote: >> >> Also, what filesystem are you using? >> -Sam >> >> On Fri, Jul 11, 2014 at 10:37 AM, Sage Weil wrote: >> > One other thing w

Re: [ceph-users] scrub error on firefly

2014-07-11 Thread Samuel Just
And grab the xattrs as well. -Sam On Fri, Jul 11, 2014 at 2:39 PM, Samuel Just wrote: > Right. > -Sam > > On Fri, Jul 11, 2014 at 2:05 PM, Randy Smith wrote: >> Greetings, >> >> I'm using xfs. >> >> Also, when, in a previous email, you asked if I

Re: [ceph-users] scrub error on firefly

2014-07-12 Thread Samuel Just
> www.adams.edu > 719-587-7741 > > On Jul 12, 2014 10:34 AM, "Samuel Just" wrote: >> >> Here's a diff of the two files. One of the two files appears to >> contain ceph leveldb keys? Randy, do you have an idea of what this >> rbd image is being use

Re: [ceph-users] OSD is crashing while running admin socket

2014-09-08 Thread Samuel Just
That seems reasonable. Bug away! -Sam On Mon, Sep 8, 2014 at 5:11 PM, Somnath Roy wrote: > Hi Sage/Sam, > > > > I faced a crash in OSD with latest Ceph master. Here is the log trace for > the same. > > > > ceph version 0.85-677-gd5777c4 (d5777c421548e7f039bb2c77cb0df2e9c7404723) > > 1: ceph-osd(

Re: [ceph-users] OpTracker optimization

2014-09-10 Thread Samuel Just
Added a comment about the approach. -Sam On Tue, Sep 9, 2014 at 1:33 PM, Somnath Roy wrote: > Hi Sam/Sage, > > As we discussed earlier, enabling the present OpTracker code degrading > performance severely. For example, in my setup a single OSD node with 10 > clients is reaching ~103K read iops wi

Re: [ceph-users] OpTracker optimization

2014-09-10 Thread Samuel Just
.@vger.kernel.org > [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Samuel Just > Sent: Wednesday, September 10, 2014 11:17 AM > To: Somnath Roy > Cc: Sage Weil (sw...@redhat.com); ceph-de...@vger.kernel.org; > ceph-users@lists.ceph.com > Subject: Re: OpTracker optimization >

Re: [ceph-users] OpTracker optimization

2014-09-10 Thread Samuel Just
I don't quite understand. -Sam On Wed, Sep 10, 2014 at 2:38 PM, Somnath Roy wrote: > Thanks Sam. > So, you want me to go with optracker/shadedopWq , right ? > > Regards > Somnath > > -Original Message- > From: Samuel Just [mailto:sam.j...@inktank.com] &g

Re: [ceph-users] OpTracker optimization

2014-09-10 Thread Samuel Just
acker for the ios going through > ms_dispatch path. > > 2. Additionally, for ios going through ms_fast_dispatch, you want me to > implement optracker (without internal shard) per opwq shard > > Am I right ? > > Thanks & Regards > Somnath > > -Original

Re: [ceph-users] OpTracker optimization

2014-09-11 Thread Samuel Just
th > > -Original Message- > From: Sage Weil [mailto:sw...@redhat.com] > Sent: Wednesday, September 10, 2014 8:33 PM > To: Somnath Roy > Cc: Samuel Just; ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com > Subject: RE: OpTracker optimization > > I had two s

Re: [ceph-users] v0.92 released

2015-02-03 Thread Samuel Just
lander) > * osd: add fadvise flags to ObjectStore API (Jianpeng Ma) > * osd: add get_latest_osdmap asok command (#9483 #9484 Mykola Golub) > * osd: EIO on whole-object reads when checksum is wrong (Sage Weil) > * osd: filejournal: don't cache journal when not using direct IO (Jianp

Re: [ceph-users] Unexpected OSD down during deep-scrub

2015-03-05 Thread Samuel Just
The fix for this should be in 0.93, so this must be something different, can you reproduce with debug osd = 20 debug ms = 1 debug filestore = 20 and post the log to http://tracker.ceph.com/issues/11027? On Wed, 2015-03-04 at 00:04 +0100, Yann Dupont wrote: > Le 03/03/2015 22:03, Italo Santos a é

Re: [ceph-users] Stuck PGs blocked_by non-existent OSDs

2015-03-09 Thread Samuel Just
You'll probably have to recreate osds with the same ids (empty ones), let them boot, stop them, and mark them lost. There is a feature in the tracker to improve this behavior: http://tracker.ceph.com/issues/10976 -Sam On Mon, 2015-03-09 at 12:24 +, joel.merr...@gmail.com wrote: > Hi, > > I'm

Re: [ceph-users] Stuck PGs blocked_by non-existent OSDs

2015-03-10 Thread Samuel Just
What do you mean by "unblocked" but still "stuck"? -Sam On Mon, 2015-03-09 at 22:54 +, joel.merr...@gmail.com wrote: > On Mon, Mar 9, 2015 at 2:28 PM, Samuel Just wrote: > > You'll probably have to recreate osds with the same ids (empty ones), > > let

Re: [ceph-users] Issues with fresh 0.93 OSD adding to existing cluster

2015-03-10 Thread Samuel Just
Can you reproduce this with debug osd = 20 debug filestore = 20 debug ms = 1 on the crashing osd? Also, what sha1 are the other osds and mons running? -Sam - Original Message - From: "Malcolm Haak" To: ceph-users@lists.ceph.com Sent: Tuesday, March 10, 2015 3:28:26 AM Subject: [ceph-us

Re: [ceph-users] Issues with fresh 0.93 OSD adding to existing cluster

2015-03-10 Thread Samuel Just
Joao, it looks like map 2759 is causing trouble, how would he get the full and incremental maps for that out of the mons? -Sam On Tue, 2015-03-10 at 14:12 +, Malcolm Haak wrote: > Hi Samuel, > > The sha1? I'm going to admit ignorance as to what you are looking for. They > are all running the

Re: [ceph-users] Stuck PGs blocked_by non-existent OSDs

2015-03-10 Thread Samuel Just
tree, osd dump etc). There were blocked_by > operations that no longer exist after doing the OSD addition. > > Side note, spent some time yesterday writing some bash to do this > programatically (might be useful to others, will throw on github) > > On Tue, Mar 10, 2015 at

Re: [ceph-users] Stuck PGs blocked_by non-existent OSDs

2015-03-11 Thread Samuel Just
.467.query https://gist.github.com/05dbcdc9ee089bd52d0c On Tue, Mar 10, 2015 at 2:49 PM, Samuel Just wrote: Yeah, get a ceph pg query on one of the stuck ones. -Sam On Tue, 2015-03-10 at 14:41 +, joel.merr...@gmail.com wrote: Stuck unclean and stuck inactive. I can fire up a full query and h

Re: [ceph-users] Stuck PGs blocked_by non-existent OSDs

2015-03-11 Thread Samuel Just
ances). When you say complicated and fragile, could you expand? Thanks again! Joel On Wed, Mar 11, 2015 at 1:21 PM, Samuel Just wrote: Ok, you lost all copies from an interval where the pgs went active. The recovery from this is going to be complicated and fragile. Are the pools valuable? -Sam

Re: [ceph-users] Random OSD failures - FAILED assert

2015-03-17 Thread Samuel Just
Most likely fixed in firefly. -Sam - Original Message - From: "Kostis Fardelas" To: "ceph-users" Sent: Tuesday, March 17, 2015 12:30:43 PM Subject: [ceph-users] Random OSD failures - FAILED assert Hi, we are running Ceph v.0.72.2 (emperor) from the ceph emperor repo. The latest week we

Re: [ceph-users] monitor 0.87.1 crashes

2015-03-27 Thread Samuel Just
You'll want to at least include the backtrace. -Sam On 03/27/2015 10:55 AM, samuel wrote: Hi all, In a fully functional ceph installation today we suffer a problem with ceph monitors, that started crashing with following error: include/interval_set.h: 340: FAILED assert(0) Is there any relat

Re: [ceph-users] OSDs failing on upgrade from Giant to Hammer

2015-04-19 Thread Samuel Just
I have a suspicion about what caused this. Can you restart one of the problem osds with debug osd = 20 debug filestore = 20 debug ms = 1 and attach the resulting log from startup to crash along with the osdmap binary (ceph osd getmap -o ). -Sam - Original Message - From: "Scott Laird"

Re: [ceph-users] OSDs failing on upgrade from Giant to Hammer

2015-04-21 Thread Samuel Just
m the crashing osds using the ceph-objectstore-tool. -Sam - Original Message - From: "Scott Laird" To: "Samuel Just" Cc: "Robert LeBlanc" , "'ceph-users@lists.ceph.com' (ceph-users@lists.ceph.com)" Sent: Monday, April 20, 2015 6:13:06 AM

Re: [ceph-users] Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down

2015-04-27 Thread Samuel Just
Can you explain exactly what you mean by: "Also I created one pool for tier to be able to move data without outage." -Sam - Original Message - From: "tuomas juntunen" To: "Ian Colle" Cc: ceph-users@lists.ceph.com Sent: Monday, April 27, 2015 4:23:44 AM Subject: Re: [ceph-users] Upgrade

Re: [ceph-users] Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down

2015-04-27 Thread Samuel Just
force-nonempty flag for that operation, I think. I think the immediate answer is probably to disallow pools with snapshots as a cache tier altogether until we think of a good way to make it work. -Sam - Original Message - From: "tuomas juntunen" To: "Samuel J

[ceph-users] straw vs straw2 mapping differences

2015-05-06 Thread Samuel Just
I took a bit of time to get a feel for how different the straw2 mappings are vs straw1 mappings. For a bucket in which all weights are the same, I saw no changed mappings, which is as expected. However, on a map with 3 hosts each of which has 4 osds with weights 1,2,3, and 4 (crush-different-w

Re: [ceph-users] Write freeze when writing to rbd image and rebooting one of the nodes

2015-05-13 Thread Samuel Just
In short, the drawback is false positives which can cause unnecessary cluster churn. -Sam - Original Message - From: "Robert LeBlanc" To: "Vasiliy Angapov" Cc: "Sage Weil" , "ceph-users" Sent: Wednesday, May 13, 2015 12:21:16 PM Subject: Re: [ceph-users] Write freeze when writing to rb

Re: [ceph-users] OSD unable to start (giant -> hammer)

2015-05-18 Thread Samuel Just
You have most likely hit http://tracker.ceph.com/issues/11429. There are some workarounds in the bugs marked as duplicates of that bug, or you can wait for the next hammer point release. -Sam - Original Message - From: "Berant Lemmenes" To: ceph-users@lists.ceph.com Sent: Monday, May 1

Re: [ceph-users] OSD crashing over and over, taking cluster down

2015-05-19 Thread Samuel Just
You appear to be using pool snapshots with radosgw, I suspect that's what is causing the issue. Can you post a longer log? Preferably with debug osd = 20 debug filestore = 20 debug ms = 1 from startup to crash on an osd? -Sam - Original Message - From: "Daniel Schneller" To: ceph-use

Re: [ceph-users] OSD unable to start (giant -> hammer)

2015-05-19 Thread Samuel Just
If 2.14 is part of a non-existent pool, you should be able to rename it out of current/ in the osd directory to prevent the osd from seeing it on startup. -Sam - Original Message - From: "Berant Lemmenes" To: "Samuel Just" Cc: ceph-users@lists.ceph.com Sent: Tuesday

[ceph-users] Discuss: New default recovery config settings

2015-05-29 Thread Samuel Just
Many people have reported that they need to lower the osd recovery config options to minimize the impact of recovery on client io. We are talking about changing the defaults as follows: osd_max_backfills to 1 (from 10) osd_recovery_max_active to 3 (from 15) osd_recovery_op_priority to 1 (from 1

[ceph-users] librados clone_range

2015-06-23 Thread Samuel Just
ObjectWriteOperations currently allow you to perform a clone_range from another object with the same object locator. Years ago, rgw used this as part of multipart upload. Today, the implementation complicates the OSD considerably, and it doesn't appear to have any users left. Is there anyone

Re: [ceph-users] PGs going inconsistent after stopping the primary

2015-07-22 Thread Samuel Just
Looks like it's just a stat error. The primary appears to have the correct stats, but the replica for some reason doesn't (thinks there's an object for some reason). I bet it clears itself it you perform a write on the pg since the primary will send over its stats. We'd need information from

Re: [ceph-users] PGs going inconsistent after stopping the primary

2015-07-22 Thread Samuel Just
Annoying that we don't know what caused the replica's stat structure to get out of sync. Let us know if you see it recur. What were those pools used for? -Sam - Original Message - From: "Dan van der Ster" To: "Samuel Just" Cc: ceph-users@lists.ceph.com

Re: [ceph-users] PGs going inconsistent after stopping the primary

2015-07-23 Thread Samuel Just
Oh, if you were running dev releases, it's not super surprising that the stat tracking was at some point buggy. -Sam - Original Message - From: "Dan van der Ster" To: "Samuel Just" Cc: ceph-users@lists.ceph.com Sent: Thursday, July 23, 2015 8:21:07 AM Subj

Re: [ceph-users] why are there "degraded" PGs when adding OSDs?

2015-07-27 Thread Samuel Just
Hmm, that's odd. Can you attach the osdmap and ceph pg dump prior to the addition (with all pgs active+clean), then the osdmap and ceph pg dump afterwards? -Sam - Original Message - From: "Chad William Seys" To: "Samuel Just" , "ceph-users" Sent

Re: [ceph-users] why are there "degraded" PGs when adding OSDs?

2015-07-28 Thread Samuel Just
sage - From: "Chad William Seys" To: "Samuel Just" Cc: "ceph-users" Sent: Tuesday, July 28, 2015 7:40:31 AM Subject: Re: [ceph-users] why are there "degraded" PGs when adding OSDs? Hi Sam, Trying again today with crush tunables set to firefly. Degraded pe

Re: [ceph-users] Inconsistent PGs that ceph pg repair does not fix

2015-08-03 Thread Samuel Just
Hrm, that's certainly supposed to work. Can you make a bug? Be sure to note what version you are running (output of ceph-osd -v). -Sam On Mon, Aug 3, 2015 at 12:34 PM, Andras Pataki wrote: > Summary: I am having problems with inconsistent PG's that the 'ceph pg > repair' command does not fix.

[ceph-users] C++11 and librados C++

2015-08-03 Thread Samuel Just
It seems like it's about time for us to make the jump to C++11. This is probably going to have an impact on users of the librados C++ bindings. It seems like such users would have to recompile code using the librados C++ libraries after upgrading the librados library version. Is that reasonable?

Re: [ceph-users] Is it safe to increase pg number in a production environment

2015-08-04 Thread Samuel Just
It will cause a large amount of data movement. Each new pg after the split will relocate. It might be ok if you do it slowly. Experiment on a test cluster. -Sam On Mon, Aug 3, 2015 at 12:57 AM, 乔建峰 wrote: > Hi Cephers, > > This is a greeting from Jevon. Currently, I'm experiencing an issue whi

Re: [ceph-users] Repair inconsistent pgs..

2015-08-18 Thread Samuel Just
Is the number of inconsistent objects growing? Can you attach the whole ceph.log from the 6 hours before and after the snippet you linked above? Are you using cache/tiering? Can you attach the osdmap (ceph osd getmap -o )? -Sam On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor wrote: > ceph -

Re: [ceph-users] Repair inconsistent pgs..

2015-08-18 Thread Samuel Just
Also, what command are you using to take snapshots? -Sam On Tue, Aug 18, 2015 at 8:48 AM, Samuel Just wrote: > Is the number of inconsistent objects growing? Can you attach the > whole ceph.log from the 6 hours before and after the snippet you > linked above? Are you using cache/tier

Re: [ceph-users] any recommendation of using EnhanceIO?

2015-08-18 Thread Samuel Just
1. We've kicked this around a bit. What kind of failure semantics would you be comfortable with here (that is, what would be reasonable behavior if the client side cache fails)? 2. We've got a branch which should merge soon (tomorrow probably) which actually does allow writes to be proxied, so th

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Samuel Just
lume to new one. > > But after that - scrub errors growing... Was 15 errors.. .Now 35... We laos > try to out OSD which was lead, but after rebalancing this 2 pgs still have > 35 scrub errors... > > ceph osd getmap -o - attached > > > 2015-08-18 18:48 GMT+03:00 Samuel Just : &

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Samuel Just
Also, was there at any point a power failure/power cycle event, perhaps on osd 56? -Sam On Thu, Aug 20, 2015 at 9:23 AM, Samuel Just wrote: > Ok, you appear to be using a replicated cache tier in front of a > replicated base tier. Please scrub both inconsistent pgs and post the > ceph

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Samuel Just
What was the issue? -Sam On Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor wrote: > Samuel, we turned off cache layer few hours ago... > I will post ceph.log in few minutes > > For snap - we found issue, was connected with cache tier.. > > 2015-08-20 19:23 GMT+03:00 Samuel Ju

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Samuel Just
requested from cache layer. > > 2015-08-20 19:53 GMT+03:00 Samuel Just : >> >> What was the issue? >> -Sam >> >> On Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor >> wrote: >> > Samuel, we turned off cache layer few hours ago... >> > I

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Samuel Just
Which docs? -Sam On Thu, Aug 20, 2015 at 9:57 AM, Voloshanenko Igor wrote: > Not yet. I will create. > But according to mail lists and Inktank docs - it's expected behaviour when > cache enable > > 2015-08-20 19:56 GMT+03:00 Samuel Just : >> >> Is there a bug

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Samuel Just
/ceph-users@lists.ceph.com/msg18338.html > > 2015-08-20 20:06 GMT+03:00 Samuel Just : >> >> Which docs? >> -Sam >> >> On Thu, Aug 20, 2015 at 9:57 AM, Voloshanenko Igor >> wrote: >> > Not yet. I will create. >> > But according to mail li

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Samuel Just
The feature bug for the tool is http://tracker.ceph.com/issues/12740. -Sam On Thu, Aug 20, 2015 at 2:52 PM, Samuel Just wrote: > Ah, this is kind of silly. I think you don't have 37 errors, but 2 > errors. pg 2.490 object > 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Samuel Just
it? > > I mean i can help with coding/testing/etc... > > 2015-08-21 0:52 GMT+03:00 Samuel Just : >> >> Ah, this is kind of silly. I think you don't have 37 errors, but 2 >> errors. pg 2.490 object >> 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapd

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Samuel Just
block names started with this... > >> Actually, now that I think about it, you probably didn't remove the >> images for 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 >> and 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2 > > > > > 2015-08

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Samuel Just
you'll probably just remove the images. -Sam On Thu, Aug 20, 2015 at 3:45 PM, Voloshanenko Igor wrote: > Image? One? > > We start deleting images only to fix thsi (export/import)m before - 1-4 > times per day (when VM destroyed)... > > > > 2015-08-21 1:44 GMT+03:00 Sa

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
Snapshotting with cache/tiering *is* supposed to work. Can you open a bug? -Sam On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic wrote: > This was related to the caching layer, which doesnt support snapshooting per > docs...for sake of closing the thread. > > On 17 August 2015 at 21:15, Voloshanen

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
Also, can you include the kernel version? -Sam On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just wrote: > Snapshotting with cache/tiering *is* supposed to work. Can you open a bug? > -Sam > > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic > wrote: >> This was related to the

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
17 17:37:22 UTC > 2015 x86_64 x86_64 x86_64 GNU/Linux > > 2015-08-21 1:54 GMT+03:00 Samuel Just : >> >> Also, can you include the kernel version? >> -Sam >> >> On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just wrote: >> > Snapshotting with cache/t

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
x-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22 UTC >> 2015 x86_64 x86_64 x86_64 GNU/Linux >> >> 2015-08-21 1:54 GMT+03:00 Samuel Just : >>> >>> Also, can you include the kernel version? >>> -Sam >>> >>> On Thu, Aug 2

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
Certainly, don't reproduce this with a cluster you care about :). -Sam On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just wrote: > What's supposed to happen is that the client transparently directs all > requests to the cache pool rather than the cold pool when there is a > cache p

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
So you started draining the cache pool before you saw either the inconsistent pgs or the anomalous snap behavior? (That is, writeback mode was working correctly?) -Sam On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor wrote: > Good joke ) > > 2015-08-21 2:06 GMT+03:00 Sa

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
Created a ticket to improve our testing here -- this appears to be a hole. http://tracker.ceph.com/issues/12742 -Sam On Thu, Aug 20, 2015 at 4:09 PM, Samuel Just wrote: > So you started draining the cache pool before you saw either the > inconsistent pgs or the anomalous snap behavior?

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
th data and evict/flush started... > > > > 2015-08-21 2:09 GMT+03:00 Samuel Just : >> >> So you started draining the cache pool before you saw either the >> inconsistent pgs or the anomalous snap behavior? (That is, writeback >> mode was working correctly?) >> -Sam >

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
Also, what do you mean by "change journal side"? -Sam On Thu, Aug 20, 2015 at 4:15 PM, Samuel Just wrote: > Not sure what you mean by: > > but it's stop to work in same moment, when cache layer fulfilled with > data and evict/flush started... > -Sam >

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
notification from monitoring that we collect about 750GB in > hot pool ) So i changed values for max_object_bytes to be 0,9 of disk > size... And then evicting/flushing started... > > And issue with snapshots arrived > > 2015-08-21 2:15 GMT+03:00 Samuel Just : >> >> Not sur

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
Yeah, I'm trying to confirm that the issues did happen in writeback mode. -Sam On Thu, Aug 20, 2015 at 4:21 PM, Voloshanenko Igor wrote: > Right. But issues started... > > 2015-08-21 2:20 GMT+03:00 Samuel Just : >> >> But that was still in writeback mode, right? &g

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
Specifically, the snap behavior (we already know that the pgs went inconsistent while the pool was in writeback mode, right?). -Sam On Thu, Aug 20, 2015 at 4:22 PM, Samuel Just wrote: > Yeah, I'm trying to confirm that the issues did happen in writeback mode. > -Sam > > On Thu,

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
And you adjusted the journals by removing the osd, recreating it with a larger journal, and reinserting it? -Sam On Thu, Aug 20, 2015 at 4:24 PM, Voloshanenko Igor wrote: > Right ( but also was rebalancing cycle 2 day before pgs corrupted) > > 2015-08-21 2:23 GMT+03:00 Sa

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
Ok, create a ticket with a timeline and all of this information, I'll try to look into it more tomorrow. -Sam On Thu, Aug 20, 2015 at 4:25 PM, Voloshanenko Igor wrote: > Exactly > > пятница, 21 августа 2015 г. пользователь Samuel Just написал: > >> And you adjusted the j

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
}-${TYPE} weight 9.000/item ${HOST}-${TYPE} weight > 1.000/" cm > > echo "Compile new CRUSHMAP" > crushtool -c cm -o cm.new > > echo "Inject new CRUSHMAP" > ceph osd setcrushmap -i cm.new > > #echo "Clean..." > #rm -rf cm cm.new >

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-21 Thread Samuel Just
Odd, did you happen to capture osd logs? -Sam On Thu, Aug 20, 2015 at 8:10 PM, Ilya Dryomov wrote: > On Fri, Aug 21, 2015 at 2:02 AM, Samuel Just wrote: >> What's supposed to happen is that the client transparently directs all >> requests to the cache pool rather than the

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-21 Thread Samuel Just
I think I found the bug -- need to whiteout the snapset (or decache it) upon evict. http://tracker.ceph.com/issues/12748 -Sam On Fri, Aug 21, 2015 at 8:04 AM, Ilya Dryomov wrote: > On Fri, Aug 21, 2015 at 5:59 PM, Samuel Just wrote: >> Odd, did you happen to capture osd logs? > &

Re: [ceph-users] Help with inconsistent pg on EC pool, v9.0.2

2015-08-28 Thread Samuel Just
David, does this look familiar? -Sam On Fri, Aug 28, 2015 at 10:43 AM, Aaron Ten Clay wrote: > Hi Cephers, > > I'm trying to resolve an inconsistent pg on an erasure-coded pool, running > Ceph 9.0.2. I can't seem to get Ceph to run a repair or even deep-scrub the > pg again. Here's the background

Re: [ceph-users] ceph -w warning "I don't have pgid 0.2c8"?

2013-07-17 Thread Samuel Just
What version are you running? How did you move the osds from 2TB to 4TB? -Sam On Wed, Jul 17, 2013 at 12:59 AM, Ta Ba Tuan wrote: > Hi everyone, > > I converted every osds from 2TB to 4TB, and when moving complete, show log > Ceph realtime"ceph -w": > displays error: "I don't have pgid 0.2c8" >

Re: [ceph-users] ceph -w warning "I don't have pgid 0.2c8"?

2013-07-18 Thread Samuel Just
on't know how to remove those pgs?. > Please guiding this error help me! > > Thank you! > --tuantaba > TA BA TUAN > > > On 07/18/2013 01:16 AM, Samuel Just wrote: > > What version are you running? How did you move the osds from 2TB to 4TB? > -Sam > > On We

Re: [ceph-users] how to repair laggy storage cluster

2013-08-09 Thread Samuel Just
Can you attach the output of ceph -s? -Sam On Fri, Aug 9, 2013 at 11:10 AM, Suresh Sadhu wrote: > how to repair laggy storage cluster,able to create images on the pools even > if HEATH state shows WARN, > > > > sudo ceph > > HEALTH_WARN 181 pgs degraded; 676 pgs stuck unclean; recovery 2/107 degr

Re: [ceph-users] OSD Keep Crashing

2013-08-12 Thread Samuel Just
Can you post more of the log? There should be a line towards the bottom indicating the line with the failed assert. Can you also attach ceph pg dump, ceph osd dump, ceph osd tree? -Sam On Mon, Aug 12, 2013 at 11:54 AM, John Wilkins wrote: > Stephane, > > You should post any crash bugs with sta

Re: [ceph-users] ceph-deploy and journal on separate disk

2013-08-12 Thread Samuel Just
Did you try using ceph-deploy disk zap ceph001:sdaa first? -Sam On Mon, Aug 12, 2013 at 6:21 AM, Pavel Timoschenkov wrote: > Hi. > > I have some problems with create journal on separate disk, using ceph-deploy > osd prepare command. > > When I try execute next command: > > ceph-deploy osd prepare

Re: [ceph-users] mounting a pool via fuse

2013-08-12 Thread Samuel Just
Can you elaborate on what behavior you are looking for? -Sam On Fri, Aug 9, 2013 at 4:37 AM, Georg Höllrigl wrote: > Hi, > > I'm using ceph 0.61.7. > > When using ceph-fuse, I couldn't find a way, to only mount one pool. > > Is there a way to mount a pool - or is it simply not supported? > > > >

Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)

2013-08-12 Thread Samuel Just
Can you attach the output of ceph osd tree? Also, can you run ceph osd getmap -o /tmp/osdmap and attach /tmp/osdmap? -Sam On Fri, Aug 9, 2013 at 4:28 AM, Jeff Moskow wrote: > Thanks for the suggestion. I had tried stopping each OSD for 30 seconds, > then restarting it, waiting 2 minutes and t

Re: [ceph-users] run ceph without auth

2013-08-12 Thread Samuel Just
I have referred you to someone more conversant with the details of mkcephfs, but for dev purposes, most of us use the vstart.sh script in src/ (http://ceph.com/docs/master/dev/). -Sam On Fri, Aug 9, 2013 at 2:59 AM, Nulik Nol wrote: > Hi, > I am configuring a single node for developing purposes,

Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)

2013-08-12 Thread Samuel Just
Are you using any kernel clients? Will osds 3,14,16 be coming back? -Sam On Mon, Aug 12, 2013 at 2:26 PM, Jeff Moskow wrote: > Sam, > > I've attached both files. > > Thanks! > Jeff > > On Mon, Aug 12, 2013 at 01:46:57PM -0700, Samuel Just wrote: &g

Re: [ceph-users] How to set Object Size/Stripe Width/Stripe Count?

2013-08-12 Thread Samuel Just
I think the docs you are looking for are http://ceph.com/docs/master/man/8/cephfs/ (specifically the set_layout command). -Sam On Thu, Aug 8, 2013 at 7:48 AM, Da Chun wrote: > Hi list, > I saw the info about data striping in > http://ceph.com/docs/master/architecture/#data-striping . > But couldn

Re: [ceph-users] Ceph pgs stuck unclean

2013-08-12 Thread Samuel Just
Can you attach the output of: ceph -s ceph pg dump ceph osd dump and run ceph osd getmap -o /tmp/osdmap and attach /tmp/osdmap/ -Sam On Wed, Aug 7, 2013 at 1:58 AM, Howarth, Chris wrote: > Hi, > > One of our OSD disks failed on a cluster and I replaced it, but when it > failed it did not

Re: [ceph-users] could not generate the bootstrap key

2013-08-12 Thread Samuel Just
Can you give a step by step account of what you did prior to the error? -Sam On Tue, Aug 6, 2013 at 10:52 PM, 於秀珠 wrote: > using the ceph-deploy to manage a existing cluster,i follow the steps in the > document ,but there is some errors that i can not gather the keys. > when i run the command "ce

Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)

2013-08-12 Thread Samuel Just
> On Mon, Aug 12, 2013 at 02:41:11PM -0700, Samuel Just wrote: >> Are you using any kernel clients? Will osds 3,14,16 be coming back? >> -Sam >> >> On Mon, Aug 12, 2013 at 2:26 PM, Jeff Moskow wrote: >> > Sam, >> > >> > I've attac

Re: [ceph-users] one pg stuck with 2 unfound pieces

2013-08-13 Thread Samuel Just
You can run 'ceph pg 0.cfa mark_unfound_lost revert'. (Revert Lost section of http://ceph.com/docs/master/rados/operations/placement-groups/). -Sam On Tue, Aug 13, 2013 at 6:50 AM, Jens-Christian Fischer wrote: > We have a cluster with 10 servers, 64 OSDs and 5 Mons on them. The OSDs are > 3TB di

Re: [ceph-users] Ceph pgs stuck unclean

2013-08-13 Thread Samuel Just
vered": 45, > "num_bytes_recovered": 188743680, > "num_keys_recovered": 0}, > "stat_cat_sum": {}, > "up": [ > 5, > 4], > "acting": [ > 5, >

Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)

2013-08-13 Thread Samuel Just
Cool! -Sam On Tue, Aug 13, 2013 at 4:49 AM, Jeff Moskow wrote: > Sam, > > Thanks that did it :-) > >health HEALTH_OK >monmap e17: 5 mons at > {a=172.16.170.1:6789/0,b=172.16.170.2:6789/0,c=172.16.170.3:6789/0,d=172.16.170.4:6789/0,e=172.16.170.5:6789/0}, > election epoch 9794, quorum

  1   2   3   4   >