Re: [ceph-users] xfs corruption

2016-03-07 Thread Ric Wheeler
You are right that some cards might not send those commands on to the backend storage, but spinning disks don't usually implement either trim or discard (SSD's do though). XFS, ext4, etc can pass down those commands to the firmware on the card and it is up to the firmware to propagate the comm

Re: [ceph-users] xfs corruption

2016-03-07 Thread Ferhat Ozkasgarli
I am always forgetting this reply all things. *RAID5 and RAID10 (or other raid levels) are a property of the block devices. XFS, ext4, etc can pass down those commands to the firmware on the card and it is up to the firmware to propagate the command on to the backend drives.* You mean I can get a

Re: [ceph-users] xfs corruption

2016-03-07 Thread Ric Wheeler
Unfortunately, you will have to follow up with the hardware RAID card vendors to see what commands their firmware handles. Good luck! Ric On 03/07/2016 01:37 PM, Ferhat Ozkasgarli wrote: I am always forgetting this reply all things. / / /RAID5 and RAID10 (or other raid levels) are a proper

Re: [ceph-users] Cache tier operation clarifications

2016-03-07 Thread Nick Fisk
Hi Christian, > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Christian Balzer > Sent: 07 March 2016 02:22 > To: ceph-users > Subject: Re: [ceph-users] Cache tier operation clarifications > > > Hello, > > I'd like to get some insights,

[ceph-users] After Reboot no OSD disks mountend

2016-03-07 Thread Martin Palma
Hi All, we are in the middle of patching our OSD servers and noticed that after rebooting no OSD disk is mounted and therefore no OSD service starts. We have then to manually call "ceph-disk-activate /dev/sdX1" for all our disk in order to mount and start the OSD service again. Here a the versio

Re: [ceph-users] After Reboot no OSD disks mountend

2016-03-07 Thread Dan van der Ster
Hi, As a workaround you can add "ceph-disk activate-all" to rc.local. (We use this all the time anyway just in case...) -- Dan On Mon, Mar 7, 2016 at 9:38 AM, Martin Palma wrote: > Hi All, > > we are in the middle of patching our OSD servers and noticed that > after rebooting no OSD disk is mou

Re: [ceph-users] xfs corruption

2016-03-07 Thread Jan Schermer
This functionality is common on RAID controllers in combination with HCL-certified drives. This usually means that you can't rely on it working unless you stick to the exact combination that's certified, which is impossible in practice. For example LSI controllers do this if you get the right SS

Re: [ceph-users] After Reboot no OSD disks mountend

2016-03-07 Thread Martin Palma
Hi Dan, thanks for the quick replay and fix suggestion. So we are not the only one facing this issue :-) Best, Martin On Mon, Mar 7, 2016 at 10:04 AM, Dan van der Ster wrote: > Hi, > > As a workaround you can add "ceph-disk activate-all" to rc.local. > (We use this all the time anyway just in c

Re: [ceph-users] After Reboot no OSD disks mountend

2016-03-07 Thread Dan van der Ster
Hi, To clarify, I didn't notice this issue in 0.94.6 specifically... I just don't trust the udev magic to work every time after every kernel upgrade, etc. -- Dan On Mon, Mar 7, 2016 at 10:20 AM, Martin Palma wrote: > Hi Dan, > > thanks for the quick replay and fix suggestion. So we are not the

Re: [ceph-users] After Reboot no OSD disks mountend

2016-03-07 Thread Martin Palma
We have tested it now also on 0.94.6 and after rebooting the OSD host the disks aren't' mounted and no OSD service is therefore running. So I'm with you and don't trust the udev magic. Best, Martin On Mon, Mar 7, 2016 at 11:00 AM, Dan van der Ster wrote: > Hi, > > To clarify, I didn't notice thi

Re: [ceph-users] Ceph & systemctl on Debian

2016-03-07 Thread Christian Balzer
Hello, since everybody else who might actually have answers for you will ask this: What version of Debian (Jessie one assumes)? What Ceph packages, Debian ones or from the Ceph repository? Exact versions please. As for me, I had similar experiences with Firefly (Debian package) under Jessie and

Re: [ceph-users] slow requests with rbd

2016-03-07 Thread Jan Krcmar
hi, i've got following sysctl set: kernel.perf_event_paranoid=2 kernel.watchdog_thresh=10 vm.min_free_kbytes=262144 could any of these configs cause the problem? fous 2016-03-04 13:48 GMT+01:00 Max A. Krasilnikov : > Здравствуйте! > > On Fri, Mar 04, 2016 at 01:33:24PM +0100, honza801 wrote:

[ceph-users] 1 more way to kill OSD

2016-03-07 Thread Dzianis Kahanovich
This issue was fixed by "xfs_repair -L". 1) Megaraid SAS (Intel's SATA still unaffected, but time was limited), spinning HDDs. 2) XFS v5 (mkfs.xfs -m crc=1,finobt=1 ...) 3) mount -o logbsize=256k,logbufs=8... (noatime,attr2,inode64,noquota still in both cases) After few hours starts to xfs failur

Re: [ceph-users] Cache tier operation clarifications

2016-03-07 Thread Christian Balzer
Hello nick, On Mon, 7 Mar 2016 08:30:52 - Nick Fisk wrote: > Hi Christian, > > > -Original Message- > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf > > Of Christian Balzer > > Sent: 07 March 2016 02:22 > > To: ceph-users > > Subject: Re: [ceph-users] Cache

Re: [ceph-users] Problem: silently corrupted RadosGW objects caused by slow requests

2016-03-07 Thread Ritter Sławomir
> > Yes, > > > > fix for #8269 also has been included in our version: Dumpling 0.67.11. > > Guys from #13764 are using patched Hammer version > > I didn't notice that you were actually running Dumpling (which we > haven't supported and backported fixes for a while). Here's one issue > that you mig

[ceph-users] Port a cluster

2016-03-07 Thread Sándor Szombat
Hi guys! I want to port my cluster without using rados import/export function. The scenario is next: I made a cluster with 3 node, I fill it with data. After this I run *ceph-deploy gatherkeys host1 *and *ceph-deploy config pull host1* -> I have keyring files and config file. I make an image from

[ceph-users] Problems with starting services on Debian Jessie/Infernalis

2016-03-07 Thread Josef Johansson
Hi, We’re setting up a new cluster, but we’re having trouble restarting the monitor services. The problem is the difference between the ceph.service and ceph-mon@osd11 service in our case. root@osd11:/etc/init.d# /bin/systemctl status ceph.service ● ceph.service - LSB: Start Ceph distributed

Re: [ceph-users] Can I rebuild object maps while VMs are running ?

2016-03-07 Thread Jason Dillaman
That's disheartening to hear that your RBD images were corrupted -- do you have any more detail as to what happened? Enabling the object map is designed to flag the object map as invalid, so it won't be used as a reference for any IO ops until it is successfully rebuilt. Documentation of these

Re: [ceph-users] Port a cluster

2016-03-07 Thread Oliver Dzombic
Hi Sandor, the deploy script is in the first place a script to maintain existing ceph environments and build up new. If you want to use existing keys/informations you will have to modify the deploy script according to your needs. -- What should work would be, if you create your new cluster with

Re: [ceph-users] osd up_from, up_thru

2016-03-07 Thread Gregory Farnum
On Sun, Mar 6, 2016 at 10:56 PM, min fang wrote: > Dear, I used osd dump to extract osd monmap, and found up_from, up_thru > list, what is the difference between up_from and up_thru? > > osd.0 up in weight 1 up_from 673 up_thru 673 down_at 670 > last_clean_interval [637,669) up_from is when t

Re: [ceph-users] Infernalis 9.2.1: the "rados df"ommand show wrong data

2016-03-07 Thread Gregory Farnum
On Fri, Mar 4, 2016 at 11:56 PM, Mike Almateia wrote: > Hello Cephers! > > On my small cluster I see this: > > [root@c1 ~]# rados df > pool name KB objects clones degraded unfound > rdrd KB wrwr KB > data 0

Re: [ceph-users] Ceph & systemctl on Debian

2016-03-07 Thread Bill Sanders
Avoiding the merits of SysV vs SystemD: I went and grabbed the systemd init scripts (unit files, whatever) from upstream. As Christian suggests, answers will vary depending on your ceph version (and where your packages came from), but if you go to: https://github.com/ceph/ceph/tree/master/systemd

[ceph-users] osds crashing on Thread::create

2016-03-07 Thread Mike Lovell
first off, hello all. this is my first time posting to the list. i have seen a recurring problem that has starting in the past week or so on one of my ceph clusters. osds will crash and it seems to happen whenever backfill or recovery is started. looking at the logs it appears that the the osd is

[ceph-users] inconsistent PG -> unfound objects on an erasure coded system

2016-03-07 Thread Jeffrey McDonald
Hi, For a while, we've been seeing inconsistent placement groups on our erasure coded system. The placement groups go from a state of active+clean to active+clean+inconsistent after a deep scrub: 2016-03-07 13:45:42.044131 7f385d118700 -1 log_channel(cluster) log [ERR] : 70.320s0 deep-scrub st

Re: [ceph-users] osds crashing on Thread::create

2016-03-07 Thread Gregory Farnum
On Mon, Mar 7, 2016 at 11:04 AM, Mike Lovell wrote: > first off, hello all. this is my first time posting to the list. > > i have seen a recurring problem that has starting in the past week or so on > one of my ceph clusters. osds will crash and it seems to happen whenever > backfill or recovery i

Re: [ceph-users] inconsistent PG -> unfound objects on an erasure coded system

2016-03-07 Thread Gregory Farnum
On Mon, Mar 7, 2016 at 12:07 PM, Jeffrey McDonald wrote: > Hi, > > For a while, we've been seeing inconsistent placement groups on our erasure > coded system. The placement groups go from a state of active+clean to > active+clean+inconsistent after a deep scrub: > > > 2016-03-07 13:45:42.044131

Re: [ceph-users] inconsistent PG -> unfound objects on an erasure coded system

2016-03-07 Thread Gregory Farnum
[ Keeping this on the users list. ] Okay, so next time this happens you probably want to do a pg query on the PG which has been reported as dirty. I can't help much beyond that, but hopefully Kefu or David will chime in once there's a little more for them to look at. -Greg On Mon, Mar 7, 2016 at

Re: [ceph-users] inconsistent PG -> unfound objects on an erasure coded system

2016-03-07 Thread Jeffrey McDonald
Here is a PG which just went inconsistent: pg 70.459 is active+clean+inconsistent, acting [307,210,273,191,132,450] Attached is the result of a pg query on this. I will wait for your feedback before issuing a repair. >From what I read, the inconsistencies are more likely the result of ntp, but

Re: [ceph-users] Cache Pool and EC: objects didn't flush to a cold EC storage

2016-03-07 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Did you also set "target_max_bytes" to the size of the pool? That bit us when we didn't have it set. The ratio then uses the target_max_bytes to know when to flush. -BEGIN PGP SIGNATURE- Version: Mailvelope v1.3.6 Comment: https://www.mailvel

Re: [ceph-users] how to downgrade when upgrade from firefly to hammer fail

2016-03-07 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 There is no downgrade path. You are best off trying to fix the issue preventing the upgrade. Post some of the logs from the upgraded OSD and people can try to help you out. -BEGIN PGP SIGNATURE- Version: Mailvelope v1.3.6 Comment: https://www

Re: [ceph-users] inconsistent PG -> unfound objects on an erasure coded system

2016-03-07 Thread Jeffrey McDonald
During the 'recovery' of the first pg mentioned, I see only these messages on the primary acting OSD: 016-03-07 13:51:28.468358 7f385d118700 -1 log_channel(cluster) log [ERR] : 70.320s0 repair 18 missing, 0 inconsistent objects 2016-03-07 13:51:28.469431 7f385d118700 -1 log_channel(cluster) log [E

Re: [ceph-users] inconsistent PG -> unfound objects on an erasure coded system

2016-03-07 Thread Samuel Just
Can you enable debug osd = 20 debug filestore = 20 debug ms = 1 on all osds in that PG, rescrub, and convey to us the resulting logs? -Sam On Mon, Mar 7, 2016 at 1:36 PM, Jeffrey McDonald wrote: > Here is a PG which just went inconsistent: > > pg 70.459 is active+clean+inconsistent, acting [307

Re: [ceph-users] inconsistent PG -> unfound objects on an erasure coded system

2016-03-07 Thread Samuel Just
Also, what version are you running? I think the unfound after repair bug has actually been fixed. -Sam On Mon, Mar 7, 2016 at 1:45 PM, Jeffrey McDonald wrote: > During the 'recovery' of the first pg mentioned, I see only these messages > on the primary acting OSD: > > 016-03-07 13:51:28.468358 7

Re: [ceph-users] inconsistent PG -> unfound objects on an erasure coded system

2016-03-07 Thread Jeffrey McDonald
Do you want me to enable this for the pg already with unfound objects or the placement group just scrubbed and now inconsistent? Jeff On Mon, Mar 7, 2016 at 3:54 PM, Samuel Just wrote: > Can you enable > > debug osd = 20 > debug filestore = 20 > debug ms = 1 > > on all osds in that PG, rescrub,

Re: [ceph-users] inconsistent PG -> unfound objects on an erasure coded system

2016-03-07 Thread Samuel Just
The one just scrubbed and now inconsistent. -Sam On Mon, Mar 7, 2016 at 1:57 PM, Jeffrey McDonald wrote: > Do you want me to enable this for the pg already with unfound objects or the > placement group just scrubbed and now inconsistent? > Jeff > > On Mon, Mar 7, 2016 at 3:54 PM, Samuel Just wro

Re: [ceph-users] inconsistent PG -> unfound objects on an erasure coded system

2016-03-07 Thread Samuel Just
I think the unfound object on repair is fixed by d51806f5b330d5f112281fbb95ea6addf994324e (not in hammer yet). I opened http://tracker.ceph.com/issues/15002 for the backport and to make sure it's covered in ceph-qa-suite. No idea at this time why the objects are disappearing though. -Sam On Mon,

[ceph-users] deleting objects with a full OSD

2016-03-07 Thread David Chen
Hi, If I write too much data to a Ceph cluster such that an OSD becomes full, it seems that I am unable to delete any objects. I've read about a few options to reduce usage and get out of this situation, namely: (A) add OSDs, or (B) temporarily disable backfill then increase the full ratio, or (C

[ceph-users] query about running ceph from source code

2016-03-07 Thread Ridwan Rashid Noel
Hi, I am new to ceph deployment from source code. I am trying to run ceph (version Infernalis) from source code in virtualbox. I have built the source according to http://docs.ceph.com/docs/master/install/build-ceph/ and also installed the build using sudo make install I want to know what are th

Re: [ceph-users] inconsistent PG -> unfound objects on an erasure coded system

2016-03-07 Thread Jeffrey McDonald
Hi Sam, I've done as you requested: pg 70.459 is active+clean+inconsistent, acting [307,210,273,191,132,450] # for i in 307 210 273 191 132 450 ; do > ceph tell osd.$i injectargs '--debug-osd 20 --debug-filestore 20 --debug-ms 1' > done debug_osd=20/20 debug_filestore=20/20 debug_ms=1/1 debug_o

Re: [ceph-users] osds crashing on Thread::create

2016-03-07 Thread Mike Lovell
i just checked several of the osds running in the environment and the hard and soft limits for the number of processes is set to 257486. if its exceeding that, than it seems like there would still be a bug somewhere. i can't imagine it needing that many. $ for N in `pidof ceph-osd`; do echo ${N};

Re: [ceph-users] inconsistent PG -> unfound objects on an erasure coded system

2016-03-07 Thread Samuel Just
So after the scrub, it came up clean? The inconsistent/missing objects reappeared? -Sam On Mon, Mar 7, 2016 at 2:33 PM, Jeffrey McDonald wrote: > Hi Sam, > > I've done as you requested: > > pg 70.459 is active+clean+inconsistent, acting [307,210,273,191,132,450] > > # for i in 307 210 273 191 13

Re: [ceph-users] inconsistent PG -> unfound objects on an erasure coded system

2016-03-07 Thread Samuel Just
If so, that strongly suggests that the pg was actually never inconsistent in the first place and that the bug is in scrub itself presumably getting confused about an object during a write. The next step would be to get logs like the above from a pg as it scrubs transitioning from clean to inconsis

Re: [ceph-users] inconsistent PG -> unfound objects on an erasure coded system

2016-03-07 Thread Samuel Just
Hmm, at the end of the log, the pg is still inconsistent. Can you attach a ceph pg query on that pg? -Sam On Mon, Mar 7, 2016 at 3:05 PM, Samuel Just wrote: > If so, that strongly suggests that the pg was actually never > inconsistent in the first place and that the bug is in scrub itself > pres

Re: [ceph-users] inconsistent PG -> unfound objects on an erasure coded system

2016-03-07 Thread Samuel Just
Hmm, instead, please rescrub on the same osds the same pg with the same logging and send the logs again. There rae two object inconsistent in the last set of logs (not 18). I bet in the next scrub either none are inconsistent, or it's a disjoint set. -Sam On Mon, Mar 7, 2016 at 3:19 PM, Samuel J

Re: [ceph-users] inconsistent PG -> unfound objects on an erasure coded system

2016-03-07 Thread Shinobu Kinjo
What could cause this kind of unexpected behaviour? Any assumption?? Sorry for interrupting you. Cheers, S On Tue, Mar 8, 2016 at 8:19 AM, Samuel Just wrote: > Hmm, at the end of the log, the pg is still inconsistent. Can you > attach a ceph pg query on that pg? > -Sam > > On Mon, Mar 7, 2016 a

Re: [ceph-users] inconsistent PG -> unfound objects on an erasure coded system

2016-03-07 Thread Samuel Just
Well, the fact that different objects are being selected as inconsistent strongly suggests that the objects are not actually inconsistent. Thus, at the moment my assumption is a bug in scrub... -Sam On Mon, Mar 7, 2016 at 3:31 PM, Shinobu Kinjo wrote: > What could cause this kind of unexpected b

Re: [ceph-users] inconsistent PG -> unfound objects on an erasure coded system

2016-03-07 Thread Samuel Just
Jeffrey: can you confirm through the admin socket the versions running on each of those osds and include the output in your reply? I have a theory about what's causing the objects to be erroneously reported as inconsistent, but it requires that osd.307 be running a different version. -Sam On Mon,

Re: [ceph-users] inconsistent PG -> unfound objects on an erasure coded system

2016-03-07 Thread Jeffrey McDonald
I have one other very strange event happening, I've opened a ticket on it: http://tracker.ceph.com/issues/14766 During this migration, OSD failed probably over 400 times while moving data around. We move the empty directories and restarted the OSDs.I can't say if this is related--I have no r

Re: [ceph-users] inconsistent PG -> unfound objects on an erasure coded system

2016-03-07 Thread Samuel Just
Have you confirmed the versions? -Sam On Mon, Mar 7, 2016 at 4:29 PM, Jeffrey McDonald wrote: > I have one other very strange event happening, I've opened a ticket on it: > http://tracker.ceph.com/issues/14766 > > During this migration, OSD failed probably over 400 times while moving data > aroun

Re: [ceph-users] inconsistent PG -> unfound objects on an erasure coded system

2016-03-07 Thread Samuel Just
Hmm, so much for that theory, still looking. If you can produce another set of logs (as before) from scrubbing that pg, it might help. -Sam On Mon, Mar 7, 2016 at 4:34 PM, Jeffrey McDonald wrote: > they're all the same.see attached. > > On Mon, Mar 7, 2016 at 6:31 PM, Samuel Just wrote: >>

Re: [ceph-users] inconsistent PG -> unfound objects on an erasure coded system

2016-03-07 Thread Jeffrey McDonald
they're all the same.see attached. On Mon, Mar 7, 2016 at 6:31 PM, Samuel Just wrote: > Have you confirmed the versions? > -Sam > > On Mon, Mar 7, 2016 at 4:29 PM, Jeffrey McDonald wrote: > > I have one other very strange event happening, I've opened a ticket on > it: > > http://tracker.cep

Re: [ceph-users] inconsistent PG -> unfound objects on an erasure coded system

2016-03-07 Thread Jeffrey McDonald
I have three more pgs now showing up as inconsistent. Should I turn on debug for all OSDs to capture the transition from active+clean -> inconsistent?Do I understand correction that the repair cause the 'unfound' objects because of a bug in the repair command? Regards, Jeff On Mon, Mar 7,

Re: [ceph-users] inconsistent PG -> unfound objects on an erasure coded system

2016-03-07 Thread Samuel Just
What filesystem and kernel are you running on the osds? This (and your other bug, actually) could be explained by some kind of weird kernel readdir behavior. -Sam On Mon, Mar 7, 2016 at 4:36 PM, Samuel Just wrote: > Hmm, so much for that theory, still looking. If you can produce > another set o

Re: [ceph-users] inconsistent PG -> unfound objects on an erasure coded system

2016-03-07 Thread Samuel Just
Yes, the unfound objects are due to a bug in the repair command. I suggest you don't repair anything, actually. I don't think any of the pgs are actually inconsistent. The log I have here is definitely a case of two objects showing up as inconsistent which are actually fine. -Sam On Mon, Mar 7,

Re: [ceph-users] inconsistent PG -> unfound objects on an erasure coded system

2016-03-07 Thread Samuel Just
I'd rather you just scrubbed the same pg with the same osds and the same debugging. -Sam On Mon, Mar 7, 2016 at 4:40 PM, Samuel Just wrote: > Yes, the unfound objects are due to a bug in the repair command. I > suggest you don't repair anything, actually. I don't think any of the > pgs are actu

Re: [ceph-users] inconsistent PG -> unfound objects on an erasure coded system

2016-03-07 Thread Jeffrey McDonald
The filesystem is xfs everywhere, there are nine hosts. The two new ceph nodes 08, 09 have a new kernel.I didn't see the errors in the tracker on the new nodes, but they were only receiving new data, not migrating it. Jeff ceph2: Linux ceph2 3.13.0-65-generic #106-Ubuntu SMP Fri Oct 2 22:08:

Re: [ceph-users] inconsistent PG -> unfound objects on an erasure coded system

2016-03-07 Thread Samuel Just
' I didn't see the errors in the tracker on the new nodes, but they were only receiving new data, not migrating it.' -- What do you mean by that? -Sam On Mon, Mar 7, 2016 at 4:42 PM, Jeffrey McDonald wrote: > The filesystem is xfs everywhere, there are nine hosts. The two new ceph > nodes 08, 0

Re: [ceph-users] inconsistent PG -> unfound objects on an erasure coded system

2016-03-07 Thread Samuel Just
Which node is osd.307 on? -Sam On Mon, Mar 7, 2016 at 4:43 PM, Samuel Just wrote: > ' I didn't see the errors in the tracker on the new nodes, but they > were only receiving new data, not migrating it.' -- What do you mean > by that? > -Sam > > On Mon, Mar 7, 2016 at 4:42 PM, Jeffrey McDonald wr

Re: [ceph-users] inconsistent PG -> unfound objects on an erasure coded system

2016-03-07 Thread Jeffrey McDonald
307 is on ceph03. Jeff On Mon, Mar 7, 2016 at 6:48 PM, Samuel Just wrote: > Which node is osd.307 on? > -Sam > > On Mon, Mar 7, 2016 at 4:43 PM, Samuel Just wrote: > > ' I didn't see the errors in the tracker on the new nodes, but they > > were only receiving new data, not migrating it.' -- Wha

Re: [ceph-users] inconsistent PG -> unfound objects on an erasure coded system

2016-03-07 Thread Samuel Just
Hmm, I'll look into this a bit more tomorrow. Can you get the tree structure of the 70.459 pg directory on osd.307 (find . will do fine). -Sam On Mon, Mar 7, 2016 at 4:50 PM, Jeffrey McDonald wrote: > 307 is on ceph03. > Jeff > > On Mon, Mar 7, 2016 at 6:48 PM, Samuel Just wrote: >> >> Which no

Re: [ceph-users] inconsistent PG -> unfound objects on an erasure coded system

2016-03-07 Thread Samuel Just
On the plus side, I think I figured out http://tracker.ceph.com/issues/14766. -Sam On Mon, Mar 7, 2016 at 4:52 PM, Samuel Just wrote: > Hmm, I'll look into this a bit more tomorrow. Can you get the tree > structure of the 70.459 pg directory on osd.307 (find . will do fine). > -Sam > > On Mon, M

Re: [ceph-users] inconsistent PG -> unfound objects on an erasure coded system

2016-03-07 Thread Jeffrey McDonald
Just to be sure I grab what you need: 1- set debug logs for the pg 70.459 2 - Issue a deep-scrub ceph pg deep-scrub 70.459 3- stop once the 70.459 pg goes inconsistent? Thanks, Jeff On Mon, Mar 7, 2016 at 6:52 PM, Samuel Just wrote: > Hmm, I'll look into this a bit more tomorrow. Can you get

Re: [ceph-users] inconsistent PG -> unfound objects on an erasure coded system

2016-03-07 Thread Samuel Just
Yep, just as before. Actually, do it twice (wait for 'scrubbing' to go away each time). -Sam On Mon, Mar 7, 2016 at 5:25 PM, Jeffrey McDonald wrote: > Just to be sure I grab what you need: > > 1- set debug logs for the pg 70.459 > 2 - Issue a deep-scrub ceph pg deep-scrub 70.459 > 3- stop once t

[ceph-users] crush tunable docs and straw_calc_version

2016-03-07 Thread Sage Weil
I rewrote the CRUSH tunable docs after struggling to summarize to a customer what the impact would be to migrate a bunch of older clusters to the latest tunables: https://github.com/ceph/ceph/pull/7964 However, after trying to explain the hammer tunables vs the straw_calc_version tunab

Re: [ceph-users] inconsistent PG -> unfound objects on an erasure coded system

2016-03-07 Thread Samuel Just
Nevermind on http://tracker.ceph.com/issues/14766 , OSD::remove_dir uses the right collection_list_partial. -Sam On Mon, Mar 7, 2016 at 5:26 PM, Samuel Just wrote: > Yep, just as before. Actually, do it twice (wait for 'scrubbing' to > go away each time). > -Sam > > On Mon, Mar 7, 2016 at 5:25 P

[ceph-users] write iops drops down after testing for some minutes

2016-03-07 Thread Pei Feng Lin
Hi, I setup a ceph environment with Hammer version. I run benchmark for the rbd image with following command: fio --ioengine=rbd --pool=performance_test --rbdname=test1 --clientname=admin --iodepth=32 --direct=1 --rw=randwrite --bs=4k --numjobs=4 --runtime=600 --ramp_time=100 --name=test --gr

[ceph-users] Fwd: write iops drops down after testing for some minutes

2016-03-07 Thread Pei Feng Lin
Hi, I setup a ceph environment with Hammer version. I run benchmark for the rbd image with following command: fio --ioengine=rbd --pool=performance_test --rbdname=test1 --clientname=admin --iodepth=32 --direct=1 --rw=randwrite --bs=4k --numjobs=4 --runtime=600 --ramp_time=100 --name=test --gr

[ceph-users] Ceph Recovery Assistance, pgs stuck peering

2016-03-07 Thread Ben Hines
Howdy, I was hoping someone could help me recover a couple pgs which are causing problems in my cluster. If we aren't able to resolve this soon, we may have to just destroy them and lose some data. Recovery has so far been unsuccessful. Data loss would probably cause some here to reconsider Ceph a

Re: [ceph-users] Fwd: write iops drops down after testing for some minutes

2016-03-07 Thread Christian Balzer
Hello, On Tue, 8 Mar 2016 12:04:23 +0800 Pei Feng Lin wrote: > Hi, > > I setup a ceph environment with Hammer version. I run benchmark for the > rbd image with following command: > As always, exact versions of everything, kernel, OS, Ceph. Then your cluster, HW, network, how the client is con

Re: [ceph-users] Cache Pool and EC: objects didn't flush to a cold EC storage

2016-03-07 Thread Mike Almateia
06-Mar-16 17:28, Christian Balzer пишет: On Sun, 6 Mar 2016 12:17:48 +0300 Mike Almateia wrote: Hello Cephers! When my cluster hit "full ratio" settings, objects from cache pull didn't flush to a cold storage. As always, versions of everything, Ceph foremost. Yes of course, I think it's an

Re: [ceph-users] Cache Pool and EC: objects didn't flush to a cold EC storage

2016-03-07 Thread Mike Almateia
08-Mar-16 00:41, Robert LeBlanc пишет: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Did you also set "target_max_bytes" to the size of the pool? That bit us when we didn't have it set. The ratio then uses the target_max_bytes to know when to flush. Yes, later I set this option. But the clus

Re: [ceph-users] Infernalis 9.2.1: the "rados df"ommand show wrong data

2016-03-07 Thread Mike Almateia
07-Mar-16 21:28, Gregory Farnum пишет: On Fri, Mar 4, 2016 at 11:56 PM, Mike Almateia wrote: Hello Cephers! On my small cluster I see this: [root@c1 ~]# rados df pool name KB objects clones degraded unfound rdrd KB wrwr KB data