[ceph-users] Cache tiering and target_max_bytes

2014-08-14 Thread Paweł Sadowski
Hello, I've a cluster of 35 OSD (30 HDD, 5 SSD) with cache tiering configured. During tests it looks like ceph is not respecting target_max_bytes settings. Steps to reproduce: - configure cache tiering - set target_max_bytes to 32G (on hot pool) - write more than 32G of data - nothing happens

Re: [ceph-users] Cache tiering and target_max_bytes

2014-08-14 Thread Paweł Sadowski
W dniu 14.08.2014 17:20, Sage Weil pisze: > On Thu, 14 Aug 2014, Pawe? Sadowski wrote: >> Hello, >> >> I've a cluster of 35 OSD (30 HDD, 5 SSD) with cache tiering configured. >> During tests it looks like ceph is not respecting target_max_bytes >> settings. Steps to reproduce: >> - configure cache

Re: [ceph-users] Cache tiering and target_max_bytes

2014-08-17 Thread Paweł Sadowski
On 08/14/2014 10:30 PM, Sage Weil wrote: > On Thu, 14 Aug 2014, Pawe? Sadowski wrote: >> W dniu 14.08.2014 17:20, Sage Weil pisze: >>> On Thu, 14 Aug 2014, Pawe? Sadowski wrote: Hello, I've a cluster of 35 OSD (30 HDD, 5 SSD) with cache tiering configured. During tests it looks

[ceph-users] ceph-users@lists.ceph.com

2014-08-21 Thread Paweł Sadowski
Hi, I'm trying to start Qemu on top of RBD. In documentation[1] there is a big warning: Important If you set rbd_cache=true, you must set cache=writeback or risk data loss. Without cache=writeback, QEMU will not send flush requests to librbd. If QEMU exits uncleanly in this confi

Re: [ceph-users] Ceph + Qemu cache=writethrough

2014-08-21 Thread Paweł Sadowski
Sorry for missing subject. On 08/21/2014 03:09 PM, Paweł Sadowski wrote: > Hi, > > I'm trying to start Qemu on top of RBD. In documentation[1] there is a > big warning: > > Important > > If you set rbd_cache=true, you must set cache=writeback or risk d

[ceph-users] osd id == 2147483647 (2^31 - 1)

2015-05-26 Thread Paweł Sadowski
Has anyone saw something like this: osd id == 2147483647 (2147483647 == 2^31 - 1). Looks like some int casting bug but I have no idea where to look for it (and I don't know exact steps to reproduce this - I was just doing osd in/osd out multiple times to test recovery speed under some client load).

Re: [ceph-users] osd id == 2147483647 (2^31 - 1)

2015-05-26 Thread Paweł Sadowski
t;, "item": -1, "item_name": "default"}, { "op": "chooseleaf_indep", "num": 0, "type": "rack"}, { "op": "emit"}]}] Regar

[ceph-users] osd_scrub_sleep, osd_scrub_chunk_{min,max}

2015-06-09 Thread Paweł Sadowski
Hello Everyone, There are some options[1] that greatly reduces deep-scrub performance impact but they are not documented anywhere. Is there any reason for this? 1: - osd_scrub_sleep - osd_scrub_chunk_min - osd_scrub_chunk_max -- PS ___ ceph-user

[ceph-users] Erasure coded pools and bit-rot protection

2015-06-12 Thread Paweł Sadowski
Hi All, I'm testing erasure coded pools. Is there any protection from bit-rot errors on object read? If I modify one bit in object part (directly on OSD) I'm getting *broken*object: mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum bb2d82bbb95be6b9a039d135cc7a5d0d - # m

Re: [ceph-users] Erasure coded pools and bit-rot protection

2015-06-13 Thread Paweł Sadowski
#x27;s a ticket: > http://tracker.ceph.com/issues/12000 > > On Fri, Jun 12, 2015 at 12:32 PM, Gregory Farnum wrote: >> On Fri, Jun 12, 2015 at 1:07 AM, Paweł Sadowski wrote: >>> Hi All, >>> >>> I'm testing erasure coded pools. Is there any protection from

[ceph-users] Weight of new OSD

2014-10-22 Thread Paweł Sadowski
Hi, >From time to time when I replace broken OSD the new one gets weight of zero. Crush map from epoch before adding OSD seems to be fine. Is there any way to debug this issue? Regards, -- PS ___ ceph-users mailing list ceph-users@lists.ceph.com http:/

[ceph-users] osd_disk_thread_ioprio_class/_priorioty ignored?

2014-10-23 Thread Paweł Sadowski
Hi, I was trying to determine performance impact of deep-scrubbing with osd_disk_thread_ioprio_class option set but it looks like it's ignored. Performance (during deep-scrub) is the same with this options set or left with defaults (1/3 of "normal" performance). # ceph --admin-daemon /var/run/ce

Re: [ceph-users] osd_disk_thread_ioprio_class/_priorioty ignored?

2014-10-23 Thread Paweł Sadowski
On 10/23/2014 09:10 AM, Paweł Sadowski wrote: > Hi, > > I was trying to determine performance impact of deep-scrubbing with > osd_disk_thread_ioprio_class option set but it looks like it's ignored. > Performance (during deep-scrub) is the same with this options set or > lef

Re: [ceph-users] Weight of new OSD

2014-11-04 Thread Paweł Sadowski
t; > echo "Create or move OSD $id weight ${weight:-${defaultweight:-1}} to > location $location" >> /tmp/ceph-osd.log > > You probably want more detail though, like maybe the value of weight > and defaultweight, to see where the error is occurring. > > > O

[ceph-users] Ceph inconsistency after deep-scrub

2014-11-21 Thread Paweł Sadowski
Hi, During deep-scrub Ceph discovered some inconsistency between OSDs on my cluster (size 3, min size 2). I have fund broken object and calculated md5sum of it on each OSD (osd.195 is acting_primary): osd.195 - md5sum_ osd.40 - md5sum_ osd.314 - md5sum_ I run ceph pg repair and Cep

Re: [ceph-users] Ceph inconsistency after deep-scrub

2014-11-21 Thread Paweł Sadowski
W dniu 21.11.2014 o 20:12, Gregory Farnum pisze: > On Fri, Nov 21, 2014 at 2:35 AM, Paweł Sadowski wrote: >> Hi, >> >> During deep-scrub Ceph discovered some inconsistency between OSDs on my >> cluster (size 3, min size 2). I have fund broken object and calculated >&

Re: [ceph-users] Ceph inconsistency after deep-scrub

2014-11-24 Thread Paweł Sadowski
On 11/21/2014 10:46 PM, Paweł Sadowski wrote: > W dniu 21.11.2014 o 20:12, Gregory Farnum pisze: >> On Fri, Nov 21, 2014 at 2:35 AM, Paweł Sadowski wrote: >>> Hi, >>> >>> During deep-scrub Ceph discovered some inconsistency between OSDs on my >>> clu

[ceph-users] Ceph data consistency

2014-12-30 Thread Paweł Sadowski
Hi, On our Ceph cluster from time to time we have some inconsistent PGs (after deep-scrub). We have some issues with disk/sata cables/lsi controller causing IO errors from time to time (but that's not the point in this case). When IO error occurs on OSD journal partition everything works as is sh

Re: [ceph-users] Ceph data consistency

2014-12-30 Thread Paweł Sadowski
On 12/30/2014 09:40 AM, Chen, Xiaoxi wrote: > Hi, >First of all, the data is safe since it's persistent in journal, if error > occurs on OSD data partition, replay the journal will get the data back. Agree. Data are safe in journal. But when journal is flushed data are moved to a filestore and

Re: [ceph-users] Hammer reduce recovery impact

2015-09-10 Thread Paweł Sadowski
On 09/10/2015 10:56 PM, Robert LeBlanc wrote: > Things I've tried: > > * Lowered nr_requests on the spindles from 1000 to 100. This reduced > the max latency sometimes up to 3000 ms down to a max of 500-700 ms. > it has also reduced the huge swings in latency, but has also reduced > throughput som

Re: [ceph-users] Lot of blocked operations

2015-09-18 Thread Paweł Sadowski
On 09/18/2015 12:17 PM, Olivier Bonvalet wrote: > Le vendredi 18 septembre 2015 à 12:04 +0200, Jan Schermer a écrit : >>> On 18 Sep 2015, at 11:28, Christian Balzer wrote: >>> >>> On Fri, 18 Sep 2015 11:07:49 +0200 Olivier Bonvalet wrote: >>> Le vendredi 18 septembre 2015 à 10:59 +0200, Jan S

[ceph-users] O_DIRECT on deep-scrub read

2015-10-07 Thread Paweł Sadowski
Hi, Can anyone tell if deep scrub is done using O_DIRECT flag or not? I'm not able to verify that in source code. If not would it be possible to add such feature (maybe config option) to help keeping Linux page cache in better shape? Thanks, -- PS _

Re: [ceph-users] O_DIRECT on deep-scrub read

2015-10-08 Thread Paweł Sadowski
On 10/07/2015 10:52 PM, Sage Weil wrote: > On Wed, 7 Oct 2015, David Zafman wrote: >> There would be a benefit to doing fadvise POSIX_FADV_DONTNEED after >> deep-scrub reads for objects not recently accessed by clients. > Yeah, it's the 'except for stuff already in cache' part that we don't do >

[ceph-users] Inconsistent PGs

2016-06-21 Thread Paweł Sadowski
Hello, We have an issue on one of our clusters. One node with 9 OSD was down for more than 12 hours. During that time cluster recovered without problems. When host back to the cluster we got two PGs in incomplete state. We decided to mark OSDs on this host as out but the two PGs are still in incom

Re: [ceph-users] Inconsistent PGs

2016-06-21 Thread Paweł Sadowski
? On 06/21/2016 12:37 PM, M Ranga Swami Reddy wrote: > Try to restart OSD 109 and 166? check if it help? > > > On Tue, Jun 21, 2016 at 4:05 PM, Paweł Sadowski wrote: >> Thanks for response. >> >> All OSDs seems to be ok, they have been restarted, joined cluster after

Re: [ceph-users] Inconsistent PGs

2016-06-21 Thread Paweł Sadowski
uck stale > ceph pg dump_stuck inactive > ceph pg dump_stuck unclean > === > > And the query the PG, which are in unclean or stale state, check for > any issue with a specific OSD. > > Thanks > Swami > > On Tue, Jun 21, 2016 at 3:02 PM, Paweł Sadowski wrote: >

Re: [ceph-users] Inconsistent PGs

2016-06-22 Thread Paweł Sadowski
tore-tool" to recover that pg. > > 2016-06-21 19:09 GMT+08:00 Paweł Sadowski <mailto:c...@sadziu.pl>>: > > Already restarted those OSD and then whole cluster (rack by rack, > failure domain is rack in this setup). > We would like to try *ceph-objectstore-tool

Re: [ceph-users] unfound objects blocking cluster, need help!

2016-10-07 Thread Paweł Sadowski
Hi, I work with Tomasz and I'm investigating this situation. We still don't fully understood why there was unfound object after removing single OSD. >From logs[1] it looks like all PGs were active+clean before marking that OSD out. After that backfills started on multiple OSDs. Three minutes later

Re: [ceph-users] effectively reducing scrub io impact

2016-10-20 Thread Paweł Sadowski
You can inspect source code or do: ceph --admin-daemon /var/run/ceph/ceph-osd.OSD_ID.asok config show | grep scrub # or similar And then check in source code :) On 10/20/2016 03:03 PM, Oliver Dzombic wrote: > Hi Christian, > > thank you for your time. > > The problem is deep scrub only. > > Jewe

Re: [ceph-users] Remove - down_osds_we_would_probe

2016-11-19 Thread Paweł Sadowski
Hi, Make a temporary OSD with the same ID and weight 0 to avoid putting data on it. Cluster should contact this OSD and move forward. If not you can also use 'ceph osd lost ID' but OSD with that ID must exists in crushmap (and this probably not the case here). On 19.11.2016 13:46, Bruno Silv

Re: [ceph-users] After OSD Flap - FAILED assert(oi.version == i->first)

2016-12-01 Thread Paweł Sadowski
Hi, We see this error on Hammer 0.94.6. Bug report updated with logs. Thanks, On 11/15/2016 07:30 PM, Samuel Just wrote: > http://tracker.ceph.com/issues/17916 > > I just pushed a branch wip-17916-jewel based on v10.2.3 with some > additional debugging. Once it builds, would you be able to st

Re: [ceph-users] RBD image "lightweight snapshots"

2018-08-10 Thread Paweł Sadowski
On 08/10/2018 06:24 PM, Gregory Farnum wrote: On Fri, Aug 10, 2018 at 4:53 AM, Paweł Sadowsk wrote: On 08/09/2018 04:39 PM, Alex Elder wrote: On 08/09/2018 08:15 AM, Sage Weil wrote: On Thu, 9 Aug 2018, Piotr Dałek wrote: Hello, At OVH we're heavily utilizing snapshots for our backup system

Re: [ceph-users] High TCP retransmission rates, only with Ceph

2018-04-15 Thread Paweł Sadowski
On 04/15/2018 08:18 PM, Robert Stanford wrote:  Iperf gives about 7Gb/s between a radosgw host and one of my OSD hosts (8 disks, 8 OSD daemons, one of 3 OSD hosts).  When I benchmark radosgw with cosbench I see high TCP retransmission rates (from sar -n ETCP 1).  I don't see this with iperf. 

Re: [ceph-users] Ceph Monitoring

2017-01-13 Thread Paweł Sadowski
We monitor few things: - cluster health (error only, ignoring warnings since we have separate checks for interesting things) - if all PGs are active (number of active replicas >= min_size) - if there are any blocked requests (it's a good indicator, in our case, that some disk is going to fail s

Re: [ceph-users] SIGHUP to ceph processes every morning

2017-01-25 Thread Paweł Sadowski
Hi, 6:25 points to daily cron job, it's probably logrotate trying to force ceph to reopen logs. On 01/26/2017 07:34 AM, Torsten Casselt wrote: > Hi, > > I get the following line in journalctl: > > Jan 24 06:25:02 ceph01 ceph-osd[28398]: 2017-01-24 06:25:02.302770 > 7f0655516700 -1 received sign

Re: [ceph-users] Sparse file info in filestore not propagated to other OSDs

2017-06-13 Thread Paweł Sadowski
On 04/13/2017 04:23 PM, Piotr Dałek wrote: > On 04/06/2017 03:25 PM, Sage Weil wrote: >> On Thu, 6 Apr 2017, Piotr Dałek wrote: >>> Hello, >>> >>> We recently had an interesting issue with RBD images and filestore >>> on Jewel >>> 10.2.5: >>> We have a pool with RBD images, all of them mostly unt

Re: [ceph-users] Flash for mon nodes ?

2017-06-21 Thread Paweł Sadowski
On 06/21/2017 12:38 PM, Osama Hasebou wrote: > Hi Guys, > > Has anyone used flash SSD drives for nodes hosting Monitor nodes only? > > If yes, any major benefits against just using SAS drives ? We are using such setup for big (>500 OSDs) clusters. It makes it less painful when such cluster rebalan

Re: [ceph-users] Intel P3700 PCI-e as journal drives?

2016-01-08 Thread Paweł Sadowski
Hi, Quick results for 1/5/10 jobs: # fio --filename=/dev/nvme0n1 --direct=1 --sync=1 --rw=write --bs=4k --numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting --name=journal-test journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1 fio-2.1.3 Starting 1 proce

[ceph-users] RadosGW and X-Storage-Url

2016-04-26 Thread Paweł Sadowski
Hi, I'm testing RadosGW on Infernalis (9.2.1) and have two questions regarding X-Storage-Url header. First thing is that it always returns something like below: X-Storage-Url: http://my.example.domain:0/swift/v1 While docs say it should return "... {api version}/{account} prefix" Second thi

Re: [ceph-users] New OSD with weight 0, rebalance still happen...

2018-11-22 Thread Paweł Sadowski
On 11/22/18 6:12 PM, Marco Gaiarin wrote: Mandi! Paweł Sadowsk In chel di` si favelave... From your osd tree it looks like you used 'ceph osd reweight'. Yes, and i supposed also to do the right things! Now, i've tried to lower the to-dimissi OSD, using: ceph osd reweight 2 0.95 l