Re: [ceph-users] ceph mgr balancer bad distribution

2018-03-01 Thread Dan van der Ster
On Thu, Mar 1, 2018 at 9:52 AM, Stefan Priebe - Profihost AG wrote: > Hi, > > Am 01.03.2018 um 09:42 schrieb Dan van der Ster: >> On Thu, Mar 1, 2018 at 9:31 AM, Stefan Priebe - Profihost AG >> wrote: >>> Hi, >>> Am 01.03.2018 um 09:03 schrieb Dan va

Re: [ceph-users] ceph mgr balancer bad distribution

2018-03-01 Thread Dan van der Ster
On Thu, Mar 1, 2018 at 10:24 AM, Stefan Priebe - Profihost AG wrote: > > Am 01.03.2018 um 09:58 schrieb Dan van der Ster: >> On Thu, Mar 1, 2018 at 9:52 AM, Stefan Priebe - Profihost AG >> wrote: >>> Hi, >>> >>> Am 01.03.2018 um 09:42 schrieb Dan van

Re: [ceph-users] ceph mgr balancer bad distribution

2018-03-01 Thread Dan van der Ster
On Thu, Mar 1, 2018 at 10:38 AM, Dan van der Ster wrote: > On Thu, Mar 1, 2018 at 10:24 AM, Stefan Priebe - Profihost AG > wrote: >> >> Am 01.03.2018 um 09:58 schrieb Dan van der Ster: >>> On Thu, Mar 1, 2018 at 9:52 AM, Stefan Priebe - Profihost AG >>> wro

Re: [ceph-users] ceph mgr balancer bad distribution

2018-03-01 Thread Dan van der Ster
On Thu, Mar 1, 2018 at 10:40 AM, Dan van der Ster wrote: > On Thu, Mar 1, 2018 at 10:38 AM, Dan van der Ster wrote: >> On Thu, Mar 1, 2018 at 10:24 AM, Stefan Priebe - Profihost AG >> wrote: >>> >>> Am 01.03.2018 um 09:58 schrieb Dan van der Ster: >>&g

Re: [ceph-users] ceph mgr balancer bad distribution

2018-03-01 Thread Dan van der Ster
On Thu, Mar 1, 2018 at 1:08 PM, Stefan Priebe - Profihost AG wrote: > nice thanks will try that soon. > > Can you tell me how to change the log lever to info for the balancer module? debug mgr = 4/5 -- dan ___ ceph-users mailing list ceph-users@lists.c

Re: [ceph-users] ceph mgr balancer bad distribution

2018-03-02 Thread Dan van der Ster
late evening, every day, the cluster is back to HEALTH_OK. Cheers, Dan > Stefan > > Excuse my typo sent from my mobile phone. > > Am 01.03.2018 um 13:12 schrieb Dan van der Ster : > > On Thu, Mar 1, 2018 at 1:08 PM, Stefan Priebe - Profihost AG > wrote: > > nice than

[ceph-users] Don't use ceph mds set max_mds

2018-03-07 Thread Dan van der Ster
Hi all, What is the purpose of ceph mds set max_mds ? We just used that by mistake on a cephfs cluster when attempting to decrease from 2 to 1 active mds's. The correct command to do this is of course ceph fs set max_mds So, is `ceph mds set max_mds` useful for something? If not, sho

Re: [ceph-users] Don't use ceph mds set max_mds

2018-03-07 Thread Dan van der Ster
On Wed, Mar 7, 2018 at 2:29 PM, John Spray wrote: > On Wed, Mar 7, 2018 at 10:11 AM, Dan van der Ster wrote: >> Hi all, >> >> What is the purpose of >> >>ceph mds set max_mds >> >> ? >> >> We just used that by mistake on a cephfs

[ceph-users] rctime not tracking inode ctime

2018-03-14 Thread Dan van der Ster
Hi all, On our luminous v12.2.4 ceph-fuse clients / mds the rctime is not tracking the latest inode ctime, but only the latest directory ctimes. Initial empty dir: # getfattr -d -m ceph . | egrep 'bytes|ctime' ceph.dir.rbytes="0" ceph.dir.rctime="1521043742.09466372697" Create a file, rctime is

Re: [ceph-users] rctime not tracking inode ctime

2018-03-15 Thread Dan van der Ster
On Wed, Mar 14, 2018 at 11:43 PM, Patrick Donnelly wrote: > On Wed, Mar 14, 2018 at 9:22 AM, Dan van der Ster wrote: >> Hi all, >> >> On our luminous v12.2.4 ceph-fuse clients / mds the rctime is not >> tracking the latest inode ctime, but only the latest directory

Re: [ceph-users] Backfilling on Luminous

2018-03-15 Thread Dan van der Ster
Hi, Do you see any split or merge messages in the osd logs? I recall some surprise filestore splitting on a few osds after the luminous upgrade. .. Dan On Mar 15, 2018 6:04 PM, "David Turner" wrote: I upgraded a [1] cluster from Jewel 10.2.7 to Luminous 12.2.2 and last week I added 2 nodes t

Re: [ceph-users] Backfilling on Luminous

2018-03-15 Thread Dan van der Ster
store splitting), but actually segfaulting and restarting. On Thu, Mar 15, 2018 at 4:08 PM Dan van der Ster wrote: > Hi, > > Do you see any split or merge messages in the osd logs? > I recall some surprise filestore splitting on a few osds after the > luminous upgrade. > >

Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-03-28 Thread Dan van der Ster
Hi, Which versions were those MDS's before and after the restarted standby MDS? Cheers, Dan On Wed, Mar 28, 2018 at 11:11 AM, adrien.geor...@cc.in2p3.fr wrote: > Hi, > > I just had the same issue with our 12.2.4 cluster but not during the > upgrade. > One of our 3 monitors restarted (the one

Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-03-28 Thread Dan van der Ster
version. > > Adrien > > > Le 28/03/2018 à 14:47, Dan van der Ster a écrit : >> >> Hi, >> >> Which versions were those MDS's before and after the restarted standby >> MDS? >> >> Cheers, Dan >> >> >> >> On Wed, Mar

Re: [ceph-users] cephfs performance issue

2018-03-29 Thread Dan van der Ster
On Thu, Mar 29, 2018 at 10:31 AM, Robert Sander wrote: > On 29.03.2018 09:50, ouyangxu wrote: > >> I'm using Ceph 12.2.4 with CentOS 7.4, and tring to use cephfs for >> MariaDB deployment, > > Don't do this. > As the old saying goes: If it hurts, stop doing it. Why not? Let's find out where and w

Re: [ceph-users] [SOLVED] Replicated pool with an even size - has min_size to be bigger than half the size?

2018-03-29 Thread Dan van der Ster
Guys, Ceph does not have a concept of "osd quorum" or "electing a primary PG". The mons are in a PAXOS quorum, and the mon leader decides which OSD is primary for each PG. No need to worry about a split OSD brain. -- dan On Thu, Mar 29, 2018 at 2:51 PM, Peter Linder wrote: > > > Den 2018-03-2

Re: [ceph-users] scalability new node to the existing cluster

2018-04-18 Thread Hans van den Bogert
I keep seeing these threads where adding nodes has such an impact on the cluster as a whole, that I wonder what the rest of the cluster looks like. Normally I’d just advise someone to put a limit on the concurrent backfills that can be done, and `osd max backfills` by default already is 1. Could

Re: [ceph-users] ceph luminous 12.2.4 - 2 servers better than 3 ?

2018-04-19 Thread Hans van den Bogert
Hi Steven, There is only one bench. Could you show multiple benches of the different scenarios you discussed? Also provide hardware details. Hans On Apr 19, 2018 13:11, "Steven Vacaroaia" wrote: Hi, Any idea why 2 servers with one OSD each will provide better performance than 3 ? Servers are

Re: [ceph-users] ceph luminous 12.2.4 - 2 servers better than 3 ?

2018-04-19 Thread Hans van den Bogert
4194304 > Bandwidth (MB/sec): 44.0793 > Stddev Bandwidth: 55.3843 > Max bandwidth (MB/sec): 232 > Min bandwidth (MB/sec): 0 > Average IOPS: 11 > Stddev IOPS:13 > Max IOPS: 58 > Min IOPS: 0 > Average Latency(s

Re: [ceph-users] ceph luminous 12.2.4 - 2 servers better than 3 ?

2018-04-19 Thread Hans van den Bogert
DB ( on separate SSD or same HDD) Thanks Steven On Thu, 19 Apr 2018 at 12:06, Hans van den Bogert wrote: > I take it that the first bench is with replication size 2, the second > bench is with replication size 3? Same for the 4 node OSD scenario? > > Also please let us know how you

Re: [ceph-users] ceph luminous 12.2.4 - 2 servers better than 3 ?

2018-04-19 Thread Hans van den Bogert
Write Cache : Disk's Default > Adapter 0-VD 2(target id: 2): Disk Write Cache : Disk's Default > Adapter 0-VD 3(target id: 3): Disk Write Cache : Disk's Default > > > On Thu, 19 Apr 2018 at 14:22, Hans van den Bogert > wrote: > >> I see, the second one i

Re: [ceph-users] cephfs luminous 12.2.4 - multi-active MDSes with manual pinning

2018-04-24 Thread Dan van der Ster
That "nicely exporting" thing is a logging issue that was apparently fixed in https://github.com/ceph/ceph/pull/19220. I'm not sure if that will be backported to luminous. Otherwise the slow requests could be due to either slow trimming (see previous discussions about mds log max expiring and mds

Re: [ceph-users] *** SPAM *** Re: Multi-MDS Failover

2018-04-27 Thread Dan van der Ster
Hi Scott, Multi MDS just assigns different parts of the namespace to different "ranks". Each rank (0, 1, 2, ...) is handled by one of the active MDSs. (You can query which parts of the name space are assigned to each rank using the jq tricks in [1]). If a rank is down and there are no more standby

Re: [ceph-users] ceph 12.2.5 - atop DB/WAL SSD usage 0%

2018-04-30 Thread Hans van den Bogert
Shouldn't Steven see some data being written to the block/wal for object metadata? Though that might be negligible with 4MB objects On 27-04-18 16:04, Serkan Çoban wrote: rados bench is using 4MB block size for io. Try with with io size 4KB, you will see ssd will be used for write operations.

Re: [ceph-users] Bluestore on HDD+SSD sync write latency experiences

2018-05-03 Thread Dan van der Ster
Hi Nick, Our latency probe results (4kB rados bench) didn't change noticeably after converting a test cluster from FileStore (sata SSD journal) to BlueStore (sata SSD db). Those 4kB writes take 3-4ms on average from a random VM in our data centre. (So bluestore DB seems equivalent to FileStore jou

Re: [ceph-users] Luminous radosgw S3/Keystone integration issues

2018-05-04 Thread Dan van der Ster
Hi Valery, Did you eventually find a workaround for this? I *think* we'd also prefer rgw to fallback to external plugins, rather than checking them before local. But I never understood the reasoning behind the change from jewel to luminous. I saw that there is work towards a cache for ldap [1] an

Re: [ceph-users] Luminous radosgw S3/Keystone integration issues

2018-05-07 Thread Dan van der Ster
> > We agreed in upstream RGW to make this change. Do you intend to > submit this as a PR? > > regards > > Matt > > On Fri, May 4, 2018 at 10:57 AM, Dan van der Ster wrote: >> Hi Valery, >> >> Did you eventually find a workaround for this? I *think* we&#x

Re: [ceph-users] What is the meaning of size and min_size for erasure-coded pools?

2018-05-08 Thread Dan van der Ster
On Tue, May 8, 2018 at 7:35 PM, Vasu Kulkarni wrote: > On Mon, May 7, 2018 at 2:26 PM, Maciej Puzio wrote: >> I am an admin in a research lab looking for a cluster storage >> solution, and a newbie to ceph. I have setup a mini toy cluster on >> some VMs, to familiarize myself with ceph and to tes

Re: [ceph-users] jewel to luminous upgrade, chooseleaf_vary_r and chooseleaf_stable

2018-05-14 Thread Dan van der Ster
Hi Adrian, Is there a strict reason why you *must* upgrade the tunables? It is normally OK to run with old (e.g. hammer) tunables on a luminous cluster. The crush placement won't be state of the art, but that's not a huge problem. We have a lot of data in a jewel cluster with hammer tunables. We

Re: [ceph-users] Poor CentOS 7.5 client performance

2018-05-17 Thread Dan van der Ster
Hi, It still isn't clear if you're using the fuse or kernel client. Do you `mount -t ceph` or something else? -- Dan On Wed, May 16, 2018 at 8:28 PM Donald "Mac" McCarthy wrote: > CephFS. 8 core atom C2758, 16 GB ram, 256GB ssd, 2.5 GB NIC (supermicro microblade node). > Read test: > dd if

[ceph-users] ceph-volume: failed to activate some bluestore osds

2018-06-07 Thread Dan van der Ster
Hi all, We have an intermittent issue where bluestore osds sometimes fail to start after a reboot. The osds all fail the same way [see 2], failing to open the superblock. One one particular host, there are 24 osds and 4 SSDs partitioned for the block.db's. The affected non-starting OSDs all have b

Re: [ceph-users] ceph-volume: failed to activate some bluestore osds

2018-06-07 Thread Dan van der Ster
On Thu, Jun 7, 2018 at 4:33 PM Sage Weil wrote: > > On Thu, 7 Jun 2018, Dan van der Ster wrote: > > Hi all, > > > > We have an intermittent issue where bluestore osds sometimes fail to > > start after a reboot. > > The osds all fail the same way [see 2], fai

Re: [ceph-users] ceph-volume: failed to activate some bluestore osds

2018-06-07 Thread Dan van der Ster
On Thu, Jun 7, 2018 at 4:31 PM Alfredo Deza wrote: > > On Thu, Jun 7, 2018 at 10:23 AM, Dan van der Ster wrote: > > Hi all, > > > > We have an intermittent issue where bluestore osds sometimes fail to > > start after a reboot. > > The osds all fail the s

Re: [ceph-users] ceph-volume: failed to activate some bluestore osds

2018-06-07 Thread Dan van der Ster
On Thu, Jun 7, 2018 at 4:41 PM Sage Weil wrote: > > On Thu, 7 Jun 2018, Dan van der Ster wrote: > > On Thu, Jun 7, 2018 at 4:33 PM Sage Weil wrote: > > > > > > On Thu, 7 Jun 2018, Dan van der Ster wrote: > > > > Hi all, > > > > > > >

Re: [ceph-users] ceph-volume: failed to activate some bluestore osds

2018-06-07 Thread Dan van der Ster
On Thu, Jun 7, 2018 at 5:16 PM Alfredo Deza wrote: > > On Thu, Jun 7, 2018 at 10:54 AM, Dan van der Ster wrote: > > On Thu, Jun 7, 2018 at 4:41 PM Sage Weil wrote: > >> > >> On Thu, 7 Jun 2018, Dan van der Ster wrote: > >> > On Thu, Jun 7, 2018 at 4:33

Re: [ceph-users] ceph-volume: failed to activate some bluestore osds

2018-06-07 Thread Dan van der Ster
On Thu, Jun 7, 2018 at 5:34 PM Sage Weil wrote: > > On Thu, 7 Jun 2018, Dan van der Ster wrote: > > On Thu, Jun 7, 2018 at 4:41 PM Sage Weil wrote: > > > > > > On Thu, 7 Jun 2018, Dan van der Ster wrote: > > > > On Thu, Jun 7, 2018 at 4:33 PM Sage Weil

Re: [ceph-users] ceph-volume: failed to activate some bluestore osds

2018-06-07 Thread Dan van der Ster
On Thu, Jun 7, 2018 at 5:36 PM Dan van der Ster wrote: > > On Thu, Jun 7, 2018 at 5:34 PM Sage Weil wrote: > > > > On Thu, 7 Jun 2018, Dan van der Ster wrote: > > > On Thu, Jun 7, 2018 at 4:41 PM Sage Weil wrote: > > > > > > > > On Thu, 7 Jun

Re: [ceph-users] ceph-volume: failed to activate some bluestore osds

2018-06-07 Thread Dan van der Ster
On Thu, Jun 7, 2018 at 6:01 PM Dan van der Ster wrote: > > On Thu, Jun 7, 2018 at 5:36 PM Dan van der Ster wrote: > > > > On Thu, Jun 7, 2018 at 5:34 PM Sage Weil wrote: > > > > > > On Thu, 7 Jun 2018, Dan van der Ster wrote: > > > >

Re: [ceph-users] ceph-volume: failed to activate some bluestore osds

2018-06-07 Thread Dan van der Ster
On Thu, Jun 7, 2018 at 6:09 PM Sage Weil wrote: > > On Thu, 7 Jun 2018, Dan van der Ster wrote: > > On Thu, Jun 7, 2018 at 5:36 PM Dan van der Ster wrote: > > > > > > On Thu, Jun 7, 2018 at 5:34 PM Sage Weil wrote: > > > > > > > > On Thu

Re: [ceph-users] ceph-volume: failed to activate some bluestore osds

2018-06-07 Thread Dan van der Ster
On Thu, Jun 7, 2018 at 6:33 PM Sage Weil wrote: > > On Thu, 7 Jun 2018, Dan van der Ster wrote: > > > > Wait, we found something!!! > > > > > > > > In the 1st 4k on the block we found the block.db pointing at the wrong > > > > device (/dev/sd

Re: [ceph-users] ceph-volume: failed to activate some bluestore osds

2018-06-07 Thread Dan van der Ster
On Thu, Jun 7, 2018 at 6:58 PM Alfredo Deza wrote: > > On Thu, Jun 7, 2018 at 12:09 PM, Sage Weil wrote: > > On Thu, 7 Jun 2018, Dan van der Ster wrote: > >> On Thu, Jun 7, 2018 at 5:36 PM Dan van der Ster > >> wrote: > >> > > >>

Re: [ceph-users] ceph-volume: failed to activate some bluestore osds

2018-06-07 Thread Dan van der Ster
On Thu, Jun 7, 2018 at 8:58 PM Alfredo Deza wrote: > > On Thu, Jun 7, 2018 at 2:45 PM, Dan van der Ster wrote: > > On Thu, Jun 7, 2018 at 6:58 PM Alfredo Deza wrote: > >> > >> On Thu, Jun 7, 2018 at 12:09 PM, Sage Weil wrote: > >> > On Thu, 7 Jun 20

Re: [ceph-users] *****SPAM***** Re: Add ssd's to hdd cluster, crush map class hdd update necessary?

2018-06-13 Thread Dan van der Ster
See this thread: http://lists.ceph.com/pipermail/ceph-large-ceph.com/2018-April/000106.html http://lists.ceph.com/pipermail/ceph-large-ceph.com/2018-June/000113.html (Wido -- should we kill the ceph-large list??) -- dan On Wed, Jun 13, 2018 at 12:27 PM Marc Roos wrote: > > > Shit, I added th

Re: [ceph-users] Add ssd's to hdd cluster, crush map class hdd update necessary?

2018-06-13 Thread Dan van der Ster
See this thread: http://lists.ceph.com/pipermail/ceph-large-ceph.com/2018-April/000106.html http://lists.ceph.com/pipermail/ceph-large-ceph.com/2018-June/000113.html (Wido -- should we kill the ceph-large list??) On Wed, Jun 13, 2018 at 1:14 PM Marc Roos wrote: > > > I wonder if this is not a

[ceph-users] Performance issues with deep-scrub since upgrading from v12.2.2 to v12.2.5

2018-06-14 Thread Sander van Schie / True
Hello, We recently upgraded Ceph from version 12.2.2 to version 12.2.5. Since the upgrade we've been having performance issues which seem to relate to when deep-scrub actions are performed. Most of the time deep-scrub actions only takes a couple of seconds at most, however ocassionaly it takes

Re: [ceph-users] Performance issues with deep-scrub since upgrading from v12.2.2 to v12.2.5

2018-06-14 Thread Sander van Schie / True
porarily disabled this. Could this somehow be related? Thanks Sander From: Gregory Farnum Sent: Thursday, June 14, 2018 19:45 To: Sander van Schie / True Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Performance issues with deep-scrub since upgrading

Re: [ceph-users] Performance issues with deep-scrub since upgrading from v12.2.2 to v12.2.5

2018-06-14 Thread Sander van Schie / True
the issue for us. Sander From: Gregory Farnum Sent: Thursday, June 14, 2018 22:45 To: Sander van Schie / True Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Performance issues with deep-scrub since upgrading from v12.2.2 to v12.2.5 Yes. Deep scrub o

[ceph-users] RGW Dynamic bucket index resharding keeps resharding all buckets

2018-06-15 Thread Sander van Schie / True
Hello, We're into some problems with dynamic bucket index resharding. After an upgrade from Ceph 12.2.2 to 12.2.5, which fixed an issue with the resharding when using tenants (which we do), the cluster was busy resharding for 2 days straight, resharding the same buckets over and over again. Af

Re: [ceph-users] IO to OSD with librados

2018-06-18 Thread Dan van der Ster
Hi, One way you can see exactly what is happening when you write an object is with --debug_ms=1. For example, I write a 100MB object to a test pool: rados --debug_ms=1 -p test put 100M.dat 100M.dat I pasted the output of this here: https://pastebin.com/Zg8rjaTV In this case, it first gets the cl

Re: [ceph-users] IO to OSD with librados

2018-06-18 Thread Dan van der Ster
:6789/0,ngfdv078=128.55.xxx.xx:6789/0} > > election epoch 4, quorum 0,1 ngfdv076,ngfdv078 > > osdmap e280: 48 osds: 48 up, 48 in > > flags sortbitwise,require_jewel_osds > > pgmap v117283: 3136 pgs, 11 pools, 25600 MB data, 510 objects > &g

Re: [ceph-users] RGW Dynamic bucket index resharding keeps resharding all buckets

2018-06-18 Thread Sander van Schie / True
p would be greatly appreciated. Thanks, Sander From: Sander van Schie / True Sent: Friday, June 15, 2018 14:19 To: ceph-users@lists.ceph.com Subject: RGW Dynamic bucket index resharding keeps resharding all buckets Hello, We're into some proble

Re: [ceph-users] RGW Dynamic bucket index resharding keeps resharding all buckets

2018-06-18 Thread Sander van Schie / True
Thanks, I created the following issue: https://tracker.ceph.com/issues/24551 Sander ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] IO to OSD with librados

2018-06-19 Thread Dan van der Ster
82489 osd.19 up 1.0 1.0 >> > >> > 23 21.82489 osd.23 up 1.0 1.0 >> > >> > 27 21.82489 osd.27 up 1.0 1.0 >> > >> > 31 21.82489 osd.3

Re: [ceph-users] CentOS Dojo at CERN

2018-06-20 Thread Dan van der Ster
And BTW, if you can't make it to this event we're in the early days of planning a dedicated Ceph + OpenStack Days at CERN around May/June 2019. More news on that later... -- Dan @ CERN On Tue, Jun 19, 2018 at 10:23 PM Leonardo Vaz wrote: > > Hey Cephers, > > We will join our friends from OpenSt

Re: [ceph-users] CentOS Dojo at CERN

2018-06-21 Thread Dan van der Ster
On Thu, Jun 21, 2018 at 2:41 PM Kai Wagner wrote: > > On 20.06.2018 17:39, Dan van der Ster wrote: > > And BTW, if you can't make it to this event we're in the early days of > > planning a dedicated Ceph + OpenStack Days at CERN around May/June > > 2019. > >

[ceph-users] unfound blocks IO or gives IO error?

2018-06-22 Thread Dan van der Ster
Hi all, Quick question: does an IO with an unfound object result in an IO error or should the IO block? During a jewel to luminous upgrade some PGs passed through a state with unfound objects for a few seconds. And this seems to match the times when we had a few IO errors on RBD attached volumes.

Re: [ceph-users] unfound blocks IO or gives IO error?

2018-06-22 Thread Dan van der Ster
there is no live ceph-osd who has a > copy. In this case, IO to those objects will block, and the cluster will hope > that the failed node comes back soon; this is assumed to be preferable to > returning an IO error to the user." > > On 22.06.2018, at 16:16, Dan van der Ster w

Re: [ceph-users] unfound blocks IO or gives IO error?

2018-06-25 Thread Dan van der Ster
viour of virtio-blk vs virtio-scsi: the latter has a timeout but blk blocks forever. On 5000 attached volumes we saw around 12 of these IO errors, and this was the first time in 5 years of upgrades that an IO error happened... -- dan > -Greg > >> >> >> On 22.06.2018, at 1

Re: [ceph-users] ceph-volume: failed to activate some bluestore osds

2018-06-29 Thread Dan van der Ster
On Thu, Jun 7, 2018 at 8:40 PM Dan van der Ster wrote: > > On Thu, Jun 7, 2018 at 6:33 PM Sage Weil wrote: > > > > On Thu, 7 Jun 2018, Dan van der Ster wrote: > > > > > Wait, we found something!!! > > > > > > > > > > In the

Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?

2017-01-13 Thread Dan van der Ster
Hammer or jewel? I've forgotten which thread pool is handling the snap trim nowadays -- is it the op thread yet? If so, perhaps all the op threads are stuck sleeping? Just a wild guess. (Maybe increasing # op threads would help?). -- Dan On Thu, Jan 12, 2017 at 3:11 PM, Nick Fisk wrote: > Hi, >

Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?

2017-01-19 Thread Dan van der Ster
o:ceph-users-boun...@lists.ceph.com] On Behalf Of >> Nick Fisk >> Sent: 13 January 2017 20:38 >> To: 'Dan van der Ster' >> Cc: 'ceph-users' >> Subject: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep? >> >> We're o

[ceph-users] Crash on startup

2017-02-01 Thread Hans van den Bogert
Hi All, I'm clueless as to why an OSD crashed. I have a log at [1]. If anyone can explain how this should be interpreted, then please let me know. I can only see generic errors probably started by a false assert. Restarting the OSD fails with the same errors as in [1]. It seems like, though co

Re: [ceph-users] Workaround for XFS lockup resulting in down OSDs

2017-02-08 Thread Dan van der Ster
Hi, This is interesting. Do you have a bit more info about how to identify a server which is suffering from this problem? Is there some process (xfs* or kswapd?) we'll see as busy in top or iotop. Also, which kernel are you using? Cheers, Dan On Tue, Feb 7, 2017 at 6:59 PM, Thorvald Natvig wr

Re: [ceph-users] osd_disk_thread_ioprio_priority help

2017-03-13 Thread Dan van der Ster
On Mon, Mar 13, 2017 at 10:35 AM, Florian Haas wrote: > On Sun, Mar 12, 2017 at 9:07 PM, Laszlo Budai wrote: >> Hi Florian, >> >> thank you for your answer. >> >> We have already set the IO scheduler to cfq in order to be able to lower the >> priority of the scrub operations. >> My problem is tha

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-13 Thread Dan van der Ster
On Sat, Mar 11, 2017 at 12:21 PM, wrote: > > The next and biggest problem we encountered had to do with the CRC errors on > the OSD map. On every map update, the OSDs that were not upgraded yet, got > that CRC error and asked the monitor for a full OSD map instead of just a > delta update. At f

[ceph-users] cephfs deep scrub error:

2017-03-13 Thread Dan van der Ster
Hi John, Last week we updated our prod CephFS cluster to 10.2.6 (clients and server side), and for the first time today we've got an object info size mismatch: I found this ticket you created in the tracker, which is why I've emailed you: http://tracker.ceph.com/issues/18240 Here's the detail of

Re: [ceph-users] cephfs deep scrub error:

2017-03-13 Thread Dan van der Ster
On Mon, Mar 13, 2017 at 1:35 PM, John Spray wrote: > On Mon, Mar 13, 2017 at 10:28 AM, Dan van der Ster > wrote: >> Hi John, >> >> Last week we updated our prod CephFS cluster to 10.2.6 (clients and >> server side), and for the first time today we've got an ob

Re: [ceph-users] [ceph-fuse] Quota size change does not notify another ceph-fuse client.

2017-03-14 Thread Dan van der Ster
Hi, This sounds familiar: http://tracker.ceph.com/issues/17939 I found that you can get the updated quota on node2 by touching the base dir. In your case: touch /shares/share0 -- Dan On Tue, Mar 14, 2017 at 10:52 AM, yu2xiangyang wrote: > Dear cephers, > I met a problem when using ce

Re: [ceph-users] CephFS fuse client users stuck

2017-03-16 Thread Dan van der Ster
On Tue, Mar 14, 2017 at 5:55 PM, John Spray wrote: > On Tue, Mar 14, 2017 at 2:10 PM, Andras Pataki > wrote: >> Hi John, >> >> I've checked the MDS session list, and the fuse client does appear on that >> with 'state' as 'open'. So both the fuse client and the MDS agree on an >> open connection.

Re: [ceph-users] XFS attempt to access beyond end of device

2017-03-22 Thread Dan van der Ster
On Wed, Mar 22, 2017 at 8:24 AM, Marcus Furlong wrote: > Hi, > > I'm experiencing the same issue as outlined in this post: > > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/013330.html > > I have also deployed this jewel cluster using ceph-deploy. > > This is the message I see

Re: [ceph-users] Mon not starting after upgrading to 10.2.7

2017-04-12 Thread Dan van der Ster
Can't help, but just wanted to say that the upgrade worked for us: # ceph health HEALTH_OK # ceph tell mon.* version mon.p01001532077488: ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185) mon.p01001532149022: ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185) mon.p01001532

[ceph-users] fsping, why you no work no mo?

2017-04-13 Thread Dan van der Ster
Dear ceph-*, A couple weeks ago I wrote this simple tool to measure the round-trip latency of a shared filesystem. https://github.com/dvanders/fsping In our case, the tool is to be run from two clients who mount the same CephFS. First, start the server (a.k.a. the ping reflector) on one mach

Re: [ceph-users] v12.0.2 Luminous (dev) released

2017-04-25 Thread Dan van der Ster
Hi, The mon's on my test luminous cluster do not start after upgrading from 12.0.1 to 12.0.2. Here is the backtrace: 0> 2017-04-25 11:06:02.897941 7f467ddd7880 -1 *** Caught signal (Aborted) ** in thread 7f467ddd7880 thread_name:ceph-mon ceph version 12.0.2 (5a1b6b3269da99a18984c138c23935

Re: [ceph-users] v12.0.2 Luminous (dev) released

2017-04-25 Thread Dan van der Ster
__ << " loading creating_pgs e" << creating_pgs.last_scan_epoch << dendl; } ... Cheers, Dan On Tue, Apr 25, 2017 at 11:15 AM, Dan van der Ster wrote: > Hi, > > The mon's on my test luminous cluster do not start after upgrading > from 12.0.1 to 12.0.2. Here is the b

Re: [ceph-users] v12.0.2 Luminous (dev) released

2017-04-25 Thread Dan van der Ster
Created ticket to follow up: http://tracker.ceph.com/issues/19769 On Tue, Apr 25, 2017 at 11:34 AM, Dan van der Ster wrote: > Could this change be the culprit? > > commit 973829132bf7206eff6c2cf30dd0aa32fb0ce706 > Author: Sage Weil > Date: Fri Mar 31 09:33:19 2017 -0

Re: [ceph-users] Intel power tuning - 30% throughput performance increase

2017-05-03 Thread Dan van der Ster
Hi Blair, We use cpu_dma_latency=1, because it was in the latency-performance profile. And indeed by setting cpu_dma_latency=0 on one of our OSD servers, powertop now shows the package as 100% in turbo mode. So I suppose we'll pay for this performance boost in energy. But more importantly, can th

Re: [ceph-users] Intel power tuning - 30% throughput performance increase

2017-05-03 Thread Dan van der Ster
On Wed, May 3, 2017 at 9:13 AM, Blair Bethwaite wrote: > We did the latter using the pmqos_static.py, which was previously part of > the RHEL6 tuned latency-performance profile, but seems to have been dropped > in RHEL7 (don't yet know why), It looks like el7's tuned natively supports the pmqos i

Re: [ceph-users] Intel power tuning - 30% throughput performance increase

2017-05-03 Thread Dan van der Ster
On Wed, May 3, 2017 at 10:32 AM, Blair Bethwaite wrote: > On 3 May 2017 at 18:15, Dan van der Ster wrote: >> It looks like el7's tuned natively supports the pmqos interface in >> plugins/plugin_cpu.py. > > Ahha, you are right, but I'm sure I tested tuned an

Re: [ceph-users] Intel power tuning - 30% throughput performance increase

2017-05-03 Thread Dan van der Ster
On Wed, May 3, 2017 at 10:52 AM, Blair Bethwaite wrote: > On 3 May 2017 at 18:38, Dan van der Ster wrote: >> Seems to work for me, or? > > Yeah now that I read the code more I see it is opening and > manipulating /dev/cpu_dma_latency in response to that option, so the > TOD

Re: [ceph-users] Changing SSD Landscape

2017-05-17 Thread Dan van der Ster
I am currently pricing out some DCS3520's, for OSDs. Word is that the price is going up, but I don't have specifics, yet. I'm curious, does your real usage show that the 3500 series don't offer enough endurance? Here's one of our DCS3700's after 2.5 years of RBD + a bit of S3: Model Family:

Re: [ceph-users] Changing SSD Landscape

2017-05-17 Thread Dan van der Ster
On Wed, May 17, 2017 at 11:29 AM, Dan van der Ster wrote: > I am currently pricing out some DCS3520's, for OSDs. Word is that the > price is going up, but I don't have specifics, yet. > > I'm curious, does your real usage show that the 3500 series don't > offer

Re: [ceph-users] Changing SSD Landscape

2017-05-18 Thread Dan van der Ster
On Thu, May 18, 2017 at 3:11 AM, Christian Balzer wrote: > On Wed, 17 May 2017 18:02:06 -0700 Ben Hines wrote: > >> Well, ceph journals are of course going away with the imminent bluestore. > Not really, in many senses. > But we should expect far fewer writes to pass through the RocksDB and its W

Re: [ceph-users] removing cluster name support

2017-06-08 Thread Dan van der Ster
Hi Sage, We need named clusters on the client side. RBD or CephFS clients, or monitoring/admin machines all need to be able to access several clusters. Internally, each cluster is indeed called "ceph", but the clients use distinct names to differentiate their configs/keyrings. Cheers, Dan On J

Re: [ceph-users] Living with huge bucket sizes

2017-06-09 Thread Dan van der Ster
Hi Bryan, On Fri, Jun 9, 2017 at 1:55 AM, Bryan Stillwell wrote: > This has come up quite a few times before, but since I was only working with > RBD before I didn't pay too close attention to the conversation. I'm > looking > for the best way to handle existing clusters that have buckets with a

Re: [ceph-users] removing cluster name support

2017-06-09 Thread Dan van der Ster
On Fri, Jun 9, 2017 at 5:58 PM, Vasu Kulkarni wrote: > On Fri, Jun 9, 2017 at 6:11 AM, Wes Dillingham > wrote: >> Similar to Dan's situation we utilize the --cluster name concept for our >> operations. Primarily for "datamover" nodes which do incremental rbd >> import/export between distinct clus

[ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-14 Thread Dan van der Ster
Dear ceph users, Today we had O(100) slow requests which were caused by deep-scrubbing of the metadata log: 2017-06-14 11:07:55.373184 osd.155 [2001:1458:301:24::100:d]:6837/3817268 7387 : cluster [INF] 24.1d deep-scrub starts ... 2017-06-14 11:22:04.143903 osd.155 [2001:1458:301:24::100:d]:6837/

Re: [ceph-users] Help build a drive reliability service!

2017-06-14 Thread Dan van der Ster
Hi Patrick, We've just discussed this internally and I wanted to share some notes. First, there are at least three separate efforts in our IT dept to collect and analyse SMART data -- its clearly a popular idea and simple to implement, but this leads to repetition and begs for a common, good solu

Re: [ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-19 Thread Dan van der Ster
On Thu, Jun 15, 2017 at 7:56 PM, Casey Bodley wrote: > > On 06/14/2017 05:59 AM, Dan van der Ster wrote: >> >> Dear ceph users, >> >> Today we had O(100) slow requests which were caused by deep-scrubbing >> of the metadata log: >> >> 2017-06-14 11:

Re: [ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-21 Thread Dan van der Ster
'leveldb compact on mount = true' to the osd > config and restarting. > > Casey > > > On 06/19/2017 11:01 AM, Dan van der Ster wrote: >> >> On Thu, Jun 15, 2017 at 7:56 PM, Casey Bodley wrote: >>> >>> On 06/14/2017 05:59 AM, Dan van der Ster

Re: [ceph-users] High disk utilisation

2015-12-10 Thread Dan van der Ster
On Thu, Dec 10, 2015 at 5:06 AM, Christian Balzer wrote: > > Hello, > > On Wed, 9 Dec 2015 15:57:36 + MATHIAS, Bryn (Bryn) wrote: > >> to update this, the error looks like it comes from updatedb scanning the >> ceph disks. >> >> When we make sure it doesn’t, by putting the ceph mount points in

Re: [ceph-users] problem after reinstalling system

2015-12-10 Thread Dan van der Ster
On Wed, Dec 9, 2015 at 1:25 PM, Jacek Jarosiewicz wrote: > 2015-12-09 13:11:51.171377 7fac03c7f880 -1 > filestore(/var/lib/ceph/osd/ceph-5) Error initializing leveldb : Corruption: > 29 missing files; e.g.: /var/lib/ceph/osd/ceph-5/current/omap/046388.sst Did you have .lbd files? If so, this shou

Re: [ceph-users] Ceph monitors 100% full filesystem, refusing start

2016-01-21 Thread Dan van der Ster
On Wed, Jan 20, 2016 at 8:01 PM, Zoltan Arnold Nagy wrote: > > Wouldn’t actually blowing away the other monitors then recreating them from > scratch solve the issue? > > Never done this, just thinking out loud. It would grab the osdmap and > everything from the other monitor and form a quorum, w

Re: [ceph-users] cephfs - inconsistent nfs and samba directory listings

2016-02-05 Thread Dan van der Ster
Thanks for this thread. We just did the same mistake (rmfailed) on our hammer cluster which broke it similarly. The addfailed patch worked for us too. -- Dan On Fri, Jan 15, 2016 at 6:30 AM, Mike Carlson wrote: > Hey ceph-users, > > I wanted to follow up, Zheng's patch did the trick. We re-added

Re: [ceph-users] K is for Kraken

2016-02-09 Thread Dan van der Ster
On Mon, Feb 8, 2016 at 8:10 PM, Sage Weil wrote: > On Mon, 8 Feb 2016, Karol Mroz wrote: >> On Mon, Feb 08, 2016 at 01:36:57PM -0500, Sage Weil wrote: >> > I didn't find any other good K names, but I'm not sure anything would top >> > kraken anyway, so I didn't look too hard. :) >> > >> > For L,

Re: [ceph-users] Large directory block size on XFS may be harmful

2016-02-18 Thread Dan van der Ster
Hi, Thanks for linking to a current update on this problem [1] [2]. I really hope that new Ceph installations aren't still following that old advice... it's been known to be a problem for around a year and a half [3]. That said, the "-n size=64k" wisdom was really prevalent a few years ago, and I

Re: [ceph-users] Large directory block size on XFS may be harmful

2016-02-18 Thread Dan van der Ster
On Thu, Feb 18, 2016 at 3:46 PM, Jens Rosenboom wrote: > 2016-02-18 15:10 GMT+01:00 Dan van der Ster : >> Hi, >> >> Thanks for linking to a current update on this problem [1] [2]. I >> really hope that new Ceph installations aren't still following that >>

Re: [ceph-users] v0.94.6 Hammer released

2016-02-24 Thread Dan van der Ster
Thanks Sage, looking forward to some scrub randomization. Were binaries built for el6? http://download.ceph.com/rpm-hammer/el6/x86_64/ Cheers, Dan On Tue, Feb 23, 2016 at 5:01 PM, Sage Weil wrote: > This Hammer point release fixes a range of bugs, most notably a fix for > unbounded growth of t

Re: [ceph-users] Bug in rados bench with 0.94.6 (regression, not present in 0.94.5)

2016-02-26 Thread Dan van der Ster
I can reproduce and updated the ticket. (I only upgraded the client, not the server). It seems to be related to the new --no-verify option, which is giving strange results -- see the ticket. -- Dan On Fri, Feb 26, 2016 at 11:48 AM, Alexey Sheplyakov wrote: > Christian, > >> Note that "rand" wo

Re: [ceph-users] v0.94.6 Hammer released

2016-02-29 Thread Dan van der Ster
0.94.6 Hammer released > > Hi all, > > should we build el6 packages ourself or, it's hoped that these packages would > be built officially by community? > > > Regards, > > Vladislav Odintsov > > _______

Re: [ceph-users] v0.94.6 Hammer released

2016-02-29 Thread Dan van der Ster
If it can help, it's really very little work for me to send the hammer SRPM to our Koji build system. I think the real work will come if people starting asking for jewel builds on el6 and other old platforms. In that case, if a reputable organisation offers to maintain the builds (+ deps), then IM

<    2   3   4   5   6   7   8   9   >