Re: [ceph-users] Quetions abount osd journal configuration

2014-11-26 Thread Dan Van Der Ster
> On 26 Nov 2014, at 13:47, Christian Balzer wrote: > > On Wed, 26 Nov 2014 05:37:43 -0600 Mark Nelson wrote: > >> On 11/26/2014 04:05 AM, Yujian Peng wrote: > [snip] >> >>> >>> Since the size of jornal partitions on SSDs is 10G, I want to set >>> filestore max sync interval to 30 minutes. Is

Re: [ceph-users] Quetions abount osd journal configuration

2014-11-26 Thread Dan Van Der Ster
Hi, > On 26 Nov 2014, at 17:07, Yujian Peng wrote: > > > Thanks a lot! > IOPS is a bottleneck in my cluster and the object disks are much slower than > SSDs. I don't know whether SSDs will be used as caches if > filestore_max_sync_interval is set to a big value. I will set > filestore_max_s

Re: [ceph-users] Quetions abount osd journal configuration

2014-11-26 Thread Dan Van Der Ster
> On 26 Nov 2014, at 17:26, Dan Van Der Ster wrote: > > Hi, > >> On 26 Nov 2014, at 17:07, Yujian Peng wrote: >> >> >> Thanks a lot! >> IOPS is a bottleneck in my cluster and the object disks are much slower than >> SSDs. I d

[ceph-users] large reads become 512 byte reads on qemu-kvm rbd

2014-11-27 Thread Dan Van Der Ster
Hi all, We throttle (with qemu-kvm) rbd devices to 100 w/s and 100 r/s (and 80MB/s write and read). With fio we cannot exceed 51.2MB/s sequential or random reads, no matter the reading block size. (But with large writes we can achieve 80MB/s). I just realised that the VM subsytem is probably sp

Re: [ceph-users] large reads become 512 kbyte reads on qemu-kvm rbd

2014-11-27 Thread Dan Van Der Ster
, Dan On 27 Nov 2014 18:26, Dan Van Der Ster wrote: Hi all, We throttle (with qemu-kvm) rbd devices to 100 w/s and 100 r/s (and 80MB/s write and read). With fio we cannot exceed 51.2MB/s sequential or random reads, no matter the reading block size. (But with large writes we can achieve 80MB/s). I

Re: [ceph-users] large reads become 512 kbyte reads on qemu-kvm rbd

2014-11-28 Thread Dan Van Der Ster
dev rule or similar that could set max_sectors_kb when a RBD device is attached? Cheers, Dan On 27 Nov 2014, at 20:29, Dan Van Der Ster mailto:daniel.vanders...@cern.ch>> wrote: Oops, I was off by a factor of 1000 in my original subject. We actually have 4M and 8M reads being split

Re: [ceph-users] large reads become 512 kbyte reads on qemu-kvm rbd

2014-11-28 Thread Dan Van Der Ster
this impacts performance? Like small block size performance, etc? Cheers From: "Dan Van Der Ster" mailto:daniel.vanders...@cern.ch>> To: "ceph-users" mailto:ceph-users@lists.ceph.com>> Sent: Friday, 28 November, 2014 1:33:20 PM Subj

Re: [ceph-users] Removing Snapshots Killing Cluster Performance

2014-12-01 Thread Dan Van Der Ster
Hi, Which version of Ceph are you using? This could be related: http://tracker.ceph.com/issues/9487 See "ReplicatedPG: don't move on to the next snap immediately"; basically, the OSD is getting into a tight loop "trimming" the snapshot objects. The fix above breaks out of that loop more frequent

Re: [ceph-users] large reads become 512 kbyte reads on qemu-kvm rbd

2014-12-01 Thread Dan Van Der Ster
Hi Ilya, > On 28 Nov 2014, at 17:56, Ilya Dryomov wrote: > > On Fri, Nov 28, 2014 at 5:46 PM, Dan Van Der Ster > wrote: >> Hi Andrei, >> Yes, I’m testing from within the guest. >> >> Here is an example. First, I do 2MB reads when the max_sectors_kb=512, an

Re: [ceph-users] Removing Snapshots Killing Cluster Performance

2014-12-01 Thread Dan Van Der Ster
> On 01 Dec 2014, at 13:37, Daniel Schneller > wrote: > > On 2014-12-01 10:03:35 +0000, Dan Van Der Ster said: > >> Which version of Ceph are you using? This could be related: >> http://tracker.ceph.com/issues/9487 > > Firefly. I had seen this ticket earlier

Re: [ceph-users] full osdmaps in mon txns

2015-01-05 Thread Dan van der Ster
Hi Sage, On Tue, Dec 23, 2014 at 10:10 PM, Sage Weil wrote: > > This fun issue came up again in the form of 10422: > > http://tracker.ceph.com/issues/10422 > > I think we have 3 main options: > > 1. Ask users to do a mon scrub prior to upgrade to > ensure it is safe. If a mon is out of s

Re: [ceph-users] full osdmaps in mon txns

2015-01-06 Thread Dan van der Ster
On Mon, Jan 5, 2015 at 10:12 AM, Dan van der Ster wrote: > Hi Sage, > > On Tue, Dec 23, 2014 at 10:10 PM, Sage Weil wrote: >> >> This fun issue came up again in the form of 10422: >> >> http://tracker.ceph.com/issues/10422 >> >> I think we ha

Re: [ceph-users] Is ceph production ready? [was: Ceph PG Incomplete = Cluster unusable]

2015-01-07 Thread Dan Van Der Ster
Hi Nico, Yes Ceph is production ready. Yes people are using it in production for qemu. Last time I heard, Ceph was surveyed as the most popular backend for OpenStack Cinder in production. When using RBD in production, it really is critically important to (a) use 3 replicas and (b) pay attention

Re: [ceph-users] NUMA and ceph ... zone_reclaim_mode

2015-01-12 Thread Dan van der Ster
(resending to list) Hi Kyle, I'd like to +10 this old proposal of yours. Let me explain why... A couple months ago we started testing a new use-case with radosgw -- this new user is writing millions of small files and has been causing us some headaches. Since starting these tests, the relevant OS

[ceph-users] NUMA zone_reclaim_mode

2015-01-12 Thread Dan Van Der Ster
(apologies if you receive this more than once... apparently I cannot reply to a 1 year old message on the list). Dear all, I'd like to +10 this old proposal of Kyle's. Let me explain why... A couple months ago we started testing a new use-case with radosgw -- this new user is writing millions of

Re: [ceph-users] NUMA zone_reclaim_mode

2015-01-12 Thread Dan Van Der Ster
On 12 Jan 2015, at 17:08, Sage Weil mailto:s...@newdream.net>> wrote: On Mon, 12 Jan 2015, Dan Van Der Ster wrote: Moving forward, I think it would be good for Ceph to a least document this behaviour, but better would be to also detect when zone_reclaim_mode != 0 and warn the admin

Re: [ceph-users] estimate the impact of changing pg_num

2015-02-01 Thread Dan van der Ster
Hi, I don't know the general calculation, but last week we split a pool with 20 million tiny objects from 512 to 1024 pgs, on a cluster with 80 OSDs. IIRC around 7 million objects needed to move, and it took around 13 hours to finish. The bottleneck in our case was objects per second (limited to ar

Re: [ceph-users] estimate the impact of changing pg_num

2015-02-01 Thread Dan van der Ster
applications don't deal with this well. > > I am not convinced that increase pg_num gradually is the right way to go. Have you tried giving backfilling traffic very low priorities? > > Thanks. > -Simon > > On Sun, Feb 1, 2015 at 2:39 PM, Dan van der Ster wrote: >> >

Re: [ceph-users] estimate the impact of changing pg_num

2015-02-01 Thread Dan van der Ster
bile so can't answer precisely). If you have a large purged_snaps set on the images pool, then I'd bet you're suffering from the snap trim issue I mentioned. 0.80.8 fixes it... You won't see slow requests anymore. Cheers, Dan > Thanks. > -Simon > > > On Sun, Feb

Re: [ceph-users] centos6.4 + libvirt + qemu + rbd/ceph

2013-12-06 Thread Dan van der Ster
See thread a couple days ago "[ceph-users] qemu-kvm packages for centos" On Thu, Dec 5, 2013 at 10:44 PM, Chris C wrote: > I've been working on getting this setup working. I have virtual machines > working using rbd based images by editing the domain directly. > > Is there any way to make the cr

Re: [ceph-users] Basic cephx configuration

2013-12-06 Thread Dan Van Der Ster
Hi, All of our clusters have this in ceph.conf: [global] auth cluster required = cephx auth service required = cephx auth client required = cephx keyring = /etc/ceph/keyring and the client.admin secret in /etc/ceph/keyring: # cat /etc/ceph/keyring [client.admin] key = ... With t

Re: [ceph-users] ceph reliability in large RBD setups

2013-12-09 Thread Dan van der Ster
Hi Felix, I've been running similar calculations recently. I've been using this tool from Inktank to calculate RADOS reliabilities with different assumptions: https://github.com/ceph/ceph-tools/tree/master/models/reliability But I've also had similar questions about RBD (or any multi-part files

Re: [ceph-users] ulimit max user processes (-u) and non-root ceph clients

2013-12-16 Thread Dan van der Ster
ys discussed the client ulimit issues recently and is there a plan in the works? Best Regards, Dan, CERN IT/DSS On Sep 19, 2013 6:10 PM, "Gregory Farnum" wrote: > On Wed, Sep 18, 2013 at 11:43 PM, Dan Van Der Ster > wrote: > > > > On Sep 18, 2013, at 11:50 PM, Gregory Fa

Re: [ceph-users] ulimit max user processes (-u) and non-root ceph clients

2013-12-16 Thread Dan Van Der Ster
On Dec 16, 2013 8:26 PM, Gregory Farnum wrote: > > On Mon, Dec 16, 2013 at 11:08 AM, Dan van der Ster > wrote: > > Hi, > > > > Sorry to revive this old thread, but I wanted to update you on the current > > pains we're going through related to clients'

Re: [ceph-users] ulimit max user processes (-u) and non-root ceph clients

2013-12-17 Thread Dan van der Ster
On Mon, Dec 16, 2013 at 8:36 PM, Dan Van Der Ster wrote: > > On Dec 16, 2013 8:26 PM, Gregory Farnum wrote: >> >> On Mon, Dec 16, 2013 at 11:08 AM, Dan van der Ster >> wrote: >> > Hi, >> > >> > Sorry to revive this old thread, but I wanted

Re: [ceph-users] Storing VM Images on CEPH with RBD-QEMU driver

2013-12-20 Thread Dan van der Ster
Hi, Our fio tests against qemu-kvm on RBD look quite promising, details here: https://docs.google.com/spreadsheet/ccc?key=0AoB4ekP8AM3RdGlDaHhoSV81MDhUS25EUVZxdmN6WHc&usp=drive_web#gid=0 tl;dr: rbd with caching enabled is (1) at least 2x faster than the local instance storage, and (2) reaches the

Re: [ceph-users] Storing VM Images on CEPH with RBD-QEMU driver

2013-12-20 Thread Dan van der Ster
On Fri, Dec 20, 2013 at 9:44 AM, Christian Balzer wrote: > > Hello, > > On Fri, 20 Dec 2013 09:20:48 +0100 Dan van der Ster wrote: > >> Hi, >> Our fio tests against qemu-kvm on RBD look quite promising, details here: >> >> http

Re: [ceph-users] Storing VM Images on CEPH with RBD-QEMU driver

2013-12-20 Thread Dan van der Ster
On Fri, Dec 20, 2013 at 6:19 PM, James Pearce wrote: > > "fio --size=100m --ioengine=libaio --invalidate=1 --direct=1 --numjobs=10 > --rw=read --name=fiojob --blocksize_range=4K-512k --iodepth=16" > > Since size=100m so reads would be entirely cached --invalidate=1 drops the cache, no? Our result

[ceph-users] backfilling after OSD marked out _and_ OSD removed

2014-01-09 Thread Dan Van Der Ster
any data on those OSDs and all PGs being previously healthy. Is this expected? Is there a way to avoid the 2nd rebalance? Best Regards, Dan van der Ster CERN IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi

Re: [ceph-users] backfilling after OSD marked out _and_ OSD removed

2014-01-09 Thread Dan Van Der Ster
ceph osd crush reweight osd.1036 2.5 it is going to result in some backfilling. Why? Cheers, Dan On 09 Jan 2014, at 12:11, Dan van der Ster wrote: > Hi, > I’m slightly confused about one thing we are observing at the moment. We’re > testing the shutdown/removal of OSD servers an

Re: [ceph-users] backfilling after OSD marked out _and_ OSD removed

2014-01-09 Thread Dan Van Der Ster
Thanks Greg. One thought I had is that I might try just crush rm'ing the OSD instead of or just after marking it out... That should avoid the double rebalance, right? Cheers, Dan On Jan 9, 2014 7:57 PM, Gregory Farnum wrote: On Thu, Jan 9, 2014 at 6:27 AM, Dan Van Der Ster wrote: >

Re: [ceph-users] radosgw machines virtualization

2014-02-06 Thread Dan van der Ster
Hi, Our three radosgw's are OpenStack VMs. Seems to work for our (limited) testing, and I don't see a reason why it shouldn't work. Cheers, Dan -- Dan van der Ster || Data & Storage Services || CERN IT Department -- On Thu, Feb 6, 2014 at 2:12 PM, Dominik Mostowiec wro

Re: [ceph-users] RBD Caching - How to enable?

2014-02-06 Thread Dan van der Ster
On Thu, Feb 6, 2014 at 12:11 PM, Alexandre DERUMIER wrote: >>>Do the VMs using RBD images need to be restarted at all? > I think yes. In our case, we had to restart the hypervisor qemu-kvm process to enable caching. Cheers, Dan ___ ceph-users mailing l

[ceph-users] slow requests from rados bench with small writes

2014-02-15 Thread Dan van der Ster
source/queue is being exhausted during these tests? Oh yeah, we're running latest dumpling stable, 0.67.5, on the servers. Best Regards, Thanks in advance! Dan -- Dan van der Ster || Data & Storage Services || CERN IT Department -- ___ ceph

Re: [ceph-users] slow requests from rados bench with small writes

2014-02-16 Thread Dan van der Ster
a very long tail of writes. I hope that someone will chip in if they've already been down this path and has advice/warnings. Cheers, dan -- Dan van der Ster || Data & Storage Services || CERN IT Department -- On Sat, Feb 15, 2014 at 11:48 PM, Dan van der Ster wrote: > Dear Ceph

[ceph-users] hashpspool and backfilling

2014-02-18 Thread Dan van der Ster
e+degraded+remapped+wait_backfill [884,1186,122,841] [884,1186,182,1216] 5.1bc active+degraded+remapped+wait_backfill [884,1186,122,841] [884,1186,182,1216] 32.1a1 active+degraded+remapped+backfilling [884,1186,122] [884,1186,1216] full details at: http://pastebin.com/raw.php?i=LBpx5VsD -- Dan va

Re: [ceph-users] hashpspool and backfilling

2014-02-20 Thread Dan van der Ster
Hi, On Thu, Feb 20, 2014 at 7:47 PM, Gregory Farnum wrote: > On Tue, Feb 18, 2014 at 8:21 AM, Dan van der Ster > wrote: > > Hi, > > Today I've noticed an interesting result of not have hashpspool > > enabled on a number of pools -- backfilling is delayed. > >

Re: [ceph-users] CephFS and slow requests

2014-02-21 Thread Dan van der Ster
355 [write 0~4194304 [12@0],startsync 0~0] 0.c36d4557 snapc 1=[] e42655) v4 currently waiting for subops from [558,827] Cheers, Dan -- Dan van der Ster || Data & Storage Services || CERN IT Department -- On Thu, Feb 20, 2014 at 4:02 PM, Gregory Farnum wrote: > Arne, > Sorry th

Re: [ceph-users] CephFS and slow requests

2014-02-24 Thread Dan van der Ster
also explain why only the cephfs writes are becoming slow -- the 2kHz of other (mostly RBD) IOs are not affected by this "overload". Cheers, Dan -- Dan van der Ster || Data & Storage Services || CERN IT Department -- On Tue, Feb 25, 2014 at 7:25 AM, Gregory Farnum wrote: > I&#x

[ceph-users] Qemu iotune values for RBD

2014-03-06 Thread Dan van der Ster
Hi all, We're about to go live with some qemu rate limiting to RBD, and I wanted to crosscheck our values with this list, in case someone can chime in with their experience or known best practices. The only reasonable, non test-suite, values I found on the web are: iops_wr 200 iops_rd 400 bps_

Re: [ceph-users] Qemu iotune values for RBD

2014-03-07 Thread Dan van der Ster
On Thu, Mar 6, 2014 at 10:54 PM, Wido den Hollander wrote: > On 03/06/2014 08:38 PM, Dan van der Ster wrote: >> >> Hi all, >> >> We're about to go live with some qemu rate limiting to RBD, and I >> wanted to crosscheck our values with this list, in c

Re: [ceph-users] if partition name changes, will ceph get corrupted?

2014-03-12 Thread Dan Van Der Ster
We use /dev/disk/by-path for this reason, but we confirmed that is stable for our HBAs. Maybe /dev/disk/by-something is consistent with your controller. Cheers, Dan -- Dan van der Ster || Data & Storage Services || CERN IT Department -- Original Message From: Sidh

Re: [ceph-users] OSD down after PG increase

2014-03-13 Thread Dan Van Der Ster
Why do you create so many PGs ?? The goal is 100 per OSD, with your numbers you have 3 * (48000) / 140 ~= 1000 per OSD. -- Dan van der Ster || Data & Storage Services || CERN IT Department -- On 13 Mar 2014 at 11:11:16, Kasper Dieter (dieter.kas...@ts.fujitsu.com<mailto:die

Re: [ceph-users] OSD down after PG increase

2014-03-13 Thread Dan Van Der Ster
On 13 Mar 2014 at 10:46:13, Gandalf Corvotempesta (gandalf.corvotempe...@gmail.com) wrote: > Yes, if you have essentially high amount of commited data in the cluster > and/or large number of PG(tens of thousands). I've increased from 64 to 8192 PGs Do you

Re: [ceph-users] OSD down after PG increase

2014-03-13 Thread Dan Van Der Ster
On 13 Mar 2014 at 11:23:44, Gandalf Corvotempesta (gandalf.corvotempe...@gmail.com<mailto:gandalf.corvotempe...@gmail.com>) wrote: 2014-03-13 11:19 GMT+01:00 Dan Van Der Ster : > Do you mean you used PG splitting? > > You should split PGs by a factor of 2x at a time. So to get

Re: [ceph-users] OSD down after PG increase

2014-03-13 Thread Dan Van Der Ster
On 13 Mar 2014 at 11:26:55, Gandalf Corvotempesta (gandalf.corvotempe...@gmail.com) wrote: I'm also unsure if 8192 PGs are correct for my cluster. At maximum i'll have 168 OSDs (14 servers, 12 disks each, 1 osd per disk), with replica set to 3, so: (168*100

Re: [ceph-users] OSD down after PG increase

2014-03-13 Thread Dan Van Der Ster
On 13 Mar 2014 at 11:41:30, Gandalf Corvotempesta (gandalf.corvotempe...@gmail.com<mailto:gandalf.corvotempe...@gmail.com>) wrote: 2014-03-13 11:32 GMT+01:00 Dan Van Der Ster : > Do you have any other pools? Remember that you need to include _all_ pools > in the PG calculation, not j

Re: [ceph-users] PG Calculations

2014-03-14 Thread Dan Van Der Ster
correct me if I'm wrong. With your config you must have an avg 400 PGs per OSD. Do you find peering/backfilling/recovery to be responsive? How is the CPU and memory usage of your OSDs during backfilling? Cheers, Dan -- Dan van der Ster || Data & Storage Services || CERN IT Department --

Re: [ceph-users] Get list of all RADOSGW users

2014-03-20 Thread Dan Van Der Ster
Or radosgw-admin metadata list user On Mar 20, 2014 7:23 PM, "Michael J. Kidd" wrote: How about this: rados ls -p .users.uid Your pool name may vary, but should contain the .users.uid extension. Michael J. Kidd Sr. Storage Consultant Inktank Professional Services On Thu, Mar 20, 2014 at 2:00

[ceph-users] How to upgrade ceph v0.72.2 Emperor to v0.8

2014-03-25 Thread Thanh. Tran Van (3)
Hi, I finding how to upgrade my current ceph (v.72.2 emperor) to v0.8, I searched on Internet and to found in website ceph.com, but I haven't seen any guide yet. please show me how to upgrade or give some resources about how to perform. Best retgards, Thanh Tran

[ceph-users] TCP failed connection attempts

2014-03-26 Thread Dan Van Der Ster
— but this doesn’t look good. Do others have similar numbers? Does anyone know if some ipv4 sysctl tuning can clear this up? Cheers, Dan -- Dan van der Ster || Data & Storage Services || CERN IT Department -- ___ ceph-users mailing list ceph-u

Re: [ceph-users] TCP failed connection attempts

2014-03-26 Thread Dan Van Der Ster
Thanks, I’ll try that. (Our current settings are the exact opposite of your suggestion). I found an old thread discussing a new option, ms tcp rcvbuf, but I found that it is still not enabled by default in dumpling: "ms_tcp_rcvbuf": "0", Not sure if that’s related. Chee

Re: [ceph-users] TCP failed connection attempts

2014-03-27 Thread Dan Van Der Ster
ct attempts should be replication from other OSDs. The suggested sysctl changes didn’t stop the failed conn attempts from increasing. I’m going to keep looking around… Cheers, Dan -- Dan van der Ster || Data & Storage Services || CERN IT Department -- _

Re: [ceph-users] TCP failed connection attempts

2014-03-27 Thread Dan Van Der Ster
d from above. So I’m still looking … Cheers, Dan -- Dan van der Ster || Data & Storage Services || CERN IT Department -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Linux kernel module / package / drivers needed for RedHat 6.4 to work with CEPH RBD

2014-03-28 Thread Dan Van Der Ster
We use the elrepo kernel-ml (kernel mainline) packages for some kernel RBD RHEL 6 clients. Works quite well, but obviously RedHat won’t give you enterprise support for it. http://elrepo.org/tiki/kernel-ml Cheers, Dan -- Dan van der Ster || Data & Storage Services || CERN IT Depart

Re: [ceph-users] Security Hole?

2014-03-31 Thread Dan Van Der Ster
Hi, I can't reproduce that with a dumpling cluster: # cat ceph.client.dpm.keyring [client.dpm] key = xxx caps mon = "allow r" caps osd = "allow x, allow rwx pool=dpm" # ceph health --id dpm HEALTH_OK # ceph auth list --id dpm Error EACCES: access denied Cheers, Dan _

Re: [ceph-users] Mon hangs when started after Emperor upgrade

2014-03-31 Thread Dan Van Der Ster
Perhaps as a workaround you should just wipe this mon's data dir and remake it? In the past when I upgraded our mons from spinning disks to SSDs, I went through a procedure to remake each mon from scratch (wiping and resyncing each mon's leveldb one at a time). I did something like this: servic

Re: [ceph-users] Largest Production Ceph Cluster

2014-04-01 Thread Dan Van Der Ster
other instances for ongoing tests. BTW#2, I don’t think the CERN cluster is the largest. Isn’t DreamHost’s bigger? Cheers, Dan -- Dan van der Ster || Data & Storage Services || CERN IT Department -- ___ ceph-users mailing list ceph-users@lists.

Re: [ceph-users] Largest Production Ceph Cluster

2014-04-03 Thread Dan Van Der Ster
Hi, On Apr 3, 2014 4:49 AM, Christian Balzer wrote: > > On Tue, 1 Apr 2014 14:18:51 +0000 Dan Van Der Ster wrote: > > [snip] > > > > > > http://www.slideshare.net/Inktank_Ceph/scaling-ceph-at-cern > > > > [snap] > > In that slide it says th

Re: [ceph-users] out then rm / just rm an OSD?

2014-04-03 Thread Dan Van Der Ster
Hi, By my observation, I don't think that marking it out before crush rm would be any safer. Normally what I do (when decommissioning an OSD or whole server) is stop the OSD process, then crush rm / osd rm / auth del the OSD shortly afterwards, before the down out interval expires. Since the OS

Re: [ceph-users] Small Production system with Openstack

2014-04-05 Thread Dan Van Der Ster
Hi, I'm not looking at your hardware in detail (except to say that you absolutely must have 3 monitors and that I don't know what a load balancer would be useful for in this setup), but perhaps the two parameters below may help you evaluate your system. To estimate the IOPS capacity of your cl

Re: [ceph-users] ceph cluster health monitoring

2014-04-11 Thread Dan Van Der Ster
It’s pretty basic, but we run this hourly: https://github.com/cernceph/ceph-scripts/blob/master/ceph-health-cron/ceph-health-cron -- Dan van der Ster || Data & Storage Services || CERN IT Department -- On 11 Apr 2014 at 09:12:13, Pavel V. Kaygorodov (pa...@inasan.ru<mailto:pa...@in

Re: [ceph-users] Useful visualizations / metrics

2014-04-13 Thread Dan Van Der Ster
For our cluster we monitor write latency by running a short (10s) rados bench with one thread writing 64kB objects, every 5 minutes or so. rados bench tells you the min, max, and average of those writes -- we plot them all. An example is attached. The latency and other metrics that we plot (inc

Re: [ceph-users] multi-mds and directory sharding

2014-04-14 Thread Dan Van Der Ster
ltimds.  Does the dev version of the MDS rely on any dev features in RADOS? ie. can we use a dumpling or emperor cluster with dev MDS? And what is the status of fuse cephfs in the new dev version? Is that up to date with the latest kernel client? Cheers, Dan -- Dan van der Ster || Data & Storage

[ceph-users] RBD write access patterns and atime

2014-04-16 Thread Dan van der Ster
096: 675012 8192: 488194 516096: 342771 16384: 187577 65536: 87783 131072: 87279 12288: 66735 49152: 50170 24576: 47794 262144: 45199 466944: 23064 So reads are mostly 512kB, which is probably some default read-ahead size. -- Dan van der Ster || Data & Storage Services || CERN IT Department -- ___

Re: [ceph-users] RBD write access patterns and atime

2014-04-17 Thread Dan van der Ster
Hi, Gregory Farnum wrote: I forget which clients you're using — is rbd caching enabled? Yes, the clients are qemu-kvm-rhev with latest librbd from dumpling and rbd cache = true. Cheers, Dan -- Dan van der Ster || Data & Storage Services || CERN IT D

Re: [ceph-users] RBD write access patterns and atime

2014-04-17 Thread Dan van der Ster
/4096 The last num is the size of the write/read. Then run this: https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl Cheers, Dan -- -- Dan van der Ster || Data & Storage Services || CERN IT Department -- ___ ceph-users mailing

Re: [ceph-users] RBD write access patterns and atime

2014-04-17 Thread Dan van der Ster
y default on our RHEL6 client VMs, so it probably isn't the file accesses leading to many small writes. Any other theories? Cheers, Dan -- Dan van der Ster || Data & Storage Services || CERN IT Department -- ___ ceph-users mailing list ceph-users@

Re: [ceph-users] OSD distribution unequally

2014-04-18 Thread Dan Van Der Ster
ceph osd reweight-by-utilization Is that still in 0.79? I'd start with reweight-by-utilization 200 and then adjust that number down until you get to 120 or so. Cheers, Dan On Apr 18, 2014 12:49 PM, Kenneth Waegeman wrote: Hi, Some osds of our cluster filled up: health HEALTH_ERR 1 full

Re: [ceph-users] Pool with empty name recreated

2014-04-24 Thread Dan van der Ster
17 '' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 3347 owner 0 ceph version 0.67.7 (d7ab4244396b57aac8b7e80812115bbd079e6b73) How can i delete it forever? -- Dan van der Ster || Data & Storage Services || CERN IT Department -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Unable to add CEPH as Primary Storage - libvirt error undefined storage pool type

2014-04-28 Thread Dan van der Ster
On 28/04/14 14:54, Wido den Hollander wrote: On 04/28/2014 02:15 PM, Andrija Panic wrote: Thank you very much Wido, any suggestion on compiling libvirt with support (I already found a way) or perhaps use some prebuilt , that you would recommend ? No special suggestions, just make sure you us

Re: [ceph-users] Unable to add CEPH as Primary Storage - libvirt error undefined storage pool type

2014-04-28 Thread Dan van der Ster
ready have rbd enabled qemu, qemu-img etc from ceph.com <http://ceph.com> site) I need just libvirt with rbd support ? Thanks On 28 April 2014 15:05, Andrija Panic <mailto:andrija.pa...@gmail.com>> wrote: Thanks Dan :) On 28 April 2014 15:02, Dan van der Ster mailto:

Re: [ceph-users] Replace journals disk

2014-05-06 Thread Dan Van Der Ster
I've followed this recipe successfully in the past: http://wiki.skytech.dk/index.php/Ceph_-_howto,_rbd,_lvm,_cluster#Add.2Fmove_journal_in_running_cluster On May 6, 2014 12:34 PM, Gandalf Corvotempesta wrote: > > Hi to all, > I would like to replace a disk used as journal (one partition for eac

Re: [ceph-users] Cache tiering

2014-05-07 Thread Dan van der Ster
Hi, Gregory Farnum wrote: 3) The cost of a cache miss is pretty high, so they should only be used when the active set fits within the cache and doesn't change too frequently. Can you roughly quantify how long a cache miss would take? Naively I'd assume it would turn one read into a read from

Re: [ceph-users] v0.80 Firefly released

2014-05-07 Thread Dan van der Ster
Hi, Sage Weil wrote: **Primary affinity*: Ceph now has the ability to skew selection of OSDs as the "primary" copy, which allows the read workload to be cheaply skewed away from parts of the cluster without migrating any data. Can you please elaborate a bit on this one? I found the bl

[ceph-users] 0.67.7 rpms changed today??

2014-05-08 Thread Dan van der Ster
-0.67.7-0.el6.x86_64.rpm (568 kiB) Yet the dates haven't changed. Is that understood? It's not a malicious incident, is it? Cheers, Dan -- Dan van der Ster || Data & Storage Services || CERN IT Department -- ___ ceph-users mailing l

Re: [ceph-users] Bulk storage use case

2014-05-13 Thread Dan van der Ster
Hi, I think you're not getting many replies simply because those are rather large servers and not many have such hardware in prod. We run with 24x3TB drives, 64GB ram, one 10Gbit NIC. Memory-wise there are no problems. Throughput-wise, the bottleneck is somewhere between the NIC (~1GB/s) and

[ceph-users] testing a crush rule against an out osd

2015-09-02 Thread Dan van der Ster
Hi all, We just ran into a small problem where some PGs wouldn't backfill after an OSD was marked out. Here's the relevant crush rule; being a non-trivial example I'd like to test different permutations of the crush map (e.g. increasing choose_total_tries): rule critical { ruleset 4

Re: [ceph-users] testing a crush rule against an out osd

2015-09-02 Thread Dan van der Ster
On Wed, Sep 2, 2015 at 4:11 PM, Sage Weil wrote: > On Wed, 2 Sep 2015, Dan van der Ster wrote: >> ... >> Normally I use crushtool --test --show-mappings to test rules, but >> AFAICT it doesn't let you simulate an out osd, i.e. with reweight = 0. >> Any ideas ho

Re: [ceph-users] testing a crush rule against an out osd

2015-09-02 Thread Dan van der Ster
On Wed, Sep 2, 2015 at 4:23 PM, Sage Weil wrote: > On Wed, 2 Sep 2015, Dan van der Ster wrote: >> On Wed, Sep 2, 2015 at 4:11 PM, Sage Weil wrote: >> > On Wed, 2 Sep 2015, Dan van der Ster wrote: >> >> ... >> >> Normally I use crushtool --test --show-m

Re: [ceph-users] testing a crush rule against an out osd

2015-09-02 Thread Dan van der Ster
On Wed, Sep 2, 2015 at 7:23 PM, Sage Weil wrote: > On Wed, 2 Sep 2015, Dan van der Ster wrote: >> On Wed, Sep 2, 2015 at 4:23 PM, Sage Weil wrote: >> > On Wed, 2 Sep 2015, Dan van der Ster wrote: >> >> On Wed, Sep 2, 2015 at 4:11 PM, Sage Weil wrote: >> >&

Re: [ceph-users] move/upgrade from straw to straw2

2015-09-21 Thread Dan van der Ster
On Mon, Sep 21, 2015 at 12:11 PM, Wido den Hollander wrote: > You can also change 'straw_calc_version' to 2 in the CRUSHMap. AFAIK straw_calc_version = 1 is the optimal. straw_calc_version = 2 is not defined. See src/crush/builder.c Cheers, Dan ___ cep

Re: [ceph-users] Important security noticed regarding release signing key

2015-09-21 Thread Dan van der Ster
On Mon, Sep 21, 2015 at 3:50 PM, Wido den Hollander wrote: > > > On 21-09-15 15:05, SCHAER Frederic wrote: >> Hi, >> >> Forgive the question if the answer is obvious... It's been more than "an >> hour or so" and eu.ceph.com apparently still hasn't been re-signed or at >> least what I checked was

Re: [ceph-users] Antw: Hammer reduce recovery impact

2015-09-23 Thread Dan van der Ster
On Wed, Sep 23, 2015 at 1:44 PM, Steffen Weißgerber wrote: > "... osd recovery op priority: This is > the priority set for recovery operation. Lower the number, higher the > recovery priority. > Higher recovery priority might cause performance degradation until recovery > completes. " > > So w

Re: [ceph-users] v9.1.0 Infernalis release candidate released

2015-10-14 Thread Dan van der Ster
: fix peek_queue locking in FileStore (Xinze Chi) >> * osd: fix promotion vs full cache tier (Samuel Just) >> * osd: fix replay requeue when pg is still activating (#13116 Samuel Just) >> * osd: fix scrub stat bugs (Sage Weil, Samuel Just) >> * osd: force promotion for ops EC

Re: [ceph-users] ceph-mon crash after update to Hammer 0.94.3 from Firefly 0.80.10

2015-10-16 Thread Dan van der Ster
Hi, Is there a backtrace in /var/log/ceph/ceph-mon.*.log ? Cheers, Dan On Fri, Oct 16, 2015 at 12:46 PM, Richard Bade wrote: > Hi Everyone, > I upgraded our cluster to Hammer 0.94.3 a couple of days ago and today we've > had one monitor crash twice and another one once. We have 3 monitors total >

Re: [ceph-users] ceph-mon crash after update to Hammer 0.94.3 from Firefly 0.80.10

2015-10-16 Thread Dan van der Ster
a]#012 15: > (DispatchQueue::DispatchThread::entry()+0xd) [0x79e4ad]#012 16: (()+0x7e9a) > [0x7f4ca50d8e9a]#012 17: (clone()+0x6d) [0x7f4ca3dca38d]#012 NOTE: a copy of > the executable, or `objdump -rdS ` is needed to interpret this. > > Regards, > Richard > > On 17 October 2015 a

Re: [ceph-users] CephFS and page cache

2015-10-19 Thread Dan van der Ster
On Mon, Oct 19, 2015 at 12:34 PM, John Spray wrote: > On Mon, Oct 19, 2015 at 8:59 AM, Burkhard Linke > wrote: >> Hi, >> >> On 10/19/2015 05:27 AM, Yan, Zheng wrote: >>> >>> On Sat, Oct 17, 2015 at 1:42 AM, Burkhard Linke >>> wrote: Hi, I've noticed that CephFS (both ceph-fus

Re: [ceph-users] PG won't stay clean

2015-10-26 Thread Dan van der Ster
On Mon, Oct 26, 2015 at 4:38 AM, Robert LeBlanc wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA256 > > I set debug_osd = 20/20 and restarted the primary osd. The logs are at > http://162.144.87.113/files/ceph-osd.110.log.xz . > > The PG in question is 9.e3 and it is one of 15 that have thi

Re: [ceph-users] Understanding the number of TCP connections between clients and OSDs

2015-10-27 Thread Dan van der Ster
On Mon, Oct 26, 2015 at 10:48 PM, Jan Schermer wrote: > If we're talking about RBD clients (qemu) then the number also grows with > number of volumes attached to the client. I never thought about that but it might explain a problem we have where multiple attached volumes crashes an HV. I had assu

[ceph-users] Package ceph-debuginfo-0.94.5-0.el7.centos.x86_64.rpm is not signed

2015-10-28 Thread Dan van der Ster
Hi, During a repo sync, I got: Package ceph-debuginfo-0.94.5-0.el7.centos.x86_64.rpm is not signed Indeed: # rpm -K http://download.ceph.com/rpm-hammer/el7/x86_64/ceph-debuginfo-0.94.5-0.el7.centos.x86_64.rpm http://download.ceph.com/rpm-hammer/el7/x86_64/ceph-debuginfo-0.94.5-0.el7.centos.x

Re: [ceph-users] v9.2.0 Infernalis released

2015-11-08 Thread Dan van der Ster
On Mon, Nov 9, 2015 at 6:39 AM, Francois Lafont wrote: > 0: 10.0.2.101:6789/0 mon.1 > 1: 10.0.2.102:6789/0 mon.2 > 2: 10.0.2.103:6789/0 mon.3 Mon rank vs. Mon id is super confusing, especially if you use a number for the mon id. In your case: 0 -> mon.0 (which has id mon.1) 1 -> mon.1 (which has

Re: [ceph-users] Flapping OSDs, Large meta directories in OSDs

2015-11-30 Thread Dan van der Ster
The trick with debugging heartbeat errors is to grep back through the log to find the last thing the affected thread was doing, e.g. is 0x7f5affe72700 stuck in messaging, writing to the disk, reading through the omap, etc.. I agree this doesn't look to be network related, but if you want to rule i

Re: [ceph-users] Number of OSD map versions

2015-11-30 Thread Dan van der Ster
I wouldn't run with those settings in production. That was a test to squeeze too many OSDs into too little RAM. Check the values from infernalis/master. Those should be safe. -- Dan On 30 Nov 2015 21:45, "George Mihaiescu" wrote: > Hi, > > I've read the recommendation from CERN about the number

Re: [ceph-users] Flapping OSDs, Large meta directories in OSDs

2015-12-01 Thread Dan van der Ster
of them, and then it wakes up and realizes it > can delete a bunch of them... If all OSDs are up and all PGs are active+clean I don't know what would cause an OSD to need old maps. The debug logs should help. Another tool to use is perf top. With that you can see if the OSD is busy in some l

[ceph-users] radosgw in 0.94.5 leaking memory?

2015-12-02 Thread Dan van der Ster
Hi, We've had increased user activity on our radosgw boxes the past two days and are finding that the radosgw is growing quickly in used memory. Most of our gateways are VMs with 4GB of memory and these are getting OOM-killed after ~30 mins of high user load. We added a few physical gateways with

Re: [ceph-users] Removing OSD - double rebalance?

2015-12-02 Thread Dan van der Ster
Here's something that I didn't see mentioned in this thread yet: the set of PGs mapped to an OSD is a function of the ID of that OSD. So, if you replace a drive but don't reuse the same OSD ID for the replacement, you'll have more PG movement than if you kept the ID. -- dan On Wed, Dec 2, 2015 at

Re: [ceph-users] radosgw in 0.94.5 leaking memory?

2015-12-02 Thread Dan van der Ster
On Wed, Dec 2, 2015 at 11:09 AM, Dan van der Ster wrote: > Hi, > > We've had increased user activity on our radosgw boxes the past two > days and are finding that the radosgw is growing quickly in used > memory. Most of our gateways are VMs with 4GB of memory and these are

Re: [ceph-users] Incomplete PGs, how do I get them back without data loss?

2016-05-11 Thread Dan van der Ster
Hi George, Which version of Ceph is this? I've never had incompete pgs stuck like this before. AFAIK it means that osd.52 would need to be brought up before you can restore those PGs. Perhaps you'll need ceph-objectstore-tool to help dump osd.52 and bring up its data elsewhere. A quick check on t

Re: [ceph-users] Incomplete PGs, how do I get them back without data loss?

2016-05-12 Thread Dan van der Ster
d about 1GB (~27%) less data, everything has just been really inconsistent. > > Here's hoping Cunningham will come to the rescue. > > Cheers, > > George > > > From: Dan van der Ster [d...@vanderster.com] > Sent: 11 May 2016 17

<    1   2   3   4   5   6   7   8   9   >