Re: [ceph-users] Consumer-grade SSD in Ceph

2020-01-03 Thread Reed Dier
Also, just for more diversity, Samsung has the 883 DCT and the 860 DCT models as well. Both less than 1 DWPD, but they are enterprise rated. Reed > On Jan 3, 2020, at 2:10 AM, Eneko Lacunza wrote: > > I'm sure you know also the following, but just in case: > - Intel SATA D3-S4610 (I think they

Re: [ceph-users] Local Device Health PG inconsistent

2019-10-02 Thread Reed Dier
ow my cluster is happy once more. So, in case anyone else runs into this issue, and doesn't think to run pg repair on the pg in question, in this case, go for it. Reed > On Sep 23, 2019, at 9:07 AM, Reed Dier wrote: > > And to come full circle, > > After this whole saga, I

Re: [ceph-users] Local Device Health PG inconsistent

2019-09-23 Thread Reed Dier
1 errors Nothing fancy set for the plugin: > $ ceph config dump | grep device > global basicdevice_failure_prediction_mode local > mgr advanced mgr/devicehealth/enable_monitoring true Reed > On Sep 18, 2019, at 11:33 AM, Reed Dier wrote: > > And to provide

Re: [ceph-users] Local Device Health PG inconsistent

2019-09-18 Thread Reed Dier
11d6862d55be) > nautilus (stable)": 1 > }, > "overall": { > "ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) > nautilus (stable)": 206, > "ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) > nautilu

Re: [ceph-users] Local Device Health PG inconsistent

2019-09-18 Thread Reed Dier
gt; ceph_abort_msg("unexpected error") Of the 5 OSD's now down, 3 of them are the serving OSD's for pg 30.0 (that has now been erased), > OSD_DOWN 5 osds down > osd.5 is down > osd.12 is down > osd.128 is down > osd.183 is down &g

[ceph-users] Local Device Health PG inconsistent

2019-09-12 Thread Reed Dier
Trying to narrow down a strange issue where the single PG for the device_health_metrics that was created when I enabled the 'diskprediction_local' module in the ceph-mgr. But I never see any inconsistent objects in the PG. > $ ceph health detail > OSD_SCRUB_ERRORS 1 scrub errors > PG_DAMAGED Po

Re: [ceph-users] iostat and dashboard freezing

2019-09-12 Thread Reed Dier
more optimizations, and also not running, due to some OSDs being marked as nearfull, again, because of poor distribution. Since running with balancer turned off, I have had very few issues with my MGRs. Reed > On Sep 9, 2019, at 11:19 PM, Konstantin Shalygin wrote: > > On 8/29/19 9:

Re: [ceph-users] iostat and dashboard freezing

2019-08-29 Thread Reed Dier
See responses below. > On Aug 28, 2019, at 11:13 PM, Konstantin Shalygin wrote: >> Just a follow up 24h later, and the mgr's seem to be far more stable, and >> have had no issues or weirdness after disabling the balancer module. >> >> Which isn't great, because the balancer plays an important r

Re: [ceph-users] iostat and dashboard freezing

2019-08-28 Thread Reed Dier
ugh' I'm taking the stability. Just wanted to follow up with another 2¢. Reed > On Aug 27, 2019, at 11:53 AM, Reed Dier wrote: > > Just to further piggyback, > > Probably the most "hard" the mgr seems to get pushed is when the balancer is > engaged. >

Re: [ceph-users] iostat and dashboard freezing

2019-08-27 Thread Reed Dier
NVMe OSD > for a replicated cephfs metadata pool. > > let me know if the balancer is your problem too... > > best, > > Jake > > On 8/27/19 3:57 PM, Jake Grimmett wrote: >> Yes, the problem still occurs with the dashboard disabled... >> >> Possibly rele

Re: [ceph-users] iostat and dashboard freezing

2019-08-27 Thread Reed Dier
I'm currently seeing this with the dashboard disabled. My instability decreases, but isn't wholly cured, by disabling prometheus and rbd_support, which I use in tandem, as the only thing I'm using the prom-exporter for is the per-rbd metrics. > ceph mgr module ls > { > "enabled_modules": [

Re: [ceph-users] iostat and dashboard freezing

2019-08-27 Thread Reed Dier
Curious what dist you're running on, as I've been having similar issues with instability in the mgr as well, curious if any similar threads to pull at. While the iostat command is running, is the active mgr using 100% CPU in top? Reed > On Aug 27, 2019, at 6:41 AM, Jake Grimmett wrote: > > De

Re: [ceph-users] ceph status: pg backfill_toofull, but all OSDs have enough space

2019-08-21 Thread Reed Dier
Just chiming in to say that I too had some issues with backfill_toofull PGs, despite no OSD's being in a backfill_full state, albeit, there were some nearfull OSDs. I was able to get through it by reweighting down the OSD that was the target reported by ceph pg dump | grep 'backfill_toofull'.

[ceph-users] compat weight reset

2019-08-02 Thread Reed Dier
Hi all, I am trying to find a simple way that might help me better distribute my data, as I wrap up my Nautilus upgrades. Currently rebuilding some OSD's with bigger block.db to prevent BlueFS spillover where it isn't difficult to do so, and I'm once again struggling with unbalanced distributi

Re: [ceph-users] How to add 100 new OSDs...

2019-07-24 Thread Reed Dier
Just chiming in to say that this too has been my preferred method for adding [large numbers of] OSDs. Set the norebalance nobackfill flags. Create all the OSDs, and verify everything looks good. Make sure my max_backfills, recovery_max_active are as expected. Make sure everything has peered. Unse

Re: [ceph-users] Need to replace OSD. How do I find physical disk

2019-07-18 Thread Reed Dier
You can use ceph-volume to get the LV ID > # ceph-volume lvm list > > == osd.24 == > > [block] > /dev/ceph-edeb727e-c6d3-4347-bfbb-b9ce7f60514b/osd-block-1da5910e-136a-48a7-8cf1-1c265b7b612a > > type block > osd id24 > osd

Re: [ceph-users] Ubuntu 18.04 - Mimic - Nautilus

2019-07-10 Thread Reed Dier
6.04 and 14.2.1. ? > -Ed > >> On Jul 10, 2019, at 1:46 PM, Reed Dier > <mailto:reed.d...@focusvq.com>> wrote: >> >> It does not appear that that page has been updated in a while. >> >> The official Ceph deb repos only include Mimic and Nautilus pac

Re: [ceph-users] Ubuntu 18.04 - Mimic - Nautilus

2019-07-10 Thread Reed Dier
It does not appear that that page has been updated in a while. The official Ceph deb repos only include Mimic and Nautilus packages for 18.04, While the Ubuntu-bionic repos include a Luminous build. Hope that helps. Reed > On Jul 10, 2019, at 1:20 PM, Edward Kalk wrote: > > When reviewing: ht

[ceph-users] Faux-Jewel Client Features

2019-07-02 Thread Reed Dier
Hi all, Starting to make preparations for Nautilus upgrades from Mimic, and I'm looking over my client/session features and trying to fully grasp the situation. > $ ceph versions > { > "mon": { > "ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic > (stable)": 3

Re: [ceph-users] obj_size_info_mismatch error handling

2019-06-06 Thread Reed Dier
gt; > PG Repair doesn't fix the inconsistency, nor does Brad's omap > workaround earlier in the thread. > In our case, we can fix by cp'ing the file to a new inode, deleting > the inconsistent file, then scrubbing the PG. > > -- Dan > > > On Fri, May 3,

Re: [ceph-users] performance in a small cluster

2019-05-31 Thread Reed Dier
Is there any other evidence of this? I have 20 5100 MAX (MTFDDAK1T9TCC) and have not experienced any real issues with them. I would pick my Samsung SM863a's or any of my Intel's over the Micron's, but I haven't seen the Micron's cause any issues for me. For what its worth, they are all FW D0MU02

Re: [ceph-users] obj_size_info_mismatch error handling

2019-05-03 Thread Reed Dier
is *correct* you could try just doing a rados > get followed by a rados put of the object to see if the size is > updated correctly. > > It's more likely the object info size is wrong IMHO. > >> >> On Tue, Apr 30, 2019 at 1:06 AM Reed Dier wrote: >>>

[ceph-users] obj_size_info_mismatch error handling

2019-04-29 Thread Reed Dier
Hi list, Woke up this morning to two PG's reporting scrub errors, in a way that I haven't seen before. > $ ceph versions > { > "mon": { > "ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic > (stable)": 3 > }, > "mgr": { > "ceph version 13.2.5 (cbff8

Re: [ceph-users] SSD Recovery Settings

2019-03-20 Thread Reed Dier
Grafana is the web frontend for creating the graphs. InfluxDB holds the time series data that Grafana pulls from. To collect data, I am using collectd daemons run

Re: [ceph-users] SSD Recovery Settings

2019-03-20 Thread Reed Dier
Not sure what your OSD config looks like, When I was moving from Filestore to Bluestore on my SSD OSD's (and NVMe FS journal to NVMe Bluestore block.db), I had an issue where the OSD was incorrectly being reported as rotational in some part of the chain. Once I overcame that, I had a huge boost

Re: [ceph-users] collectd problems with pools

2019-02-28 Thread Reed Dier
I've been collecting with collectd since Jewel, and experienced the growing pains when moving to Luminous and collectd-ceph needing to be reworked to support Luminous. It is also worth mentioning that in Luminous+ there is an Influx plugin for ceph-mgr that has some per pool statistics. Reed

Re: [ceph-users] Bionic Upgrade 12.2.10

2019-01-14 Thread Reed Dier
This is because Luminous is not being built for Bionic for whatever reason. There are some other mailing list entries detailing this. Right now you have ceph installed from the Ubuntu bionic-updates repo, which has 12.2.8, but does not get regular release updates. This is what I ended up having

Re: [ceph-users] Mimic 13.2.3?

2019-01-10 Thread Reed Dier
> Could I suggest building Luminous for Bionic +1 for Luminous on Bionic. Ran into issues with bionic upgrades, and had to eventually revert from the ceph repos to the Ubuntu repos where they have 12.2.8, which isn’t ideal. Reed > On Jan 9, 2019, at 10:27 AM, Matthew Vernon wrote: > > Hi, >

Re: [ceph-users] Mimic 13.2.3?

2019-01-04 Thread Reed Dier
Piggy backing for a +1 on this. Really would love if bad packages would be recalled, and also if packages would follow release announcement, rather than precede it. For anyone wondering, this is the likely changelog for 13.2.3 in case people want to know what is in it. https://github.com/ceph/c

Re: [ceph-users] Should ceph build against libcurl4 for Ubuntu 18.04 and later?

2018-12-13 Thread Reed Dier
Figured I would chime in as also having this issue. Moving from 16.04 to 18.04 on some OSD nodes. I have been using the ceph apt repo > deb https://download.ceph.com/debian-luminous/ xenial main During the release-upgrade, it can’t find a candidate package, and actually removes the ceph-osd pack

Re: [ceph-users] Favorite SSD

2018-09-17 Thread Reed Dier
SM863a were always good to me. Micron 5100 MAX are fine, but felt less consistent than the Samsung’s. Haven’t had any issues with Intel S4600. Intel S3710’s obviously not available anymore, but those were a crowd favorite. Micron 5200 line seems to not have a high endurance SKU like the 5100 line

Re: [ceph-users] cephfs kernel client hangs

2018-08-07 Thread Reed Dier
This is the first I am hearing about this as well. Granted, I am using ceph-fuse rather than the kernel client at this point, but that isn’t etched in stone. Curious if there is more to share. Reed > On Aug 7, 2018, at 9:47 AM, Webert de Souza Lima > wrote: > > > Yan, Zheng mailto:uker...@

Re: [ceph-users] Best way to replace OSD

2018-08-06 Thread Reed Dier
ndurance, a rebalance or two is peanuts > compared to your normal I/O. If you're not, then there's more than > enough write endurance in an SSD to handle daily rebalances for years. > > On 06/08/18 17:05, Reed Dier wrote: >> This has been my modus operandi when replacing dr

Re: [ceph-users] Best way to replace OSD

2018-08-06 Thread Reed Dier
This has been my modus operandi when replacing drives. Only having ~50 OSD’s for each drive type/pool, rebalancing can be a lengthy process, and in the case of SSD’s, shuffling data adds unnecessary write wear to the disks. When migrating from filestore to bluestore, I would actually forklift a

Re: [ceph-users] Ceph Balancer per Pool/Crush Unit

2018-08-03 Thread Reed Dier
ght-set reweight-compat 28 1.964446 > ceph osd crush weight-set reweight-compat 29 1.629001 > ceph osd crush weight-set reweight-compat 30 1.961968 > ceph osd crush weight-set reweight-compat 31 1.738253 > ceph osd crush weight-set reweight-compat 32 1.884098 > ceph osd crush weight-

[ceph-users] Ceph Balancer per Pool/Crush Unit

2018-08-01 Thread Reed Dier
Hi Cephers, I’m starting to play with the Ceph Balancer plugin after moving to straw2 and running into something I’m surprised I haven’t seen posted here. My cluster has two crush roots, one for HDD, one for SSD. Right now, HDD’s are a single pool to themselves, SSD’s are a single pool to them

Re: [ceph-users] separate monitoring node

2018-06-22 Thread Reed Dier
> On Jun 22, 2018, at 2:14 AM, Stefan Kooman wrote: > > Just checking here: Are you using the telegraf ceph plugin on the nodes? > In that case you _are_ duplicating data. But the good news is that you > don't need to. There is a Ceph mgr telegraf plugin now (mimic) which > also works on luminou

Re: [ceph-users] SSD Bluestore Backfills Slow

2018-06-04 Thread Reed Dier
Appreciate the input. Wasn’t sure if ceph-volume was the one setting these bits of metadata or something else. Appreciate the help guys. Thanks, Reed > The fix is in core Ceph (the OSD/BlueStore code), not ceph-volume. :) > journal_rotational is still a thing in BlueStore; it represents the

Re: [ceph-users] SSD Bluestore Backfills Slow

2018-06-04 Thread Reed Dier
r with 10TB SATA HDD's which each have a 100GB SSD based > block.db > > Looking at ceph osd metadata for each of those: > > "bluefs_db_model": "SAMSUNG MZ7KM960", > "bluefs_db_rotational": "0", > "b

Re: [ceph-users] Luminous cluster - how to find out which clients are still jewel?

2018-05-29 Thread Reed Dier
Possibly helpful, If you are able to hit your ceph-mgr dashboard in a web browser, I find it possible to see a table of currently connected cephfs clients, hostnames, state, type (userspace/kernel), and ceph version. Assuming that the link is persistent, for me the url is ceph-mgr:7000/clients/

Re: [ceph-users] Show and Tell: Grafana cluster dashboard

2018-05-07 Thread Reed Dier
I’ll +1 on InfluxDB rather than Prometheus, though I think having a version for each infrastructure path would be best. I’m sure plenty here have existing InfluxDB infrastructure as their TSDB of choice, and moving to Prometheus would be less advantageous. Conversely, I’m sure all of the Prometh

[ceph-users] ceph-mgr balancer getting started

2018-04-12 Thread Reed Dier
Hi ceph-users, I am trying to figure out how to go about making ceph balancer do its magic, as I have some pretty unbalanced distribution across osd’s currently, both SSD and HDD. Cluster is 12.2.4 on Ubuntu 16.04. All OSD’s have been migrated to bluestore. Specifically, my HDD’s are the main

Re: [ceph-users] Disk write cache - safe?

2018-03-14 Thread Reed Dier
Tim, I can corroborate David’s sentiments as it pertains to being a disaster. In the early days of my Ceph cluster, I had 8TB SAS drives behind an LSI RAID controller as RAID0 volumes (no IT mode), with on-drive write-caching enabled (pdcache=default). I subsequently had my the data center whe

Re: [ceph-users] ceph-mds suicide on upgrade

2018-03-12 Thread Reed Dier
/pipermail/ceph-users-ceph.com/2018-February/025092.html > <http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-February/025092.html> > > Might be of interest. > > Dietmar > > Am 12. März 2018 18:19:51 MEZ schrieb Reed Dier : > Figured I would see if anyone has

[ceph-users] ceph-mds suicide on upgrade

2018-03-12 Thread Reed Dier
Figured I would see if anyone has seen this or can see something I am doing wrong. Upgrading all of my daemons from 12.2.2. to 12.2.4. Followed the documentation, upgraded mons, mgrs, osds, then mds’s in that order. All was fine, until the MDSs. I have two MDS’s in Active:Standby config. I dec

Re: [ceph-users] SSD Bluestore Backfills Slow

2018-02-26 Thread Reed Dier
at I was able to use to figure out the issue. Adding to ceph.conf for future OSD conversions. Thanks, Reed > On Feb 26, 2018, at 4:12 PM, Reed Dier wrote: > > For the record, I am not seeing a demonstrative fix by injecting the value of > 0 into the OSDs running. >> osd_recove

Re: [ceph-users] SSD Bluestore Backfills Slow

2018-02-26 Thread Reed Dier
that should affect this issue. I could also attempt to change osd_recovery_sleep_hdd as well, since these are ssd osd’s, it shouldn’t make a difference, but its a free move. Thanks, Reed > On Feb 26, 2018, at 3:42 PM, Gregory Farnum wrote: > > On Mon, Feb 26, 2018 at 12:26 PM Reed Di

Re: [ceph-users] SSD Bluestore Backfills Slow

2018-02-26 Thread Reed Dier
_front_addr": "", > "hostname": “host00", > "journal_rotational": "1", > "kernel_description": "#29~16.04.2-Ubuntu SMP Tue Jan 9 22:00:44 UTC > 2018", > "kernel_version"

Re: [ceph-users] SSD Bluestore Backfills Slow

2018-02-26 Thread Reed Dier
"journal_rotational": "1", > "rotational": “0" > "id": 59, > "journal_rotational": "0", > "rotational": “0" I wonder if it matters/is correct to see "journal_rotational": “1

Re: [ceph-users] SSD Bluestore Backfills Slow

2018-02-26 Thread Reed Dier
ice class is configured correctly as far as I know, it all shows as ssd/hdd correctly in ceph osd tree. So hopefully this may be enough of a smoking gun to help narrow down where this may be stemming from. Thanks, Reed > On Feb 23, 2018, at 10:04 AM, David Turner wrote: > > Here

Re: [ceph-users] SSD Bluestore Backfills Slow

2018-02-23 Thread Reed Dier
/s wr Don’t mean to clutter the ML/thread, however it did seem odd, maybe its a culprit? Maybe its some weird sampling interval issue thats been solved in 12.2.3? Thanks, Reed > On Feb 23, 2018, at 8:26 AM, Reed Dier wrote: > > Below is ceph -s > >> cluster: >

Re: [ceph-users] SSD Bluestore Backfills Slow

2018-02-23 Thread Reed Dier
u're seeing. (eg, it could be that reading out the omaps is > expensive, so you can get higher recovery op numbers by turning down the > number of entries per request, but not actually see faster backfilling > because you have to issue more requests.) > -Greg > > On Wed,

[ceph-users] SSD Bluestore Backfills Slow

2018-02-21 Thread Reed Dier
Hi all, I am running into an odd situation that I cannot easily explain. I am currently in the midst of destroy and rebuild of OSDs from filestore to bluestore. With my HDDs, I am seeing expected behavior, but with my SSDs I am seeing unexpected behavior. The HDDs and SSDs are set in crush accor

Re: [ceph-users] Is there a "set pool readonly" command?

2018-02-12 Thread Reed Dier
I do know that there is a pause flag in Ceph. What I do not know is if that also pauses recovery traffic, in addition to client traffic. Also worth mentioning, this is a cluster-wide flag, not a pool level flag. Reed > On Feb 11, 2018, at 11:45 AM, David Turner wrote: > > If you set min_size

Re: [ceph-users] Migrating filestore to bluestore using ceph-volume

2018-01-26 Thread Reed Dier
0 0 00 0 osd.6 > 7 0 0 0 0 0 00 0 osd.7 > > I guess I can just remove them from crush,auth and rm them? > > Kind Regards, > > David Majchrzak > >> 26 jan. 2018 kl. 18:09 skrev Reed Dier > <mail

Re: [ceph-users] Migrating filestore to bluestore using ceph-volume

2018-01-26 Thread Reed Dier
This is the exact issue that I ran into when starting my bluestore conversion journey. See my thread here: https://www.spinics.net/lists/ceph-users/msg41802.html Specifying --osd-id causes it to fail. Below are my steps for OSD replace/m

Re: [ceph-users] filestore to bluestore: osdmap epoch problem and is the documentation correct?

2018-01-11 Thread Reed Dier
Jan 11, 2018 12:22 PM, "Reed Dier" <mailto:reed.d...@focusvq.com>> wrote: > I am in the process of migrating my OSDs to bluestore finally and thought I > would give you some input on how I am approaching it. > Some of saga you can find in another ML thread here: >

Re: [ceph-users] filestore to bluestore: osdmap epoch problem and is the documentation correct?

2018-01-11 Thread Reed Dier
I am in the process of migrating my OSDs to bluestore finally and thought I would give you some input on how I am approaching it. Some of saga you can find in another ML thread here: https://www.spinics.net/lists/ceph-users/msg41802.html

Re: [ceph-users] Ceph MGR Influx plugin 12.2.2

2018-01-11 Thread Reed Dier
he loops that create data points and apply it to every point > created in loops through stats. Of course we'll feed that back > upstream when we get to it and assuming it is still an issue in the > current code. > > thanks, > Ben > > On Thu, Jan 11, 2018 at 2:04 AM,

[ceph-users] Ceph MGR Influx plugin 12.2.2

2018-01-10 Thread Reed Dier
Hi all, Does anyone have any idea if the influx plugin for ceph-mgr is stable in 12.2.2? Would love to ditch collectd and report directly from ceph if that is the case. Documentation says that it is added in Mimic/13.x, however it looks like from an earlier ML post that it would be coming to Lu

Re: [ceph-users] OSD Bluestore Migration Issues

2018-01-09 Thread Reed Dier
ore is the default and filestore requires intervention. Thanks, Reed > On Jan 9, 2018, at 2:10 PM, Reed Dier wrote: > >> -221.81000 host node24 >> 0 hdd 7.26999 osd.0 destroyed0 >> 1.0 &g

Re: [ceph-users] OSD Bluestore Migration Issues

2018-01-09 Thread Reed Dier
ng removed from the crush map. Thanks, Reed > On Jan 9, 2018, at 2:05 PM, Alfredo Deza wrote: > > On Tue, Jan 9, 2018 at 2:19 PM, Reed Dier <mailto:reed.d...@focusvq.com>> wrote: >> Hi ceph-users, >> >> Hoping that this is something small that I am overlo

[ceph-users] OSD Bluestore Migration Issues

2018-01-09 Thread Reed Dier
Hi ceph-users, Hoping that this is something small that I am overlooking, but could use the group mind to help. Ceph 12.2.2, Ubuntu 16.04 environment. OSD (0) is an 8TB spinner (/dev/sda) and I am moving from a filestore journal to a blocks.db and WAL device on an NVMe partition (/dev/nvme0n1p5

Re: [ceph-users] ceph-volume lvm deactivate/destroy/zap

2018-01-09 Thread Reed Dier
I would just like to mirror what Dan van der Ster’s sentiments are. As someone attempting to move an OSD to bluestore, with limited/no LVM experience, it is a completely different beast and complexity level compared to the ceph-disk/filestore days. ceph-deploy was a very simple tool that did ex

Re: [ceph-users] CephFS log jam prevention

2017-12-07 Thread Reed Dier
Reed > On Dec 5, 2017, at 4:02 PM, Patrick Donnelly wrote: > > On Tue, Dec 5, 2017 at 8:07 AM, Reed Dier wrote: >> Been trying to do a fairly large rsync onto a 3x replicated, filestore HDD >> backed CephFS pool. >> >> Luminous 12.2.1 for all daemons, kernel

[ceph-users] CephFS log jam prevention

2017-12-05 Thread Reed Dier
Been trying to do a fairly large rsync onto a 3x replicated, filestore HDD backed CephFS pool. Luminous 12.2.1 for all daemons, kernel CephFS driver, Ubuntu 16.04 running mix of 4.8 and 4.10 kernels, 2x10GbE networking between all daemons and clients. > $ ceph versions > { > "mon": { >

Re: [ceph-users] CephFS metadata pool to SSDs

2017-10-13 Thread Reed Dier
e showing 23 P/E cycles so far. Thanks again, Reed > On Oct 12, 2017, at 4:18 PM, John Spray wrote: > > On Thu, Oct 12, 2017 at 9:34 PM, Reed Dier wrote: >> I found an older ML entry from 2015 and not much else, mostly detailing the >> doing performance testing to dispel

[ceph-users] CephFS metadata pool to SSDs

2017-10-12 Thread Reed Dier
I found an older ML entry from 2015 and not much else, mostly detailing the doing performance testing to dispel poor performance numbers presented by OP. Currently have the metadata pool on my slow 24 HDDs, and am curious if I should see any increased performance with CephFS by moving the metada

Re: [ceph-users] min_size & hybrid OSD latency

2017-10-11 Thread Reed Dier
Just for the sake of putting this in the public forum, In theory, by placing the primary copy of the object on an SSD medium, and placing replica copies on HDD medium, it should still yield some improvement in writes, compared to an all HDD scenario. My logic here is rooted in the idea that the

Re: [ceph-users] Ceph monitoring

2017-10-02 Thread Reed Dier
As someone currently running collectd/influxdb/grafana stack for monitoring, I am curious if anyone has seen issues moving Jewel -> Luminous. I thought I remembered reading that collectd wasn’t working perfectly in Luminous, likely not helped with the MGR daemon. Also thought about trying teleg

Re: [ceph-users] Watch for fstrim running on your Ubuntu systems

2017-07-06 Thread Reed Dier
00% blocked for about 5min for 16GB > trimmed), and works just fine with firmware M017 (4s for 32GB trimmed). So > maybe you just need an update. > > Peter > > > > On 07/06/17 18:39, Reed Dier wrote: >> Hi Wido, >> >> I came across this ancient

Re: [ceph-users] Watch for fstrim running on your Ubuntu systems

2017-07-06 Thread Reed Dier
Hi Wido, I came across this ancient ML entry with no responses and wanted to follow up with you to see if you recalled any solution to this. Copying the ceph-users list to preserve any replies that may result for archival. I have a couple of boxes with 10x Micron 5100 SATA SSD’s, journaled on M

Re: [ceph-users] Ideas on the UI/UX improvement of ceph-mgr: Cluster Status Dashboard

2017-06-29 Thread Reed Dier
I’d like to see per pool iops/usage, et al. Being able to see rados vs rbd vs whatever else performance, or pools with different backing mediums and see which workloads result in what performance. Most of this I pretty well cobble together with collectd, but it would still be nice to have out o

Re: [ceph-users] Changing SSD Landscape

2017-06-08 Thread Reed Dier
g > interesting/comparable in the Samsung range... > > > On Wed, May 17, 2017 at 5:03 PM, Reed Dier wrote: >> Agreed, the issue I have seen is that the P4800X (Optane) is demonstrably >> more expensive than the P3700 for a roughly equivalent amount of storage >> space (

Re: [ceph-users] OSD scrub during recovery

2017-05-30 Thread Reed Dier
. Either way, make sense, and thanks for the insight. And don’t worry Wido, they aren’t SMR drives! Thanks, Reed > On May 30, 2017, at 11:03 AM, Wido den Hollander wrote: > >> >> Op 30 mei 2017 om 17:37 schreef Reed Dier : >> >> >> Lost an OSD and having

[ceph-users] OSD scrub during recovery

2017-05-30 Thread Reed Dier
Lost an OSD and having to rebuild it. 8TB drive, so it has to backfill a ton of data. Been taking a while, so looked at ceph -s and noticed that deep/scrubs were running even though I’m running newest Jewel (10.2.7) and OSD’s have the osd_scrub_during_recovery set to false. > $ cat /etc/ceph/ce

Re: [ceph-users] Changing SSD Landscape

2017-05-18 Thread Reed Dier
> BTW, you asked about Samsung parts earlier. We are running these > SM863's in a block storage cluster: > > Model Family: Samsung based SSDs > Device Model: SAMSUNG MZ7KM240HAGR-0E005 > Firmware Version: GXM1003Q > > > 177 Wear_Leveling_Count 0x0013 094 094 005Pre-fail >

Re: [ceph-users] Changing SSD Landscape

2017-05-17 Thread Reed Dier
Agreed, the issue I have seen is that the P4800X (Optane) is demonstrably more expensive than the P3700 for a roughly equivalent amount of storage space (400G v 375G). However, the P4800X is perfectly suited to a Ceph environment, with 30 DWPD, or 12.3 PBW. And on top of that, it seems to gener

Re: [ceph-users] Power Failure

2017-05-02 Thread Reed Dier
One scenario I can offer here as it relates to powercut/hard shutdown. I had my data center get struck by lightning very early on in my Ceph lifespan when I was testing and evaluating. I had 8 OSD’s on 8 hosts, and each OSD was a RAID0 (single) vd on my LSI RAID controller. On the RAID controll

Re: [ceph-users] Sharing SSD journals and SSD drive choice

2017-04-26 Thread Reed Dier
Hi Adam, How did you settle on the P3608 vs say the P3600 or P3700 for journals? And also the 1.6T size? Seems overkill, unless its pulling double duty beyond OSD journals. Only improvement over the P3x00 is the move from x4 lanes to x8 lanes on the PCIe bus, but the P3600/P3700 offer much mor

Re: [ceph-users] Adding New OSD Problem

2017-04-25 Thread Reed Dier
Others will likely be able to provide some better responses, but I’ll take a shot to see if anything makes sense. With 10.2.6 you should be able to set 'osd scrub during recovery’ to false to prevent any new scrubs from occurring during a recovery event. Current scrubs will complete, but future

Re: [ceph-users] SSD Primary Affinity

2017-04-19 Thread Reed Dier
In this case the spinners have their journals on an NVMe drive, 3 OSD : 1 NVMe Journal. Will be trying tomorrow to get some benchmarks and compare some hdd/ssd/hybrid workloads to see performance differences across the three backing layers. Most client traffic is read oriented to begin with, so

Re: [ceph-users] SSD Primary Affinity

2017-04-19 Thread Reed Dier
Hi Maxime, This is a very interesting concept. Instead of the primary affinity being used to choose SSD for primary copy, you set crush rule to first choose an osd in the ‘ssd-root’, then the ‘hdd-root’ for the second set. And with 'step chooseleaf first {num}’ > If {num} > 0 && < pool-num-repl

[ceph-users] SSD Primary Affinity

2017-04-17 Thread Reed Dier
Hi all, I am looking at a way to scale performance and usable space using something like Primary Affinity to effectively use 3x replication across 1 primary SSD OSD, and 2 replicated HDD OSD’s. Assuming production level, we would keep a pretty close 1:2 SSD:HDD ratio, but looking to experiment

[ceph-users] Strange crush / ceph-deploy issue

2017-03-31 Thread Reed Dier
Trying to add a batch of OSD’s to my cluster, (Jewel 10.2.6, Ubuntu 16.04) 2 new nodes (ceph01,ceph02), 10 OSD’s per node. I am trying to steer the OSD’s into a different root pool with crush location set in ceph.conf with > [osd.34] > crush_location = "host=ceph01 rack=ssd.rack2 root=ssd" > >

Re: [ceph-users] Ceph PG repair

2017-03-08 Thread Reed Dier
/{object.name} > 55a76349b758d68945e5028784c59f24 > /var/lib/ceph/osd/ceph-22/current/10.2d8_head/DIR_8/DIR_D/DIR_2/DIR_4/DIR_4/DIR_A/{object.name} So is the object actually inconsistent? Is rados somehow behind on something, not showing the third inconsistent PG? Appreciate any help. Reed

[ceph-users] Ceph PG repair

2017-03-02 Thread Reed Dier
Over the weekend, two inconsistent PG’s popped up in my cluster. This being after having scrubs disabled for close to 6 weeks after a very long rebalance after adding 33% more OSD’s, an OSD failing, increasing PG’s, etc. It appears we came out the other end with 2 inconsistent PG’s and I’m tryin

[ceph-users] Backfill/recovery prioritization

2017-02-01 Thread Reed Dier
Have a smallish cluster that has been expanding with almost a 50% increase in the number of OSD (16->24). This has caused some issues with data integrity and cluster performance as we have increased PG count, and added OSDs. 8x nodes with 3x drives, connected over 2x10G. My problem is that I h

Re: [ceph-users] OSD create with SSD journal

2017-01-11 Thread Reed Dier
uot;ceph-users on behalf of Reed Dier" > > wrote: > >>> 2017-01-03 12:10:23.514577 7f1d821f2800 0 ceph version 10.2.5 >>> (c461ee19ecbc0c5c330aca20f7392c9a00730367), process ceph-osd, pid 19754 >>> 2017-01-03 12:10:23.517465 7f1d821f2800 1 >>> fi

[ceph-users] OSD create with SSD journal

2017-01-11 Thread Reed Dier
So I was attempting to add an OSD to my ceph-cluster (running Jewel 10.2.5), using ceph-deploy (1.5.35), on Ubuntu. I have 2 OSD’s on this node, attempting to add third. The first two OSD’s I created with on-disk journals, then later moved them to partitions on the NVMe system disk (Intel P3600

Re: [ceph-users] High load on OSD processes

2016-12-09 Thread Reed Dier
t; So, just like Diego, do you know if there is a fix for this yet and when it > might be available on the repo? Should I try to install the prior minor > release version for now? > > Thank you for the information. > > Have a good day, > > Lewis George > >

Re: [ceph-users] High load on OSD processes

2016-12-09 Thread Reed Dier
Assuming you deployed within the last 48 hours, I’m going to bet you are using v10.2.4 which has an issue that causes high cpu utilization. Should see large ramp up in loadav after 15 minutes exactly. See mailing list thread here: https://www.mail-archive.com/ceph-users@lists.ceph.com/msg34390.

Re: [ceph-users] Interpretation Guidance for Slow Requests

2016-12-06 Thread Reed Dier
> On Dec 5, 2016, at 9:42 PM, Christian Balzer wrote: > > > Hello, > > On Tue, 6 Dec 2016 03:37:32 +0100 Christian Theune wrote: > >> Hi Christian (heh), >> >> thanks for picking this up. :) >> >> This has become a rather long post as I added more details and giving >> our history, but if w

Re: [ceph-users] Migrate OSD Journal to SSD

2016-12-02 Thread Reed Dier
> On Dec 1, 2016, at 6:26 PM, Christian Balzer wrote: > > On Thu, 1 Dec 2016 18:06:38 -0600 Reed Dier wrote: > >> Apologies if this has been asked dozens of times before, but most answers >> are from pre-Jewel days, and want to double check that the methodology still

[ceph-users] Migrate OSD Journal to SSD

2016-12-01 Thread Reed Dier
Apologies if this has been asked dozens of times before, but most answers are from pre-Jewel days, and want to double check that the methodology still holds. Currently have 16 OSD’s across 8 machines with on-disk journals, created using ceph-deploy. These machines have NVMe storage (Intel P3600

[ceph-users] CephFS in existing pool namespace

2016-10-27 Thread Reed Dier
Looking to add CephFS into our Ceph cluster (10.2.3), and trying to plan for that addition. Currently only using RADOS on a single replicated, non-EC, pool, no RBD or RGW, and segmenting logically in namespaces. No auth scoping at this time, but likely something we will be moving to in the fut

Re: [ceph-users] New cephfs cluster performance issues- Jewel - cache pressure, capability release, poor iostat await avg queue size

2016-10-21 Thread Reed Dier
> On Oct 19, 2016, at 7:54 PM, Christian Balzer wrote: > > > Hello, > > On Wed, 19 Oct 2016 12:28:28 + Jim Kilborn wrote: > >> I have setup a new linux cluster to allow migration from our old SAN based >> cluster to a new cluster with ceph. >> All systems running centos 7.2 with the 3.10

Re: [ceph-users] OSD won't come back "UP"

2016-10-07 Thread Reed Dier
ull 8TB disk. > Filesystem1K-blocksUsed Available Use% Mounted on > /dev/sda17806165996 1953556296 5852609700 26% > /var/lib/ceph/osd/ceph-0 Reed > On Oct 7, 2016, at 7:33 PM, Reed Dier wrote: > > Attempting to adjust parameters of some of my r

[ceph-users] OSD won't come back "UP"

2016-10-07 Thread Reed Dier
7 19:39:30.515618 7fd39aced700 0 mon.core@0(leader).osd e4363 > create-or-move crush item name 'osd.0' initial_weight 7.2701 at location > {host=node24,root=default} > 2016-10-07 19:41:59.714517 7fd39b4ee700 0 log_channel(cluster) log [INF] : > osd.0 out (down for 338.148761) Everything running latest Jewel release > ceph --version > ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b) Any help with this is extremely appreciated. Hoping someone has dealt with this before. Reed Dier ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Recovery/Backfill Speedup

2016-10-04 Thread Reed Dier
Attempting to expand our small ceph cluster currently. Have 8 nodes, 3 mons, and went from a single 8TB disk per node to 2x 8TB disks per node, and the rebalancing process is excruciatingly slow. Originally at 576 PGs before expansion, and wanted to allow rebalance to finish before expanding th

  1   2   >