Re: [ceph-users] migrating cephfs data and metadat to new pools

2017-08-21 Thread David Turner
Why do you need to move the data between pools? My guess is that for your needs you can add another pool to the FS and do something with it that way. Then you are using the same MDS servers and the same FS. I would probably recommend doing the copy using mounted filesystems instead of a rados copy.

Re: [ceph-users] NVMe + SSD + HDD RBD Replicas with Bluestore...

2017-08-21 Thread David Turner
I don't understand why min_size = 2 would kill latency times. Regardless of your min_size, a write to ceph does not ack until it completes to all copies. That means that even with min_size = 1 the write will not be successful until it's written to the NVME, the SSD, and the HDD (given your propos

Re: [ceph-users] Small-cluster performance issues

2017-08-22 Thread David Turner
I would run some benchmarking throughout the cluster environment to see where your bottlenecks are before putting time and money into something that might not be your limiting resource. Sebastian Han put together a great guide for benchmarking your cluster here. https://www.sebastien-han.fr/blog/

Re: [ceph-users] OSD doesn't always start at boot

2017-08-23 Thread David Turner
This isn't a solution to fix them not starting at boot time, but a fix to not having to reboot the node again. `ceph-disk activate-all` should go through and start up the rest of your osds without another reboot. On Wed, Aug 23, 2017 at 9:36 AM Sean Purdy wrote: > Hi, > > Luminous 12.1.1 > > I'

Re: [ceph-users] Ruleset vs replica count

2017-08-24 Thread David Turner
> min_size 1 STOP THE MADNESS. Search the ML to realize why you should never user a min_size of 1. I'm curious as well as to what this sort of configuration will do for how many copies are stored between DCs. On Thu, Aug 24, 2017 at 1:03 PM Sinan Polat wrote: > Hi, > > > > In a Multi Datacente

Re: [ceph-users] RGW Multisite metadata sync init

2017-08-24 Thread David Turner
s. On Thu, Aug 24, 2017 at 10:59 AM David Turner wrote: > I have a RGW Multisite 10.2.7 set up for bi-directional syncing. This has > been operational for 5 months and working fine. I recently created a new > user on the master zone, used that user to create a bucket, and put in a >

[ceph-users] RGW Multisite metadata sync init

2017-08-24 Thread David Turner
t this point)? Thank you, David Turner ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW multisite sync data sync shard stuck

2017-08-24 Thread David Turner
Andreas, did you find a solution to your multisite sync issues with the stuck shards? I'm also on 10.2.7 and having this problem. One realm has stuck shards for data sync and another realm says it's up to date, but isn't receiving new users via metadata sync. I ran metadata sync init on it and i

Re: [ceph-users] NVMe + SSD + HDD RBD Replicas with Bluestore...

2017-08-24 Thread David Turner
erformance on large reads, and lose more > performance on small writes/reads (dependent on cpu speed and various other > factors). > > Mark > > > > > Anyway, thanks for the info! > > Xavier. > > > > -Mensaje original- > > De: Christian Bal

Re: [ceph-users] RGW Multisite metadata sync init

2017-08-24 Thread David Turner
same page. Does anyone know if that command will overwrite any local data that the zone has that the other doesn't if you run `data sync init` on it? On Thu, Aug 24, 2017 at 1:51 PM David Turner wrote: > After restarting the 2 RGW daemons on the second site again, everything > caug

Re: [ceph-users] Monitoring a rbd map rbd connection

2017-08-25 Thread David Turner
Additionally, solely testing if you can write to the path could give a false sense of security if the path is writable when the RBD is not mounted. It would write a file to the system drive and you would see it as successful. On Fri, Aug 25, 2017 at 2:27 AM Adrian Saul wrote: > If you are monit

Re: [ceph-users] State of play for RDMA on Luminous

2017-08-28 Thread David Disseldorp
ble alongside a tcmu-runner backstore: http://linux-iscsi.org/wiki/ISCSI_Extensions_for_RDMA I'm not aware of any testing of this combination though. Cheers, David pgpnwbOwz5naj.pgp Description: OpenPGP digital signature ___ ceph-users mailing list

Re: [ceph-users] RGW Multisite metadata sync init

2017-08-28 Thread David Turner
PM Casey Bodley wrote: > Hi David, > > The 'data sync init' command won't touch any actual object data, no. > Resetting the data sync status will just cause a zone to restart a full > sync of the --source-zone's data changes log. This log only lists which &

Re: [ceph-users] pros/cons of multiple OSD's per host

2017-08-28 Thread David Turner
In your example of EC 5 + 3, your min_size is 5. As long as you have 5 hosts up, you should still be serving content. My home cluster uses 2+1 and has 3 nodes. I can reboot any node (leaving 2 online) as long as the PGs in the cluster are healthy. If I were to actually lose a node, I would have to

Re: [ceph-users] Power outages!!! help!

2017-08-29 Thread David Turner
To addend Steve's success, the rbd was created in a second cluster in the same datacenter so it didn't run the risk of deadlocking that mapping rbds on machines running osds has. It is still theoretical to work on the same cluster, but more inherently dangerous for a few reasons. On Tue, Aug 29,

Re: [ceph-users] Power outages!!! help!

2017-08-29 Thread David Turner
But it was absolutely awesome to run an osd off of an rbd after the disk failed. On Tue, Aug 29, 2017, 1:42 PM David Turner wrote: > To addend Steve's success, the rbd was created in a second cluster in the > same datacenter so it didn't run the risk of deadlocking that m

Re: [ceph-users] OSD's flapping on ordinary scrub with cluster being static (after upgrade to 12.1.1

2017-08-29 Thread David Zafman
LTS release happened today, so 12.2.0 is the best thing to run as of now. See if any existing bugs like http://tracker.ceph.com/issues/21142 are related. David On 8/29/17 8:24 AM, Tomasz Kusmierz wrote: So nobody has any clue on this one ??? Should I go with this one to dev mailing list

Re: [ceph-users] Ceph re-ip of OSD node

2017-08-30 Thread David Turner
ALL OSDs need to be running the same private network at the same time. ALL clients, RGW, OSD, MON, MGR, MDS, etc, etc need to be running on the same public network at the same time. You cannot do this as a one at a time migration to the new IP space. Even if all of the servers can still communic

Re: [ceph-users] Changing the failure domain

2017-08-31 Thread David Turner
How long are you seeing these blocked requests for? Initially or perpetually? Changing the failure domain causes all PGs to peer at the same time. This would be the cause if it happens really quickly. There is no way to avoid all of them peering while making a change like this. After that, It

Re: [ceph-users] RGW Multisite metadata sync init

2017-08-31 Thread David Turner
": "failed to sync bucket instance: (5) Input\/output error" 65 "message": "failed to sync object" On Tue, Aug 29, 2017 at 10:00 AM Orit Wasserman wrote: > > Hi David, > > On Mon, Aug 28, 2017 at 8:33 PM, David Turner > wro

Re: [ceph-users] Possible way to clean up leaked multipart objects?

2017-08-31 Thread David Turner
Jewel 10.2.7. I found a discrepancy in object counts for a multisite configuration and it's looking like it might be orphaned multipart files causing it. It doesn't look like this PR has received much attention. Is there anything I can do to help you with testing/confirming a use case for this t

Re: [ceph-users] Changing the failure domain

2017-09-01 Thread David Turner
drive that you're reading from/writing to in sectors that are bad or at least slower. On Fri, Sep 1, 2017, 6:13 AM Laszlo Budai wrote: > Hi David, > > Well, most probably the larger part of our PGs will have to be > reorganized, as we are moving from 9 hosts to 3 chassis. But I

Re: [ceph-users] Changing the failure domain

2017-09-01 Thread David Turner
appearing only > during the backfill. I will try to dig deeper into the IO operations at the > next test. > > Kind regards, > Laszlo > > > > On 01.09.2017 16:08, David Turner wrote: > > That is normal to have backfilling because the crush map did change. The >

Re: [ceph-users] crushmap rule for not using all buckets

2017-09-04 Thread David Turner
I am unaware of any way to accomplish having 1 pool with all 3 racks and another pool with only 2 of them. If you could put the same osd in 2 different roots or have a crush rule choose from 2 different roots, then this might work out. To my knowledge neither of these is possible. What is your rea

Re: [ceph-users] ceph pgs state forever stale+active+clean

2017-09-04 Thread David Turner
gt; peering -> remapped -> >> active+clean) and ceph become health_ok >> >> ceph cluster become health_ok eventually, but in this time there was a >> problem that rbd can not found rbd images like below. >> # rbd ls -p volumes >> hhvol01 >> # rbd info

Re: [ceph-users] Ceph re-ip of OSD node

2017-09-05 Thread David Turner
d, Aug 30, 2017 at 3:55 PM Jeremy Hanmer > wrote: > >> This is simply not true. We run quite a few ceph clusters with >> rack-level layer2 domains (thus routing between racks) and everything >> works great. >> >> On Wed, Aug 30, 2017 at 10:52 AM, David Turner >

Re: [ceph-users] ceph OSD journal (with dmcrypt) replacement

2017-09-05 Thread David Turner
Did the journal drive fail during operation? Or was it taken out during pre-failure. If it fully failed, then most likely you can't guarantee the consistency of the underlying osds. In this case, you just put the affected osds and add them back in as new osds. In the case of having good data on th

Re: [ceph-users] RGW Multisite metadata sync init

2017-09-07 Thread David Turner
hat is preventing the metadata from syncing in the other realm? I have 2 realms being sync using multi-site and it's only 1 of them that isn't getting the metadata across. As far as I can tell it is configured identically. On Thu, Aug 31, 2017 at 12:46 PM David Turner wrote: > All

Re: [ceph-users] Separate WAL and DB Partitions for existing OSDs ?

2017-09-07 Thread David Turner
On Filestore you would flush the journal and then after mapping the new journal device use the command to create the journal. I'm sure there's something similar for bluestore, but I don't have any experience with it yet. Is there a new command similar to flush and create for the WAL and DB? On T

Re: [ceph-users] Blocked requests

2017-09-07 Thread David Turner
`ceph health detail` will give a little more information into the blocked requests. Specifically which OSDs are the requests blocked on and how long have they actually been blocked (as opposed to '> 32 sec'). I usually find a pattern after watching that for a time and narrow things down to an OSD

Re: [ceph-users] Blocked requests

2017-09-07 Thread David Turner
To be fair, other times I have to go in and tweak configuration settings and timings to resolve chronic blocked requests. On Thu, Sep 7, 2017 at 1:32 PM David Turner wrote: > `ceph health detail` will give a little more information into the blocked > requests. Specifically which OSDs a

Re: [ceph-users] RGW Multisite metadata sync init

2017-09-07 Thread David Turner
ub wrote: > On Thu, Sep 7, 2017 at 7:44 PM, David Turner > wrote: > > Ok, I've been testing, investigating, researching, etc for the last week > and > > I don't have any problems with data syncing. The clients on one side are > > creating multipart objec

Re: [ceph-users] Blocked requests

2017-09-07 Thread David Turner
> > > 1 ops are blocked > 524.288 sec on osd.2 > > 1 ops are blocked > 262.144 sec on osd.2 > > 2 ops are blocked > 65.536 sec on osd.21 > > 9 ops are blocked > 1048.58 sec on osd.5 > > 9 ops are blocked > 524.288 sec on osd.5 > > 71 ops are blocke

Re: [ceph-users] RGW Multisite metadata sync init

2017-09-07 Thread David Turner
ucket both in `mdlog list`. On Thu, Sep 7, 2017 at 3:27 PM Yehuda Sadeh-Weinraub wrote: > On Thu, Sep 7, 2017 at 10:04 PM, David Turner > wrote: > > One realm is called public with a zonegroup called public-zg with a zone > for > > each datacenter. The second realm is c

Re: [ceph-users] RGW Multisite metadata sync init

2017-09-07 Thread David Turner
I'm pretty sure I'm using the cluster admin user/keyring. Is there any output that would be helpful? Period, zonegroup get, etc? On Thu, Sep 7, 2017 at 4:27 PM Yehuda Sadeh-Weinraub wrote: > On Thu, Sep 7, 2017 at 11:02 PM, David Turner > wrote: > > I created a test use

Re: [ceph-users] RGW Multisite metadata sync init

2017-09-07 Thread David Turner
I sent the output of all of the files including the logs to you. Thank you for your help so far. On Thu, Sep 7, 2017 at 4:48 PM Yehuda Sadeh-Weinraub wrote: > On Thu, Sep 7, 2017 at 11:37 PM, David Turner > wrote: > > I'm pretty sure I'm using the cluster admin user/

Re: [ceph-users] ceph OSD journal (with dmcrypt) replacement

2017-09-08 Thread David Turner
e as empty? > > On Wed, Sep 6, 2017 at 11:23 PM, M Ranga Swami Reddy > wrote: > > Thank you. Iam able to replace the dmcrypt journal successfully. > > > > On Sep 5, 2017 18:14, "David Turner" wrote: > >> > >> Did the journal drive fail during o

Re: [ceph-users] Significant uptick in inconsistent pgs in Jewel 10.2.9

2017-09-08 Thread David Zafman
h .1327 .dir.default.292886573.13181.12 remove" .dir.default.64449186.344176 has selected_object_info with "od 337cf025" so shards have "omap_digest_mismatch_oi" except for osd 990. The pg repair code will use osd.990 to fix the other 2 copies without further handling. D

Re: [ceph-users] Significant uptick in inconsistent pgs in Jewel 10.2.9

2017-09-08 Thread David Zafman
6.344176 get-omaphdr obj_header $ for i in $(ceph-objectstore-tool --data-path ... --pgid 5.3d40 .dir.default.64449186.344176 list-omap) do echo -n "${i}: " ceph-objectstore-tool --data-path ... .dir.default.292886573.13181.12 get-omap $i done key1: val1 key2: val2 key3: val3 David On

Re: [ceph-users] Ceph OSD crash starting up

2017-09-14 Thread David Turner
What do you mean by "updated crush map to 1"? Can you please provide a copy of your crush map and `ceph osd df`? On Wed, Sep 13, 2017 at 6:39 AM Gonzalo Aguilar Delgado < gagui...@aguilardelgado.com> wrote: > Hi, > > I'recently updated crush map to 1 and did all relocation of the pgs. At > the e

Re: [ceph-users] unknown PG state in a newly created pool.

2017-09-14 Thread David Turner
Did you configure your crush map to have that hierarchy of region, datacenter, room, row, rack, and chassis? If you're using the default crush map, then it has no idea about any of those places/locations. I don't know what the crush map would look like after using that syntax if the crush map did

Re: [ceph-users] Jewel -> Luminous upgrade, package install stopped all daemons

2017-09-14 Thread David Turner
d copy and paste the running command (viewable in ps) to know exactly what to run in the screens to start the daemons like this. On Wed, Sep 13, 2017 at 6:53 PM David wrote: > Hi All > > I did a Jewel -> Luminous upgrade on my dev cluster and it went very > smoothly. > > I

Re: [ceph-users] OSD_OUT_OF_ORDER_FULL even when the ratios are in order.

2017-09-14 Thread David Turner
The warning you are seeing is because those settings are out of order and it's showing you which ones are greater than the ones they should be. backfillfull_ratio is supposed to be higher than nearfull_ratio and osd_failsafe_full_ratio is supposed to be higher than full_ratio. nearfull_ratio is a

Re: [ceph-users] Ceph OSD crash starting up

2017-09-14 Thread David Turner
17.13 0.64 340 > 6 0.90919 1.0 931G 164G 766G 17.70 0.67 210 > TOTAL 4179G G 3067G 26.60 > MIN/MAX VAR: 0.64/2.32 STDDEV: 16.99 > > As I said I still have OSD1 intact so I can do whatever you need except > readding to the cluster. Since I don't

Re: [ceph-users] Some OSDs are down after Server reboot

2017-09-15 Thread David Turner
I have this issue with my NVMe OSDs, but not my HDD OSDs. I have 15 HDD's and 2 NVMe's in each host. We put most of the journals on one of the NVMe's and a few on the second, but added a small OSD partition to the second NVMe for RGW metadata pools. When restarting a server manually for testing,

Re: [ceph-users] Jewel -> Luminous upgrade, package install stopped all daemons

2017-09-15 Thread David Turner
rading the packages is causing a restart of the Ceph daemons, it is most definitely a bug and needs to be fixed. On Fri, Sep 15, 2017 at 4:48 PM David wrote: > Happy to report I got everything up to Luminous, used your tip to keep the > OSDs running, David, thanks again for that. > > I

Re: [ceph-users] Jewel -> Luminous upgrade, package install stopped all daemons

2017-09-15 Thread David Turner
S... good riddance! That's ridiculous! On Fri, Sep 15, 2017 at 6:06 PM Vasu Kulkarni wrote: > On Fri, Sep 15, 2017 at 2:10 PM, David Turner > wrote: > > I'm glad that worked for you to finish the upgrade. > > > > He has multiple MONs, but all of them are on n

Re: [ceph-users] Jewel -> Luminous upgrade, package install stopped all daemons

2017-09-15 Thread David Turner
start these (or guarantee that it will for those folks)? On Fri, Sep 15, 2017 at 6:49 PM Gregory Farnum wrote: > On Fri, Sep 15, 2017 at 3:34 PM David Turner > wrote: > >> I don't understand a single use case where I want updating my packages >> using yum, apt, etc to r

Re: [ceph-users] Rbd resize, refresh rescan

2017-09-18 Thread David Turner
I've never needed to do anything other than extend the partition and/or filesystem when I increased the size of an RBD. Particularly if I didn't partition the RBD I only needed to extend the filesystem. Which method are you mapping/mounting the RBD? Is it through a Hypervisor or just mapped to a

Re: [ceph-users] Rbd resize, refresh rescan

2017-09-18 Thread David Turner
ass, > which I don’t have) echo 1 > /sys/devices/rbd/21/refresh > > (I am trying to online increase the size via kvm, virtio disk in win > 2016) > > > -Original Message- > From: David Turner [mailto:drakonst...@gmail.com] > Sent: maandag 18 september 2017 22:42 >

Re: [ceph-users] Ceph OSD crash starting up

2017-09-19 Thread David Turner
Are you asking to add the osd back with its data or add it back in as a fresh osd. What is your `ceph status`? On Tue, Sep 19, 2017, 5:23 AM Gonzalo Aguilar Delgado < gagui...@aguilardelgado.com> wrote: > Hi David, > > Thank you for the great explanation of the weights, I th

Re: [ceph-users] Ceph OSD crash starting up

2017-09-19 Thread David Turner
do < gagui...@aguilardelgado.com> wrote: > Hi David, > > What I want is to add the OSD back with its data yes. But avoiding any > troubles that can happen from the time it was out. > > Is it possible? I suppose that some pg has been updated after. Will ceph > manage it gracefully? &

[ceph-users] OSD assert hit suicide timeout

2017-09-19 Thread David Turner
Just starting 3 nights ago we started seeing OSDs randomly going down in our cluster (Jewel 10.2.7). At first I saw that each OSD that was recently marked down in the cluster (`ceph osd dump | grep -E '^osd\.[0-9]+\s' | sort -nrk11` sorted list of OSDs by which OSDs have been marked down in the mo

Re: [ceph-users] Ceph fails to recover

2017-09-19 Thread David Turner
Can you please provide the output of `ceph status`, `ceph osd tree`, and `ceph health detail`? Thank you. On Tue, Sep 19, 2017 at 2:59 PM Jonas Jaszkowic < jonasjaszkowic.w...@gmail.com> wrote: > Hi all, > > I have setup a Ceph cluster consisting of one monitor, 32 OSD hosts (1 OSD > of size 320

Re: [ceph-users] Ceph fails to recover

2017-09-20 Thread David Turner
degraded, acting > [14,29,4,1,19,17,9,0,3,16,24,2] > pg 3.5b is active+recovery_wait+degraded, acting > [4,15,14,30,28,1,12,10,2,29,24,18] > pg 3.52 is active+recovery_wait+degraded, acting > [17,24,20,23,4,14,18,27,8,22,9,31] > pg 3.51 is active+recovery_wait+degraded

Re: [ceph-users] Ceph fails to recover

2017-09-20 Thread David Turner
ssible). Are there any important > options that I have to know? > > What is the best practice to deal with the issue recovery speed vs. > read/write speed during a recovery situation? Do you > have any suggestions/references/hints how to deal with such situations? > > > Am 20.09.2

Re: [ceph-users] Ceph fails to recover

2017-09-20 Thread David Turner
admin socket to see your currently running settings to make sure that they took effect. http://docs.ceph.com/docs/kraken/rados/operations/monitoring/#using-the-admin-socket On Wed, Sep 20, 2017 at 11:42 AM David Turner wrote: > You are currently on Kraken, but if you upgrade to Luminous you

Re: [ceph-users] Ceph fails to recover

2017-09-20 Thread David Turner
active+remapped+backfilling, 173 > active+clean; 1975 GB data, 3011 GB used, 7063 GB / 10075 GB avail; > 30549/1376215 objects degraded (2.220%); 12201/1376215 objects misplaced > (0.887%); 21868 kB/s, 3 objects/s recovering > > Is this an acceptable recovery rate? Unfortunately I hav

Re: [ceph-users] Ceph fails to recover

2017-09-20 Thread David Turner
Correction, if the OSD had been marked down and been marked out, some of its PGs would be in a backfill state while others would be in a recovery state depending on how long the OSD was marked down and how much backfilling had completed in the cluster. On Wed, Sep 20, 2017 at 12:06 PM David

Re: [ceph-users] Ceph OSD crash starting up

2017-09-20 Thread David Turner
been up just long enough before it crashed to cause problems. On Wed, Sep 20, 2017 at 1:12 PM Gonzalo Aguilar Delgado < gagui...@aguilardelgado.com> wrote: > Hi David, > > Thank you for your support. What can be the cause of > active+clean+inconsistent still growing up? Bad

Re: [ceph-users] Ceph fails to recover

2017-09-20 Thread David Turner
is (quite) working as described with *ceph osd out * and *ceph > osd in *, but I am wondering > if this produces a realistic behavior. > > > Am 20.09.2017 um 18:06 schrieb David Turner : > > When you posted your ceph status, you only had 56 PGs degraded. Any value > of osd_max_

Re: [ceph-users] Possible to change the location of run_dir?

2017-09-20 Thread David Turner
You can always add the telegraf user to the ceph group. That change will persist on reboots and allow the user running the commands to read any folder/file that is owned by the group ceph. I do this for Zabbix and Nagios now that the /var/lib/ceph folder is not public readable. On Wed, Sep 20, 2

Re: [ceph-users] OSD assert hit suicide timeout

2017-09-20 Thread David Turner
27;t use EC pools, but my experience with similar slow requests on >> RGW+replicated_pools is that in the logs you need to find out the first >> slow request and identify where it's from, for example, is it deep-scrub, >> or some client accessing corrupted objects, disk error

Re: [ceph-users] Stuck IOs

2017-09-22 Thread David Turner
The request remains blocked if you issue `ceph osd down 2`? Marking the offending OSD as down usually clears up blocked requests for me... at least it resets the timer on it and the requests start blocking again if the OSD is starting to fail. On Fri, Sep 22, 2017 at 11:51 AM Matthew Stroud wrot

Re: [ceph-users] Stuck IOs

2017-09-22 Thread David Turner
g for missing object > > > > Thanks, > > Matthew Stroud > > > > *From: *David Turner > *Date: *Friday, September 22, 2017 at 9:57 AM > *To: *Matthew Stroud , " > ceph-users@lists.ceph.com" > *Subject: *Re: [ceph-users] Stuck IOs > > > >

Re: [ceph-users] Updating ceps client - what will happen to services like NFS on clients

2017-09-25 Thread David Turner
ed with the new Ceph version. In general RBDs are not affected by upgrades as long as you don't take down too much of the cluster at once and are properly doing a rolling upgrade. On Mon, Sep 25, 2017 at 8:07 AM David wrote: > Hi Götz > > If you did a rolling upgrade, RBD clients should

Re: [ceph-users] Bluestore OSD_DATA, WAL & DB

2017-09-25 Thread David Turner
db/wal partitions are per OSD. DB partitions need to be made as big as you need them. If they run out of space, they will fall back to the block device. If the DB and block are on the same device, then there's no reason to partition them and figure out the best size. If they are on separate dev

Re: [ceph-users] Updating ceps client - what will happen to services like NFS on clients

2017-09-26 Thread David Turner
You can update the server with the mapped rbd and shouldn't see as much as a blip on your VMs. On Tue, Sep 26, 2017, 3:32 AM Götz Reinicke wrote: > Hi Thanks David & David, > > we don’t use the fuse code. And may be I was a bit unclear, but your > feedback clears some

[ceph-users] osd max scrubs not honored?

2017-09-26 Thread J David
With “osd max scrubs” set to 1 in ceph.conf, which I believe is also the default, at almost all times, there are 2-3 deep scrubs running. 3 simultaneous deep scrubs is enough to cause a constant stream of: mon.ceph1 [WRN] Health check update: 69 slow requests are blocked > 32 sec (REQUEST_SLOW)

Re: [ceph-users] inconsistent pg will not repair

2017-09-26 Thread David Zafman
, "osd": 2, "primary": false } ], "selected_object_info": "3:ce3f1d6a::: mytestobject:head(47'54 osd.0.0:53 dirty|omap|data_digest|omap_digest s 143456 uv 3 dd 2ddbf8f5 od f5fba2c6 alloc_hint [0 0 0])", "union_shard_errors": [ "data_digest_mi

Re: [ceph-users] Minimum requirements to mount luminous cephfs ?

2017-09-27 Thread David Turner
You can also use ceph-fuse instead of the kernel driver to mount cephfs. It supports all of the luminous features. On Wed, Sep 27, 2017, 8:46 AM Yoann Moulin wrote: > Hello, > > > Try to work with the tunables: > > > > $ *ceph osd crush show-tunables* > > { > > "choose_local_tries": 0, > >

Re: [ceph-users] Different recovery times for OSDs joining and leaving the cluster

2017-09-27 Thread David Turner
When you lose 2 osds you have 30 osds accepting the degraded data and performing the backfilling. When the 2 osds are added back in you only have 2 osds receiving the majority of the data from the backfilling. 2 osds have a lot less available iops and spindle speed than the other 30 did when they

Re: [ceph-users] Re install ceph

2017-09-27 Thread David Turner
I've reinstalled a host many times over the years. We used dmcrypt so I made sure to back up the keys for that. Other than that it is seamless as long as your installation process only affects the root disk. If it affected any osd or journal disk, then you would need to mark those osds out and re-

Re: [ceph-users] osd max scrubs not honored?

2017-09-27 Thread David Turner
ooking for scrub should give you some ideas of things to try. On Tue, Sep 26, 2017, 2:04 PM J David wrote: > With “osd max scrubs” set to 1 in ceph.conf, which I believe is also > the default, at almost all times, there are 2-3 deep scrubs running. > > 3 simultaneous deep scrubs is enou

Re: [ceph-users] Need some help/advice upgrading Hammer to Jewel - HEALTH_ERR shutting down OSD

2017-09-27 Thread David Turner
There are new PG states that cause health_err. In this case it is undersized that is causing this state. While I decided to upgrade my tunables before upgrading the rest of my cluster, it does not seem to be a requirement. However I would recommend upgrading them sooner than later. It will cause a

Re: [ceph-users] osd max scrubs not honored?

2017-09-28 Thread David Turner
to do > it by turning off deep scrubs, forcing individual PGs to deep scrub at > intervals, and then enabling deep scrubs again. > -Greg > > > On Wed, Sep 27, 2017 at 6:34 AM David Turner > wrote: > >> This isn't an answer, but a suggestion to try and help track it

Re: [ceph-users] osd max scrubs not honored?

2017-09-29 Thread David Turner
If you're scheduling them appropriately so that no deep scrubs will happen on their own, then you can just check the cluster status if any PGs are deep scrubbing at all. If you're only scheduling them for specific pools, then you can confirm which PGs are being deep scrubbed in a specific pool wit

Re: [ceph-users] Ceph OSD get blocked and start to make inconsistent pg from time to time

2017-09-29 Thread David Turner
I'm going to assume you're dealing with your scrub errors and have a game plan for those as you didn't mention them in your question at all. One thing I'm always leary of when I see blocked requests happening is that the PGs might be splitting subfolders. It is pretty much a guarantee if you're a

Re: [ceph-users] Ceph OSD on Hardware RAID

2017-09-29 Thread David Turner
The reason it is recommended not to raid your disks is to give them all to Ceph. When a disk fails, Ceph can generally recover faster than the raid can. The biggest problem with raid is that you need to replace the disk and rebuild the raid asap. When a disk fails in Ceph, the cluster just moves

Re: [ceph-users] Get rbd performance stats

2017-09-29 Thread David Turner
There is no tool on the Ceph side to see which RBDs are doing what. Generally you need to monitor the mount points for the RBDs to track that down with iostat or something. That said, there are some tricky things you could probably do to track down the RBD that is doing a bunch of stuff (as long a

Re: [ceph-users] Get rbd performance stats

2017-09-29 Thread David Turner
His dilemma sounded like he has access to the cluster, but not any of the clients where the RBDs are used or even the hypervisors in charge of those. On Fri, Sep 29, 2017 at 12:03 PM Maged Mokhtar wrote: > On 2017-09-29 17:13, Matthew Stroud wrote: > > Is there a way I could get a performance st

Re: [ceph-users] erasure-coded with overwrites versus erasure-coded with cache tiering

2017-09-30 Thread David Turner
I can only think of 1 type of cache tier usage that is faster if you are using the cache tier on the same root of osds as the EC pool. That is cold storage where the file is written initially, modified and read door the first X hours, and then remains in cold storage for the remainder of its life

Re: [ceph-users] erasure-coded with overwrites versus erasure-coded with cache tiering

2017-09-30 Thread David Turner
Proofread failure. "modified and read during* the first X hours, and then remains in cold storage for the remainder of its life with rare* reads" On Sat, Sep 30, 2017, 1:32 PM David Turner wrote: > I can only think of 1 type of cache tier usage that is faster if you are > usin

Re: [ceph-users] right way to recover a failed OSD (disk) when using BlueStore ?

2017-09-30 Thread David Turner
I'm pretty sure that the process is the same as with filestore. The cluster doesn't really know if an osd is filestore or bluestore... It's just an osd running a daemon. If there are any differences, they would be in the release notes for Luminous as changes from Jewel. On Sat, Sep 30, 2017, 6:28

Re: [ceph-users] erasure-coded with overwrites versus erasure-coded with cache tiering

2017-09-30 Thread David Turner
e the cache tier. I mention that because if it is that easy to enable/disable, then testing it should be simple and easy to compare. On Sat, Sep 30, 2017, 8:10 PM Chad William Seys wrote: > Hi David, >Thanks for the clarification. Reminded me of some details I forgot > to mention. &g

Re: [ceph-users] decreasing number of PGs

2017-10-02 Thread David Turner
Adding more OSDs or deleting/recreating pools that have too many PGs are your only 2 options to reduce the number of PG's per OSD. It is on the Ceph roadmap, but is not a currently supported feature. You can alternatively adjust the setting threshold for the warning, but it is still a problem you

Re: [ceph-users] decreasing number of PGs

2017-10-03 Thread David Turner
> Andrei > -- > > *From: *"David Turner" > *To: *"Jack" , "ceph-users" < > ceph-users@lists.ceph.com> > *Sent: *Monday, 2 October, 2017 22:28:33 > *Subject: *Re: [ceph-users] decreasing number of PGs > > Adding more OSDs or

Re: [ceph-users] Ceph stuck creating pool

2017-10-03 Thread David Turner
My guess is a networking problem. Do you have vlans, cluster network vs public network in the ceph.conf, etc configured? Can you ping between all of your storage nodes on all of their IPs? All of your OSDs communicate with the mons on the public network, but they communicate with each other for

Re: [ceph-users] radosgw notify on creation/deletion of file in bucket

2017-10-03 Thread David Turner
Just to make sure you're not confusing redundancy with backups. Having your data in another site does not back up your data, but makes it more redundant. For instance if an object/file is accidentally deleted from RGW and you're syncing those files to AWS, Google buckets, or a second RGW cluster

Re: [ceph-users] Ceph cache pool full

2017-10-05 Thread David Turner
You're missing most all of the important bits. What the osds in your cluster look like, your tree, and your cache pool settings. ceph df ceph osd df ceph osd tree ceph osd pool get cephfs_cache all You have your writeback cache on 3 nvme drives. It looks like you have 1.6TB available between them

Re: [ceph-users] Ceph cache pool full

2017-10-06 Thread David Turner
On Fri, Oct 6, 2017, 1:05 AM Christian Balzer wrote: > > Hello, > > On Fri, 06 Oct 2017 03:30:41 + David Turner wrote: > > > You're missing most all of the important bits. What the osds in your > > cluster look like, your tree, and your cache pool settings.

Re: [ceph-users] Ceph cache pool full

2017-10-06 Thread David Turner
6, 2017 at 4:49 PM, Shawfeng Dong wrote: >> > >> > Dear all, >> > >> > >> > >> > Thanks a lot for the very insightful comments/suggestions! >> > >> > >> > >> > There are 3 OSD servers in our pilot Ceph clust

Re: [ceph-users] Ceph cache pool full

2017-10-06 Thread David Turner
gt; # rados -p cephfs_data ls > > Any advice? > > On Fri, Oct 6, 2017 at 9:45 AM, David Turner > wrote: > >> Notice in the URL for the documentation the use of "luminous". When you >> looked a few weeks ago, you might have been looking at the documentation &g

Re: [ceph-users] Ceph cache pool full

2017-10-06 Thread David Turner
> flushing of objects to the underlying data pool. Once I killed that > process, objects started to flush to the data pool automatically (with > target_max_bytes & target_max_objects set); and I can force the flushing > with 'rados -p cephfs_cache cache-flush-evict-all' as

Re: [ceph-users] PGs get placed in the same datacenter (Trying to make a hybrid NVMe/HDD pool with 6 servers, 2 in each datacenter)

2017-10-07 Thread David Turner
way from the primary NVMe copy. It wastes a copy of all of the data in the pool, but that's on the much cheaper HDD storage and can probably be considered acceptable losses for the sake of having the primary OSD on NVMe drives. On Sat, Oct 7, 2017 at 3:36 PM Peter Linder wrote: > On 1

Re: [ceph-users] PGs get placed in the same datacenter (Trying to make a hybrid NVMe/HDD pool with 6 servers, 2 in each datacenter)

2017-10-07 Thread David Turner
Just to make sure you understand that the reads will happen on the primary osd for the PG and not the nearest osd, meaning that reads will go between the datacenters. Also that each write will not ack until all 3 writes happen adding the latency to the writes and reads both. On Sat, Oct 7, 2017, 1

Re: [ceph-users] PGs get placed in the same datacenter (Trying to make a hybrid NVMe/HDD pool with 6 servers, 2 in each datacenter)

2017-10-08 Thread David Turner
all operations in the isolated DC will be > frozen, so I believe you would not lose data. > > >> >> >> >> On Sat, Oct 7, 2017 at 3:36 PM Peter Linder >> wrote: >> >>> On 10/7/2017 8:08 PM, David Turner wrote: >>> >>> Just

Re: [ceph-users] right way to recover a failed OSD (disk) when using BlueStore ?

2017-10-11 Thread David Turner
ming after hot swaping the device, the drive letter > is "sdx" according to the link above what would be the right command to > re-use the two NVME partitions for block db and wal ? > > I presume that everything else is the same. > best. > > > On Sat, Sep 30,

Re: [ceph-users] advice on number of objects per OSD

2017-10-11 Thread David Turner
I've managed RBD cluster that had all of the RBDs configured to 1M objects and filled up the cluster to 75% full with 4TB drives. Other than the collection splitting (subfolder splitting as I've called it before) we didn't have any problems with object counts. On Wed, Oct 11, 2017 at 9:47 AM Greg

Re: [ceph-users] min_size & hybrid OSD latency

2017-10-11 Thread David Turner
Christian is correct that min_size does not affect how many need to ACK the write, it is responsible for how many copies need to be available for the PG to be accessible. This is where SSD journals for filestore and SSD DB/WAL partitions come into play. The write is considered ACK'd as soon as th

<    1   2   3   4   5   6   7   8   9   10   >