[ceph-users] running Firefly client (0.80.1) against older version (dumpling 0.67.10) cluster?

2014-08-13 Thread Nigel Williams
Anyone know if this is safe in the short term? we're rebuilding our nova-compute nodes and can make sure the Dumpling versions are pinned as part of the process in the future. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/lis

[ceph-users] ceph-deploy with --release (--stable) for dumpling?

2014-08-25 Thread Nigel Williams
ceph-deploy --release dumpling or previously ceph-deploy --stable dumpling now results in Firefly (0.80.1) being installed, is this intentional? I'm adding another host with more OSDs and guessing it is preferable to deploy the same version. ___ ceph-use

Re: [ceph-users] ceph-deploy with --release (--stable) for dumpling?

2014-08-26 Thread Nigel Williams
On Tue, Aug 26, 2014 at 5:10 PM, Konrad Gutkowski wrote: > Ceph-deploy should set priority for ceph repository, which it doesn't, this > usually installs the best available version from any repository. Thanks Konrad for the tip. It took several goes (notably ceph-deploy purge did not, for me at l

Re: [ceph-users] SSD journal deployment experiences

2014-09-05 Thread Nigel Williams
On Fri, Sep 5, 2014 at 5:46 PM, Dan Van Der Ster wrote: >> On 05 Sep 2014, at 03:09, Christian Balzer wrote: >> You might want to look into cache pools (and dedicated SSD servers with >> fast controllers and CPUs) in your test cluster and for the future. >> Right now my impression is that there i

Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-02 Thread Nigel Williams
On Wed, Jun 3, 2015 at 8:30 AM, wrote: > We are running with Jumbo Frames turned on. Is that likely to be the issue? I got caught by this previously: http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2014-October/043955.html The problem is Ceph "almost-but-not-quite" works, leading you

[ceph-users] anyone using CephFS for HPC?

2015-06-11 Thread Nigel Williams
Wondering if anyone has done comparisons between CephFS and other parallel filesystems like Lustre typically used in HPC deployments either for scratch storage or persistent storage to support HPC workflows? thanks. ___ ceph-users mailing list ceph-users

Re: [ceph-users] anyone using CephFS for HPC?

2015-06-14 Thread Nigel Williams
On 12/06/2015 3:41 PM, Gregory Farnum wrote: ... and the test evaluation was on repurposed Lustre hardware so it was a bit odd, ... Agree, it was old (at least by now) DDN kit (SFA10K?) and not ideally suited for Ceph (really high OSD per host ratio). Sage's thesis or some of the earlier p

[ceph-users] EC pool needs hosts equal to k + m?

2015-06-21 Thread Nigel Williams
I recall a post to the mailing list in the last week(s) where someone said that for an EC Pool the failure-domain defaults to having k+m hosts in some versions of Ceph? Can anyone recall the post? have I got the requirement correct? ___ ceph-users ma

Re: [ceph-users] EC pool needs hosts equal to k + m?

2015-06-24 Thread Nigel Williams
On Wed, Jun 24, 2015 at 4:29 PM, Yueliang wrote: > When I use K+M hosts in the EC pool, if M hosts get down, still have K hosts > active, Can I continue write data to the pool ? If your CRUSH map specifies a failure-domain at the host level (so no two chunks share the same host) then you will be

Re: [ceph-users] Configuring Ceph without DNS

2015-07-13 Thread Nigel Williams
> On 13 Jul 2015, at 4:58 pm, Abhishek Varshney > wrote: > I have a requirement wherein I wish to setup Ceph where hostname resolution > is not supported and I just have IP addresses to work with. Is there a way > through which I can achieve this in Ceph? If yes, what are the caveats > associ

Re: [ceph-users] 160 Thousand ceph-client.admin.*.asok files : Wired problem , never seen before

2015-08-09 Thread Nigel Williams
On 10/08/2015 12:02 AM, Robert LeBlanc wrote: > I'm guessing this is on an OpenStack node? There is a fix for this and I > think it will come out in the next release. For now we have had to disable > the admin sockets. Do you know what triggers the fault? we've not seen it on Firefly+RBD for Ope

[ceph-users] ceph-deply preflight hostname check?

2013-09-04 Thread Nigel Williams
I notice under HOSTNAME RESOLUTION section the use of 'host -4 {hostname}' as a required test, however, in all my trial deployments so far, none would pass as this command is a direct DNS query, and instead I usually just add entries to the host file. Two thoughts, is Ceph expecting to only do DNS

[ceph-users] CephFS test-case

2013-09-06 Thread Nigel Williams
I appreciate CephFS is not a high priority, but this is a user-experience test-case that can be a source of stability bugs for Ceph developers to investigate (and hopefully resolve): CephFS test-case 1. Create two clusters, each 3 nodes with 4 OSDs each 2. I used Ubuntu 13.04 followed by update/

Re: [ceph-users] newbie question: rebooting the whole cluster, powerfailure

2013-09-06 Thread Nigel Williams
On 06/09/2013, at 7:49 PM, "Bernhard Glomm" wrote: > Can I introduce the cluster network later on, after the cluster is deployed > and started working? > (by editing ceph.conf, push it to the cluster members and restart the > daemons?) Thanks Bernhard for asking this question, I have the same q

Re: [ceph-users] xfsprogs not found in RHEL

2013-09-11 Thread Nigel Williams
On Wed, Aug 28, 2013 at 4:46 PM, Stroppa Daniele (strp) wrote: > You might need the RHEL Scalable File System add-on. Exactly. I understand this needs to be purchased from Red Hat in order to get access to it if you are using the Red Hat subscription management system. I expect you could drag ov

Re: [ceph-users] Placement groups on a 216 OSD cluster with multiple pools

2013-11-14 Thread Nigel Williams
On 15/11/2013 8:57 AM, Dane Elwell wrote: [2] - I realise the dangers/stupidity of a replica size of 0, but some of the data we wish to store just isn’t /that/ important. We've been thinking of this too. The application is storing boot-images, ISOs, local repository mirrors etc where recovery

[ceph-users] beware of jumbo frames

2014-10-23 Thread Nigel Williams
Spent a frustrating day trying to build a new test cluster, turned out I had jumbo frames set on the cluster-network only, but having re-wired the machines recently with a new switch, I forgot to check it could handle jumbo-frames (it can't). Symptoms were stuck/unclean PGs - a small subset of PGs

Re: [ceph-users] v0.87 Giant released

2014-10-29 Thread Nigel Williams
On 30/10/2014 8:56 AM, Sage Weil wrote: * *Degraded vs misplaced*: the Ceph health reports from 'ceph -s' and related commands now make a distinction between data that is degraded (there are fewer than the desired number of copies) and data that is misplaced (stored in the wrong location

Re: [ceph-users] v0.87 Giant released

2014-10-29 Thread Nigel Williams
On 30/10/2014 11:51 AM, Christian Balzer wrote: Thus objects are (temporarily) not where they're supposed to be, but still present in sufficient replication. thanks for the reminder, I suppose that is obvious :-) A much more benign scenario than degraded and I hope that this doesn't even gene

Re: [ceph-users] Compile from source with Kinetic support

2014-11-28 Thread Nigel Williams
On Sat, Nov 29, 2014 at 5:19 AM, Julien Lutran wrote: > Where can I find this kinetic devel package ? I guess you want this (C== kinetic client)? it has kinetic.h at least. https://github.com/Seagate/kinetic-cpp-client ___ ceph-users mailing list ceph-

Re: [ceph-users] experimental features

2014-12-05 Thread Nigel Williams
On Sat, Dec 6, 2014 at 4:36 AM, Sage Weil wrote: > - enumerate experiemntal options we want to enable >... > This has the property that no config change is necessary when the > feature drops its experimental status. It keeps the risky options in one place too so easier to spot. > In all of the

[ceph-users] replacing an OSD or crush map sensitivity

2013-06-01 Thread Nigel Williams
Could I have a critique of this approach please as to how I could have done it better or whether what I experienced simply reflects work still to be done. This is with Ceph 0.61.2 on a quite slow test cluster (logs shared with OSDs, no separate journals, using CephFS). I knocked the power co

Re: [ceph-users] replacing an OSD or crush map sensitivity

2013-06-03 Thread Nigel Williams
On 4/06/2013 9:16 AM, Chen, Xiaoxi wrote: > my 0.02, you really dont need to wait for health_ok between your > recovery steps,just go ahead. Everytime a new map be generated and > broadcasted,the old map and in-progress recovery will be canceled thanks Xiaoxi, that is helpful to know. It seems to

Re: [ceph-users] replacing an OSD or crush map sensitivity

2013-06-03 Thread Nigel Williams
On Tue, Jun 4, 2013 at 1:59 PM, Sage Weil wrote: > On Tue, 4 Jun 2013, Nigel Williams wrote: >> Something else I noticed: ... > > Does the monitor data directory share a disk with an OSD? If so, that > makes sense: compaction freed enough space to drop below the threshold...

Re: [ceph-users] Drive replacement procedure

2013-06-24 Thread Nigel Williams
On 25/06/2013 5:59 AM, Brian Candler wrote: On 24/06/2013 20:27, Dave Spano wrote: Here's my procedure for manually adding OSDs. The other thing I discovered is not to wait between steps; some changes result in a new crushmap, that then triggers replication. You want to speed through the step

[ceph-users] Luminous 12.1.3: mgr errors

2017-08-11 Thread Nigel Williams
Cluster is ok and mgr is active, but unable to get the dashboard to start. I see the following errors in logs: 2017-08-12 15:40:07.805991 7f508effd500 0 pidfile_write: ignore empty --pid-file 2017-08-12 15:40:07.810124 7f508effd500 -1 auth: unable to find a keyring on /var/lib/ceph/mgr/ceph-0/key

Re: [ceph-users] Luminous 12.1.3: mgr errors

2017-08-12 Thread Nigel Williams
On 12 August 2017 at 23:04, David Turner wrote: > I haven't set up the mgr service yet, but your daemon folder is missing > it's keyring file (/var/lib/ceph/mgr/ceph-0/keyring). It's exactly what > the error message says. When you set it up did you run a command pile ceph > auth add? If you did,

Re: [ceph-users] State of play for RDMA on Luminous

2017-08-28 Thread Nigel Williams
On 29 August 2017 at 00:21, Haomai Wang wrote: > On Wed, Aug 23, 2017 at 1:26 AM, Florian Haas wrote: >> - And more broadly, if a user wants to use the performance benefits of >> RDMA, but not all of their potential Ceph clients have InfiniBand HCAs, >> what are their options? RoCE? > > roce v2 i

Re: [ceph-users] v12.2.0 Luminous released

2017-08-29 Thread Nigel Williams
On 30 August 2017 at 16:05, Mark Kirkwood wrote: > Very nice! > > I tested an upgrade from Jewel, pretty painless. However we forgot to merge: > > http://tracker.ceph.com/issues/20950 > > So the mgr creation requires surgery still :-( > > regards > > Mark > > > > On 30/08/17 06:20, Abhishek Lekshm

Re: [ceph-users] v12.2.0 Luminous released

2017-08-29 Thread Nigel Williams
> On 30 August 2017 at 16:05, Mark Kirkwood > wrote: >> http://tracker.ceph.com/issues/20950 >> >> So the mgr creation requires surgery still :-( is there a way out of this error with ceph-mgr? mgr init Authentication failed, did you specify a mgr ID with a valid keyring? root@c0mds-100:~# sys

Re: [ceph-users] v12.2.0 Luminous released

2017-08-30 Thread Nigel Williams
On 30 August 2017 at 17:43, Mark Kirkwood wrote: > Yes - you just edit /var/lib/ceph/bootstrap-mgr/ceph.keyring so the key > matches what 'ceph auth list' shows and re-deploy the mgr (worked for me in > 12.1.3/4 and 12.2.0). thanks for the tip, what I did to get it work: - had already sync'd the

Re: [ceph-users] Centos7, luminous, cephfs, .snaps

2017-08-30 Thread Nigel Williams
On 30 August 2017 at 18:52, Marc Roos wrote: > I noticed it is .snap not .snaps Yes > mkdir: cannot create directory ‘.snap/snap1’: Operation not permitted > > Is this because my permissions are insufficient on the client id? fairly sure you've forgotten this step: ceph mds set allow_new_snaps

Re: [ceph-users] v12.2.0 Luminous released

2017-08-30 Thread Nigel Williams
On 30 August 2017 at 20:53, John Spray wrote: > The mgr_initial_modules setting is only applied at the point of > cluster creation, ok. > so I would guess that if it didn't seem to take > effect then this was an upgrade from >=11.x not quite, it was a clean install of Luminous, and somewhere ar

Re: [ceph-users] Bluestore disk colocation using NVRAM, SSD and SATA

2017-09-20 Thread Nigel Williams
On 21 September 2017 at 04:53, Maximiliano Venesio wrote: > Hi guys i'm reading different documents about bluestore, and it never > recommends to use NVRAM to store the bluefs db, nevertheless the official > documentation says that, is better to use the faster device to put the > block.db in. >

Re: [ceph-users] Bluestore OSD_DATA, WAL & DB

2017-09-25 Thread Nigel Williams
On 26 September 2017 at 01:10, David Turner wrote: > If they are on separate > devices, then you need to make it as big as you need to to ensure that it > won't spill over (or if it does that you're ok with the degraded performance > while the db partition is full). I haven't come across an equat

Re: [ceph-users] Bluestore OSD_DATA, WAL & DB

2017-09-25 Thread Nigel Williams
On 26 September 2017 at 08:11, Mark Nelson wrote: > The WAL should never grow larger than the size of the buffers you've > specified. It's the DB that can grow and is difficult to estimate both > because different workloads will cause different numbers of extents and > objects, but also because r

Re: [ceph-users] clients failing to advance oldest client/flush tid

2017-10-09 Thread Nigel Williams
On 9 October 2017 at 19:21, Jake Grimmett wrote: > HEALTH_WARN 9 clients failing to advance oldest client/flush tid; > 1 MDSs report slow requests; 1 MDSs behind on trimming On a proof-of-concept 12.2.1 cluster (few random files added, 30 OSDs, default Ceph settings) I can get the above error by

Re: [ceph-users] Bluestore OSD_DATA, WAL & DB

2017-11-02 Thread Nigel Williams
On 3 November 2017 at 07:45, Martin Overgaard Hansen wrote: > I want to bring this subject back in the light and hope someone can provide > insight regarding the issue, thanks. Thanks Martin, I was going to do the same. Is it possible to make the DB partition (on the fastest device) too big? in

Re: [ceph-users] how to improve performance

2017-11-20 Thread Nigel Williams
On 20 November 2017 at 23:36, Christian Balzer wrote: > On Mon, 20 Nov 2017 14:02:30 +0200 Rudi Ahlers wrote: >> The SATA drives are ST8000NM0055-1RM112 >> > Note that these (while fast) have an internal flash cache, limiting them to > something like 0.2 DWPD. > Probably not an issue with the WAL/

Re: [ceph-users] how to improve performance

2017-11-20 Thread Nigel Williams
On 21 November 2017 at 10:07, Christian Balzer wrote: > On Tue, 21 Nov 2017 10:00:28 +1100 Nigel Williams wrote: >> Is there something in the specifications that gives them away as SSHD? >> > The 550TB endurance per year for an 8TB drive and the claim of 30% faster > IOPS wou

[ceph-users] Transparent huge pages

2017-11-28 Thread Nigel Williams
Given that memory is a key resource for Ceph, this advice about switching Transparent Huge Pages kernel setting to madvise would be worth testing to see if THP is helping or hindering. Article: https://blog.nelhage.com/post/transparent-hugepages/ Discussion: https://news.ycombinator.com/item?id=1

Re: [ceph-users] CephFS - Mounting a second Ceph file system

2017-11-28 Thread Nigel Williams
On 29 November 2017 at 01:51, Daniel Baumann wrote: > On 11/28/17 15:09, Geoffrey Rhodes wrote: >> I'd like to run more than one Ceph file system in the same cluster. Are their opinions on how stable multiple filesystems per single Ceph cluster is in practice? is anyone using it actively with a s

Re: [ceph-users] BlueStore upgrade steps broken

2018-08-19 Thread Nigel Williams
On 18 August 2018 at 03:06, David Turner wrote: > The WAL will choose the fastest device available. > Any idea how it makes this determination automatically? is it doing a hdparm -t or similar? is fastest=bandwidth, IOPs or latency? ___ ceph-users mail

Re: [ceph-users] ceph 12.2.4 - which OSD has slow requests ?

2018-04-17 Thread Nigel Williams
On 18 April 2018 at 05:52, Steven Vacaroaia wrote: > I can see many slow requests in the logs but no clue which OSD is the > culprit > How can I find the culprit ? > ​ceph osd perf or ceph pg dump osds -f json-pretty | jq .[].fs_perf_stat ​searching the ML archives for threads about slow requ

[ceph-users] network connectivity test tool?

2016-02-04 Thread Nigel Williams
I thought I had book-marked a neat shell script that used the Ceph.conf definitions to do an all-to-all, all-to-one check of network connectivity for a Ceph cluster (useful for discovering problems with jumbo frames), but I've lost the bookmark and after trawling github and trying various keywords

Re: [ceph-users] State of Ceph documention

2016-02-25 Thread Nigel Williams
On Fri, Feb 26, 2016 at 3:10 PM, Christian Balzer wrote: > Then we come to a typical problem for fast evolving SW like Ceph, things > that are not present in older versions. I was going to post on this too (I had similar frustrations), and would like to propose that a move to splitting the docu

Re: [ceph-users] State of Ceph documention

2016-02-25 Thread Nigel Williams
On Fri, Feb 26, 2016 at 4:09 PM, Adam Tygart wrote: > The docs are already split by version, although it doesn't help that > it isn't linked in an obvious manner. > > http://docs.ceph.com/docs/master/rados/operations/cache-tiering/ Is there any reason to keep this "master" (version-less variant)

Re: [ceph-users] State of Ceph documention

2016-02-26 Thread Nigel Williams
On Fri, Feb 26, 2016 at 11:28 PM, John Spray wrote: > Some projects have big angry warning banners at the top of their > master branch documentation, I think perhaps we should do that too, > and at the same time try to find a way to steer google hits to the > latest stable branch docs rather than

Re: [ceph-users] State of Ceph documention

2016-02-26 Thread Nigel Williams
On Sat, Feb 27, 2016 at 12:08 AM, Andy Allan wrote: > When I made a (trivial, to be fair) documentation PR it was dealt with > immediately, both when I opened it, and when I fixed up my commit > message. I'd recommend that if anyone sees anything wrong with the > docs, just submit a PR with the fi

Re: [ceph-users] BlueFS spillover detected - 14.2.1

2019-06-19 Thread Nigel Williams
On Thu, 20 Jun 2019 at 09:12, Vitaliy Filippov wrote: > All values except 4, 30 and 286 GB are currently useless in ceph with > default rocksdb settings :) > however, several commenters have said that during compaction rocksdb needs space during the process, and hence the DB partition needs to b

[ceph-users] show-prediction-config - no valid command found?

2019-06-26 Thread Nigel Williams
Have I missed a step? Diskprediction module is not working for me. root@cnx-11:/var/log/ceph# ceph device show-prediction-config no valid command found; 10 closest matches: root@cnx-11:/var/log/ceph# ceph mgr module ls { "enabled_modules": [ "dashboard", "diskprediction_cloud"

[ceph-users] Nautilus - cephfs auth caps problem?

2019-07-02 Thread Nigel Williams
I am getting "Operation not permitted" on a write when trying to set caps for a user. Admin user (allow * for everything) works ok. This does not work: caps: [mds] allow r,allow rw path=/home caps: [mon] allow r caps: [osd] allow rwx tag cephfs data=cephfs_data2 This does

Re: [ceph-users] Nautilus - cephfs auth caps problem?

2019-07-03 Thread Nigel Williams
thanks for the tip, I did wonder about that, and checked that at one point, and assumed that was ok. root@cnx-11:~# ceph osd pool application get cephfs_data { "cephfs": { "data": "cephfs" } } root@cnx-11:~# ceph osd pool application get cephfs_data2 { "cephfs": { "data

Re: [ceph-users] Nautilus 14.2.1 / 14.2.2 crash

2019-07-19 Thread Nigel Williams
On Sat, 20 Jul 2019 at 04:28, Nathan Fish wrote: > On further investigation, it seems to be this bug: > http://tracker.ceph.com/issues/38724 We just upgraded to 14.2.2, and had a dozen OSDs at 14.2.2 go down this bug, recovered with: systemctl reset-failed ceph-osd@160 systemctl start ceph-osd

[ceph-users] fixing a bad PG per OSD decision with pg-autoscaling?

2019-08-20 Thread Nigel Williams
Due to a gross miscalculation several years ago I set way too many PGs for our original Hammer cluster. We've lived with it ever since, but now we are on Luminous, changes result in stuck-requests and balancing problems. The cluster currently has 12% misplaced, and is grinding to re-balance but is

[ceph-users] cephfs 1 large omap objects

2019-10-06 Thread Nigel Williams
Out of the blue this popped up (on an otherwise healthy cluster): HEALTH_WARN 1 large omap objects LARGE_OMAP_OBJECTS 1 large omap objects 1 large objects found in pool 'cephfs_metadata' Search the cluster log for 'Large omap object found' for more details. "Search the cluster log" is som

Re: [ceph-users] cephfs 1 large omap objects

2019-10-06 Thread Nigel Williams
I followed some other suggested steps, and have this: root@cnx-17:/var/log/ceph# zcat ceph-osd.178.log.?.gz|fgrep Large 2019-10-02 13:28:39.412 7f482ab1c700 0 log_channel(cluster) log [WRN] : Large omap object found. Object: 2:654134d2:::mds0_openfiles.0:head Key count: 306331 Size (bytes): 13993

Re: [ceph-users] cephfs 1 large omap objects

2019-10-06 Thread Nigel Williams
I've adjusted the threshold: ceph config set osd osd_deep_scrub_large_omap_object_key_threshold 35 Colleague suggested that this will take effect on the next deep-scrub. Is the default of 200,000 too small? will this be adjusted in future releases or is it meant to be adjusted in some use-ca

Re: [ceph-users] Issues with Nautilus 14.2.6 ceph-volume lvm batch --bluestore ?

2020-01-19 Thread Nigel Williams
On Mon, 20 Jan 2020 at 14:15, Dave Hall wrote: > BTW, I did try to search the list archives via > http://lists.ceph.com/pipermail/ceph-users-ceph.com/, but that didn't work > well for me. Is there another way to search? With your favorite search engine (say Goog / ddg ), you can do this: ceph