Anyone know if this is safe in the short term? we're rebuilding our
nova-compute nodes and can make sure the Dumpling versions are pinned
as part of the process in the future.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/lis
ceph-deploy --release dumpling or previously ceph-deploy --stable
dumpling now results in Firefly (0.80.1) being installed, is this
intentional?
I'm adding another host with more OSDs and guessing it is preferable
to deploy the same version.
___
ceph-use
On Tue, Aug 26, 2014 at 5:10 PM, Konrad Gutkowski
wrote:
> Ceph-deploy should set priority for ceph repository, which it doesn't, this
> usually installs the best available version from any repository.
Thanks Konrad for the tip. It took several goes (notably ceph-deploy
purge did not, for me at l
On Fri, Sep 5, 2014 at 5:46 PM, Dan Van Der Ster
wrote:
>> On 05 Sep 2014, at 03:09, Christian Balzer wrote:
>> You might want to look into cache pools (and dedicated SSD servers with
>> fast controllers and CPUs) in your test cluster and for the future.
>> Right now my impression is that there i
On Wed, Jun 3, 2015 at 8:30 AM, wrote:
> We are running with Jumbo Frames turned on. Is that likely to be the issue?
I got caught by this previously:
http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2014-October/043955.html
The problem is Ceph "almost-but-not-quite" works, leading you
Wondering if anyone has done comparisons between CephFS and other
parallel filesystems like Lustre typically used in HPC deployments
either for scratch storage or persistent storage to support HPC
workflows?
thanks.
___
ceph-users mailing list
ceph-users
On 12/06/2015 3:41 PM, Gregory Farnum wrote:
... and the test evaluation was on repurposed Lustre
hardware so it was a bit odd, ...
Agree, it was old (at least by now) DDN kit (SFA10K?) and not ideally suited for Ceph
(really high OSD per host ratio).
Sage's thesis or some of the earlier p
I recall a post to the mailing list in the last week(s) where someone said that for an EC
Pool the failure-domain defaults to having k+m hosts in some versions of Ceph?
Can anyone recall the post? have I got the requirement correct?
___
ceph-users ma
On Wed, Jun 24, 2015 at 4:29 PM, Yueliang wrote:
> When I use K+M hosts in the EC pool, if M hosts get down, still have K hosts
> active, Can I continue write data to the pool ?
If your CRUSH map specifies a failure-domain at the host level (so no
two chunks share the same host) then you will be
> On 13 Jul 2015, at 4:58 pm, Abhishek Varshney
> wrote:
> I have a requirement wherein I wish to setup Ceph where hostname resolution
> is not supported and I just have IP addresses to work with. Is there a way
> through which I can achieve this in Ceph? If yes, what are the caveats
> associ
On 10/08/2015 12:02 AM, Robert LeBlanc wrote:
> I'm guessing this is on an OpenStack node? There is a fix for this and I
> think it will come out in the next release. For now we have had to disable
> the admin sockets.
Do you know what triggers the fault? we've not seen it on Firefly+RBD for
Ope
I notice under HOSTNAME RESOLUTION section the use of 'host -4
{hostname}' as a required test, however, in all my trial deployments
so far, none would pass as this command is a direct DNS query, and
instead I usually just add entries to the host file.
Two thoughts, is Ceph expecting to only do DNS
I appreciate CephFS is not a high priority, but this is a
user-experience test-case that can be a source of stability bugs for
Ceph developers to investigate (and hopefully resolve):
CephFS test-case
1. Create two clusters, each 3 nodes with 4 OSDs each
2. I used Ubuntu 13.04 followed by update/
On 06/09/2013, at 7:49 PM, "Bernhard Glomm" wrote:
> Can I introduce the cluster network later on, after the cluster is deployed
> and started working?
> (by editing ceph.conf, push it to the cluster members and restart the
> daemons?)
Thanks Bernhard for asking this question, I have the same q
On Wed, Aug 28, 2013 at 4:46 PM, Stroppa Daniele (strp) wrote:
> You might need the RHEL Scalable File System add-on.
Exactly.
I understand this needs to be purchased from Red Hat in order to get
access to it if you are using the Red Hat subscription management
system. I expect you could drag ov
On 15/11/2013 8:57 AM, Dane Elwell wrote:
[2] - I realise the dangers/stupidity of a replica size of 0, but some of the
data we wish
to store just isn’t /that/ important.
We've been thinking of this too. The application is storing boot-images, ISOs, local
repository mirrors etc where recovery
Spent a frustrating day trying to build a new test cluster, turned out
I had jumbo frames set on the cluster-network only, but having
re-wired the machines recently with a new switch, I forgot to check it
could handle jumbo-frames (it can't).
Symptoms were stuck/unclean PGs - a small subset of PGs
On 30/10/2014 8:56 AM, Sage Weil wrote:
* *Degraded vs misplaced*: the Ceph health reports from 'ceph -s' and
related commands now make a distinction between data that is
degraded (there are fewer than the desired number of copies) and
data that is misplaced (stored in the wrong location
On 30/10/2014 11:51 AM, Christian Balzer wrote:
Thus objects are (temporarily) not where they're supposed to be, but still
present in sufficient replication.
thanks for the reminder, I suppose that is obvious :-)
A much more benign scenario than degraded and I hope that this doesn't
even gene
On Sat, Nov 29, 2014 at 5:19 AM, Julien Lutran wrote:
> Where can I find this kinetic devel package ?
I guess you want this (C== kinetic client)? it has kinetic.h at least.
https://github.com/Seagate/kinetic-cpp-client
___
ceph-users mailing list
ceph-
On Sat, Dec 6, 2014 at 4:36 AM, Sage Weil wrote:
> - enumerate experiemntal options we want to enable
>...
> This has the property that no config change is necessary when the
> feature drops its experimental status.
It keeps the risky options in one place too so easier to spot.
> In all of the
Could I have a critique of this approach please as to how I could have
done it better or whether what I experienced simply reflects work still
to be done.
This is with Ceph 0.61.2 on a quite slow test cluster (logs shared with
OSDs, no separate journals, using CephFS).
I knocked the power co
On 4/06/2013 9:16 AM, Chen, Xiaoxi wrote:
> my 0.02, you really dont need to wait for health_ok between your
> recovery steps,just go ahead. Everytime a new map be generated and
> broadcasted,the old map and in-progress recovery will be canceled
thanks Xiaoxi, that is helpful to know.
It seems to
On Tue, Jun 4, 2013 at 1:59 PM, Sage Weil wrote:
> On Tue, 4 Jun 2013, Nigel Williams wrote:
>> Something else I noticed: ...
>
> Does the monitor data directory share a disk with an OSD? If so, that
> makes sense: compaction freed enough space to drop below the threshold...
On 25/06/2013 5:59 AM, Brian Candler wrote:
On 24/06/2013 20:27, Dave Spano wrote:
Here's my procedure for manually adding OSDs.
The other thing I discovered is not to wait between steps; some changes result in a new
crushmap, that then triggers replication. You want to speed through the step
Cluster is ok and mgr is active, but unable to get the dashboard to start.
I see the following errors in logs:
2017-08-12 15:40:07.805991 7f508effd500 0 pidfile_write: ignore empty
--pid-file
2017-08-12 15:40:07.810124 7f508effd500 -1 auth: unable to find a keyring
on /var/lib/ceph/mgr/ceph-0/key
On 12 August 2017 at 23:04, David Turner wrote:
> I haven't set up the mgr service yet, but your daemon folder is missing
> it's keyring file (/var/lib/ceph/mgr/ceph-0/keyring). It's exactly what
> the error message says. When you set it up did you run a command pile ceph
> auth add? If you did,
On 29 August 2017 at 00:21, Haomai Wang wrote:
> On Wed, Aug 23, 2017 at 1:26 AM, Florian Haas wrote:
>> - And more broadly, if a user wants to use the performance benefits of
>> RDMA, but not all of their potential Ceph clients have InfiniBand HCAs,
>> what are their options? RoCE?
>
> roce v2 i
On 30 August 2017 at 16:05, Mark Kirkwood wrote:
> Very nice!
>
> I tested an upgrade from Jewel, pretty painless. However we forgot to merge:
>
> http://tracker.ceph.com/issues/20950
>
> So the mgr creation requires surgery still :-(
>
> regards
>
> Mark
>
>
>
> On 30/08/17 06:20, Abhishek Lekshm
> On 30 August 2017 at 16:05, Mark Kirkwood
> wrote:
>> http://tracker.ceph.com/issues/20950
>>
>> So the mgr creation requires surgery still :-(
is there a way out of this error with ceph-mgr?
mgr init Authentication failed, did you specify a mgr ID with a valid keyring?
root@c0mds-100:~# sys
On 30 August 2017 at 17:43, Mark Kirkwood wrote:
> Yes - you just edit /var/lib/ceph/bootstrap-mgr/ceph.keyring so the key
> matches what 'ceph auth list' shows and re-deploy the mgr (worked for me in
> 12.1.3/4 and 12.2.0).
thanks for the tip, what I did to get it work:
- had already sync'd the
On 30 August 2017 at 18:52, Marc Roos wrote:
> I noticed it is .snap not .snaps
Yes
> mkdir: cannot create directory ‘.snap/snap1’: Operation not permitted
>
> Is this because my permissions are insufficient on the client id?
fairly sure you've forgotten this step:
ceph mds set allow_new_snaps
On 30 August 2017 at 20:53, John Spray wrote:
> The mgr_initial_modules setting is only applied at the point of
> cluster creation,
ok.
> so I would guess that if it didn't seem to take
> effect then this was an upgrade from >=11.x
not quite, it was a clean install of Luminous, and somewhere ar
On 21 September 2017 at 04:53, Maximiliano Venesio
wrote:
> Hi guys i'm reading different documents about bluestore, and it never
> recommends to use NVRAM to store the bluefs db, nevertheless the official
> documentation says that, is better to use the faster device to put the
> block.db in.
>
On 26 September 2017 at 01:10, David Turner wrote:
> If they are on separate
> devices, then you need to make it as big as you need to to ensure that it
> won't spill over (or if it does that you're ok with the degraded performance
> while the db partition is full). I haven't come across an equat
On 26 September 2017 at 08:11, Mark Nelson wrote:
> The WAL should never grow larger than the size of the buffers you've
> specified. It's the DB that can grow and is difficult to estimate both
> because different workloads will cause different numbers of extents and
> objects, but also because r
On 9 October 2017 at 19:21, Jake Grimmett wrote:
> HEALTH_WARN 9 clients failing to advance oldest client/flush tid;
> 1 MDSs report slow requests; 1 MDSs behind on trimming
On a proof-of-concept 12.2.1 cluster (few random files added, 30 OSDs,
default Ceph settings) I can get the above error by
On 3 November 2017 at 07:45, Martin Overgaard Hansen wrote:
> I want to bring this subject back in the light and hope someone can provide
> insight regarding the issue, thanks.
Thanks Martin, I was going to do the same.
Is it possible to make the DB partition (on the fastest device) too
big? in
On 20 November 2017 at 23:36, Christian Balzer wrote:
> On Mon, 20 Nov 2017 14:02:30 +0200 Rudi Ahlers wrote:
>> The SATA drives are ST8000NM0055-1RM112
>>
> Note that these (while fast) have an internal flash cache, limiting them to
> something like 0.2 DWPD.
> Probably not an issue with the WAL/
On 21 November 2017 at 10:07, Christian Balzer wrote:
> On Tue, 21 Nov 2017 10:00:28 +1100 Nigel Williams wrote:
>> Is there something in the specifications that gives them away as SSHD?
>>
> The 550TB endurance per year for an 8TB drive and the claim of 30% faster
> IOPS wou
Given that memory is a key resource for Ceph, this advice about switching
Transparent Huge Pages kernel setting to madvise would be worth testing to
see if THP is helping or hindering.
Article:
https://blog.nelhage.com/post/transparent-hugepages/
Discussion:
https://news.ycombinator.com/item?id=1
On 29 November 2017 at 01:51, Daniel Baumann wrote:
> On 11/28/17 15:09, Geoffrey Rhodes wrote:
>> I'd like to run more than one Ceph file system in the same cluster.
Are their opinions on how stable multiple filesystems per single Ceph
cluster is in practice? is anyone using it actively with a s
On 18 August 2018 at 03:06, David Turner wrote:
> The WAL will choose the fastest device available.
>
Any idea how it makes this determination automatically? is it doing a
hdparm -t or similar? is fastest=bandwidth, IOPs or latency?
___
ceph-users mail
On 18 April 2018 at 05:52, Steven Vacaroaia wrote:
> I can see many slow requests in the logs but no clue which OSD is the
> culprit
> How can I find the culprit ?
>
ceph osd perf
or
ceph pg dump osds -f json-pretty | jq .[].fs_perf_stat
searching the ML archives for threads about slow requ
I thought I had book-marked a neat shell script that used the
Ceph.conf definitions to do an all-to-all, all-to-one check of network
connectivity for a Ceph cluster (useful for discovering problems with
jumbo frames), but I've lost the bookmark and after trawling github
and trying various keywords
On Fri, Feb 26, 2016 at 3:10 PM, Christian Balzer wrote:
> Then we come to a typical problem for fast evolving SW like Ceph, things
> that are not present in older versions.
I was going to post on this too (I had similar frustrations), and would
like to propose that a move to splitting the docu
On Fri, Feb 26, 2016 at 4:09 PM, Adam Tygart wrote:
> The docs are already split by version, although it doesn't help that
> it isn't linked in an obvious manner.
>
> http://docs.ceph.com/docs/master/rados/operations/cache-tiering/
Is there any reason to keep this "master" (version-less variant)
On Fri, Feb 26, 2016 at 11:28 PM, John Spray wrote:
> Some projects have big angry warning banners at the top of their
> master branch documentation, I think perhaps we should do that too,
> and at the same time try to find a way to steer google hits to the
> latest stable branch docs rather than
On Sat, Feb 27, 2016 at 12:08 AM, Andy Allan wrote:
> When I made a (trivial, to be fair) documentation PR it was dealt with
> immediately, both when I opened it, and when I fixed up my commit
> message. I'd recommend that if anyone sees anything wrong with the
> docs, just submit a PR with the fi
On Thu, 20 Jun 2019 at 09:12, Vitaliy Filippov wrote:
> All values except 4, 30 and 286 GB are currently useless in ceph with
> default rocksdb settings :)
>
however, several commenters have said that during compaction rocksdb needs
space during the process, and hence the DB partition needs to b
Have I missed a step? Diskprediction module is not working for me.
root@cnx-11:/var/log/ceph# ceph device show-prediction-config
no valid command found; 10 closest matches:
root@cnx-11:/var/log/ceph# ceph mgr module ls
{
"enabled_modules": [
"dashboard",
"diskprediction_cloud"
I am getting "Operation not permitted" on a write when trying to set caps
for a user. Admin user (allow * for everything) works ok.
This does not work:
caps: [mds] allow r,allow rw path=/home
caps: [mon] allow r
caps: [osd] allow rwx tag cephfs data=cephfs_data2
This does
thanks for the tip, I did wonder about that, and checked that at one point,
and assumed that was ok.
root@cnx-11:~# ceph osd pool application get cephfs_data
{
"cephfs": {
"data": "cephfs"
}
}
root@cnx-11:~# ceph osd pool application get cephfs_data2
{
"cephfs": {
"data
On Sat, 20 Jul 2019 at 04:28, Nathan Fish wrote:
> On further investigation, it seems to be this bug:
> http://tracker.ceph.com/issues/38724
We just upgraded to 14.2.2, and had a dozen OSDs at 14.2.2 go down this
bug, recovered with:
systemctl reset-failed ceph-osd@160
systemctl start ceph-osd
Due to a gross miscalculation several years ago I set way too many PGs for
our original Hammer cluster. We've lived with it ever since, but now we are
on Luminous, changes result in stuck-requests and balancing problems.
The cluster currently has 12% misplaced, and is grinding to re-balance but
is
Out of the blue this popped up (on an otherwise healthy cluster):
HEALTH_WARN 1 large omap objects
LARGE_OMAP_OBJECTS 1 large omap objects
1 large objects found in pool 'cephfs_metadata'
Search the cluster log for 'Large omap object found' for more details.
"Search the cluster log" is som
I followed some other suggested steps, and have this:
root@cnx-17:/var/log/ceph# zcat ceph-osd.178.log.?.gz|fgrep Large
2019-10-02 13:28:39.412 7f482ab1c700 0 log_channel(cluster) log [WRN] :
Large omap object found. Object: 2:654134d2:::mds0_openfiles.0:head Key
count: 306331 Size (bytes): 13993
I've adjusted the threshold:
ceph config set osd osd_deep_scrub_large_omap_object_key_threshold 35
Colleague suggested that this will take effect on the next deep-scrub.
Is the default of 200,000 too small? will this be adjusted in future
releases or is it meant to be adjusted in some use-ca
On Mon, 20 Jan 2020 at 14:15, Dave Hall wrote:
> BTW, I did try to search the list archives via
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/, but that didn't work
> well for me. Is there another way to search?
With your favorite search engine (say Goog / ddg ), you can do this:
ceph
59 matches
Mail list logo