[ceph-users] running Firefly client (0.80.1) against older version (dumpling 0.67.10) cluster?

2014-08-13 Thread Nigel Williams
Anyone know if this is safe in the short term? we're rebuilding our
nova-compute nodes and can make sure the Dumpling versions are pinned
as part of the process in the future.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-deploy with --release (--stable) for dumpling?

2014-08-25 Thread Nigel Williams
ceph-deploy --release dumpling or previously ceph-deploy --stable
dumpling now results in Firefly (0.80.1) being installed, is this
intentional?

I'm adding another host with more OSDs and guessing it is preferable
to deploy the same version.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy with --release (--stable) for dumpling?

2014-08-26 Thread Nigel Williams
On Tue, Aug 26, 2014 at 5:10 PM, Konrad Gutkowski
 wrote:
> Ceph-deploy should set priority for ceph repository, which it doesn't, this
> usually installs the best available version from any repository.

Thanks Konrad for the tip. It took several goes (notably ceph-deploy
purge did not, for me at least, seem to be removing librbd1 cleanly)
but I managed to get 0.67.10 to be preferred, basically I did this:

root@ceph12:~# ceph -v
ceph version 0.67.10
root@ceph12:~# cat /etc/apt/preferences
Package: *
Pin: origin ceph.com
Pin-priority: 900

Package: *
Pin: origin ceph.newdream.net
Pin-priority: 900
root@ceph12:~#
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD journal deployment experiences

2014-09-05 Thread Nigel Williams
On Fri, Sep 5, 2014 at 5:46 PM, Dan Van Der Ster
 wrote:
>> On 05 Sep 2014, at 03:09, Christian Balzer  wrote:
>> You might want to look into cache pools (and dedicated SSD servers with
>> fast controllers and CPUs) in your test cluster and for the future.
>> Right now my impression is that there is quite a bit more polishing to be
>> done (retention of hot objects, etc) and there have been stability concerns
>> raised here.
>
> Right, Greg already said publicly not to use the cache tiers for RBD.

I lost the context for this statement you reference from Greg
(presumably Greg Farnum?) - was it a reference to bcache or Ceph cache
tiering? Could you point me to where it was stated please.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-02 Thread Nigel Williams
On Wed, Jun 3, 2015 at 8:30 AM,   wrote:
> We are running with Jumbo Frames turned on. Is that likely to be the issue?

I got caught by this previously:

http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2014-October/043955.html

The problem is Ceph "almost-but-not-quite" works, leading you down
lots of fruitless paths.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[ceph-users] anyone using CephFS for HPC?

2015-06-11 Thread Nigel Williams
Wondering if anyone has done comparisons between CephFS and other
parallel filesystems like Lustre typically used in HPC deployments
either for scratch storage or persistent storage to support HPC
workflows?

thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] anyone using CephFS for HPC?

2015-06-14 Thread Nigel Williams

On 12/06/2015 3:41 PM, Gregory Farnum wrote:

...  and the test evaluation was on repurposed Lustre
hardware so it was a bit odd, ...


Agree, it was old (at least by now) DDN kit (SFA10K?) and not ideally suited for Ceph 
(really high OSD per host ratio).



Sage's thesis or some of the earlier papers will be happy to tell you
all the ways in which Ceph > Lustre, of course, since creating a
successor is how the project started. ;)
-Greg


Thanks Greg, yes those original documents have been well-thumbed; but I was hoping someone 
had done a more recent comparison given the significant improvements over the last couple 
of Ceph releases.


My superficial poking about in Lustre doesn't reveal to me anything particularly 
compelling in the design or typical deployments that would magically yield 
higher-performance than an equally well tuned Ceph cluster. Blair Bethwaite commented that 
Lustre client-side write caching might be more effective than CephFS at the moment.




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] EC pool needs hosts equal to k + m?

2015-06-21 Thread Nigel Williams
I recall a post to the mailing list in the last week(s) where someone said that for an EC 
Pool the failure-domain defaults to having k+m hosts in some versions of Ceph?


Can anyone recall the post? have I got the requirement correct?


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] EC pool needs hosts equal to k + m?

2015-06-24 Thread Nigel Williams
On Wed, Jun 24, 2015 at 4:29 PM, Yueliang  wrote:
> When I use K+M hosts in the EC pool, if M hosts get down, still have K hosts
> active, Can I continue write data to the pool ?

If your CRUSH map specifies a failure-domain at the host level (so no
two chunks share the same host) then you will be unable to write to
the pool. If instead the failure-domain is OSD then with enough OSDs
pool writes would still be accepted.

> Since there only have K
> hosts, not K+M hosts, When client write a data to EC pool , Primary OSD will
> split the data to K data pieces,but how about the M coding pieces? is it
> still be calculated  and where it should be hold ?

Same as above, with failure-domain = host then there would be nowhere
to put the M coding pieces.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Configuring Ceph without DNS

2015-07-13 Thread Nigel Williams

> On 13 Jul 2015, at 4:58 pm, Abhishek Varshney  
> wrote:
> I have a requirement wherein I wish to setup Ceph where hostname resolution 
> is not supported and I just have IP addresses to work with. Is there a way 
> through which I can achieve this in Ceph? If yes, what are the caveats 
> associated with that approach?

We’ve been operating our Dumpling (now Firefly) cluster this way since it was 
put into production over 18-months ago, using host files to define all the Ceph 
hosts, works perfectly well.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 160 Thousand ceph-client.admin.*.asok files : Wired problem , never seen before

2015-08-09 Thread Nigel Williams

On 10/08/2015 12:02 AM, Robert LeBlanc wrote:
> I'm guessing this is on an OpenStack node? There is a fix for this and I
> think it will come out in the next release. For now we have had to disable
> the admin sockets.

Do you know what triggers the fault? we've not seen it on Firefly+RBD for 
Openstack.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-deply preflight hostname check?

2013-09-04 Thread Nigel Williams
I notice under HOSTNAME RESOLUTION section the use of 'host -4
{hostname}' as a required test, however, in all my trial deployments
so far, none would pass as this command is a direct DNS query, and
instead I usually just add entries to the host file.

Two thoughts, is Ceph expecting to only do DNS queries? or instead
would it be better for the pre-flight to use the getent  hosts
{hostname} as a test?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CephFS test-case

2013-09-06 Thread Nigel Williams
I appreciate CephFS is not a high priority, but this is a
user-experience test-case that can be a source of stability bugs for
Ceph developers to investigate (and hopefully resolve):

CephFS test-case

1. Create two clusters, each 3 nodes with 4 OSDs each

2. I used Ubuntu 13.04 followed by update/upgrade

3. Install Ceph version 0.61 on Cluster A

4. Install release on Cluster B with ceph-deploy

5. Fill Cluster A (version 0.61) with about one million files (all sizes)

6. rsync ClusterA ClusterB

7. In about 12-hours one or two OSDs on ClusterB will crash, restart
OSDs, restart rsync

8. At around 75% full OSDs on ClusterB will become out of balance
(some more full than others), one or more OSD will then crash.

For (4) it is possible to use freely available .ISOs of old user-group
CDROMs that are floating around the web, they are a good source of
varied content size, directory size and filename lengths.

My impression is that 0.61 was relatively stable but subsequent
version such as 0.67.2  are less stable in this particular scenario
with CephFS.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] newbie question: rebooting the whole cluster, powerfailure

2013-09-06 Thread Nigel Williams
On 06/09/2013, at 7:49 PM, "Bernhard Glomm"  wrote:
> Can I introduce the cluster network later on, after the cluster is deployed 
> and started working?
> (by editing ceph.conf, push it to the cluster members and restart the 
> daemons?)

Thanks Bernhard for asking this question, I have the same question. To 
rephrase, if we use ceph-deploy to setup a cluster, what is the recommended way 
to add the cluster/client networks later on?

It seems that ceph-deploy provides a minimal ceph.conf, not explicitly defining 
OSDs, how is this file later re-populated with the missing detail?


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] xfsprogs not found in RHEL

2013-09-11 Thread Nigel Williams
On Wed, Aug 28, 2013 at 4:46 PM, Stroppa Daniele (strp)  wrote:
> You might need the RHEL Scalable File System add-on.

Exactly.

I understand this needs to be purchased from Red Hat in order to get
access to it if you are using the Red Hat subscription management
system. I expect you could drag over the CentOS RPM but you would then
need to track updates/patches yourself (or minimally reconcile
differences between Red Hat and CentOS).

In summary: XFS on Red Hat is a paid-for-option.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Placement groups on a 216 OSD cluster with multiple pools

2013-11-14 Thread Nigel Williams

On 15/11/2013 8:57 AM, Dane Elwell wrote:

[2] - I realise the dangers/stupidity of a replica size of 0, but some of the 
data we wish
to store just isn’t /that/ important.


We've been thinking of this too. The application is storing boot-images, ISOs, local 
repository mirrors etc where recovery is easy with a slight inconvenience if the data has 
to be re-fetched. This suggests a neat additional feature for Ceph would be the ability to 
have metadata attached to zero-replica objects that includes a URL where a copy could be 
recovered/re-fetched. Then it could all happen auto-magically.


We also have users trampolining data between systems in order to buffer fast-data streams 
or handle data-surges. This can be zero-replica too.




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] beware of jumbo frames

2014-10-23 Thread Nigel Williams
Spent a frustrating day trying to build a new test cluster, turned out
I had jumbo frames set on the cluster-network only, but having
re-wired the machines recently with a new switch, I forgot to check it
could handle jumbo-frames (it can't).

Symptoms were stuck/unclean PGs - a small subset of PGs would go
active but always a proportion would not, got side-tracked by using a
ruleset set to OSD (it worked once) but would not work with host - all
red-herrings I think.

Anyhow, somewhere deep in Ceph a check might be useful at the network
layer for fragmentation (or just remember this message).

Thanks to Jean-Charles Lopez (JCL) on IRC for walking me through
diagnosis (and sticking with me) while I circled around and around...
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.87 Giant released

2014-10-29 Thread Nigel Williams

On 30/10/2014 8:56 AM, Sage Weil wrote:

* *Degraded vs misplaced*: the Ceph health reports from 'ceph -s' and
   related commands now make a distinction between data that is
   degraded (there are fewer than the desired number of copies) and
   data that is misplaced (stored in the wrong location in the
   cluster).


Is someone able to briefly described how/why misplaced happens please, is it repaired 
eventually? I've not seen misplaced (yet).




 leveldb_write_buffer_size = 32*1024*1024  = 33554432  // 32MB
 leveldb_cache_size= 512*1024*1204 = 536870912 // 512MB


I noticed the typo, wondered about the code, but I'm not seeing the same values 
anyway?

https://github.com/ceph/ceph/blob/giant/src/common/config_opts.h

OPTION(leveldb_write_buffer_size, OPT_U64, 8 *1024*1024) // leveldb write 
buffer size
OPTION(leveldb_cache_size, OPT_U64, 128 *1024*1024) // leveldb cache size





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.87 Giant released

2014-10-29 Thread Nigel Williams

On 30/10/2014 11:51 AM, Christian Balzer wrote:

Thus objects are (temporarily) not where they're supposed to be, but still
present in sufficient replication.


thanks for the reminder, I suppose that is obvious :-)


A much more benign scenario than degraded and I hope that this doesn't
even generate a WARN in the "ceph -s" report.


Better described as a transitory "hazardous" state, given that the PG distribution might 
not be optimal for a period of time and (inopportune) failures may tip the health into 
degraded.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Compile from source with Kinetic support

2014-11-28 Thread Nigel Williams
On Sat, Nov 29, 2014 at 5:19 AM, Julien Lutran  wrote:
> Where can I find this kinetic devel package ?

I guess you want this (C== kinetic client)? it has kinetic.h at least.

https://github.com/Seagate/kinetic-cpp-client
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] experimental features

2014-12-05 Thread Nigel Williams
On Sat, Dec 6, 2014 at 4:36 AM, Sage Weil  wrote:
> - enumerate experiemntal options we want to enable
>...
>   This has the property that no config change is necessary when the
> feature drops its experimental status.

It keeps the risky options in one place too so easier to spot.

> In all of these cases, we can also make a point of sending something to
> the log on daemon startup.  I don't think too many people will notice
> this, but it is better than nothing.

Perhaps change the cluster health status to FRAGILE? or AT_RISK?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] replacing an OSD or crush map sensitivity

2013-06-01 Thread Nigel Williams
Could I have a critique of this approach please as to how I could have 
done it better or whether what I experienced simply reflects work still 
to be done.


This is with Ceph 0.61.2 on a quite slow test cluster (logs shared with 
OSDs, no separate journals, using CephFS).


I knocked the power cord out from a storage node taking down 4 of the 
hosted OSDs, all but one came back ok. This is one OSD out of a total of 
12 so 1/12 of the storage.


Losing an OSD put the cluster into recovery, so all good. Next action 
was how to get the missing (downed) OSD back online.


The OSD was xfs based and so I had to throw away the xfs log to get it 
to mount. Having done this and getting it re-mounted Ceph then started 
throwing issue #4855 (I added dmesg and logs to that issue if it helps - 
I am wonder if throwing away the xfs log caused an internal OSD 
inconsistency? and this causes issue #4855?). Given that I could not 
"recover" this OSD as far as Ceph is concerned I decided to delete and 
rebuild it.


Several hours later, cluster was back to HEALTH_OK. I proceeded to 
remove and re-add the bad OSD. I following the doc suggestions to do this.


The problem is we each change, it caused a slight change in the crush 
map, resulting in the cluster going back into recovery, adding several 
hours wait for each change. I chose to wait until the cluster was back 
to HEALTH_OK before doing the next step. Overall it has taken a few days 
to finally get a single OSD back into the cluster.


At one point during recovery the full threshold was triggered on a 
single OSD causing the recovery to stop, doing "ceph pg set_full_ratio 
0.98" did not help. I was not planning to add data to the cluster while 
doing recovery operations and did not understand the suggestion the PGs 
could be deleted to make space on a "full" OSD, so I expect raising the 
threshold was the best option but it had no (immediate) effect.


I am now back to having all 12 OSDs in and the hopefully final recovery 
under way while it re-balances the OSDs, although I note I am still 
getting the full OSD warning I am expecting this to disappear soon now 
that the 12th OSD is back online.


During this recovery the percentage degraded has been a little 
confusing. While the 12th OSD was offline the percentages were around 
15-20% IIRC. But now I see the percentage is 35% and slowly dropping, 
not sure I understand the ratios and why so high with a single missing OSD.


A few documentation errors caused confusion too.

This page still contains errors in the steps to create a new OSD (manually):

http://eu.ceph.com/docs/wip-3060/cluster-ops/add-or-rm-osds/#adding-an-osd-manual

"ceph osd create {osd-num}" should be "ceph osd create"


and this:

http://eu.ceph.com/docs/wip-3060/cluster-ops/crush-map/#addosd

I had to put host= to get the command accepted.

Suggestions and questions:

1. Is there a way to get documentation pages fixed? or at least 
health-warnings on them: "This page badly needs updating since it is 
wrong/misleading"


2. We need a small set of definitive succinct recipes that provide steps 
to recover from common failures with a narrative around what to expect 
at each step (your cluster will be in recovery here...).


3. Some commands are throwing erroneous errors that are actually benign 
:ceph-osd -i 10 --mkfs --mkkey" complains about failures that are 
expected as the OSD is initially empty.


4. An easier way to capture the state of the cluster for analysis. I 
don't feel confident that when asked for "logs" that I am giving the 
most useful snippets or the complete story. It seems we need a tool that 
can gather all this in a neat bundle for later dissection or forensics.


5. Is there a more straightforward (faster) way getting an OSD back 
online. It almost seems like it is worth having a standby OSD ready to 
step in and assume duties (a hot spare?).


6. Is there a way to make the crush map less sensitive to changes during 
recovery operations? I would have liked to stall/slow recovery while I 
replaced the OSD then let it run at full speed.


Excuses:

I'd be happy to action suggestions but my current level of Ceph 
understanding is still too limited that effort on my part is 
unproductive; I am prodding the community to see if there is consensus 
on the need.





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] replacing an OSD or crush map sensitivity

2013-06-03 Thread Nigel Williams
On 4/06/2013 9:16 AM, Chen, Xiaoxi wrote:
> my 0.02, you really dont need to wait for health_ok between your
> recovery steps,just go ahead. Everytime a new map be generated and
> broadcasted,the old map and in-progress recovery will be canceled

thanks Xiaoxi, that is helpful to know.

It seems to me that there might be a failure-mode (or race-condition?)
here though, as the cluster is now struggling to recover as the
replacement OSD caused the cluster to go into backfill_toofull.

The failure sequence might be:

1. From HEALTH_OK crash an OSD
2. Wait for recovery
3. Remove OSD using usual procedures
4. Wait for recovery
5. Add back OSD using usual procedures
6. Wait for recovery
7. Cluster is unable to recover due to toofull conditions

Perhaps this is a needed test case to round-trip a cluster through a
known failure/recovery scenario.

Note this is using a simplistically configured test-cluster with CephFS
in the mix and about 2.5 million files.

Something else I noticed: I restarted the cluster (and set the leveldb
compact option since I'd run out of space on the roots) and now I see it
is again making progress on the backfill. Seems odd that the cluster
pauses but a restart clears the pause, is that by design?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] replacing an OSD or crush map sensitivity

2013-06-03 Thread Nigel Williams
On Tue, Jun 4, 2013 at 1:59 PM, Sage Weil  wrote:
> On Tue, 4 Jun 2013, Nigel Williams wrote:
>> Something else I noticed: ...
>
> Does the monitor data directory share a disk with an OSD?  If so, that
> makes sense: compaction freed enough space to drop below the threshold...

 Of course! that is exactly it, thanks - scratch that last
observation, red herring.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Drive replacement procedure

2013-06-24 Thread Nigel Williams

On 25/06/2013 5:59 AM, Brian Candler wrote:

On 24/06/2013 20:27, Dave Spano wrote:

Here's my procedure for manually adding OSDs.


The other thing I discovered is not to wait between steps; some changes result in a new 
crushmap, that then triggers replication. You want to speed through the steps so the 
cluster does not waste time moving objects around to meet the replica requirements until 
you have finished crushmap changes.




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Luminous 12.1.3: mgr errors

2017-08-11 Thread Nigel Williams
Cluster is ok and mgr is active, but unable to get the dashboard to start.

I see the following errors in logs:
2017-08-12 15:40:07.805991 7f508effd500  0 pidfile_write: ignore empty
--pid-file
2017-08-12 15:40:07.810124 7f508effd500 -1 auth: unable to find a keyring
on /var/lib/ceph/mgr/ceph-0/keyring: (2) No such file or directory
2017-08-12 15:40:07.810145 7f508effd500 -1 monclient: ERROR: missing
keyring, cannot use cephx for authentication


and an unrelated error I think:

RuntimeError: no certificate configured
raise RuntimeError('no certificate configured')
  File "/usr/lib/ceph/mgr/restful/module.py", line 299, in _serve
self._serve()
  File "/usr/lib/ceph/mgr/restful/module.py", line 248, in serve

Setup was done by:
added [mgr] section to ceph.conf, then:
ceph config-key put mgr/dashboard/server_addr ::
systemctl restart ceph-mgr@0
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous 12.1.3: mgr errors

2017-08-12 Thread Nigel Williams
On 12 August 2017 at 23:04, David Turner  wrote:

> I haven't set up the mgr service yet, but your daemon folder is missing
> it's keyring file (/var/lib/ceph/mgr/ceph-0/keyring). It's exactly what
> the error message says. When you set it up did you run a command pile ceph
> auth add? If you did, then you just need to ask the cluster what the auth
> key is and put it into that keyring file. You can look at the keyring for a
> month to see what format it's expecting.
>
​ceph auth list shows:
mgr.c0mds-100
key: AQDVXI1ZV1U0KRAAFDY6/ZCVzTjxhy0d5/ReSA==
caps: [mds] allow *
caps: [mon] allow profile mgr
caps: [osd] allow *

and there is this:

root@c0mds-100:/var/lib/ceph/mgr/ceph-c0mds-100# ls -l
total 4
-rw-r--r-- 1 root root  0 Aug 11 17:29 done
-rw-r--r-- 1 root root 64 Aug 11 17:29 keyring
-rw-r--r-- 1 root root  0 Aug 11 17:29 systemd
​
root@c0mds-100:/var/lib/ceph/mgr/ceph-c0mds-100# cat keyring
[mgr.c0mds-100]
key = AQDVXI1ZV1U0KRAAFDY6/ZCVzTjxhy0d5/ReSA==
root@c0mds-100:/var/lib/ceph/mgr/ceph-c0mds-100#

NOTE: this mgr is running on the MDS host.​ Everything was done with
ceph-deploy.

but mgr is looking for the keyring under /var/lib/ceph/mgr/ceph-0?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] State of play for RDMA on Luminous

2017-08-28 Thread Nigel Williams
On 29 August 2017 at 00:21, Haomai Wang  wrote:
> On Wed, Aug 23, 2017 at 1:26 AM, Florian Haas  wrote:
>> - And more broadly, if a user wants to use the performance benefits of
>> RDMA, but not all of their potential Ceph clients have InfiniBand HCAs,
>> what are their options? RoCE?
>
> roce v2 is supported

I've no experience with RoCE, but given Florian's question, is the
implication that Infiniband RDMA and RoCE can be bridged somehow?
otherwise how do mix clients with different transports access the same
Ceph cluster?

I'm guessing IPoIB clients could work with a RoCE Ceph cluster via an
Ethernet/Infiniband gateway (like the Mellanox product), but the IPoIB
clients could not do RDMA as this won't cross the gateway(at least I
understand this is the case with the Mellanox product).
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v12.2.0 Luminous released

2017-08-29 Thread Nigel Williams
On 30 August 2017 at 16:05, Mark Kirkwood  wrote:
> Very nice!
>
> I tested an upgrade from Jewel, pretty painless. However we forgot to merge:
>
> http://tracker.ceph.com/issues/20950
>
> So the mgr creation requires surgery still :-(
>
> regards
>
> Mark
>
>
>
> On 30/08/17 06:20, Abhishek Lekshmanan wrote:
>>
>> We're glad to announce the first release of Luminous v12.2.x long term
>> stable release series. There have been major changes since Kraken
>> (v11.2.z) and Jewel (v10.2.z), and the upgrade process is non-trivial.
>> Please read the release notes carefully.
>>
>> For more details, links & changelog please refer to the
>> complete release notes entry at the Ceph blog:
>> http://ceph.com/releases/v12-2-0-luminous-released/
>>
>>
>> Major Changes from Kraken
>> -
>>
>> - *General*:
>>* Ceph now has a simple, built-in web-based dashboard for monitoring
>> cluster
>>  status.
>>
>> - *RADOS*:
>>* *BlueStore*:
>>  - The new *BlueStore* backend for *ceph-osd* is now stable and the
>>new default for newly created OSDs.  BlueStore manages data
>>stored by each OSD by directly managing the physical HDDs or
>>SSDs without the use of an intervening file system like XFS.
>>This provides greater performance and features.
>>  - BlueStore supports full data and metadata checksums
>>of all data stored by Ceph.
>>  - BlueStore supports inline compression using zlib, snappy, or LZ4.
>> (Ceph
>>also supports zstd for RGW compression but zstd is not recommended
>> for
>>BlueStore for performance reasons.)
>>
>>* *Erasure coded* pools now have full support for overwrites
>>  allowing them to be used with RBD and CephFS.
>>
>>* *ceph-mgr*:
>>  - There is a new daemon, *ceph-mgr*, which is a required part of
>>any Ceph deployment.  Although IO can continue when *ceph-mgr*
>>is down, metrics will not refresh and some metrics-related calls
>>(e.g., `ceph df`) may block.  We recommend deploying several
>>instances of *ceph-mgr* for reliability.  See the notes on
>>Upgrading below.
>>  - The *ceph-mgr* daemon includes a REST-based management API.
>>The API is still experimental and somewhat limited but
>>will form the basis for API-based management of Ceph going forward.
>>  - ceph-mgr also includes a Prometheus exporter plugin, which can
>> provide Ceph
>>perfcounters to Prometheus.
>>  - ceph-mgr now has a Zabbix plugin. Using zabbix_sender it sends
>> trapper
>>events to a Zabbix server containing high-level information of the
>> Ceph
>>cluster. This makes it easy to monitor a Ceph cluster's status and
>> send
>>out notifications in case of a malfunction.
>>
>>* The overall *scalability* of the cluster has improved. We have
>>  successfully tested clusters with up to 10,000 OSDs.
>>* Each OSD can now have a device class associated with
>>  it (e.g., `hdd` or `ssd`), allowing CRUSH rules to trivially map
>>  data to a subset of devices in the system.  Manually writing CRUSH
>>  rules or manual editing of the CRUSH is normally not required.
>>* There is a new upmap exception mechanism that allows individual PGs
>> to be moved around to achieve
>>  a *perfect distribution* (this requires luminous clients).
>>* Each OSD now adjusts its default configuration based on whether the
>>  backing device is an HDD or SSD. Manual tuning generally not
>> required.
>>* The prototype mClock QoS queueing algorithm is now available.
>>* There is now a *backoff* mechanism that prevents OSDs from being
>>  overloaded by requests to objects or PGs that are not currently able
>> to
>>  process IO.
>>* There is a simplified OSD replacement process that is more robust.
>>* You can query the supported features and (apparent) releases of
>>  all connected daemons and clients with `ceph features`
>>* You can configure the oldest Ceph client version you wish to allow to
>>  connect to the cluster via `ceph osd set-require-min-compat-client`
>> and
>>  Ceph will prevent you from enabling features that will break
>> compatibility
>>  with those clients.
>>* Several `sleep` settings, include `osd_recovery_sleep`,
>>  `osd_snap_trim_sleep`, and `osd_scrub_sleep` have been
>>  reimplemented to work efficiently.  (These are used in some cases
>>  to work around issues throttling background work.)
>>* Pools are now expected to be associated with the application using
>> them.
>>  Upon completing the upgrade to Luminous, the cluster will attempt to
>> associate
>>  existing pools to known applications (i.e. CephFS, RBD, and RGW).
>> In-use pools
>>  that are not associated to an application will generate a health
>> warning. Any
>>  unassociated pools can be manually associated using the new
>>  `ceph osd pool applica

Re: [ceph-users] v12.2.0 Luminous released

2017-08-29 Thread Nigel Williams
> On 30 August 2017 at 16:05, Mark Kirkwood  
> wrote:
>> http://tracker.ceph.com/issues/20950
>>
>> So the mgr creation requires surgery still :-(

is there a way out of this error with ceph-mgr?

mgr init Authentication failed, did you specify a mgr ID with a valid keyring?

root@c0mds-100:~# systemctl status ceph-mgr@c0mds-100
● ceph-mgr@c0mds-100.service - Ceph cluster manager daemon
   Loaded: loaded (/lib/systemd/system/ceph-mgr@.service; enabled;
vendor preset: enabled)
   Active: inactive (dead) (Result: exit-code) since Wed 2017-08-30
16:40:43 AEST; 3min 36s ago
  Process: 1821 ExecStart=/usr/bin/ceph-mgr -f --cluster ${CLUSTER}
--id %i --setuser ceph --setgroup ceph (code=exited, status=255)
 Main PID: 1821 (code=exited, status=255)


as reported previously, pre 12.2.0 versions seemed to create erroneous
ceph-mgr with the wrong host identifier (in my case
/var/lib/ceph/mgr/ceph-0 and ceph-c0mds/)


sorry for the previous empty email...keyboard stuck...
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v12.2.0 Luminous released

2017-08-30 Thread Nigel Williams
On 30 August 2017 at 17:43, Mark Kirkwood  wrote:
> Yes - you just edit /var/lib/ceph/bootstrap-mgr/ceph.keyring so the key
> matches what 'ceph auth list' shows and re-deploy the mgr (worked for me in
> 12.1.3/4 and 12.2.0).

thanks for the tip, what I did to get it work:

- had already sync'd the keyrings
- redid ceph-deploy --overwrite-conf mgr create c0mds-100
- ceph mgr module enable dashboard

I wasn't expecting the last item since I had added the [mgr] section
to /etc/ceph/ceph.conf...

anyhow, working now
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Centos7, luminous, cephfs, .snaps

2017-08-30 Thread Nigel Williams
On 30 August 2017 at 18:52, Marc Roos  wrote:
> I noticed it is .snap not .snaps

Yes

> mkdir: cannot create directory ‘.snap/snap1’: Operation not permitted
>
> Is this because my permissions are insufficient on the client id?

fairly sure you've forgotten this step:

ceph mds set allow_new_snaps true --yes-i-really-mean-it
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v12.2.0 Luminous released

2017-08-30 Thread Nigel Williams
On 30 August 2017 at 20:53, John Spray  wrote:
> The mgr_initial_modules setting is only applied at the point of
> cluster creation,

ok.

> so I would guess that if it didn't seem to take
> effect then this was an upgrade from >=11.x

not quite, it was a clean install of Luminous, and somewhere around
12.1.3, ceph-deploy got confused about the service name, it created
both ceph-0 and a c0mds-100 entries under /var/lib/ceph and messed up
the keys.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore disk colocation using NVRAM, SSD and SATA

2017-09-20 Thread Nigel Williams
On 21 September 2017 at 04:53, Maximiliano Venesio 
wrote:

> Hi guys i'm reading different documents about bluestore, and it never
> recommends to use NVRAM to store the bluefs db, nevertheless the official
> documentation says that, is better to use the faster device to put the
> block.db in.
>

​Likely not mentioned since no one yet has had the opportunity to test it.​

So how do i have to deploy using bluestore, regarding where i should put
> block.wal and block.db ?
>

​block.* would be best on your NVRAM device, like this:

​ceph-deploy osd create --bluestore c0osd-136:/dev/sda --block-wal
/dev/nvme0n1 --block-db /dev/nvme0n1
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore OSD_DATA, WAL & DB

2017-09-25 Thread Nigel Williams
On 26 September 2017 at 01:10, David Turner  wrote:
> If they are on separate
> devices, then you need to make it as big as you need to to ensure that it
> won't spill over (or if it does that you're ok with the degraded performance
> while the db partition is full).  I haven't come across an equation to judge
> what size should be used for either partition yet.

Is it the case that only the WAL will spill if there is a backlog
clearing entries into the DB partition? so the WAL's fill-mark
oscillates but the DB is going to steadily grow (depending on the
previously mentioned factors of "...extents, checksums, RGW bucket
indices, and potentially other random stuff".

Is there an indicator that can be monitored to show that a spill is occurring?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore OSD_DATA, WAL & DB

2017-09-25 Thread Nigel Williams
On 26 September 2017 at 08:11, Mark Nelson  wrote:
> The WAL should never grow larger than the size of the buffers you've
> specified.  It's the DB that can grow and is difficult to estimate both
> because different workloads will cause different numbers of extents and
> objects, but also because rocksdb itself causes a certain amount of
> space-amplification due to a variety of factors.

Ok, I was confused whether both types could spill. within Bluestore it
simply blocks if the WAL hits 100%?

Would a drastic (quick) action to correct a too-small-DB-partition
(impacting performance) is to destroy the OSD and rebuild it with a
larger DB partition?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] clients failing to advance oldest client/flush tid

2017-10-09 Thread Nigel Williams
On 9 October 2017 at 19:21, Jake Grimmett  wrote:
> HEALTH_WARN 9 clients failing to advance oldest client/flush tid;
> 1 MDSs report slow requests; 1 MDSs behind on trimming

On a proof-of-concept 12.2.1 cluster (few random files added, 30 OSDs,
default Ceph settings) I can get the above error by doing this from a
client:

bonnie++ -s 0 -n 1000 -u 0

This makes 1 million files in a single directory (we wanted to see
what might break).

This takes a few hours to run but seems to finish without incident.
Over that time we get this in the logs:

root@c0mon-101:/var/log/ceph# zcat ceph-mon.c0mon-101.log.6.gz|fgrep MDS_TRIM
2017-10-04 11:14:18.489943 7ff914a26700  0 log_channel(cluster) log
[WRN] : Health check failed: 1 MDSs behind on trimming (MDS_TRIM)
2017-10-04 11:14:22.523117 7ff914a26700  0 log_channel(cluster) log
[INF] : Health check cleared: MDS_TRIM (was: 1 MDSs behind on
trimming)
2017-10-04 11:14:26.589797 7ff914a26700  0 log_channel(cluster) log
[WRN] : Health check failed: 1 MDSs behind on trimming (MDS_TRIM)
2017-10-04 11:14:34.614567 7ff914a26700  0 log_channel(cluster) log
[INF] : Health check cleared: MDS_TRIM (was: 1 MDSs behind on
trimming)
2017-10-04 20:38:22.812032 7ff914a26700  0 log_channel(cluster) log
[WRN] : Health check failed: 1 MDSs behind on trimming (MDS_TRIM)
2017-10-04 20:41:14.700521 7ff914a26700  0 log_channel(cluster) log
[INF] : Health check cleared: MDS_TRIM (was: 1 MDSs behind on
trimming)
root@c0mon-101:/var/log/ceph#
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore OSD_DATA, WAL & DB

2017-11-02 Thread Nigel Williams
On 3 November 2017 at 07:45, Martin Overgaard Hansen  wrote:
> I want to bring this subject back in the light and hope someone can provide
> insight regarding the issue, thanks.

Thanks Martin, I was going to do the same.

Is it possible to make the DB partition (on the fastest device) too
big? in other words is there a point where for a given set of OSDs
(number + size) the DB partition is sized too large and is wasting
resources. I recall a comment by someone proposing to split up a
single large (fast) SSD into 100GB partitions for each OSD.

The answer could be couched as some intersection of pool type (RBD /
RADOS / CephFS), object change(update?) intensity, size of OSD etc and
rule-of-thumb.

An idea occurred to me that by monitoring for the logged spill message
(the event when the DB partition spills/overflows to the OSD), OSDs
could be (lazily) destroyed and recreated with a new DB partition
increased in size say by 10% each time.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to improve performance

2017-11-20 Thread Nigel Williams
On 20 November 2017 at 23:36, Christian Balzer  wrote:
> On Mon, 20 Nov 2017 14:02:30 +0200 Rudi Ahlers wrote:
>> The SATA drives are ST8000NM0055-1RM112
>>
> Note that these (while fast) have an internal flash cache, limiting them to
> something like 0.2 DWPD.
> Probably not an issue with the WAL/DB on the Intels, but something to keep
> in mind.

I had forgotten about the flash-cache hybrid drives. Seagate calls
them SSHD (Solid State Hard Drives) and as Christian highlights they
have several GB of SSD as an on-board cache. I looked at the
specifications for the ST8000NM0055 but I cannot see them listed as
SSHD, rather they seem like the usual Seagate Enterprise hard-drive.

https://www.seagate.com/www-content/product-content/enterprise-hdd-fam/enterprise-capacity-3-5-hdd/constellation-es-4/en-us/docs/ent-capacity-3-5-hdd-8tb-ds1863-2-1510us.pdf

Is there something in the specifications that gives them away as SSHD?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to improve performance

2017-11-20 Thread Nigel Williams
On 21 November 2017 at 10:07, Christian Balzer  wrote:
> On Tue, 21 Nov 2017 10:00:28 +1100 Nigel Williams wrote:
>> Is there something in the specifications that gives them away as SSHD?
>>
> The 550TB endurance per year for an 8TB drive and the claim of 30% faster
> IOPS would be a dead giveaway, one thinks.

I just found this other answer:

http://products.wdc.com/library/other/2579-772003.pdf

Hard-drive manufacturers introduced workload specifications because
they better model failure rates than MTTF.

I see the drive has 2MB of NOR-flash for write-caching, what happens
when this wears out?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Transparent huge pages

2017-11-28 Thread Nigel Williams
Given that memory is a key resource for Ceph, this advice about switching
Transparent Huge Pages kernel setting to madvise would be worth testing to
see if THP is helping or hindering.

Article:
https://blog.nelhage.com/post/transparent-hugepages/

Discussion:
https://news.ycombinator.com/item?id=15795337


echo madvise | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS - Mounting a second Ceph file system

2017-11-28 Thread Nigel Williams
On 29 November 2017 at 01:51, Daniel Baumann  wrote:
> On 11/28/17 15:09, Geoffrey Rhodes wrote:
>> I'd like to run more than one Ceph file system in the same cluster.

Are their opinions on how stable multiple filesystems per single Ceph
cluster is in practice? is anyone using it actively with a stressful
load?

I see the docs still place it under Experimental:

http://docs.ceph.com/docs/master/cephfs/experimental-features/#multiple-filesystems-within-a-ceph-cluster
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] BlueStore upgrade steps broken

2018-08-19 Thread Nigel Williams
On 18 August 2018 at 03:06, David Turner  wrote:

> The WAL will choose the fastest device available.
>

Any idea how it makes this determination automatically? is it doing a
hdparm -t or similar? is fastest=bandwidth, IOPs or latency?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph 12.2.4 - which OSD has slow requests ?

2018-04-17 Thread Nigel Williams
On 18 April 2018 at 05:52, Steven Vacaroaia  wrote:

> I can see many slow requests in the logs but no clue which OSD is the
> culprit
> How can I find the culprit ?
>

​ceph osd perf

or

ceph pg dump osds -f json-pretty | jq .[].fs_perf_stat

​searching the ML archives for threads about slow requests will surface
several techniques to explore.

slow requests site:http://lists.ceph.com/pipermail/ceph-users-ceph.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] network connectivity test tool?

2016-02-04 Thread Nigel Williams
I thought I had book-marked a neat shell script that used the
Ceph.conf definitions to do an all-to-all, all-to-one check of network
connectivity for a Ceph cluster (useful for discovering problems with
jumbo frames), but I've lost the bookmark and after trawling github
and trying various keywords cannot find it.

I thought the tool was in Ceph CBT or was a CERN-developed script, but
neither yielded a hit.

Anyone know where it is? thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] State of Ceph documention

2016-02-25 Thread Nigel Williams
On Fri, Feb 26, 2016 at 3:10 PM, Christian Balzer  wrote:

> Then we come to a typical problem for fast evolving SW like Ceph, things
> that are not present in older versions.


I was going to post on this too (I had similar frustrations), and would
like to propose that a move to splitting the documentation by versions:

OLD
http://docs.ceph.com/docs/master/rados/operations/cache-tiering/


NEW
http://docs.ceph.com/docs/master/hammer/rados/operations/cache-tiering/

http://docs.ceph.com/docs/master/infernalis/rados/operations/cache-tiering/

http://docs.ceph.com/docs/master/jewel/rados/operations/cache-tiering/

and so on.

When a new version is started, the documentation should be 100% cloned and
the tree restructured around the version. It could equally be a drop-down
on the page to select the version.

Postgres for example uses a similar mechanism:

http://www.postgresql.org/docs/

Note the version numbers are embedded in the URL. I like their commenting
mechanism too as it provides a running narrative of changes that should be
considered as practice develops around things to do or avoid.

Once the documentation is cloned for the new version, all the inapplicable
material should be removed and the new features/practice changes should be
added.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] State of Ceph documention

2016-02-25 Thread Nigel Williams
On Fri, Feb 26, 2016 at 4:09 PM, Adam Tygart  wrote:
> The docs are already split by version, although it doesn't help that
> it isn't linked in an obvious manner.
>
> http://docs.ceph.com/docs/master/rados/operations/cache-tiering/

Is there any reason to keep this "master" (version-less variant) given
how much confusion it causes?

I think I noticed the version split one time back but it didn't lodge
in my mind, and when I looked for something today I hit the "master"
and there were no hits for the version (which I should have been
looking at).

I'd be glad to contribute to the documentation effort. For example I
would like to be able to ask questions around the terminology that is
scattered through the documentation that I think needs better
explanation. I'm not sure if pull-requests that try to annotate what
is there would mean some parts would become a wall of text whereas the
explanation would be better suited as a (more informal) comment-thread
at the bottom of the page that can be browsed (mainly by beginners
trying to navigate an unfamiliar architecture).
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] State of Ceph documention

2016-02-26 Thread Nigel Williams
On Fri, Feb 26, 2016 at 11:28 PM, John Spray  wrote:
> Some projects have big angry warning banners at the top of their
> master branch documentation, I think perhaps we should do that too,
> and at the same time try to find a way to steer google hits to the
> latest stable branch docs rather than to master.

Are there reasons to "publish" the version-less master? Maybe I've
missed the explanation for why master is necessary, but could it be
completely hidden?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] State of Ceph documention

2016-02-26 Thread Nigel Williams
On Sat, Feb 27, 2016 at 12:08 AM, Andy Allan  wrote:
> When I made a (trivial, to be fair) documentation PR it was dealt with
> immediately, both when I opened it, and when I fixed up my commit
> message. I'd recommend that if anyone sees anything wrong with the
> docs, just submit a PR with the fix.

Are we collectively ok with the discussion about the documentation to
happen via the repo (presumably on github)? The limitation with PRs is
the submitter has to suggest a change when sometimes it is a
less-formal interpretation question.

Or will it be ok to conduct the discussions on this mailing list to
form up the ultimate PR?

I'm reluctant to suggest a ceph-docs mailing list but that would be
another option if we can't have commentary on the documentation
web-pages.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] BlueFS spillover detected - 14.2.1

2019-06-19 Thread Nigel Williams
On Thu, 20 Jun 2019 at 09:12, Vitaliy Filippov  wrote:

> All values except 4, 30 and 286 GB are currently useless in ceph with
> default rocksdb settings :)
>

however, several commenters have said that during compaction rocksdb needs
space during the process, and hence the DB partition needs to be twice
those sizes, so 8GB, 60GB and 600GB.

Does rocksdb spill during compaction if it doesn't have enough space?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] show-prediction-config - no valid command found?

2019-06-26 Thread Nigel Williams
Have I missed a step? Diskprediction module is not working for me.

root@cnx-11:/var/log/ceph# ceph device show-prediction-config
no valid command found; 10 closest matches:

root@cnx-11:/var/log/ceph# ceph mgr module ls
{
"enabled_modules": [
"dashboard",
"diskprediction_cloud",
"iostat",
"pg_autoscaler",
"prometheus",
"restful"
],...

root@cnx-11:/var/log/ceph# ceph -v
ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus
(stable)

One other failure I get for:
ceph device get-health-metrics INTEL_SSDPE2KE020T7_BTLE74200D8J2P0DGN
...
"nvme_vendor": "intel",
"dev": "/dev/nvme0n1",
"error": "smartctl returned invalid JSON"
...
with smartmon 7.1
Using this version directly with the device and with JSON output parses ok
(using an online parser).
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Nautilus - cephfs auth caps problem?

2019-07-02 Thread Nigel Williams
I am getting "Operation not permitted" on a write when trying to set caps
for a user. Admin user (allow * for everything) works ok.

This does not work:
caps: [mds] allow r,allow rw path=/home
caps: [mon] allow r
caps: [osd] allow rwx tag cephfs data=cephfs_data2

This does work:
caps: [mds] allow r,allow rw path=/home
caps: [mon] allow r
caps: [osd] allow *

Nothing specific I set for the OSD caps, allows files to be written,
although I can create files and directories.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Nautilus - cephfs auth caps problem?

2019-07-03 Thread Nigel Williams
thanks for the tip, I did wonder about that, and checked that at one point,
and assumed that was ok.

root@cnx-11:~# ceph osd pool application get cephfs_data
{
"cephfs": {
"data": "cephfs"
}
}
root@cnx-11:~# ceph osd pool application get cephfs_data2
{
"cephfs": {
"data": "cephfs"
}
}
root@cnx-11:~# ceph osd pool application get cephfs_metadata
{
"cephfs": {
"metadata": "cephfs"
}
}
root@cnx-11:~#

Is the act of setting it again likely to make a needed change elsewhere
that is fixed by that git pull?


On Wed, 3 Jul 2019 at 17:20, Paul Emmerich  wrote:

> Your cephfs was probably created with a buggy version that didn't set the
> metadata tags on the data pools correctly. IIRC there still isn't any
> automated migration of old broken pools.
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Nautilus 14.2.1 / 14.2.2 crash

2019-07-19 Thread Nigel Williams
On Sat, 20 Jul 2019 at 04:28, Nathan Fish  wrote:

> On further investigation, it seems to be this bug:
> http://tracker.ceph.com/issues/38724


We just upgraded to 14.2.2, and had a dozen OSDs at 14.2.2 go down this
bug, recovered with:

systemctl reset-failed ceph-osd@160
systemctl start ceph-osd@160
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] fixing a bad PG per OSD decision with pg-autoscaling?

2019-08-20 Thread Nigel Williams
Due to a gross miscalculation several years ago I set way too many PGs for
our original Hammer cluster. We've lived with it ever since, but now we are
on Luminous, changes result in stuck-requests and balancing problems.

The cluster currently has 12% misplaced, and is grinding to re-balance but
is unusable to clients (even with osd_max_pg_per_osd_hard_ratio set to 32,
and mon_max_pg_per_osd set to 1000).

Can I safely press on upgrading to Nautilus in this state so I can enable
the pg-autoscaling to finally fix the problem?

thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] cephfs 1 large omap objects

2019-10-06 Thread Nigel Williams
Out of the blue this popped up (on an otherwise healthy cluster):

HEALTH_WARN 1 large omap objects
LARGE_OMAP_OBJECTS 1 large omap objects
1 large objects found in pool 'cephfs_metadata'
Search the cluster log for 'Large omap object found' for more details.

"Search the cluster log" is somewhat opaque, there are logs for many
daemons, what is a "cluster" log? In the ML history some found it in the
OSD logs?

Another post suggested removing lost+found, but using cephfs-shell I don't
see one at the top-level, is there another way to disable this "feature"?

thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs 1 large omap objects

2019-10-06 Thread Nigel Williams
I followed some other suggested steps, and have this:

root@cnx-17:/var/log/ceph# zcat ceph-osd.178.log.?.gz|fgrep Large
2019-10-02 13:28:39.412 7f482ab1c700  0 log_channel(cluster) log [WRN] :
Large omap object found. Object: 2:654134d2:::mds0_openfiles.0:head Key
count: 306331 Size (bytes): 13993148
root@cnx-17:/var/log/ceph# ceph daemon osd.178 config show | grep
osd_deep_scrub_large_omap
"osd_deep_scrub_large_omap_object_key_threshold": "20",
"osd_deep_scrub_large_omap_object_value_sum_threshold": "1073741824",

root@cnx-11:~# rados -p cephfs_metadata stat 'mds0_openfiles.0'
cephfs_metadata/mds0_openfiles.0 mtime 2019-10-06 23:37:23.00, size 0
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs 1 large omap objects

2019-10-06 Thread Nigel Williams
I've adjusted the threshold:

ceph config set osd osd_deep_scrub_large_omap_object_key_threshold 35

Colleague suggested that this will take effect on the next deep-scrub.

Is the default of 200,000 too small? will this be adjusted in future
releases or is it meant to be adjusted in some use-cases?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Issues with Nautilus 14.2.6 ceph-volume lvm batch --bluestore ?

2020-01-19 Thread Nigel Williams
On Mon, 20 Jan 2020 at 14:15, Dave Hall  wrote:
> BTW, I did try to search the list archives via 
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/, but that didn't work 
> well for me.  Is there another way to search?

With your favorite search engine (say Goog / ddg ), you can do this:

ceph site:http://lists.ceph.com/pipermail/ceph-users-ceph.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com