Re: [ceph-users] Luminous : All OSDs not starting when ceph.target is started

2018-01-24 Thread nokia ceph
Hi,

For the above issue we found a work-around.

1)we created a directory 'ceph-osd.target.wants'
2)Created simlinks to OSD service for all OSDs. Sample below:
cn7.chn6us1c1.cdn /etc/systemd/system/ceph-osd.target.wants# ll
total 0
lrwxrwxrwx 1 root root 41 Jan 23 09:36 ceph-osd@102.service ->
/usr/lib/systemd/system/ceph-osd@.service
lrwxrwxrwx 1 root root 41 Jan 23 09:36 ceph-osd@107.service ->
/usr/lib/systemd/system/ceph-osd@.service
lrwxrwxrwx 1 root root 41 Jan 23 09:36 ceph-osd@112.service ->
/usr/lib/systemd/system/ceph-osd@.service
lrwxrwxrwx 1 root root 41 Jan 23 09:36 ceph-osd@117.service ->
/usr/lib/systemd/system/ceph-osd@.service
lrwxrwxrwx 1 root root 41 Jan 23 09:36 ceph-osd@122.service ->
/usr/lib/systemd/system/ceph-osd@.service
lrwxrwxrwx 1 root root 41 Jan 23 09:36 ceph-osd@127.service ->
/usr/lib/systemd/system/ceph-osd@.service
lrwxrwxrwx 1 root root 41 Jan 23 09:36 ceph-osd@12.service ->
/usr/lib/systemd/system/ceph-osd@.service
.
.
.



On Mon, Jan 8, 2018 at 3:49 PM, nokia ceph  wrote:

> Hello,
>
> i have installed Luminous 12.2.2 on a 5 node cluster with logical volume
> OSDs.
> I am trying to stop and start ceph on one of the nodes using systemctl
> commands.
> *systemctl stop ceph.target; systemctl start ceph.target*
>
> When i stop ceph, all OSDs are stopped on the node properly.
> But when i start ceph, i see only 14 OSDs(out of 68) being started.
> Remaining OSDs are not starting automatically. When i check the OSD service
> status, i see it is still inactive and not starting.
>
> When i reboot the same node, i do not see the above problem. All OSDs come
> up automatically.
>
> ENV:
> Luminous 12.2.2 , 5 node, 5 mon and 5 mgr
> CentOS7.4
> EC 4+1 , 68 disks per node, bluestore
>
> Has anyone faced the same or have any suggestions?
>
> Thanks in advance
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] SPDK for BlueStore rocksDB

2018-01-24 Thread Jorge Pinilla López
Hey, sorry if the question doesnt really make a lot of sense I am
talking from almost complete ignorace of the topic, but there is not a
lof info about it

I am planning about creating a cluster with 7~10 NL-SAS HDD and 1 nmve
per host.

The nmve would be used as rocksDB and journal for each osd (hdd block
device) but I am worried about that being a bottleneck as all the osds
would share the same device.

I've been reading about spdk for nmve devices and I have seen that
blueStore configuration supports it
(http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#spdk-usage)
but I would like the know the current status of the spdk and if I could
use it for rocksDB device and not for block device

I've been reading also about RDMA and I would like to know if I could
use it in this scenario, all  I have found was using hole nmve devices
for block and RocksDB

I would really apreciate if someone could introduce me about this topic,
its really interesting but also confusing at the same time.

Thanks a lot!

-- 

*Jorge Pinilla López*
jorp...@unizar.es
Estudiante de ingenieria informática
Becario del area de sistemas (SICUZ)
Universidad de Zaragoza
PGP-KeyID: A34331932EBC715A


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SPDK for BlueStore rocksDB

2018-01-24 Thread Igor Fedotov

Jorge,

I'd suggest to start with regular (non-SPDK) configuration and deploy 
test cluster. Then do some benchmarking against it and check if nvme 
drive is the actual bottleneck. I doubt it is though. I did some 
experiments a while ago and didn't see any benefit from SPDK in my case 
- probably due to bottlenecks were somewhere else.



Hope this helps,

Igor


On 1/24/2018 12:46 PM, Jorge Pinilla López wrote:


Hey, sorry if the question doesnt really make a lot of sense I am 
talking from almost complete ignorace of the topic, but there is not a 
lof info about it


I am planning about creating a cluster with 7~10 NL-SAS HDD and 1 nmve 
per host.


The nmve would be used as rocksDB and journal for each osd (hdd block 
device) but I am worried about that being a bottleneck as all the osds 
would share the same device.


I've been reading about spdk for nmve devices and I have seen that 
blueStore configuration supports it 
(http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#spdk-usage) 
but I would like the know the current status of the spdk and if I 
could use it for rocksDB device and not for block device


I've been reading also about RDMA and I would like to know if I could 
use it in this scenario, all  I have found was using hole nmve devices 
for block and RocksDB


I would really apreciate if someone could introduce me about this 
topic, its really interesting but also confusing at the same time.


Thanks a lot!

--

*Jorge Pinilla López*
jorp...@unizar.es
Estudiante de ingenieria informática
Becario del area de sistemas (SICUZ)
Universidad de Zaragoza
PGP-KeyID: A34331932EBC715A 





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to remove deactivated cephFS

2018-01-24 Thread Thomas Bennett
Hi Eugen,

>From my experiences, to truely delete and recreate  the Ceph FS *cephfs*
file system I've done the following:

1. Remove the file system:
ceph fs rm cephfs --yes-i-really-mean-it
ceph fs rm_data_pool cephfs_data
ceph fs rm_data_pool cephfs cephfs_data

2. Remove the associated pools:
ceph osd pool delete cephfs_data cephfs_data --yes-i-really-really-mean-it
ceph osd pool delete cephfs_metadata
cephfs_metadata --yes-i-really-really-mean-it

3. Create a new default ceph file system:
ceph osd pool create cephfs_data  {} {}
ceph osd pool create cephfs_metadata  {} {}
ceph fs new cephfs cephfs_metadata cephfs_data
ceph fs set_default cephfs

Not sure if this helps, as you may need to repeat the whole process from
the start.

Regards,
Tom

On Mon, Jan 8, 2018 at 2:19 PM, Eugen Block  wrote:

> Hi list,
>
> all this is on Ceph 12.2.2.
>
> An existing cephFS (named "cephfs") was backed up as a tar ball, then
> "removed" ("ceph fs rm cephfs --yes-i-really-mean-it"), a new one created
> ("ceph fs new cephfs cephfs-metadata cephfs-data") and the content restored
> from the tar ball. According to the output of "ceph fs rm",  the old cephFS
> has only been deactivated, not deleted.  Looking at the Ceph manager's web
> interface, it now lists two entries "cephfs", one with id 0 (the "old" FS)
> and id "1" (the currently active FS).
>
> When we try to run "ceph fs status", we get an error with a traceback:
>
> ---cut here---
> ceph3:~ # ceph fs status
> Error EINVAL: Traceback (most recent call last):
>   File "/usr/lib64/ceph/mgr/status/module.py", line 301, in handle_command
> return self.handle_fs_status(cmd)
>   File "/usr/lib64/ceph/mgr/status/module.py", line 219, in
> handle_fs_status
> stats = pool_stats[pool_id]
> KeyError: (29L,)
> ---cut here---
>
> while this works:
>
> ---cut here---
> ceph3:~ # ceph fs ls
> name: cephfs, metadata pool: cephfs-metadata, data pools: [cephfs-data ]
> ---cut here---
>
> We see the new id 1 when we run
>
> ---cut here---
> ceph3:~ #  ceph fs get cephfs
> Filesystem 'cephfs' (1)
> fs_name cephfs
> [...]
> data_pools  [35]
> metadata_pool   36
> inline_data disabled
> balancer
> standby_count_wanted1
> [...]
> ---cut here---
>
> The new FS seems to work properly and can be mounted from the clients,
> just like before removing and rebuilding it. I'm not sure which other
> commands would fail with this traceback, for now "ceph fs status" is the
> only one.
>
> So it seems that having one deactivated cephFS has an impact on some of
> the functions/commands. Is there any way to remove it properly? Most of the
> commands work with the name, not the id of the FS, so it's difficult to
> access the data from the old FS. Has anyone some insights on how to clean
> this up?
>
> Regards,
> Eugen
>
> --
> Eugen Block voice   : +49-40-559 51 75
> NDE Netzdesign und -entwicklung AG  fax : +49-40-559 51 77
> Postfach 61 03 15
> D-22423 Hamburg e-mail  : ebl...@nde.ag
>
> Vorsitzende des Aufsichtsrates: Angelika Mozdzen
>   Sitz und Registergericht: Hamburg, HRB 90934
>   Vorstand: Jens-U. Mozdzen
>USt-IdNr. DE 814 013 983
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Thomas Bennett

SKA South Africa
Science Processing Team

Office: +27 21 5067341
Mobile: +27 79 5237105
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to remove deactivated cephFS

2018-01-24 Thread Eugen Block

Hi Tom,

thanks for the detailed steps.

I think our problem literally vanished. A couple of days after my  
email I noticed that the web interface suddenly  listed only one  
cephFS. Also the command "ceph fs status" doesn't return an error  
anymore but shows the corret output.

I guess Ceph is indeed a self-healing storage solution! :-)

Regards,
Eugen


Zitat von Thomas Bennett :


Hi Eugen,

From my experiences, to truely delete and recreate  the Ceph FS *cephfs*
file system I've done the following:

1. Remove the file system:
ceph fs rm cephfs --yes-i-really-mean-it
ceph fs rm_data_pool cephfs_data
ceph fs rm_data_pool cephfs cephfs_data

2. Remove the associated pools:
ceph osd pool delete cephfs_data cephfs_data --yes-i-really-really-mean-it
ceph osd pool delete cephfs_metadata
cephfs_metadata --yes-i-really-really-mean-it

3. Create a new default ceph file system:
ceph osd pool create cephfs_data  {} {}
ceph osd pool create cephfs_metadata  {} {}
ceph fs new cephfs cephfs_metadata cephfs_data
ceph fs set_default cephfs

Not sure if this helps, as you may need to repeat the whole process from
the start.

Regards,
Tom

On Mon, Jan 8, 2018 at 2:19 PM, Eugen Block  wrote:


Hi list,

all this is on Ceph 12.2.2.

An existing cephFS (named "cephfs") was backed up as a tar ball, then
"removed" ("ceph fs rm cephfs --yes-i-really-mean-it"), a new one created
("ceph fs new cephfs cephfs-metadata cephfs-data") and the content restored
from the tar ball. According to the output of "ceph fs rm",  the old cephFS
has only been deactivated, not deleted.  Looking at the Ceph manager's web
interface, it now lists two entries "cephfs", one with id 0 (the "old" FS)
and id "1" (the currently active FS).

When we try to run "ceph fs status", we get an error with a traceback:

---cut here---
ceph3:~ # ceph fs status
Error EINVAL: Traceback (most recent call last):
  File "/usr/lib64/ceph/mgr/status/module.py", line 301, in handle_command
return self.handle_fs_status(cmd)
  File "/usr/lib64/ceph/mgr/status/module.py", line 219, in
handle_fs_status
stats = pool_stats[pool_id]
KeyError: (29L,)
---cut here---

while this works:

---cut here---
ceph3:~ # ceph fs ls
name: cephfs, metadata pool: cephfs-metadata, data pools: [cephfs-data ]
---cut here---

We see the new id 1 when we run

---cut here---
ceph3:~ #  ceph fs get cephfs
Filesystem 'cephfs' (1)
fs_name cephfs
[...]
data_pools  [35]
metadata_pool   36
inline_data disabled
balancer
standby_count_wanted1
[...]
---cut here---

The new FS seems to work properly and can be mounted from the clients,
just like before removing and rebuilding it. I'm not sure which other
commands would fail with this traceback, for now "ceph fs status" is the
only one.

So it seems that having one deactivated cephFS has an impact on some of
the functions/commands. Is there any way to remove it properly? Most of the
commands work with the name, not the id of the FS, so it's difficult to
access the data from the old FS. Has anyone some insights on how to clean
this up?

Regards,
Eugen

--
Eugen Block voice   : +49-40-559 51 75
NDE Netzdesign und -entwicklung AG  fax : +49-40-559 51 77
Postfach 61 03 15
D-22423 Hamburg e-mail  : ebl...@nde.ag

Vorsitzende des Aufsichtsrates: Angelika Mozdzen
  Sitz und Registergericht: Hamburg, HRB 90934
  Vorstand: Jens-U. Mozdzen
   USt-IdNr. DE 814 013 983

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





--
Thomas Bennett

SKA South Africa
Science Processing Team

Office: +27 21 5067341
Mobile: +27 79 5237105




--
Eugen Block voice   : +49-40-559 51 75
NDE Netzdesign und -entwicklung AG  fax : +49-40-559 51 77
Postfach 61 03 15
D-22423 Hamburg e-mail  : ebl...@nde.ag

Vorsitzende des Aufsichtsrates: Angelika Mozdzen
  Sitz und Registergericht: Hamburg, HRB 90934
  Vorstand: Jens-U. Mozdzen
   USt-IdNr. DE 814 013 983

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] client with uid

2018-01-24 Thread Keane Wolter
Hello all,

I was looking at the Client Config Reference page (
http://docs.ceph.com/docs/master/cephfs/client-config-ref/) and there was
mention of a flag --client_with_uid. The way I read it is that you can
specify the UID of a user on a cephfs and the user mounting the filesystem
will act as the same UID. I am using the flags --client_mount_uid and
--client_mount_gid set equal to my UID and GID values on the cephfs when
running ceph-fuse. Is this the correct action for the flags or am I
misunderstanding the flags?

Thanks,
Keane
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD servers swapping despite having free memory capacity

2018-01-24 Thread Warren Wang
Forgot to mention another hint. If kswapd is constantly using CPU, and your sar 
-r ALL and sar -B stats look like it's trashing, kswapd is probably busy 
evicting things from memory in order to make a larger order allocation.

The other thing I can think of is if you have OSDs locking up and getting 
corrupted, there is a severe XFS bug where the kernel will throw a NULL pointer 
dereference under heavy memory pressure. Again, it's due to memory issues, but 
you will see the message in your kernel logs. It's fixed in upstream kernels as 
of this month. I forget what version exactly. 4.4.0-102? 
https://launchpad.net/bugs/1729256 

Warren Wang

On 1/23/18, 11:01 PM, "Blair Bethwaite"  wrote:

+1 to Warren's advice on checking for memory fragmentation. Are you
seeing kmem allocation failures in dmesg on these hosts?

On 24 January 2018 at 10:44, Warren Wang  wrote:
> Check /proc/buddyinfo for memory fragmentation. We have some pretty 
severe memory frag issues with Ceph to the point where we keep excessive 
min_free_kbytes configured (8GB), and are starting to order more memory than we 
actually need. If you have a lot of objects, you may find that you need to 
increase vfs_cache_pressure as well, to something like the default of 100.
>
> In your buddyinfo, the columns represent the quantity of each page size 
available. So if you only see numbers in the first 2 columns, you only have 4K 
and 8K pages available, and will fail any allocations larger than that. The 
problem is so severe for us that we have stopped using jumbo frames due to 
dropped packets as a result of not being able to DMA map pages that will fit 9K 
frames.
>
> In short, you might have enough memory, but not contiguous. It's even 
worse on RGW nodes.
>
> Warren Wang
>
> On 1/23/18, 2:56 PM, "ceph-users on behalf of Samuel Taylor Liston" 
 wrote:
>
> We have a 9 - node (16 - 8TB OSDs per node) running jewel on centos 
7.4.  The OSDs are configured with encryption.  The cluster is accessed via two 
- RGWs  and there are 3 - mon servers.  The data pool is using 6+3 erasure 
coding.
>
> About 2 weeks ago I found two of the nine servers wedged and had to 
hard power cycle them to get them back.  In this hard reboot 22 - OSDs came 
back with either a corrupted encryption or data partitions.  These OSDs were 
removed and recreated, and the resultant rebalance moved along just fine for 
about a week.  At the end of that week two different nodes were unresponsive 
complaining of page allocation failures.  This is when I realized the nodes 
were heavy into swap.  These nodes were configured with 64GB of RAM as a cost 
saving going against the 1GB per 1TB recommendation.  We have since then 
doubled the RAM in each of the nodes giving each of them more than the 1GB per 
1TB ratio.
>
> The issue I am running into is that these nodes are still swapping; a 
lot, and over time becoming unresponsive, or throwing page allocation failures. 
 As an example, “free” will show 15GB of RAM usage (out of 128GB) and 32GB of 
swap.  I have configured swappiness to 0 and and also turned up the 
vm.min_free_kbytes to 4GB to try to keep the kernel happy, and yet I am still 
filling up swap.  It only occurs when the OSDs have mounted partitions and 
ceph-osd daemons active.
>
> Anyone have an idea where this swap usage might be coming from?
> Thanks for any insight,
>
> Sam Liston (sam.lis...@utah.edu)
> 
> Center for High Performance Computing
> 155 S. 1452 E. Rm 405
> Salt Lake City, Utah 84112 (801)232-6932
> 
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Cheers,
~Blairo


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD servers swapping despite having free memory capacity

2018-01-24 Thread Nick Fisk
I know this may be a bit vague, but also suggests the "try a newer kernel" 
approach. We had constant problems with hosts mounting a number of RBD volumes 
formatted with XFS. The servers would start aggressively swapping even though 
the actual memory in use was nowhere near even 50% and eventually processes 
started dying/hanging (Not OOM though). I couldn't quite put my finger on what 
was actually using the memory, but it looked almost like the page cache was not 
releasing memory when requested.

This was happening on the  4.10 kernel, updated to 4.14 and the problem 
completely disappeared.

I've attached a graph (if it gets through) showing the memory change between 
4.10 and 4.14 on the 22nd Nov

Nick


> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Warren Wang
> Sent: 24 January 2018 17:54
> To: Blair Bethwaite 
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] OSD servers swapping despite having free memory
> capacity
> 
> Forgot to mention another hint. If kswapd is constantly using CPU, and your 
> sar -
> r ALL and sar -B stats look like it's trashing, kswapd is probably busy 
> evicting
> things from memory in order to make a larger order allocation.
> 
> The other thing I can think of is if you have OSDs locking up and getting
> corrupted, there is a severe XFS bug where the kernel will throw a NULL 
> pointer
> dereference under heavy memory pressure. Again, it's due to memory issues,
> but you will see the message in your kernel logs. It's fixed in upstream 
> kernels as
> of this month. I forget what version exactly. 4.4.0-102?
> https://launchpad.net/bugs/1729256
> 
> Warren Wang
> 
> On 1/23/18, 11:01 PM, "Blair Bethwaite"  wrote:
> 
> +1 to Warren's advice on checking for memory fragmentation. Are you
> seeing kmem allocation failures in dmesg on these hosts?
> 
> On 24 January 2018 at 10:44, Warren Wang 
> wrote:
> > Check /proc/buddyinfo for memory fragmentation. We have some pretty
> severe memory frag issues with Ceph to the point where we keep excessive
> min_free_kbytes configured (8GB), and are starting to order more memory than
> we actually need. If you have a lot of objects, you may find that you need to
> increase vfs_cache_pressure as well, to something like the default of 100.
> >
> > In your buddyinfo, the columns represent the quantity of each page size
> available. So if you only see numbers in the first 2 columns, you only have 
> 4K and
> 8K pages available, and will fail any allocations larger than that. The 
> problem is
> so severe for us that we have stopped using jumbo frames due to dropped
> packets as a result of not being able to DMA map pages that will fit 9K 
> frames.
> >
> > In short, you might have enough memory, but not contiguous. It's even
> worse on RGW nodes.
> >
> > Warren Wang
> >
> > On 1/23/18, 2:56 PM, "ceph-users on behalf of Samuel Taylor Liston" 
>  users-boun...@lists.ceph.com on behalf of sam.lis...@utah.edu> wrote:
> >
> > We have a 9 - node (16 - 8TB OSDs per node) running jewel on centos 
> 7.4.
> The OSDs are configured with encryption.  The cluster is accessed via two -
> RGWs  and there are 3 - mon servers.  The data pool is using 6+3 erasure 
> coding.
> >
> > About 2 weeks ago I found two of the nine servers wedged and had to
> hard power cycle them to get them back.  In this hard reboot 22 - OSDs came
> back with either a corrupted encryption or data partitions.  These OSDs were
> removed and recreated, and the resultant rebalance moved along just fine for
> about a week.  At the end of that week two different nodes were unresponsive
> complaining of page allocation failures.  This is when I realized the nodes 
> were
> heavy into swap.  These nodes were configured with 64GB of RAM as a cost
> saving going against the 1GB per 1TB recommendation.  We have since then
> doubled the RAM in each of the nodes giving each of them more than the 1GB
> per 1TB ratio.
> >
> > The issue I am running into is that these nodes are still swapping; 
> a lot,
> and over time becoming unresponsive, or throwing page allocation failures.  As
> an example, “free” will show 15GB of RAM usage (out of 128GB) and 32GB of
> swap.  I have configured swappiness to 0 and and also turned up the
> vm.min_free_kbytes to 4GB to try to keep the kernel happy, and yet I am still
> filling up swap.  It only occurs when the OSDs have mounted partitions and 
> ceph-
> osd daemons active.
> >
> > Anyone have an idea where this swap usage might be coming from?
> > Thanks for any insight,
> >
> > Sam Liston (sam.lis...@utah.edu)
> > 
> > Center for High Performance Computing
> > 155 S. 1452 E. Rm 405
> > Salt Lake City, Utah 84112 (801)232-6932
> > 
> >
> >
> >
> > 

[ceph-users] Signature check failures.

2018-01-24 Thread Cary
Hello,

 We are running Luminous 12.2.2. 6 OSD hosts with 12 1TB, and 64GB
RAM. Each host with a SSD for Bluestore's block.wal and block.db.
There are 5 monitor nodes as well with 32GB RAM. All servers have
Gentoo with kernel, 4.12.12-gentoo.

When I export an image using:
rbd export pool-name/volume-name  /location/image-name.raw

The following messages show up. The signature check fails randomly.
The image is still exported successfully.

2018-01-24 17:35:15.616080 7fc8d4024700  0 cephx:
verify_authorizer_reply bad nonce got 4552544084014661633 expected
4552499520046621785 sent 4552499520046621784
2018-01-24 17:35:15.616098 7fc8d4024700  0 --
172.21.32.16:0/1412094654 >> 172.21.32.6:6802/6219 conn(0x7fc8b0078a50
:-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0
l=1)._process_connection failed verifying authorize reply
2018-01-24 17:35:15.699004 7fc8d4024700  0 SIGN: MSG 2 Message
signature does not match contents.
2018-01-24 17:35:15.699020 7fc8d4024700  0 SIGN: MSG 2Signature on message:
2018-01-24 17:35:15.699021 7fc8d4024700  0 SIGN: MSG 2sig:
8189090775647585001
2018-01-24 17:35:15.699047 7fc8d4024700  0 SIGN: MSG 2Locally
calculated signature:
2018-01-24 17:35:15.699048 7fc8d4024700  0 SIGN: MSG 2
sig_check:140500325643792
2018-01-24 17:35:15.699049 7fc8d4024700  0 Signature failed.
2018-01-24 17:35:15.699050 7fc8d4024700  0 --
172.21.32.16:0/1412094654 >> 172.21.32.2:6807/153106
conn(0x7fc8bc020870 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH
pgs=26018 cs=1 l=1).process Signature check failed

Does anyone know what could cause this, and what I can do to fix it.

Thank you,

Cary
-Dynamic
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous - bad performance

2018-01-24 Thread Steven Vacaroaia
Hi ,

I have bundled the public NICs and added 2 more monitors ( running on 2 of
the 3 OSD hosts)
This seem to improve  things but still I have high latency
Also performance of the SSD pool is worse than HDD which is very confusing

SSDpool is using one Toshiba PX05SMB040Y per server ( for a total of 3
OSDs)
while HDD pool is using 2 Seagate ST600MM0006 disks per server () for a
total of 6 OSDs)

Note
I have also disabled  C state in the BIOS and added  "intel_pstate=disable
intel_idle.max_cstate=0 processor.max_cstate=0 idle=poll" to GRUB

Any hints/suggestions will be greatly appreciated

[root@osd04 ~]# ceph status
  cluster:
id: 37161a51-a159-4895-a7fd-3b0d857f1b66
health: HEALTH_WARN
noscrub,nodeep-scrub flag(s) set
application not enabled on 2 pool(s)
mon osd02 is low on available space

  services:
mon: 3 daemons, quorum osd01,osd02,mon01
mgr: mon01(active)
osd: 9 osds: 9 up, 9 in
 flags noscrub,nodeep-scrub
tcmu-runner: 6 daemons active

  data:
pools:   2 pools, 228 pgs
objects: 50384 objects, 196 GB
usage:   402 GB used, 3504 GB / 3906 GB avail
pgs: 228 active+clean

  io:
client:   46061 kB/s rd, 852 B/s wr, 15 op/s rd, 0 op/s wr

[root@osd04 ~]# ceph osd tree
ID  CLASS WEIGHT  TYPE NAME  STATUS REWEIGHT PRI-AFF
 -9   4.5 root ssds
-10   1.5 host osd01-ssd
  6   hdd 1.5 osd.6  up  1.0 1.0
-11   1.5 host osd02-ssd
  7   hdd 1.5 osd.7  up  1.0 1.0
-12   1.5 host osd04-ssd
  8   hdd 1.5 osd.8  up  1.0 1.0
 -1   2.72574 root default
 -3   1.09058 host osd01
  0   hdd 0.54529 osd.0  up  1.0 1.0
  4   hdd 0.54529 osd.4  up  1.0 1.0
 -5   1.09058 host osd02
  1   hdd 0.54529 osd.1  up  1.0 1.0
  3   hdd 0.54529 osd.3  up  1.0 1.0
 -7   0.54459 host osd04
  2   hdd 0.27229 osd.2  up  1.0 1.0
  5   hdd 0.27229 osd.5  up  1.0 1.0


 rados bench -p ssdpool 300 -t 32 write --no-cleanup && rados bench -p
ssdpool 300 -t 32  seq
Total time run: 302.058832
Total writes made:  4100
Write size: 4194304
Object size:4194304
Bandwidth (MB/sec): 54.2941
Stddev Bandwidth:   70.3355
Max bandwidth (MB/sec): 252
Min bandwidth (MB/sec): 0
Average IOPS:   13
Stddev IOPS:17
Max IOPS:   63
Min IOPS:   0
Average Latency(s): 2.35655
Stddev Latency(s):  4.4346
Max latency(s): 29.7027
Min latency(s): 0.045166

rados bench -p rbd 300 -t 32 write --no-cleanup && rados bench -p rbd 300
-t 32  seq
Total time run: 301.428571
Total writes made:  8753
Write size: 4194304
Object size:4194304
Bandwidth (MB/sec): 116.154
Stddev Bandwidth:   71.5763
Max bandwidth (MB/sec): 320
Min bandwidth (MB/sec): 0
Average IOPS:   29
Stddev IOPS:17
Max IOPS:   80
Min IOPS:   0
Average Latency(s): 1.10189
Stddev Latency(s):  1.80203
Max latency(s): 15.0715
Min latency(s): 0.0210309




[root@osd04 ~]# ethtool -k gth0
Features for gth0:
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: on
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: on [fixed]
tx-checksum-sctp: on
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp-mangleid-segmentation: off
tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: on [fixed]
tx-gre-segmentation: on
tx-gre-csum-segmentation: on
tx-ipxip4-segmentation: on
tx-ipxip6-segmentation: on
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-gso-partial: on
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off
hw-tc-offload: off
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]



On 22 January 2018 at 12:09, Steven Vacaroaia  wrote:

> Hi David,
>
> I noticed the public interface of the server I am running the test from is
> heavily 

[ceph-users] Full Ratio

2018-01-24 Thread Karun Josy
Hi,

I am trying to increase the full ratio of OSDs in a cluster.
While adding a new node one of the new disk got backfilled to more than 95%
and cluster freezed. So I am trying to avoid it from happening again.


Tried pg set command but it is not working :
$ ceph pg set_nearfull_ratio 0.88
Error ENOTSUP: this command is obsolete

I had increased the full ratio in osds using injectargs initially but it
didnt work as when the disk reached 95% it showed osd full status

$ ceph tell osd.* injectargs '--mon_osd_full_ratio 0.97'
osd.0: mon_osd_full_ratio = '0.97' (not observed, change may require
restart)
osd.1: mon_osd_full_ratio = '0.97' (not observed, change may require
restart)



How can I set full ratio to more than 95% ?

Karun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Full Ratio

2018-01-24 Thread Jean-Charles Lopez
Hi,

if you are using an older Ceph version note that the mon_osd_near_full_ration 
and mon_osd_full_ration must be set in the config file on the MON hosts first 
and then the MONs restarted one after the other one.

If using a recent  version there is a command ceph osd set-full-ratio and ceph 
osd set-nearfull-ratio

Regards
JC

> On Jan 24, 2018, at 11:07, Karun Josy  wrote:
> 
> Hi,
> 
> I am trying to increase the full ratio of OSDs in a cluster.
> While adding a new node one of the new disk got backfilled to more than 95% 
> and cluster freezed. So I am trying to avoid it from happening again.
> 
> 
> Tried pg set command but it is not working : 
> $ ceph pg set_nearfull_ratio 0.88
> Error ENOTSUP: this command is obsolete
> 
> I had increased the full ratio in osds using injectargs initially but it 
> didnt work as when the disk reached 95% it showed osd full status
> 
> $ ceph tell osd.* injectargs '--mon_osd_full_ratio 0.97'
> osd.0: mon_osd_full_ratio = '0.97' (not observed, change may require 
> restart)
> osd.1: mon_osd_full_ratio = '0.97' (not observed, change may require 
> restart)
> 
> 
> 
> How can I set full ratio to more than 95% ? 
> 
> Karun 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous - bad performance

2018-01-24 Thread Marc Roos
 

ceph osd pool application enable XXX rbd

-Original Message-
From: Steven Vacaroaia [mailto:ste...@gmail.com] 
Sent: woensdag 24 januari 2018 19:47
To: David Turner
Cc: ceph-users
Subject: Re: [ceph-users] Luminous - bad performance

Hi ,

I have bundled the public NICs and added 2 more monitors ( running on 2 
of the 3 OSD hosts) This seem to improve  things but still I have high 
latency Also performance of the SSD pool is worse than HDD which is very 
confusing 

SSDpool is using one Toshiba PX05SMB040Y per server ( for a total of 3 
OSDs) while HDD pool is using 2 Seagate ST600MM0006 disks per server () 
for a total of 6 OSDs)

Note
I have also disabled  C state in the BIOS and added  
"intel_pstate=disable intel_idle.max_cstate=0 processor.max_cstate=0 
idle=poll" to GRUB

Any hints/suggestions will be greatly appreciated 

[root@osd04 ~]# ceph status
  cluster:
id: 37161a51-a159-4895-a7fd-3b0d857f1b66
health: HEALTH_WARN
noscrub,nodeep-scrub flag(s) set
application not enabled on 2 pool(s)
mon osd02 is low on available space

  services:
mon: 3 daemons, quorum osd01,osd02,mon01
mgr: mon01(active)
osd: 9 osds: 9 up, 9 in
 flags noscrub,nodeep-scrub
tcmu-runner: 6 daemons active

  data:
pools:   2 pools, 228 pgs
objects: 50384 objects, 196 GB
usage:   402 GB used, 3504 GB / 3906 GB avail
pgs: 228 active+clean

  io:
client:   46061 kB/s rd, 852 B/s wr, 15 op/s rd, 0 op/s wr

[root@osd04 ~]# ceph osd tree
ID  CLASS WEIGHT  TYPE NAME  STATUS REWEIGHT PRI-AFF
 -9   4.5 root ssds
-10   1.5 host osd01-ssd
  6   hdd 1.5 osd.6  up  1.0 1.0
-11   1.5 host osd02-ssd
  7   hdd 1.5 osd.7  up  1.0 1.0
-12   1.5 host osd04-ssd
  8   hdd 1.5 osd.8  up  1.0 1.0
 -1   2.72574 root default
 -3   1.09058 host osd01
  0   hdd 0.54529 osd.0  up  1.0 1.0
  4   hdd 0.54529 osd.4  up  1.0 1.0
 -5   1.09058 host osd02
  1   hdd 0.54529 osd.1  up  1.0 1.0
  3   hdd 0.54529 osd.3  up  1.0 1.0
 -7   0.54459 host osd04
  2   hdd 0.27229 osd.2  up  1.0 1.0
  5   hdd 0.27229 osd.5  up  1.0 1.0


 rados bench -p ssdpool 300 -t 32 write --no-cleanup && rados bench -p 
ssdpool 300 -t 32  seq

Total time run: 302.058832
Total writes made:  4100
Write size: 4194304
Object size:4194304
Bandwidth (MB/sec): 54.2941
Stddev Bandwidth:   70.3355
Max bandwidth (MB/sec): 252
Min bandwidth (MB/sec): 0
Average IOPS:   13
Stddev IOPS:17
Max IOPS:   63
Min IOPS:   0
Average Latency(s): 2.35655
Stddev Latency(s):  4.4346
Max latency(s): 29.7027
Min latency(s): 0.045166

rados bench -p rbd 300 -t 32 write --no-cleanup && rados bench -p rbd 
300 -t 32  seq
Total time run: 301.428571
Total writes made:  8753
Write size: 4194304
Object size:4194304
Bandwidth (MB/sec): 116.154
Stddev Bandwidth:   71.5763
Max bandwidth (MB/sec): 320
Min bandwidth (MB/sec): 0
Average IOPS:   29
Stddev IOPS:17
Max IOPS:   80
Min IOPS:   0
Average Latency(s): 1.10189
Stddev Latency(s):  1.80203
Max latency(s): 15.0715
Min latency(s): 0.0210309




[root@osd04 ~]# ethtool -k gth0
Features for gth0:
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: on
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: on [fixed]
tx-checksum-sctp: on
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp-mangleid-segmentation: off
tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: on [fixed]
tx-gre-segmentation: on
tx-gre-csum-segmentation: on
tx-ipxip4-segmentation: on
tx-ipxip6-segmentation: on
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-gso-partial: on
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fw

[ceph-users] Scrub mismatch since upgrade to Luminous (12.2.2)

2018-01-24 Thread hans
Since upgrade to Ceph Luminous (12.2.2) from Jewel we get scrub mismatch 
errors every day at the same time (19:25), how can we fix them? Seems to 
be the same problem as described at 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-December/023202.html 
(can't reply to archived messages), but we don't use MDS.


We can force the error by running ceph scrub. Ceph health shows health 
OK all the time.



/var/log/ceph/ceph.log:

2018-01-24 19:25:17.627962 mon.stor01 mon.0 192.168.110.131:6789/0 7015 
: cluster [ERR] scrub mismatch
2018-01-24 19:25:17.627981 mon.stor01 mon.0 192.168.110.131:6789/0 7016 
: cluster [ERR]  mon.0 ScrubResult(keys 
{mgrstat=23,monmap=46,osd_metadata=25,osd_pg_creating=1,osdmap=5} crc 
{mgrstat=947634900,monmap=2803006433,osd_metadata=1876741065,osd_pg_creating=2944932770,osdmap=77513567})
2018-01-24 19:25:17.628001 mon.stor01 mon.0 192.168.110.131:6789/0 7017 
: cluster [ERR]  mon.1 ScrubResult(keys 
{mgrstat=23,monmap=46,osd_metadata=25,osd_pg_creating=1,osdmap=5} crc 
{mgrstat=947634900,monmap=463219445,osd_metadata=1876741065,osd_pg_creating=2944932770,osdmap=77513567})
2018-01-24 19:25:17.628014 mon.stor01 mon.0 192.168.110.131:6789/0 7018 
: cluster [ERR] scrub mismatch
2018-01-24 19:25:17.628029 mon.stor01 mon.0 192.168.110.131:6789/0 7019 
: cluster [ERR]  mon.0 ScrubResult(keys 
{mgrstat=23,monmap=46,osd_metadata=25,osd_pg_creating=1,osdmap=5} crc 
{mgrstat=947634900,monmap=2803006433,osd_metadata=1876741065,osd_pg_creating=2944932770,osdmap=77513567})
2018-01-24 19:25:17.628040 mon.stor01 mon.0 192.168.110.131:6789/0 7020 
: cluster [ERR]  mon.4 ScrubResult(keys 
{mgrstat=23,monmap=46,osd_metadata=25,osd_pg_creating=1,osdmap=5} crc 
{mgrstat=947634900,monmap=463219445,osd_metadata=1876741065,osd_pg_creating=2944932770,osdmap=77513567})
2018-01-23 19:25:17.348731 mon.stor01 mon.0 192.168.110.131:6789/0 5199 
: cluster [ERR] scrub mismatch
2018-01-23 19:25:17.348751 mon.stor01 mon.0 192.168.110.131:6789/0 5200 
: cluster [ERR]  mon.0 ScrubResult(keys 
{mgrstat=43,monmap=46,osd_metadata=11} crc 
{mgrstat=3495389473,monmap=2803006433,osd_metadata=1919422177})
2018-01-23 19:25:17.348764 mon.stor01 mon.0 192.168.110.131:6789/0 5201 
: cluster [ERR]  mon.1 ScrubResult(keys 
{mgrstat=43,monmap=46,osd_metadata=11} crc 
{mgrstat=3495389473,monmap=463219445,osd_metadata=1919422177})
2018-01-23 19:25:17.348772 mon.stor01 mon.0 192.168.110.131:6789/0 5202 
: cluster [ERR] scrub mismatch
2018-01-23 19:25:17.348780 mon.stor01 mon.0 192.168.110.131:6789/0 5203 
: cluster [ERR]  mon.0 ScrubResult(keys 
{mgrstat=43,monmap=46,osd_metadata=11} crc 
{mgrstat=3495389473,monmap=2803006433,osd_metadata=1919422177})
2018-01-23 19:25:17.348795 mon.stor01 mon.0 192.168.110.131:6789/0 5204 
: cluster [ERR]  mon.4 ScrubResult(keys 
{mgrstat=43,monmap=46,osd_metadata=11} crc 
{mgrstat=3495389473,monmap=463219445,osd_metadata=1919422177})
2018-01-22 19:25:17.197627 mon.stor01 mon.0 192.168.110.131:6789/0 3406 
: cluster [ERR] scrub mismatch
2018-01-22 19:25:17.197649 mon.stor01 mon.0 192.168.110.131:6789/0 3407 
: cluster [ERR]  mon.0 ScrubResult(keys 
{mgrstat=24,monmap=46,osd_metadata=25,osd_pg_creating=1,osdmap=4} crc 
{mgrstat=1702457410,monmap=2803006433,osd_metadata=1876741065,osd_pg_creating=1008471957,osdmap=4220436819})
2018-01-22 19:25:17.197663 mon.stor01 mon.0 192.168.110.131:6789/0 3408 
: cluster [ERR]  mon.1 ScrubResult(keys 
{mgrstat=24,monmap=46,osd_metadata=25,osd_pg_creating=1,osdmap=4} crc 
{mgrstat=1702457410,monmap=463219445,osd_metadata=1876741065,osd_pg_creating=1008471957,osdmap=4220436819})
2018-01-22 19:25:17.197678 mon.stor01 mon.0 192.168.110.131:6789/0 3409 
: cluster [ERR] scrub mismatch
2018-01-22 19:25:17.197691 mon.stor01 mon.0 192.168.110.131:6789/0 3410 
: cluster [ERR]  mon.0 ScrubResult(keys 
{mgrstat=24,monmap=46,osd_metadata=25,osd_pg_creating=1,osdmap=4} crc 
{mgrstat=1702457410,monmap=2803006433,osd_metadata=1876741065,osd_pg_creating=1008471957,osdmap=4220436819})
2018-01-22 19:25:17.197707 mon.stor01 mon.0 192.168.110.131:6789/0 3411 
: cluster [ERR]  mon.4 ScrubResult(keys 
{mgrstat=24,monmap=46,osd_metadata=25,osd_pg_creating=1,osdmap=4} crc 
{mgrstat=1702457410,monmap=463219445,osd_metadata=1876741065,osd_pg_creating=1008471957,osdmap=4220436819})



/var/log/ceph/ceph-mon.log:

2018-01-24 19:25:17.627955 7f4bd191e700 -1 log_channel(cluster) log 
[ERR] : scrub mismatch
2018-01-24 19:25:17.627979 7f4bd191e700 -1 log_channel(cluster) log 
[ERR] :  mon.0 ScrubRes

Re: [ceph-users] Full Ratio

2018-01-24 Thread Karun Josy
Thank you!

Ceph version is 12.2

Also, can you let me know the format to set  osd_backfill_full_ratio ?

Is it  " ceph osd   set   -backfillfull-ratio .89 " ?









Karun Josy

On Thu, Jan 25, 2018 at 1:29 AM, Jean-Charles Lopez 
wrote:

> Hi,
>
> if you are using an older Ceph version note that the
> mon_osd_near_full_ration and mon_osd_full_ration must be set in the config
> file on the MON hosts first and then the MONs restarted one after the other
> one.
>
> If using a recent  version there is a command ceph osd set-full-ratio and
> ceph osd set-nearfull-ratio
>
> Regards
> JC
>
> > On Jan 24, 2018, at 11:07, Karun Josy  wrote:
> >
> > Hi,
> >
> > I am trying to increase the full ratio of OSDs in a cluster.
> > While adding a new node one of the new disk got backfilled to more than
> 95% and cluster freezed. So I am trying to avoid it from happening again.
> >
> >
> > Tried pg set command but it is not working :
> > $ ceph pg set_nearfull_ratio 0.88
> > Error ENOTSUP: this command is obsolete
> >
> > I had increased the full ratio in osds using injectargs initially but it
> didnt work as when the disk reached 95% it showed osd full status
> >
> > $ ceph tell osd.* injectargs '--mon_osd_full_ratio 0.97'
> > osd.0: mon_osd_full_ratio = '0.97' (not observed, change may require
> restart)
> > osd.1: mon_osd_full_ratio = '0.97' (not observed, change may require
> restart)
> > 
> > 
> >
> > How can I set full ratio to more than 95% ?
> >
> > Karun
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD servers swapping despite having free memory capacity

2018-01-24 Thread Blair Bethwaite
On 25 January 2018 at 04:53, Warren Wang  wrote:
> The other thing I can think of is if you have OSDs locking up and getting 
> corrupted, there is a severe XFS bug where the kernel will throw a NULL 
> pointer dereference under heavy memory pressure. Again, it's due to memory 
> issues, but you will see the message in your kernel logs.

I think that's the one I was thinking of too - IIRC it only happens
with larger XFS block sizes though, at some point it must have been
default or recommended to use "-n size=64k" with Ceph on filesystem
creation. We first hit that whilst my whole team was sitting in the
keynotes of the OpenStack Summit in Sydney... good times o_0.

-- 
Cheers,
~Blairo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Full Ratio

2018-01-24 Thread QR
ceph osd  



 原始邮件 发件人: Karun 
Josy收件人: Jean-Charles Lopez抄送: 
ceph-users@lists.ceph.com发送时间: 2018年1月25日(周四) 
04:42主题: Re: [ceph-users] Full RatioThank you!
Ceph version is 12.2
Also, can you let me know the format to set  osd_backfill_full_ratio ?
Is it  " ceph osd   set   -backfillfull-ratio .89 " ?







Karun Josy
On Thu, Jan 25, 2018 at 1:29 AM, Jean-Charles Lopez  wrote:
Hi,

if you are using an older Ceph version note that the mon_osd_near_full_ration 
and mon_osd_full_ration must be set in the config file on the MON hosts first 
and then the MONs restarted one after the other one.

If using a recent  version there is a command ceph osd set-full-ratio and ceph 
osd set-nearfull-ratio

Regards
JC

> On Jan 24, 2018, at 11:07, Karun Josy  wrote:
>
> Hi,
>
> I am trying to increase the full ratio of OSDs in a cluster.
> While adding a new node one of the new disk got backfilled to more than 95% 
> and cluster freezed. So I am trying to avoid it from happening again.
>
>
> Tried pg set command but it is not working :
> $ ceph pg set_nearfull_ratio 0.88
> Error ENOTSUP: this command is obsolete
>
> I had increased the full ratio in osds using injectargs initially but it 
> didnt work as when the disk reached 95% it showed osd full status
>
> $ ceph tell osd.* injectargs '--mon_osd_full_ratio 0.97'
> osd.0: mon_osd_full_ratio = '0.97' (not observed, change may require 
> restart)
> osd.1: mon_osd_full_ratio = '0.97' (not observed, change may require 
> restart)
> 
> 
>
> How can I set full ratio to more than 95% ?
>
> Karun
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ideal Bluestore setup

2018-01-24 Thread Alex Gorbachev
Hi Ean,

I don't have any experience with less than 8 drives per OSD node, and
the setup heavily depends on what you want to use it for.  Assuming
small proof of concept with not much requirement for performance (due
to low spindle count), I would do this:

On Mon, Jan 22, 2018 at 1:28 PM, Ean Price  wrote:
> Hi folks,
>
> I’m not sure the ideal setup for bluestore given the set of hardware I have 
> to work with so I figured I would ask the collective wisdom of the ceph 
> community. It is a small deployment so the hardware is not all that 
> impressive, but I’d still like to get some feedback on what would be the 
> preferred and most maintainable setup.
>
> We have 5 ceph OSD hosts with the following setup:
>
> 16 GB RAM
> 1 PCI-E NVRAM 128GB
> 1 SSD 250 GB
> 2 HDD 1 TB each
>
> I was thinking to put:
>
> OS on NVRAM with 2x20 GB partitions for bluestore’s WAL and rocksdb

I would put the OS on the SSD and not colocate with WAL/DB.  I would
also put WAL/DB on the NVMe drive as the fastest.

> And either use bcache with the SSD to cache the 2x HDDs or possibly use 
> Ceph’s built in cache tiering.

Ceph cache tiering is likely out of the range of this setup, and
requires a very clear understanding of the workload.  I would not use
it.

No experience with bcache, but again seems to be a bit of overkill for
a small setup like this.  Simple = stable.

>
> My questions are:
>
> 1) is a 20GB logical volume adequate for the WAL and db with a 1TB HDD or 
> should it be larger?

I believe so, yes.  If it spills over, the data will just go onto the drives.

>
> 2) or - should I put the rocksdb on the SSD and just leave the WAL on the 
> NVRAM device?

You are likely better off with WAL and DB on the NVRAM

>
> 3) Lastly, what are the downsides of bcache vs Ceph’s cache tiering? I see 
> both are used in production so I’m not sure which is the better choice for us.
>
> Performance is, of course, important but maintainability and stability are 
> definitely more important.

I would avoid both bcache and tiering to simplify the configuration,
and seriously consider larger nodes if possible, and more OSD drives.

HTH,
--
Alex Gorbachev
Storcium

>
> Thanks in advance for your advice!
>
> Best,
> Ean
>
>
>
>
>
> --
> __
>
> This message contains information which may be confidential.  Unless you
> are the addressee (or authorized to receive for the addressee), you may not
> use, copy, or disclose to anyone the message or any information contained
> in the message.  If you have received the message in error, please advise
> the sender by reply e-mail or contact the sender at Price Paper & Twine
> Company by phone at (516) 378-7842 and delete the message.  Thank you very
> much.
>
> __
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Cache-tier forward mode hang in luminous

2018-01-24 Thread TYLin
Hi all,

Has anyone tried setting cache-tier to forward mode in luminous 12.2.1 ? Our 
cluster cannot write to rados pool once the mode to set to forward. We setup 
the cache-tier with forward mode and then do rados bench. However, the 
throughput from rados bench is 0, and iostat shows no disk usage as well. 
Kraken works fine with forward mode. 

Environment:
* ceph/daemon container
* ceph 12.2.1
* total 6 osd. 3 hdd, 3 ssd and all of them are bluestore

The logs are uploaded through ceph-post-file with tag 
fb8cb1f2-703b-4866-92b5-52c42974aac3

Thanks,
Ting-Yi Lin

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com