[ceph-users] Re: "Signature check failed" from certain clients

2019-08-15 Thread Hector Martin

On 15/08/2019 11.42, Peter Sarossy wrote:

hey folks,

I spent the past 2 hours digging through the forums and similar sources with no 
luck..

I use ceph storage for docker stacks, and this issue has taken the whole thing 
down as I cannot mount their volumes back...
Starting yesterday, some of my nodes cannot mount the filesystem and it just 
hangs on the mount command, while the logs are full of the messages below. It 
doesn't matter which MDS node is active for some clients it just works, and for 
others it doesn't.

Any hints?

2019-08-15 02:20:55.230 7f6e80e47700  0 --1- 
[v2:10.0.0.8:6800/859279545,v1:10.0.0.8:6801/859279545] >> 
v1:10.0.0.1:0/844039356 conn(0x564928535000 0x56492851b800 :6801 
s=READ_FOOTER_AND_DISPATCH pgs=13406 cs=3203 l=0).handle_message_footer Signature 
check failed
2019-08-15 02:20:56.254 7f6e7fe45700  0 SIGN: MSG 1 Message signature does not 
match contents.
2019-08-15 02:20:56.254 7f6e7fe45700  0 SIGN: MSG 1Signature on message:
2019-08-15 02:20:56.254 7f6e7fe45700  0 SIGN: MSG 1sig: 1045888080092928376
2019-08-15 02:20:56.254 7f6e7fe45700  0 SIGN: MSG 1Locally calculated signature:
2019-08-15 02:20:56.254 7f6e7fe45700  0 SIGN: MSG 1
sig_check:13982737427498638198
2019-08-15 02:20:56.254 7f6e7fe45700  0 Signature failed.
2019-08-15 02:20:56.254 7f6e7fe45700  0 --1- 
[v2:10.0.0.8:6800/859279545,v1:10.0.0.8:6801/859279545] >> 
v1:10.0.0.1:0/844039356 conn(0x564928535000 0x56492851b800 :6801 
s=READ_FOOTER_AND_DISPATCH pgs=13410 cs=3205 l=0).handle_message_footer Signature 
check failed
2019-08-15 02:20:57.246 7f6e80646700  0 SIGN: MSG 1 Message signature does not 
match contents.
2019-08-15 02:20:57.246 7f6e80646700  0 SIGN: MSG 1Signature on message:
2019-08-15 02:20:57.246 7f6e80646700  0 SIGN: MSG 1sig: 1045888080092928376
2019-08-15 02:20:57.246 7f6e80646700  0 SIGN: MSG 1Locally calculated signature:
2019-08-15 02:20:57.246 7f6e80646700  0 SIGN: MSG 1
sig_check:13982737427498638198
2019-08-15 02:20:57.246 7f6e80646700  0 Signature failed.
2019-08-15 02:20:57.246 7f6e80646700  0 --1- 
[v2:10.0.0.8:6800/859279545,v1:10.0.0.8:6801/859279545] >> 
v1:10.0.0.1:0/844039356 conn(0x564928535000 0x56492851b800 :6801 
s=READ_FOOTER_AND_DISPATCH pgs=13414 cs=3207 l=0).handle_message_footer Signature 
check failed


Are you using Canonical Livepatch? Looks like they didn't test a recent 
patch and broke CephFS in the process. You'll probably have to reboot 
the affected clients.


http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-August/036513.html

--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Failing heartbeats when no backfill is running

2019-08-15 Thread Lorenz Kiefner
Oh no, it's not that bad. It's

$ ping -s 65000 dest.inati.on

on a VPN connection that has a MTU of 1300 via IPv6. So I suspect that I
only get an answer, when all 51 fragments get fully returned. It's clear
that big packets with lots of fragments are more affected by packet loss
than 64 byte pings.

I just (at 9 o'clock in the morning) repeated this ping test and got
hardly any drops (less than 1%), even with the size of 64k. So it's
really dependent on the time of the day. Seems like some ISPs are
dropping some packets, especially in the evening...

A few minutes ago I restarted all down-marked OSDs, but they are getting
marked down again... Seems like Ceph is tolerable against packet loss
(it surely affects performance, but this irrelevant for me).


Could erasure coded pools pose some problems?


Thank you all for every hint!

Lorenz


Am 15.08.19 um 08:51 schrieb Janne Johansson:
> Den ons 14 aug. 2019 kl 17:46 skrev Lorenz Kiefner
> mailto:root%2bcephus...@deinadmin.de>>:
>
> Is ceph sensitive to packet loss? On some VPN links I have up to 20%
> packet loss on 64k packets but less than 3% on 5k packets in the
> evenings.
>
>
> 20% seems crazy high, there must be something really wrong there.
>
> At 20%, you would get tons of packet timeouts to wait for on all those
> lost frames,
> then resends of (at least!) those 20% extra, which in turn would lead
> to 20% of those
> resends getting lost, all while the main streams of data try to move
> forward when some
> older packet do get over. This is a really bad situation to design for, 
>
> I think you should look for a link solution that doesn't drop that
> many packets, instead of changing
> the software you try to run over that link, all others will notice
> this too and act badly in some way or other.
>
> Heck, 20% is like taking a math schoolbook and remove all instances of
> "3" and "8" and see if kids can learn to count from it. 8-/
>  
> -- 
> May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] rgw luminous 12.2.12

2019-08-15 Thread Marc Roos


FYI I just had an issue with radosgw / civetweb. Wanted to upload 40 MB 
file, it started with poor transfer speed, which was decreasing over 
time to 20KB/s when I stopped the transfer. I had to kill radosgw and 
start it to get 'normal' operation back.






___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mgr stability

2019-08-15 Thread Mykola Golub
On Wed, Aug 14, 2019 at 12:12:36PM -0500, Reed Dier wrote:

> My main metrics source is the influx plugin, but I enabled the
> prometheus plugin to get access to the per-rbd image metrics.  I may
> disable prometheus and see if that yields better stability, until
> possibly the influx plugin gets updated to support those metric
> exports.

Before disabling the prometheus plugin, could you try just disabling
per-rbd image metrics (i.e. set rbd_stats_pools param to empty)?
Per-rbd images stats is a new feature and might be heavy depending on
your cluster size and image count, so it would be nice to check this
first.

I also see you have rbd_support module enabled. It would be good to
have it temporary disabled during this experiment too.

-- 
Mykola Golub
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] How to tune the ceph balancer in nautilus

2019-08-15 Thread Manuel Lausch
Hi,

I playing around with the ceph balancer in luminous and nautilus. While
tuning some balancer settings I experienced some problems with nautilus.

In Luminous I cold configure the max_misplaced value like this:
 ceph config-key set mgr/balancer/max_misplaced 0.002

With the same command in nautilus I get this Warning:
 WARNING: it looks like you might be trying to set a ceph-mgr module
 configuration key.  Since Ceph 13.0.0 (Mimic), mgr module configuration
 is done with `config set`, and new values set using `config-key set`
 will be ignored.

After investigating the nautilus documentation
(https://docs.ceph.com/docs/nautilus/rados/operations/balancer/#throttling)
I tried this:
 ceph config set mgr mgr/balancer/max_misplaced 0.002

and get this error:
 Error EINVAL: unrecognized config option 'mgr/balancer/max_misplaced'

My question is, how can I configure this parameters? In general the
whole "ceph config" foo confuses me a bit.


ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus
(stable)

Regards
Manuel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Upgrad luminous -> mimic , any pointers?

2019-08-15 Thread Marc Roos



I have a fairly dormant ceph luminous cluster on centos7 with stock 
kernel, and thought about upgrading it before putting it to more use. 
I can remember some page on the ceph website that had specific 
instructions mentioning upgrading from luminous. But I can't find it 
anymore, this page[0] even says I am viewing an unsupported version of 
ceph??

Any pointers aside from a default upgrade procedure? 
- I am using rbd, rgw and cephfs (only one mds)
- Still using direct block devices (not lvm)
- have one 1 pg active+clean+inconsistent since a year or so, that I am 
getting sentimentally attached to ;)


[0]
https://docs.ceph.com/docs/mimic/install/upgrading-ceph/#

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mgr stability

2019-08-15 Thread Reed Dier
I had already disabled prometheus plugin (again, only using for the rbd stats), 
but will also remove the rbd pool from the rbd_support module, as well as 
disable the rbd_support module.

It seems slightly more stable so far, but still not rock solid as it was before.

Thanks,

Reed

> On Aug 15, 2019, at 8:10 AM, Mykola Golub  wrote:
> 
> On Wed, Aug 14, 2019 at 12:12:36PM -0500, Reed Dier wrote:
> 
>> My main metrics source is the influx plugin, but I enabled the
>> prometheus plugin to get access to the per-rbd image metrics.  I may
>> disable prometheus and see if that yields better stability, until
>> possibly the influx plugin gets updated to support those metric
>> exports.
> 
> Before disabling the prometheus plugin, could you try just disabling
> per-rbd image metrics (i.e. set rbd_stats_pools param to empty)?
> Per-rbd images stats is a new feature and might be heavy depending on
> your cluster size and image count, so it would be nice to check this
> first.
> 
> I also see you have rbd_support module enabled. It would be good to
> have it temporary disabled during this experiment too.
> 
> -- 
> Mykola Golub



smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Upgrad luminous -> mimic , any pointers?

2019-08-15 Thread Marc Roos


Pfff, you are right, I don't even know which one is the newest latest, 
indeed Nautilus



-Original Message-
Subject: Re: [ceph-users] Upgrad luminous -> mimic , any pointers?

Why would you go to Mimic instead of Nautilus?


> 
> 
> 
> I have a fairly dormant ceph luminous cluster on centos7 with stock 
> kernel, and thought about upgrading it before putting it to more use.
> I can remember some page on the ceph website that had specific 
> instructions mentioning upgrading from luminous. But I can't find it 
> anymore, this page[0] even says I am viewing an unsupported version of 

> ceph??
> 
> Any pointers aside from a default upgrade procedure? 
> - I am using rbd, rgw and cephfs (only one mds)
> - Still using direct block devices (not lvm)
> - have one 1 pg active+clean+inconsistent since a year or so, that I 
> am getting sentimentally attached to ;)
> 
> 
> [0]
> https://docs.ceph.com/docs/mimic/install/upgrading-ceph/#
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
> email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Upgrade luminous -> nautilus , any pointers?

2019-08-15 Thread Marc Roos
 

I have a fairly dormant ceph luminous cluster on centos7 with stock 
kernel, and thought about upgrading it before putting it to more use. 
I can remember some page on the ceph website that had specific 
instructions mentioning upgrading from luminous. But I can't find it 
anymore, this page[0] even says I am viewing an unsupported version of 
ceph??

Any pointers aside from a default upgrade procedure? 
- I am using rbd, rgw and cephfs (only one mds)
- Still using direct block devices (not lvm)
- have one 1 pg active+clean+inconsistent since a year or so, that I am 
getting sentimentally attached to ;)


[0]
https://docs.ceph.com/docs/mimic/install/upgrading-ceph/#


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mapped rbd is very slow

2019-08-15 Thread Vitaliy Filippov
rbd -p kube bench kube/bench --io-type write --io-threads 1 --io-total  
10G --io-pattern rand

elapsed:14  ops:   262144  ops/sec: 17818.16  bytes/sec: 72983201.32


It's a totally unreal number. Something is wrong with the test.

Test it with `fio` please:

fio -ioengine=rbd -name=test -bs=4k -iodepth=1 -rw=randwrite -runtime=60  
-pool=kube -rbdname=bench



Reads are very very slow:
elapsed:   445  ops:81216  ops/sec:   182.37  bytes/sec: 747006.15
elapsed:14  ops:14153  ops/sec:   957.57  bytes/sec: 3922192.15


This is closer to the reality.

--
With best regards,
  Vitaliy Filippov
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Small HDD cluster, switch from Bluestore to Filestore

2019-08-15 Thread Rich Bade
Unfortunately the scsi reset on this vm happened again last night so this 
hasn't resolved the issue. 
Thanks for the suggestion though.

Rich
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Small HDD cluster, switch from Bluestore to Filestore

2019-08-15 Thread Robert LeBlanc
The overall latency in the cluster may be too high, but it was worth a
shot. I've noticed that these settings really reduces the latency
distribution so that it becomes more predictable and prevented some single
VMs from hanging for long periods of time while others worked just fine
usually when one drive was at a constant 100%.

Did you notice any improvement in performance or system utilization in
other ways?

Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Thu, Aug 15, 2019 at 3:49 PM Rich Bade  wrote:

> Unfortunately the scsi reset on this vm happened again last night so this
> hasn't resolved the issue.
> Thanks for the suggestion though.
>
> Rich
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io