[ceph-users] RDMA

2019-10-14 Thread gabryel . mason-williams
Hello,

I was wondering what user experience was with using Ceph over RDMA? 
  - How you set it up?
  - Documentation used to set it up?
  - Known issues when using it?
  - If you still use it?
Kind regards

Gabryel Mason-Williams
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS rejects clients causing hanging mountpoint on linux kernel client

2019-10-14 Thread Florian Pritz
On Wed, Oct 02, 2019 at 10:24:41PM +0800, "Yan, Zheng"  
wrote:
> Can you reproduce this. If you can, run 'ceph daemon mds.x session ls'
> before restart mds.

I just managed to run into this issue again. 'ceph daemon mds.x session
ls' doesn't work because apparently our setup doesn't have the admin
socket in the expected place. I've therefore used 'ceph tell mds.0
session ls' which I think should be the same expect for how the daemon
is contacted.

When the issue happens and 2 clients are hanging, 'ceph tell mds.0
session ls' shows only 9 clients instead of 11. The hanging clients are
missing from the list. Once they are rebooted they show up in the
output.

On a potentially interesting note: The clients that were hanging this
time are the same ones as last time. They aren't set up any differently
from the others as far as I can tell though.

Florian


signature.asc
Description: PGP signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Recurring issue: PG is inconsistent, but lists no inconsistent objects

2019-10-14 Thread Florian Haas
Hello,

I am running into an "interesting" issue with a PG that is being flagged
as inconsistent during scrub (causing the cluster to go to HEALTH_ERR),
but doesn't actually appear to contain any inconsistent objects.

$ ceph health detail
HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent
OSD_SCRUB_ERRORS 1 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent
pg 10.10d is active+clean+inconsistent, acting [15,13]

$ rados list-inconsistent-obj 10.10d
{"epoch":12138,"inconsistents":[]}

"ceph pg query" (see below) on that PG does report num_scrub_errors=1,
num_shallow_scrub_errors=1, and num_objects_dirty=1. "osd scrub auto
repair = true" is set on all OSDs, but the PG never auto-repairs. (This
is a test cluster, the pool size is 2 — this may preclude auto repair
from ever kicking in; I'm not sure on that one.)

"ceph pg repair" does repair, but the issue reappears on the next
scheduled scrub.

This issue was first discovered while the cluster was on
Jewel/Filestore. In an event like this I would normally suspect either a
problem with an individual OSD, or a bug in the FileStore code. But the
cluster has had *all* of it's OSDs replaced since, as part of a full
Jewel→Luminous→Nautilus upgrade and a FileStore→BlueStore conversion.
The issue still persists.

A full "ceph pg 10.10d query" result is below. If anyone has ideas on
how to permanently fix this issue, I'd be most grateful.

Thanks!

Cheers,
Florian




{
"state": "active+clean+inconsistent",
"snap_trimq": "[]",
"snap_trimq_len": 0,
"epoch": 12143,
"up": [
15,
13
],
"acting": [
15,
13
],
"acting_recovery_backfill": [
"13",
"15"
],
"info": {
"pgid": "10.10d",
"last_update": "100'11",
"last_complete": "100'11",
"log_tail": "0'0",
"last_user_version": 11,
"last_backfill": "MAX",
"last_backfill_bitwise": 0,
"purged_snaps": [],
"history": {
"epoch_created": 45,
"epoch_pool_created": 45,
"last_epoch_started": 12139,
"last_interval_started": 12138,
"last_epoch_clean": 12139,
"last_interval_clean": 12138,
"last_epoch_split": 0,
"last_epoch_marked_full": 0,
"same_up_since": 12138,
"same_interval_since": 12138,
"same_primary_since": 12114,
"last_scrub": "100'11",
"last_scrub_stamp": "2019-10-14 08:33:57.347097",
"last_deep_scrub": "100'11",
"last_deep_scrub_stamp": "2019-10-11 14:09:29.016946",
"last_clean_scrub_stamp": "2019-10-11 14:09:29.016946"
},
"stats": {
"version": "100'11",
"reported_seq": "4927",
"reported_epoch": "12143",
"state": "active+clean+inconsistent",
"last_fresh": "2019-10-14 08:33:57.347147",
"last_change": "2019-10-14 08:33:57.347147",
"last_active": "2019-10-14 08:33:57.347147",
"last_peered": "2019-10-14 08:33:57.347147",
"last_clean": "2019-10-14 08:33:57.347147",
"last_became_active": "2019-10-11 14:44:09.312226",
"last_became_peered": "2019-10-11 14:44:09.312226",
"last_unstale": "2019-10-14 08:33:57.347147",
"last_undegraded": "2019-10-14 08:33:57.347147",
"last_fullsized": "2019-10-14 08:33:57.347147",
"mapping_epoch": 12138,
"log_start": "0'0",
"ondisk_log_start": "0'0",
"created": 45,
"last_epoch_clean": 12139,
"parent": "0.0",
"parent_split_bits": 0,
"last_scrub": "100'11",
"last_scrub_stamp": "2019-10-14 08:33:57.347097",
"last_deep_scrub": "100'11",
"last_deep_scrub_stamp": "2019-10-11 14:09:29.016946",
"last_clean_scrub_stamp": "2019-10-11 14:09:29.016946",
"log_size": 11,
"ondisk_log_size": 11,
"stats_invalid": false,
"dirty_stats_invalid": false,
"omap_stats_invalid": false,
"hitset_stats_invalid": false,
"hitset_bytes_stats_invalid": true,
"pin_stats_invalid": true,
"manifest_stats_invalid": true,
"snaptrimq_len": 0,
"stat_sum": {
"num_bytes": 11,
"num_objects": 1,
"num_object_clones": 0,
"num_object_copies": 2,
"num_objects_missing_on_primary": 0,
"num_objects_missing": 0,
"num_objects_degraded": 0,
"num_objects_misplaced": 0,
"num_objects_unfound": 0,
"num_objects_dirty": 1,
"num_whiteouts": 0,
"num_read": 33,
"num_read_kb": 22,
"num_write": 11,
"num_writ

[ceph-users] CephFS and 32-bit Inode Numbers

2019-10-14 Thread Dan van der Ster
Hi all,

One of our users has some 32-bit commercial software that they want to
use with CephFS, but it's not working because our inode numbers are
too large. E.g. his application gets a "file too big" error trying to
stat inode 0x40008445FB3.

I'm aware that CephFS is offsets the inode numbers by (mds_rank + 1) *
2^40; in the case above the file is managed by mds.3.

Did anyone see this same issue and find a workaround? (I read that
GlusterFS has an enable-in32 client option -- does CephFS have
something like that planned?)

Thanks!

Dan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Recurring issue: PG is inconsistent, but lists no inconsistent objects

2019-10-14 Thread Dan van der Ster
Hey Florian,

What does the ceph.log ERR or ceph-osd log show for this inconsistency?

-- Dan

On Mon, Oct 14, 2019 at 1:04 PM Florian Haas  wrote:
>
> Hello,
>
> I am running into an "interesting" issue with a PG that is being flagged
> as inconsistent during scrub (causing the cluster to go to HEALTH_ERR),
> but doesn't actually appear to contain any inconsistent objects.
>
> $ ceph health detail
> HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent
> OSD_SCRUB_ERRORS 1 scrub errors
> PG_DAMAGED Possible data damage: 1 pg inconsistent
> pg 10.10d is active+clean+inconsistent, acting [15,13]
>
> $ rados list-inconsistent-obj 10.10d
> {"epoch":12138,"inconsistents":[]}
>
> "ceph pg query" (see below) on that PG does report num_scrub_errors=1,
> num_shallow_scrub_errors=1, and num_objects_dirty=1. "osd scrub auto
> repair = true" is set on all OSDs, but the PG never auto-repairs. (This
> is a test cluster, the pool size is 2 — this may preclude auto repair
> from ever kicking in; I'm not sure on that one.)
>
> "ceph pg repair" does repair, but the issue reappears on the next
> scheduled scrub.
>
> This issue was first discovered while the cluster was on
> Jewel/Filestore. In an event like this I would normally suspect either a
> problem with an individual OSD, or a bug in the FileStore code. But the
> cluster has had *all* of it's OSDs replaced since, as part of a full
> Jewel→Luminous→Nautilus upgrade and a FileStore→BlueStore conversion.
> The issue still persists.
>
> A full "ceph pg 10.10d query" result is below. If anyone has ideas on
> how to permanently fix this issue, I'd be most grateful.
>
> Thanks!
>
> Cheers,
> Florian
>
>
>
>
> {
> "state": "active+clean+inconsistent",
> "snap_trimq": "[]",
> "snap_trimq_len": 0,
> "epoch": 12143,
> "up": [
> 15,
> 13
> ],
> "acting": [
> 15,
> 13
> ],
> "acting_recovery_backfill": [
> "13",
> "15"
> ],
> "info": {
> "pgid": "10.10d",
> "last_update": "100'11",
> "last_complete": "100'11",
> "log_tail": "0'0",
> "last_user_version": 11,
> "last_backfill": "MAX",
> "last_backfill_bitwise": 0,
> "purged_snaps": [],
> "history": {
> "epoch_created": 45,
> "epoch_pool_created": 45,
> "last_epoch_started": 12139,
> "last_interval_started": 12138,
> "last_epoch_clean": 12139,
> "last_interval_clean": 12138,
> "last_epoch_split": 0,
> "last_epoch_marked_full": 0,
> "same_up_since": 12138,
> "same_interval_since": 12138,
> "same_primary_since": 12114,
> "last_scrub": "100'11",
> "last_scrub_stamp": "2019-10-14 08:33:57.347097",
> "last_deep_scrub": "100'11",
> "last_deep_scrub_stamp": "2019-10-11 14:09:29.016946",
> "last_clean_scrub_stamp": "2019-10-11 14:09:29.016946"
> },
> "stats": {
> "version": "100'11",
> "reported_seq": "4927",
> "reported_epoch": "12143",
> "state": "active+clean+inconsistent",
> "last_fresh": "2019-10-14 08:33:57.347147",
> "last_change": "2019-10-14 08:33:57.347147",
> "last_active": "2019-10-14 08:33:57.347147",
> "last_peered": "2019-10-14 08:33:57.347147",
> "last_clean": "2019-10-14 08:33:57.347147",
> "last_became_active": "2019-10-11 14:44:09.312226",
> "last_became_peered": "2019-10-11 14:44:09.312226",
> "last_unstale": "2019-10-14 08:33:57.347147",
> "last_undegraded": "2019-10-14 08:33:57.347147",
> "last_fullsized": "2019-10-14 08:33:57.347147",
> "mapping_epoch": 12138,
> "log_start": "0'0",
> "ondisk_log_start": "0'0",
> "created": 45,
> "last_epoch_clean": 12139,
> "parent": "0.0",
> "parent_split_bits": 0,
> "last_scrub": "100'11",
> "last_scrub_stamp": "2019-10-14 08:33:57.347097",
> "last_deep_scrub": "100'11",
> "last_deep_scrub_stamp": "2019-10-11 14:09:29.016946",
> "last_clean_scrub_stamp": "2019-10-11 14:09:29.016946",
> "log_size": 11,
> "ondisk_log_size": 11,
> "stats_invalid": false,
> "dirty_stats_invalid": false,
> "omap_stats_invalid": false,
> "hitset_stats_invalid": false,
> "hitset_bytes_stats_invalid": true,
> "pin_stats_invalid": true,
> "manifest_stats_invalid": true,
> "snaptrimq_len": 0,
> "stat_sum": {
> "num_bytes": 11,
> "num_objects": 1,
> "num_object_clones": 0,
> "num_object_copies": 2,
> "num_objects_m

[ceph-users] Re: Recurring issue: PG is inconsistent, but lists no inconsistent objects

2019-10-14 Thread Florian Haas
On 14/10/2019 13:20, Dan van der Ster wrote:
> Hey Florian,
> 
> What does the ceph.log ERR or ceph-osd log show for this inconsistency?
> 
> -- Dan

Hi Dan,

what's in the log is (as far as I can see) consistent with the pg query
output:

2019-10-14 08:33:57.345 7f1808fb3700  0 log_channel(cluster) log [DBG] :
10.10d scrub starts
2019-10-14 08:33:57.345 7f1808fb3700 -1 log_channel(cluster) log [ERR] :
10.10d scrub : stat mismatch, got 0/1 objects, 0/0 clones, 0/1 dirty,
0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, 0/11 bytes,
0/0 manifest objects, 0/0 hit_set_archive bytes.
2019-10-14 08:33:57.345 7f1808fb3700 -1 log_channel(cluster) log [ERR] :
10.10d scrub 1 errors

Have you seen this before?

Cheers,
Florian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Recurring issue: PG is inconsistent, but lists no inconsistent objects

2019-10-14 Thread Dan van der Ster
On Mon, Oct 14, 2019 at 1:27 PM Florian Haas  wrote:
>
> On 14/10/2019 13:20, Dan van der Ster wrote:
> > Hey Florian,
> >
> > What does the ceph.log ERR or ceph-osd log show for this inconsistency?
> >
> > -- Dan
>
> Hi Dan,
>
> what's in the log is (as far as I can see) consistent with the pg query
> output:
>
> 2019-10-14 08:33:57.345 7f1808fb3700  0 log_channel(cluster) log [DBG] :
> 10.10d scrub starts
> 2019-10-14 08:33:57.345 7f1808fb3700 -1 log_channel(cluster) log [ERR] :
> 10.10d scrub : stat mismatch, got 0/1 objects, 0/0 clones, 0/1 dirty,
> 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, 0/11 bytes,
> 0/0 manifest objects, 0/0 hit_set_archive bytes.
> 2019-10-14 08:33:57.345 7f1808fb3700 -1 log_channel(cluster) log [ERR] :
> 10.10d scrub 1 errors
>
> Have you seen this before?

Yes occasionally we see stat mismatches -- repair always fixes
definitively though.
Are you using PG autoscaling? There's a known issue there which
generates stat mismatches.

-- dan


>
> Cheers,
> Florian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS and 32-bit Inode Numbers

2019-10-14 Thread Dan van der Ster
OK I found that the kernel has an "ino32" mount option which hashes 64
bit inos to 32-bit space.
Has anyone tried this?
What happens if two files collide?

-- Dan

On Mon, Oct 14, 2019 at 1:18 PM Dan van der Ster  wrote:
>
> Hi all,
>
> One of our users has some 32-bit commercial software that they want to
> use with CephFS, but it's not working because our inode numbers are
> too large. E.g. his application gets a "file too big" error trying to
> stat inode 0x40008445FB3.
>
> I'm aware that CephFS is offsets the inode numbers by (mds_rank + 1) *
> 2^40; in the case above the file is managed by mds.3.
>
> Did anyone see this same issue and find a workaround? (I read that
> GlusterFS has an enable-in32 client option -- does CephFS have
> something like that planned?)
>
> Thanks!
>
> Dan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Constant write load on 4 node ceph cluster

2019-10-14 Thread Ingo Schmidt
Hi all 

We have a 4 node ceph cluster that runs generally fine. It is the storage 
backend for our virtualization cluster with Proxmox, that runs about 40 virtual 
machines (80% various Linux Servers). Now that we have implemented monitoring, 
i see that there is a quite constant write load of about 4-10MB/s throughout 
the whole day and even weekends, while read load clearly is bound to daily work 
hours. 
I am wondering whats up with this. Is this normal? 
Outside working hours, there is almost no read load, well under 1MB/s, while 
write Load stays relatively constant at around 5MB/s. During rebalancing of the 
Cluster, normal write load decreases significantly to under 1MB/s. Overall 
performance is not greatly affected by rebalancing as it is tuned to relatively 
slow backfills (osd_max_backfills = '3' | osd_recovery_max_active = '3') 
After rebalance has been finished, the average write load of about 5MB/s 
reappears. 
I have attached a little screenshot for reference.


Can someone shed a little light on this? 
Thanks.

kind regards 
Ingo Schmidt 


IT-Department
Island Municipality Langeoog 
with in-house operations
Tourismusservice and Schiffahrt 

Hauptstraße 28 
26465 Langeoog 
Deutschland 

[ http://www.langeoog.de/ ] 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Recurring issue: PG is inconsistent, but lists no inconsistent objects

2019-10-14 Thread Florian Haas
On 14/10/2019 13:29, Dan van der Ster wrote:
>> Hi Dan,
>>
>> what's in the log is (as far as I can see) consistent with the pg query
>> output:
>>
>> 2019-10-14 08:33:57.345 7f1808fb3700  0 log_channel(cluster) log [DBG] :
>> 10.10d scrub starts
>> 2019-10-14 08:33:57.345 7f1808fb3700 -1 log_channel(cluster) log [ERR] :
>> 10.10d scrub : stat mismatch, got 0/1 objects, 0/0 clones, 0/1 dirty,
>> 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, 0/11 bytes,
>> 0/0 manifest objects, 0/0 hit_set_archive bytes.
>> 2019-10-14 08:33:57.345 7f1808fb3700 -1 log_channel(cluster) log [ERR] :
>> 10.10d scrub 1 errors
>>
>> Have you seen this before?
> 
> Yes occasionally we see stat mismatches -- repair always fixes
> definitively though.

Not here, sadly. That error keeps coming back, always in the same PG,
and only in that PG.

> Are you using PG autoscaling? There's a known issue there which
> generates stat mismatches.

I'd appreciate a link to more information if you have one, but a PG
autoscaling problem wouldn't really match with the issue already
appearing in pre-Nautilus releases. :)

Cheers,
Florian

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Constant write load on 4 node ceph cluster

2019-10-14 Thread Ashley Merrick
Is the storage being used for the whole VM disk?



If so have you checked none of your software is writing constant log's? Or 
something that could continuously write to disk.



If your running a new version you can use : 
https://docs.ceph.com/docs/mimic/mgr/iostat/ to locate the exact RBD image.





 On Mon, 14 Oct 2019 20:32:53 +0800 Ingo Schmidt  
wrote 



Hi all 
 
We have a 4 node ceph cluster that runs generally fine. It is the storage 
backend for our virtualization cluster with Proxmox, that runs about 40 virtual 
machines (80% various Linux Servers). Now that we have implemented monitoring, 
i see that there is a quite constant write load of about 4-10MB/s throughout 
the whole day and even weekends, while read load clearly is bound to daily work 
hours. 
I am wondering whats up with this. Is this normal? 
Outside working hours, there is almost no read load, well under 1MB/s, while 
write Load stays relatively constant at around 5MB/s. During rebalancing of the 
Cluster, normal write load decreases significantly to under 1MB/s. Overall 
performance is not greatly affected by rebalancing as it is tuned to relatively 
slow backfills (osd_max_backfills = '3' | osd_recovery_max_active = '3') 
After rebalance has been finished, the average write load of about 5MB/s 
reappears. 
I have attached a little screenshot for reference. 
 
 
Can someone shed a little light on this? 
Thanks. 
 
kind regards 
Ingo Schmidt 
 
 
IT-Department 
Island Municipality Langeoog 
with in-house operations 
Tourismusservice and Schiffahrt 
 
Hauptstraße 28 
26465 Langeoog 
Deutschland 
 
[ http://www.langeoog.de/ ] 
___ 
ceph-users mailing list -- mailto:ceph-users@ceph.io 
To unsubscribe send an email to mailto:ceph-users-le...@ceph.io___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] object goes missing in bucket

2019-10-14 Thread Benjamin . Zieglmeier
Hey all,

Experiencing an odd issue over the last week or so with a single bucket in a 
Ceph Luminous (12.2.11) cluster. We occasionally get a complaint from the owner 
of one bucket (bucket1) that a single object they have written has gone 
missing. If we list the bucket, the object is indeed missing, and is not 
present if we look for it using `radosgw-admin bi list –bucket=bucket1`. If we 
dump the pool the object is placed using (rados -p  ls ) we can see 
the object is present in the pool.

Looking in the logs we are having a hard time finding any errors, and if we 
look at the rgw logs we can see the PUT, followed by a couple GETs and HEADs 
(we believe this to be client app verification) but never a DELETE. This 
incident has happened a couple of times over the past week, but only to this 
bucket. The bucket in question does average anywhere from 100-200ops per second 
based on the rgw usage logs. Any assistance in knowing where to look to see why 
these objects might be disappearing from the index would be greatly appreciated!

-Ben
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Recurring issue: PG is inconsistent, but lists no inconsistent objects

2019-10-14 Thread Dan van der Ster
On Mon, Oct 14, 2019 at 3:14 PM Florian Haas  wrote:
>
> On 14/10/2019 13:29, Dan van der Ster wrote:
> >> Hi Dan,
> >>
> >> what's in the log is (as far as I can see) consistent with the pg query
> >> output:
> >>
> >> 2019-10-14 08:33:57.345 7f1808fb3700  0 log_channel(cluster) log [DBG] :
> >> 10.10d scrub starts
> >> 2019-10-14 08:33:57.345 7f1808fb3700 -1 log_channel(cluster) log [ERR] :
> >> 10.10d scrub : stat mismatch, got 0/1 objects, 0/0 clones, 0/1 dirty,
> >> 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, 0/11 bytes,
> >> 0/0 manifest objects, 0/0 hit_set_archive bytes.
> >> 2019-10-14 08:33:57.345 7f1808fb3700 -1 log_channel(cluster) log [ERR] :
> >> 10.10d scrub 1 errors
> >>
> >> Have you seen this before?
> >
> > Yes occasionally we see stat mismatches -- repair always fixes
> > definitively though.
>
> Not here, sadly. That error keeps coming back, always in the same PG,
> and only in that PG.
>
> > Are you using PG autoscaling? There's a known issue there which
> > generates stat mismatches.
>
> I'd appreciate a link to more information if you have one, but a PG
> autoscaling problem wouldn't really match with the issue already
> appearing in pre-Nautilus releases. :)

https://github.com/ceph/ceph/pull/30479

-- dan
>
> Cheers,
> Florian
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Past_interval start interval mismatch (last_clean_epoch reported)

2019-10-14 Thread Huseyin Cotuk
Hi all,

I also hit the bug #24866 in my test environment. According to the logs, the 
last_clean_epoch in the specified OSD/PG is 17703, but the interval starts with 
17895. So the OSD fails to start. There are some other OSDs in the same status. 

2019-10-14 18:22:51.908 7f0a275f1700 -1 osd.21 pg_epoch: 18432 pg[18.51( v 
18388'4 lc 18386'3 (0'0,18388'4] local-lis/les=18430/18431 n=1 ec=295/295 lis/c 
18430/17702 les/c/f 18431/17703/0 18428/18430/18421) [11,21]/[11,21,20] r=1 
lpr=18431 pi=[17895,18430)/3 crt=18388'4 lcod 0'0 unknown m=1 mbc={}] 18.51 
past_intervals [17895,18430) start interval does not contain the required bound 
[17703,18430) start

The cause is pg 18.51 went clean in 17703 but 17895 is reported to the monitor. 

I am using the last stable version of Mimic (13.2.6).

Any idea how to fix it? Is there any way to bypass this check or fix the 
reported epoch #?

Thanks in advance. 

Best regards,
Huseyin Cotuk
hco...@gmail.com

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Constant write load on 4 node ceph cluster

2019-10-14 Thread Ingo Schmidt
Great, this helped a lot. Although "ceph iostat" didn't give iostats of single 
images, but just general overview of IO, i remembered the new nautilus RDB 
performance monitoring.

https://ceph.com/rbd/new-in-nautilus-rbd-performance-monitoring/

With a "simple"
>rbd perf image iotop
i was able to see that the writes indeed are from the Log Server and the Zabbix 
Monitoring Server. I didn't expect that it would cause that much I/O... 
unbelieveable...

- Ursprüngliche Mail -
Von: "Ashley Merrick" 
An: "i schmidt" 
CC: "ceph-users" 
Gesendet: Montag, 14. Oktober 2019 15:20:46
Betreff: Re: [ceph-users] Constant write load on 4 node ceph cluster

Is the storage being used for the whole VM disk? 

If so have you checked none of your software is writing constant log's? Or 
something that could continuously write to disk. 

If your running a new version you can use : [ 
https://docs.ceph.com/docs/mimic/mgr/iostat/ | 
https://docs.ceph.com/docs/mimic/mgr/iostat/ ] to locate the exact RBD image. 




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Constant write load on 4 node ceph cluster

2019-10-14 Thread Paul Emmerich
It's pretty common to see way more writes than reads if you got lots of idle VMs


Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Mon, Oct 14, 2019 at 6:34 PM Ingo Schmidt  wrote:
>
> Great, this helped a lot. Although "ceph iostat" didn't give iostats of 
> single images, but just general overview of IO, i remembered the new nautilus 
> RDB performance monitoring.
>
> https://ceph.com/rbd/new-in-nautilus-rbd-performance-monitoring/
>
> With a "simple"
> >rbd perf image iotop
> i was able to see that the writes indeed are from the Log Server and the 
> Zabbix Monitoring Server. I didn't expect that it would cause that much 
> I/O... unbelieveable...
>
> - Ursprüngliche Mail -
> Von: "Ashley Merrick" 
> An: "i schmidt" 
> CC: "ceph-users" 
> Gesendet: Montag, 14. Oktober 2019 15:20:46
> Betreff: Re: [ceph-users] Constant write load on 4 node ceph cluster
>
> Is the storage being used for the whole VM disk?
>
> If so have you checked none of your software is writing constant log's? Or 
> something that could continuously write to disk.
>
> If your running a new version you can use : [ 
> https://docs.ceph.com/docs/mimic/mgr/iostat/ | 
> https://docs.ceph.com/docs/mimic/mgr/iostat/ ] to locate the exact RBD image.
>
>
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RGW blocking on large objects

2019-10-14 Thread Robert LeBlanc
We set up a new Nautilus cluster and only have RGW on it. While we had
a job doing 200k IOPs of really small objects, I noticed that HAProxy
was kicking out RGW backends because they were taking more than 2
seconds to return. We GET a large ~4GB file each minute and use that
as a health check to determine if the system is taking too long to
service requests. It seems that other IO is being blocked by this
large transfer. This seems to be the case with both civetweb and
beast. But I'm double checking beast at the moment because I'm not
100% sure we were using it at the start.

Any ideas how to mitigate this? It seems that IOs are being scheduled
on a thread and if they get unlucky enough to be scheduled behind a
big IO, they are just stuck, in this case HAProxy could kick out the
backend before the IO is returned and it has to re-request it.

Thank you,
Robert LeBlanc



Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Recurring issue: PG is inconsistent, but lists no inconsistent objects

2019-10-14 Thread Florian Haas
On 14/10/2019 17:21, Dan van der Ster wrote:
>> I'd appreciate a link to more information if you have one, but a PG
>> autoscaling problem wouldn't really match with the issue already
>> appearing in pre-Nautilus releases. :)
> 
> https://github.com/ceph/ceph/pull/30479

Thanks! But no, this doesn't look like a likely culprit, for the reason
that we also saw this in Luminous and hence, *definitely* without splits
or merges in play.

Has anyone else seen these scrub false positives — if that's what they are?

Cheers,
Florian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Recurring issue: PG is inconsistent, but lists no inconsistent objects

2019-10-14 Thread Reed Dier
I had something slightly similar to you.

However, my issue was specific/limited to the device_health_metrics pool that 
is auto-created with 1 PG when you turn that mgr feature on.

https://www.mail-archive.com/ceph-users@lists.ceph.com/msg56315.html 


I was never able to get a good resolution, other than finally running pg repair 
on it, and it resolved itself.

Reed

> On Oct 14, 2019, at 2:55 PM, Florian Haas  wrote:
> 
> On 14/10/2019 17:21, Dan van der Ster wrote:
>>> I'd appreciate a link to more information if you have one, but a PG
>>> autoscaling problem wouldn't really match with the issue already
>>> appearing in pre-Nautilus releases. :)
>> 
>> https://github.com/ceph/ceph/pull/30479
> 
> Thanks! But no, this doesn't look like a likely culprit, for the reason
> that we also saw this in Luminous and hence, *definitely* without splits
> or merges in play.
> 
> Has anyone else seen these scrub false positives — if that's what they are?
> 
> Cheers,
> Florian
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW blocking on large objects

2019-10-14 Thread Paul Emmerich
Could the 4 GB GET limit saturate the connection from rgw to Ceph?
Simple to test: just rate-limit the health check GET

Did you increase "objecter inflight ops" and "objecter inflight op bytes"?
You absolutely should adjust these settings for large RGW setups,
defaults of 1024 and 100 MB are way too low for many RGW setups, we
default to 8192 and 800MB

Sometimes "ms async op threads" and "ms async max op threads" might
help as well (we adjust them by default, but for other reasons)


Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Mon, Oct 14, 2019 at 9:54 PM Robert LeBlanc  wrote:
>
> We set up a new Nautilus cluster and only have RGW on it. While we had
> a job doing 200k IOPs of really small objects, I noticed that HAProxy
> was kicking out RGW backends because they were taking more than 2
> seconds to return. We GET a large ~4GB file each minute and use that
> as a health check to determine if the system is taking too long to
> service requests. It seems that other IO is being blocked by this
> large transfer. This seems to be the case with both civetweb and
> beast. But I'm double checking beast at the moment because I'm not
> 100% sure we were using it at the start.
>
> Any ideas how to mitigate this? It seems that IOs are being scheduled
> on a thread and if they get unlucky enough to be scheduled behind a
> big IO, they are just stuck, in this case HAProxy could kick out the
> backend before the IO is returned and it has to re-request it.
>
> Thank you,
> Robert LeBlanc
>
>
> 
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io