[ceph-users] Re: cephadm - How to deploy ceph cluster with a partition on SSD for block.db

2020-09-08 Thread klemen
I found out that it's already possible to specify storage path in OSD service 
specification yaml. It works for data_devices, but unfortunately not for 
db_devices and wal_devices, at least not in my case.

service_type: osd
service_id: osd_spec_default
placement:
  host_pattern: '*'
data_devices:
  paths:
  - /dev/vdb1
db_devices:
  paths:
  - /dev/vdb2
wal_devices:
  paths:
  - /dev/vdb3
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Spam here still

2020-09-08 Thread Marc Roos
 

Do know that this is the only mailing list I am subscribed to, that 
sends me so much spam. Maybe the list admin should finally have a word 
with other list admins on how they are managing their lists
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Spam here still

2020-09-08 Thread Lindsay Mathieson

On 8/09/2020 5:30 pm, Marc Roos wrote:
  


Do know that this is the only mailing list I am subscribed to, that
sends me so much spam. Maybe the list admin should finally have a word
with other list admins on how they are managing their lists
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


Same. I'm missing a number of legit emails as the list gets classified 
as spam.


--
Lindsay
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Syncing cephfs from Ceph to Ceph

2020-09-08 Thread Simon Sutter
Hello,


Is it possible to somehow sync a ceph from one site to a ceph form another site?
I'm just using the cephfs feature and no block devices.

Being able to sync cephfs pools between two sites would be great for a hot 
backup, in case one site fails.


Thanks in advance,

Simon
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Syncing cephfs from Ceph to Ceph

2020-09-08 Thread Stefan Kooman
On 2020-09-08 11:22, Simon Sutter wrote:
> Hello,
> 
> 
> Is it possible to somehow sync a ceph from one site to a ceph form another 
> site?
> I'm just using the cephfs feature and no block devices.
> 
> Being able to sync cephfs pools between two sites would be great for a hot 
> backup, in case one site fails.

It's a work in progress [1]. This might do what you want to right now:
[2]. Note: I haven't used [2] myself.

Gr. Stefan

[1]: https://docs.ceph.com/docs/master/dev/cephfs-mirroring/
[2]: https://github.com/oliveiradan/cephfs-sync
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Syncing cephfs from Ceph to Ceph

2020-09-08 Thread Simon Sutter
Thanks Stefan,

First of all, for a bit more context, we use this ceph cluster just for hot 
backups, so 99% write 1% read, no need for low latency.

Ok so the snapshot function would mean, we would have like a colder backup.
Just like a snapshot of a VM, without any incremental functionality, which also 
means scheduled but huge transfer rates.

What about the idea of creating the cluster over two data centers?
Would it be possible to modify the crush map, so one pool gets replicated over 
those two data centers and if one fails, the other one would still be 
functional?
Additionally, would it be possible to prioritize one data center over the other?
This would allow saving data from site1 to a pool on site2 in case of a 
disaster on site1, site2 would still have those Backups.

We have a 10G connection with around 0.5ms latency.


Thanks in advance,
Simon



Von: Stefan Kooman 
Gesendet: Dienstag, 8. September 2020 11:38:29
An: Simon Sutter; ceph-users@ceph.io
Betreff: Re: [ceph-users] Syncing cephfs from Ceph to Ceph

On 2020-09-08 11:22, Simon Sutter wrote:
> Hello,
>
>
> Is it possible to somehow sync a ceph from one site to a ceph form another 
> site?
> I'm just using the cephfs feature and no block devices.
>
> Being able to sync cephfs pools between two sites would be great for a hot 
> backup, in case one site fails.

It's a work in progress [1]. This might do what you want to right now:
[2]. Note: I haven't used [2] myself.

Gr. Stefan

[1]: https://docs.ceph.com/docs/master/dev/cephfs-mirroring/
[2]: https://github.com/oliveiradan/cephfs-sync
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Spam here still

2020-09-08 Thread Gerhard W. Recher
just my 5 cents, admin should disable postings on web interface ...

all spams are  injected via hyperkitty !!

since there is no parameter to accomplish this, admin should hack into
"post_to_list" and raise a exeption upon posting attempts to mittigate
this !

regards

Gerhard W. Recher

net4sec UG (haftungsbeschränkt)
Leitenweg 6
86929 Penzing

+49 8191 4283888
+49 171 4802507
Am 08.09.2020 um 10:50 schrieb Lindsay Mathieson:
> On 8/09/2020 5:30 pm, Marc Roos wrote:
>>  
>> Do know that this is the only mailing list I am subscribed to, that
>> sends me so much spam. Maybe the list admin should finally have a word
>> with other list admins on how they are managing their lists
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>
> Same. I'm missing a number of legit emails as the list gets classified
> as spam.
>




smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Spam here still

2020-09-08 Thread Gerhard W. Recher
update:

admin should consider to use Version 1.3.4

https://hyperkitty.readthedocs.io/en/latest/news.html



  * Implemented a new |HYPERKITTY_ALLOW_WEB_POSTING| that allows
disabling the web posting feature. (Closes #264)




Gerhard W. Recher

net4sec UG (haftungsbeschränkt)
Leitenweg 6
86929 Penzing

+49 8191 4283888
+49 171 4802507
Am 08.09.2020 um 12:53 schrieb Gerhard W. Recher:
> just my 5 cents, admin should disable postings on web interface ...
>
> all spams are  injected via hyperkitty !!
>
> since there is no parameter to accomplish this, admin should hack into
> "post_to_list" and raise a exeption upon posting attempts to mittigate
> this !
>
> regards
>
> Gerhard W. Recher
>
> net4sec UG (haftungsbeschränkt)
> Leitenweg 6
> 86929 Penzing
>
> +49 8191 4283888
> +49 171 4802507
> Am 08.09.2020 um 10:50 schrieb Lindsay Mathieson:
>> On 8/09/2020 5:30 pm, Marc Roos wrote:
>>>  
>>> Do know that this is the only mailing list I am subscribed to, that
>>> sends me so much spam. Maybe the list admin should finally have a word
>>> with other list admins on how they are managing their lists
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> Same. I'm missing a number of legit emails as the list gets classified
>> as spam.
>>
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm - How to deploy ceph cluster with a partition on SSD for block.db

2020-09-08 Thread Dimitri Savineau
https://tracker.ceph.com/issues/46558
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Multipart uploads with partsizes larger than 16MiB failing on Nautilus

2020-09-08 Thread shubjero
Hey all,

I'm creating a new post for this issue as we've narrowed the problem
down to a partsize limitation on multipart upload. We have discovered
that in our production Nautilus (14.2.11) cluster and our lab Nautilus
(14.2.10) cluster that multipart uploads with a configured part size
of greater than 16777216 bytes (16MiB) will return a status 500 /
internal server error from radosgw.

So far I have increased the following rgw settings/values that looked
suspect, without any success/improvement with partsizes.
Such as:
"rgw_get_obj_window_size": "16777216",
"rgw_put_obj_min_window_size": "16777216",

I am trying to determine if this is because of a conservative default
setting somewhere that I don't know about or if this is perhaps a bug?

I would appreciate it if someone on Nautilus with rgw could also test
/ provide feedback. It's very easy to reproduce and configuring your
partsize with aws2cli requires you to put the following in your aws
'config'
s3 =
  multipart_chunksize = 32MB

rgw server logs during a failed multipart upload (32MB chunk/partsize):
2020-09-08 15:59:36.054 7f2d32fa6700  1 == starting new request
req=0x55953dc36930 =
2020-09-08 15:59:36.082 7f2d32fa6700 -1 res_query() failed
2020-09-08 15:59:36.138 7f2d32fa6700  1 == req done
req=0x55953dc36930 op status=0 http_status=200 latency=0.0839988s
==
2020-09-08 16:00:07.285 7f2d3dfbc700  1 == starting new request
req=0x55953dc36930 =
2020-09-08 16:00:07.285 7f2d3dfbc700 -1 res_query() failed
2020-09-08 16:00:07.353 7f2d00741700  1 == starting new request
req=0x55954dd5e930 =
2020-09-08 16:00:07.357 7f2d00741700 -1 res_query() failed
2020-09-08 16:00:07.413 7f2cc56cb700  1 == starting new request
req=0x55953dc02930 =
2020-09-08 16:00:07.417 7f2cc56cb700 -1 res_query() failed
2020-09-08 16:00:07.473 7f2cb26a5700  1 == starting new request
req=0x5595426f6930 =
2020-09-08 16:00:07.473 7f2cb26a5700 -1 res_query() failed
2020-09-08 16:00:09.465 7f2d3dfbc700  0 WARNING: set_req_state_err
err_no=35 resorting to 500
2020-09-08 16:00:09.465 7f2d3dfbc700  1 == req done
req=0x55953dc36930 op status=-35 http_status=500 latency=2.17997s
==
2020-09-08 16:00:09.549 7f2d00741700  0 WARNING: set_req_state_err
err_no=35 resorting to 500
2020-09-08 16:00:09.549 7f2d00741700  1 == req done
req=0x55954dd5e930 op status=-35 http_status=500 latency=2.19597s
==
2020-09-08 16:00:09.605 7f2cc56cb700  0 WARNING: set_req_state_err
err_no=35 resorting to 500
2020-09-08 16:00:09.609 7f2cc56cb700  1 == req done
req=0x55953dc02930 op status=-35 http_status=500 latency=2.19597s
==
2020-09-08 16:00:09.641 7f2cb26a5700  0 WARNING: set_req_state_err
err_no=35 resorting to 500
2020-09-08 16:00:09.641 7f2cb26a5700  1 == req done
req=0x5595426f6930 op status=-35 http_status=500 latency=2.16797s
==

awscli client side output during a failed multipart upload:
root@jump:~# aws --no-verify-ssl --endpoint-url
http://lab-object.cancercollaboratory.org:7480 s3 cp 4GBfile
s3://troubleshooting
upload failed: ./4GBfile to s3://troubleshooting/4GBfile An error
occurred (UnknownError) when calling the UploadPart operation (reached
max retries: 2): Unknown

Thanks,

Jared Baker
Cloud Architect for the Cancer Genome Collaboratory
Ontario Institute for Cancer Research
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm didn't create journals

2020-09-08 Thread Dimitri Savineau
journal_devices is for filestore and filestore isn't supported with cephadm
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Multipart uploads with partsizes larger than 16MiB failing on Nautilus

2020-09-08 Thread shubjero
I had been looking into this issue all day and during testing found
that a specific configuration option we had been setting for years was
the culprit. Not setting this value and letting it fall back to the
default seems to have fixed our issue with multipart uploads.

If you are curious, the configuration option is rgw_obj_stripe_size
which was being set to 67108864 bytes (64MiB). The default is 4194304
bytes (4MiB). This is a documented option
(https://docs.ceph.com/docs/nautilus/radosgw/config-ref/) and from my
testing it seems like using anything but the default (only tried
larger values) breaks multipart uploads.

On Tue, Sep 8, 2020 at 12:12 PM shubjero  wrote:
>
> Hey all,
>
> I'm creating a new post for this issue as we've narrowed the problem
> down to a partsize limitation on multipart upload. We have discovered
> that in our production Nautilus (14.2.11) cluster and our lab Nautilus
> (14.2.10) cluster that multipart uploads with a configured part size
> of greater than 16777216 bytes (16MiB) will return a status 500 /
> internal server error from radosgw.
>
> So far I have increased the following rgw settings/values that looked
> suspect, without any success/improvement with partsizes.
> Such as:
> "rgw_get_obj_window_size": "16777216",
> "rgw_put_obj_min_window_size": "16777216",
>
> I am trying to determine if this is because of a conservative default
> setting somewhere that I don't know about or if this is perhaps a bug?
>
> I would appreciate it if someone on Nautilus with rgw could also test
> / provide feedback. It's very easy to reproduce and configuring your
> partsize with aws2cli requires you to put the following in your aws
> 'config'
> s3 =
>   multipart_chunksize = 32MB
>
> rgw server logs during a failed multipart upload (32MB chunk/partsize):
> 2020-09-08 15:59:36.054 7f2d32fa6700  1 == starting new request
> req=0x55953dc36930 =
> 2020-09-08 15:59:36.082 7f2d32fa6700 -1 res_query() failed
> 2020-09-08 15:59:36.138 7f2d32fa6700  1 == req done
> req=0x55953dc36930 op status=0 http_status=200 latency=0.0839988s
> ==
> 2020-09-08 16:00:07.285 7f2d3dfbc700  1 == starting new request
> req=0x55953dc36930 =
> 2020-09-08 16:00:07.285 7f2d3dfbc700 -1 res_query() failed
> 2020-09-08 16:00:07.353 7f2d00741700  1 == starting new request
> req=0x55954dd5e930 =
> 2020-09-08 16:00:07.357 7f2d00741700 -1 res_query() failed
> 2020-09-08 16:00:07.413 7f2cc56cb700  1 == starting new request
> req=0x55953dc02930 =
> 2020-09-08 16:00:07.417 7f2cc56cb700 -1 res_query() failed
> 2020-09-08 16:00:07.473 7f2cb26a5700  1 == starting new request
> req=0x5595426f6930 =
> 2020-09-08 16:00:07.473 7f2cb26a5700 -1 res_query() failed
> 2020-09-08 16:00:09.465 7f2d3dfbc700  0 WARNING: set_req_state_err
> err_no=35 resorting to 500
> 2020-09-08 16:00:09.465 7f2d3dfbc700  1 == req done
> req=0x55953dc36930 op status=-35 http_status=500 latency=2.17997s
> ==
> 2020-09-08 16:00:09.549 7f2d00741700  0 WARNING: set_req_state_err
> err_no=35 resorting to 500
> 2020-09-08 16:00:09.549 7f2d00741700  1 == req done
> req=0x55954dd5e930 op status=-35 http_status=500 latency=2.19597s
> ==
> 2020-09-08 16:00:09.605 7f2cc56cb700  0 WARNING: set_req_state_err
> err_no=35 resorting to 500
> 2020-09-08 16:00:09.609 7f2cc56cb700  1 == req done
> req=0x55953dc02930 op status=-35 http_status=500 latency=2.19597s
> ==
> 2020-09-08 16:00:09.641 7f2cb26a5700  0 WARNING: set_req_state_err
> err_no=35 resorting to 500
> 2020-09-08 16:00:09.641 7f2cb26a5700  1 == req done
> req=0x5595426f6930 op status=-35 http_status=500 latency=2.16797s
> ==
>
> awscli client side output during a failed multipart upload:
> root@jump:~# aws --no-verify-ssl --endpoint-url
> http://lab-object.cancercollaboratory.org:7480 s3 cp 4GBfile
> s3://troubleshooting
> upload failed: ./4GBfile to s3://troubleshooting/4GBfile An error
> occurred (UnknownError) when calling the UploadPart operation (reached
> max retries: 2): Unknown
>
> Thanks,
>
> Jared Baker
> Cloud Architect for the Cancer Genome Collaboratory
> Ontario Institute for Cancer Research
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph pgs inconsistent, always the same checksum

2020-09-08 Thread David Orman
Hi,

I've got a ceph cluster, 7 nodes, 168 OSDs, with 96G of ram on each server.
Ceph has been instructed to set a memory target of 3G until we increase RAM
to 128G per node. Available memory tends to hover around 14G. I do see a
tiny bit (KB) of swap utilization per ceph-osd process, but there's no
reason for it, so unsure what that's about:

root@ceph02:~# cat /proc/14363/status |egrep 'Name|VmSwap'

*Name*: ceph-osd

*VmSwap*: 464 kB

We're seeing repeated inconsistent PG warnings, generally on the order of
3-10 per week.

pg 2.b9 is active+clean+inconsistent, acting [25,117,128,95,151,15]

PG query on that PG:

INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15

{

"snap_trimq": "[]",

"snap_trimq_len": 0,

"state": "active+clean+inconsistent",

"epoch": 20278,

"up": [

25,

117,

128,

95,

151,

15

],

"acting": [

25,

117,

128,

95,

151,

15

],

"acting_recovery_backfill": [

"15(5)",

"25(0)",

"95(3)",

"117(1)",

"128(2)",

"151(4)"

],

"info": {

"pgid": "2.b9s0",

"last_update": "20278'445510",

"last_complete": "20278'445510",

"log_tail": "20278'438137",

"last_user_version": 445510,

"last_backfill": "MAX",

"purged_snaps": [],

"history": {

"epoch_created": 573,

"epoch_pool_created": 100,

"last_epoch_started": 14679,

"last_interval_started": 14678,

"last_epoch_clean": 14716,

"last_interval_clean": 14678,

"last_epoch_split": 573,

"last_epoch_marked_full": 0,

"same_up_since": 14678,

"same_interval_since": 14678,

"same_primary_since": 14396,

"last_scrub": "20278'444009",

"last_scrub_stamp": "2020-09-08T16:57:22.430246+",

"last_deep_scrub": "20278'444009",

"last_deep_scrub_stamp": "2020-09-08T16:57:22.430246+",

"last_clean_scrub_stamp": "2020-09-07T06:34:26.320796+",

"prior_readable_until_ub": 0

},

"stats": {

"version": "20278'445510",

"reported_seq": "896803",

"reported_epoch": "20278",

"state": "active+clean+inconsistent",

"last_fresh": "2020-09-08T18:06:45.463880+",

"last_change": "2020-09-08T16:57:22.430293+",

"last_active": "2020-09-08T18:06:45.463880+",

"last_peered": "2020-09-08T18:06:45.463880+",

"last_clean": "2020-09-08T18:06:45.463880+",

"last_became_active": "2020-08-06T19:35:02.634999+",

"last_became_peered": "2020-08-06T19:35:02.634999+",

"last_unstale": "2020-09-08T18:06:45.463880+",

"last_undegraded": "2020-09-08T18:06:45.463880+",

"last_fullsized": "2020-09-08T18:06:45.463880+",

"mapping_epoch": 14678,

"log_start": "20278'438137",

"ondisk_log_start": "20278'438137",

"created": 573,

"last_epoch_clean": 14716,

"parent": "0.0",

"parent_split_bits": 10,

"last_scrub": "20278'444009",

"last_scrub_stamp": "2020-09-08T16:57:22.430246+",

"last_deep_scrub": "20278'444009",

"last_deep_scrub_stamp": "2020-09-08T16:57:22.430246+",

"last_clean_scrub_stamp": "2020-09-07T06:34:26.320796+",

"log_size": 7373,

"ondisk_log_size": 7373,

"stats_invalid": false,

"dirty_stats_invalid": false,

"omap_stats_invalid": false,

"hitset_stats_invalid": false,

"hitset_bytes_stats_invalid": false,

"pin_stats_invalid": false,

"manifest_stats_invalid": false,

"snaptrimq_len": 0,

"stat_sum": {

"num_bytes": 322985947136,

"num_objects": 78724,

"num_object_clones": 0,

"num_object_copies": 472344,

"num_objects_missing_on_primary": 0,

"num_objects_missing": 0,

"num_objects_degraded": 0,

"num_objects_misplaced": 0,

"num_objects_unfound": 0,

"num_objects_dirty": 78724,

"num_whiteouts": 0,

"num_read": 430713,

"num_read_kb": 121695928,

"num_write": 445501,

"num_write_kb": 405283436,

"num_scrub_errors": 1,

"num_shallow_scrub_errors": 0,

"num_deep_scrub_errors": 1,

"num_objects_recovered": 21,

"num_bytes_recovered": 88080384,

"num_key

[ceph-users] Re: Multipart uploads with partsizes larger than 16MiB failing on Nautilus

2020-09-08 Thread Matt Benjamin
thanks, Shubjero

Would you consider creating a ceph tracker issue for this?

regards,

Matt

On Tue, Sep 8, 2020 at 4:13 PM shubjero  wrote:
>
> I had been looking into this issue all day and during testing found
> that a specific configuration option we had been setting for years was
> the culprit. Not setting this value and letting it fall back to the
> default seems to have fixed our issue with multipart uploads.
>
> If you are curious, the configuration option is rgw_obj_stripe_size
> which was being set to 67108864 bytes (64MiB). The default is 4194304
> bytes (4MiB). This is a documented option
> (https://docs.ceph.com/docs/nautilus/radosgw/config-ref/) and from my
> testing it seems like using anything but the default (only tried
> larger values) breaks multipart uploads.
>
> On Tue, Sep 8, 2020 at 12:12 PM shubjero  wrote:
> >
> > Hey all,
> >
> > I'm creating a new post for this issue as we've narrowed the problem
> > down to a partsize limitation on multipart upload. We have discovered
> > that in our production Nautilus (14.2.11) cluster and our lab Nautilus
> > (14.2.10) cluster that multipart uploads with a configured part size
> > of greater than 16777216 bytes (16MiB) will return a status 500 /
> > internal server error from radosgw.
> >
> > So far I have increased the following rgw settings/values that looked
> > suspect, without any success/improvement with partsizes.
> > Such as:
> > "rgw_get_obj_window_size": "16777216",
> > "rgw_put_obj_min_window_size": "16777216",
> >
> > I am trying to determine if this is because of a conservative default
> > setting somewhere that I don't know about or if this is perhaps a bug?
> >
> > I would appreciate it if someone on Nautilus with rgw could also test
> > / provide feedback. It's very easy to reproduce and configuring your
> > partsize with aws2cli requires you to put the following in your aws
> > 'config'
> > s3 =
> >   multipart_chunksize = 32MB
> >
> > rgw server logs during a failed multipart upload (32MB chunk/partsize):
> > 2020-09-08 15:59:36.054 7f2d32fa6700  1 == starting new request
> > req=0x55953dc36930 =
> > 2020-09-08 15:59:36.082 7f2d32fa6700 -1 res_query() failed
> > 2020-09-08 15:59:36.138 7f2d32fa6700  1 == req done
> > req=0x55953dc36930 op status=0 http_status=200 latency=0.0839988s
> > ==
> > 2020-09-08 16:00:07.285 7f2d3dfbc700  1 == starting new request
> > req=0x55953dc36930 =
> > 2020-09-08 16:00:07.285 7f2d3dfbc700 -1 res_query() failed
> > 2020-09-08 16:00:07.353 7f2d00741700  1 == starting new request
> > req=0x55954dd5e930 =
> > 2020-09-08 16:00:07.357 7f2d00741700 -1 res_query() failed
> > 2020-09-08 16:00:07.413 7f2cc56cb700  1 == starting new request
> > req=0x55953dc02930 =
> > 2020-09-08 16:00:07.417 7f2cc56cb700 -1 res_query() failed
> > 2020-09-08 16:00:07.473 7f2cb26a5700  1 == starting new request
> > req=0x5595426f6930 =
> > 2020-09-08 16:00:07.473 7f2cb26a5700 -1 res_query() failed
> > 2020-09-08 16:00:09.465 7f2d3dfbc700  0 WARNING: set_req_state_err
> > err_no=35 resorting to 500
> > 2020-09-08 16:00:09.465 7f2d3dfbc700  1 == req done
> > req=0x55953dc36930 op status=-35 http_status=500 latency=2.17997s
> > ==
> > 2020-09-08 16:00:09.549 7f2d00741700  0 WARNING: set_req_state_err
> > err_no=35 resorting to 500
> > 2020-09-08 16:00:09.549 7f2d00741700  1 == req done
> > req=0x55954dd5e930 op status=-35 http_status=500 latency=2.19597s
> > ==
> > 2020-09-08 16:00:09.605 7f2cc56cb700  0 WARNING: set_req_state_err
> > err_no=35 resorting to 500
> > 2020-09-08 16:00:09.609 7f2cc56cb700  1 == req done
> > req=0x55953dc02930 op status=-35 http_status=500 latency=2.19597s
> > ==
> > 2020-09-08 16:00:09.641 7f2cb26a5700  0 WARNING: set_req_state_err
> > err_no=35 resorting to 500
> > 2020-09-08 16:00:09.641 7f2cb26a5700  1 == req done
> > req=0x5595426f6930 op status=-35 http_status=500 latency=2.16797s
> > ==
> >
> > awscli client side output during a failed multipart upload:
> > root@jump:~# aws --no-verify-ssl --endpoint-url
> > http://lab-object.cancercollaboratory.org:7480 s3 cp 4GBfile
> > s3://troubleshooting
> > upload failed: ./4GBfile to s3://troubleshooting/4GBfile An error
> > occurred (UnknownError) when calling the UploadPart operation (reached
> > max retries: 2): Unknown
> >
> > Thanks,
> >
> > Jared Baker
> > Cloud Architect for the Cancer Genome Collaboratory
> > Ontario Institute for Cancer Research
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] The confusing output of ceph df command

2020-09-08 Thread norman kern
Hi,
I have changed most of pools from 3-replica to ec 4+2 in my cluster, when I use ceph df command to show

the used capactiy of the cluster:

RAW STORAGE:
    CLASS SIZE    AVAIL   USED    RAW USED %RAW USED
    hdd   1.8 PiB 788 TiB 1.0 PiB  1.0 PiB 57.22
    ssd   7.9 TiB 4.6 TiB 181 GiB  3.2 TiB 41.15
    ssd-cache 5.2 TiB 5.2 TiB  67 GiB   73 GiB  1.36
    TOTAL 1.8 PiB 798 TiB 1.0 PiB  1.0 PiB 56.99
 
POOLS:
    POOL    ID STORED  OBJECTS USED    %USED MAX AVAIL
    default-oss.rgw.control 1 0 B   8 0 B 0   1.3 TiB
    default-oss.rgw.meta    2  22 KiB  97 3.9 MiB 0   1.3 TiB
    default-oss.rgw.log 3 525 KiB 223 621 KiB 0   1.3 TiB
    default-oss.rgw.buckets.index   4  33 MiB  34  33 MiB 0   1.3 TiB
    default-oss.rgw.buckets.non-ec  5 1.6 MiB  48 3.8 MiB 0   1.3 TiB
    .rgw.root    6 3.8 KiB  16 720 KiB 0   1.3 TiB
    default-oss.rgw.buckets.data    7 274 GiB 185.39k 450 GiB  0.14   212 TiB
    default-fs-metadata 8 488 GiB 153.10M 490 GiB 10.65   1.3 TiB
    default-fs-data0    9 374 TiB   1.48G 939 TiB 74.71   212 TiB

   ...

The USED = 3 * STORED in 3-replica mode is completely right, but for EC 4+2 pool (for default-fs-data0 )

the USED is not equal 1.5 * STORED, why...:(

 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] How to delete OSD benchmark data

2020-09-08 Thread Jayesh Labade
Dear Ceph Users,

I am testing my 3 node Proxmox + Ceph cluster.
I have performed osd benchmark with the below command.

# ceph tell osd.0 bench

Do I need to perform any cleanup to delete benchmark data from osd ?

I have googled for same but nowhere mentioned post steps after osd
benchmark commands




Thanks
Jayesh
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph pgs inconsistent, always the same checksum

2020-09-08 Thread Janne Johansson
I googled "got 0x6706be76, expected" and found some hits regarding ceph, so
whatever it is, you are not the first, and that number has some internal
meaning.
Redhat solution for similar issue says that checksum is for seeing all
zeroes, and hints at a bad write cache on the controller or something that
ends up clearing data instead of writing the correct information on
shutdowns.


Den tis 8 sep. 2020 kl 23:21 skrev David Orman :

>
>
> We're seeing repeated inconsistent PG warnings, generally on the order of
> 3-10 per week.
>
> pg 2.b9 is active+clean+inconsistent, acting [25,117,128,95,151,15]
>
>


> Every time we look at them, we see the same checksum (0x6706be76):
>
> debug 2020-08-13T18:39:01.731+ 7fbc037a7700 -1
> bluestore(/var/lib/ceph/osd/ceph-25) _verify_csum bad crc32c/0x1000
> checksum at blob offset 0x0, got 0x6706be76, expected 0x61f2021c, device
> location [0x12b403c~1000], logical extent 0x0~1000, object
> 2#2:0f1a338f:::rbd_data.3.20d195d612942.01db869b:head#
>
> This looks a lot like: https://tracker.ceph.com/issues/22464
> That said, we've got the following versions in play (cluster was created
> with 15.2.3):
> ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus
> (stable)
>


-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io