[ceph-users] Re: Hadoop to Ceph

2020-11-06 Thread Jaroslaw Owsiewski
Hi,

What protocol do you want to make this data available on the Ceph?

-- 
Jarek

pt., 6 lis 2020 o 04:00 Szabo, Istvan (Agoda) 
napisał(a):

> Hi,
>
> Is there anybody tried to migrate data from Hadoop to Ceph?
> If yes what is the right way?
>
> Thank you
>
> 
> This message is confidential and is for the sole use of the intended
> recipient(s). It may also be privileged or otherwise protected by copyright
> or other legal rules. If you have received it by mistake please let us know
> by reply email and delete it from your system. It is prohibited to copy
> this message or disclose its content to anyone. Any confidentiality or
> privilege is not waived or lost by any mistaken delivery or unauthorized
> disclosure of the message. All messages sent to and from Agoda may be
> monitored to ensure compliance with company policies, to protect the
> company's interests and to remove potential malware. Electronic messages
> may be intercepted, amended, lost or deleted, or contain viruses.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mon went down and won't come back

2020-11-06 Thread Eugen Block

Hi,

can you share your ceph.conf (mon section)?


Zitat von Paul Mezzanini :


Hi everyone,

I figure it's time to pull in more brain power on this one.  We had  
an NVMe mostly die in one of our monitors and it caused the write  
latency for the machine to spike.  Ceph did the RightThing(tm) and  
when it lost quorum on that machine it was ignored.  I pulled the  
bad drive out of the array and tried to bring the mon and mgr back  
in (our monitors double-duty as managers).


The manager came up 0 problems but the monitor got stuck probing.

I removed the bad host from the monmap and stood up a new one on an  
OSD node to get back to 3 active.  That new node added perfectly  
using the same methods I've tried on the old one.


Network appears to be clean between all hosts.  Packet captures show  
them chatting just fine.  Since we are getting ready to upgrade from  
RHEL7 to RHEL8 I took this as an opportunity to reinstall the  
monitor as an 8 box to get that process rolling.  Box is now on  
RHEL8 with no changes to how ceph-mon is acting.


I install machines with a kickstart and use our own ansible roles to  
get it 95% into service.  I then follow the manual install  
instructions  
(https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/#adding-monitors).


Time is in sync, /var/lib/ceph/mon/* is owned by the right UID, keys  
are in sync, configs are in sync.  I pulled the old mon out of "mon  
initial members" and "mon host".  `nc` can talk to all the ports in  
question and we've tried it with firewalld off as well (ditto with  
selinux).  Cleaned up some stale DNS and even tried a different IP  
(same DNS name). I started all of this with 14.2.12 but .13 was  
released while debugging so I've got that on the broken monitor at  
the moment.


I manually start the daemon in debug mode (/usr/bin/ceph-mon -d  
--cluster ceph --id ceph-mon-02 --setuser ceph --setgroup ceph)  
until it's joined in then use the systemd scripts to start it once  
it's clean.  The current state is:


(Lightly sanitized output)
:snip:
2020-11-04 11:38:57.049 7f4232fb3540  0 mon.ceph-mon-02 does not  
exist in monmap, will attempt to join an existing cluster
2020-11-04 11:38:57.049 7f4232fb3540  0 using public_addr  
v2:Num.64:0/0 -> [v2:Num.64:3300/0,v1:Num.64:6789/0]
2020-11-04 11:38:57.050 7f4232fb3540  0 starting mon.ceph-mon-02  
rank -1 at public addrs [v2:Num.64:3300/0,v1:Num.64:6789/0] at bind  
addrs [v2:Num.64:3300/0,v1:Num.64:6789/0] mon_data  
/var/lib/ceph/mon/ceph-ceph-mon-02 fsid  
8514c8d5-4cd3-4dee-b460-27633e3adb1a
2020-11-04 11:38:57.051 7f4232fb3540  1 mon.ceph-mon-02@-1(???) e25  
preinit fsid 8514c8d5-4cd3-4dee-b460-27633e3adb1a
2020-11-04 11:38:57.051 7f4232fb3540  1 mon.ceph-mon-02@-1(???) e25   
initial_members ceph-mon-01,ceph-mon-03, filtering seed monmap
2020-11-04 11:38:57.051 7f4232fb3540  0 mon.ceph-mon-02@-1(???).mds  
e430081 new map
2020-11-04 11:38:57.051 7f4232fb3540  0 mon.ceph-mon-02@-1(???).mds  
e430081 print_map

:snip:
2020-11-04 11:38:57.053 7f4232fb3540  0 mon.ceph-mon-02@-1(???).osd  
e1198618 crush map has features 288514119978713088, adjusting msgr  
requires
2020-11-04 11:38:57.053 7f4232fb3540  0 mon.ceph-mon-02@-1(???).osd  
e1198618 crush map has features 288514119978713088, adjusting msgr  
requires
2020-11-04 11:38:57.053 7f4232fb3540  0 mon.ceph-mon-02@-1(???).osd  
e1198618 crush map has features 3314933069571702784, adjusting msgr  
requires
2020-11-04 11:38:57.053 7f4232fb3540  0 mon.ceph-mon-02@-1(???).osd  
e1198618 crush map has features 288514119978713088, adjusting msgr  
requires
2020-11-04 11:38:57.054 7f4232fb3540  1  
mon.ceph-mon-02@-1(???).paxosservice(auth 54141..54219) refresh  
upgraded, format 0 -> 3
2020-11-04 11:38:57.069 7f421d891700  1 mon.ceph-mon-02@-1(probing)  
e25 handle_auth_request failed to assign global_id

 ^^^ last line repeated every few seconds until process killed

I've exhausted everything I can think of so I've just been doing the  
scientific shotgun (one slug at a time) approach to see what  
changes.  Does anyone else have any ideas?


--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technology Services
Finance & Administration
Rochester Institute of Technology
o:(585) 475-3245 | pfm...@rit.edu

CONFIDENTIALITY NOTE: The information transmitted, including attachments, is
intended only for the person(s) or entity to which it is addressed and may
contain confidential and/or privileged material. Any review, retransmission,
dissemination or other use of, or taking of any action in reliance upon this
information by persons or entities other than the intended recipient is
prohibited. If you received this in error, please contact the sender and
destroy any copies of this information.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




[ceph-users] Re: high latency after maintenance]

2020-11-06 Thread Marcel Kuiper


Hi Anthony

Thank you for your respons

I am looking at the"OSDs highest latency of write operations" panel of the
grafana dashboard found in the ceph source in
./monitoring/grafana/dashboards/osds-overview.json. It is a topk graph
that  uses ceph_osd_op_w_latency_sum / ceph_osd_op_w_latency_count.
During normal operations we see sometime latency spikes of 4 seconds max
but during the bringing back of the rack we saw a consistent increase in
latency for a lot of osds into the 20 seconds range

The cluster has 1139 osds total of which we had 5 x 9 - 45 in maintenance

We did not throttle the backfilling proces because we succesfully did the
same maintenance before on a few occasions for other racks without
problems. I will throttle backfills next time we have the same sort of
maintenance in the next rack

Can you elaborate a bit more what happens exactly during the peering
process? I understand that the osds need to catch up. I also see that the
nr of scrubs increases a lot when osds are brought back online. Is that
part of the peering proces?

Thx, Marcel


> HDDs and concern for latency don’t mix.  That said, you don’t specify
> what you mean by “latency”.  Does that mean average client write
> latency?  median? P99? Something else?
>
> If you have a 15 node cluster and you took a third of it down for two
> hours then yeah you’ll have a lot to catch up on when you come back.
> Bringing the nodes back one at a time can help, to spread out the peering.
>  Did you throttle backfill/recovery tunables all the way down to 1?  In a
> way that the restarted OSDs would use the throttled values as they boot?
>
>
>
>
>> On Nov 5, 2020, at 6:47 AM, Marcel Kuiper  wrote:
>>
>> Hi
>>
>> We had a rack down for 2hours for maintenance. 5 storage nodes were
>> involved. We had noout en norebalance flags set before the start of the
>> maintenance
>>
>> When the systems were brought back online we noticed a lot of osds with
>> high latency (in 20 seconds range) . Mostly osds that are not on the
>> storage nodes that were down. It took about 20 minutes for things to
>> settle down.
>>
>> We're running nautilus 14.2.11. The storage nodes run bluestore and have
>> 9
>> x 8T HDD's and 3 x SSD for rocksdb. Each with 3 x 123G LV
>>
>> - Can anyone give a reason for these high latencies?
>> - Is there a way to avoid or lower these latencies when bringing systems
>> back into operation?
>>
>> Best Regards
>>
>> Marcel
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Low Memory Nodes

2020-11-06 Thread Hans van den Bogert

> I already ordered more ram. Can i turn temporary down the RAM usage of
> the OSDs to not get into that vicious cycle and just suffer small but
> stable performance?

Hi,

Look at 
https://docs.ceph.com/en/latest/rados/configuration/bluestore-config-ref/#bluestore-config-reference


and then specifically  the `osd_memory_target` config key. This may help.

Regards,

Hans
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Low Memory Nodes

2020-11-06 Thread Dan van der Ster
How much RAM do you have, and how many OSDs?

This config should be considered close to the minimum:

   ceph config set osd osd_memory_target 15

(1.5GB per OSD  -- remember the default is 4GB per OSD)

-- dan


On Fri, Nov 6, 2020 at 11:52 AM Ml Ml  wrote:
>
> Hello List,
>
> i think 3 of 6 Nodes have to less memory. This triggers the effect,
> that the nodes will swap a lot and almost kill themselfes. That
> triggers OSDs to go down, which triggers a rebalance which does not
> really help :D
>
> I already ordered more ram. Can i turn temporary down the RAM usage of
> the OSDs to not get into that vicious cycle and just suffer small but
> stable performance?
>
> This is ceph version 15.2.5 with bluestore.
>
> Thanks,
> Michael
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Multisite sync not working - permission denied

2020-11-06 Thread Michael Breen
Hi,

radosgw-admin -v
ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus
(stable)

Multisite sync was something I had working with a previous cluster and an
earlier Ceph version, but it doesn't now, and I can't understand why.
If anyone with an idea of a possible cause could give me a clue I would be
grateful.
I have clusters set up using Rook, but as far as I can tell, that's not a
factor.

On the primary cluster, I have this:

radosgw-admin zonegroup get --rgw-zonegroup zonegroup-a
{
"id": "b115d74a-2d5f-4127-b621-0223f1e96c71",
"name": "zonegroup-a",
"api_name": "zonegroup-a",
"is_master": "true",
"endpoints": [
"http://192.168.30.8:80";
],
"hostnames": [],
"hostnames_s3website": [],
"master_zone": "024687e0-1461-4f45-9149-9e571791c2b3",
"zones": [
{
"id": "024687e0-1461-4f45-9149-9e571791c2b3",
"name": "zone-a",
"endpoints": [
"http://192.168.30.8:80";
],
"log_meta": "false",
"log_data": "true",
"bucket_index_max_shards": 11,
"read_only": "false",
"tier_type": "",
"sync_from_all": "true",
"sync_from": [],
"redirect_zone": ""
},
{
"id": "6ba0ee26-0155-48f9-b057-2803336f0d66",
"name": "zone-b",
"endpoints": [
"http://192.168.30.108:80";
],
"log_meta": "false",
"log_data": "true",
"bucket_index_max_shards": 11,
"read_only": "false",
"tier_type": "",
"sync_from_all": "true",
"sync_from": [],
"redirect_zone": ""
}
],
"placement_targets": [
{
"name": "default-placement",
"tags": [],
"storage_classes": [
"STANDARD"
]
}
],
"default_placement": "default-placement",
"realm_id": "8c38fa05-c19d-4e30-bc98-e2bc84eccb68",
"sync_policy": {
"groups": []
}
}

It's identical on the secondary (that's after a realm pull, an update of
the zone-b endpoints, and a period commit), which I double-checked by
piping the output to md5sum on both sides.
The system user created on the primary is

radosgw-admin user info --uid realm-a-system-user
{
...
"keys": [
{
"user": "realm-a-system-user",
"access_key": "IUs+USI5IjA8WkZPRjU=",
"secret_key": "PGRDSzRERD4lbF9AYThuLzkvW1QvL148Q147PA=="
}
...
}

The zones on both sides have these keys

radosgw-admin zone get --rgw-zone zone-a
{
...
"system_key": {
"access_key": "IUs+USI5IjA8WkZPRjU=",
"secret_key": "PGRDSzRERD4lbF9AYThuLzkvW1QvL148Q147PA=="
},
...
}

radosgw-admin zone get --rgw-zonegroup zonegroup-a --rgw-zone zone-b
{
...
"system_key": {
"access_key": "IUs+USI5IjA8WkZPRjU=",
"secret_key": "PGRDSzRERD4lbF9AYThuLzkvW1QvL148Q147PA=="
},
...
}


Yet, on the secondary

radosgw-admin sync status
  realm 8c38fa05-c19d-4e30-bc98-e2bc84eccb68 (realm-a)
  zonegroup b115d74a-2d5f-4127-b621-0223f1e96c71 (zonegroup-a)
   zone 6ba0ee26-0155-48f9-b057-2803336f0d66 (zone-b)
  metadata sync preparing for full sync
full sync: 64/64 shards
full sync: 0 entries to sync
incremental sync: 0/64 shards
metadata is behind on 64 shards
behind shards:
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63]
  data sync source: 024687e0-1461-4f45-9149-9e571791c2b3 (zone-a)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source

and on the primary

radosgw-admin sync status
  realm 8c38fa05-c19d-4e30-bc98-e2bc84eccb68 (realm-a)
  zonegroup b115d74a-2d5f-4127-b621-0223f1e96c71 (zonegroup-a)
   zone 024687e0-1461-4f45-9149-9e571791c2b3 (zone-a)
  metadata sync no sync (zone is master)
2020-11-06T10:58:46.345+ 7fa805c201c0  0 data sync zone:6ba0ee26 ERROR:
failed to fetch datalog info
  data sync source: 6ba0ee26-0155-48f9-b057-2803336f0d66 (zone-b)
failed to retrieve sync info: (13) Permission denied

Given that all the keys above match, that "permission denied" is a mystery
to me, but it does accord with:

export AWS_ACCESS_KEY_ID="IUs+USI5IjA8WkZPRjU="
export AWS_SECRET_ACCESS_KEY="PGRDSzRERD4lbF9AYThuLzkvW1QvL148Q147PA=="
s3cmd ls --no-ssl --host-bucket= --host=192.168.30.8 # OK, but:
s3cmd ls --no-ssl --host-bucket= --host=192.168.30.108
# ERROR: S3 error: 403 (InvalidAccessKeyId)
# Although
curl -L http://192.168.30.108  # works: 

smime.p7

[ceph-users] Re: Multisite sync not working - permission denied

2020-11-06 Thread Michael Breen
I forgot to mention earlier attempted debugging: I believe this is not
because the keys are wrong, but because it is looking for a user that is
not seen on the secondary:

debug 2020-11-03T16:37:47.330+ 7f32e9859700  5 req 60 0.00386s
:post_period error reading user info, uid=ACCESS can't authenticate
debug 2020-11-03T16:37:47.330+ 7f32e9859700 20 req 60 0.00386s
:post_period rgw::auth::s3::LocalEngine denied with reason=-2028
debug 2020-11-03T16:37:47.330+ 7f32e9859700 20 req 60 0.00386s
:post_period rgw::auth::s3::AWSAuthStrategy denied with reason=-2028
debug 2020-11-03T16:37:47.330+ 7f32e9859700  5 req 60 0.00386s
:post_period Failed the auth strategy, reason=-2028
debug 2020-11-03T16:37:47.330+ 7f32e9859700 10 failed to authorize
request

src/rgw/rgw_common.h:#define ERR_INVALID_ACCESS_KEY   2028

./src/rgw/rgw_rest_s3.cc
  if (rgw_get_user_info_by_access_key(ctl->user, access_key_id, user_info)
< 0) {
  ldpp_dout(dpp, 5) << "error reading user info, uid=" << access_key_id
  << " can't authenticate" << dendl;

On Fri, 6 Nov 2020 at 11:38, Michael Breen <
michael.br...@vikingenterprise.com> wrote:

> Hi,
>
> radosgw-admin -v
> ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus
> (stable)
>
> Multisite sync was something I had working with a previous cluster and an
> earlier Ceph version, but it doesn't now, and I can't understand why.
> If anyone with an idea of a possible cause could give me a clue I would be
> grateful.
> I have clusters set up using Rook, but as far as I can tell, that's not a
> factor.
>
> On the primary cluster, I have this:
>
> radosgw-admin zonegroup get --rgw-zonegroup zonegroup-a
> {
> "id": "b115d74a-2d5f-4127-b621-0223f1e96c71",
> "name": "zonegroup-a",
> "api_name": "zonegroup-a",
> "is_master": "true",
> "endpoints": [
> "http://192.168.30.8:80";
> ],
> "hostnames": [],
> "hostnames_s3website": [],
> "master_zone": "024687e0-1461-4f45-9149-9e571791c2b3",
> "zones": [
> {
> "id": "024687e0-1461-4f45-9149-9e571791c2b3",
> "name": "zone-a",
> "endpoints": [
> "http://192.168.30.8:80";
> ],
> "log_meta": "false",
> "log_data": "true",
> "bucket_index_max_shards": 11,
> "read_only": "false",
> "tier_type": "",
> "sync_from_all": "true",
> "sync_from": [],
> "redirect_zone": ""
> },
> {
> "id": "6ba0ee26-0155-48f9-b057-2803336f0d66",
> "name": "zone-b",
> "endpoints": [
> "http://192.168.30.108:80";
> ],
> "log_meta": "false",
> "log_data": "true",
> "bucket_index_max_shards": 11,
> "read_only": "false",
> "tier_type": "",
> "sync_from_all": "true",
> "sync_from": [],
> "redirect_zone": ""
> }
> ],
> "placement_targets": [
> {
> "name": "default-placement",
> "tags": [],
> "storage_classes": [
> "STANDARD"
> ]
> }
> ],
> "default_placement": "default-placement",
> "realm_id": "8c38fa05-c19d-4e30-bc98-e2bc84eccb68",
> "sync_policy": {
> "groups": []
> }
> }
>
> It's identical on the secondary (that's after a realm pull, an update of
> the zone-b endpoints, and a period commit), which I double-checked by
> piping the output to md5sum on both sides.
> The system user created on the primary is
>
> radosgw-admin user info --uid realm-a-system-user
> {
> ...
> "keys": [
> {
> "user": "realm-a-system-user",
> "access_key": "IUs+USI5IjA8WkZPRjU=",
> "secret_key": "PGRDSzRERD4lbF9AYThuLzkvW1QvL148Q147PA=="
> }
> ...
> }
>
> The zones on both sides have these keys
>
> radosgw-admin zone get --rgw-zone zone-a
> {
> ...
> "system_key": {
> "access_key": "IUs+USI5IjA8WkZPRjU=",
> "secret_key": "PGRDSzRERD4lbF9AYThuLzkvW1QvL148Q147PA=="
> },
> ...
> }
>
> radosgw-admin zone get --rgw-zonegroup zonegroup-a --rgw-zone zone-b
> {
> ...
> "system_key": {
> "access_key": "IUs+USI5IjA8WkZPRjU=",
> "secret_key": "PGRDSzRERD4lbF9AYThuLzkvW1QvL148Q147PA=="
> },
> ...
> }
>
>
> Yet, on the secondary
>
> radosgw-admin sync status
>   realm 8c38fa05-c19d-4e30-bc98-e2bc84eccb68 (realm-a)
>   zonegroup b115d74a-2d5f-4127-b621-0223f1e96c71 (zonegroup-a)
>zone 6ba0ee26-0155-48f9-b057-2803336f0d66 (zone-b)
>   metadata sync preparing for full sync
> full sync: 64/64 shards
> full sync: 0 entries to sync
> incremental sync: 0/64 shards
> metadata is behind on 64 shards
> behind shards:
> [0,1,2,3,4,5,6,7,8

[ceph-users] Re: Hadoop to Ceph

2020-11-06 Thread Jaroslaw Owsiewski
If S3 you can use distcp from HDFS to S3@Ceph.

For example:

hadoop distcp -Dmapred.job.queue.name=queue_name Dfs.s3a.access.key=
-Dfs.s3a.secret.key= -Dfs.s3a.endpoint=
-Dfs.s3a.connection.ssl.enabled=false_or_true /hdfs_path/ s3a://path/

Regards
-- 
Jarek

pt., 6 lis 2020 o 12:29 Szabo, Istvan (Agoda) 
napisał(a):

> Objectstore
>
> On 2020. Nov 6., at 15:41, Jaroslaw Owsiewski <
> jaroslaw.owsiew...@allegro.pl> wrote:
>
> 
> Email received from outside the company. If in doubt don't click links nor
> open attachments!
> 
> Hi,
>
> What protocol do you want to make this data available on the Ceph?
>
> --
> Jarek
>
> pt., 6 lis 2020 o 04:00 Szabo, Istvan (Agoda)  > napisał(a):
> Hi,
>
> Is there anybody tried to migrate data from Hadoop to Ceph?
> If yes what is the right way?
>
> Thank you
>
> 
> This message is confidential and is for the sole use of the intended
> recipient(s). It may also be privileged or otherwise protected by copyright
> or other legal rules. If you have received it by mistake please let us know
> by reply email and delete it from your system. It is prohibited to copy
> this message or disclose its content to anyone. Any confidentiality or
> privilege is not waived or lost by any mistaken delivery or unauthorized
> disclosure of the message. All messages sent to and from Agoda may be
> monitored to ensure compliance with company policies, to protect the
> company's interests and to remove potential malware. Electronic messages
> may be intercepted, amended, lost or deleted, or contain viruses.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io ceph-users-le...@ceph.io>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: high latency after maintenance]

2020-11-06 Thread Wout van Heeswijk
Hi Marcel,

The peering process is the process used by Ceph OSDs, on a per placement group 
basis, to agree on the state of that placement on each of the involved OSDs.

In your case, 2/3 of the placement group metadata that needs to be agreed 
upon/checked is on the nodes that did not undergo maintenance. You also need to 
consider that the acting primary OSD for everything is now hosted on the OSDs 
that did not undergo any maintenance.

This all means that all 'heavy' lifting is done by these nodes until the 
recovery/backfilling process is completed is done by the nodes that stayed 
online. Also consider that Ceph will, most likely, execute peering twice per 
pg. Once when the OSDs start again, and once when the recovery and backfillling 
is finished.

I really don't want to RTFM, but I don't think it is useful to copy it here:
https://docs.ceph.com/en/latest/dev/peering/#description-of-the-peering-process

Peering
the process of bringing all of the OSDs that store a Placement Group (PG) into 
agreement about the state of all of the objects (and their metadata) in that 
PG. Note that agreeing on the state does not mean that they all have the latest 
contents.

Kind regards,

Wout
42on



From: Marcel Kuiper 
Sent: Friday, 6 November 2020 10:23
To: ceph-users@ceph.io
Subject: [ceph-users] Re: high latency after maintenance]


Hi Anthony

Thank you for your respons

I am looking at the"OSDs highest latency of write operations" panel of the
grafana dashboard found in the ceph source in
./monitoring/grafana/dashboards/osds-overview.json. It is a topk graph
that  uses ceph_osd_op_w_latency_sum / ceph_osd_op_w_latency_count.
During normal operations we see sometime latency spikes of 4 seconds max
but during the bringing back of the rack we saw a consistent increase in
latency for a lot of osds into the 20 seconds range

The cluster has 1139 osds total of which we had 5 x 9 - 45 in maintenance

We did not throttle the backfilling proces because we succesfully did the
same maintenance before on a few occasions for other racks without
problems. I will throttle backfills next time we have the same sort of
maintenance in the next rack

Can you elaborate a bit more what happens exactly during the peering
process? I understand that the osds need to catch up. I also see that the
nr of scrubs increases a lot when osds are brought back online. Is that
part of the peering proces?

Thx, Marcel


> HDDs and concern for latency don’t mix.  That said, you don’t specify
> what you mean by “latency�.  Does that mean average client write
> latency?  median? P99? Something else?
>
> If you have a 15 node cluster and you took a third of it down for two
> hours then yeah you’ll have a lot to catch up on when you come back.
> Bringing the nodes back one at a time can help, to spread out the peering.
>  Did you throttle backfill/recovery tunables all the way down to 1?  In a
> way that the restarted OSDs would use the throttled values as they boot?
>
>
>
>
>> On Nov 5, 2020, at 6:47 AM, Marcel Kuiper  wrote:
>>
>> Hi
>>
>> We had a rack down for 2hours for maintenance. 5 storage nodes were
>> involved. We had noout en norebalance flags set before the start of the
>> maintenance
>>
>> When the systems were brought back online we noticed a lot of osds with
>> high latency (in 20 seconds range) . Mostly osds that are not on the
>> storage nodes that were down. It took about 20 minutes for things to
>> settle down.
>>
>> We're running nautilus 14.2.11. The storage nodes run bluestore and have
>> 9
>> x 8T HDD's and 3 x SSD for rocksdb. Each with 3 x 123G LV
>>
>> - Can anyone give a reason for these high latencies?
>> - Is there a way to avoid or lower these latencies when bringing systems
>> back into operation?
>>
>> Best Regards
>>
>> Marcel
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mon went down and won't come back

2020-11-06 Thread Paul Mezzanini
Relevant ceph.conf file lines:
[global]
mon initial members = ceph-mon-01,ceph-mon-03
mon host = IPFor01,IPFor03  

mon max pg per osd = 400
mon pg warn max object skew = -1

[mon]
mon allow pool delete = true



ceph config has:
global  advanced mon_max_pg_per_osd 
400  
global  advanced mon_pg_warn_max_object_skew
-1.00
global  dev  mon_warn_on_pool_pg_num_not_power_of_two   
false 
monadvanced mon_allow_pool_delete  
true 


I'm slowly pulling it all into ceph config and I just haven't sat down to 
verify it and deploy the stub config everywhere.   Non power of two is set 
because i'm slowly walking a pool back to a lower PG num and I was sick of the 
health warn :)  (same with max pg per osd but I'm well under 400 now so I could 
purge that line)

Again, lightly sanitized.  The actual IP's do match forward and reverse DNS.

--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technology Services
Finance & Administration
Rochester Institute of Technology
o:(585) 475-3245 | pfm...@rit.edu

CONFIDENTIALITY NOTE: The information transmitted, including attachments, is
intended only for the person(s) or entity to which it is addressed and may
contain confidential and/or privileged material. Any review, retransmission,
dissemination or other use of, or taking of any action in reliance upon this
information by persons or entities other than the intended recipient is
prohibited. If you received this in error, please contact the sender and
destroy any copies of this information.



From: Eugen Block 
Sent: Friday, November 6, 2020 3:41 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Mon went down and won't come back

Hi,

can you share your ceph.conf (mon section)?


Zitat von Paul Mezzanini :

> Hi everyone,
>
> I figure it's time to pull in more brain power on this one.  We had
> an NVMe mostly die in one of our monitors and it caused the write
> latency for the machine to spike.  Ceph did the RightThing(tm) and
> when it lost quorum on that machine it was ignored.  I pulled the
> bad drive out of the array and tried to bring the mon and mgr back
> in (our monitors double-duty as managers).
>
> The manager came up 0 problems but the monitor got stuck probing.
>
> I removed the bad host from the monmap and stood up a new one on an
> OSD node to get back to 3 active.  That new node added perfectly
> using the same methods I've tried on the old one.
>
> Network appears to be clean between all hosts.  Packet captures show
> them chatting just fine.  Since we are getting ready to upgrade from
> RHEL7 to RHEL8 I took this as an opportunity to reinstall the
> monitor as an 8 box to get that process rolling.  Box is now on
> RHEL8 with no changes to how ceph-mon is acting.
>
> I install machines with a kickstart and use our own ansible roles to
> get it 95% into service.  I then follow the manual install
> instructions
> (https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/#adding-monitors).
>
> Time is in sync, /var/lib/ceph/mon/* is owned by the right UID, keys
> are in sync, configs are in sync.  I pulled the old mon out of "mon
> initial members" and "mon host".  `nc` can talk to all the ports in
> question and we've tried it with firewalld off as well (ditto with
> selinux).  Cleaned up some stale DNS and even tried a different IP
> (same DNS name). I started all of this with 14.2.12 but .13 was
> released while debugging so I've got that on the broken monitor at
> the moment.
>
> I manually start the daemon in debug mode (/usr/bin/ceph-mon -d
> --cluster ceph --id ceph-mon-02 --setuser ceph --setgroup ceph)
> until it's joined in then use the systemd scripts to start it once
> it's clean.  The current state is:
>
> (Lightly sanitized output)
> :snip:
> 2020-11-04 11:38:57.049 7f4232fb3540  0 mon.ceph-mon-02 does not
> exist in monmap, will attempt to join an existing cluster
> 2020-11-04 11:38:57.049 7f4232fb3540  0 using public_addr
> v2:Num.64:0/0 -> [v2:Num.64:3300/0,v1:Num.64:6789/0]
> 2020-11-04 11:38:57.050 7f4232fb3540  0 starting mon.ceph-mon-02
> rank -1 at public addrs [v2:Num.64:3300/0,v1:Num.64:6789/0] at bind
> addrs [v2:Num.64:3300/0,v1:Num.64:6789/0] mon_data
> /var/lib/ceph/mon/ceph-ceph-mon-02 fsid
> 8514c8d5-4cd3-4dee-b460-27633e3adb1a
> 2020-11-04 11:38:57.051 7f4232fb3540  1 mon.ceph-mon-02@-1(???) e25
> preinit fsid 8514c8d5-4cd3-4dee-b460-27633e3adb1a
> 2020-11-04 11:38:57.051 7f4232fb3540  1 mon.ceph-mon-02@-1(???) e25
> initial_members ceph-mon-01,ceph-mon-03, filtering seed monmap
> 2020-11-04 11:38:57.051 7f4232fb3540  0 mon

[ceph-users] Re: Mon went down and won't come back

2020-11-06 Thread Eugen Block

So the mon_host line is without a port, correct, just the IP?


Zitat von Paul Mezzanini :


Relevant ceph.conf file lines:
[global]
mon initial members = ceph-mon-01,ceph-mon-03
mon host = IPFor01,IPFor03
mon max pg per osd = 400
mon pg warn max object skew = -1

[mon]
mon allow pool delete = true



ceph config has:
global  advanced mon_max_pg_per_osd   
   400
global  advanced mon_pg_warn_max_object_skew  
   -1.00
global  dev   
mon_warn_on_pool_pg_num_not_power_of_two   false
monadvanced mon_allow_pool_delete 
  true



I'm slowly pulling it all into ceph config and I just haven't sat  
down to verify it and deploy the stub config everywhere.   Non power  
of two is set because i'm slowly walking a pool back to a lower PG  
num and I was sick of the health warn :)  (same with max pg per osd  
but I'm well under 400 now so I could purge that line)


Again, lightly sanitized.  The actual IP's do match forward and reverse DNS.

--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technology Services
Finance & Administration
Rochester Institute of Technology
o:(585) 475-3245 | pfm...@rit.edu

CONFIDENTIALITY NOTE: The information transmitted, including attachments, is
intended only for the person(s) or entity to which it is addressed and may
contain confidential and/or privileged material. Any review, retransmission,
dissemination or other use of, or taking of any action in reliance upon this
information by persons or entities other than the intended recipient is
prohibited. If you received this in error, please contact the sender and
destroy any copies of this information.



From: Eugen Block 
Sent: Friday, November 6, 2020 3:41 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Mon went down and won't come back

Hi,

can you share your ceph.conf (mon section)?


Zitat von Paul Mezzanini :


Hi everyone,

I figure it's time to pull in more brain power on this one.  We had
an NVMe mostly die in one of our monitors and it caused the write
latency for the machine to spike.  Ceph did the RightThing(tm) and
when it lost quorum on that machine it was ignored.  I pulled the
bad drive out of the array and tried to bring the mon and mgr back
in (our monitors double-duty as managers).

The manager came up 0 problems but the monitor got stuck probing.

I removed the bad host from the monmap and stood up a new one on an
OSD node to get back to 3 active.  That new node added perfectly
using the same methods I've tried on the old one.

Network appears to be clean between all hosts.  Packet captures show
them chatting just fine.  Since we are getting ready to upgrade from
RHEL7 to RHEL8 I took this as an opportunity to reinstall the
monitor as an 8 box to get that process rolling.  Box is now on
RHEL8 with no changes to how ceph-mon is acting.

I install machines with a kickstart and use our own ansible roles to
get it 95% into service.  I then follow the manual install
instructions
(https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/#adding-monitors).

Time is in sync, /var/lib/ceph/mon/* is owned by the right UID, keys
are in sync, configs are in sync.  I pulled the old mon out of "mon
initial members" and "mon host".  `nc` can talk to all the ports in
question and we've tried it with firewalld off as well (ditto with
selinux).  Cleaned up some stale DNS and even tried a different IP
(same DNS name). I started all of this with 14.2.12 but .13 was
released while debugging so I've got that on the broken monitor at
the moment.

I manually start the daemon in debug mode (/usr/bin/ceph-mon -d
--cluster ceph --id ceph-mon-02 --setuser ceph --setgroup ceph)
until it's joined in then use the systemd scripts to start it once
it's clean.  The current state is:

(Lightly sanitized output)
:snip:
2020-11-04 11:38:57.049 7f4232fb3540  0 mon.ceph-mon-02 does not
exist in monmap, will attempt to join an existing cluster
2020-11-04 11:38:57.049 7f4232fb3540  0 using public_addr
v2:Num.64:0/0 -> [v2:Num.64:3300/0,v1:Num.64:6789/0]
2020-11-04 11:38:57.050 7f4232fb3540  0 starting mon.ceph-mon-02
rank -1 at public addrs [v2:Num.64:3300/0,v1:Num.64:6789/0] at bind
addrs [v2:Num.64:3300/0,v1:Num.64:6789/0] mon_data
/var/lib/ceph/mon/ceph-ceph-mon-02 fsid
8514c8d5-4cd3-4dee-b460-27633e3adb1a
2020-11-04 11:38:57.051 7f4232fb3540  1 mon.ceph-mon-02@-1(???) e25
preinit fsid 8514c8d5-4cd3-4dee-b460-27633e3adb1a
2020-11-04 11:38:57.051 7f4232fb3540  1 mon.ceph-mon-02@-1(???) e25
initial_members ceph-mon-01,ceph-mon-03, filtering seed monmap
2020-11-04 11:38:57.051 7f4232fb3540  0 mon.ceph-mon-02@-1(???).mds
e430081 new map
2020-11-04 11:38:57.051 7f4232fb3540  0 mon.ceph-mon-02@-1(???).mds
e430081 print_map
:snip:
2020-11-04 11:38:57.053 7f4232fb3540  0 mon.ceph-mon-02@-1(???).osd
e1198618

[ceph-users] Re: Mon went down and won't come back

2020-11-06 Thread Paul Mezzanini
Correct, just comma separated IP addresses

--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technology Services
Finance & Administration
Rochester Institute of Technology
o:(585) 475-3245 | pfm...@rit.edu

CONFIDENTIALITY NOTE: The information transmitted, including attachments, is
intended only for the person(s) or entity to which it is addressed and may
contain confidential and/or privileged material. Any review, retransmission,
dissemination or other use of, or taking of any action in reliance upon this
information by persons or entities other than the intended recipient is
prohibited. If you received this in error, please contact the sender and
destroy any copies of this information.



From: Eugen Block 
Sent: Friday, November 6, 2020 9:00 AM
To: Paul Mezzanini
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Re: Mon went down and won't come back

So the mon_host line is without a port, correct, just the IP?


Zitat von Paul Mezzanini :

> Relevant ceph.conf file lines:
> [global]
> mon initial members = ceph-mon-01,ceph-mon-03
> mon host = IPFor01,IPFor03
> mon max pg per osd = 400
> mon pg warn max object skew = -1
>
> [mon]
> mon allow pool delete = true
>
>
>
> ceph config has:
> global  advanced mon_max_pg_per_osd
>400
> global  advanced mon_pg_warn_max_object_skew
>-1.00
> global  dev
> mon_warn_on_pool_pg_num_not_power_of_two   false
> monadvanced mon_allow_pool_delete
>   true
>
>
> I'm slowly pulling it all into ceph config and I just haven't sat
> down to verify it and deploy the stub config everywhere.   Non power
> of two is set because i'm slowly walking a pool back to a lower PG
> num and I was sick of the health warn :)  (same with max pg per osd
> but I'm well under 400 now so I could purge that line)
>
> Again, lightly sanitized.  The actual IP's do match forward and reverse DNS.
>
> --
> Paul Mezzanini
> Sr Systems Administrator / Engineer, Research Computing
> Information & Technology Services
> Finance & Administration
> Rochester Institute of Technology
> o:(585) 475-3245 | pfm...@rit.edu
>
> CONFIDENTIALITY NOTE: The information transmitted, including attachments, is
> intended only for the person(s) or entity to which it is addressed and may
> contain confidential and/or privileged material. Any review, retransmission,
> dissemination or other use of, or taking of any action in reliance upon this
> information by persons or entities other than the intended recipient is
> prohibited. If you received this in error, please contact the sender and
> destroy any copies of this information.
> 
>
> 
> From: Eugen Block 
> Sent: Friday, November 6, 2020 3:41 AM
> To: ceph-users@ceph.io
> Subject: [ceph-users] Re: Mon went down and won't come back
>
> Hi,
>
> can you share your ceph.conf (mon section)?
>
>
> Zitat von Paul Mezzanini :
>
>> Hi everyone,
>>
>> I figure it's time to pull in more brain power on this one.  We had
>> an NVMe mostly die in one of our monitors and it caused the write
>> latency for the machine to spike.  Ceph did the RightThing(tm) and
>> when it lost quorum on that machine it was ignored.  I pulled the
>> bad drive out of the array and tried to bring the mon and mgr back
>> in (our monitors double-duty as managers).
>>
>> The manager came up 0 problems but the monitor got stuck probing.
>>
>> I removed the bad host from the monmap and stood up a new one on an
>> OSD node to get back to 3 active.  That new node added perfectly
>> using the same methods I've tried on the old one.
>>
>> Network appears to be clean between all hosts.  Packet captures show
>> them chatting just fine.  Since we are getting ready to upgrade from
>> RHEL7 to RHEL8 I took this as an opportunity to reinstall the
>> monitor as an 8 box to get that process rolling.  Box is now on
>> RHEL8 with no changes to how ceph-mon is acting.
>>
>> I install machines with a kickstart and use our own ansible roles to
>> get it 95% into service.  I then follow the manual install
>> instructions
>> (https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/#adding-monitors).
>>
>> Time is in sync, /var/lib/ceph/mon/* is owned by the right UID, keys
>> are in sync, configs are in sync.  I pulled the old mon out of "mon
>> initial members" and "mon host".  `nc` can talk to all the ports in
>> question and we've tried it with firewalld off as well (ditto with
>> selinux).  Cleaned up some stale DNS and even tried a different IP
>> (same DNS name). I started all of this with 14.2.12 but .13 was
>> released while debugging so I've got that on the broken monitor at
>> the moment.
>>
>> I manually start the daemon in debug mode (/usr/bin/ceph-mon -d
>> --cluster ceph --id ceph-mon-02 --setuser ceph --setgroup ceph)
>> until it's joined

[ceph-users] Low Memory Nodes

2020-11-06 Thread Ml Ml
Hello List,

i think 3 of 6 Nodes have to less memory. This triggers the effect,
that the nodes will swap a lot and almost kill themselfes. That
triggers OSDs to go down, which triggers a rebalance which does not
really help :D

I already ordered more ram. Can i turn temporary down the RAM usage of
the OSDs to not get into that vicious cycle and just suffer small but
stable performance?

This is ceph version 15.2.5 with bluestore.

Thanks,
Michael
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Hadoop to Ceph

2020-11-06 Thread Szabo, Istvan (Agoda)
Objectstore

On 2020. Nov 6., at 15:41, Jaroslaw Owsiewski  
wrote:


Email received from outside the company. If in doubt don't click links nor open 
attachments!

Hi,

What protocol do you want to make this data available on the Ceph?

--
Jarek

pt., 6 lis 2020 o 04:00 Szabo, Istvan (Agoda) 
mailto:istvan.sz...@agoda.com>> napisał(a):
Hi,

Is there anybody tried to migrate data from Hadoop to Ceph?
If yes what is the right way?

Thank you


This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Multisite sync not working - permission denied

2020-11-06 Thread Michael Breen
Continuing my fascinating conversation with myself:
The output of  radosgw-admin sync status  indicates that only the metadata
is a problem, i.e., the data itself is syncing, and I have confirmed that.
There is no S3 access to the secondary, zone-b, so I could not check
replication that way, but having created a bucket on the primary, on the
secondary I did
rados -p zone-b.rgw.buckets.data ls
and saw the bucket had been replicated.
My current suspicion is that the user problem is an effect rather than a
cause of the metadata sync problem.
I have also discovered a setting  debug_rgw_sync  which increases the debug
level only for the sync code, but found nothing interesting. The additional
output seemed all to relate to data rather than metadata.

On Fri, 6 Nov 2020 at 11:47, Michael Breen <
michael.br...@vikingenterprise.com> wrote:

> I forgot to mention earlier attempted debugging: I believe this is not
> because the keys are wrong, but because it is looking for a user that is
> not seen on the secondary:
>
> debug 2020-11-03T16:37:47.330+ 7f32e9859700  5 req 60 0.00386s
> :post_period error reading user info, uid=ACCESS can't authenticate
> debug 2020-11-03T16:37:47.330+ 7f32e9859700 20 req 60 0.00386s
> :post_period rgw::auth::s3::LocalEngine denied with reason=-2028
> debug 2020-11-03T16:37:47.330+ 7f32e9859700 20 req 60 0.00386s
> :post_period rgw::auth::s3::AWSAuthStrategy denied with reason=-2028
> debug 2020-11-03T16:37:47.330+ 7f32e9859700  5 req 60 0.00386s
> :post_period Failed the auth strategy, reason=-2028
> debug 2020-11-03T16:37:47.330+ 7f32e9859700 10 failed to authorize
> request
>
> src/rgw/rgw_common.h:#define ERR_INVALID_ACCESS_KEY   2028
>
> ./src/rgw/rgw_rest_s3.cc
>   if (rgw_get_user_info_by_access_key(ctl->user, access_key_id, user_info)
> < 0) {
>   ldpp_dout(dpp, 5) << "error reading user info, uid=" << access_key_id
>   << " can't authenticate" << dendl;
>
> On Fri, 6 Nov 2020 at 11:38, Michael Breen <
> michael.br...@vikingenterprise.com> wrote:
>
>> Hi,
>>
>> radosgw-admin -v
>> ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus
>> (stable)
>>
>> Multisite sync was something I had working with a previous cluster and an
>> earlier Ceph version, but it doesn't now, and I can't understand why.
>> If anyone with an idea of a possible cause could give me a clue I would
>> be grateful.
>> I have clusters set up using Rook, but as far as I can tell, that's not a
>> factor.
>>
>> On the primary cluster, I have this:
>>
>> radosgw-admin zonegroup get --rgw-zonegroup zonegroup-a
>> {
>> "id": "b115d74a-2d5f-4127-b621-0223f1e96c71",
>> "name": "zonegroup-a",
>> "api_name": "zonegroup-a",
>> "is_master": "true",
>> "endpoints": [
>> "http://192.168.30.8:80";
>> ],
>> "hostnames": [],
>> "hostnames_s3website": [],
>> "master_zone": "024687e0-1461-4f45-9149-9e571791c2b3",
>> "zones": [
>> {
>> "id": "024687e0-1461-4f45-9149-9e571791c2b3",
>> "name": "zone-a",
>> "endpoints": [
>> "http://192.168.30.8:80";
>> ],
>> "log_meta": "false",
>> "log_data": "true",
>> "bucket_index_max_shards": 11,
>> "read_only": "false",
>> "tier_type": "",
>> "sync_from_all": "true",
>> "sync_from": [],
>> "redirect_zone": ""
>> },
>> {
>> "id": "6ba0ee26-0155-48f9-b057-2803336f0d66",
>> "name": "zone-b",
>> "endpoints": [
>> "http://192.168.30.108:80";
>> ],
>> "log_meta": "false",
>> "log_data": "true",
>> "bucket_index_max_shards": 11,
>> "read_only": "false",
>> "tier_type": "",
>> "sync_from_all": "true",
>> "sync_from": [],
>> "redirect_zone": ""
>> }
>> ],
>> "placement_targets": [
>> {
>> "name": "default-placement",
>> "tags": [],
>> "storage_classes": [
>> "STANDARD"
>> ]
>> }
>> ],
>> "default_placement": "default-placement",
>> "realm_id": "8c38fa05-c19d-4e30-bc98-e2bc84eccb68",
>> "sync_policy": {
>> "groups": []
>> }
>> }
>>
>> It's identical on the secondary (that's after a realm pull, an update of
>> the zone-b endpoints, and a period commit), which I double-checked by
>> piping the output to md5sum on both sides.
>> The system user created on the primary is
>>
>> radosgw-admin user info --uid realm-a-system-user
>> {
>> ...
>> "keys": [
>> {
>> "user": "realm-a-system-user",
>> "access_key": "IUs+USI5IjA8WkZPRjU=",
>> "secret_key": "PGRDSzRERD4lbF9AYThuLzkvW1QvL148Q147PA=="
>> }
>> ...
>> }
>>
>> The zones on both sides have these keys
>>
>> radosgw-a

[ceph-users] Re: using msgr-v1 for OSDs on nautilus

2020-11-06 Thread Void Star Nill
Thanks Eugen. I will give it a try.

Regards,
Shridhar


On Thu, 5 Nov 2020 at 23:52, Eugen Block  wrote:

> Hi,
>
> you could try do only bind to v1 [1] by setting
>
> ms_bind_msgr2 = false
>
>
> Regards,
> Eugen
>
>
> [1] https://docs.ceph.com/en/latest/rados/configuration/msgr2/
>
>
> Zitat von Void Star Nill :
>
> > Hello,
> >
> > I am running nautilus cluster. Is there a way to force the cluster to use
> > msgr-v1 instead of msgr-v2?
> >
> > I am debugging an issue and it seems like it could be related to the msgr
> > layer, so want to test it by using msgr-v1.
> >
> > Thanks,
> > Shridhar
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Debugging slow ops

2020-11-06 Thread Void Star Nill
Hello,

I am trying to debug slow operations in our cluster running Nautilus
14.2.13. I am analysing the output of "ceph daemon osd.N dump_historic_ops"
command. I am noticing that the

I am noticing that most of the time is spent between "header_read" and
"throttled" events. For example, below is an operation that took ~160
seconds to complete and almost all of that time was spent between these 2
events.

Going by the descriptions at
https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-osd/#debugging-slow-requests


   -

   header_read: When the messenger first started reading the message off
   the wire.
   -

   throttled: When the messenger tried to acquire memory throttle space to
   read the message into memory.
   -

   all_read: When the messenger finished reading the message off the wire.


Does this mean that the slowness I am observing is because OSD's messaging
layer is not able to acquire the memory required for the message fast
enough?

The system has lots of available memory (over 300G), so how do I tune OSD
to perform better at this?

Appreciate any feedback on this.

{
"description": "osd_op(client.405792.0:98299 3.313
3:c8c63189:::rbd_data.51b046b8b4567.0180:head [set-alloc-hint
object_size 4194304 write_size 4194304,writefull 0~4194304] snapc 0=[]
ondisk+write+known_if_redirected e1073)",
"initiated_at": "2020-11-06 16:16:40.924448",
"age": 164.3215580289,
"duration": 159.57800813,
"type_data": {
"flag_point": "commit sent; apply or cleanup",
"client_info": {
"client": "client.405792",
"client_addr": "v1:x.y.156.101:0/3840080733",
"tid": 98299
},
"events": [
{
"time": "2020-11-06 16:16:40.924448",
"event": "initiated"
},
{
"time": "2020-11-06 16:16:40.924448",
"event": "header_read"
},
{
"time": "2020-11-06 16:19:20.481593",
"event": "throttled"
},
{
"time": "2020-11-06 16:19:20.487331",
"event": "all_read"
},
{
"time": "2020-11-06 16:19:20.487333",
"event": "dispatched"
},
{
"time": "2020-11-06 16:19:20.487340",
"event": "queued_for_pg"
},
{
"time": "2020-11-06 16:19:20.487372",
"event": "reached_pg"
},
{
"time": "2020-11-06 16:19:20.487507",
"event": "started"
},
{
"time": "2020-11-06 16:19:20.487586",
"event": "waiting for subops from 1,94"
},
{
"time": "2020-11-06 16:19:20.491873",
"event": "op_commit"
},
{
"time": "2020-11-06 16:19:20.501164",
"event": "sub_op_commit_rec"
},
{
"time": "2020-11-06 16:19:20.502423",
"event": "sub_op_commit_rec"
},
{
"time": "2020-11-06 16:19:20.502438",
"event": "commit_sent"
},
{
"time": "2020-11-06 16:19:20.502456",
"event": "done"
}
]
}
}
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io