[ceph-users] Re: Octopus - unbalanced OSDs

2021-04-19 Thread Dan van der Ster
This should help:

ceph config set mgr mgr/balancer/upmap_max_deviation 1

On Mon, Apr 19, 2021 at 10:17 AM Ml Ml  wrote:
>
> Anyone an idea? :)
>
> On Fri, Apr 16, 2021 at 3:09 PM Ml Ml  wrote:
> >
> > Hello List,
> >
> > any ideas why my OSDs are that unbalanced ?
> >
> > root@ceph01:~# ceph -s
> >   cluster:
> > id: 5436dd5d-83d4-4dc8-a93b-60ab5db145df
> > health: HEALTH_WARN
> > 1 nearfull osd(s)
> > 4 pool(s) nearfull
> >
> >   services:
> > mon: 3 daemons, quorum ceph03,ceph01,ceph02 (age 2w)
> > mgr: ceph03(active, since 4M), standbys: ceph02.jwvivm
> > mds: backup:1 {0=backup.ceph06.hdjehi=up:active} 3 up:standby
> > osd: 56 osds: 56 up (since 29h), 56 in (since 3d)
> >
> >   task status:
> > scrub status:
> > mds.backup.ceph06.hdjehi: idle
> >
> >   data:
> > pools:   4 pools, 1185 pgs
> > objects: 24.29M objects, 44 TiB
> > usage:   151 TiB used, 55 TiB / 206 TiB avail
> > pgs: 675 active+clean
> >  476 active+clean+snaptrim_wait
> >  30  active+clean+snaptrim
> >  4   active+clean+scrubbing+deep
> >
> > root@ceph01:~# ceph osd df tree
> > ID   CLASS  WEIGHT REWEIGHT  SIZE RAW USE  DATA OMAP
> > META AVAIL%USE   VAR   PGS  STATUS  TYPE NAME
> >  -1 206.79979 -  206 TiB  151 TiB  151 TiB   36 GiB
> > 503 GiB   55 TiB  73.23  1.00-  root default
> >  -2  28.89995 -   29 TiB   20 TiB   20 TiB  5.5 GiB
> > 74 GiB  8.9 TiB  69.19  0.94-  host ceph01
> >   0hdd2.7   1.0  2.7 TiB  1.8 TiB  1.8 TiB  590 MiB
> > 6.9 GiB  908 GiB  66.81  0.91   44  up  osd.0
> >   1hdd2.7   1.0  2.7 TiB  1.6 TiB  1.6 TiB  411 MiB
> > 6.5 GiB  1.1 TiB  60.43  0.83   39  up  osd.1
> >   4hdd2.7   1.0  2.7 TiB  1.8 TiB  1.8 TiB  501 MiB
> > 6.8 GiB  898 GiB  67.15  0.92   43  up  osd.4
> >   8hdd2.7   1.0  2.7 TiB  2.0 TiB  2.0 TiB  453 MiB
> > 7.0 GiB  700 GiB  74.39  1.02   47  up  osd.8
> >  11hdd1.7   1.0  1.7 TiB  1.3 TiB  1.3 TiB  356 MiB
> > 5.6 GiB  433 GiB  75.39  1.03   31  up  osd.11
> >  12hdd2.7   1.0  2.7 TiB  2.1 TiB  2.1 TiB  471 MiB
> > 7.0 GiB  591 GiB  78.40  1.07   48  up  osd.12
> >  14hdd2.7   1.0  2.7 TiB  1.6 TiB  1.6 TiB  448 MiB
> > 6.0 GiB  1.1 TiB  59.68  0.82   38  up  osd.14
> >  18hdd2.7   1.0  2.7 TiB  1.7 TiB  1.7 TiB  515 MiB
> > 6.2 GiB  980 GiB  64.15  0.88   41  up  osd.18
> >  22hdd1.7   1.0  1.7 TiB  1.2 TiB  1.2 TiB  360 MiB
> > 4.2 GiB  491 GiB  72.06  0.98   29  up  osd.22
> >  30hdd1.7   1.0  1.7 TiB  1.2 TiB  1.2 TiB  366 MiB
> > 4.7 GiB  558 GiB  68.26  0.93   28  up  osd.30
> >  33hdd1.5   1.0  1.6 TiB  1.2 TiB  1.2 TiB  406 MiB
> > 4.9 GiB  427 GiB  74.28  1.01   29  up  osd.33
> >  64hdd3.2   1.0  3.3 TiB  2.4 TiB  2.4 TiB  736 MiB
> > 8.6 GiB  915 GiB  73.22  1.00   60  up  osd.64
> >  -3  29.69995 -   30 TiB   22 TiB   22 TiB  5.4 GiB
> > 81 GiB  7.9 TiB  73.20  1.00-  host ceph02
> >   2hdd1.7   1.0  1.7 TiB  1.3 TiB  1.2 TiB  402 MiB
> > 5.2 GiB  476 GiB  72.93  1.00   30  up  osd.2
> >   3hdd2.7   1.0  2.7 TiB  2.0 TiB  2.0 TiB  653 MiB
> > 7.8 GiB  652 GiB  76.15  1.04   49  up  osd.3
> >   7hdd2.7   1.0  2.7 TiB  2.5 TiB  2.5 TiB  456 MiB
> > 7.7 GiB  209 GiB  92.36  1.26   56  up  osd.7
> >   9hdd2.7   1.0  2.7 TiB  1.9 TiB  1.9 TiB  434 MiB
> > 7.2 GiB  781 GiB  71.46  0.98   46  up  osd.9
> >  13hdd2.3   1.0  2.4 TiB  1.6 TiB  1.6 TiB  451 MiB
> > 6.1 GiB  823 GiB  66.28  0.91   38  up  osd.13
> >  16hdd2.7   1.0  2.7 TiB  1.6 TiB  1.6 TiB  375 MiB
> > 6.4 GiB  1.1 TiB  59.84  0.82   39  up  osd.16
> >  19hdd1.7   1.0  1.7 TiB  1.1 TiB  1.1 TiB  323 MiB
> > 4.7 GiB  601 GiB  65.80  0.90   27  up  osd.19
> >  23hdd2.7   1.0  2.7 TiB  2.2 TiB  2.2 TiB  471 MiB
> > 7.7 GiB  520 GiB  80.99  1.11   50  up  osd.23
> >  24hdd1.7   1.0  1.7 TiB  1.4 TiB  1.4 TiB  371 MiB
> > 5.5 GiB  273 GiB  84.44  1.15   32  up  osd.24
> >  28hdd2.7   1.0  2.7 TiB  1.9 TiB  1.9 TiB  428 MiB
> > 7.4 GiB  818 GiB  70.07  0.96   44  up  osd.28
> >  31hdd2.7   1.0  2.7 TiB  2.0 TiB  2.0 TiB  516 MiB
> > 7.4 GiB  660 GiB  75.85  1.04   48  up  osd.31
> >  32hdd3.2   1.0  3.3 TiB  2.2 TiB  2.2 TiB  661 MiB
> > 7.9 GiB  1.2 TiB  64.86  0.89   52  up  osd.32
> >  -4  26.29996 -   26 TiB   18 TiB  

[ceph-users] Re: cephadm: how to create more than 1 rgw per host

2021-04-19 Thread Sebastian Wagner
Hi Ivan,

this is a feature that is not yet released in Pacific. It seems the
documentation is a bit ahead of time right now.

Sebastian

On Fri, Apr 16, 2021 at 10:58 PM i...@z1storage.com 
wrote:

> Hello,
>
> According to the documentation, there's count-per-host key to 'ceph
> orch', but it does not work for me:
>
> :~# ceph orch apply rgw z1 sa-1 --placement='label:rgw count-per-host:2'
> --port=8000 --dry-run
> Error EINVAL: Host and label are mutually exclusive
>
> Why it says anything about Host if I don't specify any hosts, just labels?
>
> ~# ceph orch host ls
> HOST  ADDR  LABELS   STATUS
> s101  s101  mon rgw
> s102  s102  mgr mon rgw
> s103  s103  mon rgw
> s104  s104  mgr mon rgw
> s105  s105  mgr mon rgw
> s106  s106  mon rgw
> s107  s107  mon rgw
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] cleanup multipart in radosgw

2021-04-19 Thread Boris Behrens
Hi,
is there a way to remove multipart uploads that are older than X days?

It doesn't need to be build into ceph or is automated to the end. Just
something I don't need to build on my own.

I currently try to debug a problem where ceph reports a lot more used space
than it actually requires (
https://www.mail-archive.com/ceph-users@ceph.io/msg09810.html).

I came across a lot of old _multipart_ files in some buckets and now I want
to clean them up.
I don't know if this will fix my problem but I would love to rule that out.

radosgw-admin bucket check --bucket=bucket --check-objects --fix does not
work because it is a shareded bucket.

I have also some buckets that look like this, and contain 100% _multipart_
files which are >2 years old:
"buckets": [
{
"bucket": "ncprod",
"tenant": "",
"num_objects": -482,
"num_shards": 0,
"objects_per_shard": -482,
"fill_status": "OVER 180143985094819%"
}
]

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Radosgw - WARNING: couldn't find acl header for object, generating default

2021-04-19 Thread by morphin
Hello.

I've a RGW bucket (versioning=on). And there was objects like this:

radosgw-admin object stat --bucket=xdir
--object=f5492238-50cb-4bc2-93fa-424869018946
{
"name": "f5492238-50cb-4bc2-93fa-424869018946",
"size": 0,
"tag": "",
"attrs": {
"user.rgw.manifest": "",
"user.rgw.olh.idtag": "5rs3x0qh152tn0j865k8ybo9xqy92qjn",
"user.rgw.olh.info": "\u0001\u0001�",
"user.rgw.olh.pending.607c87b5pgo03tvm3sqt23i9":
"\u0001\u0001\u0008",
"user.rgw.olh.pending.607c87b5pyv13ugk3fadvxw7":
"\u0001\u0001\u0008",
"user.rgw.olh.pending.607c87b5qic02n0e54zsjkax":
"\u0001\u0001\u0008",
"user.rgw.olh.ver": "3"
}
}

I'm not sure but I suspect that these objects maybe leftover from
unfinished multisite sync.
I've removed the zone and made it master (because I have to for something
else) and I've created a newbucket with (versioning=off) and copied all
objects from old rgw bucket to a new rgw bucket with "rclone"

cmd: "rclone copy --files-from "object.list" old:bucket new:bucket
--no-traverse -vv --progress --fast-list --no-check-dest
--no-update-modtime"

Config:
[bucket]
type = s3
provider = Ceph
env_auth = false
acl = private
bucket =
access_key_id =
secret_access_key =
endpoint =

After the copy I've checked every object via "radosgw-admin object stat
object=$i" and there was still these problematic objects. I've tried to
overwrite these objects from the Backup-bucket at different cluster (these
objects are fine) and I tried to be sure everything written as expected via
object stat again and these pending object was gone and everything was ok.
All of them was OK!

After a while I started to see this warning in radosgw.log when the client
GET or HEAD the objects.

2021-04-19 11:37:50.230 7f2d917eb700  1 == starting new request
req=0x55a44414a710 =
2021-04-19 11:37:50.230 7f2d917eb700  0 WARNING: couldn't find acl header
for object, generating default
2021-04-19 11:37:50.230 7f2d917eb700  1 == req done req=0x55a44414a710
op status=0 http_status=200 latency=0s ==
2021-04-19 11:37:50.230 7f2d917eb700  1 beast: 0x55a44414a710: 10.10.10.1 -
- [2021-04-19 11:37:50.0.230489s] "HEAD
/xdir/f5492238-50cb-4bc2-93fa-424869018946 HTTP/1.1" 200 0 -
"aws-sdk-java/1.11.638 Linux/3.10.0-1160.11.1.el7.x86_64
Java_HotSpot(TM)_64-Bit_Server_VM/25.281-b09 java/1.8.0_281 groovy/2.5.6
vendor/Oracle_Corporation" -

These objects are problematic objects and somehow the overwritten objects
is gone and these objects left after the WARNING. (versioning is still =off
--> multisite = off --> the cluster alone in zonegroup and its masterzone)

But I've checked every overwritten object via "radosgw-admin object stat
object=$i" and there was all OK!!! How the hell they're gone and others
came back? HOW?
What should I do? Maybe "object rm" "log trim" and re-write again?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [Suspicious newsletter] cleanup multipart in radosgw

2021-04-19 Thread Boris Behrens
Hi Istvan,

both of them require bucket access, correct?
Is there a way to add the LC policy globally?

Cheers
 Boris

Am Mo., 19. Apr. 2021 um 11:58 Uhr schrieb Szabo, Istvan (Agoda) <
istvan.sz...@agoda.com>:

> Hi,
>
> You have 2 ways:
>
> First is using s3vrowser app and in the menu select the multipart uploads
> and clean it up.
> The other is like this:
>
> Set lifecycle policy
> On the client:
> vim lifecyclepolicy
> 
> http://s3.amazonaws.com/doc/2006-03-01/";>
> 
> Incomplete Multipart
> Uploads
> 
> Enabled
>
> 
>
> 1
>
> 
> 
> 
>
> /bin/s3cmd setlifecycle lifecyclepolicy  s3://bucketname
> On mon node process manually
> radosgw-admin lc list
> radosgw-admin lc process
>
> Istvan Szabo
> Senior Infrastructure Engineer
> ---
> Agoda Services Co., Ltd.
> e: istvan.sz...@agoda.com
> ---
>
> -Original Message-
> From: Boris Behrens 
> Sent: Monday, April 19, 2021 4:10 PM
> To: ceph-users@ceph.io
> Subject: [Suspicious newsletter] [ceph-users] cleanup multipart in radosgw
>
> Hi,
> is there a way to remove multipart uploads that are older than X days?
>
> It doesn't need to be build into ceph or is automated to the end. Just
> something I don't need to build on my own.
>
> I currently try to debug a problem where ceph reports a lot more used
> space than it actually requires (
> https://www.mail-archive.com/ceph-users@ceph.io/msg09810.html).
>
> I came across a lot of old _multipart_ files in some buckets and now I
> want to clean them up.
> I don't know if this will fix my problem but I would love to rule that out.
>
> radosgw-admin bucket check --bucket=bucket --check-objects --fix does not
> work because it is a shareded bucket.
>
> I have also some buckets that look like this, and contain 100% _multipart_
> files which are >2 years old:
> "buckets": [
> {
> "bucket": "ncprod",
> "tenant": "",
> "num_objects": -482,
> "num_shards": 0,
> "objects_per_shard": -482,
> "fill_status": "OVER 180143985094819%"
> }
> ]
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groüen Saal.
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> email to ceph-users-le...@ceph.io
>
> 
> This message is confidential and is for the sole use of the intended
> recipient(s). It may also be privileged or otherwise protected by copyright
> or other legal rules. If you have received it by mistake please let us know
> by reply email and delete it from your system. It is prohibited to copy
> this message or disclose its content to anyone. Any confidentiality or
> privilege is not waived or lost by any mistaken delivery or unauthorized
> disclosure of the message. All messages sent to and from Agoda may be
> monitored to ensure compliance with company policies, to protect the
> company's interests and to remove potential malware. Electronic messages
> may be intercepted, amended, lost or deleted, or contain viruses.
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Octopus - unbalanced OSDs

2021-04-19 Thread Ml Ml
Anyone an idea? :)

On Fri, Apr 16, 2021 at 3:09 PM Ml Ml  wrote:
>
> Hello List,
>
> any ideas why my OSDs are that unbalanced ?
>
> root@ceph01:~# ceph -s
>   cluster:
> id: 5436dd5d-83d4-4dc8-a93b-60ab5db145df
> health: HEALTH_WARN
> 1 nearfull osd(s)
> 4 pool(s) nearfull
>
>   services:
> mon: 3 daemons, quorum ceph03,ceph01,ceph02 (age 2w)
> mgr: ceph03(active, since 4M), standbys: ceph02.jwvivm
> mds: backup:1 {0=backup.ceph06.hdjehi=up:active} 3 up:standby
> osd: 56 osds: 56 up (since 29h), 56 in (since 3d)
>
>   task status:
> scrub status:
> mds.backup.ceph06.hdjehi: idle
>
>   data:
> pools:   4 pools, 1185 pgs
> objects: 24.29M objects, 44 TiB
> usage:   151 TiB used, 55 TiB / 206 TiB avail
> pgs: 675 active+clean
>  476 active+clean+snaptrim_wait
>  30  active+clean+snaptrim
>  4   active+clean+scrubbing+deep
>
> root@ceph01:~# ceph osd df tree
> ID   CLASS  WEIGHT REWEIGHT  SIZE RAW USE  DATA OMAP
> META AVAIL%USE   VAR   PGS  STATUS  TYPE NAME
>  -1 206.79979 -  206 TiB  151 TiB  151 TiB   36 GiB
> 503 GiB   55 TiB  73.23  1.00-  root default
>  -2  28.89995 -   29 TiB   20 TiB   20 TiB  5.5 GiB
> 74 GiB  8.9 TiB  69.19  0.94-  host ceph01
>   0hdd2.7   1.0  2.7 TiB  1.8 TiB  1.8 TiB  590 MiB
> 6.9 GiB  908 GiB  66.81  0.91   44  up  osd.0
>   1hdd2.7   1.0  2.7 TiB  1.6 TiB  1.6 TiB  411 MiB
> 6.5 GiB  1.1 TiB  60.43  0.83   39  up  osd.1
>   4hdd2.7   1.0  2.7 TiB  1.8 TiB  1.8 TiB  501 MiB
> 6.8 GiB  898 GiB  67.15  0.92   43  up  osd.4
>   8hdd2.7   1.0  2.7 TiB  2.0 TiB  2.0 TiB  453 MiB
> 7.0 GiB  700 GiB  74.39  1.02   47  up  osd.8
>  11hdd1.7   1.0  1.7 TiB  1.3 TiB  1.3 TiB  356 MiB
> 5.6 GiB  433 GiB  75.39  1.03   31  up  osd.11
>  12hdd2.7   1.0  2.7 TiB  2.1 TiB  2.1 TiB  471 MiB
> 7.0 GiB  591 GiB  78.40  1.07   48  up  osd.12
>  14hdd2.7   1.0  2.7 TiB  1.6 TiB  1.6 TiB  448 MiB
> 6.0 GiB  1.1 TiB  59.68  0.82   38  up  osd.14
>  18hdd2.7   1.0  2.7 TiB  1.7 TiB  1.7 TiB  515 MiB
> 6.2 GiB  980 GiB  64.15  0.88   41  up  osd.18
>  22hdd1.7   1.0  1.7 TiB  1.2 TiB  1.2 TiB  360 MiB
> 4.2 GiB  491 GiB  72.06  0.98   29  up  osd.22
>  30hdd1.7   1.0  1.7 TiB  1.2 TiB  1.2 TiB  366 MiB
> 4.7 GiB  558 GiB  68.26  0.93   28  up  osd.30
>  33hdd1.5   1.0  1.6 TiB  1.2 TiB  1.2 TiB  406 MiB
> 4.9 GiB  427 GiB  74.28  1.01   29  up  osd.33
>  64hdd3.2   1.0  3.3 TiB  2.4 TiB  2.4 TiB  736 MiB
> 8.6 GiB  915 GiB  73.22  1.00   60  up  osd.64
>  -3  29.69995 -   30 TiB   22 TiB   22 TiB  5.4 GiB
> 81 GiB  7.9 TiB  73.20  1.00-  host ceph02
>   2hdd1.7   1.0  1.7 TiB  1.3 TiB  1.2 TiB  402 MiB
> 5.2 GiB  476 GiB  72.93  1.00   30  up  osd.2
>   3hdd2.7   1.0  2.7 TiB  2.0 TiB  2.0 TiB  653 MiB
> 7.8 GiB  652 GiB  76.15  1.04   49  up  osd.3
>   7hdd2.7   1.0  2.7 TiB  2.5 TiB  2.5 TiB  456 MiB
> 7.7 GiB  209 GiB  92.36  1.26   56  up  osd.7
>   9hdd2.7   1.0  2.7 TiB  1.9 TiB  1.9 TiB  434 MiB
> 7.2 GiB  781 GiB  71.46  0.98   46  up  osd.9
>  13hdd2.3   1.0  2.4 TiB  1.6 TiB  1.6 TiB  451 MiB
> 6.1 GiB  823 GiB  66.28  0.91   38  up  osd.13
>  16hdd2.7   1.0  2.7 TiB  1.6 TiB  1.6 TiB  375 MiB
> 6.4 GiB  1.1 TiB  59.84  0.82   39  up  osd.16
>  19hdd1.7   1.0  1.7 TiB  1.1 TiB  1.1 TiB  323 MiB
> 4.7 GiB  601 GiB  65.80  0.90   27  up  osd.19
>  23hdd2.7   1.0  2.7 TiB  2.2 TiB  2.2 TiB  471 MiB
> 7.7 GiB  520 GiB  80.99  1.11   50  up  osd.23
>  24hdd1.7   1.0  1.7 TiB  1.4 TiB  1.4 TiB  371 MiB
> 5.5 GiB  273 GiB  84.44  1.15   32  up  osd.24
>  28hdd2.7   1.0  2.7 TiB  1.9 TiB  1.9 TiB  428 MiB
> 7.4 GiB  818 GiB  70.07  0.96   44  up  osd.28
>  31hdd2.7   1.0  2.7 TiB  2.0 TiB  2.0 TiB  516 MiB
> 7.4 GiB  660 GiB  75.85  1.04   48  up  osd.31
>  32hdd3.2   1.0  3.3 TiB  2.2 TiB  2.2 TiB  661 MiB
> 7.9 GiB  1.2 TiB  64.86  0.89   52  up  osd.32
>  -4  26.29996 -   26 TiB   18 TiB   18 TiB  4.3 GiB
> 73 GiB  8.0 TiB  69.58  0.95-  host ceph03
>   5hdd1.7   1.0  1.7 TiB  1.2 TiB  1.2 TiB  298 MiB
> 5.2 GiB  541 GiB  69.21  0.95   29  up  osd.5
>   6hdd1.7   1.0  1.7 TiB  1.0 TiB  1.0 TiB  321 MiB
> 4.4 GiB  697 GiB  60.34  0.

[ceph-users] Documentation of the LVM metadata format

2021-04-19 Thread Nico Schottelius


Good morning,

is there any documentation available regarding the meta data stored
within LVM that ceph-volume manages / creates?

My background is that ceph-volume activate does not work on non-systemd
Linux distributions, but if I know how to recreate the tmpfs, we can
easily start the osd without systemd.

Any pointers in the right direction are appreciated.

Best regards,

Nico

--
Sustainable and modern Infrastructures by ungleich.ch
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [Suspicious newsletter] cleanup multipart in radosgw

2021-04-19 Thread Szabo, Istvan (Agoda)
Hi,

You have 2 ways:

First is using s3vrowser app and in the menu select the multipart uploads and 
clean it up.
The other is like this:

Set lifecycle policy
On the client:
vim lifecyclepolicy

http://s3.amazonaws.com/doc/2006-03-01/";>

Incomplete Multipart 
Uploads

Enabled


1





/bin/s3cmd setlifecycle lifecyclepolicy  s3://bucketname
On mon node process manually
radosgw-admin lc list
radosgw-admin lc process

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---

-Original Message-
From: Boris Behrens 
Sent: Monday, April 19, 2021 4:10 PM
To: ceph-users@ceph.io
Subject: [Suspicious newsletter] [ceph-users] cleanup multipart in radosgw

Hi,
is there a way to remove multipart uploads that are older than X days?

It doesn't need to be build into ceph or is automated to the end. Just 
something I don't need to build on my own.

I currently try to debug a problem where ceph reports a lot more used space 
than it actually requires ( 
https://www.mail-archive.com/ceph-users@ceph.io/msg09810.html).

I came across a lot of old _multipart_ files in some buckets and now I want to 
clean them up.
I don't know if this will fix my problem but I would love to rule that out.

radosgw-admin bucket check --bucket=bucket --check-objects --fix does not work 
because it is a shareded bucket.

I have also some buckets that look like this, and contain 100% _multipart_ 
files which are >2 years old:
"buckets": [
{
"bucket": "ncprod",
"tenant": "",
"num_objects": -482,
"num_shards": 0,
"objects_per_shard": -482,
"fill_status": "OVER 180143985094819%"
}
]

--
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im 
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
ceph-users-le...@ceph.io


This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm: how to create more than 1 rgw per host

2021-04-19 Thread i...@z1storage.com

Hi Sebastian,

Thank you. Is there a way to create more than 1 rgw per host until this 
new feature is released?


On 2021/04/19 11:39, Sebastian Wagner wrote:

Hi Ivan,

this is a feature that is not yet released in Pacific. It seems the 
documentation is a bit ahead of time right now.


Sebastian

On Fri, Apr 16, 2021 at 10:58 PM i...@z1storage.com 
 > wrote:


Hello,

According to the documentation, there's count-per-host key to 'ceph
orch', but it does not work for me:

:~# ceph orch apply rgw z1 sa-1 --placement='label:rgw
count-per-host:2'
--port=8000 --dry-run
Error EINVAL: Host and label are mutually exclusive

Why it says anything about Host if I don't specify any hosts, just
labels?

~# ceph orch host ls
HOST  ADDR  LABELS   STATUS
s101  s101  mon rgw
s102  s102  mgr mon rgw
s103  s103  mon rgw
s104  s104  mgr mon rgw
s105  s105  mgr mon rgw
s106  s106  mon rgw
s107  s107  mon rgw

___
ceph-users mailing list -- ceph-users@ceph.io

To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Documentation of the LVM metadata format

2021-04-19 Thread Nico Schottelius


The best questions are the ones that one can answer oneself.
The great documentation on

https://docs.ceph.com/en/latest/dev/ceph-volume/lvm/

gives the right pointers. The right search term is "lvm list tags" and
results into something like this:

[15:56:04] server20.place6:~# lvs -o lv_tags
  /dev/sda: open failed: No medium found
  /dev/sdb: open failed: No medium found
  LV Tags
  
ceph.block_device=/dev/ceph-26fdb7c4-17af-42af-8353-06d95b0071c7/osd-block-bb63e9b6-b2d7-40d1-83ee-815262ae8b45,...

If anyone is interested in upstream ceph-volume support for non-systemd
Linux distributions to activate the ceph-osds, let me know.

In any case, we'll publish our new style scripts on [0].

Best regards,

Nico

[0] https://code.ungleich.ch/ungleich-public/ungleich-tools

Nico Schottelius  writes:

> Good morning,
>
> is there any documentation available regarding the meta data stored
> within LVM that ceph-volume manages / creates?
>
> My background is that ceph-volume activate does not work on non-systemd
> Linux distributions, but if I know how to recreate the tmpfs, we can
> easily start the osd without systemd.
>
> Any pointers in the right direction are appreciated.
>
> Best regards,
>
> Nico


--
Sustainable and modern Infrastructures by ungleich.ch
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] BlueFS spillover detected (Nautilus 14.2.16)

2021-04-19 Thread by morphin
Hello.

I'm trying to fix a wrong cluster deployment (Nautilus 14.2.16)
Cluster usage is %40 EC pool with RGW

Every node has:
20 x OSD = TOSHIBA  MG08SCA16TEY 16.0TB
2 x DB = NVME PM1725b 1.6TB (linux mdadm raid1)

NVME usage always goes around %90-99.
With "iostat -xdh 1"
 r/s w/s rkB/s wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm
r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
  168.00 3619.00  7.2M367.7M 0.00 90510.00   0.0%  96.2%
1.109.21  22.8643.8k   104.0k   0.25  96.0% nvme0c0n1
   19.00 3670.00  1.7M373.5M 0.00 90510.00   0.0%  96.1%
0.26   29.61  95.9989.7k   104.2k   0.27  98.0% nvme1c1n1

The problem is: BLUEFS_SPILLOVER BlueFS spillover detected on 120 OSD(s)
 osd.194 spilled over 42 GiB metadata from 'db' device (39 GiB used of
50 GiB) to slow device
 osd.195 spilled over 34 GiB metadata from 'db' device (40 GiB used of
50 GiB) to slow device
 osd.196 spilled over 28 GiB metadata from 'db' device (40 GiB used of
50 GiB) to slow device
 osd.197 spilled over 25 GiB metadata from 'db' device (41 GiB used of
50 GiB) to slow device
 osd.198 spilled over 30 GiB metadata from 'db' device (41 GiB used of
50 GiB) to slow device

Block and wal size:
bluestore_block_db_size = 53687091200
bluestore_block_wal_size = 0

nvme0n1
   259:20   1.5T  0 disk
└─md0
 9:00   1.5T  0 raid1
  ├─md0p1
   259:4050G  0 md
  ├─md0p2
   259:5050G  0 md
+n
  └─md0p20
  259:22   050G  0 md


How can I change the level up to 500MB --> 5GB --> 50GB ?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Documentation of the LVM metadata format

2021-04-19 Thread Dimitri Savineau
Hi,

> My background is that ceph-volume activate does not work on non-systemd
 Linux distributions

Why not using the --no-systemd option during the ceph-volume activate
command?

The systemd part is only enabling and starting the service but the tmpfs
part should work if you're not using systemd

https://github.com/ceph/ceph/blob/master/src/ceph-volume/ceph_volume/devices/lvm/activate.py#L212

Dimitri

On Monday, April 19, 2021, Nico Schottelius 
wrote:

>
> The best questions are the ones that one can answer oneself.
> The great documentation on
>
> https://docs.ceph.com/en/latest/dev/ceph-volume/lvm/
>
> gives the right pointers. The right search term is "lvm list tags" and
> results into something like this:
>
> [15:56:04] server20.place6:~# lvs -o lv_tags
>   /dev/sda: open failed: No medium found
>   /dev/sdb: open failed: No medium found
>   LV Tags
>   ceph.block_device=/dev/ceph-26fdb7c4-17af-42af-8353-
> 06d95b0071c7/osd-block-bb63e9b6-b2d7-40d1-83ee-815262ae8b45,...
>
> If anyone is interested in upstream ceph-volume support for non-systemd
> Linux distributions to activate the ceph-osds, let me know.
>
> In any case, we'll publish our new style scripts on [0].
>
> Best regards,
>
> Nico
>
> [0] https://code.ungleich.ch/ungleich-public/ungleich-tools
>
> Nico Schottelius  writes:
>
> > Good morning,
> >
> > is there any documentation available regarding the meta data stored
> > within LVM that ceph-volume manages / creates?
> >
> > My background is that ceph-volume activate does not work on non-systemd
> > Linux distributions, but if I know how to recreate the tmpfs, we can
> > easily start the osd without systemd.
> >
> > Any pointers in the right direction are appreciated.
> >
> > Best regards,
> >
> > Nico
>
>
> --
> Sustainable and modern Infrastructures by ungleich.ch
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Documentation of the LVM metadata format

2021-04-19 Thread Nico Schottelius


Hey Dimitir,

because --no-systemd still requires systemd:

[19:03:00] server20.place6:~# ceph-volume lvm activate --all --no-systemd
--> Executable systemctl not in PATH: 
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
-->  FileNotFoundError: [Errno 2] No such file or directory: 'systemctl': 
'systemctl'

Best regards,

Nico

Dimitri Savineau  writes:

> Hi,
>
>> My background is that ceph-volume activate does not work on non-systemd
>  Linux distributions
>
> Why not using the --no-systemd option during the ceph-volume activate
> command?
>
> The systemd part is only enabling and starting the service but the tmpfs
> part should work if you're not using systemd
>
> https://github.com/ceph/ceph/blob/master/src/ceph-volume/ceph_volume/devices/lvm/activate.py#L212
>
> Dimitri
>
> On Monday, April 19, 2021, Nico Schottelius 
> wrote:
>
>>
>> The best questions are the ones that one can answer oneself.
>> The great documentation on
>>
>> https://docs.ceph.com/en/latest/dev/ceph-volume/lvm/
>>
>> gives the right pointers. The right search term is "lvm list tags" and
>> results into something like this:
>>
>> [15:56:04] server20.place6:~# lvs -o lv_tags
>>   /dev/sda: open failed: No medium found
>>   /dev/sdb: open failed: No medium found
>>   LV Tags
>>   ceph.block_device=/dev/ceph-26fdb7c4-17af-42af-8353-
>> 06d95b0071c7/osd-block-bb63e9b6-b2d7-40d1-83ee-815262ae8b45,...
>>
>> If anyone is interested in upstream ceph-volume support for non-systemd
>> Linux distributions to activate the ceph-osds, let me know.
>>
>> In any case, we'll publish our new style scripts on [0].
>>
>> Best regards,
>>
>> Nico
>>
>> [0] https://code.ungleich.ch/ungleich-public/ungleich-tools
>>
>> Nico Schottelius  writes:
>>
>> > Good morning,
>> >
>> > is there any documentation available regarding the meta data stored
>> > within LVM that ceph-volume manages / creates?
>> >
>> > My background is that ceph-volume activate does not work on non-systemd
>> > Linux distributions, but if I know how to recreate the tmpfs, we can
>> > easily start the osd without systemd.
>> >
>> > Any pointers in the right direction are appreciated.
>> >
>> > Best regards,
>> >
>> > Nico
>>
>>
>> --
>> Sustainable and modern Infrastructures by ungleich.ch
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>


--
Sustainable and modern Infrastructures by ungleich.ch
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Upgrade and lost osds Operation not permitted

2021-04-19 Thread Behzad Khoshbakhti
thanks by commenting the ProtectClock directive, the issue is resolved.
Thanks for the support.

On Sun, Apr 18, 2021 at 9:28 AM Lomayani S. Laizer 
wrote:

> Hello,
>
> Uncomment ProtectClock=true in /lib/systemd/system/ceph-osd@.service
> should fix the issue
>
>
>
> On Thu, Apr 8, 2021 at 9:49 AM Behzad Khoshbakhti 
> wrote:
>
>> I believe there is some of problem in the systemd as the ceph starts
>> successfully by running manually using the ceph-osd command.
>>
>> On Thu, Apr 8, 2021, 10:32 AM Enrico Kern 
>> wrote:
>>
>> > I agree. But why does the process start manual without systemd which
>> > obviously has nothing to do with uid/gid 167 ? It is also not really a
>> fix
>> > to let all users change uid/gids...
>> >
>> > On Wed, Apr 7, 2021 at 7:39 PM Wladimir Mutel  wrote:
>> >
>> > > Could there be more smooth migration? On my Ubuntu I have the same
>> > > behavior and my ceph uid/gud are also 64045.
>> > > I started with Luminous in 2018 when it was not containerized, and
>> still
>> > > continue updating it with apt.
>> > > Since when we have got this hardcoded value of 167 ???
>> > >
>> > > Andrew Walker-Brown wrote:
>> > > > UID and guid should both be 167 I believe.
>> > > >
>> > > > Make a note of the current values and change them to 167 using
>> usermod
>> > > and groupmod.
>> > > >
>> > > > I had just this issue. It’s partly to do with how perms are used
>> within
>> > > the containers I think.
>> > > >
>> > > > I changed the values to 167 in passwd everything worked again.
>> Symptoms
>> > > for me were OSDs not starting and permissions/file not found errors.
>> > > >
>> > > > Sent from my iPhone
>> > > >
>> > > > On 4 Apr 2021, at 13:43, Lomayani S. Laizer 
>> > wrote:
>> > > >
>> > > > 
>> > > > Hello,
>> > > > Permissions are correct. guid/uid is 64045/64045
>> > > >
>> > > > ls -alh
>> > > > total 32K
>> > > > drwxrwxrwt 2 ceph ceph  200 Apr  4 14:11 .
>> > > > drwxr-xr-x 8 ceph ceph 4.0K Sep 18  2018 ..
>> > > > lrwxrwxrwx 1 ceph ceph   93 Apr  4 14:11 block -> /dev/...
>> > > > -rw--- 1 ceph ceph   37 Apr  4 14:11 ceph_fsid
>> > > > -rw--- 1 ceph ceph   37 Apr  4 14:11 fsid
>> > > > -rw--- 1 ceph ceph   56 Apr  4 14:11 keyring
>> > > > -rw--- 1 ceph ceph6 Apr  4 14:11 ready
>> > > > -rw--- 1 ceph ceph3 Apr  4 14:11 require_osd_release
>> > > > -rw--- 1 ceph ceph   10 Apr  4 14:11 type
>> > > > -rw--- 1 ceph ceph3 Apr  4 14:11 whoami
>> > > >
>> > > > On Sun, Apr 4, 2021 at 3:07 PM Andrew Walker-Brown <
>> > > andrew_jbr...@hotmail.com> wrote:
>> > > > Are the file permissions correct and UID/guid in passwd  both 167?
>> > > >
>> > > > Sent from my iPhone
>> > > >
>> > > > On 4 Apr 2021, at 12:29, Lomayani S. Laizer > > > > > lomlai...@gmail.com>> wrote:
>> > > >
>> > > > Hello,
>> > > >
>> > > > +1 Am facing the same problem in ubuntu after upgrade to pacific
>> > > >
>> > > > 2021-04-03T10:36:07.698+0300 7f9b8d075f00 -1
>> > bluestore(/var/lib/ceph/osd/
>> > > > ceph-29/block) _read_bdev_label failed to open
>> > > /var/lib/ceph/osd/ceph-29/block:
>> > > > (1) Operation not permitted
>> > > > 2021-04-03T10:36:07.698+0300 7f9b8d075f00 -1 ESC[0;31m ** ERROR:
>> unable
>> > > to
>> > > > open OSD superblock on /var/lib/ceph/osd/ceph-29: (2) No such file
>> or
>> > > > directoryESC[0m
>> > > >
>> > > > On Sun, Apr 4, 2021 at 1:52 PM Behzad Khoshbakhti <
>> > > khoshbakh...@gmail.com>
>> > > > wrote:
>> > > >
>> > > >> It worth mentioning as I issue the following command, the Ceph OSD
>> > > starts
>> > > >> and joins the cluster:
>> > > >> /usr/bin/ceph-osd -f --cluster ceph --id 2 --setuser ceph
>> --setgroup
>> > > ceph
>> > > >>
>> > > >>
>> > > >>
>> > > >> On Sun, Apr 4, 2021 at 3:00 PM Behzad Khoshbakhti <
>> > > khoshbakh...@gmail.com>
>> > > >> wrote:
>> > > >>
>> > > >>> Hi all,
>> > > >>>
>> > > >>> As I have upgrade my Ceph cluster from 15.2.10 to 16.2.0, during
>> the
>> > > >>> manual upgrade using the precompiled packages, the OSDs was down
>> with
>> > > the
>> > > >>> following messages:
>> > > >>>
>> > > >>> root@osd03:/var/lib/ceph/osd/ceph-2# ceph-volume lvm activate
>> --all
>> > > >>> --> Activating OSD ID 2 FSID 2d3ffc61-e430-4b89-bcd4-105b2df26352
>> > > >>> Running command: /usr/bin/chown -R ceph:ceph
>> /var/lib/ceph/osd/ceph-2
>> > > >>> Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph
>> > > >> prime-osd-dir
>> > > >>> --dev
>> > > >>>
>> > > >>
>> > >
>> >
>> /dev/ceph-9d37674b-a269-4239-aa9e-66a3c74df76c/osd-block-2d3ffc61-e430-4b89-bcd4-105b2df26352
>> > > >>> --path /var/lib/ceph/osd/ceph-2 --no-mon-config
>> > > >>> Running command: /usr/bin/ln -snf
>> > > >>>
>> > > >>
>> > >
>> >
>> /dev/ceph-9d37674b-a269-4239-aa9e-66a3c74df76c/osd-block-2d3ffc61-e430-4b89-bcd4-105b2df26352
>> > > >>> /var/lib/ceph/osd/ceph-2/block
>> > > >>> Running command: /usr/bin/chown -h ceph:ceph
>> > > >> /var/lib/ceph/osd/ceph-

[ceph-users] Re: any experience on using Bcache on top of HDD OSD

2021-04-19 Thread Matthias Ferdinand
On Sun, Apr 18, 2021 at 10:31:30PM +0200, huxia...@horebdata.cn wrote:
> Dear Cephers,
> 
> Just curious about any one who has some experience on using Bcache on top of 
> HDD OSD to accelerate IOPS performance? 
> 
> If any, how about the stability and the performance improvement, and for how 
> long the running time?

Hi,

I have not used bcache with Bluestore, but I use bcache in a Jewel
cluster with Filestore on XFS on bcache on HDD, and I haven't seen any
bcache-related trouble with this setup so far. I don't have journal on
bcache, journal is separated out to SSD.
>From time to time I drained and detached the caching device from an OSD
just to see if the added complexity still has some value, but latency
(as measured by iostat on the OSD) would go up by a factor of about 2 so
I kept bcache active on HDD OSDs.

For Bluestore I don't know how much (if at all) bcache would improve
performance where WAL/DB is placed on SSD already.

A word of warning: never use non-DC-class SSDs for bcache caching
devices. bcache does some write amplification, and you will regret it if
using consumer grade SSDs. Might even turn out slower than plain HDD
with highly irregular latency spikes.


Regards
Matthias
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Documentation of the LVM metadata format

2021-04-19 Thread Dimitri Savineau
So that's a bug ;)

https://github.com/ceph/ceph/blob/master/src/ceph-volume/ceph_volume/devices/lvm/activate.py#L248-L251

This doesn't honor the --no-systemd flag.

But this should work when you're not using the --all option.

Dimitri


On Mon, Apr 19, 2021 at 10:41 AM Nico Schottelius <
nico.schottel...@ungleich.ch> wrote:

>
> Hey Dimitir,
>
> because --no-systemd still requires systemd:
>
> [19:03:00] server20.place6:~# ceph-volume lvm activate --all --no-systemd
> --> Executable systemctl not in PATH:
> /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
> -->  FileNotFoundError: [Errno 2] No such file or directory: 'systemctl':
> 'systemctl'
>
> Best regards,
>
> Nico
>
> Dimitri Savineau  writes:
>
> > Hi,
> >
> >> My background is that ceph-volume activate does not work on non-systemd
> >  Linux distributions
> >
> > Why not using the --no-systemd option during the ceph-volume activate
> > command?
> >
> > The systemd part is only enabling and starting the service but the tmpfs
> > part should work if you're not using systemd
> >
> >
> https://github.com/ceph/ceph/blob/master/src/ceph-volume/ceph_volume/devices/lvm/activate.py#L212
> >
> > Dimitri
> >
> > On Monday, April 19, 2021, Nico Schottelius <
> nico.schottel...@ungleich.ch>
> > wrote:
> >
> >>
> >> The best questions are the ones that one can answer oneself.
> >> The great documentation on
> >>
> >> https://docs.ceph.com/en/latest/dev/ceph-volume/lvm/
> >>
> >> gives the right pointers. The right search term is "lvm list tags" and
> >> results into something like this:
> >>
> >> [15:56:04] server20.place6:~# lvs -o lv_tags
> >>   /dev/sda: open failed: No medium found
> >>   /dev/sdb: open failed: No medium found
> >>   LV Tags
> >>   ceph.block_device=/dev/ceph-26fdb7c4-17af-42af-8353-
> >> 06d95b0071c7/osd-block-bb63e9b6-b2d7-40d1-83ee-815262ae8b45,...
> >>
> >> If anyone is interested in upstream ceph-volume support for non-systemd
> >> Linux distributions to activate the ceph-osds, let me know.
> >>
> >> In any case, we'll publish our new style scripts on [0].
> >>
> >> Best regards,
> >>
> >> Nico
> >>
> >> [0] https://code.ungleich.ch/ungleich-public/ungleich-tools
> >>
> >> Nico Schottelius  writes:
> >>
> >> > Good morning,
> >> >
> >> > is there any documentation available regarding the meta data stored
> >> > within LVM that ceph-volume manages / creates?
> >> >
> >> > My background is that ceph-volume activate does not work on
> non-systemd
> >> > Linux distributions, but if I know how to recreate the tmpfs, we can
> >> > easily start the osd without systemd.
> >> >
> >> > Any pointers in the right direction are appreciated.
> >> >
> >> > Best regards,
> >> >
> >> > Nico
> >>
> >>
> >> --
> >> Sustainable and modern Infrastructures by ungleich.ch
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
> >>
>
>
> --
> Sustainable and modern Infrastructures by ungleich.ch
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Logging to Graylog

2021-04-19 Thread Andrew Walker-Brown
Hi All,

I want to send Ceph logs out to an external Graylog server.  I’ve configured 
the Graylog host IP using “ceph config set global log_graylog_host x.x.x.x” and 
enabled logging through the Ceph dashboard (I’m running Octopus 15.2.9 – 
container based).  I’ve also setup a GELF UDP input on Graylog.

Do I need to do anything else?  Any services that need to be restarted?

Cheers

Andrew

Sent from Mail for Windows 10
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: BlueFS spillover detected (Nautilus 14.2.16)

2021-04-19 Thread by morphin
Thanks for the answer. It seems very easy.
I've never played with rocksdb options before. I always used default
and I think I need to play more with it but I couldn't find a good
config reference to understand at ceph side.
Can I use this guide instead?
https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide

This is the default options: "ceph config help bluestore_rocksdb_options"

bluestore_rocksdb_options - Rocksdb options
  (str, advanced)
  Default: 
compression=kNoCompression,max_write_buffer_number=4,min_write_buffer_number_to_merge=1,recycle_log_file_num=4,write_buffer_size=268435456,writable_file_max_buffer_size=0,compaction_readahead_size=2097152,max_background_compactions=2
  Can update at runtime: false


> * get you release options `ceph config help bluestore_rocksdb_options`
> * append `bluestore_rocksdb_options=536870912` to this list
> * set `ceph config set osd bluestore_rocksdb_options `

 Are you tried to say add these (below) options to the config?

- options.max_bytes_for_level_base = 536870912; // 512MB
- options.max_bytes_for_level_multiplier = 10;


At the link below I've found a tune but its for All-Flash cluster.

https://ceph.io/community/bluestore-default-vs-tuned-performance-comparison/
bluestore_rocksdb_options =
compression=kNoCompression,max_write_buffer_number=32,min_write_buffer_number_to_merge=2,recycle_log_file_num=32,compaction_style=kCompactionStyleLevel,write_buffer_size=67108864,target_file_size_base=67108864,max_background_compactions=31,level0_file_num_compaction_trigger=8,level0_slowdown_writes_trigger=32,level0_stop_writes_trigger=64,max_bytes_for_level_base=536870912,compaction_threads=32,max_bytes_for_level_multiplier=8,flusher_threads=8,compaction_readahead_size=2MB

Also I've 2 type OSD's. SSD's for RGW index(no external db) and HDD's
for EC pool with NVME cache. Do I need different options?






Konstantin Shalygin , 19 Nis 2021 Pzt, 20:16 tarihinde
şunu yazdı:
>
> You need to adjust max_bytes_for_level_base rocksdb option from 
> bluestore_rocksdb_options to 536870912
>
> * get you release options `ceph config help bluestore_rocksdb_options`
> * append `bluestore_rocksdb_options=536870912` to this list
> * set `ceph config set osd bluestore_rocksdb_options `
>
> Restart your OSD's.
>
>
> k
>
> On 19 Apr 2021, at 17:34, by morphin  wrote:
>
> How can I change the level up to 500MB --> 5GB --> 50GB ?
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] HBA vs caching Raid controller

2021-04-19 Thread Nico Schottelius


Good evening,

I've to tackle an old, probably recurring topic: HBAs vs. Raid
controllers. While generally speaking many people in the ceph field
recommend to go with HBAs, it seems in our infrastructure the only
server we phased in with an HBA vs. raid controller is actually doing
worse in terms of latency.

For the background: we have many Perc H800+MD1200 [1] systems running with
10TB HDDs (raid0, read ahead, writeback cache).
One server has LSI SAS3008 [0] instead of the Perc H800,
which comes with 512MB RAM + BBU. On most servers latencies are around
4-12ms (average 6ms), on the system with the LSI controller we see
20-60ms (average 30ms) latency.

Now, my question is, are we doing some inherently wrong with the
SAS3008 or does in fact the cache help to possible reduce seek time?

We were considering to move more towards LSI HBAs to reduce maintenance
effort, however if we have a factor of 5 in latency between the two
different systems, it might be better to stay on the H800 path for
disks.

Any input/experiences appreciated.

Best regards,

Nico

[0]
05:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS3008 
PCI-Express Fusion-MPT SAS-3 (rev 02)
Subsystem: Dell 12Gbps HBA
Kernel driver in use: mpt3sas
Kernel modules: mpt3sas

[1]
08:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2108 
[Liberator] (rev 05)
Subsystem: Dell PERC H800 Adapter
Kernel driver in use: megaraid_sas
Kernel modules: megaraid_sas

--
Sustainable and modern Infrastructures by ungleich.ch
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: HBA vs caching Raid controller

2021-04-19 Thread Marc
> For the background: we have many Perc H800+MD1200 [1] systems running
> with
> 10TB HDDs (raid0, read ahead, writeback cache).
> One server has LSI SAS3008 [0] instead of the Perc H800,
> which comes with 512MB RAM + BBU. On most servers latencies are around
> 4-12ms (average 6ms), on the system with the LSI controller we see
> 20-60ms (average 30ms) latency.

How did you get these latencies? Then I can show you maybe what I have with the 
SAS2308.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: HBA vs caching Raid controller

2021-04-19 Thread Nico Schottelius


Marc  writes:

>> For the background: we have many Perc H800+MD1200 [1] systems running
>> with
>> 10TB HDDs (raid0, read ahead, writeback cache).
>> One server has LSI SAS3008 [0] instead of the Perc H800,
>> which comes with 512MB RAM + BBU. On most servers latencies are around
>> 4-12ms (average 6ms), on the system with the LSI controller we see
>> 20-60ms (average 30ms) latency.
>
> How did you get these latencies? Then I can show you maybe what I have with 
> the SAS2308.

Via grafana->prometheus->ceph-mgr:


avg by (hostname) (ceph_osd_apply_latency_ms{dc="$place"} * on
(ceph_daemon) group_left(hostname) ceph_osd_metadata{dc="$place"})


where $place = the data center name. I cross checked the numbers with
the OSDs using


ceph_osd_apply_latency_ms{dc="$place"}


which showed that all OSDs attached to that controller are in a similar
range, so the average above is not hiding "one bad osd".

Does that help?


--
Sustainable and modern Infrastructures by ungleich.ch
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: HBA vs caching Raid controller

2021-04-19 Thread Marc



This is what I have when I query prometheus, most hdd's are still sata 5400rpm, 
there are also some ssd's. I also did not optimize cpu frequency settings. 
(forget about the instance=c03, that is just because the data comes from mgr 
c03, these drives are on different hosts)

ceph_osd_apply_latency_ms

ceph_osd_apply_latency_ms{ceph_daemon="osd.0", instance="c03", job="ceph"}  
11
ceph_osd_apply_latency_ms{ceph_daemon="osd.1", instance="c03", job="ceph"}  
5
ceph_osd_apply_latency_ms{ceph_daemon="osd.10", instance="c03", job="ceph"} 
8
ceph_osd_apply_latency_ms{ceph_daemon="osd.11", instance="c03", job="ceph"} 
33
ceph_osd_apply_latency_ms{ceph_daemon="osd.12", instance="c03", job="ceph"} 
42
ceph_osd_apply_latency_ms{ceph_daemon="osd.13", instance="c03", job="ceph"} 
17
ceph_osd_apply_latency_ms{ceph_daemon="osd.14", instance="c03", job="ceph"} 
27
ceph_osd_apply_latency_ms{ceph_daemon="osd.15", instance="c03", job="ceph"} 
15
ceph_osd_apply_latency_ms{ceph_daemon="osd.16", instance="c03", job="ceph"} 
14
ceph_osd_apply_latency_ms{ceph_daemon="osd.17", instance="c03", job="ceph"} 
4
ceph_osd_apply_latency_ms{ceph_daemon="osd.18", instance="c03", job="ceph"} 
18
ceph_osd_apply_latency_ms{ceph_daemon="osd.19", instance="c03", job="ceph"} 
1
ceph_osd_apply_latency_ms{ceph_daemon="osd.2", instance="c03", job="ceph"}  
14
ceph_osd_apply_latency_ms{ceph_daemon="osd.20", instance="c03", job="ceph"} 0
ceph_osd_apply_latency_ms{ceph_daemon="osd.21", instance="c03", job="ceph"} 0
ceph_osd_apply_latency_ms{ceph_daemon="osd.22", instance="c03", job="ceph"} 
9
ceph_osd_apply_latency_ms{ceph_daemon="osd.23", instance="c03", job="ceph"} 
2
ceph_osd_apply_latency_ms{ceph_daemon="osd.24", instance="c03", job="ceph"} 0
ceph_osd_apply_latency_ms{ceph_daemon="osd.25", instance="c03", job="ceph"} 
15
ceph_osd_apply_latency_ms{ceph_daemon="osd.26", instance="c03", job="ceph"} 
18
ceph_osd_apply_latency_ms{ceph_daemon="osd.27", instance="c03", job="ceph"} 0
ceph_osd_apply_latency_ms{ceph_daemon="osd.28", instance="c03", job="ceph"} 
4
ceph_osd_apply_latency_ms{ceph_daemon="osd.29", instance="c03", job="ceph"} 0
ceph_osd_apply_latency_ms{ceph_daemon="osd.3", instance="c03", job="ceph"}  
10
ceph_osd_apply_latency_ms{ceph_daemon="osd.30", instance="c03", job="ceph"} 0
ceph_osd_apply_latency_ms{ceph_daemon="osd.31", instance="c03", job="ceph"} 
2
ceph_osd_apply_latency_ms{ceph_daemon="osd.32", instance="c03", job="ceph"} 0
ceph_osd_apply_latency_ms{ceph_daemon="osd.33", instance="c03", job="ceph"} 
1
ceph_osd_apply_latency_ms{ceph_daemon="osd.34", instance="c03", job="ceph"} 0
ceph_osd_apply_latency_ms{ceph_daemon="osd.35", instance="c03", job="ceph"} 
2
ceph_osd_apply_latency_ms{ceph_daemon="osd.36", instance="c03", job="ceph"} 
2
ceph_osd_apply_latency_ms{ceph_daemon="osd.37", instance="c03", job="ceph"} 0
ceph_osd_apply_latency_ms{ceph_daemon="osd.38", instance="c03", job="ceph"} 0
ceph_osd_apply_latency_ms{ceph_daemon="osd.39", instance="c03", job="ceph"} 
1
ceph_osd_apply_latency_ms{ceph_daemon="osd.4", instance="c03", job="ceph"}  
11
ceph_osd_apply_latency_ms{ceph_daemon="osd.40", instance="c03", job="ceph"} 
8
ceph_osd_apply_latency_ms{ceph_daemon="osd.41", instance="c03", job="ceph"} 
5
ceph_osd_apply_latency_ms{ceph_daemon="osd.5", instance="c03", job="ceph"}  
12
ceph_osd_apply_latency_ms{ceph_daemon="osd.6", instance="c03", job="ceph"}  
18
ceph_osd_apply_latency_ms{ceph_daemon="osd.7", instance="c03", job="ceph"}  
8
ceph_osd_apply_latency_ms{ceph_daemon="osd.8", instance="c03", job="ceph"}  
33
ceph_osd_apply_latency_ms{ceph_daemon="osd.9", instance="c03", job="ceph"}  
22

avg (ceph_osd_apply_latency_ms)
9.336


So I guess it is possible for you to get lower values on the lsi hba

Maybe you can tune read a head on the lsi with something like this.
echo 8192 > /sys/block/$line/queue/read_ahead_kb
echo 1024 > /sys/block/$line/queue/nr_requests

Also check for pci-e 3 those have higher bus speeds.



> -Original Message-
> Sent: 19 April 2021 20:57
> Subject: Re: [ceph-users] HBA vs caching Raid controller
> 
> 
> 
> >> For the background: we have many Perc H800+MD1200 [1] systems running
> >> with
> >> 10TB HDDs (raid0, read ahead, writeback cache).
> >> One server has LSI SAS3008 [0] instead of the Perc H800,
> >> which comes with 512MB RAM + BBU. On most servers latencies are
> around
> >> 4-12ms (average 6ms), on the system with the LSI controller we see
> >> 20-60ms (average 30ms) latency.
> >
> > How did you get these latencies? Then I can show you maybe what I have
> with the SAS2308.
> 
> Via grafana->prometheus->ceph-mgr:
> 
> 
> 
> avg by (hostname) (ceph_osd_apply_latency_ms{dc="$place"} * on
> (ceph_daemon) group_left(hostname) ceph_osd_met

[ceph-users] Re: any experience on using Bcache on top of HDD OSD

2021-04-19 Thread Richard Bade
Hi,
I also have used bcache extensively on filestore with journals on SSD
for at least 5 years. This has worked very well in all versions up to
luminous. The iops improvement was definitely beneficial for vm disk
images in rbd. I am also using it under bluestore with db/wal on nvme
on both Luminous and Nautilus. This is a smaller cluster also for vm
disk images on rbd, and the iops appeared to be improved. However as
Matthias mentioned, be sure to get the right types of SSD's as you can
have strange performance issues. Make sure you do fio testing as I
found on paper Intel DC S4600 looked ok but performed very badly in
sequential 4k writes. Also keep an eye on your drive writes per day as
in a busy cluster you may hit 1DWPD or more.
Stability has been very good with bcache. I'm using it on Ubuntu and
have had to use newer kernels on some of the historical LTS releases.
In 16.04 the 4.15 edge kernel is recommended, newer versions stock
kernel is fine.
Rich

On Tue, 20 Apr 2021 at 03:37, Matthias Ferdinand  wrote:
>
> On Sun, Apr 18, 2021 at 10:31:30PM +0200, huxia...@horebdata.cn wrote:
> > Dear Cephers,
> >
> > Just curious about any one who has some experience on using Bcache on top 
> > of HDD OSD to accelerate IOPS performance?
> >
> > If any, how about the stability and the performance improvement, and for 
> > how long the running time?
>
> Hi,
>
> I have not used bcache with Bluestore, but I use bcache in a Jewel
> cluster with Filestore on XFS on bcache on HDD, and I haven't seen any
> bcache-related trouble with this setup so far. I don't have journal on
> bcache, journal is separated out to SSD.
> From time to time I drained and detached the caching device from an OSD
> just to see if the added complexity still has some value, but latency
> (as measured by iostat on the OSD) would go up by a factor of about 2 so
> I kept bcache active on HDD OSDs.
>
> For Bluestore I don't know how much (if at all) bcache would improve
> performance where WAL/DB is placed on SSD already.
>
> A word of warning: never use non-DC-class SSDs for bcache caching
> devices. bcache does some write amplification, and you will regret it if
> using consumer grade SSDs. Might even turn out slower than plain HDD
> with highly irregular latency spikes.
>
>
> Regards
> Matthias
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] EC Backfill Observations

2021-04-19 Thread Josh Baergen
Hey all,

I wanted to confirm my understanding of some of the mechanics of
backfill in EC pools. I've yet to find a document that outlines this
in detail; if there is one, please send it my way. :) Some of what I
write below is likely in the "well, duh" category, but I tended
towards completeness.

First off, I understand that backfill reservations work the same way
between replicated pools and EC pools. A local reservation is taken on
the primary OSD, then a remote reservation on the backfill target(s),
before the backfill is allowed to begin. Until this point, the
backfill is in the backfill_wait state.

When the backfill begins, though, is when the differences begin. Let's
say we have an EC 3:2 PG that's backfilling from OSD 2 to OSD 5
(formatted here like pgs_brief):

1.1  active+remapped+backfilling   [0,1,5,3,4]  0   [0,1,2,3,4]  0

The question in my mind was: Where is the data for this backfill
coming from? In replicated pools, all reads come from the primary.
However, in this case, the primary does not have the data in question;
the primary has to either read the EC chunk from OSD 2, or it has to
reconstruct it by reading from 3 of the OSDs in the acting set.

Based on observation, I _think_ this is what happens:
1. As long as the PG is not degraded, the backfill read is simply
forwarded by the primary to OSD 2.
2. Once the PG becomes degraded, the backfill read needs to use the
reconstructing path, and begins reading from 3 of the OSDs in the
acting set.

Questions:
1. Can anyone confirm or correct my description of how EC backfill
operates? In particular, in case 2 above, does it matter whether OSD 2
is the cause of degradation, for example? Does the read still get
forwarded to a single OSD when it's parity chunks that are being moved
via backfill?
2. I'm curious as to why a 3rd reservation, for the source OSD, wasn't
introduced as a part of EC in Ceph. We've occasionally seen an OSD
become overloaded because several backfills were reading from it
simultaneously, and there's no way to control this via the normal
osd_max_backfills mechanism. Is anyone aware of discussions to this
effect?

Thanks!
Josh
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] v15.2.11 Octopus released

2021-04-19 Thread David Galloway
This is the 11th bugfix release in the Octopus stable series.  It
addresses a security vulnerability in the Ceph authentication framework.
We recommend users to update to this release. For a detailed release
notes with links & changelog please refer to the official blog entry at
https://ceph.io/releases/v15-2-11-octopus-released

Security Fixes
--

* This release includes a security fix that ensures the global_id value
(a numeric value that should be unique for every authenticated client or
daemon in the cluster) is reclaimed after a network disconnect or ticket
renewal in a secure fashion. Two new health alerts may appear during the
upgrade indicating that there are clients or daemons that are not yet
patched with the appropriate fix.

Getting Ceph

* Git at git://github.com/ceph/ceph.git
* Tarball at http://download.ceph.com/tarballs/ceph-15.2.11.tar.gz
* For packages, see http://docs.ceph.com/docs/master/install/get-packages/
* Release git sha1: e3523634d9c2227df9af89a4eac33d16738c49cb

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] v14.2.20 Nautilus released

2021-04-19 Thread David Galloway
This is the 20th bugfix release in the Nautilus stable series.  It
addresses a security vulnerability in the Ceph authentication framework.
We recommend users to update to this release. For a detailed release
notes with links & changelog please refer to the official blog entry at
https://ceph.io/releases/v14-2-20-nautilus-released

Security Fixes
--

* This release includes a security fix that ensures the global_id value
(a numeric value that should be unique for every authenticated client or
daemon in the cluster) is reclaimed after a network disconnect or ticket
renewal in a secure fashion.  Two new health alerts may appear during
the upgrade indicating that there are clients or daemons that are not
yet patched with the appropriate fix.

Getting Ceph

* Git at git://github.com/ceph/ceph.git
* Tarball at http://download.ceph.com/tarballs/ceph-14.2.20.tar.gz
* For packages, see http://docs.ceph.com/docs/master/install/get-packages/
* Release git sha1: 36274af6eb7f2a5055f2d53ad448f2694e9046a0

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] v16.2.1 Pacific released

2021-04-19 Thread David Galloway
This is the first bugfix release in the Pacific stable series. It
addresses a security vulnerability in the Ceph authentication framework.
 We recommend users to update to this release. For a detailed release
notes with links & changelog please refer to the official blog entry at
https://ceph.io/releases/v16-2-1-pacific-released

Security Fixes
--

* This release includes a security fix that ensures the global_id value
(a numeric value that should be unique for every authenticated client or
daemon in the cluster) is reclaimed after a network disconnect or ticket
renewal in a secure fashion.  Two new health alerts may appear during
the upgrade indicating that there are clients or daemons that are not
yet patched with the appropriate fix.

Getting Ceph

* Git at git://github.com/ceph/ceph.git
* Tarball at http://download.ceph.com/tarballs/ceph-16.2.1.tar.gz
* For packages, see http://docs.ceph.com/docs/master/install/get-packages/
* Release git sha1: afb9061ab4117f798c858c741efa6390e48ccf10

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: any experience on using Bcache on top of HDD OSD

2021-04-19 Thread huxia...@horebdata.cn
Dear Mattias,

Very glad to know that your setting with Bcache works well in production.

How long have you been puting XFS on bcache on HDD in production?  Which bcache 
version (i mean the kernel) do you use? or do you use a special version of 
bcache?

thanks in advance,

samuel





huxia...@horebdata.cn
 
From: Matthias Ferdinand
Date: 2021-04-19 17:35
To: huxia...@horebdata.cn
CC: ceph-users
Subject: Re: [ceph-users] any experience on using Bcache on top of HDD OSD
On Sun, Apr 18, 2021 at 10:31:30PM +0200, huxia...@horebdata.cn wrote:
> Dear Cephers,
> 
> Just curious about any one who has some experience on using Bcache on top of 
> HDD OSD to accelerate IOPS performance? 
> 
> If any, how about the stability and the performance improvement, and for how 
> long the running time?
 
Hi,
 
I have not used bcache with Bluestore, but I use bcache in a Jewel
cluster with Filestore on XFS on bcache on HDD, and I haven't seen any
bcache-related trouble with this setup so far. I don't have journal on
bcache, journal is separated out to SSD.
From time to time I drained and detached the caching device from an OSD
just to see if the added complexity still has some value, but latency
(as measured by iostat on the OSD) would go up by a factor of about 2 so
I kept bcache active on HDD OSDs.
 
For Bluestore I don't know how much (if at all) bcache would improve
performance where WAL/DB is placed on SSD already.
 
A word of warning: never use non-DC-class SSDs for bcache caching
devices. bcache does some write amplification, and you will regret it if
using consumer grade SSDs. Might even turn out slower than plain HDD
with highly irregular latency spikes.
 
 
Regards
Matthias
 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io