[ceph-users] Re: Upgrade from 17.2.5 to 17.2.6 stuck at MDS

2023-04-11 Thread Xiubo Li


On 4/11/23 03:24, Thomas Widhalm wrote:

Hi,

If you remember, I hit bug https://tracker.ceph.com/issues/58489 so I 
was very relieved when 17.2.6 was released and started to update 
immediately.



Please note, this fix is not in the v17.2.6 yet in upstream code.

Thanks

- Xiubo


But now I'm stuck again with my broken MDS. MDS won't get into 
up:active without the update but the update waits for them to get into 
up:active state. Seems like a deadlock / chicken-egg problem to me.


Since I'm still relatively new to Ceph, could you help me?

What I see when watching the update status:

{
    "target_image": 
"quay.io/ceph/ceph@sha256:1161e35e4e02cf377c93b913ce78773f8413f5a8d7c5eaee4b4773a4f9dd6635",

    "in_progress": true,
    "which": "Upgrading all daemon types on all hosts",
    "services_complete": [
    "crash",
    "mgr",
"mon",
"osd"
    ],
    "progress": "18/40 daemons upgraded",
    "message": "Error: UPGRADE_OFFLINE_HOST: Upgrade: Failed to 
connect to host ceph01 at addr (192.168.23.61)",

    "is_paused": false
}

(The offline host was one host that broke during the upgrade. I fixed 
that in the meantime and the update went on.)


And in the log:

2023-04-10T19:23:48.750129+ mgr.ceph04.qaexpv [INF] Upgrade: 
Waiting for mds.mds01.ceph04.hcmvae to be up:active (currently up:replay)
2023-04-10T19:23:58.758141+ mgr.ceph04.qaexpv [WRN] Upgrade: No 
mds is up; continuing upgrade procedure to poke things in the right 
direction



Please give me a hint what I can do.

Cheers,
Thomas

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Upgrade from 17.2.5 to 17.2.6 stuck at MDS

2023-04-11 Thread Thomas Widhalm



On 11.04.23 09:16, Xiubo Li wrote:


On 4/11/23 03:24, Thomas Widhalm wrote:

Hi,

If you remember, I hit bug https://tracker.ceph.com/issues/58489 so I 
was very relieved when 17.2.6 was released and started to update 
immediately.



Please note, this fix is not in the v17.2.6 yet in upstream code.



Thanks for the information. I misread the information in the tracker. Do 
you have a predicted schedule for the backport? Or should I go for a 
specific pre-release? I don't want to take chances but I'm desperate 
because my production system is affected and offline for several weeks now.


Thanks,
Thomas


Thanks

- Xiubo


But now I'm stuck again with my broken MDS. MDS won't get into 
up:active without the update but the update waits for them to get into 
up:active state. Seems like a deadlock / chicken-egg problem to me.


Since I'm still relatively new to Ceph, could you help me?

What I see when watching the update status:

{
    "target_image": 
"quay.io/ceph/ceph@sha256:1161e35e4e02cf377c93b913ce78773f8413f5a8d7c5eaee4b4773a4f9dd6635",

    "in_progress": true,
    "which": "Upgrading all daemon types on all hosts",
    "services_complete": [
    "crash",
    "mgr",
"mon",
"osd"
    ],
    "progress": "18/40 daemons upgraded",
    "message": "Error: UPGRADE_OFFLINE_HOST: Upgrade: Failed to 
connect to host ceph01 at addr (192.168.23.61)",

    "is_paused": false
}

(The offline host was one host that broke during the upgrade. I fixed 
that in the meantime and the update went on.)


And in the log:

2023-04-10T19:23:48.750129+ mgr.ceph04.qaexpv [INF] Upgrade: 
Waiting for mds.mds01.ceph04.hcmvae to be up:active (currently up:replay)
2023-04-10T19:23:58.758141+ mgr.ceph04.qaexpv [WRN] Upgrade: No 
mds is up; continuing upgrade procedure to poke things in the right 
direction



Please give me a hint what I can do.

Cheers,
Thomas

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




OpenPGP_signature
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Disks are filling up even if there is not a single placement group on them

2023-04-11 Thread Michal Strnad

Hi.

Thank you for the explanation. I get it now.

Michal



On 4/10/23 20:44, Alexander E. Patrakov wrote:

On Sat, Apr 8, 2023 at 2:26 PM Michal Strnad  wrote:

cluster:
  id: a12aa2d2-fae7-df35-ea2f-3de23100e345
  health: HEALTH_WARN

...

  pgs: 1656117639/32580808518 objects misplaced (5.083%)


That's why the space is eaten. The stuff that eats the disk space on
MONs is osdmaps, and the MONs have to keep old osdmaps back to the
moment in the past when the cluster was 100% healthy. Note that
osdmaps are also copied to all OSDs and eat space there, which is what
you have seen.

The relevant (but dangerous) configuration parameter is
"mon_osd_force_trim_to". Better don't use it, and let your ceph
cluster recover. If you can't wait, try to use upmaps to say that all
PGs are fine where they are now, i.e that they are not misplaced.
There is a script somewhere on GitHub that does this, but
unfortunately I can't find it right now.


--
Alexander E. Patrakov
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Why is my cephfs almostfull?

2023-04-11 Thread Frank Schilder
Hi Jorge,

firstly, it would be really helpful if you would not truncate output of ceph 
status or omit output of commands you refer to, like ceph df. We have seen way 
too many examples where the clue was in the omitted part.

Without any information, my bets in order are (according to many cases of this 
type on this list):

- the pool does not actually use all OSDs
- you have an imbalance in your cluster and at least one OSD/failure domain is 
85-90% full
- you have a huge amount of small files/objects in the data pool and suffer 
from allocation amplification
- you have a quota on the data pool
- there is an error in the crush map

If you provide a reasonable amount of information, like full output of 'ceph 
status', 'ceph df detail' and 'ceph osd df tree' (please use 
https://pastebin.com/), I'm willing to give it a second try. You may also - 
before replying - investigate a bit on your own to see if there is any 
potentially relevant information *additional* to the full output of these 
commands. Anything else that looks odd.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Jorge Garcia 
Sent: Thursday, April 6, 2023 1:09 AM
To: ceph-users
Subject: [ceph-users] Why is my cephfs almostfull?

We have a ceph cluster with a cephfs filesystem that we use mostly for
backups. When I do a "ceph -s" or a "ceph df", it reports lots of space:

 data:
   pools:   3 pools, 4104 pgs
   objects: 1.09 G objects, 944 TiB
   usage:   1.5 PiB used, 1.0 PiB / 2.5 PiB avail

   GLOBAL:
 SIZEAVAIL   RAW USED %RAW USED
 2.5 PiB 1.0 PiB  1.5 PiB 59.76
   POOLS:
 NAMEID USED%USED MAX AVAIL OBJECTS
 cephfs_data 2  944 TiB 87.63   133 TiB 880988429
 cephfs_metadata 3  128 MiB 062 TiB 206535313
 .rgw.root   4  0 B 062
TiB 0

The whole thing consists of 2 pools: metadata (regular default
replication) and data (erasure k:5 m:2). The global raw space reports
2.5 PiB total, with 1.0 PiB still available. But, when the ceph
filesystem is mounted, it only reports 1.1 PB total, and the filesystem
is almost full:

Filesystem Size  Used Avail Use% Mounted on
x.x.x.x::/1.1P  944T  134T  88% /backups

So, where is the rest of my space? Or what am I missing?

Thanks!
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Module 'cephadm' has failed: invalid literal for int() with base 10:

2023-04-11 Thread Duncan M Tooke
Hi,

Our Ceph cluster is in an error state with the message:

# ceph status
  cluster:
id: 58140ed2-4ed4-11ed-b4db-5c6f69756a60
health: HEALTH_ERR
Module 'cephadm' has failed: invalid literal for int() with base 
10: '352.broken'

This happened after trying to re-add an OSD which had failed. Adopting it back 
in to the Ceph failed because a directory was causing problems in 
/var/lib/ceph/{cephid}/osd.352. To re-add the OSD I renamed it to 
osd.352.broken (rather than delete it), re-ran the command and then everything 
worked perfectly. Then 5 minutes later the ceph orchestrator went into 
"HEALTH_ERR"

I've removed that directory, but "cephadm" isn't cleaning up after itself. Does 
anyone know if there's a way I can clear the cache for this directory it's 
tried to inventory and failed?

Thanks,

Duncan
--
Dr Duncan Tooke | Research Cluster Administrator
Centre for Computational Biology, Weatherall Institute of Molecular Medicine,
University of Oxford, OX3 9DS
www.imm.ox.ac.uk

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Some hint for a DELL PowerEdge T440/PERC H750 Controller...

2023-04-11 Thread Marco Gaiarin
Mandi! Matthias Ferdinand
  In chel di` si favelave...

> To check current state:
> sdparm --get=WCE /dev/sdf
> /dev/sdf: SEAGATE   ST2000NM0045  DS03
> WCE 0  [cha: y, def:  0, sav:  0]
> "WCE 0" means: off
> "sav: 0" means: off next time the disk is powered on

Checking current state lead to:

 root@pppve1:~# sdparm --get=WCE /dev/sdd
 /dev/sdd: ATA   HGST HUS726T4TAL  PV07
 WCE   0  [cha: y]

So seems off, right?!

-- 
  Voi non ci crederete
  la mia ragazza sogna  (R. Vecchioni)

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Some hint for a DELL PowerEdge T440/PERC H750 Controller...

2023-04-11 Thread Marco Gaiarin
Mandi! Anthony D'Atri
  In chel di` si favelave...

> Dell???s CLI guide describes setting individual drives in Non-RAID, which 
> *smells* like passthrough, not the more-complex RAID0 workaround we had to do 
> before passthrough.
> https://www.dell.com/support/manuals/en-nz/perc-h750-sas/perc_cli_rg/set-drive-state-commands?guid=guid-d4750845-1f57-434c-b4a9-935876ee1a8e&lang=en-us???

Exactly. NonRAID 'smells' also to us more like passthrough. The firs strange
things came from the fact that SATA/SSD disk are passed in a way that seems
'fully transparent' (eg, linux see even the disk S/N), while SAS disk are
passed more like an RAID0 disk, with a different S/N.


> Spinners are slow, this is news?

;-)

> That said, how slow is slow?  Testing commands and results or it didn???t 
> happen.

A test done some month ago:

root@pppve1:~# fio --filename=/dev/sdc --direct=1 --rw=randrw --bs=128k 
--ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 --time_based 
--group_reporting --name=hdd-rw-128
hdd-rw-128: (g=0): rw=randrw, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 
128KiB-128KiB, ioengine=libaio, iodepth=256
...
fio-3.12
Starting 4 processes
Jobs: 4 (f=4): [m(4)][0.0%][eta 08d:17h:11m:29s]
 
hdd-rw-128: (groupid=0, jobs=4): err= 0: pid=26198: Wed May 18 19:11:04 2022
  read: IOPS=84, BW=10.5MiB/s (11.0MB/s)(1279MiB/121557msec)
slat (usec): min=4, max=303887, avg=23029.19, stdev=61832.29
clat (msec): min=1329, max=6673, avg=4737.71, stdev=415.84
 lat (msec): min=1543, max=6673, avg=4760.74, stdev=420.10
clat percentiles (msec):
 |  1.00th=[ 2802],  5.00th=[ 4329], 10.00th=[ 4463], 20.00th=[ 4530],
 | 30.00th=[ 4597], 40.00th=[ 4665], 50.00th=[ 4732], 60.00th=[ 4799],
 | 70.00th=[ 4866], 80.00th=[ 4933], 90.00th=[ 5134], 95.00th=[ 5336],
 | 99.00th=[ 5805], 99.50th=[ 6007], 99.90th=[ 6342], 99.95th=[ 6409],
 | 99.99th=[ 6611]
   bw (  KiB/s): min=  256, max= 5120, per=25.18%, avg=2713.08, stdev=780.45, 
samples=929
   iops: min=2, max=   40, avg=21.13, stdev= 6.10, samples=929
  write: IOPS=87, BW=10.9MiB/s (11.5MB/s)(1328MiB/121557msec); 0 zone resets
slat (usec): min=9, max=309914, avg=23025.13, stdev=61676.77
clat (msec): min=1444, max=13086, avg=6943.12, stdev=2068.26
 lat (msec): min=1543, max=13086, avg=6966.15, stdev=2069.28
clat percentiles (msec):
 |  1.00th=[ 2769],  5.00th=[ 4597], 10.00th=[ 4799], 20.00th=[ 5067],
 | 30.00th=[ 5403], 40.00th=[ 5873], 50.00th=[ 6409], 60.00th=[ 7148],
 | 70.00th=[ 8020], 80.00th=[ 9060], 90.00th=[10134], 95.00th=[10671],
 | 99.00th=[11610], 99.50th=[11879], 99.90th=[12550], 99.95th=[12550],
 | 99.99th=[12684]
   bw (  KiB/s): min=  256, max= 5376, per=24.68%, avg=2762.20, stdev=841.30, 
samples=926
   iops: min=2, max=   42, avg=21.52, stdev= 6.56, samples=926
  cpu  : usr=0.05%, sys=0.09%, ctx=2847, majf=0, minf=49
  IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.2%, 16=0.3%, 32=0.6%, >=64=98.8%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
 issued rwts: total=10233,10627,0,0 short=0,0,0,0 dropped=0,0,0,0
 latency   : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
   READ: bw=10.5MiB/s (11.0MB/s), 10.5MiB/s-10.5MiB/s (11.0MB/s-11.0MB/s), 
io=1279MiB (1341MB), run=121557-121557msec
  WRITE: bw=10.9MiB/s (11.5MB/s), 10.9MiB/s-10.9MiB/s (11.5MB/s-11.5MB/s), 
io=1328MiB (1393MB), run=121557-121557msec

Disk stats (read/write):
  sdc: ios=10282/10601, merge=0/0, ticks=3041312/27373721, in_queue=30373472, 
util=99.99%


> Also, firmware matters.  Run Dell???s DSU.

Controller does not have the latest-latest firmware, but a decent new one;
i've looked at chaneglogs and found nothing that seems relevant to
performance trouble.
Indeed, i'll do an upgrade ASAP.


> Give us details, perccli /c0 show, test results etc.  

root@pppve1:~# perccli /c0 show
Generating detailed summary of the adapter, it may take a while to complete.

CLI Version = 007.1910.. Oct 08, 2021
Operating system = Linux 5.4.203-1-pve
Controller = 0
Status = Success
Description = None

Product Name = PERC H750 Adapter
Serial Number = 23L01Y6
SAS Address =  5f4ee0802ba3a400
PCI Address = 00:b3:00:00
System Time = 04/11/2023 12:11:50
Mfg. Date = 03/25/22
Controller Time = 04/11/2023 10:11:47
FW Package Build = 52.16.1-4405
BIOS Version = 7.16.00.0_0x07100501
FW Version = 5.160.02-3552
Driver Name = megaraid_sas
Driver Version = 07.713.01.00-rc1
Current Personality = RAID-Mode 
Vendor Id = 0x1000
Device Id = 0x10E2
SubVendor Id = 0x1028
SubDevice Id = 0x2176
Host Interface = PCI-E
Device Interface = SAS-12G
Bus Number = 179
Device Number = 0
Function Number = 0
Domain ID = 0
Security Protocol = None
JBOD Drives = 6

JBOD LIST :
=


[ceph-users] Announcing go-ceph v0.21.0

2023-04-11 Thread John Mulligan
We are happy to announce another release of the go-ceph API library.
This is a regular release following our every-two-months release
cadence.


https://github.com/ceph/go-ceph/releases/tag/v0.21.0

Changes include additions to the rbd, cephfs, and cephfs/admin packages.
More details are available at the link above.

The library includes bindings that aim to play a similar role to the
"pybind" python bindings in the ceph tree but for the Go language. The
library also includes additional APIs that can be used to administer
cephfs, rbd, and rgw subsystems.

There are already a few consumers of this library in the wild,
including the ceph-csi project.


-- 
John Mulligan

phlogistonj...@asynchrono.us
jmulli...@redhat.com


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: deploying Ceph using FQDN for MON / MDS Services

2023-04-11 Thread Eugen Block
What ceph version is this? Could it be this bug [1]? Although the  
error message is different, not sure if it could be the same issue,  
and I don't have anything to test ipv6 with.


[1] https://tracker.ceph.com/issues/47300

Zitat von Lokendra Rathour :


Hi All,
Requesting any inputs around the issue raised.

Best Regards,
Lokendra

On Tue, 24 Jan, 2023, 7:32 pm Lokendra Rathour, 
wrote:


Hi Team,



We have a ceph cluster with 3 storage nodes:

1. storagenode1 - abcd:abcd:abcd::21

2. storagenode2 - abcd:abcd:abcd::22

3. storagenode3 - abcd:abcd:abcd::23



The requirement is to mount ceph using the domain name of MON node:

Note: we resolved the domain name via DNS server.


For this we are using the command:

```

mount -t ceph [storagenode.storage.com]:6789:/  /backup -o
name=admin,secret=AQCM+8hjqzuZEhAAcuQc+onNKReq7MV+ykFirg==

```



We are getting the following logs in /var/log/messages:

```

Jan 24 17:23:17 localhost kernel: libceph: resolve '
storagenode.storage.com' (ret=-3): failed

Jan 24 17:23:17 localhost kernel: libceph: parse_ips bad ip '
storagenode.storage.com:6789'

```



We also tried mounting ceph storage using IP of MON which is working fine.



Query:


Could you please help us out with how we can mount ceph using FQDN.



My /etc/ceph/ceph.conf is as follows:

[global]

ms bind ipv6 = true

ms bind ipv4 = false

mon initial members = storagenode1,storagenode2,storagenode3

osd pool default crush rule = -1

fsid = 7969b8a3-1df7-4eae-8ccf-2e5794de87fe

mon host =
[v2:[abcd:abcd:abcd::21]:3300,v1:[abcd:abcd:abcd::21]:6789],[v2:[abcd:abcd:abcd::22]:3300,v1:[abcd:abcd:abcd::22]:6789],[v2:[abcd:abcd:abcd::23]:3300,v1:[abcd:abcd:abcd::23]:6789]

public network = abcd:abcd:abcd::/64

cluster network = eff0:eff0:eff0::/64



[osd]

osd memory target = 4294967296



[client.rgw.storagenode1.rgw0]

host = storagenode1

keyring = /var/lib/ceph/radosgw/ceph-rgw.storagenode1.rgw0/keyring

log file = /var/log/ceph/ceph-rgw-storagenode1.rgw0.log

rgw frontends = beast endpoint=[abcd:abcd:abcd::21]:8080

rgw thread pool size = 512

--
~ Lokendra
skype: lokendrarathour




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: deploying Ceph using FQDN for MON / MDS Services

2023-04-11 Thread Lokendra Rathour
Ceph version Quincy.

But now I am able to resolve the issue.

During mount i will not pass any monitor details, it will be
auto-discovered via SRV.

On Tue, Apr 11, 2023 at 6:09 PM Eugen Block  wrote:

> What ceph version is this? Could it be this bug [1]? Although the
> error message is different, not sure if it could be the same issue,
> and I don't have anything to test ipv6 with.
>
> [1] https://tracker.ceph.com/issues/47300
>
> Zitat von Lokendra Rathour :
>
> > Hi All,
> > Requesting any inputs around the issue raised.
> >
> > Best Regards,
> > Lokendra
> >
> > On Tue, 24 Jan, 2023, 7:32 pm Lokendra Rathour, <
> lokendrarath...@gmail.com>
> > wrote:
> >
> >> Hi Team,
> >>
> >>
> >>
> >> We have a ceph cluster with 3 storage nodes:
> >>
> >> 1. storagenode1 - abcd:abcd:abcd::21
> >>
> >> 2. storagenode2 - abcd:abcd:abcd::22
> >>
> >> 3. storagenode3 - abcd:abcd:abcd::23
> >>
> >>
> >>
> >> The requirement is to mount ceph using the domain name of MON node:
> >>
> >> Note: we resolved the domain name via DNS server.
> >>
> >>
> >> For this we are using the command:
> >>
> >> ```
> >>
> >> mount -t ceph [storagenode.storage.com]:6789:/  /backup -o
> >> name=admin,secret=AQCM+8hjqzuZEhAAcuQc+onNKReq7MV+ykFirg==
> >>
> >> ```
> >>
> >>
> >>
> >> We are getting the following logs in /var/log/messages:
> >>
> >> ```
> >>
> >> Jan 24 17:23:17 localhost kernel: libceph: resolve '
> >> storagenode.storage.com' (ret=-3): failed
> >>
> >> Jan 24 17:23:17 localhost kernel: libceph: parse_ips bad ip '
> >> storagenode.storage.com:6789'
> >>
> >> ```
> >>
> >>
> >>
> >> We also tried mounting ceph storage using IP of MON which is working
> fine.
> >>
> >>
> >>
> >> Query:
> >>
> >>
> >> Could you please help us out with how we can mount ceph using FQDN.
> >>
> >>
> >>
> >> My /etc/ceph/ceph.conf is as follows:
> >>
> >> [global]
> >>
> >> ms bind ipv6 = true
> >>
> >> ms bind ipv4 = false
> >>
> >> mon initial members = storagenode1,storagenode2,storagenode3
> >>
> >> osd pool default crush rule = -1
> >>
> >> fsid = 7969b8a3-1df7-4eae-8ccf-2e5794de87fe
> >>
> >> mon host =
> >>
> [v2:[abcd:abcd:abcd::21]:3300,v1:[abcd:abcd:abcd::21]:6789],[v2:[abcd:abcd:abcd::22]:3300,v1:[abcd:abcd:abcd::22]:6789],[v2:[abcd:abcd:abcd::23]:3300,v1:[abcd:abcd:abcd::23]:6789]
> >>
> >> public network = abcd:abcd:abcd::/64
> >>
> >> cluster network = eff0:eff0:eff0::/64
> >>
> >>
> >>
> >> [osd]
> >>
> >> osd memory target = 4294967296
> >>
> >>
> >>
> >> [client.rgw.storagenode1.rgw0]
> >>
> >> host = storagenode1
> >>
> >> keyring = /var/lib/ceph/radosgw/ceph-rgw.storagenode1.rgw0/keyring
> >>
> >> log file = /var/log/ceph/ceph-rgw-storagenode1.rgw0.log
> >>
> >> rgw frontends = beast endpoint=[abcd:abcd:abcd::21]:8080
> >>
> >> rgw thread pool size = 512
> >>
> >> --
> >> ~ Lokendra
> >> skype: lokendrarathour
> >>
> >>
> >>
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
~ Lokendra
skype: lokendrarathour
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW don't use .rgw.root multisite configuration

2023-04-11 Thread Casey Bodley
there's a rgw_period_root_pool option for the period objects too. but
it shouldn't be necessary to override any of these

On Sun, Apr 9, 2023 at 11:26 PM  wrote:
>
> Up :)
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Object Gateway and lua scripts

2023-04-11 Thread Yuval Lifshitz
It is a simple fix. you can have a look here:
https://github.com/ceph/ceph/pull/50975
will backport to reef so it will be in the next release.

On Tue, Apr 11, 2023 at 2:48 PM Thomas Bennett  wrote:

> Thanks Yuval. From your email I've confirmed that it's not the logging
> that is broken - it's the CopyFrom is causing an issue :)
>
> I've got some other example Lua scripts working now.
>
> Kind regards,
> Thomas
>
>
>
> On Sun, 9 Apr 2023 at 11:41, Yuval Lifshitz  wrote:
>
>> Hi Thomas,
>> I think you found a crash when using the lua "CopyFrom" field.
>> Opened a tracker: https://tracker.ceph.com/issues/59381
>>
>> Will fix SASP and keep you updated.
>>
>> Yuval
>>
>> On Wed, Apr 5, 2023 at 6:58 PM Thomas Bennett  wrote:
>>
>>> Hi,
>>>
>>> We're currently testing out lua scripting in the Ceph Object Gateway
>>> (Radosgw).
>>>
>>> Ceph version: 17.2.5
>>>
>>> We've tried a simple experiment with the simple lua script which is based
>>> on the documentation (see fixed width text below).
>>>
>>> However, the issue we're having is that we can't find the log messages
>>> anywhere. We've searched the entire jourrnalctl database as well as
>>> raised
>>> the debug level on the radosgw by setting debug_rgw to 20 on the running
>>> daemon.
>>>
>>> Any help welcome :)
>>>
>>> function print_object(msg, object)
>>>   RGWDebugLog("  Title: " .. msg)
>>>   RGWDebugLog("  Name: " .. object.Name)
>>>   RGWDebugLog("  Instance: " .. object.Instance)
>>>   RGWDebugLog("  Id: " .. object.Id)
>>>   RGWDebugLog("  Size: " .. object.Size)
>>>   RGWDebugLog("  MTime: " .. object.MTime)
>>> end
>>>
>>> RGWDebugLog("This is a log message!")
>>>
>>> Request.Log()
>>> if Request.CopyFrom then
>>>   print_object("copy from", Request.CopyFrom.Object)
>>> if Request.CopyFrom.Object then
>>>   print_object("copy from-object" ,Request.CopyFrom.Object)
>>> end
>>> end
>>>
>>> if Request.Object then
>>>   print_object("Object" ,Request.Object)
>>> end
>>>
>>> Disclaimer
>>>
>>> The information contained in this communication from the sender is
>>> confidential. It is intended solely for use by the recipient and others
>>> authorized to receive it. If you are not the recipient, you are hereby
>>> notified that any disclosure, copying, distribution or taking action in
>>> relation of the contents of this information is strictly prohibited and may
>>> be unlawful.
>>>
>>> This email has been scanned for viruses and malware, and may have been
>>> automatically archived by Mimecast, a leader in email security and cyber
>>> resilience. Mimecast integrates email defenses with brand protection,
>>> security awareness training, web security, compliance and other essential
>>> capabilities. Mimecast helps protect large and small organizations from
>>> malicious activity, human error and technology failure; and to lead the
>>> movement toward building a more resilient world. To find out more, visit
>>> our website.
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>
>>>
>
> *Disclaimer*
>
> The information contained in this communication from the sender is
> confidential. It is intended solely for use by the recipient and others
> authorized to receive it. If you are not the recipient, you are hereby
> notified that any disclosure, copying, distribution or taking action in
> relation of the contents of this information is strictly prohibited and may
> be unlawful.
>
> This email has been scanned for viruses and malware, and may have been
> automatically archived by Mimecast, a leader in email security and cyber
> resilience. Mimecast integrates email defenses with brand protection,
> security awareness training, web security, compliance and other essential
> capabilities. Mimecast helps protect large and small organizations from
> malicious activity, human error and technology failure; and to lead the
> movement toward building a more resilient world. To find out more, visit
> our website.
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Some hint for a DELL PowerEdge T440/PERC H750 Controller...

2023-04-11 Thread Frank Schilder
>   iops: min=2, max=   40, avg=21.13, stdev= 6.10, samples=929
>   iops: min=2, max=   42, avg=21.52, stdev= 6.56, samples=926

That looks horrible. We also have a few SATA HDDs in Dell servers and they do 
about 100-150 IOP/s read or write. Originally, I was also a bit afraid that 
these disks would drag performance down, but they are on par with the NL-SAS 
drives.

For ceph we use the cheapest Dell disk controller one can get (Dell HBA330 Mini 
(Embedded)) and it works perfectly. All ceph-disks are configured non-raid, 
which is equivalent to JBOD mode or pass-through. These controllers have no 
cache options, if your do, disable all of them. Mode should be write-through.

For your disk type I saw "volatile write cache available = yes" on "the 
internet". This looks a bit odd, but maybe these HDDs do have some volatile 
cache. Try to disable it with smartctl and do the benchmark again.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Some hint for a DELL PowerEdge T440/PERC H750 Controller...

2023-04-11 Thread Mario Giammarco
Hi,
do you want to hear the truth from real experience?
Or the myth?
The truth is that:
- hdd are too slow for ceph, the first time you need to do a rebalance or
similar you will discover...
- if you want to use hdds do a raid with your controller and use the
controller BBU cache (do not consider controllers with hdd cache), and
present the raid as one ceph disk.
- enabling single hdd write cache (that is not battery protected) is far
worse than enabling controller cache (which I assume is always protected by
BBU)
- anyway the best thing for ceph is to use nvme disks.

Mario

Il giorno gio 6 apr 2023 alle ore 13:40 Marco Gaiarin <
g...@lilliput.linux.it> ha scritto:

>
> We are testing an experimental Ceph cluster with server and controller at
> subject.
>
> The controller have not an HBA mode, but only a 'NonRAID' mode, come sort
> of
> 'auto RAID0' configuration.
>
> We are using SSD SATA disks (MICRON MTFDDAK480TDT) that perform very well,
> and SAS HDD disks (SEAGATE ST8000NM014A) that instead perform very bad
> (particulary, very low IOPS).
>
>
> There's some hint for disk/controller configuration/optimization?
>
>
> Thanks.
>
> --
>   Io credo nella chimica tanto quanto Giulio Cesare credeva nel caso...
>   mi va bene fino a quando non riguarda me :)   (Emanuele Pucciarelli)
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Object Gateway and lua scripts

2023-04-11 Thread Thomas Bennett
Thanks Yuval. From your email I've confirmed that it's not the logging that
is broken - it's the CopyFrom is causing an issue :)

I've got some other example Lua scripts working now.

Kind regards,
Thomas



On Sun, 9 Apr 2023 at 11:41, Yuval Lifshitz  wrote:

> Hi Thomas,I think you found a crash when using the lua "CopyFrom"
> field.Opened a tracker: https://tracker.ceph.com/issues/59381Will
> 
> fix SASP and keep you updated.YuvalOn Wed, Apr 5, 2023 at 6:58 PM Thomas
> Bennett  wrote:Hi,We're currently
> t ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌
>
> 
> Hi Thomas,
> I think you found a crash when using the lua "CopyFrom" field.
> Opened a tracker: https://tracker.ceph.com/issues/59381
> 
>
> Will fix SASP and keep you updated.
>
> Yuval
>
> On Wed, Apr 5, 2023 at 6:58 PM Thomas Bennett  wrote:
>
>> Hi,
>>
>> We're currently testing out lua scripting in the Ceph Object Gateway
>> (Radosgw).
>>
>> Ceph version: 17.2.5
>>
>> We've tried a simple experiment with the simple lua script which is based
>> on the documentation (see fixed width text below).
>>
>> However, the issue we're having is that we can't find the log messages
>> anywhere. We've searched the entire jourrnalctl database as well as raised
>> the debug level on the radosgw by setting debug_rgw to 20 on the running
>> daemon.
>>
>> Any help welcome :)
>>
>> function print_object(msg, object)
>>   RGWDebugLog("  Title: " .. msg)
>>   RGWDebugLog("  Name: " .. object.Name)
>>   RGWDebugLog("  Instance: " .. object.Instance)
>>   RGWDebugLog("  Id: " .. object.Id)
>>   RGWDebugLog("  Size: " .. object.Size)
>>   RGWDebugLog("  MTime: " .. object.MTime)
>> end
>>
>> RGWDebugLog("This is a log message!")
>>
>> Request.Log()
>> if Request.CopyFrom then
>>   print_object("copy from", Request.CopyFrom.Object)
>> if Request.CopyFrom.Object then
>>   print_object("copy from-object" ,Request.CopyFrom.Object)
>> end
>> end
>>
>> if Request.Object then
>>   print_object("Object" ,Request.Object)
>> end
>>
>> Disclaimer
>>
>> The i

[ceph-users] radosgw-admin bucket stats doesn't show real num_objects and size

2023-04-11 Thread viplanghe6
The radosgw-admin bucket stats show there are 209266 objects in this bucket, 
but it included failed multiparts, so that make the size parameter is also 
wrong. When I use boto3 to count objects, the bucket only has 209049 objects. 

The only solution I can find is to use lifecycle to clean these failed 
multiparts, but in production, the client will decide to use lifecycle or not?
So are there any way to exclude the failed multiparts in bucket statistic?
Does Ceph allow to set auto clean failed multiparts globally?

Thanks!

"usage": {
"rgw.main": {
"size": 593286801276,
"size_actual": 593716080640,
"size_utilized": 593286801276,
"size_kb": 579381642,
"size_kb_actual": 579800860,
"size_kb_utilized": 579381642,
"num_objects": 209266
}
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: radosgw-admin bucket stats doesn't show real num_objects and size

2023-04-11 Thread Boris Behrens
I don't think you can exclude that.
We've build a notification in the customer panel that there are incomplete
multipart uploads which will be added as space to the bill. We also added a
button to create a LC policy for these objects.

Am Di., 11. Apr. 2023 um 19:07 Uhr schrieb :

> The radosgw-admin bucket stats show there are 209266 objects in this
> bucket, but it included failed multiparts, so that make the size parameter
> is also wrong. When I use boto3 to count objects, the bucket only has
> 209049 objects.
>
> The only solution I can find is to use lifecycle to clean these failed
> multiparts, but in production, the client will decide to use lifecycle or
> not?
> So are there any way to exclude the failed multiparts in bucket statistic?
> Does Ceph allow to set auto clean failed multiparts globally?
>
> Thanks!
>
> "usage": {
> "rgw.main": {
> "size": 593286801276,
> "size_actual": 593716080640,
> "size_utilized": 593286801276,
> "size_kb": 579381642,
> "size_kb_actual": 579800860,
> "size_kb_utilized": 579381642,
> "num_objects": 209266
> }
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Some hint for a DELL PowerEdge T440/PERC H750 Controller...

2023-04-11 Thread Anthony D'Atri



> 
> The truth is that:
> - hdd are too slow for ceph, the first time you need to do a rebalance or
> similar you will discover...

Depends on the needs.  For cold storage, or sequential use-cases that aren't 
performance-sensitive ...  Can't say "too slow" without context.  In Marco's 
case, I wonder how the results might differ with numjobs=1 -- with a value of 4 
as reported, seems to me like the drive will be seeking an awful lot.  Mind you 
many Ceph multi-client workloads exhibit the "IO Blender"  effect where they 
present to the drives as random, but this FIO job may not be entirely 
indicative.

If you have to expand just to get more IOPs, that's a different story.

> - if you want to use hdds do a raid with your controller and use the
> controller BBU cache (do not consider controllers with hdd cache), and
> present the raid as one ceph disk.

Take care regarding OSD and PG counts with that strategy.  Plus, Ceph does 
replication, so replication under the OSD layer can be ... gratuitous.  

> - enabling single hdd write cache (that is not battery protected) is far
> worse than enabling controller cache (which I assume is always protected by
> BBU)

There are plenty of RoC HBAs out there without cache RAM or BBU/supercap, and 
also ones with cache RAM but without BBU/supercap.  These often default to 
writethrough caching and arguably don't have much or any net benefit.

> - anyway the best thing for ceph is to use nvme disks.

I wouldn't disagree, but it's not entirely cut and dried.  Notably the cost and 
hassle of an RoC HBA, cache, BBU/supercap, additional monitoring, replacement 
...  See my post a few years back about reasons I don't like RoC HBAs.  Go with 
a plain, non-RoC HBA and the savings can help justify going with SATA SSDs at a 
minimum.

> 
> Mario
> 
> Il giorno gio 6 apr 2023 alle ore 13:40 Marco Gaiarin <
> g...@lilliput.linux.it> ha scritto:
> 
>> 
>> We are testing an experimental Ceph cluster with server and controller at
>> subject.
>> 
>> The controller have not an HBA mode, but only a 'NonRAID' mode, come sort
>> of
>> 'auto RAID0' configuration.
>> 
>> We are using SSD SATA disks (MICRON MTFDDAK480TDT) that perform very well,
>> and SAS HDD disks (SEAGATE ST8000NM014A) that instead perform very bad
>> (particulary, very low IOPS).
>> 
>> 
>> There's some hint for disk/controller configuration/optimization?
>> 
>> 
>> Thanks.
>> 
>> --
>>  Io credo nella chimica tanto quanto Giulio Cesare credeva nel caso...
>>  mi va bene fino a quando non riguarda me :)   (Emanuele Pucciarelli)
>> 
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] naming the S release

2023-04-11 Thread Josh Durgin
With the Reef dev cycle closing, it's time to think about S and future
releases.

There are a bunch of options for S already, add a +1 or a new option to
this etherpad, and we'll see what has the most votes next week:

  https://pad.ceph.com/p/s

Josh
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph 17.2.6 and iam roles (pr#48030)

2023-04-11 Thread Christopher Durham

Hi,
I see that this PR: https://github.com/ceph/ceph/pull/48030
made it into ceph 17.2.6, as per the change log  at: 
https://docs.ceph.com/en/latest/releases/quincy/  That's great.
But my scenario is as follows:
I have two clusters set up as multisite. Because of  the lack of replication 
for IAM roles, we have set things up so that roles on the primary 'manually' 
get replicated to the secondary site via a python script. Thus, if I create a 
role on the primary, add/delete users or buckets from said role, the role, 
including the AssumeRolePolicyDocument and policies, gets pushed to the 
replicated site. This has served us well for three years.
With the advent of this fix, what should I do before I upgrade to 17.2.6 
(currently on 17.2.5, rocky 8)

I know that in my situation, roles of the same name have different RoleIDs on 
the two sites. What should I do before I upgrade? Possibilities that *could* 
happen if i dont rectify things as we upgrade:
1. The different RoleIDs lead to two roles of the same name on the replicated 
site, perhaps with the system unable to address/look at/modify either
2. Roles just don't get repiicated to the second site

or other similar situations, all of which I want to avoid.
Perhaps the safest thing to do is to remove all roles on the secondary site, 
upgrade, and then force a replication of roles (How would I *force* that for 
iAM roles if it is the correct answer?)
Here is the original bug report: 

https://tracker.ceph.com/issues/57364
Thanks!
-Chris
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph 17.2.6 and iam roles (pr#48030)

2023-04-11 Thread Casey Bodley
On Tue, Apr 11, 2023 at 3:19 PM Christopher Durham  wrote:
>
>
> Hi,
> I see that this PR: https://github.com/ceph/ceph/pull/48030
> made it into ceph 17.2.6, as per the change log  at: 
> https://docs.ceph.com/en/latest/releases/quincy/  That's great.
> But my scenario is as follows:
> I have two clusters set up as multisite. Because of  the lack of replication 
> for IAM roles, we have set things up so that roles on the primary 'manually' 
> get replicated to the secondary site via a python script. Thus, if I create a 
> role on the primary, add/delete users or buckets from said role, the role, 
> including the AssumeRolePolicyDocument and policies, gets pushed to the 
> replicated site. This has served us well for three years.
> With the advent of this fix, what should I do before I upgrade to 17.2.6 
> (currently on 17.2.5, rocky 8)
>
> I know that in my situation, roles of the same name have different RoleIDs on 
> the two sites. What should I do before I upgrade? Possibilities that *could* 
> happen if i dont rectify things as we upgrade:
> 1. The different RoleIDs lead to two roles of the same name on the replicated 
> site, perhaps with the system unable to address/look at/modify either
> 2. Roles just don't get repiicated to the second site

no replication would happen until the metadata changes again on the
primary zone. once that gets triggered, the role metadata would
probably fail to sync due to the name conflicts

>
> or other similar situations, all of which I want to avoid.
> Perhaps the safest thing to do is to remove all roles on the secondary site, 
> upgrade, and then force a replication of roles (How would I *force* that for 
> iAM roles if it is the correct answer?)

this removal will probably be necessary to avoid those conflicts. once
that's done, you can force a metadata full sync on the secondary zone
by running 'radosgw-admin metadata sync init' there, then restarting
its gateways. this will have to resync all of the bucket and user
metadata as well

> Here is the original bug report:
>
> https://tracker.ceph.com/issues/57364
> Thanks!
> -Chris
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph 17.2.6 and iam roles (pr#48030)

2023-04-11 Thread Casey Bodley
On Tue, Apr 11, 2023 at 3:53 PM Casey Bodley  wrote:
>
> On Tue, Apr 11, 2023 at 3:19 PM Christopher Durham  wrote:
> >
> >
> > Hi,
> > I see that this PR: https://github.com/ceph/ceph/pull/48030
> > made it into ceph 17.2.6, as per the change log  at: 
> > https://docs.ceph.com/en/latest/releases/quincy/  That's great.
> > But my scenario is as follows:
> > I have two clusters set up as multisite. Because of  the lack of 
> > replication for IAM roles, we have set things up so that roles on the 
> > primary 'manually' get replicated to the secondary site via a python 
> > script. Thus, if I create a role on the primary, add/delete users or 
> > buckets from said role, the role, including the AssumeRolePolicyDocument 
> > and policies, gets pushed to the replicated site. This has served us well 
> > for three years.
> > With the advent of this fix, what should I do before I upgrade to 17.2.6 
> > (currently on 17.2.5, rocky 8)
> >
> > I know that in my situation, roles of the same name have different RoleIDs 
> > on the two sites. What should I do before I upgrade? Possibilities that 
> > *could* happen if i dont rectify things as we upgrade:
> > 1. The different RoleIDs lead to two roles of the same name on the 
> > replicated site, perhaps with the system unable to address/look at/modify 
> > either
> > 2. Roles just don't get repiicated to the second site
>
> no replication would happen until the metadata changes again on the
> primary zone. once that gets triggered, the role metadata would
> probably fail to sync due to the name conflicts
>
> >
> > or other similar situations, all of which I want to avoid.
> > Perhaps the safest thing to do is to remove all roles on the secondary 
> > site, upgrade, and then force a replication of roles (How would I *force* 
> > that for iAM roles if it is the correct answer?)
>
> this removal will probably be necessary to avoid those conflicts. once
> that's done, you can force a metadata full sync on the secondary zone
> by running 'radosgw-admin metadata sync init' there, then restarting
> its gateways. this will have to resync all of the bucket and user
> metadata as well

p.s. don't use the DeleteRole rest api on the secondary zone after
upgrading, as the request would get forwarded to the primary zone and
delete it there too. you can use 'radosgw-admin role delete' on the
secondary instead

>
> > Here is the original bug report:
> >
> > https://tracker.ceph.com/issues/57364
> > Thanks!
> > -Chris
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Pacific dashboard: unable to get RGW information

2023-04-11 Thread Michel Jouvin

Hi,

Our cluster is running Pacific 16.2.10. We have a problem using the 
dashboard to display information about RGWs configured in the cluster. 
When clicking on "Object Gateway", we get an error 500. Looking in the 
mgr logs, I found that the problem is that the RGW is accessed by its IP 
address rather than its name. As the RGW has SSL enabled, the 
certificate cannot be matched against the IP address.


I digged into the configuration but I was not able to identify where an 
IP address rather than a name was used (I checked in particular the 
zonegroup parameters and names are used to define endpoints). Did I make 
something wrong in the configuration or is it a know issue when using 
SSL-enabled RGW?


Best regards,

Michel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Pacific dashboard: unable to get RGW information

2023-04-11 Thread Wyll Ingersoll
I have a similar issue with how the dashboard tries to access an SSL protected 
RGW service.  It doesn't use the correct name and doesn't allow for any way to 
override the RGW name that the dashboard uses.

https://tracker.ceph.com/issues/59111
Bug #59111: dashboard should use rgw_dns_name when talking to rgw api - 
Dashboard - Ceph 
Redmine
tracker.ceph.com




From: Michel Jouvin 
Sent: Tuesday, April 11, 2023 4:19 PM
To: Ceph Users 
Subject: [ceph-users] Pacific dashboard: unable to get RGW information

Hi,

Our cluster is running Pacific 16.2.10. We have a problem using the
dashboard to display information about RGWs configured in the cluster.
When clicking on "Object Gateway", we get an error 500. Looking in the
mgr logs, I found that the problem is that the RGW is accessed by its IP
address rather than its name. As the RGW has SSL enabled, the
certificate cannot be matched against the IP address.

I digged into the configuration but I was not able to identify where an
IP address rather than a name was used (I checked in particular the
zonegroup parameters and names are used to define endpoints). Did I make
something wrong in the configuration or is it a know issue when using
SSL-enabled RGW?

Best regards,

Michel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Pacific dashboard: unable to get RGW information

2023-04-11 Thread Eugen Block

Hi,

version 16.2.11 (which was just recently released) contains a fix for  
that. But it still doesn’t work with wildcard certificates, that’s  
still an issue for us.


Zitat von Michel Jouvin :


Hi,

Our cluster is running Pacific 16.2.10. We have a problem using the  
dashboard to display information about RGWs configured in the  
cluster. When clicking on "Object Gateway", we get an error 500.  
Looking in the mgr logs, I found that the problem is that the RGW is  
accessed by its IP address rather than its name. As the RGW has SSL  
enabled, the certificate cannot be matched against the IP address.


I digged into the configuration but I was not able to identify where  
an IP address rather than a name was used (I checked in particular  
the zonegroup parameters and names are used to define endpoints).  
Did I make something wrong in the configuration or is it a know  
issue when using SSL-enabled RGW?


Best regards,

Michel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Pacific dashboard: unable to get RGW information

2023-04-11 Thread Michel Jouvin
Thanks for these answers, I was not able to find information mentioning the 
problem, thus my email. I didn't try 16.2.11 because of the big mentioned 
by others in volume activation when using cephadm.


Michel
Sent from my mobile
Le 11 avril 2023 22:28:37 Eugen Block  a écrit :


Hi,

version 16.2.11 (which was just recently released) contains a fix for
that. But it still doesn’t work with wildcard certificates, that’s
still an issue for us.

Zitat von Michel Jouvin :


Hi,

Our cluster is running Pacific 16.2.10. We have a problem using the
dashboard to display information about RGWs configured in the
cluster. When clicking on "Object Gateway", we get an error 500.
Looking in the mgr logs, I found that the problem is that the RGW is
accessed by its IP address rather than its name. As the RGW has SSL
enabled, the certificate cannot be matched against the IP address.

I digged into the configuration but I was not able to identify where
an IP address rather than a name was used (I checked in particular
the zonegroup parameters and names are used to define endpoints).
Did I make something wrong in the configuration or is it a know
issue when using SSL-enabled RGW?

Best regards,

Michel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Pacific dashboard: unable to get RGW information

2023-04-11 Thread Gilles Mocellin
Hi,

My problem is the opposite !
I don't use SSL on RGWs, because I use a load balancer with HTTPS endpoint.
so no problem with certificates and IP adresses.
With 16.2.11, it does not work anymore because it uses DNS names, and those 
names are resolving to a management IP, which is not the network where I 
expose the RGWs...

So for everyone to have a working dashboard, we need to be able to override 
that configuration, and set what ever we want as RGW endpoints.

PS: I still don't use cephadm, but ceph-ansible. Perhaps things are different 
with containers.

Le mardi 11 avril 2023, 22:26:28 CEST Eugen Block a écrit :
> Hi,
> 
> version 16.2.11 (which was just recently released) contains a fix for  
> that. But it still doesn’t work with wildcard certificates, that’s  
> still an issue for us.
> 
> Zitat von Michel Jouvin :
> 
> 
> > Hi,
> >
> >
> >
> > Our cluster is running Pacific 16.2.10. We have a problem using the  
> > dashboard to display information about RGWs configured in the  
> > cluster. When clicking on "Object Gateway", we get an error 500.  
> > Looking in the mgr logs, I found that the problem is that the RGW is  
> > accessed by its IP address rather than its name. As the RGW has SSL  
> > enabled, the certificate cannot be matched against the IP address.
> >
> >
> >
> > I digged into the configuration but I was not able to identify where  
> > an IP address rather than a name was used (I checked in particular  
> > the zonegroup parameters and names are used to define endpoints).  
> > Did I make something wrong in the configuration or is it a know  
> > issue when using SSL-enabled RGW?
> >
> >
> >
> > Best regards,
> >
> >
> >
> > Michel
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Pacific dashboard: unable to get RGW information

2023-04-11 Thread Eugen Block
Right, I almost forgot that one, I stumbled upon the performance  
regression as well. :-/


Zitat von Michel Jouvin :

Thanks for these answers, I was not able to find information  
mentioning the problem, thus my email. I didn't try 16.2.11 because  
of the big mentioned by others in volume activation when using  
cephadm.


Michel
Sent from my mobile
Le 11 avril 2023 22:28:37 Eugen Block  a écrit :


Hi,

version 16.2.11 (which was just recently released) contains a fix for
that. But it still doesn’t work with wildcard certificates, that’s
still an issue for us.

Zitat von Michel Jouvin :


Hi,

Our cluster is running Pacific 16.2.10. We have a problem using the
dashboard to display information about RGWs configured in the
cluster. When clicking on "Object Gateway", we get an error 500.
Looking in the mgr logs, I found that the problem is that the RGW is
accessed by its IP address rather than its name. As the RGW has SSL
enabled, the certificate cannot be matched against the IP address.

I digged into the configuration but I was not able to identify where
an IP address rather than a name was used (I checked in particular
the zonegroup parameters and names are used to define endpoints).
Did I make something wrong in the configuration or is it a know
issue when using SSL-enabled RGW?

Best regards,

Michel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Pacific dashboard: unable to get RGW information

2023-04-11 Thread Gilles Mocellin
I forgot, there's a similar bug around that :
https://tracker.ceph.com/issues/58811

Le mardi 11 avril 2023, 22:45:28 CEST Gilles Mocellin a écrit :
> Hi,
> 
> My problem is the opposite !
> I don't use SSL on RGWs, because I use a load balancer with HTTPS endpoint.
> so no problem with certificates and IP adresses.
> With 16.2.11, it does not work anymore because it uses DNS names, and those
> 
 names are resolving to a management IP, which is not the network where I
> expose the RGWs...
> 
> So for everyone to have a working dashboard, we need to be able to override
> 
 that configuration, and set what ever we want as RGW endpoints.
> 
> PS: I still don't use cephadm, but ceph-ansible. Perhaps things are
> different 
 with containers.
> 
> Le mardi 11 avril 2023, 22:26:28 CEST Eugen Block a écrit :
> 
> > Hi,
> > 
> > version 16.2.11 (which was just recently released) contains a fix for  
> > that. But it still doesn’t work with wildcard certificates, that’s  
> > still an issue for us.
> > 
> > Zitat von Michel Jouvin :
> > 
> > 
> > 
> > > Hi,
> > >
> > >
> > >
> > >
> > >
> > > Our cluster is running Pacific 16.2.10. We have a problem using the  
> > > dashboard to display information about RGWs configured in the  
> > > cluster. When clicking on "Object Gateway", we get an error 500.  
> > > Looking in the mgr logs, I found that the problem is that the RGW is  
> > > accessed by its IP address rather than its name. As the RGW has SSL  
> > > enabled, the certificate cannot be matched against the IP address.
> > >
> > >
> > >
> > >
> > >
> > > I digged into the configuration but I was not able to identify where  
> > > an IP address rather than a name was used (I checked in particular  
> > > the zonegroup parameters and names are used to define endpoints).  
> > > Did I make something wrong in the configuration or is it a know  
> > > issue when using SSL-enabled RGW?
> > >
> > >
> > >
> > >
> > >
> > > Best regards,
> > >
> > >
> > >
> > >
> > >
> > > Michel
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > 
> > 
> > 
> > 
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> 
> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Upgrade from 17.2.5 to 17.2.6 stuck at MDS

2023-04-11 Thread Xiubo Li


On 4/11/23 15:59, Thomas Widhalm wrote:



On 11.04.23 09:16, Xiubo Li wrote:


On 4/11/23 03:24, Thomas Widhalm wrote:

Hi,

If you remember, I hit bug https://tracker.ceph.com/issues/58489 so 
I was very relieved when 17.2.6 was released and started to update 
immediately.



Please note, this fix is not in the v17.2.6 yet in upstream code.



Thanks for the information. I misread the information in the tracker. 
Do you have a predicted schedule for the backport? Or should I go for 
a specific pre-release? I don't want to take chances but I'm desperate 
because my production system is affected and offline for several weeks 
now.


The backport is already queued to review and test, but I am not very 
sure when it can get merged. I am not very sure this can 100% resolve 
your issue in case when you have other corruptions in you production.


Thanks


Thanks,
Thomas


Thanks

- Xiubo


But now I'm stuck again with my broken MDS. MDS won't get into 
up:active without the update but the update waits for them to get 
into up:active state. Seems like a deadlock / chicken-egg problem to 
me.


Since I'm still relatively new to Ceph, could you help me?

What I see when watching the update status:

{
    "target_image": 
"quay.io/ceph/ceph@sha256:1161e35e4e02cf377c93b913ce78773f8413f5a8d7c5eaee4b4773a4f9dd6635",

    "in_progress": true,
    "which": "Upgrading all daemon types on all hosts",
    "services_complete": [
    "crash",
    "mgr",
"mon",
"osd"
    ],
    "progress": "18/40 daemons upgraded",
    "message": "Error: UPGRADE_OFFLINE_HOST: Upgrade: Failed to 
connect to host ceph01 at addr (192.168.23.61)",

    "is_paused": false
}

(The offline host was one host that broke during the upgrade. I 
fixed that in the meantime and the update went on.)


And in the log:

2023-04-10T19:23:48.750129+ mgr.ceph04.qaexpv [INF] Upgrade: 
Waiting for mds.mds01.ceph04.hcmvae to be up:active (currently 
up:replay)
2023-04-10T19:23:58.758141+ mgr.ceph04.qaexpv [WRN] Upgrade: No 
mds is up; continuing upgrade procedure to poke things in the right 
direction



Please give me a hint what I can do.

Cheers,
Thomas

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: radosgw-admin bucket stats doesn't show real num_objects and size

2023-04-11 Thread huyv nguyễn
Yeah, thanks for your suggestion

Vào Th 4, 12 thg 4, 2023 vào lúc 00:10 Boris Behrens  đã
viết:

> I don't think you can exclude that.
> We've build a notification in the customer panel that there are incomplete
> multipart uploads which will be added as space to the bill. We also added a
> button to create a LC policy for these objects.
>
> Am Di., 11. Apr. 2023 um 19:07 Uhr schrieb :
>
> > The radosgw-admin bucket stats show there are 209266 objects in this
> > bucket, but it included failed multiparts, so that make the size
> parameter
> > is also wrong. When I use boto3 to count objects, the bucket only has
> > 209049 objects.
> >
> > The only solution I can find is to use lifecycle to clean these failed
> > multiparts, but in production, the client will decide to use lifecycle or
> > not?
> > So are there any way to exclude the failed multiparts in bucket
> statistic?
> > Does Ceph allow to set auto clean failed multiparts globally?
> >
> > Thanks!
> >
> > "usage": {
> > "rgw.main": {
> > "size": 593286801276,
> > "size_actual": 593716080640,
> > "size_utilized": 593286801276,
> > "size_kb": 579381642,
> > "size_kb_actual": 579800860,
> > "size_kb_utilized": 579381642,
> > "num_objects": 209266
> > }
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groüen Saal.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io