[ceph-users] S3 Buckets with "object-lock"

2020-10-01 Thread Torsten Ennenbach
Hello, we are using 

Ceph 14.x for our s3 storages and some of our customers want to create a locked 
object bucket.
BUT:
While the creation of a locked bucket works, the objects are still deletable.

Any ideas or hints?

Best regards:
Torsten




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] bugs ceph-volume scripting

2020-10-01 Thread Marc Roos


I have been creating lvm osd's with:
ceph-volume lvm zap --destroy /dev/sdf && ceph-volume lvm create --data 
/dev/sdf --dmcrypt

Because this procedure failed:
ceph-volume lvm zap --destroy /dev/sdf  
(waiting on slow human typing)
ceph-volume lvm create --data /dev/sdf --dmcrypt

However when I was looking at the /var/lib/ceph/osd I expected these lvm 
mounts to list as world writable[1]

So I decided to compare:
[@osd]# cp -a ceph-33 ceph-33.bak2

[@osd]# service ceph-osd@33 stop

[@osd]# service ceph-volume@lvm-33-9a7a9a7c-8fc8-441c-8380-acf7f8b1a670 
start

[BUG1]
With the zap && create, I seem to be running the osd without the tmpfs 
mounted?! So that means that if I reboot this node and the content is 
different from the tmpfs. I have a serious problem.

So I am trying to unmount ceph-33
[BUG2]
Wtf ceph-osd@33 is running! Something started (ceph-volume?) the osd, 
without even having a chance to inspect the difference of these folders.
If ceph-volume 'by design' starts the osd, then design is bad, nobody is 
expecting this behaviour and I have no idea what can go wrong if this 
startup data in tmpfs ceph-33 mount is different from the lvm create 
files on the os disk.

Stopping again osd-33
[@osd]# service ceph-osd@33 stop

Trying to unmount again ceph-33
[@osd]# service ceph-volume@lvm-33-9a7a9a7c-8fc8-441c-8380-acf7f8b1a670 
stop

[BUG3]
service ceph-volume just does not unmount tmpfs. I have to unmount with
umount /var/lib/ceph/osd/ceph-33


Inspecting differences of both,
[@osd]# ls -l ceph-33.bak2 ceph-33.new
ceph-33.new:
total 28
lrwxrwxrwx 1 ceph ceph  50 Oct  1 10:06 block -> 
/dev/mapper/1K8AX3-D3Gv-VKdY-0wTW-qjgd-txAu-JbNJHo
-rw--- 1 ceph ceph  37 Oct  1 10:06 ceph_fsid
-rw--- 1 ceph ceph  37 Oct  1 10:06 fsid
-rw--- 1 ceph ceph  56 Oct  1 10:06 keyring
-rw--- 1 ceph ceph 106 Oct  1 10:06 lockbox.keyring
-rw--- 1 ceph ceph   6 Oct  1 10:06 ready
-rw--- 1 ceph ceph  10 Oct  1 10:06 type
-rw--- 1 ceph ceph   3 Oct  1 10:06 whoami

ceph-33.bak2:
total 56
-rw-r- 1 ceph ceph 373 Sep 30 21:23 activate.monmap
lrwxrwxrwx 1 ceph ceph  50 Sep 30 21:23 block -> 
/dev/mapper/1K8AX3-D3Gv-VKdY-0wTW-qjgd-txAu-JbNJHo
-rw--- 1 ceph ceph   2 Sep 30 21:23 bluefs
-rw--- 1 ceph ceph  37 Sep 30 21:23 ceph_fsid
-rw-r- 1 ceph ceph  37 Sep 30 21:23 fsid
-rw--- 1 ceph ceph  56 Sep 30 21:23 keyring
-rw--- 1 ceph ceph   8 Sep 30 21:23 kv_backend
-rw--- 1 ceph ceph 106 Sep 30 21:23 lockbox.keyring
-rw--- 1 ceph ceph  21 Sep 30 21:23 magic
-rw--- 1 ceph ceph   4 Sep 30 21:23 mkfs_done
-rw--- 1 ceph ceph  41 Sep 30 21:23 osd_key
-rw--- 1 ceph ceph   6 Sep 30 21:23 ready
-rw--- 1 ceph ceph   3 Sep 30 21:23 require_osd_release
-rw--- 1 ceph ceph  10 Sep 30 21:23 type
-rw--- 1 ceph ceph   3 Sep 30 21:23 whoami

The contents of the files in new (tmpfs) luckily are the same as in bak2 
(ceph-volume create?). However as you can see I miss quite a few files 
in the tmpfs

So I am giving it a try and start this osd.33 with the tmpfs mounted.

[@osd]# service ceph-volume@lvm-33-9a7a9a7c-8fc8-441c-8380-acf7f8b1a670 
start^C
[@osd]# ps -ef | grep ceph-osd | grep 33
[@osd]# service ceph-volume@lvm-33-9a7a9a7c-8fc8-441c-8380-acf7f8b1a670 
start
Redirecting to /bin/systemctl start 
ceph-volume@lvm-33-9a7a9a7c-8fc8-441c-8380-acf7f8b1a670.service

And indeed, again ceph-osd is started.
[@osd]# ps -ef | grep ceph-osd | grep 33
ceph 1651105   1 48 11:29 ?00:00:00 /usr/bin/ceph-osd -f 
--cluster ceph --id 33 --setuser ceph --setgroup ceph


[QUESTION1]
Should I just copy these files like kv_backend, mkfs_done to the tmpfs 
mount? I seem to have these files on other ceph-volume created osd's.

[QUESTION2]
Is there a reasonable explanation for running into such issues, before I 
start thinking this is shitty scripting?


[1]
https://tracker.ceph.com/issues/47549

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: hdd pg's migrating when converting ssd class osd's

2020-10-01 Thread Frank Schilder
Dear Mark and Nico,

I think this might be the time to file a tracker report. As far as I can see, 
your set-up is as it should be, OSD operations on your clusters should behave 
exactly as on ours. I don't know of any other configuration option that 
influences placement calculation.

The problems you (Nico in particular) describe seem serious enough. I heard 
also other reports of admin operations killing a cluster starting with 
Nautilus, most notably this one 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/W4M5XQRDBLXFGJGDYZALG6TQ4QBVGGAJ/#4KY3OW7PTOODLQVYKARZLGE5FZUNQOER
 . Maybe there is/are regressions with crush placement computations (and 
others)? I will add this to the list of tests before considering to upgrade 
from mimic.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Marc Roos 
Sent: 30 September 2020 22:26:11
To: eblock; Frank Schilder
Cc: ceph-users; nico.schottelius
Subject: RE: [ceph-users] Re: hdd pg's migrating when converting ssd class osd's

I am not sure, but it looks like this remapping at hdd's is not being
done when adding back the same ssd osd.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] cephfs tag not working

2020-10-01 Thread Andrej Filipcic


Hi,

on octopus 15.2.4 I have an issue with cephfs tag auth. The following 
works fine:


client.f9desktop
    key: 
    caps: [mds] allow rw
    caps: [mon] allow r
    caps: [osd] allow rw  pool=cephfs_data, allow rw pool=ssd_data, 
allow rw pool=fast_data,  allow rw pool=arich_data, allow rw 
pool=ecfast_data


but this one does not.

client.f9desktopnew
    key: 
    caps: [mds] allow rw
    caps: [mon] allow r
    caps: [osd] allow rw tag cephfs data=cephfs

For the 2nd, mds works, files can be created or removed, but client 
read/write (native client, kernel version 5.7.4) fails with I/O error, 
so osd part does not seem to be working properly.


Any clues what can be wrong? the cephfs was created in jewel...

Another issue is: if osd caps are updated (adding data pool), then some 
clients refresh the caps, but most of them do not, and the only way to 
refresh it is to remount the filesystem. working tag would solve it.


Best regards,
Andrej

--
_
   prof. dr. Andrej Filipcic,   E-mail: andrej.filip...@ijs.si
   Department of Experimental High Energy Physics - F9
   Jozef Stefan Institute, Jamova 39, P.o.Box 3000
   SI-1001 Ljubljana, Slovenia
   Tel.: +386-1-477-3674Fax: +386-1-477-3166
-
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] rgw index shard much larger than others

2020-10-01 Thread Dan van der Ster
Dear friends,

Running 14.2.11, we have one particularly large bucket with a very
strange distribution of objects among the shards. The bucket has 512
shards, and most shards have ~75k entries, but shard 0 has 1.75M
entries:

# rados -p default.rgw.buckets.index listomapkeys
.dir.61c59385-085d-4caa-9070-63a3868dccb6.272652427.1.0 | wc -l
1752085

# rados -p default.rgw.buckets.index listomapkeys
.dir.61c59385-085d-4caa-9070-63a3868dccb6.272652427.1.1 | wc -l
78388

# rados -p default.rgw.buckets.index listomapkeys
.dir.61c59385-085d-4caa-9070-63a3868dccb6.272652427.1.2 | wc -l
78764

We had resharded this bucket (manually) from 32 up to 512 shards just
before upgrading from 12.2.12 to 14.2.11 a couple weeks ago.

Any idea why shard .0 is getting such an imbalance of entries?
Should we manually reshard this bucket again?

Thanks!

Dan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] CEPH iSCSI issue - ESXi command timeout

2020-10-01 Thread Golasowski Martin
Dear All,

a week ago we had to reboot our ESXi nodes since our CEPH cluster sudennly 
stopped serving all I/O. We have identified a VM (vCenter appliance) which was 
swapping heavily and causing heavy load. However, since then we are 
experiencing strange issues, as if the cluster cannot handle any spike in I/O 
load like migration or VM reboot.

The main problem is that the iSCSI commands issued by ESXi sometimes time out 
and ESXi reports inaccessible datastore. It disrupts the I/O heavily, we had to 
reboot the vmware cluster entirely several times. It started suddennly after 
approx 10 months of operation without problems.

I can see a steadily increasing number of dropped Rx packets on the iSCSI 
network interfaces in the OSDs.

Our CEPH setup is following: 4 OSDs, each having 3 10TB 7.2k rpm HDDs. The OSDs 
are connected by 25 Gbps Ethernet to the other nodes. For the RBD pools I have 
64 PGs. The OSDs have 32 GB RAM, free is around 1G on each, I have seen even 
lower, though. OS is CentOS 7, CEPH release is Nautilus 14.2.11 deployed by 
ceph-ansible. MONs are virtualized in ESXi nodes on the local SSD drives.

iSCSI NICs are on separate VLAN, other traffic is served via bond with 
balance-xor (LACP is unusable due to VMware limitation for using SW iSCSI HBA) 
in a different VLAN. Our network is Mellanox based - SN2100 switches and 
Connect-X 5 NICs. 

The iSCSI target serves 2 LUNs in RBD pool which is erasure coded. Yesterday I 
have increased the number of PGs for that pool from 64 to 128, without much 
effect after the cluster finished rebalancing.

In OSD servers kernel log we see the following:

[299560.618893] iSCSI Login negotiation failed.
[303088.450088] Did not receive response to NOPIN on CID: 0, failing connection 
for I_T Nexus 
iqn.1994-05.com.redhat:esxi1,i,0x00023d02,iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw,t,0x01
[324926.694077] Did not receive response to NOPIN on CID: 0, failing connection 
for I_T Nexus 
iqn.1994-05.com.redhat:esxi2,i,0x00023d01,iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw,t,0x01
[407067.404538] ABORT_TASK: Found referenced iSCSI task_tag: 5891
[407076.077175] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 5891
[411677.887690] ABORT_TASK: Found referenced iSCSI task_tag: 6722
[411683.297425] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 6722


The error in ESXi looks like this:

naa.60014053b46fc760ff0470dbd7980263" on path "vmhba64:C1:T0:L0" Failed:
2020-10-01T05:38:51.291Z cpu49:2144076)NMP: nmp_ThrottleLogForDevice:3856: Cmd 
0x89 (0x459a5b1b9480, 2097241) to dev "naa.6001405a527d78935724451aa5f53513" on 
path "vmhba64:C2:T0:L1" Failed:
2020-10-01T05:38:57.098Z cpu44:2099346)NMP: nmp_ThrottleLogForDevice:3856: Cmd 
0x8a (0x45ba96710ec0, 2107403) to dev "naa.60014053b46fc760ff0470dbd7980263" on 
path "vmhba64:C1:T0:L0" Failed:
2020-10-01T05:38:57.122Z cpu71:2098965)NMP: nmp_ThrottleLogForDevice:3856: Cmd 
0x89 (0x45ba9676aec0, 2146212) to dev "naa.60014053b46fc760ff0470dbd7980263" on 
path "vmhba64:C1:T0:L0" Failed:
2020-10-01T05:38:57.256Z cpu65:2098959)NMP: nmp_ThrottleLogForDevice:3856: Cmd 
0x89 (0x459a4179d8c0, 2146269) to dev "naa.6001405a527d78935724451aa5f53513" on 
path "vmhba64:C2:T0:L1" Failed:

We would appreciate any help you can give us.

Thank you very much.

Regards,
Martin Golasowski




smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rgw index shard much larger than others

2020-10-01 Thread Matt Benjamin
Hi Dan,

Possibly you're reproducing https://tracker.ceph.com/issues/46456.

That explains how the underlying issue worked, I don't remember how a
bucked exhibiting this is repaired.

Eric?

Matt


On Thu, Oct 1, 2020 at 8:41 AM Dan van der Ster  wrote:
>
> Dear friends,
>
> Running 14.2.11, we have one particularly large bucket with a very
> strange distribution of objects among the shards. The bucket has 512
> shards, and most shards have ~75k entries, but shard 0 has 1.75M
> entries:
>
> # rados -p default.rgw.buckets.index listomapkeys
> .dir.61c59385-085d-4caa-9070-63a3868dccb6.272652427.1.0 | wc -l
> 1752085
>
> # rados -p default.rgw.buckets.index listomapkeys
> .dir.61c59385-085d-4caa-9070-63a3868dccb6.272652427.1.1 | wc -l
> 78388
>
> # rados -p default.rgw.buckets.index listomapkeys
> .dir.61c59385-085d-4caa-9070-63a3868dccb6.272652427.1.2 | wc -l
> 78764
>
> We had resharded this bucket (manually) from 32 up to 512 shards just
> before upgrading from 12.2.12 to 14.2.11 a couple weeks ago.
>
> Any idea why shard .0 is getting such an imbalance of entries?
> Should we manually reshard this bucket again?
>
> Thanks!
>
> Dan
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>


-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs tag not working

2020-10-01 Thread Eugen Block

Hi,

I have a one-node-cluster (also 15.2.4) for testing purposes and just  
created a cephfs with the tag, it works for me. But my node is also  
its own client, so there's that. And it was installed with 15.2.4, no  
upgrade.


For the 2nd, mds works, files can be created or removed, but client  
read/write (native client, kernel version 5.7.4) fails with I/O  
error, so osd part does not seem to be working properly.


You mean it works if you mount it from a different host (within the  
cluster maybe) with the new client's key but it doesn't work with the  
designated clients? I'm not sure about the OSD part since the other  
syntax seems to work, you say.


Can you share more details about the error? The mount on the clients  
works but they can't read/write?


Regards,
Eugen


Zitat von Andrej Filipcic :


Hi,

on octopus 15.2.4 I have an issue with cephfs tag auth. The  
following works fine:


client.f9desktop
    key: 
    caps: [mds] allow rw
    caps: [mon] allow r
    caps: [osd] allow rw  pool=cephfs_data, allow rw  
pool=ssd_data, allow rw pool=fast_data,  allow rw pool=arich_data,  
allow rw pool=ecfast_data


but this one does not.

client.f9desktopnew
    key: 
    caps: [mds] allow rw
    caps: [mon] allow r
    caps: [osd] allow rw tag cephfs data=cephfs

For the 2nd, mds works, files can be created or removed, but client  
read/write (native client, kernel version 5.7.4) fails with I/O  
error, so osd part does not seem to be working properly.


Any clues what can be wrong? the cephfs was created in jewel...

Another issue is: if osd caps are updated (adding data pool), then  
some clients refresh the caps, but most of them do not, and the only  
way to refresh it is to remount the filesystem. working tag would  
solve it.


Best regards,
Andrej

--
_
   prof. dr. Andrej Filipcic,   E-mail: andrej.filip...@ijs.si
   Department of Experimental High Energy Physics - F9
   Jozef Stefan Institute, Jamova 39, P.o.Box 3000
   SI-1001 Ljubljana, Slovenia
   Tel.: +386-1-477-3674Fax: +386-1-477-3166
-
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph-volume quite buggy compared to ceph-disk

2020-10-01 Thread Matt Larson
Hi Marc,

 Did you have any success with `ceph-volume` for activating your OSD?

 I am having a similar problem where the command `ceph-bluestore-tool`
fails to be able to read a label for a previously created OSD on an
LVM partition. I had previously been using the OSD without issues, but
after a reboot it fails to load.

 1. I had initially created my OSD using Ceph Octopus 15.x with `ceph
orch daemon add osd :boot/cephfs_meta` that was able to
create an OSD on the LVM partition and bring up an OSD.
 2. After a reboot, the OSD fails to come up, with error from
`ceph-bluestore-tool` happening inside the container specifically
being unable to read the label of the device.
 3. When I query the symlinked device /dev/boot/cephfs_meta ->
/dev/dm3, with `dmsetup info /dev/dm-3`, I can see the state is active
and that it has a UUID, etc.
 4. I installed `ceph-osd` CentOS package providing the
ceph-bluestore-tool, and tried to manually test and `sudo
ceph-bluestore-tool show-label --dev /dev/dm-3` fails to read the
label. When I try with other OSD's that were created for entire disks
this command is able to read the label and print out information.

 I am considering submitting a ticket to the ceph issue tracker, as I
am unable to figure out why the ceph-bluestore-tool cannot read the
labels and it seems either the OSD was initially created incorrectly
or there is a bug in ceph-bluestore-tool.

 One possibility is that I did not have the LVM2 package installed on
this host prior to the `ceph orch daemon add ..` command and this
caused a particular issue with the LVM partition OSD.

 -Matt

On Sat, Sep 19, 2020 at 9:11 AM Marc Roos  wrote:
>
>
>
>
> [@]# ceph-volume lvm activate 36 82b94115-4dfb-4ed0-8801-def59a432b0a
> Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-36
> Running command: /usr/bin/ceph-authtool
> /var/lib/ceph/osd/ceph-36/lockbox.keyring --create-keyring --name
> client.osd-lockbox.82b94115-4dfb-4ed0-8801-def59a432b0a --add-key
> AQBxA2Zfj6avOBAAIIHqNNY2J22EnOZV+dNzFQ==
>  stdout: creating /var/lib/ceph/osd/ceph-36/lockbox.keyring
> added entity client.osd-lockbox.82b94115-4dfb-4ed0-8801-def59a432b0a
> auth(key=AQBxA2Zfj6avOBAAIIHqNNY2J22EnOZV+dNzFQ==)
> Running command: /usr/bin/chown -R ceph:ceph
> /var/lib/ceph/osd/ceph-36/lockbox.keyring
> Running command: /usr/bin/ceph --cluster ceph --name
> client.osd-lockbox.82b94115-4dfb-4ed0-8801-def59a432b0a --keyring
> /var/lib/ceph/osd/ceph-36/lockbox.keyring config-key get
> dm-crypt/osd/82b94115-4dfb-4ed0-8801-def59a432b0a/luks
> Running command: /usr/sbin/cryptsetup --key-file - --allow-discards
> luksOpen
> /dev/ceph-9263e83b-7660-4f5b-843a-2111e882a17e/osd-block-82b94115-4dfb-4
> ed0-8801-def59a432b0a I8MyTZ-RQjx-gGmd-XSRw-kfa1-L60n-fgQpCb
>  stderr: Device I8MyTZ-RQjx-gGmd-XSRw-kfa1-L60n-fgQpCb already exists.
> Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-36
> Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph
> prime-osd-dir --dev /dev/mapper/I8MyTZ-RQjx-gGmd-XSRw-kfa1-L60n-fgQpCb
> --path /var/lib/ceph/osd/ceph-36 --no-mon-config
>  stderr: failed to read label for
> /dev/mapper/I8MyTZ-RQjx-gGmd-XSRw-kfa1-L60n-fgQpCb: (2) No such file or
> directory
> -->  RuntimeError: command returned non-zero exit status: 1
>
> dmsetup ls lists this
>
> Where is an option to set the weight? As far as I can see you can only
> set this after peering started?
>
> How can I mount this tmpfs manually to inspect this? Maybe put in the
> manual[1]?
>
>
> [1]
> https://docs.ceph.com/en/latest/ceph-volume/lvm/activate/
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



-- 
Matt Larson, PhD
Madison, WI  53705 U.S.A.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rgw index shard much larger than others

2020-10-01 Thread Eric Ivancich
Hi Matt and Dan,

I too suspect it’s the issue Matt linked to. That bug only affects versioned 
buckets, so I’m guessing your bucket is versioned, Dan.

This bug is triggered when the final instance of an object in a versioned 
bucket is deleted, but for reasons we do not yet understand, the object was not 
fully deleted from the bucket index. And then a reshard moves part of the 
object index to shard 0.

Upgrading to a version that included Casey’s fix would mean this situation is 
not re-created in the future.

An automated clean-up is non-trivial but feasible. It would have to take into 
account that an object with the same name as the previously deleted one was 
re-created in the versioned bucket.

Eric

> On Oct 1, 2020, at 8:46 AM, Matt Benjamin  wrote:
> 
> Hi Dan,
> 
> Possibly you're reproducing https://tracker.ceph.com/issues/46456.
> 
> That explains how the underlying issue worked, I don't remember how a
> bucked exhibiting this is repaired.
> 
> Eric?
> 
> Matt
> 
> 
> On Thu, Oct 1, 2020 at 8:41 AM Dan van der Ster  wrote:
>> 
>> Dear friends,
>> 
>> Running 14.2.11, we have one particularly large bucket with a very
>> strange distribution of objects among the shards. The bucket has 512
>> shards, and most shards have ~75k entries, but shard 0 has 1.75M
>> entries:
>> 
>> # rados -p default.rgw.buckets.index listomapkeys
>> .dir.61c59385-085d-4caa-9070-63a3868dccb6.272652427.1.0 | wc -l
>> 1752085
>> 
>> # rados -p default.rgw.buckets.index listomapkeys
>> .dir.61c59385-085d-4caa-9070-63a3868dccb6.272652427.1.1 | wc -l
>> 78388
>> 
>> # rados -p default.rgw.buckets.index listomapkeys
>> .dir.61c59385-085d-4caa-9070-63a3868dccb6.272652427.1.2 | wc -l
>> 78764
>> 
>> We had resharded this bucket (manually) from 32 up to 512 shards just
>> before upgrading from 12.2.12 to 14.2.11 a couple weeks ago.
>> 
>> Any idea why shard .0 is getting such an imbalance of entries?
>> Should we manually reshard this bucket again?
>> 
>> Thanks!
>> 
>> Dan
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
>> 
> 
> 
> -- 
> 
> Matt Benjamin
> Red Hat, Inc.
> 315 West Huron Street, Suite 140A
> Ann Arbor, Michigan 48103
> 
> http://www.redhat.com/en/technologies/storage
> 
> tel.  734-821-5101
> fax.  734-769-8938
> cel.  734-216-5309
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rgw index shard much larger than others

2020-10-01 Thread Dan van der Ster
Thanks Matt and Eric,

Sorry for the basic question, but how can I as a ceph operator tell if
a bucket is versioned?

And for fixing this current situation, I would wait for the fix then reshard?
(We want to reshard this bucket anyway because listing perf is way too
slow for the user with 512 shards).

-- Dan


On Thu, Oct 1, 2020 at 4:36 PM Eric Ivancich  wrote:
>
> Hi Matt and Dan,
>
> I too suspect it’s the issue Matt linked to. That bug only affects versioned 
> buckets, so I’m guessing your bucket is versioned, Dan.
>
> This bug is triggered when the final instance of an object in a versioned 
> bucket is deleted, but for reasons we do not yet understand, the object was 
> not fully deleted from the bucket index. And then a reshard moves part of the 
> object index to shard 0.
>
> Upgrading to a version that included Casey’s fix would mean this situation is 
> not re-created in the future.
>
> An automated clean-up is non-trivial but feasible. It would have to take into 
> account that an object with the same name as the previously deleted one was 
> re-created in the versioned bucket.
>
> Eric
>
> > On Oct 1, 2020, at 8:46 AM, Matt Benjamin  wrote:
> >
> > Hi Dan,
> >
> > Possibly you're reproducing https://tracker.ceph.com/issues/46456.
> >
> > That explains how the underlying issue worked, I don't remember how a
> > bucked exhibiting this is repaired.
> >
> > Eric?
> >
> > Matt
> >
> >
> > On Thu, Oct 1, 2020 at 8:41 AM Dan van der Ster  wrote:
> >>
> >> Dear friends,
> >>
> >> Running 14.2.11, we have one particularly large bucket with a very
> >> strange distribution of objects among the shards. The bucket has 512
> >> shards, and most shards have ~75k entries, but shard 0 has 1.75M
> >> entries:
> >>
> >> # rados -p default.rgw.buckets.index listomapkeys
> >> .dir.61c59385-085d-4caa-9070-63a3868dccb6.272652427.1.0 | wc -l
> >> 1752085
> >>
> >> # rados -p default.rgw.buckets.index listomapkeys
> >> .dir.61c59385-085d-4caa-9070-63a3868dccb6.272652427.1.1 | wc -l
> >> 78388
> >>
> >> # rados -p default.rgw.buckets.index listomapkeys
> >> .dir.61c59385-085d-4caa-9070-63a3868dccb6.272652427.1.2 | wc -l
> >> 78764
> >>
> >> We had resharded this bucket (manually) from 32 up to 512 shards just
> >> before upgrading from 12.2.12 to 14.2.11 a couple weeks ago.
> >>
> >> Any idea why shard .0 is getting such an imbalance of entries?
> >> Should we manually reshard this bucket again?
> >>
> >> Thanks!
> >>
> >> Dan
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
> >>
> >
> >
> > --
> >
> > Matt Benjamin
> > Red Hat, Inc.
> > 315 West Huron Street, Suite 140A
> > Ann Arbor, Michigan 48103
> >
> > http://www.redhat.com/en/technologies/storage
> >
> > tel.  734-821-5101
> > fax.  734-769-8938
> > cel.  734-216-5309
> >
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph-volume quite buggy compared to ceph-disk

2020-10-01 Thread Marc Roos

 > Did you have any success with `ceph-volume` for activating your OSD?

No, I have tried with ceph-volume prepare and ceph-volume activate, but 
got errors also. The only way for me to currently create an osd without 
hasle is: 

ceph-volume lvm zap --destroy /dev/sdf && 
 ceph-volume lvm create --data /dev/sdf --dmcrypt

and really like this, with the &&
But first read this[1] before you use this.

 > I am having a similar problem where the command `ceph-bluestore-tool`
 >fails to be able to read a label for a previously created OSD on an
 >LVM partition. I had previously been using the OSD without issues, but
 >after a reboot it fails to load.

ceph-volume is creating the systemd links I think, so if you go even 
more low level you have to create these yourself. Check if your 
ceph-volumes are existing and are mounted. I have them like this.

[@ ~]#  find /etc/ -iname "*ceph-volume*"
/etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-38-7cfec20d-
e963-4908-b4fc-0020f050a0d3.service
...

 > 1. I had initially created my OSD using Ceph Octopus 15.x with `ceph
 >orch daemon add osd :boot/cephfs_meta` that was able to
 >create an OSD on the LVM partition and bring up an OSD.
 > 2. After a reboot, the OSD fails to come up, with error from
 >`ceph-bluestore-tool` happening inside the container specifically
 >being unable to read the label of the device.

check these systemd entries and/or logs I guess ;)

 > 3. When I query the symlinked device /dev/boot/cephfs_meta ->
 >/dev/dm3, with `dmsetup info /dev/dm-3`, I can see the state is active
 >and that it has a UUID, etc.

I have these for a running osd, you have to check indeed if the tags are 
existing on the vg. I think this is the process of an lvm dmcrypt osd[2]

[@ ~]# dmsetup ls --tree
...
H1U5hz-j51i-HLeY-wzHx-Rqzv-buSc-xVIGnB (253:7)
 └─ceph--bb97bd0e--9edb--4d29--b6e8--0876359edc3c-osd--block--23bf0dfe
--6678--4633--b7ac--f4133da785be (253:6 

[@ ~]# lvs
  LV VG  
  Attr   LSizePool Origin Data%  
  ...
  osd-block-b232f7a5-8409-4992-ad4d-b5bbb8ffa2e1 
ceph-e459af98-4013-4860-84b8-2eb80fbd2f57 -wi-ao   <7.28t


 > 4. I installed `ceph-osd` CentOS package providing the
 >ceph-bluestore-tool, and tried to manually test and `sudo
 >ceph-bluestore-tool show-label --dev /dev/dm-3` fails to read the
 >label. When I try with other OSD's that were created for entire disks
 >this command is able to read the label and print out information.
 >
 > I am considering submitting a ticket to the ceph issue tracker, as I
 >am unable to figure out why the ceph-bluestore-tool cannot read the
 >labels and it seems either the OSD was initially created incorrectly
 >or there is a bug in ceph-bluestore-tool.

I cannot advice on this because I have not a clear understanding on what 
the procedure is to create and osd on this level. 

 > One possibility is that I did not have the LVM2 package installed on
 >this host prior to the `ceph orch daemon add ..` command and this
 >caused a particular issue with the LVM partition OSD.
 >
I am using centos7, which has the lvm2 by default. I am also having 
problems.


[1]
https://www.mail-archive.com/ceph-users@ceph.io/msg06624.html

[2]
https://www.mail-archive.com/ceph-users@ceph.io/msg06405.html
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Tech Talk: Karan Singh - Scale Testing Ceph with 10Billion+ Objects

2020-10-01 Thread Mike Perez
Hey all,

We're live now with the latest Ceph tech talk! Join us:

https://bluejeans.com/908675367/browser

-- 

Mike Perez

he/him

Ceph Community Manager


M: +1-951-572-2633

494C 5D25 2968 D361 65FB 3829 94BC D781 ADA8 8AEA
@Thingee   Thingee
 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Tech Talk: Karan Singh - Scale Testing Ceph with 10Billion+ Objects

2020-10-01 Thread Marc Roos
 
Mike, 

Can you allow access without mic and cam?

Thanks,
Marc



-Original Message-

To: ceph-users@ceph.io
Subject: *SPAM* [ceph-users] Ceph Tech Talk: Karan Singh - Scale 
Testing Ceph with 10Billion+ Objects

Hey all,

We're live now with the latest Ceph tech talk! Join us:

https://bluejeans.com/908675367/browser

-- 

Mike Perez

he/him

Ceph Community Manager


M: +1-951-572-2633

494C 5D25 2968 D361 65FB 3829 94BC D781 ADA8 8AEA @Thingee 
  Thingee 
 

___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Tech Talk: Karan Singh - Scale Testing Ceph with 10Billion+ Objects

2020-10-01 Thread Peter Sarossy
You can click "join without audio and video" at the bottom

On Thu, Oct 1, 2020 at 1:10 PM Marc Roos  wrote:

>
> Mike,
>
> Can you allow access without mic and cam?
>
> Thanks,
> Marc
>
>
>
> -Original Message-
>
> To: ceph-users@ceph.io
> Subject: *SPAM* [ceph-users] Ceph Tech Talk: Karan Singh - Scale
> Testing Ceph with 10Billion+ Objects
>
> Hey all,
>
> We're live now with the latest Ceph tech talk! Join us:
>
> https://bluejeans.com/908675367/browser
>
> --
>
> Mike Perez
>
> he/him
>
> Ceph Community Manager
>
>
> M: +1-951-572-2633
>
> 494C 5D25 2968 D361 65FB 3829 94BC D781 ADA8 8AEA @Thingee
>   Thingee
>  
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> email to ceph-users-le...@ceph.io
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Cheers,
Peter Sarossy
Technical Program Manager
Data Center Data Security - Google LLC.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Tech Talk: Karan Singh - Scale Testing Ceph with 10Billion+ Objects

2020-10-01 Thread Marc Roos
 
P, thanks, you are right, I am blind and impatient not to look under 
options.


-Original Message-
Cc: ceph-users; miperez
Subject: *SPAM* Re: [ceph-users] Ceph Tech Talk: Karan Singh - 
Scale Testing Ceph with 10Billion+ Objects

You can click "join without audio and video" at the bottom

On Thu, Oct 1, 2020 at 1:10 PM Marc Roos  
wrote:

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Feedback for proof of concept OSD Node

2020-10-01 Thread Ignacio Ocampo
RGW and RBD primarily, CephFS in less capacity.

> On 1 Oct 2020, at 9:58, Nathan Fish  wrote:
> 
> 
> What kind of cache configuration are you planning? Are you going to use 
> CephFS, RGW, and/or RBD?
> 
>> On Tue, Sep 29, 2020 at 2:45 AM Ignacio Ocampo  wrote:
>> Hi All :),
>> 
>> I would like to get your feedback about the components below to build a PoC 
>> OSD Node (I will build 3 of these).
>> 
>> SSD for OS.
>> NVMe for cache.
>> HDD for storage.
>> 
>> The Supermicro motherboard has 2 10Gb cards, and I will use ECC memories.
>> 
>> 
>> 
>> 
>> Thanks for your feedback!
>> 
>> -- 
>> Ignacio Ocampo
>> 
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs tag not working

2020-10-01 Thread Frank Schilder
There used to be / is a bug in ceph fs commands when using data pools. If you 
enable the application cephfs on a pool explicitly before running cephfs add 
datapool, the fs-tag is not applied. Maybe its that? There is an older thread 
on the topic in the users-list and also a fix/workaround.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Eugen Block 
Sent: 01 October 2020 15:33:53
To: ceph-users@ceph.io
Subject: [ceph-users] Re: cephfs tag not working

Hi,

I have a one-node-cluster (also 15.2.4) for testing purposes and just
created a cephfs with the tag, it works for me. But my node is also
its own client, so there's that. And it was installed with 15.2.4, no
upgrade.

> For the 2nd, mds works, files can be created or removed, but client
> read/write (native client, kernel version 5.7.4) fails with I/O
> error, so osd part does not seem to be working properly.

You mean it works if you mount it from a different host (within the
cluster maybe) with the new client's key but it doesn't work with the
designated clients? I'm not sure about the OSD part since the other
syntax seems to work, you say.

Can you share more details about the error? The mount on the clients
works but they can't read/write?

Regards,
Eugen


Zitat von Andrej Filipcic :

> Hi,
>
> on octopus 15.2.4 I have an issue with cephfs tag auth. The
> following works fine:
>
> client.f9desktop
> key: 
> caps: [mds] allow rw
> caps: [mon] allow r
> caps: [osd] allow rw  pool=cephfs_data, allow rw
> pool=ssd_data, allow rw pool=fast_data,  allow rw pool=arich_data,
> allow rw pool=ecfast_data
>
> but this one does not.
>
> client.f9desktopnew
> key: 
> caps: [mds] allow rw
> caps: [mon] allow r
> caps: [osd] allow rw tag cephfs data=cephfs
>
> For the 2nd, mds works, files can be created or removed, but client
> read/write (native client, kernel version 5.7.4) fails with I/O
> error, so osd part does not seem to be working properly.
>
> Any clues what can be wrong? the cephfs was created in jewel...
>
> Another issue is: if osd caps are updated (adding data pool), then
> some clients refresh the caps, but most of them do not, and the only
> way to refresh it is to remount the filesystem. working tag would
> solve it.
>
> Best regards,
> Andrej
>
> --
> _
>prof. dr. Andrej Filipcic,   E-mail: andrej.filip...@ijs.si
>Department of Experimental High Energy Physics - F9
>Jozef Stefan Institute, Jamova 39, P.o.Box 3000
>SI-1001 Ljubljana, Slovenia
>Tel.: +386-1-477-3674Fax: +386-1-477-3166
> -
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs tag not working

2020-10-01 Thread Patrick Donnelly
On Thu, Oct 1, 2020 at 6:57 AM Frank Schilder  wrote:
>
> There used to be / is a bug in ceph fs commands when using data pools. If you 
> enable the application cephfs on a pool explicitly before running cephfs add 
> datapool, the fs-tag is not applied. Maybe its that? There is an older thread 
> on the topic in the users-list and also a fix/workaround.

This is likely to be the problem. Please add the application tag to
your CephFS data pools:
https://docs.ceph.com/en/latest/rados/operations/pools/#associate-pool-to-application

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs tag not working

2020-10-01 Thread Andrej Filipcic

On 2020-10-01 15:56, Frank Schilder wrote:

There used to be / is a bug in ceph fs commands when using data pools. If you 
enable the application cephfs on a pool explicitly before running cephfs add 
datapool, the fs-tag is not applied. Maybe its that? There is an older thread 
on the topic in the users-list and also a fix/workaround.
Thanks, found it, that was it. I enabled application before adding the 
pool. Only the latest pool had the cephfs key/value in. The fix worked.


Best regards,
Andrej


Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Eugen Block 
Sent: 01 October 2020 15:33:53
To: ceph-users@ceph.io
Subject: [ceph-users] Re: cephfs tag not working

Hi,

I have a one-node-cluster (also 15.2.4) for testing purposes and just
created a cephfs with the tag, it works for me. But my node is also
its own client, so there's that. And it was installed with 15.2.4, no
upgrade.


For the 2nd, mds works, files can be created or removed, but client
read/write (native client, kernel version 5.7.4) fails with I/O
error, so osd part does not seem to be working properly.

You mean it works if you mount it from a different host (within the
cluster maybe) with the new client's key but it doesn't work with the
designated clients? I'm not sure about the OSD part since the other
syntax seems to work, you say.

Can you share more details about the error? The mount on the clients
works but they can't read/write?

Regards,
Eugen


Zitat von Andrej Filipcic :


Hi,

on octopus 15.2.4 I have an issue with cephfs tag auth. The
following works fine:

client.f9desktop
 key: 
 caps: [mds] allow rw
 caps: [mon] allow r
 caps: [osd] allow rw  pool=cephfs_data, allow rw
pool=ssd_data, allow rw pool=fast_data,  allow rw pool=arich_data,
allow rw pool=ecfast_data

but this one does not.

client.f9desktopnew
 key: 
 caps: [mds] allow rw
 caps: [mon] allow r
 caps: [osd] allow rw tag cephfs data=cephfs

For the 2nd, mds works, files can be created or removed, but client
read/write (native client, kernel version 5.7.4) fails with I/O
error, so osd part does not seem to be working properly.

Any clues what can be wrong? the cephfs was created in jewel...

Another issue is: if osd caps are updated (adding data pool), then
some clients refresh the caps, but most of them do not, and the only
way to refresh it is to remount the filesystem. working tag would
solve it.

Best regards,
Andrej

--
_
prof. dr. Andrej Filipcic,   E-mail: andrej.filip...@ijs.si
Department of Experimental High Energy Physics - F9
Jozef Stefan Institute, Jamova 39, P.o.Box 3000
SI-1001 Ljubljana, Slovenia
Tel.: +386-1-477-3674Fax: +386-1-477-3166
-
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



--
_
   prof. dr. Andrej Filipcic,   E-mail: andrej.filip...@ijs.si
   Department of Experimental High Energy Physics - F9
   Jozef Stefan Institute, Jamova 39, P.o.Box 3000
   SI-1001 Ljubljana, Slovenia
   Tel.: +386-1-477-3674Fax: +386-1-425-7074
-
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Feedback for proof of concept OSD Node

2020-10-01 Thread Brian Topping
Welcome to Ceph!

I think better questions to start with are “what are your objectives in your 
study?” Is it just seeing Ceph run with many disks, or are you trying to see 
how much performance you can get out of it with distributed disk? What is your 
budget? Do you want to try different combinations of storage devices to learn 
how they differ in performance or do you just want to jump to the fastest 
things out there?

One often doesn’t need a bunch of machines to determine that Ceph is a really 
versatile and robust solution. I pretty regularly deploy Ceph on a single node 
using Kubernetes and Rook. Some would ask “why would one ever do that, just use 
direct storage!”. The answer is when I want to expand a cluster, I am willing 
to have traded initial performance overhead for letting Ceph distribute data at 
a later date. And the overhead is far lower than one might think when there’s 
not a network bottleneck to deal with. I do use direct storage on LVM when I 
have distributed workloads such as Kafka that abstract storage that a service 
instance depends on. It doesn’t make much sense in my mind for Kafka or 
Cassandra to use Ceph because I can afford to lose nodes using those services.

In other words, Ceph is virtualized storage. You have likely come to it because 
your workloads need to be able to come up anywhere on your network and reach 
that storage. How do you see those workloads exercising the capabilities of 
Ceph? That’s where your interesting use cases come from, and can help you 
better decide what the best lab platform is to get started.

Hope that helps, Brian

> On Sep 29, 2020, at 12:44 AM, Ignacio Ocampo  wrote:
> 
> Hi All :),
> 
> I would like to get your feedback about the components below to build a PoC 
> OSD Node (I will build 3 of these).
> 
> SSD for OS.
> NVMe for cache.
> HDD for storage.
> 
> The Supermicro motherboard has 2 10Gb cards, and I will use ECC memories.
> 
> 
> 
> Thanks for your feedback!
> 
> -- 
> Ignacio Ocampo
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph-volume quite buggy compared to ceph-disk

2020-10-01 Thread tri
Hi Matt, Marc,

I'm using Ceph Otopus with cephadm as the orchestration tool. I've tried adding 
OSDs with ceph orch daemon add ... but it's pretty limited. For one, you can't 
create dmcrypt OSD with it nor having a separate db device.  I found that the 
most reliable way to create OSD with cephadm orchestration tool is via the spec 
file (i.e. ceph orch apply osd -i osd.spec ).  For example, you can ask it to 
find all the HDD disk by certain model, size... on a particular host(s) and 
make them into osds. Here is a simple spec file:

service_type: osd
service_id: furry-osd
placement:
  host_pattern: 'furry'
data_devices:
  size: '5900G:6000G'
encrypted: true

You can find more info here: 
https://docs.ceph.com/en/latest/cephadm/drivegroups/

However, this method only works with full disks, not partitions nor LVs. You 
can use 'ceph orch device ls  --refresh' to list all avail disks a 
particular host and why certain disks are't avail.

My understanding with the ceph-volume lvm is that it uses LV labels exclusively 
to find the block/db/wal devices using LVM. During startup, it will use lvm to 
find the OSD blocks, setup dmcrypt volume (if required), create proper links, 
and execute ceph-osd command.  The existing links in /var/lib/ceph/ceph-osd/ 
would be overrided by the info from the LV tags.

You can use lvs -o lv_tags on an LV to see all the labels created for an OSD.

Hope it helps.

--Tri Hoang


October 1, 2020 10:23 AM, "Matt Larson"  wrote:

> Hi Marc,
> 
> Did you have any success with `ceph-volume` for activating your OSD?
> 
> I am having a similar problem where the command `ceph-bluestore-tool`
> fails to be able to read a label for a previously created OSD on an
> LVM partition. I had previously been using the OSD without issues, but
> after a reboot it fails to load.
> 
> 1. I had initially created my OSD using Ceph Octopus 15.x with `ceph
> orch daemon add osd :boot/cephfs_meta` that was able to
> create an OSD on the LVM partition and bring up an OSD.
> 2. After a reboot, the OSD fails to come up, with error from
> `ceph-bluestore-tool` happening inside the container specifically
> being unable to read the label of the device.
> 3. When I query the symlinked device /dev/boot/cephfs_meta ->
> /dev/dm3, with `dmsetup info /dev/dm-3`, I can see the state is active
> and that it has a UUID, etc.
> 4. I installed `ceph-osd` CentOS package providing the
> ceph-bluestore-tool, and tried to manually test and `sudo
> ceph-bluestore-tool show-label --dev /dev/dm-3` fails to read the
> label. When I try with other OSD's that were created for entire disks
> this command is able to read the label and print out information.
> 
> I am considering submitting a ticket to the ceph issue tracker, as I
> am unable to figure out why the ceph-bluestore-tool cannot read the
> labels and it seems either the OSD was initially created incorrectly
> or there is a bug in ceph-bluestore-tool.
> 
> One possibility is that I did not have the LVM2 package installed on
> this host prior to the `ceph orch daemon add ..` command and this
> caused a particular issue with the LVM partition OSD.
> 
> -Matt
> 
> On Sat, Sep 19, 2020 at 9:11 AM Marc Roos  wrote:
> 
>> [@]# ceph-volume lvm activate 36 82b94115-4dfb-4ed0-8801-def59a432b0a
>> Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-36
>> Running command: /usr/bin/ceph-authtool
>> /var/lib/ceph/osd/ceph-36/lockbox.keyring --create-keyring --name
>> client.osd-lockbox.82b94115-4dfb-4ed0-8801-def59a432b0a --add-key
>> AQBxA2Zfj6avOBAAIIHqNNY2J22EnOZV+dNzFQ==
>> stdout: creating /var/lib/ceph/osd/ceph-36/lockbox.keyring
>> added entity client.osd-lockbox.82b94115-4dfb-4ed0-8801-def59a432b0a
>> auth(key=AQBxA2Zfj6avOBAAIIHqNNY2J22EnOZV+dNzFQ==)
>> Running command: /usr/bin/chown -R ceph:ceph
>> /var/lib/ceph/osd/ceph-36/lockbox.keyring
>> Running command: /usr/bin/ceph --cluster ceph --name
>> client.osd-lockbox.82b94115-4dfb-4ed0-8801-def59a432b0a --keyring
>> /var/lib/ceph/osd/ceph-36/lockbox.keyring config-key get
>> dm-crypt/osd/82b94115-4dfb-4ed0-8801-def59a432b0a/luks
>> Running command: /usr/sbin/cryptsetup --key-file - --allow-discards
>> luksOpen
>> /dev/ceph-9263e83b-7660-4f5b-843a-2111e882a17e/osd-block-82b94115-4dfb-4
>> ed0-8801-def59a432b0a I8MyTZ-RQjx-gGmd-XSRw-kfa1-L60n-fgQpCb
>> stderr: Device I8MyTZ-RQjx-gGmd-XSRw-kfa1-L60n-fgQpCb already exists.
>> Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-36
>> Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph
>> prime-osd-dir --dev /dev/mapper/I8MyTZ-RQjx-gGmd-XSRw-kfa1-L60n-fgQpCb
>> --path /var/lib/ceph/osd/ceph-36 --no-mon-config
>> stderr: failed to read label for
>> /dev/mapper/I8MyTZ-RQjx-gGmd-XSRw-kfa1-L60n-fgQpCb: (2) No such file or
>> directory
>> --> RuntimeError: command returned non-zero exit status: 1
>> 
>> dmsetup ls lists this
>> 
>> Where is an option to set the weight? As far as I can see you can only
>> set this after peering started?
>> 

[ceph-users] Re: rgw index shard much larger than others

2020-10-01 Thread Eric Ivancich
Hi Dan,

One way to tell would be to do a:

radosgw-admin bi list —bucket=

And see if any of the lines output contains (perhaps using `grep`):

"type": "olh",

That would tell you if there were any versioned objects in the bucket.

The “fix” we currently have only prevents this from happening in the future. We 
currently do not have a “fix” that cleans up the bucket index. Like I mentioned 
— an automated clean-up is non-trivial but feasible; it would have to take into 
account that an object with the same name as the previously deleted one was 
re-created in the versioned bucket.

I hope that’s informative, if not what you were hoping to hear.

Eric
--
J. Eric Ivancich
he / him / his
Red Hat Storage
Ann Arbor, Michigan, USA

> On Oct 1, 2020, at 10:53 AM, Dan van der Ster  wrote:
> 
> Thanks Matt and Eric,
> 
> Sorry for the basic question, but how can I as a ceph operator tell if
> a bucket is versioned?
> 
> And for fixing this current situation, I would wait for the fix then reshard?
> (We want to reshard this bucket anyway because listing perf is way too
> slow for the user with 512 shards).
> 
> -- Dan
> 
> 
> On Thu, Oct 1, 2020 at 4:36 PM Eric Ivancich  wrote:
>> 
>> Hi Matt and Dan,
>> 
>> I too suspect it’s the issue Matt linked to. That bug only affects versioned 
>> buckets, so I’m guessing your bucket is versioned, Dan.
>> 
>> This bug is triggered when the final instance of an object in a versioned 
>> bucket is deleted, but for reasons we do not yet understand, the object was 
>> not fully deleted from the bucket index. And then a reshard moves part of 
>> the object index to shard 0.
>> 
>> Upgrading to a version that included Casey’s fix would mean this situation 
>> is not re-created in the future.
>> 
>> An automated clean-up is non-trivial but feasible. It would have to take 
>> into account that an object with the same name as the previously deleted one 
>> was re-created in the versioned bucket.
>> 
>> Eric
>> 
>>> On Oct 1, 2020, at 8:46 AM, Matt Benjamin  wrote:
>>> 
>>> Hi Dan,
>>> 
>>> Possibly you're reproducing https://tracker.ceph.com/issues/46456.
>>> 
>>> That explains how the underlying issue worked, I don't remember how a
>>> bucked exhibiting this is repaired.
>>> 
>>> Eric?
>>> 
>>> Matt
>>> 
>>> 
>>> On Thu, Oct 1, 2020 at 8:41 AM Dan van der Ster  wrote:
 
 Dear friends,
 
 Running 14.2.11, we have one particularly large bucket with a very
 strange distribution of objects among the shards. The bucket has 512
 shards, and most shards have ~75k entries, but shard 0 has 1.75M
 entries:
 
 # rados -p default.rgw.buckets.index listomapkeys
 .dir.61c59385-085d-4caa-9070-63a3868dccb6.272652427.1.0 | wc -l
 1752085
 
 # rados -p default.rgw.buckets.index listomapkeys
 .dir.61c59385-085d-4caa-9070-63a3868dccb6.272652427.1.1 | wc -l
 78388
 
 # rados -p default.rgw.buckets.index listomapkeys
 .dir.61c59385-085d-4caa-9070-63a3868dccb6.272652427.1.2 | wc -l
 78764
 
 We had resharded this bucket (manually) from 32 up to 512 shards just
 before upgrading from 12.2.12 to 14.2.11 a couple weeks ago.
 
 Any idea why shard .0 is getting such an imbalance of entries?
 Should we manually reshard this bucket again?
 
 Thanks!
 
 Dan
 ___
 ceph-users mailing list -- ceph-users@ceph.io
 To unsubscribe send an email to ceph-users-le...@ceph.io
 
 
>>> 
>>> 
>>> --
>>> 
>>> Matt Benjamin
>>> Red Hat, Inc.
>>> 315 West Huron Street, Suite 140A
>>> Ann Arbor, Michigan 48103
>>> 
>>> http://www.redhat.com/en/technologies/storage
>>> 
>>> tel.  734-821-5101
>>> fax.  734-769-8938
>>> cel.  734-216-5309
>>> 
>> 
> 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RFC: Possible replacement for ceph-disk

2020-10-01 Thread Nico Schottelius


Good evening,

since 2018 we have been using a custom script to create disks /
partitions, because at the time both ceph-disk and ceph-volume exhibited
bugs that made them unreliable for us.

We recently re-tested ceph-volume and while it seems generally speaking
[0] to work, using LVM seems to introduce an additional layer that is
not needed from our side.

The script we created is little less than 100 lines long and works by
specifying the device and its device class:

./ceph-osd-create-start /dev/sdd ssd

Everything else is determined by the script. As the script is very
simple and work independently of any init system, we wanted to discuss
whether it would make sense for anyone else to re-integrate something
like it back into ceph upstream?

We are aware that ceph-disk has been deprecated, however
ceph-volume raw had some issues our setup [2].

The script itself can be found on [1]. At the end it has some ungleich
specifics, but we'd be very open to remove that.

Best regards,

Nico


[0] https://tracker.ceph.com/issues/47724
[1] 
https://code.ungleich.ch/ungleich-public/ungleich-tools/-/blob/master/ceph-osd-create-start
[2]

[00:19:45] server8.place6:/var/lib/ceph/osd# ceph-volume raw prepare 
--bluestore --data /dev/sdn
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd 
--keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 
d0fa7074-6cdf-4947-ac1e-e73dd0ec1fe8
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-3
--> Executable selinuxenabled not in PATH: 
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
Running command: /bin/chown -R ceph:ceph /dev/sdn
Running command: /bin/ln -s /dev/sdn /var/lib/ceph/osd/ceph-3/block
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd 
--keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o 
/var/lib/ceph/osd/ceph-3/activate.monmap
 stderr: 2020-09-28 00:20:48.276 7f8c90f27700 -1 auth: unable to find a keyring 
on 
/etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,:
 (2) No such file or directory
2020-09-28 00:20:48.276 7f8c90f27700 -1 AuthRegistry(0x7f8c8c081d08) no keyring 
found at 
/etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,,
 disabling cephx
 stderr: got monmap epoch 12
Running command: /usr/bin/ceph-authtool /var/lib/ceph/osd/ceph-3/keyring 
--create-keyring --name osd.3 --add-key AQA/EHFfINr/HhAAVFAP9NF2LLGFrJEGXbbMSw==
 stdout: creating /var/lib/ceph/osd/ceph-3/keyring
added entity osd.3 auth(key=AQA/EHFfINr/HhAAVFAP9NF2LLGFrJEGXbbMSw==)
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-3/keyring
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-3/
Running command: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore 
--mkfs -i 3 --monmap /var/lib/ceph/osd/ceph-3/activate.monmap --keyfile - 
--osd-data /var/lib/ceph/osd/ceph-3/ --osd-uuid 
d0fa7074-6cdf-4947-ac1e-e73dd0ec1fe8 --setuser ceph --setgroup ceph
 stderr: 2020-09-28 00:20:48.760 7f6e34ebfd80 -1 
bluestore(/var/lib/ceph/osd/ceph-3/) _read_fsid unparsable uuid
 stderr: 2020-09-28 00:20:55.912 7f6e34ebfd80 -1 bdev(0x561a9a0f6700 
/var/lib/ceph/osd/ceph-3//block) _lock flock failed on 
/var/lib/ceph/osd/ceph-3//block
 stderr: 2020-09-28 00:20:55.912 7f6e34ebfd80 -1 bdev(0x561a9a0f6700 
/var/lib/ceph/osd/ceph-3//block) open failed to lock 
/var/lib/ceph/osd/ceph-3//block: (11) Resource temporarily unavailable
 stderr: 2020-09-28 00:20:55.912 7f6e34ebfd80 -1 OSD::mkfs: couldn't mount 
ObjectStore: error (11) Resource temporarily unavailable
 stderr: 2020-09-28 00:20:55.912 7f6e34ebfd80 -1  ** ERROR: error creating 
empty object store in /var/lib/ceph/osd/ceph-3/: (11) Resource temporarily 
unavailable
--> Was unable to complete a new OSD, will rollback changes
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd 
--keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.3 
--yes-i-really-mean-it
 stderr: 2020-09-28 00:20:56.072 7f3e446a9700 -1 auth: unable to find a keyring 
on 
/etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,:
 (2) No such file or directory
2020-09-28 00:20:56.072 7f3e446a9700 -1 AuthRegistry(0x7f3e3c081d08) no keyring 
found at 
/etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,,
 disabling cephx
 stderr: purged osd.3
-->  RuntimeError: Command failed with exit code 250: /usr/bin/ceph-osd 
--cluster ceph --osd-objectstore bluestore --mkfs -i 3 --monmap 
/var/lib/ceph/osd/ceph-3/activate.monmap --keyfile - --osd-data 
/var/lib/ceph/osd/ceph-3/ --osd-uuid d0fa7074-6cdf-4947-ac1e-e73dd0ec1fe8 
--setuser ceph --setgroup ceph
[00:20:56] server8.place6:/var/lib/ceph/osd#

--
Modern, afford

[ceph-users] Re: Feedback for proof of concept OSD Node

2020-10-01 Thread Ignacio Ocampo
Hi Brian,

Here more context about what I want to accomplish: I've migrated a bunch of
services from AWS to a local server, but having everything in a single
server is not safe, and instead of investing in RAID, I would like to start
setting up a small Ceph Cluster to have redundancy and a robust mechanism
in case any component fails.

Also, in the mid-term, I do have plans to deploy a small OpenStack Cluster.

Because of that, I would like to set up the first small Ceph Cluster that
can scale as my needs grow, the idea is to have 3 OSD nodes with the same
characteristics and add additional HDDs as needed, up to 5 HDD per OSD
node, starting with 1 HDD per node.

Thanks!

On Thu, Oct 1, 2020 at 11:35 AM Brian Topping 
wrote:

> Welcome to Ceph!
>
> I think better questions to start with are “what are your objectives in
> your study?” Is it just seeing Ceph run with many disks, or are you trying
> to see how much performance you can get out of it with distributed disk?
> What is your budget? Do you want to try different combinations of storage
> devices to learn how they differ in performance or do you just want to jump
> to the fastest things out there?
>
> One often doesn’t need a bunch of machines to determine that Ceph is a
> really versatile and robust solution. I pretty regularly deploy Ceph on a
> single node using Kubernetes and Rook. Some would ask “why would one ever
> do that, just use direct storage!”. The answer is when I want to expand a
> cluster, I am willing to have traded initial performance overhead for
> letting Ceph distribute data at a later date. And the overhead is far lower
> than one might think when there’s not a network bottleneck to deal with. I
> do use direct storage on LVM when I have distributed workloads such as
> Kafka that abstract storage that a service instance depends on. It doesn’t
> make much sense in my mind for Kafka or Cassandra to use Ceph because I can
> afford to lose nodes using those services.
>
> In other words, Ceph is virtualized storage. You have likely come to it
> because your workloads need to be able to come up anywhere on your network
> and reach that storage. How do you see those workloads exercising the
> capabilities of Ceph? That’s where your interesting use cases come from,
> and can help you better decide what the best lab platform is to get started.
>
> Hope that helps, Brian
>
> On Sep 29, 2020, at 12:44 AM, Ignacio Ocampo  wrote:
>
> Hi All :),
>
> I would like to get your feedback about the components below to build a
> PoC OSD Node (I will build 3 of these).
>
> SSD for OS.
> NVMe for cache.
> HDD for storage.
>
> The Supermicro motherboard has 2 10Gb cards, and I will use ECC memories.
>
> 
>
> Thanks for your feedback!
>
> --
> Ignacio Ocampo
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>

-- 
Ignacio Ocampo
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [Suspicious newsletter] Re: Ceph Tech Talk: Karan Singh - Scale Testing Ceph with 10Billion+ Objects

2020-10-01 Thread Szabo, Istvan (Agoda)
Hi,

Is it available for download or youtube?

Thank you.

From: Peter Sarossy 
Sent: Friday, October 2, 2020 12:12 AM
To: Marc Roos
Cc: ceph-users
Subject: [Suspicious newsletter] [ceph-users] Re: Ceph Tech Talk: Karan Singh - 
Scale Testing Ceph with 10Billion+ Objects

Email received from outside the company. If in doubt don't click links nor open 
attachments!


You can click "join without audio and video" at the bottom

On Thu, Oct 1, 2020 at 1:10 PM Marc Roos  wrote:

>
> Mike,
>
> Can you allow access without mic and cam?
>
> Thanks,
> Marc
>
>
>
> -Original Message-
>
> To: ceph-users@ceph.io
> Subject: *SPAM* [ceph-users] Ceph Tech Talk: Karan Singh - Scale
> Testing Ceph with 10Billion+ Objects
>
> Hey all,
>
> We're live now with the latest Ceph tech talk! Join us:
>
> https://bluejeans.com/908675367/browser
>
> --
>
> Mike Perez
>
> he/him
>
> Ceph Community Manager
>
>
> M: +1-951-572-2633
>
> 494C 5D25 2968 D361 65FB 3829 94BC D781 ADA8 8AEA @Thingee
>   Thingee
>  
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> email to ceph-users-le...@ceph.io
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


--
Cheers,
Peter Sarossy
Technical Program Manager
Data Center Data Security - Google LLC.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io