[ceph-users] Re: Benchmark WAL/DB on SSD and HDD for RGW RBD CephFS

2020-09-18 Thread huxia...@horebdata.cn
Dear Maged,

Do you mean dm-writecache is better than B-cache in terms of small IO 
performance. By how much? Could you please share us a bit more details?

thanks in advance,

Samuel





huxia...@horebdata.cn
 
From: Maged Mokhtar
Date: 2020-09-18 02:12
To: ceph-users
Subject: [ceph-users] Re: Benchmark WAL/DB on SSD and HDD for RGW RBD CephFS
 
On 17/09/2020 19:21, vita...@yourcmc.ru wrote:
>   RBD in fact doesn't benefit much from the WAL/DB partition alone because 
> Bluestore never does more writes per second than HDD can do on average (it 
> flushes every 32 writes to the HDD). For RBD, the best thing is bcache.
 
rbd will benefit: for each write data iop, there could be a metada read 
iop (unless it is cached) + a write iop, so taking these extra metadata 
iops away from the hdd will make a difference for small block sizes. 
Even for data flushes (not sure if it is 32 or 64) if the data is not 
totally random, the io scheduler for the hdd (cfq or deadline) will 
either merge blocks or order them in a way which can sustain higher 
client iops.
 
we did test dm-cache, bcache and dm-writecache, we found the later to be 
much better.
 
/Maged
 
 
>
> Just try to fill up your OSDs up to a decent point to see the difference 
> because a lot of objects means a lot of metadata and when there's a lot of 
> metadata it stops fitting in cache. The performance and the performance 
> difference will also depend on whether your HDDs have internal SSD/media 
> cache (a lot of them do even if you're unaware of it).
>
> +1 for hsbench, just be careful and use my repo 
> https://github.com/vitalif/hsbench because the original has at least 2 bugs 
> for now:
> 1) it only reads first 64KB when benchmarking GETs
> 2) it reads objects sequentially instead of reading them randomly
>
> The first one actually has a fix waiting to be merged in a someone's pull 
> request, the second is my fix, I can submit a PR later.
>
>> Yes, I agree that there are many knob for fine tuning Ceph performance.
>> The problem is we don't have data which workload that benefit most from
>> WAL/DB in SSD vs in same spinning drive and by how much. Does it really
>> help in a cluster that mostly for object storage/RGW? Or may be just
>> block storage/RBD workload that benefit most?
>>
>> IMHO, I think we need some cost-benefit analysis from this because the
>> cost placing WAL/DB in SSD is quite noticeable (multiple OSD would be
>> fail when SSD fail and capacity reduced).
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Spanning OSDs over two drives

2020-09-18 Thread Robert Sander
Hi,

Am 18.09.20 um 03:53 schrieb Liam MacKenzie:

> As I understand that using RAID isn't recommended, how would I best deploy my 
> cluster so it's smart enough to group drives according to the trays that 
> they're in?

You could treat both disks as one and do a RAID0 over them with one OSD
on it. Double the space and the same risk.

If you have at least "host" as failure domain then you even have no
copies of the same object in one single host. That means it does not
matter if you take two OSDs offline at the same time.

Regards
-- 
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Benchmark WAL/DB on SSD and HDD for RGW RBD CephFS

2020-09-18 Thread vitalif
> we did test dm-cache, bcache and dm-writecache, we found the later to be
> much better.

Did you set bcache block size to 4096 during your tests? Without this setting 
it's slow because 99.9% SSDs don't handle 512 byte overwrites well. Otherwise I 
don't think bcache should be worse than dm-writecache. Also dm-writecache only 
caches writes, and bcache also caches reads. And lvmcache is trash because it 
only writes to SSD when the block is already on the SSD.

Please post some details about the comparison if you have them :)
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: multiple OSD crash, unfound objects

2020-09-18 Thread Frank Schilder
Dear Michael,

firstly, I'm a bit confused why you started deleting data. The objects were 
unfound, but still there. That's a small issue. Now the data might be gone and 
that's a real issue.


Interval:

Anyone reading this: I have seen many threads where ceph admins started 
deleting objects or PGs or even purging OSDs way too early from a cluster. 
Trying to recover health by deleting data is a contradiction. Ceph has bugs and 
sometimes it needs some help finding everything again. As far as I know, for 
most of these bugs there are workarounds that allow full recovery with a bit of 
work.


First question is, did you delete the entire object or just a shard on one 
disk? Are there OSDs that might still have a copy?

If the object is gone for good, the file references something that doesn't 
exist - its like a bad sector. You probably need to delete the file. Bit 
strange that the operation does not err out with a read error. Maybe it doesn't 
because it waits for the unfound objects state to be resolved?

For all the other unfound objects, they are there somewhere - you didn't loose 
a disk or something. Try pushing ceph to scan the correct OSDs, for example, by 
restarting the newly added OSDs one by one or something similar. Sometimes 
exporting and importing a PG from one OSD to another forces a re-scan and 
subsequent discovery of unfound objects. It is also possible that ceph will 
find these objects along the way of recovery or when OSDs scrub or check for 
objects that can be deleted.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Michael Thomas 
Sent: 17 September 2020 22:27:47
To: Frank Schilder; ceph-users@ceph.io
Subject: Re: [ceph-users] multiple OSD crash, unfound objects

Hi Frank,

Yes, it does sounds similar to your ticket.

I've tried a few things to restore the failed files:

* Locate a missing object with 'ceph pg $pgid list_unfound'

* Convert the hex oid to a decimal inode number

* Identify the affected file with 'find /ceph -inum $inode'

At this point, I know which file is affected by the missing object.  As
expected, attempts to read the file simply hang.  Unexpectedly, attempts
to 'ls' the file or its containing directory also hang.  I presume from
this that the stat() system call needs some information that is
contained in the missing object, and is waiting for the object to become
available.

Next I tried to remove the affected object with:

* ceph pg $pgid mark_unfound_lost delete

Now 'ceph status' shows one fewer missing objects, but attempts to 'ls'
or 'rm' the affected file continue to hang.

Finally, I ran a scrub over the part of the filesystem containing the
affected file:

ceph tell mds.ceph4 scrub start /frames/postO3/hoft recursive

Nothing seemed to come up during the scrub:

2020-09-17T14:56:15.208-0500 7f39bca24700  1 mds.ceph4 asok_command:
scrub status {prefix=scrub status} (starting...)
2020-09-17T14:58:58.013-0500 7f39bca24700  1 mds.ceph4 asok_command:
scrub start {path=/frames/postO3/hoft,prefix=scrub
start,scrubops=[recursive]} (starting...)
2020-09-17T14:58:58.013-0500 7f39b5215700  0 log_channel(cluster) log
[INF] : scrub summary: active
2020-09-17T14:58:58.014-0500 7f39b5215700  0 log_channel(cluster) log
[INF] : scrub queued for path: /frames/postO3/hoft
2020-09-17T14:58:58.014-0500 7f39b5215700  0 log_channel(cluster) log
[INF] : scrub summary: active [paths:/frames/postO3/hoft]
2020-09-17T14:59:02.535-0500 7f39bca24700  1 mds.ceph4 asok_command:
scrub status {prefix=scrub status} (starting...)
2020-09-17T15:00:12.520-0500 7f39bca24700  1 mds.ceph4 asok_command:
scrub status {prefix=scrub status} (starting...)
2020-09-17T15:02:32.944-0500 7f39b5215700  0 log_channel(cluster) log
[INF] : scrub summary: idle
2020-09-17T15:02:32.945-0500 7f39b5215700  0 log_channel(cluster) log
[INF] : scrub complete with tag '1405e5c7-3ecf-4754-918e-129e9d101f7a'
2020-09-17T15:02:32.945-0500 7f39b5215700  0 log_channel(cluster) log
[INF] : scrub completed for path: /frames/postO3/hoft
2020-09-17T15:02:32.945-0500 7f39b5215700  0 log_channel(cluster) log
[INF] : scrub summary: idle


After the scrub completed, access to the file (ls or rm) continue to
hang.  The MDS reports slow reads:

2020-09-17T15:11:05.654-0500 7f39b9a1e700  0 log_channel(cluster) log
[WRN] : slow request 481.867381 seconds old, received at
2020-09-17T15:03:03.788058-0500: client_request(client.451432:11309
getattr pAsLsXsFs #0x105b1c0 2020-09-17T15:03:03.787602-0500
caller_uid=0, caller_gid=0{}) currently dispatched

Does anyone have any suggestions on how else to clean up from a
permanently lost object?

--Mike

On 9/16/20 2:03 AM, Frank Schilder wrote:
> Sounds similar to this one: https://tracker.ceph.com/issues/46847
>
> If you have or can reconstruct the crush map from before adding the OSDs, you 
> might be able to discover everything with the tem

[ceph-users] Re: multiple OSD crash, unfound objects

2020-09-18 Thread Michael Thomas

Hi Frank,

On 9/18/20 2:50 AM, Frank Schilder wrote:

Dear Michael,

firstly, I'm a bit confused why you started deleting data. The objects were 
unfound, but still there. That's a small issue. Now the data might be gone and 
that's a real issue.


Interval:

Anyone reading this: I have seen many threads where ceph admins started 
deleting objects or PGs or even purging OSDs way too early from a cluster. 
Trying to recover health by deleting data is a contradiction. Ceph has bugs and 
sometimes it needs some help finding everything again. As far as I know, for 
most of these bugs there are workarounds that allow full recovery with a bit of 
work.


I disagree with the statement that trying to recover health by deleting 
data is a contradiction.  In some cases (such as mine), the data in ceph 
is backed up in another location (eg tape library).  Restoring a few 
files from tape is a simple and cheap operation that takes a minute, at 
most.  For the sake of expediency, sometimes it's quicker and easier to 
simply delete the affected files and restore them from the backup system.


This procedure has worked fine with our previous distributed filesystem 
(hdfs), so I (naively?) thought that it could be used with ceph as well. 
 I was a bit surprised that cephs behavior was to indefinitely block 
the 'rm' operation so that the affected file could not even be removed.


Since I have 25 unfound objects spread across 9 PGs, I used a PG with a 
single unfound object to test this alternate recovery procedure.



First question is, did you delete the entire object or just a shard on one 
disk? Are there OSDs that might still have a copy?


Per the troubleshooting guide 
(https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-pg/), 
I ran:


ceph pg 7.1fb mark_unfound_lost delete

So I presume that the entire object has been deleted.


If the object is gone for good, the file references something that doesn't 
exist - its like a bad sector. You probably need to delete the file. Bit 
strange that the operation does not err out with a read error. Maybe it doesn't 
because it waits for the unfound objects state to be resolved?


Even before the object was removed, all read operations on the file 
would hang.  Even worse, attempts to stat() the file with commands such 
as 'ls' or 'rm' would hang.  Even worse, attempts to 'ls' in the 
directory itself would hang.  This hasn't changed after removing the object.


*Update*: The stat() operations may not be hanging indefinitely.  It 
seems to hang for somewhere between 10 minutes and 8 hours.



For all the other unfound objects, they are there somewhere - you didn't loose 
a disk or something. Try pushing ceph to scan the correct OSDs, for example, by 
restarting the newly added OSDs one by one or something similar. Sometimes 
exporting and importing a PG from one OSD to another forces a re-scan and 
subsequent discovery of unfound objects. It is also possible that ceph will 
find these objects along the way of recovery or when OSDs scrub or check for 
objects that can be deleted.


I have restarted the new OSDs countless times.  I've used three 
different methods to restart the OSD:


* systemctl restart ceph-osd@120

* init 6

* ceph osd out 120
  ...wait for repeering to finish...
  systemctl restart ceph-osd@120
  ceph osd in 120

I've done this for all OSDs that a PG has listed in the 'not queried' 
state in 'ceph pg $pgid detail'.  But even when all OSDs in the PG are 
back to the 'already probed' state, the missing objects remain.


Over 90% of my PGs have not been deep scrubbed recently, due to the 
amount of backfilling and importing of data into the ceph cluster.  I 
plan to leave the cluster mostly idle over the weekend so that hopefully 
the deep scrubs can catch up and possibly locate any missing objects.


--Mike


Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Michael Thomas 
Sent: 17 September 2020 22:27:47
To: Frank Schilder; ceph-users@ceph.io
Subject: Re: [ceph-users] multiple OSD crash, unfound objects

Hi Frank,

Yes, it does sounds similar to your ticket.

I've tried a few things to restore the failed files:

* Locate a missing object with 'ceph pg $pgid list_unfound'

* Convert the hex oid to a decimal inode number

* Identify the affected file with 'find /ceph -inum $inode'

At this point, I know which file is affected by the missing object.  As
expected, attempts to read the file simply hang.  Unexpectedly, attempts
to 'ls' the file or its containing directory also hang.  I presume from
this that the stat() system call needs some information that is
contained in the missing object, and is waiting for the object to become
available.

Next I tried to remove the affected object with:

* ceph pg $pgid mark_unfound_lost delete

Now 'ceph status' shows one fewer missing objects, but attempts to 'ls'
or 'rm' the affected file cont

[ceph-users] RGW multisite replication doesn't start

2020-09-18 Thread Eugen Block

Hi *,

I have 2 virtual one-node-clusters configured for multisite RGW. In  
the beginning the replication actually worked  for some hundred MB or  
so, and then it stopped. In the meantime I wiped both RGWs twice to  
make sure the configuration is right (including wiping all pools  
clean). I don't see any errors in the logs but nothing happens on the  
secondary site. Both clusters are healthy, RGWs run with https.  
Uploading data directly to the secondary site also works, so the  
configuration seems ok to me.


These is the current rgw status:

---snip---
primary:~ # radosgw-admin sync status
  realm c7d5fd30-9c06-46a1-baf4-497f95bf3abc (hamburg)
  zonegroup 68adec15-aace-403d-bd63-f5182a6437b1 (zg-hamburg)
   zone 0fb33fa1-8110-4179-ae45-acf5f5f825c5 (z-primary)
  metadata sync no sync (zone is master)


secondary:~ # radosgw-admin sync status
2020-09-17T09:34:59.593+0200 7fdd3e706a40  1 Cannot find zone  
id=93ece7a6-beef-4f4e-841a-60ba0405f192 (name=z-secondary), switching  
to local zonegroup configuration

  realm c7d5fd30-9c06-46a1-baf4-497f95bf3abc (hamburg)
  zonegroup 68adec15-aace-403d-bd63-f5182a6437b1 (zg-hamburg)
   zone 93ece7a6-beef-4f4e-841a-60ba0405f192 (z-secondary)
  metadata sync syncing
full sync: 64/64 shards
full sync: 3 entries to sync
incremental sync: 0/64 shards
metadata is behind on 64 shards
behind shards:  
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63]

  data sync source: 0fb33fa1-8110-4179-ae45-acf5f5f825c5 (z-primary)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source
---snip---


Since the data was not replicated I ran a 'radosgw-admin metadata sync  
run --source-zone=z-primary' but it never finishes. If I do the same  
with data it will show that all shards are behind on data but nothing  
will happen either.
I also don't understand the 'Cannot find zone  
id=93ece7a6-beef-4f4e-841a-60ba0405f192 (name=z-secondary), switching  
to local zonegroup configuration' message but this didn't break the  
replication in the first attempt, so I ignored it. Or is this  
something I should fix first (if yes, how)?


Can anyone point me to what's going on here? I can provide more  
details if necessary, just let me know.


Thank you!
Eugen
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Problem with manual deep-scrubbing PGs on EC pools

2020-09-18 Thread Osiński Piotr
Hi,

We have a little problem with deep-scrubbing on PGs on EC pool.


[root@mon-1 ~]# ceph health detail
HEALTH_WARN 1 pgs not deep-scrubbed in time
PG_NOT_DEEP_SCRUBBED 1 pgs not deep-scrubbed in time
pg 14.d4 not deep-scrubbed since 2020-09-05 20:26:02.696191

[root@mon-1 ~]# ceph pg deep-scrub 14.d4
instructing pg 14.d4s0 on osd.113 to deep-scrub

[root@mon-1 ~]# grep deep-scrub /var/log/ceph/ceph.log |grep 14.d4


There is nothing about pg 14.d4. I checked, that pg 14.d4 belongs to
pool default.rgw.buckets.data-ec


[root@mon-1 ~]# ceph osd pool ls detail  |grep
default.rgw.buckets.data-ec
pool 14 'default.rgw.buckets.data-ec' erasure size 8 min_size 6
crush_rule 1 object_hash rjenkins pg_num 256 pgp_num 256 autoscale_mode
warn last_change 74563 flags hashpspool stripe_width 4160 application
rgw

[root@mon-1 ~]# ceph pg ls-by-pool default.rgw.buckets.data-ec |grep
14.d4
14.d4   00 0   0 0   0  0
 0 active+clean   67m 0'0 74562:10673
[113,40,125,80,16,24,95,32]p113 [113,40,125,80,16,24,95,32]p113 2020-
09-12 14:47:40.603264 2020-09-05 20:26:02.696191



When I try to run a manual scrub or deep-scrub on any PG that belongs
to the EC pool it doesn't work. For other PGs in replicated pools, it
works fine.

Is it possible to manually run pg deep-scrub on an EC pool?



Spółki Grupy Wirtualna Polska:

Wirtualna Polska Holding Spółka Akcyjna z siedzibą w Warszawie, ul. Żwirki i 
Wigury 16, 02-092 Warszawa, wpisana do Krajowego Rejestru Sądowego - Rejestru 
Przedsiębiorców prowadzonego przez Sąd Rejonowy dla m.st. Warszawy w Warszawie 
pod nr KRS: 407130, kapitał zakładowy: 1 454 218,50 zł (w całości 
wpłacony), Numer Identyfikacji Podatkowej (NIP): 521-31-11-513

Wirtualna Polska Media Spółka Akcyjna z siedzibą w Warszawie, ul. Żwirki i 
Wigury 16, 02-092 Warszawa, wpisana do Krajowego Rejestru Sądowego - Rejestru 
Przedsiębiorców prowadzonego przez Sąd Rejonowy dla m.st. Warszawy w Warszawie 
pod nr KRS: 580004, kapitał zakładowy: 320 005 950,00 zł (w całości 
wpłacony), Numer Identyfikacji Podatkowej (NIP): 527-26-45-593

Administratorem udostępnionych danych osobowych jest Wirtualna Polska Media 
S.A. z siedzibą w Warszawie (dalej „WPM”). WPM przetwarza Twoje dane osobowe, 
które zostały podane przez Ciebie dobrowolnie w trakcie dotychczasowej 
współpracy, w związku z zawarciem umowy lub zostały zebrane ze źródeł 
powszechnie dostępnych, w szczególności: imię i nazwisko, adres email, numer 
telefonu. Przetwarzamy te dane w celach opisanych w polityce 
prywatności, między innymi w celu realizacji 
współpracy, realizacji obowiązków przewidzianych prawem, w celach 
marketingowych WP. Podstawą prawną przetwarzania Twoich danych osobowych w 
celach marketingowych jest prawnie uzasadniony interes jakim jest m.in. 
przesyłanie informacji marketingowych o usługach WP, w tym zaproszeń na 
konferencje branżowe, informacje o publikacjach. Twoje dane możemy przekazywać 
podmiotom przetwarzającym je na nasze zlecenie oraz podmiotom uprawnionym do 
uzyskania danych na podstawie obowiązującego prawa. Masz prawo m.in. do żądania 
dostępu do danych, sprostowania, usunięcia lub ograniczenia ich przetwarzania, 
jak również prawo do zgłoszenia sprzeciwu w przewidzianych w prawie sytuacjach. 
Prawa te oraz sposób ich realizacji opisaliśmy w polityce 
prywatności. Tam też znajdziesz informacje 
jak zakomunikować nam Twoją wolę skorzystania z tych praw.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] disk scheduler for SSD

2020-09-18 Thread George Shuklin

I start to wonder (again) which scheduler is better for ceph on SSD.

My reasoning.

None:

1. Reduces latency for requests. The lower latency is, the higher is 
perceived performance for unbounded workload with fixed queue depth 
(hello, benchmarks).
2. Causes possible spikes in latency for requests because of the 
'unfair' request ordering (hello, deep scrub).


Deadline-mq:

1. Reduce size of nr_requests (queue size) to 256 (noop shows me 
916???). Make introduce latency.
2. May reduce latency spikes due to different rates for different types 
of workloads.


I'm doing some benchmarks, and they, but of course, gives higher marks 
for 'none' scheduler. Nevertheless, I believe most of normal workload on 
Ceph does not utilize it with unbounded rate, so bounded (f.e. app 
making IO based on external independed events) workload can be hurt by 
lack of disk scheduler in presence of unbounded workload.


Any ideas?

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] September Ceph Science User Group Virtual Meeting

2020-09-18 Thread Kevin Hrpcek

Hey all,

We will be having a Ceph science/research/big cluster call on Wednesday 
September 23rd. If anyone wants to discuss something specific they can 
add it to the pad linked below. If you have questions or comments you 
can contact me.


This is an informal open call of community members mostly from 
hpc/htc/research environments where we discuss whatever is on our minds 
regarding ceph. Updates, outages, features, maintenance, etc...there is 
no set presenter but I do attempt to keep the conversation lively.


https://pad.ceph.com/p/Ceph_Science_User_Group_20200923

We try to keep it to an hour or less.

Ceph calendar event details:

September 23, 2020
14:00 UTC
4pm Central European
9am Central US

Description: Main pad for discussions: 
https://pad.ceph.com/p/Ceph_Science_User_Group_Index

Meetings will be recorded and posted to the Ceph Youtube channel.
To join the meeting on a computer or mobile phone: 
https://bluejeans.com/908675367?src=calendarLink

To join from a Red Hat Deskphone or Softphone, dial: 84336.
Connecting directly from a room system?
    1.) Dial: 199.48.152.152 or bjn.vc
    2.) Enter Meeting ID: 908675367
Just want to dial in on your phone?
    1.) Dial one of the following numbers: 408-915-6466 (US)
    See all numbers: https://www.redhat.com/en/conference-numbers
    2.) Enter Meeting ID: 908675367
    3.) Press #
Want to test your video connection? https://bluejeans.com/111


Kevin

--
Kevin Hrpcek
NASA VIIRS Atmosphere SIPS
Space Science & Engineering Center
University of Wisconsin-Madison

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: virtual machines crashes after upgrade to octopus

2020-09-18 Thread Lomayani S. Laizer
Hello Jason,

I confirm this release fixes the crashes. there is no a single crash for
past 4 days



On Mon, Sep 14, 2020 at 2:55 PM Jason Dillaman  wrote:

> On Mon, Sep 14, 2020 at 5:13 AM Lomayani S. Laizer 
> wrote:
> >
> > Hello,
> > Last week i got time to try debug crashes of these vms
> >
> > Below  log  includes rados debug which i left last time
> >
> > https://storage.habari.co.tz/index.php/s/AQEJ7tQS7epC4Zn
> >
> > I have observed the following with these settings in openstack and ceph
> >
> > disk_cachemodes="network=writeback" is set in openstack environment
> > and in ceph  rbd_cache_policy = writearound is set--- crashes occur
> >
> > disk_cachemodes="network=writeback" is set in openstack environment
> > and in ceph  rbd_cache_policy = writeback is set---  no crashes
> >
> > disk_cachemodes="none" is set in openstack environment and in ceph
> > rbd_cache_policy = writearound is set  no crashes
> >
> > disk_cachemodes="none" is set in openstack environment and in ceph
> > rbd_cache_policy = writeback is set --- crashes occur
> >
> > Is disk_cachemodes="network=writeback" is no longer recommended in
> > octopus because i see it is left out in new documentation for octopus?
> >
> > https://ceph.readthedocs.io/en/latest/rbd/rbd-openstack/
>
> Can you try the latest development release of Octopus [1]? A librbd
> crash fix has been sitting in that branch for about a month now to be
> included in the next point release.
>
> >
> >
> > >* > Hello,*
> > >* >*
> > >* > Below is full debug log of 2 minutes before crash of virtual
> machine.*
> > >* Download from below url*
> > >* >*
> > >* > https://storage.habari.co.tz/index.php/s/31eCwZbOoRTMpcU <
> https://storage.habari.co.tz/index.php/s/31eCwZbOoRTMpcU>*
> > >
> > >* This log has rbd debug output, but not rados :(*
> > >
> > >* I guess you'll need to try and capture a coredump if you can't get a*
> > >* backtrace.*
> > >
> > >* I'd also suggest opening a tracker in case one of the rbd devs has
> any*
> > >* ideas on this, or has seen something similar. Without a backtrace or*
> > >* core it will be impossible to definitively identify the issue though.*
> > >
> >
> > +1 to needing the backtrace. I don't see any indications of a problem in
> > that log.
> >
> >
> > >* >*
> > >* >*
> > >* > apport.log*
> > >* >*
> > >* > Wed May 13 09:35:30 2020: host pid 4440 crashed in a separate mount*
> > >* namespace, ignoring*
> > >* >*
> > >* > kernel.log*
> > >* > May 13 09:35:30 compute5 kernel: [123071.373217]
> fn-radosclient[4485]:*
> > >* segfault at 0 ip 7f4c8c85d7ed sp 7f4c66ffc470 error 4 in*
> > >* librbd.so.1.12.0[7f4c8c65a000+5cb000]*
> > >* > May 13 09:35:30 compute5 kernel: [123071.373228] Code: 8d 44 24 08
> 48 81*
> > >* c3 d8 3e 00 00 49 21 f9 48 c1 e8 30 83 c0 01 48 c1 e0 30 48 89 02 48
> 8b 03*
> > >* 48 89 04 24 48 8b 34 24 48 21 fe <48> 8b 06 48 89 44 24 08 48 8b 44
> 24 08*
> > >* 48 8b 0b 48 21 f8 48 39 0c*
> > >* > May 13 09:35:33 compute5 kernel: [123074.832700] brqa72d845b-e9:
> port*
> > >* 1(tap33511c4d-2c) entered disabled state*
> > >* > May 13 09:35:33 compute5 kernel: [123074.838520] device
> tap33511c4d-2c*
> > >* left promiscuous mode*
> > >* > May 13 09:35:33 compute5 kernel: [123074.838527] brqa72d845b-e9:
> port*
> > >* 1(tap33511c4d-2c) entered disabled state*
> > >* >*
> > >* > syslog*
> > >* > compute5 kernel: [123071.373217] fn-radosclient[4485]: segfault at
> 0 ip*
> > >* 7f4c8c85d7ed sp 7f4c66ffc470 error 4 i*
> > >* > n librbd.so.1.12.0[7f4c8c65a000+5cb000]*
> > >* > May 13 09:35:30 compute5 kernel: [123071.373228] Code: 8d 44 24 08
> 48 81*
> > >* c3 d8 3e 00 00 49 21 f9 48 c1 e8 30 83 c0 01 48 c1 e0 30 48 8*
> > >* > 9 02 48 8b 03 48 89 04 24 48 8b 34 24 48 21 fe <48> 8b 06 48 89 44
> 24 08*
> > >* 48 8b 44 24 08 48 8b 0b 48 21 f8 48 39 0c*
> > >* > May 13 09:35:30 compute5 libvirtd[1844]: internal error: End of
> file*
> > >* from qemu monitor*
> > >* > May 13 09:35:33 compute5 systemd-networkd[1326]: tap33511c4d-2c:
> Link*
> > >* DOWN*
> > >* > May 13 09:35:33 compute5 systemd-networkd[1326]: tap33511c4d-2c:
> Lost*
> > >* carrier*
> > >* > May 13 09:35:33 compute5 kernel: [123074.832700] brqa72d845b-e9:
> port*
> > >* 1(tap33511c4d-2c) entered disabled state*
> > >* > May 13 09:35:33 compute5 kernel: [123074.838520] device
> tap33511c4d-2c*
> > >* left promiscuous mode*
> > >* > May 13 09:35:33 compute5 kernel: [123074.838527] brqa72d845b-e9:
> port*
> > >* 1(tap33511c4d-2c) entered disabled state*
> > >* > May 13 09:35:33 compute5 networkd-dispatcher[1614]: Failed to
> request*
> > >* link: No such device*
> > >* >*
> > >* > On Fri, May 8, 2020 at 5:40 AM Brad Hubbard 
> wrote:*
> > >* >>*
> > >* >> On Fri, May 8, 2020 at 12:10 PM Lomayani S. Laizer
> *
> > >* wrote:*
> > >* >> >*
> > >* >> > Hello,*
> > >* >> > On my side at point of vm crash these are logs below. At the
> moment*
> > >* my debug is at 10 value. I will rise to 20 for full debug. these
> crashes*
> > >* are random and so 

[ceph-users] Re: Benchmark WAL/DB on SSD and HDD for RGW RBD CephFS

2020-09-18 Thread George Shuklin

On 17/09/2020 17:37, Mark Nelson wrote:
Does fio handle S3 objects spread across many buckets well? I think 
bucket listing performance was maybe missing too, but It's been a 
while since I looked at fio's S3 support.  Maybe they have those use 
cases covered now.  I wrote a go based benchmark called hsbench based 
on the wasabi-tech benchmark a while back that tries to cover some of 
those cases, but I haven't touched it in a while:



https://github.com/markhpc/hsbench


The way to spread across many buckets is to use 'farm' for servers under 
one client manage. You just give each server a different bucket to 
torture inside jobfile. iodepth=1 restriction for http ioengine is 
actually encouraging this.





FWIW fio can be used for cephfs as well and it works reasonably well 
if you give it a long enough run time and only expect hero run 
scenarios from it.  For metadata intensive workloads you'll need to 
use mdtest or smallfile.  At this point I mostly just use the io500 
suite that includes both ior for hero runs and mdtest for metadata 
(but you need mpi to coordinate it across multiple nodes).
Yep, I've talked about metadata intensive workloads. Romping within a 
file or two is not a true fs-specific benchmark.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] It is an advantage to use assignment help in Australia

2020-09-18 Thread john seena
Do you want to connect with Australian assignment helpers to feel the low 
stress of submission? Are you seeking the best source of help for composing 
your academic documents? In this context, use Assignment Help Services and get 
completed papers without any delay. As I can understand that, whenever you have 
issues in writing your assignments, it gets hard for you to write flawless 
papers. When you are busy with lots of activities, you can’t focus on 
assignment writing and collecting requisite information for it. So, if you 
require a reliable way of finishing your homework, choosing the assistance of 
professionals is an advantage. You can manage your effort and save your time 
using the services of online academic writing. I can say so because I already 
used these services when I was studying Australia. You don’t have to think a 
lot because using online assignment help services is the best choice for 
completing your work without any issue. Don’t face any delay if you know about 
online academic writing service in Australia.
https://www.greatassignmenthelp.com/au/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Benchmark WAL/DB on SSD and HDD for RGW RBD CephFS

2020-09-18 Thread Maged Mokhtar



dm-writecache works using a high and low watermarks, set at 45 and 50%. 
All writes land in cache, once cache fills to the high watermark 
backfilling to the slow device starts and stops when reaching the low 
watermark. Backfilling uses b-tree with LRU blocks and tries merge 
blocks to reduce hdd seeks, this is further helped by the io scheduler 
(cfq/deadline) ordering.
Each sync write op to the device requires 2 sync write ops, one for data 
and one for metadata, metadata is always in ram so there is no 
additional metada read op (at the expense of using 2.5% of your cache 
partition size in ram). So for pure sync writes (those with REQ_FUA or 
REQ_FLUSH which is used by Ceph) get half the SSD iops performance at 
the device level.


Now the questions, what sustained performance would you get during 
backfilling: it totally depends on whether your workload is sequential 
or random. For pure sequential workloads, all blocks are merged so there 
will not be a drop in input iops and backfilling occurs in small step 
like intervals, but for such workloads you could get good performance 
even without a cache. For purely random writes theoretically you should 
drop to the hdd random iops speed ( ie 80-150 iops ), but in our testing 
with fio pure random we would get 400-450 sustained iops, this is 
probably related to the non-random-ness of fio rather than any magic. 
For real life workloads that have a mix of both, this is where the real 
benefit of the cache will be felt, however it is not easy to simulate 
such workloads, fio does offer a zipf/theta random distribution control 
but it was difficult for us to simulate real life workloads with it, we 
did some manual workloads such as installing and copying multiple vms 
and we found the cache helped by 3-4 times the time to complete.


dm-writecache does serve reads if in cache, however the OSD cache does 
help for reads as well as any client read-ahead and in general writes 
are the performance issue with hdd in Ceph.


For bcache, the only configuration we did was to enable write back mode, 
we did not set the block size to 4k.


If you want to try dm-writecache, use a recent 5.4+ kernel or a kernel 
with REQ_FUA support patch we did. You would need a recent lvm tools 
package to support dm-writecache. We also limit the number of backfill 
blocks inflight to 100k blocks ie 400 MB.


/Maged

On 18/09/2020 13:38, vita...@yourcmc.ru wrote:

we did test dm-cache, bcache and dm-writecache, we found the later to be
much better.

Did you set bcache block size to 4096 during your tests? Without this setting 
it's slow because 99.9% SSDs don't handle 512 byte overwrites well. Otherwise I 
don't think bcache should be worse than dm-writecache. Also dm-writecache only 
caches writes, and bcache also caches reads. And lvmcache is trash because it 
only writes to SSD when the block is already on the SSD.

Please post some details about the comparison if you have them :)

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph RDMA GID Selection Problem

2020-09-18 Thread Lazuardi Nasution
Hi,

I have something weird about GID selection for Ceph with RDMA. When I do
configuration with ms_async_rdma_device_name and ms_async_rdma_gid_idx,
Ceph with RDMA running successfully. But, when I do configuration with
ms_async_rdma_device_name, ms_async_rdma_local_gid and
ms_async_rdma_roce_ver, Ceph with RDMA is not working, OSDs are down in
seconds after they up. The GID index which is used on the first attempt is
associated with the GID and RoCE version which are used on the second
attempt. Is this the string matter (maybe because GID is using colon
characters) or something else? Using the GID index sometimes gives me
problems due to it not persisting and changes happen every time I do
network reconfiguration (for example: adding/removing VLAN), or even
rebooting.

Best regards,
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: multiple OSD crash, unfound objects

2020-09-18 Thread Frank Schilder
Dear Michael,

> I disagree with the statement that trying to recover health by deleting
> data is a contradiction.  In some cases (such as mine), the data in ceph
> is backed up in another location (eg tape library).  Restoring a few
> files from tape is a simple and cheap operation that takes a minute, at
> most.

I would agree with that if the data was deleted using the appropriate 
high-level operation. Deleting an unfound object is like marking a sector on a 
disk as bad with smartctl. How should the file system react to that? Purging an 
OSD is like removing a disk from a raid set. Such operations increase 
inconsistencies/degradation rather than resolving them. Cleaning this up also 
requires to execute other operations to remove all references to the object 
and, finally, the file inode itself.

The ls on a dir with corrupted file(s) hangs if ls calls stat on every file. 
For example, when coloring is enabled, ls will stat every file in the dir to be 
able to choose the color according to permissions. If one then disables 
coloring, a plain "ls" will return all names while an "ls -l" will hang due to 
stat calls.

An "rm" or "rm -f" should succeed if the folder permissions allow that. It 
should not stat the file itself, so it sounds a bit odd that its hanging. I 
guess in some situations it does, like "rm -i", which will ask before removing 
read-only files. How does "unlink FILE" behave?

Most admin commands on ceph are asynchronous. A command like "pg repair" or 
"osd scrub" only schedules an operation. The command "ceph pg 7.1fb 
mark_unfound_lost delete" does probably just the same. Unfortunately, I don't 
know how to check that a scheduled operation has 
started/completed/succeeded/failed. I asked this in an earlier thread (about PG 
repair) and didn't get an answer. On our cluster, the actual repair happened 
ca. 6-12 hours after scheduling (on a healthy cluster!). I would conclude that 
(some of) these operations have very low priority and will not start at least 
as long as there is recovery going on. One might want to consider the 
possibility that some of the scheduled commands have not been executed yet.

The output of "pg query" contains the IDs of the missing objects (in mimic) and 
each of these objects is on one of the peer OSDs of the PG (I think object here 
refers to shard or copy). It should be possible to find the corresponding OSD 
(or at least obtain confirmation that the object is really gone) and move the 
object to a place where it is expected to be found. This can probably be 
achieved with "PG export" and "PG import". I don't know of any other way(s).

I guess, in the current situation, sitting it out a bit longer might be a good 
strategy. I don't know how many asynchronous commands you executed and giving 
the cluster time to complete these jobs might improve the situation.

Sorry that I can't be of more help here. However, if you figure out a solution 
(ideally non-destructive), please post it here.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Michael Thomas 
Sent: 18 September 2020 14:15:53
To: Frank Schilder; ceph-users@ceph.io
Subject: Re: [ceph-users] multiple OSD crash, unfound objects

Hi Frank,

On 9/18/20 2:50 AM, Frank Schilder wrote:
> Dear Michael,
>
> firstly, I'm a bit confused why you started deleting data. The objects were 
> unfound, but still there. That's a small issue. Now the data might be gone 
> and that's a real issue.
>
> 
> Interval:
>
> Anyone reading this: I have seen many threads where ceph admins started 
> deleting objects or PGs or even purging OSDs way too early from a cluster. 
> Trying to recover health by deleting data is a contradiction. Ceph has bugs 
> and sometimes it needs some help finding everything again. As far as I know, 
> for most of these bugs there are workarounds that allow full recovery with a 
> bit of work.

I disagree with the statement that trying to recover health by deleting
data is a contradiction.  In some cases (such as mine), the data in ceph
is backed up in another location (eg tape library).  Restoring a few
files from tape is a simple and cheap operation that takes a minute, at
most.  For the sake of expediency, sometimes it's quicker and easier to
simply delete the affected files and restore them from the backup system.

This procedure has worked fine with our previous distributed filesystem
(hdfs), so I (naively?) thought that it could be used with ceph as well.
  I was a bit surprised that cephs behavior was to indefinitely block
the 'rm' operation so that the affected file could not even be removed.

Since I have 25 unfound objects spread across 9 PGs, I used a PG with a
single unfound object to test this alternate recovery procedure.

> First question is, did you delete the entire object or just a shard on one 
> disk? Are there OSDs that might still have a copy?

Per the troubleshootin

[ceph-users] Re: Using cephadm shell/ceph-volume

2020-09-18 Thread Eugen Block

Use the block.db option without the device path but only {VG}/{LV}:

ceph-volume lvm prepare --data /dev/sda
--block.db vg/sda.db --dmcrypt

Zitat von t...@postix.net:


Hi all,

I'm having problem creating an osd using ceph-volume (by the way of  
cephadm). This is on an octopus installation with cephadm. So I use  
"cephadm shell" and then "ceph-volume" but got the following error:


root@furry:/var/lib# ceph-volume lvm prepare --data /dev/sda  
--block.db /dev/vg/sda.db --dmcrypt

Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name  
client.bootstrap-osd --keyring  
/var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new  
29d95564-733d-4f2a-a2c8-1bb9ceb5a14b

 stderr: [errno 13] RADOS permission denied (error connecting to the cluster)
--> RuntimeError: Unable to create a new OSD id
if I pass the cluster ID, and use the correct key ring (using the  
client.admin keyring), I got a bit further
root@furry:/var/lib# ceph-volume --cluster  
c258000c-f3e4-11ea-9ebe-c3c75e8e9028 lvm prepare --data /dev/sda  
--block.db /dev/vg/sda.db --dmcrypt

Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster  
c258000c-f3e4-11ea-9ebe-c3c75e8e9028 --name client.bootstrap-osd  
--keyring  
/var/lib/ceph/bootstrap-osd/c258000c-f3e4-11ea-9ebe-c3c75e8e9028.keyring -i  
- osd new aa13c362-c9cf-4d03-9a86-d6118fbc312c
 stderr: Error initializing cluster client: ObjectNotFound('RADOS  
object not found (error calling conf_read_file)',)

--> RuntimeError: Unable to create a new OSD id

Any idea on how to get pass these errors? Thanks.

--Tri Hoang
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Benchmark WAL/DB on SSD and HDD for RGW RBD CephFS

2020-09-18 Thread Daniel Poelzleithner
On 2020-09-17 19:21, vita...@yourcmc.ru wrote:
> It does, RGW really needs SSDs for bucket indexes. CephFS also needs SSDs for 
> metadata in any setup that's used by more than 1 user :).

Nah. I crashed my first cephfs with my music library, a 2 TB git annex
repo, just me alone (slow ops on mds).

creating a cephfs on a non ssd/nvme metadata pool should require
--i-really-want-this flag :)


poelzi
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RuntimeError: Unable check if OSD id exists

2020-09-18 Thread Marc Roos


I have still ceph-disk created osd's in nautilus. Thought about using 
this ceph-volume, but looks like this manual for replacing ceph-disk[1] 
is not complete. Getting already this error

RuntimeError: Unable check if OSD id exists: 

[1]
https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#rados-replacing-an-osd
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: multiple OSD crash, unfound objects

2020-09-18 Thread Frank Schilder
Dear Michael,

maybe there is a way to restore access for users and solve the issues later. 
Someone else with a lost/unfound object was able to move the affected file (or 
directory containing the file) to a separate location and restore the now 
missing data from backup. This will "park" the problem of cluster health for 
later fixing.

Best regads,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Frank Schilder 
Sent: 18 September 2020 15:38:51
To: Michael Thomas; ceph-users@ceph.io
Subject: [ceph-users] Re: multiple OSD crash, unfound objects

Dear Michael,

> I disagree with the statement that trying to recover health by deleting
> data is a contradiction.  In some cases (such as mine), the data in ceph
> is backed up in another location (eg tape library).  Restoring a few
> files from tape is a simple and cheap operation that takes a minute, at
> most.

I would agree with that if the data was deleted using the appropriate 
high-level operation. Deleting an unfound object is like marking a sector on a 
disk as bad with smartctl. How should the file system react to that? Purging an 
OSD is like removing a disk from a raid set. Such operations increase 
inconsistencies/degradation rather than resolving them. Cleaning this up also 
requires to execute other operations to remove all references to the object 
and, finally, the file inode itself.

The ls on a dir with corrupted file(s) hangs if ls calls stat on every file. 
For example, when coloring is enabled, ls will stat every file in the dir to be 
able to choose the color according to permissions. If one then disables 
coloring, a plain "ls" will return all names while an "ls -l" will hang due to 
stat calls.

An "rm" or "rm -f" should succeed if the folder permissions allow that. It 
should not stat the file itself, so it sounds a bit odd that its hanging. I 
guess in some situations it does, like "rm -i", which will ask before removing 
read-only files. How does "unlink FILE" behave?

Most admin commands on ceph are asynchronous. A command like "pg repair" or 
"osd scrub" only schedules an operation. The command "ceph pg 7.1fb 
mark_unfound_lost delete" does probably just the same. Unfortunately, I don't 
know how to check that a scheduled operation has 
started/completed/succeeded/failed. I asked this in an earlier thread (about PG 
repair) and didn't get an answer. On our cluster, the actual repair happened 
ca. 6-12 hours after scheduling (on a healthy cluster!). I would conclude that 
(some of) these operations have very low priority and will not start at least 
as long as there is recovery going on. One might want to consider the 
possibility that some of the scheduled commands have not been executed yet.

The output of "pg query" contains the IDs of the missing objects (in mimic) and 
each of these objects is on one of the peer OSDs of the PG (I think object here 
refers to shard or copy). It should be possible to find the corresponding OSD 
(or at least obtain confirmation that the object is really gone) and move the 
object to a place where it is expected to be found. This can probably be 
achieved with "PG export" and "PG import". I don't know of any other way(s).

I guess, in the current situation, sitting it out a bit longer might be a good 
strategy. I don't know how many asynchronous commands you executed and giving 
the cluster time to complete these jobs might improve the situation.

Sorry that I can't be of more help here. However, if you figure out a solution 
(ideally non-destructive), please post it here.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Michael Thomas 
Sent: 18 September 2020 14:15:53
To: Frank Schilder; ceph-users@ceph.io
Subject: Re: [ceph-users] multiple OSD crash, unfound objects

Hi Frank,

On 9/18/20 2:50 AM, Frank Schilder wrote:
> Dear Michael,
>
> firstly, I'm a bit confused why you started deleting data. The objects were 
> unfound, but still there. That's a small issue. Now the data might be gone 
> and that's a real issue.
>
> 
> Interval:
>
> Anyone reading this: I have seen many threads where ceph admins started 
> deleting objects or PGs or even purging OSDs way too early from a cluster. 
> Trying to recover health by deleting data is a contradiction. Ceph has bugs 
> and sometimes it needs some help finding everything again. As far as I know, 
> for most of these bugs there are workarounds that allow full recovery with a 
> bit of work.

I disagree with the statement that trying to recover health by deleting
data is a contradiction.  In some cases (such as mine), the data in ceph
is backed up in another location (eg tape library).  Restoring a few
files from tape is a simple and cheap operation that takes a minute, at
most.  For the sake of expediency, sometimes it's quicker and easier to
simply delete the affected files

[ceph-users] Using cephadm shell/ceph-volume

2020-09-18 Thread tri
Hi all,

I'm having problem creating an osd using ceph-volume (by the way of cephadm). 
This is on an octopus installation with cephadm. So I use "cephadm shell" and 
then "ceph-volume" but got the following error:

root@furry:/var/lib# ceph-volume lvm prepare --data /dev/sda --block.db 
/dev/vg/sda.db --dmcrypt
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd 
--keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 
29d95564-733d-4f2a-a2c8-1bb9ceb5a14b
 stderr: [errno 13] RADOS permission denied (error connecting to the cluster)
--> RuntimeError: Unable to create a new OSD id
if I pass the cluster ID, and use the correct key ring (using the client.admin 
keyring), I got a bit further
root@furry:/var/lib# ceph-volume --cluster c258000c-f3e4-11ea-9ebe-c3c75e8e9028 
lvm prepare --data /dev/sda --block.db /dev/vg/sda.db --dmcrypt
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster c258000c-f3e4-11ea-9ebe-c3c75e8e9028 
--name client.bootstrap-osd --keyring 
/var/lib/ceph/bootstrap-osd/c258000c-f3e4-11ea-9ebe-c3c75e8e9028.keyring -i - 
osd new aa13c362-c9cf-4d03-9a86-d6118fbc312c
 stderr: Error initializing cluster client: ObjectNotFound('RADOS object not 
found (error calling conf_read_file)',)
--> RuntimeError: Unable to create a new OSD id

Any idea on how to get pass these errors? Thanks.

--Tri Hoang
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Process for adding a separate block.db to an osd

2020-09-18 Thread Eugen Block
Don’t forget to change the lv tags and make sure ceph-bluestore-tool  
show-label has the right labels. This has been discussed multiple  
times [1].



[1]  
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/GSFUUIMYDPSFM2HHO25TCTPLTXBS3O2K/


Zitat von t...@postix.net:


Hey all,

I'm trying to figure out the appropriate process for adding a  
separate SSD block.db to an existing OSD. From what I gather the two  
steps are:


 1. Use ceph-bluestore-tool bluefs-bdev-new-db to add the new db device
 2. Migrate the data ceph-bluestore-tool bluefs-bdev-migrate

I followed this and got both executed fine without any error. Yet  
when the OSD got started up, it keeps on using the integrated  
block.db instead of the new db. The block.db link to the new db  
device was deleted. Again, no error, just not using the new db


Any suggestion? Thanks.

--Tri Hoang

root@elmo:/#CEPH_ARGS="--bluestore_block_db_size=26843545600  
--bluestore_block_db_create=true" ceph-bluestore-tool --path  
/mnt/ceph/c258000c-f3e4-11ea-9ebe-c3c75e8e9028/osd.2  
bluefs-bdev-new-db --dev-target /dev/vg/sdc.db

inferring bluefs devices from bluestore path
DB device added /dev/dm-8

root@elmo:/# ceph-bluestore-tool --path  
/mnt/ceph/c258000c-f3e4-11ea-9ebe-c3c75e8e9028/osd.2 --devs-source  
/mnt/ceph/c258000c-f3e4-11ea-9ebe-c3c75e8e9028/osd.2/block  
--dev-target  
/mnt/ceph/c258000c-f3e4-11ea-9ebe-c3c75e8e9028/osd.2/block.db  
bluefs-bdev-migrate

inferring bluefs devices from bluestore path
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Process for adding a separate block.db to an osd

2020-09-18 Thread tri
Hey all,

I'm trying to figure out the appropriate process for adding a separate SSD 
block.db to an existing OSD. From what I gather the two steps are:

 1. Use ceph-bluestore-tool bluefs-bdev-new-db to add the new db device
 2. Migrate the data ceph-bluestore-tool bluefs-bdev-migrate

I followed this and got both executed fine without any error. Yet when the OSD 
got started up, it keeps on using the integrated block.db instead of the new 
db. The block.db link to the new db device was deleted. Again, no error, just 
not using the new db

Any suggestion? Thanks.

--Tri Hoang

root@elmo:/#CEPH_ARGS="--bluestore_block_db_size=26843545600 
--bluestore_block_db_create=true" ceph-bluestore-tool --path 
/mnt/ceph/c258000c-f3e4-11ea-9ebe-c3c75e8e9028/osd.2 bluefs-bdev-new-db 
--dev-target /dev/vg/sdc.db
inferring bluefs devices from bluestore path
DB device added /dev/dm-8

root@elmo:/# ceph-bluestore-tool --path 
/mnt/ceph/c258000c-f3e4-11ea-9ebe-c3c75e8e9028/osd.2 --devs-source 
/mnt/ceph/c258000c-f3e4-11ea-9ebe-c3c75e8e9028/osd.2/block --dev-target 
/mnt/ceph/c258000c-f3e4-11ea-9ebe-c3c75e8e9028/osd.2/block.db 
bluefs-bdev-migrate
inferring bluefs devices from bluestore path
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io