[ceph-users] virtual machines crashes after upgrade to octopus

2020-09-14 Thread Lomayani S. Laizer
Hello,
Last week i got time to try debug crashes of these vms

Below  log  includes rados debug which i left last time

https://storage.habari.co.tz/index.php/s/AQEJ7tQS7epC4Zn

I have observed the following with these settings in openstack and ceph

disk_cachemodes="network=writeback" is set in openstack environment
and in ceph  rbd_cache_policy = writearound is set--- crashes occur

disk_cachemodes="network=writeback" is set in openstack environment
and in ceph  rbd_cache_policy = writeback is set---  no crashes

disk_cachemodes="none" is set in openstack environment and in ceph
rbd_cache_policy = writearound is set  no crashes

disk_cachemodes="none" is set in openstack environment and in ceph
rbd_cache_policy = writeback is set --- crashes occur

Is disk_cachemodes="network=writeback" is no longer recommended in
octopus because i see it is left out in new documentation for octopus?

https://ceph.readthedocs.io/en/latest/rbd/rbd-openstack/














>* > Hello,*
>* >*
>* > Below is full debug log of 2 minutes before crash of virtual machine.*
>* Download from below url*
>* >*
>* > https://storage.habari.co.tz/index.php/s/31eCwZbOoRTMpcU 
>*
>
>* This log has rbd debug output, but not rados :(*
>
>* I guess you'll need to try and capture a coredump if you can't get a*
>* backtrace.*
>
>* I'd also suggest opening a tracker in case one of the rbd devs has any*
>* ideas on this, or has seen something similar. Without a backtrace or*
>* core it will be impossible to definitively identify the issue though.*
>

+1 to needing the backtrace. I don't see any indications of a problem in
that log.


>* >*
>* >*
>* > apport.log*
>* >*
>* > Wed May 13 09:35:30 2020: host pid 4440 crashed in a separate mount*
>* namespace, ignoring*
>* >*
>* > kernel.log*
>* > May 13 09:35:30 compute5 kernel: [123071.373217] fn-radosclient[4485]:*
>* segfault at 0 ip 7f4c8c85d7ed sp 7f4c66ffc470 error 4 in*
>* librbd.so.1.12.0[7f4c8c65a000+5cb000]*
>* > May 13 09:35:30 compute5 kernel: [123071.373228] Code: 8d 44 24 08 48 81*
>* c3 d8 3e 00 00 49 21 f9 48 c1 e8 30 83 c0 01 48 c1 e0 30 48 89 02 48 8b 03*
>* 48 89 04 24 48 8b 34 24 48 21 fe <48> 8b 06 48 89 44 24 08 48 8b 44 24 08*
>* 48 8b 0b 48 21 f8 48 39 0c*
>* > May 13 09:35:33 compute5 kernel: [123074.832700] brqa72d845b-e9: port*
>* 1(tap33511c4d-2c) entered disabled state*
>* > May 13 09:35:33 compute5 kernel: [123074.838520] device tap33511c4d-2c*
>* left promiscuous mode*
>* > May 13 09:35:33 compute5 kernel: [123074.838527] brqa72d845b-e9: port*
>* 1(tap33511c4d-2c) entered disabled state*
>* >*
>* > syslog*
>* > compute5 kernel: [123071.373217] fn-radosclient[4485]: segfault at 0 ip*
>* 7f4c8c85d7ed sp 7f4c66ffc470 error 4 i*
>* > n librbd.so.1.12.0[7f4c8c65a000+5cb000]*
>* > May 13 09:35:30 compute5 kernel: [123071.373228] Code: 8d 44 24 08 48 81*
>* c3 d8 3e 00 00 49 21 f9 48 c1 e8 30 83 c0 01 48 c1 e0 30 48 8*
>* > 9 02 48 8b 03 48 89 04 24 48 8b 34 24 48 21 fe <48> 8b 06 48 89 44 24 08*
>* 48 8b 44 24 08 48 8b 0b 48 21 f8 48 39 0c*
>* > May 13 09:35:30 compute5 libvirtd[1844]: internal error: End of file*
>* from qemu monitor*
>* > May 13 09:35:33 compute5 systemd-networkd[1326]: tap33511c4d-2c: Link*
>* DOWN*
>* > May 13 09:35:33 compute5 systemd-networkd[1326]: tap33511c4d-2c: Lost*
>* carrier*
>* > May 13 09:35:33 compute5 kernel: [123074.832700] brqa72d845b-e9: port*
>* 1(tap33511c4d-2c) entered disabled state*
>* > May 13 09:35:33 compute5 kernel: [123074.838520] device tap33511c4d-2c*
>* left promiscuous mode*
>* > May 13 09:35:33 compute5 kernel: [123074.838527] brqa72d845b-e9: port*
>* 1(tap33511c4d-2c) entered disabled state*
>* > May 13 09:35:33 compute5 networkd-dispatcher[1614]: Failed to request*
>* link: No such device*
>* >*
>* > On Fri, May 8, 2020 at 5:40 AM Brad Hubbard  wrote:*
>* >>*
>* >> On Fri, May 8, 2020 at 12:10 PM Lomayani S. Laizer *
>* wrote:*
>* >> >*
>* >> > Hello,*
>* >> > On my side at point of vm crash these are logs below. At the moment*
>* my debug is at 10 value. I will rise to 20 for full debug. these crashes*
>* are random and so far happens on very busy vms. Downgrading clients in host*
>* to Nautilus these crashes disappear*
>* >>*
>* >> You could try adding debug_rados as well but you may get a very large*
>* >> log so keep an eye on things.*
>* >>*
>* >> >*
>* >> > Qemu is not shutting down in general because other vms on the same*
>* host continues working*
>* >>*
>* >> A process can not reliably continue after encountering a segfault so*
>* >> the qemu-kvm process must be ending and therefore it should be*
>* >> possible to capture a coredump with the right configuration.*
>* >>*
>* >> In the following example, if you were to search for pid 6060 you would*
>* >> find it is no longer running.*
>* >> >> > [ 7682.233684] fn-radosclient[6060]: segfault at 2b19 ip*
>* 7f8165cc0a50 sp 7f81397f6490 error 4 in*
>* librbd.so.1.12.0[7f8165ab4000+53

[ceph-users] Re: Is it possible to assign osd id numbers?

2020-09-14 Thread George Shuklin

On 11/09/2020 22:43, Shain Miley wrote:

Thank you for your answer below.

I'm not looking to reuse them as much as I am trying to control what unused 
number is actually used.

For example if I have 20 osds and 2 have failed...when I replace a disk in one 
server I don't want it to automatically use the next lowest number for the osd 
assignment.

I understand what you mean about not focusing on the osd ids...but my ocd is 
making me ask the question.

Well, technically, you can create fake OSD to hold numbers and release 
those 'fake OSD' if you need to use their numbers, but you are really 
complicate everything. I suggest you to stop worrying about numbers. If 
you are ok that every OSD on every sever is using /dev/sdb (OCD requires 
that server1 uses /dev/sda, server2 uses /dev/sdb, server 3 /dev/sdc, 
etc), so you should be fine with random OSD numbers. Moreover, you 
should be fine with discrepancy on sorting order of OSD uuid and their 
numbers, misalignment of IP adddress and OSD тumber (192.168.0.4 for 
OSD.1 ).


While it's may be fun to play with numbers in a lab, if you are using 
Ceph in production, you should avoid doing unnecessary changes, as they 
will surprise other people (and you!) trying to keep this thing running.


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Orchestrator & ceph osd purge

2020-09-14 Thread Robert Sander
Hi,

is it correct that when using the orchestrator to deploy and manage a
cluster you should not use "ceph osd purge" any more as the orchestrator
then is not able to find the OSD for the "ceph orch osd rm" operation?

Regards
-- 
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph-osd performance on ram disk

2020-09-14 Thread George Shuklin

On 11/09/2020 17:44, Mark Nelson wrote:


On 9/11/20 4:15 AM, George Shuklin wrote:

On 10/09/2020 19:37, Mark Nelson wrote:

On 9/10/20 11:03 AM, George Shuklin wrote:


...
Are there any knobs to tweak to see higher performance for 
ceph-osd? I'm pretty sure it's not any kind of leveling, GC or 
other 'iops-related' issues (brd has performance of two order of 
magnitude higher).




...

I've disabled CSTATE (governor=performance), it make no difference - 
same iops, same CPU use by ceph-osd  I've just can't force Ceph to 
consume more than 330% of CPU. I can force read up to 150k IOPS (both 
network and local), hitting CPU limit, but write is somewhat 
restricted by ceph itself.



Ok, can I assume block/db/wal are all on the ramdisk?  I'd start a 
benchmark and attach gdbpmp to the OSD and see if you can get a 
callgraph (1000 samples is nice if you don't mind waiting a bit). That 
will tell us a lot more about where the code is spending time.  It 
will slow the benchmark way down fwiw.  Some other things you could 
try:  Try to tweak the number of osd worker threads to better match 
the number of cores in your system.  Too many and you end up with 
context switching.  Too few and you limit parallelism.  You can also 
check rocksdb compaction stats in the osd logs using this tool:



https://github.com/ceph/cbt/blob/master/tools/ceph_rocksdb_log_parser.py


Given that you are on ramdisk the 1GB default WAL limit should be 
plenty to let you avoid WAL throttling during compaction, but just 
verifying that compactions are not taking a long time is good peace of 
mind. 



Thank you very much for feedback. In my case all OSD data was on the brd 
device. (To test it just create a ramdisk: modprobe brd rd_size=20G, 
create pv and vg for ceph, and let ceph-ansible to consume them as OSD 
devices).


The stuff you've give me here is really cool, but a bit out of my skills 
now. I wrote them into my tasklist, and I'll continue to research this 
topic further.


Thank you for directions to look into.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Issues with the ceph-bluestore-tool during cluster upgrade from Mimic to Nautilus

2020-09-14 Thread Igor Fedotov
Well, I can see duplicate admin socket command 
registration/de-registration (and the second de-registration asserts) 
but don't understand how this could happen.


Would you share the full log, please?


Thanks,

Igor

On 9/11/2020 7:26 PM, Jean-Philippe Méthot wrote:

Here’s the out file, as requested.




Jean-Philippe Méthot
Senior Openstack system administrator
Administrateur système Openstack sénior
PlanetHoster inc.
4414-4416 Louis B Mayer
Laval, QC, H7P 0G1, Canada
TEL : +1.514.802.1644 - Poste : 2644
FAX : +1.514.612.0678
CA/US : 1.855.774.4678
FR : 01 76 60 41 43
UK : 0808 189 0423






Le 11 sept. 2020 à 10:38, Igor Fedotov > a écrit :


Could you please run:

CEPH_ARGS="--log-file log --debug-asok 5" ceph-bluestore-tool repair 
--path <...> ; cat log | grep asok > out


and share 'out' file.


Thanks,

Igor

On 9/11/2020 5:15 PM, Jean-Philippe Méthot wrote:

Hi,

We’re upgrading our cluster OSD node per OSD node to Nautilus from 
Mimic. From some release notes, it was recommended to run the 
following command to fix stats after an upgrade :


ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-0

However, running that command gives us the following error message:

/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/os/bluestore/Allocator.cc 
: In
 function 'virtual Allocator::SocketHook::~SocketHook()' thread 
7f1a6467eec0 time 2020-09-10 14:40:25.872353
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/os/bluestore/Allocator.cc 
: 53

: FAILED ceph_assert(r == 0)
 ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) 
nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x14a) [0x7f1a5a823025]

 2: (()+0x25c1ed) [0x7f1a5a8231ed]
 3: (()+0x3c7a4f) [0x55b33537ca4f]
 4: (HybridAllocator::~HybridAllocator()+0x17) [0x55b3353ac517]
 5: (BlueStore::_close_alloc()+0x42) [0x55b3351f2082]
 6: (BlueStore::_close_db_and_around(bool)+0x2f8) [0x55b335274528]
 7: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x2c1) 
[0x55b3352749a1]

 8: (main()+0x10b3) [0x55b335187493]
 9: (__libc_start_main()+0xf5) [0x7f1a574aa555]
 10: (()+0x1f9b5f) [0x55b3351aeb5f]
2020-09-10 14:40:25.873 7f1a6467eec0 -1 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/os/bluestore/Allocator.cc 
: In function 'virtual 
Allocator::SocketHook::~SocketHook()' thread 7f1a6467eec0 time 
2020-09-10 14:40:25.872353
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/os/bluestore/Allocator.cc 
: 53: FAILED ceph_assert(r == 0)


 ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) 
nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x14a) [0x7f1a5a823025]

 2: (()+0x25c1ed) [0x7f1a5a8231ed]
 3: (()+0x3c7a4f) [0x55b33537ca4f]
 4: (HybridAllocator::~HybridAllocator()+0x17) [0x55b3353ac517]
 5: (BlueStore::_close_alloc()+0x42) [0x55b3351f2082]
 6: (BlueStore::_close_db_and_around(bool)+0x2f8) [0x55b335274528]
 7: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x2c1) 
[0x55b3352749a1]

 8: (main()+0x10b3) [0x55b335187493]
 9: (__libc_start_main()+0xf5) [0x7f1a574aa555]
 10: (()+0x1f9b5f) [0x55b3351aeb5f]
*** Caught signal (Aborted) **
 in thread 7f1a6467eec0 thread_name:ceph-bluestore-
ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) 
nautilus (stable)

 1: (()+0xf630) [0x7f1a58cf0630]
 2: (gsignal()+0x37) [0x7f1a574be387]
 3: (abort()+0x148) [0x7f1a574bfa78]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x199) [0x7f1a5a823074]

 5: (()+0x25c1ed) [0x7f1a5a8231ed]
 6: (()+0x3c7a4f) [0x55b33537ca4f]
 7: (HybridAllocator::~HybridAllocator()+0x17) [0x55b3353ac517]
 8: (BlueStore::_close_alloc()+0x42) [0x55b3351f2082]
 9: (BlueStore::_close_db_and_around(bool)+0x2f8) [0x55b335274528]
 10: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x2c1) 
[0x55b3352749a1]

 11: (main()+0x10b3) [0x55b335187493]
 12: (__libc_start_main()+0xf5) [0x7f1a574aa555]
 13: (()+0x1f9b5f) [0x55b3351aeb5f]
2020-09-10 14:40:25.874 7f1a6467eec0 -1 *** Caught signal (Aborted) **
 in thread 7f1a6467eec0 thread_name:ceph-bluestore-

 ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) 
nautilus (stable)

 1: (()+0xf630) [0x7f1a58cf0630]
 2: (gsignal()+0x37) [0x7f1a574be387]
 3: (abort()+0x148) [0x7f1a574bfa78]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x

[ceph-users] Re: ceph pgs inconsistent, always the same checksum

2020-09-14 Thread Igor Fedotov

Hi David,

you might want to try to disable swap for your nodes. Look like there is 
some implicit correlation between such read errors and enabled swapping.


Also wondering whether you can observe non-zero values for 
"bluestore_reads_with_retries" performance counters over your OSDs. How 
wide-spread these cases are present? How high this counter might get?



Thanks,

Igor


On 9/9/2020 4:59 PM, David Orman wrote:

Right, you can see the previously referenced ticket/bug in the link I had
provided. It's definitely not an unknown situation.

We have another one today:

debug 2020-09-09T06:49:36.595+ 7f570871d700 -1
bluestore(/var/lib/ceph/osd/ceph-123) _verify_csum bad crc32c/0x1000
checksum at blob offset 0x6, got 0x6706be76, expected 0x929a618, device
location [0x2f387d7~1000], logical extent 0xe~1000, object
0#2:7ff493bc:::rbd_data.3.20d195d612942.04228a96:head#

debug 2020-09-09T06:49:36.611+ 7f570871d700 -1
bluestore(/var/lib/ceph/osd/ceph-123) _verify_csum bad crc32c/0x1000
checksum at blob offset 0x6, got 0x6706be76, expected 0x929a618, device
location [0x2f387d7~1000], logical extent 0xe~1000, object
0#2:7ff493bc:::rbd_data.3.20d195d612942.04228a96:head#

debug 2020-09-09T06:49:36.611+ 7f570871d700 -1
bluestore(/var/lib/ceph/osd/ceph-123) _verify_csum bad crc32c/0x1000
checksum at blob offset 0x6, got 0x6706be76, expected 0x929a618, device
location [0x2f387d7~1000], logical extent 0xe~1000, object
0#2:7ff493bc:::rbd_data.3.20d195d612942.04228a96:head#

debug 2020-09-09T06:49:36.611+ 7f570871d700 -1
bluestore(/var/lib/ceph/osd/ceph-123) _verify_csum bad crc32c/0x1000
checksum at blob offset 0x6, got 0x6706be76, expected 0x929a618, device
location [0x2f387d7~1000], logical extent 0xe~1000, object
0#2:7ff493bc:::rbd_data.3.20d195d612942.04228a96:head#

debug 2020-09-09T06:49:37.315+ 7f570871d700 -1 log_channel(cluster) log
[ERR] : 2.3fe shard 123(0) soid
2:7ff493bc:::rbd_data.3.20d195d612942.04228a96:head : candidate had
a read error

debug 2020-09-09T06:57:08.930+ 7f570871d700 -1 log_channel(cluster) log
[ERR] : 2.3fes0 deep-scrub 0 missing, 1 inconsistent objects

debug 2020-09-09T06:57:08.930+ 7f570871d700 -1 log_channel(cluster) log
[ERR] : 2.3fe deep-scrub 1 errors

This happens across the entire cluster, not just one server, so we don't
think it's faulty hardware.

On Wed, Sep 9, 2020 at 12:51 AM Janne Johansson  wrote:


I googled "got 0x6706be76, expected" and found some hits regarding ceph,
so whatever it is, you are not the first, and that number has some internal
meaning.
Redhat solution for similar issue says that checksum is for seeing all
zeroes, and hints at a bad write cache on the controller or something that
ends up clearing data instead of writing the correct information on
shutdowns.


Den tis 8 sep. 2020 kl 23:21 skrev David Orman :



We're seeing repeated inconsistent PG warnings, generally on the order of
3-10 per week.

 pg 2.b9 is active+clean+inconsistent, acting [25,117,128,95,151,15]





Every time we look at them, we see the same checksum (0x6706be76):

debug 2020-08-13T18:39:01.731+ 7fbc037a7700 -1
bluestore(/var/lib/ceph/osd/ceph-25) _verify_csum bad crc32c/0x1000
checksum at blob offset 0x0, got 0x6706be76, expected 0x61f2021c, device
location [0x12b403c~1000], logical extent 0x0~1000, object
2#2:0f1a338f:::rbd_data.3.20d195d612942.01db869b:head#

This looks a lot like: https://tracker.ceph.com/issues/22464
That said, we've got the following versions in play (cluster was created
with 15.2.3):
ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus
(stable)



--
May the most significant bit of your life be positive.


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Nautilus Scrub and deep-Scrub execution order

2020-09-14 Thread Johannes L
Hello Ceph-Users

after upgrading one of our clusters to Nautilus we noticed the x pgs not 
scrubbed/deep-scrubbed in time warnings.
Through some digging we found out that it seems like the scrubbing takes place 
at random and doesn't take the age of the last scrub/deep-scrub into 
consideration.
I dumped the time of the last scrub with a 90 min gap in between:
ceph pg dump | grep active | awk '{print $22}' | sort | uniq -c
dumped all
   2434 2020-08-30
   5935 2020-08-31
   1782 2020-09-01
  2 2020-09-02
  2 2020-09-03
  5 2020-09-06
  3 2020-09-08
  5 2020-09-09
 17 2020-09-10
259 2020-09-12
  26672 2020-09-13
  12036 2020-09-14

dumped all
   2434 2020-08-30
   5933 2020-08-31
   1782 2020-09-01
  2 2020-09-02
  2 2020-09-03
  5 2020-09-06
  3 2020-09-08
  5 2020-09-09
 17 2020-09-10
 51 2020-09-12
  24862 2020-09-13
  14056 2020-09-14

It is pretty obvious that the PGs that have been scrubbed a day ago have been 
scrubbed again for some reason while ones that are 2 weeks old are basically 
left untouched.
One way we are currently dealing with this issue is setting the 
osd_scrub_min_interval to 72h to force the cluster to scrub the older PGs.
This can't be intentional.
Has anyone else seen this behavior?

Kind regards
Johannes
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: virtual machines crashes after upgrade to octopus

2020-09-14 Thread Jason Dillaman
On Mon, Sep 14, 2020 at 5:13 AM Lomayani S. Laizer  wrote:
>
> Hello,
> Last week i got time to try debug crashes of these vms
>
> Below  log  includes rados debug which i left last time
>
> https://storage.habari.co.tz/index.php/s/AQEJ7tQS7epC4Zn
>
> I have observed the following with these settings in openstack and ceph
>
> disk_cachemodes="network=writeback" is set in openstack environment
> and in ceph  rbd_cache_policy = writearound is set--- crashes occur
>
> disk_cachemodes="network=writeback" is set in openstack environment
> and in ceph  rbd_cache_policy = writeback is set---  no crashes
>
> disk_cachemodes="none" is set in openstack environment and in ceph
> rbd_cache_policy = writearound is set  no crashes
>
> disk_cachemodes="none" is set in openstack environment and in ceph
> rbd_cache_policy = writeback is set --- crashes occur
>
> Is disk_cachemodes="network=writeback" is no longer recommended in
> octopus because i see it is left out in new documentation for octopus?
>
> https://ceph.readthedocs.io/en/latest/rbd/rbd-openstack/

Can you try the latest development release of Octopus [1]? A librbd
crash fix has been sitting in that branch for about a month now to be
included in the next point release.

>
>
> >* > Hello,*
> >* >*
> >* > Below is full debug log of 2 minutes before crash of virtual machine.*
> >* Download from below url*
> >* >*
> >* > https://storage.habari.co.tz/index.php/s/31eCwZbOoRTMpcU 
> >*
> >
> >* This log has rbd debug output, but not rados :(*
> >
> >* I guess you'll need to try and capture a coredump if you can't get a*
> >* backtrace.*
> >
> >* I'd also suggest opening a tracker in case one of the rbd devs has any*
> >* ideas on this, or has seen something similar. Without a backtrace or*
> >* core it will be impossible to definitively identify the issue though.*
> >
>
> +1 to needing the backtrace. I don't see any indications of a problem in
> that log.
>
>
> >* >*
> >* >*
> >* > apport.log*
> >* >*
> >* > Wed May 13 09:35:30 2020: host pid 4440 crashed in a separate mount*
> >* namespace, ignoring*
> >* >*
> >* > kernel.log*
> >* > May 13 09:35:30 compute5 kernel: [123071.373217] fn-radosclient[4485]:*
> >* segfault at 0 ip 7f4c8c85d7ed sp 7f4c66ffc470 error 4 in*
> >* librbd.so.1.12.0[7f4c8c65a000+5cb000]*
> >* > May 13 09:35:30 compute5 kernel: [123071.373228] Code: 8d 44 24 08 48 81*
> >* c3 d8 3e 00 00 49 21 f9 48 c1 e8 30 83 c0 01 48 c1 e0 30 48 89 02 48 8b 03*
> >* 48 89 04 24 48 8b 34 24 48 21 fe <48> 8b 06 48 89 44 24 08 48 8b 44 24 08*
> >* 48 8b 0b 48 21 f8 48 39 0c*
> >* > May 13 09:35:33 compute5 kernel: [123074.832700] brqa72d845b-e9: port*
> >* 1(tap33511c4d-2c) entered disabled state*
> >* > May 13 09:35:33 compute5 kernel: [123074.838520] device tap33511c4d-2c*
> >* left promiscuous mode*
> >* > May 13 09:35:33 compute5 kernel: [123074.838527] brqa72d845b-e9: port*
> >* 1(tap33511c4d-2c) entered disabled state*
> >* >*
> >* > syslog*
> >* > compute5 kernel: [123071.373217] fn-radosclient[4485]: segfault at 0 ip*
> >* 7f4c8c85d7ed sp 7f4c66ffc470 error 4 i*
> >* > n librbd.so.1.12.0[7f4c8c65a000+5cb000]*
> >* > May 13 09:35:30 compute5 kernel: [123071.373228] Code: 8d 44 24 08 48 81*
> >* c3 d8 3e 00 00 49 21 f9 48 c1 e8 30 83 c0 01 48 c1 e0 30 48 8*
> >* > 9 02 48 8b 03 48 89 04 24 48 8b 34 24 48 21 fe <48> 8b 06 48 89 44 24 08*
> >* 48 8b 44 24 08 48 8b 0b 48 21 f8 48 39 0c*
> >* > May 13 09:35:30 compute5 libvirtd[1844]: internal error: End of file*
> >* from qemu monitor*
> >* > May 13 09:35:33 compute5 systemd-networkd[1326]: tap33511c4d-2c: Link*
> >* DOWN*
> >* > May 13 09:35:33 compute5 systemd-networkd[1326]: tap33511c4d-2c: Lost*
> >* carrier*
> >* > May 13 09:35:33 compute5 kernel: [123074.832700] brqa72d845b-e9: port*
> >* 1(tap33511c4d-2c) entered disabled state*
> >* > May 13 09:35:33 compute5 kernel: [123074.838520] device tap33511c4d-2c*
> >* left promiscuous mode*
> >* > May 13 09:35:33 compute5 kernel: [123074.838527] brqa72d845b-e9: port*
> >* 1(tap33511c4d-2c) entered disabled state*
> >* > May 13 09:35:33 compute5 networkd-dispatcher[1614]: Failed to request*
> >* link: No such device*
> >* >*
> >* > On Fri, May 8, 2020 at 5:40 AM Brad Hubbard  wrote:*
> >* >>*
> >* >> On Fri, May 8, 2020 at 12:10 PM Lomayani S. Laizer 
> >*
> >* wrote:*
> >* >> >*
> >* >> > Hello,*
> >* >> > On my side at point of vm crash these are logs below. At the moment*
> >* my debug is at 10 value. I will rise to 20 for full debug. these crashes*
> >* are random and so far happens on very busy vms. Downgrading clients in 
> >host*
> >* to Nautilus these crashes disappear*
> >* >>*
> >* >> You could try adding debug_rados as well but you may get a very large*
> >* >> log so keep an eye on things.*
> >* >>*
> >* >> >*
> >* >> > Qemu is not shutting down in general because other vms on the same*
> >* host continues working*
> >* >>*
> >* >> A process can not reliably continue after

[ceph-users] Re: ceph rbox test on passive compressed pool

2020-09-14 Thread Marc Roos
 

>   mail/b875f40571f1545ff43052412a8e mtime 2020-09-06 
> 16:25:53.00,
>   size 63580
>   mail/e87c120b19f1545ff43052412a8e mtime 2020-09-06 
> 16:24:25.00,
>   size 525

Hi David, How is this going. To me this looks more like deduplication 
than compression. This is a link[1] to the 62kb text I used



[1]
https://pastebin.pl/view/e45ac998
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph rbox test on passive compressed pool

2020-09-14 Thread Marc Roos



>   mail/b875f40571f1545ff43052412a8e mtime 2020-09-06 
> 16:25:53.00,
>   size 63580
>   mail/e87c120b19f1545ff43052412a8e mtime 2020-09-06 
> 16:24:25.00,
>   size 525

Hi David, How is this going? To me this looks more like deduplication 
than compression. This is a link[1] to the 62kb text I used. I cannot 
really believe that this compresses to 526 bytes. If I compress this 
with gzip, it is already 7189 bytes.


[1]
https://pastebin.pl/view/e45ac998

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] New pool with SSD OSDs

2020-09-14 Thread Tecnologia Charne.Net

Hello!

We have a Ceph cluster with 30 HDD 4 TB in 6 hosts, only for RBD.

Now, we're receiving other 6 servers with 6 SSD 2 TB each and we want to 
create a separate pool for RBD on SSD, and let unused and backup volumes 
stays in HDD.



I have some questions:


As I am only using "replicated_rule". ¿If I add an SSD OSD to the 
cluster, Ceph starts to migrate PGs to it?


If so, to prevent this, first I have to create rule like

    # ceph osd crush rule create-replicated pool-hdd default host hdd

and then

    #ceph osd pool set rbd crush_rule pool-hdd

?


Or, if Ceph does not mix automatically hdd and ssd, I create the SSD OSD 
and then


    # ceph osd crush rule create-replicated pool-ssd default host ssd

    # ceph osd pool create pool-ssd 256 256 ssdpool

?

And then migrate images from one to another pool as needed.


Any thoughts are wellcome!

Thanks in advanced for your time.


Javier.-






___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Choosing suitable SSD for Ceph cluster

2020-09-14 Thread vitalif
Samsung PM983 M.2
 I want to have a separate disk for buckets index pool and all of my server 
bays are full and I should use m2 storage devices. Also the bucket index 
doesn't need much space so I plan to have a 6x device with replica 3 for it. 
Each disk could be 240GB to not waste space but there is no enterprise nvme 
disk in this space! Do you have any recommendations? 
 On Sun, Sep 13, 2020 at 10:17 PM Виталий Филиппов mailto:vita...@yourcmc.ru)> wrote: 
Easy, 883 has capacitors and 970 evo doesn't
13 сентября 2020 г. 0:57:43 GMT+03:00, Seena Fallah mailto:seenafal...@gmail.com)> пишет: 

Hi. How do you say 883DCT is faster than 970 EVO? I saw the 
specifications and 970 EVO has higher IOPS than 883DCT! Can you please tell why 
970 EVO act lower than 883DCT? 

ceph-users mailing list -- ceph-users@ceph.io (mailto:ceph-users@ceph.io)
To unsubscribe send an email to ceph-users-le...@ceph.io 
(mailto:ceph-users-le...@ceph.io)   
--
With best regards,
Vitaliy Filippov
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Choosing suitable SSD for Ceph cluster

2020-09-14 Thread vitalif
There's also Micron 7300 Pro/Max. Please benchmark it like described here 
https://docs.google.com/spreadsheets/d/1E9-eXjzsKboiCCX-0u0r5fAjjufLKayaut_FOPxYZjc/edit
 
(https://docs.google.com/spreadsheets/d/1E9-eXjzsKboiCCX-0u0r5fAjjufLKayaut_FOPxYZjc/edit)
 and send me the results if you get one :))
Samsung PM983 M.2
 I want to have a separate disk for buckets index pool and all of my server 
bays are full and I should use m2 storage devices. Also the bucket index 
doesn't need much space so I plan to have a 6x device with replica 3 for it. 
Each disk could be 240GB to not waste space but there is no enterprise nvme 
disk in this space! Do you have any recommendations? 
 On Sun, Sep 13, 2020 at 10:17 PM Виталий Филиппов mailto:vita...@yourcmc.ru)> wrote: 
Easy, 883 has capacitors and 970 evo doesn't
13 сентября 2020 г. 0:57:43 GMT+03:00, Seena Fallah mailto:seenafal...@gmail.com)> пишет: 

Hi. How do you say 883DCT is faster than 970 EVO? I saw the 
specifications and 970 EVO has higher IOPS than 883DCT! Can you please tell why 
970 EVO act lower than 883DCT? 

ceph-users mailing list -- ceph-users@ceph.io (mailto:ceph-users@ceph.io)
To unsubscribe send an email to ceph-users-le...@ceph.io 
(mailto:ceph-users-le...@ceph.io)   
--
With best regards,
Vitaliy Filippov
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Choosing suitable SSD for Ceph cluster

2020-09-14 Thread Seena Fallah
Thanks for the sheet. I need a low space disk for my use case (around
240GB). Do you have any suggestions with M.2 and capacitors?

On Mon, Sep 14, 2020 at 6:11 PM  wrote:

> There's also Micron 7300 Pro/Max. Please benchmark it like described here
> https://docs.google.com/spreadsheets/d/1E9-eXjzsKboiCCX-0u0r5fAjjufLKayaut_FOPxYZjc/edit
> and send me the results if you get one :))
>
> Samsung PM983 M.2
>
> I want to have a separate disk for buckets index pool and all of my server
> bays are full and I should use m2 storage devices. Also the bucket index
> doesn't need much space so I plan to have a 6x device with replica 3 for
> it. Each disk could be 240GB to not waste space but there is no enterprise
> nvme disk in this space! Do you have any recommendations?
> On Sun, Sep 13, 2020 at 10:17 PM Виталий Филиппов 
> wrote:
>
> Easy, 883 has capacitors and 970 evo doesn't
> 13 сентября 2020 г. 0:57:43 GMT+03:00, Seena Fallah 
> пишет:
>
> Hi. How do you say 883DCT is faster than 970 EVO? I saw the specifications 
> and 970 EVO has higher IOPS than 883DCT! Can you please tell why 970 EVO act 
> lower than 883DCT?
>
> --
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> --
> With best regards,
> Vitaliy Filippov
>
>
>
>
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Choosing suitable SSD for Ceph cluster

2020-09-14 Thread response
https://www.kingston.com/unitedkingdom/en/ssd/dc1000b-data-center-boot-ssd

look good for your purpose. 



- Original Message -
From: "Seena Fallah" 
To: "Виталий Филиппов" 
Cc: "Anthony D'Atri" , "ceph-users" 

Sent: Monday, September 14, 2020 2:47:14 PM
Subject: [ceph-users] Re: Choosing suitable SSD for Ceph cluster

Thanks for the sheet. I need a low space disk for my use case (around
240GB). Do you have any suggestions with M.2 and capacitors?

On Mon, Sep 14, 2020 at 6:11 PM  wrote:

> There's also Micron 7300 Pro/Max. Please benchmark it like described here
> https://docs.google.com/spreadsheets/d/1E9-eXjzsKboiCCX-0u0r5fAjjufLKayaut_FOPxYZjc/edit
> and send me the results if you get one :))
>
> Samsung PM983 M.2
>
> I want to have a separate disk for buckets index pool and all of my server
> bays are full and I should use m2 storage devices. Also the bucket index
> doesn't need much space so I plan to have a 6x device with replica 3 for
> it. Each disk could be 240GB to not waste space but there is no enterprise
> nvme disk in this space! Do you have any recommendations?
> On Sun, Sep 13, 2020 at 10:17 PM Виталий Филиппов 
> wrote:
>
> Easy, 883 has capacitors and 970 evo doesn't
> 13 сентября 2020 г. 0:57:43 GMT+03:00, Seena Fallah 
> пишет:
>
> Hi. How do you say 883DCT is faster than 970 EVO? I saw the specifications 
> and 970 EVO has higher IOPS than 883DCT! Can you please tell why 970 EVO act 
> lower than 883DCT?
>
> --
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> --
> With best regards,
> Vitaliy Filippov
>
>
>
>
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: New pool with SSD OSDs

2020-09-14 Thread Marc Roos
 
I did the same, 1 or 2 years ago, creating a replicated_ruleset_hdd and 
replicated_ruleset_ssd. Eventhough I did not have any ssd's on any of 
the nodes at that time, adding this hdd type criteria made pg's migrate. 
I thought it was strange that this happens on a hdd only cluster, so I 
mentioned it here. I am not sure however if this is still an issue, but 
better take this into account.





-Original Message-
To: ceph-users@ceph.io
Subject: [ceph-users] New pool with SSD OSDs

Hello!

We have a Ceph cluster with 30 HDD 4 TB in 6 hosts, only for RBD.

Now, we're receiving other 6 servers with 6 SSD 2 TB each and we want to 
create a separate pool for RBD on SSD, and let unused and backup volumes 
stays in HDD.


I have some questions:


As I am only using "replicated_rule". ¿If I add an SSD OSD to the 
cluster, Ceph starts to migrate PGs to it?

If so, to prevent this, first I have to create rule like

     # ceph osd crush rule create-replicated pool-hdd default host hdd

and then

     #ceph osd pool set rbd crush_rule pool-hdd

?


Or, if Ceph does not mix automatically hdd and ssd, I create the SSD OSD 
and then

     # ceph osd crush rule create-replicated pool-ssd default host ssd

     # ceph osd pool create pool-ssd 256 256 ssdpool

?

And then migrate images from one to another pool as needed.


Any thoughts are wellcome!

Thanks in advanced for your time.


Javier.-

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: New pool with SSD OSDs

2020-09-14 Thread André Gemünd
Same happened to us two weeks ago using nautilus, although we added the rules 
and storage classes. 

- Am 14. Sep 2020 um 16:02 schrieb Marc Roos m.r...@f1-outsourcing.eu:

> I did the same, 1 or 2 years ago, creating a replicated_ruleset_hdd and
> replicated_ruleset_ssd. Eventhough I did not have any ssd's on any of
> the nodes at that time, adding this hdd type criteria made pg's migrate.
> I thought it was strange that this happens on a hdd only cluster, so I
> mentioned it here. I am not sure however if this is still an issue, but
> better take this into account.
> 
> 
> 
> 
> 
> -Original Message-
> To: ceph-users@ceph.io
> Subject: [ceph-users] New pool with SSD OSDs
> 
> Hello!
> 
> We have a Ceph cluster with 30 HDD 4 TB in 6 hosts, only for RBD.
> 
> Now, we're receiving other 6 servers with 6 SSD 2 TB each and we want to
> create a separate pool for RBD on SSD, and let unused and backup volumes
> stays in HDD.
> 
> 
> I have some questions:
> 
> 
> As I am only using "replicated_rule". ¿If I add an SSD OSD to the
> cluster, Ceph starts to migrate PGs to it?
> 
> If so, to prevent this, first I have to create rule like
> 
>     # ceph osd crush rule create-replicated pool-hdd default host hdd
> 
> and then
> 
>     #ceph osd pool set rbd crush_rule pool-hdd
> 
> ?
> 
> 
> Or, if Ceph does not mix automatically hdd and ssd, I create the SSD OSD
> and then
> 
>     # ceph osd crush rule create-replicated pool-ssd default host ssd
> 
>     # ceph osd pool create pool-ssd 256 256 ssdpool
> 
> ?
> 
> And then migrate images from one to another pool as needed.
> 
> 
> Any thoughts are wellcome!
> 
> Thanks in advanced for your time.
> 
> 
> Javier.-
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
Dipl.-Inf. André Gemünd, Leiter IT / Head of IT
Fraunhofer-Institute for Algorithms and Scientific Computing
andre.gemu...@scai.fraunhofer.de
Tel: +49 2241 14-2193
/C=DE/O=Fraunhofer/OU=SCAI/OU=People/CN=Andre Gemuend
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph-container: docker restart, mon's unable to join

2020-09-14 Thread Stefan Kooman
Hi,

In an attempt to get a (test) Mimic cluster running on Ubuntu 20.04 we
are using docker with ceph-container images (ceph/daemon:latest-mimic).
Deploying monitors and mgrs works fine. If however a monitor container
gets stopped and started (i.e. docker restart) two out of three (with
exception of mon initial member) mons won't join the cluster anymore and
keep logging the following:

/opt/ceph-container/bin/entrypoint.sh: Existing mon, trying to rejoin
cluster...

If docker is stopped, the mon directory "/var/lib/ceph/mon/$mon-name"
removed and docker started again the mon is able to join the cluster.
This directory is a persistent volume with correct permissions
(167.167). No etcd cluster is in use here. We manually copied the
/etc/ceph and /var/lib/ceph directories to the docker hosts.

Any hints on how to make a mon container survive a reboot is welcome.

Gr. Stefan

P.s And yes, we know about Rook, kubernetes, etc. but that's not what
want to use now.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Issues with the ceph-bluestore-tool during cluster upgrade from Mimic to Nautilus

2020-09-14 Thread Igor Fedotov

Thanks!

Now got the root cause. The fix is on its way...

Meanwhile you might want to try to workaround the issue via setting 
"bluestore_hybrid_alloc_mem_cap" to 0 or using different allocator, e.g. 
avl for bluestore_allocator (and optionally for bluefs_allocator too).



Hope this helps,

Igor.



On 9/14/2020 5:02 PM, Jean-Philippe Méthot wrote:

Alright, here’s the full log file.





Jean-Philippe Méthot
Senior Openstack system administrator
Administrateur système Openstack sénior
PlanetHoster inc.
4414-4416 Louis B Mayer
Laval, QC, H7P 0G1, Canada
TEL : +1.514.802.1644 - Poste : 2644
FAX : +1.514.612.0678
CA/US : 1.855.774.4678
FR : 01 76 60 41 43
UK : 0808 189 0423






Le 14 sept. 2020 à 06:49, Igor Fedotov > a écrit :


Well, I can see duplicate admin socket command 
registration/de-registration (and the second de-registration asserts) 
but don't understand how this could happen.


Would you share the full log, please?


Thanks,

Igor

On 9/11/2020 7:26 PM, Jean-Philippe Méthot wrote:

Here’s the out file, as requested.




Jean-Philippe Méthot
Senior Openstack system administrator
Administrateur système Openstack sénior
PlanetHoster inc.
4414-4416 Louis B Mayer
Laval, QC, H7P 0G1, Canada
TEL : +1.514.802.1644 - Poste : 2644
FAX : +1.514.612.0678
CA/US : 1.855.774.4678
FR : 01 76 60 41 43
UK : 0808 189 0423






Le 11 sept. 2020 à 10:38, Igor Fedotov > a écrit :


Could you please run:

CEPH_ARGS="--log-file log --debug-asok 5" ceph-bluestore-tool 
repair --path <...> ; cat log | grep asok > out


and share 'out' file.


Thanks,

Igor

On 9/11/2020 5:15 PM, Jean-Philippe Méthot wrote:

Hi,

We’re upgrading our cluster OSD node per OSD node to Nautilus from 
Mimic. From some release notes, it was recommended to run the 
following command to fix stats after an upgrade :


ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-0

However, running that command gives us the following error message:

/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/os/bluestore/Allocator.cc 
: In
 function 'virtual Allocator::SocketHook::~SocketHook()' thread 
7f1a6467eec0 time 2020-09-10 14:40:25.872353
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/os/bluestore/Allocator.cc 
: 53

: FAILED ceph_assert(r == 0)
 ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) 
nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x14a) [0x7f1a5a823025]

 2: (()+0x25c1ed) [0x7f1a5a8231ed]
 3: (()+0x3c7a4f) [0x55b33537ca4f]
 4: (HybridAllocator::~HybridAllocator()+0x17) [0x55b3353ac517]
 5: (BlueStore::_close_alloc()+0x42) [0x55b3351f2082]
 6: (BlueStore::_close_db_and_around(bool)+0x2f8) [0x55b335274528]
 7: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x2c1) 
[0x55b3352749a1]

 8: (main()+0x10b3) [0x55b335187493]
 9: (__libc_start_main()+0xf5) [0x7f1a574aa555]
 10: (()+0x1f9b5f) [0x55b3351aeb5f]
2020-09-10 14:40:25.873 7f1a6467eec0 -1 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/os/bluestore/Allocator.cc 
: In function 'virtual 
Allocator::SocketHook::~SocketHook()' thread 7f1a6467eec0 time 
2020-09-10 14:40:25.872353
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/os/bluestore/Allocator.cc 
: 53: FAILED ceph_assert(r == 0)


 ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) 
nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x14a) [0x7f1a5a823025]

 2: (()+0x25c1ed) [0x7f1a5a8231ed]
 3: (()+0x3c7a4f) [0x55b33537ca4f]
 4: (HybridAllocator::~HybridAllocator()+0x17) [0x55b3353ac517]
 5: (BlueStore::_close_alloc()+0x42) [0x55b3351f2082]
 6: (BlueStore::_close_db_and_around(bool)+0x2f8) [0x55b335274528]
 7: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x2c1) 
[0x55b3352749a1]

 8: (main()+0x10b3) [0x55b335187493]
 9: (__libc_start_main()+0xf5) [0x7f1a574aa555]
 10: (()+0x1f9b5f) [0x55b3351aeb5f]
*** Caught signal (Aborted) **
 in thread 7f1a6467eec0 thread_name:ceph-bluestore-
ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) 
nautilus (stable)

 1: (()+0xf630) [0x7f1a58cf0630]
 2: (gsignal()+0x37) [0x7f1a574be387]
 3: (abort()+0x148) [0x7f1a574bfa78]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x199) [0x7f1a5a823074]

 5: (()+0x25c1ed) [0x7f1a5a8231ed]
 6: (()+0x3c7a4f) [0x55b33537ca4f]
 7: (Hy

[ceph-users] Re: New pool with SSD OSDs

2020-09-14 Thread Tecnologia Charne.Net
Exactly! I created a replicated-hdd rule and set it to an existing small 
pool without any changes on OSDs (all HDD) and PGs starts migration... 
It seems like new rules forces migrations...


El 14/9/20 a las 11:09, André Gemünd escribió:

Same happened to us two weeks ago using nautilus, although we added the rules 
and storage classes.

- Am 14. Sep 2020 um 16:02 schrieb Marc Roos m.r...@f1-outsourcing.eu:


I did the same, 1 or 2 years ago, creating a replicated_ruleset_hdd and
replicated_ruleset_ssd. Eventhough I did not have any ssd's on any of
the nodes at that time, adding this hdd type criteria made pg's migrate.
I thought it was strange that this happens on a hdd only cluster, so I
mentioned it here. I am not sure however if this is still an issue, but
better take this into account.





-Original Message-
To: ceph-users@ceph.io
Subject: [ceph-users] New pool with SSD OSDs

Hello!

We have a Ceph cluster with 30 HDD 4 TB in 6 hosts, only for RBD.

Now, we're receiving other 6 servers with 6 SSD 2 TB each and we want to
create a separate pool for RBD on SSD, and let unused and backup volumes
stays in HDD.


I have some questions:


As I am only using "replicated_rule". ¿If I add an SSD OSD to the
cluster, Ceph starts to migrate PGs to it?

If so, to prevent this, first I have to create rule like

     # ceph osd crush rule create-replicated pool-hdd default host hdd

and then

     #ceph osd pool set rbd crush_rule pool-hdd

?


Or, if Ceph does not mix automatically hdd and ssd, I create the SSD OSD
and then

     # ceph osd crush rule create-replicated pool-ssd default host ssd

     # ceph osd pool create pool-ssd 256 256 ssdpool

?

And then migrate images from one to another pool as needed.


Any thoughts are wellcome!

Thanks in advanced for your time.


Javier.-

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Choosing suitable SSD for Ceph cluster

2020-09-14 Thread Martin Verges
Hello,

Please keep in mind that you can have significant operational problems if
you choose too small OSDs. Sometimes your OSDs require >40G for
osdmaps/pgmaps/... and the smaller you OSD, the more likely it will be a
problem as Ceph is totally unable to deal with full disks and break apart.

--
Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx


Am Mo., 14. Sept. 2020 um 15:58 Uhr schrieb :

> https://www.kingston.com/unitedkingdom/en/ssd/dc1000b-data-center-boot-ssd
>
> look good for your purpose.
>
>
>
> - Original Message -
> From: "Seena Fallah" 
> To: "Виталий Филиппов" 
> Cc: "Anthony D'Atri" , "ceph-users" <
> ceph-users@ceph.io>
> Sent: Monday, September 14, 2020 2:47:14 PM
> Subject: [ceph-users] Re: Choosing suitable SSD for Ceph cluster
>
> Thanks for the sheet. I need a low space disk for my use case (around
> 240GB). Do you have any suggestions with M.2 and capacitors?
>
> On Mon, Sep 14, 2020 at 6:11 PM  wrote:
>
> > There's also Micron 7300 Pro/Max. Please benchmark it like described here
> >
> https://docs.google.com/spreadsheets/d/1E9-eXjzsKboiCCX-0u0r5fAjjufLKayaut_FOPxYZjc/edit
> > and send me the results if you get one :))
> >
> > Samsung PM983 M.2
> >
> > I want to have a separate disk for buckets index pool and all of my
> server
> > bays are full and I should use m2 storage devices. Also the bucket index
> > doesn't need much space so I plan to have a 6x device with replica 3 for
> > it. Each disk could be 240GB to not waste space but there is no
> enterprise
> > nvme disk in this space! Do you have any recommendations?
> > On Sun, Sep 13, 2020 at 10:17 PM Виталий Филиппов 
> > wrote:
> >
> > Easy, 883 has capacitors and 970 evo doesn't
> > 13 сентября 2020 г. 0:57:43 GMT+03:00, Seena Fallah <
> seenafal...@gmail.com>
> > пишет:
> >
> > Hi. How do you say 883DCT is faster than 970 EVO? I saw the
> specifications and 970 EVO has higher IOPS than 883DCT! Can you please tell
> why 970 EVO act lower than 883DCT?
> >
> > --
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
> > --
> > With best regards,
> > Vitaliy Filippov
> >
> >
> >
> >
> >
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Choosing suitable SSD for Ceph cluster

2020-09-14 Thread Seena Fallah
Yes I'm planning to use only 60% - 70% of my disks and pools like
buckets.index doesn't grow too much and don't need much space! I'm just
trying to make this pool faster because I see it sometimes needs 1Milion
iops and I think NVME is a good option for this pool. But finding a good
datacenter NVME in low space is too hard :(

On Mon, Sep 14, 2020 at 7:32 PM Martin Verges 
wrote:

> Hello,
>
> Please keep in mind that you can have significant operational problems if
> you choose too small OSDs. Sometimes your OSDs require >40G for
> osdmaps/pgmaps/... and the smaller you OSD, the more likely it will be a
> problem as Ceph is totally unable to deal with full disks and break apart.
>
> --
> Martin Verges
> Managing director
>
> Mobile: +49 174 9335695
> E-Mail: martin.ver...@croit.io
> Chat: https://t.me/MartinVerges
>
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
>
> Web: https://croit.io
> YouTube: https://goo.gl/PGE1Bx
>
>
> Am Mo., 14. Sept. 2020 um 15:58 Uhr schrieb :
>
>> https://www.kingston.com/unitedkingdom/en/ssd/dc1000b-data-center-boot-ssd
>>
>> look good for your purpose.
>>
>>
>>
>> - Original Message -
>> From: "Seena Fallah" 
>> To: "Виталий Филиппов" 
>> Cc: "Anthony D'Atri" , "ceph-users" <
>> ceph-users@ceph.io>
>> Sent: Monday, September 14, 2020 2:47:14 PM
>> Subject: [ceph-users] Re: Choosing suitable SSD for Ceph cluster
>>
>> Thanks for the sheet. I need a low space disk for my use case (around
>> 240GB). Do you have any suggestions with M.2 and capacitors?
>>
>> On Mon, Sep 14, 2020 at 6:11 PM  wrote:
>>
>> > There's also Micron 7300 Pro/Max. Please benchmark it like described
>> here
>> >
>> https://docs.google.com/spreadsheets/d/1E9-eXjzsKboiCCX-0u0r5fAjjufLKayaut_FOPxYZjc/edit
>> > and send me the results if you get one :))
>> >
>> > Samsung PM983 M.2
>> >
>> > I want to have a separate disk for buckets index pool and all of my
>> server
>> > bays are full and I should use m2 storage devices. Also the bucket index
>> > doesn't need much space so I plan to have a 6x device with replica 3 for
>> > it. Each disk could be 240GB to not waste space but there is no
>> enterprise
>> > nvme disk in this space! Do you have any recommendations?
>> > On Sun, Sep 13, 2020 at 10:17 PM Виталий Филиппов 
>> > wrote:
>> >
>> > Easy, 883 has capacitors and 970 evo doesn't
>> > 13 сентября 2020 г. 0:57:43 GMT+03:00, Seena Fallah <
>> seenafal...@gmail.com>
>> > пишет:
>> >
>> > Hi. How do you say 883DCT is faster than 970 EVO? I saw the
>> specifications and 970 EVO has higher IOPS than 883DCT! Can you please tell
>> why 970 EVO act lower than 883DCT?
>> >
>> > --
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>> >
>> >
>> > --
>> > With best regards,
>> > Vitaliy Filippov
>> >
>> >
>> >
>> >
>> >
>> >
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: New pool with SSD OSDs

2020-09-14 Thread André Gemünd
Oh, yes, I think this would have helped indeed! 

Thanks for pointing it out.

Greetings
André

- Am 14. Sep 2020 um 16:48 schrieb Stefan Kooman ste...@bit.nl:

> On 2020-09-14 16:09, André Gemünd wrote:
>> Same happened to us two weeks ago using nautilus, although we added the rules
>> and storage classes.
> 
> I think this post [1] from Wido den Hollander might be useful
> information. That way you can avoid data movement if data is already on hdd.
> 
> Gr. Stefan
> 
> [1]: https://blog.widodh.nl/2019/02/comparing-two-ceph-crush-maps/

-- 
Dipl.-Inf. André Gemünd, Leiter IT / Head of IT
Fraunhofer-Institute for Algorithms and Scientific Computing
andre.gemu...@scai.fraunhofer.de
Tel: +49 2241 14-2193
/C=DE/O=Fraunhofer/OU=SCAI/OU=People/CN=Andre Gemuend
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: New pool with SSD OSDs

2020-09-14 Thread Tecnologia Charne.Net

Thanks Stefan!
Compiling crush map by hand on production cluster makes me sweat
but we like to take risks, don't we?


El 14/9/20 a las 11:48, Stefan Kooman escribió:

On 2020-09-14 16:09, André Gemünd wrote:

Same happened to us two weeks ago using nautilus, although we added the rules 
and storage classes.

I think this post [1] from Wido den Hollander might be useful
information. That way you can avoid data movement if data is already on hdd.

Gr. Stefan

[1]: https://blog.widodh.nl/2019/02/comparing-two-ceph-crush-maps/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph pgs inconsistent, always the same checksum

2020-09-14 Thread Welby McRoberts
Hi Igor

We'll take a look at disabling swap on the nodes and see if that improves
the situation.

Having checked across all osds we're not seeing
bluestore_reads_with_retries as anything other than a zero value. We get
the error anywhere from 3 - 10 occurrences of the error a week, but it's
usually only one or two PGs that are inconsistent at any one time.

Thanks
Welby

On Mon, Sep 14, 2020 at 12:17 PM Igor Fedotov  wrote:

> Hi David,
>
> you might want to try to disable swap for your nodes. Look like there is
> some implicit correlation between such read errors and enabled swapping.
>
> Also wondering whether you can observe non-zero values for
> "bluestore_reads_with_retries" performance counters over your OSDs. How
> wide-spread these cases are present? How high this counter might get?
>
>
> Thanks,
>
> Igor
>
>
> On 9/9/2020 4:59 PM, David Orman wrote:
> > Right, you can see the previously referenced ticket/bug in the link I had
> > provided. It's definitely not an unknown situation.
> >
> > We have another one today:
> >
> > debug 2020-09-09T06:49:36.595+ 7f570871d700 -1
> > bluestore(/var/lib/ceph/osd/ceph-123) _verify_csum bad crc32c/0x1000
> > checksum at blob offset 0x6, got 0x6706be76, expected 0x929a618,
> device
> > location [0x2f387d7~1000], logical extent 0xe~1000, object
> > 0#2:7ff493bc:::rbd_data.3.20d195d612942.04228a96:head#
> >
> > debug 2020-09-09T06:49:36.611+ 7f570871d700 -1
> > bluestore(/var/lib/ceph/osd/ceph-123) _verify_csum bad crc32c/0x1000
> > checksum at blob offset 0x6, got 0x6706be76, expected 0x929a618,
> device
> > location [0x2f387d7~1000], logical extent 0xe~1000, object
> > 0#2:7ff493bc:::rbd_data.3.20d195d612942.04228a96:head#
> >
> > debug 2020-09-09T06:49:36.611+ 7f570871d700 -1
> > bluestore(/var/lib/ceph/osd/ceph-123) _verify_csum bad crc32c/0x1000
> > checksum at blob offset 0x6, got 0x6706be76, expected 0x929a618,
> device
> > location [0x2f387d7~1000], logical extent 0xe~1000, object
> > 0#2:7ff493bc:::rbd_data.3.20d195d612942.04228a96:head#
> >
> > debug 2020-09-09T06:49:36.611+ 7f570871d700 -1
> > bluestore(/var/lib/ceph/osd/ceph-123) _verify_csum bad crc32c/0x1000
> > checksum at blob offset 0x6, got 0x6706be76, expected 0x929a618,
> device
> > location [0x2f387d7~1000], logical extent 0xe~1000, object
> > 0#2:7ff493bc:::rbd_data.3.20d195d612942.04228a96:head#
> >
> > debug 2020-09-09T06:49:37.315+ 7f570871d700 -1 log_channel(cluster)
> log
> > [ERR] : 2.3fe shard 123(0) soid
> > 2:7ff493bc:::rbd_data.3.20d195d612942.04228a96:head : candidate
> had
> > a read error
> >
> > debug 2020-09-09T06:57:08.930+ 7f570871d700 -1 log_channel(cluster)
> log
> > [ERR] : 2.3fes0 deep-scrub 0 missing, 1 inconsistent objects
> >
> > debug 2020-09-09T06:57:08.930+ 7f570871d700 -1 log_channel(cluster)
> log
> > [ERR] : 2.3fe deep-scrub 1 errors
> >
> > This happens across the entire cluster, not just one server, so we don't
> > think it's faulty hardware.
> >
> > On Wed, Sep 9, 2020 at 12:51 AM Janne Johansson 
> wrote:
> >
> >> I googled "got 0x6706be76, expected" and found some hits regarding ceph,
> >> so whatever it is, you are not the first, and that number has some
> internal
> >> meaning.
> >> Redhat solution for similar issue says that checksum is for seeing all
> >> zeroes, and hints at a bad write cache on the controller or something
> that
> >> ends up clearing data instead of writing the correct information on
> >> shutdowns.
> >>
> >>
> >> Den tis 8 sep. 2020 kl 23:21 skrev David Orman :
> >>
> >>>
> >>> We're seeing repeated inconsistent PG warnings, generally on the order
> of
> >>> 3-10 per week.
> >>>
> >>>  pg 2.b9 is active+clean+inconsistent, acting
> [25,117,128,95,151,15]
> >>>
> >>>
> >>
> >>> Every time we look at them, we see the same checksum (0x6706be76):
> >>>
> >>> debug 2020-08-13T18:39:01.731+ 7fbc037a7700 -1
> >>> bluestore(/var/lib/ceph/osd/ceph-25) _verify_csum bad crc32c/0x1000
> >>> checksum at blob offset 0x0, got 0x6706be76, expected 0x61f2021c,
> device
> >>> location [0x12b403c~1000], logical extent 0x0~1000, object
> >>> 2#2:0f1a338f:::rbd_data.3.20d195d612942.01db869b:head#
> >>>
> >>> This looks a lot like: https://tracker.ceph.com/issues/22464
> >>> That said, we've got the following versions in play (cluster was
> created
> >>> with 15.2.3):
> >>> ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus
> >>> (stable)
> >>>
> >>
> >> --
> >> May the most significant bit of your life be positive.
> >>
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing

[ceph-users] Re: Issues with the ceph-bluestore-tool during cluster upgrade from Mimic to Nautilus

2020-09-14 Thread Jean-Philippe Méthot
Alright, here’s the full log file.





Jean-Philippe Méthot
Senior Openstack system administrator
Administrateur système Openstack sénior
PlanetHoster inc.
4414-4416 Louis B Mayer
Laval, QC, H7P 0G1, Canada
TEL : +1.514.802.1644 - Poste : 2644
FAX : +1.514.612.0678
CA/US : 1.855.774.4678
FR : 01 76 60 41 43
UK : 0808 189 0423






> Le 14 sept. 2020 à 06:49, Igor Fedotov  a écrit :
> 
> Well, I can see duplicate admin socket command registration/de-registration 
> (and the second de-registration asserts) but don't understand how this could 
> happen.
> 
> Would you share the full log, please?
> 
> 
> 
> Thanks,
> 
> Igor
> 
> On 9/11/2020 7:26 PM, Jean-Philippe Méthot wrote:
>> Here’s the out file, as requested.
>> 
>> 
>> 
>> 
>> Jean-Philippe Méthot
>> Senior Openstack system administrator
>> Administrateur système Openstack sénior
>> PlanetHoster inc.
>> 4414-4416 Louis B Mayer
>> Laval, QC, H7P 0G1, Canada
>> TEL : +1.514.802.1644 - Poste : 2644
>> FAX : +1.514.612.0678
>> CA/US : 1.855.774.4678
>> FR : 01 76 60 41 43
>> UK : 0808 189 0423
>> 
>> 
>> 
>> 
>> 
>> 
>>> Le 11 sept. 2020 à 10:38, Igor Fedotov >> > a écrit :
>>> 
>>> Could you please run:
>>> 
>>> CEPH_ARGS="--log-file log --debug-asok 5" ceph-bluestore-tool repair --path 
>>> <...> ; cat log | grep asok > out
>>> 
>>> and share 'out' file.
>>> 
>>> 
>>> Thanks,
>>> 
>>> Igor
>>> 
>>> On 9/11/2020 5:15 PM, Jean-Philippe Méthot wrote:
 Hi,
 
 We’re upgrading our cluster OSD node per OSD node to Nautilus from Mimic. 
 From some release notes, it was recommended to run the following command 
 to fix stats after an upgrade :
 
 ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-0
 
 However, running that command gives us the following error message:
 
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/os/bluestore/Allocator.cc
>  : In
>  function 'virtual Allocator::SocketHook::~SocketHook()' thread 
> 7f1a6467eec0 time 2020-09-10 14:40:25.872353
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/os/bluestore/Allocator.cc
>  : 53
> : FAILED ceph_assert(r == 0)
>  ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus 
> (stable)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> const*)+0x14a) [0x7f1a5a823025]
>  2: (()+0x25c1ed) [0x7f1a5a8231ed]
>  3: (()+0x3c7a4f) [0x55b33537ca4f]
>  4: (HybridAllocator::~HybridAllocator()+0x17) [0x55b3353ac517]
>  5: (BlueStore::_close_alloc()+0x42) [0x55b3351f2082]
>  6: (BlueStore::_close_db_and_around(bool)+0x2f8) [0x55b335274528]
>  7: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x2c1) [0x55b3352749a1]
>  8: (main()+0x10b3) [0x55b335187493]
>  9: (__libc_start_main()+0xf5) [0x7f1a574aa555]
>  10: (()+0x1f9b5f) [0x55b3351aeb5f]
> 2020-09-10 14:40:25.873 7f1a6467eec0 -1 
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/os/bluestore/Allocator.cc
>  : In function 'virtual 
> Allocator::SocketHook::~SocketHook()' thread 7f1a6467eec0 time 2020-09-10 
> 14:40:25.872353
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/os/bluestore/Allocator.cc
>  : 53: FAILED ceph_assert(r == 0)
> 
>  ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus 
> (stable)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> const*)+0x14a) [0x7f1a5a823025]
>  2: (()+0x25c1ed) [0x7f1a5a8231ed]
>  3: (()+0x3c7a4f) [0x55b33537ca4f]
>  4: (HybridAllocator::~HybridAllocator()+0x17) [0x55b3353ac517]
>  5: (BlueStore::_close_alloc()+0x42) [0x55b3351f2082]
>  6: (BlueStore::_close_db_and_around(bool)+0x2f8) [0x55b335274528]
>  7: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x2c1) [0x55b3352749a1]
>  8: (main()+0x10b3) [0x55b335187493]
>  9: (__libc_start_main()+0xf5) [0x7f1a574aa555]
>  10: (()+0x1f9b5f) [0x55b3351aeb5f]
> *** Caught signal (Aborted) **
>  in thread 7f1a6467eec0 thread_name:ceph-bluestore-
> ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus 
> (stable)
>  1: (()+0xf630) [0x7f1a58cf0630]
>  2: (gsignal()+0x37) [0x7f1a574be387]
>  3: (abort()+0x148) [0x7f1a574bfa78]
>  4: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
>>>

[ceph-users] Re: Nautilus Scrub and deep-Scrub execution order

2020-09-14 Thread Robin H. Johnson
On Mon, Sep 14, 2020 at 11:40:22AM -, Johannes L wrote:
> Hello Ceph-Users
> 
> after upgrading one of our clusters to Nautilus we noticed the x pgs not 
> scrubbed/deep-scrubbed in time warnings.
> Through some digging we found out that it seems like the scrubbing takes 
> place at random and doesn't take the age of the last scrub/deep-scrub into 
> consideration.
> I dumped the time of the last scrub with a 90 min gap in between:
> ceph pg dump | grep active | awk '{print $22}' | sort | uniq -c
> dumped all
>2434 2020-08-30
>5935 2020-08-31
>1782 2020-09-01
>   2 2020-09-02
>   2 2020-09-03
>   5 2020-09-06
>   3 2020-09-08
>   5 2020-09-09
>  17 2020-09-10
> 259 2020-09-12
>   26672 2020-09-13
>   12036 2020-09-14
> 
> dumped all
>2434 2020-08-30
>5933 2020-08-31
>1782 2020-09-01
>   2 2020-09-02
>   2 2020-09-03
>   5 2020-09-06
>   3 2020-09-08
>   5 2020-09-09
>  17 2020-09-10
>  51 2020-09-12
>   24862 2020-09-13
>   14056 2020-09-14
> 
> It is pretty obvious that the PGs that have been scrubbed a day ago have been 
> scrubbed again for some reason while ones that are 2 weeks old are basically 
> left untouched.
> One way we are currently dealing with this issue is setting the 
> osd_scrub_min_interval to 72h to force the cluster to scrub the older PGs.
> This can't be intentional.
> Has anyone else seen this behavior?
Yes, this has existed for a long time; but the warnings are what's new.

- What's your workload? RBD/RGW/CephFS/???
- Is there a pattern to which pools are behind?

At more than one job now, we've have written some tooling that drove the
oldest scrubs in addition or instead of Ceph scheduling scrubs.

The one thing that absolutely stood out in that however, is some PGs
that took much longer than others or never completed (and meant other
PGs on those OSDs also got delayed). I never got to the bottom of why
when I was at my last job, and it hasn't been priority enough at my
current job for the once we saw it (and it may have been a precursor to
a disk failing).

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
E-Mail   : robb...@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136


signature.asc
Description: PGP signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: New pool with SSD OSDs

2020-09-14 Thread Stefan Kooman
On 2020-09-14 16:09, André Gemünd wrote:
> Same happened to us two weeks ago using nautilus, although we added the rules 
> and storage classes. 

I think this post [1] from Wido den Hollander might be useful
information. That way you can avoid data movement if data is already on hdd.

Gr. Stefan

[1]: https://blog.widodh.nl/2019/02/comparing-two-ceph-crush-maps/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unable to start mds when creating cephfs volume with erasure encoding data pool

2020-09-14 Thread Patrick Donnelly
On Sun, Sep 13, 2020 at 1:26 PM  wrote:
>
> Hi all,
>
> I'm using ceph Octopus version and deployed it using cephadm. The ceph 
> documentation provides 2 ways for creating a new cephfs volume:
>
>  1. via "ceph fs volume create ..." - I can use this and it works fine with 
> the MDS automatically deployed but there is no provision for using EC with 
> the data pool

See "Using EC pools with CephFS"  in
https://ceph.io/community/new-luminous-erasure-coding-rbd-cephfs/

I will make a note to improve the ceph documentation on this.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Disk consume for CephFS

2020-09-14 Thread fotofors
Hello.

I'm using the Nautilus Ceph version for some huge folder with approximately 
1.7TB of files.I created the filesystem and started to copy files via rsync. 

However, I've had to stop the process, because Ceph shows me that the new size 
of the folder is almost 6TB. I double checked the replicated size and it is 2. 
I double checked the rsync options and I didn't copy the files followed by 
symlinks.

How would it be possible to explain the extreme difference between the size of 
the original folder and CephFS?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: New pool with SSD OSDs

2020-09-14 Thread Stefan Kooman
On 2020-09-14 17:51, Tecnologia Charne.Net wrote:
> Thanks Stefan!
> Compiling crush map by hand on production cluster makes me sweat
> but we like to take risks, don't we?

I the crushtool says it's OK, I guess it's OK ;-). But yeah, that's the
most powerful operation one can perform on a cluster I guess ... you
don't want to mess that up.

Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Syncing cephfs from Ceph to Ceph

2020-09-14 Thread Stefan Kooman
On 2020-09-09 15:51, Eugen Block wrote:
> Hi Simon,
> 
>> What about the idea of creating the cluster over two data centers?
>> Would it be possible to modify the crush map, so one pool gets
>> replicated over those two data centers and if one fails, the other one
>> would still be functional?
> 
> A stretched cluster is a valid approach, but you have to consider
> several things like MON quorum (you'll need a third MON independent of
> the two DCs) and failure domains and resiliency. The crush map and rules
> can be easily adjusted to reflect two DCs.

There is a PR open to _explicitly_ support stretch clusters in Ceph [1].
To get it explained you can watch this presentation Gregory Farnum gave
at FOSDEM 2020 [2]. Fortunately you can pause the video as he is going
quite fast ;-).

Better than two DCs are three DCs: that just works. And even better than
theee are of course four DCs ... so you can recover from a complete DC
failure ...

Gr. Stefan

[1]: https://github.com/ceph/ceph/pull/32336
[2]:
https://archive.fosdem.org/2020/schedule/event/sds_ceph_stretch_clusters/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Disk consume for CephFS

2020-09-14 Thread Nathan Fish
What about hardlinks, are there any of those? Are there lots of
directories or tiny (<4k) files?
Also, size=2 is not very safe. You want size=3, min_size=2 if you are
doing replication.

On Mon, Sep 14, 2020 at 6:15 PM  wrote:
>
> Hello.
>
> I'm using the Nautilus Ceph version for some huge folder with approximately 
> 1.7TB of files.I created the filesystem and started to copy files via rsync.
>
> However, I've had to stop the process, because Ceph shows me that the new 
> size of the folder is almost 6TB. I double checked the replicated size and it 
> is 2. I double checked the rsync options and I didn't copy the files followed 
> by symlinks.
>
> How would it be possible to explain the extreme difference between the size 
> of the original folder and CephFS?
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] benchmark Ceph

2020-09-14 Thread Tony Liu
Hi,

I have a 3-OSD-node Ceph cluster with 1 480GB SSD and 8 x 2TB
12Gpbs SAS HDD on each node, to provide storage to a OpenStack
cluster. Both public and cluster networks are 2x10G. WAL and DB
of each OSD is on SSD and they share the same 60GB partition.

I run fio with different combinations of operation, block size and
io-depth to collect IOPS, bandwidth and latency. I tried fio on
compute node with ioengine=rbd, also fio within VM (backed by Ceph)
with ioengine=libaio.

The result doesn't seem good. Here are couple examples.

fio --name=test --ioengine=rbd --clientname=admin \
--pool=benchmark --rbdname=test --numjobs=1 \
--runtime=30 --direct=1 --size=2G \
--rw=read --bs=4k --iodepth=1

test: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, 
ioengine=rbd, iodepth=1
fio-3.7
Starting 1 process
Jobs: 1 (f=0): [f(1)][100.0%][r=27.6MiB/s,w=0KiB/s][r=7075,w=0 IOPS][eta 
00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=56310: Mon Sep 14 19:01:24 2020
   read: IOPS=7610, BW=29.7MiB/s (31.2MB/s)(892MiB/30001msec)
slat (nsec): min=1550, max=57662, avg=3312.74, stdev=2981.42
clat (usec): min=77, max=4799, avg=127.39, stdev=39.88
 lat (usec): min=78, max=4812, avg=130.70, stdev=40.67
clat percentiles (usec):
 |  1.00th=[   82],  5.00th=[   86], 10.00th=[   95], 20.00th=[   98],
 | 30.00th=[  100], 40.00th=[  104], 50.00th=[  116], 60.00th=[  129],
 | 70.00th=[  141], 80.00th=[  157], 90.00th=[  182], 95.00th=[  198],
 | 99.00th=[  233], 99.50th=[  245], 99.90th=[  359], 99.95th=[  515],
 | 99.99th=[  709]
   bw (  KiB/s): min=27160, max=40696, per=100.00%, avg=30474.29, 
stdev=2826.23, samples=59
   iops: min= 6790, max=10174, avg=7618.56, stdev=706.56, samples=59
  lat (usec)   : 100=28.89%, 250=70.72%, 500=0.34%, 750=0.05%, 1000=0.01%
  lat (msec)   : 2=0.01%, 10=0.01%
  cpu  : usr=3.55%, sys=3.80%, ctx=228358, majf=0, minf=29
  IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 issued rwts: total=228333,0,0,0 short=0,0,0,0 dropped=0,0,0,0
 latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=29.7MiB/s (31.2MB/s), 29.7MiB/s-29.7MiB/s (31.2MB/s-31.2MB/s), 
io=892MiB (935MB), run=30001-30001msec

Disk stats (read/write):
dm-0: ios=290/3, merge=0/0, ticks=2427/19, in_queue=2446, util=0.95%, 
aggrios=290/4, aggrmerge=0/0, aggrticks=2427/39, aggrin_queue=2332, 
aggrutil=0.95%
  sda: ios=290/4, merge=0/0, ticks=2427/39, in_queue=2332, util=0.95%


fio --name=test --ioengine=rbd --clientname=admin \
--pool=benchmark --rbdname=test --numjobs=1 \
--runtime=30 --direct=1 --size=2G \
--rw=write --bs=4k --iodepth=1

test: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, 
ioengine=rbd, iodepth=1
fio-3.7
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][r=0KiB/s,w=6352KiB/s][r=0,w=1588 IOPS][eta 
00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=56544: Mon Sep 14 19:03:36 2020
  write: IOPS=1604, BW=6417KiB/s (6571kB/s)(188MiB/30003msec)
slat (nsec): min=2240, max=45925, avg=6526.95, stdev=3486.19
clat (usec): min=399, max=35411, avg=615.88, stdev=231.41
 lat (usec): min=402, max=35421, avg=622.40, stdev=232.08
clat percentiles (usec):
 |  1.00th=[  420],  5.00th=[  449], 10.00th=[  469], 20.00th=[  498],
 | 30.00th=[  529], 40.00th=[  562], 50.00th=[  611], 60.00th=[  652],
 | 70.00th=[  685], 80.00th=[  709], 90.00th=[  766], 95.00th=[  799],
 | 99.00th=[  881], 99.50th=[  955], 99.90th=[ 2671], 99.95th=[ 3097],
 | 99.99th=[ 3785]
   bw (  KiB/s): min= 5944, max= 6792, per=100.00%, avg=6415.95, stdev=178.72, 
samples=60
   iops: min= 1486, max= 1698, avg=1603.93, stdev=44.67, samples=60
  lat (usec)   : 500=20.82%, 750=67.23%, 1000=11.55%
  lat (msec)   : 2=0.25%, 4=0.14%, 10=0.01%, 20=0.01%, 50=0.01%
  cpu  : usr=1.22%, sys=1.25%, ctx=48143, majf=0, minf=18
  IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 issued rwts: total=0,48129,0,0 short=0,0,0,0 dropped=0,0,0,0
 latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=6417KiB/s (6571kB/s), 6417KiB/s-6417KiB/s (6571kB/s-6571kB/s), 
io=188MiB (197MB), run=30003-30003msec

Disk stats (read/write):
dm-0: ios=31/2, merge=0/0, ticks=342/14, in_queue=356, util=0.12%, 
aggrios=33/3, aggrmerge=0/0, aggrticks=390/27, aggrin_queue=404, aggrutil=0.13%
  sda: ios=33/3, merge=0/0, ticks=390/27, in_queue=404, util=0.13%
==

[ceph-users] Re: benchmark Ceph

2020-09-14 Thread rainning
Can you post the fio results with the ioengine using libaio? From what you 
posted, it seems to me that the read test hit cache. And the write performance 
was not good, the latency was too high (~35.4ms) while the numjobs and iodepth 
both were 1. Did you monitor system stat on both side (VM/Compute Node and 
Cluster)?




-- Original --
From:  "Tony Liu";

[ceph-users] Re: Disk consume for CephFS

2020-09-14 Thread tri
I suggest trying the rsync --sparse option. Typically, qcow2 files (tend to be 
large) are sparse files. Without the sparse option, the files expand in their 
destination.


September 14, 2020 6:15 PM, fotof...@gmail.com wrote:

> Hello.
> 
> I'm using the Nautilus Ceph version for some huge folder with approximately 
> 1.7TB of files.I
> created the filesystem and started to copy files via rsync. 
> 
> However, I've had to stop the process, because Ceph shows me that the new 
> size of the folder is
> almost 6TB. I double checked the replicated size and it is 2. I double 
> checked the rsync options
> and I didn't copy the files followed by symlinks.
> 
> How would it be possible to explain the extreme difference between the size 
> of the original folder
> and CephFS?
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: benchmark Ceph

2020-09-14 Thread Tony Liu
Here is the test inside VM.

# fio --name=test --ioengine=libaio --numjobs=1 --runtime=30 \
  --direct=1 --size=2G --end_fsync=1 \
  --rw=read --bs=4K --iodepth=1
test: (groupid=0, jobs=1): err= 0: pid=14615: Mon Sep 14 21:50:55 2020
   read: IOPS=3209, BW=12.5MiB/s (13.1MB/s)(376MiB/30001msec)
slat (usec): min=3, max=162, avg= 6.91, stdev= 4.74
clat (usec): min=85, max=17366, avg=303.17, stdev=639.42
 lat (usec): min=161, max=17373, avg=310.38, stdev=639.93
clat percentiles (usec):
 |  1.00th=[  167],  5.00th=[  172], 10.00th=[  176], 20.00th=[  182],
 | 30.00th=[  188], 40.00th=[  194], 50.00th=[  204], 60.00th=[  221],
 | 70.00th=[  239], 80.00th=[  277], 90.00th=[  359], 95.00th=[  461],
 | 99.00th=[ 3130], 99.50th=[ 5735], 99.90th=[ 8094], 99.95th=[11338],
 | 99.99th=[14091]
   bw (  KiB/s): min= 9688, max=15120, per=99.87%, avg=12820.51, stdev=1001.88, 
samples=59
   iops: min= 2422, max= 3780, avg=3205.12, stdev=250.47, samples=59
  lat (usec)   : 100=0.01%, 250=74.99%, 500=20.76%, 750=2.21%, 1000=0.50%
  lat (msec)   : 2=0.39%, 4=0.27%, 10=0.81%, 20=0.06%
  cpu  : usr=0.65%, sys=3.06%, ctx=96287, majf=0, minf=13
  IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 issued rwts: total=96287,0,0,0 short=0,0,0,0 dropped=0,0,0,0
 latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=12.5MiB/s (13.1MB/s), 12.5MiB/s-12.5MiB/s (13.1MB/s-13.1MB/s), 
io=376MiB (394MB), run=30001-30001msec

Disk stats (read/write):
  vda: ios=95957/2, merge=0/0, ticks=29225/12, in_queue=6027, util=82.52%


# fio --name=test --ioengine=libaio --numjobs=1 --runtime=30 \
  --direct=1 --size=2G --end_fsync=1 \
  --rw=write --bs=4K --iodepth=1
test: (groupid=0, jobs=1): err= 0: pid=14619: Mon Sep 14 21:52:04 2020
  write: IOPS=16.3k, BW=63.7MiB/s (66.8MB/s)(1917MiB/30074msec)
slat (usec): min=3, max=182, avg= 5.94, stdev= 1.30
clat (usec): min=11, max=5234, avg=54.08, stdev=18.58
 lat (usec): min=35, max=5254, avg=60.26, stdev=18.80
clat percentiles (usec):
 |  1.00th=[   36],  5.00th=[   38], 10.00th=[   40], 20.00th=[   46],
 | 30.00th=[   48], 40.00th=[   50], 50.00th=[   53], 60.00th=[   56],
 | 70.00th=[   59], 80.00th=[   63], 90.00th=[   67], 95.00th=[   71],
 | 99.00th=[   85], 99.50th=[  100], 99.90th=[  289], 99.95th=[  355],
 | 99.99th=[  412]
   bw (  KiB/s): min=59640, max=80982, per=100.00%, avg=65462.25, 
stdev=7166.81, samples=59
   iops: min=14910, max=20245, avg=16365.54, stdev=1791.69, samples=59
  lat (usec)   : 20=0.01%, 50=39.85%, 100=59.65%, 250=0.36%, 500=0.14%
  lat (usec)   : 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%
  cpu  : usr=2.10%, sys=11.63%, ctx=490639, majf=0, minf=12
  IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 issued rwts: total=0,490635,0,1 short=0,0,0,0 dropped=0,0,0,0
 latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=63.7MiB/s (66.8MB/s), 63.7MiB/s-63.7MiB/s (66.8MB/s-66.8MB/s), 
io=1917MiB (2010MB), run=30074-30074msec

Disk stats (read/write):
  vda: ios=9/490639, merge=0/0, ticks=26/27102, in_queue=184, util=99.36%

Both networking and storage workloads are light.
Which system stat should I monitor?

Thanks!
Tony
> -Original Message-
> From: rainning 
> Sent: Monday, September 14, 2020 8:39 PM
> To: Tony Liu ; ceph-users 
> Subject: [ceph-users] Re: benchmark Ceph
> 
> Can you post the fio results with the ioengine using libaio? From what
> you posted, it seems to me that the read test hit cache. And the write
> performance was not good, the latency was too high (~35.4ms) while the
> numjobs and iodepth both were 1. Did you monitor system stat on both
> side (VM/Compute Node and Cluster)?
> 
> 
> 
> 
> -- Original --
> From:  "Tony Liu"; Date:  Sep 15, 2020
> To:  "ceph-users" 
> Subject:  [ceph-users] benchmark Ceph
> 
> 
> 
> Hi,
> 
> I have a 3-OSD-node Ceph cluster with 1 480GB SSD and 8 x 2TB 12Gpbs SAS
> HDD on each node, to provide storage to a OpenStack cluster. Both public
> and cluster networks are 2x10G. WAL and DB of each OSD is on SSD and
> they share the same 60GB partition.
> 
> I run fio with different combinations of operation, block size and io-
> depth to collect IOPS, bandwidth and latency. I tried fio on compute
> node with ioengine=rbd, also fio within VM (backed by Ceph) with
> ioeng

[ceph-users] Re: Nautilus: rbd image stuck unaccessible after VM restart

2020-09-14 Thread Cashapp Failed
The cash app is otherwise called square cash, which is a companion to a friend 
app that permits the clients to transfer assets by connecting their financial 
balance. It is fundamentally a versatile app-based stage for installment 
administrations. It is an a lot less difficult app when contrasted with the 
other installment administration apps. Nonetheless, a client may confront the 
issue of Cash App Transfer Failed and here, we are giving the answer for that. 
https://i-cashapp.com/cash-app-transfer-failed/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: benchmark Ceph

2020-09-14 Thread rainning
What is your Ceph version? From the test results you posted, your environment's 
performance is okay in regard of your setup. But there are definitely many 
things that can be tuned to get you better number.


I normally use top, iostat, pidstat, vmstat, dstat, iperf3, blktrace, netmon, 
ceph admin socket to monitor system stat. 





-- Original --
From:  "Tony Liu";