[ceph-users] Re: Understanding Bluestore performance characteristics

2020-02-04 Thread Vitaliy Filippov
Hi, Try to repeat your test with numjobs=1, I've already seen strange behaviour with parallel jobs to one RBD image. Also as usual: https://yourcmc.ru/wiki/Ceph_performance :-) Hi, We have a production cluster of 27 OSD's across 5 servers (all SSD's running bluestore), and have started to

[ceph-users] Bluestore cache parameter precedence

2020-02-04 Thread Boris Epstein
Hello list, As stated in this document: https://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/ there are multiple parameters defining cache limits for BlueStore. You have bluestore_cache_size (presumably controlling the cache size), bluestore_cache_size_hdd (presumably doing

[ceph-users] Re: recovery_unfound

2020-02-04 Thread Jake Grimmett
Hi Paul, Many thanks for your helpful suggestions. Yes, we have 13 pgs with "might_have_unfound" entries. (also 1 pgs without "might_have_unfound" stuck in active+recovery_unfound+degraded+repair state) Taking one pg with unfound objects: [root@ceph1 ~]# ceph health detail | grep 5.5c9 pg

[ceph-users] OSDs crashing

2020-02-04 Thread Raymond Clotfelter
I have 30 or so OSDs on a cluster with 240 that just keep crashing. Below is the last part of one of the log files showing the crash, can anyone please help me read this to figure out what is going on and how to correct it? When I start the OSDs they generally seem to work for 5-30 minutes, and

[ceph-users] Doubt about AVAIL space on df

2020-02-04 Thread German Anders
Hello Everyone, I would like to understand if this output is right: *# ceph df* GLOBAL: SIZEAVAIL RAW USED %RAW USED 85.1TiB 43.7TiB 41.4TiB 48.68 POOLS: NAMEID USED%USED MAX AVAIL OBJECTS volumes 13 13.8TiB

[ceph-users] Re: Doubt about AVAIL space on df

2020-02-04 Thread EDH - Manuel Rios
Hi German, Can you post , ceph osd df tree ? Looks like your usage distribution is not perfect and that's why you got less space than real. Regards -Mensaje original- De: German Anders Enviado el: martes, 4 de febrero de 2020 14:00 Para: ceph-us...@ceph.com Asunto: [ceph-users] Doubt

[ceph-users] Re: Doubt about AVAIL space on df

2020-02-04 Thread German Anders
Hi Manuel, Sure thing: # ceph osd df ID CLASS WEIGHT REWEIGHT SIZEUSE AVAIL %USE VAR PGS 0 nvme 1.0 1.0 1.09TiB 496GiB 622GiB 44.35 0.91 143 1 nvme 1.0 1.0 1.09TiB 488GiB 630GiB 43.63 0.89 141 2 nvme 1.0 1.0 1.09TiB 537GiB 581GiB 48.05 0.99 155

[ceph-users] osd_memory_target ignored

2020-02-04 Thread Frank Schilder
I recently upgraded from 13.2.2 to 13.2.8 and observe two changes that I struggle with: - from release notes: The bluestore_cache_* options are no longer needed. They are replaced by osd_memory_target, defaulting to 4GB. - the default for bluestore_allocator has changed from stupid to bitmap, w

[ceph-users] Re: Doubt about AVAIL space on df

2020-02-04 Thread EDH - Manuel Rios
With “ceph osd df tree” will be clear but right now I can see that some %USE osd between 44% and 65%. Ceph osd df tree give also the balance at host level. Do you have balancer enabled ?No “perfect” distribution cause that you cant use the full space. In our case we gain space manually rebalan

[ceph-users] Re: Doubt about AVAIL space on df

2020-02-04 Thread German Anders
Manuel, find the output of ceph osd df tree command: # ceph osd df tree ID CLASS WEIGHT REWEIGHT SIZEUSE AVAIL %USE VAR PGS TYPE NAME -7 84.00099- 85.1TiB 41.6TiB 43.6TiB 48.82 1.00 - root root -5 12.0- 13.1TiB 5.81TiB 7.29TiB 44.38 0.91 - r

[ceph-users] Re: Doubt about AVAIL space on df

2020-02-04 Thread Wido den Hollander
On 2/4/20 2:00 PM, German Anders wrote: > Hello Everyone, > > I would like to understand if this output is right: > > *# ceph df* > GLOBAL: > SIZEAVAIL RAW USED %RAW USED > 85.1TiB 43.7TiB 41.4TiB 48.68 > POOLS: > NAMEID USED%U

[ceph-users] Re: Write i/o in CephFS metadata pool

2020-02-04 Thread Samy Ascha
> On 2 Feb 2020, at 12:45, Patrick Donnelly wrote: > > On Wed, Jan 29, 2020 at 1:25 AM Samy Ascha wrote: >> >> Hi! >> >> I've been running CephFS for a while now and ever since setting it up, I've >> seen unexpectedly large write i/o on the CephFS metadata pool. >> >> The filesystem is ot

[ceph-users] Re: osd_memory_target ignored

2020-02-04 Thread Stefan Kooman
Hi, Quoting Frank Schilder (fr...@dtu.dk): > I recently upgraded from 13.2.2 to 13.2.8 and observe two changes that > I struggle with: > > - from release notes: The bluestore_cache_* options are no longer > needed. They are replaced by osd_memory_target, defaulting to 4GB. - > the default for bl

[ceph-users] All pgs peering indefinetely

2020-02-04 Thread Rodrigo Severo - Fábrica
Hi, I have a rather small cephfs cluster with 3 machines right now: all of them sharing MDS, MON, MGS and OSD roles. I had to move all machines to a new physical location and, unfortunately, I had to move all of them at the same time. They are already on again but ceph won't be accessible as al

[ceph-users] Re: osd_memory_target ignored

2020-02-04 Thread Frank Schilder
Dear Stefan, I check with top the total allocation. ps -aux gives: USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND ceph 784155 15.8 3.1 6014276 4215008 ? Sl Jan31 932:13 /usr/bin/ceph-osd --cluster ceph -f -i 243 ... ceph 784732 16.6 3.0 6058736 40825

[ceph-users] Re: All pgs peering indefinetely

2020-02-04 Thread DHilsbos
Rodrigo; Are all your hosts using the same IP addresses as before the move? Is the new network structured the same? Thank you, Dominic L. Hilsbos, MBA Director - Information Technology Perform Air International Inc. dhils...@performair.com www.PerformAir.com -Original Message- Fr

[ceph-users] More OMAP Issues

2020-02-04 Thread DHilsbos
All; We're backing to having large OMAP object warnings regarding our RGW index pool. This cluster is now in production, so I can simply dump the buckets / pools and hope everything works out. I did some additional research on this issue, and it looks like I need to (re)shard the bucket (index

[ceph-users] Re: Understanding Bluestore performance characteristics

2020-02-04 Thread Bradley Kite
Hi Vitaliy Yes - I tried this and I can still see a number of reads (~110 iops, 440KB/sec) on the SSD, so it is significantly better, but the result is still puzzling - I'm trying to understand what is causing the reads. The problem is amplified with numjobs >= 2 but it looks like it is still ther

[ceph-users] Re: recovery_unfound

2020-02-04 Thread Chad William Seys
Hi Jake and all, We're having what looks to be the exact same problem. In our case it happened when I was "draining" an OSD for removal. (ceph crush remove...) Adding the OSD back doesn't help workaround the bug. Everything is either triply replicated or EC k3m2, either of which should st

[ceph-users] Re: Understanding Bluestore performance characteristics

2020-02-04 Thread Igor Fedotov
Hi Bradley, you might want to check performance counters for this specific OSD. Available via 'ceph daemon osd.0 perf dump'  command in Nautilus. A bit different command for Luminous AFAIR. Then look for 'read' substring in the dump and try to find unexpectedly high read-related counter valu

[ceph-users] Re: More OMAP Issues

2020-02-04 Thread Paul Emmerich
Are you running a multi-site setup? In this case it's best to set the default shard size to large enough number *before* enabling multi-site. If you didn't do this: well... I think the only way is still to completely re-sync the second site... Paul -- Paul Emmerich Looking for help with your

[ceph-users] Re: More OMAP Issues

2020-02-04 Thread DHilsbos
Paul; Yes, we are running a multi-site setup. Re-sync would be acceptable at this point, as we only have 4 TiB in use right now. Tearing down and reconfiguring the second site would also be acceptable, except that I've never been able to cleanly remove a zone from a zone group. The only way

[ceph-users] Re: Bluestore cache parameter precedence

2020-02-04 Thread Igor Fedotov
Hi Boris, general settings (unless they are set to zero) override disk-specific settings . I.e. bluestore_cache_size overrides both bluestore_cache_size_hdd and bluestore_cache_size_ssd. Here is the code snippet in case you know C++   if (cct->_conf->bluestore_cache_size) {     cache_size

[ceph-users] Re: Understanding Bluestore performance characteristics

2020-02-04 Thread vitalif
SSD (block.db) partition contains object metadata in RocksDB so it probably loads the metadata before modifying objects (if it's not in cache yet). Also it sometimes performs compaction which also results in disk reads and writes. There are other things going on that I'm not completely aware of

[ceph-users] Cephalocon Seoul is canceled

2020-02-04 Thread Sage Weil
Hi everyone, We are sorry to announce that, due to the recent coronavirus outbreak, we are canceling Cephalocon for March 3-5 in Seoul. More details will follow about how to best handle cancellation of hotel reservations and so forth. Registrations will of course be refunded--expect an email

[ceph-users] Re: Bluestore cache parameter precedence

2020-02-04 Thread Boris Epstein
Hi Igor, Thanks! I think the code needs to be corrected - the choice criteria for which setting to use when cct->_conf->bluestore_cache_size == 0 should be as follows: 1) See what kind of storage you have. 2) Select type-appropriate storage. Is this code public-editable? I'll be happy to cor

[ceph-users] Bucket rename with

2020-02-04 Thread EDH - Manuel Rios
Hi Some Customer asked us for a normal easy problem, they want rename a bucket. Checking the Nautilus documentation looks by now its not possible, but I checked master documentation and a CLI should be accomplish this apparently. $ radosgw-admin bucket link --bucket=foo --bucket-new-name=bar --

[ceph-users] Re: All pgs peering indefinetely

2020-02-04 Thread Rodrigo Severo - Fábrica
Em ter., 4 de fev. de 2020 às 13:11, escreveu: > > Rodrigo; > > Are all your hosts using the same IP addresses as before the move? Is the > new network structured the same? Yes for both questions. Rodrigo ___ ceph-users mailing list -- ceph-users@ce

[ceph-users] Re: All pgs peering indefinetely

2020-02-04 Thread Rodrigo Severo - Fábrica
Em ter., 4 de fev. de 2020 às 12:39, Rodrigo Severo - Fábrica escreveu: > > Hi, > > > I have a rather small cephfs cluster with 3 machines right now: all of > them sharing MDS, MON, MGS and OSD roles. > > I had to move all machines to a new physical location and, > unfortunately, I had to move all

[ceph-users] Re: All pgs peering indefinetely

2020-02-04 Thread Wesley Dillingham
I would guess that you have something preventing osd to osd communication on ports 6800-7300 or osd to mon communication on port 6789 and/or 3300. Respectfully, *Wes Dillingham* w...@wesdillingham.com LinkedIn On Tue, Feb 4, 2020 at 12:44 PM Rodri

[ceph-users] Re: All pgs peering indefinetely

2020-02-04 Thread Rodrigo Severo - Fábrica
Em ter., 4 de fev. de 2020 às 14:54, Wesley Dillingham escreveu: > > > I would guess that you have something preventing osd to osd communication on > ports 6800-7300 or osd to mon communication on port 6789 and/or 3300. The 3 servers are on the same subnet. They are connect to a non-managed swi

[ceph-users] Re: All pgs peering indefinetely

2020-02-04 Thread DHilsbos
Rodrigo; Best bet would be to check logs. Check the OSD logs on the affected server. Check cluster logs on the MONs. Check OSD logs on other servers. Your Ceph version(s) and your OS distribution and version would also be useful to help you troubleshoot this OSD flapping issue. Thank you,

[ceph-users] Re: All pgs peering indefinetely

2020-02-04 Thread Rodrigo Severo - Fábrica
Em ter., 4 de fev. de 2020 às 15:19, escreveu: > > Rodrigo; > > Best bet would be to check logs. Check the OSD logs on the affected server. > Check cluster logs on the MONs. Check OSD logs on other servers. > > Your Ceph version(s) and your OS distribution and version would also be > useful t

[ceph-users] Re: osd_memory_target ignored

2020-02-04 Thread Stefan Kooman
Quoting Frank Schilder (fr...@dtu.dk): > Dear Stefan, > > I check with top the total allocation. ps -aux gives: > > USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND > ceph 784155 15.8 3.1 6014276 4215008 ? Sl Jan31 932:13 > /usr/bin/ceph-osd --cluster ceph -

[ceph-users] Migrate journal to Nvme from old SSD journal drive?

2020-02-04 Thread Alex L
Hi, I finally got my Samsung PM983 [1] to use as journal for about 6 drives plus drive cache replacing a consumer SSD - Kingston SV300. But I can't for the life of me figure out how to move an existing journal to this NVME on my Nautilus cluster. # Created a new big partition on the NVME sgdi

[ceph-users] Re: Understanding Bluestore performance characteristics

2020-02-04 Thread Bradley Kite
Hi Igor, This has been very helpful. I have identified (when numjobs=1, the least-worst case) that there are approximately just as many bluestore_write_small_pre_read per second as there are sequential-write IOPS per second: Tue 4 Feb 22:44:34 GMT 2020 "bluestore_write_small_pre_read":