date:20200116

[ceph-users] Re: Benchmark results for Seagate Exos2X14 Dual Actuator HDDs

2020-01-16 Thread Konstantin Shalygin


On 1/15/20 11:58 PM, Paul Emmerich wrote:


we ran some benchmarks with a few samples of Seagate's new HDDs that 
some of you might find interesting:


Blog post:
https://croit.io/2020/01/06/2020-01-06-benchmark-mach2

GitHub repo with scripts and raw data:
https://github.com/croit/benchmarks/tree/master/mach2-disks

Tl;dr: way faster for writes, somewhat faster for reads in some scenarios


Very interesting, thanks for sharing results. Price is available?



k
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Benchmark results for Seagate Exos2X14 Dual Actuator HDDs

2020-01-16 Thread mj


Hi,

Interesting technology!

It seems they have only one capacity: 14TB? Or are they planning 
different sizes as well? Also the linked pdf mentions just this one disk.


And obviouly the price would be interesting to know...

MJ

On 1/16/20 9:51 AM, Konstantin Shalygin wrote:

On 1/15/20 11:58 PM, Paul Emmerich wrote:


we ran some benchmarks with a few samples of Seagate's new HDDs that 
some of you might find interesting:


Blog post:
https://croit.io/2020/01/06/2020-01-06-benchmark-mach2

GitHub repo with scripts and raw data:
https://github.com/croit/benchmarks/tree/master/mach2-disks

Tl;dr: way faster for writes, somewhat faster for reads in some scenarios


Very interesting, thanks for sharing results. Price is available?



k
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Benchmark results for Seagate Exos2X14 Dual Actuator HDDs

2020-01-16 Thread mj


More details, different capacities etc:

https://www.seagate.com/nl/nl/support/internal-hard-drives/enterprise-hard-drives/exos-X/

MJ

On 1/16/20 9:51 AM, Konstantin Shalygin wrote:

On 1/15/20 11:58 PM, Paul Emmerich wrote:


we ran some benchmarks with a few samples of Seagate's new HDDs that 
some of you might find interesting:


Blog post:
https://croit.io/2020/01/06/2020-01-06-benchmark-mach2

GitHub repo with scripts and raw data:
https://github.com/croit/benchmarks/tree/master/mach2-disks

Tl;dr: way faster for writes, somewhat faster for reads in some scenarios


Very interesting, thanks for sharing results. Price is available?



k
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Benchmark results for Seagate Exos2X14 Dual Actuator HDDs

2020-01-16 Thread Martin Verges

Hello,

according to some prices we have heard so far, the Seagate dual actuator
HDD will cost around 15-20% more than a single actuator.

We can help with a good hardware selection if interested.

--
Martin Verges
Managing director

Hint: Secure one of the last slots in the upcoming 4-day Ceph Intensive
Training at https://croit.io/training/4-days-ceph-in-depth-training.

Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx


Am Do., 16. Jan. 2020 um 10:11 Uhr schrieb mj :

> Hi,
>
> Interesting technology!
>
> It seems they have only one capacity: 14TB? Or are they planning
> different sizes as well? Also the linked pdf mentions just this one disk.
>
> And obviouly the price would be interesting to know...
>
> MJ
>
> On 1/16/20 9:51 AM, Konstantin Shalygin wrote:
> > On 1/15/20 11:58 PM, Paul Emmerich wrote:
> >>
> >> we ran some benchmarks with a few samples of Seagate's new HDDs that
> >> some of you might find interesting:
> >>
> >> Blog post:
> >> https://croit.io/2020/01/06/2020-01-06-benchmark-mach2
> >>
> >> GitHub repo with scripts and raw data:
> >> https://github.com/croit/benchmarks/tree/master/mach2-disks
> >>
> >> Tl;dr: way faster for writes, somewhat faster for reads in some
> scenarios
> >
> > Very interesting, thanks for sharing results. Price is available?
> >
> >
> >
> > k
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: High CPU usage by ceph-mgr in 14.2.5

2020-01-16 Thread Wido den Hollander

Anybody upgraded to 14.2.6 yet?

On a 1800 OSD cluster I see that ceph-mgr is consuming 200 to 450% CPU
on a 4C/8T system (Intel Xeon E3-1230 3.3Ghz CPU).

The logs don't show anything very special, it's just that the mgr is
super busy.

I noticed this when I executed:

$ ceph balancer status

That command wouldn't return and then I checked the mgr. Only after
restarting ceph-mgr the balancer module returned results again. It
didn't change the CPU usage, it's still consuming a lot of CPU, but at
least the balancer seems to work again.

Wido

On 1/9/20 10:21 AM, Lars Täuber wrote:
> yesterday:
> https://ceph.io/releases/v14-2-6-nautilus-released/
> 
> 
> Cheers,
> Lars
> 
> Thu, 9 Jan 2020 10:10:12 +0100
> Wido den Hollander  ==> Neha Ojha , Sasha 
> Litvak  :
>> On 12/24/19 9:19 PM, Neha Ojha wrote:
>>> The root cause of this issue is the overhead added by the network ping
>>> time monitoring feature for the mgr to process.
>>> We have a fix that disables sending the network ping times related
>>> stats to the mgr and Eric has helped verify the fix(Thanks Eric!) -
>>> https://tracker.ceph.com/issues/43364#note-9. We'll get this fix out
>>> in 14.2.6 after the holidays.
>>>   
>>
>> It's after the holidays now and this is affecting a lot of deployments.
>> Can people expect 14.2.6 soon?
>>
>> Wido
>>
>>>
>>>
>>> On Fri, Dec 20, 2019 at 6:24 PM Neha Ojha  wrote:  

 Not yet, but we have a theory and a test build in
 https://tracker.ceph.com/issues/43364#note-6, if anybody would like to
 give it a try.

 Thanks,
 Neha

 On Fri, Dec 20, 2019 at 2:31 PM Sasha Litvak
  wrote:  
>
> Was the root cause found and fixed?  If so, will the fix be available in 
> 14.2.6 or sooner?
>
> On Thu, Dec 19, 2019 at 5:48 PM Mark Nelson  wrote:  
>>
>> Hi Paul,
>>
>>
>> Thanks for gathering this!  It looks to me like at the very least we
>> should redo the fixed_u_to_string and fixed_to_string functions in
>> common/Formatter.cc.  That alone looks like it's having a pretty
>> significant impact.
>>
>>
>> Mark
>>
>>
>> On 12/19/19 2:09 PM, Paul Mezzanini wrote:  
>>> Based on what we've seen with perf, we think this is the relevant 
>>> section.  (attached is also the whole file)
>>>
>>> Thread: 73 (mgr-fin) - 1000 samples
>>>
>>> + 100.00% clone
>>>+ 100.00% start_thread
>>>  + 100.00% Finisher::finisher_thread_entry()
>>>+ 99.40% Context::complete(int)
>>>| + 99.40% FunctionContext::finish(int)
>>>|   + 99.40% ActivePyModule::notify(std::string const&, 
>>> std::string const&)
>>>| + 91.30% PyObject_CallMethod
>>>| | + 91.30% call_function_tail
>>>| |   + 91.30% PyObject_Call
>>>| | + 91.30% instancemethod_call
>>>| |   + 91.30% PyObject_Call
>>>| | + 91.30% function_call
>>>| |   + 91.30% PyEval_EvalCodeEx
>>>| | + 88.40% PyEval_EvalFrameEx
>>>| | | + 88.40% PyEval_EvalFrameEx
>>>| | |   + 88.40% ceph_state_get(BaseMgrModule*, 
>>> _object*)
>>>| | | + 88.40% 
>>> ActivePyModules::get_python(std::string const&)
>>>| | |   + 51.10% 
>>> PGMap::dump_osd_stats(ceph::Formatter*) const
>>>| | |   | + 51.10% 
>>> osd_stat_t::dump(ceph::Formatter*) const
>>>| | |   |   + 22.50% 
>>> ceph::fixed_u_to_string(unsigned long, int)
>>>| | |   |   | + 10.50% 
>>> std::basic_ostringstream, 
>>> std::allocator >::basic_ostringstream(std::_Ios_Openmode)
>>>| | |   |   | | + 9.30% std::basic_ios>> std::char_traits >::init(std::basic_streambuf>> std::char_traits >*)
>>>| | |   |   | | | + 7.00% 
>>> std::basic_ios 
>>> >::_M_cache_locale(std::locale const&)
>>>| | |   |   | | | | + 1.60% std::ctype 
>>> const& std::use_facet >(std::locale const&)
>>>| | |   |   | | | | | + 1.50% __dynamic_cast
>>>| | |   |   | | | | |   + 0.80% 
>>> __cxxabiv1::__vmi_class_type_info::__do_dyncast(long, 
>>> __cxxabiv1::__class_type_info::__sub_kind, 
>>> __cxxabiv1::__class_type_info const*, void const*, 
>>> __cxxabiv1::__class_type_info const*, void const*, 
>>> __cxxabiv1::__class_type_info::__dyncast_result&) const
>>>| | |   |   | | | | + 1.40% bool 
>>> std::has_facet >(std::locale const&)
>>>| | |   |   | | | | | + 1.30% __dynamic_cast
>>>| | |   |   | | | | |   + 0.9

[ceph-users] Re: High CPU usage by ceph-mgr in 14.2.5

2020-01-16 Thread Thomas Schneider

Hi,

I've experienced exactly the same with 14.2.5 and upgraded to 14.2.6.
I'm running 7 node cluster with ~ 500 OSDs.
Since the upgrade the CPU for ceph-mgr is back to normal, and ceph
balancer status is responsive.
However, balancing is still not working... but this is another issue.

Thomas

Am 16.01.2020 um 11:15 schrieb Wido den Hollander:
> Anybody upgraded to 14.2.6 yet?
>
> On a 1800 OSD cluster I see that ceph-mgr is consuming 200 to 450% CPU
> on a 4C/8T system (Intel Xeon E3-1230 3.3Ghz CPU).
>
> The logs don't show anything very special, it's just that the mgr is
> super busy.
>
> I noticed this when I executed:
>
> $ ceph balancer status
>
> That command wouldn't return and then I checked the mgr. Only after
> restarting ceph-mgr the balancer module returned results again. It
> didn't change the CPU usage, it's still consuming a lot of CPU, but at
> least the balancer seems to work again.
>
> Wido
>
> On 1/9/20 10:21 AM, Lars Täuber wrote:
>> yesterday:
>> https://ceph.io/releases/v14-2-6-nautilus-released/
>>
>>
>> Cheers,
>> Lars
>>
>> Thu, 9 Jan 2020 10:10:12 +0100
>> Wido den Hollander  ==> Neha Ojha , Sasha 
>> Litvak  :
>>> On 12/24/19 9:19 PM, Neha Ojha wrote:
 The root cause of this issue is the overhead added by the network ping
 time monitoring feature for the mgr to process.
 We have a fix that disables sending the network ping times related
 stats to the mgr and Eric has helped verify the fix(Thanks Eric!) -
 https://tracker.ceph.com/issues/43364#note-9. We'll get this fix out
 in 14.2.6 after the holidays.
   
>>> It's after the holidays now and this is affecting a lot of deployments.
>>> Can people expect 14.2.6 soon?
>>>
>>> Wido
>>>

 On Fri, Dec 20, 2019 at 6:24 PM Neha Ojha  wrote:  
> Not yet, but we have a theory and a test build in
> https://tracker.ceph.com/issues/43364#note-6, if anybody would like to
> give it a try.
>
> Thanks,
> Neha
>
> On Fri, Dec 20, 2019 at 2:31 PM Sasha Litvak
>  wrote:  
>> Was the root cause found and fixed?  If so, will the fix be available in 
>> 14.2.6 or sooner?
>>
>> On Thu, Dec 19, 2019 at 5:48 PM Mark Nelson  wrote:  
>>> Hi Paul,
>>>
>>>
>>> Thanks for gathering this!  It looks to me like at the very least we
>>> should redo the fixed_u_to_string and fixed_to_string functions in
>>> common/Formatter.cc.  That alone looks like it's having a pretty
>>> significant impact.
>>>
>>>
>>> Mark
>>>
>>>
>>> On 12/19/19 2:09 PM, Paul Mezzanini wrote:  
 Based on what we've seen with perf, we think this is the relevant 
 section.  (attached is also the whole file)

 Thread: 73 (mgr-fin) - 1000 samples

 + 100.00% clone
+ 100.00% start_thread
  + 100.00% Finisher::finisher_thread_entry()
+ 99.40% Context::complete(int)
| + 99.40% FunctionContext::finish(int)
|   + 99.40% ActivePyModule::notify(std::string const&, 
 std::string const&)
| + 91.30% PyObject_CallMethod
| | + 91.30% call_function_tail
| |   + 91.30% PyObject_Call
| | + 91.30% instancemethod_call
| |   + 91.30% PyObject_Call
| | + 91.30% function_call
| |   + 91.30% PyEval_EvalCodeEx
| | + 88.40% PyEval_EvalFrameEx
| | | + 88.40% PyEval_EvalFrameEx
| | |   + 88.40% ceph_state_get(BaseMgrModule*, 
 _object*)
| | | + 88.40% 
 ActivePyModules::get_python(std::string const&)
| | |   + 51.10% 
 PGMap::dump_osd_stats(ceph::Formatter*) const
| | |   | + 51.10% 
 osd_stat_t::dump(ceph::Formatter*) const
| | |   |   + 22.50% 
 ceph::fixed_u_to_string(unsigned long, int)
| | |   |   | + 10.50% 
 std::basic_ostringstream, 
 std::allocator >::basic_ostringstream(std::_Ios_Openmode)
| | |   |   | | + 9.30% 
 std::basic_ios 
 >::init(std::basic_streambuf >*)
| | |   |   | | | + 7.00% 
 std::basic_ios 
 >::_M_cache_locale(std::locale const&)
| | |   |   | | | | + 1.60% 
 std::ctype const& std::use_facet >(std::locale 
 const&)
| | |   |   | | | | | + 1.50% __dynamic_cast
| | |   |   | | | | |   + 0.80% 
 __cxxabiv1::__vmi_class_type_info::__do_dyncast(long, 
 __cxxabiv1::__class_type_info::__sub_kind, 
 __

[ceph-users] Re: High CPU usage by ceph-mgr in 14.2.5

2020-01-16 Thread Dan van der Ster

Hey Wido,
We upgraded a 550-osd cluster from 14.2.4 to 14.2.6 and everything seems to
be working fine. Here's top:

PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
COMMAND

1432693 ceph  20   0 3246580   2.0g  18260 S  78.4 13.9   2760:58
ceph-mgr

2075038 ceph  20   0 2235072   1.1g  16408 S  11.6  7.6 176:15.30
ceph-mon

And the balancer is quick:

# ceph balancer status
{
"last_optimize_duration": "0:00:02.806449",
"plans": [],
"mode": "upmap",
"active": true,
"optimize_result": "Optimization plan created successfully",
"last_optimize_started": "Thu Jan 16 11:26:19 2020"
}

Cheers, Dan


On Thu, Jan 16, 2020 at 11:19 AM Wido den Hollander  wrote:

> Anybody upgraded to 14.2.6 yet?
>
> On a 1800 OSD cluster I see that ceph-mgr is consuming 200 to 450% CPU
> on a 4C/8T system (Intel Xeon E3-1230 3.3Ghz CPU).
>
> The logs don't show anything very special, it's just that the mgr is
> super busy.
>
> I noticed this when I executed:
>
> $ ceph balancer status
>
> That command wouldn't return and then I checked the mgr. Only after
> restarting ceph-mgr the balancer module returned results again. It
> didn't change the CPU usage, it's still consuming a lot of CPU, but at
> least the balancer seems to work again.
>
> Wido
>
> On 1/9/20 10:21 AM, Lars Täuber wrote:
> > yesterday:
> > https://ceph.io/releases/v14-2-6-nautilus-released/
> >
> >
> > Cheers,
> > Lars
> >
> > Thu, 9 Jan 2020 10:10:12 +0100
> > Wido den Hollander  ==> Neha Ojha ,
> Sasha Litvak  :
> >> On 12/24/19 9:19 PM, Neha Ojha wrote:
> >>> The root cause of this issue is the overhead added by the network ping
> >>> time monitoring feature for the mgr to process.
> >>> We have a fix that disables sending the network ping times related
> >>> stats to the mgr and Eric has helped verify the fix(Thanks Eric!) -
> >>> https://tracker.ceph.com/issues/43364#note-9. We'll get this fix out
> >>> in 14.2.6 after the holidays.
> >>>
> >>
> >> It's after the holidays now and this is affecting a lot of deployments.
> >> Can people expect 14.2.6 soon?
> >>
> >> Wido
> >>
> >>>
> >>>
> >>> On Fri, Dec 20, 2019 at 6:24 PM Neha Ojha  wrote:
> 
>  Not yet, but we have a theory and a test build in
>  https://tracker.ceph.com/issues/43364#note-6, if anybody would like
> to
>  give it a try.
> 
>  Thanks,
>  Neha
> 
>  On Fri, Dec 20, 2019 at 2:31 PM Sasha Litvak
>   wrote:
> >
> > Was the root cause found and fixed?  If so, will the fix be
> available in 14.2.6 or sooner?
> >
> > On Thu, Dec 19, 2019 at 5:48 PM Mark Nelson 
> wrote:
> >>
> >> Hi Paul,
> >>
> >>
> >> Thanks for gathering this!  It looks to me like at the very least we
> >> should redo the fixed_u_to_string and fixed_to_string functions in
> >> common/Formatter.cc.  That alone looks like it's having a pretty
> >> significant impact.
> >>
> >>
> >> Mark
> >>
> >>
> >> On 12/19/19 2:09 PM, Paul Mezzanini wrote:
> >>> Based on what we've seen with perf, we think this is the relevant
> section.  (attached is also the whole file)
> >>>
> >>> Thread: 73 (mgr-fin) - 1000 samples
> >>>
> >>> + 100.00% clone
> >>>+ 100.00% start_thread
> >>>  + 100.00% Finisher::finisher_thread_entry()
> >>>+ 99.40% Context::complete(int)
> >>>| + 99.40% FunctionContext::finish(int)
> >>>|   + 99.40% ActivePyModule::notify(std::string const&,
> std::string const&)
> >>>| + 91.30% PyObject_CallMethod
> >>>| | + 91.30% call_function_tail
> >>>| |   + 91.30% PyObject_Call
> >>>| | + 91.30% instancemethod_call
> >>>| |   + 91.30% PyObject_Call
> >>>| | + 91.30% function_call
> >>>| |   + 91.30% PyEval_EvalCodeEx
> >>>| | + 88.40% PyEval_EvalFrameEx
> >>>| | | + 88.40% PyEval_EvalFrameEx
> >>>| | |   + 88.40%
> ceph_state_get(BaseMgrModule*, _object*)
> >>>| | | + 88.40%
> ActivePyModules::get_python(std::string const&)
> >>>| | |   + 51.10%
> PGMap::dump_osd_stats(ceph::Formatter*) const
> >>>| | |   | + 51.10%
> osd_stat_t::dump(ceph::Formatter*) const
> >>>| | |   |   + 22.50%
> ceph::fixed_u_to_string(unsigned long, int)
> >>>| | |   |   | + 10.50%
> std::basic_ostringstream, std::allocator
> >::basic_ostringstream(std::_Ios_Openmode)
> >>>| | |   |   | | + 9.30%
> std::basic_ios
> >::init(std::basic_streambuf >*)
> >>>| | |   |   | | | + 7.00%
> std::basic_ios >::_M_cache_locale(std::locale
> const&)
> >>>| | |   |   | | | | +

[ceph-users] Re: Benchmark results for Seagate Exos2X14 Dual Actuator HDDs

2020-01-16 Thread vitalif


Hi,

The results look strange to me...

To begin with, it's strange that read and write performance differs. But 
the thing is that a lot (if not most) large Seagate EXOS drives have 
internal SSD cache (~8 GB of it). I suspect that new EXOS also does and 
I'm not sure if Toshiba has it. It could explain the write performance 
difference in your test.


Try to disable Seagates' write cache with sdparm --set WCE=0 /dev/sdX 
and see how the performance changes. If there is an SSD cache you'll 
probably see an increase in iops. Due to the nature of Bluestore and at 
least with an external block.db on SSD the difference is like ~230 iops 
vs ~1200 iops with iodepth=1. This is the result for ST8000NM0055.


Also it's strange that read performance is almost the same. Can you 
benchmark the drive with fio alone, without Ceph?



Hi,

we ran some benchmarks with a few samples of Seagate's new HDDs that
some of you might find interesting:

Blog post:

https://croit.io/2020/01/06/2020-01-06-benchmark-mach2

GitHub repo with scripts and raw data:
https://github.com/croit/benchmarks/tree/master/mach2-disks

Tl;dr: way faster for writes, somewhat faster for reads in some
scenarios

Paul

--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at
https://croit.io

Looking for Ceph training? We have some free spots available
https://croit.io/training/4-days-ceph-in-depth-training

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io [1]
Tel: +49 89 1896585 90

Links:
--
[1] http://www.croit.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Uneven Node utilization

2020-01-16 Thread Sasha Litvak

Hello, Cephers,

I have a small 6 node cluster with 36 OSDs.  When running the
benchmark/torture tests I noticed that some nodes, usually storage2n6-la
and also sometimes others are utilized much more.  I see some osds are used
100% and load average goes up to 21 while on the others the load average is
5 - 6 and osds are within 40 - 50 - 60% of utilization.  I cannot use upmap
mode for balancer because I still have some client machines using hammer.
I wonder if my issue is caused by compat balancing mode as compat weight
shows nodes with the same number and disk size but different compat
weights.  If so what can I do to improve the load/disk usage distribution
in the cluster?   Also, my legacy client machines only need to access
cephfs on the new cluster, so I wonder if keeping hammer as the oldest
client version makes sense and I should change it to jewel and set crush
tunables to optimal.

Help is greatly appreciated,


ceph df
RAW STORAGE:.
CLASS SIZE   AVAIL  USEDRAW USED %RAW USED
ssd   94 TiB 88 TiB 5.9 TiB  5.9 TiB  6.29
TOTAL 94 TiB 88 TiB 5.9 TiB  5.9 TiB  6.29

POOLS:
POOLID STORED  OBJECTS USED%USED
  MAX AVAIL
cephfs_data  1 1.6 TiB   3.77M 4.9 TiB  5.57
 28 TiB
cephfs_metadata  2 3.9 GiB 367.34k 4.3 GiB 0
 28 TiB
one  5 344 GiB  90.94k 1.0 TiB  1.20
 28 TiB
 ceph -s
  cluster:
id: 9b4468b7-5bf2-4964-8aec-4b2f4bee87ad
health: HEALTH_OK

  services:
mon: 3 daemons, quorum storage2n1-la,storage2n2-la,storage2n3-la (age
39h)
mgr: storage2n1-la(active, since 39h), standbys: storage2n2-la,
storage2n3-la
mds: cephfs:1 {0=storage2n4-la=up:active} 1 up:standby-replay 1
up:standby
osd: 36 osds: 36 up (since 37h), 36 in (since 10w)

  data:
pools:   3 pools, 1664 pgs
objects: 4.23M objects, 1.9 TiB
usage:   5.9 TiB used, 88 TiB / 94 TiB avail
pgs: 1664 active+clean

  io:
client:   1.2 KiB/s rd, 46 KiB/s wr, 5 op/s rd, 2 op/s wr

Ceph osd df looks like this

ID CLASS WEIGHT  REWEIGHT SIZERAW USE DATAOMAPMETA AVAIL
%USE VAR  PGS STATUS
 6   ssd 1.74609  1.0 1.7 TiB 115 GiB 114 GiB 186 MiB  838 MiB 1.6 TiB
6.45 1.02  92 up
12   ssd 1.74609  1.0 1.7 TiB 122 GiB 121 GiB  90 MiB  934 MiB 1.6 TiB
6.81 1.08  92 up
18   ssd 1.74609  1.0 1.7 TiB 112 GiB 111 GiB 107 MiB  917 MiB 1.6 TiB
6.24 0.99  91 up
24   ssd 3.49219  1.0 3.5 TiB 233 GiB 232 GiB 206 MiB  818 MiB 3.3 TiB
6.53 1.04 185 up
30   ssd 3.49219  1.0 3.5 TiB 224 GiB 223 GiB 246 MiB  778 MiB 3.3 TiB
6.25 0.99 187 up
35   ssd 3.49219  1.0 3.5 TiB 216 GiB 215 GiB 252 MiB  772 MiB 3.3 TiB
6.04 0.96 184 up
 5   ssd 1.74609  1.0 1.7 TiB 112 GiB 111 GiB  88 MiB  936 MiB 1.6 TiB
6.28 1.00  92 up
11   ssd 1.74609  1.0 1.7 TiB 112 GiB 111 GiB 112 MiB  912 MiB 1.6 TiB
6.26 0.99  92 up
17   ssd 1.74609  1.0 1.7 TiB 112 GiB 111 GiB 274 MiB  750 MiB 1.6 TiB
6.25 0.99  94 up
23   ssd 3.49219  1.0 3.5 TiB 234 GiB 233 GiB 192 MiB  832 MiB 3.3 TiB
6.54 1.04 183 up
29   ssd 3.49219  1.0 3.5 TiB 216 GiB 215 GiB 356 MiB  668 MiB 3.3 TiB
6.03 0.96 184 up
34   ssd 3.49219  1.0 3.5 TiB 227 GiB 226 GiB 267 MiB  757 MiB 3.3 TiB
6.34 1.01 184 up
 4   ssd 1.74609  1.0 1.7 TiB 125 GiB 124 GiB  16 MiB 1008 MiB 1.6 TiB
7.00 1.11  94 up
10   ssd 1.74609  1.0 1.7 TiB 108 GiB 107 GiB 163 MiB  861 MiB 1.6 TiB
6.01 0.96  93 up
16   ssd 1.74609  1.0 1.7 TiB 107 GiB 106 GiB 163 MiB  861 MiB 1.6 TiB
6.00 0.95  94 up
22   ssd 3.49219  1.0 3.5 TiB 221 GiB 220 GiB 385 MiB  700 MiB 3.3 TiB
6.18 0.98 187 up
28   ssd 3.49219  1.0 3.5 TiB 223 GiB 222 GiB 257 MiB  767 MiB 3.3 TiB
6.23 0.99 186 up
33   ssd 3.49219  1.0 3.5 TiB 241 GiB 240 GiB 233 MiB  791 MiB 3.3 TiB
6.74 1.07 185 up
 1   ssd 1.74609  1.0 1.7 TiB 103 GiB 102 GiB 240 MiB  784 MiB 1.6 TiB
5.76 0.92  93 up
 7   ssd 1.74609  1.0 1.7 TiB 117 GiB 116 GiB  70 MiB  954 MiB 1.6 TiB
6.56 1.04  91 up
13   ssd 1.74609  1.0 1.7 TiB 126 GiB 125 GiB  76 MiB  948 MiB 1.6 TiB
7.03 1.12  95 up
19   ssd 3.49219  1.0 3.5 TiB 230 GiB 229 GiB 307 MiB  717 MiB 3.3 TiB
6.44 1.02 186 up
25   ssd 3.49219  1.0 3.5 TiB 220 GiB 219 GiB 309 MiB  715 MiB 3.3 TiB
6.15 0.98 185 up
31   ssd 3.49219  1.0 3.5 TiB 223 GiB 222 GiB 205 MiB  819 MiB 3.3 TiB
6.23 0.99 186 up
 0   ssd 1.74609  1.0 1.7 TiB 116 GiB 115 GiB 151 MiB  873 MiB 1.6 TiB
6.49 1.03  93 up
 3   ssd 1.74609  1.0 1.7 TiB 121 GiB 120 GiB  89 MiB  935 MiB 1.6 TiB
6.77 1.08  91 up
 9   ssd 1.74609  1.0 1.7 TiB 104 GiB 103 GiB 183 MiB  841 MiB 1.6 TiB
5.81 0.92  93 up
15   ssd 3.49219  1.0 3.5 TiB 222 GiB 221 GiB 205 MiB  819 MiB 3.3 TiB
6.20 0.98 185 up
21   ssd 3.49219  1.0 3.5 TiB 213 GiB 21

[ceph-users] Ceph MDS specific perf info disappeared in Nautilus

2020-01-16 Thread Stefan Kooman

Hi,

The command "ceph daemon mds.$mds perf dump" does not give the
collection with MDS specific data anymore. In Mimic I get the following
MDS specific collections:

- mds
- mds_cache
- mds_log
- mds_mem
- mds_server
- mds_sessions

But those are not available in Nautilus anymore (14.2.4). Also not
listed in a "perf schema".

Where did these metrics go?

Thanks,

Stefan

-- 
| BIT BV  https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: High CPU usage by ceph-mgr in 14.2.5

2020-01-16 Thread Reed Dier

Chiming in to mirror this.

250 OSDs, and after 14.2.6 CPU is much lower on the mgr, and the balancer 
doesn't hang, which was the main thing that would stall previously.

Reed

> On Jan 16, 2020, at 4:30 AM, Dan van der Ster  wrote:
> 
> Hey Wido,
> We upgraded a 550-osd cluster from 14.2.4 to 14.2.6 and everything seems to 
> be working fine. Here's top:
> 
> PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND 
>   
>   
> 1432693 ceph  20   0 3246580   2.0g  18260 S  78.4 13.9   2760:58 
> ceph-mgr  
>   
> 2075038 ceph  20   0 2235072   1.1g  16408 S  11.6  7.6 176:15.30 
> ceph-mon
> 
> And the balancer is quick:
> 
> # ceph balancer status
> {
> "last_optimize_duration": "0:00:02.806449", 
> "plans": [], 
> "mode": "upmap", 
> "active": true, 
> "optimize_result": "Optimization plan created successfully", 
> "last_optimize_started": "Thu Jan 16 11:26:19 2020"
> }
> 
> Cheers, Dan
> 
> 
> On Thu, Jan 16, 2020 at 11:19 AM Wido den Hollander  > wrote:
> Anybody upgraded to 14.2.6 yet?
> 
> On a 1800 OSD cluster I see that ceph-mgr is consuming 200 to 450% CPU
> on a 4C/8T system (Intel Xeon E3-1230 3.3Ghz CPU).
> 
> The logs don't show anything very special, it's just that the mgr is
> super busy.
> 
> I noticed this when I executed:
> 
> $ ceph balancer status
> 
> That command wouldn't return and then I checked the mgr. Only after
> restarting ceph-mgr the balancer module returned results again. It
> didn't change the CPU usage, it's still consuming a lot of CPU, but at
> least the balancer seems to work again.
> 
> Wido
> 
> On 1/9/20 10:21 AM, Lars Täuber wrote:
> > yesterday:
> > https://ceph.io/releases/v14-2-6-nautilus-released/ 
> > 
> > 
> > 
> > Cheers,
> > Lars
> > 
> > Thu, 9 Jan 2020 10:10:12 +0100
> > Wido den Hollander mailto:w...@42on.com>> ==> Neha Ojha 
> > mailto:no...@redhat.com>>, Sasha Litvak 
> > mailto:alexander.v.lit...@gmail.com>> :
> >> On 12/24/19 9:19 PM, Neha Ojha wrote:
> >>> The root cause of this issue is the overhead added by the network ping
> >>> time monitoring feature for the mgr to process.
> >>> We have a fix that disables sending the network ping times related
> >>> stats to the mgr and Eric has helped verify the fix(Thanks Eric!) -
> >>> https://tracker.ceph.com/issues/43364#note-9 
> >>> . We'll get this fix out
> >>> in 14.2.6 after the holidays.
> >>>   
> >>
> >> It's after the holidays now and this is affecting a lot of deployments.
> >> Can people expect 14.2.6 soon?
> >>
> >> Wido
> >>
> >>>
> >>>
> >>> On Fri, Dec 20, 2019 at 6:24 PM Neha Ojha  >>> > wrote:  
> 
>  Not yet, but we have a theory and a test build in
>  https://tracker.ceph.com/issues/43364#note-6 
>  , if anybody would like to
>  give it a try.
> 
>  Thanks,
>  Neha
> 
>  On Fri, Dec 20, 2019 at 2:31 PM Sasha Litvak
>  mailto:alexander.v.lit...@gmail.com>> 
>  wrote:  
> >
> > Was the root cause found and fixed?  If so, will the fix be available 
> > in 14.2.6 or sooner?
> >
> > On Thu, Dec 19, 2019 at 5:48 PM Mark Nelson  > > wrote:  
> >>
> >> Hi Paul,
> >>
> >>
> >> Thanks for gathering this!  It looks to me like at the very least we
> >> should redo the fixed_u_to_string and fixed_to_string functions in
> >> common/Formatter.cc.  That alone looks like it's having a pretty
> >> significant impact.
> >>
> >>
> >> Mark
> >>
> >>
> >> On 12/19/19 2:09 PM, Paul Mezzanini wrote:  
> >>> Based on what we've seen with perf, we think this is the relevant 
> >>> section.  (attached is also the whole file)
> >>>
> >>> Thread: 73 (mgr-fin) - 1000 samples
> >>>
> >>> + 100.00% clone
> >>>+ 100.00% start_thread
> >>>  + 100.00% Finisher::finisher_thread_entry()
> >>>+ 99.40% Context::complete(int)
> >>>| + 99.40% FunctionContext::finish(int)
> >>>|   + 99.40% ActivePyModule::notify(std::string const&, 
> >>> std::string const&)
> >>>| + 91.30% PyObject_CallMethod
> >>>| | + 91.30% call_function_tail
> >>>| |   + 91.30% PyObject_Call
> >>>| | + 91.30% instancemethod_call
> >>>| |   + 91.30% PyObject_Call
> >>>| | + 91.30% function_call
> >>>| |   + 91.30% PyEval_EvalCodeEx
> >>>| | + 88.40% Py

[ceph-users] Re: Benchmark results for Seagate Exos2X14 Dual Actuator HDDs

2020-01-16 Thread Paul Emmerich

Sorry, we no longer have these test drives :(


Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


On Thu, Jan 16, 2020 at 1:48 PM  wrote:

> Hi,
>
> The results look strange to me...
>
> To begin with, it's strange that read and write performance differs. But
> the thing is that a lot (if not most) large Seagate EXOS drives have
> internal SSD cache (~8 GB of it). I suspect that new EXOS also does and
> I'm not sure if Toshiba has it. It could explain the write performance
> difference in your test.
>
> Try to disable Seagates' write cache with sdparm --set WCE=0 /dev/sdX
> and see how the performance changes. If there is an SSD cache you'll
> probably see an increase in iops. Due to the nature of Bluestore and at
> least with an external block.db on SSD the difference is like ~230 iops
> vs ~1200 iops with iodepth=1. This is the result for ST8000NM0055.
>
> Also it's strange that read performance is almost the same. Can you
> benchmark the drive with fio alone, without Ceph?
>
> > Hi,
> >
> > we ran some benchmarks with a few samples of Seagate's new HDDs that
> > some of you might find interesting:
> >
> > Blog post:
> >
> > https://croit.io/2020/01/06/2020-01-06-benchmark-mach2
> >
> > GitHub repo with scripts and raw data:
> > https://github.com/croit/benchmarks/tree/master/mach2-disks
> >
> > Tl;dr: way faster for writes, somewhat faster for reads in some
> > scenarios
> >
> > Paul
> >
> > --
> > Paul Emmerich
> >
> > Looking for help with your Ceph cluster? Contact us at
> > https://croit.io
> >
> > Looking for Ceph training? We have some free spots available
> > https://croit.io/training/4-days-ceph-in-depth-training
> >
> > croit GmbH
> > Freseniusstr. 31h
> > 81247 München
> > www.croit.io [1]
> > Tel: +49 89 1896585 90
> >
> > Links:
> > --
> > [1] http://www.croit.io
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Benchmark results for Seagate Exos2X14 Dual Actuator HDDs

[ceph-users] Re: Benchmark results for Seagate Exos2X14 Dual Actuator HDDs

[ceph-users] Re: Benchmark results for Seagate Exos2X14 Dual Actuator HDDs

[ceph-users] Re: Benchmark results for Seagate Exos2X14 Dual Actuator HDDs

[ceph-users] Re: High CPU usage by ceph-mgr in 14.2.5

[ceph-users] Re: High CPU usage by ceph-mgr in 14.2.5

[ceph-users] Re: High CPU usage by ceph-mgr in 14.2.5

[ceph-users] Re: Benchmark results for Seagate Exos2X14 Dual Actuator HDDs

[ceph-users] Uneven Node utilization

[ceph-users] Ceph MDS specific perf info disappeared in Nautilus

[ceph-users] Re: High CPU usage by ceph-mgr in 14.2.5

[ceph-users] Re: Benchmark results for Seagate Exos2X14 Dual Actuator HDDs

12 matches

Site Navigation

Mail list logo

Footer information