[ceph-users] Best practices for OSD on bcache

2021-02-28 Thread Norman.Kern
Hi, guys

I am testing ceph on bcache devices,  I found the performance is not good as 
expected. Does anyone have any best practices for it?  Thanks.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Best practices for OSD on bcache

2021-03-02 Thread Norman.Kern

On 2021/3/1 下午6:32, Matthias Ferdinand wrote:
> On Mon, Mar 01, 2021 at 12:37:38PM +0800, Norman.Kern wrote:
>> Hi, guys
>>
>> I am testing ceph on bcache devices,  I found the performance is not
>> good as expected. Does anyone have any best practices for it?  Thanks.
> Hi,
>
> sorry to say, but since use cases and workloads differ so much, there is
> no easy list of best practises.
>
> Number one reason for low bcache performance is consumer-grade caching
> devices, since bcache does a lot of write amplification and not even
> "PRO"
> consumer devices will give you decent and consistent performance. You
> might even end up with worse performance than on direct HDD under load.
>
> With decent caching device, there still are quite a few tuning knobs in
> bcache, but it all depends on your workload.
>
> You also have to consider the added complexity of a bcache setup for
> maintenance operations. Moving an OSD between hosts becomes a complex
> operation (wait for bcache draining, detach bcache, move HDD, create new
> bcache caching device, attach bcache).

Matthias, 

I agreed with you for tuning. I  ask this question just for that my OSDs have 
problems when the

cache_available_percent less than 30, the SSDs almost useless and all I/Os 
bypass to HDDs with large latency.

So I think maybe I have wrong configs for bcache.

>
> Regards
> Matthias
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Best practices for OSD on bcache

2021-03-02 Thread Norman.Kern

On 2021/3/2 上午5:09, Andreas John wrote:
> Hallo,
>
> do you expect that to be better (faster), than having the OSD's Journal
> on a different disk (ssd, nvme) ?
No, I created the OSD storage devices using bcache devices.
>
>
> rgds,
>
> derjohn
>
>
> On 01.03.21 05:37, Norman.Kern wrote:
>> Hi, guys
>>
>> I am testing ceph on bcache devices,  I found the performance is not good as 
>> expected. Does anyone have any best practices for it?  Thanks.
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Best practices for OSD on bcache

2021-03-02 Thread Norman.Kern

On 2021/3/2 下午4:49, James Page wrote:
> Hi Norman
>
> On Mon, Mar 1, 2021 at 4:38 AM Norman.Kern  wrote:
>
>> Hi, guys
>>
>> I am testing ceph on bcache devices,  I found the performance is not good
>> as expected. Does anyone have any best practices for it?  Thanks.
>>
> I've used bcache quite a bit with Ceph with the following configuration
> options tweaked
>
> a) use writeback mode rather than writethrough (which is the default)
>
> This ensures that the cache device is actually used for write caching
>
> b) turn off the sequential cutoff
>
> sequential_cutoff = 0
>
> This means that sequential writes will also always go to the cache device
> rather than the backing device
>
> c) disable the congestion read and write thresholds
>
> congested_read_threshold_us = congested_write_threshold_us = 0
>
> The following repository:
>
> https://git.launchpad.net/charm-bcache-tuning/tree/src/files
>
> has a python script and systemd configuration todo b) and c) automatically
> on all bcache devices on boot; a) we let the provisioning system take care
> of.
>
> HTH

I have set the variables described above.  Didn't you met the latency problems 
when cache used increaced 30%?

My cache status like this:

root@WXS0089:~# cat /sys/block/sda/bcache/priority_stats
Unused: 4%
Clean:  28%
Dirty:  70%
Metadata:   0%
Average:    551
Sectors per Q:  29197312
Quantiles:  [27 135 167 199 230 262 294 326 358 390 422 454 486 517 549 581 
613 645 677 709 741 773 804 836 844 847 851 855 860 868 881]


>
>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Best practices for OSD on bcache

2021-03-02 Thread Norman.Kern
James,

Can you tell me what's the hardware config of your bcache? I use the 400G SATA 
SSD as cache device and

10T HDD as the storage device.  Hardware relationed?

On 2021/3/2 下午4:49, James Page wrote:
> Hi Norman
>
> On Mon, Mar 1, 2021 at 4:38 AM Norman.Kern  wrote:
>
>> Hi, guys
>>
>> I am testing ceph on bcache devices,  I found the performance is not good
>> as expected. Does anyone have any best practices for it?  Thanks.
>>
> I've used bcache quite a bit with Ceph with the following configuration
> options tweaked
>
> a) use writeback mode rather than writethrough (which is the default)
>
> This ensures that the cache device is actually used for write caching
>
> b) turn off the sequential cutoff
>
> sequential_cutoff = 0
>
> This means that sequential writes will also always go to the cache device
> rather than the backing device
>
> c) disable the congestion read and write thresholds
>
> congested_read_threshold_us = congested_write_threshold_us = 0
>
> The following repository:
>
> https://git.launchpad.net/charm-bcache-tuning/tree/src/files
>
> has a python script and systemd configuration todo b) and c) automatically
> on all bcache devices on boot; a) we let the provisioning system take care
> of.
>
> HTH
>
>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: balance OSD usage.

2021-03-07 Thread Norman.Kern
I met the same problem, I set the reweight value to make it not worse. Do you 
solve it by setting balancer? My ceph verison is 14.2.5.

Thank you,

Norman

On 2021/3/7 下午12:40, Anthony D'Atri wrote:
> ceph balancer status
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Openstack rbd image Error deleting problem

2021-03-09 Thread Norman.Kern
Hi Guys,

I have used Ceph rbd for Openstack for sometime, I met a problem while 
destroying a VM. The Openstack tried to

delete rbd image but failed. I have a test deleting a image by rbd command, it 
costs lots of time(image size 512G or more).

Anyone met the same problem with me? 

Thanks,

Norman
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Openstack rbd image Error deleting problem

2021-03-10 Thread Norman.Kern

On 2021/3/10 下午3:05, Konstantin Shalygin wrote:
>> On 10 Mar 2021, at 09:50, Norman.Kern  wrote:
>>
>> I have used Ceph rbd for Openstack for sometime, I met a problem while 
>> destroying a VM. The Openstack tried to
>>
>> delete rbd image but failed. I have a test deleting a image by rbd command, 
>> it costs lots of time(image size 512G or more).
>>
>> Anyone met the same problem with me?
> Object-map feature is enabled for your RBD's?

No, I use its default features like this:

rbd -p openstack-volumes-rs info volume-34a720b5-372f-4cb3-b5d0-e92e819b3cfc
rbd image 'volume-34a720b5-372f-4cb3-b5d0-e92e819b3cfc':
    size 2 TiB in 524288 objects
    order 22 (4 MiB objects)
    snapshot_count: 0
    id: c6a8a7964566af
    block_name_prefix: rbd_data.c6a8a7964566af
    format: 2
    features: layering
    op_features:
    flags:
    create_timestamp: Tue Aug 25 10:05:05 2020
    access_timestamp: Tue Feb 23 09:57:34 2021
    modify_timestamp: Thu Mar 11 08:59:11 2021
    parent: openstack-images-rs/6cdcd5a0-410b-43cd-b9b8-ed61a611ed91@snap
    overlap: 15 GiB

Thanks,

Norman

>
>
>
> k
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] How to know which client hold the lock of a file

2021-03-22 Thread Norman.Kern
Hi,

Anyone knows how to know which client hold lock of a file in Ceph fs?

I met a dead lock problem that a client holding on get the lock, but I don't 
kown which client held it.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: v16.2.2 Pacific released

2021-05-07 Thread Norman.Kern
Hi David,

The web page is missing: 
https://docs.ceph.com/en/latest/docs/master/install/get-packages/


\  SORRY/
 \ /
  \This page does /
   ]   not exist yet.[,'|
   ] [   /  |
   ]___   ___[ ,'   |
   ]  ]\ /[  [ |:   |
   ]  ] \   / [  [ |:   |
   ]  ]  ] [  [  [ |:   |
   ]  ]  ]__ __[  [  [ |:   |
   ]  ]  ] ]\ _ /[ [  [  [ |:   |
   ]  ]  ] ] (#) [ [  [  [ :'
   ]  ]  ]_].nHn.[_[  [  [
   ]  ]  ]  H. [  [  [
   ]  ] /   `HH("N  \ [  [
   ]__]/ HHH  "  \[__[
   ] NNN [
   ] N/" [
   ] N H [
  /  N\
 /   q,\
/   \

在 2021/5/6 上午12:51, David Galloway 写道:
> This is the second backport release in the Pacific stable series. For a
> detailed release notes with links & changelog please refer to the
> official blog entry at https://ceph.io/releases/v16-2-2-pacific-released
>
> Notable Changes
> ---
> * Cephadm now supports an *ingress* service type that provides load
> balancing and HA (via haproxy and keepalived on a virtual IP) for RGW
> service.  The experimental *rgw-ha* service has been removed.
>
> Getting Ceph
> 
> * Git at git://github.com/ceph/ceph.git
> * Tarball at http://download.ceph.com/tarballs/ceph-16.2.2.tar.gz
> * For packages, see http://docs.ceph.com/docs/master/install/get-packages/
> * Release git sha1: e8f22dde28889481f4dda2beb8a07788204821d3
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Which verison of ceph is better

2021-10-18 Thread norman.kern

Hi guys,

I have a long holiday since this summer, I came back to setup a new ceph
server, I want to know which stable version of ceh you're using for
production?

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] How many data disks share one meta disks is better

2021-11-19 Thread norman.kern

Hi guys,

Nowadays, I have some SATA SSDs(400G) and HDDs(8T),How many HHDs(Data)
share one SSD(DB&Journal) is better?

And If the SSD is broken down, it will cause all OSDs which share it down?

Wait for your replies.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How many data disks share one meta disks is better

2021-11-19 Thread norman.kern

Hi Anthony,

Thanks for your reply.  If the SSD down, Do I have to rebuild the 3-4
OSDs and balance the data in the OSD?

在 2021/11/20 下午2:27, Anthony D'Atri 写道:



On Nov 19, 2021, at 10:25 PM, norman.kern  wrote:

Hi guys,

Nowadays, I have some SATA SSDs(400G) and HDDs(8T),How many HHDs(Data)
share one SSD(DB&Journal) is better?

With that mix people often do 3-4 to 1.


And If the SSD is broken down, it will cause all OSDs which share it down?

yes.


Wait for your replies.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Large latency for single thread

2021-12-15 Thread norman.kern
I create a rbd pool using only two SATA SSDs(one for data, another for 
database,WAL), and set the replica size 1.


After that, I setup a fio test on Host same with the OSD placed. I found 
the latency is hundreds micro-seconds(sixty micro-seconds for the raw 
SATA SSD ).


The fio outpus:

m-seqwr-004k-001q-001j: (groupid=0, jobs=1): err= 0: pid=46: Wed Dec 15 
14:05:32 2021
  write: IOPS=794, BW=3177KiB/s (3254kB/s)(559MiB/180002msec); 0 zone 
resets

    slat (usec): min=4, max=123, avg=22.30, stdev= 9.18
    clat (usec): min=630, max=16977, avg=1232.89, stdev=354.67
 lat (usec): min=639, max=17009, avg=1255.19, stdev=358.99
    clat percentiles (usec):
 |  1.00th=[  709],  5.00th=[  775], 10.00th=[  824], 20.00th=[  906],
 | 30.00th=[ 1074], 40.00th=[ 1172], 50.00th=[ 1237], 60.00th=[ 1303],
 | 70.00th=[ 1369], 80.00th=[ 1450], 90.00th=[ 1565], 95.00th=[ 1663],
 | 99.00th=[ 2606], 99.50th=[ 3261], 99.90th=[ 3785], 99.95th=[ 3949],
 | 99.99th=[ 6718]
   bw (  KiB/s): min= 1928, max= 5048, per=100.00%, avg=3179.54, 
stdev=588.79, samples=360
   iops    : min=  482, max= 1262, avg=794.76, stdev=147.20, 
samples=360

  lat (usec)   : 750=2.98%, 1000=22.41%
  lat (msec)   : 2=73.38%, 4=1.18%, 10=0.04%, 20=0.01%
  cpu  : usr=2.69%, sys=1.78%, ctx=145218, majf=0, minf=2
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 
>=64=0.0%
 submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%

 issued rwts: total=0,142985,0,0 short=0,0,0,0 dropped=0,0,0,0
 latency   : target=0, window=0, percentile=100.00%, depth=1


Parts of the OSD' perf status:

 "state_io_done_lat": {
    "avgcount": 151295,
    "sum": 0.336297058,
    "avgtime": 0.0
    },
    "state_kv_queued_lat": {
    "avgcount": 151295,
    "sum": 18.812333051,
    "avgtime": 0.000124342
    },
    "state_kv_commiting_lat": {
    "avgcount": 151295,
    "sum": 64.555436175,
    "avgtime": 0.000426685
    },
    "state_kv_done_lat": {
    "avgcount": 151295,
    "sum": 0.130403628,
    "avgtime": 0.00861
    },
    "state_deferred_queued_lat": {
    "avgcount": 148,
    "sum": 215.726286547,
    "avgtime": 1.457610044
    },

... ...

    "op_w_latency": {
    "avgcount": 151133,
    "sum": 130.134246667,
    "avgtime": 0.000861057
    },
    "op_w_process_latency": {
    "avgcount": 151133,
    "sum": 125.301196872,
    "avgtime": 0.000829079
    },
    "op_w_prepare_latency": {
    "avgcount": 151133,
    "sum": 29.892687947,
    "avgtime": 0.000197790
    },

Is it reasonable for the benchmark test case?  And how to improve it?  
It's really NOT friendly for single thread.



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Large latency for single thread

2021-12-21 Thread norman.kern

Marc,

Thanks for your reply. The wiki page is very helpful to me. I have 
analyzed the I/O flow and pretend to optimize the librbd client.


And I found RBD has support persistent 
cache(https://docs.ceph.com/en/pacific/rbd/rbd-persistent-write-back-cache/), 
and I will have a try.


P.S. Anyone has some best practices on the rbd persistent cache? I'm not 
sure it's stable or not, few information can be found in the docs.ceph.com



On 12/16/21 2:44 AM, Marc wrote:

Is this not just inherent to SDS? And wait for the new osd code, I think they 
are working on it.

https://yourcmc.ru/wiki/Ceph_performance



m-seqwr-004k-001q-001j: (groupid=0, jobs=1): err= 0: pid=46: Wed Dec 15
14:05:32 2021
    write: IOPS=794, BW=3177KiB/s (3254kB/s)(559MiB/180002msec); 0 zone
resets
      slat (usec): min=4, max=123, avg=22.30, stdev= 9.18
      clat (usec): min=630, max=16977, avg=1232.89, stdev=354.67
   lat (usec): min=639, max=17009, avg=1255.19, stdev=358.99
      clat percentiles (usec):
   |  1.00th=[  709],  5.00th=[  775], 10.00th=[  824],
20.00th=[  906],
   | 30.00th=[ 1074], 40.00th=[ 1172], 50.00th=[ 1237], 60.00th=[
1303],
   | 70.00th=[ 1369], 80.00th=[ 1450], 90.00th=[ 1565], 95.00th=[
1663],
   | 99.00th=[ 2606], 99.50th=[ 3261], 99.90th=[ 3785], 99.95th=[
3949],
   | 99.99th=[ 6718]
     bw (  KiB/s): min= 1928, max= 5048, per=100.00%, avg=3179.54,
stdev=588.79, samples=360
     iops    : min=  482, max= 1262, avg=794.76, stdev=147.20,
samples=360
    lat (usec)   : 750=2.98%, 1000=22.41%
    lat (msec)   : 2=73.38%, 4=1.18%, 10=0.04%, 20=0.01%
    cpu  : usr=2.69%, sys=1.78%, ctx=145218, majf=0, minf=2
    IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
  >=64=0.0%
   submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
  >=64=0.0%
   complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
  >=64=0.0%
   issued rwts: total=0,142985,0,0 short=0,0,0,0 dropped=0,0,0,0
   latency   : target=0, window=0, percentile=100.00%, depth=1


Parts of the OSD' perf status:

   "state_io_done_lat": {
      "avgcount": 151295,
      "sum": 0.336297058,
      "avgtime": 0.0
      },
      "state_kv_queued_lat": {
      "avgcount": 151295,
      "sum": 18.812333051,
      "avgtime": 0.000124342
      },
      "state_kv_commiting_lat": {
      "avgcount": 151295,
      "sum": 64.555436175,
      "avgtime": 0.000426685
      },
      "state_kv_done_lat": {
      "avgcount": 151295,
      "sum": 0.130403628,
      "avgtime": 0.00861
      },
      "state_deferred_queued_lat": {
      "avgcount": 148,
      "sum": 215.726286547,
      "avgtime": 1.457610044
      },

... ...

      "op_w_latency": {
      "avgcount": 151133,
      "sum": 130.134246667,
      "avgtime": 0.000861057
      },
      "op_w_process_latency": {
      "avgcount": 151133,
      "sum": 125.301196872,
      "avgtime": 0.000829079
      },
      "op_w_prepare_latency": {
      "avgcount": 151133,
      "sum": 29.892687947,
      "avgtime": 0.000197790
      },

Is it reasonable for the benchmark test case?  And how to improve it?
It's really NOT friendly for single thread.


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Large latency for single thread

2021-12-21 Thread norman.kern

Mark,

Thanks for your reply. I made the test on the local host and no replica 
pg set.  The crimson may help me a lot and I will do more tests.


And I will try rbd persistent cache feature for that the client is 
sensitive to latency.


P.S. crimson can be used in production now or not ?

On 12/16/21 3:53 AM, Mark Nelson wrote:
FWIW, we ran single OSD, iodepth=1 O_DSYNC write tests against classic 
and crimson bluestore OSDs in our Q3 crimson slide deck. You can see 
the results starting on slide 32 here:



https://docs.google.com/presentation/d/1eydyAFKRea8n-VniQzXKW8qkKM9GLVMJt2uDjipJjQA/edit#slide=id.gf880cf6296_1_73 




That was with the OSD restricted to 2 cores, but for these tests it 
shouldn't really matter.  Also keep in mind that the fio client was on 
localhost as well.  Note that Crimson is less efficient than the 
classic OSD in this test (while being more efficient in other tests) 
because the reactor is working in a tight loop to reduce latency and 
since the OSD isn't doing a ton of IO that ends up dominating in terms 
of CPU usage.  Seastar provides an option to have the reactor be a bit 
more lazy that lowers idle CPU consumption but we don't utilize it yet.



Running with replication across mulitple OSDs (that requires round 
trips to mulitple replicas) does make this tougher to do well on a 
real cluster.  I suspect that long term crimson should be better at 
this kind of workload vs classic, but with synchronous replication 
we're always going to be fighting against the slowest link.



Mark

On 12/15/21 12:44 PM, Marc wrote:
Is this not just inherent to SDS? And wait for the new osd code, I 
think they are working on it.


https://yourcmc.ru/wiki/Ceph_performance



m-seqwr-004k-001q-001j: (groupid=0, jobs=1): err= 0: pid=46: Wed Dec 15
14:05:32 2021
    write: IOPS=794, BW=3177KiB/s (3254kB/s)(559MiB/180002msec); 0 zone
resets
      slat (usec): min=4, max=123, avg=22.30, stdev= 9.18
      clat (usec): min=630, max=16977, avg=1232.89, stdev=354.67
   lat (usec): min=639, max=17009, avg=1255.19, stdev=358.99
      clat percentiles (usec):
   |  1.00th=[  709],  5.00th=[  775], 10.00th=[  824],
20.00th=[  906],
   | 30.00th=[ 1074], 40.00th=[ 1172], 50.00th=[ 1237], 60.00th=[
1303],
   | 70.00th=[ 1369], 80.00th=[ 1450], 90.00th=[ 1565], 95.00th=[
1663],
   | 99.00th=[ 2606], 99.50th=[ 3261], 99.90th=[ 3785], 99.95th=[
3949],
   | 99.99th=[ 6718]
     bw (  KiB/s): min= 1928, max= 5048, per=100.00%, avg=3179.54,
stdev=588.79, samples=360
     iops    : min=  482, max= 1262, avg=794.76, stdev=147.20,
samples=360
    lat (usec)   : 750=2.98%, 1000=22.41%
    lat (msec)   : 2=73.38%, 4=1.18%, 10=0.04%, 20=0.01%
    cpu  : usr=2.69%, sys=1.78%, ctx=145218, majf=0, minf=2
    IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
  >=64=0.0%
   submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
  >=64=0.0%
   complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
  >=64=0.0%
   issued rwts: total=0,142985,0,0 short=0,0,0,0 dropped=0,0,0,0
   latency   : target=0, window=0, percentile=100.00%, depth=1


Parts of the OSD' perf status:

   "state_io_done_lat": {
      "avgcount": 151295,
      "sum": 0.336297058,
      "avgtime": 0.0
      },
      "state_kv_queued_lat": {
      "avgcount": 151295,
      "sum": 18.812333051,
      "avgtime": 0.000124342
      },
      "state_kv_commiting_lat": {
      "avgcount": 151295,
      "sum": 64.555436175,
      "avgtime": 0.000426685
      },
      "state_kv_done_lat": {
      "avgcount": 151295,
      "sum": 0.130403628,
      "avgtime": 0.00861
      },
      "state_deferred_queued_lat": {
      "avgcount": 148,
      "sum": 215.726286547,
      "avgtime": 1.457610044
      },

... ...

      "op_w_latency": {
      "avgcount": 151133,
      "sum": 130.134246667,
      "avgtime": 0.000861057
      },
      "op_w_process_latency": {
      "avgcount": 151133,
      "sum": 125.301196872,
      "avgtime": 0.000829079
      },
      "op_w_prepare_latency": {
      "avgcount": 151133,
      "sum": 29.892687947,
      "avgtime": 0.000197790
      },

Is it reasonable for the benchmark test case?  And how to improve it?
It's really NOT friendly for single thread.


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le.

[ceph-users] Re: Where do I find information on the release timeline for quincy?

2021-12-22 Thread norman.kern

Joshua,

Quincy should release in March 2022, You can find the release cycle and
standards from https://docs.ceph.com/en/latest/releases/general/


Norman

Best regards

On 12/22/21 9:37 PM, Joshua West wrote:

Where do I find information on the release timeline for quincy?
I learned a lesson some time ago with regard to building from source
and accidentally upgrading my cluster to the dev branch. whoops.

Just wondering if there is a published timeline on the next major
release, so I can figure out my game plan from here.

Joshua West
~Small Cluster Hobby User
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: min_size ambiguity

2021-12-22 Thread norman.kern

Chad,

As the document noted,  min_size means "Minimum number of replicas to
serve the request",  so you can't read when number of PGs below min_size.


Norman

Best regards

On 12/17/21 10:59 PM, Chad William Seys wrote:

ill open an issue to h

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Cephadm is stable or not in product?

2022-03-07 Thread norman.kern

Dear Ceph folks,

Anyone is using cephadm in product(Version: Pacific)? I found several bugs on 
it and
I really doubt it.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm is stable or not in product?

2022-03-09 Thread norman.kern

Martin,

Thanks for your reply. I watch your video in the website, and have a try
with croit. It's a very good product at first sight.

Much better than ceph's default dashboard. Good job :).

On 3/8/22 2:26 PM, Martin Verges wrote:

Some say it is, some say it's not.
Every time I try it, it's buggy as hell and I can destroy my test clusters
with ease. That's why I still avoid it. But as you can see in my signature,
I am biased ;).

--
Martin Verges
Managing director

Mobile: +49 174 9335695  | Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx


On Tue, 8 Mar 2022 at 05:18, norman.kern  wrote:


Dear Ceph folks,

Anyone is using cephadm in product(Version: Pacific)? I found several bugs
on it and
I really doubt it.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Scrubbing

2022-03-09 Thread norman.kern

Ray,

Can you  provide more information about your cluster(hardware and
software configs)?

On 3/10/22 7:40 AM, Ray Cunningham wrote:

  make any difference. Do

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Scrubbing

2022-03-10 Thread norman.kern

Ray,

You can use node-exporter+prom+grafana  to collect the load of CPUs
statistics. You can use uptime command to get the current statistics.

On 3/10/22 10:51 PM, Ray Cunningham wrote:

From:

osd_scrub_load_threshold
The normalized maximum load. Ceph will not scrub when the system load (as 
defined by getloadavg() / number of online CPUs) is higher than this number. 
Default is 0.5.

Does anyone know how I can run getloadavg() / number of online CPUs so I can 
see what our load is? Is that a ceph command, or an OS command?

Thank you,
Ray


-Original Message-
From: Ray Cunningham
Sent: Thursday, March 10, 2022 7:59 AM
To: norman.kern 
Cc: ceph-users@ceph.io
Subject: RE: [ceph-users] Scrubbing


We have 16 Storage Servers each with 16TB HDDs and 2TB SSDs for DB/WAL, so we 
are using bluestore. The system is running Nautilus 14.2.19 at the moment, with 
an upgrade scheduled this month. I can't give you a complete ceph config dump 
as this is an offline customer system, but I can get answers for specific 
questions.

Off the top of my head, we have set:

osd_max_scrubs 20
osd_scrub_auto_repair true
osd _scrub_load_threashold 0.6
We do not limit srub hours.

Thank you,
Ray




-Original Message-
From: norman.kern 
Sent: Wednesday, March 9, 2022 7:28 PM
To: Ray Cunningham 
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Scrubbing

Ray,

Can you  provide more information about your cluster(hardware and software 
configs)?

On 3/10/22 7:40 AM, Ray Cunningham wrote:

   make any difference. Do

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Scrubbing

2022-03-10 Thread norman.kern

Ray,

Do you known the IOPS/BW of the cluster?  The 16TB HDD is more suitable
for cold data, If the clients' bw/iops is too big, you can never  finish
the scrub.

And if you adjust the priority, it will have a great impact to the clients.

On 3/10/22 9:59 PM, Ray Cunningham wrote:

We have 16 Storage Servers each with 16TB HDDs and 2TB SSDs for DB/WAL, so we 
are using bluestore. The system is running Nautilus 14.2.19 at the moment, with 
an upgrade scheduled this month. I can't give you a complete ceph config dump 
as this is an offline customer system, but I can get answers for specific 
questions.

Off the top of my head, we have set:

osd_max_scrubs 20
osd_scrub_auto_repair true
osd _scrub_load_threashold 0.6
We do not limit srub hours.

Thank you,
Ray




-Original Message-----
From: norman.kern 
Sent: Wednesday, March 9, 2022 7:28 PM
To: Ray Cunningham 
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Scrubbing

Ray,

Can you  provide more information about your cluster(hardware and software 
configs)?

On 3/10/22 7:40 AM, Ray Cunningham wrote:

   make any difference. Do

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] which cdn tool for rgw in production

2022-04-18 Thread norman.kern

Hi guys,

I want to add a CDN service for my rgws, and provide a url without
authentication.

I have a test on openresty, but I'm not sure it it suitable for
production. Which tool do you use in production?

Thanks.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Is Ceph with rook ready for production?

2022-07-04 Thread norman.kern

I have used rook for a year, it's really easy to manage the ceph
cluster, but I didn't use it later, because ceph cluster

is complicated enough, I don't want to make it more complicated with k8s.

If you want to use ceph , you have to know ceph+k8s+rook, each module
can cause problems.


On 7/4/22 2:15 PM, Szabo, Istvan (Agoda) wrote:

Hi,

Is ceph with rook ready for production?
Not really clear based on this documentation because of the development 
section:  https://docs.ceph.com/en/octopus/mgr/rook/ and 
https://docs.ceph.com/en/octopus/dev/kubernetes/#kubernetes-dev

"This is not official user documentation for setting up production Ceph clusters 
with Kubernetes. It is aimed at developers who want to hack on Ceph in Kubernetes."

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---



This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io