Re: [ceph-users] bluestore compression enabled but no data compressed

Frank Schilder Sat, 16 Mar 2019 06:05:26 -0700

Yes:

Two days ago I did a complete re-deployment of ceph from my test cluster to a 
production cluster. As part this re-deployment I also added the following to my 
ceph.conf:

[osd]
bluestore compression mode = aggressive
bluestore compression min blob size hdd = 262144

Apparently, cephfs and rbd clients do not provide hints to ceph about blob 
sizes, so for these apps bluestore will (at least in current versions) always 
compress blobs of size bluestore_compression_min_blob_size_hdd. The best 
achievable compression ratio is 
bluestore_compression_min_blob_size_hdd/bluestore_min_alloc_size_hdd.

I did not want to reduce the default of bluestore_min_alloc_size_hdd = 64KB and 
only increased bluestore_compression_min_blob_size_hdd to 
4*bluestore_compression_min_blob_size_hdd, which means that for large 
compressible files the best ratio is 4. I tested this with a 1TB file of zeros 
and it works.

I'm not sure what the performance impact and rocksDB overhead implications of 
all bluestore options are. Pretty much nothing of this is easy to find in 
documentations. I will keep watching how the above works in reality and, maybe, 
make some more advanced experiments later.

Best regards,

=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: ceph-users <ceph-users-boun...@lists.ceph.com> on behalf of Ragan, Tj 
(Dr.) <tj.ra...@leicester.ac.uk>
Sent: 14 March 2019 11:22:07
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] bluestore compression enabled but no data compressed

Hi Frank,

Did you ever get the 0.5 compression ratio thing figured out?

Thanks
-TJ Ragan

On 23 Oct 2018, at 16:56, Igor Fedotov 
<ifedo...@suse.de<mailto:ifedo...@suse.de>> wrote:

Hi Frank,

On 10/23/2018 2:56 PM, Frank Schilder wrote:
Dear David and Igor,

thank you very much for your help. I have one more question about chunk sizes 
and data granularity on bluestore and will summarize the information I got on 
bluestore compression at the end.

1) Compression ratio
---------------------------

Following Igor's explanation, I tried to understand the numbers for 
compressed_allocated and compressed_original and am somewhat stuck with 
figuring out how bluestore arithmetic works. I created a 32GB file of zeros 
using dd with write size bs=8M on a cephfs with

    ceph.dir.layout="stripe_unit=4194304 stripe_count=1 object_size=4194304 
pool=con-fs-data-test"

The data pool is an 8+2 erasure coded pool with properties

    pool 37 'con-fs-data-test' erasure size 10 min_size 9 crush_rule 11 
object_hash rjenkins pg_num 900 pgp_num 900 last_change 9970 flags 
hashpspool,ec_overwrites stripe_width 32768 compression_mode aggressive 
application cephfs

As I understand EC pools, a 4M object is split into 8x0.5M data shards that are 
stored together with 2x0.5M coding shards on one OSD each. So, I would expect a 
full object write to put a 512K chunk on each OSD in the PG. Looking at some 
config options of one of the OSDs, I see:

    "bluestore_compression_max_blob_size_hdd": "524288",
    "bluestore_compression_min_blob_size_hdd": "131072",
    "bluestore_max_blob_size_hdd": "524288",
    "bluestore_min_alloc_size_hdd": "65536",

>From this, I would conclude that the largest chunk size is 512K, which also 
>equals compression_max_blob_size. The minimum allocation size is 64K for any 
>object. What I would expect now is, that the full object writes to cephfs 
>create chunk sizes of 512M per OSD in the PG, meaning that with an all-zero 
>file I should observe a compresses_allocated ratio of 64K/512K=0.125 instead 
>of the 0.5 reported below. It looks like that chunks of 128K are written 
>instead of 512K. I'm happy with the 64K granularity, but the observed maximum 
>chunk size seems a factor of 4 too small.

Where am I going wrong, what am I overlooking?
Please note how selection whether to use compression_max_blob_size or 
compression_min_blob_size is performed.

Max blob size threshold is mainly for objects that are tagged with flags 
indicating non-random access, e.g. sequential read and/or write, immutable, 
append-only etc.
Here is how it's determined in the code:
  if ((alloc_hints & CEPH_OSD_ALLOC_HINT_FLAG_SEQUENTIAL_READ) &&
      (alloc_hints & CEPH_OSD_ALLOC_HINT_FLAG_RANDOM_READ) == 0 &&
      (alloc_hints & (CEPH_OSD_ALLOC_HINT_FLAG_IMMUTABLE |
                      CEPH_OSD_ALLOC_HINT_FLAG_APPEND_ONLY)) &&
      (alloc_hints & CEPH_OSD_ALLOC_HINT_FLAG_RANDOM_WRITE) == 0) {
    dout(20) << __func__ << " will prefer large blob and csum sizes" << dendl;

This is done to minimize the overhead during future random access since it will 
need full blob decompression.
Hence min blob size is used for regular random I/O. Which is probably you case 
as well.
You can check bluestore log (once its level is raised to 20) to confirm this. 
E.g. by looking for the following line output:
  dout(20) << __func__ << " prefer csum_order " << wctx->csum_order
           << " target_blob_size 0x" << std::hex << wctx->target_blob_size
           << std::dec << dendl;

So you can simply increase bluestore_compression_min_blob_size_hdd if you want 
longer compressed chunks.
With the above-mentioned penalty on subsequent access though.

2) Bluestore compression configuration
---------------------------------------------------

If I understand David correctly, pool and OSD settings do *not* override each 
other, but are rather *combined* into a resulting setting as follows. Let

    0 - (n)one
    1 - (p)assive
    2 - (a)ggressive
    3 - (f)orce

    ? - (u)nset

be the 4+1 possible settings of compression modes with numeric values assigned 
as shown. Then, the resulting numeric compression mode for data in a pool on a 
specific OSD is

    res_compr_mode = min(mode OSD, mode pool)

or in form of a table:

              pool
         | n  p  a  f  u
       --+--------------
       n | n  n  n  n  n
    O  p | n  p  p  p  ?
    S  a | n  p  a  a  ?
    D  f | n  p  a  f  ?
       u | n  ?  ?  ?  u

which would allow for the flexible configuration as mentioned by David below.

I'm actually not sure if I can confirm this. I have some pools where 
compression_mode is not set and which reside on separate OSDs with compression 
enabled, yet there is compressed data on these OSDs. Wondering if I polluted my 
test with "ceph config set bluestore_compression_mode aggressive" that I 
executed earlier, or if my above interpretation is still wrong. Does the 
setting issued with "ceph config set bluestore_compression_mode aggressive" 
apply to pools with 'compression_mode' not set on the pool (see question marks 
in table above, what is the resulting mode?).

What I would like to do is enable compression on all OSDs, enable compression 
on all data pools and disable compression on all meta data pools. Data and meta 
data pools might share OSDs in the future. The above table says I should be 
able to do just that by being explicit.

Many thanks again and best regards,
Will try to answer later.

Thanks,
Igor

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] bluestore compression enabled but no data compressed

Reply via email to