You're missing most all of the important bits. What the osds in your
cluster look like, your tree, and your cache pool settings.

ceph df
ceph osd df
ceph osd tree
ceph osd pool get cephfs_cache all

You have your writeback cache on 3 nvme drives. It looks like you have
1.6TB available between them for the cache. I don't know the behavior of a
writeback cache tier on cephfs for large files, but I would guess that it
can only hold full files and not flush partial files. That would mean your
cache needs to have enough space for any file being written to the cluster.
In this case a 1.3TB file with 3x replication would require 3.9TB (more
than double what you have available) of available space in your writeback
cache.

There are very few use cases that benefit from a cache tier. The docs for
Luminous warn as much. What is your goal by implementing this cache? If the
answer is to utilize extra space on the nvmes, then just remove it and say
thank you. The better use of nvmes in that case are as a part of the
bluestore stack and give your osds larger DB partitions. Keeping your
metadata pool on nvmes is still a good idea.

On Thu, Oct 5, 2017, 7:45 PM Shawfeng Dong <s...@ucsc.edu> wrote:

> Dear all,
>
> We just set up a Ceph cluster, running the latest stable release Ceph
> v12.2.0 (Luminous):
> # ceph --version
> ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous
> (rc)
>
> The goal is to serve Ceph filesystem, for which we created 3 pools:
> # ceph osd lspools
> 1 cephfs_data,2 cephfs_metadata,3 cephfs_cache,
> where
> * cephfs_data is the data pool (36 OSDs on HDDs), which is erased-coded;
> * cephfs_metadata is the metadata pool
> * cephfs_cache is the cache tier (3 OSDs on NVMes) for cephfs_data. The
> cache-mode is writeback.
>
> Everything had worked fine, until today when we tried to copy a 1.3TB file
> to the CephFS.  We got the "No space left on device" error!
>
> 'ceph -s' says some OSDs are full:
> # ceph -s
>   cluster:
>     id:     e18516bf-39cb-4670-9f13-88ccb7d19769
>     health: HEALTH_ERR
>             full flag(s) set
>             1 full osd(s)
>             1 pools have many more objects per pg than average
>
>   services:
>     mon: 3 daemons, quorum pulpo-admin,pulpo-mon01,pulpo-mds01
>     mgr: pulpo-mds01(active), standbys: pulpo-admin, pulpo-mon01
>     mds: pulpos-1/1/1 up  {0=pulpo-mds01=up:active}
>     osd: 39 osds: 39 up, 39 in
>          flags full
>
>   data:
>     pools:   3 pools, 2176 pgs
>     objects: 347k objects, 1381 GB
>     usage:   2847 GB used, 262 TB / 265 TB avail
>     pgs:     2176 active+clean
>
>   io:
>     client:   19301 kB/s rd, 2935 op/s rd, 0 op/s wr
>
> And indeed the cache pool is full:
> # rados df
> POOL_NAME       USED  OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND
> DEGRADED RD_OPS   RD
>     WR_OPS  WR
> cephfs_cache    1381G  355385      0 710770                  0       0
>     0 10004954 15
> 22G 1398063  1611G
> cephfs_data         0       0      0      0                  0       0
>     0        0
>   0       0      0
> cephfs_metadata 8515k      24      0     72                  0       0
>     0        3  3
> 072    3953 10541k
>
> total_objects    355409
> total_used       2847G
> total_avail      262T
> total_space      265T
>
> However, the data pool is completely empty! So it seems that data has only
> been written to the cache pool, but not written back to the data pool.
>
> I am really at a loss whether this is due to a setup error on my part, or
> a Luminous bug. Could anyone shed some light on this? Please let me know if
> you need any further info.
>
> Best,
> Shaw
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to