You're missing most all of the important bits. What the osds in your cluster look like, your tree, and your cache pool settings.
ceph df ceph osd df ceph osd tree ceph osd pool get cephfs_cache all You have your writeback cache on 3 nvme drives. It looks like you have 1.6TB available between them for the cache. I don't know the behavior of a writeback cache tier on cephfs for large files, but I would guess that it can only hold full files and not flush partial files. That would mean your cache needs to have enough space for any file being written to the cluster. In this case a 1.3TB file with 3x replication would require 3.9TB (more than double what you have available) of available space in your writeback cache. There are very few use cases that benefit from a cache tier. The docs for Luminous warn as much. What is your goal by implementing this cache? If the answer is to utilize extra space on the nvmes, then just remove it and say thank you. The better use of nvmes in that case are as a part of the bluestore stack and give your osds larger DB partitions. Keeping your metadata pool on nvmes is still a good idea. On Thu, Oct 5, 2017, 7:45 PM Shawfeng Dong <s...@ucsc.edu> wrote: > Dear all, > > We just set up a Ceph cluster, running the latest stable release Ceph > v12.2.0 (Luminous): > # ceph --version > ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous > (rc) > > The goal is to serve Ceph filesystem, for which we created 3 pools: > # ceph osd lspools > 1 cephfs_data,2 cephfs_metadata,3 cephfs_cache, > where > * cephfs_data is the data pool (36 OSDs on HDDs), which is erased-coded; > * cephfs_metadata is the metadata pool > * cephfs_cache is the cache tier (3 OSDs on NVMes) for cephfs_data. The > cache-mode is writeback. > > Everything had worked fine, until today when we tried to copy a 1.3TB file > to the CephFS. We got the "No space left on device" error! > > 'ceph -s' says some OSDs are full: > # ceph -s > cluster: > id: e18516bf-39cb-4670-9f13-88ccb7d19769 > health: HEALTH_ERR > full flag(s) set > 1 full osd(s) > 1 pools have many more objects per pg than average > > services: > mon: 3 daemons, quorum pulpo-admin,pulpo-mon01,pulpo-mds01 > mgr: pulpo-mds01(active), standbys: pulpo-admin, pulpo-mon01 > mds: pulpos-1/1/1 up {0=pulpo-mds01=up:active} > osd: 39 osds: 39 up, 39 in > flags full > > data: > pools: 3 pools, 2176 pgs > objects: 347k objects, 1381 GB > usage: 2847 GB used, 262 TB / 265 TB avail > pgs: 2176 active+clean > > io: > client: 19301 kB/s rd, 2935 op/s rd, 0 op/s wr > > And indeed the cache pool is full: > # rados df > POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND > DEGRADED RD_OPS RD > WR_OPS WR > cephfs_cache 1381G 355385 0 710770 0 0 > 0 10004954 15 > 22G 1398063 1611G > cephfs_data 0 0 0 0 0 0 > 0 0 > 0 0 0 > cephfs_metadata 8515k 24 0 72 0 0 > 0 3 3 > 072 3953 10541k > > total_objects 355409 > total_used 2847G > total_avail 262T > total_space 265T > > However, the data pool is completely empty! So it seems that data has only > been written to the cache pool, but not written back to the data pool. > > I am really at a loss whether this is due to a setup error on my part, or > a Luminous bug. Could anyone shed some light on this? Please let me know if > you need any further info. > > Best, > Shaw > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com