** Description changed:

  [Impact]
  
  For a certain type of workload, the bluefs might never compact the log
  file, which would cause the bluefs log file slowly grows to a huge size
  (some bigger than 1TB for a 1.5T device).
  
  There are more details in the bluefs perf counters when this issue happened:
  e.g.
  "bluefs": {
  "gift_bytes": 811748818944,
  "reclaim_bytes": 0,
  "db_total_bytes": 888564350976,
  "db_used_bytes": 867311747072,
  "wal_total_bytes": 0,
  "wal_used_bytes": 0,
  "slow_total_bytes": 0,
  "slow_used_bytes": 0,
  "num_files": 11,
  "log_bytes": 866545131520,
  "log_compactions": 0,
  "logged_bytes": 866542977024,
  "files_written_wal": 2,
  "files_written_sst": 3,
  "bytes_written_wal": 32424281934,
  "bytes_written_sst": 25382201
  }
  
  This bug could eventually cause osd crash and failed to restart as it 
couldn't get through the bluefs replay phase during boot time.
  We might see below log when trying to restart the osd:
  bluefs mount failed to replay log: (5) Input/output error
  
  As we can see the log_compactions is 0, which means it's never compacted
  and the log file size(log_bytes) is already 800+G. After the compaction,
  the log file size would need to be reduced to around 1G.
  
  [Test Case]
  
  Deploy a test ceph cluster (Luminous 12.2.13 which has the bug) and
  drive I/O. The compaction doesn't get triggered often when most I/O are
  reads. So fill up the cluster initially with lots of writes and then
  start reading heavy reads (no writes). Then the problem should occur.
  Smaller sized OSDs are OK as we'are only interested filling up the OSD
  and grow the bluefs log.
  
  [Where problems could occur]
  
  This fix has been part of all upstream releases since Mimic, so there's been 
quite good "runtime".
  The changes ensure that compaction happens more often. But that's not going 
to cause any problem.
  I can't see any real problems.
  
  [Other Info]
-  - It's only needed for Luminous (Bionic). All new releases since have this 
already.
-  - Upstream PR: https://github.com/ceph/ceph/pull/17354
+  - It's only needed for Luminous (Bionic). All new releases since have this 
already.
+  - Upstream master PR: https://github.com/ceph/ceph/pull/17354
+  - Upstream Luminous PR: https://github.com/ceph/ceph/pull/34876/files

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1914911

Title:
  [SRU] bluefs doesn't compact log file

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1914911/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to