Anton Vinogradov created IGNITE-28843:
-----------------------------------------

             Summary: Improve diagnostics when snapshot/dump directory creation 
fails
                 Key: IGNITE-28843
                 URL: https://issues.apache.org/jira/browse/IGNITE-28843
             Project: Ignite
          Issue Type: Task
            Reporter: Anton Vinogradov
            Assignee: Anton Vinogradov


When snapshot or dump directory creation fails, Ignite reports only the
target path with no OS-level cause, e.g.:

  class org.apache.ignite.IgniteCheckedException: Dump directory can't be
  created: /opt/ignite/nvme5/snapshot/dump_.../db/cell_2_node_2/
  cache-replication.replications_v1

The real reason (permissions, no space, read-only FS, a file where a
directory is expected, missing mount) is lost because the code uses
java.io.File.mkdirs(), which returns a boolean and discards the cause.
mkdirs() also returns false when the directory already exists, which can
produce a misleading error on retry.

Fix: create directories via NIO Files.createDirectories(Path). It is
idempotent for existing directories and throws a typed IOException
(AccessDeniedException; FileSystemException "No space left on device";
ReadOnlyFileSystemException; NotDirectoryException / FileAlreadyExists-
Exception; NoSuchFileException) that carries the OS reason. Include the
exception class and message in the thrown Ignite exception and attach the
original exception as the cause.

Updated:
- IgniteUtils.ensureDirectory(File, String, IgniteLogger) - shared helper
  used by WAL, binary metadata, maintenance, snapshot and dump config.
- CreateDumpFutureTask.prepare() - the exact spot from the report; now
  routed through ensureDirectory (also removes the inconsistency with
  saveCacheConfigs, which already used ensureDirectory).
- SnapshotFutureTask - temp cache-configuration directory.
- SharedFileTree.mkdir(File, String) - remaining raw-mkdirs throwing site.

Not changed: IgniteUtils.mkdirs(File) keeps its boolean contract - a
boolean cannot carry a reason, and 9 call-sites (3 of them ignoring the
result) rely on it.

Resulting message example:
  Failed to create dump directory: /opt/ignite/nvme5/snapshot/.../
  cell_2_node_2/cache-replication.replications_v1 [reason=Access-
  DeniedException, detail=/opt/ignite/.../cell_2_node_2: Permission denied]

Testing:
IgniteUtilsUnitTest, 3 new cases (plain JUnit, no grid started):
- ensureDirectoryIsIdempotentForExistingDirectory - an existing directory
  must not raise an error (guards the mkdirs()-returns-false regression).
- ensureDirectorySurfacesReasonAndCauseWhenPathComponentIsFile - the
  message contains "reason=" and the cause is an IOException
  (deterministic, independent of permissions or the running user).
- ensureDirectoryReportsAccessDeniedWhenParentNotWritable - a non-writable
  parent yields AccessDeniedException in both the message and the cause.
  Guarded with assumeTrue(POSIX) and an assumeFalse writable-probe so it
  self-skips on non-POSIX filesystems or when running as root.
Verified locally against the compiled class: OK (3 tests).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to