Sunil Kumar created SPARK-18155:
-----------------------------------
Summary: CLONE - HDFSMetadataLog should not leak CRC files
Key: SPARK-18155
URL: https://issues.apache.org/jira/browse/SPARK-18155
Project: Spark
Issue Type: Sub-task
Components: Streaming
Reporter: Sunil Kumar
When HDFSMetadataLog uses a log directory on a filesystem other than HDFS (i.e.
NFS or the driver node's local filesystem), the class leaves orphan checksum
(CRC) files in the log directory. The files have names that follow the pattern
"..[long UUID hex string].tmp.crc". These files exist because HDFSMetaDataLog
renames other temporary files without renaming the corresponding checksum
files. There is one CRC file per batch, so the directory fills up quite quickly.
I'm not certain, but this problem might also occur on certain versions of the
HDFS APIs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]