Joe McDonnell created HDFS-14308:
------------------------------------
Summary: DFSStripedInputStream should implement unbuffer()
Key: HDFS-14308
URL: https://issues.apache.org/jira/browse/HDFS-14308
Project: Hadoop HDFS
Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Joe McDonnell
Attachments: ec_heap_dump.png
Some users of HDFS cache opened HDFS file handles to avoid repeated roundtrips
to the NameNode. For example, Impala caches up to 20,000 HDFS file handles by
default. Recent tests on erasure coded files show that the open file handles
can consume a large amount of memory when not in use.
For example, here is output from Impala's JMX endpoint when 608 file handles
are cached
{noformat}
{
"name": "java.nio:type=BufferPool,name=direct",
"modelerType": "sun.management.ManagementFactoryHelper$1",
"Name": "direct",
"TotalCapacity": 1921048960,
"MemoryUsed": 1921048961,
"Count": 633,
"ObjectName": "java.nio:type=BufferPool,name=direct"
},{noformat}
This shows direct buffer memory usage of 3MB per DFSStripedInputStream.
Attached is output from Eclipse MAT showing that the direct buffers come from
DFSStripedInputStream objects.
To support caching file handles on erasure coded files, DFSStripedInputStream
should implement the unbuffer() call. See HDFS-7694. "unbuffer()" is intended
to move an input stream to a lower memory state to support these caching use
cases. Both Impala and HBase call unbuffer() when a file handle is being cached
and potentially unused for significant chunks of time.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]