Nick Dimiduk created HDFS-9803:
----------------------------------
Summary: Proactively refresh ShortCircuitCache entries to avoid
latency spikes
Key: HDFS-9803
URL: https://issues.apache.org/jira/browse/HDFS-9803
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: Nick Dimiduk
My region server logs are flooding with messages like
"SecretManager$InvalidToken: access control error while attempting to set up
short-circuit access to <hdfs-file-path> ... is expired". These logs correspond
with responseTooSlow WARNings from the region server.
{noformat}
2016-01-19 22:10:14,432 INFO [B.defaultRpcServer.handler=4,queue=1,port=16020]
shortcircuit.ShortCircuitCache: ShortCircuitCache(0x71bdc547): could not load
1074037633_BP-1145309065-XXX-1448053136416 due to InvalidToken exception.
org.apache.hadoop.security.token.SecretManager$InvalidToken: access control
error while attempting to set up short-circuit access to <hfile path> token
with block_token_identifier (expiryDate=1453194430724, keyId=1508822027,
userId=hbase, blockPoolId=BP-1145309065-XXX-1448053136416, blockId=1074037633,
access modes=[READ]) is expired.
at
org.apache.hadoop.hdfs.BlockReaderFactory.requestFileDescriptors(BlockReaderFactory.java:591)
at
org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:490)
at
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:782)
at
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:716)
at
org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:422)
at
org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:333)
at
org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:618)
at
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:844)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:896)
at java.io.DataInputStream.read(DataInputStream.java:149)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:678)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1372)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1591)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1470)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:437)
...
{noformat}
A potential solution could be to have a background thread that makes a best
effort to proactively refreshes tokens in the cache before they expire, so as
to minimize latency impact on the critical path.
Thanks to [~cnauroth] for providing an explaination and suggesting a solution
over on the [user
list|http://mail-archives.apache.org/mod_mbox/hadoop-user/201601.mbox/%3CCANZa%3DGt%3Dhvuf3fyOJqf-jdpBPL_xDknKBcp7LmaC-YUm0jDUVg%40mail.gmail.com%3E].
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)