[jira] [Commented] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

Rui Li (JIRA) Mon, 12 Dec 2016 22:46:22 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15744366#comment-15744366
 ]


Rui Li commented on HIVE-13278:
-------------------------------

Hi [~xuefuz], the conclusion is we somehow try to read reduce.xml for map-only 
job, and yes it happens to MR as well. The call path is 
{{HiveOutputFormatImpl.checkOutputSpecs -> Utilities.getMapRedWork}}. The 
reason why HiveOutputFormatImpl needs to get the MapRedWork is it needs to do 
some check on all the FS operators. Since FS only exists at the end of a job, 
my suggestion is we firstly try to get MapWork. If the MapWork has an FS in it, 
it means this is a map-only job so we don't have to look for ReduceWork. But 
[~stakiar] found that some map-only job may not have FS in the MapWork, e.g. 
{{ANALYZE TABLE}}. To have a complete fix, we'll need some flag in the JobConf 
indicating if this is map-only. Or we can use my solution, which solves the 
issue for most cases.

Some special handling for HoS may be needed. For HoS, each map.xml and 
reduce.xml resides in a different path. We can use {{mapred.task.is.map}} to 
determine whether the JobConf is for MapWork or ReduceWork. And then call 
getMapWork or getReduceWork respectively.

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-13278
>                 URL: https://issues.apache.org/jira/browse/HIVE-13278
>             Project: Hive
>          Issue Type: Bug
>         Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>            Reporter: Xin Hao
>            Assignee: Sahil Takiar
>            Priority: Minor
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
>         at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
>         at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
>         at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
>         at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

Reply via email to