Hello all,

I'm running a from-scratch cluster on AWS EC2. I have an external table 
(partitioned) defined with data on S3. I'm able to query this table and receive 
results to the console with a simple select * statement:

--------------------------------------------------------------------------------------------------------
hive> set hive.execution.engine=tez;
hive> select * from external_table where partition_1='1' and partition_2='2';
[correct results returned]
--------------------------------------------------------------------------------------------------------

Running a query that requires Tez doesn't return the results to the console:

--------------------------------------------------------------------------------------------------------
hive> set hive.execution.engine=tez;
hive> select count(*) from external_table where partition_1='1' and 
partition_2='2';
Status: Running (Executing on YARN cluster with App id 
application_1572972524483_0012)

OK
+------+
| _c0 |
+------+
+------+
No rows selected (8.902 seconds)
--------------------------------------------------------------------------------------------------------

However, if I dig in the logs and on the filesystem, I can find the results 
from that query:

--------------------------------------------------------------------------------------------------------
(yarn.resourcemanager.log) 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root 
OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS 
APPID=application_1572972524483_0022 
CONTAINERID=container_1572972524483_0022_01_000002 RESOURCE=<memory:1024, 
vCores:1> QUEUENAME=default
(container_folder/syslog_attempt) [TezChild] |exec.FileSinkOperator|: New Final 
Path: FS file:/tmp/[REALLY LONG FILE PATH]/000000_0
[root #] cat /tmp/[REALLY LONG FILE PATH]/000000_0
SEQ"org.apache.hadoop.io.BytesWritableorg.apache.hadoop.io.Textl▒ꩇ1som}▒▒j¹▒ 
2060
--------------------------------------------------------------------------------------------------------

2060 is the correct count for the partition.

Now, oddly enough, I'm able to get the results from the application if I insert 
overwrite directory on HDFS:

--------------------------------------------------------------------------------------------------------
hive> set hive.execution.engine=tez;
hive> INSERT OVERWRITE DIRECTORY '/tmp/local_out' select count(*) from 
external_table where partition_1='1' and partition_2='2';
[root #] hdfs dfs -cat /tmp/local_out/000000_0
2060
--------------------------------------------------------------------------------------------------------

However, attempting to insert overwrite local directory fails:

--------------------------------------------------------------------------------------------------------
hive> set hive.execution.engine=tez;
hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/local_out' select count(*) from 
external_table where partition_1='1' and partition_2='2';
[root #] cat /tmp/local_out/000000_0
cat: /tmp/local_out/000000_0: No such file or directory
--------------------------------------------------------------------------------------------------------

If I cat the container result file for this query, it's only the number, no 
class name or special characters:

--------------------------------------------------------------------------------------------------------
[root #] cat /tmp/[REALLY LONG FILE PATH]/000000_0
2060
--------------------------------------------------------------------------------------------------------

The only out-of-place log message I can find comes from the YARN 
ResourceManager log:

--------------------------------------------------------------------------------------------------------
(yarn.resourcemanager.log) INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root 
OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS 
APPID=application_1572972524483_0023 
CONTAINERID=container_1572972524483_0023_01_000004 RESOURCE=<memory:1024, 
vCores:1> QUEUENAME=default
(yarn.resourcemanager.log) WARN 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root IP=NMIP 
OPERATION=AM Released Container TARGET=Scheduler RESULT=FAILURE 
DESCRIPTION=Trying to release container not owned by app or with invalid id. 
PERMISSIONS=Unauthorized access or invalid container 
APPID=application_1572972524483_0023 
CONTAINERID=container_1572972524483_0023_01_000004
--------------------------------------------------------------------------------------------------------

I've also tried creating a table and inserting data into it. The table creates 
just fine but when I tried to insert data, it throws an error:

--------------------------------------------------------------------------------------------------------
hive> set hive.execution.engine=tez;
hive> insert into test_table (test_col) values ('blah'), ('blahblah');
Query ID = root_20191106172949_5301b127-7219-46d1-8fd2-dc80ca7e96ee
Total jobs = 1
Launching Job 1 out of 1
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1573060958692_0001_1_00, 
diagnostics=[Vertex vertex_1573060958692_0001_1_00 [Map 1] killed/failed due 
to:ROOT_INPUT_INIT_FAILURE, Vertex Input: _dummy_table initializer failed, 
vertex=vertex_1573060958692_0001_1_00 [Map 1], 
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does 
not exist: 
file:/tmp/root/a9b76683-8e19-446a-be74-7a5daedf70e5/hive_2019-11-06_17-29-49_820_224977921325223208-2/dummy_path
        at 
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:332)
        at 
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:274)
        at 
org.apache.hadoop.hive.shims.Hadoop23Shims$1.listStatus(Hadoop23Shims.java:134)
        at 
org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:217)
        at 
org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:76)
        at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:321)
        at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:444)
        at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:564)
        at 
org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateOldSplits(MRInputHelpers.java:488)
        at 
org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateInputSplitsToMem(MRInputHelpers.java:337)
        at 
org.apache.tez.mapreduce.common.MRInputAMSplitGenerator.initialize(MRInputAMSplitGenerator.java:122)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)
        at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
--------------------------------------------------------------------------------------------------------

My versions are as follows:

Hadoop 3.2.1
Hive 3.1.2
Tez 0.9.2

Any help is much appreciated!

Reply via email to