> another case of a query hangin' in v2.1.0. I'm not sure that's a hang. If you can repro this, can you please do a jstack while it is "hanging" (like a jstack of hiveserver2 or cli)?
I have a theory that you're hitting a slow path in HDFS remote read because of the following stacktrace. at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:700) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.hadoop.io.SequenceFile$Reader.readBlock(SequenceFile.java:2101) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2508) at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:82) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:484) Notice that it is firing off a 4 byte HDFS read call without buffering - this is probably because Compression is usually the natural buffering mode for the SequenceFiles. The uncompressed data might be triggering a 4 byte remote read directly, which would be an extremely slow way to read data out of HDFS. > * so empty result expected. The empty result is the worst-case scenario for the FetchTask optimization, because it means the CLI tool deserializes every single row in a single thread. ORC which has internal indexes is somewhat safe against that. > set hive.fetch.task.conversion=none; > but not sure its the right thing to set globally just yet. No, it's not - the right setting is to tune the size threshold for that optimization. hive.fetch.task.conversion.threshold; Setting that to <=1G bytes can be a win, while setting that to -1 can cause so much pain. Cheers, Gopal