Hi guys, I've banged my head on this one all day and i need to surrender. I have a query that hangs (never returns). However, when i turn on logging to DEBUG level it works. I'm stumped. I include here the query, the different query plans (with the only thing different being the log level) and a traceback on the query that hangs.
* the query #!/bin/bash sql=" explain SELECT * FROM raw_logs.fact_www_access_datekey_by_hour WHERE date_key = 20160828 -- partition key and lower(http_string) = 'foo' -- always evaluates to false limit 1 -- if i remove limit 1 it works. I expect an empty result set. ; " #hive -hiveconf hive.root.logger=ERROR,console -e "$sql" >stdout.bad 2>stderr.bad #hangs hive -hiveconf hive.root.logger=DEBUG,console -e "$sql" >stdout.good 2>stderr.good #works * explain plan for the one that hangs (ERROR,console) STAGE DEPENDENCIES: Stage-0 is a root stage STAGE PLANS: Stage: Stage-0 Fetch Operator limit: 1 Processor Tree: TableScan alias: fact_www_access_datekey_by_hour Statistics: Num rows: 144355924 Data size: 158755813572 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (lower(http_string) = 'foo') (type: boolean) Statistics: Num rows: 72177962 Data size: 79377906786 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: payload_year (type: int), payload_month (type: int), payload_day (type: int), payload_time (type: string), payload_gmt_offset (type: string), perf_time_micros (type: int), http_string (type: string), http_type (type: string), http_version (type: string), http_rc (type: int), http_size (type: int), referrer (type: string), user_agent (type: string), bot_flag (type: string), ip_list (type: array<string>), scalar_tags (type: map<string,string>), url_components (type: array<string>), user_id (type: string), lt_session_id (type: string), session_id (type: string), log_name (type: string), log_lineno (type: int), log_line (type: string), web_server (type: string), location_info (type: map<string,string>), dp_main_url_type (type: string), dp_inventory_type (type: string), dp_main_url (type: map<string,array<string>>), dp_tag_url (type: map<string,array<string>>), dp_referrer_url (type: map<string,array<string>>), hour (type: int), 20160828 (type: int) outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29, _col30, _col31 Statistics: Num rows: 72177962 Data size: 79377906786 Basic stats: COMPLETE Column stats: NONE Limit Number of rows: 1 Statistics: Num rows: 1 Data size: 1099 Basic stats: COMPLETE Column stats: NONE ListSink * query plan for the one that works (DEBUG,console) STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: fact_www_access_datekey_by_hour Statistics: Num rows: 144355924 Data size: 158755813572 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (lower(http_string) = 'foo') (type: boolean) Statistics: Num rows: 72177962 Data size: 79377906786 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: payload_year (type: int), payload_month (type: int), payload_day (type: int), payload_time (type: string), payload_gmt_offset (type: string), perf_time_micros (type: int), http_string (type: string), http_type (type: string), http_version (type: string), http_rc (type: int), http_size (type: int), referrer (type: string), user_agent (type: string), bot_flag (type: string), ip_list (type: array<string>), scalar_tags (type: map<string,string>), url_components (type: array<string>), user_id (type: string), lt_session_id (type: string), session_id (type: string), log_name (type: string), log_lineno (type: int), log_line (type: string), web_server (type: string), location_info (type: map<string,string>), dp_main_url_type (type: string), dp_inventory_type (type: string), dp_main_url (type: map<string,array<string>>), dp_tag_url (type: map<string,array<string>>), dp_referrer_url (type: map<string,array<string>>), hour (type: int), 20160828 (type: int) outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29, _col30, _col31 Statistics: Num rows: 72177962 Data size: 79377906786 Basic stats: COMPLETE Column stats: NONE Limit Number of rows: 1 Statistics: Num rows: 1 Data size: 1099 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: true Statistics: Num rows: 1 Data size: 1099 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: 1 Processor Tree: ListSink * traceback when i press ctl-c on the client ^CFailed with exception java.io.IOException:java.io.IOException: Filesystem closed 16/08/30 20:16:09 [62cf87a8-2b39-47d2-b234-53631b60d84e main]: ERROR CliDriver: - Failed with exception java.io.IOException:java.io.IOException: Filesystem closed java.io.IOException: java.io.IOException: Filesystem closed at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:517) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:424) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:144) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1885) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:252) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:335) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:742) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:794) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:833) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:897) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:700) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.hadoop.io.SequenceFile$Reader.readBlock(SequenceFile.java:2101) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2508) at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:82) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:484) ... 16 more on the one that hangs the job is never submitted to YARN. On the one that succeeds the job is submitted. Ultimately though why does changing the log level influence the query plan? any insights greatly appreciated. Thanks, Stephen.