Hi, I am trying to use percentile function of HIVE but getting exception from Amazon EMR service. I am using version 0.7.
Please assist. It is very critical and urgent. Below is the code snippet: CREATE EXTERNAL TABLE IF NOT EXISTS server_d ( ag_date STRING, median _time BIGINT, 95percentile _time BIGINT ) COMMENT 'server' PARTITIONED by (dt STRING, hh STRING, min STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION '${hiveconf:S3_INPUT_BUCKET}/server /'; INSERT OVERWRITE TABLE server_d PARTITION(dt='${hiveconf:DT}', hh='${hiveconf:HH}', min='${hiveconf:MIN}') SELECT '${hiveconf:DT}' as ag_date, percentile(total _time, 0.50) as median _time, percentile(total_time, 0.95) as 95percentile_time from my_log my where my.date_req = '${hiveconf:DT}' and my.dt = '${hiveconf:DT}'; exception that I am getting: Exception in thread "Thread-239" java.lang.RuntimeException: Error while reading from task log url at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:130) at org.apache.hadoop.hive.ql.exec.JobDebugger.showJobFailDebugInfo(JobDebugger.java:211) at org.apache.hadoop.hive.ql.exec.JobDebugger.run(JobDebugger.java:81) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: Server returned HTTP response code: 400 for URL: http://10.***.***.**:9103/tasklog?taskid=attempt_201****0835_0005_m_000001_3&start=-8193 at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436) at java.net.URL.openStream(URL.java:1010) at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:120) ... 3 more Thanks, Puneet From: Felix.徐 [mailto:ygnhz...@gmail.com] Sent: Wednesday, November 21, 2012 2:22 PM To: user@hive.apache.org Subject: Bugs exist in SEMI JOIN? Hi, I am using the version 0.9.0 and my tables are the same with TPC-H benchmark: Here is a simple query(works correctly): Q1 INSERT OVERWRITE TABLE customer_orders_statistics SELECT C_CUSTKEY FROM CUSTOMER LEFT SEMI JOIN( SELECT O_CUSTKEY FROM ORDERS WHERE unix_timestamp(O_ORDERDATE, 'yyyy-MM-dd') > unix_timestamp('1995-12-31','yyyy-MM-dd') ) tempTable ON tempTable.O_CUSTKEY=CUSTOMER.C_CUSTKEY it means inserting the key of customers who has orders since 1995-12-31 into another table. But if I write the query like this: Q2 INSERT OVERWRITE TABLE customer_orders_statistics SELECT C_CUSTKEY FROM CUSTOMER LEFT SEMI JOIN ORDERS ON CUSTOMER.C_CUSTKEY=ORDERS.O_CUSTKEY AND unix_timestamp(ORDERS.O_ORDERDATE, 'yyyy-MM-dd') > unix_timestamp('1995-12-31','yyyy-MM-dd') I will get exception from Hive: FAILED: Hive Internal Error: java.lang.NullPointerException(null) java.lang.NullPointerException at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:1566) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.pushJoinFilters(SemanticAnalyzer.java:5254) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6754) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7531) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:689) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Also,If I write the query like this: Q3 INSERT OVERWRITE TABLE customer_orders_statistics SELECT C_CUSTKEY FROM CUSTOMER LEFT SEMI JOIN ORDERS ON CUSTOMER.C_CUSTKEY=ORDERS.O_CUSTKEY WHERE unix_timestamp(ORDERS.O_ORDERDATE, 'yyyy-MM-dd') > unix_timestamp('1995-12-31','yyyy-MM-dd') Then this query can be executed(wondering the right hand of SEMI JOIN can be referenced in WHERE clause now?), but the result is wrong(comparing to Q1, Q1's result is the same with mysql). Any comments or statements made in this email are not necessarily those of Tavant Technologies. The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. If you have received this in error, please contact the sender and delete the material from any computer. All e-mails sent from or to Tavant Technologies may be subject to our monitoring procedures.