Hi hive: We are looking at an interesting exception running count(*) queries in hive from embedded mode.
http://stackoverflow.com/questions/20956124/how-to-run-hive-sql-querys-containing-select-count-and-group-by-clauses I've pasted extra details on the question here: How can I run this query (1) in Hive embedded mode select product,count(*) as cnt from hive_bigpetstore_etl group by product in the maven console I get a `InvocationTargetException` exception `java.lang.reflect.InvocationTargetException Continuing ... java.lang.reflect.InvocationTargetException Continuing ...` in the Hive log file I find `2014-01-06 08:46:44,811 INFO mapred.LocalJobRunner (LocalJobRunner.java:run(348)) - Map task executor complete. 2014-01-06 08:46:44,813 WARN mapred.LocalJobRunner (LocalJobRunner.java:run(435)) - job_local817775241_0001 java.lang.Exception: java.lang.NullPointerException at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.Utilities.setColumnTypeList(Utilities.java:2033) at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushFilters(HiveInputFormat.java:351) at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:432) at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:374) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:540) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:191) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662)` this is a typical row in the input data `BigPetStore,storeCode_AK,1 russell,baird,Sun Dec 21 11:57:31 PST 1969,20.1,antelope-caller` this is the create table hive sql `String create = "CREATE TABLE hive_bigpetstore_etl (" + " state STRING," + " trans_id STRING," + " lname STRING," + " fname STRING," + " date STRING," + " price STRING," + " product STRING" + ")" + " ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' " + "WITH SERDEPROPERTIES (" + "\"input.regex\" = \"INPUT_REGEX\" , " + "\"output.format.string\" = \"%1$s %2$s %3$s %4$s %5$s\") " + "STORED AS TEXTFILE"; // \\d+ seems to lose its delimiters when sent to hive ? so opt for // another regex // create=create.replaceAll("INPUT_REGEX", // ".*_([A-Z][A-Z]),(\\d+)\\s+([a-z]*),([a-z]*),([^,]*),([^,]*),([^,]*).*"); create = create.replaceAll("INPUT_REGEX", "(?:BigPetStore,storeCode_)" + "([A-Z][A-Z])," + // state (CT) "([0-9]*)" + // state transaction id (1) "(?:\t)" + // [tab] "([a-z]*)," + // fname (jay) "([a-z]*)," + // lname (vyas) "([A-Z][^,]*)," + // date starts with capital letter (MWTFSS) "([^,]*)," + // price (12.19) "([^,]*).*"); // product (premium cat food)` this is the data load java code `ResultSet res = stmt.executeQuery(create); res = stmt.executeQuery("LOAD DATA INPATH '<rawInput>' INTO TABLE `enter code here`hive_bigpetstore_etl".replaceAll("<rawInput>", pathToRawInput.toString()));` the data loads into the table successfully as if I change (1) to (2) `select* from hive_hive_bigpetstore_etl` it returns a correct ResultSet with all the data I have checked everything is on the class-path, there are no exceptions like class not found, the hive and hadoop home env variables are set and checked with `printenv` if I run (1) against a standalone Hive/Thrift query (1) runs with no exceptions, I get the exceptions only in embedded mode. How can I run the `select count(*)` and `group by` clause's in Hive embedded mode? -- Jay Vyas http://jayunit100.blogspot.com