Hi hive:

We are looking at an interesting exception running count(*) queries in hive
from embedded mode.

http://stackoverflow.com/questions/20956124/how-to-run-hive-sql-querys-containing-select-count-and-group-by-clauses

I've pasted extra details on the question here:


How can I run this query (1) in Hive embedded mode

    select product,count(*) as cnt from hive_bigpetstore_etl group by
product

in the maven console I get a `InvocationTargetException` exception

    `java.lang.reflect.InvocationTargetException
    Continuing ...
    java.lang.reflect.InvocationTargetException
    Continuing ...`

in the Hive log file I find


`2014-01-06 08:46:44,811 INFO  mapred.LocalJobRunner
(LocalJobRunner.java:run(348)) - Map task executor complete.
2014-01-06 08:46:44,813 WARN  mapred.LocalJobRunner
(LocalJobRunner.java:run(435)) - job_local817775241_0001
java.lang.Exception: java.lang.NullPointerException
    at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.NullPointerException
    at
org.apache.hadoop.hive.ql.exec.Utilities.setColumnTypeList(Utilities.java:2033)
    at
org.apache.hadoop.hive.ql.io.HiveInputFormat.pushFilters(HiveInputFormat.java:351)
    at
org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:432)
    at
org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:374)
    at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:540)
    at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:191)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
    at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
    at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)`

this is a typical row in the input data

`BigPetStore,storeCode_AK,1    russell,baird,Sun Dec 21 11:57:31 PST
1969,20.1,antelope-caller`

this is the create table hive sql

    `String create = "CREATE TABLE hive_bigpetstore_etl ("
                + "  state STRING,"
                + "  trans_id STRING,"
                + "  lname STRING,"
                + "  fname STRING,"
                + "  date STRING,"
                + "  price STRING,"
                + "  product STRING"
                + ")"
                + " ROW FORMAT SERDE
'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' "
                + "WITH SERDEPROPERTIES  ("
                + "\"input.regex\" = \"INPUT_REGEX\" , "
                + "\"output.format.string\" = \"%1$s %2$s %3$s %4$s %5$s\")
"
                + "STORED AS TEXTFILE";
        // \\d+ seems to lose its delimiters when sent to hive ? so opt for
        // another regex
        // create=create.replaceAll("INPUT_REGEX",
        //
".*_([A-Z][A-Z]),(\\d+)\\s+([a-z]*),([a-z]*),([^,]*),([^,]*),([^,]*).*");
        create = create.replaceAll("INPUT_REGEX",
"(?:BigPetStore,storeCode_)"
                + "([A-Z][A-Z])," + // state (CT)
                "([0-9]*)" + // state transaction id (1)
                "(?:\t)" + // [tab]
                "([a-z]*)," + // fname (jay)
                "([a-z]*)," + // lname (vyas)
                "([A-Z][^,]*)," + // date starts with capital letter
(MWTFSS)
                "([^,]*)," + // price (12.19)
                "([^,]*).*"); // product (premium cat food)`

this is the data load java code

    `ResultSet res = stmt.executeQuery(create);
        res = stmt.executeQuery("LOAD DATA INPATH '<rawInput>' INTO TABLE
`enter code    here`hive_bigpetstore_etl".replaceAll("<rawInput>",
pathToRawInput.toString()));`

the data loads into the table successfully as if I change (1) to (2)

    `select* from hive_hive_bigpetstore_etl`

it returns a correct ResultSet with all the data

I have checked everything is on the class-path, there are no exceptions
like class not found, the hive and hadoop home env variables are set and
checked with
    `printenv`
if I run (1) against a standalone Hive/Thrift query (1) runs with no
exceptions, I get the exceptions only in embedded mode.

How can I run the
    `select count(*)` and
    `group by`

clause's in Hive embedded mode?

-- 
Jay Vyas
http://jayunit100.blogspot.com

Reply via email to