Hi Sameer, Try the following:
CREATE INDEX user_id_ordering_mode_index ON TABLE products(user_id,ordering_mode) AS 'COMPACT' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; Let me know what it says. On Mon, Jul 28, 2014 at 1:03 PM, Sameer Tilak <ssti...@live.com> wrote: > Hi everyone, > I have a TSV file (around 4 GB). I have creted a hive table on that using > the following command. It works finr without indexing. However, when I > create an index based on 2 columsn I get the following error: > > > create table products (user_id String, session_id String, ordering_date > Date, product_id String, reorder_date Date, ordering_mode int) ROW FORMAT > DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; > > > CREATE INDEX user_id_ordering_mode_index ON TABLE > products(user_id,ordering_mode) > AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH > DEFERRED REBUILD; > > ALTER INDEX user_id_ordering_mode_index ON products REBUILD; > > SET hive.index.compact.file=/user/hive/warehouse/default__ > user_id_ordering_mode_index__; > SET > hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat; > > > hive> select count (*) from products where ordering_mode = 1; > Total MapReduce jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks determined at compile time: 1 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer=<number> > In order to limit the maximum number of reducers: > set hive.exec.reducers.max=<number> > In order to set a constant number of reducers: > set mapred.reduce.tasks=<number> > java.lang.NumberFormatException: For input string: "00000001218 1 hdfs:// > pzxdcc0250.cdbt.pldc.kp.org:8020/user/hive/warehouse/products/products_clean.tsv > 16599219" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Long.parseLong(Long.java:441) > at java.lang.Long.parseLong(Long.java:483) > at > org.apache.hadoop.hive.ql.index.HiveIndexResult.add(HiveIndexResult.java:174) > at > org.apache.hadoop.hive.ql.index.HiveIndexResult.<init>(HiveIndexResult.java:127) > at > org.apache.hadoop.hive.ql.index.HiveIndexedInputFormat.getSplits(HiveIndexedInputFormat.java:120) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:520) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:512) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:564) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:559) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) > at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:559) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:550) > at > org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:425) > at > org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1485) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1263) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1091) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:921) > at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422) > at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:790) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:684) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:623) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > Job Submission failed with exception 'java.lang.NumberFormatException(For > input string: "00000001218 1 hdfs:// > pzxdcc0250.cdbt.pldc.kp.org:8020/user/hive/warehouse/products/products.tsv > 16599219")' > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.mr.MapRedTask >