[ https://issues.apache.org/jira/browse/HIVE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030831#comment-13030831 ]
Marc Harris commented on HIVE-2111: ----------------------------------- I can't tell in which way you are suggesting the regular expression should change. If it is about the presence of the backslash before the parentheses, well the hive command line processor seems to require these. If it is the dot slashes, then the JIRA text editor seems to be mangling. Each Set of parentheses should contain "dot star date" and then there should be "dot star dollar" at the end. In any case, none of this would account for the fact that "select part1, col1, col2" and "select *" produce different results. > NullPointerException on select * with table using RegexSerDe and partitions > --------------------------------------------------------------------------- > > Key: HIVE-2111 > URL: https://issues.apache.org/jira/browse/HIVE-2111 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers > Affects Versions: 0.7.0 > Environment: Amazon Elastic Mapreduce > Reporter: Marc Harris > > When querying against a table that is partitioned, and uses RegexSerde, > select with explicit columns works, but "select *" results in a > NullPointerException > To reproduce: > 1) create a table containing the following text (notice the blank line): > ====start==== > fillerdatafillerdatafiller > fillerdata2fillerdata2filler > =====end===== > 2) copy the file to hdfs: > hadoop dfs -put foo.txt test/part1=x/foo.txt > 3) run the following hive commands to create a table: > add jar s3://elasticmapreduce/samples/hive/jars/hive_contrib.jar; > drop table test; > create external table test(col1 STRING, col2 STRING) > partitioned by (part1 STRING) > row format serde 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' > with serdeproperties ( "input.regex" = "^\(.*data\)\(.*data\).*$") > stored as textfile > location 'hdfs:///user/hadoop/test'; > alter table test add partition (part1='x'); > (Note that the text processor seems to have mangled the regex a bit. Inside > each pair of parentheses should be dot star data. After the second pair of > parentheses should be dot start dollar). > 4) select from it with explicit columns: > select part1, col1, col2 from test; > outputs: > OK > x fillerdata fillerdata > x NULL NULL > x fillerdata 2fillerdata > 5) select from it with * columns > select * from test; > outputs: > Failed with exception java.io.IOException:java.lang.NullPointerException > 11/04/12 14:28:27 ERROR CliDriver: Failed with exception > java.io.IOException:java.lang.NullPointerException > java.io.IOException: java.lang.NullPointerException > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:149) > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1039) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:172) > at > org.apache.hadoop.hive.cli.CliDriver.processLineInternal(CliDriver.java:228) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:209) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:398) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > Caused by: java.lang.NullPointerException > at java.util.ArrayList.addAll(ArrayList.java:472) > at > org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldsDataAsList(UnionStructObjectInspector.java:144) > at > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:357) > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:141) > ... 10 more -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira