Use \\d instead of \d. On Jul 1, 2011, at 6:52 PM, Sal Scalisi <sal...@hotmail.com> wrote:
> I'm new to hive and I'm having an issue loading a simple set of data via > regex. > > I have a data file called test.txt that contains the following: > > TESTONE-1 > TESTTWO-2 > TESTTHREE-3 > TESTFOUR-4 > TESTFIVE-5 > > I have this hive script: > > hive> CREATE TABLE test > > ( > > field_1 STRING > > ) > > ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' > > WITH SERDEPROPERTIES > > ( > > "input.regex" = "([^ ]*)", > > "output.regex" = "%1$s" > > ) > > STORED AS TEXTFILE; > Found class for org.apache.hadoop.hive.contrib.serde2.RegexSerDe > OK > Time taken: 0.064 seconds > > hive> LOAD DATA LOCAL INPATH '/home/hadoop/test' OVERWRITE INTO TABLE test; > Copying data from file:/home/hadoop/test > Loading data to table test > OK > Time taken: 0.213 seconds > > hive> SELECT * FROM test LIMIT 10; > OK > TESTONE-1 > TESTTWO-2 > TESTTHREE-3 > TESTFOUR-4 > TESTFIVE-5 > Time taken: 0.153 seconds > > Which produces the expected output. > > When I alter the hive script to include two fields, I get all null values: > > hive> CREATE TABLE test > > ( > > field_1 STRING, > > field_2 STRING > > ) > > ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' > > WITH SERDEPROPERTIES > > ( > > "input.regex" = "([a-z,A-Z]*)(-\d*)", > > "output.regex" = "%1$s %2$s" > > ) > > STORED AS TEXTFILE; > Found class for org.apache.hadoop.hive.contrib.serde2.RegexSerDe > OK > Time taken: 0.025 seconds > > hive> LOAD DATA LOCAL INPATH '/home/hadoop/test' OVERWRITE INTO TABLE test; > Copying data from file:/home/hadoop/test > Loading data to table test > OK > Time taken: 0.187 seconds > > hive> SELECT * FROM test LIMIT 10; > OK > NULL NULL > NULL NULL > NULL NULL > NULL NULL > NULL NULL > Time taken: 0.162 seconds > > I've checked the regular expression against http://regexpal.com/ and it seems > to check out. I think there may be an issue with SerDe, but I don't know how > to go about trouble shooting it. > > I'm running this on Amazon's Elastic MapReduce > > Any help is appreciated. > > -Sal