Hi, 
I have a question about the behavior of the class 
org.apache.hadoop.hive.contrib.serde2.RegexSerDe. Here is the example I tested 
using the Cloudra hive-0.7.1-cdh3u3 release. The above class did NOT do what I 
expect, any one knows the reason?
user:~/tmp> more Test.javaimport java.io.*;import java.text.*;
class Test {    public static void main (String[] argv) throws Exception    {   
     String line = "aaa,\"bbb\",\"cc,c\"";        String[] tokens = 
line.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");        int i = 1;        
for(String t : tokens) {            System.out.println(i + "> "+t);            
i++;        }    }}
:~/tmp> java Test1> aaa2> "bbb"3> "cc,c"
As you can see, the Java regular expression ",(?=([^\"]*\"[^\"]*\")*[^\"]*$)" 
did what I want it to do, it parse the string aaa,"bbb","cc,c" to 3 tokens: 
(aaa), ("bbb"), and ("cc,c"). So the regular expression works fine.
Now in the hive:
:~> more test.txtaaa,"bbb","cc,c":~> hiveHive history 
file=/tmp/user/hive_job_log_user_201204031242_591028210.txthive> create table 
test(    >  c1 string,    >  c2 string,    >  c3 string    > )    > row format  
  > SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'    > WITH 
SERDEPROPERTIES (    > "input.regex" = ",(?=([^\"]*\"[^\"]*\")*[^\"]*$)"    > ) 
   > STORED AS TEXTFILE;OKTime taken: 0.401 secondshive> load data local inpath 
'test.txt' overwrite into table test;Copying data from 
file:/home/user/test.txtCopying file: file:/home/user/test.txtLoading data to 
table dev.testDeleted hdfs://host/user/hive/warehouse/dev.db/testOKTime taken: 
0.282 secondshive> select * from test;                                         
OKNULL    NULL    NULL
When I query this table, I don't get what I expected. I expect the output 
should be the 3 strings like this ----->        aaa        "bbb"       "cc,c"
Why the output gives me 3 NULLs?
Thanks for your help.                                     

Reply via email to