The example below shows that the RCFILE SerDe doesn't handle "\n" in string fields correctly.
It seem that the SerDe uses "\n" internally as a record delimiter but it's failing to de/serialize it correctly when it appears within a field. Is that correct? Any ideas on how to work around that? Thanks, Andre $ echo X > dual.data $ hive hive> CREATE TABLE araujo_sandbox.dual(dummy string) stored as textfile; OK hive> use araujo_sandbox; OK hive> LOAD DATA LOCAL INPATH 'dual.data' INTO TABLE araujo_sandbox.dual; ... OK hive> select * from dual; OK X hive> CREATE TABLE araujo_sandbox.testIssue( > id int, > first_name string, > last_name string > ) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.RCFileInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat' > ; OK hive> insert into table araujo_sandbox.testIssue > select 1, 'John\n', 'Doe' from dual; ... 1 Rows loaded to testissue ... OK hive> select id from testIssue; ... OK 1 Time taken: 4.475 seconds, Fetched: 1 row(s) hive> select first_name from testIssue; ... OK John <---- there's an empty row here!! Time taken: 4.44 seconds, Fetched: 2 row(s) <---- there should be only 1 row hive> select last_name from testIssue; ... OK Doe Time taken: 4.414 seconds, Fetched: 1 row(s) hive> select * from testIssue; OK 1 John Doe Time taken: 0.065 seconds, Fetched: 1 row(s) hive> -- André Araújo Big Data Consultant/Solutions Architect The Pythian Group - Australia - www.pythian.com Office (calls from within Australia): 1300 366 021 x1270 Office (international): +61 2 8016 7000 x270 *OR* +1 613 565 8696 x1270 Mobile: +61 410 323 559 Fax: +61 2 9805 0544 IM: pythianaraujo @ AIM/MSN/Y! or ara...@pythian.com @ GTalk “Success is not about standing at the top, it's the steps you leave behind.” — Iker Pou (rock climber) -- --