Hi Sreeman, Unfortunately, I don't think that Hive built-in format can currently read csv files with fields enclosed in double quotes. More generally, for having ingested quite a lot of messy csv files myself, I would recommend you to write a MapReduce (or Spark) job for cleaning your csv before giving it to Hive. This is what I did. The (other) kind of issue I've met were among :
- File not encoded in utf-8, making special characters unreadable for Hive - Some lines with missing or too many columns, which could shift your columns and ruin your stats. - Some lines with unreadable characters (probably data corruption) - I even got some lines with java stack traces in it I hope your csv is cleaner than that, and would recommend that if you have the control on how it is generated, replace your current separator with tab (and replace inline tabs with \t) or something like that. There might be some open source tools for data cleaning already out there. I plan to release mine one day, once I've migrated it to Spark maybe, and if my company agrees. If you're lazy, I heard that Dataiku Studio (which has a free version) can do such thing, though I never used it myself. Hope this helps, Furcy 2015-02-13 7:30 GMT+01:00 Slava Markeyev <[email protected]>: > You can use lazy simple serde with ROW FORMAT DELIMITED FIELDS TERMINATED > BY ',' ESCAPED BY '\'. Check the DDL for details > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL > > > > On Thu, Feb 12, 2015 at 8:19 PM, Sreeman <[email protected]> wrote: > >> Hi All, >> >> How all of you are creating hive/Impala table when the CSV file has some >> values with COMMA in between. it is like >> >> sree,12345,"payment made,but it is not successful" >> >> >> >> >> >> I know opencsv serde is there but it is not available in lower versions >> of Hive 14.0 >> >> >> > > > > -- > > Slava Markeyev | Engineering | Upsight > Find me on LinkedIn <http://www.linkedin.com/in/slavamarkeyev> > <http://www.linkedin.com/in/slavamarkeyev> >
