Hi Mark, You could try RegexSerDe to deserialize using regular expression. Here is a good example:
http://books.google.com/books?id=Nff49D7vnJcC&lpg=PA391&ots=IicwYn7zOq&dq=ROW%20FORMAT%20SERDE%20input.regex&pg=PA391#v=onepage&q=ROW%20FORMAT%20SERDE%20input.regex&f=false Good luck, Vince From: Mark Kerzner <mark.kerz...@shmsoft.com<mailto:mark.kerz...@shmsoft.com>> Reply-To: <user@hive.apache.org<mailto:user@hive.apache.org>> Date: Sat, 24 Sep 2011 22:43:23 -0500 To: <user@hive.apache.org<mailto:user@hive.apache.org>> Subject: Re: How to load quote-separated fields? That is what I ended up doing - since I could not change the format of the existing logs, I wrote a utility<https://github.com/markkerzner/WebLogAnalyzer/blob/master/src/main/java/com/shmsoft/webloganalyzer/ApacheWebLog.java> to convert them to something more standard that Hive can easily accept. Thank you, Mark 2011/9/24 longmans163 <longmans...@163.com<mailto:longmans...@163.com>> hi, Mark, I saw ""GET /dynLink/?" contains a space, since hive should recognize this as a FIELDS TERMINATED which you have defined before. I think you should encode the spaces to other non-terminate char. At 2011-09-23 04:58:59,"Mark Kerzner" <mark.kerz...@shmsoft.com<mailto:mark.kerz...@shmsoft.com>> wrote: Hi, I have an apache web log (sample below), and want to LOAD DATA INPATH. My fields are separated by a space, and those that contains spaces are enclosed in quotes. I tried this, ROW FORMAT DELIMITED FIELDS TERMINATED BY " " COLLECTION ITEMS TERMINATED BY '"' MAP KEYS TERMINATED BY "," but it did not work, and thought that GET is a separate field. What should I change? Thank you, Mark [01/May/2011:00:00:00 +0000] 68.115.109.118 TLSv1 RC4-MD5 "GET /dynLink/?PCD=CHICHHH&EBC=3425154412&RCC=D2RVX&GAD=20110426&NMN=2&NOA=1& amp;NOC=0&LNG=en&TBP=325.43&GEM=STEPHENCLAUDENELSON%40GMAIL.COM<http://40GMAIL.COM>&GEN=&GSL=&GLN=NELSON&GFN=STEPHEN&GCC=&GST=&GCT=&GPC=&GAR=&GPN=&PRT=0&PLC=&PCC=brandwebsite&PSC=&SRP=CIBMS0&PID=HIL&PET=WEB&GNR=1&CRP=0901452 HTTP/1.1" 200 95 0 99885 "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152<tel:3.0.4506.2152>; .NET CLR 3.5.30729; InfoPath.2; .NET4.0C; .NET4.0E; MS-RTC LM 8)" "https://secure.hilton.com/en/hi/res/retrieved_reservation.jhtml;jsessionid=UIBJ2MH0JDJPOCSGBIYMVCQ?_requestid=153483" "t=1304208000431979" "D=99766" The contents of this message, together with any attachments, are intended only for the use of the individual or entity to which they are addressed and may contain information that is confidential and exempt from disclosure. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this message, or any attachment, is strictly prohibited. If you have received this message in error, please notify the original sender immediately by telephone or by return E-mail and delete this message, along with any attachments, from your computer. Thank you.