Hi Mark,

You could try RegexSerDe to deserialize using regular expression. Here is a 
good example:

http://books.google.com/books?id=Nff49D7vnJcC&lpg=PA391&ots=IicwYn7zOq&dq=ROW%20FORMAT%20SERDE%20input.regex&pg=PA391#v=onepage&q=ROW%20FORMAT%20SERDE%20input.regex&f=false

Good luck,
Vince


From: Mark Kerzner <mark.kerz...@shmsoft.com<mailto:mark.kerz...@shmsoft.com>>
Reply-To: <user@hive.apache.org<mailto:user@hive.apache.org>>
Date: Sat, 24 Sep 2011 22:43:23 -0500
To: <user@hive.apache.org<mailto:user@hive.apache.org>>
Subject: Re: How to load quote-separated fields?

That is what I ended up doing - since I could not change the format of the 
existing logs, I wrote a 
utility<https://github.com/markkerzner/WebLogAnalyzer/blob/master/src/main/java/com/shmsoft/webloganalyzer/ApacheWebLog.java>
 to convert them to something more standard that Hive can easily accept.

Thank you,
Mark

2011/9/24 longmans163 <longmans...@163.com<mailto:longmans...@163.com>>
hi, Mark, I saw ""GET /dynLink/?" contains a space, since hive should recognize 
this as a  FIELDS TERMINATED which you have defined before. I think you should 
encode the spaces to other non-terminate char.


At 2011-09-23 04:58:59,"Mark Kerzner" 
<mark.kerz...@shmsoft.com<mailto:mark.kerz...@shmsoft.com>> wrote:
Hi,

I have an apache web log (sample below), and want to LOAD DATA INPATH.

My fields are separated by a space, and those that contains spaces are enclosed 
in quotes.

I tried this,

ROW FORMAT   DELIMITED
FIELDS TERMINATED BY " "
COLLECTION ITEMS TERMINATED BY '"'
MAP KEYS TERMINATED BY ","

but it did not work, and thought that GET is a separate field. What should I 
change?

Thank you,
Mark


[01/May/2011:00:00:00 +0000] 68.115.109.118 TLSv1 RC4-MD5 "GET 
/dynLink/?PCD=CHICHHH&EBC=3425154412&RCC=D2RVX&GAD=20110426&NMN=2&NOA=1& 
amp;NOC=0&LNG=en&TBP=325.43&GEM=STEPHENCLAUDENELSON%40GMAIL.COM<http://40GMAIL.COM>&GEN=&GSL=&GLN=NELSON&GFN=STEPHEN&GCC=&GST=&GCT=&GPC=&GAR=&GPN=&PRT=0&PLC=&PCC=brandwebsite&PSC=&SRP=CIBMS0&PID=HIL&PET=WEB&GNR=1&CRP=0901452
 HTTP/1.1" 200 95 0 99885 "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; 
.NET CLR 2.0.50727; .NET CLR 3.0.4506.2152<tel:3.0.4506.2152>; .NET CLR 
3.5.30729; InfoPath.2; .NET4.0C; .NET4.0E; MS-RTC LM 8)" 
"https://secure.hilton.com/en/hi/res/retrieved_reservation.jhtml;jsessionid=UIBJ2MH0JDJPOCSGBIYMVCQ?_requestid=153483";
  "t=1304208000431979"  "D=99766"





The contents of this message, together with any attachments, are intended only 
for the use of the individual or entity to which they are addressed and may 
contain information that is confidential and exempt from disclosure. If you are 
not the intended recipient, you are hereby notified that any dissemination, 
distribution, or copying of this message, or any attachment, is strictly 
prohibited. If you have received this message in error, please notify the 
original sender immediately by telephone or by return E-mail and delete this 
message, along with any attachments, from your computer. Thank you.

Reply via email to