That is what I ended up doing - since I could not change the format of the existing logs, I wrote a utility<https://github.com/markkerzner/WebLogAnalyzer/blob/master/src/main/java/com/shmsoft/webloganalyzer/ApacheWebLog.java>to convert them to something more standard that Hive can easily accept.
Thank you, Mark 2011/9/24 longmans163 <longmans...@163.com> > hi, Mark, I saw ""GET /dynLink/?" contains a space, since hive should > recognize this as a FIELDS TERMINATED which you have defined before. I > think you should encode the spaces to other non-terminate char. > > > At 2011-09-23 04:58:59,"Mark Kerzner" <mark.kerz...@shmsoft.com> wrote: > > Hi, > > I have an apache web log (sample below), and want to LOAD DATA INPATH. > > My fields are separated by a space, and those that contains spaces are > enclosed in quotes. > > I tried this, > > ROW FORMAT DELIMITED > FIELDS TERMINATED BY " " > COLLECTION ITEMS TERMINATED BY '"' > MAP KEYS TERMINATED BY "," > > but it did not work, and thought that GET is a separate field. What should > I change? > > Thank you, > Mark > > > [01/May/2011:00:00:00 +0000] 68.115.109.118 TLSv1 RC4-MD5 "GET > /dynLink/?PCD=CHICHHH&EBC=3425154412&RCC=D2RVX&GAD=20110426&NMN=2&NOA=1& > amp;NOC=0&LNG=en&TBP=325.43&GEM=STEPHENCLAUDENELSON%40GMAIL.COM&GEN=&GSL=&GLN=NELSON&GFN=STEPHEN&GCC=&GST=&GCT=&GPC=&GAR=&GPN=&PRT=0&PLC=&PCC=brandwebsite&PSC=&SRP=CIBMS0&PID=HIL&PET=WEB&GNR=1&CRP=0901452 > HTTP/1.1" 200 95 0 99885 "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; > .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; > InfoPath.2; .NET4.0C; .NET4.0E; MS-RTC LM 8)" " > https://secure.hilton.com/en/hi/res/retrieved_reservation.jhtml;jsessionid=UIBJ2MH0JDJPOCSGBIYMVCQ?_requestid=153483" > "t=1304208000431979" "D=99766" > > > >