That is what I ended up doing - since I could not change the format of the
existing logs, I wrote a
utility<https://github.com/markkerzner/WebLogAnalyzer/blob/master/src/main/java/com/shmsoft/webloganalyzer/ApacheWebLog.java>to
convert them to something more standard that Hive can easily accept.

Thank you,
Mark

2011/9/24 longmans163 <longmans...@163.com>

> hi, Mark, I saw ""GET /dynLink/?" contains a space, since hive should
> recognize this as a  FIELDS TERMINATED which you have defined before. I
> think you should encode the spaces to other non-terminate char.
>
>
> At 2011-09-23 04:58:59,"Mark Kerzner" <mark.kerz...@shmsoft.com> wrote:
>
> Hi,
>
> I have an apache web log (sample below), and want to LOAD DATA INPATH.
>
> My fields are separated by a space, and those that contains spaces are
> enclosed in quotes.
>
> I tried this,
>
> ROW FORMAT   DELIMITED
> FIELDS TERMINATED BY " "
> COLLECTION ITEMS TERMINATED BY '"'
> MAP KEYS TERMINATED BY ","
>
> but it did not work, and thought that GET is a separate field. What should
> I change?
>
> Thank you,
> Mark
>
>
> [01/May/2011:00:00:00 +0000] 68.115.109.118 TLSv1 RC4-MD5 "GET
> /dynLink/?PCD=CHICHHH&EBC=3425154412&RCC=D2RVX&GAD=20110426&NMN=2&NOA=1&
> amp;NOC=0&LNG=en&TBP=325.43&GEM=STEPHENCLAUDENELSON%40GMAIL.COM&GEN=&GSL=&GLN=NELSON&GFN=STEPHEN&GCC=&GST=&GCT=&GPC=&GAR=&GPN=&PRT=0&PLC=&PCC=brandwebsite&PSC=&SRP=CIBMS0&PID=HIL&PET=WEB&GNR=1&CRP=0901452
> HTTP/1.1" 200 95 0 99885 "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1;
> .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729;
> InfoPath.2; .NET4.0C; .NET4.0E; MS-RTC LM 8)" "
> https://secure.hilton.com/en/hi/res/retrieved_reservation.jhtml;jsessionid=UIBJ2MH0JDJPOCSGBIYMVCQ?_requestid=153483";
>  "t=1304208000431979"  "D=99766"
>
>
>
>

Reply via email to