The log lines are in some kind of JSON format though. The regex needs
to literally match the log lines.

On Fri, Sep 30, 2011 at 8:52 PM, Jonathan <jonny2...@gmail.com> wrote:
> Hi,
> I am  trying to parse an apache2 log using
> the 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'. I am able to load
> the tables using the script below but its showing up each of the 3 rows as
> null for every entry.
> CREATE TABLE apachelog4 (
>   ip STRING,
>   time STRING,
>   beacon STRING)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
> WITH SERDEPROPERTIES (
>   "input.regex" = "([0-9]*?\.[0-9]*?\.[0-9]*?\.[0-9]*?) (\[.*\]) (beacon.*
> )",
>   "output.regex" = "%1$s %2$s %3$s"
> )
> STORED AS TEXTFILE;
>
> As an example here is a few rows of the input data that I am trying to get
> Hive to parse.
>
> {"body":"68.40.84.98 - - [29/Sep/2011:17:38:15 -0400] \"GET
> /beacon?action=imp&pos=2&PlacementId=&AdType=1&hmGUID=e9a51cd2-14ec-48d5-8652-b7607af26962&advertiserPostcode=WA12+0HE&lat=53.466&lon=-2.6327&searchRadius=&DistanceModified=&sort=priceasc&vMake=&vModel=&vMaximumPrice=&vMaximumPriceModified=&vMinimumPrice=&vMinimumPriceModified=&vFuelType=&vMileage=&vAge=&vBodyType=&vTransmission=&vEngineCc=&vColour=&sellertype=&vNoOfDoors=&channel=cars&pgN=1&vMinAge=&vMaxAge=&vMinMileage=&vMaxMileage=&vMinEngineCc=&vMaxEngineCc=&Platform=&PlatformVersion=&cUserID=&cid=70&pid=7fc9e5b6-6dfc-4c45-8cd9-8709b7c8e0fb&advertisementId=201113383357957&advertiserId=0
> HTTP/1.1\" 304
> 148","timestamp":1317332295456,"pri":"INFO","nanos":10720819422720,"host":"domU-12-31-39-0F-1D-C3.compute-1.internal","fields":{"rolltag":"20110929-173823752-0400.7672766572025.00000058"}}
>  {"body":"68.40.84.98 - - [29/Sep/2011:17:38:15 -0400] \"GET /favicon.ico
> HTTP/1.1\" 404
> 467","timestamp":1317332295693,"pri":"INFO","nanos":10721056284876,"host":"domU-12-31-39-0F-1D-C3.compute-1.internal","fields":{"rolltag":"20110929-173823752-0400.7672766572025.00000058"}}
>  {"body":"68.40.84.98 - - [29/Sep/2011:17:38:16 -0400] \"GET
> /beacon?action=imp&pos=2&PlacementId=&AdType=1&hmGUID=e9a51cd2-14ec-48d5-8652-b7607af26962&advertiserPostcode=WA12+0HE&lat=53.466&lon=-2.6327&searchRadius=&DistanceModified=&sort=priceasc&vMake=&vModel=&vMaximumPrice=&vMaximumPriceModified=&vMinimumPrice=&vMinimumPriceModified=&vFuelType=&vMileage=&vAge=&vBodyType=&vTransmission=&vEngineCc=&vColour=&sellertype=&vNoOfDoors=&channel=cars&pgN=1&vMinAge=&vMaxAge=&vMinMileage=&vMaxMileage=&vMinEngineCc=&vMaxEngineCc=&Platform=&PlatformVersion=&cUserID=&cid=70&pid=7fc9e5b6-6dfc-4c45-8cd9-8709b7c8e0fb&advertisementId=201113383357957&advertiserId=0
> HTTP/1.1\" 304
> 148","timestamp":1317332296116,"pri":"INFO","nanos":10721479503001,"host":"domU-12-31-39-0F-1D-C3.compute-1.internal","fields":{"rolltag":"20110929-173823752-0400.7672766572025.00000058"}}
>  {"body":"68.40.84.98 - - [29/Sep/2011:17:38:16 -0400] \"GET /favicon.ico
> HTTP/1.1\" 404
> 467","timestamp":1317332296353,"pri":"INFO","nanos":10721716299339,"host":"domU-12-31-39-0F-1D-C3.compute-1.internal","fields":{"rolltag":"20110929-173823752-0400.7672766572025.00000058"}}
>  {"body":"68.40.84.98 - - [29/Sep/2011:17:38:16 -0400] \"GET
> /beacon?action=imp&pos=2&PlacementId=&AdType=1&hmGUID=e9a51cd2-14ec-48d5-8652-b7607af26962&advertiserPostcode=WA12+0HE&lat=53.466&lon=-2.6327&searchRadius=&DistanceModified=&sort=priceasc&vMake=&vModel=&vMaximumPrice=&vMaximumPriceModified=&vMinimumPrice=&vMinimumPriceModified=&vFuelType=&vMileage=&vAge=&vBodyType=&vTransmission=&vEngineCc=&vColour=&sellertype=&vNoOfDoors=&channel=cars&pgN=1&vMinAge=&vMaxAge=&vMinMileage=&vMaxMileage=&vMinEngineCc=&vMaxEngineCc=&Platform=&PlatformVersion=&cUserID=&cid=70&pid=7fc9e5b6-6dfc-4c45-8cd9-8709b7c8e0fb&advertisementId=201113383357957&advertiserId=0
> HTTP/1.1\" 304
> 148","timestamp":1317332296533,"pri":"INFO","nanos":10721896540315,"host":"domU-12-31-39-0F-1D-C3.compute-1.internal","fields":{"rolltag":"20110929-173823752-0400.7672766572025.00000058"}}
>  {"body":"68.40.84.98 - - [29/Sep/2011:17:38:16 -0400] \"GET /favicon.ico
> HTTP/1.1\" 404
> 467","timestamp":1317332296833,"pri":"INFO","nanos":10722196180417,"host":"domU-12-31-39-0F-1D-C3.compute-1.internal","fields":{"rolltag":"20110929-173823752-0400.7672766572025.00000058"}}
>  {"body":"68.40.84.98 - - [29/Sep/2011:17:38:17 -0400] \"GET
> /beacon?action=imp&pos=2&PlacementId=&AdType=1&hmGUID=e9a51cd2-14ec-48d5-8652-b7607af26962&advertiserPostcode=WA12+0HE&lat=53.466&lon=-2.6327&searchRadius=&DistanceModified=&sort=priceasc&vMake=&vModel=&vMaximumPrice=&vMaximumPriceModified=&vMinimumPrice=&vMinimumPriceModified=&vFuelType=&vMileage=&vAge=&vBodyType=&vTransmission=&vEngineCc=&vColour=&sellertype=&vNoOfDoors=&channel=cars&pgN=1&vMinAge=&vMaxAge=&vMinMileage=&vMaxMileage=&vMinEngineCc=&vMaxEngineCc=&Platform=&PlatformVersion=&cUserID=&cid=70&pid=7fc9e5b6-6dfc-4c45-8cd9-8709b7c8e0fb&advertisementId=201113383357957&advertiserId=0
> HTTP/1.1\" 304
>
> Thanks in advance for all you help.

Reply via email to