The log lines are in some kind of JSON format though. The regex needs to literally match the log lines.
On Fri, Sep 30, 2011 at 8:52 PM, Jonathan <jonny2...@gmail.com> wrote: > Hi, > I am trying to parse an apache2 log using > the 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'. I am able to load > the tables using the script below but its showing up each of the 3 rows as > null for every entry. > CREATE TABLE apachelog4 ( > ip STRING, > time STRING, > beacon STRING) > ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' > WITH SERDEPROPERTIES ( > "input.regex" = "([0-9]*?\.[0-9]*?\.[0-9]*?\.[0-9]*?) (\[.*\]) (beacon.* > )", > "output.regex" = "%1$s %2$s %3$s" > ) > STORED AS TEXTFILE; > > As an example here is a few rows of the input data that I am trying to get > Hive to parse. > > {"body":"68.40.84.98 - - [29/Sep/2011:17:38:15 -0400] \"GET > /beacon?action=imp&pos=2&PlacementId=&AdType=1&hmGUID=e9a51cd2-14ec-48d5-8652-b7607af26962&advertiserPostcode=WA12+0HE&lat=53.466&lon=-2.6327&searchRadius=&DistanceModified=&sort=priceasc&vMake=&vModel=&vMaximumPrice=&vMaximumPriceModified=&vMinimumPrice=&vMinimumPriceModified=&vFuelType=&vMileage=&vAge=&vBodyType=&vTransmission=&vEngineCc=&vColour=&sellertype=&vNoOfDoors=&channel=cars&pgN=1&vMinAge=&vMaxAge=&vMinMileage=&vMaxMileage=&vMinEngineCc=&vMaxEngineCc=&Platform=&PlatformVersion=&cUserID=&cid=70&pid=7fc9e5b6-6dfc-4c45-8cd9-8709b7c8e0fb&advertisementId=201113383357957&advertiserId=0 > HTTP/1.1\" 304 > 148","timestamp":1317332295456,"pri":"INFO","nanos":10720819422720,"host":"domU-12-31-39-0F-1D-C3.compute-1.internal","fields":{"rolltag":"20110929-173823752-0400.7672766572025.00000058"}} > {"body":"68.40.84.98 - - [29/Sep/2011:17:38:15 -0400] \"GET /favicon.ico > HTTP/1.1\" 404 > 467","timestamp":1317332295693,"pri":"INFO","nanos":10721056284876,"host":"domU-12-31-39-0F-1D-C3.compute-1.internal","fields":{"rolltag":"20110929-173823752-0400.7672766572025.00000058"}} > {"body":"68.40.84.98 - - [29/Sep/2011:17:38:16 -0400] \"GET > /beacon?action=imp&pos=2&PlacementId=&AdType=1&hmGUID=e9a51cd2-14ec-48d5-8652-b7607af26962&advertiserPostcode=WA12+0HE&lat=53.466&lon=-2.6327&searchRadius=&DistanceModified=&sort=priceasc&vMake=&vModel=&vMaximumPrice=&vMaximumPriceModified=&vMinimumPrice=&vMinimumPriceModified=&vFuelType=&vMileage=&vAge=&vBodyType=&vTransmission=&vEngineCc=&vColour=&sellertype=&vNoOfDoors=&channel=cars&pgN=1&vMinAge=&vMaxAge=&vMinMileage=&vMaxMileage=&vMinEngineCc=&vMaxEngineCc=&Platform=&PlatformVersion=&cUserID=&cid=70&pid=7fc9e5b6-6dfc-4c45-8cd9-8709b7c8e0fb&advertisementId=201113383357957&advertiserId=0 > HTTP/1.1\" 304 > 148","timestamp":1317332296116,"pri":"INFO","nanos":10721479503001,"host":"domU-12-31-39-0F-1D-C3.compute-1.internal","fields":{"rolltag":"20110929-173823752-0400.7672766572025.00000058"}} > {"body":"68.40.84.98 - - [29/Sep/2011:17:38:16 -0400] \"GET /favicon.ico > HTTP/1.1\" 404 > 467","timestamp":1317332296353,"pri":"INFO","nanos":10721716299339,"host":"domU-12-31-39-0F-1D-C3.compute-1.internal","fields":{"rolltag":"20110929-173823752-0400.7672766572025.00000058"}} > {"body":"68.40.84.98 - - [29/Sep/2011:17:38:16 -0400] \"GET > /beacon?action=imp&pos=2&PlacementId=&AdType=1&hmGUID=e9a51cd2-14ec-48d5-8652-b7607af26962&advertiserPostcode=WA12+0HE&lat=53.466&lon=-2.6327&searchRadius=&DistanceModified=&sort=priceasc&vMake=&vModel=&vMaximumPrice=&vMaximumPriceModified=&vMinimumPrice=&vMinimumPriceModified=&vFuelType=&vMileage=&vAge=&vBodyType=&vTransmission=&vEngineCc=&vColour=&sellertype=&vNoOfDoors=&channel=cars&pgN=1&vMinAge=&vMaxAge=&vMinMileage=&vMaxMileage=&vMinEngineCc=&vMaxEngineCc=&Platform=&PlatformVersion=&cUserID=&cid=70&pid=7fc9e5b6-6dfc-4c45-8cd9-8709b7c8e0fb&advertisementId=201113383357957&advertiserId=0 > HTTP/1.1\" 304 > 148","timestamp":1317332296533,"pri":"INFO","nanos":10721896540315,"host":"domU-12-31-39-0F-1D-C3.compute-1.internal","fields":{"rolltag":"20110929-173823752-0400.7672766572025.00000058"}} > {"body":"68.40.84.98 - - [29/Sep/2011:17:38:16 -0400] \"GET /favicon.ico > HTTP/1.1\" 404 > 467","timestamp":1317332296833,"pri":"INFO","nanos":10722196180417,"host":"domU-12-31-39-0F-1D-C3.compute-1.internal","fields":{"rolltag":"20110929-173823752-0400.7672766572025.00000058"}} > {"body":"68.40.84.98 - - [29/Sep/2011:17:38:17 -0400] \"GET > /beacon?action=imp&pos=2&PlacementId=&AdType=1&hmGUID=e9a51cd2-14ec-48d5-8652-b7607af26962&advertiserPostcode=WA12+0HE&lat=53.466&lon=-2.6327&searchRadius=&DistanceModified=&sort=priceasc&vMake=&vModel=&vMaximumPrice=&vMaximumPriceModified=&vMinimumPrice=&vMinimumPriceModified=&vFuelType=&vMileage=&vAge=&vBodyType=&vTransmission=&vEngineCc=&vColour=&sellertype=&vNoOfDoors=&channel=cars&pgN=1&vMinAge=&vMaxAge=&vMinMileage=&vMaxMileage=&vMinEngineCc=&vMaxEngineCc=&Platform=&PlatformVersion=&cUserID=&cid=70&pid=7fc9e5b6-6dfc-4c45-8cd9-8709b7c8e0fb&advertisementId=201113383357957&advertiserId=0 > HTTP/1.1\" 304 > > Thanks in advance for all you help.