Yes views looks like a way to go On Tue, Sep 20, 2016 at 3:49 PM, Damien Carol <damien.ca...@gmail.com> wrote:
> The royal way to do that is a view IMHO. > > 2016-09-20 12:14 GMT+02:00 Manish Rangari <linuxtricksfordev...@gmail.com> > : > >> Thanks for the reply Damien. The suggestion you gave is really useful. >> Currently I am achieving my desired output by performing below steps. But I >> want to achieve the desired result in one step instead of two. Do we have >> any way so that I can get the aid, did etc in create table statement? If >> not I will have to look for the option that you mentioned >> >> 1. >> CREATE TABLE elblog ( >> Request_date STRING, >> ELBName STRING, >> RequestIP STRING, >> RequestPort INT, >> BackendIP STRING, >> BackendPort INT, >> RequestProcessingTime DOUBLE, >> BackendProcessingTime DOUBLE, >> ClientResponseTime DOUBLE, >> ELBResponseCode STRING, >> BackendResponseCode STRING, >> ReceivedBytes BIGINT, >> SentBytes BIGINT, >> RequestVerb STRING, >> URL STRING, >> Protocol STRING, >> Useragent STRING, >> ssl_cipher STRING, >> ssl_protocol STRING >> ) >> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' >> WITH SERDEPROPERTIES ( >> "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ >> ]*):([0-9]*) ([.0-9]*) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*) ([-0-9]*) >> ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"(.*)\" (.*) (.*)$" >> ) >> STORED AS TEXTFILE; >> >> 2. >> create table elb_raw_log as select request_date, elbname, requestip, >> requestport, backendip, backendport, requestprocessingtime, >> backendprocessingtime, clientresponsetime, elbresponsecode, >> backendresponsecode, receivedbytes, sentbytes, requestverb, url, >> regexp_extract(url, '.*aid=([a-zA-Z0-9]+).*', 1) as aid, >> regexp_extract(url, '.*tid=([a-zA-Z0-9]+).*', 1) as tid, >> regexp_extract(url, '.*eid=([a-zA-Z0-9]+).*', 1) as eid, >> regexp_extract(url, '.*did=([a-zA-Z0-9]+).*', 1) as did, protocol, >> useragent, ssl_cipher, ssl_protocol from elblog; >> >> On Tue, Sep 20, 2016 at 3:12 PM, Damien Carol <damien.ca...@gmail.com> >> wrote: >> >>> see the udf >>> *parse_url_tuple* >>> SELECT b.* >>> FROM src LATERAL VIEW parse_url_tuple(fullurl, 'HOST', 'PATH', 'QUERY', >>> 'QUERY:id') b as host, path, query, query_id LIMIT 1; >>> >>> >>> https://cwiki.apache.org/confluence/display/Hive/LanguageMan >>> ual+UDF#LanguageManualUDF-parse_url_tuple >>> >>> 2016-09-20 11:22 GMT+02:00 Manish Rangari <linuxtricksfordev...@gmail.co >>> m>: >>> >>>> Guys, >>>> >>>> I want to get the field of elb logs. A sample elb log is given below >>>> and I am using below create table definition. It is working fine. I am >>>> getting what I wanted but now I want the bold part as well. For example >>>> eid, tid, aid. Can anyone help me how can I match them as well. >>>> >>>> NOTE: The position of aid, eid, tid is not fixed and it may change. >>>> >>>> 2016-09-16T06:55:19.056871Z testelb 2.1.7.2:52399 192.168.1.5:80 >>>> 0.000021 0.000596 0.00002 200 200 0 43 "GET >>>> https://site1.example.com:443/peek? >>>> *eid=aw123&tid=fskc235n&aid=2ADSFGSDG* HTTP/1.1" "Mozilla/5.0 (Windows >>>> NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 >>>> Safari/537.36" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 >>>> >>>> >>>> CREATE TABLE elblog ( >>>> Request_date STRING, >>>> ELBName STRING, >>>> RequestIP STRING, >>>> RequestPort INT, >>>> BackendIP STRING, >>>> BackendPort INT, >>>> RequestProcessingTime DOUBLE, >>>> BackendProcessingTime DOUBLE, >>>> ClientResponseTime DOUBLE, >>>> ELBResponseCode STRING, >>>> BackendResponseCode STRING, >>>> ReceivedBytes BIGINT, >>>> SentBytes BIGINT, >>>> RequestVerb STRING, >>>> URL STRING, >>>> Protocol STRING, >>>> Useragent STRING, >>>> ssl_cipher STRING, >>>> ssl_protocol STRING >>>> ) >>>> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' >>>> WITH SERDEPROPERTIES ( >>>> "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ >>>> ]*):([0-9]*) ([.0-9]*) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*) ([-0-9]*) >>>> ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"(.*)\" (.*) (.*)$" >>>> ) >>>> STORED AS TEXTFILE; >>>> >>> >>> >> >