Retrieve the file content based on file name

2016-11-21 Thread Arun Patel
Hive Experts, I am using INPUT__FILE__NAME to store the file name in a column of ORC table. Now, Is there a way to retrieve the content of the HDFS file using any Hive function? If there is no function available, Is there a UDF I can use? Thank you very much for help. - Arun

Re: HDFS small files to Sequence file using Hive

2016-10-06 Thread Arun Patel
*Is there a way to increase the file/block size beyond 1MB? * *Thank you!* On Mon, Sep 26, 2016 at 7:50 PM, Arun Patel wrote: > Thanks Dudu and Gopal. > > I tried HAR files and it works. > > I want to use Sequence file because I want to expose data using a table > (filename a

Re: HDFS small files to Sequence file using Hive

2016-09-26 Thread Arun Patel
Thanks Dudu and Gopal. I tried HAR files and it works. I want to use Sequence file because I want to expose data using a table (filename and content columns). *Can this be done for HAR files?* This is what I am doing to create a sequencefile: create external table raw_files (raw_data string) l

HDFS small files to Sequence file using Hive

2016-09-23 Thread Arun Patel
I'm trying to resolve small files issue using Hive. Is there a way to create an external table on a directory, extract 'key' as file name and 'value' as file content and write to a sequence file table? Or any other better option in Hive? Thank you Arun

Re: RegexSerDe with Filters

2016-07-14 Thread Arun Patel
ngHandler Message%' then val_num end))as timestamps_no_dup > > > > ... > > fromv > > group bytid > > ; > > > > *From:* Arun Patel [mailto:arunp.bigd...@gmail.com] > *Sent:* Sunday, July 03, 2016 12:39 AM > > *To:* user@h

Re: RegexSerDe with Filters

2016-07-02 Thread Arun Patel
ssage > >,min (case when att = 'Request received in writer' then > ts end) as ts_Request_received_in_writer > >,min (case when att = 'Total time' then > ts end) as ts_Total_time > > > >

Re: RegexSerDe with Filters

2016-07-01 Thread Arun Patel
een TID: and TID number. How do I create base table and views? I am planning to join these 3 views based on TID. Do I need to take any special considerations? Regards, Venkat On Fri, Jun 24, 2016 at 5:17 PM, Arun Patel wrote: > Dudu, Thanks for the clarification. Looks like I have an

Hive Query Error: Cannot obtain block length

2016-06-28 Thread Arun Patel
I am trying to do log analytics on the logs created by Flume. Hive queries are failing with below error. "hadoop fs -cat" command works on all these open files. Is there a way to read these open files? My requirement is to read the data from open files too. I am using tez as execution engine.

Re: RegexSerDe with Filters

2016-06-24 Thread Arun Patel
^]]+)\]\s+(\S+)\s+:\s+(TID:\s\d+)?\s*(.*) > > > > I’ll send you a screen shot in private, since you don’t want to expose the > data. > > > > Dudu > > > > > > *From:* Arun Patel [mailto:arunp.bigd...@gmail.com] > *Sent:* Friday, June 24, 2016 9:33 PM > &

Re: RegexSerDe with Filters

2016-06-24 Thread Arun Patel
Looks like Regex pattern is not working. I tested the pattern on https://regex101.com/ and it does not find any match. Any suggestions? On Thu, Jun 23, 2016 at 3:01 PM, Markovitz, Dudu wrote: > My pleasure. > > Please feel free to reach me if needed. > > > > Dudu >

Re: RegexSerDe with Filters

2016-06-21 Thread Arun Patel
; > from log_V > > > > where txt like 'GET request received for path %' > > ; > > > > select * from log_v_get_request; > > > > > >

RegexSerDe with Filters

2016-06-20 Thread Arun Patel
Hello Hive Experts, I use flume to ingest application specific logs from Syslog to HDFS. Currently, I grep the HDFS directory for specific patterns (for multiple types of requests) and then create reports. However, generating reports for Weekly and Monthly are not salable. I would like to create