Re: Improving query performance on hive and hdfs

2012-09-04 Thread MiaoMiao
Your store 90 million records in DB? What kind? Sure there are some optimizations to speed up hive query, but I don't see a universal one, except adding more servers. On Wed, Sep 5, 2012 at 2:19 PM, iwannaplay games wrote: > Hi all, > > I ran a query on hive on top of 90 million records that too

Re: loading logfile into hive tables using certain format

2012-09-04 Thread MiaoMiao
Awk is good, except that it is not distributed. Some may use it with Hadoop streaming, but I haven't give it a try yet. Pig has some advanced features, its field-based sql-like language (Pig-latin) is more flexible than text processing. Anyway, glad I could help. On Wed, Sep 5, 2012 at 9:33 AM,

Re: loading logfile into hive tables using certain format

2012-09-04 Thread Elaine Gan
Hi Miao Miao, Thanks for the response and solution idea. I am not familiar with Pig (as I am still a beginner on hadoop & hive), will check it out. The simplest way which comes into my mind now is to awk the logs, and create a csv file with the input values i want before i load it to my hive table

Re: loading logfile into hive tables using certain format

2012-09-04 Thread Elaine Gan
Hi Rekha Thank you for your response. Now i can be sure that output.format.string doesnt support what i am trying to do. Thanks for providing solution of creating an external table as staging table in order to do this. Will try it out. Thank you. > Hi lai, > > Interesting, and ideally we must

Re: Hive UDF intialization

2012-09-04 Thread Ruslan Al-Fakikh
Ravi, It looks like you are missing the ADD JAR ... command Ruslan On Tue, Sep 4, 2012 at 6:45 PM, Edward Capriolo wrote: > You could start with this: > > https://github.com/edwardcapriolo/hive-geoip > > On Tue, Sep 4, 2012 at 10:42 AM, Ravi Shetye wrote: >> Hi >> I am trying to register a jav

Re: Hive sort by using a single reducer

2012-09-04 Thread Ruslan Al-Fakikh
Hi https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-DifferencebetweenSortByandOrderBy Sort By will give you only partially sorted results if you have more than one reducer Ruslan On Mon, Sep 3, 2012 at 1:38 AM, Binesh Gummadi wrote: > Thanks for your q

Re: Java methods from within a Hive query

2012-09-04 Thread Tamil A
Thanks Bertrand, will look into the same. Regards, Tamil On Tue, Sep 4, 2012 at 8:24 PM, Bertrand Dechoux wrote: > Hi, > > Would reflection be the answer for your question? > http://hive.apache.org/docs/r0.9.0/udf/reflect.html > > This is a UDF but a very flexible one. > > Regards > > Bertrand

Re: Java methods from within a Hive query

2012-09-04 Thread Bertrand Dechoux
Hi, Would reflection be the answer for your question? http://hive.apache.org/docs/r0.9.0/udf/reflect.html This is a UDF but a very flexible one. Regards Bertrand On Tue, Sep 4, 2012 at 4:38 PM, Tamil A <4tamil...@gmail.com> wrote: > Hi Experts, > I am familiar with calling out to python from

Re: Hive UDF intialization

2012-09-04 Thread Edward Capriolo
You could start with this: https://github.com/edwardcapriolo/hive-geoip On Tue, Sep 4, 2012 at 10:42 AM, Ravi Shetye wrote: > Hi > I am trying to register a java udf which looks like > > public final class IP_2_GEO extends UDF { > String geo_file; > String geo_type; > public IP_2_GEO(String geo_

Re: loading logfile into hive tables using certain format

2012-09-04 Thread MiaoMiao
I tried import apache2 log into hive a few weeks ago, and took a look at SERDEPROPERTIES, but it was too complicated and pasting others' demo wouldn't work. Then I came up with another solution : apache2 log -> Apache Pig (for ETL) -> Hive external table. But I ran into a problem of Pig ( which wa

Re: loading logfile into hive tables using certain format

2012-09-04 Thread Joshi, Rekha
Hi lai, Interesting, and ideally we must have a feature like below , but I don't think we have either partition-based UPDATE on metastore(MERGE/APPEND if partition already exists) or named/positional direct hdfs column filtering on SQL. CREATE TABLE c Š. UPDATE TABLE c PARTITION(dt=) SELECT $0, $3