Your store 90 million records in DB? What kind?
Sure there are some optimizations to speed up hive query, but I don't
see a universal one, except adding more servers.
On Wed, Sep 5, 2012 at 2:19 PM, iwannaplay games
wrote:
> Hi all,
>
> I ran a query on hive on top of 90 million records that too
Awk is good, except that it is not distributed. Some may use it with
Hadoop streaming, but I haven't give it a try yet.
Pig has some advanced features, its field-based sql-like language
(Pig-latin) is more flexible than text processing.
Anyway, glad I could help.
On Wed, Sep 5, 2012 at 9:33 AM,
Hi Miao Miao,
Thanks for the response and solution idea.
I am not familiar with Pig (as I am still a beginner on hadoop & hive),
will check it out.
The simplest way which comes into my mind now is to awk the logs, and
create a csv file with the input values i want before i load it to my
hive table
Hi Rekha
Thank you for your response.
Now i can be sure that output.format.string doesnt support what i am
trying to do.
Thanks for providing solution of creating an external table as staging
table in order to do this.
Will try it out.
Thank you.
> Hi lai,
>
> Interesting, and ideally we must
Ravi,
It looks like you are missing the
ADD JAR ...
command
Ruslan
On Tue, Sep 4, 2012 at 6:45 PM, Edward Capriolo wrote:
> You could start with this:
>
> https://github.com/edwardcapriolo/hive-geoip
>
> On Tue, Sep 4, 2012 at 10:42 AM, Ravi Shetye wrote:
>> Hi
>> I am trying to register a jav
Hi
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-DifferencebetweenSortByandOrderBy
Sort By will give you only partially sorted results if you have more
than one reducer
Ruslan
On Mon, Sep 3, 2012 at 1:38 AM, Binesh Gummadi wrote:
> Thanks for your q
Thanks Bertrand, will look into the same.
Regards,
Tamil
On Tue, Sep 4, 2012 at 8:24 PM, Bertrand Dechoux wrote:
> Hi,
>
> Would reflection be the answer for your question?
> http://hive.apache.org/docs/r0.9.0/udf/reflect.html
>
> This is a UDF but a very flexible one.
>
> Regards
>
> Bertrand
Hi,
Would reflection be the answer for your question?
http://hive.apache.org/docs/r0.9.0/udf/reflect.html
This is a UDF but a very flexible one.
Regards
Bertrand
On Tue, Sep 4, 2012 at 4:38 PM, Tamil A <4tamil...@gmail.com> wrote:
> Hi Experts,
> I am familiar with calling out to python from
You could start with this:
https://github.com/edwardcapriolo/hive-geoip
On Tue, Sep 4, 2012 at 10:42 AM, Ravi Shetye wrote:
> Hi
> I am trying to register a java udf which looks like
>
> public final class IP_2_GEO extends UDF {
> String geo_file;
> String geo_type;
> public IP_2_GEO(String geo_
I tried import apache2 log into hive a few weeks ago, and took a look
at SERDEPROPERTIES, but it was too complicated and pasting others'
demo wouldn't work.
Then I came up with another solution : apache2 log -> Apache Pig (for
ETL) -> Hive external table. But I ran into a problem of Pig ( which
wa
Hi lai,
Interesting, and ideally we must have a feature like below , but I don't
think we have either partition-based UPDATE on metastore(MERGE/APPEND if
partition already exists) or named/positional direct hdfs column filtering
on SQL.
CREATE TABLE c Š.
UPDATE TABLE c PARTITION(dt=) SELECT $0, $3
11 matches
Mail list logo