date:20130404

Re: Loopup objects in distributed cache

2013-04-04 Thread vivek thakre

Thanks Jan for your reply. This is helpful Vivek On Thu, Apr 4, 2013 at 12:11 AM, Jan Dolinár wrote: > Hello Vivek, > > GenericUDTF has method initialize() which is only called once per task. So > if you read your files in this method and store the structures in memory > then the overhead is r

Re: Correct syntax for EXPLAIN DEPENDENCY

2013-04-04 Thread Sanjay Subramanian

Thanks I did upgrade but got stumped with this so reverted back https://issues.cloudera.org/browse/DISTRO-461 Regards sanjay On 4/4/13 7:37 PM, "Jarek Jarcec Cecho" wrote: >Hi Sanjay, >you can upgrade to CDH4.2.0 that contains Hive 0.10. > >Jarcec > >On Fri, Apr 05, 2013 at 01:48:39AM +, Sa

Re: Correct syntax for EXPLAIN DEPENDENCY

2013-04-04 Thread Jarek Jarcec Cecho

Hi Sanjay, you can upgrade to CDH4.2.0 that contains Hive 0.10. Jarcec On Fri, Apr 05, 2013 at 01:48:39AM +, Sanjay Subramanian wrote: > Ah its available only in 0.10.0 :-( > And I am still using 0.9.x from the CDH4.1.2 distribution > > > From: Sanjay Subramanian > mailto:sanjay.subraman..

Re: Correct syntax for EXPLAIN DEPENDENCY

2013-04-04 Thread Sanjay Subramanian

Ah its available only in 0.10.0 :-( And I am still using 0.9.x from the CDH4.1.2 distribution From: Sanjay Subramanian mailto:sanjay.subraman...@wizecommerce.com>> Reply-To: "user@hive.apache.org" mailto:user@hive.apache.org>> Date: Thursday, April 4, 2013 6:40 PM T

Correct syntax for EXPLAIN DEPENDENCY

2013-04-04 Thread Sanjay Subramanian

Hi Whats the correct syntax for EXPLAIN DEPENDENCY ? Query == /usr/lib/hive/bin/hive -e "explain dependency select * from channel_market_lang where channelid > 29000" org.apache.hadoop.hive.ql.parse.ParseException: line 1:8 cannot recognize input near 'plan' 'dependency' 'select' in stateme

Re: Partition performance

2013-04-04 Thread Dean Wampler

Also, how big are the files in each directory? Are they roughly the size of one HDFS block or a multiple. Lots of small files will mean lots of mapper tasks will little to do. You can also compare the job tracker console output for each job. I bet the slow one has a lot of very short map and reduc

Re: Partition performance

2013-04-04 Thread Owen O'Malley

See slide #9 from my Optimizing Hive Queries talk http://www.slideshare.net/oom65/optimize-hivequeriespptx . Certainly, we will improve it, but for now you are much better off with 1,000 partitions than 10,000. -- Owen On Thu, Apr 4, 2013 at 4:21 PM, Ramki Palle wrote: > Is it possible for you

Re: Partition performance

2013-04-04 Thread Ramki Palle

Is it possible for you to send the explain plan of these two queries? Regards, Ramki. On Thu, Apr 4, 2013 at 4:06 PM, Sanjay Subramanian < sanjay.subraman...@wizecommerce.com> wrote: > The slow down is most possibly due to large number of partitions. > I believe the Hive book authors tell us t

Re: Partition performance

2013-04-04 Thread Sanjay Subramanian

The slow down is most possibly due to large number of partitions. I believe the Hive book authors tell us to be cautious with large number of partitions :-) and I abide by that. Users Please add your points of view and experiences Thanks sanjay From: Ian mailto:liu...@yahoo.com>> Reply-To: "us

Partition performance

2013-04-04 Thread Ian

Hi, I created 3 years of hourly log files (totally 26280 files), and use External Table with partition to query. I tried two partition methods. 1). Log files are stored as /test1/2013/04/02/16/00_0 (A directory per hour). Use date and hour as partition keys. Add 3 years of directories to

builtins submodule - is it still needed?

2013-04-04 Thread Travis Crawford

Hey hive gurus - Is the "builtins" hive submodule in use? The submodule was added in HIVE-2523 as a location for builtin-UDFs, but it appears to not have taken off. Any objections to removing it? DETAILS For HIVE-4278 I'm making some build changes for the HCatalog integration. The "builtins" sub

Re: Huge join performance issue

2013-04-04 Thread Nitin Pawar

you dont really need subqueries to join the tables which have common columns. Its an additional overhead best way to filter your data and speed up your data processing is how you layout your data When you have larger table I will use partitioning and bucketing to trim down the data and improve the

Huge join performance issue

2013-04-04 Thread Gabi D

Hi all, I have two tables I need to join and then summarize. They are both huge (about 1B rows each, in the relevant partitions) and the query runs for over 2 hours creating 5T intermediate data. The current query looks like this: select t1.b,t1.c,t2.d,t2.e, count(*) from (select a,b,cfrom ta

Re: Loopup objects in distributed cache

2013-04-04 Thread Jan Dolinár

Hello Vivek, GenericUDTF has method initialize() which is only called once per task. So if you read your files in this method and store the structures in memory then the overhead is relatively small (reading 15MB per mapper is negligible compared to several GB of processed data). Best regards, Ja

Re: Loopup objects in distributed cache

Re: Correct syntax for EXPLAIN DEPENDENCY

Re: Correct syntax for EXPLAIN DEPENDENCY

Re: Correct syntax for EXPLAIN DEPENDENCY

Correct syntax for EXPLAIN DEPENDENCY

Re: Partition performance

Re: Partition performance

Re: Partition performance

Re: Partition performance

Partition performance

builtins submodule - is it still needed?

Re: Huge join performance issue

Huge join performance issue

Re: Loopup objects in distributed cache

14 matches

Site Navigation

Mail list logo

Footer information