Setting up stats database

2011-08-15 Thread wd
hi, I'm try to use postgres as stats database. And made following settings in hive-site.xml hive.stats.dbclass jdbc:postgresql The default database that stores temporary hive statistics. hive.stats.autogather true A flag to gather statistics automatically during the INSERT OVERWR

Re: Setting up stats database

2011-08-15 Thread wd
oh, found hive only support mysql and hbase. I'll try hbase. On Mon, Aug 15, 2011 at 3:09 PM, wd wrote: > hi, > > I'm try to use postgres as stats database. And made following settings > in hive-site.xml > > > >  hive.stats.dbclass >  jdbc:postgresql >  The default database that stores temporary

Re: Setting up stats database

2011-08-15 Thread wd
HBase Publisher/Aggregator classes cannot be loaded. need to configure publisher/aggregator for hbase...there is only one way, that is use mysql .. does stats database will optimize hive query? Consider whether or not setup a mysql for this. On Mon, Aug 15, 2011 at 3:17 PM, wd wrote: > oh, foun

slow performance when using udf

2011-08-15 Thread wd
hi, I create a udf to decode urlencoded things, but found the speed for mapred is 3 times(73sec -> 213 sec) as before. How to optimize it? package com.test.hive.udf; import org.apache.hadoop.hive.ql.exec.UDF; import java.net.URLDecoder; public final class urldecode extends UDF { public Str

Re: slow performance when using udf

2011-08-15 Thread Carl Steinbach
Converting it to a GenericUDF (i.e. extending GenericUDF instead of UDF) should help some with performance. On Mon, Aug 15, 2011 at 1:49 AM, wd wrote: > hi, > > I create a udf to decode urlencoded things, but found the speed for > mapred is 3 times(73sec -> 213 sec) as before. How to optimize it

Re: slow performance when using udf

2011-08-15 Thread Edward Capriolo
On Monday, August 15, 2011, Carl Steinbach wrote: > Converting it to a GenericUDF (i.e. extending GenericUDF instead of UDF) should help some with performance. > On Mon, Aug 15, 2011 at 1:49 AM, wd wrote: >> >> hi, >> >> I create a udf to decode urlencoded things, but found the speed for >> mapre

copy table, change serde

2011-08-15 Thread Jonathan Grimm
Hi, I'm trying to do what I think should be a simple task, but I'm running into some issues with carrying through column names. All I want to do is essentially copy an existing table but change the serialization format (if you're curious, this is to help integrate with some existing map reduce

Single Map task for Hive queries

2011-08-15 Thread Jon Bender
Hello, I have external tables in Hive stored in a single flat text file. When I execute queries against it, all of my jobs are run as a single map task, even on very large tables. What steps do I need to make to ensure that these queries are split up and pushed out to multiple TTs? Do I need to

Re: Single Map task for Hive queries

2011-08-15 Thread Loren Siebert
Is your external file compressed with GZip or BZip? Those file formats aren’t splittable, so they get assigned to one mapper. On Aug 15, 2011, at 10:23 AM, Jon Bender wrote: > Hello, > > I have external tables in Hive stored in a single flat text file. When I > execute queries against it, al

Re: Single Map task for Hive queries

2011-08-15 Thread Jon Bender
It's actually just an uncompressed UTF-8 text file. This was essentially the create table clause: CREATE EXTERNAL TABLE foo ROW FORMAT DELIMITED STORED AS TEXTFILE LOCATION '/data/foo' Using Hive 0.7. On Mon, Aug 15, 2011 at 10:37 AM, Loren Siebert wrote: > Is your external file compressed wit

Re: Single Map task for Hive queries

2011-08-15 Thread Ayon Sinha
Can you try to recreate the external table with fields terminated by and lines   terminated by clauses?    -Ayon See My Photos on Flickr Also check out my Blog for answers to commonly asked questions. From: Jon Bender To: user@hive.apache.org Sent: Monday, Augus

Re: Single Map task for Hive queries

2011-08-15 Thread Loren Siebert
You should not have to do anything special to Hive to make it use all of your TT’s. The actual MR job should be governed by your mapred-site.xml file. When you run sample MR jobs (like the Pi example) and look at the job tracker, are you seeing all your TT’s getting used? On Aug 15, 2011, at 10

Re: Single Map task for Hive queries

2011-08-15 Thread Jon Bender
Yeah MapReduce itself is set up to use all of my task trackers--only one Map Task gets created one the external table queries. I tried querying another external table (composed of some 20 files) and it created 20 map tasks in turn during the query. I will try the LINES TERMINATED BY clause next t

Wiki write access, please

2011-08-15 Thread Jakob Homan
The current DDL page doesn't have documentation about the describe database command. I'd like to add that. I'm listed under my apache addr: jgho...@apache.org Thanks, Jakob

Re: Wiki write access, please

2011-08-15 Thread John Sichi
Granted! JVS On Aug 15, 2011, at 4:35 PM, Jakob Homan wrote: > The current DDL page doesn't have documentation about the describe > database command. I'd like to add that. I'm listed under my apache > addr: jgho...@apache.org > > Thanks, > Jakob

Re: slow performance when using udf

2011-08-15 Thread wd
Thanks for all your advise, I'll try it out. On Mon, Aug 15, 2011 at 9:02 PM, Edward Capriolo wrote: > > > On Monday, August 15, 2011, Carl Steinbach wrote: >> Converting it to a GenericUDF (i.e. extending GenericUDF instead of UDF) >> should help some with performance. >> On Mon, Aug 15, 2011 a

Re: slow performance when using udf

2011-08-15 Thread wd
Finally, the flowing code get no performance lose. I think the point is to avoid to use the getString method, Thanks everyone again. //import org.apache.hadoop.hive.ql.udf.generic.GenericUDF; import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.io.Text; import java.net.URLDecoder;