Re: how to load data to partitioned table

2011-08-11 Thread wd
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML 2011/8/12 Daniel,Wu > suppose the table is partitioned by period_key, and the csv file also has > a column named as period_key. The csv file contains multiple days of data, > how can we load it in the the table? > > I think of

how to load data to partitioned table

2011-08-11 Thread Daniel,Wu
suppose the table is partitioned by period_key, and the csv file also has a column named as period_key. The csv file contains multiple days of data, how can we load it in the the table? I think of an workaround by first load the data into a non-partition table, and then insert the data from n

create table for hive

2011-08-11 Thread Daniel,Wu
drop table store_sales; CREATE TABLE store_sales( SUBVENDOR_ID_KEY int , VENDOR_KEY int , RETAILER_KEY int , ITEM_KEY int , STORE_KEY int , SubvendorId string, OOS_REASON_KEY int , Total_Sales_Amount float , Total_Sales_Volume_Units float , Store_On_Hand_Volume_Units float , Promoted_Sal

Re: Running Hive from Eclipse

2011-08-11 Thread john smith
Hi, See in the line that log4j props is not in found .. I added Hive_conf dir to the classpath while running and now I get this trace .. http://pastebin.com/vXs98aZ5 I am completely clueless ! Thanks JS On Fri, Aug 12, 2011 at 9:54 AM, john smith wrote: > Hi Carl, > > This is the stack tra

Re: Running Hive from Eclipse

2011-08-11 Thread john smith
Hi Carl, This is the stack trace I get .. http://pastebin.com/3pASqvDq I configured mysql as my metastore and its perfectly getting updated when ever I am adding tables via commandline. Also one more thing is ..I am not getting any log statements while using command line . I haven't messed up wi

how to make the data in one table available to multiple tables?

2011-08-11 Thread Daniel,Wu
We have a table name as sales, which is partitioned by period (MMDD), and we also need a table ly_sales(last year sales). To speed up the query, we don't use a view to join sales with last year mapping table( e.g 20110603 mapped to 20100603) for performance viewpoint. However we used the

Re: how to distribute a small table to all nodes?

2011-08-11 Thread Loren Siebert
The Hive table is just a directory in HDFS, so you can recursively set the replication factor on it as you like. You can set it to the number of datanodes you have. If you have 100 nodes, then run this after you create your table: hadoop fs -setrep -R -w 100 /path/to/hive/warehouse/smal

how to distribute a small table to all nodes?

2011-08-11 Thread Daniel,Wu
if we have a very small table to be joined. we can use map side join and need the small table to be located on the map task. Is it possible to replicate the small table to ALL nodes when create the small table to cute the time to distribute the small table?

Re: multiple tables join with only one hug table.

2011-08-11 Thread Ayon Sinha
The Mapjoin hint syntax help optimize by loading the smaller tables specified in the Mapjoin hint into memory. Then every small table is in memory of each mapper.   -Ayon See My Photos on Flickr Also check out my Blog for answers to commonly asked questions. Fr

multiple tables join with only one hug table.

2011-08-11 Thread Daniel,Wu
if the retailer fact table is sale_fact with 10B rows, and join with 3 small tables: stores (10K), products(10K), period (1K). What's the best join solution? In oracle, it can first build hash for stores, and hash for products, and hash for stores. Then probe using the fact table, if the row mat

Re: Running Hive from Eclipse

2011-08-11 Thread Carl Steinbach
Hi John, Can you please include the error messages/exceptions that you're encountering? Thanks. Carl On Thu, Aug 11, 2011 at 1:40 PM, john smith wrote: > Hi folks, > > I am trying to run Hive from eclipse. I've set it up correctly and it is > building the jars and stuff. However I face execep

Running Hive from Eclipse

2011-08-11 Thread john smith
Hi folks, I am trying to run Hive from eclipse. I've set it up correctly and it is building the jars and stuff. However I face execeptions when I try to run hive queries like "show tables" etc. There has been a discussion on this in the mailing list previously but there was no solution provided.

RE: Reducer Issue in New Setup

2011-08-11 Thread Aggarwal, Vaibhav
Are you using a custom scheduler? I have seen issues with jobs having 0 mappers and 1 reducer with Fair scheduler. From: hadoop n00b [mailto:new2h...@gmail.com] Sent: Thursday, August 11, 2011 9:32 AM To: user@hive.apache.org Subject: Reducer Issue in New Setup Hello, We have just setup Hive on

Re: Reducer Issue in New Setup

2011-08-11 Thread Loren Siebert
Can you run normal MR jobs, like the example Pi calculation? Sometimes a no-reducer problem stems from DNS issues— reducers use node names, not IP addresses, so you need to have each machine knows how to resolve the names of all the other machines in the cluster. If it's a new cluster, you may

RE: Reducer Issue in New Setup

2011-08-11 Thread Travis Powell
Have you checked your logs? These are often the best places to start. Look at the running job and click on the running count, the current task, then the task logs. Sometimes they're helpful, sometimes they're not. http://hadoop-master:50030/jobtracker.jsp Travis Powell / tpow...@tealea

Reducer Issue in New Setup

2011-08-11 Thread hadoop n00b
Hello, We have just setup Hive on a new Hadoop cluster. When I run a select * on a table, it works fine but when I run any query which needs a reducer, like count(1) or a where condition, the query just sits there doing nothing (map 0%). I see some message like no reducers to run. How do I fix th

Re: Help IN CHD

2011-08-11 Thread Harsh J
Vikas, This question belongs to Hadoop's lists. I'm moving it to hdfs-u...@hadoop.apache.org. To answer your question: DN hostnames must exist in the dfs.hosts pointed file if you want selective inclusion. Else you just have to start the DN with the right config and network access to the NN, and

Help IN CHD

2011-08-11 Thread Vikas Srivastava
Hey All, Please tell me where to enter datanode IP's in CHD3U2 , actally i installed all the components in namenode and datanode but confuse where to put datanode IPS in namenode so thet they get connected. -- With Regards Vikas Srivastava DWH & Analytics Team Mob:+91 9560885900 One97 | Let's

Re: CDH3 U1 Hive Job-commit very slow

2011-08-11 Thread air
I did some test, found that it is not Hive's issue, when I submit a job using hadoop jar it also has the same problem , so I need to find the key point from the hadoop cluster ! 2011/8/11 air > hi Aggarwal, I am using the newest version (CDH3 Update1 Hive 0.7), after > submitting several jobs us