Re: Loading data into hive tables

2011-12-08 Thread alo alt
Adithya, Here small articles about: MS SQL => Sqoop http://mapredit.blogspot.com/2011/10/sqoop-and-microsoft-sql-server.html And to use the self-generated classes: http://mapredit.blogspot.com/2011/10/speedup-sqoop.html - Alex On Fri, Dec 9, 2011 at 5:51 AM, wrote: > Adithya > The answer is y

Re: Loading data into hive tables

2011-12-08 Thread bejoy_ks
Adithya The answer is yes. SQOOP is the tool you are looking for. It has an import option to load data from from any jdbc compliant database into hive. It even creates the hive table for you by refering to the source db table. Hope It helps!.. Regards Bejoy K S -Original Message-

Loading data into hive tables

2011-12-08 Thread Aditya Singh30
Hi, I want to know if there is any way to load data directly from some other DB, say Oracle/MySQL etc., into hive tables, without getting the data from DB into a text/rcfile/sequence file in a specific format and then loading the data from that file into hive table. Regards, Adi

Re: Partitioning EXTERNAL TABLE without copying or moving files

2011-12-08 Thread Aniket Mokashi
It is a hadoop limitation. hdfs move operation is inexpensive. I am assuming that is not an option to you because you want to save the path structure (for some backward compatibility sake). Something like symbolic links (i think its not supported in 0.20, not sure) or path filter might help. But,

Re: Partitioning EXTERNAL TABLE without copying or moving files

2011-12-08 Thread Jasper Knulst
Hi Vince, Hive partitioning can only exist by issueing new directories in HDFS. There is no way to partition the data in a Hive table without adding extra filepaths/dirs in HDFS. For an external table you have to redistribute the data yourself in corresponding filepaths and add the new partition

Re: Partitioning EXTERNAL TABLE without copying or moving files

2011-12-08 Thread Vince Hoang
Hi Matt Thanks for the response. We tried the example you provided without success. When we tried to add a partition by specifying the location as a file (log-2011-09-01.log), Hive complained with "Parent path is not a directory". I think Hive expects a directory. Our directory structure, a

RE: Partitioning EXTERNAL TABLE without copying or moving files

2011-12-08 Thread Tucker, Matt
Hi Vince, External tables shouldn't issue copy or move commands to your data files. You should define the base table location to '/logs', and issue alter table commands to add partitions for each date. Example: CREATE EXTERNAL TABLE logs ( Data STRING ) PARTITIONED BY (cal_date STRING) ROW FO

meeting minutes for 5-Dec-2011 contributor meeting

2011-12-08 Thread John Sichi
https://cwiki.apache.org/confluence/display/Hive/ContributorMinutes20111205 I created an INFRA ticket to take Hive out of Review Board: https://issues.apache.org/jira/browse/INFRA-4200 Please use Phabricator for all new review requests: https://cwiki.apache.org/confluence/display/Hive/Phabricat

Partitioning EXTERNAL TABLE without copying or moving files

2011-12-08 Thread Vince Hoang
Hi, I am running Hive 0.7.0 with Hadoop 0.20.2. I have one HDFS folder full of web server logs dated back several months. Is possible to partition an EXTERNAL TABLE without copying/moving files or altering the layout of the directory? For example, in HDFS, I have: > /logs/log-2011-09-01 > /l

Re: Hive UDFs/ FunctionRegistry etc

2011-12-08 Thread John Sichi
On Dec 8, 2011, at 12:20 PM, Sam William wrote: > I have a bunch of custom UDFs and I d like the others in the company to > make use of then in an easy way .Im not very happy with the 'CREATE > TEMPORARY FUNCTION' arrangement for each session . It d be great if our > site-specific

Hive UDFs/ FunctionRegistry etc

2011-12-08 Thread Sam William
Hi, I have a bunch of custom UDFs and I d like the others in the company to make use of then in an easy way .Im not very happy with the 'CREATE TEMPORARY FUNCTION' arrangement for each session . It d be great if our site-specific functions , work the sameway as the inbuilt functio

Re: Data loading from Datanode

2011-12-08 Thread Bejoy Ks
Hi Keshav Adding on to others comments. You can install hive anywhere, not necessary on the namenode. You can install the same on a data node or an utility server other than name node as well, I know a few large clusters that operates so.It applies the same with pig and other librar

Re: Hive query taking too much time

2011-12-08 Thread Wojciech Langiewicz
Using CombineFileInputFormat might help, but it still creates overhead when you hold many small files in HDFS. I don't know details of your requirements, but but option 2 seems to be better, make sure that X is at least size of few blocks in HDFS. You could also merge files incrementally, lik

Re: Data loading from Datanode

2011-12-08 Thread Jasper Knulst
Hi Keshav, What you want is not possible I guess. You can't submit anything into HDFS without the namenode. Datanodes reports their local blocks into the namenode. If the namenode does not know them it will instruct the datanode to delete them. But whats the point? If you submit local files to HDF

RE: Data loading from Datanode

2011-12-08 Thread Savant, Keshav
Hi Vikas, I think there is some problem in understanding, I have my cluster setup where I have installed Hive on namenode, and I can insert data into HDFS using hive. My question is, can I install hive on any of the datanode (instead of namenode) and load data from there in datanode directl

Re: Hive query taking too much time

2011-12-08 Thread Aniket Mokashi
You can also take a look at-- https://issues.apache.org/jira/browse/HIVE-74 On Wed, Dec 7, 2011 at 9:05 PM, Savant, Keshav < keshav.c.sav...@fisglobal.com> wrote: > You are right Wojciech Langiewicz, we did the same thing and posted my > result yesterday. Now we are planning to do this using a sh