Re: Wikipedia Dump Analysis..

2013-10-07 Thread Ajeet S Raina
Any suggestion?? On 7 Oct 2013 11:24, "Ajeet S Raina" wrote: > I was just trying to see if some interesting analysis is possible or > not.one thing which came to mind was tracking contributors and just thought > about that. > > Is it really possible? > On 7 Oct 2013 11:13, "Ajeet S Raina" wrote:

Re: Execution failed with exit status: 3

2013-10-07 Thread Sanjay Subramanian
Hi Nick How many partitions are there in table t1 and table t2 If there are many partitions in either t1 or t2 or both can u mod your query as follows and see if the error comes up SELECT T1.somecolumn, T2.someothercolumn FROM (SELECT * FROM t1 WHERE partition_column1='') T1 JOIN

Execution failed with exit status: 3

2013-10-07 Thread Martin, Nick
Hi all, I'm doing a very basic join in Hive and getting the error below. The HiveQL join syntax I'm using is: SELECT T1.somecolumn, T2.someothercolumn FROM t1 JOIN t2 ON (t1.idfield=t2.idfield) Driver returned: 3. Errors: OK Total MapReduce jobs = 1 setting HADOOP_USER_NAME someuser E

JSON format files versus AVRO

2013-10-07 Thread Sanjay Subramanian
Sorry if the subject sounds really stupid ! Basically I am re-architecting our web log record format Currently we have "Multiple lines = 1 Record " format (I have Hadoop jobs that parse the files and create columnar output for Hive tables) [begin_unique_id] Pipe delimited Blah..

Re: Setting evn variables through Hive

2013-10-07 Thread Edward Capriolo
Technically it is NOT possible to set ENVIRONMENT VARIABLES even from java. This is because getEnv() is a read only map in java. However I DID find a hack for this: https://github.com/edwardcapriolo/hive_test/blob/master/src/main/java/com/jointhegrid/hive_test/EnvironmentHack.java We would need t

Setting evn variables through Hive

2013-10-07 Thread vivek thakre
Hello, I am using some legacy binaries as streaming in Hive. These binaries are dependent on libraries which are installed on all the nodes of the cluster under /user/project_name/lib The env variable I want to set is LD_LIBRARY_PATH. Something like LD_LIBRARY_PATH=/user/project_name/lib I tried

Re: Is there any API that tells me what files comprise a hive table?

2013-10-07 Thread demian rosas
Nitin, This sounds good. I will write to hdfs dev guys to try to get the information I need. It is good to know that the information is there and that my problem is "just" to figure out how to make sense of it. Thanks a lot !!! On 7 October 2013 12:04, Nitin Pawar wrote: > You may want to rea

Re: Is there any API that tells me what files comprise a hive table?

2013-10-07 Thread demian rosas
Hi Sanjay, Thanks a lot for this. I am precisely looking at the available information in the hive metastore db. This is going to be of great help. Cheers. On 7 October 2013 12:04, Sanjay Subramanian < sanjay.subraman...@wizecommerce.com> wrote: > Perhaps a good thing to have in your Hive che

Re: Is there any API that tells me what files comprise a hive table?

2013-10-07 Thread Sanjay Subramanian
Perhaps a good thing to have in your Hive cheat sheet :-) ' I use the following mySQL query to find out the locations of the Hive table echo "select t.TBL_NAME, p.PART_NAME, s.LOCATION from PARTITIONS p, SDS s, TBLS t where t.TBL_ID=p.TBL_ID and p.SD_ID=s.SD_ID "| mysql -u -p -A | grep "" Th

Re: Is there any API that tells me what files comprise a hive table?

2013-10-07 Thread Nitin Pawar
You may want to reach out to hdfs dev for the format of editlog. There is a lot of information there and I am not sure how accurate I am. In one of my previous works, we did convert the daily editlog to a partitioned hive table and did exactly what you wanted to do. Sadly we could not opensource t

Re: Is there any API that tells me what files comprise a hive table?

2013-10-07 Thread demian rosas
Edward, Thanks a lot for this info !!! This gives me a clearer picture of the problem and how I can approach it. Cheers. On 7 October 2013 11:52, Edward Capriolo wrote: > Not a direct API. > > What I do is this. From java/thrift: > Table t = client.getTable("name_of_table"); > Path p = new P

Re: Is there any API that tells me what files comprise a hive table?

2013-10-07 Thread Edward Capriolo
Not a direct API. What I do is this. From java/thrift: Table t = client.getTable("name_of_table"); Path p = new Path(t.getSd.getLocation()); FileSystem fs = FileSystem.get(conf); List f = fs.listFiles(p) /// your logic here. On Mon, Oct 7, 2013 at 2:01 PM, demian rosas wrote: > Hi all, > > I

Re: Is there any API that tells me what files comprise a hive table?

2013-10-07 Thread demian rosas
Nitin, Thanks a lot for these answers. Then based on this: I should be able to call the dfs command and hdfs APIs from a java application, is this correct? (I bet this sound naive) I know about the edit log file. I have to do more investigation on this but, are you aware about any sort of stand

Re: Is there any API that tells me what files comprise a hive table?

2013-10-07 Thread Nitin Pawar
Answers as per my understanding and I may be wrong. so wait for others to correct me as well. 1. What files in hdfs constitute a hive table. If you specifically do an alter table command and map it to a single file, all the files inside a directory where the table is created are mapped to the tabl

Re: Hive Connection Pooling

2013-10-07 Thread Nitin Pawar
Its a setting added to hive available to apache hive. I suppose they would not take it away. If you want to confirm, wait for someone from cloudera to answer or if u r in hurry they have active support forums. On Mon, Oct 7, 2013 at 11:18 PM, S R wrote: > Is this something available out of the

Is there any API that tells me what files comprise a hive table?

2013-10-07 Thread demian rosas
Hi all, I want to track the changes made to the files of a Hive table. I wounder whether there is any API that I can use to find out the following: 1. What files in hdfs constitute a hive table. 2. What is the size of each of these files. 3. The time stamp of the creation/last update to each of

Re: Hive Connection Pooling

2013-10-07 Thread S R
Is this something available out of the box from Cloudera Hive? On Mon, Oct 7, 2013 at 12:02 AM, Sonal Goyal wrote: > Yes, the Hive MetaStore does support JDBC connection pooling to the > underlying metastore database. You can configure this in hive-site.xml > > > datanucleus.connectionPooli

Re: How to load /t /n file to Hive

2013-10-07 Thread Raj Hadoop
Yes, I have it. Thanks, Raj From: Sonal Goyal To: "user@hive.apache.org" ; Raj Hadoop Sent: Monday, October 7, 2013 1:38 AM Subject: Re: How to load /t /n file to Hive Do you have the option to escape your tabs and newlines in your base file?  Best Re

Re: Need help with Installation hive 0.11

2013-10-07 Thread Nitin Pawar
So here is what I did quickly. (I did not follow the step by step guide from hive wiki) 1) I already have a running hadoop cluster 2) Downloaded the bin-tar.gz from hive.apache.org links 3) Uncompressed the binaries 4) Inside the conf folder there are few .template files (I don't know there is no h