Re: Help with Join involving Non-Equality condition

2012-08-16 Thread Bertrand Dechoux
What are the data volume? And what are the meaning of those data? >From what I can see, you have a 'pack' per day. If that's true, a map join could be used because you should not have that many pack creation (But I am not sure how to enforce that.) I so filtering could happen right after. You woul

Re: Help with Join involving Non-Equality condition

2012-08-16 Thread Navis류승우
If you don't specify join condition, hive performs cross join. What is added to hive 0.10.0 is just a clarifying grammar. 2012/8/17 Himanish Kushary > We are on Hive 0.8 , I think cross join is available only since 0.10.0 > > Do we have any other options ? > > On Thu, Aug 16, 2012 at 2:28 PM, A

Re: Help with Join involving Non-Equality condition

2012-08-16 Thread Himanish Kushary
We are on Hive 0.8 , I think cross join is available only since 0.10.0 Do we have any other options ? On Thu, Aug 16, 2012 at 2:28 PM, Ablimit Aji wrote: > You can do a CROSS JOIN, then filter with the original inequality join > condition. > This would generate a lot of redundant tuples and may

Re: Help with Join involving Non-Equality condition

2012-08-16 Thread Ablimit Aji
You can do a CROSS JOIN, then filter with the original inequality join condition. This would generate a lot of redundant tuples and may not work if you have large amounts of data. On Thu, Aug 16, 2012 at 2:07 PM, Himanish Kushary wrote: > Hi, > > We have two tables in the following structure : >

Help with Join involving Non-Equality condition

2012-08-16 Thread Himanish Kushary
Hi, We have two tables in the following structure : Table1 : | id |packcreatetime | packid | -- | 505 |2012-07-16 11:51:12 | 111024 | | 505 |2012-07-18 11:52:13 | 111025 | | 505

RE: Aggregate Multiple Columns

2012-08-16 Thread richin.jain
Thanks Jan, I was looking for the first one, summing the values from two columns into one number. I did it as sum(col1) + sum(col2), but your solution is more elegant ☺ Regards, Richin From: ext Jan Dolinár [mailto:dolik@gmail.com] Sent: Thursday, August 16, 2012 12:07 PM To: user@hive.apac

Re: Aggregate Multiple Columns

2012-08-16 Thread Jan Dolinár
Hi Richin, Do you mean summing the values from two columns into one number, or calculating sum of both columns into two sums in one query? Both is possible, the first can be done simply as SUM(col1 + col2), the second can be accomplished with two sums: sum(col1), sum(col2). Does that answer your q

Problem when copying data from local drive to HDFS and creating external table.

2012-08-16 Thread Manish
Hello All, I am copying data from local drive to the HDFS and creating an external table in hive. But due to some reason data is not copied and create table script giving an error "file/folder doesn't exists". Note: there is an error also "log4j", don't know what exact reason for this error. In

RE: Converting rows into dynamic colums in Hive

2012-08-16 Thread richin.jain
You could do it using Pivot table in MS Excel. It's under the Insert tab, first option on the left. Richin -Original Message- From: Jain Richin (Nokia-LC/Boston) Sent: Thursday, August 09, 2012 4:16 PM To: user@hive.apache.org Subject: RE: Converting rows into dynamic colums in Hive Th

Aggregate Multiple Columns

2012-08-16 Thread richin.jain
Hello, Is there a way to aggregate multiple columns in Hive? I can do it in two separate queries but is there something similar to sum(col1,col2)? Thanks, Richin

Re: Hive directory permissions

2012-08-16 Thread John Meagher
Creating the /user/hive/warehouse folder is a one-time setup step that can be done as the hdfs user. With g+w permissions any user can then create and read the tables. On Thu, Aug 16, 2012 at 9:57 AM, Connell, Chuck wrote: > I have no doubt that works, but surely a Hive user should not need su

RE: Hive directory permissions

2012-08-16 Thread Connell, Chuck
I have no doubt that works, but surely a Hive user should not need sudo privileges! I am also looking for best practices, since we have run into the same. From: Himanish Kushary [mailto:himan...@gmail.com] Sent: Thursday, August 16, 2012 9:51 AM To: user@hive.apache.org Subject: Re: Hive direc

Re: Hive directory permissions

2012-08-16 Thread Himanish Kushary
We usually start the shell thru sudo,otherwise we get a "Permission denied" while creating Hive tables. But this is a good point, any suggestions/best practices from the user community ? Thanks On Thu, Aug 16, 2012 at 9:37 AM, Connell, Chuck wrote: > I have run into similar problems. Thanks fo

RE: Hive directory permissions

2012-08-16 Thread Connell, Chuck
I have run into similar problems. Thanks for the suggestions. One concern... Isn't hdfs a highly privileged user within the Hadoop cluster? So do we really want it to be standard practice for all Hive users to su to hdfs? Chuck Connell Nuance R&D Data Team Burlington, MA From: Himanish Kushary

Re: Reducer throwing warning during join operations.Defaulting int columns to 0

2012-08-16 Thread Himanish Kushary
Hi, To address this issue , for now I have changed the all my fields in the external tables to STRING datatype.The joins on external tables are working fine now. Will try to change the datatype while transforming to Hive managed table and re-execute the joins on the new tables. Any other suggesti

Re: Hive directory permissions

2012-08-16 Thread Himanish Kushary
Hi Sean, >From the Hive language manual - "Moreover, we strongly advise users to create the HDFS directories /tmp and /user/hive/warehouse (aka hive.metastore.warehouse.dir) and set them chmod g+w before tables are created in Hive" My warehouse directory has the following permissions: *Name* *Ty

RE: UNION ALL - what is the simplest form

2012-08-16 Thread Balaraman, Anand
Thanks for your suggestion Bejoy I am using hive 0.7.1... So, cant use you first solution... The second one is a good idea, but - I get a large chunk of files in staging which clutters my HDFS... Each file sizes from 40 KB to a max 4.4 MB, though my block size is 64MB... This is one of the reaso

Re: UNION ALL - what is the simplest form

2012-08-16 Thread Bejoy KS
Hi Anand You necessarily don't need to go in for UNION ALL for your requirement. Use INSERT INTO instead, which has less overhead. It is supported from hive 0.8 . INSERT INTO main_table SELECT * FROM stage_table; Or an even better approach if you are just copying whole data from one table to