ROW_NUMBER() equivalent in Hive

2013-02-20 Thread kumar mr
Hi, This is Kumar, and this is my first question in this group. I have a requirement to implement ROW_NUMBER() from Teradata in Hive where partitioning happens on multiple columns along with multiple column ordering. It can be easily implemented in Hadoop MR, but I have to do in Hive. By doin

Re: bucketing on a column with millions of unique IDs

2013-02-20 Thread bejoy_ks
Hi Li The major consideration you should give is regarding the size of bucket. One bucket corresponds to a file in hdfs and you should ensure that every bucket is atleast a block size or in the worst case atleast majority of the buckets should be. So based on the data size you should derive on

bucketing on a column with millions of unique IDs

2013-02-20 Thread Echo Li
Hi guys, I plan to bucket a table by "userid" as I'm going to do intense calculation using "group by userid". there are about 110 million rows, with 7 million unique userid, so my question is what is a good number of buckets for this scenario, and how to determine number of buckets? Any input is

Re: Need tab separated output file and put limit on number of lines in a output file

2013-02-20 Thread Chunky Gupta
Hi Mark, We mostly do insert overwrite into local directory, and at that location multiple files with output of that query are created and we use these files our analysis. So, we want these files to be tab separated. Limiting the number of records means limiting the length of a file, not limiting

Re: Need tab separated output file and put limit on number of lines in a output file

2013-02-20 Thread Mark Grover
Chunky, There may be another way to do this but to get tab separated output, I usually create an external table that's tab separated and insert overwrite into that table. For limiting the number of records in the output, you can use the limit clause in your query. Mark On Tue, Feb 19, 2013 at 10

ower of hive table

2013-02-20 Thread hadoop hive
hi folks, I have a quick question, suppose I create a table in hive by a user and after sometime I want to change the owner of the table. 1- how can I change the owner of the table. 2- does I also need to change the directory owner 3- or what is the feasible way to do that. thanks hadoophive