Hi,
This is Kumar, and this is my first question in this group.
I have a requirement to implement ROW_NUMBER() from Teradata in Hive where
partitioning happens on multiple columns along with multiple column ordering.
It can be easily implemented in Hadoop MR, but I have to do in Hive. By doin
Hi Li
The major consideration you should give is regarding the size of bucket. One
bucket corresponds to a file in hdfs and you should ensure that every bucket is
atleast a block size or in the worst case atleast majority of the buckets
should be.
So based on the data size you should derive on
Hi guys,
I plan to bucket a table by "userid" as I'm going to do intense calculation
using "group by userid". there are about 110 million rows, with 7 million
unique userid, so my question is what is a good number of buckets for this
scenario, and how to determine number of buckets?
Any input is
Hi Mark,
We mostly do insert overwrite into local directory, and at that location
multiple files with output of that query are created and we use these files
our analysis. So, we want these files to be tab separated.
Limiting the number of records means limiting the length of a file, not
limiting
Chunky,
There may be another way to do this but to get tab separated output, I
usually create an external table that's tab separated and insert
overwrite into that table.
For limiting the number of records in the output, you can use the
limit clause in your query.
Mark
On Tue, Feb 19, 2013 at 10
hi folks,
I have a quick question, suppose I create a table in hive by a user and
after sometime I want to change the owner of the table.
1- how can I change the owner of the table.
2- does I also need to change the directory owner
3- or what is the feasible way to do that.
thanks
hadoophive