hi Aggarwal, I am using the newest version (CDH3 Update1 Hive 0.7), after
submitting several jobs using hive, the submit becomes very slow (about 2-5
minutes), following is some error information from hive.log (seems the
metastore has some problem, I upgrade the metastore from 0.5 to 0.6 and then
f
Hi Daniel. The Hive query uses a reduce step to group by retailer_key and
calculate count(*). The "copy" step is a copy of data from the mapper to the
reducer.
Kai Ju
2011/8/10 Daniel,Wu
> I run a single query like
>
> select retailer_key,count(*) from records group by retailer_key;
>
> it uses
Hi
Hive queries are parsed into hadoop map reduce jobs. In map reduce jobs,
between map and reduce tasks there are two phases, copy-phase and sort-phase
together known as sort and shuffle phase. So the copy task indicated in hive
job here should be the copy phase of map reduce. It does the co
How much time is the query startup taking?
In earlier versions of Hive (before HIVE 2299) the query startup process had an
algorithm which took O(n^2) operations in number of partitions.
This means 100M operations before it would submit the map reduce job.
From: air [mailto:cnwe...@gmail.com]
Se
There is a lot of overhead in the hadoop distributed computing
infrastructure
If you are not designing for data that is much bigger then this example, you
might consider other alternatives
Guy
2011/8/10 Daniel,Wu
>
> Anyone know why hive has such a high latency? scan a table with
> 16,522,43
I run a single query like
select retailer_key,count(*) from records group by retailer_key;
it uses a single map as shown below, since the file is already on HDFS, so I
think hadoop/hive doesn't need to copy anything.
Kind% CompleteNum TasksPendingRunningCompleteKilledFailed/Killed
Task Attempt
there is only 10186 partitions in the metadata store (select count(1) from
PARTITIONS; in mysql), I think it is not the problem.
2011/8/10 Aggarwal, Vaibhav
> Do you have a lot of partitions in your table?
>
> Time taken to process the partitions before submitting the job is
> proportional t
Anyone know why hive has such a high latency? scan a table with 16,522,439
rows take more than 85 seconds. To read these data off disk, we only need about
10 seconds (even not consider the caching which read data from memory). So
where does 75 seconds go to? will Deserialize & Serialize t
Hi, all
In order to use hive, Hadoop conf "fs.default.name" must be setted
hostname:port otherwise Hive will throw Wrong FS exception. In my opinion,
ip and hostname is equivalent. Is there something wrong in my Hive conf?
*Any Help Will* *be greatly appreciated.*--
Thanks,
Jander
Hi,
I know I have asked this question before but the answers didn't quite satisfy
my requirements. Our problem stems from the fact that there is not built-in UDF
equivalent to yearweek of Mysql. This gives the year + week concatenated, with
the week starting on Sunday.
So we wrote our UDF but no
10 matches
Mail list logo