This should be expected. Compressed text files are not splittable so that
CombineHiveInputFormat cannot read multiple files per mapper.
CombinedHiveInputFormat is used when hive.merge.maponly=true. If you set it to
false, we'll use HiveInputFormat and that should be able to merge compressed
tex
I found another criterion that determines whether or not the merge job
runs with compression turned on. It seems that if the target table is
stored as an rcfile, merges work, but if a text file, merges will
fail. For instance:
-- merge will work here:
create table alogs_dbg_sample3 (server_host
It makes sense. CombineHiveInputFormat does not work with compressed text files
(suffix *.gz) since it is not splittable. I think your default
hive.file.format=CombineHiveInputFormat. But I think by setting
hive.merge.maponly it should work (meaning merge should be succeeded). By
setting hive.m
I can not think this could be the cause.
The problem should be: your files can not be merged. I mean the file
size is bigger than the split size
On Friday, November 19, 2010, Leo Alekseyev wrote:
> Folks, thanks for your help. I've narrowed the problem down to
> compression. When I set hive.ex
Folks, thanks for your help. I've narrowed the problem down to
compression. When I set hive.exec.compress.output=false, merges
proceed as expected. When compression is on, the merge job doesn't
seem to actually merge, it just spits out the input.
On Fri, Nov 19, 2010 at 10:51 AM, yongqiang he
These are the parameters that control the behavior. (Try to set them
to different values if it does not work in your environment.)
set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
set mapred.min.split.size.per.node=10;
set mapred.min.split.size.per.rack=10
I'm using Hadoop 0.20.2. Merge jobs (with static partitions) have
worked for me in the past. Again, what's strange here is with the
latest Hive build the merge stage appears to run, but it doesn't
actually merge -- it's a quick map-only job that, near as I can tell,
doesn't do anything.
On Fri,
What version of Hadoop are you on?
On Thu, Nov 18, 2010 at 10:48 PM, Leo Alekseyev wrote:
> I thought I was running Hive with those changes merged in, but to make
> sure, I built the latest trunk version. The behavior changed somewhat
> (as in, it runs 2 stages instead of 1), but it still gener
I thought I was running Hive with those changes merged in, but to make
sure, I built the latest trunk version. The behavior changed somewhat
(as in, it runs 2 stages instead of 1), but it still generates the
same number of files (# of files generated is equal to the number of
the original mappers,
I see. If you are using dynamic partitions, HIVE-1307 and HIVE-1622 need to be
there for merging to take place. HIVE-1307 was committed to trunk on 08/25 and
HIVE-1622 was committed on 09/13. The simplest way is to update your Hive trunk
and rerun the query. If it still doesn't work maybe you ca
Leo:
You may find this helpful:
http://indoos.wordpress.com/2010/06/24/hive-remote-debugging/
On Thu, Nov 18, 2010 at 2:57 PM, Leo Alekseyev wrote:
> Hi Ning,
> For the dataset I'm experimenting with, the total size of the output
> is 2mb, and the files are at most a few kb in size. My
> hive.i
Hi Ning,
For the dataset I'm experimenting with, the total size of the output
is 2mb, and the files are at most a few kb in size. My
hive.input.format was set to default HiveInputFormat; however, when I
set it to CombineHiveInputFormat, it only made the first stage of the
job use fewer mappers. T
The settings looks good. The parameter hive.merge.size.smallfiles.avgsize is
used to determine at run time if a merge should be triggered: if the average
size of the files in the partition is SMALLER than the parameter and there are
more than 1 file, the merge should be scheduled. Can you try to
I have jobs that sample (or generate) a small amount of data from a
large table. At the end, I get e.g. about 3000 or more files of 1kb
or so. This becomes a nuisance. How can I make Hive do another pass
to merge the output? I have the following settings:
hive.merge.mapfiles=true
hive.merge.ma
14 matches
Mail list logo