RE: mapper is slower than hive' mapper

Connell, Chuck Wed, 01 Aug 2012 07:36:03 -0700

This is actually not surprising. Hive is essentially a MapReduce compiler. It 
is common for regular compilers (C, C#, Fortran) to emit faster assembler code 
than you write yourself. Compilers know the tricks of their target language.


Chuck Connell
Nuance R&D Data Team
Burlington, MA


-----Original Message-----
From: Yue Guan [mailto:pipeha...@gmail.com] 
Sent: Wednesday, August 01, 2012 10:29 AM
To: user@hive.apache.org
Subject: mapper is slower than hive' mapper

Hi, there

I'm writing mapreduce to replace some hive query and I find that my mapper is 
slow than hive's mapper. The Hive query is like:

select sum(column1) from table group by column2, column3;

My mapreduce program likes this:

     public static class HiveTableMapper extends Mapper<BytesWritable, Text, 
MyKey, DoubleWritable> {

         public void map(BytesWritable key, Text value, Context context) throws 
IOException, InterruptedException {
                 String[] sLine = StringUtils.split(value.toString(),
StringUtils.ESCAPE_CHAR, HIVE_FIELD_DELIMITER_CHAR);
             context.write(new MyKey(Integer.parseInt(sLine[0]), sLine[1]), new 
DoubleWritable(Double.parseDouble(sLine[2])));
         }

     }

I assume hive is doing something similar. Is there any trick in hive to speed 
this thing up? Thank you!

Best,

RE: mapper is slower than hive' mapper

Reply via email to