This is actually not surprising. Hive is essentially a MapReduce compiler. It is common for regular compilers (C, C#, Fortran) to emit faster assembler code than you write yourself. Compilers know the tricks of their target language.
Chuck Connell Nuance R&D Data Team Burlington, MA -----Original Message----- From: Yue Guan [mailto:pipeha...@gmail.com] Sent: Wednesday, August 01, 2012 10:29 AM To: user@hive.apache.org Subject: mapper is slower than hive' mapper Hi, there I'm writing mapreduce to replace some hive query and I find that my mapper is slow than hive's mapper. The Hive query is like: select sum(column1) from table group by column2, column3; My mapreduce program likes this: public static class HiveTableMapper extends Mapper<BytesWritable, Text, MyKey, DoubleWritable> { public void map(BytesWritable key, Text value, Context context) throws IOException, InterruptedException { String[] sLine = StringUtils.split(value.toString(), StringUtils.ESCAPE_CHAR, HIVE_FIELD_DELIMITER_CHAR); context.write(new MyKey(Integer.parseInt(sLine[0]), sLine[1]), new DoubleWritable(Double.parseDouble(sLine[2]))); } } I assume hive is doing something similar. Is there any trick in hive to speed this thing up? Thank you! Best,