Re: mapper is slower than hive' mapper

Yue Guan Wed, 01 Aug 2012 08:11:31 -0700

Hive don't use Writable?!!. Could you please give me a pointer to hivecode to see how they do the job?


I check the map output record. I find this:
my case:
total mapper input record: 23091348
total mapper output record: 23091348
avg mapper output bytes/record: 34.819994
total combiner output record: 27298
hive:
total mapper input record: 23091348
total mapper output record: 13164
avg mapper output bytes/record: 36.199407
total combiner output record: 0


Hive actually do reduce in mapper? How does that work?



On 08/01/2012 10:41 AM, Bertrand Dechoux wrote:

One hint would be to reduce the number of writable instances you need.
Create the object once and reuse it.
By the way, Hive do not use Writable. ;)

Bertrand

On Wed, Aug 1, 2012 at 4:35 PM, Connell, Chuck<chuck.conn...@nuance.com <mailto:chuck.conn...@nuance.com>> wrote:


    This is actually not surprising. Hive is essentially a MapReduce
    compiler. It is common for regular compilers (C, C#, Fortran) to
    emit faster assembler code than you write yourself. Compilers know
    the tricks of their target language.

    Chuck Connell
    Nuance R&D Data Team
    Burlington, MA


    -----Original Message-----
    From: Yue Guan [mailto:pipeha...@gmail.com
    <mailto:pipeha...@gmail.com>]
    Sent: Wednesday, August 01, 2012 10:29 AM
    To: user@hive.apache.org <mailto:user@hive.apache.org>
    Subject: mapper is slower than hive' mapper

    Hi, there

    I'm writing mapreduce to replace some hive query and I find that
    my mapper is slow than hive's mapper. The Hive query is like:

    select sum(column1) from table group by column2, column3;

    My mapreduce program likes this:

         public static class HiveTableMapper extends
    Mapper<BytesWritable, Text, MyKey, DoubleWritable> {

             public void map(BytesWritable key, Text value, Context
    context) throws IOException, InterruptedException {
                     String[] sLine = StringUtils.split(value.toString(),
    StringUtils.ESCAPE_CHAR, HIVE_FIELD_DELIMITER_CHAR);
                 context.write(new MyKey(Integer.parseInt(sLine[0]),
    sLine[1]), new DoubleWritable(Double.parseDouble(sLine[2])));
             }

         }

    I assume hive is doing something similar. Is there any trick in
    hive to speed this thing up? Thank you!

    Best,




--
Bertrand Dechoux

Re: mapper is slower than hive' mapper

Reply via email to