My bad. I wasn't sure, at least I know now. But other solutions may use other 'Serialization' strategies like Thrift (which is only other customisation point of Hadoop).
Bertrand On Wed, Aug 1, 2012 at 5:49 PM, Edward Capriolo <edlinuxg...@gmail.com>wrote: > Hive does not use combiners it uses map side aggregation. Hive does > use writables, sometimes it uses ones from hadoop, sometimes it uses > its own custom writables for things like timestamps. > > On Wed, Aug 1, 2012 at 11:40 AM, Bertrand Dechoux <decho...@gmail.com> > wrote: > > I am not sure about Hive but if you look at Cascading they use a pseudo > > combiner instead of the standard (I mean Hadoop's) combiner. > > I guess Hive has a similar strategy. > > > > The point is that when you use a compiler, the compiler does smart thing > > that you don't need to think about (like loop unwinding). > > The result is that your code is still readable but optimized and in most > > cases the compiler will do better than you. > > > > Even your naive implementation of the Mapper (without the Reducer and the > > configuration) is more complicated than the whole Hive query. > > > > Like Chuck said Hive is basically a MapReduce compiler. It is fun to > look at > > how it works. But it is often best to let the compiler work for you > instead > > of trying to beat it. > > > > For simple cases, like a 'select', Hive (or any other same-level > alternative > > solutions) is helpful. And for complex cases, with multiple joins, you > will > > want to have something like Hive too because with the vanilla MapReduce > API > > it can become quite hard to grasp everything. Basically, two reasons : > > faster to express and cheaper to maintain. > > > > One reason not to use Hive is if your approach is more programmatic like > if > > you want to do machine learning which will require highly specific > workflow > > and user defined functions. > > > > It would be interesting to know your issue : are you trying to benchmark > > Hive (and you)? Or have you any other reasons? > > > > Bertrand > > > > > > On Wed, Aug 1, 2012 at 5:13 PM, Edward Capriolo <edlinuxg...@gmail.com> > > wrote: > >> > >> As mentioned, if you avoid using new, by re-using objects and possibly > >> use buffer objects you may be able to match or beat the speed. But in > >> the general case the hive saves you time by allowing you not to worry > >> about low level details like this. > >> > >> On Wed, Aug 1, 2012 at 10:35 AM, Connell, Chuck > >> <chuck.conn...@nuance.com> wrote: > >> > This is actually not surprising. Hive is essentially a MapReduce > >> > compiler. It is common for regular compilers (C, C#, Fortran) to emit > faster > >> > assembler code than you write yourself. Compilers know the tricks of > their > >> > target language. > >> > > >> > Chuck Connell > >> > Nuance R&D Data Team > >> > Burlington, MA > >> > > >> > > >> > -----Original Message----- > >> > From: Yue Guan [mailto:pipeha...@gmail.com] > >> > Sent: Wednesday, August 01, 2012 10:29 AM > >> > To: user@hive.apache.org > >> > Subject: mapper is slower than hive' mapper > >> > > >> > Hi, there > >> > > >> > I'm writing mapreduce to replace some hive query and I find that my > >> > mapper is slow than hive's mapper. The Hive query is like: > >> > > >> > select sum(column1) from table group by column2, column3; > >> > > >> > My mapreduce program likes this: > >> > > >> > public static class HiveTableMapper extends Mapper<BytesWritable, > >> > Text, MyKey, DoubleWritable> { > >> > > >> > public void map(BytesWritable key, Text value, Context > context) > >> > throws IOException, InterruptedException { > >> > String[] sLine = StringUtils.split(value.toString(), > >> > StringUtils.ESCAPE_CHAR, HIVE_FIELD_DELIMITER_CHAR); > >> > context.write(new MyKey(Integer.parseInt(sLine[0]), > >> > sLine[1]), new DoubleWritable(Double.parseDouble(sLine[2]))); > >> > } > >> > > >> > } > >> > > >> > I assume hive is doing something similar. Is there any trick in hive > to > >> > speed this thing up? Thank you! > >> > > >> > Best, > >> > > > > > > > > > > > -- > > Bertrand Dechoux > -- Bertrand Dechoux