IMHP Comparing the "performance" is boring and has been done umpteen times
before. The world won't get much out of another performance benchmark,
other then a bunch of fan boys saying "Look ours is faster hahahahah" and
then the other side says "but in this case ours is faster and that is the
more important case" Benchmarks are easy to bias and manipulate, and
comparing two like but not exact systems is hard. For example you will see
impala "winning" benchmarks HPC by re-writing queries, and then someone in
tez re-writes it another way tunes a setting and then they are "winning"
the benchmark.

You would be better off focusing on the design, the implementation with
third party tools (udfs, serdes, loaders) , the nuances of a more
procedural language then a declarative. Look in the world for scripts and
see who is deploying them effectively.





On Sat, May 3, 2014 at 4:46 AM, Sarfraz Ramay <sarfraz.ra...@gmail.com>wrote:

> Thanks Thejas for your input! These are interesting and very specific
> which is exactly what is required for a masters thesis.
>
> Are there any publications on Hive and the evaluation of its performance
> that i can use to compare ?
>
> Regards,
> Sarfraz Rasheed Ramay (DIT)
> Dublin, Ireland.
>
>
> On Sat, May 3, 2014 at 3:07 AM, Thejas Nair <the...@hortonworks.com>wrote:
>
>> The primary difference between hive and pig is the language. There are
>> implementation differences that will result in performance
>> differences, but it will be hard to figure out what aspect of
>> implementation responsible for what improvement.
>>
>> I think a more interesting project would be to compare the impact of
>> various performance improvements in hive. There are many features that
>> you can turn on and off.
>>
>> example -
>> - hive vectorization
>> - file format - text vs RCFile vs ORC
>> - compressed vs uncompressed
>> - mapreduce vs tez execution engine
>> - stats optimized queries
>>
>>
>>
>> On Thu, May 1, 2014 at 5:47 AM, Sarfraz Ramay <sarfraz.ra...@gmail.com>
>> wrote:
>> >>
>> >> Hi,
>> >>
>> >> It seems that both Hive and Pig are used for managing large data sets.
>> >> Hive is more SQL oriented whereas Pig is more for the data flows. I am
>> doing
>> >> a master's thesis on the performance evaluation of both. Can some
>> please
>> >> provide a list of tasks that would make for an interesting comparison ?
>> >>
>> >>
>> >> What is Hive good at ?
>> >>
>> >> What is Pig good at ?
>> >>
>> >> Ideally, i would like to take what Hive is good at and test it in Pig
>> and
>> >> vice versa. The competitive characteristics  would make for an
>> interesting
>> >> comparison.
>> >>
>> >>
>> >>
>> >>
>> >> Regards,
>> >> Sarfraz Rasheed Ramay (DIT)
>> >> Dublin, Ireland.
>> >
>> >
>>
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified
>> that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender
>> immediately
>> and delete it from your system. Thank You.
>>
>
>

Reply via email to