I think a lot will depend on what the scripts do. I've seen some legacy
hive scripts which were written in an awkward way (e.g. lots of subqueries,
nested explodes) because pre-spark it was the only way to express certain
logic. For fairly straightforward operations I expect Catalyst would reduce
b
My 2 cents is that this is a complicated question since I'm not confident
that Spark is 100% compatible with Hive in terms of query language. I have
an unanswered question in this list about this:
http://apache-spark-user-list.1001560.n3.nabble.com/Should-SHOW-TABLES-statement-return-a-hive-compat
Hi All,
Not sure if I need to ask this question on spark community or hive community.
We have a set of hive scripts that runs on EMR (Tez engine). We would like to
experiment by moving some of it onto Spark. We are planning to experiment with
two options.
1. Use the current code based on H