I think a lot will depend on what the scripts do. I've seen some legacy
hive scripts which were written in an awkward way (e.g. lots of subqueries,
nested explodes) because pre-spark it was the only way to express certain
logic. For fairly straightforward operations I expect Catalyst would reduce
b
My 2 cents is that this is a complicated question since I'm not confident
that Spark is 100% compatible with Hive in terms of query language. I have
an unanswered question in this list about this:
http://apache-spark-user-list.1001560.n3.nabble.com/Should-SHOW-TABLES-statement-return-a-hive-compat