To answer your different questions
1) Unless you have a service (Hive LLAP or Hive server 2) which keeps
running and holds a handle to the Tez session (yarn application) , it is
not possible to reuse it. Pig does not host any service of it is own. So
unless you bring a jvm up, keep it running and
We're wondering if there was something like Apache Hive LLAP:
https://cwiki.apache.org/confluence/display/Hive/LLAP
We submit scripts asynchronously throughout the day. Never more than 20
a time up to a thousand a day. Input file size varies from less than a
megabyte to a couple terabytes.
1
If you are using PigServer and submitting programmatically via same jvm, it
should automatically reuse the application if the requested AM resources
are same.
https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezSessionManager.java#L242-L245
On Fri, Ja
Hi!
We are developing an application that is looking for new files on a folder,
running a few Pig Scripts to prepare those files and, finally, loading them
into our database.
The problem is that, for small files, the time that Pig / Tez / Yarn take
to create a new application master and spawn new