Put your jobs into a parallel collection using .par -- then you can submit
them very easily to Spark, using .foreach. The jobs will then run using the
FIFO scheduler in Spark.

The advantage over the prior approaches are, that you won't have to deal
with Threads, and that you can leave parallelism completely to Spark.

On Mon, Jul 17, 2017 at 2:28 PM, Simon Kitching <
simon.kitch...@unbelievable-machine.com> wrote:

> Have you tried simply making a list with your tables in it, then using
> SparkContext.makeRDD(Seq)? ie
>
> val tablenames = List("table1", "table2", "table3", ...)
> val tablesRDD = sc.makeRDD(tablenames, nParallelTasks)
> tablesRDD.foreach(....)
>
> > Am 17.07.2017 um 14:12 schrieb FN <nuson.fr...@gmail.com>:
> >
> > Hi
> > I am currently trying to parallelize reading multiple tables from Hive .
> As
> > part of an archival framework, i need to convert few hundred tables which
> > are in txt format to Parquet. For now i am calling a Spark SQL inside a
> for
> > loop for conversion. But this execute sequential and entire process takes
> > longer time to finish.
> >
> > I tired  submitting 4 different Spark jobs ( each having set of tables
> to be
> > converted), it did give me some parallelism , but i would like to do
> this in
> > single Spark job due to few limitation of our cluster and process
> >
> > Any help will be greatly appreciated
> >
> >
> >
> >
> >
> > --
> > View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Reading-Hive-tables-Parallel-in-Spark-tp28869.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Reply via email to