Is Spark suited for replacing a batch job using many database tables?

dabuki Wed, 06 Jul 2016 12:26:24 -0700

I was thinking about to replace a legacy batch job with Spark, but I'm not
sure if Spark is suited for this use case. Before I start the proof of
concept, I wanted to ask for opinions.


The legacy job works as follows: A file (100k - 1 mio entries) is iterated.
Every row contains a (book) order with an id and for each row approx. 15
processing steps have to be performed that involve access to multiple
database tables. In total approx. 25 tables (each containing 10k-700k
entries) have to be scanned using the book's id and the retrieved data is
joined together. 

As I'm new to Spark I'm not sure if I can leverage Spark's processing model
for this use case.





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Is-Spark-suited-for-replacing-a-batch-job-using-many-database-tables-tp27300.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Is Spark suited for replacing a batch job using many database tables?

Reply via email to