I was thinking about to replace a legacy batch job with Spark, but I'm not
sure if Spark is suited for this use case. Before I start the proof of
concept, I wanted to ask for opinions.

The legacy job works as follows: A file (100k - 1 mio entries) is iterated.
Every row contains a (book) order with an id and for each row approx. 15
processing steps have to be performed that involve access to multiple
database tables. In total approx. 25 tables (each containing 10k-700k
entries) have to be scanned using the book's id and the retrieved data is
joined together. 

As I'm new to Spark I'm not sure if I can leverage Spark's processing model
for this use case.





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Is-Spark-suited-for-replacing-a-batch-job-using-many-database-tables-tp27300.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to