I was thinking about to replace a legacy batch job with Spark, but I'm not sure if Spark is suited for this use case. Before I start the proof of concept, I wanted to ask for opinions.
The legacy job works as follows: A file (100k - 1 mio entries) is iterated. Every row contains a (book) order with an id and for each row approx. 15 processing steps have to be performed that involve access to multiple database tables. In total approx. 25 tables (each containing 10k-700k entries) have to be scanned using the book's id and the retrieved data is joined together. As I'm new to Spark I'm not sure if I can leverage Spark's processing model for this use case. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-Spark-suited-for-replacing-a-batch-job-using-many-database-tables-tp27300.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org