Hi, maybe this is a stupid question:
I have a list of files. Each file I want to take as an input for a ML-algorithm. All files are independent from another. My question now is how do I distribute the work so that each worker takes a block of files and just runs the algorithm on them one by one. I hope somebody can point me in the right direction! :) Best regards, Lydia --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org