Hello,
I need to develop an application which:
- reads xml files in thousands of directories, two levels down, from year x to
year y
- extracts data from <image> tags in those files and stores them in a Sql or
NoSql database
- generates ImageMagick commands based on the extracted data to generate images
- generates curl commands to index the image files with Solr
Does Spark provide any tools/features to facilitate and automate ("batchify")
the above tasks?
I can do all of the above with one or several Java programs, but I wondered if
using Spark would be of any use in such an endeavour.
Many thanks.
Philippe
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]