Just wondering if anyone has experience at running Graphhopper (or similar)
in Spark?

In short, I can get it running in the master, but not in worker nodes. The
key trouble seems to be that Graphhopper depends on a pre-processed graph,
which it obtains from OSM data. In normal (desktop) use, it pre-processes,
and then caches to disk. My current thinking is that I could create the
cache locally, and then put it in HDFS, and tweak Graphhopper to read from
the HDFS source. Alternatively I could try to broadcast the cache (or the
entire Graphhopper instance) - though I believe that would require both
being serializable (which I've got little clue about). Does anyone have any
recommendations on the above?

In addition, I'm not quite sure how to structure it to minimise the cache
reading - I don't want to have to read the cache (and initialise
Graphhopper) for e.g. every route, as that's likely to be slow. It'd be nice
if this was only done once (e.g. for each partition) and then all the routes
in the partition processed with the same Graphhopper instance. Again, any
thoughts on this?

FYI, discussion on Graphhoper forum is  here
<https://discuss.graphhopper.com/t/how-to-use-graphhopper-in-spark/998>  ,
though no luck there. 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Graphhopper-routing-in-Spark-tp27682.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to