We extensively use pubmed & clinical trial databases for our work and it involves making large amount of parametric rest api queries, usually if the data download is large the requests get timed out ad we have to run queries in very small batches . We also extensively use large number(thousands) of NLP queries for our ML work. Given that our content is quite large and we are constrained by the public database interfaces, such a framework would be very beneficial for our use case. Since I just stumbled on this post will try to use this package in context of our framework and let you know the difference between using the library vs the way we do it conventionally. Thanks for sharing it with the community.
-- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org