Hi Sourav, This is quite an useful addition to the spark family, this is a usecase that comes more often than talked about. * to get a 3rd party mapping data(geo coordinates) , * access database data through rest. * download data from from bulk data api service
It will be really useful to be able to interact with application layer through restapi send over data to the rest api(case of post request which you already mentioned) I have few follow up thoughts 1) What's your thought when a resapi returns more complex nested json data , will this seamlessly map to a dataframe as dataframes are more flatter in nature. 2) how can this dataframe be kept in distributed cache in spark workers to be available , to encourage re-use of slow-changing data (does broadcast work on a dataframe?) . This is related to your b) 3) Last case in my mind is how can this be extended for streaming , and control the frequency of the resapi call and perform a join of two dataframes, one is slow-moving(may be a lookup table in db getting accessed over rest) and fast moving event stream. Thanks Sathi -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org