Hi Sourav,
This is quite an useful addition  to the spark family, this is a usecase
that comes more often than talked about.
* to get a 3rd party mapping data(geo coordinates) , 
* access database data through rest.
* download data from from bulk data api service   


It will be really useful to be able to interact with application layer
through restapi send over data to the rest api(case of post request which
you already mentioned) 

I have few follow up thoughts
1) What's your thought when a resapi returns more complex nested json data ,
will this seamlessly  map to a dataframe as  dataframes are more flatter in
nature. 
2) how can this dataframe be kept in distributed cache in spark workers to
be available , to encourage re-use of slow-changing data (does broadcast
work on a dataframe?) . This is related to your b) 
3) Last case in my mind is how can this be extended for streaming , and
control the frequency  of the resapi call and perform a join of two
dataframes, one is slow-moving(may be a lookup table in db getting accessed
over rest) and fast moving event stream.


Thanks
Sathi

 








--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to