I'd suggest looking at the avro data source as an example implementation: https://github.com/databricks/spark-avro
I also gave a talk a while ago: https://www.youtube.com/watch?v=GQSNJAzxOr8 Hi, You can connect to by JDBC as described in https://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases. Other option is using HadoopRDD and NewHadoopRDD to connect to databases compatible with Hadoop, like HBase, some examples can be found at chapter 5 of "Learning Spark" https://books.google.es/books?id=tOptBgAAQBAJ&pg=PT190&dq=learning+spark+hadooprdd&hl=en&sa=X&ei=4bqLVaDaLsXaU46NgfgL&ved=0CCoQ6AEwAA#v=onepage&q=%20hadooprdd&f=false For Spark Streaming see the section "Custom Sources" of https://spark.apache.org/docs/latest/streaming-programming-guide.html Hope that helps. Greetings, Juan 2015-06-25 8:25 GMT+02:00 诺铁 <noty...@gmail.com>: > hi, > > I can't find documentation about datasource api, how to implement custom > datasource. any hint is appreciated. thanks. >