Hi Ningjun,
Haven't done this myself, saw your question and was curious about the
answer and found this article which you might find useful:
http://www.sparkexpert.com/2015/03/28/loading-database-data-into-spark-using-data-sources-api/
According this article, you can pass in your SQL statement in the "dbtable"
mapping, ie, something like:
val jdbcDF = sqlContext.read.format("jdbc")
.options(
Map("url" -> "jdbc:postgresql:dbserver",
"dbtable" -> "(select docid, title, docText from
dbo.document where docid between 10 and 1000)"
)).load
-sujit
On Mon, Dec 7, 2015 at 8:26 AM, Wang, Ningjun (LNG-NPV) <
[email protected]> wrote:
> How can I create a RDD from a SQL query against SQLServer database? Here
> is the example of dataframe
>
>
>
> http://spark.apache.org/docs/latest/sql-programming-guide.html#overview
>
>
>
>
>
> *val* jdbcDF *=* sqlContext.read.format("jdbc").options(
>
> *Map*("url" -> "jdbc:postgresql:dbserver",
>
> "dbtable" -> "schema.tablename")).load()
>
>
>
> This code create dataframe from a table. How can I create dataframe from a
> query, e.g. “select docid, title, docText from dbo.document where docid
> between 10 and 1000”?
>
>
>
> Ningjun
>
>
>