This is a very helpful article. Thanks for the help. Ningjun
From: Sujit Pal [mailto:sujitatgt...@gmail.com] Sent: Monday, December 07, 2015 12:42 PM To: Wang, Ningjun (LNG-NPV) Cc: user@spark.apache.org Subject: Re: How to create dataframe from SQL Server SQL query Hi Ningjun, Haven't done this myself, saw your question and was curious about the answer and found this article which you might find useful: http://www.sparkexpert.com/2015/03/28/loading-database-data-into-spark-using-data-sources-api/ According this article, you can pass in your SQL statement in the "dbtable" mapping, ie, something like: val jdbcDF = sqlContext.read.format("jdbc") .options( Map("url" -> "jdbc:postgresql:dbserver", "dbtable" -> "(select docid, title, docText from dbo.document where docid between 10 and 1000)" )).load -sujit On Mon, Dec 7, 2015 at 8:26 AM, Wang, Ningjun (LNG-NPV) <ningjun.w...@lexisnexis.com<mailto:ningjun.w...@lexisnexis.com>> wrote: How can I create a RDD from a SQL query against SQLServer database? Here is the example of dataframe http://spark.apache.org/docs/latest/sql-programming-guide.html#overview val jdbcDF = sqlContext.read.format("jdbc").options( Map("url" -> "jdbc:postgresql:dbserver", "dbtable" -> "schema.tablename")).load() This code create dataframe from a table. How can I create dataframe from a query, e.g. “select docid, title, docText from dbo.document where docid between 10 and 1000”? Ningjun