There is a newly introduced JDBC data source in Spark 1.3.0 (not the JdbcRDD in Spark core), which may be useful. However, currently there's no SQL server specific logics implemented. I'd assume standard SQL queries should work.

Cheng

On 2/24/15 7:02 PM, Suhel M wrote:
Hey,

I am trying to work out what is the best way we can leverage Spark for crunching data that is sitting in SQL Server databases. Ideal scenario is being able to efficiently work with big data (10billion+ rows of activity data). We need to shape this data for machine learning problems and want to do ad-hoc & complex queries and get results in timely manner.

All our data crunching is done via SQL/MDX queries, but these obviously take a very long time to run over large data size. Also we currently don't have hadoop or any other distributed storage.

Keen to hear feedback/thoughts/war stories from the Spark community on best way to approach this situation.

Thanks
Suhel

Reply via email to