There is a newly introduced JDBC data source in Spark 1.3.0 (not the
JdbcRDD in Spark core), which may be useful. However, currently there's
no SQL server specific logics implemented. I'd assume standard SQL
queries should work.
Cheng
On 2/24/15 7:02 PM, Suhel M wrote:
Hey,
I am trying to work out what is the best way we can leverage Spark for
crunching data that is sitting in SQL Server databases.
Ideal scenario is being able to efficiently work with big data
(10billion+ rows of activity data). We need to shape this data for
machine learning problems and want to do ad-hoc & complex queries and
get results in timely manner.
All our data crunching is done via SQL/MDX queries, but these
obviously take a very long time to run over large data size. Also we
currently don't have hadoop or any other distributed storage.
Keen to hear feedback/thoughts/war stories from the Spark community on
best way to approach this situation.
Thanks
Suhel