I would try using the JDBC Data Source <http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases> and save the data to parquet <http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files>. You can then put that data on your Spark cluster (probably install HDFS).
On Fri, Oct 30, 2015 at 6:49 PM, Thomas Ginter <[email protected]> wrote: > I am working in an environment where data is stored in MS SQL Server. It > has been secured so that only a specific set of machines can access the > database through an integrated security Microsoft JDBC connection. We also > have a couple of beefy linux machines we can use to host a Spark cluster > but those machines do not have access to the databases directly. How can I > pull the data from the SQL database on the smaller development machine and > then have it distribute to the Spark cluster for processing? Can the > driver pull data and then distribute execution? > > Thanks, > > Thomas Ginter > 801-448-7676 > [email protected] > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
