I would try using the JDBC Data Source
<http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases>
and save the data to parquet
<http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files>.
You can then put that data on your Spark cluster (probably install HDFS).

On Fri, Oct 30, 2015 at 6:49 PM, Thomas Ginter <[email protected]>
wrote:

> I am working in an environment where data is stored in MS SQL Server.  It
> has been secured so that only a specific set of machines can access the
> database through an integrated security Microsoft JDBC connection.  We also
> have a couple of beefy linux machines we can use to host a Spark cluster
> but those machines do not have access to the databases directly.  How can I
> pull the data from the SQL database on the smaller development machine and
> then have it distribute to the Spark cluster for processing?  Can the
> driver pull data and then distribute execution?
>
> Thanks,
>
> Thomas Ginter
> 801-448-7676
> [email protected]
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to