Thomas

I have the same problem, though in my case getting Kerberos authentication
to MSSQLServer from the cluster nodes does not seem to be supported. There
are a couple of options that come to mind.

1) You can pull the data running sqoop in local mode on the smaller
development machines and write to HDFS or to a persistent store connected
to your Spark cluster.
2) You can run Spark in local mode on the smaller development machines and
use JDBC Data Source and do something similar.

Regards
Deenar

*Think Reactive Ltd*
deenar.toras...@thinkreactive.co.uk
07714140812




On 31 October 2015 at 11:35, Michael Armbrust <mich...@databricks.com>
wrote:

> I would try using the JDBC Data Source
> <http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases>
> and save the data to parquet
> <http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files>.
> You can then put that data on your Spark cluster (probably install HDFS).
>
> On Fri, Oct 30, 2015 at 6:49 PM, Thomas Ginter <thomas.gin...@utah.edu>
> wrote:
>
>> I am working in an environment where data is stored in MS SQL Server.  It
>> has been secured so that only a specific set of machines can access the
>> database through an integrated security Microsoft JDBC connection.  We also
>> have a couple of beefy linux machines we can use to host a Spark cluster
>> but those machines do not have access to the databases directly.  How can I
>> pull the data from the SQL database on the smaller development machine and
>> then have it distribute to the Spark cluster for processing?  Can the
>> driver pull data and then distribute execution?
>>
>> Thanks,
>>
>> Thomas Ginter
>> 801-448-7676
>> thomas.gin...@utah.edu
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Reply via email to