For JDBC to work, you can start spark-submit with appropriate jdbc driver
jars (using --jars), then you will have the driver available on executors.

For acquiring connections, create a singleton connection per executor. I
think dataframe's jdbc reader (sqlContext.read.jdbc) already take care of
it.

Finally, if you want multiple mysql table to be accesses in a single spark
job, you can create a list of tables and run a map on that list. Something
like:

def getTable(tablename:String): Dataframe
def saveTable(d : Dataframe): Unit

val tables = sc.paralleize(<List of Table>)
tables.map(getTable).map(saveTable)

On Wed, Mar 22, 2017 at 9:41 AM, Shashank Mandil <mandil.shash...@gmail.com>
wrote:

> I am using spark to dump data from mysql into hdfs.
> The way I am doing this is by creating a spark dataframe with the metadata
> of different mysql tables to dump from multiple mysql hosts and then
> running a map over that data frame to dump each mysql table data into hdfs
> inside the executor.
>
> The reason I want spark context is that I would like to use spark jdbc to
> be able to read the mysql table and then the spark writer to be able to
> write to hdfs.
>
> Thanks,
> Shashank
>
> On Tue, Mar 21, 2017 at 3:37 PM, ayan guha <guha.a...@gmail.com> wrote:
>
>> What is your use case? I am sure there must be a better way to solve
>> it....
>>
>> On Wed, Mar 22, 2017 at 9:34 AM, Shashank Mandil <
>> mandil.shash...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> I am using spark in a yarn cluster mode.
>>> When I run a yarn application it creates multiple executors on the
>>> hadoop datanodes for processing.
>>>
>>> Is it possible for me to create a local spark context (master=local) on
>>> these executors to be able to get a spark context ?
>>>
>>> Theoretically since each executor is a java process this should be
>>> doable isn't it ?
>>>
>>> Thanks,
>>> Shashank
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
>


-- 
Best Regards,
Ayan Guha

Reply via email to