If all you’re doing is just dumping tables from SQLServer to HDFS, have you
looked at Sqoop?
Otherwise, if you need to run this in Spark could you just use the existing
JdbcRDD?
From: Shushant Arora
Date: Wednesday, July 1, 2015 at 10:19 AM
To: user
Subject: custom RDD in java
Hi
Is it possible to write custom RDD in java?
Requirement is - I am having a list of Sqlserver tables need to be dumped in
HDFS.
So I have a
List<String> tables = {dbname.tablename,dbname.tablename2......};
then
JavaRDD<String> rdd = javasparkcontext.parllelise(tables);
JavaRDDString> tablecontent = rdd.map(new
Function<String,Iterable<String>>){fetch table and return populate iterable}
tablecontent.storeAsTextFile("hffs path");
In rdd.map(new Function<String,>). I cannot keep complete table content in
memory , so I want to creat my own RDD to handle it.
Thanks
Shushant