Ajay, You can use Sqoop if wants to ingest data to HDFS. This is POC where customer wants to prove that Spark ETL would be faster than C# based raw SQL Statements. That's all, There are no time-stamp based columns in Source tables to make it incremental load.
On Thu, May 24, 2018 at 1:08 AM, ayan guha <guha.a...@gmail.com> wrote: > Curious question: what is the reason of using spark here? Why not simple > sql-based ETL? > > On Thu, May 24, 2018 at 5:09 AM, Ajay <ajay.ku...@gmail.com> wrote: > >> Do you worry about spark overloading the SQL server? We have had this >> issue in the past where all spark slaves tend to send lots of data at once >> to SQL and that slows down the latency of the rest of the system. We >> overcame this by using sqoop and running it in a controlled environment. >> >> On Wed, May 23, 2018 at 7:32 AM Chetan Khatri < >> chetan.opensou...@gmail.com> wrote: >> >>> Super, just giving high level idea what i want to do. I have one source >>> schema which is MS SQL Server 2008 and target is also MS SQL Server 2008. >>> Currently there is c# based ETL application which does extract transform >>> and load as customer specific schema including indexing etc. >>> >>> >>> Thanks >>> >>> On Wed, May 23, 2018 at 7:11 PM, kedarsdixit < >>> kedarnath_di...@persistent.com> wrote: >>> >>>> Yes. >>>> >>>> Regards, >>>> Kedar Dixit >>>> >>>> >>>> >>>> -- >>>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>>> >>>> >>> >> >> -- >> Thanks, >> Ajay >> > > > > -- > Best Regards, > Ayan Guha >