Ajay, You can use Sqoop if wants to ingest data to HDFS. This is POC where
customer wants to prove that Spark ETL would be faster than C# based raw
SQL Statements. That's all, There are no time-stamp based columns in Source
tables to make it incremental load.

On Thu, May 24, 2018 at 1:08 AM, ayan guha <guha.a...@gmail.com> wrote:

> Curious question: what is the reason of using spark here? Why not simple
> sql-based ETL?
>
> On Thu, May 24, 2018 at 5:09 AM, Ajay <ajay.ku...@gmail.com> wrote:
>
>> Do you worry about spark overloading the SQL server?  We have had this
>> issue in the past where all spark slaves tend to send lots of data at once
>> to SQL and that slows down the latency of the rest of the system. We
>> overcame this by using sqoop and running it in a controlled environment.
>>
>> On Wed, May 23, 2018 at 7:32 AM Chetan Khatri <
>> chetan.opensou...@gmail.com> wrote:
>>
>>> Super, just giving high level idea what i want to do. I have one source
>>> schema which is MS SQL Server 2008 and target is also MS SQL Server 2008.
>>> Currently there is c# based ETL application which does extract transform
>>> and load as customer specific schema including indexing etc.
>>>
>>>
>>> Thanks
>>>
>>> On Wed, May 23, 2018 at 7:11 PM, kedarsdixit <
>>> kedarnath_di...@persistent.com> wrote:
>>>
>>>> Yes.
>>>>
>>>> Regards,
>>>> Kedar Dixit
>>>>
>>>>
>>>>
>>>> --
>>>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>>
>>>>
>>>
>>
>> --
>> Thanks,
>> Ajay
>>
>
>
>
> --
> Best Regards,
> Ayan Guha
>

Reply via email to