Hi Nitin,
here is my stab at answering the question:

>       • Does sqoop perform a clean up of the already imported/exported data?

Import happens to temporary directory, if the job wont’ finish all partially 
imported data will get dropped. On export side we have a lot of smaller 
transactions so you will get partial export in case of failure. However we have 
option to export with staging table that is designed to deal with this partial 
export issue. I would suggest to take a look into our user guide [1].

>       • Does sqoop automatically restart the job in the case of network 
> failure?

There are multiple levels of parallelism and re-tries. If one task fails, 
Hadoop will re-run it by default 3 times before killing the whole job itself. 
We’re not restarting the whole job as we’re assuming that if 3 re-tries didn’t 
help, there is no point with retrying it again.

Jarcec

Links:
1: 
http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_literal_sqoop_export_literal


> On Jan 24, 2016, at 10:30 PM, Nitin Kumar <[email protected]> wrote:
> 
> 
> I am using apache sqoop 1.4.6 (distributed with HortonWorks HDP 2.3 package) 
> to import and export data between rdbms systems and hdfs. I have to deploy 
> this in a production environment and was wondering about the network 
> resilience of sqoop.
> Say I'm done with about 90% of the import/export job and there is a network 
> failure between the rdbms system and my hadoop cluster. Since sqoop 
> internally executes a map/reduce job for this I'm guessing the job will fail 
> completely and require a manual restart. In this regard I have the following 
> questions
> 
>       • Does sqoop perform a clean up of the already imported/exported data?
>       • Does sqoop automatically restart the job in the case of network 
> failure?
>       • If a manual clean up and restart is required, what other technology 
> alongside sqoop do people generally use to achieve network resilience?
>       • Is there a different version of sqoop that offers this feature?
> Your answers and suggestions would highly appreciated.
> 
> Thanks!
> 

Reply via email to