? to Ingest 10TB from FTP

Jörn Franke Fri, 14 Aug 2015 14:03:48 -0700

Well what do you do in case of failure?
I think one should use a professional ingestion tool that ideally does not
need to reload everything in case of failure and verifies that the file has
been transferred correctly via checksums.
I am not sure if Flume supports ftp, but Ssh,scp should be supported. You
may check also other Flume sources or write your own in case of ftp (taking
into account comments above). I hope your file is compressed


Le ven. 14 août 2015 à 22:23, Marcelo Vanzin <van...@cloudera.com> a écrit :

> Why do you need to use Spark or Flume for this?
>
> You can just use curl and hdfs:
>
>   curl ftp://blah | hdfs dfs -put - /blah
>
>
> On Fri, Aug 14, 2015 at 1:15 PM, Varadhan, Jawahar <
> varad...@yahoo.com.invalid> wrote:
>
>> What is the best way to bring such a huge file from a FTP server into
>> Hadoop to persist in HDFS? Since a single jvm process might run out of
>> memory, I was wondering if I can use Spark or Flume to do this. Any help on
>> this matter is appreciated.
>>
>> I prefer a application/process running inside Hadoop which is doing this
>> transfer
>>
>> Thanks.
>>
>
>
>
> --
> Marcelo
>

Re: Setting up Spark/flume/? to Ingest 10TB from FTP

Reply via email to