Re: DistCP - Spark-based

2014-09-11 Thread Nicholas Chammas
I've created SPARK-3499 to track creating a Spark-based distcp utility. Nick On Tue, Aug 12, 2014 at 4:20 PM, Matei Zaharia wrote: > Good question; I don't know of one but I believe people at Cloudera had > some thoughts of porting Sqoop to Spa

Re: DistCP - Spark-based

2014-08-12 Thread Matei Zaharia
Good question; I don't know of one but I believe people at Cloudera had some thoughts of porting Sqoop to Spark in the future, and maybe they'd consider DistCP as part of this effort. I agree it's missing right now. Matei On August 12, 2014 at 11:04:28 AM, Gary Malouf (malouf.g...@gmail.com) wr

DistCP - Spark-based

2014-08-12 Thread Gary Malouf
We are probably still the minority, but our analytics platform based on Spark + HDFS does not have map/reduce installed. I'm wondering if there is a distcp equivalent that leverages Spark to do the work. Our team is trying to find the best way to do cross-datacenter replication of our HDFS data t