Re: Re: EXT: Dual Write to HDFS and MinIO in faster way

2024-05-30 Thread Subhasis Mukherjee
laid=6087618 From: Gera Shegalov Sent: Wednesday, May 29, 2024 7:57:56 am To: Prem Sahoo Cc: eab...@163.com ; Vibhor Gupta ; user @spark Subject: Re: Re: EXT: Dual Write to HDFS and MinIO in faster way I agree with the previous answers that (if requirements allow it

Re: Re: EXT: Dual Write to HDFS and MinIO in faster way

2024-05-28 Thread Gera Shegalov
Tue, May 21, 2024 at 9:15 PM eab...@163.com wrote: > >> Hi, >> I think you should write to HDFS then copy file (parquet or orc) >> from HDFS to MinIO. >> >> -- >> eabour >> >> >> *From:* Prem Sahoo >>

Re: Re: EXT: Dual Write to HDFS and MinIO in faster way

2024-05-21 Thread Prem Sahoo
I am looking for writer/comitter optimization which can make the spark write faster. On Tue, May 21, 2024 at 9:15 PM eab...@163.com wrote: > Hi, > I think you should write to HDFS then copy file (parquet or orc) from > HDFS to MinIO. > > -- >

Re: Re: EXT: Dual Write to HDFS and MinIO in faster way

2024-05-21 Thread eab...@163.com
Hi, I think you should write to HDFS then copy file (parquet or orc) from HDFS to MinIO. eabour From: Prem Sahoo Date: 2024-05-22 00:38 To: Vibhor Gupta; user Subject: Re: EXT: Dual Write to HDFS and MinIO in faster way On Tue, May 21, 2024 at 6:58 AM Prem Sahoo wrote: Hello Vibhor

Re: EXT: Dual Write to HDFS and MinIO in faster way

2024-05-21 Thread Prem Sahoo
help me in scenario 2 ? > How to make spark write to MinIO faster ? > Sent from my iPhone > > On May 21, 2024, at 1:18 AM, Vibhor Gupta > wrote: > >  > > Hi Prem, > > > > You can try to write to HDFS then read from HDFS and write to MinIO. > > > &

AWS EMR slow write to HDFS

2019-06-11 Thread Femi Anthony
I'm writing a large dataset in Parquet format to HDFS using Spark and it runs rather slowly in EMR vs say Databricks. I realize that if I was able to use Hadoop 3.1, it would be much more performant because it has a high performance output committer. Is this the case, and if so - when will the

Re: Write to HDFS

2017-10-20 Thread Deepak Sharma
Better use coalesce instead of repatition On Fri, Oct 20, 2017 at 9:47 PM, Marco Mistroni wrote: > Use counts.repartition(1).save.. > Hth > > > On Oct 20, 2017 3:01 PM, "Uğur Sopaoğlu" wrote: > > Actually, when I run following code, > > val textFile = sc.textFile("Sample.txt") > val co

Re: Write to HDFS

2017-10-20 Thread Marco Mistroni
Use counts.repartition(1).save.. Hth On Oct 20, 2017 3:01 PM, "Uğur Sopaoğlu" wrote: Actually, when I run following code, val textFile = sc.textFile("Sample.txt") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByK

Re: Write to HDFS

2017-10-20 Thread Uğur Sopaoğlu
Actually, when I run following code, val textFile = sc.textFile("Sample.txt") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) It save the results into more than one partition like part-0, part-1. I w

Re: Write to HDFS

2017-10-20 Thread Marco Mistroni
Hi Could you just create an rdd/df out of what you want to save and store it in hdfs? Hth On Oct 20, 2017 9:44 AM, "Uğur Sopaoğlu" wrote: > Hi all, > > In word count example, > > val textFile = sc.textFile("Sample.txt") > val counts = textFile.flatMap(line => line.split(" ")) >

Write to HDFS

2017-10-20 Thread Uğur Sopaoğlu
Hi all, In word count example, val textFile = sc.textFile("Sample.txt") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://master:8020/user/abc") I want to write collection of "*c

Slow Parquet write to HDFS using Spark

2016-11-03 Thread morfious902002
1560.n3.nabble.com/Slow-Parquet-write-to-HDFS-using-Spark-tp28011.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Looking for the method executors uses to write to HDFS

2015-11-06 Thread Reynold Xin
r. I could use the HDFS API > but I am worried that it won't work on a secure cluster. I assume that the > method the executors use to write to HDFS takes care of managing Hadoop > security. However, I can't find the place where HDFS write happens in the > spark source.

Looking for the method executors uses to write to HDFS

2015-11-04 Thread Tóth Zoltán
Hi, I'd like to write a parquet file from the driver. I could use the HDFS API but I am worried that it won't work on a secure cluster. I assume that the method the executors use to write to HDFS takes care of managing Hadoop security. However, I can't find the place where HDFS w

Re: Removing empty partitions before we write to HDFS

2015-08-06 Thread Richard Marscher
o delete empty files from the write path. On Thu, Aug 6, 2015 at 3:33 PM, Patanachai Tangchaisin wrote: > Currently, I use rdd.isEmpty() > > Thanks, > Patanachai > > > > On 08/06/2015 12:02 PM, gpatcham wrote: > >> Is there a way to filter out empty partitions

Re: Removing empty partitions before we write to HDFS

2015-08-06 Thread Patanachai Tangchaisin
Currently, I use rdd.isEmpty() Thanks, Patanachai On 08/06/2015 12:02 PM, gpatcham wrote: Is there a way to filter out empty partitions before I write to HDFS other than using reparition and colasce ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com

Removing empty partitions before we write to HDFS

2015-08-06 Thread gpatcham
Is there a way to filter out empty partitions before I write to HDFS other than using reparition and colasce ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Removing-empty-partitions-before-we-write-to-HDFS-tp24156.html Sent from the Apache Spark User