subject:"Write to HDFS"

Re: Re: EXT: Dual Write to HDFS and MinIO in faster way

2024-05-30 Thread Subhasis Mukherjee

laid=6087618 From: Gera Shegalov Sent: Wednesday, May 29, 2024 7:57:56 am To: Prem Sahoo Cc: eab...@163.com ; Vibhor Gupta ; user @spark Subject: Re: Re: EXT: Dual Write to HDFS and MinIO in faster way I agree with the previous answers that (if requirements allow it

Re: Re: EXT: Dual Write to HDFS and MinIO in faster way

2024-05-28 Thread Gera Shegalov

Tue, May 21, 2024 at 9:15 PM eab...@163.com wrote: > >> Hi， >> I think you should write to HDFS then copy file (parquet or orc) >> from HDFS to MinIO. >> >> -- >> eabour >> >> >> *From:* Prem Sahoo >>

Re: Re: EXT: Dual Write to HDFS and MinIO in faster way

2024-05-21 Thread Prem Sahoo

I am looking for writer/comitter optimization which can make the spark write faster. On Tue, May 21, 2024 at 9:15 PM eab...@163.com wrote: > Hi， > I think you should write to HDFS then copy file (parquet or orc) from > HDFS to MinIO. > > -- >

Re: Re: EXT: Dual Write to HDFS and MinIO in faster way

2024-05-21 Thread eab...@163.com

Hi， I think you should write to HDFS then copy file (parquet or orc) from HDFS to MinIO. eabour From: Prem Sahoo Date: 2024-05-22 00:38 To: Vibhor Gupta; user Subject: Re: EXT: Dual Write to HDFS and MinIO in faster way On Tue, May 21, 2024 at 6:58 AM Prem Sahoo wrote: Hello Vibhor

Re: EXT: Dual Write to HDFS and MinIO in faster way

2024-05-21 Thread Prem Sahoo

help me in scenario 2 ? > How to make spark write to MinIO faster ? > Sent from my iPhone > > On May 21, 2024, at 1:18 AM, Vibhor Gupta > wrote: > > > > Hi Prem, > > > > You can try to write to HDFS then read from HDFS and write to MinIO. > > > &

AWS EMR slow write to HDFS

2019-06-11 Thread Femi Anthony

I'm writing a large dataset in Parquet format to HDFS using Spark and it runs rather slowly in EMR vs say Databricks. I realize that if I was able to use Hadoop 3.1, it would be much more performant because it has a high performance output committer. Is this the case, and if so - when will the

Re: Write to HDFS

2017-10-20 Thread Deepak Sharma

Better use coalesce instead of repatition On Fri, Oct 20, 2017 at 9:47 PM, Marco Mistroni wrote: > Use counts.repartition(1).save.. > Hth > > > On Oct 20, 2017 3:01 PM, "Uğur Sopaoğlu" wrote: > > Actually, when I run following code, > > val textFile = sc.textFile("Sample.txt") > val co

Re: Write to HDFS

2017-10-20 Thread Marco Mistroni

Use counts.repartition(1).save.. Hth On Oct 20, 2017 3:01 PM, "Uğur Sopaoğlu" wrote: Actually, when I run following code, val textFile = sc.textFile("Sample.txt") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByK

Re: Write to HDFS

2017-10-20 Thread Uğur Sopaoğlu

Actually, when I run following code, val textFile = sc.textFile("Sample.txt") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) It save the results into more than one partition like part-0, part-1. I w

Re: Write to HDFS

2017-10-20 Thread Marco Mistroni

Hi Could you just create an rdd/df out of what you want to save and store it in hdfs? Hth On Oct 20, 2017 9:44 AM, "Uğur Sopaoğlu" wrote: > Hi all, > > In word count example, > > val textFile = sc.textFile("Sample.txt") > val counts = textFile.flatMap(line => line.split(" ")) >

Write to HDFS

2017-10-20 Thread Uğur Sopaoğlu

Hi all, In word count example, val textFile = sc.textFile("Sample.txt") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://master:8020/user/abc") I want to write collection of "*c

Slow Parquet write to HDFS using Spark

2016-11-03 Thread morfious902002

1560.n3.nabble.com/Slow-Parquet-write-to-HDFS-using-Spark-tp28011.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Looking for the method executors uses to write to HDFS

2015-11-06 Thread Reynold Xin

r. I could use the HDFS API > but I am worried that it won't work on a secure cluster. I assume that the > method the executors use to write to HDFS takes care of managing Hadoop > security. However, I can't find the place where HDFS write happens in the > spark source.

Looking for the method executors uses to write to HDFS

2015-11-04 Thread Tóth Zoltán

Hi, I'd like to write a parquet file from the driver. I could use the HDFS API but I am worried that it won't work on a secure cluster. I assume that the method the executors use to write to HDFS takes care of managing Hadoop security. However, I can't find the place where HDFS w

Re: Removing empty partitions before we write to HDFS

2015-08-06 Thread Richard Marscher

o delete empty files from the write path. On Thu, Aug 6, 2015 at 3:33 PM, Patanachai Tangchaisin wrote: > Currently, I use rdd.isEmpty() > > Thanks, > Patanachai > > > > On 08/06/2015 12:02 PM, gpatcham wrote: > >> Is there a way to filter out empty partitions

Re: Removing empty partitions before we write to HDFS

2015-08-06 Thread Patanachai Tangchaisin

Currently, I use rdd.isEmpty() Thanks, Patanachai On 08/06/2015 12:02 PM, gpatcham wrote: Is there a way to filter out empty partitions before I write to HDFS other than using reparition and colasce ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com

Removing empty partitions before we write to HDFS

2015-08-06 Thread gpatcham

Is there a way to filter out empty partitions before I write to HDFS other than using reparition and colasce ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Removing-empty-partitions-before-we-write-to-HDFS-tp24156.html Sent from the Apache Spark User

Re: Re: EXT: Dual Write to HDFS and MinIO in faster way

Re: Re: EXT: Dual Write to HDFS and MinIO in faster way

Re: Re: EXT: Dual Write to HDFS and MinIO in faster way

Re: Re: EXT: Dual Write to HDFS and MinIO in faster way

Re: EXT: Dual Write to HDFS and MinIO in faster way

AWS EMR slow write to HDFS

Re: Write to HDFS

Re: Write to HDFS

Re: Write to HDFS

Re: Write to HDFS

Write to HDFS

Slow Parquet write to HDFS using Spark

Re: Looking for the method executors uses to write to HDFS

Looking for the method executors uses to write to HDFS

Re: Removing empty partitions before we write to HDFS

Re: Removing empty partitions before we write to HDFS

Removing empty partitions before we write to HDFS

17 matches

Site Navigation

Mail list logo

Footer information