laid=6087618
From: Gera Shegalov
Sent: Wednesday, May 29, 2024 7:57:56 am
To: Prem Sahoo
Cc: eab...@163.com ; Vibhor Gupta ;
user @spark
Subject: Re: Re: EXT: Dual Write to HDFS and MinIO in faster way
I agree with the previous answers that (if requirements allow it
Tue, May 21, 2024 at 9:15 PM eab...@163.com wrote:
>
>> Hi,
>> I think you should write to HDFS then copy file (parquet or orc)
>> from HDFS to MinIO.
>>
>> --
>> eabour
>>
>>
>> *From:* Prem Sahoo
>>
I am looking for writer/comitter optimization which can make the spark
write faster.
On Tue, May 21, 2024 at 9:15 PM eab...@163.com wrote:
> Hi,
> I think you should write to HDFS then copy file (parquet or orc) from
> HDFS to MinIO.
>
> --
>
Hi,
I think you should write to HDFS then copy file (parquet or orc) from HDFS
to MinIO.
eabour
From: Prem Sahoo
Date: 2024-05-22 00:38
To: Vibhor Gupta; user
Subject: Re: EXT: Dual Write to HDFS and MinIO in faster way
On Tue, May 21, 2024 at 6:58 AM Prem Sahoo wrote:
Hello Vibhor
help me in scenario 2 ?
> How to make spark write to MinIO faster ?
> Sent from my iPhone
>
> On May 21, 2024, at 1:18 AM, Vibhor Gupta
> wrote:
>
>
>
> Hi Prem,
>
>
>
> You can try to write to HDFS then read from HDFS and write to MinIO.
>
>
>
&
I'm writing a large dataset in Parquet format to HDFS using Spark and it runs
rather slowly in EMR vs say Databricks. I realize that if I was able to use
Hadoop 3.1, it would be much more performant because it has a high performance
output committer. Is this the case, and if so - when will the
Better use coalesce instead of repatition
On Fri, Oct 20, 2017 at 9:47 PM, Marco Mistroni wrote:
> Use counts.repartition(1).save..
> Hth
>
>
> On Oct 20, 2017 3:01 PM, "Uğur Sopaoğlu" wrote:
>
> Actually, when I run following code,
>
> val textFile = sc.textFile("Sample.txt")
> val co
Use counts.repartition(1).save..
Hth
On Oct 20, 2017 3:01 PM, "Uğur Sopaoğlu" wrote:
Actually, when I run following code,
val textFile = sc.textFile("Sample.txt")
val counts = textFile.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByK
Actually, when I run following code,
val textFile = sc.textFile("Sample.txt")
val counts = textFile.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
It save the results into more than one partition like part-0,
part-1. I w
Hi
Could you just create an rdd/df out of what you want to save and store it
in hdfs?
Hth
On Oct 20, 2017 9:44 AM, "Uğur Sopaoğlu" wrote:
> Hi all,
>
> In word count example,
>
> val textFile = sc.textFile("Sample.txt")
> val counts = textFile.flatMap(line => line.split(" "))
>
Hi all,
In word count example,
val textFile = sc.textFile("Sample.txt")
val counts = textFile.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
counts.saveAsTextFile("hdfs://master:8020/user/abc")
I want to write collection of "*c
1560.n3.nabble.com/Slow-Parquet-write-to-HDFS-using-Spark-tp28011.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
r. I could use the HDFS API
> but I am worried that it won't work on a secure cluster. I assume that the
> method the executors use to write to HDFS takes care of managing Hadoop
> security. However, I can't find the place where HDFS write happens in the
> spark source.
Hi,
I'd like to write a parquet file from the driver. I could use the HDFS API
but I am worried that it won't work on a secure cluster. I assume that the
method the executors use to write to HDFS takes care of managing Hadoop
security. However, I can't find the place where HDFS w
o delete empty files from the
write path.
On Thu, Aug 6, 2015 at 3:33 PM, Patanachai Tangchaisin
wrote:
> Currently, I use rdd.isEmpty()
>
> Thanks,
> Patanachai
>
>
>
> On 08/06/2015 12:02 PM, gpatcham wrote:
>
>> Is there a way to filter out empty partitions
Currently, I use rdd.isEmpty()
Thanks,
Patanachai
On 08/06/2015 12:02 PM, gpatcham wrote:
Is there a way to filter out empty partitions before I write to HDFS other
than using reparition and colasce ?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com
Is there a way to filter out empty partitions before I write to HDFS other
than using reparition and colasce ?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Removing-empty-partitions-before-we-write-to-HDFS-tp24156.html
Sent from the Apache Spark User
17 matches
Mail list logo