[Spark Core] saveAsTextFile is unable to rename a directory using hadoop-azure NativeAzureFileSystem

2021-09-13 Thread Abhishek Jindal
Hello, I am trying to use the Spark rdd.saveAsTextFile function which calls the FileSystem.rename() under the hood. This errors out with “com.microsoft.azure.storage.StorageException: One of the request inputs is not valid” when using hadoop-azure NativeAzureFileSystem. I have written a small test

Spark saveAsTextFile Disk Recommendation

2021-03-21 Thread ranju goel
just checked the 2nd argument of saveAsTextFile and I believe read and write will be faster on disk after use of compression. I will try this. So I think there is no special requirement on type of disk for execution of saveAsTextFile as they are local I/O operations. Regards Ranju

RE: Spark saveAsTextFile Disk Recommendation

2021-03-21 Thread Ranju Jain
csv file. This script runs on every node and later they all combine to single file. On the other hand is your data really just a collection of strings without any repetitions [Ranju]: Yes It is comma separated string. And I just checked the 2nd argument of saveAsTextFile and I believe read and

Re: Spark saveAsTextFile Disk Recommendation

2021-03-20 Thread Attila Zsolt Piros
Hi! I would like to reflect only to the first part of your mail: I have a large RDD dataset of around 60-70 GB which I cannot send to driver > using *collect* so first writing that to disk using *saveAsTextFile* and > then this data gets saved in the form of multiple part files on eac

Spark saveAsTextFile Disk Recommendation

2021-03-20 Thread Ranju Jain
Hi All, I have a large RDD dataset of around 60-70 GB which I cannot send to driver using collect so first writing that to disk using saveAsTextFile and then this data gets saved in the form of multiple part files on each node of the cluster and after that driver reads the data from that

Re: How to improve performance of saveAsTextFile()

2017-03-11 Thread Yan Facai
How about increasing RDD's partitions / rebalancing data? On Sat, Mar 11, 2017 at 2:33 PM, Parsian, Mahmoud wrote: > How to improve performance of JavaRDD.saveAsTextFile(“hdfs://…“). > This is taking over 30 minutes on a cluster of 10 nodes. > Running Spark on YARN. > > JavaRDD has 120 million e

How to improve performance of saveAsTextFile()

2017-03-10 Thread Parsian, Mahmoud
How to improve performance of JavaRDD.saveAsTextFile(“hdfs://…“). This is taking over 30 minutes on a cluster of 10 nodes. Running Spark on YARN. JavaRDD has 120 million entries. Thank you, Best regards, Mahmoud

Re: Writing/Saving RDD to HDFS using saveAsTextFile

2016-10-07 Thread Deepak Sharma
Hi Mahendra Did you tried mapping the X case class members further to a String object and then saving the RDD[String] ? Thanks Deepak On Oct 7, 2016 23:04, "Mahendra Kutare" wrote: > Hi, > > I am facing issue with writing RDD[X] to HDFS file path. X is a simple > case class with variable time

Writing/Saving RDD to HDFS using saveAsTextFile

2016-10-07 Thread Mahendra Kutare
Hi, I am facing issue with writing RDD[X] to HDFS file path. X is a simple case class with variable time as primitive long. When I run the driver program with - master as spark://:7077 I get this - Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.readFully(Ob

Re: saveAsTextFile at treeEnsembleModels.scala:447, took 2.513396 s Killed

2016-07-28 Thread Ascot Moss
my_model.save(sc, "/my_model") >> >> - >> 16/07/28 08:36:19 INFO TaskSchedulerImpl: Removed TaskSet 69.0, whose >> tasks have all completed, from pool >> >> 16/07/28 08:36:19 INFO DAGScheduler: ResultStage 69 (saveAsTex

saveAsTextFile at treeEnsembleModels.scala:447, took 2.513396 s Killed

2016-07-27 Thread Ascot Moss
s have all completed, from pool 16/07/28 08:36:19 INFO DAGScheduler: ResultStage 69 (saveAsTextFile at treeEnsembleModels.scala:447) finished in 0.901 s 16/07/28 08:36:19 INFO DAGScheduler: Job 38 finished: saveAsTextFile at treeEnsembleModels.scala:447, took 2.513396

Re: problem about RDD map and then saveAsTextFile

2016-05-27 Thread Christian Hellström
Internally, saveAsTextFile uses saveAsHadoopFile: https://github.com/apache/spark/blob/d5911d1173fe0872f21cae6c47abf8ff479345a4/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala . The final bit in the method first creates the output path and then saves the data set. However, if

problem about RDD map and then saveAsTextFile

2016-05-27 Thread Reminia Scarlet
Hi all: I’ve tried to execute something as below: result.map(transform).saveAsTextFile(hdfsAddress) Result is a RDD caluculated from mlilib algorithm. I submit this to yarn, and after two attempts , the application failed. But the exception in log is very missleading. It said hdfsAddress

RE: saveAsTextFile is not writing to local fs

2016-02-01 Thread Mohammed Guller
ler Cc: spark users Subject: Re: saveAsTextFile is not writing to local fs Hi Mohamed, Thanks for your response. Data is available in worker nodes. But looking for something to write directly to local fs. Seems like it is not an option. Thanks, Sivakumar Bhavanari. On Mon, Feb 1, 2016 at 5

Re: saveAsTextFile is not writing to local fs

2016-02-01 Thread Siva
hor: Big Data Analytics with Spark > <http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/> > > > > *From:* Siva [mailto:sbhavan...@gmail.com] > *Sent:* Friday, January 29, 2016 5:40 PM > *To:* Mohammed Guller > *Cc:* spark users > *

RE: saveAsTextFile is not writing to local fs

2016-02-01 Thread Mohammed Guller
Analytics with Spark<http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/> From: Siva [mailto:sbhavan...@gmail.com] Sent: Friday, January 29, 2016 5:40 PM To: Mohammed Guller Cc: spark users Subject: Re: saveAsTextFile is not writing to local fs Hi Mohammed, Thanks fo

Re: saveAsTextFile is not writing to local fs

2016-01-29 Thread Siva
Hi Mohammed, Thanks for your quick response. I m submitting spark job to Yarn in "yarn-client" mode on a 6 node cluster. I ran the job by turning on DEBUG mode. I see the below exception, but this exception occurred after saveAsTextfile function is finished. 16/01/29 20:26:57 DEBUG

RE: saveAsTextFile is not writing to local fs

2016-01-29 Thread Mohammed Guller
: Siva [mailto:sbhavan...@gmail.com] Sent: Friday, January 29, 2016 3:38 PM To: spark users Subject: saveAsTextFile is not writing to local fs Hi Everyone, We are using spark 1.4.1 and we have a requirement of writing data local fs instead of hdfs. When trying to save rdd to local fs with saveAsTextFile,

saveAsTextFile is not writing to local fs

2016-01-29 Thread Siva
Hi Everyone, We are using spark 1.4.1 and we have a requirement of writing data local fs instead of hdfs. When trying to save rdd to local fs with saveAsTextFile, it is just writing _SUCCESS file in the folder with no part- files and also no error or warning messages on console. Is there any

Re: coalesce(1).saveAsTextfile() takes forever?

2016-01-05 Thread Andy Davidson
coalesce(1).saveAsTextfile() takes forever? > hi I am trying to save many partitions of Dataframe into one CSV file and it > take forever for large data sets of around 5-6 GB. > > sourceFrame.coalesce(1).write().format("com.databricks.spark.csv").option("gzi > p&qu

Re: coalesce(1).saveAsTextfile() takes forever?

2016-01-05 Thread Umesh Kacha
") >> >> For small data above code works well but for large data it hangs forever >> does not move on because of only one partitions has to shuffle data of GBs >> please help me >> >> >> >> -- >> View this message in context: >> http:/

Re: coalesce(1).saveAsTextfile() takes forever?

2016-01-05 Thread Igor Berman
t;> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/coalesce-1-saveAsTextfile-takes-forever-tp25886.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> -

Re: coalesce(1).saveAsTextfile() takes forever?

2016-01-05 Thread Alexander Pivovarov
n context: > http://apache-spark-user-list.1001560.n3.nabble.com/coalesce-1-saveAsTextfile-takes-forever-tp25886.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubs

coalesce(1).saveAsTextfile() takes forever?

2016-01-05 Thread unk1102
ll but for large data it hangs forever does not move on because of only one partitions has to shuffle data of GBs please help me -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/coalesce-1-saveAsTextfile-takes-forever-tp25886.html Sent from the Apache Spark Use

Re: Improve saveAsTextFile performance

2015-12-07 Thread Akhil Das
3b#file-saveasparquet-java-L80 > > Best Regards, > Ram > -- > > Date: Saturday, December 5, 2015 at 7:18 AM > To: Akhil Das > > Cc: user > Subject: Re: Improve saveAsTextFile performance > > >If you are doing a join/groupBy kind of operations then you need t

Re: Improve saveAsTextFile performance

2015-12-05 Thread Ram VISWANADHA
, Ram -- Date: Saturday, December 5, 2015 at 7:18 AM To: Akhil Das mailto:ak...@sigmoidanalytics.com>> Cc: user mailto:user@spark.apache.org>> Subject: Re: Improve saveAsTextFile performance >If you are doing a join/groupBy kind of operations then you need to make sure >the keys are

Re: Improve saveAsTextFile performance

2015-12-05 Thread Ram VISWANADHA
0 8 3.9 MB / 95334 Best Regards, Ram From: Akhil Das mailto:ak...@sigmoidanalytics.com>> Date: Saturday, December 5, 2015 at 1:32 AM To: Ram VISWANADHA mailto:ram.viswana...@dailymotion.com>> Cc: user mailto:user@spark.apache.org>> Subject: Re: Improve saveAsTex

Re: Improve saveAsTextFile performance

2015-12-05 Thread Akhil Das
the partitions. Thanks Best Regards On Sat, Dec 5, 2015 at 8:24 AM, Ram VISWANADHA < ram.viswana...@dailymotion.com> wrote: > That didn’t work :( > Any help I have documented some steps here. > > http://stackoverflow.com/questions/34048340/spark-saveastextfile-last-stage-alm

Re: Improve saveAsTextFile performance

2015-12-04 Thread Ram VISWANADHA
That didn’t work :( Any help I have documented some steps here. http://stackoverflow.com/questions/34048340/spark-saveastextfile-last-stage-almost-never-finishes Best Regards, Ram From: Sahil Sareen mailto:sareen...@gmail.com>> Date: Wednesday, December 2, 2015 at 10:18 PM To: Ram VISW

Re: Improve saveAsTextFile performance

2015-12-02 Thread Sahil Sareen
From: Ted Yu > Date: Wednesday, December 2, 2015 at 3:25 PM > To: Ram VISWANADHA > Cc: user > Subject: Re: Improve saveAsTextFile performance > > Have you tried calling coalesce() before saveAsTextFile ? > > Cheers > > On Wed, Dec 2, 2015 at 3:15 PM, Ram V

Re: Improve saveAsTextFile performance

2015-12-02 Thread Ram VISWANADHA
Yes. That did not help. Best Regards, Ram From: Ted Yu mailto:yuzhih...@gmail.com>> Date: Wednesday, December 2, 2015 at 3:25 PM To: Ram VISWANADHA mailto:ram.viswana...@dailymotion.com>> Cc: user mailto:user@spark.apache.org>> Subject: Re: Improve saveAsTextFile performan

Re: Improve saveAsTextFile performance

2015-12-02 Thread Ted Yu
Have you tried calling coalesce() before saveAsTextFile ? Cheers On Wed, Dec 2, 2015 at 3:15 PM, Ram VISWANADHA < ram.viswana...@dailymotion.com> wrote: > JavaRDD.saveAsTextFile is taking a long time to succeed. There are 10 > tasks, the first 9 complete in a reasonable time but t

Improve saveAsTextFile performance

2015-12-02 Thread Ram VISWANADHA
JavaRDD.saveAsTextFile is taking a long time to succeed. There are 10 tasks, the first 9 complete in a reasonable time but the last task is taking a long time to complete. The last task contains the maximum number of records like 90% of the total number of records. Is there any way to paralleli

Re: streaming: missing data. does saveAsTextFile() append or replace?

2015-11-09 Thread Andy Davidson
: "user @spark" Subject: Re: streaming: missing data. does saveAsTextFile() append or replace? > Andy, > > Using the rdd.saveAsTextFile(...) will overwrite the data if your target is > the same file. > > If you want to save to HDFS, DStream offers dstream.saveAsTextFile

Re: streaming: missing data. does saveAsTextFile() append or replace?

2015-11-08 Thread Gerard Maas
ote: > Hi > > I just started a new spark streaming project. In this phase of the system > all we want to do is save the data we received to hdfs. I after running for > a couple of days it looks like I am missing a lot of data. I wonder if > saveAsTextFile("hdfs:///rawSt

streaming: missing data. does saveAsTextFile() append or replace?

2015-11-07 Thread Andy Davidson
Hi I just started a new spark streaming project. In this phase of the system all we want to do is save the data we received to hdfs. I after running for a couple of days it looks like I am missing a lot of data. I wonder if saveAsTextFile("hdfs:///rawSteamingData²); is overwriting the d

RE: error with saveAsTextFile in local directory

2015-11-03 Thread Jack Yang
Yes. My one is 1.4.0. Then is this problem to do with the version? I doubt that. Any comments please? From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Wednesday, 4 November 2015 11:52 AM To: Jack Yang Cc: user@spark.apache.org Subject: Re: error with saveAsTextFile in local directory Looks

Re: error with saveAsTextFile in local directory

2015-11-03 Thread Ted Yu
t; val hdfsFilePath = "hdfs://master:ip/ tempFile "; > > val localFilePath = "file:///home/hduser/tempFile"; > > hiveContext.sql(s"""my hql codes here""") > > res.printSchema() --working > > res.show() --working > &g

error with saveAsTextFile in local directory

2015-11-03 Thread Jack Yang
res.show() --working res.map{ x => tranRow2Str(x) }.coalesce(1).saveAsTextFile(hdfsFilePath) --still working res.map{ x => tranRow2Str(x) }.coalesce(1).saveAsTextFile(localFilePath) --wrong! then at last, I get the correct results in hdfsFilePath, but nothing in localFilePath. Btw, the l

Re: saveAsTextFile creates an empty folder in HDFS

2015-10-03 Thread Ajay Chander
: >> >> This sequence produces the following log and creates the empty folder >> "test": >> >> scala> val l = Seq.fill(1)(nextInt) >> scala> val dist = sc.parallelize(l) >> scala> dist.saveAsTextFile("hdfs://node1.i3a.info/user/ja

Re: saveAsTextFile creates an empty folder in HDFS

2015-10-03 Thread Jacinto Arias
e produces the following log and creates the empty folder > "test": > > scala> val l = Seq.fill(1)(nextInt) > scala> val dist = sc.parallelize(l) > scala> dist.saveAsTextFile("hdfs://node1.i3a.info/user/jarias/test/ > <http://node1.i3a.info/user/jarias/tes

Re: saveAsTextFile creates an empty folder in HDFS

2015-10-03 Thread Ted Yu
t = sc.parallelize(l) > scala> dist.saveAsTextFile("hdfs://node1.i3a.info/user/jarias/test/") > > > 15/10/02 10:19:22 INFO FileOutputCommitter: File Output Committer Algorithm > version is 1 > 15/10/02 10:19:22 INFO SparkContext: Starting job: saveAsTextFile at > :27 > 15/10/02

saveAsTextFile creates an empty folder in HDFS

2015-10-02 Thread jarias
parallelize(l) scala> dist.saveAsTextFile("hdfs://node1.i3a.info/user/jarias/test/") 15/10/02 10:19:22 INFO FileOutputCommitter: File Output Committer Algorithm version is 1 15/10/02 10:19:22 INFO SparkContext: Starting job: saveAsTextFile at :27 15/10/02 10:19:22 INFO DAGSche

Exception during SaveAstextFile Stage

2015-09-24 Thread Chirag Dewan
Hi, I have 2 stages in my job map and save as text file. During the save text file stage I am getting an exception : 15/09/24 15:38:16 WARN AkkaUtils: Error sending message in 1 attempts java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.

RE: spark 1.4.1 saveAsTextFile (and Parquet) is slow on emr-4.0.0

2015-09-03 Thread Ewan Leith
via the –configurations flag on the “aws emr create-cluster” command Thanks, Ewan From: Alexander Pivovarov [mailto:apivova...@gmail.com] Sent: 03 September 2015 00:12 To: Neil Jonkers Cc: user@spark.apache.org Subject: Re: spark 1.4.1 saveAsTextFile is slow on emr-4.0.0 Hi Neil Yes! it helps

Re: spark 1.4.1 saveAsTextFile is slow on emr-4.0.0

2015-09-02 Thread Alexander Pivovarov
Hi Neil Yes! it helps!!! I do not see _temporary in console output anymore. saveAsTextFile is fast now. 2015-09-02 23:07:00,022 INFO [task-result-getter-0] scheduler.TaskSetManager (Logging.scala:logInfo(59)) - Finished task 18.0 in stage 0.0 (TID 18) in 4398 ms on ip-10-0-24-103.ec2.internal

Re: spark 1.4.1 saveAsTextFile is slow on emr-4.0.0

2015-09-02 Thread Neil Jonkers
gt; >> >> >> On Tue, Sep 1, 2015 at 4:01 PM, Alexander Pivovarov > > wrote: >> >>> I run spark 1.4.1 in amazom aws emr 4.0.0 >>> >>> For some reason spark saveAsTextFile is very slow on emr 4.0.0 in >>> comparison to emr 3.8 (was 5 sec, now

Re: spark 1.4.1 saveAsTextFile is slow on emr-4.0.0

2015-09-01 Thread Alexander Pivovarov
ter? > spark.hadoop.mapred.output.committer.class > com.appsflyer.spark.DirectOutputCommitter > > > > On Tue, Sep 1, 2015 at 4:01 PM, Alexander Pivovarov > wrote: > >> I run spark 1.4.1 in amazom aws emr 4.0.0 >> >> For some reason spark saveAsTextFile is very slow on emr 4.0.0 in >

Re: spark 1.4.1 saveAsTextFile is slow on emr-4.0.0

2015-09-01 Thread Alexander Pivovarov
Should I use DirectOutputCommitter? spark.hadoop.mapred.output.committer.class com.appsflyer.spark.DirectOutputCommitter On Tue, Sep 1, 2015 at 4:01 PM, Alexander Pivovarov wrote: > I run spark 1.4.1 in amazom aws emr 4.0.0 > > For some reason spark saveAsTextFile is very slow on

spark 1.4.1 saveAsTextFile is slow on emr-4.0.0

2015-09-01 Thread Alexander Pivovarov
I run spark 1.4.1 in amazom aws emr 4.0.0 For some reason spark saveAsTextFile is very slow on emr 4.0.0 in comparison to emr 3.8 (was 5 sec, now 95 sec) Actually saveAsTextFile says that it's done in 4.356 sec but after that I see lots of INFO messages with 404 error from com.amazonaws.la

Re: EC2 cluster doesn't work saveAsTextFile

2015-08-10 Thread Dean Wampler
twitter.com/deanwampler> >> http://polyglotprogramming.com >> >> On Mon, Aug 10, 2015 at 7:08 AM, Yasemin Kaya wrote: >> >>> Hi, >>> >>> I have EC2 cluster, and am using spark 1.3, yarn and HDFS . When i >>> submit at local there is n

Re: EC2 cluster doesn't work saveAsTextFile

2015-08-10 Thread Yasemin Kaya
m> > @deanwampler <http://twitter.com/deanwampler> > http://polyglotprogramming.com > > On Mon, Aug 10, 2015 at 7:08 AM, Yasemin Kaya wrote: > >> Hi, >> >> I have EC2 cluster, and am using spark 1.3, yarn and HDFS . When i submit >> at local there i

Re: EC2 cluster doesn't work saveAsTextFile

2015-08-10 Thread Dean Wampler
3, yarn and HDFS . When i submit > at local there is no problem , but i run at cluster, saveAsTextFile doesn't > work."*It says me User class threw exception: Output directory > hdfs://172.31.42.10:54310/./weblogReadResult > <http://172.31.42.10:54310/./weblogReadResul

EC2 cluster doesn't work saveAsTextFile

2015-08-10 Thread Yasemin Kaya
Hi, I have EC2 cluster, and am using spark 1.3, yarn and HDFS . When i submit at local there is no problem , but i run at cluster, saveAsTextFile doesn't work."*It says me User class threw exception: Output directory hdfs://172.31.42.10:54310/./weblogReadResult <http://172.3

Re: Combining Spark Files with saveAsTextFile

2015-08-06 Thread MEETHU MATHEW
Hi,Try using coalesce(1) before calling saveAsTextFile() Thanks & Regards, Meethu M On Wednesday, 5 August 2015 7:53 AM, Brandon White wrote: What is the best way to make saveAsTextFile save as only a single file?

Re: Combining Spark Files with saveAsTextFile

2015-08-05 Thread Igor Berman
her clarify, you can first call coalesce with argument 1 and >> then call saveAsTextFile. For example, >> >> >> >> rdd.coalesce(1).saveAsTextFile(...) >> >> >> >> >> >> >> >> Mohammed >> >> >> >

Re: Combining Spark Files with saveAsTextFile

2015-08-04 Thread Igor Berman
August 2015 at 07:43, Mohammed Guller wrote: > Just to further clarify, you can first call coalesce with argument 1 and > then call saveAsTextFile. For example, > > > > rdd.coalesce(1).saveAsTextFile(...) > > > > > > > > Mohammed > > > > *F

RE: Combining Spark Files with saveAsTextFile

2015-08-04 Thread Mohammed Guller
Just to further clarify, you can first call coalesce with argument 1 and then call saveAsTextFile. For example, rdd.coalesce(1).saveAsTextFile(...) Mohammed From: Mohammed Guller Sent: Tuesday, August 4, 2015 9:39 PM To: 'Brandon White'; user Subject: RE: Combining Spark

RE: Combining Spark Files with saveAsTextFile

2015-08-04 Thread Mohammed Guller
One options is to use the coalesce method in the RDD class. Mohammed From: Brandon White [mailto:bwwintheho...@gmail.com] Sent: Tuesday, August 4, 2015 7:23 PM To: user Subject: Combining Spark Files with saveAsTextFile What is the best way to make saveAsTextFile save as only a single file?

Combining Spark Files with saveAsTextFile

2015-08-04 Thread Brandon White
What is the best way to make saveAsTextFile save as only a single file?

spark cache issue while doing saveAsTextFile and saveAsParquetFile

2015-07-14 Thread mathewvinoj
che() val l = applySchema(outputRecords, schemaName).cache() l.saveAsTextFile(filename + ".txt") l.saveAsParquetFile(filename+ ".parquet") Expected result: When we do saveAsTextFile the computation should happen and cache the result and the second time when we do saveAsparque

Re: RDD saveAsTextFile() to local disk

2015-07-08 Thread Vijay Pawnarkar
Thanks for the help. Following are the folders I was trying to write to *saveAsTextFile("*file:///home/someuser/test2/testupload/20150708/0/") *saveAsTextFile("f*ile:///home/someuser/test2/testupload/20150708/1/") *saveAsTextFile("*file:///home/someuser/te

Re: RDD saveAsTextFile() to local disk

2015-07-08 Thread canan chen
wing function > > saveAsTextFile("file:home/someuser/dir2/testupload/20150708/") > > The dir (/home/someuser/dir2/testupload/) was created before running the > job. The error message is misleading. > > > org.apache.spark.SparkException: Job aborted due to stage

RDD saveAsTextFile() to local disk

2015-07-08 Thread spok20nn
Getting exception when wrting RDD to local disk using following function saveAsTextFile("file:home/someuser/dir2/testupload/20150708/") The dir (/home/someuser/dir2/testupload/) was created before running the job. The error message is misleading. org.apache.spark.SparkExce

RDD saveAsTextFile() to local disk

2015-07-08 Thread Vijay Pawnarkar
Getting exception when wrting RDD to local disk using following function saveAsTextFile("file:home/someuser/dir2/testupload/20150708/") The dir (/home/someuser/dir2/testupload/) was created before running the job. The error message is misleading. org.apache.spark.SparkExce

Re: Spark dramatically slow when I add "saveAsTextFile"

2015-05-24 Thread Joe Wass
This may sound like an obvious question, but are you sure that the program is doing any work when you don't have a saveAsTextFile? If there are transformations but no actions to actually collect the data, there's no need for Spark to execute the transformations. As to the question o

Spark dramatically slow when I add "saveAsTextFile"

2015-05-24 Thread allanjie
save the result to HDFS using "*saveAsTextFile*". *Problem*: if I don't add "saveAsTextFile", the program runs very fast(a few seconds), otherwise extremely slow until about 30 mins. *My program (is very Simple)* public static void main(String[]

Re: saveAsTextFile() part- files are missing

2015-05-21 Thread Tomasz Fruboes
1001560.n3.nabble.com/saveAsTextFile-part-files-are-missing-tp22974.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-ma

saveAsTextFile() part- files are missing

2015-05-21 Thread rroxanaioana
oing wrong? Thank you! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/saveAsTextFile-part-files-are-missing-tp22974.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --

Re: FP Growth saveAsTextFile

2015-05-20 Thread Xiangrui Meng
at this. > > scala> > model.freqItemsets.saveAsTextFile("c:///repository/trunk/Scala_210_wspace/fpGrowth/modelText1") > 15/05/20 14:07:47 INFO SparkContext: Starting job: saveAsTextFile at > :33 > 15/05/20 14:07:47 INFO DAGScheduler: Got job 15 (saveAsTextFile at > :33)

Re: FP Growth saveAsTextFile

2015-05-20 Thread Xiangrui Meng
Could you post the stack trace? If you are using Spark 1.3 or 1.4, it would be easier to save freq itemsets as a Parquet file. -Xiangrui On Wed, May 20, 2015 at 12:16 PM, Eric Tanner wrote: > I am having trouble with saving an FP-Growth model as a text file. I can > print out the results, but wh

FP Growth saveAsTextFile

2015-05-20 Thread Eric Tanner
I am having trouble with saving an FP-Growth model as a text file. I can print out the results, but when I try to save the model I get a NullPointerException. model.freqItemsets.saveAsTextFile("c://fpGrowth/model") Thanks, Eric

Re: SaveAsTextFile brings down data nodes with IO Exceptions

2015-05-16 Thread Ilya Ganelin
All - this issue showed up when I was tearing down a spark context and creating a new one. Often, I was unable to then write to HDFS due to this error. I subsequently switched to a different implementation where instead of tearing down and re initializing the spark context I'd instead submit a sepa

Re: SaveAsTextFile brings down data nodes with IO Exceptions

2015-05-15 Thread Puneet Kapoor
I am seeing this on hadoop 2.4.0 version. Thanks for your suggestions, i will try those and let you know if they help ! On Sat, May 16, 2015 at 1:57 AM, Steve Loughran wrote: > What version of Hadoop are you seeing this on? > > > On 15 May 2015, at 20:03, Puneet Kapoor > wrote: > > Hey, > >

Re: SaveAsTextFile brings down data nodes with IO Exceptions

2015-05-15 Thread Steve Loughran
What version of Hadoop are you seeing this on? On 15 May 2015, at 20:03, Puneet Kapoor mailto:puneet.cse.i...@gmail.com>> wrote: Hey, Did you find any solution for this issue, we are seeing similar logs in our Data node logs. Appreciate any help. 2015-05-15 10:51:43,615 ERROR org.apache.

Re: SaveAsTextFile brings down data nodes with IO Exceptions

2015-05-15 Thread Puneet Kapoor
Hey, Did you find any solution for this issue, we are seeing similar logs in our Data node logs. Appreciate any help. 2015-05-15 10:51:43,615 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: NttUpgradeDN1:50010:DataXceiver error processing WRITE_BLOCK operation src: /192.168.112.190:46253

Re: saveAsTextFile() to save output of Spark program to HDFS

2015-05-05 Thread Sudarshan Murty
nstead of minReplication (=1). >>>>> There are 1 datanode(s) running and 1 node(s) are excluded in this >>>>> operation.* >>>>> at >>>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.j

Re: saveAsTextFile() to save output of Spark program to HDFS

2015-05-05 Thread ayan guha
.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1550) >>>>at >>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3447) >>>>at >>>> org.apache.hadoop.hdfs.se

Re: saveAsTextFile() to save output of Spark program to HDFS

2015-05-05 Thread Sudarshan Murty
t;> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3447) >>> at >>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:642) >>> >>> >>> I tried this with Spark 1.2.1 sam

Re: saveAsTextFile() to save output of Spark program to HDFS

2015-05-05 Thread Sudarshan Murty
deRpcServer.addBlock(NameNodeRpcServer.java:642) >> >> >> I tried this with Spark 1.2.1 same error. >> I have plenty of space on the DFS. >> The Name Node, Sec Name Node & the one Data Node are all healthy. >> >> Any hint as to what may be the problem ? >

Re: saveAsTextFile() to save output of Spark program to HDFS

2015-05-05 Thread ayan guha
& the one Data Node are all healthy. > > Any hint as to what may be the problem ? > thanks in advance. > Sudarshan > > > -- > View this message in context: saveAsTextFile() to save output of Spark > program to HDFS > <http://apach

saveAsTextFile() to save output of Spark program to HDFS

2015-05-05 Thread Sudarshan
che-spark-user-list.1001560.n3.nabble.com/saveAsTextFile-to-save-output-of-Spark-program-to-HDFS-tp22774.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Efficient saveAsTextFile by key, directory for each key?

2015-04-22 Thread Arun Luthra
PARK-3007 On Tue, Apr 21, 2015 at 5:45 PM, Arun Luthra wrote: > Is there an efficient way to save an RDD with saveAsTextFile in such a way > that the data gets shuffled into separated directories according to a key? > (My end goal is to wrap the result in a multi-partitioned Hive table

Efficient saveAsTextFile by key, directory for each key?

2015-04-21 Thread Arun Luthra
Is there an efficient way to save an RDD with saveAsTextFile in such a way that the data gets shuffled into separated directories according to a key? (My end goal is to wrap the result in a multi-partitioned Hive table) Suppose you have: case class MyData(val0: Int, val1: string, directory_name

Re: Spark 1.3 saveAsTextFile with codec gives error - works with Spark 1.2

2015-04-17 Thread Akhil Das
Not sure if this will help, but try clearing your jar cache (for sbt ~/.ivy2 and for maven ~/.m2) directories. Thanks Best Regards On Wed, Apr 15, 2015 at 9:33 PM, Manoj Samel wrote: > Env - Spark 1.3 Hadoop 2.3, Kerbeos > > xx.saveAsTextFile(path, codec) gives following trace. Same works with

Re: saveAsTextFile

2015-04-16 Thread Vadim Bichutskiy
in different "files", which are really directories containing >>> partitions, as is common in Hadoop. You can move them later, or just >>> read them where they are. >>> >>> On Thu, Apr 16, 2015 at 6:32 PM, Vadim Bichutskiy >>> wrote: >>>

RE: saveAsTextFile

2015-04-16 Thread Evo Eftimov
files and directories From: Vadim Bichutskiy [mailto:vadim.bichuts...@gmail.com] Sent: Thursday, April 16, 2015 6:45 PM To: Evo Eftimov Cc: Subject: Re: saveAsTextFile Thanks Evo for your detailed explanation. On Apr 16, 2015, at 1:38 PM, Evo Eftimov wrote: The reason for this is

Re: saveAsTextFile

2015-04-16 Thread Sean Owen
d >> in different "files", which are really directories containing >> partitions, as is common in Hadoop. You can move them later, or just >> read them where they are. >> >> On Thu, Apr 16, 2015 at 6:32 PM, Vadim Bichutskiy >> wrote: >>> I am using Spark

Re: saveAsTextFile

2015-04-16 Thread Vadim Bichutskiy
im.bichuts...@gmail.com] > Sent: Thursday, April 16, 2015 6:33 PM > To: user@spark.apache.org > Subject: saveAsTextFile > > I am using Spark Streaming where during each micro-batch I output data to S3 > using > saveAsTextFile. Right now each batch of data is put into its own

RE: saveAsTextFile

2015-04-16 Thread Evo Eftimov
@spark.apache.org Subject: Re: saveAsTextFile Thanks Sean. I want to load each batch into Redshift. What's the best/most efficient way to do that? Vadim > On Apr 16, 2015, at 1:35 PM, Sean Owen wrote: > > You can't, since that's how it's designed to work. Batches are saved >

RE: saveAsTextFile

2015-04-16 Thread Evo Eftimov
Nop Sir, it is possible - check my reply earlier -Original Message- From: Sean Owen [mailto:so...@cloudera.com] Sent: Thursday, April 16, 2015 6:35 PM To: Vadim Bichutskiy Cc: user@spark.apache.org Subject: Re: saveAsTextFile You can't, since that's how it's designed t

RE: saveAsTextFile

2015-04-16 Thread Evo Eftimov
HDFS adapter and invoke it in forEachRDD and foreach Regards Evo Eftimov From: Vadim Bichutskiy [mailto:vadim.bichuts...@gmail.com] Sent: Thursday, April 16, 2015 6:33 PM To: user@spark.apache.org Subject: saveAsTextFile I am using Spark Streaming where during each micro-batch I

Re: saveAsTextFile

2015-04-16 Thread Vadim Bichutskiy
;, which are really directories containing > partitions, as is common in Hadoop. You can move them later, or just > read them where they are. > > On Thu, Apr 16, 2015 at 6:32 PM, Vadim Bichutskiy > wrote: >> I am using Spark Streaming where during each micro-batch I output data

Re: saveAsTextFile

2015-04-16 Thread Sean Owen
skiy wrote: > I am using Spark Streaming where during each micro-batch I output data to S3 > using > saveAsTextFile. Right now each batch of data is put into its own directory > containing > 2 objects, "_SUCCESS" and "part-0." > > How do I

saveAsTextFile

2015-04-16 Thread Vadim Bichutskiy
I am using Spark Streaming where during each micro-batch I output data to S3 using saveAsTextFile. Right now each batch of data is put into its own directory containing 2 objects, "_SUCCESS" and "part-0." How do I output each batch into a common directory? Thanks, Vadim ᐧ

Spark 1.3 saveAsTextFile with codec gives error - works with Spark 1.2

2015-04-15 Thread Manoj Samel
Env - Spark 1.3 Hadoop 2.3, Kerbeos xx.saveAsTextFile(path, codec) gives following trace. Same works with Spark 1.2 in same environment val codec = classOf[] val a = sc.textFile("/some_hdfs_file") a.saveAsTextFile("/some_other_hdfs_file", codec) fails with following trace in Spark 1.3, works i

Re: Spark permission denied error when invoking saveAsTextFile

2015-04-01 Thread Kannan Rajah
Ignore the question. There was a Hadoop setting that needed to be set to get it working. -- Kannan On Wed, Apr 1, 2015 at 1:37 PM, Kannan Rajah wrote: > Running a simple word count job in standalone mode as a non root user from > spark-shell. The spark master, worker services are running as ro

Spark permission denied error when invoking saveAsTextFile

2015-04-01 Thread Kannan Rajah
Running a simple word count job in standalone mode as a non root user from spark-shell. The spark master, worker services are running as root user. The problem is the _temporary under /user/krajah/output2/_temporary/0 dir is being created with root permission even when running the job as non root

Pyspark saveAsTextFile exceptions

2015-03-13 Thread Madabhattula Rajesh Kumar
Hi Team, I'm getting below exception for saving the results into hadoop. *Code :* rdd.saveAsTextFile("hdfs://localhost:9000/home/rajesh/data/result.rdd") Could you please help me how to resolve this issue. 15/03/13 17:19:31 INFO spark.SparkContext: Starting job: sa

Re: saveAsTextFile extremely slow near finish

2015-03-11 Thread Imran Rashid
From WebUI, the job is > splitted into two stages: saveAsTextFile and mapToPair. MapToPair finished > in 8 mins. While saveAsTextFile took ~15mins to reach (2366/2373) progress > and the last few jobs just took forever and never finishes. > > Cluster setup: > 8 nodes > on each nod

  1   2   3   >