Hello,
I am trying to use the Spark rdd.saveAsTextFile function which calls the
FileSystem.rename() under the hood. This errors out with
“com.microsoft.azure.storage.StorageException: One of the request inputs is
not valid” when using hadoop-azure NativeAzureFileSystem. I have written a
small test
just checked the 2nd argument of saveAsTextFile and I believe read
and write will be faster on disk after use of compression. I will try this.
So I think there is no special requirement on type of disk for execution of
saveAsTextFile as they are local I/O operations.
Regards
Ranju
csv file. This script runs on every node and later
they all combine to single file.
On the other hand is your data really just a collection of strings without any
repetitions
[Ranju]:
Yes It is comma separated string.
And I just checked the 2nd argument of saveAsTextFile and I believe read and
Hi!
I would like to reflect only to the first part of your mail:
I have a large RDD dataset of around 60-70 GB which I cannot send to driver
> using *collect* so first writing that to disk using *saveAsTextFile* and
> then this data gets saved in the form of multiple part files on eac
Hi All,
I have a large RDD dataset of around 60-70 GB which I cannot send to driver
using collect so first writing that to disk using saveAsTextFile and then this
data gets saved in the form of multiple part files on each node of the cluster
and after that driver reads the data from that
How about increasing RDD's partitions / rebalancing data?
On Sat, Mar 11, 2017 at 2:33 PM, Parsian, Mahmoud
wrote:
> How to improve performance of JavaRDD.saveAsTextFile(“hdfs://…“).
> This is taking over 30 minutes on a cluster of 10 nodes.
> Running Spark on YARN.
>
> JavaRDD has 120 million e
How to improve performance of JavaRDD.saveAsTextFile(“hdfs://…“).
This is taking over 30 minutes on a cluster of 10 nodes.
Running Spark on YARN.
JavaRDD has 120 million entries.
Thank you,
Best regards,
Mahmoud
Hi Mahendra
Did you tried mapping the X case class members further to a String object
and then saving the RDD[String] ?
Thanks
Deepak
On Oct 7, 2016 23:04, "Mahendra Kutare" wrote:
> Hi,
>
> I am facing issue with writing RDD[X] to HDFS file path. X is a simple
> case class with variable time
Hi,
I am facing issue with writing RDD[X] to HDFS file path. X is a simple
case class with variable time as primitive long.
When I run the driver program with - master as
spark://:7077
I get this -
Caused by: java.io.EOFException
at
java.io.ObjectInputStream$BlockDataInputStream.readFully(Ob
my_model.save(sc, "/my_model")
>>
>> -
>> 16/07/28 08:36:19 INFO TaskSchedulerImpl: Removed TaskSet 69.0, whose
>> tasks have all completed, from pool
>>
>> 16/07/28 08:36:19 INFO DAGScheduler: ResultStage 69 (saveAsTex
s
have all completed, from pool
16/07/28 08:36:19 INFO DAGScheduler: ResultStage 69 (saveAsTextFile at
treeEnsembleModels.scala:447) finished in 0.901 s
16/07/28 08:36:19 INFO DAGScheduler: Job 38 finished: saveAsTextFile at
treeEnsembleModels.scala:447, took 2.513396
Internally, saveAsTextFile uses saveAsHadoopFile:
https://github.com/apache/spark/blob/d5911d1173fe0872f21cae6c47abf8ff479345a4/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala
.
The final bit in the method first creates the output path and then saves
the data set. However, if
Hi all:
I’ve tried to execute something as below:
result.map(transform).saveAsTextFile(hdfsAddress)
Result is a RDD caluculated from mlilib algorithm.
I submit this to yarn, and after two attempts , the application failed.
But the exception in log is very missleading. It said hdfsAddress
ler
Cc: spark users
Subject: Re: saveAsTextFile is not writing to local fs
Hi Mohamed,
Thanks for your response. Data is available in worker nodes. But looking for
something to write directly to local fs. Seems like it is not an option.
Thanks,
Sivakumar Bhavanari.
On Mon, Feb 1, 2016 at 5
hor: Big Data Analytics with Spark
> <http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>
>
>
>
> *From:* Siva [mailto:sbhavan...@gmail.com]
> *Sent:* Friday, January 29, 2016 5:40 PM
> *To:* Mohammed Guller
> *Cc:* spark users
> *
Analytics with
Spark<http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>
From: Siva [mailto:sbhavan...@gmail.com]
Sent: Friday, January 29, 2016 5:40 PM
To: Mohammed Guller
Cc: spark users
Subject: Re: saveAsTextFile is not writing to local fs
Hi Mohammed,
Thanks fo
Hi Mohammed,
Thanks for your quick response. I m submitting spark job to Yarn in
"yarn-client" mode on a 6 node cluster. I ran the job by turning on DEBUG
mode. I see the below exception, but this exception occurred after
saveAsTextfile function is finished.
16/01/29 20:26:57 DEBUG
: Siva [mailto:sbhavan...@gmail.com]
Sent: Friday, January 29, 2016 3:38 PM
To: spark users
Subject: saveAsTextFile is not writing to local fs
Hi Everyone,
We are using spark 1.4.1 and we have a requirement of writing data local fs
instead of hdfs.
When trying to save rdd to local fs with saveAsTextFile,
Hi Everyone,
We are using spark 1.4.1 and we have a requirement of writing data local fs
instead of hdfs.
When trying to save rdd to local fs with saveAsTextFile, it is just writing
_SUCCESS file in the folder with no part- files and also no error or
warning messages on console.
Is there any
coalesce(1).saveAsTextfile() takes forever?
> hi I am trying to save many partitions of Dataframe into one CSV file and it
> take forever for large data sets of around 5-6 GB.
>
> sourceFrame.coalesce(1).write().format("com.databricks.spark.csv").option("gzi
> p&qu
")
>>
>> For small data above code works well but for large data it hangs forever
>> does not move on because of only one partitions has to shuffle data of GBs
>> please help me
>>
>>
>>
>> --
>> View this message in context:
>> http:/
t;>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/coalesce-1-saveAsTextfile-takes-forever-tp25886.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
n context:
> http://apache-spark-user-list.1001560.n3.nabble.com/coalesce-1-saveAsTextfile-takes-forever-tp25886.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubs
ll but for large data it hangs forever
does not move on because of only one partitions has to shuffle data of GBs
please help me
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/coalesce-1-saveAsTextfile-takes-forever-tp25886.html
Sent from the Apache Spark Use
3b#file-saveasparquet-java-L80
>
> Best Regards,
> Ram
> --
>
> Date: Saturday, December 5, 2015 at 7:18 AM
> To: Akhil Das
>
> Cc: user
> Subject: Re: Improve saveAsTextFile performance
>
> >If you are doing a join/groupBy kind of operations then you need t
,
Ram
--
Date: Saturday, December 5, 2015 at 7:18 AM
To: Akhil Das mailto:ak...@sigmoidanalytics.com>>
Cc: user mailto:user@spark.apache.org>>
Subject: Re: Improve saveAsTextFile performance
>If you are doing a join/groupBy kind of operations then you need to make sure
>the keys are
0 8 3.9 MB / 95334
Best Regards,
Ram
From: Akhil Das mailto:ak...@sigmoidanalytics.com>>
Date: Saturday, December 5, 2015 at 1:32 AM
To: Ram VISWANADHA
mailto:ram.viswana...@dailymotion.com>>
Cc: user mailto:user@spark.apache.org>>
Subject: Re: Improve saveAsTex
the partitions.
Thanks
Best Regards
On Sat, Dec 5, 2015 at 8:24 AM, Ram VISWANADHA <
ram.viswana...@dailymotion.com> wrote:
> That didn’t work :(
> Any help I have documented some steps here.
>
> http://stackoverflow.com/questions/34048340/spark-saveastextfile-last-stage-alm
That didn’t work :(
Any help I have documented some steps here.
http://stackoverflow.com/questions/34048340/spark-saveastextfile-last-stage-almost-never-finishes
Best Regards,
Ram
From: Sahil Sareen mailto:sareen...@gmail.com>>
Date: Wednesday, December 2, 2015 at 10:18 PM
To: Ram VISW
From: Ted Yu
> Date: Wednesday, December 2, 2015 at 3:25 PM
> To: Ram VISWANADHA
> Cc: user
> Subject: Re: Improve saveAsTextFile performance
>
> Have you tried calling coalesce() before saveAsTextFile ?
>
> Cheers
>
> On Wed, Dec 2, 2015 at 3:15 PM, Ram V
Yes. That did not help.
Best Regards,
Ram
From: Ted Yu mailto:yuzhih...@gmail.com>>
Date: Wednesday, December 2, 2015 at 3:25 PM
To: Ram VISWANADHA
mailto:ram.viswana...@dailymotion.com>>
Cc: user mailto:user@spark.apache.org>>
Subject: Re: Improve saveAsTextFile performan
Have you tried calling coalesce() before saveAsTextFile ?
Cheers
On Wed, Dec 2, 2015 at 3:15 PM, Ram VISWANADHA <
ram.viswana...@dailymotion.com> wrote:
> JavaRDD.saveAsTextFile is taking a long time to succeed. There are 10
> tasks, the first 9 complete in a reasonable time but t
JavaRDD.saveAsTextFile is taking a long time to succeed. There are 10 tasks,
the first 9 complete in a reasonable time but the last task is taking a long
time to complete. The last task contains the maximum number of records like 90%
of the total number of records. Is there any way to paralleli
: "user @spark"
Subject: Re: streaming: missing data. does saveAsTextFile() append or
replace?
> Andy,
>
> Using the rdd.saveAsTextFile(...) will overwrite the data if your target is
> the same file.
>
> If you want to save to HDFS, DStream offers dstream.saveAsTextFile
ote:
> Hi
>
> I just started a new spark streaming project. In this phase of the system
> all we want to do is save the data we received to hdfs. I after running for
> a couple of days it looks like I am missing a lot of data. I wonder if
> saveAsTextFile("hdfs:///rawSt
Hi
I just started a new spark streaming project. In this phase of the system
all we want to do is save the data we received to hdfs. I after running for
a couple of days it looks like I am missing a lot of data. I wonder if
saveAsTextFile("hdfs:///rawSteamingData²); is overwriting the d
Yes. My one is 1.4.0.
Then is this problem to do with the version?
I doubt that. Any comments please?
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Wednesday, 4 November 2015 11:52 AM
To: Jack Yang
Cc: user@spark.apache.org
Subject: Re: error with saveAsTextFile in local directory
Looks
t; val hdfsFilePath = "hdfs://master:ip/ tempFile ";
>
> val localFilePath = "file:///home/hduser/tempFile";
>
> hiveContext.sql(s"""my hql codes here""")
>
> res.printSchema() --working
>
> res.show() --working
>
&g
res.show() --working
res.map{ x => tranRow2Str(x) }.coalesce(1).saveAsTextFile(hdfsFilePath)
--still working
res.map{ x => tranRow2Str(x) }.coalesce(1).saveAsTextFile(localFilePath)
--wrong!
then at last, I get the correct results in hdfsFilePath, but nothing in
localFilePath.
Btw, the l
:
>>
>> This sequence produces the following log and creates the empty folder
>> "test":
>>
>> scala> val l = Seq.fill(1)(nextInt)
>> scala> val dist = sc.parallelize(l)
>> scala> dist.saveAsTextFile("hdfs://node1.i3a.info/user/ja
e produces the following log and creates the empty folder
> "test":
>
> scala> val l = Seq.fill(1)(nextInt)
> scala> val dist = sc.parallelize(l)
> scala> dist.saveAsTextFile("hdfs://node1.i3a.info/user/jarias/test/
> <http://node1.i3a.info/user/jarias/tes
t = sc.parallelize(l)
> scala> dist.saveAsTextFile("hdfs://node1.i3a.info/user/jarias/test/")
>
>
> 15/10/02 10:19:22 INFO FileOutputCommitter: File Output Committer Algorithm
> version is 1
> 15/10/02 10:19:22 INFO SparkContext: Starting job: saveAsTextFile at
> :27
> 15/10/02
parallelize(l)
scala> dist.saveAsTextFile("hdfs://node1.i3a.info/user/jarias/test/")
15/10/02 10:19:22 INFO FileOutputCommitter: File Output Committer Algorithm
version is 1
15/10/02 10:19:22 INFO SparkContext: Starting job: saveAsTextFile at
:27
15/10/02 10:19:22 INFO DAGSche
Hi,
I have 2 stages in my job map and save as text file. During the save text file
stage I am getting an exception :
15/09/24 15:38:16 WARN AkkaUtils: Error sending message in 1 attempts
java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at scala.concurrent.impl.
via the –configurations flag
on the “aws emr create-cluster” command
Thanks,
Ewan
From: Alexander Pivovarov [mailto:apivova...@gmail.com]
Sent: 03 September 2015 00:12
To: Neil Jonkers
Cc: user@spark.apache.org
Subject: Re: spark 1.4.1 saveAsTextFile is slow on emr-4.0.0
Hi Neil
Yes! it helps
Hi Neil
Yes! it helps!!! I do not see _temporary in console output anymore.
saveAsTextFile
is fast now.
2015-09-02 23:07:00,022 INFO [task-result-getter-0]
scheduler.TaskSetManager (Logging.scala:logInfo(59)) - Finished task 18.0
in stage 0.0 (TID 18) in 4398 ms on ip-10-0-24-103.ec2.internal
gt;
>>
>>
>> On Tue, Sep 1, 2015 at 4:01 PM, Alexander Pivovarov > > wrote:
>>
>>> I run spark 1.4.1 in amazom aws emr 4.0.0
>>>
>>> For some reason spark saveAsTextFile is very slow on emr 4.0.0 in
>>> comparison to emr 3.8 (was 5 sec, now
ter?
> spark.hadoop.mapred.output.committer.class
> com.appsflyer.spark.DirectOutputCommitter
>
>
>
> On Tue, Sep 1, 2015 at 4:01 PM, Alexander Pivovarov
> wrote:
>
>> I run spark 1.4.1 in amazom aws emr 4.0.0
>>
>> For some reason spark saveAsTextFile is very slow on emr 4.0.0 in
>
Should I use DirectOutputCommitter?
spark.hadoop.mapred.output.committer.class
com.appsflyer.spark.DirectOutputCommitter
On Tue, Sep 1, 2015 at 4:01 PM, Alexander Pivovarov
wrote:
> I run spark 1.4.1 in amazom aws emr 4.0.0
>
> For some reason spark saveAsTextFile is very slow on
I run spark 1.4.1 in amazom aws emr 4.0.0
For some reason spark saveAsTextFile is very slow on emr 4.0.0 in
comparison to emr 3.8 (was 5 sec, now 95 sec)
Actually saveAsTextFile says that it's done in 4.356 sec but after that I
see lots of INFO messages with 404 error from com.amazonaws.la
twitter.com/deanwampler>
>> http://polyglotprogramming.com
>>
>> On Mon, Aug 10, 2015 at 7:08 AM, Yasemin Kaya wrote:
>>
>>> Hi,
>>>
>>> I have EC2 cluster, and am using spark 1.3, yarn and HDFS . When i
>>> submit at local there is n
m>
> @deanwampler <http://twitter.com/deanwampler>
> http://polyglotprogramming.com
>
> On Mon, Aug 10, 2015 at 7:08 AM, Yasemin Kaya wrote:
>
>> Hi,
>>
>> I have EC2 cluster, and am using spark 1.3, yarn and HDFS . When i submit
>> at local there i
3, yarn and HDFS . When i submit
> at local there is no problem , but i run at cluster, saveAsTextFile doesn't
> work."*It says me User class threw exception: Output directory
> hdfs://172.31.42.10:54310/./weblogReadResult
> <http://172.31.42.10:54310/./weblogReadResul
Hi,
I have EC2 cluster, and am using spark 1.3, yarn and HDFS . When i submit
at local there is no problem , but i run at cluster, saveAsTextFile doesn't
work."*It says me User class threw exception: Output directory
hdfs://172.31.42.10:54310/./weblogReadResult
<http://172.3
Hi,Try using coalesce(1) before calling saveAsTextFile() Thanks & Regards,
Meethu M
On Wednesday, 5 August 2015 7:53 AM, Brandon White
wrote:
What is the best way to make saveAsTextFile save as only a single file?
her clarify, you can first call coalesce with argument 1 and
>> then call saveAsTextFile. For example,
>>
>>
>>
>> rdd.coalesce(1).saveAsTextFile(...)
>>
>>
>>
>>
>>
>>
>>
>> Mohammed
>>
>>
>>
>
August 2015 at 07:43, Mohammed Guller wrote:
> Just to further clarify, you can first call coalesce with argument 1 and
> then call saveAsTextFile. For example,
>
>
>
> rdd.coalesce(1).saveAsTextFile(...)
>
>
>
>
>
>
>
> Mohammed
>
>
>
> *F
Just to further clarify, you can first call coalesce with argument 1 and then
call saveAsTextFile. For example,
rdd.coalesce(1).saveAsTextFile(...)
Mohammed
From: Mohammed Guller
Sent: Tuesday, August 4, 2015 9:39 PM
To: 'Brandon White'; user
Subject: RE: Combining Spark
One options is to use the coalesce method in the RDD class.
Mohammed
From: Brandon White [mailto:bwwintheho...@gmail.com]
Sent: Tuesday, August 4, 2015 7:23 PM
To: user
Subject: Combining Spark Files with saveAsTextFile
What is the best way to make saveAsTextFile save as only a single file?
What is the best way to make saveAsTextFile save as only a single file?
che()
val l = applySchema(outputRecords, schemaName).cache()
l.saveAsTextFile(filename + ".txt")
l.saveAsParquetFile(filename+ ".parquet")
Expected result: When we do saveAsTextFile the computation should happen and
cache the result
and the second time when we do saveAsparque
Thanks for the help.
Following are the folders I was trying to write to
*saveAsTextFile("*file:///home/someuser/test2/testupload/20150708/0/")
*saveAsTextFile("f*ile:///home/someuser/test2/testupload/20150708/1/")
*saveAsTextFile("*file:///home/someuser/te
wing function
>
> saveAsTextFile("file:home/someuser/dir2/testupload/20150708/")
>
> The dir (/home/someuser/dir2/testupload/) was created before running the
> job. The error message is misleading.
>
>
> org.apache.spark.SparkException: Job aborted due to stage
Getting exception when wrting RDD to local disk using following function
saveAsTextFile("file:home/someuser/dir2/testupload/20150708/")
The dir (/home/someuser/dir2/testupload/) was created before running the
job. The error message is misleading.
org.apache.spark.SparkExce
Getting exception when wrting RDD to local disk using following function
saveAsTextFile("file:home/someuser/dir2/testupload/20150708/")
The dir (/home/someuser/dir2/testupload/) was created before running the
job. The error message is misleading.
org.apache.spark.SparkExce
This may sound like an obvious question, but are you sure that the program
is doing any work when you don't have a saveAsTextFile? If there are
transformations but no actions to actually collect the data, there's no
need for Spark to execute the transformations.
As to the question o
save the result to HDFS using "*saveAsTextFile*".
*Problem*: if I don't add "saveAsTextFile", the program runs very fast(a few
seconds), otherwise extremely slow until about 30 mins.
*My program (is very Simple)*
public static void main(String[]
1001560.n3.nabble.com/saveAsTextFile-part-files-are-missing-tp22974.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-ma
oing wrong?
Thank you!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/saveAsTextFile-part-files-are-missing-tp22974.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
--
at this.
>
> scala>
> model.freqItemsets.saveAsTextFile("c:///repository/trunk/Scala_210_wspace/fpGrowth/modelText1")
> 15/05/20 14:07:47 INFO SparkContext: Starting job: saveAsTextFile at
> :33
> 15/05/20 14:07:47 INFO DAGScheduler: Got job 15 (saveAsTextFile at
> :33)
Could you post the stack trace? If you are using Spark 1.3 or 1.4, it
would be easier to save freq itemsets as a Parquet file. -Xiangrui
On Wed, May 20, 2015 at 12:16 PM, Eric Tanner
wrote:
> I am having trouble with saving an FP-Growth model as a text file. I can
> print out the results, but wh
I am having trouble with saving an FP-Growth model as a text file. I can
print out the results, but when I try to save the model I get a
NullPointerException.
model.freqItemsets.saveAsTextFile("c://fpGrowth/model")
Thanks,
Eric
All - this issue showed up when I was tearing down a spark context and
creating a new one. Often, I was unable to then write to HDFS due to this
error. I subsequently switched to a different implementation where instead
of tearing down and re initializing the spark context I'd instead submit a
sepa
I am seeing this on hadoop 2.4.0 version.
Thanks for your suggestions, i will try those and let you know if they help
!
On Sat, May 16, 2015 at 1:57 AM, Steve Loughran
wrote:
> What version of Hadoop are you seeing this on?
>
>
> On 15 May 2015, at 20:03, Puneet Kapoor
> wrote:
>
> Hey,
>
>
What version of Hadoop are you seeing this on?
On 15 May 2015, at 20:03, Puneet Kapoor
mailto:puneet.cse.i...@gmail.com>> wrote:
Hey,
Did you find any solution for this issue, we are seeing similar logs in our
Data node logs. Appreciate any help.
2015-05-15 10:51:43,615 ERROR org.apache.
Hey,
Did you find any solution for this issue, we are seeing similar logs in our
Data node logs. Appreciate any help.
2015-05-15 10:51:43,615 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
NttUpgradeDN1:50010:DataXceiver error processing WRITE_BLOCK operation
src: /192.168.112.190:46253
nstead of minReplication (=1).
>>>>> There are 1 datanode(s) running and 1 node(s) are excluded in this
>>>>> operation.*
>>>>> at
>>>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.j
.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1550)
>>>>at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3447)
>>>>at
>>>> org.apache.hadoop.hdfs.se
t;> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3447)
>>> at
>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:642)
>>>
>>>
>>> I tried this with Spark 1.2.1 sam
deRpcServer.addBlock(NameNodeRpcServer.java:642)
>>
>>
>> I tried this with Spark 1.2.1 same error.
>> I have plenty of space on the DFS.
>> The Name Node, Sec Name Node & the one Data Node are all healthy.
>>
>> Any hint as to what may be the problem ?
>
& the one Data Node are all healthy.
>
> Any hint as to what may be the problem ?
> thanks in advance.
> Sudarshan
>
>
> --
> View this message in context: saveAsTextFile() to save output of Spark
> program to HDFS
> <http://apach
che-spark-user-list.1001560.n3.nabble.com/saveAsTextFile-to-save-output-of-Spark-program-to-HDFS-tp22774.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
PARK-3007
On Tue, Apr 21, 2015 at 5:45 PM, Arun Luthra wrote:
> Is there an efficient way to save an RDD with saveAsTextFile in such a way
> that the data gets shuffled into separated directories according to a key?
> (My end goal is to wrap the result in a multi-partitioned Hive table
Is there an efficient way to save an RDD with saveAsTextFile in such a way
that the data gets shuffled into separated directories according to a key?
(My end goal is to wrap the result in a multi-partitioned Hive table)
Suppose you have:
case class MyData(val0: Int, val1: string, directory_name
Not sure if this will help, but try clearing your jar cache (for sbt
~/.ivy2 and for maven ~/.m2) directories.
Thanks
Best Regards
On Wed, Apr 15, 2015 at 9:33 PM, Manoj Samel
wrote:
> Env - Spark 1.3 Hadoop 2.3, Kerbeos
>
> xx.saveAsTextFile(path, codec) gives following trace. Same works with
in different "files", which are really directories containing
>>> partitions, as is common in Hadoop. You can move them later, or just
>>> read them where they are.
>>>
>>> On Thu, Apr 16, 2015 at 6:32 PM, Vadim Bichutskiy
>>> wrote:
>>>
files and
directories
From: Vadim Bichutskiy [mailto:vadim.bichuts...@gmail.com]
Sent: Thursday, April 16, 2015 6:45 PM
To: Evo Eftimov
Cc:
Subject: Re: saveAsTextFile
Thanks Evo for your detailed explanation.
On Apr 16, 2015, at 1:38 PM, Evo Eftimov wrote:
The reason for this is
d
>> in different "files", which are really directories containing
>> partitions, as is common in Hadoop. You can move them later, or just
>> read them where they are.
>>
>> On Thu, Apr 16, 2015 at 6:32 PM, Vadim Bichutskiy
>> wrote:
>>> I am using Spark
im.bichuts...@gmail.com]
> Sent: Thursday, April 16, 2015 6:33 PM
> To: user@spark.apache.org
> Subject: saveAsTextFile
>
> I am using Spark Streaming where during each micro-batch I output data to S3
> using
> saveAsTextFile. Right now each batch of data is put into its own
@spark.apache.org
Subject: Re: saveAsTextFile
Thanks Sean. I want to load each batch into Redshift. What's the best/most
efficient way to do that?
Vadim
> On Apr 16, 2015, at 1:35 PM, Sean Owen wrote:
>
> You can't, since that's how it's designed to work. Batches are saved
>
Nop Sir, it is possible - check my reply earlier
-Original Message-
From: Sean Owen [mailto:so...@cloudera.com]
Sent: Thursday, April 16, 2015 6:35 PM
To: Vadim Bichutskiy
Cc: user@spark.apache.org
Subject: Re: saveAsTextFile
You can't, since that's how it's designed t
HDFS adapter and invoke it in forEachRDD and foreach
Regards
Evo Eftimov
From: Vadim Bichutskiy [mailto:vadim.bichuts...@gmail.com]
Sent: Thursday, April 16, 2015 6:33 PM
To: user@spark.apache.org
Subject: saveAsTextFile
I am using Spark Streaming where during each micro-batch I
;, which are really directories containing
> partitions, as is common in Hadoop. You can move them later, or just
> read them where they are.
>
> On Thu, Apr 16, 2015 at 6:32 PM, Vadim Bichutskiy
> wrote:
>> I am using Spark Streaming where during each micro-batch I output data
skiy
wrote:
> I am using Spark Streaming where during each micro-batch I output data to S3
> using
> saveAsTextFile. Right now each batch of data is put into its own directory
> containing
> 2 objects, "_SUCCESS" and "part-0."
>
> How do I
I am using Spark Streaming where during each micro-batch I output data to
S3 using
saveAsTextFile. Right now each batch of data is put into its own directory
containing
2 objects, "_SUCCESS" and "part-0."
How do I output each batch into a common directory?
Thanks,
Vadim
ᐧ
Env - Spark 1.3 Hadoop 2.3, Kerbeos
xx.saveAsTextFile(path, codec) gives following trace. Same works with
Spark 1.2 in same environment
val codec = classOf[]
val a = sc.textFile("/some_hdfs_file")
a.saveAsTextFile("/some_other_hdfs_file", codec) fails with following trace
in Spark 1.3, works i
Ignore the question. There was a Hadoop setting that needed to be set to
get it working.
--
Kannan
On Wed, Apr 1, 2015 at 1:37 PM, Kannan Rajah wrote:
> Running a simple word count job in standalone mode as a non root user from
> spark-shell. The spark master, worker services are running as ro
Running a simple word count job in standalone mode as a non root user from
spark-shell. The spark master, worker services are running as root user.
The problem is the _temporary under /user/krajah/output2/_temporary/0 dir
is being created with root permission even when running the job as non root
Hi Team,
I'm getting below exception for saving the results into hadoop.
*Code :*
rdd.saveAsTextFile("hdfs://localhost:9000/home/rajesh/data/result.rdd")
Could you please help me how to resolve this issue.
15/03/13 17:19:31 INFO spark.SparkContext: Starting job: sa
From WebUI, the job is
> splitted into two stages: saveAsTextFile and mapToPair. MapToPair finished
> in 8 mins. While saveAsTextFile took ~15mins to reach (2366/2373) progress
> and the last few jobs just took forever and never finishes.
>
> Cluster setup:
> 8 nodes
> on each nod
1 - 100 of 235 matches
Mail list logo