Re: write.df is failing on Spark Cluster

Sankar Mittapally Tue, 20 Sep 2016 09:31:22 -0700

Please find the code below.

sankar2 <- read.df("/nfspartition/sankar/test/2016/08/test.json")


I tried these two commands.
write.df(sankar2,"/nfspartition/sankar/test/test.csv","csv",header="true")

saveDF(sankar2,"sankartest.csv",source="csv",mode="append",schema="true")



On Tue, Sep 20, 2016 at 9:40 PM, Kevin Mellott <kevin.r.mell...@gmail.com>
wrote:

> Can you please post the line of code that is doing the df.write command?
>
> On Tue, Sep 20, 2016 at 9:29 AM, Sankar Mittapally <sankar.mittapally@
> creditvidya.com> wrote:
>
>> Hey Kevin,
>>
>> It is a empty directory, It is able to write part files to the directory
>> but while merging those part files we are getting above error.
>>
>> Regards
>>
>>
>> On Tue, Sep 20, 2016 at 7:46 PM, Kevin Mellott <kevin.r.mell...@gmail.com
>> > wrote:
>>
>>> Have you checked to see if any files already exist at
>>> /nfspartition/sankar/banking_l1_v2.csv? If so, you will need to delete
>>> them before attempting to save your DataFrame to that location.
>>> Alternatively, you may be able to specify the "mode" setting of the
>>> df.write operation to "overwrite", depending on the version of Spark you
>>> are running.
>>>
>>> *ERROR (from log)*
>>> 16/09/17 08:03:28 WARN FileUtil: Failed to delete file or
>>> dir[/nfspartition/sankar/banking_l1_v2.csv/_temporary/0/task
>>> _201609170802_0013_m_000000/.part-r-00000-46a7f178-2490-444e
>>> -9110-510978eaaecb.csv.crc]:
>>> it still exists.
>>> 16/09/17 08:03:28 WARN FileUtil: Failed to delete file or
>>> dir[/nfspartition/sankar/banking_l1_v2.csv/_temporary/0/task
>>> _201609170802_0013_m_000000/part-r-00000-46a7f178-2490-444e-
>>> 9110-510978eaaecb.csv]:
>>> it still exists.
>>>
>>> *df.write Documentation*
>>> http://spark.apache.org/docs/latest/api/R/write.df.html
>>>
>>> Thanks,
>>> Kevin
>>>
>>> On Tue, Sep 20, 2016 at 12:16 AM, sankarmittapally <
>>> sankar.mittapa...@creditvidya.com> wrote:
>>>
>>>>  We have setup a spark cluster which is on NFS shared storage, there is
>>>> no
>>>> permission issues with NFS storage, all the users are able to write to
>>>> NFS
>>>> storage. When I fired write.df command in SparkR, I am getting below.
>>>> Can
>>>> some one please help me to fix this issue.
>>>>
>>>>
>>>> 16/09/17 08:03:28 ERROR InsertIntoHadoopFsRelationCommand: Aborting
>>>> job.
>>>> java.io.IOException: Failed to rename DeprecatedRawLocalFileStatus
>>>> {path=file:/nfspartition/sankar/banking_l1_v2.csv/_temporary
>>>> /0/task_201609170802_0013_m_000000/part-r-00000-46a7f178-249
>>>> 0-444e-9110-510978eaaecb.csv;
>>>> isDirectory=false; length=436486316; replication=1; blocksize=33554432;
>>>> modification_time=1474099400000; access_time=0; owner=; group=;
>>>> permission=rw-rw-rw-; isSymlink=false}
>>>> to
>>>> file:/nfspartition/sankar/banking_l1_v2.csv/part-r-00000-46a
>>>> 7f178-2490-444e-9110-510978eaaecb.csv
>>>> at
>>>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.m
>>>> ergePaths(FileOutputCommitter.java:371)
>>>> at
>>>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.m
>>>> ergePaths(FileOutputCommitter.java:384)
>>>> at
>>>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.c
>>>> ommitJob(FileOutputCommitter.java:326)
>>>> at
>>>> org.apache.spark.sql.execution.datasources.BaseWriterContain
>>>> er.commitJob(WriterContainer.scala:222)
>>>> at
>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>>>> sRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoo
>>>> pFsRelationCommand.scala:144)
>>>> at
>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>>>> sRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRela
>>>> tionCommand.scala:115)
>>>> at
>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>>>> sRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRela
>>>> tionCommand.scala:115)
>>>> at
>>>> org.apache.spark.sql.execution.SQLExecution$.withNewExecutio
>>>> nId(SQLExecution.scala:57)
>>>> at
>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>>>> sRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115)
>>>> at
>>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.s
>>>> ideEffectResult$lzycompute(commands.scala:60)
>>>> at
>>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.s
>>>> ideEffectResult(commands.scala:58)
>>>> at
>>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.d
>>>> oExecute(commands.scala:74)
>>>> at
>>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.
>>>> apply(SparkPlan.scala:115)
>>>> at
>>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.
>>>> apply(SparkPlan.scala:115)
>>>> at
>>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQue
>>>> ry$1.apply(SparkPlan.scala:136)
>>>> at
>>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati
>>>> onScope.scala:151)
>>>> at
>>>> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkP
>>>> lan.scala:133)
>>>> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.s
>>>> cala:114)
>>>> at
>>>> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompu
>>>> te(QueryExecution.scala:86)
>>>> at
>>>> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExe
>>>> cution.scala:86)
>>>> at
>>>> org.apache.spark.sql.execution.datasources.DataSource.write(
>>>> DataSource.scala:487)
>>>> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
>>>> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194)
>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>> at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>>>> ssorImpl.java:62)
>>>> at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>>>> thodAccessorImpl.java:43)
>>>> at java.lang.reflect.Method.invoke(Method.java:498)
>>>> at
>>>> org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBac
>>>> kendHandler.scala:141)
>>>> at
>>>> org.apache.spark.api.r.RBackendHandler.channelRead0(RBackend
>>>> Handler.scala:86)
>>>> at
>>>> org.apache.spark.api.r.RBackendHandler.channelRead0(RBackend
>>>> Handler.scala:38)
>>>> at
>>>> io.netty.channel.SimpleChannelInboundHandler.channelRead(Sim
>>>> pleChannelInboundHandler.java:105)
>>>> at
>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannel
>>>> Read(AbstractChannelHandlerContext.java:308)
>>>> at
>>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRe
>>>> ad(AbstractChannelHandlerContext.java:294)
>>>> at
>>>> io.netty.handler.codec.MessageToMessageDecoder.channelRead(M
>>>> essageToMessageDecoder.java:103)
>>>> at
>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannel
>>>> Read(AbstractChannelHandlerContext.java:308)
>>>> at
>>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRe
>>>> ad(AbstractChannelHandlerContext.java:294)
>>>> at
>>>> io.netty.handler.codec.ByteToMessageDecoder.channelRead(Byte
>>>> ToMessageDecoder.java:244)
>>>> at
>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannel
>>>> Read(AbstractChannelHandlerContext.java:308)
>>>> at
>>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRe
>>>> ad(AbstractChannelHandlerContext.java:294)
>>>> at
>>>> io.netty.channel.DefaultChannelPipeline.fireChannelRead(Defa
>>>> ultChannelPipeline.java:846)
>>>> at
>>>> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.re
>>>> ad(AbstractNioByteChannel.java:131)
>>>> at
>>>> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEven
>>>> tLoop.java:511)
>>>> at
>>>> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimiz
>>>> ed(NioEventLoop.java:468)
>>>> at
>>>> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEve
>>>> ntLoop.java:382)
>>>> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>>>> at
>>>> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(Sin
>>>> gleThreadEventExecutor.java:111)
>>>> at
>>>> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnabl
>>>> eDecorator.run(DefaultThreadFactory.java:137)
>>>> at java.lang.Thread.run(Thread.java:745)
>>>> 16/09/17 08:03:28 WARN FileUtil: Failed to delete file or
>>>> dir[/nfspartition/sankar/banking_l1_v2.csv/_temporary/0/task
>>>> _201609170802_0013_m_000000/.part-r-00000-46a7f178-2490-444e
>>>> -9110-510978eaaecb.csv.crc]:
>>>> it still exists.
>>>> 16/09/17 08:03:28 WARN FileUtil: Failed to delete file or
>>>> dir[/nfspartition/sankar/banking_l1_v2.csv/_temporary/0/task
>>>> _201609170802_0013_m_000000/part-r-00000-46a7f178-2490-444e-
>>>> 9110-510978eaaecb.csv]:
>>>> it still exists.
>>>> 16/09/17 08:03:28 ERROR DefaultWriterContainer: Job
>>>> job_201609170803_0000
>>>> aborted.
>>>> 16/09/17 08:03:28 ERROR RBackendHandler: save on 625 failed
>>>> Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) :
>>>> org.apache.spark.SparkException: Job aborted.
>>>> at
>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>>>> sRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoo
>>>> pFsRelationCommand.scala:149)
>>>> at
>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>>>> sRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRela
>>>> tionCommand.scala:115)
>>>> at
>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>>>> sRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRela
>>>> tionCommand.scala:115)
>>>> at
>>>> org.apache.spark.sql.execution.SQLExecution$.withNewExecutio
>>>> nId(SQLExecution.scala:57)
>>>> at
>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>>>> sRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115)
>>>> at
>>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.s
>>>> ideEffectResult$lzycompute(commands.scala:60)
>>>> at
>>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.s
>>>> ideEffectResult(commands.scala:58)
>>>> at org.apache.spark.sql.execution.command.ExecutedCommandExec.doE
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context: http://apache-spark-user-list.
>>>> 1001560.n3.nabble.com/write-df-is-failing-on-Spark-Cluster-tp27761.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>>
>>>>
>>>
>>
>>
>> --
>> Regards
>>
>> Sankar Mittapally
>> Senior Software Engineer
>>
>
>


-- 
Regards

Sankar Mittapally
Senior Software Engineer

Re: write.df is failing on Spark Cluster

Reply via email to