Re: saveAsTextFile() to save output of Spark program to HDFS

Sudarshan Murty Tue, 05 May 2015 21:12:06 -0700

Thanks much for your help.

Here's what was happening ...
The HDP VM was running in VirtualBox and host was connected to the guest VM
in NAT mode. When I connected this in "Bridged Adapter" mode it worked !


On Tue, May 5, 2015 at 8:54 PM, ayan guha <guha.a...@gmail.com> wrote:

> Try to add one more data node or make minreplication to 0. Hdfs is trying
> to replicate at least one more copy and not able to find another DN to do
> thay
> On 6 May 2015 09:37, "Sudarshan Murty" <njmu...@gmail.com> wrote:
>
>> Another thing - could it be a permission problem ?
>> It creates all the directory structure (in red)    /tmp/wordcount/
>> _temporary/0/_temporary/attempt_201505051439_0001_m_000001_3/part-00001
>> so I am guessing not.
>>
>> On Tue, May 5, 2015 at 7:27 PM, Sudarshan Murty <njmu...@gmail.com>
>> wrote:
>>
>>> You are most probably right. I assumed others may have run into this.
>>> When I try to put the files in there, it creates a directory structure
>>> with the part-00000 and part00001 files but these files are of size 0 - no
>>> content. The client error and the server logs have  the error message shown
>>> - which seem to indicate that the system is aware that a datanode exists
>>> but is excluded from the operation. So, it looks like it is not partitioned
>>> and Ambari indicates that HDFS is in good health with one NN, one SN, one
>>> DN.
>>> I am unable to figure out what the issue is.
>>> thanks for your help.
>>>
>>> On Tue, May 5, 2015 at 6:39 PM, ayan guha <guha.a...@gmail.com> wrote:
>>>
>>>> What happens when you try to put files to your hdfs from local
>>>> filesystem? Looks like its a hdfs issue rather than spark thing.
>>>> On 6 May 2015 05:04, "Sudarshan" <njmu...@gmail.com> wrote:
>>>>
>>>>> I have searched all replies to this question & not found an answer.
>>>>>
>>>>> I am running standalone Spark 1.3.1 and Hortonwork's HDP 2.2 VM, side by 
>>>>> side, on the same machine and trying to write output of wordcount program 
>>>>> into HDFS (works fine writing to a local file, /tmp/wordcount).
>>>>>
>>>>> Only line I added to the wordcount program is: (where 'counts' is the 
>>>>> JavaPairRDD)
>>>>> *counts.saveAsTextFile("hdfs://sandbox.hortonworks.com:8020/tmp/wordcount 
>>>>> <http://sandbox.hortonworks.com:8020/tmp/wordcount>");*
>>>>>
>>>>> When I check in HDFS at that location (/tmp) here's what I find.
>>>>> /tmp/wordcount/_temporary/0/_temporary/attempt_201505051439_0001_m_000000_2/part-00000
>>>>> and
>>>>> /tmp/wordcount/_temporary/0/_temporary/attempt_201505051439_0001_m_000001_3/part-00001
>>>>>
>>>>> and *both part-000[01] are 0 size files*.
>>>>>
>>>>> The wordcount client output error is:
>>>>> [Stage 1:>                                                          (0 + 
>>>>> 2) / 2]15/05/05 14:40:45 WARN DFSClient: DataStreamer Exception
>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
>>>>> /tmp/wordcount/_temporary/0/_temporary/attempt_201505051439_0001_m_000001_3/part-00001
>>>>>  *could only be replicated to 0 nodes instead of minReplication (=1).  
>>>>> There are 1 datanode(s) running and 1 node(s) are excluded in this 
>>>>> operation.*
>>>>>   at 
>>>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1550)
>>>>>   at 
>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3447)
>>>>>   at 
>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:642)
>>>>>
>>>>>
>>>>> I tried this with Spark 1.2.1 same error.
>>>>> I have plenty of space on the DFS.
>>>>> The Name Node, Sec Name Node & the one Data Node are all healthy.
>>>>>
>>>>> Any hint as to what may be the problem ?
>>>>> thanks in advance.
>>>>> Sudarshan
>>>>>
>>>>>
>>>>> ------------------------------
>>>>> View this message in context: saveAsTextFile() to save output of
>>>>> Spark program to HDFS
>>>>> <http://apache-spark-user-list.1001560.n3.nabble.com/saveAsTextFile-to-save-output-of-Spark-program-to-HDFS-tp22774.html>
>>>>> Sent from the Apache Spark User List mailing list archive
>>>>> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>>>>>
>>>>
>>>
>>

Re: saveAsTextFile() to save output of Spark program to HDFS

Reply via email to