Re: How to write large string to file in HDFS

nguyenhuynh.mr Wed, 29 Apr 2009 03:15:35 -0700

Wang Zhong wrote:

> You can try using FSDataOutputStream in reduce phase. Create a file
> with FSDataOutputStream by the method below:
>
> ====
> FileSystem fs = FileSystem.get(conf);
> OutputStream os = fs.create(path);
> os.writeChars(str);
> ====
>
> You should call writeChars in each iteration of your values but not
> use a StringBuffer. The key should be part of your file name to
> indicate the group of URIs.
>
>
> On Wed, Apr 29, 2009 at 2:56 PM, nguyenhuynh.mr
> <[email protected]> wrote:
>   
>> Wang Zhong wrote:
>>
>>     
>>> Where did you get the large string? Can't you generate the string one
>>> line per time and append it to local files, then upload to HDFS when
>>> finished?
>>>
>>> On Wed, Apr 29, 2009 at 10:47 AM, nguyenhuynh.mr
>>> <[email protected]> wrote:
>>>
>>>       
>>>> Hi all!
>>>>
>>>>
>>>> I have the large String and I want to write it into the file in HDFS.
>>>>
>>>> (The large string has >100.000 lines.)
>>>>
>>>>
>>>> Current, I use method copyBytes of class org.apache.hadoop.io.IOUtils.
>>>> But the copyBytes request the InputStream of content. Therefore, I have
>>>> to convert the String to InputStream, some things like:
>>>>
>>>>
>>>>
>>>>    InputStream in=new ByteArrayInputStream(sb.toString().getBytes());
>>>>
>>>>    The "sb" is a StringBuffer.
>>>>
>>>>
>>>> It not work with the command line above. :(
>>>>
>>>> There is the error:
>>>>
>>>> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>>>>    at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:232)
>>>>    at java.lang.StringCoding.encode(StringCoding.java:272)
>>>>    at java.lang.String.getBytes(String.java:947)
>>>>    at asnet.haris.mapred.jobs.Test.main(Test.java:32)
>>>>
>>>>
>>>>
>>>> Please give me the good solution!
>>>>
>>>>
>>>> Thanks,
>>>>
>>>>
>>>> Best regards,
>>>>
>>>> Nguyen,
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>         
>>>
>>>
>>>       
>> Thanks for your answer!
>>
>> I have Map/Reduce job. It partition URI from HBase into groups URIs.
>> In the map phase, get group name of the URI and collect output
>> <groupname, uri>.
>> In the reduce phase, I get the String (URIs of the partition) and save
>> into HDFS.
>> Each group is a file.
>>
>> Thanks,
>>
>> Best regards,
>> NguyenHuynh.
>>
>>
>>     
>
>
>
>   
Thanks very much!


Best,
Nguyen.

Re: How to write large string to file in HDFS

Reply via email to