Re: Could not get FileSystem obj, get java.lang.NullPointerException !!

2010-06-23 Thread chaitanya krishna
Hi Elton, Can you mention the hadoop version? Also, can you double-check if you set "fs.default.name" property correctly in conf/hdfs-site.xml? -Chaitanya. On Thu, Jun 24, 2010 at 12:12 PM, elton sky wrote: > Hi, > I am new to hadoop programming. I am trying to copy a local file to HDFS. > M

Could not get FileSystem obj, get java.lang.NullPointerException !!

2010-06-23 Thread elton sky
Hi, I am new to hadoop programming. I am trying to copy a local file to HDFS. My code snippet is: . . Configuration conf = new Configuration(); InputStream in=null; OutputStream out = null; try { in = new BufferedInputStream(new FileInputStream(src));

[jira] Created: (HADOOP-6838) Investigate Eclipse API Tools for enforcing or reporting on API compatibility

2010-06-23 Thread Tom White (JIRA)
Investigate Eclipse API Tools for enforcing or reporting on API compatibility -- Key: HADOOP-6838 URL: https://issues.apache.org/jira/browse/HADOOP-6838 Project: Hadoop Commo

Re: Questions about recommendation value of the "io.sort.mb" parameter

2010-06-23 Thread Yu Li
Hi Todd, Thanks a lot for your further explanation! It makes me more clear about this parameter. BTW, please allow me to express my thankfulness to everyone helps. Best Regards, Carp 在 2010年6月24日 上午1:49,Todd Lipcon 写道: > Plus there's some overhead for each record of map output. Specifically, 2

[jira] Created: (HADOOP-6837) Support for LZMA compression

2010-06-23 Thread Nicholas Carlini (JIRA)
Attachments: HADOOP-6837-lzma-java-20100623.patch Add support for LZMA (http://www.7-zip.org/sdk.html) compression, which generally achieves higher compression ratios than both gzip and bzip2. -- This message is automatically generated by JIRA. - You can reply to this email to add a

Re: Questions about recommendation value of the "io.sort.mb" parameter

2010-06-23 Thread Todd Lipcon
Plus there's some overhead for each record of map output. Specifically, 24 bytes. So if you output 64MB worth of data, but each of your objects is only 24 bytes long itself, you need more than 128MB worth of spill space for it. Last, the map output buffer begins spilling when it is partially full s

Re: New attachment added to page download on Hadoop Wiki

2010-06-23 Thread Tsz Wo (Nicholas), Sze
We are getting more spams in the hadoop wiki page. How should we deal with it? Nicholas Sze - Original Message > From: Apache Wiki > To: Apache Wiki > Sent: Wed, June 23, 2010 8:31:37 AM > Subject: New attachment added to page download on Hadoop Wiki > > Dear Wiki user, You have

Re: Questions about recommendation value of the "io.sort.mb" parameter

2010-06-23 Thread 李钰
Hi Jeff, Thanks for your quick reply. Seems my thinking is stuck on the job style I'm running. Now I'm much clearer about it. Best Regards, Carp 2010/6/23 Jeff Zhang > Hi 李钰 > > The size of map output depends on your Mapper class. The Mapper class > will do processing on the input data. > > >

Re: Questions about recommendation value of the "io.sort.mb" parameter

2010-06-23 Thread Jeff Zhang
Hi 李钰 The size of map output depends on your Mapper class. The Mapper class will do processing on the input data. 2010/6/23 李钰 : > Hi Sriguru, > > Thanks a lot for your comments and suggestions! > Here I still have some questions: since map mainly do data preparation, > say split input data int

Re: Questions about recommendation value of the "io.sort.mb" parameter

2010-06-23 Thread 李钰
Hi Sriguru, Thanks a lot for your comments and suggestions! Here I still have some questions: since map mainly do data preparation, say split input data into KVPs, sort and partition before spill, would the size of map output KVPs be much larger than the input data size? If not, since one map task

RE: Questions about recommendation value of the "io.sort.mb" parameter

2010-06-23 Thread Srigurunath Chakravarthi
Hi Carp, Your assumption is right that this is a per-map-task setting. However, this buffer stores map output KVPs, not input. Therefore the optimal value depends on how much data your map task is generating. If your output per map is greater than io.sort.mb, these rules of thumb that could wor