Re: MultithreadedMapper

2012-07-26 Thread kenyh
For multithreaded mapper, it can get more chances to combine the mapper output. Meanwhile, the locality of some global data will also be better. But the implementation in Hadoop 1.0.2 uses heavy synchronization, which brings much overhead. Are there any optimization about multithreaded mapper? s

Re: MultithreadedMapper

2012-07-26 Thread kenyh
For multithreaded mapper, it can get more chances to combine the mapper output. Meanwhile, the locality of some global data will also be better. But the implementation in Hadoop 1.0.2 uses heavy synchronization, which brings much overhead. Are there any optimization about multithreaded mapper? s

答复: regarding _HOST token replacement in security hadoop

2012-07-26 Thread Wangwenli
Could you spent one minute to check whether below code will cause issue or not? In org.apache.hadoop.hdfs.server.namenode.NameNode.loginAsNameNodeUser(), it use socAddr.getHostName() to get _HOST, But in org.apache.hadoop.security.SecurityUtil.replacePattern(), in getLocalHostName(), it use get

Re: regarding _HOST token replacement in security hadoop

2012-07-26 Thread Arpit Gupta
you need to use HTTP/_h...@site.com as that is the principal needed by spnego. So you would need create the HTTP/_HOST principal and add it to the same keytab (/home/hdfs/keytab/nn.service.keytab). -- Arpit Gupta Hortonworks Inc. http://hortonworks.com/ On Jul 26, 2012, at 6:54 PM, Wangwenli w

答复: regarding _HOST token replacement in security hadoop

2012-07-26 Thread Wangwenli
Thank yours response. I am using hadoop-2.0.0-alpha from apache site. In which version it should configure with HTTP/_h...@site.com? I think not in hadoop-2.0.0-alpha. Because I login successful with other principal, pls refer below log: 2012-07-23 22:48:17,303 INFO org.apache.hadoop.security

Re: regarding _HOST token replacement in security hadoop

2012-07-26 Thread Arpit Gupta
what version of hadoop are you using? also dfs.web.authentication.kerberos.principal should be set to HTTP/_h...@site.com -- Arpit Gupta Hortonworks Inc. http://hortonworks.com/ On Jul 26, 2012, at 6:11 PM, Wangwenli wrote: > Hi all, > > I configured like below in hdfs-site.xml: > > > d

regarding _HOST token replacement in security hadoop

2012-07-26 Thread Wangwenli
Hi all, I configured like below in hdfs-site.xml: dfs.namenode.kerberos.principal nn/_HOST@site dfs.web.authentication.kerberos.principal nn/_HOST@site When start up namenode, I found, namenode will use principal : nn/167-52-0-56@site to login, but the http server will

[jira] [Created] (HADOOP-8626) Typo in default setting for hadoop.security.group.mapping.ldap.search.filter.user

2012-07-26 Thread Jonathan Natkins (JIRA)
Jonathan Natkins created HADOOP-8626: Summary: Typo in default setting for hadoop.security.group.mapping.ldap.search.filter.user Key: HADOOP-8626 URL: https://issues.apache.org/jira/browse/HADOOP-8626

Re: MultithreadedMapper

2012-07-26 Thread syscokid
Why multithread the mapper? Just create more mappers. That way you spread the data load as well as the mapping load potentially across multiple nodes. kenyh wrote: > > I wonder if there are any optimization about the multithread mapper to > decrease the contention of input reading and output?

Re: MultithreadedMapper

2012-07-26 Thread Radim Kolar
But I found that synchronization is needed for record reading(read the input Key and Value) and result output. I use Spring Batch for that. it has io buffering builtin and it is very easy to use and well documented.

Re: MultithreadedMapper

2012-07-26 Thread Doug Cutting
On Thu, Jul 26, 2012 at 7:42 AM, Robert Evans wrote: > About the only time that > MultiThreaded mapper makes a lot of since is if there is a lot of > computation associated with each key/value pair. Or if the mapper does a lot of i/o to some external resource, e.g., a web crawler. Doug

Re: MultithreadedMapper

2012-07-26 Thread Robert Evans
In general multithreaded does not get you much in traditional Map/Reduce. If you want the mappers to run faster you can drop the split size and get a similar result, because you get more parallelism. This is the use case that we have typically concentrated on. About the only time that MultiThread

[jira] [Created] (HADOOP-8625) Use GzipCodec to decompress data in ResetableGzipOutputStream test

2012-07-26 Thread Mike Percy (JIRA)
Mike Percy created HADOOP-8625: -- Summary: Use GzipCodec to decompress data in ResetableGzipOutputStream test Key: HADOOP-8625 URL: https://issues.apache.org/jira/browse/HADOOP-8625 Project: Hadoop Common

Build failed in Jenkins: Hadoop-Common-trunk #484

2012-07-26 Thread Apache Jenkins Server
See Changes: [szetszwo] HDFS-3696. Set chunked streaming mode in WebHdfsFileSystem write operations to get around a Java library bug causing OutOfMemoryError. [todd] Amend previous commit of HDFS-3626: accidentally included a hunk