Re: HDFS Blockreport question
Thanks! I'm already using eclipse to browse the code. In this scenario, i could understand that java serializes the object through the network and its parameters. is that ok? For example, if i want to make a pure C library (with no JNI interfaces).. is it possible/feasible? or it will be like to freeze the hell? Thanks once again!!! On Sat, Apr 3, 2010 at 1:54 AM, Ryan Rawson wrote: > If you look at the getProxy code it passes an "Invoker" (or something > like that) which the proxy code uses to delegate calls TO. The > Invoker will call another class "Client" which has sub-classes like > Call, and Connection which wrap the actual java IO. This all lives in > the org.apache.hadoop.ipc package. > > Be sure to use a good IDE like IJ or Eclipse to browse the code, it > makes following all this stuff much easier. > > > > > > On Fri, Apr 2, 2010 at 4:39 PM, Alberich de megres > wrote: >> Hi again! >> >> Anyone could help me? >> I could not understand how RPC class works. For me, only tries to >> instantiates a single interfaces with no declaration for some methods >> like blockreport. But then it uses rpc.getproxy to get new class wich >> send messages with name node. >> >> I'm sorry for this silly question, but i am really lost at this point. >> >> Thanks for the patience. >> >> >> >> On Fri, Apr 2, 2010 at 2:11 AM, Alberich de megres >> wrote: >>> Hi Jay! >>> >>> thanks for the answear but i'm asking for what it works it sends? >>> blockreport is an interface in DatanodeProtocol that has no >>> declaration. >>> >>> thanks! >>> >>> >>> >>> On Thu, Apr 1, 2010 at 5:50 PM, Jay Booth wrote: In DataNode: public DatanodeProtocol namenode It's not a reference to an actual namenode, it's a wrapper for a network protocol created by that RPC.waitForProxy call -- so when it calls namenode.blockReport, it's sending that information over RPC to the namenode instance over the network On Thu, Apr 1, 2010 at 5:50 AM, Alberich de megres wrote: > Hi everyone! > > sailing throught the hdfs source code that comes with hadoop 0.20.2, i > could not understand how hdfs sends blockreport to nameNode. > > As i can see, in > src/hdfs/org/apache/hadoop/hdfs/server/datanode/DataNode.java we > create this.namenode interface with RPC.waitForProxy call (wich i > could not understand which class it instantiates, and how it works). > > After that, datanode generates block list report (blockListAsLongs) > with data.getBlockReport, and call this.namenode.blockReport(..), > inside namenode.blockReport it calls again namesystem.processReport. > This leads to an update of block lists inside nameserver. > > But how it sends over the network this blockreport? > > Anyone can point me some light? > > thanks for all! > (and sorry for the newbie question) > > Alberich > >>> >> >
Re: HDFS Blockreport question
A pure C library to communicate with HDFS? Certainly possible, but it would be a lot of work, and the HDFS wire protocols are ad hoc, only somewhat documented and subject to change between releases right now so you'd be chasing a moving target. I'd try to think of another way to accomplish what you want to do before attempting a client reimplementation in C right now.. if you only need to talk to the namenode and not the datanodes it might be a little easier but still, lots of work that will probably be obsolete after another release or two. On Tue, Apr 6, 2010 at 9:47 AM, Alberich de megres wrote: > Thanks! > > I'm already using eclipse to browse the code. > In this scenario, i could understand that java serializes the object > through the network and its parameters. is that ok? > > For example, if i want to make a pure C library (with no JNI > interfaces).. is it possible/feasible? or it will be like to freeze > the hell? > > Thanks once again!!! > > > On Sat, Apr 3, 2010 at 1:54 AM, Ryan Rawson wrote: > > If you look at the getProxy code it passes an "Invoker" (or something > > like that) which the proxy code uses to delegate calls TO. The > > Invoker will call another class "Client" which has sub-classes like > > Call, and Connection which wrap the actual java IO. This all lives in > > the org.apache.hadoop.ipc package. > > > > Be sure to use a good IDE like IJ or Eclipse to browse the code, it > > makes following all this stuff much easier. > > > > > > > > > > > > On Fri, Apr 2, 2010 at 4:39 PM, Alberich de megres > > wrote: > >> Hi again! > >> > >> Anyone could help me? > >> I could not understand how RPC class works. For me, only tries to > >> instantiates a single interfaces with no declaration for some methods > >> like blockreport. But then it uses rpc.getproxy to get new class wich > >> send messages with name node. > >> > >> I'm sorry for this silly question, but i am really lost at this point. > >> > >> Thanks for the patience. > >> > >> > >> > >> On Fri, Apr 2, 2010 at 2:11 AM, Alberich de megres > >> wrote: > >>> Hi Jay! > >>> > >>> thanks for the answear but i'm asking for what it works it sends? > >>> blockreport is an interface in DatanodeProtocol that has no > >>> declaration. > >>> > >>> thanks! > >>> > >>> > >>> > >>> On Thu, Apr 1, 2010 at 5:50 PM, Jay Booth wrote: > In DataNode: > public DatanodeProtocol namenode > > It's not a reference to an actual namenode, it's a wrapper for a > network > protocol created by that RPC.waitForProxy call -- so when it calls > namenode.blockReport, it's sending that information over RPC to the > namenode > instance over the network > > On Thu, Apr 1, 2010 at 5:50 AM, Alberich de megres < > alberich...@gmail.com>wrote: > > > Hi everyone! > > > > sailing throught the hdfs source code that comes with hadoop 0.20.2, > i > > could not understand how hdfs sends blockreport to nameNode. > > > > As i can see, in > > src/hdfs/org/apache/hadoop/hdfs/server/datanode/DataNode.java we > > create this.namenode interface with RPC.waitForProxy call (wich i > > could not understand which class it instantiates, and how it works). > > > > After that, datanode generates block list report (blockListAsLongs) > > with data.getBlockReport, and call this.namenode.blockReport(..), > > inside namenode.blockReport it calls again namesystem.processReport. > > This leads to an update of block lists inside nameserver. > > > > But how it sends over the network this blockreport? > > > > Anyone can point me some light? > > > > thanks for all! > > (and sorry for the newbie question) > > > > Alberich > > > > >>> > >> > > >
Re: HDFS Blockreport question
Hey Jay, I think, if you're experienced in implementing transfer protocols, it is not difficult to implement the HDFS wire protocol. As you point out, they are subject to change between releases (especially between 0.20, 0.21, and 0.22) and basically documented in fragments in the java source code. At least, I looked at doing this for the read portions, and it wasn't horrible. However, the *really hard part* is the client retry/recovery logic. That's where a lot of the intelligence is, in very large classes, and not incredibly well-documented. I've had lots of luck with scaling libhdfs - we average >20TB / day and billions of I/O operations a day with it. I'd strongly advise not re-inventing the wheel, unless it's for a research project. Brian On Apr 6, 2010, at 8:53 AM, Jay Booth wrote: > A pure C library to communicate with HDFS? > > Certainly possible, but it would be a lot of work, and the HDFS wire > protocols are ad hoc, only somewhat documented and subject to change between > releases right now so you'd be chasing a moving target. I'd try to think of > another way to accomplish what you want to do before attempting a client > reimplementation in C right now.. if you only need to talk to the namenode > and not the datanodes it might be a little easier but still, lots of work > that will probably be obsolete after another release or two. > > > On Tue, Apr 6, 2010 at 9:47 AM, Alberich de megres > wrote: > >> Thanks! >> >> I'm already using eclipse to browse the code. >> In this scenario, i could understand that java serializes the object >> through the network and its parameters. is that ok? >> >> For example, if i want to make a pure C library (with no JNI >> interfaces).. is it possible/feasible? or it will be like to freeze >> the hell? >> >> Thanks once again!!! >> >> >> On Sat, Apr 3, 2010 at 1:54 AM, Ryan Rawson wrote: >>> If you look at the getProxy code it passes an "Invoker" (or something >>> like that) which the proxy code uses to delegate calls TO. The >>> Invoker will call another class "Client" which has sub-classes like >>> Call, and Connection which wrap the actual java IO. This all lives in >>> the org.apache.hadoop.ipc package. >>> >>> Be sure to use a good IDE like IJ or Eclipse to browse the code, it >>> makes following all this stuff much easier. >>> >>> >>> >>> >>> >>> On Fri, Apr 2, 2010 at 4:39 PM, Alberich de megres >>> wrote: Hi again! Anyone could help me? I could not understand how RPC class works. For me, only tries to instantiates a single interfaces with no declaration for some methods like blockreport. But then it uses rpc.getproxy to get new class wich send messages with name node. I'm sorry for this silly question, but i am really lost at this point. Thanks for the patience. On Fri, Apr 2, 2010 at 2:11 AM, Alberich de megres wrote: > Hi Jay! > > thanks for the answear but i'm asking for what it works it sends? > blockreport is an interface in DatanodeProtocol that has no > declaration. > > thanks! > > > > On Thu, Apr 1, 2010 at 5:50 PM, Jay Booth wrote: >> In DataNode: >> public DatanodeProtocol namenode >> >> It's not a reference to an actual namenode, it's a wrapper for a >> network >> protocol created by that RPC.waitForProxy call -- so when it calls >> namenode.blockReport, it's sending that information over RPC to the >> namenode >> instance over the network >> >> On Thu, Apr 1, 2010 at 5:50 AM, Alberich de megres < >> alberich...@gmail.com>wrote: >> >>> Hi everyone! >>> >>> sailing throught the hdfs source code that comes with hadoop 0.20.2, >> i >>> could not understand how hdfs sends blockreport to nameNode. >>> >>> As i can see, in >>> src/hdfs/org/apache/hadoop/hdfs/server/datanode/DataNode.java we >>> create this.namenode interface with RPC.waitForProxy call (wich i >>> could not understand which class it instantiates, and how it works). >>> >>> After that, datanode generates block list report (blockListAsLongs) >>> with data.getBlockReport, and call this.namenode.blockReport(..), >>> inside namenode.blockReport it calls again namesystem.processReport. >>> This leads to an update of block lists inside nameserver. >>> >>> But how it sends over the network this blockreport? >>> >>> Anyone can point me some light? >>> >>> thanks for all! >>> (and sorry for the newbie question) >>> >>> Alberich >>> >> > >>> >> smime.p7s Description: S/MIME cryptographic signature
[jira] Resolved: (HDFS-481) Bug Fixes + HdfsProxy to use proxy user to impresonate the real user
[ https://issues.apache.org/jira/browse/HDFS-481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE resolved HDFS-481. - Resolution: Fixed I also have tested it locally. It worked fine. I have committed this. Thanks, Srikanth! > Bug Fixes + HdfsProxy to use proxy user to impresonate the real user > > > Key: HDFS-481 > URL: https://issues.apache.org/jira/browse/HDFS-481 > Project: Hadoop HDFS > Issue Type: Bug > Components: contrib/hdfsproxy >Affects Versions: 0.21.0 >Reporter: zhiyong zhang >Assignee: Srikanth Sundarrajan > Attachments: HDFS-481-bp-y20.patch, HDFS-481-bp-y20s.patch, > HDFS-481.out, HDFS-481.patch, HDFS-481.patch, HDFS-481.patch, HDFS-481.patch, > HDFS-481.patch, HDFS-481.patch, HDFS-481.patch, HDFS-481.patch > > > Bugs: > 1. hadoop-version is not recognized if run ant command from src/contrib/ or > from src/contrib/hdfsproxy > If running ant command from $HADOOP_HDFS_HOME, hadoop-version will be passed > to contrib's build through subant. But if running from src/contrib or > src/contrib/hdfsproxy, the hadoop-version will not be recognized. > 2. LdapIpDirFilter.java is not thread safe. userName, Group & Paths are per > request and can't be class members. > 3. Addressed the following StackOverflowError. > ERROR [org.apache.catalina.core.ContainerBase.[Catalina].[localh > ost].[/].[proxyForward]] Servlet.service() for servlet proxyForward threw > exception > java.lang.StackOverflowError > at > org.apache.catalina.core.ApplicationHttpRequest.getAttribute(ApplicationHttpR > equest.java:229) > This is due to when the target war (/target.war) does not exist, the > forwarding war will forward to its parent context path /, which defines the > forwarding war itself. This cause infinite loop. Added "HDFS Proxy > Forward".equals(dstContext.getServletContextName() in the if logic to break > the loop. > 4. Kerberos credentials of remote user aren't available. HdfsProxy needs to > act on behalf of the real user to service the requests -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1082) CHANGES.txt in the last three branches diverged
CHANGES.txt in the last three branches diverged --- Key: HDFS-1082 URL: https://issues.apache.org/jira/browse/HDFS-1082 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20.2 Reporter: Konstantin Shvachko Fix For: 0.20.3 Particularly, CHANGES.txt in hdfs trunk and 0.21 don't reflect that 0.20.2 has been released, there is no section for 0.20.3, and the diff on the fixed issues is not uniform. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Reopened: (HDFS-1009) Allow HDFSProxy to impersonate the real user while processing user request
[ https://issues.apache.org/jira/browse/HDFS-1009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srikanth Sundarrajan reopened HDFS-1009: Separating the KerberosAuthroization patch from HDFS-481. KerberosAuthorizationFilter uses a proxy user to act on behalf of the requesting user picked up from Ldap through LdapIpDirFilter. > Allow HDFSProxy to impersonate the real user while processing user request > -- > > Key: HDFS-1009 > URL: https://issues.apache.org/jira/browse/HDFS-1009 > Project: Hadoop HDFS > Issue Type: Improvement > Components: contrib/hdfsproxy >Affects Versions: 0.22.0 >Reporter: Srikanth Sundarrajan >Assignee: Srikanth Sundarrajan > > HDFSProxy when processing an user request, should perform the operations as > the real user. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1083) Update TestHDFSCLI to not to expect exception class name in the error messages
Update TestHDFSCLI to not to expect exception class name in the error messages -- Key: HDFS-1083 URL: https://issues.apache.org/jira/browse/HDFS-1083 Project: Hadoop HDFS Issue Type: Improvement Components: test Reporter: Suresh Srinivas Assignee: Suresh Srinivas Priority: Minor Fix For: 0.22.0 With the change from HADOOP-6686, the error messages from FsShell no longer includes redundant exception name. TestHDFSCLI needs to be updated to not to expect the exception name in command output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1084) TestDFSShell fails in trunk.
TestDFSShell fails in trunk. Key: HDFS-1084 URL: https://issues.apache.org/jira/browse/HDFS-1084 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 0.22.0 Reporter: Konstantin Shvachko Fix For: 0.22.0 {{TestDFSShell.testFilePermissions()}} fails on an assert attached below. I see it on my Linux box. Don't see it failing with Hudson, and the same test runs fine in 0.21 branch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1085) hftp read failing silently
hftp read failing silently --- Key: HDFS-1085 URL: https://issues.apache.org/jira/browse/HDFS-1085 Project: Hadoop HDFS Issue Type: Bug Reporter: Koji Noguchi When performing a massive distcp through hftp, we saw many tasks fail with {quote} 2010-04-06 17:56:43,005 INFO org.apache.hadoop.tools.DistCp: FAIL 2010/0/part-00032 : java.io.IOException: File size not matched: copied 193855488 bytes (184.9m) to tmpfile (=hdfs://omehost.com:8020/somepath/part-00032) but expected 1710327403 bytes (1.6g) from hftp://someotherhost/somepath/part-00032 at org.apache.hadoop.tools.DistCp$CopyFilesMapper.copy(DistCp.java:435) at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:543) at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:310) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:159) {quote} This means that read itself didn't fail but the resulted file was somehow smaller. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.