Re: HDFS Blockreport question

2010-04-06 Thread Alberich de megres
Thanks!

I'm already using eclipse to browse the code.
In this scenario, i could understand that java serializes the object
through the network and its parameters.  is that ok?

For example, if i want to make a pure C library (with no JNI
interfaces).. is it possible/feasible? or it will be like to freeze
the hell?

Thanks once again!!!


On Sat, Apr 3, 2010 at 1:54 AM, Ryan Rawson  wrote:
> If you look at the getProxy code it passes an "Invoker" (or something
> like that) which the proxy code uses to delegate calls TO.  The
> Invoker will call another class "Client" which has sub-classes like
> Call, and Connection which wrap the actual java IO.  This all lives in
> the org.apache.hadoop.ipc package.
>
> Be sure to use a good IDE like IJ or Eclipse to browse the code, it
> makes following all this stuff much easier.
>
>
>
>
>
> On Fri, Apr 2, 2010 at 4:39 PM, Alberich de megres
>  wrote:
>> Hi again!
>>
>> Anyone could help me?
>> I could not understand how RPC class works. For me, only tries to
>> instantiates a single interfaces with no declaration for some methods
>> like blockreport. But then it uses rpc.getproxy to get new class wich
>> send messages with name node.
>>
>> I'm sorry for this silly question, but i am really lost at this point.
>>
>> Thanks for the patience.
>>
>>
>>
>> On Fri, Apr 2, 2010 at 2:11 AM, Alberich de megres
>>  wrote:
>>> Hi Jay!
>>>
>>> thanks for the answear but i'm asking for what it works it sends?
>>> blockreport is an interface in DatanodeProtocol that has no
>>> declaration.
>>>
>>> thanks!
>>>
>>>
>>>
>>> On Thu, Apr 1, 2010 at 5:50 PM, Jay Booth  wrote:
 In DataNode:
 public DatanodeProtocol namenode

 It's not a reference to an actual namenode, it's a wrapper for a network
 protocol created by that RPC.waitForProxy call -- so when it calls
 namenode.blockReport, it's sending that information over RPC to the 
 namenode
 instance over the network

 On Thu, Apr 1, 2010 at 5:50 AM, Alberich de megres 
 wrote:

> Hi everyone!
>
> sailing throught the hdfs source code that comes with hadoop 0.20.2, i
> could not understand how hdfs sends blockreport to nameNode.
>
> As i can see, in
> src/hdfs/org/apache/hadoop/hdfs/server/datanode/DataNode.java we
> create this.namenode interface with RPC.waitForProxy call (wich i
> could not understand which class it instantiates, and how it works).
>
> After that, datanode generates block list report (blockListAsLongs)
> with data.getBlockReport, and call this.namenode.blockReport(..),
> inside namenode.blockReport it calls again namesystem.processReport.
> This leads to an update of block lists inside nameserver.
>
> But how it sends over the network this blockreport?
>
> Anyone can point me some light?
>
> thanks for all!
> (and sorry for the newbie question)
>
> Alberich
>

>>>
>>
>


Re: HDFS Blockreport question

2010-04-06 Thread Jay Booth
A pure C library to communicate with HDFS?

Certainly possible, but it would be a lot of work, and the HDFS wire
protocols are ad hoc, only somewhat documented and subject to change between
releases right now so you'd be chasing a moving target.  I'd try to think of
another way to accomplish what you want to do before attempting a client
reimplementation in C right now..  if you only need to talk to the namenode
and not the datanodes it might be a little easier but still, lots of work
that will probably be obsolete after another release or two.


On Tue, Apr 6, 2010 at 9:47 AM, Alberich de megres wrote:

> Thanks!
>
> I'm already using eclipse to browse the code.
> In this scenario, i could understand that java serializes the object
> through the network and its parameters.  is that ok?
>
> For example, if i want to make a pure C library (with no JNI
> interfaces).. is it possible/feasible? or it will be like to freeze
> the hell?
>
> Thanks once again!!!
>
>
> On Sat, Apr 3, 2010 at 1:54 AM, Ryan Rawson  wrote:
> > If you look at the getProxy code it passes an "Invoker" (or something
> > like that) which the proxy code uses to delegate calls TO.  The
> > Invoker will call another class "Client" which has sub-classes like
> > Call, and Connection which wrap the actual java IO.  This all lives in
> > the org.apache.hadoop.ipc package.
> >
> > Be sure to use a good IDE like IJ or Eclipse to browse the code, it
> > makes following all this stuff much easier.
> >
> >
> >
> >
> >
> > On Fri, Apr 2, 2010 at 4:39 PM, Alberich de megres
> >  wrote:
> >> Hi again!
> >>
> >> Anyone could help me?
> >> I could not understand how RPC class works. For me, only tries to
> >> instantiates a single interfaces with no declaration for some methods
> >> like blockreport. But then it uses rpc.getproxy to get new class wich
> >> send messages with name node.
> >>
> >> I'm sorry for this silly question, but i am really lost at this point.
> >>
> >> Thanks for the patience.
> >>
> >>
> >>
> >> On Fri, Apr 2, 2010 at 2:11 AM, Alberich de megres
> >>  wrote:
> >>> Hi Jay!
> >>>
> >>> thanks for the answear but i'm asking for what it works it sends?
> >>> blockreport is an interface in DatanodeProtocol that has no
> >>> declaration.
> >>>
> >>> thanks!
> >>>
> >>>
> >>>
> >>> On Thu, Apr 1, 2010 at 5:50 PM, Jay Booth  wrote:
>  In DataNode:
>  public DatanodeProtocol namenode
> 
>  It's not a reference to an actual namenode, it's a wrapper for a
> network
>  protocol created by that RPC.waitForProxy call -- so when it calls
>  namenode.blockReport, it's sending that information over RPC to the
> namenode
>  instance over the network
> 
>  On Thu, Apr 1, 2010 at 5:50 AM, Alberich de megres <
> alberich...@gmail.com>wrote:
> 
> > Hi everyone!
> >
> > sailing throught the hdfs source code that comes with hadoop 0.20.2,
> i
> > could not understand how hdfs sends blockreport to nameNode.
> >
> > As i can see, in
> > src/hdfs/org/apache/hadoop/hdfs/server/datanode/DataNode.java we
> > create this.namenode interface with RPC.waitForProxy call (wich i
> > could not understand which class it instantiates, and how it works).
> >
> > After that, datanode generates block list report (blockListAsLongs)
> > with data.getBlockReport, and call this.namenode.blockReport(..),
> > inside namenode.blockReport it calls again namesystem.processReport.
> > This leads to an update of block lists inside nameserver.
> >
> > But how it sends over the network this blockreport?
> >
> > Anyone can point me some light?
> >
> > thanks for all!
> > (and sorry for the newbie question)
> >
> > Alberich
> >
> 
> >>>
> >>
> >
>


Re: HDFS Blockreport question

2010-04-06 Thread Brian Bockelman
Hey Jay,

I think, if you're experienced in implementing transfer protocols, it is not 
difficult to implement the HDFS wire protocol.  As you point out, they are 
subject to change between releases (especially between 0.20, 0.21, and 0.22) 
and basically documented in fragments in the java source code.  At least, I 
looked at doing this for the read portions, and it wasn't horrible.

However, the *really hard part* is the client retry/recovery logic.  That's 
where a lot of the intelligence is, in very large classes, and not incredibly 
well-documented.

I've had lots of luck with scaling libhdfs - we average >20TB / day and 
billions of I/O operations a day with it.  I'd strongly advise not re-inventing 
the wheel, unless it's for a research project.

Brian

On Apr 6, 2010, at 8:53 AM, Jay Booth wrote:

> A pure C library to communicate with HDFS?
> 
> Certainly possible, but it would be a lot of work, and the HDFS wire
> protocols are ad hoc, only somewhat documented and subject to change between
> releases right now so you'd be chasing a moving target.  I'd try to think of
> another way to accomplish what you want to do before attempting a client
> reimplementation in C right now..  if you only need to talk to the namenode
> and not the datanodes it might be a little easier but still, lots of work
> that will probably be obsolete after another release or two.
> 
> 
> On Tue, Apr 6, 2010 at 9:47 AM, Alberich de megres 
> wrote:
> 
>> Thanks!
>> 
>> I'm already using eclipse to browse the code.
>> In this scenario, i could understand that java serializes the object
>> through the network and its parameters.  is that ok?
>> 
>> For example, if i want to make a pure C library (with no JNI
>> interfaces).. is it possible/feasible? or it will be like to freeze
>> the hell?
>> 
>> Thanks once again!!!
>> 
>> 
>> On Sat, Apr 3, 2010 at 1:54 AM, Ryan Rawson  wrote:
>>> If you look at the getProxy code it passes an "Invoker" (or something
>>> like that) which the proxy code uses to delegate calls TO.  The
>>> Invoker will call another class "Client" which has sub-classes like
>>> Call, and Connection which wrap the actual java IO.  This all lives in
>>> the org.apache.hadoop.ipc package.
>>> 
>>> Be sure to use a good IDE like IJ or Eclipse to browse the code, it
>>> makes following all this stuff much easier.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Fri, Apr 2, 2010 at 4:39 PM, Alberich de megres
>>>  wrote:
 Hi again!
 
 Anyone could help me?
 I could not understand how RPC class works. For me, only tries to
 instantiates a single interfaces with no declaration for some methods
 like blockreport. But then it uses rpc.getproxy to get new class wich
 send messages with name node.
 
 I'm sorry for this silly question, but i am really lost at this point.
 
 Thanks for the patience.
 
 
 
 On Fri, Apr 2, 2010 at 2:11 AM, Alberich de megres
  wrote:
> Hi Jay!
> 
> thanks for the answear but i'm asking for what it works it sends?
> blockreport is an interface in DatanodeProtocol that has no
> declaration.
> 
> thanks!
> 
> 
> 
> On Thu, Apr 1, 2010 at 5:50 PM, Jay Booth  wrote:
>> In DataNode:
>> public DatanodeProtocol namenode
>> 
>> It's not a reference to an actual namenode, it's a wrapper for a
>> network
>> protocol created by that RPC.waitForProxy call -- so when it calls
>> namenode.blockReport, it's sending that information over RPC to the
>> namenode
>> instance over the network
>> 
>> On Thu, Apr 1, 2010 at 5:50 AM, Alberich de megres <
>> alberich...@gmail.com>wrote:
>> 
>>> Hi everyone!
>>> 
>>> sailing throught the hdfs source code that comes with hadoop 0.20.2,
>> i
>>> could not understand how hdfs sends blockreport to nameNode.
>>> 
>>> As i can see, in
>>> src/hdfs/org/apache/hadoop/hdfs/server/datanode/DataNode.java we
>>> create this.namenode interface with RPC.waitForProxy call (wich i
>>> could not understand which class it instantiates, and how it works).
>>> 
>>> After that, datanode generates block list report (blockListAsLongs)
>>> with data.getBlockReport, and call this.namenode.blockReport(..),
>>> inside namenode.blockReport it calls again namesystem.processReport.
>>> This leads to an update of block lists inside nameserver.
>>> 
>>> But how it sends over the network this blockreport?
>>> 
>>> Anyone can point me some light?
>>> 
>>> thanks for all!
>>> (and sorry for the newbie question)
>>> 
>>> Alberich
>>> 
>> 
> 
 
>>> 
>> 



smime.p7s
Description: S/MIME cryptographic signature


[jira] Resolved: (HDFS-481) Bug Fixes + HdfsProxy to use proxy user to impresonate the real user

2010-04-06 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE resolved HDFS-481.
-

Resolution: Fixed

I also have tested it locally.  It worked fine.

I have committed this.  Thanks, Srikanth!

> Bug Fixes + HdfsProxy to use proxy user to impresonate the real user
> 
>
> Key: HDFS-481
> URL: https://issues.apache.org/jira/browse/HDFS-481
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: contrib/hdfsproxy
>Affects Versions: 0.21.0
>Reporter: zhiyong zhang
>Assignee: Srikanth Sundarrajan
> Attachments: HDFS-481-bp-y20.patch, HDFS-481-bp-y20s.patch, 
> HDFS-481.out, HDFS-481.patch, HDFS-481.patch, HDFS-481.patch, HDFS-481.patch, 
> HDFS-481.patch, HDFS-481.patch, HDFS-481.patch, HDFS-481.patch
>
>
> Bugs:
> 1. hadoop-version is not recognized if run ant command from src/contrib/ or 
> from src/contrib/hdfsproxy  
> If running ant command from $HADOOP_HDFS_HOME, hadoop-version will be passed 
> to contrib's build through subant. But if running from src/contrib or 
> src/contrib/hdfsproxy, the hadoop-version will not be recognized. 
> 2. LdapIpDirFilter.java is not thread safe. userName, Group & Paths are per 
> request and can't be class members.
> 3. Addressed the following StackOverflowError. 
> ERROR [org.apache.catalina.core.ContainerBase.[Catalina].[localh
> ost].[/].[proxyForward]] Servlet.service() for servlet proxyForward threw 
> exception
> java.lang.StackOverflowError
> at 
> org.apache.catalina.core.ApplicationHttpRequest.getAttribute(ApplicationHttpR
> equest.java:229)
>  This is due to when the target war (/target.war) does not exist, the 
> forwarding war will forward to its parent context path /, which defines the 
> forwarding war itself. This cause infinite loop.  Added "HDFS Proxy 
> Forward".equals(dstContext.getServletContextName() in the if logic to break 
> the loop.
> 4. Kerberos credentials of remote user aren't available. HdfsProxy needs to 
> act on behalf of the real user to service the requests

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1082) CHANGES.txt in the last three branches diverged

2010-04-06 Thread Konstantin Shvachko (JIRA)
CHANGES.txt in the last three branches diverged
---

 Key: HDFS-1082
 URL: https://issues.apache.org/jira/browse/HDFS-1082
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.20.2
Reporter: Konstantin Shvachko
 Fix For: 0.20.3


Particularly, CHANGES.txt in hdfs trunk and 0.21 don't reflect that 0.20.2 has 
been released, there is no section for 0.20.3, and the diff on the fixed issues 
is not uniform.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Reopened: (HDFS-1009) Allow HDFSProxy to impersonate the real user while processing user request

2010-04-06 Thread Srikanth Sundarrajan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srikanth Sundarrajan reopened HDFS-1009:



Separating the KerberosAuthroization patch from HDFS-481.

KerberosAuthorizationFilter uses a proxy user to act on behalf of the 
requesting user picked up from Ldap through LdapIpDirFilter.

> Allow HDFSProxy to impersonate the real user while processing user request
> --
>
> Key: HDFS-1009
> URL: https://issues.apache.org/jira/browse/HDFS-1009
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: contrib/hdfsproxy
>Affects Versions: 0.22.0
>Reporter: Srikanth Sundarrajan
>Assignee: Srikanth Sundarrajan
>
> HDFSProxy when processing an user request, should perform the operations as 
> the real user.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1083) Update TestHDFSCLI to not to expect exception class name in the error messages

2010-04-06 Thread Suresh Srinivas (JIRA)
Update TestHDFSCLI to not to expect exception class name in the error messages
--

 Key: HDFS-1083
 URL: https://issues.apache.org/jira/browse/HDFS-1083
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: test
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
Priority: Minor
 Fix For: 0.22.0


With the change from HADOOP-6686, the error messages from FsShell no longer 
includes redundant exception name. TestHDFSCLI needs to be updated to not to 
expect the exception name in command output.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1084) TestDFSShell fails in trunk.

2010-04-06 Thread Konstantin Shvachko (JIRA)
TestDFSShell fails in trunk.


 Key: HDFS-1084
 URL: https://issues.apache.org/jira/browse/HDFS-1084
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 0.22.0
Reporter: Konstantin Shvachko
 Fix For: 0.22.0


{{TestDFSShell.testFilePermissions()}} fails on an assert attached below. I see 
it on my Linux box. Don't see it failing with Hudson, and the same test runs 
fine in 0.21 branch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1085) hftp read failing silently

2010-04-06 Thread Koji Noguchi (JIRA)
hftp read  failing silently
---

 Key: HDFS-1085
 URL: https://issues.apache.org/jira/browse/HDFS-1085
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Koji Noguchi


When performing a massive distcp through hftp, we saw many tasks fail with 

{quote}
2010-04-06 17:56:43,005 INFO org.apache.hadoop.tools.DistCp: FAIL 
2010/0/part-00032 : java.io.IOException: File size not matched: copied 
193855488 bytes (184.9m) to tmpfile 
(=hdfs://omehost.com:8020/somepath/part-00032)
but expected 1710327403 bytes (1.6g) from 
hftp://someotherhost/somepath/part-00032
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.copy(DistCp.java:435)
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:543)
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:310)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:159)
{quote}

This means that read itself didn't fail but the resulted file was somehow 
smaller.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.