Change log level

2016-04-19 Thread Kun Ren
Hi All,

I compiled the source code, and used eclipse to remotely debug the code, I
want to see the Debug information from the log, so I changed the log level
for some classes, for example, I changed the FsShell's log level to
DEBUG(change it from http://localhost:50070/logLevel), then I add the
following test code in the FsShell.java:

LOG.debug("FsShell:main(), log leve=debug");
LOG.info("FsShell:main(), log leve=info");

I re-compiled the code, and remotely debug it, however I can see the output
"FsShell:main(), log leve=info", but can not see the LOG.debug line, looks
like the log level is still INFO, but I checked with
http://localhost:50070/logLevel, it shows that the level is DEBUG, do you
know why or how to enable debug and change log level to debug? Thanks so
much for your help.

By the way, I also tried to change the log4j.properties, but still not
working.

Best,
Kun


HDFS Federation

2016-04-27 Thread Kun Ren
Hi Genius,

I have two questions about the HDFS Federation:
(1) Since there are multiple namenodes,  there should be some code that
analysis the client request and transfer  the request to the appropriate
namenode, could you please point to me where I can find the related code?

(2) .Also just confirm that the Hadoop 2.7.2 support HDFS Federation, but
in default there is only 1 namenode, is this correct? Meanwhile, do you
think it is possible to configure the HDFS Fderation in the pseudo
distributed mode in one node?

Thanks so much in advance.

Best,
Kun Ren


Re: HDFS Federation

2016-04-28 Thread Kun Ren
Thanks a  lot, Kihwal.

So some code in the ViewFileSystem analysis the client request and transfer
the request to the correct namenode, correct?  Can you point out in
particular which function do this? Thanks.

On Thu, Apr 28, 2016 at 10:57 AM, Kihwal Lee 
wrote:

> Kun,
>
> (1) The client-facing counter part of federation is ViewFileSystem, aka
> client side mount table.(2) Federation is supported in 2.7. There are test
> cases bringing up federated mini cluster, so I assume setting up a pseudo
> distributed cluster is possible. I am not sure whether all support scripts
> will work as is.
> 73,Kihwal
>
>
>   From: Kun Ren 
>  To: hdfs-dev@hadoop.apache.org
>  Sent: Wednesday, April 27, 2016 7:29 PM
>  Subject: HDFS Federation
>
> Hi Genius,
>
> I have two questions about the HDFS Federation:
> (1) Since there are multiple namenodes,  there should be some code that
> analysis the client request and transfer  the request to the appropriate
> namenode, could you please point to me where I can find the related code?
>
> (2) .Also just confirm that the Hadoop 2.7.2 support HDFS Federation, but
> in default there is only 1 namenode, is this correct? Meanwhile, do you
> think it is possible to configure the HDFS Fderation in the pseudo
> distributed mode in one node?
>
> Thanks so much in advance.
>
> Best,
> Kun Ren
>
>
>
>


handlerCount

2016-04-28 Thread Kun Ren
Hi Genius,

I have a quick question:

I remembered I saw the default value for HandlerCout is 10(The number of
Handler threads), but I can not find where it is defined in the source
code, could you please point to me where I can find it in the 2.7.2
codebase? Thanks a lot.


Re: handlerCount

2016-04-28 Thread Kun Ren
Thanks a lot, Chris and Kihwal.

I found it in the DFSConfigKeys as you mentioned.

On Thu, Apr 28, 2016 at 6:00 PM, Chris Nauroth 
wrote:

> Hello,
>
> In general, configuration property default values will be defined in two
> places: 1) hdfs-default.xml, which defines the default property values
> when a deployment doesn't specifically set them and 2) DFSConfigKeys, a
> class that defines constant default values that the code uses if for some
> reason no default value is found during the configuration lookup.
>
> https://github.com/apache/hadoop/blob/rel/release-2.7.2/hadoop-hdfs-project
> /hadoop-hdfs/src/main/resources/hdfs-default.xml#L602-L606
>
>
> https://github.com/apache/hadoop/blob/rel/release-2.7.2/hadoop-hdfs-project
> /hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java#L473-L
> 474
>
>
> --Chris Nauroth
>
>
>
>
> On 4/28/16, 2:51 PM, "Kun Ren"  wrote:
>
> >Hi Genius,
> >
> >I have a quick question:
> >
> >I remembered I saw the default value for HandlerCout is 10(The number of
> >Handler threads), but I can not find where it is defined in the source
> >code, could you please point to me where I can find it in the 2.7.2
> >codebase? Thanks a lot.
>
>


Get the methodName and parameters from the Call object in server.java

2016-05-02 Thread Kun Ren
Hi Genius,

 I want to intercept the requests in the processRpcRequest() method in the
listener component in server.java, for example if I want to intercept the
"mkdirs" and "append" request,  I just try to get the method name and
parameters before this line:
callQueue.put(call);

Currently I use the following way to get the method name:
  rpcRequest = call.rpcRequest;
  RpcRequestWrapper request = (RpcRequestWrapper) rpcRequest;
  RequestHeaderProto rpcRequestProto = request.getRequestHeader();
  String methodName = rpcRequestProto.getMethodName();

Then the methodName is "mkdirs" if the request is "./bin/hdfs dfs -mkdir
input/test1", however I don't know how to get the parameter, like
"input/test1", does anyone know how to get the methodName and parameters
from the Call object?

Thanks a lot and very appreciate.


Get the methodName and parameters from the Call object in server.java

2016-05-05 Thread Kun Ren
Hi Genius,

 I want to intercept the requests in the processRpcRequest() method in the
listener component in server.java, for example if I want to intercept the
"mkdirs" and "append" request,  I just try to get the method name and
parameters before this line:
callQueue.put(call);

Currently I use the following way to get the method name:
  rpcRequest = call.rpcRequest;
  RpcRequestWrapper request = (RpcRequestWrapper) rpcRequest;
  RequestHeaderProto rpcRequestProto = request.getRequestHeader();
  String methodName = rpcRequestProto.getMethodName();

Then the methodName is "mkdirs" if the request is "./bin/hdfs dfs -mkdir
input/test1", however I don't know how to get the parameter, like
"input/test1", does anyone know how to get the methodName and parameters
from the Call object?

Thanks a lot and very appreciate.


Compile proto

2016-05-06 Thread Kun Ren
Hi Genius,

I added a new proto into the
HADOOP_DIR/hadoop-common-project/hadoop-common/src/main/proto,

however,every time when I run the following Maven commands:

   mvn install -DskipTests
   mvn eclipse:eclipse -DdownloadSources=true -DdownloadJavadocs=
true

It only compiles all other protoes, but don't compile my added new proto,do
you know why and how can I configure it? Otherwise I have to compile the
new proto by hand.

Thanks a lot for your help.


Re: Compile proto

2016-05-10 Thread Kun Ren
Yes, this fixed the problem. Thanks a lot for your reply.

On Tue, May 10, 2016 at 2:13 PM, Colin McCabe  wrote:

> Hi Kun Ren,
>
> You have to add your new proto file to the relevant pom.xml file.
>
> best,
> Colin
>
> On Fri, May 6, 2016, at 13:04, Kun Ren wrote:
> > Hi Genius,
> >
> > I added a new proto into the
> > HADOOP_DIR/hadoop-common-project/hadoop-common/src/main/proto,
> >
> > however,every time when I run the following Maven commands:
> >
> >mvn install -DskipTests
> >mvn eclipse:eclipse -DdownloadSources=true -DdownloadJavadocs=
> > true
> >
> > It only compiles all other protoes, but don't compile my added new
> > proto,do
> > you know why and how can I configure it? Otherwise I have to compile the
> > new proto by hand.
> >
> > Thanks a lot for your help.
>
> -
> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>
>


cp and mv

2016-05-20 Thread Kun Ren
Hi Genius,

Currently I debugged the cp and mv operations, for example:
(1) ./bin/hdfs dfs -cp input/a.xml input/b.xml
(2)./bin/hdfs dfs -mv input/a.xml input/b.xml

My understanding is that for operation cp, it will create a new file b.xml,
and will copy the content of a.xml to b.xml; For mv operations, it will
create b.xml and copy the content of a.xml to b.xml, and delete the a.xml.

However, when I debug the code, i found that both operations will finally
go through the create() method in NameNodeRpcServer.java, but I didn't see
any calls to copy and delete function,  could you please point out to me
where I can debug and see the full logic of the cp and mv operations.
Thanks a lot.


Re: cp and mv

2016-05-23 Thread Kun Ren
Make sense, Thanks a lot, Mingliang.  Another quick question:  why there is
a shell command that can copy a HDFS file to another  HDFS file, but there
is no API that can do this, the API only support copying file from local to
HDFS, so that I should call multiple APIs to implement this, is that
correct?  Thanks.

On Fri, May 20, 2016 at 5:44 PM, Mingliang Liu  wrote:

> Kun,
>
> I think you need to be ware of the difference between client and server
> side logic. Perhaps you’re more interested in the client side in this case.
> The commands are generally running in the shell, and
> org.apache.hadoop.fs.shell package is a good place to start. Specially,
> have a look at CommandWithDestination.java.
>
> Ciao,
>
> L
>
> On May 20, 2016, at 12:05 PM, Kun Ren  wrote:
>
> Hi Genius,
>
> Currently I debugged the cp and mv operations, for example:
> (1) ./bin/hdfs dfs -cp input/a.xml input/b.xml
> (2)./bin/hdfs dfs -mv input/a.xml input/b.xml
>
> My understanding is that for operation cp, it will create a new file b.xml,
> and will copy the content of a.xml to b.xml; For mv operations, it will
> create b.xml and copy the content of a.xml to b.xml, and delete the a.xml.
>
> However, when I debug the code, i found that both operations will finally
> go through the create() method in NameNodeRpcServer.java, but I didn't see
> any calls to copy and delete function,  could you please point out to me
> where I can debug and see the full logic of the cp and mv operations.
> Thanks a lot.
>
>
>


Cp command is not atomic

2016-05-25 Thread Kun Ren
Hi Genius,

If I understand correctly, the shell command "cp" for the HDFS is not
atomic, is that correct?

For example:

./bin/hdfs dfs -cp input/a.xml input/b.xml

This command actually does 3 things, 1. read input/a.xml; 2. Create a new
file input/b.xml; 3. Write the content of a.xml to b.xml;

When I looked at the code, and the client side actually does the 3 steps
and there are no lock between the 3 step, does it mean that the cp command
is not guaranteed atomic?


Thanks a lot for your reply.


HDFS Federation-- cross namenodes operations

2016-05-25 Thread Kun Ren
Hi Genius,

Does HDFS Federation support the cross namenodes operations?

For example:

./bin/hdfs dfs -cp input1/a.xml input2/b.xml

Supposed that input1 belongs namenode 1, and input 2 belongs namenode 2,
does Federation support this operation? And if not, why?

Thanks.


Re: Cp command is not atomic

2016-05-25 Thread Kun Ren
Thanks a lot, Chris, this is helpful.

On Wed, May 25, 2016 at 12:33 PM, Chris Nauroth 
wrote:

> Hello Kun,
>
> You are correct that "hdfs dfs -cp" is not atomic, but the details of that
> are a bit different from what you described.  For the example you gave,
> the sequence of events would be:
>
> 1. Open a.xml.
> 2. Create file b.xml._COPYING_.
> 3. Copy the bytes from a.xml to b.xml._COPYING_.
> 4. Rename b.xml._COPYING_ to b.xml.
>
> b.xml._COPYING_ is a temporary file.  All the bytes are written to this
> location first.  Only if the full copy is successful, it proceeds to step
> 4 to rename it to its final destination at b.xml.  The rename is atomic,
> so overall, this has the effect that b.xml will never have
> partially-written data.  Either the whole copy succeeds or the copy fails
> and b.xml doesn't exist.
>
> However, even though the rename is atomic, we can't claim the overall
> operation is atomic.  For example, if the process dies between step 2 and
> step 3, then the command leaves a lingering side effect in the form of the
> b.xml._COPYING_ file.
>
> Perhaps it's sufficient for your use case that the final rename step is
> atomic.
>
> --Chris Nauroth
>
>
>
>
> On 5/25/16, 8:21 AM, "Kun Ren"  wrote:
>
> >Hi Genius,
> >
> >If I understand correctly, the shell command "cp" for the HDFS is not
> >atomic, is that correct?
> >
> >For example:
> >
> >./bin/hdfs dfs -cp input/a.xml input/b.xml
> >
> >This command actually does 3 things, 1. read input/a.xml; 2. Create a new
> >file input/b.xml; 3. Write the content of a.xml to b.xml;
> >
> >When I looked at the code, and the client side actually does the 3 steps
> >and there are no lock between the 3 step, does it mean that the cp command
> >is not guaranteed atomic?
> >
> >
> >Thanks a lot for your reply.
>
>


Start client side daemon

2016-07-22 Thread Kun Ren
Hi Genius,

I understand that we use the command to start namenode and datanode. But I
don't know how HDFS starts client side and creates the Client side
object(Like DistributedFileSystem), and client side RPC server? Could you
please point it out how HDFS start the client side dameon?
If the client side uses the same RPC server with server side, Can I
understand that the client side has to be located at either Namenode or
Datanode?

Thanks so much.
Kun


Multiple namenodes

2016-07-22 Thread Kun Ren
Hi Genius,

I am currently involved in a project that will create/start multiple
namenodes(It is different with Federation that: We want to partition the
metadata not only by directory, and may support other partition schemes,
and we want to support the distributed operations that cross multiple
namenodes), It would be great if I can get some suggestions:

(1).  How to create/start multiple namenodes? Suppose I want to create 2
Namenodes, and one in machine A and the other in machien B, I should start
the Namenode in both machines(For example, each machine will call the
initialize() method in NameNode.java), correct? Yes, I definitely need to
change the related code like partition the metadata into different namenode
and change the data structure etc, just want to make sure that I can use
this way to simply start multiple namenodes. Or do you have any suggestions
to do so?

(2). Once I have multiple Namenodes running, do you think what is the
best/simple way to change the HDFS client code to let the clients send the
requests to a random Namenode?

(3). I need to support the communication between the Namenodes, my current
plan is to create one more protocol that supports the communication between
the Namenodes, something like the clientProtocol and
ClientDataNodeProtocol. Do you think is it easy to do so? Or do you have
other suggestions to support the communication between the Namenodes?

Thanks so much.
Kun


Re: Multiple namenodes

2016-07-22 Thread Kun Ren
Thanks a lot for your reply, Daniel, very helpful.

About (1) :  I will consider this way, thanks. Also beside multiple
clusters, are there any other options to do so? Thanks.
About (2), if I understand correctly, HDFS used the quorum journal
manager(QJM) for HA, and the client  still only communicates with the
active namenode, not both node, am I understanding right? Thanks.

On Fri, Jul 22, 2016 at 1:27 PM, Daniel Templeton 
wrote:

> On 7/22/16 8:45 AM, Kun Ren wrote:
>
>> (1).  How to create/start multiple namenodes?
>>
>
> Just pretend like you have two separate HDFS clusters and set them up that
> way.  (That is actually what you have.)
>
> (2). Once I have multiple Namenodes running, do you think what is the
>> best/simple way to change the HDFS client code to let the clients send the
>> requests to a random Namenode?
>>
>
> The client code is already built to try multiple NNs to handle HA. You can
> look there for inspiration.  If you want random, grab a random number and
> mod it by the number of NNs, then use that as an index into the list of NNs.
>
> (3). I need to support the communication between the Namenodes, my current
>> plan is to create one more protocol that supports the communication
>> between
>> the Namenodes, something like the clientProtocol and
>> ClientDataNodeProtocol. Do you think is it easy to do so? Or do you have
>> other suggestions to support the communication between the Namenodes?
>>
>
> You will indeed need to define a new protocol.  Not the easiest thing in
> the world, but there are plenty of docs on protobuf.  Good luck!
>
> Daniel
>


Re: Multiple namenodes

2016-07-22 Thread Kun Ren
Thanks a lot for your suggestions.

On Fri, Jul 22, 2016 at 3:58 PM, Daniel Templeton 
wrote:

> On 7/22/16 12:23 PM, Kun Ren wrote:
>
>> Thanks a lot for your reply, Daniel, very helpful.
>>
>> About (1) :  I will consider this way, thanks. Also beside multiple
>> clusters, are there any other options to do so? Thanks.
>>
>
> HDFS does not support two active NNs in a single cluster.  Each DN belongs
> to a single NN, so a federated cluster is really multiple clusters that are
> stitched together at the NNs.
>
> About (2), if I understand correctly, HDFS used the quorum journal
>> manager(QJM) for HA, and the client  still only communicates with the
>> active namenode, not both node, am I understanding right?
>>
>
> I might be confusing HDFS with YARN, but I thought the way the client
> found the active NN was by just trying them all until one responds.
>
> Daniel
>