Change log level
Hi All, I compiled the source code, and used eclipse to remotely debug the code, I want to see the Debug information from the log, so I changed the log level for some classes, for example, I changed the FsShell's log level to DEBUG(change it from http://localhost:50070/logLevel), then I add the following test code in the FsShell.java: LOG.debug("FsShell:main(), log leve=debug"); LOG.info("FsShell:main(), log leve=info"); I re-compiled the code, and remotely debug it, however I can see the output "FsShell:main(), log leve=info", but can not see the LOG.debug line, looks like the log level is still INFO, but I checked with http://localhost:50070/logLevel, it shows that the level is DEBUG, do you know why or how to enable debug and change log level to debug? Thanks so much for your help. By the way, I also tried to change the log4j.properties, but still not working. Best, Kun
HDFS Federation
Hi Genius, I have two questions about the HDFS Federation: (1) Since there are multiple namenodes, there should be some code that analysis the client request and transfer the request to the appropriate namenode, could you please point to me where I can find the related code? (2) .Also just confirm that the Hadoop 2.7.2 support HDFS Federation, but in default there is only 1 namenode, is this correct? Meanwhile, do you think it is possible to configure the HDFS Fderation in the pseudo distributed mode in one node? Thanks so much in advance. Best, Kun Ren
Re: HDFS Federation
Thanks a lot, Kihwal. So some code in the ViewFileSystem analysis the client request and transfer the request to the correct namenode, correct? Can you point out in particular which function do this? Thanks. On Thu, Apr 28, 2016 at 10:57 AM, Kihwal Lee wrote: > Kun, > > (1) The client-facing counter part of federation is ViewFileSystem, aka > client side mount table.(2) Federation is supported in 2.7. There are test > cases bringing up federated mini cluster, so I assume setting up a pseudo > distributed cluster is possible. I am not sure whether all support scripts > will work as is. > 73,Kihwal > > > From: Kun Ren > To: hdfs-dev@hadoop.apache.org > Sent: Wednesday, April 27, 2016 7:29 PM > Subject: HDFS Federation > > Hi Genius, > > I have two questions about the HDFS Federation: > (1) Since there are multiple namenodes, there should be some code that > analysis the client request and transfer the request to the appropriate > namenode, could you please point to me where I can find the related code? > > (2) .Also just confirm that the Hadoop 2.7.2 support HDFS Federation, but > in default there is only 1 namenode, is this correct? Meanwhile, do you > think it is possible to configure the HDFS Fderation in the pseudo > distributed mode in one node? > > Thanks so much in advance. > > Best, > Kun Ren > > > >
handlerCount
Hi Genius, I have a quick question: I remembered I saw the default value for HandlerCout is 10(The number of Handler threads), but I can not find where it is defined in the source code, could you please point to me where I can find it in the 2.7.2 codebase? Thanks a lot.
Re: handlerCount
Thanks a lot, Chris and Kihwal. I found it in the DFSConfigKeys as you mentioned. On Thu, Apr 28, 2016 at 6:00 PM, Chris Nauroth wrote: > Hello, > > In general, configuration property default values will be defined in two > places: 1) hdfs-default.xml, which defines the default property values > when a deployment doesn't specifically set them and 2) DFSConfigKeys, a > class that defines constant default values that the code uses if for some > reason no default value is found during the configuration lookup. > > https://github.com/apache/hadoop/blob/rel/release-2.7.2/hadoop-hdfs-project > /hadoop-hdfs/src/main/resources/hdfs-default.xml#L602-L606 > > > https://github.com/apache/hadoop/blob/rel/release-2.7.2/hadoop-hdfs-project > /hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java#L473-L > 474 > > > --Chris Nauroth > > > > > On 4/28/16, 2:51 PM, "Kun Ren" wrote: > > >Hi Genius, > > > >I have a quick question: > > > >I remembered I saw the default value for HandlerCout is 10(The number of > >Handler threads), but I can not find where it is defined in the source > >code, could you please point to me where I can find it in the 2.7.2 > >codebase? Thanks a lot. > >
Get the methodName and parameters from the Call object in server.java
Hi Genius, I want to intercept the requests in the processRpcRequest() method in the listener component in server.java, for example if I want to intercept the "mkdirs" and "append" request, I just try to get the method name and parameters before this line: callQueue.put(call); Currently I use the following way to get the method name: rpcRequest = call.rpcRequest; RpcRequestWrapper request = (RpcRequestWrapper) rpcRequest; RequestHeaderProto rpcRequestProto = request.getRequestHeader(); String methodName = rpcRequestProto.getMethodName(); Then the methodName is "mkdirs" if the request is "./bin/hdfs dfs -mkdir input/test1", however I don't know how to get the parameter, like "input/test1", does anyone know how to get the methodName and parameters from the Call object? Thanks a lot and very appreciate.
Get the methodName and parameters from the Call object in server.java
Hi Genius, I want to intercept the requests in the processRpcRequest() method in the listener component in server.java, for example if I want to intercept the "mkdirs" and "append" request, I just try to get the method name and parameters before this line: callQueue.put(call); Currently I use the following way to get the method name: rpcRequest = call.rpcRequest; RpcRequestWrapper request = (RpcRequestWrapper) rpcRequest; RequestHeaderProto rpcRequestProto = request.getRequestHeader(); String methodName = rpcRequestProto.getMethodName(); Then the methodName is "mkdirs" if the request is "./bin/hdfs dfs -mkdir input/test1", however I don't know how to get the parameter, like "input/test1", does anyone know how to get the methodName and parameters from the Call object? Thanks a lot and very appreciate.
Compile proto
Hi Genius, I added a new proto into the HADOOP_DIR/hadoop-common-project/hadoop-common/src/main/proto, however,every time when I run the following Maven commands: mvn install -DskipTests mvn eclipse:eclipse -DdownloadSources=true -DdownloadJavadocs= true It only compiles all other protoes, but don't compile my added new proto,do you know why and how can I configure it? Otherwise I have to compile the new proto by hand. Thanks a lot for your help.
Re: Compile proto
Yes, this fixed the problem. Thanks a lot for your reply. On Tue, May 10, 2016 at 2:13 PM, Colin McCabe wrote: > Hi Kun Ren, > > You have to add your new proto file to the relevant pom.xml file. > > best, > Colin > > On Fri, May 6, 2016, at 13:04, Kun Ren wrote: > > Hi Genius, > > > > I added a new proto into the > > HADOOP_DIR/hadoop-common-project/hadoop-common/src/main/proto, > > > > however,every time when I run the following Maven commands: > > > >mvn install -DskipTests > >mvn eclipse:eclipse -DdownloadSources=true -DdownloadJavadocs= > > true > > > > It only compiles all other protoes, but don't compile my added new > > proto,do > > you know why and how can I configure it? Otherwise I have to compile the > > new proto by hand. > > > > Thanks a lot for your help. > > - > To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org > >
cp and mv
Hi Genius, Currently I debugged the cp and mv operations, for example: (1) ./bin/hdfs dfs -cp input/a.xml input/b.xml (2)./bin/hdfs dfs -mv input/a.xml input/b.xml My understanding is that for operation cp, it will create a new file b.xml, and will copy the content of a.xml to b.xml; For mv operations, it will create b.xml and copy the content of a.xml to b.xml, and delete the a.xml. However, when I debug the code, i found that both operations will finally go through the create() method in NameNodeRpcServer.java, but I didn't see any calls to copy and delete function, could you please point out to me where I can debug and see the full logic of the cp and mv operations. Thanks a lot.
Re: cp and mv
Make sense, Thanks a lot, Mingliang. Another quick question: why there is a shell command that can copy a HDFS file to another HDFS file, but there is no API that can do this, the API only support copying file from local to HDFS, so that I should call multiple APIs to implement this, is that correct? Thanks. On Fri, May 20, 2016 at 5:44 PM, Mingliang Liu wrote: > Kun, > > I think you need to be ware of the difference between client and server > side logic. Perhaps you’re more interested in the client side in this case. > The commands are generally running in the shell, and > org.apache.hadoop.fs.shell package is a good place to start. Specially, > have a look at CommandWithDestination.java. > > Ciao, > > L > > On May 20, 2016, at 12:05 PM, Kun Ren wrote: > > Hi Genius, > > Currently I debugged the cp and mv operations, for example: > (1) ./bin/hdfs dfs -cp input/a.xml input/b.xml > (2)./bin/hdfs dfs -mv input/a.xml input/b.xml > > My understanding is that for operation cp, it will create a new file b.xml, > and will copy the content of a.xml to b.xml; For mv operations, it will > create b.xml and copy the content of a.xml to b.xml, and delete the a.xml. > > However, when I debug the code, i found that both operations will finally > go through the create() method in NameNodeRpcServer.java, but I didn't see > any calls to copy and delete function, could you please point out to me > where I can debug and see the full logic of the cp and mv operations. > Thanks a lot. > > >
Cp command is not atomic
Hi Genius, If I understand correctly, the shell command "cp" for the HDFS is not atomic, is that correct? For example: ./bin/hdfs dfs -cp input/a.xml input/b.xml This command actually does 3 things, 1. read input/a.xml; 2. Create a new file input/b.xml; 3. Write the content of a.xml to b.xml; When I looked at the code, and the client side actually does the 3 steps and there are no lock between the 3 step, does it mean that the cp command is not guaranteed atomic? Thanks a lot for your reply.
HDFS Federation-- cross namenodes operations
Hi Genius, Does HDFS Federation support the cross namenodes operations? For example: ./bin/hdfs dfs -cp input1/a.xml input2/b.xml Supposed that input1 belongs namenode 1, and input 2 belongs namenode 2, does Federation support this operation? And if not, why? Thanks.
Re: Cp command is not atomic
Thanks a lot, Chris, this is helpful. On Wed, May 25, 2016 at 12:33 PM, Chris Nauroth wrote: > Hello Kun, > > You are correct that "hdfs dfs -cp" is not atomic, but the details of that > are a bit different from what you described. For the example you gave, > the sequence of events would be: > > 1. Open a.xml. > 2. Create file b.xml._COPYING_. > 3. Copy the bytes from a.xml to b.xml._COPYING_. > 4. Rename b.xml._COPYING_ to b.xml. > > b.xml._COPYING_ is a temporary file. All the bytes are written to this > location first. Only if the full copy is successful, it proceeds to step > 4 to rename it to its final destination at b.xml. The rename is atomic, > so overall, this has the effect that b.xml will never have > partially-written data. Either the whole copy succeeds or the copy fails > and b.xml doesn't exist. > > However, even though the rename is atomic, we can't claim the overall > operation is atomic. For example, if the process dies between step 2 and > step 3, then the command leaves a lingering side effect in the form of the > b.xml._COPYING_ file. > > Perhaps it's sufficient for your use case that the final rename step is > atomic. > > --Chris Nauroth > > > > > On 5/25/16, 8:21 AM, "Kun Ren" wrote: > > >Hi Genius, > > > >If I understand correctly, the shell command "cp" for the HDFS is not > >atomic, is that correct? > > > >For example: > > > >./bin/hdfs dfs -cp input/a.xml input/b.xml > > > >This command actually does 3 things, 1. read input/a.xml; 2. Create a new > >file input/b.xml; 3. Write the content of a.xml to b.xml; > > > >When I looked at the code, and the client side actually does the 3 steps > >and there are no lock between the 3 step, does it mean that the cp command > >is not guaranteed atomic? > > > > > >Thanks a lot for your reply. > >
Start client side daemon
Hi Genius, I understand that we use the command to start namenode and datanode. But I don't know how HDFS starts client side and creates the Client side object(Like DistributedFileSystem), and client side RPC server? Could you please point it out how HDFS start the client side dameon? If the client side uses the same RPC server with server side, Can I understand that the client side has to be located at either Namenode or Datanode? Thanks so much. Kun
Multiple namenodes
Hi Genius, I am currently involved in a project that will create/start multiple namenodes(It is different with Federation that: We want to partition the metadata not only by directory, and may support other partition schemes, and we want to support the distributed operations that cross multiple namenodes), It would be great if I can get some suggestions: (1). How to create/start multiple namenodes? Suppose I want to create 2 Namenodes, and one in machine A and the other in machien B, I should start the Namenode in both machines(For example, each machine will call the initialize() method in NameNode.java), correct? Yes, I definitely need to change the related code like partition the metadata into different namenode and change the data structure etc, just want to make sure that I can use this way to simply start multiple namenodes. Or do you have any suggestions to do so? (2). Once I have multiple Namenodes running, do you think what is the best/simple way to change the HDFS client code to let the clients send the requests to a random Namenode? (3). I need to support the communication between the Namenodes, my current plan is to create one more protocol that supports the communication between the Namenodes, something like the clientProtocol and ClientDataNodeProtocol. Do you think is it easy to do so? Or do you have other suggestions to support the communication between the Namenodes? Thanks so much. Kun
Re: Multiple namenodes
Thanks a lot for your reply, Daniel, very helpful. About (1) : I will consider this way, thanks. Also beside multiple clusters, are there any other options to do so? Thanks. About (2), if I understand correctly, HDFS used the quorum journal manager(QJM) for HA, and the client still only communicates with the active namenode, not both node, am I understanding right? Thanks. On Fri, Jul 22, 2016 at 1:27 PM, Daniel Templeton wrote: > On 7/22/16 8:45 AM, Kun Ren wrote: > >> (1). How to create/start multiple namenodes? >> > > Just pretend like you have two separate HDFS clusters and set them up that > way. (That is actually what you have.) > > (2). Once I have multiple Namenodes running, do you think what is the >> best/simple way to change the HDFS client code to let the clients send the >> requests to a random Namenode? >> > > The client code is already built to try multiple NNs to handle HA. You can > look there for inspiration. If you want random, grab a random number and > mod it by the number of NNs, then use that as an index into the list of NNs. > > (3). I need to support the communication between the Namenodes, my current >> plan is to create one more protocol that supports the communication >> between >> the Namenodes, something like the clientProtocol and >> ClientDataNodeProtocol. Do you think is it easy to do so? Or do you have >> other suggestions to support the communication between the Namenodes? >> > > You will indeed need to define a new protocol. Not the easiest thing in > the world, but there are plenty of docs on protobuf. Good luck! > > Daniel >
Re: Multiple namenodes
Thanks a lot for your suggestions. On Fri, Jul 22, 2016 at 3:58 PM, Daniel Templeton wrote: > On 7/22/16 12:23 PM, Kun Ren wrote: > >> Thanks a lot for your reply, Daniel, very helpful. >> >> About (1) : I will consider this way, thanks. Also beside multiple >> clusters, are there any other options to do so? Thanks. >> > > HDFS does not support two active NNs in a single cluster. Each DN belongs > to a single NN, so a federated cluster is really multiple clusters that are > stitched together at the NNs. > > About (2), if I understand correctly, HDFS used the quorum journal >> manager(QJM) for HA, and the client still only communicates with the >> active namenode, not both node, am I understanding right? >> > > I might be confusing HDFS with YARN, but I thought the way the client > found the active NN was by just trying them all until one responds. > > Daniel >