This is mostly disk bound on NameNode. I think this ends up being one
fsync for each file. If you have multiple directories, you could start
multiple commands in parallel. Because of the way NameNode syncs having
multiple clients helps.
Raghu.
Frank Singleton wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi,
Did a test on recursive chown on a fedora 9 box here (2xquad core,16Gram)
Took about 12.5 minutes to complete for 45000 files. (hmm approx 60 files/sec)
This was the namenode that I executed the command on
Q1. Is this rate (60 files/sec) typical of what other folks are seeing ?
Q2. Are there any dfs/jvm parameters I should look at to see if I can improve
this
time /home/hadoop/hadoop-0.18.1/bin/hadoop dfs -chown -R frank:frank
/home/frank/proj100
real 12m38.631s
user 1m54.662s
sys 0m33.124s
time /home/hadoop/hadoop-0.18.1/bin/hadoop dfs -count /home/frank/proj100
220 45891 3965996260
hdfs://namenode:9000/home/frank/proj100
real 0m1.579s
user 0m0.686s
sys 0m0.129s
cheers / frank
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org
iEYEARECAAYFAkjln0MACgkQpZzN+MMic6dqgQCdEtto3qEhKIc50ICMf058w8ar
o4QAoILcDRDYmUUuxPwSFh7LNTQdKodn
=xuZE
-----END PGP SIGNATURE-----