--- Begin Message ---
Package: libc6
Version: 2.7-18
Severity: normal
I am sorry if I should have originally filed it against kernel but I am not
sure if it is actually an NFS client issue.. It seems to be not nfs server
issue either since other clients of the same server perform fine. It shouldn't
be issue bonded interface on this box, since other nodes in the cluster with
bonded interfaces perform fine.
2 days ago I upgraded our cluster from etch to lenny.
Yesterday I've mentioned (it could be that it wasn't that slow before) that it
takes too long to start bash/zsh. strace showed that syscalls dealing with NFS
mounted files:
13:10:46.301746 access("/home/yoh/.zsh/sourcedir", F_OK) = 0 <0.019635>
whenever on other nodes it takes just 0.0001 or so
I've done obvious checks of nfsstat and netstat -- no obvious reasons for
slow-down.
Here is an excerpt from my notes in tiddlywiki. I would appreciate any hints on
what else to look at, since nfs client traffic seems to be going quite fast,
but there is significant delays between transactions, which probably what boils
down to slow syscalls.
examples from main node
{{{
strace -fF -o /tmp/zsh2.strace -T -ttt zsh-beta
1235583796.011574 access("/home/yoh/.zshenv", F_OK) = 0 <0.023661>
1235583796.035416 munmap(0x7fd834c96000, 2100856) = 0 <0.000053>
1235583796.035562 stat("/home/yoh/.zshenv.zwc", 0x7fff3e262f00) = -1 ENOENT (No
such file or directory) <0.009246>
}}}
has corresponding tcpdump:
{{{
tcpdump -i bond0 -tt -vvv -n "(dst host raider or src host raider)" -s 192 >|
/tmp/zsh-beta.tcpdump
1235583796.020630 IP (tos 0x0, ttl 64, id 43841, offset 0, flags [DF], proto
TCP (6), length 224) 10.0.0.1.514335490 > 10.0.0.252.2049: 172 access [|nfs]
1235583796.020777 IP (tos 0x0, ttl 64, id 46892, offset 0, flags [DF], proto
TCP (6), length 176) 10.0.0.252.2049 > 10.0.0.1.514335490: reply ok 124 access
attr: DIR 40755 ids 47521/47522 sz 16384 nlink 221 rdev 0/0 fsid
7d6310991d5a808d fileid 51855823 a/m/ctime 1235582942.843330957
1235583781.787331514 1235583781.787331514 c 001f
1235583796.020857 IP (tos 0x0, ttl 64, id 43842, offset 0, flags [DF], proto
TCP (6), length 232) 10.0.0.1.531112706 > 10.0.0.252.2049: 180 lookup [|nfs]
1235583796.021001 IP (tos 0x0, ttl 64, id 46893, offset 0, flags [DF], proto
TCP (6), length 292) 10.0.0.252.2049 > 10.0.0.1.531112706: reply ok 240 lookup
fh Unknown/010006013CC4464909BF4398B1441C5490AF20E507AF8051ED1A000000000001 LNK
120777 ids 47521/47522 sz 16 [|nfs]
1235583796.025559 IP (tos 0x0, ttl 64, id 43843, offset 0, flags [DF], proto
TCP (6), length 228) 10.0.0.1.547889922 > 10.0.0.252.2049: 176 lookup [|nfs]
1235583796.025709 IP (tos 0x0, ttl 64, id 46894, offset 0, flags [DF], proto
TCP (6), length 292) 10.0.0.252.2049 > 10.0.0.1.547889922: reply ok 240 lookup
fh Unknown/010006013CC4464909BF4398B1441C5490AF20E513906E780700000000000001 DIR
40700 ids 47521/47522 sz 90 [|nfs]
1235583796.030339 IP (tos 0x0, ttl 64, id 43844, offset 0, flags [DF], proto
TCP (6), length 224) 10.0.0.1.564667138 > 10.0.0.252.2049: 172 access [|nfs]
1235583796.030489 IP (tos 0x0, ttl 64, id 46895, offset 0, flags [DF], proto
TCP (6), length 176) 10.0.0.252.2049 > 10.0.0.1.564667138: reply ok 124 access
attr: DIR 40700 ids 47521/47522 sz 90 nlink 10 rdev 0/0 fsid 7d6310991d5a808d
fileid 786e9013 a/m/ctime 1231731824.355110521 1231737665.659793897
1231737665.659793897 c 001f
1235583796.035048 IP (tos 0x0, ttl 64, id 43845, offset 0, flags [DF], proto
TCP (6), length 224) 10.0.0.1.581444354 > 10.0.0.252.2049: 172 access [|nfs]
1235583796.035191 IP (tos 0x0, ttl 64, id 46896, offset 0, flags [DF], proto
TCP (6), length 176) 10.0.0.252.2049 > 10.0.0.1.581444354: reply ok 124 access
attr: DIR 40775 ids 47521/47522 sz 66 nlink 6 rdev 0/0 fsid 7d6310991d5a808d
fileid 18005733 a/m/ctime 1218207520.461796280 1218207523.890125418
1218207523.890125418 c 001f
1235583796.044623 IP (tos 0x0, ttl 64, id 43846, offset 0, flags [DF], proto
TCP (6), length 236) 10.0.0.1.598221570 > 10.0.0.252.2049: 184 lookup [|nfs]
1235583796.044773 IP (tos 0x0, ttl 64, id 46897, offset 0, flags [DF], proto
TCP (6), length 172) 10.0.0.252.2049 > 10.0.0.1.598221570: reply ok 120 lookup
ERROR: No such file or directory post dattr: DIR 40755 ids 47521/47522 sz 16384
nlink 221 rdev 0/0 fsid 7d6310991d5a808d fileid 51855823 a/m/ctime
1235582942.843330957 1235583781.787331514 1235583781.787331514
}}}
so nfs queries processed ok -- around 0.0001 sec (like on other nodes) but dt
between transactions is high (up to 0.005sec), which leads to total 0.02 for a
syscall... too high.
on another example used NFS client debugging:
{{{
13:10:45.714835 access("/home/yoh/.zshenv", F_OK) = 0 <0.018355>
13:10:45.733420 munmap(0x7f789cc36000, 2100856) = 0 <0.000063>
13:10:45.733623 stat("/home/yoh/.zshenv.zwc", 0x7fffa6202ea0) = -1 ENOENT (No
such file or directory) <0.006971
}}}
{{{
# used sudo rpcdebug -v -m nfs -s all to enable and -c to disable debugging
Feb 25 13:10:45 ravana kernel: [222679.726216] NFS:
nfs_lookup_revalidate(yoh/.zshenv) is valid
Feb 25 13:10:45 ravana kernel: [222679.729475] NFS:
permission(0:18/1367693347), mask=0x1, res=0
Feb 25 13:10:45 ravana kernel: [222679.729495] NFS call lookup .etc
Feb 25 13:10:45 ravana kernel: [222679.729827] NFS:
nfs_update_inode(0:18/1367693347 ct=1 info=0x6)
Feb 25 13:10:45 ravana kernel: [222679.729835] NFS reply lookup: 0
Feb 25 13:10:45 ravana kernel: [222679.729842] NFS:
nfs_update_inode(0:18/2020511763 ct=1 info=0x6)
Feb 25 13:10:45 ravana kernel: [222679.729852] NFS:
nfs_lookup_revalidate(yoh/.etc) is valid
Feb 25 13:10:45 ravana kernel: [222679.733126] NFS call access
Feb 25 13:10:45 ravana kernel: [222679.733563] NFS:
nfs_update_inode(0:18/2020511763 ct=1 info=0x6)
Feb 25 13:10:45 ravana kernel: [222679.733572] NFS reply access: 0
Feb 25 13:10:45 ravana kernel: [222679.733579] NFS:
permission(0:18/2020511763), mask=0x1, res=0
Feb 25 13:10:45 ravana kernel: [222679.733592] NFS:
nfs_lookup_revalidate(.etc/zsh) is valid
}}}
so major delay is between
{{{
Feb 25 13:10:45 ravana kernel: [222679.729852] NFS:
nfs_lookup_revalidate(yoh/.etc) is valid
Feb 25 13:10:45 ravana kernel: [222679.733126] NFS call access
}}}
-- System Information:
Debian Release: 5.0
APT prefers stable
APT policy: (500, 'stable')
Architecture: amd64 (x86_64)
Kernel: Linux 2.6.26-1-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash
Versions of packages libc6 depends on:
ii libgcc1 1:4.3.2-1.1 GCC support library
libc6 recommends no packages.
Versions of packages libc6 suggests:
ii glibc-doc 2.7-18 GNU C Library: Documentation
ii locales 2.7-18 GNU C Library: National Language (
-- debconf information:
* glibc/upgrade: true
glibc/restart-failed:
* glibc/restart-services: ssh snmpd rsync openbsd-inetd nis exim4 cupsys cron
atd apache-ssl
--- End Message ---