On Thu, May 03, 2007 at 11:55:49AM -0700, Steven Schlansker wrote: > Karl E. Jorgensen wrote: > >On Wed, May 02, 2007 at 11:23:21AM -0700, Steven Schlansker wrote: > >>I'm having a rather strange error while trying to ls a large directory. > >>The setup is as follows: > >> > >>/home is nfs-mounted from a BSD box > >>nsswitch is set to use LDAP for passwd, shadow, and group info > >>nscd is running to cache the responses from LDAP > >> > >>I try to run ls -l /home, and get the error > >> > >>[EMAIL PROTECTED]:~$ ls -l /home > >>*** glibc detected *** free(): invalid pointer: 0xa7f9ad38 *** > >>Aborted > > > >Questions that might help narrow it down: > >- Does other commands (find, shell wildcard expansion) behave strangely > > too? > >- Do you get the same error if you omit "-l" ? > >- What about "ls --numeric-uid-gid /home" ? (might blame/eliminate ldap) > >- Does the same happen if you run the commands on the actual (BSD?) box? > > This would eliminate/blame NFS... > >- Any out-of-the-ordinary options in /etc/fstab for /home ? > > > >It would be nice to narrow it down to one of: > >- nfs > >- ldap > >- specific users/groups > >- specific files > >- network trouble (unlikely...)
[snip] > I did some more narrowing down. The problem was almost certainly with > LDAP. Our LDAP server was heavily overloaded (19! Never seen a > 15-minute load average that high before...) because we had an index on > the wrong key (uid instead of uidNumber, and all the queries used > uidNumber as their search term) 19 is workable. But it starts to hurt around there. I once had one of my boxes up to 78 (didn't want to reboot as this would loose both uptime counter and a diagnostic opportunity). > So what was apparently happening was name lookups were taking too long > (a few seconds?). Adding a proper index to slapd made the problem go > away. It's probably a bug though that ls and friends would abort if it > couldn't resolve the name in a certain amount of time though - is that > intended behavior? I suspect that this is *not* the intended behaviour of ls :-) Sounds like there's an obscure bug somewhere there... > Wouldn't it be better to log a timeout and use the numeric ID or > something? I concur. But setting up a testcase for it might require a bit of work - might not be worth it for such an obscure bug... -- Karl E. Jorgensen [EMAIL PROTECTED] http://www.jorgensen.org.uk/ [EMAIL PROTECTED] http://karl.jorgensen.com ==== Today's fortune: Before destruction a man's heart is haughty, but humility goes before honour. -- Proverbs 18:12
signature.asc
Description: Digital signature