Re: slow access to files

Farid Hajji Mon, 05 Nov 2001 16:12:04 -0800

Hi Diego,

>   I've noticed that access to files on Hurd is noticeable slower than using
> Linux. For example compiling hurd require a lot of disk access using hurd,
> while cross compiling hurd on linux uses few disk access.
I'm sure you're not comparing apples and oranges here, though it may
seem so at first sight ;-).


You're right: the Hurd _is_ currently slower at accessing files, compared
to Linux or *BSD. Part of the _perceived_ slowness arises from the very
inefficient fork()/exec() syscall emulation in glibc, but this is only
visible when accessing a lot of files in multiple processes like in e.g.
'configure' runs (among others).

The main reaons for the slowness of the Hurd's file I/O, is that data is
actually _copied_ more often than necessary between the clients and the
file servers/translators. Just look at the sources of glibc and the hurd,
starting e.g. with glibc's write(), its hurd-ish sysdeps, which IPCs the
translator, which in turn goes to the storeio etc... That's quite a lot
of copying going on. Compared to Linux/BSD's integrated VFS/VM buffer
cache, that's extremely inefficient ;-).

The unnecessary multiple copying of data would already consume a lot of
CPU cycles, but wait, this is even worst than it may seem at first sight:
Every single access (e.g. an "atomic" write(2) call) not only induces
two context switches [to kernel and back], but a lot more: One IPC to
the translator (via Mach), another IPC of the translator to storeio (again
via Mach) and then access to the disks etc... [if not cached in storeio].
Add to this the response IPCs etc.. All this consume a lot of CPU cycles
as well. Now if the applications use a lot of small read(2)/write(2) calls,
you've got a flurry of data copying, IPCs and context switches that is,
say one order of magnitude slower than in Linux and *BSD.

> The question is: there is some kind of caching on hurd or in gnumach? Can be
> useful implementing caching, for example, in the storeio translator?
First possible caching is in storeio: Data that is being read from physical
devices can be cached in storeio (look at the sources). This is the most
important cache when it comes to speed.

The next best thing in caching is when the filesystem translator shares
memory (with mach's vm_map() call) with its underlying storeio. This would
speed up things quite a lot.

The problem right now is that there is no memory sharing between normal
clients and the filesystem translators. Here, data is simply copied across
a costly IPC path, thus wasting a lot of CPU cycles.

It would be nice to add a mmap()/munmap() IPC to the translators, so that
clients can obtain vm_objects directly linked to, say, storeio's memory
space. This way, the clients could change [pages of] a file through direct
memory read/writes. If done well, this would be hidden from user programs
which will still use read(2)/write(2) calls. glibc's sysdeps would then
either IPC the translators or request memory-mapping of [e.g. frequently
used?] parts of the file.

This change would however break the current design of the Hurd, and I'm
not expert enough here to see the ramifications of this choice.

> I've also noted that during compiling hurd, there is a lot free memory
> avaible, so why don't we use it?
Here too, you're right. The reason is the same as said above: we don't
have a VFS buffer cache like in Linux/BSD, but just copy the data when
available [from client to translator and back].

> Diego Roversi | diegor at maganet.net
>               | diegor at tiscalinet.it 

-Farid.

-- 
Farid Hajji -- Unix Systems and Network Admin | Phone: +49-2131-67-555
Broicherdorfstr. 83, D-41564 Kaarst, Germany  | [EMAIL PROTECTED]
- - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - -
One OS To Rule Them All And In The Darkness Bind Them... --Bill Gates.


_______________________________________________
Bug-hurd mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/bug-hurd

Re: slow access to files

Reply via email to