Re: Thread stack allocation (was Re: cvs commit: src/lib/libc_r Makefile src/lib/libc_r/uthread pthread_private.h uthread_create.c uthread_gc.c uthread_init.c)

1999-07-12 Thread Dmitrij Tejblum

> On Mon, 12 Jul 1999, Dmitrij Tejblum wrote:
> > Alan Cox wrote:
> > > When you create a stack or grow an existing stack, the minimum chunk size
> > > is 128K.
> > 
> > This make use of "growable" stacks in libc_r particulary useful, given that 
> > libc_r make "growable" only 64K stacks.
> 
> That is a problem, to be sure.  In order to make effective use of growable
> stacks, each stack really needs to be at least 256KB.  However, Alan also
> pointed out that growable stacks are a bit of a non-feature, since the VM
> is lazy about backing mapped regions.  In light of this, I'm leaning
> toward using MAP_ANON instead of MAP_STACK.

I don't see how MAP_ANON is better than MAP_STACK.

> > These changes create other troubles. For example, they limit the size of the 
> > initial thread to 1M, and this is too little and not tunable.
> 
> Making the initial stack size tunable at runtime would require a
> non-standard interface.  How big is big enough?  I picked 1MB rather
> randomly; increasing the value is quite easy.  One possible solution would
> be to pay at least some heed to the value of getrlimit(RLIMIT_STACK, ..).

Well, why you map stacks on fixed address, and in the process stack? I think, 
you could map it at random address. (And mprotect the red zone). 

> Another problem with the changes I made was also pointed out by Alan.
> Each stack is a separate region, and with the red zones, there end up
> being two regions for every stack.  This apparently has a direct impact on
> page table lookups.  Somehow, the stack allocation code needs to be more
> economical in this regard, but I haven't thought of a slick method yet.

Anyway, what are the advantages of mmap over malloc? Especially if you change 
MAP_STACK to MAP_ANON?

Dima




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: newfs of a ccd failing?

1999-08-25 Thread Dmitrij Tejblum

> 
> I've had this problem since at least FreeBSD 3.1-RELEASE (it works in
> 2.2.7/2.2.8).  Same problem in 3.2-RELEASE and -current (as of last night).
> 
> Can someone reproduce this error?  I can't believe that you can't newfs
> a ccd...  did I miss something?

I always see the error message last months, but it is harmless in 
practice - the filesystem is OK, the ccd is ready to use.

> newfs: ioctl (WDINFO): No such process
> newfs: /dev/rccd0c: can't rewrite disk label

Dima




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: NFSv3 on freebsd<-->solaris

1999-08-25 Thread Dmitrij Tejblum

Doug Rabson wrote:
> 
> This is probably because our server detects that the directory has been
> modified and rejects the solaris client's directory cookies.

I think we should not ever reject a client's cookie. Consider a local 
program that scan the directoty with the getdirentries() syscall. The 
offset in the directory is essentially the cookie that would be sent to
an NFS client. But we never "reject" the offset, and everyone is happy.
(Not to mention NFSv2, where we never reject a client's cookie too). 
So, what we are trying to achieve by rejecting a NFSv3 client's cookie?

> Instead of
> recovering, the solaris client barfs. Its a solaris bug really

IMHO, it is very arguable. Why the client should "recover" after "stale 
cookie" error, but should not recover after "stale filehandle" error? 
How should it perform the recovery: If a reliable recovery is possible, 
why it is not done on the server? 

(After all, Sun know how NFSv3 should work, since they wrote the spec, 
right? :-|)

Dima


Dima




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: NFSv3 on freebsd<-->solaris

1999-08-29 Thread Dmitrij Tejblum

[sorry for some delay...]

Doug Rabson wrote:
> > I think we should not ever reject a client's cookie. Consider a local 
> > program that scan the directoty with the getdirentries() syscall. The 
> > offset in the directory is essentially the cookie that would be sent to
> > an NFS client. But we never "reject" the offset, and everyone is happy.
> > (Not to mention NFSv2, where we never reject a client's cookie too). 
> > So, what we are trying to achieve by rejecting a NFSv3 client's cookie?
> 
> Notify the client that the directory contents may have been compacted and
> therefore that their seek offsets are now wrong.

You apparently missed my above paragraph. Do we notify a local process
about such a condition?

Then, what is so special about compacting? What if I quickly moved out 
a directory content and replaced it with something completely different?
How you can recover after it? With rm -r, the recovery is easy, but how 
the recovery will work if the program is, say, du?

> From rfc1813:
> 
>   If the
>   server detects that the cookie is no longer valid, the
>   server will reject the READDIR request with the status,
>   NFS3ERR_BAD_COOKIE. 

I propose that our cookies are always valid, just like directory 
offsets after getdirentries() syscall (on a local filesystems).

> The client should be careful to
>   avoid holding directory entry cookies across operations
>   that modify the directory contents, such as REMOVE and
>   CREATE.
> 
> It seems to me that the solaris client is holding directory cookies across
> a REMOVE operation and therefore should expect to get stale cookie errors
> occaisionally.

Yes. FreeBSD programs typically use fts(3), which read whole directory 
before return its content to the application. That is, the rule is 
honored. But this solution is in the userland.

> Our NFS client used to have the same problem (a long time ago) and I put
> code into it to re-read the directory if its cookies are stale.

(According to a mail recently sent to -hackers, that doesn't work. 
In -current, the recovery code has a debugging printf(), so I guess 
the code only triggered in very rare cases (see above).)

Anyway, I don't actually care what is correct NFS client behavior. I am 
saying that sending "bad cookie" error is not useful for FreeBSD sever.

Dima




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: NFSv3 on freebsd<-->solaris

1999-08-29 Thread Dmitrij Tejblum

> It isn't possible to do this and still remain synchronized.  If the
> directory changes on the server, the client has no way of knowing 
> whether a cookie corresponds to the same file if you always return
> a valid response.  This breaks the protocol.
> 
> A local filesystem getdirientries() call is monotonic, stateful, and
> cache coherent.  An NFS readdir rpc is stateless, not monotonic, and can
> only approximate cache coherency.

Perhaps I am mistaken, but I disagree. getdirentries() call is not 
monolitic and is stateless. Let see:

To read a directory with the getdirentries() call, the application have 
to open it just like every over file and get a file descriptor. Like 
every over file descriptor, the open directory has associated offset, 
or pointer. 

The getdirentries() syscall supply the directory pointer to VOP_READDIR 
as uio_offset. (The cookie sent by NFS client is supplied to VOP_READDIR 
as uio_offset too.) After exit from VOP_READDIR, the uio_offset stored 
back in the file descriptor offset. The file offset is the only state
saved.

Note also that offset has nothing to do with the size of data 
transferred by getdirentries(), escpecially if the filesystem is not 
UFS. That is, the offset is actually just a handy place to store the 
cookie (OTOH, for any local filesystem I am aware of it indeed the 
offset in the physical directory.)

Note that the application can do lseek on the directory, that is change 
the next cookie used. It is used by seekdir(). (And, of course, the
application may lseek to anywhere it like, and the filesystem will have 
to deal with the bogus cookie.

> * an NFS readdir rpc is stateless and not monotonic.  The server cannot
>   tell the difference between a new rpc, a retry, or several different
>   processes on the client scanning the same directory (running at different
>   points in the directory).

With the local applications, VOP_READDIR cannot tell the difference 
too. There may be several program scanning one directory, the program 
may do seekdir(), the only known thing is the uio_offset, that is the 
cookie.

> 
> * An NFS readdir rpc can only approximate cache coherency, but that
>   doesn't mean you can throw cache coherency out the window.  

What cache coherency? Noone ever mmap() a directory, I hope. After 
getdirentries() syscall finished, someone may change the directory in 
any way (just after read() call and a regular file). After the nfs 
readdir reply sent to the client, someone may change the directory in 
any way. Again, I don't see any difference. 

> It 
>   approximates cache coherency through the use of the verifier key.  If
>   the verifier key supplied by the client is wrong, the server has to
>   tell it so.  Otherwise the client's directory cache will get out of
>   sync.

Nope, the verifier is for the server can validate the cookie. Cache 
validation need to be done my checking of mtime, like with regular 
files. What if the client cached all the directory, and then the 
directory has changed? So, the cache coherency with directories is 
no worse than with regular files.

Note, that just like READ call return file attributes that can be used 
to cache validation, the READDIR call return the directory attributes, 
that can be used for this purpose.

> Furthermore, the NFS readdir rpc has no notion of 'dead' directory entries
> as far as I can tell.  This means that from the point of view of an NFS
> client, directories are always 'compacted'.  Since clients may implement
> a block cache for directories, the server cannot afford to return a valid
> response if the verifier mismatches because it will screw up the client's
> block cache for the directory.  This is very different from the way most
> local directories are scanned - filesystems such as UFS maintain dead
> directory entries and thus allow a directory data block to be scanned 
> without any locking.  We cannot use this trick with NFS.
> 
> Add on top of that the fact that the NFS directory 'block size' may
> different then a local filesystem's.  NFS must translate padding 
> characteristics between the local filesystem and the NFS client's notion
> of the directory.  Even if we did support the notion of dead directory
> entries in NFS, trying to translate the padding characteristics at the
> same time would be fairly difficult to accomplish.

Umm, I didn't understand that the translation has to do with the issue. 
BTW, not all local filesystems are UFS.

> 
> :> Our NFS client used to have the same problem (a long time ago) and I put
> :> code into it to re-read the directory if its cookies are stale.
> :
> :(According to a mail recently sent to -hackers, that doesn't work. 
> :In -current, the recovery code has a debugging printf(), so I guess 
> :the code only triggered in very rare cases (see above).)
> 
> This works on FreeBSD clients as

Re: NFSv3 on freebsd<-->solaris

1999-08-30 Thread Dmitrij Tejblum


> The client system -- A FreeBSD
> client system - has a buffer cache.  The buffer cache holds an abstraction
> for both files and directories.  

Well, the discussion is about FreeBSD NFS server, not about FreeBSD NFS 
client. Neither FreeBSD server cannot assume FreeBSD client, nor 
FreeBSD client can assume FreeBSD server. The NFS server is a simple 
thing that just do what the client requested, for example read the 
directory. Bugs in the FreeBSD NFS client is completely different story.

> 
> Our NFS implementation on the client caches the NFS directory via the
> buffer cache.  It translates the cookies returned by the server to
> a block number and offset as cached in the client's buffer cache.
> 
> See nfs_readdirrpc() in sys/nfs/nfs_vnops.c
> 
> This creates a directory-block abstraction on the client.  The 'cookies'
> the client returns to processes are based on this abstraction and do not
> match the cookies returned by the server.
> 
> The problem that we have is that our buffer cache abstraction essentially
> fits a variable number of directory entries returned from the server.  If
> a file is created or deleted on the server, our buffer cache abstraction
> gets thrown for a loop.

The client _cannot_ depend on that if a file is created or removed on the 
server, the "bad cookie" error is returned in next readdir. RFC1813 does not 
require it in any way.

> 
> In order to maintain consistency within the set of cached pages (note:
> I'm not talking about cache coherency with the server here, just 
> consistency within the buffer cache on the client), our buffer cache
> abstraction is currently dependant on the verifier key changing on the
> server.  I don't why it was done this way -- perhaps mtime was found to
> not be sufficient.  Maybe because it doesn't have sufficient resolution
> under NFSv2.  Under NFSv3 it should theoretically have sufficient 
> resolution but how many servers do you know keep the nanoseconds field
> updated?

I don't believe in it. First of all, NFSv2 has no verifiers, and work 
reasonably well. (There is a belief that NFSv2 is much more reliable 
than NFSv3, you know.) Then, invalidation of cached data is heavily 
depended on mtime anyway. The client don't do readdir RPC if it think 
that its cache is valid, it only verify the mtime. Finally, -current 
has a debugging printf in "bad cookie" handling code for about 4 
months, and noone complain that his logs filled with the message.

I think I now understand why the "bad cookie" handling code don't do 
the right thing. Removing files in the directory effectively shift its 
content to the left. So, if you read the directory and remove files in 
the same time, you will miss some entries.

> When applied to files, the use of mtime to determine when to flush the
> cache is nothing more then an inconvenience.  But the use of mtime to
> determine when to flush a directory cache can be fatal.

I still don't see why.

> If you want to change the way our directory verifier works, you have to
> completely rewrite the directory caching code for the client.  I think
> you can argue that the verifier is not being implemented properly, but
> I'm not going to let anyone change it unless the directory caching code
> on the client is rewritten at the same time to use the server's cookies
> directly.  

Really?

> Right now the server's cookies are only used by the client to demark 
> client-buffer-cache buffer boundries.  The actual cookies returned to
> the *process* running on the client are translated from the client's
> buffer cache abstraction of the NFS directory.
> 
> The change that would have to be made would be for the server's cookies
> to be passed through all the way to the process sitting on the client
> rather then translated in the buffer cache.  Then cache consistency in
> our client would then not be as sensitive to the varying amounts of
> information the server sends us and we could safely leave the verifier 
> alone on the server.  This would require us to change the abstraction our
> client uses significantly -- it would not longer be able to use the 
> cookies passed to it by the user process as direct offsets into the
> client's buffer cache.

Hmm. I don't think such a big changes in the directory caching is 
necessary at all, though I didn't actually think about it. Anyway, the 
verifiers only add to the breakage (see above).

Dima




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Perl still broken in 4.0-CURRENT

1999-09-01 Thread Dmitrij Tejblum

Pascal Hofstee wrote:
> Hi,
> 
> Perl seems to be broken for about 3 consecutive days now 
> Anybody have any idea what might be causing this ?

I suspect it is the recent changes in rtld.

Dima




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: (P)review: sigset_t for more than 32 signals

1999-09-06 Thread Dmitrij Tejblum

> > typedef struct {
> > unsigned int n;
> > uint64_t v;
> > } sigset_t;
> 
> You can't use any BSD or FreeBSD specific types (such as u_int32)t) in
> publicly visible types (such as sigset_t). It breaks programs because it's
> not ANSI and/or Posix.

You can use internal names like __uint32_t from  for. They 
are in the implementation namespace.

Dima




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: ccd build failure

1999-09-23 Thread Dmitrij Tejblum

"Matthew D. Fuller" wrote:
> OK:
> #!/bin/sh
> (cvs status | grep '^File:' | grep -v 'Status: Up-to-date$') 2> /dev/null

Excuse me, I apparently completely missed the idea, but what is wrong 
with

cvs -qn up

?

Dima




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: wierd message

1999-09-25 Thread Dmitrij Tejblum

Kenneth Culver wrote:
> what does this message mean?
> Sep 24 18:34:04 culverk /kernel: arpresolve: can't allocate llinfo for 127.0.0.1rt

I don't know how you got it, but here is an easy way to do so:

In /etc/rc.conf:
network_interfaces="ed0 lo0"
ifconfig_ed0="DHCP"

(ed0 apparently can be replaced with any other NIC)

Then run amd. (I don't know why amd is the first program I tried that 
trigger it.)

The workaround (fix?) is to put lo0 before ed0. But sysinstall put 
network interfaces in this order :-(.

I have seen this when I recently installed 3.3-RELEASE on a new machine.

As I ssuspect, this is because DHCP do something like 
route add -host  127.0.0.1
before lo0 is ifconfiged.

Dima




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: {a}sync updates (was Re: make install trick)

1999-10-09 Thread Dmitrij Tejblum

"David Schwartz" wrote:
>   We're talking about the special case of small root partitions, such that
> softupdates inability to make empty space available quickly can make the
> difference between a major operation's success or failure.
> 
>   This is almost impossible on a 1.8Gb root partition.

[sorry, cannot resist...]


Once upon a time, a month or so ago, there were ~30G of free space on
our 130G filesystem (with softupdates). An important application that 
was going to create a ~35G file was running. It already written out 1G 
or so. My colleague called me and sayed: "I removed a 10G of files TWO 
HOURS ago and the space didn't free up yet!!! The free space isn't 
going to appear!!! What do we do now???"

Well, after I stopped the application with kill -STOP, and temporary 
killed off another I/O consuming program, the free space started to 
appear and after several minutes there were >40G of free space.


I think that the problem in this particular case was inability of the 
syncer to run as fast as it supposed to. It is assummed that syncer 
fsync 1/30 of all files and process the softupdates worklist every 
second. If there are several I/O bound processes running, the syncer 
will not have enough I/O bandwidth to do this job in the required speed. 
Perhaps running several syncer processes could help. (OTOH, the machine 
in question is running a quite old version of FreeBSD-CURRENT, it is 
possible things are already better. I don't see serious changes in 
softupdates code from that moment, tough. It is also possible that the 
machine is mistuned in some way, but I don't know how :-()

Dima




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: freefall hangs w/ nfs

1999-10-26 Thread Dmitrij Tejblum

Matthew Dillon wrote:
> Actually, what I meant was that AMD itself is equivalent to a loopback
> mount, whether or not you make loopback mounts through it.

No. The loopback deadlock happen when the nfs server handle a write 
operation. But there cannot be any writes in the amd filesystem. The 
filesystem only contains symlinks to outside.

Dima




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious locking problem in CURRENT

1999-01-04 Thread Dmitrij Tejblum

David Malone wrote:
> A child process seems to be able to let go of a parent's lock on
> 4.0 by closing a file discriptor, the same doesn't seem to be true
> on 3.3.

So, apparently, it was broken in rev. 1.68 of kern_descript.c. (Another 
example that comments (in closef() in this case) serve no purpose :-/).

BTW, I have another little concern with that commit: It make possible for 
last close() of a file descriptor to return 0 instead of the error from 
VOP_CLOSE(), and the error from VOP_CLOSE() to be ignored.

Dima




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious locking problem in CURRENT

1999-11-06 Thread Dmitrij Tejblum

Brian Fundakowski Feldman wrote:
> There were zero comments about what order things happen in; in fact,
> the ordering in this case is Just Plain Lame (TM).  It's much more
> correct to explicitly check for fp->f_count == 1.

Not sure what you mean. The commit clearly states that POSIX and BSD 
locking intentionally handled in different ways here. Frankly, I see 
nothing lame in the ordering. The second VOP_ADVLOCK just should be 
moved to fdrop().

> > BTW, I have another little concern with that commit: It make possible for 
> > last close() of a file descriptor to return 0 instead of the error from 
> > VOP_CLOSE(), and the error from VOP_CLOSE() to be ignored.

When a process do closef() on a descriptor "held" by another process 
(by fhold(), e.g. the process do read() on the descriptor), it will 
just return 0 without the call to fo_close(). Then, when the other 
process drop the descriptor, fdrop() call fo_close() but the error is 
thrown away. No?

Dima




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: A call to squease more bytes from `boot2'

1999-11-14 Thread Dmitrij Tejblum

Try "-mpreferred-stack-boundary=2".

Dima




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: egcs -O breaks ping.c:in_cksum()

1999-11-15 Thread Dmitrij Tejblum

> 
> > Maybe I can at least commit the addition of "volatile" to the source
> > code. That will work around that particular bug until egcs is
> > fixed...
> 
> FWIW, the newly committed gcc-2.95.2 doesn't "fix" the problem.

Are you sure? GCC-2.95.2 seems OK here.

Dima




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: kernel: -mpreferred-stack-boundary=2 ??

1999-11-30 Thread Dmitrij Tejblum

Bruce Evans wrote:
>  I would have
> expected the most generally efficient way to align doubles and the new PIII
> obkects to be aligning the stack only in functions that have such objects
> on the stack.  This requires at most one extra instruction:
> 
>   andl $~0xf,$esp 16-byte alignment

I think, it's not that simple in the usual case when the object is accessed via 
(%ebp-).

Dima




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Flash (was: Re: Sound card support)

1999-12-09 Thread Dmitrij Tejblum

> Good luck using it under current.
> 
> First site you hit quits netscape without reasons...
> 
> ...until you drop out of X and see a __sh_getcontext  IIRC warning on
> your console.

If you can hack on the flash plugin's Makefile, try add -fno-exceptions 
there.

Dima




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: msdosfs is dead, Jim.

1999-02-07 Thread Dmitrij Tejblum
Brian Feldman wrote:
> The basic problem is that msdosfs panic()s quite easily with a "zone
> not free" error (INVARIANTS is /ON/ in the kernel), when I attempt to do a rw
> mount of a FAT16.

Don't you, by a chance, load msdosfs module dynamically? If so, the 
module must also be compiled with INVARIANTS

Dima



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-current" in the body of the message


Re: panic: zone: entry not free

1999-02-22 Thread Dmitrij Tejblum
Jos Backus wrote:
> This occurs almost immediately after copying a file to an msdos fs. I can
> provide more info if that is deemed useful.

I suspect your kernel compiled with INVARIANTS, you load msdosfs module 
dynamically, and the module isn't compiled with INVARIANTS. If so, 
don't do that. If not, please provide more info.

Dima




To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-current" in the body of the message



Re: panic: zone: entry not free

1999-02-23 Thread Dmitrij Tejblum
Jos Backus wrote:
> On Tue, Feb 23, 1999 at 02:41:14AM +0300, Dmitrij Tejblum wrote:
> > Jos Backus wrote:
> > > This occurs almost immediately after copying a file to an msdos fs. I can
> > > provide more info if that is deemed useful.
> > 
> > I suspect your kernel compiled with INVARIANTS,
> 
> Yes, and with INVARIANTS_SUPPORT as well as per Matt's instructions.
> 
> > you load msdosfs module dynamically, and the module isn't compiled with
> > INVARIANTS.
> 
> This is after a successful world and subsequent kernel build.
> 
> > If so, don't do that.
> 
> I'm not aware that I do, really.
> 
> jos:/usr/src/sys/modules/msdos# grep INVARIANTS *
> jos:/usr/src/sys/modules/msdos# 
> jos:/usr/src/sys/msdosfs# grep INVARIANTS *
> jos:/usr/src/sys/msdosfs# 

Inline functions in vm/vm_zone.h depend on INVARIANTS. These functions 
used in msdosfs and in other parts of the kernel.

> How does one add INVARIANTS support to modules?

You could add -DINVARIANTS to CFLAGS in sys/module/msdosfs/Makefile. 
You better just link msdosfs statically, or remove INVARIANTS from your 
kernel. That is, INVARIANTS in kernel incompatible with dynamic loading.

Dima






To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-current" in the body of the message



Re: tail /proc/map/*

1999-02-28 Thread Dmitrij Tejblum
Bruce Evans wrote:
> tail(1) assumes that mmap(2) works on works on regular files.  mmap(2) on
> the irregular regular files /proc/*/map returns success but doesn't work.

IMO, it ought to work. There should be no reason why regular files on 
procfs are more "irregular" than regular files on NFS.

> The first access to the mmapped memory usually causes the kernel to
> printf messages like the following:
> 
> vnode_pager: *** WARNING *** stale FS getpages

(Still no such messages here. I still don't run the Terry's submission 
that just introduced the message and spread a lot of same trivial 
getpages/putpages routines over the kernel... I hope to clean out all 
this junk some day, (when everyone will finally forget that matter ;-)
The message is not quite relevant to the problem, though.)

> No strategy for buffer at 0xf12828e8
> : 0xf3877800: type VREG, usecount 4, writecount 0, refcount 0, flags (VOBJBUF)
>   tag VT_PROCFS, type 11, pid 591, mode 124, flags 0
> : 0xf3877800: type VREG, usecount 4, writecount 0, refcount 0, flags (VOBJBUF)
>   tag VT_PROCFS, type 11, pid 591, mode 124, flags 0
> vnode_pager_getpages: I/O read error

That is because procfs define a bogus BMAP operation, but don't define 
a STRATEGY operation. The BMAP operation apparently only useful to 
break mmap(2).

After the BMAP code removed, another procfs bugs become apparent. 
procfs claim that /proc/*/map files are all sizeof(struct regs)( == 76) bytes 
length (:-|), but don't allow read only 76 bytes from the 'map' file.
It confuse the vm code that conver mmap to read, but it also may 
confuse other things. If I change the size of the 'map' file to something 
larger, tail /proc/*/map output something quite reasonable. 

I think procfs_domap should do what requested, and should not try to 
guarantee "atomicity", as now: anyhow, any file may change it content 
between reads, not just under procfs. Also, procfs could compute 'map' 
file size more accurately.

Apparently, such a mmap implementation has coherency problems. But I 
don't think that they are more difficult to solve (or more serious)
than in NFS case.

Dima

P.S. This is the changes that allow me to see a reasonable good result 
from 'tail /proc/*/map'.

--- procfs_vnops.c  Sun Feb 28 15:33:52 1999
+++ procfs_vnops.c  Sun Feb 28 17:29:22 1999
@@ -560,6 +560,12 @@
 
case Ptype:
case Pmap:
+   vap->va_bytes = vap->va_size = 4096;
+   vap->va_nlink = 1;
+   vap->va_uid = procp->p_ucred->cr_uid;
+   vap->va_gid = procp->p_ucred->cr_gid;
+   break;
+
case Pregs:
vap->va_bytes = vap->va_size = sizeof(struct reg);
vap->va_nlink = 1;
@@ -982,7 +988,7 @@
{ &vop_abortop_desc,(vop_t *) procfs_abortop },
{ &vop_access_desc, (vop_t *) procfs_access },
{ &vop_advlock_desc,(vop_t *) procfs_badop },
-   { &vop_bmap_desc,   (vop_t *) procfs_bmap },
+   /*{ &vop_bmap_desc, (vop_t *) procfs_bmap },*/
{ &vop_close_desc,  (vop_t *) procfs_close },
{ &vop_create_desc, (vop_t *) procfs_badop },
{ &vop_getattr_desc,(vop_t *) procfs_getattr },




To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-current" in the body of the message



Re: Simple DOS against 3.x locks box solid

1999-03-14 Thread Dmitrij Tejblum
Matthew Dillon wrote:
> - error = acquire(lkp, extflags,
> - LK_HAVE_EXCL | LK_WANT_EXCL | LK_WANT_UPGRADE);
> + if (p->p_flag & P_DEADLKTREAT) {
> + error = acquire(

This is broken: p may be NULL, it is checked several lines before. 
My kernel just paniced for this reason.

Well, sorry for late response, but: what was wrong with Tor Egge's 
"workaround" from kern/8416?

Dima




To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-current" in the body of the message



Re: Simple DOS against 3.x locks box solid

1999-03-15 Thread Dmitrij Tejblum
Matthew Dillon wrote:
> 
> We'll get a quick fix committed but the lockmgr stuff needs a real
> going-over... having interrupts using the general lockmgr call is
> a disaster waiting to happen.

Hmmm. After I looked a bit further, it looks like a bug in the 
scheduler (?). Here is the stack trace:

#9  0xc01ff64e in trap (frame={tf_es = 16, tf_ds = 16, tf_edi = 0, 
  tf_esi = 16777216, tf_ebp = -999002708, tf_isp = -999002744, 
  tf_ebx = -1071228500, tf_edx = -2, tf_ecx = 0, tf_eax = 0, 
  tf_trapno = 12, tf_err = 0, tf_eip = -1072584332, tf_cs = 8, 
  tf_eflags = 66050, tf_esp = -999002524, tf_ss = -1071228500})
at ../../i386/i386/trap.c:438
#10 0xc011a974 in lockmgr (lkp=0xc02659ac, flags=1, interlkp=0x0, p=0x0)
at ../../kern/kern_lock.c:217
#11 0xc01d8c5b in vm_map_lookup (var_map=0xc4746e64, vaddr=3294351360, 
fault_typea=1 '\001', out_entry=0xc4746e68, object=0xc4746e5c, 
pindex=0xc4746e60, out_prot=0xc4746e4b "ю\a", wired=0xc4746e44)
at ../../vm/vm_map.c:2463
#12 0xc01d4153 in vm_fault (map=0xc02659ac, vaddr=3294351360, 
fault_type=1 '\001', fault_flags=0) at ../../vm/vm_fault.c:197
#13 0xc01ff9ac in trap_pfault (frame=0xc4746f18, usermode=0, eva=3294351360)
at ../../i386/i386/trap.c:825
#14 0xc01ff64e in trap (frame={tf_es = 16, tf_ds = 16, tf_edi = 46137344, 
  tf_esi = -1071149988, tf_ebp = -999002244, tf_isp = -999002304, 
  tf_ebx = 18341888, tf_edx = -1000615936, tf_ecx = -1005747008, 
  tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip = -1071650796, tf_cs = 8, 
  tf_eflags = 65606, tf_esp = -1072552121, tf_ss = -999654400})
at ../../i386/i386/trap.c:438
#15 0xc01fe814 in swtch_com ()
#16 0xc01ff859 in trap (frame={tf_es = 47, tf_ds = 47, tf_edi = 20, 
  tf_esi = 136019608, tf_ebp = -1077948228, tf_isp = -999002156, 
  tf_ebx = 307, tf_edx = 136220264, tf_ecx = 136630944, 
  tf_eax = 135716928, tf_trapno = 7, tf_err = 0, tf_eip = 134536416, 
  tf_cs = 31, tf_eflags = 514, tf_esp = -1077948244, tf_ss = 47})
at ../../i386/i386/trap.c:195
#17 0xc01f5aa3 in swi_ast_user ()

the trap in swtch_com() (frame #15) is here:
/* switch address space */  <- line 622
movl%cr3,%ebx
cmplPCB_CR3(%edx),%ebx  <- trap
je  4f

I don't think this line is supposed to cause a trap...

Dima




To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-current" in the body of the message



Re: msdosfs problems?

1999-04-10 Thread Dmitrij Tejblum
Alex Zepeda wrote:
> panic: vm_page_bits: illegal base/size 4096/2048

The panic is hopefully just fixed in vnode_pager.c rev.1.107. I didn't 
quite understand if you have other msdosfs problems.

Dima




To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-current" in the body of the message



Re: SPAM

1999-05-10 Thread Dmitrij Tejblum
> > "Jonathan M. Bresler" wrote:
> > >   with volunteers, we could moderate the list(s). mail transfer
> > > would be slower as we wait for the moderator(s) to approve each piece
> > > of email.  if we use more than one moderator per list, the
> > > time-sequence of email would be lostwe would get some very
> > > strange threads...could be enteraining.
> >
> > Have you ever considered only allowing list members to post, or are
> > there difficulties that make this impossible?

I suggest following approach: moderate only mail that lack the mailing 
list name in To: or Cc: headers. It is far from ideal, but I think 
would work reasonably well.

Dima




To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-current" in the body of the message



Re: make world croaks in perl ??

1999-05-13 Thread Dmitrij Tejblum
Yup, this is supposed to fix the problem, that I introduced a day 
before. The problem was 'make -jN'-depended. Sorry for the 
inconvinience.

BTW, I hope there will be no 'Your makefile has been rebuilt' failures  
anymore.

Dima

Steve Kargl wrote:
> It doesn't fix it here, but "dt" committed a fix to a Makefile
> in the Perl tree.  This might actually fix the problem.  I'm
> rebuilding now to see. 
> 
> Poul-Henning Kamp wrote:
> > 
> > Hmm, double make cleandir fixed it.
> > 
> > Could a conveniently located perl wizard try to figure out what
> > this is tripping over and fix the build ?
> > 
> > In message <9459.926593...@critter.freebsd.dk>, Poul-Henning Kamp writes:
> > >
> > >Is anybody but me seeing this ?
> > >
> > >===> gnu/usr.bin/perl
> > >===> gnu/usr.bin/perl/libperl
> > >===> gnu/usr.bin/perl/miniperl
> > >sh config_h.sh
> > >Extracting config.h (with variable substitutions)
> > >cc -nostdinc -O -pipe 
> > >-I/usr/src/gnu/usr.bin/perl/miniperl/../../../../contrib/p
> > >erl5 -I/usr/obj/usr/src/gnu/usr.bin/perl/miniperl   
> > >-I/usr/obj/usr/src/tmp/usr/i
> > >nclude -c 
> > >/usr/src/gnu/usr.bin/perl/miniperl/../../../../contrib/perl5/miniperlm
> > >ain.c
> > >cc -nostdinc -O -pipe 
> > >-I/usr/src/gnu/usr.bin/perl/miniperl/../../../../contrib/p
> > >erl5 -I/usr/obj/usr/src/gnu/usr.bin/perl/miniperl   
> > >-I/usr/obj/usr/src/tmp/usr/i
> > >nclude  -static -o miniperl miniperlmain.o  -lperl -lm -lcrypt
> > >===> gnu/usr.bin/perl/perl
> > >make: don't know how to make writemain.sh. Stop
> > >*** Error code 2
> > >
> > >
> > >--
> > >Poul-Henning Kamp FreeBSD coreteam member
> > >p...@freebsd.org   "Real hackers run -current on their laptop."
> > >FreeBSD -- It will take a long time before progress goes too far!
> > >
> > >
> > >To Unsubscribe: send mail to majord...@freebsd.org
> > >with "unsubscribe freebsd-current" in the body of the message
> > >
> > 
> > --
> > Poul-Henning Kamp FreeBSD coreteam member
> > p...@freebsd.org   "Real hackers run -current on their laptop."
> > FreeBSD -- It will take a long time before progress goes too far!
> > 
> > 
> > To Unsubscribe: send mail to majord...@freebsd.org
> > with "unsubscribe freebsd-current" in the body of the message
> > 
> 
> 
> -- 
> Steve




To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-current" in the body of the message



Re: Berkeley DB 1.85 --> 2.0

1999-05-14 Thread Dmitrij Tejblum
"Andrey A. Chernov" wrote:
> On Fri, May 14, 1999 at 11:15:35AM -0400, John R. LoVerso wrote:
> > Of course, DB 2 is still available as an easily installed port/package.
> 
> Not so easily, it conflict with libc's DB in subtle but harmful manner.

Only if it is configured with --enable-compat185. Just Don't Do That.
(Yep, the port do it, and for this reason I consider it broken).

Dima




To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-current" in the body of the message



calcru and upages

1999-05-19 Thread Dmitrij Tejblum
calcru() access p_stats, which is in upages. Therefore, as I understand, 
it should not be called on a swapped out process. Neither calcru() nor 
its callers seem to ensure this. At least the call in procfs_dostatus()
may happen on a swapped out process. (It test for P_INMEM for another 
access to p_stats several lines before :-/)

Dima




To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-current" in the body of the message



Re: calcru and upages

1999-05-23 Thread Dmitrij Tejblum
Peter Wemm wrote:
> Bruce Evans wrote:
> > >calcru() access p_stats, which is in upages. Therefore, as I understand, 
> > >it should not be called on a swapped out process. Neither calcru() nor 
> > 
> > Does anyone object to moving everything except the stack from the upages
> > to the proc table?

This would certainly make my sleep better. However, IMO the real 
problem here is the hackish way the VM maintain upages. It is not
so hard to make such incorrect accesses to u-area detected better.
I used this:

--- vm_glue.c   Thu May 20 00:24:18 1999
+++ vm_glue.c   Thu May 20 00:27:33 1999
@@ -317,6 +317,9 @@ faultin(p)
setrunqueue(p);
 
p->p_flag |= P_INMEM;
+   p->p_stats = &p->p_addr->u_stats;
+   if (p->p_sigacts == NULL)
+   p->p_sigacts = &p->p_addr->u_sigacts;
 
/* undo the effect of setting SLOCK above */
--p->p_lock;
@@ -516,6 +519,9 @@ swapout(p)
(void) splhigh();
p->p_flag &= ~P_INMEM;
p->p_flag |= P_SWAPPING;
+   p->p_stats = NULL;
+   if (p->p_sigacts == &p->p_addr->u_sigacts)
+   p->p_sigacts = NULL;
if (p->p_stat == SRUN)
remrq(p);
(void) spl0();


Probably better idea would be pass MAP_NOFAULT in a non-currently-existent 
argument to kmem_alloc_pageable() in pmap_new_proc().

> Well, we have three things that are about the same size:
> struct  pcb u_pcb;240 bytes
> struct  sigacts u_sigacts;292 bytes
> struct  pstats u_stats;   248 bytes
> 
> On the other hand:  sizeof (struct proc) = 328 bytes.
> 
> the pcb contains a heap of space for the FP state.  It accounts for 176 of
> the 240 bytes, leaving 64-odd bytes left for the pcb proper.  The ldt
> pointers need to move to proc scope for rfork()/clone(), and gc'ing a few
> things that can get it as low as 40 - 48 bytes.  pcb_savefpu has padding in
> case a FPU emulator is used and is actually smaller than 176 bytes, and
> could be changed depending on whether it's a real or emulated fpu.

I guess, this is bit different on alpha ;-).

> 
> IMHO, I'd move them to reference counted malloc'ed structs since sigacts
> needs to be shared for clone/rforked processes.

I think sigacts is already sometimes shared, and not stored in u-area 
in these cases.

> I think there is also
> benefit to having the sigacts at least malloced, one day we should be able
> to extend the signals beyond the existing 32 set, at least for the 32
> extra RT signals.

Isn't this an argument for keep them in upages? When the struct is 
larger, you want to swap it out stronger? 

> I personally would love to see this come out of the upages, it makes
> tracking through a stack overflow even harder.  We could put an unmapped
> red-line page below the bottom stack page to ensure we get a double fault
> on an overflow instead of mystery corruptions etc.

Why not move 'struct user' on top of the upages, above the kernel stack?

Sayed all that, I don't actually suggest to keep struct user. I almost 
hate it.

Dima




To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-current" in the body of the message



Re: kvm_getswapinfo is broken

1999-05-26 Thread Dmitrij Tejblum
"Andrey A. Chernov" wrote:
> Just check 'swapinfo' in recent -current, it shows "/dev/(null)" as swap
> device, it means that devinfo() call in kvm_getswapinfo() returns NULL,
> i.e. called with wrong argument which is swinfo.sw_dev

This is a known problem. It is because dev_t in kernel and dev_t and 
userland are now different things. This is worse on the alpha, where 
they also have different sizes. So, on the alpha, the numbers are broken 
too, not just device names.

Supposedly, it will be fixed by a junior kernel hacker.

Dima




To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-current" in the body of the message



Re: net.inet.tcp.always_keepalive on as default ?

1999-06-08 Thread Dmitrij Tejblum
"Louis A. Mamakos" wrote:
>
> Before documenting it, how about we fix it's name to be more accurate
> to newcomers: net.inet.tcp.always_makedead, etc.  There's no part of
> this (in many cases misguided) mechanism that keeps anything "alive."

I disagree. I use keepalive exactly to keep my connections (mostly ssh 
sessions) alive.

The sysadmin of our corporate LAN made our router drop all connections 
idle for 15 minutes or so. As I understand, the router send fake RST 
packets. The sysadmin believe that it increase security. I tried to 
convince him to not do it, I tried to kill him, but didn't succeed. So,
now I set net.inet.tcp.keepidle to a really low value, and it keeps my
connections alive!

Dima




To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-current" in the body of the message