from:"Jeff Roberson"

PXE build?

2000-11-14 Thread Jeff Roberson

Title: PXE build?





Does anyone know of any current issues with PXE?  I've searched the mailing lists and I don't see any mention of a problem similar to mine.

I'm running FreeBSD-CURRENT from 2000 09 15 on a server.  The client has an Intel 21143 based ethernet card that claims it has PXE 2.0 (Build 74) support. I've setup bootp/tftp on the server which the client successfully uses to pull down the 'pxeboot' file.  After the client retrieves pxeboot it just hangs.  There is no further output from the machine.

Does anyone know which particular build of PXE 2.0 works with pxeboot?  Or is this even a problem with my firmware?


Thanks,
Jeff

Bug Fix for SYSV semaphores.

2000-12-12 Thread Jeff Roberson

Title: Bug Fix for SYSV semaphores.





I noticed that sysv semaphores initialize the otime member of the semid_ds structure to 0, but they never update it afterwards.  This field is supposed to be the last operation time.  ie the last time a semctl was done.  In UNIX Network Programming, Stevens suggests using this variable to detect races between multiple processes creating/accessing a sysv semaphore.  Anyway, I looked through the code and came up with the following trivial patch.  Could some one review it and perhaps commit it?  This patch was made against current, but I noticed the bug is there in 4.1.1 and most likely everything before that.

Thanks,
Jeff


(Pardon the revision numbers, they are from my own repository)


*** sysv_sem.c  2000/09/15 11:11:48 1.1.1.1
--- sysv_sem.c  2000/12/12 23:44:28
***
*** 543,548 
--- 543,550 
    return(EINVAL);
    }
  
+   semaptr->sem_otime = time_second;
+ 
    if (eval == 0)
    p->p_retval[0] = rval;
    return(eval);

Broken mmap in current?

2001-01-11 Thread Jeff Roberson

Title: Broken mmap in current?





I have written a character device driver for a proprietary PCI device that has a large sum of mapable memory.  The character device supports mmap() which I use to export the memory into a user process.  I have no problems accessing the memory on this device, but I notice that my mmap routine is called for every access!  Is this a problem with current, or a problem with my mmap?

I use bus_alloc_resource and then rman_get_start to get the physical address in my attach, and then the mmap just returns atop(physical address).  I assumed this is correct since I have verified with a logical analyzer that I am indeed writing to the memory on the device.  Also, I noticed that the device's mmap interface does not provide any way to limit the size of the block being mapped?  Can I specify the length of the region?

Thanks,
Jeff

RE: Broken mmap in current?

2001-01-12 Thread Jeff Roberson

Title: RE: Broken mmap in current?

I think I spoke too soon..  I saw thousands of calls to mmap and assumed it was the thousands of read/writes that I was doing.  It's actually for the thousands (8192) of pages that I'm mapping in.  Oddly enough though there are only 3272 calls to my mmap routine each time I run the program.  I will investigate further.

I did find a bug in mlock() and munlock().  I tried mlock()ing after I mmaped, which I later realized was bogus since the pages are always resident as they exist on the bus.  Anyway the kernel faults in vm_page_unwire when I munlock.  I will investigate further and post a pr though.

Thanks for your help!
Jeff

-Original Message-
From: Bruce Evans [mailto:[EMAIL PROTECTED]]
Sent: Thursday, January 11, 2001 8:52 PM
To: Jeff Roberson
Cc: '[EMAIL PROTECTED]'
Subject: Re: Broken mmap in current?

On Thu, 11 Jan 2001, Jeff Roberson wrote:

> I have written a character device driver for a proprietary PCI device that
> has a large sum of mapable memory.  The character device supports mmap()
> which I use to export the memory into a user process.  I have no problems
> accessing the memory on this device, but I notice that my mmap routine is
> called for every access!  Is this a problem with current, or a problem with
> my mmap?

Maybe both.  The device mmap routine is called mainly by the mmap
syscall for every page to be mmapped.  It is also called by
dev_pager_getpages() for some pagefaults, but I think this rarely happens.

> I use bus_alloc_resource and then rman_get_start to get the physical address
> in my attach, and then the mmap just returns atop(physical address).  I
> assumed this is correct since I have verified with a logical analyzer that I
> am indeed writing to the memory on the device.

This is correct.  I looked at some examples.  Many drivers get this
wrong by using i386_btop(), alpha_btop(), etc.  (AFAIK, atop() is
for addresses which are what we are converting here, btop() is for
(byte) offsets, and the machine-dependent prefixes are a vestige of
page clustering code that mostly went away 7 years ago.

> Also, I noticed that the
> device's mmap interface does not provide any way to limit the size of the
> block being mapped?  Can I specify the length of the region?

The length is implicitly PAGE_SIZE.  The device mmap function is called
for each page to be mapped.  It must verify that the memory from offset
to (offset + PAGE_SIZE - 1) belongs to the device and can be accessed
with the given protection, and do any device-specific things necessary to
enable this memory.  This scheme can't support bank-switched device
memory very well, if at all.

pcvvt_mmap() in the pcvt driver is the simplest example of this.
agp_mmap() is a more up to date example with the same old bug that the
vga drivers used to have (off by 1 (page) error checking the offset).

Bruce

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

HEADS UP: SUJ Going in to head today

2010-04-20 Thread Jeff Roberson


Hi Folks,

You may have seen my other Soft-updates journaling (SUJ) announcements. 
If not, it is a journaling system that works cooperatively with 
soft-updates to eliminate the full background filesystem check after an 
unclean shutdown.  SUJ may be enabled with tunefs -j enable and disabled 
with tunefs -j disable on an unmounted filesystem.  It is backwards 
compatible with soft-updates with no journal.


I'm going to do another round of tests and buildworld this afternoon to 
verify the diff and then I'm committing to head.  This is a very large 
feature and fundamentally changes softupdates.  Although it has been 
extensively tested by many there may be unforseen problems.  If you run 
into an issue that you think may be suj please email me directly as well 
as posting on current as I sometimes miss list email and this will ensure 
the quickest response.


Thanks,
Jeff
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: HEADS UP: SUJ Going in to head today

2010-04-21 Thread Jeff Roberson


On Tue, 20 Apr 2010, Patrick Tracanelli wrote:


Jeff Roberson escreveu:

Hi Folks,

You may have seen my other Soft-updates journaling (SUJ) announcements.
If not, it is a journaling system that works cooperatively with
soft-updates to eliminate the full background filesystem check after an
unclean shutdown.  SUJ may be enabled with tunefs -j enable and disabled
with tunefs -j disable on an unmounted filesystem.  It is backwards
compatible with soft-updates with no journal.

I'm going to do another round of tests and buildworld this afternoon to
verify the diff and then I'm committing to head.  This is a very large
feature and fundamentally changes softupdates.  Although it has been
extensively tested by many there may be unforseen problems.  If you run
into an issue that you think may be suj please email me directly as well
as posting on current as I sometimes miss list email and this will
ensure the quickest response.


Hello Jeff, McKusick and others envolved.

Is an MFC technically possible? If so, are there plans to do so?


I do have an 8 backport branch available although it is a little stale.  I 
intend to keep it somewhat up to date.  I think it will take some time 
before we have sufficient experience with SUJ in head before we want to 
put it back in 8.  It is quite a complex and disruptive feature.


Thanks,
Jeff



Thank you.

--
Patrick Tracanelli


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: HEADS UP: SUJ Going in to head today

2010-04-23 Thread Jeff Roberson


On Wed, 21 Apr 2010, Garrett Cooper wrote:


On Wed, Apr 21, 2010 at 12:39 AM, Gary Jennejohn
 wrote:

On Tue, 20 Apr 2010 12:15:48 -1000 (HST)
Jeff Roberson  wrote:


Hi Folks,

You may have seen my other Soft-updates journaling (SUJ) announcements.
If not, it is a journaling system that works cooperatively with
soft-updates to eliminate the full background filesystem check after an
unclean shutdown.  SUJ may be enabled with tunefs -j enable and disabled
with tunefs -j disable on an unmounted filesystem.  It is backwards
compatible with soft-updates with no journal.

I'm going to do another round of tests and buildworld this afternoon to
verify the diff and then I'm committing to head.  This is a very large
feature and fundamentally changes softupdates.  Although it has been
extensively tested by many there may be unforseen problems.  If you run
into an issue that you think may be suj please email me directly as well
as posting on current as I sometimes miss list email and this will ensure
the quickest response.



And the crowd goes wild.

SUJ is _great_ and I'm glad to see it finally making it into the tree.


   Indeed. I'm looking forward to testing the junk out of this --
this is definitely a good move forward with UFS2 :]...
Cheers,
-Garrett

PS How does this interact with geom with journaling BTW? Has this been
tested performance wise (I know it doesn't make logistical sense, but
it does kind of seem to null and void the importance of geom with
journaling, maybe...)?



A quick update;  I found a bug with snapshots that held up the commit. 
Hopefully I will be done with it tonight.


About gjournal; there would be no reason to use the two together.  There 
may be cases where each is faster.  In fact it is very likely.  pjd has 
said he thinks suj will simply replace gjournal.  GEOM itself is no less 
important with suj in place as it of course fills many roles.


Performance testing has been done.  There is no regression in softdep 
performance with journaling disabled.  With journaling enabled there are 
some cases that are slightly slower.  It adds an extra ordered write so 
any time you modify the filesystem metadata and then require it to be 
synchronously written to disk you may wait for an extra transaction.


There are ways to further improve the performance.  In fact I did some 
experiments that showed dbench performance nearly identical to vanilla 
softdep if I can resolve one wait situation.  Although this is not trivial 
it is possible.  The CPU overhead ended up being surprisingly trivial in 
the cases I tested.  Really the extra overhead is only when doing sync 
writes that allocate new blocks.


I am eager to see wider coverage and hear feedback from more people.  I 
suspect for all desktop and nearly all server use it will simply be 
transparent.


Thanks,
Jeff___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: HEADS UP: SUJ Going in to head today

2010-04-24 Thread Jeff Roberson


On Sun, 25 Apr 2010, Alex Keda wrote:


try in single user mode:

tunefs -j enable /
tunefs: Insuffient free space for the journal
tunefs: soft updates journaling can not be enabled

tunefs -j enable /dev/ad0s2a
tunefs: Insuffient free space for the journal
tunefs: soft updates journaling can not be enabled
tunefs: /dev/ad0s2a: failed to write superblock


There is a bug that prevents enabling journaling on a mounted filesystem. 
So for now you can't enable it on /.  I see that you have a large / volume 
but in general I would also suggest people not enable suj on / anyway as 
it's typically not very large.  I only run it on my /usr and /home 
filesystems.


I will send a mail out when I figure out why tunefs can't enable suj on / 
while it is mounted read-only.


Thanks,
Jeff



on / (/dev/ad0s2a) ~40Gb free.
dc7700p$ uname -a
FreeBSD dc7700p.lissyara.su 9.0-CURRENT FreeBSD 9.0-CURRENT #0 r207156: Sun 
Apr 25 00:04:24 MSD 2010 
lissy...@dc7700p.lissyara.su:/usr/obj/usr/src/sys/GENERIC  amd64

dc7700p$
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: HEADS UP: SUJ Going in to head today - panic on rename()

2010-04-26 Thread Jeff Roberson


On Mon, 26 Apr 2010, Vladimir Grebenschikov wrote:


Hi

First, many thanks for this effort, it is really very appreciated,

Panic on Gnome starting:


Thank you for the report with stack.  That was very helpful.  I know how 
to fix this bug but it will take me a day or two as my primary test 
machine seems to have died.


For now you will have to tunefs -j disable on that volume.

Thanks,
Jeff



# kgdb -q /usr/obj/usr/src/sys/VBOOK/kernel.debug /var/crash/vmcore.12
...
#0  doadump () at pcpu.h:246
246 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) x/s panicstr
0xc07c2160 :"remove_from_journal: 0xc581ec40 is not in journal"
(kgdb) bt
#0  doadump () at pcpu.h:246
#1  0xc056b883 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:416
#2  0xc056babd in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:590
#3  0xc0488ba9 in db_fncall (dummy1=1, dummy2=0, dummy3=-1065321792, dummy4=0xd90d572c 
"") at /usr/src/sys/ddb/db_command.c:548
#4  0xc0488fa1 in db_command (last_cmdp=0xc07abb1c, cmd_table=0x0, dopager=1) 
at /usr/src/sys/ddb/db_command.c:445
#5  0xc04890fa in db_command_loop () at /usr/src/sys/ddb/db_command.c:498
#6  0xc048af7d in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:229
#7  0xc0597f54 in kdb_trap (type=3, code=0, tf=0xd90d58c4) at 
/usr/src/sys/kern/subr_kdb.c:535
#8  0xc06f842e in trap (frame=0xd90d58c4) at /usr/src/sys/i386/i386/trap.c:694
#9  0xc06dcf7b in calltrap () at /usr/src/sys/i386/i386/exception.s:165
#10 0xc05980ba in kdb_enter (why=0xc0747a43 "panic", msg=0xc0747a43 "panic") at 
cpufunc.h:71
#11 0xc056baa1 in panic (fmt=0xc0755fee "remove_from_journal: %p is not in 
journal") at /usr/src/sys/kern/kern_shutdown.c:573
#12 0xc0672135 in remove_from_journal (wk=0xc0c3ec2f) at 
/usr/src/sys/ufs/ffs/ffs_softdep.c:2204
#13 0xc067e273 in cancel_jaddref (jaddref=0xc581ec40, inodedep=0xc5c58700, 
wkhd=0xc5c5875c) at /usr/src/sys/ufs/ffs/ffs_softdep.c:3336
#14 0xc067f163 in softdep_revert_link (dp=0xc681f9f8, ip=0xc681f910) at 
/usr/src/sys/ufs/ffs/ffs_softdep.c:3871
#15 0xc0697fd0 in ufs_rename (ap=0xd90d5c1c) at 
/usr/src/sys/ufs/ufs/ufs_vnops.c:1546
#16 0xc070ead6 in VOP_RENAME_APV (vop=0xc0796340, a=0xd90d5c1c) at 
vnode_if.c:1474
#17 0xc05f2902 in kern_renameat (td=0xc586e8c0, oldfd=-100, old=0x4856ca30 
, newfd=-100,
   new=0x4856ca90 , pathseg=UIO_USERSPACE) at 
vnode_if.h:636
#18 0xc05f29b6 in kern_rename (td=0xc586e8c0, from=0x4856ca30 , to=0x4856ca90 , pathseg=UIO_USERSPACE)
   at /usr/src/sys/kern/vfs_syscalls.c:3574
#19 0xc05f29e9 in rename (td=0xc586e8c0, uap=0xd90d5cf8) at 
/usr/src/sys/kern/vfs_syscalls.c:3551
#20 0xc06f7c49 in syscall (frame=0xd90d5d38) at 
/usr/src/sys/i386/i386/trap.c:1113
#21 0xc06dcfe0 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:261
#22 0x0033 in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb)


Just after fsck -y && tunefs -j enable for both / and /usr in
single-user mode and then usual boot

panic is reproducible


--
Vladimir B. Grebenschikov
v...@fbsd.ru


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: HEADS UP: SUJ Going in to head today

2010-04-26 Thread Jeff Roberson


On Sun, 25 Apr 2010, Lucius Windschuh wrote:


Hi Jeff,
thank you for your effort in implementing the soft update journaling.
I tried to test SUJ on a provider with 4 kB block size. My system runs
9-CURRENT r207195 (i386).
Unfortunately, tunefs is unable to cope with the device. It can easily
reproduced with these steps:

# mdconfig -s 128M -S 4096
0
#  newfs -U /dev/md0


Thanks for the repro.  This is an interesting case.  I'll have to slightly 
rewrite the directory handling code in tunefs but it should not take long.


Thanks,
Jeff


/dev/md0: 128.0MB (262144 sectors) block size 16384, fragment size 4096
   using 4 cylinder groups of 32.02MB, 2049 blks, 2112 inodes.
   with soft updates
# tunefs -j enable /dev/md0
Using inode 4 in cg 0 for 4194304 byte journal
tunefs: Failed to read dir block: Invalid argument
tunefs: soft updates journaling can not be enabled

The bread() in tunefs.c:701 fails as the requested block size (512) is
smaller than the provider's block size (4096 bytes).

As a simply attempt to fix it, I changed tunefs.c:760 to "if
(dir_extend(blk, nblk, size, ino) == -1)", as I thought that this made
more sense. Then, tunefs succeeded, but mounting the file system
resulted in a panic:
panic: ufs_dirbad: /mnt/md-test: bad dir ino 2 at offset 512: mangled entry

db:0:kdb.enter.default>  bt
Tracing pid 2714 tid 100262 td 0xc7ea6480
kdb_enter(c0a21226,c0a21226,c0a49886,eb1e6714,0,...) at kdb_enter+0x3a
panic(c0a49886,c688f468,2,200,c0a498df,...) at panic+0x136
ufs_dirbad(c81bb000,200,c0a498df,0,eb1e67b0,...) at ufs_dirbad+0x46
ufs_lookup_ino(c81d5990,0,eb1e67d8,eb1e6800,0,...) at ufs_lookup_ino+0x367
softdep_journal_lookup(c688f288,eb1e68c4,c0a45eca,750,eb1e6834,...) at
softdep_journal_lookup+0xb0
softdep_mount(c7e3fbb0,c688f288,c8165000,c7bdf900,c7bdf900,...) at
softdep_mount+0xdb
ffs_mount(c688f288,0,c0a2df89,3d6,0,...) at ffs_mount+0x23e1
vfs_donmount(c7ea6480,0,c7bc6100,c7bc6100,c8031000,...) at vfs_donmount+0x1000
nmount(c7ea6480,eb1e6cf8,c,c,207,...) at nmount+0x64
syscall(eb1e6d38) at syscall+0x1da
Xint0x80_syscall() at Xint0x80_syscall+0x20
--- syscall (378, FreeBSD ELF32, nmount), eip = 0x280f205b, esp =
0xbfbfdcec, ebp = 0xbfbfe248 ---

... so this attempt did not succeed, but was worth a try ;-)

But it would be nice to use SUJ even on such a unusual configuration.

Lucius


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: HEADS UP: SUJ Going in to head today

2010-04-26 Thread Jeff Roberson


On Sun, 25 Apr 2010, Gary Jennejohn wrote:


On Sat, 24 Apr 2010 16:57:59 -1000 (HST)
Jeff Roberson  wrote:


On Sun, 25 Apr 2010, Alex Keda wrote:


try in single user mode:

tunefs -j enable /
tunefs: Insuffient free space for the journal
tunefs: soft updates journaling can not be enabled

tunefs -j enable /dev/ad0s2a
tunefs: Insuffient free space for the journal
tunefs: soft updates journaling can not be enabled
tunefs: /dev/ad0s2a: failed to write superblock


There is a bug that prevents enabling journaling on a mounted filesystem.
So for now you can't enable it on /.  I see that you have a large / volume
but in general I would also suggest people not enable suj on / anyway as
it's typically not very large.  I only run it on my /usr and /home
filesystems.

I will send a mail out when I figure out why tunefs can't enable suj on /
while it is mounted read-only.



Jeff -
One thing which surprised me was that I couldn't reuse the existing
.sujournal files on my disks.  I did notice that there are now more
flags set on them.  Was that the reason?  Or were you just being
careful?


There were a few iterations of the code to create and discover the actual 
journal inode.  I may have introduced an incompatibility when making fsck 
more careful about what it treats as a journal.  If it were to attempt to 
apply changes from a garbage file it could corrupt your filesystem.


Thanks,
Jeff



--
Gary Jennejohn


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: HEADS UP: SUJ Going in to head today

2010-04-26 Thread Jeff Roberson


On Sun, 25 Apr 2010, Scott Long wrote:


On Apr 24, 2010, at 8:57 PM, Jeff Roberson wrote:

On Sun, 25 Apr 2010, Alex Keda wrote:


try in single user mode:

tunefs -j enable /
tunefs: Insuffient free space for the journal
tunefs: soft updates journaling can not be enabled

tunefs -j enable /dev/ad0s2a
tunefs: Insuffient free space for the journal
tunefs: soft updates journaling can not be enabled
tunefs: /dev/ad0s2a: failed to write superblock


There is a bug that prevents enabling journaling on a mounted filesystem. So 
for now you can't enable it on /.  I see that you have a large / volume but in 
general I would also suggest people not enable suj on / anyway as it's 
typically not very large.  I only run it on my /usr and /home filesystems.

I will send a mail out when I figure out why tunefs can't enable suj on / while 
it is mounted read-only.



This would preclude enabling journaling on / on an existing system, but I would 
think that you could enable it on / on a system that is being installed, since 
(at least in theory) the target / filesystem won't be the actual root of the 
system, and therefore can be unmounted at will.


That's definitely true.  Some users have had mixed success enabling it on 
/.  It looks like it is a bug either in g_access or ffs's use of g_access 
which does not allow tunefs to write after a downgrade.  I'm not yet sure 
how this is presently working for the softdep flag itself, or if it 
actually is at all.


To clarify my earlier statements:  Journaling only makes sense when the 
fsck time is longer than a few tens of seconds.  So volumes less than a 
gig or two don't really need journaling.  It just costs extra writes and 
fsck time will likely be similar.  In some pathological cases it can even 
be faster to fsck a small volume than it is to run the journal recovery on 
it.


Thanks,
Jeff



Scott


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: HEADS UP: SUJ Going in to head today

2010-04-26 Thread Jeff Roberson


On Sun, 25 Apr 2010, Bruce Cran wrote:


On Sunday 25 April 2010 19:47:00 Scott Long wrote:

On Apr 24, 2010, at 8:57 PM, Jeff Roberson wrote:

On Sun, 25 Apr 2010, Alex Keda wrote:

try in single user mode:

tunefs -j enable /
tunefs: Insuffient free space for the journal
tunefs: soft updates journaling can not be enabled

tunefs -j enable /dev/ad0s2a
tunefs: Insuffient free space for the journal
tunefs: soft updates journaling can not be enabled
tunefs: /dev/ad0s2a: failed to write superblock


There is a bug that prevents enabling journaling on a mounted filesystem.
So for now you can't enable it on /.  I see that you have a large /
volume but in general I would also suggest people not enable suj on /
anyway as it's typically not very large.  I only run it on my /usr and
/home filesystems.

I will send a mail out when I figure out why tunefs can't enable suj on /
while it is mounted read-only.


This would preclude enabling journaling on / on an existing system, but I
would think that you could enable it on / on a system that is being
installed, since (at least in theory) the target / filesystem won't be the
actual root of the system, and therefore can be unmounted at will.


It worked here - it's shown as enabled after I booted in single-user mode and
enabled it yesterday:


I think some people are enabling after returning to single user from a 
live system rather than booting into single user.  This is a different 
path in the filesystem as booting directly just mounts read-only while the 
other option updates a mount from read/write.  I believe this is the path 
that is broken.


Thanks,
Jeff



core# dumpfs / | grep -i journal
flags   soft-updates+journal

--
Bruce Cran


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: HEADS UP: SUJ Going in to head today

2010-04-26 Thread Jeff Roberson


On Mon, 26 Apr 2010, pluknet wrote:


On 26 April 2010 17:42, dikshie  wrote:

Hi Jeff,
thanks for SUJ.
btw, why there is nan% utilization? and what does it mean?
--
** SU+J Recovering /dev/ad0s1g
** Reading 33554432 byte journal from inode 4.
** Building recovery table.
** Resolving unreferenced inode list.
** Processing journal entries.
** 0 journal records in 0 bytes for nan% utilization <
** Freed 0 inodes (0 dirs) 0 blocks, and 0 frags.
--



That may be due to an empty journal (the only plausible version for me),
so jrecs and jblocks are not updated.


Yes, this is it exactly.  It's a simple bug, I will post a fix in the next 
few days.


Thanks,
Jeff



   /* Next ensure that segments are ordered properly. */
   seg = TAILQ_FIRST(&allsegs);
   if (seg == NULL) {
   if (debug)
   printf("Empty journal\n");
   return;
   }

--
wbr,
pluknet
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

SUJ update

2010-04-29 Thread Jeff Roberson


Hello,

I fixed a few SUJ bugs.  If those of you who reported one of the following 
bugs could re-test I would greatly appreciate it.


1)  panic on gnome start via softdep_cancel_link().
2)  Difficulty setting flags on /.  This can only be done from a direct 
boot into single user but there were problems with tunefs that could lead 
to the kernel and disk becoming out of sync with filesystem state.

3)  Kernel compiles without SOFTUPDATES defined in the config now work.

I have had some reports of a hang waiting on journal space with certain 
types of activity.  I have only had this reported twice and I am not able 
to reproduce no matter how much load I throw at the machine.  If you 
reproduce this please try to get a coredump or minidump.


Thanks,
Jeff
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: SUJ update

2010-05-01 Thread Jeff Roberson


On Sat, 1 May 2010, Bruce Cran wrote:


On Thu, Apr 29, 2010 at 06:37:00PM -1000, Jeff Roberson wrote:


I fixed a few SUJ bugs.  If those of you who reported one of the
following bugs could re-test I would greatly appreciate it.



I've started seeing a panic "Sleeping thread owns a non-sleepable lock",
though it seems to be occurring both with and without journaling. The
back trace when journaling is disabled is:


Can you tell me what the lock is?  This may be related to recent vm work 
which went in at the same time.




sched_switch
mi_switch
sleepq_wait
_sleep
bwait
bufwait
bufwrite
ffs_balloc_ufs2
ffs_write
VOP_WRITE_APV
vnode_pager_generic_putpages
VOP_PUTPAGES
vnode_pager_putpages
vm_pageout_flush
vm_object_page_collect_flush
vm_object_page_clean
vfs_msync
sync_fsync
VOP_FSYNC_APV
sync_vnode
sched_sync
fork_exit
fork_trampoline

I've also noticed that since disabling journaling a full fsck seems to
be occurring on boot; background fsck seems to have been disabled.


When you disable journaling it also disables soft-updates.  You need to 
re-enable it.  I could decouple this.  It's hard to say which is the POLA.


Thanks,
Jeff



--
Bruce Cran


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: SUJ update

2010-05-02 Thread Jeff Roberson


On Sun, 2 May 2010, Fabien Thomas wrote:


Hi Jeff,

Before sending the 'bad' part i would like to say that it is very useful and 
save me a lot of time after a crash.

I've updated the ports and there was no more space on the FS.
It end up with this backtrace (After one reboot the kernel crashed a second 
time with the same backtrace):


When did you update?  I fixed a bug that looked just like this a day or 
two ago.


Thanks,
Jeff



(kgdb) bt
#0  doadump () at /usr/home/fabient/fabient-sandbox/sys/kern/kern_shutdown.c:245
#1  0xc0a1a8fe in boot (howto=260) at 
/usr/home/fabient/fabient-sandbox/sys/kern/kern_shutdown.c:416
#2  0xc0a1ad4c in panic (fmt=Could not find the frame base for "panic".
) at /usr/home/fabient/fabient-sandbox/sys/kern/kern_shutdown.c:590
#3  0xc0d058b3 in remove_from_journal (wk=0xc4b4aa80) at 
/usr/home/fabient/fabient-sandbox/sys/ufs/ffs/ffs_softdep.c:2204
#4  0xc0d07ebb in cancel_jaddref (jaddref=0xc4b4aa80, inodedep=0xc46bed00, 
wkhd=0xc46bed5c)
   at /usr/home/fabient/fabient-sandbox/sys/ufs/ffs/ffs_softdep.c:3336
#5  0xc0d09401 in softdep_revert_mkdir (dp=0xc46ba6cc, ip=0xc4bba244)
   at /usr/home/fabient/fabient-sandbox/sys/ufs/ffs/ffs_softdep.c:3898
#6  0xc0d37c49 in ufs_mkdir (ap=0xc8510b2c) at 
/usr/home/fabient/fabient-sandbox/sys/ufs/ufs/ufs_vnops.c:1973
#7  0xc0e7bc6e in VOP_MKDIR_APV (vop=0xc1085ea0, a=0xc8510b2c) at 
vnode_if.c:1534
#8  0xc0add64a in VOP_MKDIR (dvp=0xc485e990, vpp=0xc8510bec, cnp=0xc8510c00, 
vap=0xc8510b6c) at vnode_if.h:665
#9  0xc0add58f in kern_mkdirat (td=0xc4649720, fd=-100, path=0x804e9a0 ,
   segflg=UIO_USERSPACE, mode=448) at 
/usr/home/fabient/fabient-sandbox/sys/kern/vfs_syscalls.c:3783
#10 0xc0add2fe in kern_mkdir (td=0xc4649720, path=0x804e9a0 , segflg=UIO_USERSPACE, mode=448)
   at /usr/home/fabient/fabient-sandbox/sys/kern/vfs_syscalls.c:3727
#11 0xc0add289 in mkdir (td=0xc4649720, uap=0x0) at 
/usr/home/fabient/fabient-sandbox/sys/kern/vfs_syscalls.c:3706
#12 0xc0e5324b in syscall (frame=0xc8510d38) at 
/usr/home/fabient/fabient-sandbox/sys/i386/i386/trap.c:1116
#13 0xc0e2b3c0 in Xint0x80_syscall () at 
/usr/home/fabient/fabient-sandbox/sys/i386/i386/exception.s:261
#14 0x0033 in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb)

Regards,
Fabien



Hello,

I fixed a few SUJ bugs.  If those of you who reported one of the following bugs 
could re-test I would greatly appreciate it.

1)  panic on gnome start via softdep_cancel_link().
2)  Difficulty setting flags on /.  This can only be done from a direct boot 
into single user but there were problems with tunefs that could lead to the 
kernel and disk becoming out of sync with filesystem state.
3)  Kernel compiles without SOFTUPDATES defined in the config now work.

I have had some reports of a hang waiting on journal space with certain types 
of activity.  I have only had this reported twice and I am not able to 
reproduce no matter how much load I throw at the machine.  If you reproduce 
this please try to get a coredump or minidump.

Thanks,
Jeff
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"



___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: SUJ update - new panic - "ffs_copyonwrite: recursive call"

2010-05-02 Thread Jeff Roberson


On Sun, 2 May 2010, Vladimir Grebenschikov wrote:


Hi

While 'make buildworld'


This is a problem with snapshots and the journal full condition.  I will 
address it shortly.


Thanks,
Jeff



kgdb /boot/kernel/kernel /var/crash/vmcore.13
GNU gdb 6.1.1 [FreeBSD]
...
#0  0xc056b93c in doadump ()
(kgdb) bt
#0  0xc056b93c in doadump ()
#1  0xc0489019 in db_fncall ()
#2  0xc0489411 in db_command ()
#3  0xc048956a in db_command_loop ()
#4  0xc048b3ed in db_trap ()
#5  0xc05985a4 in kdb_trap ()
#6  0xc06f8b5e in trap ()
#7  0xc06dd6eb in calltrap ()
#8  0xc059870a in kdb_enter ()
#9  0xc056c1d1 in panic ()
#10 0xc066d602 in ffs_copyonwrite ()
#11 0xc068742a in ffs_geom_strategy ()
#12 0xc05d8955 in bufwrite ()
#13 0xc0686e64 in ffs_bufwrite ()
#14 0xc067a8a2 in softdep_sync_metadata ()
#15 0xc068c568 in ffs_syncvnode ()
#16 0xc0681425 in softdep_prealloc ()
#17 0xc066592a in ffs_balloc_ufs2 ()
#18 0xc066a252 in ffs_snapblkfree ()
#19 0xc065eb9a in ffs_blkfree ()
#20 0xc0673de0 in freework_freeblock ()
#21 0xc06797c7 in handle_workitem_freeblocks ()
#22 0xc0679aaf in process_worklist_item ()
#23 0xc06821f4 in softdep_process_worklist ()
#24 0xc0682940 in softdep_flush ()
#25 0xc0542a00 in fork_exit ()
#26 0xc06dd760 in fork_trampoline ()
(kgdb) x/s panicstr
0xc07c2b80:  "ffs_copyonwrite: recursive call"
(kgdb)



--
Vladimir B. Grebenschikov
v...@fbsd.ru


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

HEADS UP: Required kernel config file change soon

2003-01-25 Thread Jeff Roberson

I'm about to commit code that will make one of :

options SCHED_4BSD

or

options SCHED_ULE

mandatory.  This will go in a few hours from now.  I will update all of
the standard config files to account for this change.  SCHED_4BSD selects
the old scheduler in case you're wondering.  Failure to specify one of the
two or specifying both will generate compile errors.  I will add code to a
header file at some point to check this at make depend time.

Cheers,
Jeff


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: HEADS UP: Required kernel config file change soon

2003-01-25 Thread Jeff Roberson

Ok, this has been commited.  Expect build breakage if you don't add one of
these options or use GENERIC.

Cheers,
Jeff

On Sun, 26 Jan 2003, Jeff Roberson wrote:

> I'm about to commit code that will make one of :
>
> options SCHED_4BSD
>
> or
>
> options SCHED_ULE
>
> mandatory.  This will go in a few hours from now.  I will update all of
> the standard config files to account for this change.  SCHED_4BSD selects
> the old scheduler in case you're wondering.  Failure to specify one of the
> two or specifying both will generate compile errors.  I will add code to a
> header file at some point to check this at make depend time.
>
> Cheers,
> Jeff
>
>
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-current" in the body of the message
>


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: SCHED_ULE and priorities

2003-02-04 Thread Jeff Roberson

On Tue, 4 Feb 2003, Kris Kennaway wrote:

> On Tue, Feb 04, 2003 at 09:54:23PM -0800, Steve Kargl wrote:
> > On Tue, Feb 04, 2003 at 09:38:18PM -0800, Kris Kennaway wrote:
> > > I just booted a kernel with SCHED_ULE.  It looks like there's a pretty
> > > serious bug:
> > >
> > >   PID USERNAME PRI NICE   SIZERES STATETIME   WCPUCPU COMMAND
> > >   573 dnetc139   20  1000K   804K RUN  1:29 85.94% 85.94% dnetc
> > >   661 kris  960  2252K  1496K RUN  0:00  6.25%  6.25% top
> > >   590 root  960 28620K 28128K select   0:04  3.12%  3.12% XFree86
> > >   641 root 1200  4856K  4744K RUN  0:03  3.12%  3.12% make
> > >
> > > The make you see there was a 'make -j4' in /usr/src/secure.  It has
> > > been sitting there for about 5 minutes having done nothing other than:
> > >
> >
> > Kris,
> >
> > How older is your src/ directory?  I reported a similar
> > problem to Jeff right after he committed ULE.  He
> > "fixed" the problem a couple days laters.  I put "fixed"
> > in quotes because after his fixes the system experienced
> > 2 stalls under heavy load, but I couldn't prove it was
> > ULE related (David Xu's KSE commit may have been involved
> > in the stalls).
>
> I cvsupped and built kernel tonight.
>

I may have broken the nice stuff when I was fixing interactivity.  Or
maybe I broke nice and interactivity when I was fixing SMP case. ;-)
Anyway, I have a few ideas.  \I'm going to play some more with the
SCHED_STRICT_RESCHED stuff and make it automatic.  I have some scripts
that prove the nice computations as well.  I'll rerun those.

Cheers,
Jeff


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Two witness panics in vfs_bio

2003-02-10 Thread Jeff Roberson

On Mon, 10 Feb 2003, Kris Kennaway wrote:

> *Grump* I can't get my boxes to stay up more than a few
> minutes..evidently this code was not tested prior to commit.
>
> So much for getting work done on the package cluster today.
>

It was tested.  I ran it on my desktop and did several buildworlds on an
smp machine.  I should have let it kick around for a bit longer than a few
days, I agree.  I will commit the fix in just a moment.

Cheers,
Jeff

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Two witness panics in vfs_bio

2003-02-10 Thread Jeff Roberson

On Mon, 10 Feb 2003, Kris Kennaway wrote:
> > It was tested.  I ran it on my desktop and did several buildworlds on an
> > smp machine.  I should have let it kick around for a bit longer than a few
> > days, I agree.  I will commit the fix in just a moment.
>
> Thanks!
>

Yeah, I really am sorry about the trouble.  I didn't test in any low
memory situations or with slow enough disks.  The problems that came up
were due to this.  My machine was able to sync fast enough to avoid buffer
starvation.

Cheers,
Jeff

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Still problems with ULE

2003-02-10 Thread Jeff Roberson

On Mon, 10 Feb 2003, Kris Kennaway wrote:

> I gave ULE another try just now, following your recent commits, and
> I'm seeing even worse problems:
>
> At boot time when the X server is loading, disk activity occurs
> briefly about once every 2 seconds; the mouse is active briefly at the
> same time, and nothing much else happens for about a minute until the
> entire system deadlocks.

Very weird.  Is this on UP or SMP?  I'm still working on a couple of
issues with ule.  I think I am very happy with the dynamic priority
selection now but in the process the slice size selection got kinda dirty.
Its pretty much bug for bug compatible with the old scheduler's context
switching decisions but the errors there are more serious with this
design.  I also think the max slice size is way too high now.  I was
experimenting with that and I accidentally checked it in.

I think I know what the remaining problems are.  I'll make a post on
current@ when they're all sorted out.

Thanks for testing!

Cheers,
Jeff

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

=?x-unknown?q?Re=3A_Diskless_b=F8rked?=

2003-02-25 Thread Jeff Roberson

Please try this:
Index: nfs_vnops.c
===
RCS file: /home/ncvs/src/sys/nfsclient/nfs_vnops.c,v
retrieving revision 1.195
diff -u -r1.195 nfs_vnops.c
--- nfs_vnops.c 25 Feb 2003 03:37:47 -  1.195
+++ nfs_vnops.c 25 Feb 2003 08:31:54 -
@@ -2812,9 +2812,9 @@
panic("nfs_fsync: not dirty");
if ((passone || !commit) && (bp->b_flags & B_NEEDCOMMIT))
{
BUF_UNLOCK(bp);
-   VI_LOCK(vp);
continue;
}
+   VI_UNLOCK(vp);
bremfree(bp);
if (passone || !commit)
bp->b_flags |= B_ASYNC;


On Tue, 25 Feb 2003, Poul-Henning Kamp wrote:

>
> recursed on non-recursive lock (sleep mutex) vnode interlock @ ../../../kern/vfs
> _subr.c:1897
> first acquired @ ../../../nfsclient/nfs_vnops.c:2786
> panic: recurse
> Debugger("panic")
> Stopped at  0xc0405cde = Debugger+0x7e: xchgl   %ebx,0xc05aece0 = in_Deb
> ugger.0
> db> trace
> Debugger(c0466931,c04ff7e0,c0469426,d7aad8f8,1) at 0xc0405cde = Debugger+0x7e
> panic(c0469426,c0470d18,ae2,c046d74d,769) at 0xc022cd7d = panic+0x11d
> witness_lock(c418eb68,8,c046d74d,769,269) at 0xc0263853 = witness_lock+0x643
> _mtx_lock_flags(c418eb68,0,c046d74d,769,ce537610) at 0xc021ca1a = _mtx_lock_flag
> s+0x11a
> reassignbuf(ce537610,c418eb68,269,ce537610,ce537610) at 0xc02b7154 = reassignbuf
> +0xa4
> bundirty(ce537610,29c,d7aad9d0,c021cb22,c053d260) at 0xc029b185 = bundirty+0x85
> nfs_writebp(ce537610,1,c3f67c30,29c,d7aadaa0) at 0xc032e30d = nfs_writebp+0x9d
> nfs_bwrite(ce537610,12,0,ae2,c02652be) at 0xc0314631 = nfs_bwrite+0x31
> nfs_flush(c418eb68,c14fd280,1,c3f67c30,1) at 0xc032ddf3 = nfs_flush+0xb13
> nfs_fsync(d7aadb04,0,c046d74d,460,264) at 0xc032d2bf = nfs_fsync+0x3f
> vinvalbuf(c418eb68,1,c14fd280,c3f67c30,0) at 0xc02b48bf = vinvalbuf+0x16f
> nfs_vinvalbuf(c418eb68,1,c14fd280,c3f67c30,1) at 0xc0317af5 = nfs_vinvalbuf+0x25
> 5
> nfs_close(d7aadb94,c0482180,c418eb68,a,c14fd280) at 0xc0324090 = nfs_close+0xe0
> vn_close(c418eb68,a,c14fd280,c3f67c30,d7aadc34) at 0xc02c8695 = vn_close+0x65
> vn_closefile(c415b528,c3f67c30,c046412c,765,0) at 0xc02c9d9e = vn_closefile+0x3e
> fdrop_locked(c415b528,c3f67c30,c046412c,69f,4002) at 0xc01ff7cc = fdrop_locked+0
> x1fc
> fdrop(c415b528,c3f67c30,c413de34,0,4002) at 0xc01ff00a = fdrop+0x5a
> closef(c415b528,c3f67c30,c046412c,350,c415b528) at 0xc01fef8c = closef+0x1bc
> close(c3f67c30,d7aadd10,c047a999,407,1) at 0xc01fc73e = close+0x21e
> syscall(2f,2f,2f,0,0) at 0xc0422c0e = syscall+0x45e
> Xint0x80_syscall() at 0xc04084ad = Xint0x80_syscall+0x1d
> --- syscall (6, FreeBSD ELF32, close), eip = 0x804afdb, esp = 0xbfbff81c, ebp =
> 0xbfbff828 ---
> db>
>
> --
> Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
> [EMAIL PROTECTED] | TCP/IP since RFC 956
> FreeBSD committer   | BSD since 4.3-tahoe
> Never attribute to malice what can adequately be explained by incompetence.
>
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-current" in the body of the message
>


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: =?x-unknown?q?Re=3A_Diskless_b=F8rked?=

2003-02-25 Thread Jeff Roberson

This patch worked.
On Tue, 25 Feb 2003, Jeff Roberson wrote:

> Please try this:
> Index: nfs_vnops.c
> ===
> RCS file: /home/ncvs/src/sys/nfsclient/nfs_vnops.c,v
> retrieving revision 1.195
> diff -u -r1.195 nfs_vnops.c
> --- nfs_vnops.c 25 Feb 2003 03:37:47 -  1.195
> +++ nfs_vnops.c 25 Feb 2003 08:31:54 -
> @@ -2812,9 +2812,9 @@
> panic("nfs_fsync: not dirty");
> if ((passone || !commit) && (bp->b_flags & B_NEEDCOMMIT))
> {
> BUF_UNLOCK(bp);
> -   VI_LOCK(vp);
> continue;
> }
> +   VI_UNLOCK(vp);
> bremfree(bp);
> if (passone || !commit)
> bp->b_flags |= B_ASYNC;
>
>
> On Tue, 25 Feb 2003, Poul-Henning Kamp wrote:
>
> >
> > recursed on non-recursive lock (sleep mutex) vnode interlock @ ../../../kern/vfs
> > _subr.c:1897
> > first acquired @ ../../../nfsclient/nfs_vnops.c:2786
> > panic: recurse
> > Debugger("panic")
> > Stopped at  0xc0405cde = Debugger+0x7e: xchgl   %ebx,0xc05aece0 = in_Deb
> > ugger.0
> > db> trace
> > Debugger(c0466931,c04ff7e0,c0469426,d7aad8f8,1) at 0xc0405cde = Debugger+0x7e
> > panic(c0469426,c0470d18,ae2,c046d74d,769) at 0xc022cd7d = panic+0x11d
> > witness_lock(c418eb68,8,c046d74d,769,269) at 0xc0263853 = witness_lock+0x643
> > _mtx_lock_flags(c418eb68,0,c046d74d,769,ce537610) at 0xc021ca1a = _mtx_lock_flag
> > s+0x11a
> > reassignbuf(ce537610,c418eb68,269,ce537610,ce537610) at 0xc02b7154 = reassignbuf
> > +0xa4
> > bundirty(ce537610,29c,d7aad9d0,c021cb22,c053d260) at 0xc029b185 = bundirty+0x85
> > nfs_writebp(ce537610,1,c3f67c30,29c,d7aadaa0) at 0xc032e30d = nfs_writebp+0x9d
> > nfs_bwrite(ce537610,12,0,ae2,c02652be) at 0xc0314631 = nfs_bwrite+0x31
> > nfs_flush(c418eb68,c14fd280,1,c3f67c30,1) at 0xc032ddf3 = nfs_flush+0xb13
> > nfs_fsync(d7aadb04,0,c046d74d,460,264) at 0xc032d2bf = nfs_fsync+0x3f
> > vinvalbuf(c418eb68,1,c14fd280,c3f67c30,0) at 0xc02b48bf = vinvalbuf+0x16f
> > nfs_vinvalbuf(c418eb68,1,c14fd280,c3f67c30,1) at 0xc0317af5 = nfs_vinvalbuf+0x25
> > 5
> > nfs_close(d7aadb94,c0482180,c418eb68,a,c14fd280) at 0xc0324090 = nfs_close+0xe0
> > vn_close(c418eb68,a,c14fd280,c3f67c30,d7aadc34) at 0xc02c8695 = vn_close+0x65
> > vn_closefile(c415b528,c3f67c30,c046412c,765,0) at 0xc02c9d9e = vn_closefile+0x3e
> > fdrop_locked(c415b528,c3f67c30,c046412c,69f,4002) at 0xc01ff7cc = fdrop_locked+0
> > x1fc
> > fdrop(c415b528,c3f67c30,c413de34,0,4002) at 0xc01ff00a = fdrop+0x5a
> > closef(c415b528,c3f67c30,c046412c,350,c415b528) at 0xc01fef8c = closef+0x1bc
> > close(c3f67c30,d7aadd10,c047a999,407,1) at 0xc01fc73e = close+0x21e
> > syscall(2f,2f,2f,0,0) at 0xc0422c0e = syscall+0x45e
> > Xint0x80_syscall() at 0xc04084ad = Xint0x80_syscall+0x1d
> > --- syscall (6, FreeBSD ELF32, close), eip = 0x804afdb, esp = 0xbfbff81c, ebp =
> > 0xbfbff828 ---
> > db>
> >
> > --
> > Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
> > [EMAIL PROTECTED] | TCP/IP since RFC 956
> > FreeBSD committer   | BSD since 4.3-tahoe
> > Never attribute to malice what can adequately be explained by incompetence.
> >
> > To Unsubscribe: send mail to [EMAIL PROTECTED]
> > with "unsubscribe freebsd-current" in the body of the message
> >
>
>
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-current" in the body of the message
>


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Please test: cluster locking patch.

2003-03-02 Thread Jeff Roberson

I have a patch that should clear up buf locking issues and race conditions
in vfs_cluster.c.  Since this code is so tricky I'd like to have a few
people test it.  You should notice no difference in your system
performance or behavior.

Please see:  http://www.chesapeake.net/~jroberson/cluster.diff

I will post on arch about the contents of the patch.

Cheers,
Jeff


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Please test: cluster locking patch.

2003-03-03 Thread Jeff Roberson

Found a bug.  Please update your source from the same location if you
previously applied this patch.

On Sun, 2 Mar 2003, Jeff Roberson wrote:

> I have a patch that should clear up buf locking issues and race conditions
> in vfs_cluster.c.  Since this code is so tricky I'd like to have a few
> people test it.  You should notice no difference in your system
> performance or behavior.
>
> Please see:  http://www.chesapeake.net/~jroberson/cluster.diff
>
> I will post on arch about the contents of the patch.
>
> Cheers,
> Jeff
>
>
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-current" in the body of the message
>


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

SCHED_ULE ok again. feedback please?

2003-03-03 Thread Jeff Roberson

I'm using SCHED_ULE on my laptop now.  My recent round of fixes seems to
have helped out.  I'm getting good interactive performance.  I'm doing the
following:

nice -5'd for (;;) {} process.
make -j4 buildworld

Mozilla, pine, irc, screen, vi, etc.

All interactive tasks are very responsive.  My nice -5'd looping process
is getting 70% of the cpu and my compile is taking the rest.  nice +20 may
not behave as well as in sched_4bsd right now.  I'm going to work on that.

This is on a 2ghz laptop though so your mileage may vary.  Use reports are
welcome.

Interactivity suffered so much over the last few weeks because I changed
the mechanism that determines interactivity and that impacts slice
assignment and priorities.  It took me a while to get it right but it
solved a major drawback with the old scheme.  I do not anticipate any
major rework on this part of the scheduler now.  It should only be tuning.

One thing that I'm looking for feedback on specifically is expensive but
interactive applications.  I'm thinking of office programs or mozilla on a
slow machine.  Do this while running a compile or a compute bound task.

Thanks,
Jeff


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: HEADSUP: UMA not reentrant / possible memory leak

2003-07-29 Thread Jeff Roberson

On Wed, 30 Jul 2003 [EMAIL PROTECTED] wrote:

> > The indication of this is that the g_bio zone does not return to
> > zero USED as it should.
>
> It looks like z->uz_cachefree is slightly out of date (updated in
> zone_timout() every 20th second) and often too low (not taking the
> z->uz_full_bucket list into account).
>
> The enclosed patch recalculates the number of free elements on the
> buckets instead of using z->uz_cachefree.
>

I definitely like this patch.  If it works would you please commit it?
There are other issues for sure though.  UMA can leak pages from zones
when they are destroyed.  I'm going to look into this asap.

Cheers,
Jeff

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: make buildkernel hang with SCHED_ULE

2003-08-14 Thread Jeff Roberson


On Thu, 14 Aug 2003, Adam Migus wrote:

> Andrew Gallatin wrote:
>
> >Adam Migus writes:
> > > Folks,
> > > While doing some performance analysis (doing make -j5 buildkernel)
> > > on a set of 14 kernels I've hit one using the SCHED_ULE scheduler
> > > that hangs.   It happens every time but not necessarily in the same
> > > place in the make.
> > >
> >
> ><...>
> >
> > > The hardware is a dual Xeon box.  The kernel is SMP w/ SCHED_ULE
> > > instead of SCHED_4BSD, the options required for diskless and the
> > > following two options:
> >
> >You have machdep.hlt_logical_cpus: 1 in your sysctl output.  [BTW,
> >lots of people read this mail via the web archives at
> >http://docs.freebsd.org/cgi/getmsg.cgi?fetch=1073654+0+current/freebsd-current,
> >where its impossible to view mime; it would be MUCH better for us if
> >appended things like stack traces and sysctl output rather then
> >scrambling them for no reason]
> >
> >SCHED_ULE is incompatible with halting logical CPUs.  Something about
> >it does't know the core isn't running, so it schedules a job there
> >which never runs, and then it gets confused.  When I boot a 1 CPU P4
> >with an SMP kernel and machdep.hlt_logical_cpus=1, it hangs before
> >making it to multiuser mode..
> >
> >Try setting machdep.hlt_logical_cpus=0 (via sysctl now, and in
> >/boot/loader.conf so it doesn't happen again).
> >
> >
> >Drew
> >
> >
>
> Andrew,
> WRT the mime thing.  My apologies.  It never occured to me as everyone I
> know personally uses a "real" mail reader.  I'd attached them simply to
> keep the scrolling down and allow order independant viewing.  Thanks for
> the tip.  I'll just read them in as plain text in the future.
>
> WRT the sysctl value.  Thanks for the tip.  Is this to be considered a
> bug in SCHED_ULE?  If the default is hlt_logical_cpus=1 I would think
> the scheduler should be able to handle it or deal with it
> appropriately.  Perhaps ignoring the value, setting it to 0 internally
> or even just putting a warning message on boot?  After all, not everyone
> RTFM's.  :-)
>

The MD code does not currently export the status of the CPUs in any
reliable way.  ULE attempts to recognize halted CPUs but it is not able to
due to other issues.  I think john baldwin might be solving this for x86.
If not I can take a stab at it again.

> Thanks again,
>
> --
> Adam - Migus Dot Org (http://www.migus.org)
>
>
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: panic: softdep_deallocate_dependencies: dangling deps

2003-08-31 Thread Jeff Roberson


On Sun, 31 Aug 2003, Christian Brueffer wrote:

> On Sun, Aug 31, 2003 at 05:28:10AM -0400, Jeff Roberson wrote:
> > On Sun, 31 Aug 2003, Christian Brueffer wrote:
> >
> > > Hi,
> > >
> > > got a panic on my server tonight.  Coredump available for further debuggung.
> > >
> > > FreeBSD haakonia.hitnet.rwth-aachen.de 5.1-CURRENT FreeBSD 5.1-CURRENT #6: Thu 
> > > Aug 28 00:16:19 CEST 2003
> > > [EMAIL PROTECTED]:/usr/obj/usr/src/sys/LORIEN  i386
> >
> > When are your srouces from?  Specifically, what version of vfs_bio.c do
> > you have?
> >
>
> It should be rev 1.397 of vfs_bio.c, I updated the sources just before the
> kernel build.
>
> - Christian
>

Oh, fortunately for me, it's not my fault then.  I'll try to grab the
attention of someone who can help.

Cheers,
Jeff

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: panic: softdep_deallocate_dependencies: dangling deps

2003-08-31 Thread Jeff Roberson

On Sun, 31 Aug 2003, Christian Brueffer wrote:

> Hi,
>
> got a panic on my server tonight.  Coredump available for further debuggung.
>
> FreeBSD haakonia.hitnet.rwth-aachen.de 5.1-CURRENT FreeBSD 5.1-CURRENT #6: Thu Aug 
> 28 00:16:19 CEST 2003
> [EMAIL PROTECTED]:/usr/obj/usr/src/sys/LORIEN  i386

When are your srouces from?  Specifically, what version of vfs_bio.c do
you have?

Thanks,
Jeff

>
>
> GNU gdb 5.3 (FreeBSD)
> Copyright 2002 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "i386-portbld-freebsd5.1"...
> panic: softdep_deallocate_dependencies: dangling deps
> panic messages:
> ---
> panic: softdep_deallocate_dependencies: dangling deps
> cpuid = 1; lapic.id = 0100
> boot() called on cpu#1
>
> syncing disks, buffers remaining... panic: bremfree: removing a buffer not on a queue
> cpuid = 1; lapic.id = 0100
> boot() called on cpu#1
> Uptime: 2d5h32m34s
> Dumping 511 MB
>  16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 320 336 352 
> 368 384 400 416 432 448 464 4
>  80 496
>  ---
> #0  doadump () at /usr/src/sys/kern/kern_shutdown.c:240
> 240 dumping++;
> (kgdb) bt
> #0  doadump () at /usr/src/sys/kern/kern_shutdown.c:240
> #1  0xc0212e20 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:372
> #2  0xc0213226 in panic (fmt=0xc03b2848 "bremfree: removing a buffer not on a queue")
> at /usr/src/sys/kern/kern_shutdown.c:550
> #3  0xc025a051 in bremfreel (bp=0xce6b6228) at /usr/src/sys/kern/vfs_bio.c:644
> #4  0xc0259f25 in bremfree (bp=0x0) at /usr/src/sys/kern/vfs_bio.c:626
> #5  0xc025c658 in vfs_bio_awrite (bp=0x0) at /usr/src/sys/kern/vfs_bio.c:1699
> #6  0xc030a54c in ffs_fsync (ap=0xd8361a70) at /usr/src/sys/ufs/ffs/ffs_vnops.c:268
> #7  0xc0309693 in ffs_sync (mp=0xc454d600, waitfor=2, cred=0xc150de80, 
> td=0xc040d7a0) at vnode_if.h:627
> #8  0xc027040b in sync (td=0xc040d7a0, uap=0x0) at 
> /usr/src/sys/kern/vfs_syscalls.c:142
> #9  0xc021296f in boot (howto=256) at /usr/src/sys/kern/kern_shutdown.c:281
> #10 0xc0213226 in panic (fmt=0xc03bf702 "softdep_deallocate_dependencies: dangling 
> deps")
> at /usr/src/sys/kern/kern_shutdown.c:550
> #11 0xc0306c35 in softdep_deallocate_dependencies (bp=0x0) at 
> /usr/src/sys/ufs/ffs/ffs_softdep.c:5874
> #12 0xc025b30a in brelse (bp=0xce6b6228) at /usr/src/sys/sys/buf.h:427
> #13 0xc026b93a in flushbuflist (blist=0xce6b6228, flags=0, vp=0xc60fb490, slpflag=0, 
> slptimeo=0, errorp=0x0)
> at /usr/src/sys/kern/vfs_subr.c:1277
> #14 0xc026b548 in vinvalbuf (vp=0xc60fb490, flags=0, cred=0x0, td=0x0, slpflag=0, 
> slptimeo=0)
> at /usr/src/sys/kern/vfs_subr.c:1160
> #15 0xc026e3cc in vclean (vp=0xc60fb490, flags=8, td=0xc4008000) at 
> /usr/src/sys/kern/vfs_subr.c:2577
> #16 0xc026e959 in vgonel (vp=0xc60fb490, td=0x0) at /usr/src/sys/kern/vfs_subr.c:2761
> #17 0xc026a679 in vlrureclaim (mp=0xc454d600) at /usr/src/sys/kern/vfs_subr.c:723
> #18 0xc026a8bf in vnlru_proc () at /usr/src/sys/kern/vfs_subr.c:776
> #19 0xc01ea01f in fork_exit (callout=0xc026a710 , arg=0x0, frame=0x0)
> at /usr/src/sys/kern/kern_fork.c:796
> (kgdb)
>
>
> Anyone interested?
>
> - Christian
>
> --
> Christian Brueffer[EMAIL PROTECTED]   [EMAIL PROTECTED]
> GPG Key:   http://people.freebsd.org/~brueffer/brueffer.key.asc
> GPG Fingerprint: A5C8 2099 19FF AACA F41B  B29B 6C76 178C A0ED 982D
>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: panic: softdep_lock: locking against myself

2003-09-02 Thread Jeff Roberson


On Tue, 2 Sep 2003, Christian Brueffer wrote:

> Hi,
>
> got a panic with a kernel from sources around September 1st, 8pm.
>
> Dump and debugging kernel available for further debugging.
> cg@ got the same panic on his machine.
>

This is probably my fault.  I will look into it tonight.  Until then you
could backup to sources from aug 28th or so to avoid these changes.

Thanks!
Jeff

>
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; lapic.id = 
> fault virtual address   = 0xdeadc1e6
> fault code  = supervisor read, page not present
> instruction pointer = 0x8:0xc0306f82
> stack pointer   = 0x10:0xdb832528
> frame pointer   = 0x10:0xdb832558
> code segment= base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, def32 1, gran 1
>   processor eflags= interrupt enabled, resume, IOPL = 0
>   current process = 42532 (as)
> trap number = 12
> panic: page fault
> cpuid = 0; lapic.id = 
> boot() called on cpu#0
>
> syncing disks, buffers remaining... panic: softdep_lock: locking against myself
> cpuid = 0; lapic.id = 
> boot() called on cpu#0
> Uptime: 5h39m29s
> Dumping 511 MB
>  16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 320 336 352 
> 368 384 400 416 432 448 4
>  64 480 496
>  ---
> #0  doadump () at /usr/src/sys/kern/kern_shutdown.c:240
>   240 dumping++;
> (kgdb) bt
> #0  doadump () at /usr/src/sys/kern/kern_shutdown.c:240
> #1  0xc0212d70 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:372
> #2  0xc0213176 in panic (fmt=0xc03bf620 "softdep_lock: locking against myself")
>at /usr/src/sys/kern/kern_shutdown.c:550
> #3  0xc02fde43 in acquire_lock (lk=0x0) at /usr/src/sys/ufs/ffs/ffs_softdep.c:258
> #4  0xc0303152 in initiate_write_filepage (pagedep=0xc4f9d8c0, bp=0xce661980)
>at /usr/src/sys/ufs/ffs/ffs_softdep.c:3535
> #5  0xc0302fac in softdep_disk_io_initiation (bp=0xce661980) at 
> /usr/src/sys/ufs/ffs/ffs_softdep.c:3452
> #6  0xc01c0b14 in spec_xstrategy (vp=0xc420adb0, bp=0xce661980) at 
> /usr/src/sys/sys/buf.h:416
> #7  0xc01c0cf2 in spec_specstrategy (ap=0xdb832258) at 
> /usr/src/sys/fs/specfs/spec_vnops.c:529
> #8  0xc01bfc88 in spec_vnoperate (ap=0x0) at /usr/src/sys/fs/specfs/spec_vnops.c:122
> #9  0xc0318380 in ufs_strategy (ap=0x0) at vnode_if.h:1141
> #10 0xc0319138 in ufs_vnoperate (ap=0x0) at /usr/src/sys/ufs/ufs/ufs_vnops.c:2792
> #11 0xc025a497 in bwrite (bp=0xce661980) at vnode_if.h:1116
> #12 0xc025acfc in bawrite (bp=0x0) at /usr/src/sys/kern/vfs_bio.c:1139
> #13 0xc030ab59 in ffs_fsync (ap=0xdb832350) at /usr/src/sys/ufs/ffs/ffs_vnops.c:247
> #14 0xc0309d03 in ffs_sync (mp=0xc41c3e00, waitfor=2, cred=0xc150de80, 
> td=0xc040eb20) at vnode_if.h:627
> #15 0xc027024b in sync (td=0xc040eb20, uap=0x0) at 
> /usr/src/sys/kern/vfs_syscalls.c:142
> #16 0xc02128bf in boot (howto=256) at /usr/src/sys/kern/kern_shutdown.c:281
> #17 0xc0213176 in panic (fmt=0xc039f254 "%s") at 
> /usr/src/sys/kern/kern_shutdown.c:550
> #18 0xc0373446 in trap_fatal (frame=0xdb8324e8, eva=0) at 
> /usr/src/sys/i386/i386/trap.c:818
> #19 0xc03730b2 in trap_pfault (frame=0xdb8324e8, usermode=0, eva=3735929318)
> at /usr/src/sys/i386/i386/trap.c:732
> #20 0xc0372c6d in trap (frame=
> {tf_fs = 24, tf_es = 16, tf_ds = 16, tf_edi = 0, tf_esi = -831494876, tf_ebp = 
> -612162216, tf_isp = -6
> 12162284, tf_ebx = -559038242, tf_edx = 0, tf_ecx = -1069303248, tf_eax = 0, 
> tf_trapno = 12, tf_err = 0, tf_
> eip = -1070567550, tf_cs = 8, tf_eflags = 66182, tf_esp = -1069607688, tf_ss = 
> 1})
> at /usr/src/sys/i386/i386/trap.c:417
> #21 0xc0306f82 in getdirtybuf (bpp=0xc546ebbc, mtx=0x0, waitfor=1)
> at /usr/src/sys/ufs/ffs/ffs_softdep.c:5827
> #22 0xc030601d in flush_deplist (listhead=0x0, waitfor=1, errorp=0xdb832590)
> at /usr/src/sys/ufs/ffs/ffs_softdep.c:5271
> #23 0xc0305f29 in flush_inodedep_deps (fs=0xc41ca000, ino=918604)
> at /usr/src/sys/ufs/ffs/ffs_softdep.c:5235
> #24 0xc0305977 in softdep_sync_metadata (ap=0xdb8326d4) at 
> /usr/src/sys/ufs/ffs/ffs_softdep.c:4968
> #25 0xc030ac69 in ffs_fsync (ap=0xdb8326d4) at /usr/src/sys/ufs/ffs/ffs_vnops.c:299
> #26 0xc02f61cd in ffs_truncate (vp=0xc5b25000, length=26112, flags=2052, 
> cred=0xc44c9e80, td=0xc531a980)
> at vnode_if.h:627
> #27 0xc0312b29 in ufs_direnter (dvp=0xc5b25000, tvp=0xc4dde6d8, dirp=0xdb832910, 
> cnp=0xdb832c00,
> newdirbp=0x0) at /usr/src/sys/ufs/ufs/ufs_lookup.c:966
> #28 0xc0318e4d in ufs_makeinode (mode=33188, dvp=0xc5b25000, vpp=0xdb832bec, 
> cnp=0xdb832c00)
> at /usr/src/sys/ufs/ufs/ufs_vnops.c:2541
> #29 0xc0314f89 in ufs_create (ap=0xdb832a78) at /usr/src/sys/ufs/ufs/ufs_vnops.c:199
> #30 0xc0319138 in ufs_vnoperate (ap=0x0) at /usr/src/sys/ufs/ufs/ufs_vnops.c:2792
> #31 0xc0278218 in vn_open_cred (ndp=0xdb832bd8, flagp=0xdb832cd8, cmode=420, 
> cred=0

Re: Syncer "giving up" on buffers

2003-09-02 Thread Jeff Roberson


On Tue, 2 Sep 2003, Kevin Oberman wrote:

> > Date: Tue, 2 Sep 2003 10:53:43 +0200
> > From: Jan Srzednicki <[EMAIL PROTECTED]>
> > Sender: [EMAIL PROTECTED]
> >
> > On Mon, Sep 01, 2003 at 07:53:48PM +0300, Lefteris Chatzibarbas wrote:
> > > Hello,
> > >
> > > I have a problem with kernels,  built the last couple of days, where
> > > during shutdown syncer is "giving up" on buffers.  During the next boot
> > > all filesystems are checked because of improper dismount.  Here follow
> > > the exact messages I get:
> > >
> > >   Waiting (max 60 seconds) for system process `vnlru' to stop...stopped
> > >   Waiting (max 60 seconds) for system process `bufdaemon' to stop...stopped
> > >   Waiting (max 60 seconds) for system process `syncer' to stop...stopped
> > >
> > >   syncing disks, buffers remaining... 8 8 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
> > >   giving up on 6 buffers
> > >   Uptime: 41m20s
> > >   pfs_vncache_unload(): 1 entries remaining
> > >   Shutting down ACPI
> > >   Rebooting...
> > >
> > > After some testing I found out that this does _not_ happen if I manually
> > > unmount my ext2 filesystems, before shutting down.  In this case syncer
> > > finishes without any problems.
> >
> > I confirm that, same thing happened in my case. But, I had just one
> > buffer remaining and ext2fs mounted in read-only. It seems that it's not
> > so read-only then..
>
> While this seems to impact ext2fs system, the issue of syncer failing
> on read-only volumes is also showing up in cases where ext2fs systems
> are not present. See reports over the past couple of days on this.
>
> I can't be sure that these are the same problem, but they sure do look
> like the same thing.
>

The ext2 problem is likely related to a change that I made to ext2
recently.  I always unmounted my volumes before rebooting, so I missed
this case.  The fix is simple.  I will look at it tonight.  Until then,
you just need to unmount your volumes and it should work just fine.

I don't know of any general syncer issues outside of this.  I doubt it is
related to this specific ext2fs bug though.

Cheers,
Jeff

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

softupdates panics fixed.

2003-09-02 Thread Jeff Roberson

I found the bug that I introduced around the 29th of august.  It is fixed
in ffs_softdep.c rev 1.143.  Truely, most of the leg work was done by
tegge.  I just produced and tested a patch.  This completes a buildworld
with 128M of memory now, whereas before it just completed with 64m and
512m.  perhaps only by chance.

Cheers,
Jeff

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: panic: vm_fault:

2003-09-17 Thread Jeff Roberson

On Wed, 17 Sep 2003, Florian C. Smeets wrote:

> Hi.
>
> I get this panic on a system with kernel/world from 03 September.
> Usually i only run X and xawtv on that system but when i wanted to make
> world today i got the panic:
>

This was fixed recently.  Can you cvsup and rebuild?

> Kris Kennaway reported something IMHO similar on 07/31/03
>
> panic: vm_fault: fault on nofault entry, addr: deadc000
> Debugger("panic")
> Stopped at  Debugger+0x4d:  xchgl   %ebx,in_Debugger.0
> db> trace
> Debugger(c03c1cf1,c043fd00,c03d3c82,e929f9e4,100) at Debugger+0x4d
> panic(c03d3c82,deadc000,1,e929fa80,e929fa70) at panic+0xcc
> vm_fault(c082f000,deadc000,1,0,c5db65f0) at vm_fault+0x1187
> trap_pfault(e929fb48,0,deadc1e6,c03d958f,deadc1e6) at trap_pfault+0x163
> trap(18,10,10,0,d222036c) at trap+0x2ca
> calltrap() at calltrap+0x5
> --- trap 0xc, eip = 0xc0328512, esp = 0xe929fb88, ebp = 0xe929fbb0 ---
> getdirtybuf(c5ebc8bc,0,1,1,e929fbe8) at getdirtybuf+0x22
> flush_deplist(c5ebccc4,1,e929fbe8,e929fbec,0) at flush_deplist+0x32
> flush_inodedep_deps(c5c63000,28c4,c040d6ac,c5eefb68,124) at
> flush_inodedep_deps+0x89
> softdep_sync_metadata(e929fca8,0,c03d2cc0,124,0) at
> softdep_sync_metadata+0x7e
> ffs_fsync(e929fca8,0,c03c9837,ad8,0) at ffs_fsync+0x3a9
> fsync(c5db65f0,e929fd14,c03d958f,3eb,1) at fsync+0x166
> syscall(2f,2f,2f,80a1000,0) at syscall+0x253
> Xint0x80_syscall() at Xint0x80_syscall+0x1d
> --- syscall (95, FreeBSD ELF32, fsync), eip = 0x2814ca0f, esp =
> 0xbfbff90c, ebp = 0xbfbff928 ---
> db>
>
> Are there additional infos required ?
>
> Regards,
> flo
>
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Status of SCHED_ULE?

2003-09-27 Thread Jeff Roberson

On Sat, 27 Sep 2003, Morten Rodal wrote:

> On Sat, Sep 27, 2003 at 06:47:54PM +0200, Roderick van Domburg wrote:
> > Hello everyone,
> >
> > I was wondering about the status of the ULE scheduler. Is it very
> > experimental still or is it reasonably suitable for everyday (i.e.
> > non-mission-critical) use?
> >
>
> It has improved quite a bit lately, and is now also working with KSE.
> However, the mouse will get sluggish whenever the computer is under
> bursts of load (i.e. a compile)
>
> --
> Morten Rodal
>

I have not had this experience.  Can you give me details of your machine
and the kind of load that causes slugishness?  I'll correct it as soon as
I can identify it.

Thanks,
Jeff

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Status of SCHED_ULE?

2003-09-28 Thread Jeff Roberson

On Sun, 28 Sep 2003, Morten Rodal wrote:

> On Sat, Sep 27, 2003 at 11:31:25PM -0400, Jeff Roberson wrote:
> > On Sat, 27 Sep 2003, Morten Rodal wrote:
> > > It has improved quite a bit lately, and is now also working with KSE.
> > > However, the mouse will get sluggish whenever the computer is under
> > > bursts of load (i.e. a compile)
> > >
> >
> > I have not had this experience.  Can you give me details of your machine
> > and the kind of load that causes slugishness?  I'll correct it as soon as
> > I can identify it.
> >
>
> The machine is an dual Pentium 2 300MHz, and I'm running gnome 2.4.
> I do also experience this with my computer at school, a single Pentium3
> 733MHz.
>
> The load isn't very complicated, usually just gnome 2.4 and mozilla
> firebird running.  If I then do anything that requires lots of cpu,
> like a compile of a program, the interactivity drops fast.
>
> On the dual machine I have also experienced a *HUGE* increase in the
> time for "portupgrade -ar" to complete.  I am not familiar with how
> portupgrade works, but it seems to spawn a few make's and sort's, but
> I am not sure why it is currently using 3 hours instead of 10 minutes
> to complete! (This was tested when there was no packages to upgrade,
> which shouldn't take long)
>
> Both machines (this dual and the one at school) are running with a
> libmap.conf in order to use libkse, is this perhaps affecting the
> performance of ULE?

It could be.  Can you try with libthr or libc_r and let me know?

>
> I am not sure how useful this is to you, but if you have any other
> pointers as to what I should look at just ask.
>
> --
> Morten Rodal
>
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Status of SCHED_ULE?

2003-09-28 Thread Jeff Roberson

On Sun, 28 Sep 2003, Arjan van Leeuwen wrote:

> On Sunday 28 September 2003 14:38, Matt wrote:
> > Morten Rodal wrote:
> > > On Sun, Sep 28, 2003 at 01:26:24PM +0100, Matt wrote:
> > >>Morten Rodal wrote:
> > >>>On Sat, Sep 27, 2003 at 11:31:25PM -0400, Jeff Roberson wrote:
> > >>>>On Sat, 27 Sep 2003, Morten Rodal wrote:
> > >>>>>It has improved quite a bit lately, and is now also working with KSE.
> > >>>>>However, the mouse will get sluggish whenever the computer is under
> > >>>>>bursts of load (i.e. a compile)
> > >>>>
> > >>>>I have not had this experience.  Can you give me details of your
> > >>>> machine and the kind of load that causes slugishness?  I'll correct it
> > >>>> as soon as I can identify it.
> > >>>
> > >>>The machine is an dual Pentium 2 300MHz, and I'm running gnome 2.4.
> > >>>I do also experience this with my computer at school, a single Pentium3
> > >>>733MHz.
> > >>>
> > >>>The load isn't very complicated, usually just gnome 2.4 and mozilla
> > >>>firebird running.  If I then do anything that requires lots of cpu,
> > >>>like a compile of a program, the interactivity drops fast.
> > >>>
> > >>>On the dual machine I have also experienced a *HUGE* increase in the
> > >>>time for "portupgrade -ar" to complete.  I am not familiar with how
> > >>>portupgrade works, but it seems to spawn a few make's and sort's, but
> > >>>I am not sure why it is currently using 3 hours instead of 10 minutes
> > >>>to complete! (This was tested when there was no packages to upgrade,
> > >>>which shouldn't take long)
> > >>>
> > >>>Both machines (this dual and the one at school) are running with a
> > >>>libmap.conf in order to use libkse, is this perhaps affecting the
> > >>>performance of ULE?
> > >>>
> > >>>I am not sure how useful this is to you, but if you have any other
> > >>>pointers as to what I should look at just ask.
> > >>
> > >>Are you running 5.1-release or 5.1-current?
> > >>
> > >>I ask because I have used ULE on two different kernels so far on this
> > >>box. One was 5.1-release running gnome2, mozilla, xmms. On this the
> > >>mouse stutters really badly whenever anything is being compiled.
> > >>
> > >>However on the 5.1-current kernel this behavior no longer happens and
> > >>the mouse is fine.
> > >>
> > >>I suspect ULE has had a few enhancements between the release and now.
> > >
> > > I am running 5.1-current
> > >
> > > Dual machine:
> > > FreeBSD slurp.rodal.no 5.1-CURRENT FreeBSD 5.1-CURRENT #3: Thu Sep 25
> > > 04:03:23 CEST 2003 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/slurp
> > > i386
> > >
> > > School computer:
> > > FreeBSD hauk10.idi.ntnu.no 5.1-CURRENT FreeBSD 5.1-CURRENT #2: Fri Sep 26
> > > 09:12:55 CEST 2003 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/hauk10
> > > i386
> >
> > Ahh I tell you the other difference. I had a USB mouse when I tried ULE
> > with 5.1-release and it stuttered. It's just a ps2 one on the current
> > kernel where it's not stuttering.
> >
> > Matt.
>
> I have a PS/2 mouse, I run -CURRENT from 2 days ago, and I experience the
> stuttering too.
>
> It happens when compiling stuff, when loading complicated pages in Mozilla
> Firebird, and when logging out of GNOME 2.4 (the 'background fade' animation
> brings my Athlon XP 2000+ to its knees when I use SCHED_ULE).
>
> Arjan
>

Gnome seems to be a common theme.  Are you also using libkse?  There could
be some interaction there.

Thanks,
Jeff

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Status of SCHED_ULE?

2003-09-29 Thread Jeff Roberson

On Tue, 30 Sep 2003, Aggelos Economopoulos wrote:

> On Monday 29 September 2003 08:05, Jeff Roberson wrote:
> > On Sun, 28 Sep 2003, Arjan van Leeuwen wrote:
> [...]
> > > It happens when compiling stuff, when loading complicated pages in
> > > Mozilla Firebird, and when logging out of GNOME 2.4 (the 'background
> > > fade' animation brings my Athlon XP 2000+ to its knees when I use
> > > SCHED_ULE).
> > >
> > > Arjan
> >
> > Gnome seems to be a common theme.  Are you also using libkse?  There could
> > be some interaction there.
>
> I'm experiencing similar stuff with kde + libthr (kernel built with sources
> from 18/9).
>
> Aggelos
>
Are you running seti, rc4, etc?  Any programs that sit in the background
and consume 100% of the cpu?


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

SCHED_ULE

2003-09-29 Thread Jeff Roberson

There seems to have been some regression in the interactivity of
SCHED_ULE since I was last measuring it.  I'll send a follow up mail when
I have found and fixed the issue.

Thanks,
Jeff

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: getdirtybuf: interlock not locked but should be

2003-09-30 Thread Jeff Roberson

On Wed, 1 Oct 2003, Garrett Wollman wrote:

> I'm working on getting the AFS client to work under FreeBSD.  I just
> compiled a -current kernel with DEBUG_VFS_LOCKS, and before I could
> even load the AFS module I had the system stop with the following
> locking assertion:
>
> getdirtybuf: 0xc2678000 interlock is not locked but should be

This is my fault.  YOu are safe to comment out this check for now.  I need
to better understand the softupdates code before it is really valid.

Jeff

>
> Backtrace looks like:
>
> getdirtybuf(de17cbb4, 0, 1, c7732ba0, 1) +0xee
> flush_deplist(c268ad4c, 1, de17cbdc, de17cbe0, 0) +0x43
> flush_inodedep_deps(c267,1ab,,c26ed000,124) +0xa3
> softdep_sync_metadata(de17cca4, 0, c037b672, 124, 0) +0x87
> ffs_fsync(de17cca4, c03714ea, c0373416, ad8, 0) +0x3b9
> fsync(c25d7850, de17cd10, c038276b, 3ec, 1) +0x1d4
> syscall() ...
>
> One vnode is locked:
> 0xc26ed000: tag ufs, type VREG, usecount 1, writecount 1, refcount 1,
> flags (VV_OBJBUF), lock type ufs: EXCL (count 1) by thread 0xc25d7850
>   ino 427, on dev ad0s1a (4, 13)
>
> This is repeated four times with the same vnode.  Obviously, it would
> help to have a solution to this problem so that I can debug what I'm
> really interested in rather than worrying about UFS.
>
> -GAWollman
>
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Sched_Ule

2003-10-09 Thread Jeff Roberson

On Thu, 9 Oct 2003, Evan Dower wrote:

> I ran with SCHED_ULE for a couple days recently and had trouble beyond just
> sluggishness. When doing really intensive tasks such as buildworld or
> installworld, the computer would actually stall. The first time was
> immediately after booting single user after building the world and kernel. I
> started and installworld, and it hung there until I did a hard reset.
> Horrible timing for it, but such is life. Later (after I fixed the damage
> from the partial install) it hung when doing my buildworld, so I booted up
> on a different (SCHED_4BSD) kernel to do the buildworld (subsequently
> switching back to SCHED_4BSD). If you want any particulars about my system
> just let me know.

Do you have P4's with hyper threading?


> --
> Evan Dower
> Undergraduate, Computer Science
> University of Washington
> Public key: http://students.washington.edu/evantd/pgp-pub-key.txt
> Key fingerprint = D321 FA24 4BDA F82D 53A9  5B27 7D15 5A4F 033F 887D
>
>
>
>
> >From: Scott Sipe <[EMAIL PROTECTED]>
> >To: [EMAIL PROTECTED]
> >Subject: Sched_Ule
> >Date: Thu, 9 Oct 2003 00:28:59 -0400 (EDT)
> >
> >
> >Hi,
> >
> >I see some posts from late Sept about people having issues with SCHED_ULE.
> >I just wanted to add in that I am having the exact same problems.  In
> >short:
> >
> >Anything that seems disk intensive: bzip2 (unbzip2ing one big file makes
> >this happen), making world, building ports, etc makes my X environment
> >practically unusable.  Mouse stutters, reaction times is very slow, feels
> >10x more sluggish than normal.  (I'm running KDE if anyone is curious).
> >
> >I rebuilt my kernel today (running yesterday's world) with SCHED_4BSD
> >instead, and things are much better.  System is much much more responsive.
> >
> >If there's any tests I can run, or anyone I should talk to, I'd love to be
> >of assistance,
> >
> >thanks much,
> >Scott
> >___
> >[EMAIL PROTECTED] mailing list
> >http://lists.freebsd.org/mailman/listinfo/freebsd-current
> >To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>
> _
> Instant message with integrated webcam using MSN Messenger 6.0. Try it now
> FREE!  http://msnmessenger-download.com
>
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Sched_Ule

2003-10-09 Thread Jeff Roberson

On Thu, 9 Oct 2003, Sheldon Hearn wrote:

> On (2003/10/09 00:28), Scott Sipe wrote:
>
> > Anything that seems disk intensive: bzip2 (unbzip2ing one big file makes
> > this happen), making world, building ports, etc makes my X environment
> > practically unusable.  Mouse stutters, reaction times is very slow, feels
> > 10x more sluggish than normal.  (I'm running KDE if anyone is curious).
>
> A number of us are seeing this problem, and not all of us are entry
> level end-users.  I'm using a single PIII with 1GB of RAM and maxusers
> 0.  No Hyper-threading, nothing interesting in the kernel (apart from
> I686_CPU only, KTRACE and _KPOSIX_PRIORITY_SCHEDULING).
>
> The problem (as I recall) is that Jeff hasn't received reports from
> people who can dig into the problem and have the time to do so.
>
> For example, I'm pretty sure I could at least point a finger at the
> problem if I had time.  But I'm under heavy pressure, and so the only
> solution that's feasible for me is to just switch to SCHED_4BSD and keep
> moving.
>
> What surprises me is that Jeff can't reproduce it.
>
> For me, the sluggish mouse problem manifests under these conditions:
>
> 1) Use a USB mouse, not a PS2 mouse.

Is this _only_ with usb?

> 2) SCHED_ULE in the kernel.
> 3) make buildworld (no -j necessary, but -k exacerbates the problem).
> 4) Fiddle around in X (no particular window manager required).
>
> Ciao,
> Sheldon.
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

ULE Update

2003-10-10 Thread Jeff Roberson

I have reproduced the lagging mouse issue on my laptop.  I tried moused to
no effect.  Eventually, I grudgingly installed kde and immediately started
encountering problems with mouse lag.  It would seem that twm was not
stressing my machine in the same ways that kde is. ;-)

I suspect a problem with IPC.  I will know more soon.

There have also been a few reports of problems related to nice.  I was
able to reproduce some awkward behavior but I have nothing conclusive yet.

There is still a known issue with hyperthreading.  I'm waiting on some of
john baldwin's work to fix this.  If you halt logical cpus your machine
will hang.

Expect some resolution on the ULE problems within a week or so.  Thanks
for the detailed bug reports everyone.

Cheers,
Jeff

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Interesting...sched_ule discussion

2003-10-11 Thread Jeff Roberson

On Sat, 11 Oct 2003, Brendon and Wendy wrote:

> Hi,
>
> Just saw the talk about sched_ule, nvidia driver, moused and pauses...
>
> I was running -current up until about a month ago, using the nvidia
> driver, sched_bsd on a dual ht xeon, with htt disabled. Mouse
> interactivity with moused was terrible - I actually thought the mouse
> was faulty. Getting rid of moused and using psm0 was better, but not
> hugely so.
>
> I found that under kde, things were "ok" but under nautilus the system
> was bounding on unusable.

What kind of hardware do you have?  Were you running with WITNESS and
INVARIANTS?

>
> By contrast, under linux things are "just fine".

What version of the linux kernel did you switch to?

>
> Maybe this will turn out to be useful data for you...

Could be, thanks.

>
> Thanks,
> Brendon
>
>
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

ULE status; interactivity fixed? nice uninvestigated, HTT broken

2003-10-12 Thread Jeff Roberson

I commited a fix that would have caused all of the jerky behaviors under
some load.  I was not able to reproduce this problem with kde running
afterwards.

I'm going to look into the reports of some problems with nice, although I
suspect that they could have been caused by the same issues.

HTT is awaiting some jhb fixes which are awaiting some UMA fixes.  I'll
give an update on that later.

Cheers,
Jeff

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: panic: softdep_deallocate_dependencies: dangling deps

2003-10-12 Thread Jeff Roberson

On Mon, 13 Oct 2003, Oliver Fischer wrote:

> My notebook was a little bit "panic" this night. After rebooting I found
>   this message in my system log:
>
>   panic: softdep_deallocate_dependencies: dangling deps
>
> ?

When are your sources from?

>
> Regards,
>
> Oliver Fischer
>
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: ULE status; interactivity fixed? nice uninvestigated, HTT broken

2003-10-13 Thread Jeff Roberson

On Mon, 13 Oct 2003, Arjan van Leeuwen wrote:

> On Sunday 12 October 2003 23:21, Jeff Roberson wrote:
> > I commited a fix that would have caused all of the jerky behaviors under
> > some load.  I was not able to reproduce this problem with kde running
> > afterwards.
>
> Thanks for the fix! However, the problem is still here for me (using rev.
> 1.58). I just noticed it when compiling Mozilla. I can also still see it when
> logging out of GNOME.

Is it somewhat better?  I specifically fixed the problem for Giant but
other locks could have the same issues.  I suspect that they are far less
frequently held without Giant, but I could be wrong.

>
> Arjan
>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: ULE status; interactivity fixed? nice uninvestigated, HTT broken

2003-10-13 Thread Jeff Roberson

On Tue, 14 Oct 2003, Arjan van Leeuwen wrote:

> On Monday 13 October 2003 21:27, Jeff Roberson wrote:
> > On Mon, 13 Oct 2003, Arjan van Leeuwen wrote:
> > > On Sunday 12 October 2003 23:21, Jeff Roberson wrote:
> > > > I commited a fix that would have caused all of the jerky behaviors
> > > > under some load.  I was not able to reproduce this problem with kde
> > > > running afterwards.
> > >
> > > Thanks for the fix! However, the problem is still here for me (using rev.
> > > 1.58). I just noticed it when compiling Mozilla. I can also still see it
> > > when logging out of GNOME.
> >
> > Is it somewhat better?  I specifically fixed the problem for Giant but
> > other locks could have the same issues.  I suspect that they are far less
> > frequently held without Giant, but I could be wrong.
>
> Now that I looked at it better, yes, it does indeed seem better :). It still
> seems to happen at the same places, but the jerkiness is less... jerky. the
> position of the mouse pointer is updated more often than used to be the case.

Thanks.  This feedback is very important for me to resolve this issues.  I
think I know how to solve the issue now, I'm going to make the required
changes soon.  I'll send a status update again when I do.

Cheers,
Jeff

>
> Arjan
>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

More ULE bugs fixed.

2003-10-15 Thread Jeff Roberson

I fixed two bugs that were exposed due to more of the kernel running
outside of Giant.  ULE had some issues with priority propagation that
stopped it from working very well.

Things should be much improved.  Feedback, as always, is welcome.  I'd
like to look into making this the default scheduler for 5.2 if things
start looking up.  I hope that scares you all into using it more. :-)

Cheers,
Jeff

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: More ULE bugs fixed.

2003-10-15 Thread Jeff Roberson

On Wed, 15 Oct 2003, Eirik Oeverby wrote:

> Eirik Oeverby wrote:
> > Jeff Roberson wrote:
> >
> >> I fixed two bugs that were exposed due to more of the kernel running
> >> outside of Giant.  ULE had some issues with priority propagation that
> >> stopped it from working very well.
> >>
> >> Things should be much improved.  Feedback, as always, is welcome.  I'd
> >> like to look into making this the default scheduler for 5.2 if things
> >> start looking up.  I hope that scares you all into using it more. :-)
> >
> >
> > Hi..
> > Just tested, so far it seems good. System CPU load is floored (near 0),
> > system is very responsive, no mouse sluggishness or random
> > mouse/keyboard input.
> > Doing a make -j 20 buildworld now (on my 1ghz p3 thinkpad ;), and
> > running some SQLServer stuff in VMWare. We'll see how it fares.
>
> Hi, just a followup message.
> I'm now running the buildworld mentioned above, and the system is pretty
> much unusable. It exhibits the same symptoms as I have mentioned before,
> mouse jumpiness, bogus mouse input (movement, clicks), and the system is
> generally very jerky and unresponsive. This is particularily evident
> when doing things like webpage loading/browsing/rendering, but it's
> noticeable all the time, no matter what I am doing. As an example, the
> last sentence I wote without seeing a single character on screen before
> I was finsihed writing it, and it appeared with a lot more typos than I
> usually make ;)
>
> I'm running *without* invariants and witness right now, i.e. a kernel
> 100% equal to the SCHED_4BSD kernel.

Can you confirm the revision of your sys/kern/sched_ule.c file?  How does
SCHED_4BSD respond in this same test?

Thanks,
Jeff

>
> Best regards,
> /Eirik
>
>
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: More ULE bugs fixed.

2003-10-15 Thread Jeff Roberson

On Wed, 15 Oct 2003, Daniel Eischen wrote:

> On Wed, 15 Oct 2003, Jeff Roberson wrote:
>
> > I fixed two bugs that were exposed due to more of the kernel running
> > outside of Giant.  ULE had some issues with priority propagation that
> > stopped it from working very well.
> >
> > Things should be much improved.  Feedback, as always, is welcome.  I'd
> > like to look into making this the default scheduler for 5.2 if things
> > start looking up.  I hope that scares you all into using it more. :-)
>
> Before you do that, can you look into changing the scheduler
> interfaces to address David Xu's concern with it being
> suboptimal for KSE processes?

Certainly, it may not happen if I can't find out what's making things so
jerky for gnome/kde users.  If it looks like it will, I'll investigate the
kse issues.

>
> --
> Dan Eischen
>


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: More ULE bugs fixed.

2003-10-16 Thread Jeff Roberson

On Thu, 16 Oct 2003, Eirik Oeverby wrote:

> Jeff Roberson wrote:
> > On Wed, 15 Oct 2003, Eirik Oeverby wrote:
> >
> >
> >>Eirik Oeverby wrote:
> >>
> >>>Jeff Roberson wrote:
> >>>
> >>>
> >>>>I fixed two bugs that were exposed due to more of the kernel running
> >>>>outside of Giant.  ULE had some issues with priority propagation that
> >>>>stopped it from working very well.
> >>>>
> >>>>Things should be much improved.  Feedback, as always, is welcome.  I'd
> >>>>like to look into making this the default scheduler for 5.2 if things
> >>>>start looking up.  I hope that scares you all into using it more. :-)
> >>>
> >>>
> >>>Hi..
> >>>Just tested, so far it seems good. System CPU load is floored (near 0),
> >>>system is very responsive, no mouse sluggishness or random
> >>>mouse/keyboard input.
> >>>Doing a make -j 20 buildworld now (on my 1ghz p3 thinkpad ;), and
> >>>running some SQLServer stuff in VMWare. We'll see how it fares.
> >>
> >>Hi, just a followup message.
> >>I'm now running the buildworld mentioned above, and the system is pretty
> >>much unusable. It exhibits the same symptoms as I have mentioned before,
> >>mouse jumpiness, bogus mouse input (movement, clicks), and the system is
> >>generally very jerky and unresponsive. This is particularily evident
> >>when doing things like webpage loading/browsing/rendering, but it's
> >>noticeable all the time, no matter what I am doing. As an example, the
> >>last sentence I wote without seeing a single character on screen before
> >>I was finsihed writing it, and it appeared with a lot more typos than I
> >>usually make ;)
> >>
> >>I'm running *without* invariants and witness right now, i.e. a kernel
> >>100% equal to the SCHED_4BSD kernel.
> >
> >
> > Can you confirm the revision of your sys/kern/sched_ule.c file?  How does
> > SCHED_4BSD respond in this same test?
>
> Yes I can. From file:
> __FBSDID("$FreeBSD: src/sys/kern/sched_ule.c,v 1.59 2003/10/15 07:47:06
> jeff Exp $");
> I am running SCHED_4BSD now, with a make -j 20 buildworld running, and I
> do not experience any of the problems. Keyboard and mouse input is
> smooth, and though apps run slightly slower due to the massive load on
> the system, there is none of the jerkiness I have seen before.
>
> Anything else I can do to help?

Yup, try again. :-)  I found another bug and tuned some parameters of the
scheduler.  The bug was introduced after I did my paper for BSDCon and so
I never ran into it when I was doing serious stress testing.

Hopefully this will be a huge improvement.  I did a make -j16 buildworld
and used mozilla while in kde2.  It was fine unless I tried to scroll
around rapidly in a page full of several megabyte images for many minutes.

>
> /Eirik
>
> > Thanks,
> > Jeff
> >
> >
> >>Best regards,
> >>/Eirik
> >>
> >>
> >>___
> >>[EMAIL PROTECTED] mailing list
> >>http://lists.freebsd.org/mailman/listinfo/freebsd-current
> >>To unsubscribe, send any mail to "[EMAIL PROTECTED]"
> >>
> >
> >
> > ___
> > [EMAIL PROTECTED] mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-current
> > To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>
>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Page faults with today's current

2003-10-16 Thread Jeff Roberson

On Thu, 16 Oct 2003, Arjan van Leeuwen wrote:

> I just cvsupped and installed a new world and kernel (previous kernel was from
> October 13), and now my machine gets a page fault when I try to run any GTK2
> application (Firebird, Gnome 2). Are others seeing this as well?
>
> Arjan

If you're running ULE and KSE I just fixed a bug with that.  If not, pleae
provide a stack trace.  You can manually transcribe one by starting a gtk2
application from a console with your DISPLAY variable set appropriately.

Thanks,
Jeff

>
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: sched_ule.c & SMP error

2003-10-16 Thread Jeff Roberson


On Thu, 16 Oct 2003, Valentin Chopov wrote:

> I'm getting an error in the sched_ule.c
>
> It looks that sched_add is called with "struct kse" arg. instead of
> "struct thread"

Fixed, thanks.

>
> Thanks,
>
> Val
>
>
> cc -c -O -pipe -march=pentiumpro -Wall -Wredundant-decls -Wnested-externs
> -Wstri
> ct-prototypes  -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual
> -fform
> at-extensions -std=c99  -nostdinc -I-  -I. -I/usr/src/sys
> -I/usr/src/sys/contrib
> /dev/acpica -I/usr/src/sys/contrib/ipfilter -I/usr/src/sys/contrib/dev/ath
> -I/us
> r/src/sys/contrib/dev/ath/freebsd -D_KERNEL -include opt_global.h
> -fno-common -f
> inline-limit=15000 -fno-strict-aliasing  -mno-align-long-strings
> -mpreferred-sta
> ck-boundary=2 -ffreestanding -Werror  /usr/src/sys/kern/sched_ule.c
> /usr/src/sys/kern/sched_ule.c: In function `kseq_move':
> /usr/src/sys/kern/sched_ule.c:465: warning: passing arg 1 of `sched_add'
> from in
> compatible pointer type
> *** Error code 1
>
> Stop in /usr/obj/usr/src/sys/MYKERNEL.
> *** Error code 1
>
> Stop in /usr/src.
> *** Error code 1
>
> Stop in /usr/src.
>
>
> ==
> Valentin S. Chopov, CC[ND]P
> Sys/Net Admin
> SEI Data Inc.
> E-Mail: [EMAIL PROTECTED]
> ==
>
>
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: More ULE bugs fixed.

2003-10-16 Thread Jeff Roberson

On Fri, 17 Oct 2003, Bruce Evans wrote:

> How would one test if it was an improvement on the 4BSD scheduler?  It
> is not even competitive in my simple tests.

[scripts results deleted]

>
> Summary: SCHED_ULE was more than twice as slow as SCHED_4BSD for the
> obj and depend stages.  These stages have little parallelism.  SCHED_ULE
> was only 19% slower for the all stage.  It apparently misses many
> oppurtunities to actually run useful processes.  This may be related
> to /usr being nfs mounted.  There is lots of idling waiting for nfs
> even in the SCHED_4BSD case.  The system times are smaller for SCHED_ULE,
> but this might not be significant.  E.g., zeroing pages can account
> for several percent of the system time in buildworld, but on unbalanced
> systems that have too much idle time most page zero gets done in idle
> time and doesn't show up in the system time.

At one point ULE was at least as fast as 4BSD and in most cases faster.
This is a regression.  I'll sort it out soon.


>
> Test 1 for fair scheduling related to niceness:
>
>   for i in 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
>   do
>   nice -$i sh -c "while :; do echo -n;done" &
>   done
>   top -o time
>
> [Output deleted].  This shows only a vague correlation between niceness
> and runtime for SCHED_ULE.  However, top -o cpu shows a strong correlation
> between %CPU and niceness.  Apparently, %CPU is very innacurate and/or
> not enough history is kept for long-term scheduling to be fair.
>
> Test 5 for fair scheduling related to niceness:
>
>   for i in -20 -16 -12 -8 -4 0 4 8 12 16 20
>   do
>   nice -$i sh -c "while :; do echo -n;done" &
>   done
>   time top -o cpu
>
> With SCHED_ULE, this now hangs the system, but it worked yesterday.  Today
> it doesn't get as far as running top and it stops the nfs server responding.
> To unhang the system and see what the above does, run a shell at rtprio 0
> and start top before the above, and use top to kill processes (I normally
> use "killall sh" to kill all the shells generated by tests 1-5, but killall
> doesn't work if it is on nfs when the nfs server is not responding).

  661 root 112  -20   900K   608K RUN  0:24 27.80% 27.64% sh
  662 root 114  -16   900K   608K RUN  0:19 12.43% 12.35% sh
  663 root 114  -12   900K   608K RUN  0:15 10.66% 10.60% sh
  664 root 114   -8   900K   608K RUN  0:11  9.38%  9.33% sh
  665 root 115   -4   900K   608K RUN  0:10  7.91%  7.86% sh
  666 root 1150   900K   608K RUN  0:07  6.83%  6.79% sh
  667 root 1154   900K   608K RUN  0:06  5.01%  4.98% sh
  668 root 1158   900K   608K RUN  0:04  3.83%  3.81% sh
  669 root 115   12   900K   608K RUN  0:02  2.21%  2.20% sh
  670 root 115   16   900K   608K RUN  0:01  0.93%  0.93% sh

I think you cvsup'd at a bad time.  I fixed a bug that would have caused
the system to lock up in this case late last night.  On my system it
freezes for a few seconds and then returns.  I can stop that by turning
down the interactivity threshold.

Thanks,
Jeff

>
> Bruce
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: More ULE bugs fixed.

2003-10-17 Thread Jeff Roberson


On Fri, 17 Oct 2003, Bruce Evans wrote:

> On Fri, 17 Oct 2003, Jeff Roberson wrote:
>
> > On Fri, 17 Oct 2003, Bruce Evans wrote:
> >
> > > How would one test if it was an improvement on the 4BSD scheduler?  It
> > > is not even competitive in my simple tests.
> > > ...
> >
> > At one point ULE was at least as fast as 4BSD and in most cases faster.
> > This is a regression.  I'll sort it out soon.
>
> How much faster?

Apache benchmarked at 30% greater throughput due the cpu affinity some
time ago.  I haven't done more recent tests with apache.  buildworld is
the most degenerate case for per cpu run queues because cpu affinity
doesn't help much and load imbalances hurt a lot.  On my machine the
compiler hardly ever wants to run for more than a few slices before doing
a msleep() so it's not bouncing around between CPUs so much with 4BSD.


>
> > > Test 5 for fair scheduling related to niceness:
> > >
> > >   for i in -20 -16 -12 -8 -4 0 4 8 12 16 20
> > >   do
> > >   nice -$i sh -c "while :; do echo -n;done" &
> > >   done
> > >   time top -o cpu
> > >
> > > With SCHED_ULE, this now hangs the system, but it worked yesterday.  Today
> > > it doesn't get as far as running top and it stops the nfs server responding.
>
> >   661 root 112  -20   900K   608K RUN  0:24 27.80% 27.64% sh
> >   662 root 114  -16   900K   608K RUN  0:19 12.43% 12.35% sh
> >   663 root 114  -12   900K   608K RUN  0:15 10.66% 10.60% sh
> >   664 root 114   -8   900K   608K RUN  0:11  9.38%  9.33% sh
> >   665 root 115   -4   900K   608K RUN  0:10  7.91%  7.86% sh
> >   666 root 1150   900K   608K RUN  0:07  6.83%  6.79% sh
> >   667 root 1154   900K   608K RUN  0:06  5.01%  4.98% sh
> >   668 root 1158   900K   608K RUN  0:04  3.83%  3.81% sh
> >   669 root 115   12   900K   608K RUN  0:02  2.21%  2.20% sh
> >   670 root 115   16   900K   608K RUN  0:01  0.93%  0.93% sh
>
> Perhaps the bug only affects SMP.  The above is for UP (no CPU column).
>

That is likely, I don't use my SMP machine much anymore.  I should setup
some automated tests.

> I see a large difference from the above, at least under SMP: %CPU
> tapers off to 0 at nice 0.
>
> BTW, I just noticed that SCHED_4BSD never really worked for the SMP case.
> sched_clock() is called for each CPU, and for N CPU's this has the same
> effect as calling sched_clock() N times too often for 1 CPU.  Calling
> sched_clock() too often was fixed for the UP case in kern_synch.c 1.83
> by introducing a scale factor.  The scale factor is fixed so it doesn't
> help for SMP.

Wait.. why are we calling sched_clock() too frequently on UP?

>
> > I think you cvsup'd at a bad time.  I fixed a bug that would have caused
> > the system to lock up in this case late last night.  On my system it
> > freezes for a few seconds and then returns.  I can stop that by turning
> > down the interactivity threshold.
>
> No, I tested with an up to date kernel (sched_ule.c 1.65).

Curious.  ULE seems to have suffered from bitrot.  These things were all
tested and working when I did my paper for BSDCon.  I have largely
neglected FreeBSD since.  I can't fix it this weekend, but I'm sure I'll
sort it out next weekend.

Cheers,
Jeff

>
> Bruce
>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: More ULE bugs fixed.

2003-10-17 Thread Jeff Roberson


On Fri, 17 Oct 2003, Sean Chittenden wrote:

> > I think you cvsup'd at a bad time.  I fixed a bug that would have
> > caused the system to lock up in this case late last night.  On my
> > system it freezes for a few seconds and then returns.  I can stop
> > that by turning down the interactivity threshold.
>
> Hrm, I must concur that while ULE seems a tad snappier on the
> responsiveness end, it seems to be lacking in terms of real world
> performance compared to 4BSD.

Thanks for the stats.  Is this on SMP or UP?

>
> Fresh CVSup (~midnight 2003-10-17) and build with a benchmark from
> before and after.  I was "benchmarking" a chump calc program using
> bison vs. lemon earlier today under 4BSD
> (http://groups.yahoo.com/group/sqlite/message/5506) and figured I'd
> throw my hat in on the subject with some relative numbers.  System
> time is down for ULE, but user and real are up.
>
>
> Under ULE:
>
> Running a dry run with bison calc...done.
> Running 1st run with bison calc... 52.11 real 45.63 user 0.56 sys
> Running 2nd run with bison calc... 52.16 real 45.52 user 0.69 sys
> Running 3rd run with bison calc... 51.80 real 45.32 user 0.87 sys
>
> Running a dry run with lemon calc...done.
> Running 1st run with lemon calc... 129.69 real 117.91 user 1.10 sys
> Running 2nd run with lemon calc... 130.26 real 117.88 user 1.13 sys
> Running 3rd run with lemon calc... 130.76 real 117.90 user 1.10 sys
>
> Time spent in user mode   (CPU seconds) : 654.049s
> Time spent in kernel mode (CPU seconds) : 7.047s
> Total time  : 12:19.06s
> CPU utilization (percentage): 89.4%
> Times the process was swapped   : 0
> Times of major page faults  : 34
> Times of minor page faults  : 2361
>
>
> And under 4BSD:
>
>  Running a dry run with bison calc...done.
>  Running 1st run with bison calc... 44.22 real 37.94 user 0.85 sys
>  Running 2nd run with bison calc... 46.21 real 37.98 user 0.85 sys
>  Running 3rd run with bison calc... 45.32 real 38.13 user 0.67 sys
>
>  Running a dry run with lemon calc...done.
>  Running 1st run with lemon calc... 116.53 real 100.10 user 1.13 sys
>  Running 2nd run with lemon calc... 112.61 real 100.35 user 0.86 sys
>  Running 3rd run with lemon calc... 114.16 real 100.19 user 1.04 sys
>
>  Time spent in user mode (CPU seconds) : 553.392s
>  Time spent in kernel mode (CPU seconds) : 6.978s
>  Total time : 10:40.80s
>  CPU utilization (percentage) : 87.4%
>  Times the process was swapped : 223
>  Times of major page faults : 50
>  Times of minor page faults : 2750
>
>
> Just a heads up, it does indeed look as thought hings have gone
> backwards in terms of performance.  -sc
>
> --
> Sean Chittenden
>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: More ULE bugs fixed.

2003-10-27 Thread Jeff Roberson

On Fri, 17 Oct 2003, Bruce Evans wrote:

> On Fri, 17 Oct 2003, Jeff Roberson wrote:
>
> > On Fri, 17 Oct 2003, Bruce Evans wrote:
> >
> > > How would one test if it was an improvement on the 4BSD scheduler?  It
> > > is not even competitive in my simple tests.
> > > ...
> >
> > At one point ULE was at least as fast as 4BSD and in most cases faster.
> > This is a regression.  I'll sort it out soon.
>
> How much faster?
>

make kernel on UP seems to be within 1% of 4BSD now.  I actually had some
runs which showed lower system time.  I think I can still improve the
situation some.  Anyway, I found some bugs relating to idle prio tasks,
and also ULE had been doing almost twice as many context switches as 4BSD.
Now it's doing about 8% more.  I'm still tracking this down.

Anyhow, it should be much closer now.  I still have some plans for SMP
that should improve things quite a bit there but UP is looking good.

Cheers,
Jeff

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: ULE page fault with sched_ule.c 1.67

2003-10-27 Thread Jeff Roberson

On Mon, 27 Oct 2003, Jonathan Fosburgh wrote:

> On Monday 27 October 2003 12:06 pm, Arjan van Leeuwen wrote:
> > Hi,
> >
> > I just cvsupped and built a new kernel that includes sched_ule.c 1.67. I'm
> > getting a page fault when working in Mozilla Firebird. It happens pretty
> > soon, after opening one or two pages. The trace shows that it panics at
> > sched_prio().
> >
> I should have said, I am getting the same panic, same trace, but not using
> Mozilla.  I get it shortly after launching my KDE session, though I'm not
> sure where in my session the problem is being hit.

It's KSE.  You can disable it to work around temporarily.  I will fix it
tonight.

>
> --
> Jonathan Fosburgh
> AIX and Storage Administrator
> UT MD Anderson Cancer Center
> Houston, TX
>
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: More ULE bugs fixed.

2003-10-29 Thread Jeff Roberson

On Thu, 30 Oct 2003, Bruce Evans wrote:

> > Test for scheduling buildworlds:
> >
> > cd /usr/src/usr.bin
> > for i in obj depend all
> > do
> > MAKEOBJDIRPREFIX=/somewhere/obj time make -s -j16 $i
> > done >/tmp/zqz 2>&1
> >
> > (Run this with an empty /somewhere/obj.  The all stage doesn't quite
> > finish.)  On an ABIT BP6 system with a 400MHz and a 366MHz CPU, with
> > /usr (including /usr/src) nfs-mounted (with 100 Mbps ethernet and a
> > reasonably fast server) and /somewhere/obj ufs1-mounted (on a fairly
> > slow disk; no soft-updates), this gives the following times:
> >
> > SCHED_ULE-yesterday, with not so careful setup:
> >40.37 real 8.26 user 6.26 sys
> >   278.90 real59.35 user41.32 sys
> >   341.82 real   307.38 user69.01 sys
> > SCHED_ULE-today, run immediately after booting:
> >41.51 real 7.97 user 6.42 sys
> >   306.64 real59.66 user40.68 sys
> >   346.48 real   305.54 user69.97 sys
> > SCHED_4BSD-yesterday, with not so careful setup:
> >   [same as today except the depend step was 10 seconds slower (real)]
> > SCHED_4BSD-today, run immediately after booting:
> >18.89 real 8.01 user 6.66 sys
> >   128.17 real58.33 user43.61 sys
> >   291.59 real   308.48 user72.33 sys
> > SCHED_4BSD-yesterday, with a UP kernel (running on the 366 MHz CPU) with
> > many local changes and not so careful setup:
> >17.39 real 8.28 user 5.49 sys
> >   130.51 real60.97 user34.63 sys
> >   390.68 real   310.78 user60.55 sys
> >
> > Summary: SCHED_ULE was more than twice as slow as SCHED_4BSD for the
> > obj and depend stages.  These stages have little parallelism.  SCHED_ULE
> > was only 19% slower for the all stage.  ...
>
> I reran this with -current (sched_ule.c 1.68, etc.).  Result: no
> significant change.  However, with a UP kernel there was no significant
> difference between the times for SCHED_ULE and SCHED_4BSD.

There was a significant difference on UP until last week.  I'm working on
SMP now.  I have some patches but they aren't quite ready yet.

>
> > Test 5 for fair scheduling related to niceness:
> >
> > for i in -20 -16 -12 -8 -4 0 4 8 12 16 20
> > do
> > nice -$i sh -c "while :; do echo -n;done" &
> > done
> > time top -o cpu
> >
> > With SCHED_ULE, this now hangs the system, but it worked yesterday.  Today
> > it doesn't get as far as running top and it stops the nfs server responding.
> > To unhang the system and see what the above does, run a shell at rtprio 0
> > and start top before the above, and use top to kill processes (I normally
> > use "killall sh" to kill all the shells generated by tests 1-5, but killall
> > doesn't work if it is on nfs when the nfs server is not responding).
>
> This shows problems much more clearly with UP kernels.  It gives the
> nice -20 and -16 processes approx. 55% and 50% of the CPU, respectively
> (the total is significantly more than 100%), and it gives approx.  0%
> of the CPU to the other sh processes (perhaps exactly 0).  It also
> apparently gives gives 0% of the CPU to some important nfs process (I
> couldn't see exactly which) so the nfs server stops responding.
> SCHED_4BSD errs in the opposite direction by giving too many cycles to
> highly niced processes so it is naturally immune to this problem.  With
> SMP, SCHED_ULE lets many more processes run.

I seem to have broken something related to nice.  I only tested
interactivity and performance after my last round of changes.  I have a
standard test that I do that is similar to the one that you have posted
here.  I used it to gather results for my paper
(http://www.chesapeake.net/~jroberson/ULE.pdf).  There you can see what
the intended nice curve is like.  Oddly enough, I ran your test again on
my laptop and I did not see 55% of the cpu going to nice -20.  It was
spread proportionally from -20 to 0 with postive nice values not receiving
cpu time, as intended.  It did not, however, let interactive processes
proceed.  This is certainly a bug and it sounds like there may be others
which lead to the problems that you're having.

>
> The nfs server also sometimes stops reponding with only non-negatively
> niced processes (0 through 20 in the above), but it takes longer.
>
> The nfs server restarts if enough of the hog processes are killed.
> Apparently nfs has some critical process running at only user priority
> and nice 0 and even non-negatively niced processes are enough to prevent
> it it running.

This shouldn't be the case, it sounds like my interactivity boost is
somewhat broken.

>
> Top output with loops like the above shows many anomalies in PRI, TIME,
> WCPU and CPU, but no worse than the ones with SCHED_4BSD.  PRI tends to
> stick at 139 (the max) with SCHED_ULE.  With SCHED_4BSD, this indicates

Re: More ULE bugs fixed.

2003-10-31 Thread Jeff Roberson

On Wed, 29 Oct 2003, Jeff Roberson wrote:

> On Thu, 30 Oct 2003, Bruce Evans wrote:
>
> > > Test for scheduling buildworlds:
> > >
> > >   cd /usr/src/usr.bin
> > >   for i in obj depend all
> > >   do
> > >   MAKEOBJDIRPREFIX=/somewhere/obj time make -s -j16 $i
> > >   done >/tmp/zqz 2>&1
> > >
> > > (Run this with an empty /somewhere/obj.  The all stage doesn't quite
> > > finish.)  On an ABIT BP6 system with a 400MHz and a 366MHz CPU, with
> > > /usr (including /usr/src) nfs-mounted (with 100 Mbps ethernet and a
> > > reasonably fast server) and /somewhere/obj ufs1-mounted (on a fairly
> > > slow disk; no soft-updates), this gives the following times:
> > >
> > > SCHED_ULE-yesterday, with not so careful setup:
> > >40.37 real 8.26 user 6.26 sys
> > >   278.90 real59.35 user41.32 sys
> > >   341.82 real   307.38 user69.01 sys
> > > SCHED_ULE-today, run immediately after booting:
> > >41.51 real 7.97 user 6.42 sys
> > >   306.64 real59.66 user40.68 sys
> > >   346.48 real   305.54 user69.97 sys
> > > SCHED_4BSD-yesterday, with not so careful setup:
> > >   [same as today except the depend step was 10 seconds slower (real)]
> > > SCHED_4BSD-today, run immediately after booting:
> > >18.89 real 8.01 user 6.66 sys
> > >   128.17 real58.33 user43.61 sys
> > >   291.59 real   308.48 user72.33 sys
> > > SCHED_4BSD-yesterday, with a UP kernel (running on the 366 MHz CPU) with
> > > many local changes and not so careful setup:
> > >17.39 real 8.28 user 5.49 sys
> > >   130.51 real60.97 user34.63 sys
> > >   390.68 real   310.78 user60.55 sys
> > >
> > > Summary: SCHED_ULE was more than twice as slow as SCHED_4BSD for the
> > > obj and depend stages.  These stages have little parallelism.  SCHED_ULE
> > > was only 19% slower for the all stage.  ...
> >
> > I reran this with -current (sched_ule.c 1.68, etc.).  Result: no
> > significant change.  However, with a UP kernel there was no significant
> > difference between the times for SCHED_ULE and SCHED_4BSD.
>
> There was a significant difference on UP until last week.  I'm working on
> SMP now.  I have some patches but they aren't quite ready yet.

I have commited my SMP fixes.  I would appreciate it if you could post
update results.  ULE now outperforms 4BSD in a single threaded kernel
compile and performs almost identically in a 16 way make.  I still have a
few more things that I can do to improve the situation.  I would expect
ULE to pull further ahead in the months to come.

The nice issue is still outstanding, as is the incorrect wcpu reporting.

Cheers,
Jeff

>
> >
> > > Test 5 for fair scheduling related to niceness:
> > >
> > >   for i in -20 -16 -12 -8 -4 0 4 8 12 16 20
> > >   do
> > >   nice -$i sh -c "while :; do echo -n;done" &
> > >   done
> > >   time top -o cpu
> > >
> > > With SCHED_ULE, this now hangs the system, but it worked yesterday.  Today
> > > it doesn't get as far as running top and it stops the nfs server responding.
> > > To unhang the system and see what the above does, run a shell at rtprio 0
> > > and start top before the above, and use top to kill processes (I normally
> > > use "killall sh" to kill all the shells generated by tests 1-5, but killall
> > > doesn't work if it is on nfs when the nfs server is not responding).
> >
> > This shows problems much more clearly with UP kernels.  It gives the
> > nice -20 and -16 processes approx. 55% and 50% of the CPU, respectively
> > (the total is significantly more than 100%), and it gives approx.  0%
> > of the CPU to the other sh processes (perhaps exactly 0).  It also
> > apparently gives gives 0% of the CPU to some important nfs process (I
> > couldn't see exactly which) so the nfs server stops responding.
> > SCHED_4BSD errs in the opposite direction by giving too many cycles to
> > highly niced processes so it is naturally immune to this problem.  With
> > SMP, SCHED_ULE lets many more processes run.
>
> I seem to have broken something related to nice.  I only tested
> interactivity and performance after my last round of changes.  I have a
> standard test that I do that is similar to

Re: More ULE bugs fixed.

2003-10-31 Thread Jeff Roberson

On Fri, 31 Oct 2003, Bruno Van Den Bossche wrote:

> Jeff Roberson <[EMAIL PROTECTED]> wrote:
>
> > On Wed, 29 Oct 2003, Jeff Roberson wrote:
> >
> > > On Thu, 30 Oct 2003, Bruce Evans wrote:
> > >
> > > > > Test for scheduling buildworlds:
> > > > >
> > > > >   cd /usr/src/usr.bin
> > > > >   for i in obj depend all
> > > > >   do
> > > > >   MAKEOBJDIRPREFIX=/somewhere/obj time make -s -j16 $i
> > > > >   done >/tmp/zqz 2>&1
> > > > >
> > > > > (Run this with an empty /somewhere/obj.  The all stage doesn't
> > > > > quite finish.)  On an ABIT BP6 system with a 400MHz and a 366MHz
> > > > > CPU, with/usr (including /usr/src) nfs-mounted (with 100 Mbps
> > > > > ethernet and a reasonably fast server) and /somewhere/obj
> > > > > ufs1-mounted (on a fairly slow disk; no soft-updates), this
> > > > > gives the following times:
> > > > >
> > > > > SCHED_ULE-yesterday, with not so careful setup:
> > > > >40.37 real 8.26 user 6.26 sys
> > > > >   278.90 real59.35 user41.32 sys
> > > > >   341.82 real   307.38 user69.01 sys
> > > > > SCHED_ULE-today, run immediately after booting:
> > > > >41.51 real 7.97 user 6.42 sys
> > > > >   306.64 real59.66 user40.68 sys
> > > > >   346.48 real   305.54 user69.97 sys
> > > > > SCHED_4BSD-yesterday, with not so careful setup:
> > > > >   [same as today except the depend step was 10 seconds
> > > > >   slower (real)]
> > > > > SCHED_4BSD-today, run immediately after booting:
> > > > >18.89 real 8.01 user 6.66 sys
> > > > >   128.17 real58.33 user43.61 sys
> > > > >   291.59 real   308.48 user72.33 sys
> > > > > SCHED_4BSD-yesterday, with a UP kernel (running on the 366 MHz
> > > > > CPU) with
> > > > > many local changes and not so careful setup:
> > > > >17.39 real 8.28 user 5.49 sys
> > > > >   130.51 real60.97 user34.63 sys
> > > > >   390.68 real   310.78 user60.55 sys
> > > > >
> > > > > Summary: SCHED_ULE was more than twice as slow as SCHED_4BSD for
> > > > > the obj and depend stages.  These stages have little
> > > > > parallelism.  SCHED_ULE was only 19% slower for the all stage.
> > > > > ...
> > > >
> > > > I reran this with -current (sched_ule.c 1.68, etc.).  Result: no
> > > > significant change.  However, with a UP kernel there was no
> > > > significant difference between the times for SCHED_ULE and
> > > > SCHED_4BSD.
> > >
> > > There was a significant difference on UP until last week.  I'm
> > > working on SMP now.  I have some patches but they aren't quite ready
> > > yet.
> >
> > I have commited my SMP fixes.  I would appreciate it if you could post
> > update results.  ULE now outperforms 4BSD in a single threaded kernel
> > compile and performs almost identically in a 16 way make.  I still
> > have a few more things that I can do to improve the situation.  I
> > would expect ULE to pull further ahead in the months to come.
>
> I recently had to complete a little piece of software in a course on
> parallel computing.  I've put it online[1] (we only had to write the
> pract2.cpp file).  It calculates the inverse of a Vandermonde matrix and
> allows you to spawn multiple slave-processes who each perform a part of
> the work.  Everything happens in memory so
> I've used it lately to test the different changes you made to
> sched_ule.c and these last fixes do improve the performance on my dual
> p3 machine a lot.
>
> Here are the results of my (very limited tests) :
>
> sched4bsd
> ---
> dimension   slaves  time
> 10001   90.925408
> 10002   58.897038
>
> 200 1   0.735962
> 200 2   0.676660
>
> sched_ule 1.68
> ---
> dimension   slaves  time
> 10001   90.951015
> 10002   70.402845
>
> 200 1   0.743551

Re: Sticky mouse with SCHED_ULE 10-30-03

2003-10-31 Thread Jeff Roberson


On Fri, 31 Oct 2003, Michal wrote:

> FreeBSD 5.1-CURRENT #0: Thu Oct 30 17:49:13 EST 2003
> When kernel compiled with SCHED_ULE, USB mouse (MS USB Intellimouse) is
> almost unusable. Even if CPU is idle, mouse feels sticky. When loading
> mozilla or compiling comething mouse freezes for several seconds and is
> nonresponsive in general. Switched back to SCHED_4BSD and mouse is
> better than ever. No problems at all when loading programs or compiling.
> To me subjective feeling mouse respomds worse than month ago with
> SCHED_ULE and much better with SCHED_4BSD than before.

Are you using moused?  Is this SMP or UP?  What CPUs are you using?

Thanks,
Jeff

>
> Michal
>
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: More ULE bugs fixed.

2003-11-02 Thread Jeff Roberson

On Sat, 1 Nov 2003, Bruce Evans wrote:

> On Fri, 31 Oct 2003, Jeff Roberson wrote:
>
> > I have commited my SMP fixes.  I would appreciate it if you could post
> > update results.  ULE now outperforms 4BSD in a single threaded kernel
> > compile and performs almost identically in a 16 way make.  I still have a
> > few more things that I can do to improve the situation.  I would expect
> > ULE to pull further ahead in the months to come.
>
> My simple make benchmark now takes infinitely longer with ULE under SMP,
> since make -j 16 with ULE under SMP now hangs nfs after about a minute.
> 4BSD works better.  However, some networking bugs have developed in the
> last few days.  One of their manifestations is that SMP kernels always
> panic in sbdrop() on shutdown.
>
> > The nice issue is still outstanding, as is the incorrect wcpu reporting.
>
> It may be related to nfs processes not getting any cycles even when there
> are no niced processes.
>

I've just run your script myself.  I was using sched_ule.c rev 1.75.  I
did not encounter any problem.  I also have not run it with 4BSD so I
don't have any performance comparisons.  Hopefully the next time you have
an opportunity to test things will go smoothly.  I fixed a bug in
sched_prio() that may have caused this behavior.

You commented on the nice cutoff before.  What do you believe the correct
behavior is?  In ULE I went to great lengths to be certain that I emulated
the old behavior of denying nice +20 processes cpu time when anything nice
0 or above was running.  As a result of that, nice -20 processes inhibit
any processes with a nice below zero from receiving cpu time.  Prior to a
commit earlier today, nice -20 would stop nice 0 processes that were
non-interactive.  I've changed that though so nice 0 will always be able
to run, just with a small slice.  Based on your earlier comments, you
don't believe that this behavior is correct, why, and what would you like
to see?

Thanks,
Jeff

> Bruce
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Sticky mouse with SCHED_ULE 10-30-03

2003-11-02 Thread Jeff Roberson

On Sun, 2 Nov 2003, Schnoopay wrote:

> >> Are you using moused?  Is this SMP or UP?  What CPUs are you using?
> >  >
> >  > Thanks,
> >  > Jeff
> >
> > I am having similar problems after my last cvsup (10-31-03) also using a
> > USB MS Intellimouse. Mouse is slow to respond under ULE but fine under
> > 4BSD. The mouse feels like it's being sampled at a slow rate.
> >
> > I am using moused, on a UP Athlon XP 1800+. I am running seti at home at
> > nice 15, but kill the seti process made no notable difference. I failed
> > to check objective performance as the interactive experience was truly
> > difficult to work with and I just wanted to get my work done. =]
> >
> > -Schnoopay
>
> I just disabled moused and told X to read from /dev/ums0 and the mouse
> problems are gone. I haven't changed anything else from when the mouse
> was "sticky" so I guess not using moused is a good work around.
>

I'm not able to reproduce this at all.  Could any of you folks that are
experiencing this problem update to sched_ule.c rev 1.75 and tell me if it
persists?


> -Schnoopay
>
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

How nice should behave (was Re: More ULE bugs fixed.)

2003-11-03 Thread Jeff Roberson


On Tue, 4 Nov 2003, Bruce Evans wrote:

> On Sun, 2 Nov 2003, Jeff Roberson wrote:
>
> > You commented on the nice cutoff before.  What do you believe the correct
> > behavior is?  In ULE I went to great lengths to be certain that I emulated
> > the old behavior of denying nice +20 processes cpu time when anything nice
> > 0 or above was running.  As a result of that, nice -20 processes inhibit
> > any processes with a nice below zero from receiving cpu time.  Prior to a
> > commit earlier today, nice -20 would stop nice 0 processes that were
> > non-interactive.  I've changed that though so nice 0 will always be able
> > to run, just with a small slice.  Based on your earlier comments, you
> > don't believe that this behavior is correct, why, and what would you like
> > to see?
>
> Only RELENG_4 has that "old" behaviour.
>
> I think the existence of rtprio and a non-broken idprio makes infinite
> deprioritization using niceness unnecessary.  (idprio is still broken
> (not available to users) in -current, but it doesn't need to be if
> priority propagation is working as it should be.)  It's safer and fairer
> for all niced processes to not completely prevent each other being
> scheduled, and use the special scheduling classes for cases where this
> is not wanted.  I'd mainly like the slices for nice -20 vs nice --20
> processes to be very small and/or infrequent.

idprio should be able to function properly since we have priority
propagation and elevated priorities for m/tsleep.  I believe that many
people rely on the nice +20 behavior.  We could change this and make it a
matter of user education.

ULE's nice mechanism is very flexible in this regard.  I would only have
to change one define to force the slice assignment to scale across the
whole slice range.  Although, I only have 14 possible slice values to
hand out, so small differences would be meaningless.

>
> Bruce
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: More ULE bugs fixed.

2003-11-03 Thread Jeff Roberson

On Mon, 3 Nov 2003, Eirik Oeverby wrote:

> Hi,
>
> Just recompiled yesterday, running sched_ule.c 1.75. It seems to have
> re-introduced the bogus mouse events I talked about earlier, after a
> period of having no problems with it. The change happened between 1.69
> and 1.75, and there's also the occational glitch in keyboard input.

How unfortunate, it seems to have fixed other problems.  Can you describe
the mouse problem?  Is it jittery constantly or only under load?  Or are
you having other problems?  Have you tried reverting to SCHED_4BSD?  What
window manager do you run?

Thanks for the report.

Cheers,
Jeff

>
> If you need me to do anything to track this down, let me know. I am, and
> have always been, running with moused, on a uniprocessor box (ThinkPad
> T21 1ghz p3).
>
> Best regards,
> /Eirik
>
> Jeff Roberson wrote:
> > On Fri, 31 Oct 2003, Bruno Van Den Bossche wrote:
> >
> >
> >>Jeff Roberson <[EMAIL PROTECTED]> wrote:
> >>
> >>
> >>>On Wed, 29 Oct 2003, Jeff Roberson wrote:
> >>>
> >>>
> >>>>On Thu, 30 Oct 2003, Bruce Evans wrote:
> >>>>
> >>>>
> >>>>>>Test for scheduling buildworlds:
> >>>>>>
> >>>>>>cd /usr/src/usr.bin
> >>>>>>for i in obj depend all
> >>>>>>do
> >>>>>>MAKEOBJDIRPREFIX=/somewhere/obj time make -s -j16 $i
> >>>>>>done >/tmp/zqz 2>&1
> >>>>>>
> >>>>>>(Run this with an empty /somewhere/obj.  The all stage doesn't
> >>>>>>quite finish.)  On an ABIT BP6 system with a 400MHz and a 366MHz
> >>>>>>CPU, with/usr (including /usr/src) nfs-mounted (with 100 Mbps
> >>>>>>ethernet and a reasonably fast server) and /somewhere/obj
> >>>>>>ufs1-mounted (on a fairly slow disk; no soft-updates), this
> >>>>>>gives the following times:
> >>>>>>
> >>>>>>SCHED_ULE-yesterday, with not so careful setup:
> >>>>>>   40.37 real 8.26 user 6.26 sys
> >>>>>>  278.90 real59.35 user41.32 sys
> >>>>>>  341.82 real   307.38 user69.01 sys
> >>>>>>SCHED_ULE-today, run immediately after booting:
> >>>>>>   41.51 real 7.97 user 6.42 sys
> >>>>>>  306.64 real59.66 user40.68 sys
> >>>>>>  346.48 real   305.54 user69.97 sys
> >>>>>>SCHED_4BSD-yesterday, with not so careful setup:
> >>>>>>  [same as today except the depend step was 10 seconds
> >>>>>>  slower (real)]
> >>>>>>SCHED_4BSD-today, run immediately after booting:
> >>>>>>   18.89 real 8.01 user 6.66 sys
> >>>>>>  128.17 real58.33 user43.61 sys
> >>>>>>  291.59 real   308.48 user72.33 sys
> >>>>>>SCHED_4BSD-yesterday, with a UP kernel (running on the 366 MHz
> >>>>>>CPU) with
> >>>>>>many local changes and not so careful setup:
> >>>>>>   17.39 real 8.28 user 5.49 sys
> >>>>>>  130.51 real60.97 user34.63 sys
> >>>>>>  390.68 real   310.78 user60.55 sys
> >>>>>>
> >>>>>>Summary: SCHED_ULE was more than twice as slow as SCHED_4BSD for
> >>>>>>the obj and depend stages.  These stages have little
> >>>>>>parallelism.  SCHED_ULE was only 19% slower for the all stage.
> >>>>>>...
> >>>>>
> >>>>>I reran this with -current (sched_ule.c 1.68, etc.).  Result: no
> >>>>>significant change.  However, with a UP kernel there was no
> >>>>>significant difference between the times for SCHED_ULE and
> >>>>>SCHED_4BSD.
> >>>>
> >>>>There was a significant difference on UP until last week.  I'm
> >>>>working on SMP now.  I have some patches but they aren't quite ready
> >>>>yet.
> >>>
> >>>I have commited my SMP fixes.  I would appreciate it if you could post
> >>>update results.  ULE now outperforms 4BSD in a single threaded kernel
> >>>com

Re: More ULE bugs fixed.

2003-11-04 Thread Jeff Roberson

On Tue, 4 Nov 2003, Sheldon Hearn wrote:

> On (2003/11/04 09:29), Eirik Oeverby wrote:
>
> > The problem is two parts: The mouse tends to 'lock up' for brief moments
> > when the system is under load, in particular during heavy UI operations
> > or when doing compile jobs and such.
> > The second part of the problem is related, and is manifested by the
> > mouse actually making movements I never asked it to make.
>
> Wow, I just assumed it was a local problem.  I'm also seeing unrequested
> mouse movement, as if the signals from movements are repeated or
> amplified.
>
> The thing is, I'm using 4BSD, not ULE, so I wouldn't trouble Jeff to
> look for a cause for that specific problem in ULE.

How long have you been seeing this?  Are you using a usb mouse?  Can you
try with PS/2 if you are?

Thanks,
Jeff

>
> Ciao,
> Sheldon.
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Was: More ULE bugs fixed. Is: Mouse problem?

2003-11-04 Thread Jeff Roberson

On Wed, 5 Nov 2003, Eirik Oeverby wrote:

> Alex Wilkinson wrote:
> > On Wed, Nov 05, 2003 at 12:27:04AM +0100, Eirik Oeverby wrote:
> >
> > Just for those interested:
> > I do *not* get any messages at all from the kernel (or elsewhere) when
> > my mouse goes haywire. And it's an absolute truth (just tested back and
> > forth 8 times) that it *only* happens with SCHED_ULE and *only* with old
> > versions (~1.50) and the very latest ones (1.75 as I'm currently
> > running). 1.69 for instance did *not* show any such problems.
> >
> > I will, however, update my kernel again now, to get the latest
> > sched_ule.c (if any changes have been made since 1.75) and to test with
> > the new interrupt handler. I have a suspicion it might be a combination
> > of SCHED_ULE and some signal/message/interrupt handling causing messages
> > to get lost along the way. Because that's exactly how it feels...
> >
> > Question: How can I find out what verion of SCHED_ULE I am running ?
>
> I asked the same recently, and here's what I know:
>   - check /usr/src/sys/kern/sched_ule.c - a page or so down there's a
> line with the revision
>   - ident /boot/kernel/kernel | grep sched_ule

Ident also works on source files.

Cheers,
Jeff

>
> /Eirik
>
> >
> >  - aW
>
>
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Was: More ULE bugs fixed. Is: Mouse problem?

2003-11-04 Thread Jeff Roberson


On Wed, 5 Nov 2003, Eirik Oeverby wrote:

> Eirik Oeverby wrote:
> > Just for those interested:
> > I do *not* get any messages at all from the kernel (or elsewhere) when
> > my mouse goes haywire. And it's an absolute truth (just tested back and
> > forth 8 times) that it *only* happens with SCHED_ULE and *only* with old
> > versions (~1.50) and the very latest ones (1.75 as I'm currently
> > running). 1.69 for instance did *not* show any such problems.
> >
> > I will, however, update my kernel again now, to get the latest
> > sched_ule.c (if any changes have been made since 1.75) and to test with
> > the new interrupt handler. I have a suspicion it might be a combination
> > of SCHED_ULE and some signal/message/interrupt handling causing messages
> > to get lost along the way. Because that's exactly how it feels...
>
> Whee. Either the bump from sched_ule.c 1.75 to 1.77 changed something
> back to the old status, or the new interrupt handling has had some major
> influence.
> All I can say is - wow. My system is now more responsive than ever, I
> cannot (so far) reproduce any mouse jerkiness or bogus input or
> anything, and things seem smoother.
>
> As always I cannot guarantee that this report is not influenced by the
> placebo effect, but I do feel that it's a very real improvement. The
> fact that I can start VMWare, Firebird, Thunderbird, Gaim and gkrellm at
> the same time without having *one* mouse hickup speaks for itself. I
> couldn't even do that with ULE.
>
> So Jeff or whoever did the interrupt stuff - what did you do?

This is wonderful news.  I fixed a few bugs over the last couple of days.
I'm not sure which one caused your problem.  I'm very pleased to hear your
report though.

Cheers,
Jeff

>
> /Eirik
>
> >
> > Greetings,
> > /Eirik
> >
> > Morten Johansen wrote:
> >
> >> On Tue, 4 Nov 2003, Sheldon Hearn wrote:
> >>
> >>> On (2003/11/04 09:29), Eirik Oeverby wrote:
> >>>
> >>> > The problem is two parts: The mouse tends to 'lock up' for brief
> >>> moments
> >>> > when the system is under load, in particular during heavy UI
> >>> operations
> >>> > or when doing compile jobs and such.
> >>> > The second part of the problem is related, and is manifested by the
> >>> > mouse actually making movements I never asked it to make.
> >>>
> >>> Wow, I just assumed it was a local problem.  I'm also seeing unrequested
> >>> mouse movement, as if the signals from movements are repeated or
> >>> amplified.
> >>>
> >>> The thing is, I'm using 4BSD, not ULE, so I wouldn't trouble Jeff to
> >>> look for a cause for that specific problem in ULE.
> >>
> >>
> >>
> >>
> >> Me too. Have had this problem since I got a "Intellimouse" PS/2
> >> wheel-mouse. (It worked fine with previous mice (no wheel)).
> >> With any scheduler in 5-CURRENT and even more frequent in 4-STABLE,
> >> IIRC. Using moused or not doesn't make a difference.
> >> Get these messages on console: "psmintr: out of sync", and the mouse
> >> freezes then goes wild for a few seconds.
> >> Can happen under load and sometimes when closing Mozilla (not often).
> >> It could be related to the psm-driver. Or maybe I have a bad mouse, I
> >> don't know.
> >> I will try another mouse, but it does work perfectly in Linux and
> >> Windogs...
> >>
> >> mj
> >>
> >>
> >>
> >> ___
> >> [EMAIL PROTECTED] mailing list
> >> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> >> To unsubscribe, send any mail to
> >> "[EMAIL PROTECTED]"
> >
> >
> >
> > ___
> > [EMAIL PROTECTED] mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-current
> > To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>
>
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: SYSENTER in FreeBSD

2003-11-05 Thread Jeff Roberson

On Wed, 5 Nov 2003, David Xu wrote:

> Jun Su wrote:
>
> >I noticed that Jeff Roberson implement this already. Is whi will be commit?
> >http://kerneltrap.org/node/view/1531
> >
> >I google this because I found this feature is listed in the list of Kernel 
> >Improvement of WindowsXP. :-)
> >
> >Thanks,
> >Jun Su
> >
> >
> >
> I have almost done this experiment about 10 months ago.
> http://people.freebsd.org/~davidxu/fastsyscall/
> The patch is out of date and still not complete.
> Also it can give you some performance improve, but I think too many
> things need to be changed,
> and this really makes user ret code very dirty, some syscalls, for
> example, pipe() can not use
> this fast syscall, becaues pipe() seems using two registers to return
> file handle, the performance gain
> is immediately lost when the assemble code becomes more complex. I don't
> think this hack is worth
> to do on IA32, I heard AMD has different way to support fast syscall,
> that may already in FreeBSD
> AMD 64 branch.

This works with every syscall.  I have a patch in perforce that doesn't
require any changes to userret().  The performance gain is not so
substantial for most things but I feel that it is worth it.  Mini is
probably going to finish this up over the next week or so.

Cheers,
Jeff


>
> David Xu
>
>
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: ULE and very bad responsiveness

2003-11-13 Thread Jeff Roberson


On Thu, 13 Nov 2003, Harald Schmalzbauer wrote:

> On Thursday 13 November 2003 07:17, Harald Schmalzbauer wrote:
> > Hi,
> >
> > from comp.unix.bsd.freebsd.misc:
> >
> > Kris Kennaway wrote:
> > > On 2003-11-13, Harald Schmalzbauer <[EMAIL PROTECTED]> wrote:
> > >> Well, I don't have any measurements but in my case it's not neccessary
> > >> at all. I built a UP kernel with ULE like Kris advised me.
> > >
> > > Are you running an up-to-date 5.1-CURRENT?  ULE was broken with these
> > > characteristics until very recently.  If you're up-to-date and still
> > > see these problems, you need to post to the current mailing list.
> > >
> > > Kris
> >
> > Yes, I am running current as of 13. Nov.
> >
> > Find attached my first problem description.
>
> This time I also attached my dmesg and kernel conf

Try running seti with nice +20 rather than 15.  Do you experience bad
interactivity without seti running?

Thanks,
Jeff

>
> >
> > Thanks,
> >
> > -Harry
>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: ULE and very bad responsiveness

2003-11-14 Thread Jeff Roberson


On Fri, 14 Nov 2003, Jonathan Fosburgh wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On Thursday 13 November 2003 06:01 pm, Harald Schmalzbauer wrote:
>
> > I also could play quake(2) and have something compiling in the background
> > but I see every new object file in form of a picture freeze. Also every
> > other disk access seems to block the whole machine for a moment.
> > I'll try again if somebody has an idea what's wrong. Then I can try running
> > seti wtih nice 20 but that's not really a solution. It's working perfectly
> > with nice 15 and the old scheduler.
> >
>
> I see something similar, as a file is generated during a compile a get a
> momentary hang in the mouse, but it is not every compile.  I think I see it
> mostly when running some invocation of make -j, but I've not been able to
> lock down a particular set of circumstances where I do see it.  My
> sched_ule.c is at 1.80.  I have a UP system.  This behaviour, intermittant
> though it is, persists across a normal UP kernel, and also one with SMP+APIC
> (I was *supposed* to have two CPUs, but that is another issue ...) enabled. I
> have a PS/2 mouse and use moused.  I'm running KDE3.1.4.

This does not happen with SCHED_4BSD?  How fast is your system?  Can you
give me an example including what applications you're running and what
you're compiling?

> - --
> Jonathan Fosburgh
> AIX and Storage Administrator
> UT MD Anderson Cancer Center
> Houston, TX
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.2.3 (FreeBSD)
>
> iD8DBQE/tNYwqUvQmqp7omYRAnzjAKCx8by6w77iT5G+7NiBOC8lVkxJ3QCcDgWP
> J9I+Sgx4yuzqOOQ+Gu9Ge3s=
> =GEi2
> -END PGP SIGNATURE-
>
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: ULE and very bad responsiveness

2003-11-14 Thread Jeff Roberson

On Fri, 14 Nov 2003, Jonathan Fosburgh wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On Friday 14 November 2003 01:52 pm, Jeff Roberson wrote:
>
> >
> > This does not happen with SCHED_4BSD?  How fast is your system?  Can you
> > give me an example including what applications you're running and what
> > you're compiling?
>
> I haven't tried SCHED_4BSD lately.  It will probably be next week before I
> have a chance.  Basically this is while running things such as konqueror,
> kmail, konsole, sometimes Mozilla or Firebird, usually wine for Lotus Notes.
> I think I see it more often on building the world, and again mostly with -j,
> even set at 4 or 5.  This is a 600mHz with ~380M RAM on an ATA drive at
> UDMA-66.

I suspect that you are experiencing some paging activity.  Does top show
that any of your swap is in use?  You probably don't have enough memory to
fit a parallelized buildworld, all the files that it touches, mozilla
(60MB on my machine), Xwindows (Another 60mb on my machine), and your
window manager, which if you're using kde, is probably at least another
60mb.

>
> - --
> Jonathan Fosburgh
> AIX and Storage Administrator
> UT MD Anderson Cancer Center
> Houston, TX
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.2.3 (FreeBSD)
>
> iD8DBQE/tUF5qUvQmqp7omYRAsaSAJ0Y8fZBrNEQ8UcTtf1XfVUHnE3lPwCfcup4
> k4bw4D68b7Lrdf0ygWJ4zrE=
> =ZXZ4
> -END PGP SIGNATURE-
>
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: HEADS-UP new statfs structure

2003-11-15 Thread Jeff Roberson

On Sat, 15 Nov 2003, Garance A Drosihn wrote:

> At 6:20 PM -0800 11/15/03, David O'Brien wrote:
> >On Sat, Nov 15, 2003 at 03:16:03PM -0800, Marcel Moolenaar wrote:
> >>  Provided that we
> >  > 2. replace the date with a convenient sequence number,
> >  >which we can call the minor version number, and
> >..
> >  > E.g.: libc.so.6.0, libc.so.6.1, and (first release) libc.so.6.2...
> >
> >Please no -- it wouldn't be easy to see a.out libs from ELF ones.
> >(yes I still have some a.out binaries)
>
> Maybe:
> libc.so.6.e0, libc.so.6.e1, and (first release) libc.so.6.e2...
>
> I have no idea what would be best to do, but I do think we
> (developers and users alike) would be much better off if
> we had some way to handle all these changes which come in.
>
> Or maybe the real problem is that we claim that there will
> be no API/ABI changes after X.0-RELEASE, and we've really
> missed that mark with 5.0-RELEASE, for a variety of reasons.
> If we're going to keep missing that mark with the 6.x-series,
> then we should plan to do something to make life a little
> less painful.  Right now it's getting more painful, if for
> no other reason than we have more developers, and thus more
> major-changes in the pipeline.

The API and ABI are frozen when we make 5.x-STABLE and branch 6.x.  Until
then it's open to change.  This was decided up front.

Cheers,
Jeff

>
> --
> Garance Alistair Drosehn=   [EMAIL PROTECTED]
> Senior Systems Programmer   or  [EMAIL PROTECTED]
> Rensselaer Polytechnic Instituteor  [EMAIL PROTECTED]
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: LOR (swap_pager.c:1323, swap_pager.c:1838, uma_core.c:876) (current:Nov17)

2003-11-18 Thread Jeff Roberson


On Tue, 18 Nov 2003, Cosmin Stroe wrote:

> Here is the stack backtrace:
>

Thanks, this is known and is actually safe.  We're pursuing ways to quiet
these warnings.


> lock order reversal
>  1st 0xc1da318c vm object (vm object) @ /usr/src/sys/vm/swap_pager.c:1323
>  2nd 0xc0724900 swap_pager swhash (swap_pager swhash) @ 
> /usr/src/sys/vm/swap_pager.c:1838
>  3rd 0xc0c358c4 vm object (vm object) @ /usr/src/sys/vm/uma_core.c:876
> Stack backtrace:
> backtrace(c0692be9,c0c358c4,c06a376c,c06a376c,c06a464d) at backtrace+0x17
> witness_lock(c0c358c4,8,c06a464d,36c,1) at witness_lock+0x672
> _mtx_lock_flags(c0c358c4,0,c06a464d,36c,1) at _mtx_lock_flags+0xba
> obj_alloc(c0c22480,1000,c976f9db,101,c06f3f50) at obj_alloc+0x3f
> slab_zalloc(c0c22480,1,c06a464d,68c,c0c22494) at slab_zalloc+0xb3
> uma_zone_slab(c0c22480,1,c06a464d,68c,c0c22520) at uma_zone_slab+0xd6
> uma_zalloc_internal(c0c22480,0,1,5c1,72e,c06f55a8) at uma_zalloc_internal+0x3e
> uma_zalloc_arg(c0c22480,0,1,72e,2) at uma_zalloc_arg+0x3ab
> swp_pager_meta_build(c1da318c,7,0,2,0) at swp_pager_meta_build+0x174
> swap_pager_putpages(c1da318c,c976fbb8,8,0,c976fb20) at swap_pager_putpages+0x32d
> default_pager_putpages(c1da318c,c976fbb8,8,0,c976fb20) at default_pager_putpages+0x2e
> vm_pageout_flush(c976fbb8,8,0,0,c06f36a0) at vm_pageout_flush+0x17a
> vm_pageout_clean(c0dae2d8,0,c06a4468,32a,0) at vm_pageout_clean+0x305
> vm_pageout_scan(0,0,c06a4468,5a9,1f4) at vm_pageout_scan+0x65f
> vm_pageout(0,c976fd48,c068d4ed,311,0) at vm_pageout+0x31b
> fork_exit(c0625250,0,c976fd48) at fork_exit+0xb4
> fork_trampoline() at fork_trampoline+0x8
> --- trap 0x1, eip = 0, esp = 0xc976fd7c, ebp = 0 ---
> Debugger("witness_lock")
> Stopped at  Debugger+0x54:  xchgl   %ebx,in_Debugger.0
> db>
>
> I'm running the sources from yesterday, nov 17:
>
> FreeBSD 5.1-CURRENT #0: Mon Nov 17 06:40:05 CST 2003 
> root@:/usr/obj/usr/src/sys/GALAXY
>
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: zone(9) is broken on SMP?!

2003-11-26 Thread Jeff Roberson

On Thu, 27 Nov 2003, Florian C. Smeets wrote:

> Max Laier wrote:
> > If I build attached kmod and kldload/-unload it on a GENERIC kernel w/
> > SMP & apic it'll error out:
> > "Zone was not empty (xx items).  Lost X pages of memory."
> >
> > This is on a p4 HTT, but seems reproducible on "proper" SMP systems as
> > well. UP systems don't show it however.
> >
> > Can somebody please try and report? Thanks!
>
> Yes this is reproducible on a real SMP system:
>
> bender kernel: Zone UMA test zone was not empty (65 items).  Lost 1
> pages of memory.

I'll look into this over the weekend thanks.

Cheers,
Jeff

>
> Regards,
> flo
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Frequent lockups with 5.2-BETA

2003-11-27 Thread Jeff Roberson



On 27 Nov 2003, Christian Laursen wrote:

> Since upgrading from 5.1-RELEASE to 5.2-BETA, I've been
> experiencing hard lockups once or twice every day.
>
> I managed to get a trace by enabling the watchdog, which
> put me into the debugger. This is the trace:
>
> db> trace
> Debugger(c06f85ec,1bc169d,0,c0754fc4,c0754bf8) at Debugger+0x54
> watchdog_fire(d73e2bcc,c067b433,c0c57100,0,d73e2bcc) at watchdog_fire+0xc1
> hardclock(d73e2bcc,0,0,d73e2b98,c43bd400) at hardclock+0x15a
> clkintr(d73e2bcc,d73e2b9c,c06c6bf6,0,c1925000) at clkintr+0xe9
> intr_execute_handlers(c0756be0,d73e2bcc,c0c570a0,1000,c0c61300) at intr_execute8
> atpic_handle_intr(0) at atpic_handle_intr+0xef
> Xatpic_intr0() at Xatpic_intr0+0x1e
> --- interrupt, eip = 0xc068db74, esp = 0xd73e2c10, ebp = 0xd73e2c14 ---
> uma_zone_slab(c0c61300,1,0,c068e516,c074edd8) at uma_zone_slab+0x4
> uma_zalloc_internal(c0c61300,0,1,0,d73e2c80) at uma_zalloc_internal+0x5c
> bucket_alloc(2f,1,0,0,0) at bucket_alloc+0x65
> uma_zfree_arg(c0c48240,c49121b8,0,c1922580,3600) at uma_zfree_arg+0x2c6
> tcp_hc_purge(0,c1922580,161e9,142b64fd,c05d7c70) at tcp_hc_purge+0x11f

Great debugging work.  I'm glad to see the software watchdog put to use.
This looks like a problem with the hostcache.  Perhaps andre can look at
it.

Thanks,
Jeff

> softclock(0,0,0,0,c192b54c) at softclock+0x25e
> ithread_loop(c1922580,d73e2d48,0,11,55ff44fd) at ithread_loop+0x1d8
> fork_exit(c0524ec0,c1922580,d73e2d48) at fork_exit+0x80
> fork_trampoline() at fork_trampoline+0x8
> --- trap 0x1, eip = 0, esp = 0xd73e2d7c, ebp = 0 ---
>
> This is the dmesg from the last boot:
>
> Copyright (c) 1992-2003 The FreeBSD Project.
> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
>   The Regents of the University of California. All rights reserved.
> FreeBSD 5.2-BETA #7: Wed Nov 26 17:24:32 CET 2003
> [EMAIL PROTECTED]:/usr/obj/usr/src/sys/BORG
> Preloaded elf kernel "/boot/kernel/kernel" at 0xc0823000.
> Timecounter "i8254" frequency 1193182 Hz quality 0
> CPU: Intel(R) Celeron(R) CPU 1.70GHz (1716.04-MHz 686-class CPU)
>   Origin = "GenuineIntel"  Id = 0xf13  Stepping = 3
>   
> Features=0x3febfbff
> real memory  = 536805376 (511 MB)
> avail memory = 511885312 (488 MB)
> Pentium Pro MTRR support enabled
> acpi0:  on motherboard
> pcibios: BIOS version 2.10
> Using $PIR table, 15 entries at 0xc00f7810
> acpi0: Power Button (fixed)
> Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
> acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
> acpi_cpu0:  port 0x530-0x537 on acpi0
> acpi_cpu1:  port 0x530-0x537 on acpi0
> device_probe_and_attach: acpi_cpu1 attach returned 6
> acpi_button0:  on acpi0
> pcib0:  port 0xcf8-0xcff on acpi0
> pci0:  on pcib0
> agp0:  mem 0xe000-0xe3ff at device 0.0 on 
> pci0
> pcib1:  at device 1.0 on pci0
> pci1:  on pcib1
> pcib0: slot 1 INTA is routed to irq 5
> pcib1: slot 0 INTA is routed to irq 5
> pci1:  at device 0.0 (no driver attached)
> pcib2:  at device 30.0 on pci0
> pci2:  on pcib2
> pcib2: slot 10 INTA is routed to irq 10
> pcib2: slot 13 INTA is routed to irq 3
> pcm0:  port 0xdc00-0xdcff irq 10 at device 10.0 on pci2
> em0:  port 0xd800-0xd83f mem 
> 0xdfec-0xdfed,0xdfee-0xdfef irq 3 at device 13.0 on pci2
> em0:  Speed:N/A  Duplex:N/A
> isab0:  at device 31.0 on pci0
> isa0:  on isab0
> atapci0:  port 0xfc00-0xfc0f,0-0x3,0-0x7,0-0x3,0-0x7 
> at device 31.1 on pci0
> ata0: at 0x1f0 irq 14 on atapci0
> ata0: [MPSAFE]
> ata1: at 0x170 irq 15 on atapci0
> ata1: [MPSAFE]
> pci0:  at device 31.3 (no driver attached)
> atkbdc0:  port 0x64,0x60 irq 1 on acpi0
> atkbd0:  flags 0x1 irq 1 on atkbdc0
> kbd0 at atkbd0
> psm0:  irq 12 on atkbdc0
> psm0: model MouseMan+, device ID 0
> fdc0: cmd 3 failed at out byte 1 of 3
> sio0 port 0x3f8-0x3ff irq 4 on acpi0
> sio0: type 16550A, console
> ppc0 port 0x778-0x77b,0x378-0x37f irq 7 drq 3 on acpi0
> ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode
> ppc0: FIFO with 16/16/9 bytes threshold
> ppbus0:  on ppc0
> plip0:  on ppbus0
> lpt0:  on ppbus0
> lpt0: Interrupt-driven port
> ppi0:  on ppbus0
> acpi_cpu1:  port 0x530-0x537 on acpi0
> device_probe_and_attach: acpi_cpu1 attach returned 6
> fdc0: cmd 3 failed at out byte 1 of 3
> npx0: [FAST]
> npx0:  on motherboard
> npx0: INT 16 interface
> orm0:  at iomem 0xe-0xe0fff on isa0
> pmtimer0 on isa0
> fdc0: cannot reserve I/O port range (6 ports)
> sc0:  at flags 0x100 on isa0
> sc0: VGA <16 virtual consoles, flags=0x100>
> sio1: configured irq 3 not in bitmap of probed irqs 0
> sio1: port may not be enabled
> vga0:  at port 0x3c0-0x3df iomem 0xa-0xb on isa0
> Timecounter "TSC" frequency 1716042336 Hz quality 800
> Timecounters tick every 10.000 msec
> IPv6 packet filtering initialized, default to accept, logging limited to 100 
> packets/entry
> ipfw2 initialized, divert disabled, rule-based forwarding enabled, default to 
> accept, logging limited to 100 packets/entry by defau

Re: amd64/SMP(/ata-raid ?) not happy...

2003-11-30 Thread Jeff Roberson

On Sun, 30 Nov 2003, Poul-Henning Kamp wrote:

>
> Timecounters tick every 10.000 msec
> GEOM: create disk ad0 dp=0xff00eebfaca0
> ad0: 35772MB  [72680/16/63] at ata0-master UDMA66
> GEOM: create disk ad4 dp=0xff00eebfa4a0
> ad4: 35304MB  [71730/16/63] at ata2-master UDMA133
> GEOM: create disk ad6 dp=0xff00eebfa0a0
> ad6: 35304MB  [71730/16/63] at ata3-master UDMA133
> GEOM: create disk ad8 dp=0xff00014c4ea0
> ad8: 35304MB  [71730/16/63] at ata4-master UDMA133
> GEOM: create disk ar0 dp=0xff00f04a3270
> ar0: 105913MB  [13502/255/63] status: READY subdisks:
>  disk0 READY on ad4 at ata2-master
>  disk1 READY on ad6 at ata3-master
>  disk2 READY on ad8 at ata4-master
> SMP: AP CPU #1 Launched!
> panic: mtx_lock() of spin mutex (null) @ ../../../vm/uma_core.c:1716

I mailed re about this.  There has been some disagreement over how
mp_maxid is implemented on all architectures.  Until this gets resolved
and stamped as approved by re, please as mp_maxid++; at line 187 of
amd64/amd64/mp_machdep.c

Thanks,
Jeff


> cpuid = 1;
> Stack backtrace:
> backtrace() at backtrace+0x17
> panic() at panic+0x1d2
> _mtx_lock_flags() at _mtx_lock_flags+0x4f
> uma_zfree_arg() at uma_zfree_arg+0x7e
> g_destroy_bio() at g_destroy_bio+0x1b
> g_disk_done() at g_disk_done+0x85
> biodone() at biodone+0x66
> ad_done() at ad_done+0x31
> ata_completed() at ata_completed+0x237
> taskqueue_run() at taskqueue_run+0x88
> taskqueue_swi_run() at taskqueue_swi_run+0x10
> ithread_loop() at ithread_loop+0x189
> fork_exit() at fork_exit+0xbd
> fork_trampoline() at fork_trampoline+0xe
> --- trap 0, rip = 0, rsp = 0xad5b0d30, rbp = 0 ---
> Debugger("panic")
> Stopped at  Debugger+0x4c:  xchgl   %ebx,0x2caefe
> db>
>
> --
> Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
> [EMAIL PROTECTED] | TCP/IP since RFC 956
> FreeBSD committer   | BSD since 4.3-tahoe
> Never attribute to malice what can adequately be explained by incompetence.
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: user:sys time ratio

2003-11-30 Thread Jeff Roberson

On Sun, 30 Nov 2003, Colin Percival wrote:

>Robert Watson suggested that I compare performance from UP and SMP kernels:
>
> # /usr/bin/time -hl sh -c 'make -s buildworld 2>&1' > /dev/null
>Real  UserSys
>UP kernel   38m33.29s 27m10.09s   10m59.15s
>   (retest) 38m33.18s 27m04.40s   11m05.73s
>SMP w/o HTT 41m01.54s 27m10.27s   13m29.82s
>   (retest) 39m47.50s 27m08.05s   12m12.20s
>SMP w/HTT   42m17.16s 28m12.82s   14m04.93s
>   (retest) 44m09.61s 28m15.31s   15m44.86s
>
>That enabling HTT degrades performance is not surprising, since I'm not
> passing the -j option to make; but a 5% performance delta between UP and
> SMP kernels is rather surprising (to me, at least), and the fact that the
> system time varies so much on the SMP kernel also seems peculiar.

So you have enabled SMP on a system with one physical core and two logical
cores?  Looks like almost a 20% slowdown in system time with the SMP
kernel.  It's too bad it's enabled by default now.  I suspect that some of
this is due to using the lock prefix on P4 cores.  It makes the cost of a
mutex over 300 cycles vs 50.  It might be interesting to do an experiment
without HTT, but with SMP enabled and the lock prefix commented out.

I have a set of changes for ULE that should fix some of the HTT slowdown,
although it is inevitable that there will always be some.  If you would
like to try the patch, it's available at:

http://www.chesapeake.net/~jroberson/ulehtt.diff

Cheers,
Jeff

>Is this normal?
>
> Colin Percival
>
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: kernel compile fails in uma_core.c

2003-11-30 Thread Jeff Roberson


On Sun, 30 Nov 2003, Paulius Bulotas wrote:

> Hello,
>
> when building kernel:
> ../../../vm/uma_core.c: In function `zone_timeout':
> ../../../vm/uma_core.c:345: error: `mp_maxid' undeclared (first use in
> this function)
> and so on.
>
> Anything I missed?

I just fixed this, sorry.

>
> Paulius
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: SUJ deadlock

2010-05-03 Thread Jeff Roberson


On Mon, 3 May 2010, Fabien Thomas wrote:


Hi Jeff,

I'm with r207548 now and since some days i've system deadlock.
It seems related to SUJ with process waiting on suspfs or ppwait.


I've also seen it stalled in suspfs, but this information is way better
than what I was able to garner.   I was only able to tell via ctrl-t on
a stalled 'ls' process in a terminal before hard booting.

Right now it occurs everytime I attempt to do the portmaster -a upgrade
of X/KDE on this system.


I've spotted this during multiple portupgrade -aR :)


Hi folks,

I'm really not sure why I haven't been able to reproduce this.  I do have 
some debugging info reported by others.  Hopefully it will be sufficient. 
I will send another mail when I resolve the issue and if I can not I may 
ask for coredumps or other details.


Thanks,
Jeff



Fabien
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: SUJ update

2010-05-04 Thread Jeff Roberson


On Mon, 3 May 2010, Ed Maste wrote:


On Mon, May 03, 2010 at 04:32:37PM -0700, Doug Barton wrote:


I also don't want to bikeshed this to death. I imagine that once the
feature is stable that users will just twiddle it once and then leave it
alone, or it will be set at install time and then not twiddled at all. :)


Speaking of which, is there any reason for us not to support enabling SU+J
at newfs time?  (Other than just needing a clean way to share the code
between tunefs and newfs.)


The code is actually totally different between the two so it'll 
essentially have to be rewritten in newfs.  tunefs uses libufs and some of 
the code for manipulating directories that was added to tunefs needs to be 
moved back into libufs and made more general.  However, newfs doesn't use 
libufs anyway.  So it'd have to be converted or you'd just have to 
re-write journal creation.


For now, I think an extra step in the installer is probably easier.

Thanks,
Jeff



-Ed
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: SUJ deadlock

2010-05-05 Thread Jeff Roberson


On Mon, 3 May 2010, Fabien Thomas wrote:


Hi Jeff,

I'm with r207548 now and since some days i've system deadlock.
It seems related to SUJ with process waiting on suspfs or ppwait.


I've also seen it stalled in suspfs, but this information is way better
than what I was able to garner.   I was only able to tell via ctrl-t on
a stalled 'ls' process in a terminal before hard booting.

Right now it occurs everytime I attempt to do the portmaster -a upgrade
of X/KDE on this system.


I've spotted this during multiple portupgrade -aR :)


Can anyone who has experienced this hang test this patch:

Thanks,
Jeff

Index: ffs_softdep.c
===
--- ffs_softdep.c   (revision 207480)
+++ ffs_softdep.c   (working copy)
@@ -9301,7 +9301,7 @@
hadchanges = 1;
}
/* Leave this inodeblock dirty until it's in the list. */
-   if ((inodedep->id_state & (UNLINKED | DEPCOMPLETE)) == UNLINKED)
+   if ((inodedep->id_state & (UNLINKED | UNLINKONLIST)) == UNLINKED)
hadchanges = 1;
/*
 * If we had to rollback the inode allocation because of




Fabien
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: SUJ update - new panic - "ffs_copyonwrite: recursive call"

2010-05-07 Thread Jeff Roberson


On Sun, 2 May 2010, Vladimir Grebenschikov wrote:


Hi

While 'make buildworld'

kgdb /boot/kernel/kernel /var/crash/vmcore.13
GNU gdb 6.1.1 [FreeBSD]


Hi Vladimir,

I checked in a fix for this at revision 207742.  If you can verify that it 
works for you it would be appreciated.


Thanks!
Jeff


...
#0  0xc056b93c in doadump ()
(kgdb) bt
#0  0xc056b93c in doadump ()
#1  0xc0489019 in db_fncall ()
#2  0xc0489411 in db_command ()
#3  0xc048956a in db_command_loop ()
#4  0xc048b3ed in db_trap ()
#5  0xc05985a4 in kdb_trap ()
#6  0xc06f8b5e in trap ()
#7  0xc06dd6eb in calltrap ()
#8  0xc059870a in kdb_enter ()
#9  0xc056c1d1 in panic ()
#10 0xc066d602 in ffs_copyonwrite ()
#11 0xc068742a in ffs_geom_strategy ()
#12 0xc05d8955 in bufwrite ()
#13 0xc0686e64 in ffs_bufwrite ()
#14 0xc067a8a2 in softdep_sync_metadata ()
#15 0xc068c568 in ffs_syncvnode ()
#16 0xc0681425 in softdep_prealloc ()
#17 0xc066592a in ffs_balloc_ufs2 ()
#18 0xc066a252 in ffs_snapblkfree ()
#19 0xc065eb9a in ffs_blkfree ()
#20 0xc0673de0 in freework_freeblock ()
#21 0xc06797c7 in handle_workitem_freeblocks ()
#22 0xc0679aaf in process_worklist_item ()
#23 0xc06821f4 in softdep_process_worklist ()
#24 0xc0682940 in softdep_flush ()
#25 0xc0542a00 in fork_exit ()
#26 0xc06dd760 in fork_trampoline ()
(kgdb) x/s panicstr
0xc07c2b80:  "ffs_copyonwrite: recursive call"
(kgdb)



--
Vladimir B. Grebenschikov
v...@fbsd.ru


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: SUJ deadlock

2010-05-07 Thread Jeff Roberson


On Fri, 7 May 2010, Fabien Thomas wrote:


fixed/works a lot better for me.


Thanks Fabien,  I just committed this.

Thanks everyone for the assistance finding bugs so far.  Please let me 
know if you run into anything else.  For now I don't know of any other 
than some feature/change requests for tunefs.


Thanks,
Jeff




Applied and restarted portupgrade.
Will tell you tomorrow.

Fabien

Le 6 mai 2010 ? 00:54, Jeff Roberson a ?crit :


On Mon, 3 May 2010, Fabien Thomas wrote:


Hi Jeff,

I'm with r207548 now and since some days i've system deadlock.
It seems related to SUJ with process waiting on suspfs or ppwait.


I've also seen it stalled in suspfs, but this information is way better
than what I was able to garner.   I was only able to tell via ctrl-t on
a stalled 'ls' process in a terminal before hard booting.

Right now it occurs everytime I attempt to do the portmaster -a upgrade
of X/KDE on this system.


I've spotted this during multiple portupgrade -aR :)


Can anyone who has experienced this hang test this patch:

Thanks,
Jeff

Index: ffs_softdep.c
===
--- ffs_softdep.c   (revision 207480)
+++ ffs_softdep.c   (working copy)
@@ -9301,7 +9301,7 @@
  hadchanges = 1;
  }
  /* Leave this inodeblock dirty until it's in the list. */
-   if ((inodedep->id_state & (UNLINKED | DEPCOMPLETE)) == UNLINKED)
+   if ((inodedep->id_state & (UNLINKED | UNLINKONLIST)) == UNLINKED)
  hadchanges = 1;
  /*
   * If we had to rollback the inode allocation because of




Fabien
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"



___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"



___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: LOR: ufs vs bufwait

2010-05-08 Thread Jeff Roberson


On Sat, 8 May 2010, Ulrich Sp?rlein wrote:


On Sat, 08.05.2010 at 18:00:50 +0200, Attilio Rao wrote:

2010/5/8 Ulrich Sp?rlein :

On Sat, 08.05.2010 at 12:20:05 +0200, Ulrich Sp?rlein wrote:

This LOR also is not yet listed on the LOR page, so I guess it's rather
new. I do use SUJ.

lock order reversal:
 1st 0xc48388d8 ufs (ufs) @ /usr/src/sys/kern/vfs_lookup.c:502
 2nd 0xec0fe304 bufwait (bufwait) @ /usr/src/sys/ufs/ffs/ffs_softdep.c:11363
 3rd 0xc49e56b8 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2091
KDB: stack backtrace:
db_trace_self_wrapper(c09394fe,fb817308,c062e515,c061e8ab,c093c4f1,...) at 
db_trace_self_wrapper+0x26
kdb_backtrace(c061e8ab,c093c4f1,c418b168,c418ef28,fb817364,...) at 
kdb_backtrace+0x29
_witness_debugger(c093c4f1,c49e56b8,c092e785,c418ef28,c094369d,...) at 
_witness_debugger+0x25
witness_checkorder(c49e56b8,9,c094369d,82b,0,...) at witness_checkorder+0x839
__lockmgr_args(c49e56b8,80100,c49e56d8,0,0,...) at __lockmgr_args+0x7f9
ffs_lock(fb817488,c062e2bb,c0942b3f,80100,c49e5660,...) at ffs_lock+0x82
VOP_LOCK1_APV(c09bd600,fb817488,c4827cd4,c09d62a0,c49e5660,...) at 
VOP_LOCK1_APV+0xb5
_vn_lock(c49e5660,80100,c094369d,82b,4,...) at _vn_lock+0x5e
vget(c49e5660,80100,c4827c30,50,0,...) at vget+0xb9
vfs_hash_get(c47bea20,b803,8,c4827c30,fb8175d8,...) at vfs_hash_get+0xe6
ffs_vgetf(c47bea20,b803,8,fb8175d8,1,...) at ffs_vgetf+0x49
softdep_sync_metadata(c4838880,0,c0962957,144,0,...) at 
softdep_sync_metadata+0xc82
ffs_syncvnode(c4838880,1,c4827c30,fb817698,246,...) at ffs_syncvnode+0x3e2
ffs_truncate(c4838880,200,0,880,c41fb480,...) at ffs_truncate+0x862
ufs_direnter(c4838880,c49e5660,fb81794c,fb817bd4,0,...) at ufs_direnter+0x8d4
ufs_makeinode(fb817bd4,0,fb817b30,fb817a94,c08e4cf5,...) at ufs_makeinode+0x517
ufs_create(fb817b30,fb817b48,0,0,fb817ba8,...) at ufs_create+0x30
VOP_CREATE_APV(c09bd600,fb817b30,2,fb817ac0,0,...) at VOP_CREATE_APV+0xa5
vn_open_cred(fb817ba8,fb817c5c,1a4,0,c41fb480,...) at vn_open_cred+0x1de
vn_open(fb817ba8,fb817c5c,1a4,c47e2428,0,...) at vn_open+0x3b
kern_openat(c4827c30,ff9c,804c5e8,0,602,...) at kern_openat+0x125
kern_open(c4827c30,804c5e8,0,601,21b6,...) at kern_open+0x35
open(c4827c30,fb817cf8,c0972725,c091f062,c47ea2a8,...) at open+0x30
syscall(fb817d38) at syscall+0x220
Xint0x80_syscall() at Xint0x80_syscall+0x20
--- syscall (5, FreeBSD ELF32, open), eip = 0x2817bf33, esp = 0xbfbfec4c, ebp = 
0xbfbfecb8 ---


And now the system is hanging again. While I can still ping and receive
dmesg updates (eg. USB ports appearing), I/O is frozen solid. This is
during portupgrade, when the configure script runs and usually takes 1-2
minutes to provoke.

This part looks supsicious to me:

db> show alllocks
Process 28014 (mkdir) thread 0xc691ac30 (100152)
exclusive lockmgr bufwait (bufwait) r = 0 (0xec2bdaf0) locked @ 
/usr/src/sys/ufs/ffs/ffs_softdep.c:10684
exclusive lockmgr ufs (ufs) r = 0 (0xc6bcd5a8) locked @ 
/usr/src/sys/kern/vfs_subr.c:2091
exclusive lockmgr bufwait (bufwait) r = 0 (0xec2983f4) locked @ 
/usr/src/sys/ufs/ffs/ffs_softdep.c:11363
exclusive lockmgr ufs (ufs) r = 0 (0xc6d976b8) locked @ 
/usr/src/sys/kern/vfs_lookup.c:502
Process 1990 (sshd) thread 0xc5462750 (100117)
exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xc546e08c) locked @ 
/usr/src/sys/kern/uipc_sockbuf.c:148
Process 12 (intr) thread 0xc41f4750 (14)
exclusive sleep mutex ttymtx (ttymtx) r = 0 (0xc425ae04) locked @ 
/usr/src/sys/dev/dcons/dcons_os.c:232
db>


Along with show alllocks may you also get the following from DDB:
ps, show pcpu, alltrace, lockedvnods.


1. a kernel before SUJ went in is running fine with SU only
2. the following is on a recent -CURRENT that has SUJ, *but* i've
disabled it, so it is running with soft-updates only (I hope)

I ran a portupgrade and the first configure script triggered the I/O
hang

db> ps
 pid  ppid  pgrp   uid   state   wmesg wchancmd
13467 13444 12937 0  R+  mkdir
13444 13204 12937 0  S+  wait 0xc54352a8 sh
13204 13035 12937 0  S+  wait 0xc5436000 sh
13035 12937 12937 0  S+  wait 0xc4ffad48 sh
12937 12936 12937 0  Ss+ wait 0xc4ff9d48 make
12936  3722  3722 0  R+  script
3722  2021  3722 0  S+  (threaded)  ruby18
100132   S   wait 0xc4ffa7f8 ruby18
2404  2007  2404  1000  Ss+ ttyin0xc4d74870 zsh
2325  2015  2325  1000  R+  top
2021  2009  2021 0  S+  pause0xc4ff9058 csh
2015  2007  2015  1000  Ss+ pause0xc4ffa058 zsh
2009  2007  2009  1000  Ss+ pause0xc4d4e850 zsh
2007  2006  2007  1000  Rs  screen
2006  1991  2006  1000  R+  screen
2005  2001  2005 0  R+  systat
2001  1976  2001 0  S+  pause0xc3d52058 csh
2000 1  2000 0  Ss  select   0xc3d5b1a4 ssh-agent
1991  1990  1991  1000  Ss+ pause0xc3d52850 zsh
1990  1986  1986

Re: LOR: ufs vs bufwait

2010-05-12 Thread Jeff Roberson


On Wed, 12 May 2010, Ulrich Sp?rlein wrote:


On Mon, 10.05.2010 at 22:53:32 +0200, Attilio Rao wrote:

2010/5/10 Peter Jeremy :

On 2010-May-08 12:20:05 +0200, Ulrich Sp?rlein  wrote:

This LOR also is not yet listed on the LOR page, so I guess it's rather
new. I do use SUJ.

lock order reversal:
1st 0xc48388d8 ufs (ufs) @ /usr/src/sys/kern/vfs_lookup.c:502
2nd 0xec0fe304 bufwait (bufwait) @ /usr/src/sys/ufs/ffs/ffs_softdep.c:11363
3rd 0xc49e56b8 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2091


I'm seeing exactly the same LOR (and subsequent deadlock) on a recent
-current without SUJ.


I think this LOR was reported since a long time.
The deadlock may be new and someway related to the vm_page_lock work
(if not SUJ).


I was not able to reproduce this with a kernel prior to SUJ, a kernel
just after SUJ went it shows this "deadlock" or infinite loop ...

Now it might be that the SUJ kernel only increases the pressure so it
happens during a systems uptime. It does not seem directly related to
actually using SUJ on a volume, as I could reproduce it with SU only,
too.

I will try to get a hang not involving GELI and also re-do my tests when
the volumes have neither SUJ nor SU enabled, which led to 10-20s "hangs"
of the system IIRC. It seems SU/SUJ then only prolongs these hangs ad
infinitum.


I think Peter Holm also saw this once while we were testing SUJ and 
reproduced ~30 second hangs with stock sources.  At this point we need to 
brainstorm ideas for adding debugging instrumentation and come up with the 
quickest possible repro.


It would probably be good to add some KTR tracing and log that when it 
wedges.  The core I looked at was hung in bufwait.  Is there any cpu 
activity or io activity when things hang?  You'll prboably have to keep 
iostat/vmstat in memory to find out so they don't try to fault in pages 
once things are hung.


Thanks,
Jeff



I'll be back next week with new results here

Uli
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

SUJ Changes

2010-05-17 Thread Jeff Roberson

I fixed the sparse inode tunefs bug and changed the tunefs behavior based 
on discussions here on curr...@.  Hopefully this works for everyone.


I have one bad perf bug and one journal overflow bug left to resolve. 
Please keeps the reports coming and thank you for your help.


Thanks,
Jeff

-- Forwarded message --
Date: Tue, 18 May 2010 01:45:28 + (UTC)
From: Jeff Roberson 
To: src-committ...@freebsd.org, svn-src-...@freebsd.org,
svn-src-h...@freebsd.org
Subject: svn commit: r208241 - head/sbin/tunefs

Author: jeff
Date: Tue May 18 01:45:28 2010
New Revision: 208241
URL: http://svn.freebsd.org/changeset/base/208241

Log:
   - Round up the journal size to the block size so we don't confuse fsck.

  Reported by:  Mikolaj Golub 

   - Only require 256k of blocks per-cg when trying to allocate contiguous
 journal blocks.  The storage may not actually be contiguous but is at
 least within one cg.
   - When disabling SUJ leave SU enabled and report this to the user.  It
 is expected that users will upgrade SU filesystems to SUJ and want
 a similar downgrade path.

Modified:
  head/sbin/tunefs/tunefs.c

Modified: head/sbin/tunefs/tunefs.c
==
--- head/sbin/tunefs/tunefs.c   Tue May 18 00:46:15 2010(r208240)
+++ head/sbin/tunefs/tunefs.c   Tue May 18 01:45:28 2010(r208241)
@@ -358,10 +358,12 @@ main(int argc, char *argv[])
warnx("%s remains unchanged as disabled", name);
} else {
journal_clear();
-   sblock.fs_flags &= ~(FS_DOSOFTDEP | FS_SUJ);
+   sblock.fs_flags &= ~FS_SUJ;
sblock.fs_sujfree = 0;
-   warnx("%s cleared, "
-   "remove .sujournal to reclaim space", name);
+   warnx("%s cleared but soft updates still set.",
+   name);
+
+   warnx("remove .sujournal to reclaim space");
}
}
}
@@ -546,7 +548,7 @@ journal_balloc(void)
 * Try to minimize fragmentation by requiring a minimum
 * number of blocks present.
 */
-   if (cgp->cg_cs.cs_nbfree > 128 * 1024 * 1024)
+   if (cgp->cg_cs.cs_nbfree > 256 * 1024)
break;
if (contig == 0 && cgp->cg_cs.cs_nbfree)
break;
@@ -906,6 +908,8 @@ journal_alloc(int64_t size)
if (size / sblock.fs_fsize > sblock.fs_fpg)
size = sblock.fs_fpg * sblock.fs_fsize;
size = MAX(SUJ_MIN, size);
+   /* fsck does not support fragments in journal files. */
+   size = roundup(size, sblock.fs_bsize);
}
resid = blocks = size / sblock.fs_bsize;
if (sblock.fs_cstotal.cs_nbfree < blocks) {
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: ffs_copyonwrite panics

2010-05-19 Thread Jeff Roberson


On Tue, 18 May 2010, Roman Bogorodskiy wrote:


Hi,

I've been using -CURRENT last update in February for quite a long time
and few weeks ago decided to finally update it. The update was quite
unfortunate as system became very unstable: it just hangs few times a
day and panics sometimes.

Some things can be reproduced, some cannot. Reproducible ones:

1. background fsck always makes system hang
2. system crashes on operations with nullfs mounts (disabled that for
now)

The most annoying one is ffs_copyonwrite panic which I cannot reproduce.
The thing is that if I will run 'startx' on it with some X apps it will
panic just in few minutes. When I leave the box with nearly no stress
(just use it as internet gateway for my laptop) it behaves a little
better but will eventually crash in few hours anyway.


This may have been my fault.  Can you please update and let me know if it 
is resolved?  There was both a deadlock and a copyonwrite panic as a 
result of the softupdates journaling import.  I just fixed the deadlock 
today.


Thanks,
Jeff



The even more annoying thing is that when I cannot save the dump,
because when the system boots and runs 'savecore' it leads to
fss_copyonwrite panic as well. The panic happens when about 90% complete
(as seem via ctrl-t).

Any ideas how to debug and get rid of this issue?

System arch is amd64. I don't know what other details could be useful.

Roman Bogorodskiy


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: LOR: ufs vs bufwait

2010-05-21 Thread Jeff Roberson


On Fri, 21 May 2010, Erik Cederstrand wrote:



Den 12/05/2010 kl. 22.44 skrev Jeff Roberson:


I think Peter Holm also saw this once while we were testing SUJ and reproduced 
~30 second hangs with stock sources.  At this point we need to brainstorm ideas 
for adding debugging instrumentation and come up with the quickest possible 
repro.


FWIW, I get this LOR on a ClangBSD virtual machine running the stess2 test 
suite.

I can reproduce the LOR reliably like this:

# cd stress2
#./run.sh lockf.cfg
- press ctrl-C
- another LOR is triggered by the ctrl-C (a dirhash/bufwait LOR described in 
kern/137852)
# ./run.sh mkdir.cfg
- LOR is triggered immediately

Erik


The LOR is actually safe.  I need to bless the acquisition.  We have 
always acquired the buffers in this order.


The deadlocks people were seeing were actually livelocks due to 
softdepflush looping indefinitely.  I have committed a fix for that.


Thanks,
Jeff
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: ffs_copyonwrite panics

2010-05-23 Thread Jeff Roberson


On Sun, 23 May 2010, Roman Bogorodskiy wrote:


 Jeff Roberson wrote:


On Tue, 18 May 2010, Roman Bogorodskiy wrote:


Hi,

I've been using -CURRENT last update in February for quite a long time
and few weeks ago decided to finally update it. The update was quite
unfortunate as system became very unstable: it just hangs few times a
day and panics sometimes.

Some things can be reproduced, some cannot. Reproducible ones:

1. background fsck always makes system hang
2. system crashes on operations with nullfs mounts (disabled that for
now)

The most annoying one is ffs_copyonwrite panic which I cannot reproduce.
The thing is that if I will run 'startx' on it with some X apps it will
panic just in few minutes. When I leave the box with nearly no stress
(just use it as internet gateway for my laptop) it behaves a little
better but will eventually crash in few hours anyway.


This may have been my fault.  Can you please update and let me know if it
is resolved?  There was both a deadlock and a copyonwrite panic as a
result of the softupdates journaling import.  I just fixed the deadlock
today.


Tried today's -CURRENT and unfortunately the behaviour is still same.


Can you give me a full stack trace?  Do you have coredumps enabled?  I 
would like to have you look at a few things in a core or send it to me 
with your kernel.


Thanks,
Jeff



Roman Bogorodskiy


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Panic on current when enabling SUJ

2010-06-03 Thread Jeff Roberson


On Thu, 3 Jun 2010, John Doe wrote:


Boot into single user-mode

# tunefs -j enable /
# tunefs -j enable /usr
# tunefs -j enable /tmp
# tunefs -j enable /var
# reboot

The machine then panics.

Looks like the machine is trying to write to a read-only filesystem.


Can you please give me information on the panic?  What was the state of 
the filesystems upon reboot?  Does dumpfs show suj enabled?


Thanks,
Jeff





___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: crash: bwrite: need chained iodone

2003-03-11 Thread Jeff Roberson

I've trimmed to the relavent part of the stack.

On Tue, 11 Mar 2003, Thomas Quinot wrote:

> #11 0xc0232072 in bwrite (bp=0xce5313e0) at /usr/src/sys/kern/vfs_bio.c:795
> #12 0xc0232a7c in bawrite (bp=0x0) at /usr/src/sys/kern/vfs_bio.c:1138
> #13 0xc023a02b in cluster_wbuild (vp=0xc4a21124, size=16384, start_lbn=44,
> len=3) at /usr/src/sys/kern/vfs_cluster.c:996
> #14 0xc02396ff in cluster_write (bp=0xce6bd4e8, filesize=753664, seqcount=18)
> at /usr/src/sys/kern/vfs_cluster.c:596
> #15 0xc02e3fec in ffs_write (ap=0xe5db4be0)
> at /usr/src/sys/ufs/ffs/ffs_vnops.c:728
> #16 0xc024e1b2 in vn_write (fp=0xc456921c, uio=0xe5db4c7c,
> ---Type  to continue, or q  to quit---
> active_cred=0xc48e5780, flags=0, td=0xc46e2000) at vnode_if.h:417
> #17 0xc0214008 in dofilewrite (td=0xc46e2000, fp=0xc456921c, fd=0,
> buf=0x8e1e400, nbyte=0, offset=0, flags=0) at file.h:239
> #18 0xc0213e49 in write (td=0xc46e2000, uap=0xe5db4d10)
> at /usr/src/sys/kern/sys_generic.c:329
> #19 0xc033a68e in syscall (frame=
>   {tf_fs = 47, tf_es = 47, tf_ds = 134742063, tf_edi = 677204256, tf_esi = 0, 
> tf_ebp = -1077939928, tf_isp = -438612620, tf_ebx = 677216484, tf_edx = 20, tf_ecx = 
> 0, tf_eax = 4, tf_trapno = 0, tf_err = 2, tf_eip = 677548851, tf_cs = 31, tf_eflags 
> = 518, tf_esp = -1077939988, tf_ss = 47})
> at /usr/src/sys/i386/i386/trap.c:1030
> #20 0xc032a89d in Xint0x80_syscall () at {standard input}:138
> ---Can't read userspace from dump, or kernel process---

>
> #11 0xc0232072 in bwrite (bp=0xce5313e0) at /usr/src/sys/kern/vfs_bio.c:795
> 795   panic("bwrite: need chained iodone");

> (kgdb) list
> 790   (bp->b_flags & B_ASYNC) &&
> 791   !vm_page_count_severe() &&
> 792   !buf_dirty_count_severe()) {
> 793   if (bp->b_iodone != NULL) {
> 794   printf("bp->b_iodone = %p\n", bp->b_iodone);
> 795   panic("bwrite: need chained iodone");
> 796   }
> 797
> 798   /* get a new block */
> 799   newbp = geteblk(bp->b_bufsize);
> (kgdb) print bp->b_iodone
> $1 = (void (*)(struct buf *)) 0xc0239320 
> (kgdb) quit

Can you please print bp?  I'd like to know what all of the members are.  A
cluster buf should NEVER have BX_BKGRDWRITE set.  This is totally bogus.

> I still have the crash dump at hand, if further forensics is necessary.
>

Thanks!


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: crash: bwrite: need chained iodone

2003-03-12 Thread Jeff Roberson

On Wed, 12 Mar 2003, Thomas Quinot wrote:

> Le 2003-03-12, Jeff Roberson écrivait :
>
> > Can you please print bp?  I'd like to know what all of the members are.  A
> > cluster buf should NEVER have BX_BKGRDWRITE set.  This is totally bogus.
>
> (kgdb) fr
> #11 0xc0232072 in bwrite (bp=0xce5313e0) at
> /usr/src/sys/kern/vfs_bio.c:795
> 795 panic("bwrite: need chained iodone");
> (kgdb) print *bp
> $3 = {b_io = {bio_cmd = 2, bio_dev = 0x, bio_disk = 0x0,
> bio_blkno = 18540672, bio_offset = 9492758528, bio_bcount = 32768,
> bio_data = 0xd42da000 "", bio_flags = 0, bio_error = 0, bio_resid = 0,
> bio_done = 0xc0235db0 , bio_driver1 = 0x0, bio_driver2 = 0x0,
> bio_caller1 = 0x0, bio_caller2 = 0xce5313e0, bio_queue = {tqe_next = 0x0,
>   tqe_prev = 0xc408200c}, bio_attribute = 0x0, bio_from = 0x0,
> bio_to = 0x0, bio_length = 0, bio_completed = 0, bio_children = 91,
> bio_inbed = 0, bio_parent = 0x0, bio_t0 = {sec = 0, frac = 0},
> bio_task = 0, bio_task_arg = 0x0, bio_pblkno = 64}, b_op = 0xc03a89f8,
>   b_magic = 280038160, b_iodone = 0xc0239320 ,
>   b_offset = 688128, b_vnbufs = {tqe_next = 0x0, tqe_prev = 0x0},
>   b_left = 0x0, b_right = 0x0, b_vflags = 0, b_freelist = {
> tqe_next = 0xce531228, tqe_prev = 0xc03dcb3c}, b_qindex = 0,
>   b_flags = 1677721604, b_xflags = 0 '\0', b_lock = {
> lk_interlock = 0xc03d750c, lk_flags = 0, lk_sharecount = 0,
> lk_waitcount = 0, lk_exclusivecount = 0, lk_prio = 80,
> lk_wmesg = 0xc0379b53 "bufwait", lk_timo = 0, lk_lockholder = 0x,
> lk_newlock = 0x0}, b_bufsize = 32768, b_runningbufspace = 0,
>   b_kvabase = 0xd42da000 "", b_kvasize = 32768, b_lblkno = 42,
>   b_vp = 0xc4a21124, b_object = 0x0, b_dirtyoff = 0, b_dirtyend = 32768,
>
>   b_rcred = 0x0, b_wcred = 0x0, b_saveaddr = 0xbfbfea40, b_pager = {
> pg_spc = 0x0, pg_reqpage = 0}, b_cluster = {cluster_head = {
>   tqh_first = 0xce67bfe8, tqh_last = 0xce6b6b80}, cluster_entry = {
>   tqe_next = 0xce67bfe8, tqe_prev = 0xce6b6b80}}, b_pages = {0xc0d14748,
> 0xc0acff90, 0xc0a7cbd8, 0xc0bffc20, 0xc1074868, 0xc10106b0, 0xc10700f8,
> 0xc0f9c040, 0xc0af5808, 0xc0c56c50, 0xc0b47198, 0xc0bdb9e0, 0xc10e7b28,
> 0xc0abba70, 0xc09888b8, 0xc09d3600, 0xc0d14748, 0xc0acff90, 0xc0a7cbd8,
> 0xc0bffc20, 0xc1074868, 0xc10106b0, 0xc10700f8, 0xc0f9c040, 0xc0d21888,
> 0xc105cfd0, 0xc1057f18, 0xc109ff60, 0xc0a18948, 0xc0ab3d90, 0xc0a36fd8,
> 0xc0b91820}, b_npages = 8, b_dep = {lh_first = 0x0}}
> (kgdb)
>
> Hum. Now this is *most* peculiar. bp->b_xflags is 0, so we should never
> have entered that 'if', unless there is a race condition somewhere such
> that we test b_xflags on a buffer and carry on processing on another...
>

Can you disable sync on panic to make sure that something has not come
along and cleaned this buffer?  I suspect that it has been modified after
the first panic.  Do you know when this first started to happen?  Do you
have any more clues into what triggered it?

Cheers,
Jeff


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: big file became broken on 2003-03-11(cvsuped)

2003-03-12 Thread Jeff Roberson

On Thu, 13 Mar 2003, Norikatsu Shigemura wrote:

>   Big file like OOo_1.0.2_source.tar.bz2 became broken with making
>   openoffice in my environment.

How much memory is in your machine?  Can you go back to an earlier date
and see if this is still a problem? Are you doing anything else with the
machine while this is going on?

Thanks,
Jeff

> # uname -a
> FreeBSD ***.*-.*** 5.0-CURRENT FreeBSD 5.0-CURRENT #2: Wed Mar 12 
> 18:39:05 JST 2003 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/MELFINA  i386
>
> # mount
> /dev/ad0s1a on / (ufs, local)
> devfs on /dev (devfs, local)
> /dev/md0 on /tmp (ufs, local, soft-updates)
> /dev/ad0s1f on /export (ufs, local, soft-updates)
> /dev/ad0s1d on /usr (ufs, local, soft-updates)
> /dev/ad0s1e on /var (ufs, local, soft-updates)
> /usr/compat on /compat (nullfs, local)
> linprocfs on /compat/linux/proc (linprocfs, local)
> /export/home on /home (nullfs, local)
>
>   I tested on /, /var, /usr, /home, /tmp. There is a same problem.
>
> # cd /any/partition; scp -p 
> mystable.machine:/usr/ports/distfiles/openoffice/OOo_1.0.2_source.tar.bz2 .; chflags 
> schg OOo_1.0.2_source.tar.bz2; while true; do md5 OOo_1.0.2_source.tar.bz2; sleep 1; 
> done
> OOo_1.0.2_source.tar 100% |*|   154 MB00:41
> MD5 (OOo_1.0.2_source.tar.bz2) = 8a82b4dbdd4e305b6f6db70ea65dce8c
> MD5 (OOo_1.0.2_source.tar.bz2) = 8a82b4dbdd4e305b6f6db70ea65dce8c
>   (snip)
> MD5 (OOo_1.0.2_source.tar.bz2) = 83d7c6e49bb4586ba9b8478798952c29
> MD5 (OOo_1.0.2_source.tar.bz2) = 83d7c6e49bb4586ba9b8478798952c29
>   (snip)
> MD5 (OOo_1.0.2_source.tar.bz2) = 142ee73901a58445ebd4cccb3d0af223
> MD5 (OOo_1.0.2_source.tar.bz2) = 2e9fa2b1b924595eb11760cd728ade95
> MD5 (OOo_1.0.2_source.tar.bz2) = c3eee272f6f9b4c90f10c8ca0a2eb537
>   :
>
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-current" in the body of the message
>


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: big file became broken on 2003-03-11(cvsuped)

2003-03-12 Thread Jeff Roberson

On Thu, 13 Mar 2003, Jeff Roberson wrote:

> On Thu, 13 Mar 2003, Norikatsu Shigemura wrote:
>
> > Big file like OOo_1.0.2_source.tar.bz2 became broken with making
> > openoffice in my environment.
>
> How much memory is in your machine?  Can you go back to an earlier date
> and see if this is still a problem? Are you doing anything else with the
> machine while this is going on?

Also, can you do 'sysctl vfs.read_max=0' and retest?

>
> > # uname -a
> > FreeBSD ***.*-.*** 5.0-CURRENT FreeBSD 5.0-CURRENT #2: Wed Mar 12 
> > 18:39:05 JST 2003 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/MELFINA  i386
> >
> > # mount
> > /dev/ad0s1a on / (ufs, local)
> > devfs on /dev (devfs, local)
> > /dev/md0 on /tmp (ufs, local, soft-updates)
> > /dev/ad0s1f on /export (ufs, local, soft-updates)
> > /dev/ad0s1d on /usr (ufs, local, soft-updates)
> > /dev/ad0s1e on /var (ufs, local, soft-updates)
> > /usr/compat on /compat (nullfs, local)
> > linprocfs on /compat/linux/proc (linprocfs, local)
> > /export/home on /home (nullfs, local)
> >
> > I tested on /, /var, /usr, /home, /tmp. There is a same problem.
> >
> > # cd /any/partition; scp -p 
> > mystable.machine:/usr/ports/distfiles/openoffice/OOo_1.0.2_source.tar.bz2 .; 
> > chflags schg OOo_1.0.2_source.tar.bz2; while true; do md5 
> > OOo_1.0.2_source.tar.bz2; sleep 1; done
> > OOo_1.0.2_source.tar 100% |*|   154 MB00:41
> > MD5 (OOo_1.0.2_source.tar.bz2) = 8a82b4dbdd4e305b6f6db70ea65dce8c
> > MD5 (OOo_1.0.2_source.tar.bz2) = 8a82b4dbdd4e305b6f6db70ea65dce8c
> > (snip)
> > MD5 (OOo_1.0.2_source.tar.bz2) = 83d7c6e49bb4586ba9b8478798952c29
> > MD5 (OOo_1.0.2_source.tar.bz2) = 83d7c6e49bb4586ba9b8478798952c29
> > (snip)
> > MD5 (OOo_1.0.2_source.tar.bz2) = 142ee73901a58445ebd4cccb3d0af223
> > MD5 (OOo_1.0.2_source.tar.bz2) = 2e9fa2b1b924595eb11760cd728ade95
> > MD5 (OOo_1.0.2_source.tar.bz2) = c3eee272f6f9b4c90f10c8ca0a2eb537
> > :
> >
> > To Unsubscribe: send mail to [EMAIL PROTECTED]
> > with "unsubscribe freebsd-current" in the body of the message
> >
>
>
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-current" in the body of the message
>


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

1 2 3 >

1 - 100 of 226 matches

Mail list logo