Re: Curious failure of ZFS snapshots

2008-12-01 Thread Gerrit Kühn
On Sat, 29 Nov 2008 11:46:40 +0100 Pawel Jakub Dawidek <[EMAIL PROTECTED]>
wrote about Re: Curious failure of ZFS snapshots:

PJD> > > GK> mclane# ll /tank/home/pt/.zfs/
PJD> > > GK> ls: snapshot: Bad file descriptor
PJD> > > GK> total 0

PJD> Is there a way for me to reproduce that?

None that I could tell you right now.
This was on a machine which uses zfs send/receive to backup its zfs
filesystem to a backup server. Only one out of 6 or 7 zfs filesystems
showed this problem. After rebooting it went away and did not appear again
since then.

cu
  Gerrit
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Curious failure of ZFS snapshots

2008-12-01 Thread Gerrit Kühn
On Sun, 30 Nov 2008 01:05:48 + Pete French
<[EMAIL PROTECTED]> wrote about Re: Curious failure of ZFS
snapshots:

PF> Here is what I am doing - this script is run with an argument '7am' or
PF> '7pm' once per day. the mysql database is a slave replication from a
PF> master, so there is a continuous trickle of data into it. The symbolic
PF> links are there so you can connect to the mysql server and access
PF> 'xxx-7am' or 'xxx-7pm' to get a previous version of database 'xxx'.
PF> In case its not obvious, the filesystem 'tank/zfs' is mounted on the
PF> director '/var/db/mysql'. If you run this for a few cycles it should
PF> preseumably break for you too.

If you think it will be useful I can also post my scripts. However, as I
did not see the problem again so far, it might be the case that I messed
something up manually while developing the scripts one or two weeks ago.
As mentioned, even the unaccessible zfs snapshots did send/receive fine,
so internally zfs seems to be happy (only unmounting them was a bad
idea :-).


cu
  Gerrit
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


dhclient doing DISCOVER with bad IP checksum - bge

2008-12-01 Thread Jonathan Feally

Sorry for the cross-post, but this could be either lists problem.

I have 2 boxes running 7-STABLE as of 20081130, both i386 SMP. One is 
running ISC DHCPD 3.0.x from recent ports, and the other dhclient from 
make world.


The server is refusing to answer the DISCOVER request, as it thinks the 
IP checksum is wrong, which tcpdump also confirms. Other DHCP clients 
are working fine on this network, so I do not believe it to be the 
network, server or dhcpd.


Server is running a 2 Port Intel card - em driver.

Client is a Dell PE1750 with 2 onboard NIC's - bge driver.

I have tried turning off both RXCSUM and TXCSUM on both the client and 
server machines with no luck. I also tried the second NIC on the server 
with the same result.


This setup was working just a couple of weeks ago, and the only thing 
that has changed is updating the src for a make world. PXE booting this 
server does result in an IP being issued, so it is pointing towards 
something new/changed in 7-STABLE.


I have attached a 3 packet dump of the DISCOVER requests.

Can anybody shed some light on this for me?

Thanks, -Jon

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



dhclient_badcsum.cap
Description: Binary data
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

make distribution halts during install (7.1Prerelease today)

2008-12-01 Thread Dewayne Geraghty
The "make distribution" phase of a full build of 7.1 Prerelease, sourced today 
(Mon Dec  1 10:34:45 UTC 2008), unfortunately failed.

make distribution DESTDIR=/differentplace 
failed (see below), however the following worked ok: buildkernel, 
installkernerl, buildworld, installworld. Two systems (a Uni and Dual 
processor) were used and both failed at the same point.  Both building systems 
were successful in building/installing kernel and world for themselves and for 
a different DESTDIR (per the Handbook).  Only the make distribution failed (a 
clue?)

The repeated attempts to make distribution DESTDIR=X failed at the same 
location (see below). The error message suggestions incorrect parameters to 
"install".  Advise/guidance welcome.

#cd /usr/src && make DESTDIR=/usr/k_brfw-d distribution
cd /usr/src/etc; MAKEOBJDIRPREFIX=/usr/obj  MACHINE_ARCH=i386  MACHINE=i386  
CPUTYPE=  GROFF_BIN_PATH=/usr/obj/usr/src/tmp/legacy/usr/bin  
GROFF_FONT_PATH=/usr/obj/usr/src/tmp/legacy/usr/share/groff_font  
GROFF_TMAC_PATH=/usr/obj/usr/src/tmp/legacy/usr/share/tmac 
PATH=/usr/obj/usr/src/tmp/legacy/usr/sbin:/usr/obj/usr/src/tmp/legacy/usr/bin:/usr/obj/usr/src/tmp/legacy/usr/games:/usr/obj/usr/src/tmp/usr/sbin:/usr/obj/usr/src/tmp/usr/bin:/usr/obj/usr/src/tmp/usr/games:/sbin:/bin:/usr/sbin:/usr/bin
 make distribution
cd /usr/src/etc;  install -o root -g wheel -m 644  amd.map apmd.conf auth.conf  
crontab csh.cshrc csh.login csh.logout devd.conf devfs.conf  ddb.conf 
dhclient.conf disktab fbtab freebsd-update.conf  ftpusers gettytab group  hosts 
hosts.allow hosts.equiv hosts.lpd  inetd.conf libalias.conf login.access 
login.conf mac.conf motd  netconfig network.subr networks newsyslog.conf 
nsswitch.conf  portsnap.conf pf.os phones profile protocols  rc rc.bsdextended 
rc.firewall rc.firewall6 rc.initdiskless  rc.sendmail rc.shutdown  rc.subr 
remote rpc services shells  snmpd.config sysctl.conf syslog.conf  etc.i386/ttys 
 /usr/src/etc/../gnu/usr.bin/man/manpath/manpath.config  
/usr/src/etc/../usr.bin/mail/misc/mail.rc  
/usr/src/etc/../usr.bin/locate/locate/locate.rc nscd.conf /usr/k_brfw-d/etc;  
cap_mkdb -l /usr/k_brfw-d/etc/login.conf;  install -o root -g wheel -m 755  
netstart pccard_ether rc.suspend rc.resume /usr/k_brfw-d/etc;  install -o root 
-g wheel -m 600 
 master.passwd nsmb.conf opieaccess /usr/k_brfw-d/etc;  pwd_mkdb -L -i -p -d 
/usr/k_brfw-d/etc  /usr/k_brfw-d/etc/master.passwd
install: wrong number or types of arguments
usage: install [-bCcpSsv] [-B suffix] [-f flags] [-g group] [-m mode]
   [-o owner] file1 file2
   install [-bCcpSsv] [-B suffix] [-f flags] [-g group] [-m mode]
   [-o owner] file1 ... fileN directory
   install -d [-v] [-g group] [-m mode] [-o owner] directory ...
*** Error code 64

Stop in /usr/src/etc.
*** Error code 1

Stop in /usr/src.
*** Error code 1

Stop in /usr/src.

I also tried various CPUTYPES to ensure that all parameters to make, were 
populated.  Having spent most of the day building kernels/worlds and 
gstripping, gjournalling and building a lot of ports, the package is looking 
pretty good.  I hope that my explanation is concise it's been a long day and 
I'm stuck.
Regards, Dewayne.


  Start your day with Yahoo!7 and win a Sony Bravia TV. Enter now 
http://au.docs.yahoo.com/homepageset/?p1=other&p2=au&p3=tagline

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


7.1-PRERELEASE: arcmsr write performance problem

2008-12-01 Thread Jan Mikkelsen

Hi,

I am seeing extremely poor performance (~100kB/s) when untaring large tar 
files into fresh ufs filesystems.  I see the problem with softupdates and 
without softupdates but with an async mount.  This is a Supermicro X7DB8 
board, 4GB, 2 x Xeon 5140.


Sample gstat output:

dT: 1.033s  w: 1.000s
L(q)  ops/sr/s   kBps   ms/rw/s   kBps   ms/w   %busy Name
 585 61  0  00.0 61170 13812.0  100.1| da2

I see ms/w start at about 200ms with a ~3MB/s throughput, and then I see 
ms/w rise and kBps drop.  ms/w goes as high as 16-20s, and then suddenly 
drops back down to about 200ms.   Using iostat, while the performance is 
high(er), kb/t is 64kB, as the problem starts it drops towards 2kB.


Copying a single large file doesn't exhibit this problem, although 
throughput isn't great (~3-5MB/s).  However, that's better that 100kB/s.


arcmsr0: mem 0xd890-0xd8900fff,0xd800-0xd83f irq 16 at device 14.0 
on pci10

ARECA RAID ADAPTER0: Driver Version 1.20.00.15 2007-10-07
ARECA RAID ADAPTER0: FIRMWARE VERSION V1.46 2008-08-06
arcmsr0: [ITHREAD]

There are eight disks connected in a RAID-6 configuration.  The 
controller's cache is write-through and the disks' write caches are 
disabled.  NCQ is enabled on the drives.


The same hardware when it ran 6.3-p1 didn't have this problem.  However, 
the system BIOS was updated at the same time as the operating system (in 
an attempt to solve a recent em problem), so it is possible that it is a 
BIOS related problem.  The same build on an entirely different machine 
with an aac controller and SAS disks also doesn't show this problem.


Running 'devinfo -r' doesn't list arcmsr as having an interrupt at all. 
(see below).  That strikes me as odd; checking another machine that is 
still running 6.2 with an arcmsr controller, I can see the interrupt just 
fine.


So:

- Does anyone have any suggestions?

- Is it normal for arcmsr to not show an interrupt in the output from 
devinfo in 7.1?


Full dmesg, devinfo below.

Thanks,

Jan Mikkelsen


Copyright (c) 1992-2008 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 7.1-PRERELEASE #0: Mon Dec  1 14:53:12 EST 2008
   [EMAIL 
PROTECTED]:/home/janm/p4/freebsd-image-std-2008.2/work/base-freebsd/home/janm/p4/freebsd-image-std-2008.2/FreeBSD/src/sys/TW-SMP
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(R) CPU5140  @ 2.33GHz (2333.35-MHz 
K8-class CPU)

 Origin = "GenuineIntel"  Id = 0x6f6  Stepping = 6
 
Features=0xbfebfbff
 Features2=0x4e3bd
 AMD Features=0x20100800
 AMD Features2=0x1
 Cores per package: 2
usable memory = 4280651776 (4082 MB)
avail memory  = 4117843968 (3927 MB)
ACPI APIC Table: 
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
cpu0 (BSP): APIC ID:  0
cpu1 (AP): APIC ID:  1
cpu2 (AP): APIC ID:  6
cpu3 (AP): APIC ID:  7
ioapic0  irqs 0-23 on motherboard
ioapic1  irqs 24-47 on motherboard
kbd1 at kbdmux0
ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, 
RF5413)

acpi0:  on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0
pcib0:  port 0xcf8-0xcff on acpi0
pci0:  on pcib0
pcib1:  at device 2.0 on pci0
pci1:  on pcib1
pcib2:  irq 16 at device 0.0 on pci1
pci2:  on pcib2
pcib3:  irq 16 at device 0.0 on pci2
pci3:  on pcib3
pcib4:  at device 0.0 on pci3
pci4:  on pcib4
ahd0:  port 
0x2400-0x24ff,0x2000-0x20ff mem 0xd850-0xd8501fff irq 16 at device 
2.0 on pci4

ahd0: [ITHREAD]
aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI-X 67-100Mhz, 512 SCBs
ahd1:  port 
0x2c00-0x2cff,0x2800-0x28ff mem 0xd8502000-0xd8503fff irq 17 at device 
2.1 on pci4

ahd1: [ITHREAD]
aic7902: Ultra320 Wide Channel B, SCSI Id=7, PCI-X 67-100Mhz, 512 SCBs
pcib5:  at device 0.2 on pci3
pci5:  on pcib5
bge0:  mem 
0xd860-0xd860 irq 16 at device 1.0 on pci5

miibus0:  on bge0
brgphy0:  PHY 1 on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-FDX, auto

bge0: Ethernet address: 00:40:f4:66:b1:56
bge0: [ITHREAD]
pcib6:  irq 18 at device 2.0 on pci2
pci6:  on pcib6
em0:  port 0x3000-0x301f mem 
0xd840-0xd841 irq 18 at device 0.0 on pci6

em0: Using MSI interrupt
em0: [FILTER]
em0: Ethernet address: 00:30:48:31:67:86
em1:  port 0x3020-0x303f mem 
0xd842-0xd843 irq 19 at device 0.1 on pci6

em1: Using MSI interrupt
em1: [FILTER]
em1: Ethernet address: 00:30:48:31:67:87
pcib7:  at device 0.3 on pci1
pci7:  on pcib7
pcib8:  at device 4.0 on pci0
pci8:  on pcib8
pcib9:  at device 6.0 on pci0
pci9:  on pcib9
pcib10:  at device 0.0 on pci9
pci10:  on pcib10
arcmsr0: mem 0xd890-0xd8900fff,0xd800-0xd83f irq 16 at device 14.0 
on pci10

ARECA RAID ADAPTER0: Driver Version 1.20.00.1

Re: Can I get a committer to mark this bug as blocking 6.4-RELEASE ?

2008-12-01 Thread Jo Rhett

On Nov 26, 2008, at 1:12 PM, Ken Smith wrote:

Unfortunately no.  As John indicated in the earlier thread BIOS
issues tend to be extremely hard to diagnose and so far it seems
like its specific to this one motherboard.

Given this problem does cause issues with installs I'd be willing
to provide ISOs built at the point we've done the Errata Notice that
fixes the problem.  But its too nebulous an issue to hold up the
release itself for.


It does *not* cause an issue with installs.  Installs work fine.  It  
prevents booting an installed operating system.  This appears to  
affect *ALL* of the Intel multi-cpu motherboards, including 3  
generations of Rackable systems.


The only reason it is nebulous is because absolutely nobody bothered  
to investigate the issue.  I've been asking for what information would  
help.  I've offered to setup serial consoles, or even ship systems, to  
anyone who would work on this problem.


This is very big problem that will affect thousands of freebsd servers.

Ken, the complete lack of action taken by FreeBSD to even CONSIDER  
investigating a significant bug reported during the testing process is  
shocking.  And it truly puts a lie to those who continue to claim that  
we should be more active in the testing process.  Every time I have  
done this, I'd found significant issues that affect a significant  
portion of the user base and COMPLETELY prevent deployment of a given  
release, and absolutely nothing has been done to even investigate the  
reports, nevermind address them.


Congradulations.  Good Job.  If you aren't going to accept bug  
reports, why exactly do you release testing candidates at all?


--
Jo Rhett
Net Consonance : consonant endings by net philanthropy, open source  
and other randomness



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: usb keyboard dying at loader prompt

2008-12-01 Thread Jo Rhett
Just FYI we are seeing the exact same problem with PS/2 keyboards and  
the 6.4 loader, so this may not be a USB-only issue.


The complete lack of response to serious bug reports about 6.4-REL is  
fairly shocking.


On Nov 28, 2008, at 5:24 AM, Andriy Gapon wrote:

I did more testing and it seems that our loader does have something to
do with the problem.

If I boot to memtest86 the keyboard keeps working.
If I pause boot menu, wait for many minutes, the keyboard still works.
If I escape to loader prompt, this when the keyboard stops working  
after

a few seconds.

Not sure how to explain this.
I think I've seen some changes to reduce memory usage of loader, I  
will

try them to see if that would make any difference for my situation.


--
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED] 
"


--
Jo Rhett
Net Consonance : consonant endings by net philanthropy, open source  
and other randomness



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Can I get a committer to mark this bug as blocking 6.4-RELEASE ?

2008-12-01 Thread Ken Smith
On Mon, 2008-12-01 at 10:20 -0800, Jo Rhett wrote:
> On Nov 26, 2008, at 1:12 PM, Ken Smith wrote:
> > Unfortunately no.  As John indicated in the earlier thread BIOS
> > issues tend to be extremely hard to diagnose and so far it seems
> > like its specific to this one motherboard.
> >
> > Given this problem does cause issues with installs I'd be willing
> > to provide ISOs built at the point we've done the Errata Notice that
> > fixes the problem.  But its too nebulous an issue to hold up the
> > release itself for.
> 
> It does *not* cause an issue with installs.  Installs work fine.  It  
> prevents booting an installed operating system.  This appears to  
> affect *ALL* of the Intel multi-cpu motherboards, including 3  
> generations of Rackable systems.

Understood, I guess I wasn't quite specific enough.  The machine not
being able to boot what got installed on its disk I consider an install
problem.

To date this is the first mention I've seen of it affecting more than
one specific machine type.  I might have missed it but I can't recall
you mentioning this affected more than one particular machine.  And it
does not seem to affect *ALL* of the Intel multi-cpu motherboards.

> The only reason it is nebulous is because absolutely nobody bothered  
> to investigate the issue.  I've been asking for what information would  
> help.  I've offered to setup serial consoles, or even ship systems, to  
> anyone who would work on this problem.

Both John and Xin Li have chimed in on the two threads I've seen that
are related to this specific topic.  John diagnosed it as a issue with
the BIOS.  That's what makes it a nebulous problem.  When working on
those sorts of things most people liken it to "Whack-a-mole".

> This is very big problem that will affect thousands of freebsd servers.

Its still not clear it will affect thousands of servers.  The same set
of changes got made to stable/7 as were done to stable/6, and the test
builds for the 7.1 release have been seeing much more testing than the
test builds for the 6.4 release.  If the problem was as wide-spread as
you're suggesting we'd likely have seen a lot more reports and that
factored into the decision about whether to go ahead or not.

This all left me with a decision.  My choices were to back out the BTX
changes that were known to fix boot issues with certain motherboards and
enabled booting from USB devices or leave things as they are.  The
motherboards that didn't boot with the older code had no work-around.
The motherboards that did boot with the older code but not the newer
code do have a work-around (use the old loader).  Decisions like that
suck, no matter which choice I make it's wrong.  Holding the release
until all bios issues get resolved isn't a viable option because of the
"Whack-a-mole" thing mentioned above.  Fix it for one and two break.  It
takes a lot of time/work to settle into what seems to work for the
widest set of machines.

> Ken, the complete lack of action taken by FreeBSD to even CONSIDER  
> investigating a significant bug reported during the testing process is  
> shocking.  And it truly puts a lie to those who continue to claim that  
> we should be more active in the testing process.  Every time I have  
> done this, I'd found significant issues that affect a significant  
> portion of the user base and COMPLETELY prevent deployment of a given  
> release, and absolutely nothing has been done to even investigate the  
> reports, nevermind address them.
> 
> Congradulations.  Good Job.  If you aren't going to accept bug  
> reports, why exactly do you release testing candidates at all?

So you're saying John and Xin Li's responses (Xin Li's questions still
un-answered) to you show a complete lack to even consider investigating
it?  I know from past email threads your preference is for 6.X right now
but as a test point if you aren't totally fried over this whole thing it
would still be useful to know for sure if the issue exists with 7.1 test
builds.  If yes it eliminates a variety of possibilities and helps focus
on the exact problem.

-- 
Ken Smith
- From there to here, from here to  |   [EMAIL PROTECTED]
  there, funny things are everywhere.   |
  - Theodore Geisel |



signature.asc
Description: This is a digitally signed message part


Re: Can I get a committer to mark this bug as blocking 6.4-RELEASE ?

2008-12-01 Thread Jo Rhett

On Dec 1, 2008, at 11:30 AM, Ken Smith wrote:

Both John and Xin Li have chimed in on the two threads I've seen that
are related to this specific topic.  John diagnosed it as a issue with
the BIOS.  That's what makes it a nebulous problem.  When working on
those sorts of things most people liken it to "Whack-a-mole".


Diagnosed without testing.  John never asked for any more information  
than the page fault description from me.  When I asked what else to  
test and offered to supply systems for testing he stopped responding.   
Xin Li proposed a work-around that would have castrated the systems.   
It might work, but it wasn't a useful workaround so I deferred testing  
and focused on trying to get someone to address the real problem.


This is very big problem that will affect thousands of freebsd  
servers.


Its still not clear it will affect thousands of servers.


Um... Rackable.   Rackable ships cabinets full of systems to people  
that run FreeBSD.  They don't sell to home or small corporate users,  
period.  Any problem that affects a standard Rackable build will by  
definition affect thousands of systems.  (much like any standard Dell  
or HP server build)



This all left me with a decision.  My choices were to back out the BTX
changes that were known to fix boot issues with certain motherboards  
and

enabled booting from USB devices or leave things as they are.


Or do some more testing and determine the problem and fix it.  I had a  
stack of systems demonstrating the problem.  I could have shipped one  
to each freebsd developer you wanted to work on it.  If you were  
willing to identify the affect source code and relevant gdb traps I  
would have happily worked on the source directly if that is what it  
took.


I would test.  I would supply console access and build systems.  I  
would ship them to anyone who wanted one in their hot little hands.  I  
would investigate the source code myself with a mere hour of "here's  
the relevant bits you need to consider" training.


You could have done *anything* that suited your needs for testing.   
Instead you did nothing.



The
motherboards that didn't boot with the older code had no work-around.
The motherboards that did boot with the older code but not the newer
code do have a work-around (use the old loader).


Not true.  I tested this, installing the old loader and it did not  
change the problem.  As reported.



Decisions like that
suck, no matter which choice I make it's wrong.  Holding the release
until all bios issues get resolved isn't a viable option because of  
the
"Whack-a-mole" thing mentioned above.  Fix it for one and two  
break.  It

takes a lot of time/work to settle into what seems to work for the
widest set of machines.


Break the boot loader for a very wide variety of systems rather than  
spend EVEN A SINGLE HOUR trying to diagnose the boot problem?


Ken, your diagnosis here would make sense if ANY diagnosis had been  
attempted.  This could be a trivial problem.  It could be solved with  
5 minutes of actually looking at it.  What happened here is that you  
proceeded WITHOUT EVEN TRYING.



So you're saying John and Xin Li's responses (Xin Li's questions still
un-answered) to you show a complete lack to even consider  
investigating

it?


No actual diagnosis was done.  I'm sorry, but if I pull my car up to  
my mechanic's garage and he makes a diagnosis of "no idea what's  
wrong" without even popping the hood, yeah that counts as "didn't even  
consider investigating"


Worse yet, I would happily have done all of the grunt work for the  
investigation.  But I'm not going to start by reading the source tree  
and making guesses where to look.  If someone had given me some useful  
tests to do, I would have done them.



I know from past email threads your preference is for 6.X right now


Not my preference, my ability to justify the evaluation and testing  
costs based on the support available for a given release.  7.0 doesn't  
work on this hardware at all.  No, I haven't tested 7.1 because 6.4  
was the easier testing target and I had thought that the security team  
was working on fixing the support model.


So now we have the brilliance strategy of a long-term support -REL  
that we will never be able to use.  The same stupid stunt that gave us  
6.1 which was unusable and 6.2 which worked great but expired at the  
same time as 6.1.  Etc and such forth.  6.5 will likely be short term  
support again, but the first release we can consider for deployment.


but as a test point if you aren't totally fried over this whole  
thing it
would still be useful to know for sure if the issue exists with 7.1  
test
builds.  If yes it eliminates a variety of possibilities and helps  
focus

on the exact problem.


I'm not burnt, but testing 7.1 has no meaningful relevance to my day  
job until we have a reasonable and working support mechanism.


And given that I really pulled out the stops to make sure we had  
hardware f

Re: Can I get a committer to mark this bug as blocking 6.4-RELEASE ?

2008-12-01 Thread Xin LI
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Jo Rhett wrote:
> On Dec 1, 2008, at 11:30 AM, Ken Smith wrote:
>> Both John and Xin Li have chimed in on the two threads I've seen that
>> are related to this specific topic.  John diagnosed it as a issue with
>> the BIOS.  That's what makes it a nebulous problem.  When working on
>> those sorts of things most people liken it to "Whack-a-mole".
> 
> Diagnosed without testing.  John never asked for any more information
> than the page fault description from me.  When I asked what else to test
> and offered to supply systems for testing he stopped responding.  Xin Li
> proposed a work-around that would have castrated the systems.  It might
> work, but it wasn't a useful workaround so I deferred testing and
> focused on trying to get someone to address the real problem.

What I proposed is, to *narrow down* the problem so we can diagnose
further, since nobody has idea at the moment about how the problem was,
we do need to have further information, or, to get the whole 6.3->6.4
diff reviewed, which is (in my opinion) not an optimal use of
developers' time.

Cheers,
- --
Xin LI <[EMAIL PROTECTED]>  http://www.delphij.net/
FreeBSD - The Power to Serve!
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.9 (FreeBSD)

iEYEARECAAYFAkk0SEwACgkQi+vbBBjt66AbmACeLJgUrf3fp9yNyUXV/T/YvCxT
WDkAoL745HKpJw0CogTcZDdvbkMck3uG
=0Fg4
-END PGP SIGNATURE-
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


confirming bugs is bad behavior, etc.

2008-12-01 Thread Jo Rhett

On Dec 1, 2008, at 11:59 AM, George V. Neville-Neil wrote:

I have mostly stayed away from these threads because they've often
devolved into unproductive finger pointing.

Please leave the hyperbole out of your posts, or at least attempt to
cut it back.  People on these lists are working quite hard to solve
problems for the whole of the FreeBSD community and your posts, such
as this one, are not helping us to move forward.



My posts have always been directed at solving very real, operational  
problems with using FreeBSD on server platforms, which is exactly the  
stated goal for freebsd.  I have always offered not only problems, but  
resources to help test or evaluate the issues, and serious  
considerations for ways to improve the process.


Yes, you're right.  Threads I start about real problems always devolve  
into unproductive finger pointing.  That would be the freebsd  
developers attacking the reporter for identifying a real, operational  
problem.  Take a look at the posts of the FreeBSD developers, and view  
for yourself the unprofessional attacks and personal insults hurled by  
them at people who are simply trying to get real problems resolved.


And yet, instead of asking your developers to stop violating the  
posted rules of the mailing list, you are asking a bug reporter who  
simply informed another bug reporter that their problem was both  
widespread and not limited to USB devices to stop posting to the  
list.  Because god knows that "yes we saw it too and it's widely  
reported" is bad behavior.  Much worse that personal attacks which are  
strictly against the list rules.


Yes, I'm sure that the personal attacks really do help drive freebsd  
development forward.  Much more so than me bringing resources and  
actually testing things does.


Now that Core has clearly spoken their mind on this issue, by refusing  
to ask freebsd developers to avoid violating the list charter and then  
publicly calling out someone for just saying "yeah, it's a widely  
reported problem" ... leaves any doubt that positive change is going  
to happen here.


Your request is accepted.  I'm unsubscribing now.

--
Jo Rhett
Net Consonance : consonant endings by net philanthropy, open source  
and other randomness



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Can I get a committer to mark this bug as blocking 6.4-RELEASE ?

2008-12-01 Thread Jo Rhett

On Dec 1, 2008, at 12:25 PM, Xin LI wrote:

What I proposed is, to *narrow down* the problem so we can diagnose
further, since nobody has idea at the moment about how the problem  
was,

we do need to have further information, or, to get the whole 6.3->6.4
diff reviewed, which is (in my opinion) not an optimal use of
developers' time.



I got your request at the beginning of a vacation period where I was  
out of town.  I had explicitly requested that 6.4 be blocked for this  
issue.  I didn't think that "just my problem" would be enough to hold  
it up, but I apparently never even considered that -REL would happen  
without even responding to my request.


Since nobody had responded to my request, and several posts had gone  
out about more testing for 7.1 (which had the same loader and the same  
problems) I assumed that 6.4 was similarly delayed.  Had anyone said  
you needed this information pronto I would have canceled my  
Thanksgiving plans and spent the day in the lab testing this for you.


For that matter, I had already pulled a diff of 6.3 to 6.4 and was  
working my way through it trying to find the relevant parts.  If you  
would have identified the relevant portions, I would have happily  
tried backing out some of the changes on a per-component basis to  
figure it out.


In short, tell me what you wanted/needed, and I would have done it ASAP.

It's apparently irrelevant now.

--
Jo Rhett
Net Consonance : consonant endings by net philanthropy, open source  
and other randomness



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: usb keyboard dying at loader prompt

2008-12-01 Thread George V. Neville-Neil
At Mon, 1 Dec 2008 10:22:31 -0800,
Jo Rhett wrote:
> 
> Just FYI we are seeing the exact same problem with PS/2 keyboards and  
> the 6.4 loader, so this may not be a USB-only issue.
> 
> The complete lack of response to serious bug reports about 6.4-REL is  
> fairly shocking.
> 

Jo,

I have mostly stayed away from these threads because they've often
devolved into unproductive finger pointing.

Please leave the hyperbole out of your posts, or at least attempt to
cut it back.  People on these lists are working quite hard to solve
problems for the whole of the FreeBSD community and your posts, such
as this one, are not helping us to move forward.

Thanks,
George Neville-Neil
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: confirming bugs is bad behavior, etc.

2008-12-01 Thread gnn
At Mon, 1 Dec 2008 12:27:57 -0800,
Jo Rhett wrote:
> 
> Now that Core has clearly spoken their mind on this issue, by refusing  
> to ask freebsd developers to avoid violating the list charter and then  
> publicly calling out someone for just saying "yeah, it's a widely  
> reported problem" ... leaves any doubt that positive change is going  
> to happen here.
> 

Note that my mail was not marked in any way "From core" but was merely
as a list participant.  I've always been all for people finding and
helping to work through bugs.  What I object to is hyperbole and 
passive aggressiveness.

For more on this see here:

http://video.google.com/videoplay?docid=-4216011961522818645

If we can identify the issue let's fix it, but let's do it without
lots of emotional stuff.

Best,
George
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


RE: usb keyboard dying at loader prompt

2008-12-01 Thread Jan Mikkelsen
Hi,

> Just FYI we are seeing the exact same problem with PS/2 
> keyboards and  
> the 6.4 loader, so this may not be a USB-only issue.
>
> [ ... ]
> 
> On Nov 28, 2008, at 5:24 AM, Andriy Gapon wrote:
> > I did more testing and it seems that our loader does have 
> something to
> > do with the problem.
> >
> > If I boot to memtest86 the keyboard keeps working.
> > If I pause boot menu, wait for many minutes, the keyboard 
> still works.
> > If I escape to loader prompt, this when the keyboard stops working  
> > after
> > a few seconds.
> >
> > Not sure how to explain this.
> > I think I've seen some changes to reduce memory usage of loader, I  
> > will
> > try them to see if that would make any difference for my situation.

I have seen a similar problem on a Sun X4240 with 7.1-PRE.  Using the ILOM
remote keyboard works at the loader prompt but fails at the root filesystem
prompt.  I could work around the problem by attaching a different keyboard
to the front USB port.

Have you tried different keyboards?

Regards,

Jan.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: no priority on the console?

2008-12-01 Thread Jan Mikkelsen
Hi,

> As per my previous message, I've spent about 3 months trying to debug  
> a problem that was causing all disk I/O to go very slowly.

A first glance this sounds similar to the problem I am having with very slow
I/O on the Areca controller.  (see: "7.1-PRERELEASE: arcmsr write
performance problem")  What controller are you using?  Is the write cache
enabled?

> One of the things which made this nearly impossible to diagnose was  
> the absolute lack of priority given to the console.  Logging in on the  
> console would take 12-15 minutes.  Hitting enter on the console would  
> usually take between 3 and 5 minutes.

Yes, I see this when I get the slow I/O problem.  I think this has been a
problem for some time; I have also seen "console freezes" (ssh, console,
etc.) on 6.0 and 6.1 systems under SATA load.  That was a while ago now
(2006?).  I also recall others reporting have seen the same problem
intermittently.

> This doesn't seem right to me.  Can someone explain why the console  
> isn't given a very high priority?  Why not?  What other mechanism does  
> the sysadmin have for debugging, at a time when SSH logins either  
> fail, or take up to an hour to complete?

In my case I could log into the system and start things like iostat and
gstat and they kept running while the problem occurred so that I could see
some of what was going on.  I could also have what seemed like a reasonable
ssh session with a jail on the same machine.  This indicates to me that it
is not the console that is the issue, but rather that the process of logging
into the main machine touches some file that causes it to get caught up in
the slow I/O quagmire.

If the problem I am seeing now is the same as the one I saw a few years ago
then I think the nature might have changed.  My recollection is that
utilities like iostat would also freeze back then, but I can't be sure.

I'd like to resolve this problem too.

Regards,

Jan.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: make distribution halts during install (7.1Prerelease today)

2008-12-01 Thread Dewayne Geraghty
My earlier post falls into the embarrassing, wish that I hadn't category. To 
prevent anyone wasting effort, I'm replying.

make distribution DESTDIR=/newplace
requires a 
make world DESTDIR=/newplace
as a prerequisite.  

The earlier post, caused me to believe that there was an error in 
/usr/bin/install, when using:
make distribution DESTDIR=/differentplace
The granularity of my testing was inappropriate. Apologies for the distraction. 
 

Dewayne


  Start your day with Yahoo!7 and win a Sony Bravia TV. Enter now 
http://au.docs.yahoo.com/homepageset/?p1=other&p2=au&p3=tagline

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: dhclient doing DISCOVER with bad IP checksum - bge (7.1 show stopper??)

2008-12-01 Thread Jonathan Feally
Can someone please confirm or rule out my issue with dhclient sending 
bad IP checksum packets. It would really suck if 7.1 was released with a 
broken DHCP client.


Jonathan Feally wrote:

Sorry for the cross-post, but this could be either lists problem.

I have 2 boxes running 7-STABLE as of 20081130, both i386 SMP. One is 
running ISC DHCPD 3.0.x from recent ports, and the other dhclient from 
make world.


The server is refusing to answer the DISCOVER request, as it thinks 
the IP checksum is wrong, which tcpdump also confirms. Other DHCP 
clients are working fine on this network, so I do not believe it to be 
the network, server or dhcpd.


Server is running a 2 Port Intel card - em driver.

Client is a Dell PE1750 with 2 onboard NIC's - bge driver.

I have tried turning off both RXCSUM and TXCSUM on both the client and 
server machines with no luck. I also tried the second NIC on the 
server with the same result.


This setup was working just a couple of weeks ago, and the only thing 
that has changed is updating the src for a make world. PXE booting 
this server does result in an IP being issued, so it is pointing 
towards something new/changed in 7-STABLE.


I have attached a 3 packet dump of the DISCOVER requests.

Can anybody shed some light on this for me?

Thanks, -Jon



___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"



--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 7.1-PRERELEASE: arcmsr write performance problem

2008-12-01 Thread Jan Mikkelsen

Replying to my own post ...

I have done a test on the same machine comparing 6.3-p1 to 7.1-PRE.  The 
performance is the expected ~6MB/s (because of the lack of cache) on 
6.3-p1, so the BIOS change doesn't seem to be at fault.


This seems to be a regression somewhere between 6.3 to 7.1.  The Areca 
driver is the same in 6.3 and 7.1, so the problem seems to be elsewhere.


I think this is more than just a "performance" problem.  The 
observations with gstat showing extremely high ms/w values (I have seen 
them as high as 22000) makes it look like IO completion interrupts are 
being lost.


Any suggestions on where to look next?  Are there obvious candidates?


Jan Mikkelsen wrote:

Hi,

I am seeing extremely poor performance (~100kB/s) when untaring large 
tar files into fresh ufs filesystems.  I see the problem with 
softupdates and without softupdates but with an async mount.  This is a 
Supermicro X7DB8 board, 4GB, 2 x Xeon 5140.


Sample gstat output:

dT: 1.033s  w: 1.000s
L(q)  ops/sr/s   kBps   ms/rw/s   kBps   ms/w   %busy Name
 585 61  0  00.0 61170 13812.0  100.1| da2

I see ms/w start at about 200ms with a ~3MB/s throughput, and then I see 
ms/w rise and kBps drop.  ms/w goes as high as 16-20s, and then suddenly 
drops back down to about 200ms.   Using iostat, while the performance is 
high(er), kb/t is 64kB, as the problem starts it drops towards 2kB.


Copying a single large file doesn't exhibit this problem, although 
throughput isn't great (~3-5MB/s).  However, that's better that 100kB/s.


arcmsr0: mem 0xd890-0xd8900fff,0xd800-0xd83f irq 16 at device 14.0 
on pci10

ARECA RAID ADAPTER0: Driver Version 1.20.00.15 2007-10-07
ARECA RAID ADAPTER0: FIRMWARE VERSION V1.46 2008-08-06
arcmsr0: [ITHREAD]

There are eight disks connected in a RAID-6 configuration.  The 
controller's cache is write-through and the disks' write caches are 
disabled.  NCQ is enabled on the drives.


The same hardware when it ran 6.3-p1 didn't have this problem.  However, 
the system BIOS was updated at the same time as the operating system (in 
an attempt to solve a recent em problem), so it is possible that it is a 
BIOS related problem.  The same build on an entirely different machine 
with an aac controller and SAS disks also doesn't show this problem.


Running 'devinfo -r' doesn't list arcmsr as having an interrupt at all. 
(see below).  That strikes me as odd; checking another machine that is 
still running 6.2 with an arcmsr controller, I can see the interrupt 
just fine.


So:

- Does anyone have any suggestions?

- Is it normal for arcmsr to not show an interrupt in the output from 
devinfo in 7.1?


Full dmesg, devinfo below.

Thanks,

Jan Mikkelsen


Copyright (c) 1992-2008 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 7.1-PRERELEASE #0: Mon Dec  1 14:53:12 EST 2008
   
[EMAIL PROTECTED]:/home/janm/p4/freebsd-image-std-2008.2/work/base-freebsd/home/janm/p4/freebsd-image-std-2008.2/FreeBSD/src/sys/TW-SMP 


Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(R) CPU5140  @ 2.33GHz (2333.35-MHz 
K8-class CPU)

 Origin = "GenuineIntel"  Id = 0x6f6  Stepping = 6
 Features=0xbfebfbff 

 Features2=0x4e3bd 


 AMD Features=0x20100800
 AMD Features2=0x1
 Cores per package: 2
usable memory = 4280651776 (4082 MB)
avail memory  = 4117843968 (3927 MB)
ACPI APIC Table: 
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
cpu0 (BSP): APIC ID:  0
cpu1 (AP): APIC ID:  1
cpu2 (AP): APIC ID:  6
cpu3 (AP): APIC ID:  7
ioapic0  irqs 0-23 on motherboard
ioapic1  irqs 24-47 on motherboard
kbd1 at kbdmux0
ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413)
acpi0:  on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0
pcib0:  port 0xcf8-0xcff on acpi0
pci0:  on pcib0
pcib1:  at device 2.0 on pci0
pci1:  on pcib1
pcib2:  irq 16 at device 0.0 on pci1
pci2:  on pcib2
pcib3:  irq 16 at device 0.0 on pci2
pci3:  on pcib3
pcib4:  at device 0.0 on pci3
pci4:  on pcib4
ahd0:  port 
0x2400-0x24ff,0x2000-0x20ff mem 0xd850-0xd8501fff irq 16 at device 
2.0 on pci4

ahd0: [ITHREAD]
aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI-X 67-100Mhz, 512 SCBs
ahd1:  port 
0x2c00-0x2cff,0x2800-0x28ff mem 0xd8502000-0xd8503fff irq 17 at device 
2.1 on pci4

ahd1: [ITHREAD]
aic7902: Ultra320 Wide Channel B, SCSI Id=7, PCI-X 67-100Mhz, 512 SCBs
pcib5:  at device 0.2 on pci3
pci5:  on pcib5
bge0:  mem 
0xd860-0xd860 irq 16 at device 1.0 on pci5

miibus0:  on bge0
brgphy0:  PHY 1 on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-FDX, auto

bge0: Ethernet address: 00:40:f4:66:b1:56
bge0: [ITHREAD]
p

exim 4.69 freebsd 7.0 locking issues

2008-12-01 Thread Andrew

Hi all,

Running Exim 4.69 on 7.0-RELEASE-p6 FreeBSD.
The box has been recently upgraded from 6.3 (like 24 hours ago).

Currently Exim is sending the following lines to the log files.

2008-12-01 19:02:35 Failed to get write lock for 
/var/spool/exim/db/callout.lockfile: Invalid argument
2008-12-01 19:02:35 Failed to get write lock for 
/var/spool/exim/db/callout.lockfile: Invalid argument
2008-12-01 19:02:35 1L74Cp-000GRN-3R Cannot lock 
/var/spool/exim/input//1L74Cp-000GRN-3R-D (22): Invalid argument


The permissions are all correct for the spool directories and for Exim 
itself.


It is creating stacks and stacks of 0 byte files in the message spool 
directory.  I have recompiled all the ports but to no avail.  I've 
upgraded 2 other machines with 99.0% the same setup with no issues.
The only difference is hostnames/ips and that this machine is running 
mysql on it. Everything else on the machine (spam-assassin, clamav, 
mysql) is working fine.


Has anybody got any ideas, other than downgrade back to fbsd 6.3?

Cheers
cya
Andrew
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: dhclient doing DISCOVER with bad IP checksum - bge (7.1 show stopper??)

2008-12-01 Thread Daniel O'Connor
On Tuesday 02 December 2008 15:57:15 Jonathan Feally wrote:
> Can someone please confirm or rule out my issue with dhclient sending
> bad IP checksum packets. It would really suck if 7.1 was released with a
> broken DHCP client.

I had 7.1-PRE (early Octover) send out DHCP requests without issue, although I 
don't have that system available now. It was using em card.

I have a 7.0-STABLE system with an sk card from July that does DHCP requests 
just fine too..

I don't have any bge systems running 7 to test with though sorry.. Does it 
always give dud packets or just DHCP? Can you try another card in the client?

-- 
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
"The nice thing about standards is that there
are so many of them to choose from."
  -- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C


signature.asc
Description: This is a digitally signed message part.