Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Andriy Gapon
on 18/08/2011 02:15 Steven Hartland said the following: > In a nutshell the jail manager we're using will attempt to resurrect the jail > from a dieing state in a few specific scenarios. > > Here's an exmaple:- > 1. jail restart requested > 2. jail is stopped, so the java processes is killed off,

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Andriy Gapon
on 20/08/2011 13:02 Andriy Gapon said the following: > on 18/08/2011 02:15 Steven Hartland said the following: >> In a nutshell the jail manager we're using will attempt to resurrect the jail >> from a dieing state in a few specific scenarios. >> >> Here's an exmaple:- >> 1. jail restart requested

Remote installing

2011-08-20 Thread Willem Jan Withagen
Hi, Today I liked to live dangerously, and want to upgrade a backups server from i386 to amd64. Just to see if we could. And otherwise I'd scap it and install from usb-stick. So I have my server running amd64 build GENERIC. export /, /var, /usr on the server to be upgraded. But upgrading worl

Re: Remote installing

2011-08-20 Thread Willem Jan Withagen
On 2011-08-20 13:15, Willem Jan Withagen wrote: Hi, Today I liked to live dangerously, and want to upgrade a backups server from i386 to amd64. Just to see if we could. And otherwise I'd scap it and install from usb-stick. So I have my server running amd64 build GENERIC. export /, /var, /usr on

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Steven Hartland
- Original Message - From: "Andriy Gapon" BTW, I suspect the following scenario, but I am not able to verify it either via testing or in the code: - last process in a dying jail exits - pr_uref of the jail reaches zero - pr_uref of prison0 gets decremented - you attach to the jail and

Re: USB/coredump hangs in 8 and 9

2011-08-20 Thread Hans Petter Selasky
On Friday 19 August 2011 18:32:13 Andriy Gapon wrote: > on 19/08/2011 00:24 Hans Petter Selasky said the following: > > On Thursday 18 August 2011 19:04:10 Andriy Gapon wrote: > >> If you can help Hans to figure out what you is wrong with USB subsystem > >> in this respect that would help us all. >

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Steven Hartland
- Original Message - From: "Andriy Gapon" BTW, I suspect the following scenario, but I am not able to verify it either via testing or in the code: - last process in a dying jail exits - pr_uref of the jail reaches zero - pr_uref of prison0 gets decremented - you attach to the jail and

Re: USB/coredump hangs in 8 and 9

2011-08-20 Thread Andriy Gapon
on 20/08/2011 16:35 Hans Petter Selasky said the following: > On Friday 19 August 2011 18:32:13 Andriy Gapon wrote: >> on 19/08/2011 00:24 Hans Petter Selasky said the following: >>> On Thursday 18 August 2011 19:04:10 Andriy Gapon wrote: If you can help Hans to figure out what you is wrong wi

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Andriy Gapon
on 20/08/2011 18:51 Steven Hartland said the following: > - Original Message - From: "Andriy Gapon" > >> BTW, I suspect the following scenario, but I am not able to verify it either >> via >> testing or in the code: >> - last process in a dying jail exits >> - pr_uref of the jail reaches

Re: USB/coredump hangs in 8 and 9

2011-08-20 Thread Hans Petter Selasky
On Saturday 20 August 2011 18:45:57 Andriy Gapon wrote: > SCHEDULER_STOPPED The USB code needs to check for the SCHEDULER_STOPPED and cold at the present moment. If this state can be set during bootup, and cleared at the same time like "cold", it would be very good. --HPS __

Re: USB/coredump hangs in 8 and 9

2011-08-20 Thread Andriy Gapon
on 20/08/2011 19:54 Hans Petter Selasky said the following: > On Saturday 20 August 2011 18:45:57 Andriy Gapon wrote: >> SCHEDULER_STOPPED > > The USB code needs to check for the SCHEDULER_STOPPED and cold at the present > moment. If this state can be set during bootup, and cleared at the same ti

Re: USB/coredump hangs in 8 and 9

2011-08-20 Thread Hans Petter Selasky
On Saturday 20 August 2011 19:09:02 Andriy Gapon wrote: > on 20/08/2011 19:54 Hans Petter Selasky said the following: > > On Saturday 20 August 2011 18:45:57 Andriy Gapon wrote: > >> SCHEDULER_STOPPED > > > > The USB code needs to check for the SCHEDULER_STOPPED and cold at the > > present moment.

Re: bad sector in gmirror HDD

2011-08-20 Thread Dan Langille
On Aug 19, 2011, at 11:24 PM, Jeremy Chadwick wrote: > On Fri, Aug 19, 2011 at 09:39:17PM -0400, Dan Langille wrote: >> >> On Aug 19, 2011, at 7:21 PM, Jeremy Chadwick wrote: >> >>> On Fri, Aug 19, 2011 at 04:50:01PM -0400, Dan Langille wrote: System in question: FreeBSD 8.2-STABLE #3: Thu

Re: 32GB limit per swap device?

2011-08-20 Thread Kostik Belousov
On Sat, Aug 20, 2011 at 12:33:29PM -0500, Alan Cox wrote: > On Thu, Aug 18, 2011 at 3:16 AM, Alexander V. Chernikov > wrote: > > > On 10.08.2011 19:16, per...@pluto.rain.com wrote: > > > >> Chuck Swiger wrote: > >> > >> On Aug 9, 2011, at 7:26 AM, Daniel Kalchev wrote: > >>> > I am trying

Re: bad sector in gmirror HDD

2011-08-20 Thread Alex Samorukov
You can run long self-test in smartmontools (-t long). Then you can get failed sector number from the smartmontools (-l selftest) and then you can use DD to write zero to the specific sector. Also i am highly recommending to setup smartd as daemon and to monitor number of relocated sectors. If

Re: 32GB limit per swap device?

2011-08-20 Thread Alan Cox
On Thu, Aug 18, 2011 at 3:16 AM, Alexander V. Chernikov wrote: > On 10.08.2011 19:16, per...@pluto.rain.com wrote: > >> Chuck Swiger wrote: >> >> On Aug 9, 2011, at 7:26 AM, Daniel Kalchev wrote: >>> I am trying to set up 64GB partitions for swap for a system that has 64GB of RAM (with

Re: bad sector in gmirror HDD

2011-08-20 Thread Diane Bruce
On Sat, Aug 20, 2011 at 01:34:41PM -0400, Dan Langille wrote: > On Aug 19, 2011, at 11:24 PM, Jeremy Chadwick wrote: > > > On Fri, Aug 19, 2011 at 09:39:17PM -0400, Dan Langille wrote: ... > >> Information such as this? > >> http://beta.freebsddiary.org/smart-fixing-bad-sector.php ... > > 3) A v

Re: bad sector in gmirror HDD

2011-08-20 Thread Dan Langille
On Aug 20, 2011, at 1:54 PM, Alex Samorukov wrote: >> [root@bast:~] # dd of=/dev/null if=/dev/ad2 bs=1m conv=noerror >> dd: /dev/ad2: Input/output error >> 2717+0 records in >> 2717+0 records out >> 2848980992 bytes transferred in 127.128503 secs (22410246 bytes/sec) >> dd: /dev/ad2: Input/output

Re: bad sector in gmirror HDD

2011-08-20 Thread Dan Langille
On Aug 20, 2011, at 2:04 PM, Diane Bruce wrote: > On Sat, Aug 20, 2011 at 01:34:41PM -0400, Dan Langille wrote: >> On Aug 19, 2011, at 11:24 PM, Jeremy Chadwick wrote: >> >>> On Fri, Aug 19, 2011 at 09:39:17PM -0400, Dan Langille wrote: > ... Information such as this? http://beta.fre

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Roger Marquis
Repeat this enough times and prison0.pr_uref reaches zero. To reach zero even sooner just kill enough of non-jailed processes. Interesting. We've been getting kernel panics in -stable but with only one jail started at boot without being restarted. Are you using SAS drives by any chance? Setti

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Steven Hartland
- Original Message - From: "Roger Marquis" To: ; Sent: Saturday, August 20, 2011 7:10 PM Subject: Re: debugging frequent kernel panics on 8.2-RELEASE Repeat this enough times and prison0.pr_uref reaches zero. To reach zero even sooner just kill enough of non-jailed processes. Inter

Re: bad sector in gmirror HDD

2011-08-20 Thread Jeremy Chadwick
On Sat, Aug 20, 2011 at 07:54:30PM +0200, Alex Samorukov wrote: > You can run long self-test in smartmontools (-t long). Then you can > get failed sector number from the smartmontools (-l selftest) and > then you can use DD to write zero to the specific sector. This is inaccurate advice. I covere

Re: bad sector in gmirror HDD

2011-08-20 Thread Jeremy Chadwick
Dan, I will respond to your reply sometime tomorrow. I do not have time to review the Email today (~7.7KBytes), but will have time tomorrow. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Sys

Re: 32GB limit per swap device?

2011-08-20 Thread Alan Cox
On 08/20/2011 12:41, Kostik Belousov wrote: On Sat, Aug 20, 2011 at 12:33:29PM -0500, Alan Cox wrote: On Thu, Aug 18, 2011 at 3:16 AM, Alexander V. Chernikovwrote: On 10.08.2011 19:16, per...@pluto.rain.com wrote: Chuck Swiger wrote: On Aug 9, 2011, at 7:26 AM, Daniel Kalchev wrote: I

Re: bad sector in gmirror HDD

2011-08-20 Thread Dan Langille
On Aug 20, 2011, at 2:36 PM, Jeremy Chadwick wrote: > Dan, I will respond to your reply sometime tomorrow. I do not have time > to review the Email today (~7.7KBytes), but will have time tomorrow. No worries. Thank you. -- Dan Langille - http://langille.org _

Re: bad sector in gmirror HDD

2011-08-20 Thread Alex Samorukov
"The SMART tests you did didn't really amount to anything; no surprise. short and long tests usually do not test the surface of the disk. There are some drives which do it on a long test, but as I said before, everything varies from drive to drive." It is not correct statement, sorry. Long tes

Re: 32GB limit per swap device?

2011-08-20 Thread Alexander V. Chernikov
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Alan Cox wrote: > On 08/20/2011 12:41, Kostik Belousov wrote: >> On Sat, Aug 20, 2011 at 12:33:29PM -0500, Alan Cox wrote: >>> On Thu, Aug 18, 2011 at 3:16 AM, Alexander V. >>> Chernikovwrote: >>> On 10.08.2011 19:16, per...@pluto.rain.com wrote:

Re: 32GB limit per swap device?

2011-08-20 Thread Kostik Belousov
On Sat, Aug 20, 2011 at 10:42:28PM +0400, Alexander V. Chernikov wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Alan Cox wrote: > > On 08/20/2011 12:41, Kostik Belousov wrote: > >> On Sat, Aug 20, 2011 at 12:33:29PM -0500, Alan Cox wrote: > >>> On Thu, Aug 18, 2011 at 3:16 AM, Alexan

Re: bad sector in gmirror HDD

2011-08-20 Thread Jeremy Chadwick
On Sat, Aug 20, 2011 at 08:43:09PM +0200, Alex Samorukov wrote: > > >"The SMART tests you did didn't really amount to anything; no surprise. > >short and long tests usually do not test the surface of the disk. There > >are some drives which do it on a long test, but as I said before, > >everythin

Re: Remote installing

2011-08-20 Thread Willem Jan Withagen
On 20-8-2011 13:26, Willem Jan Withagen wrote: On 2011-08-20 13:15, Willem Jan Withagen wrote: Hi, Today I liked to live dangerously, and want to upgrade a backups server from i386 to amd64. Just to see if we could. And otherwise I'd scap it and install from usb-stick. So I have my server runn

Re: bad sector in gmirror HDD

2011-08-20 Thread Jeremy Chadwick
Dan, sorry for the previous mail. Seems my schedule today has just unexpected changed; I had social events to deal with but as I found out a few minutes ago those events are cancelled, which means I have time today to look at your mail. On Sat, Aug 20, 2011 at 01:34:41PM -0400, Dan Langille wrote

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Steven Hartland
- Original Message - From: "Andriy Gapon" thanks for doing this! I'll reiterate my suspicion just in case - I think that you should look for the cases where you stop a jail, but then re-attach and resurrect the jail before it's completely dead. Yer that's where I think its happening

Re: bad sector in gmirror HDD

2011-08-20 Thread Dan Langille
On Aug 20, 2011, at 3:57 PM, Jeremy Chadwick wrote: >>> I still suggest you replace the drive, although given its age I doubt >>> you'll be able to find a suitable replacement. I tend to keep disks >>> like this around for testing/experimental purposes and not for actual >>> use. >> >> I have se

Re: bad sector in gmirror HDD

2011-08-20 Thread Jeremy Chadwick
A follow-up given that I just viewed the SMART attribute data at the very bottom of this page as of this writing (Sat Aug 20 13:00:09 PDT 2011): http://beta.freebsddiary.org/smart-fixing-bad-sector.php And I see this: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WH

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Steven Hartland
- Original Message - From: "Steven Hartland" Looking through the code I believe I may have noticed a scenario which could trigger the problem. Given the following code:- static void prison_deref(struct prison *pr, int flags) { struct prison *ppr, *tpr; int vfslocked; if (!(fl

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Andriy Gapon
on 20/08/2011 23:24 Steven Hartland said the following: > - Original Message - From: "Steven Hartland" >> Looking through the code I believe I may have noticed a scenario which could >> trigger the problem. >> >> Given the following code:- >> >> static void >> prison_deref(struct prison *p

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Steven Hartland
- Original Message - From: "Andriy Gapon" diff -u sys/kern/kern_jail.c.orig sys/kern/kern_jail.c --- sys/kern/kern_jail.c.orig 2011-08-20 21:17:14.856618854 +0100 +++ sys/kern/kern_jail.c2011-08-20 21:18:35.307201425 +0100 @@ -2455,7 +2455,8 @@ if (--tp

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Steven Hartland
- Original Message - From: "Steven Hartland" Something else you many be more interested in Andriy:- I added in debugging options DDB & INVARIANTS to see if I can get a more useful info and the panic results in a looping panic constantly scrolling up the console. Not sure if this is a

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Steven Hartland
- Original Message - From: "Andriy Gapon" on 20/08/2011 23:24 Steven Hartland said the following: - Original Message - From: "Steven Hartland" Looking through the code I believe I may have noticed a scenario which could trigger the problem. Given the following code:- static

Re: bad sector in gmirror HDD

2011-08-20 Thread perryh
Jeremy Chadwick wrote: > ... using dd to find the bad LBAs is the only choice he has. or sysutils/diskcheckd. It uses a 64KB blocksize, falling back to 512 -- to identify the bad LBA(s) -- after getting a failure when reading a large block, and IME it runs something like 10x faster than dd with

Re: bad sector in gmirror HDD

2011-08-20 Thread Jeremy Chadwick
On Sun, Aug 21, 2011 at 02:00:33AM -0700, per...@pluto.rain.com wrote: > Jeremy Chadwick wrote: > > > ... using dd to find the bad LBAs is the only choice he has. > > or sysutils/diskcheckd. It uses a 64KB blocksize, falling back to > 512 -- to identify the bad LBA(s) -- after getting a failure