Re: Any success stories for HAST + ZFS?

2011-03-27 Thread Mikolaj Golub

On Sat, 26 Mar 2011 10:52:08 -0700 Freddie Cash wrote:

 FC> hastd backtrace is here:
 FC> http://www.sd73.bc.ca/downloads/crash/hast-backtrace.png

It is not a hastd crash, but a kernel crash triggered by hastd process.

I am not sure I got the same crash as you but apparently the race is possible
in g_gate on device creation.

I got the following crash starting many hast providers simultaneously:

fault virtual address   = 0x0

#8  0xc0c11adc in calltrap () at /usr/src/sys/i386/i386/exception.s:168
#9  0xc086ac6b in g_gate_ioctl (dev=0xc6a24300, cmd=3374345472, 
addr=0xc9fec000 "\002", flags=3, td=0xc7ff0b80)
at /usr/src/sys/geom/gate/g_gate.c:410
#10 0xc0853c5b in devfs_ioctl_f (fp=0xc9b9e310, com=3374345472, 
data=0xc9fec000, cred=0xc8c9c200, td=0xc7ff0b80)
at /usr/src/sys/fs/devfs/devfs_vnops.c:678
#11 0xc09210cd in kern_ioctl (td=0xc7ff0b80, fd=3, com=3374345472, 
data=0xc9fec000 "\002") at file.h:262
#12 0xc0921254 in ioctl (td=0xc7ff0b80, uap=0xf5edbcec)
at /usr/src/sys/kern/sys_generic.c:679
#13 0xc0916616 in syscallenter (td=0xc7ff0b80, sa=0xf5edbce4)
at /usr/src/sys/kern/subr_trap.c:315
#14 0xc0c2b9ff in syscall (frame=0xf5edbd28)
at /usr/src/sys/i386/i386/trap.c:1086
#15 0xc0c11b71 in Xint0x80_syscall ()
at /usr/src/sys/i386/i386/exception.s:266

Or just creating many ggate devices simultaneously:

for i in `jot 100`; do
./ggiocreate $i&
done

ggiocreate.c is attached.

In my case the kernel crashes in g_gate_create() when checking for name
collisions in strcmp():

/* Check for name collision. */
for (unit = 0; unit < g_gate_maxunits; unit++) {
if (g_gate_units[unit] == NULL)
continue;
if (strcmp(name, g_gate_units[unit]->sc_provider->name) != 0)
continue;
mtx_unlock(&g_gate_units_lock);
mtx_destroy(&sc->sc_queue_mtx);
free(sc, M_GATE);
return (EEXIST);
}

I think the issue is the following. When preparing sc we take
g_gate_units_lock, check for name collision, fill sc fields except
sc->sc_provider, and registers sc in g_gate_units[unit]. sc_provider is filled
later, when g_gate_units_lock is released. So the scenario is possible:

1) Thread A registers sc in g_gate_units[unit] with
g_gate_units[unit]->sc_provider still null and releases g_gate_units_lock.

2) Thread B traverses g_gate_units[] when checking for name collision and
craches accessing g_gate_units[unit]->sc_provider->name.

The attached patch fixes the issue in my case.

-- 
Mikolaj Golub



ggiocreate.c
Description: Binary data
Index: sys/geom/gate/g_gate.c
===
--- sys/geom/gate/g_gate.c	(revision 220050)
+++ sys/geom/gate/g_gate.c	(working copy)
@@ -407,13 +407,14 @@ g_gate_create(struct g_gate_ctl_create *ggio)
 	for (unit = 0; unit < g_gate_maxunits; unit++) {
 		if (g_gate_units[unit] == NULL)
 			continue;
-		if (strcmp(name, g_gate_units[unit]->sc_provider->name) != 0)
+		if (strcmp(name, g_gate_units[unit]->sc_name) != 0)
 			continue;
 		mtx_unlock(&g_gate_units_lock);
 		mtx_destroy(&sc->sc_queue_mtx);
 		free(sc, M_GATE);
 		return (EEXIST);
 	}
+	sc->sc_name = name;
 	g_gate_units[sc->sc_unit] = sc;
 	g_gate_nunits++;
 	mtx_unlock(&g_gate_units_lock);
@@ -432,6 +433,9 @@ g_gate_create(struct g_gate_ctl_create *ggio)
 	sc->sc_provider = pp;
 	g_error_provider(pp, 0);
 	g_topology_unlock();
+	mtx_lock(&g_gate_units_lock);
+	sc->sc_name = sc->sc_provider->name;
+	mtx_unlock(&g_gate_units_lock);
 
 	if (sc->sc_timeout > 0) {
 		callout_reset(&sc->sc_callout, sc->sc_timeout * hz,
Index: sys/geom/gate/g_gate.h
===
--- sys/geom/gate/g_gate.h	(revision 220050)
+++ sys/geom/gate/g_gate.h	(working copy)
@@ -76,6 +76,7 @@
  * 'P:' means 'Protected by'.
  */
 struct g_gate_softc {
+	char			*sc_name;		/* P: (read-only) */
 	int			 sc_unit;		/* P: (read-only) */
 	int			 sc_ref;		/* P: g_gate_list_mtx */
 	struct g_provider	*sc_provider;		/* P: (read-only) */
@@ -96,7 +97,6 @@ struct g_gate_softc {
 	LIST_ENTRY(g_gate_softc) sc_next;		/* P: g_gate_list_mtx */
 	char			 sc_info[G_GATE_INFOSIZE]; /* P: (read-only) */
 };
-#define	sc_name	sc_provider->geom->name
 
 #define	G_GATE_DEBUG(lvl, ...)	do {	\
 	if (g_gate_debug >= (lvl)) {	\
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Any success stories for HAST + ZFS?

2011-03-27 Thread Mikolaj Golub

On Sun, 27 Mar 2011 15:16:15 +0300 Mikolaj Golub wrote to Freddie Cash:

 MG> The attached patch fixes the issue in my case.

The patch is committed to current.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Any success stories for HAST + ZFS?

2011-03-28 Thread Mikolaj Golub

On Mon, 28 Mar 2011 10:47:22 +0100 Pete French wrote:

 >> It is not a hastd crash, but a kernel crash triggered by hastd process.
 >>
 >> I am not sure I got the same crash as you but apparently the race is 
 >> possible
 >> in g_gate on device creation.
 >>
 >> I got the following crash starting many hast providers simultaneously:

 PF> This is very interestng to me - my successful ZFS+HAST only had
 PF> a single drive, but in my new setup I am intending to use two
 PF> HAST processes and then mirror across thhem under ZFS, so I am
 PF> likely to hit this bug. Are the processes stable once launched ?

Yes, you may hit it only on hast devices creation. The workaround is to avoid
using 'hastctl role primary all', start providers one by one instead.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: way for failover zpool (no HAST needed)

2011-03-29 Thread Mikolaj Golub

On Tue, 29 Mar 2011 13:17:01 +0200 Denny Schierz wrote:

 DS> hi,

 DS> my two nodes are running fine with 8.2-stable and the LSI 9200-8e and
 DS> now, I want to build a failover for the Zpool (and later ISCSI target)

 DS> Both nodes are connected to the same disks (jbod) and now I need a way,
 DS> to get the zpool(s) running on the node with the CARP public IP.

You don't need HAST but might you want to try net-mgmt/hastmon? :-)

I wrote it because didn't like much failovering with CARP.

For hastmon you need at least 3 hosts: 2 cluster nodes (primary/secondary) and
watchdog. Watchdog is polling the states of the cluster nodes. Secondary
decides to failover when:

1) There is no connection with primary.

2) There are complaints from watchdog.

The configuration is simple and would look like below (on all 3 hosts):

resource iscsi {
exec /etc/iscsi.sh

on hostA {
remote hostB
priority 0
}
on hostB {
remote hostA
priority 1
}
on hostW {
remote hostA hostB
}
}

/etc/iscsi.sh script should support at least 3 arguments:

start -- switch node to primary (iscsi up, IP up, etc);
stop  -- switch node to secondary;
status -- return current status (0 - UP, 1 - DOWN, 2 - UNKNOWN).

You can find more information in README:

http://code.google.com/p/hastmon/wiki/README

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Any success stories for HAST + ZFS?

2011-04-01 Thread Mikolaj Golub

On Fri, 01 Apr 2011 11:40:11 +0100 Pete French wrote:

 >> Yes, you may hit it only on hast devices creation. The workaround is to 
 >> avoid
 >> using 'hastctl role primary all', start providers one by one instead.

 PF> Interesting to note that I just hit a lockup in hast (the discs froze
 PF> up - could not run hastctl or zpool import, and could not kill
 PF> them). I have two hast devices instead of one, but I am starting them
 PF> individually instead of  using 'all'. The copde includes all the latest
 PF> patches which have gone into STABLE over the last few days, none of which
 PF> look particularly controversial!

 PF> I havent tried your atch yet, nor been able to reporduce the lockup, but
 PF> thought you might be interested to know that I also had problems with
 PF> multiple providers.

This looks like a different problem. If you have this again please provide the
output of 'procstat -kka'.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: geli(4) memory leak

2011-04-01 Thread Mikolaj Golub

On Fri, 1 Apr 2011 19:43:54 +0200 Victor Balada Diaz wrote:

 VBD> On Sat, Mar 26, 2011 at 01:33:48AM +0100, Victor Balada Diaz wrote:
 >> Hello,
 >> 
 >> I'm trying to setup a new geli disk and i'm seeing what looks like a memory 
 >> leak.
 >> After initializing the device i've tried to do the dd command from 
 >> /dev/random
 >> like this one:
 >> 
 >> dd if=/dev/random of=/dev/da0p1.eli  bs=1m
 >> 

 VBD> Hello again,

 VBD> I've found the cause of the memory leak and i attach a patch to fix it. I 
hope
 VBD> the patch is good enough to get committed or at least helps someone made 
a better
 VBD> patch and commit it. Patched file is src/sys/geom/eli/g_eli.c

 VBD> The problem happens when you're using data integrity verification and you 
need
 VBD> to write more than MAXPHYS. If you look at g_eli_integrity.c:314 you'll 
 VBD> see that geli creates a second request to write all that's needed.

 VBD> Each of the request get the callback to g_eli_write_done once they're 
done. The   
 VBD> first request will get up to g_eli.c:209 and find that there are still 
requests
 VBD> pending so instead of calling g_io_deliver to notify it's written data, 
it just
 VBD> returns and waits until all requests are done to say everything's OK. The 
problem
 VBD> is that once you return, you're leaking this g_bio. You can see with 
vmstat -z how
 VBD> g_bio increases and never releases memory.

 VBD> I just destroy the current bio before returning and that prevents the 
memory leak.

For me your patch look correct. But the same issue is for read :-). Also, to
avoid the leak I think we can just do g_destroy_bio() before "all sectors"
check. See the attached patch (had some testing).

-- 
Mikolaj Golub

Index: sys/geom/eli/g_eli.c
===
--- sys/geom/eli/g_eli.c	(revision 220168)
+++ sys/geom/eli/g_eli.c	(working copy)
@@ -160,13 +160,13 @@ g_eli_read_done(struct bio *bp)
 	pbp = bp->bio_parent;
 	if (pbp->bio_error == 0)
 		pbp->bio_error = bp->bio_error;
+	g_destroy_bio(bp);
 	/*
 	 * Do we have all sectors already?
 	 */
 	pbp->bio_inbed++;
 	if (pbp->bio_inbed < pbp->bio_children)
 		return;
-	g_destroy_bio(bp);
 	sc = pbp->bio_to->geom->softc;
 	if (pbp->bio_error != 0) {
 		G_ELI_LOGREQ(0, pbp, "%s() failed", __func__);
@@ -202,6 +202,7 @@ g_eli_write_done(struct bio *bp)
 		if (bp->bio_error != 0)
 			pbp->bio_error = bp->bio_error;
 	}
+	g_destroy_bio(bp);
 	/*
 	 * Do we have all sectors already?
 	 */
@@ -215,7 +216,6 @@ g_eli_write_done(struct bio *bp)
 		pbp->bio_error);
 		pbp->bio_completed = 0;
 	}
-	g_destroy_bio(bp);
 	/*
 	 * Write is finished, send it up.
 	 */
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: geli(4) memory leak

2011-04-03 Thread Mikolaj Golub

On Sat, 2 Apr 2011 12:17:50 +0200 Pawel Jakub Dawidek wrote:

 PJD> On Sat, Apr 02, 2011 at 12:04:09AM +0300, Mikolaj Golub wrote:
 >> For me your patch look correct. But the same issue is for read :-). Also, to
 >> avoid the leak I think we can just do g_destroy_bio() before "all sectors"
 >> check. See the attached patch (had some testing).

 PJD> The patch looks good. Please commit.

Commited, thanks.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: geli(4) memory leak

2011-04-03 Thread Mikolaj Golub

On Mon, 4 Apr 2011 01:51:24 +0200 Victor Balada Diaz wrote:

 VBD> On Sun, Apr 03, 2011 at 08:43:45PM +0300, Mikolaj Golub wrote:
 >> 
 >> On Sat, 2 Apr 2011 12:17:50 +0200 Pawel Jakub Dawidek wrote:
 >> 
 >>  PJD> On Sat, Apr 02, 2011 at 12:04:09AM +0300, Mikolaj Golub wrote:
 >>  >> For me your patch look correct. But the same issue is for read :-). 
 >> Also, to
 >>  >> avoid the leak I think we can just do g_destroy_bio() before "all 
 >> sectors"
 >>  >> check. See the attached patch (had some testing).
 >> 
 >>  PJD> The patch looks good. Please commit.
 >> 
 >> Commited, thanks.

 VBD> I've been out all the weekend, so i've been unable to answer before. I'm 
glad
 VBD> it got commited and it's great you discovered and fixed the same problem 
on the
 VBD> read path.

 VBD> Are there any plans to MFC this?

Approximately after one week.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Any success stories for HAST + ZFS?

2011-04-05 Thread Mikolaj Golub

On Mon, 4 Apr 2011 11:08:16 -0700 Freddie Cash wrote:

 FC> On Sat, Apr 2, 2011 at 1:44 AM, Pawel Jakub Dawidek  
wrote:
 >>
 >> I just committed a fix for a problem that might look like a deadlock.
 >> With trociny@ patch and my last fix (to GEOM GATE and hastd) do you
 >> still have any issues?

 FC> Just to confirm, this is commit r220264, 220265, 220266 to -CURRENT?

Yes, r220264 and 220266. As it is stated in the commit log MFC is planned
after 1 week.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Any success stories for HAST + ZFS?

2011-04-10 Thread Mikolaj Golub

On Mon, 4 Apr 2011 11:08:16 -0700 Freddie Cash wrote:

 FC> Once the deadlock patches above are MFC'd to -STABLE, I can do an
 FC> upgrade cycle and test them.

Committed to STABLE.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: geli(4) memory leak

2011-04-10 Thread Mikolaj Golub

On Sun, 03 Apr 2011 20:43:45 +0300 Mikolaj Golub wrote to Pawel Jakub Dawidek:

 MG> On Sat, 2 Apr 2011 12:17:50 +0200 Pawel Jakub Dawidek wrote:

 PJD>> On Sat, Apr 02, 2011 at 12:04:09AM +0300, Mikolaj Golub wrote:
 >>> For me your patch look correct. But the same issue is for read :-). Also, 
 >>> to
 >>> avoid the leak I think we can just do g_destroy_bio() before "all sectors"
 >>> check. See the attached patch (had some testing).

 PJD>> The patch looks good. Please commit.

 MG> Commited, thanks.

In STABLE too.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Any success stories for HAST + ZFS?

2011-04-11 Thread Mikolaj Golub

On Mon, 11 Apr 2011 11:26:15 -0700 Freddie Cash wrote:

 FC> On Sun, Apr 10, 2011 at 12:36 PM, Mikolaj Golub  
wrote:
 >> On Mon, 4 Apr 2011 11:08:16 -0700 Freddie Cash wrote:
 >>  FC> Once the deadlock patches above are MFC'd to -STABLE, I can do an
 >>  FC> upgrade cycle and test them.
 >>
 >> Committed to STABLE.

 FC> Updated src tree to r220537.  Recompiled world, kernel, etc.
 FC> Installed world, kernel, etc.  ZFSv28 patch was not affected.

 FC> Everything is detected correctly, everything comes up correctly.  See
 FC> a new option (reload) in the RC script for hast.

 FC> Can create/change role for 24 hast devices simultaneously.

 FC> Can switch between master/slave modes.

 FC> Have 5 rsyncs running in parallel without any issues, transferring
 FC> 80-120 Mbps over the network (just under 100 Mbps seems to be the
 FC> average right now).

 FC> Switching roles while the rsyncs are running succeeds without
 FC> deadlocking (obviously, rsync complains a whole bunch while the switch
 FC> happens as the pool disappears out from underneath it, but it picks up
 FC> again when the pool is back in place).

 FC> Hitting the reset switch on the box while the rsyncs are running
 FC> doesn't affect the hast devices or the pool, beyond losing the last 5
 FC> seconds of writes.

 FC> It's only been a couple of hours of testing and hammering, but so far
 FC> things are much more stable/performant than before.

Cool! Thanks for reporting!

 FC> Anything else I should test?

Nothing particular, but any tests and reports are appreciated. E.g. ones of
the recent features Pawel has added are checksum and compression. You could
try different options and compare :-)

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: buildworld FAIL.

2011-04-26 Thread Mikolaj Golub

On Sat, 23 Apr 2011 09:38:39 -0500 Matthew D. Fuller wrote:

 MDF> On Sat, Apr 23, 2011 at 05:52:47AM -0700 I heard the voice of
 MDF> Jeremy Chadwick, and lo! it spake thus:
 >> On Sat, Apr 23, 2011 at 09:04:42AM +0200, Pawel Tyll wrote:
 >> > So was NO_OPENSSL deprecated or something?
 >> 
 >> I think he's implying that hast indirectly relies upon OpenSSL.

 MDF> There's some conditionalization on MK_OPENSSL in the Makefile (and via
 MDF> that, in the code), but it's incomplete.  Whether that means it
 MDF> _should_ be buildable without OpenSSL and is just insufficiently
 MDF> tested, or whether it really just flat needs OpenSSL and the
 MDF> conditionalization is vestigial, I don't know.  pjd@ cc'd.

The attached patch should fix this.

-- 
Mikolaj Golub
Index: sbin/hastd/hast_proto.c
===
--- sbin/hastd/hast_proto.c	(revision 221054)
+++ sbin/hastd/hast_proto.c	(working copy)
@@ -69,7 +69,9 @@ struct hast_pipe_stage {
 
 static struct hast_pipe_stage pipeline[] = {
 	{ "compression", compression_send, compression_recv },
+#ifdef HAVE_CRYPTO
 	{ "checksum", checksum_send, checksum_recv }
+#endif
 };
 
 /*
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: buildworld FAIL.

2011-04-26 Thread Mikolaj Golub

On Tue, 26 Apr 2011 18:25:09 +0200 Pawel Jakub Dawidek wrote:

 PJD> On Tue, Apr 26, 2011 at 12:44:31PM +0300, Mikolaj Golub wrote:
 >> 
 >> On Sat, 23 Apr 2011 09:38:39 -0500 Matthew D. Fuller wrote:
 >> 
 >>  MDF> On Sat, Apr 23, 2011 at 05:52:47AM -0700 I heard the voice of
 >>  MDF> Jeremy Chadwick, and lo! it spake thus:
 >>  >> On Sat, Apr 23, 2011 at 09:04:42AM +0200, Pawel Tyll wrote:
 >>  >> > So was NO_OPENSSL deprecated or something?
 >>  >> 
 >>  >> I think he's implying that hast indirectly relies upon OpenSSL.
 >> 
 >>  MDF> There's some conditionalization on MK_OPENSSL in the Makefile (and via
 >>  MDF> that, in the code), but it's incomplete.  Whether that means it
 >>  MDF> _should_ be buildable without OpenSSL and is just insufficiently
 >>  MDF> tested, or whether it really just flat needs OpenSSL and the
 >>  MDF> conditionalization is vestigial, I don't know.  pjd@ cc'd.
 >> 
 >> The attached patch should fix this.

 PJD> The patch looks good. Please commit.

Thanks. Committed.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: way for failover zpool (no HAST needed)

2011-04-27 Thread Mikolaj Golub

On Wed, 27 Apr 2011 14:05:11 +0200 Denny Schierz wrote:

 DS> hi,

 DS> Am Dienstag, den 29.03.2011, 23:36 +0300 schrieb Mikolaj Golub:
 >> 
 >> 2) There are complaints from watchdog. 

 DS> what happens, if the watchdog isn't available and one or both nodes are
 DS> rebooting or something else?

Without receiving complaints secondary wont switch to primary. This is done
intentionally, so the node does not make decision on its own. But you can have
several watchdogs if this really worries you.

 DS> the other thing what could happen: the connection between the host and
 DS> the SAS switch is death.

 DS> carp, ifstate and hastmon looking for the reachable IP, but not, if the
 DS> local storage is available. So I have a closer look to devd and zfs and

hastmon isn't just looking if the IP reachable. watchdog connects to a cluster
node and ask its status. So the result depends on how smart the script one use
to get the status.

 DS> shutdown in case of problems the carp interface / or whole machine, to
 DS> force a switch.

 DS> cu denny

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: way for failover zpool (no HAST needed): hastmon

2011-04-29 Thread Mikolaj Golub
Oops, just noticed this mail :-) Denny sent me another message privately and I
hope I answered his questions but will answer to this message too, in case
someone is interested.

On Thu, 28 Apr 2011 15:22:22 +0200 Denny Schierz wrote:

 DS> hi,

 DS> ok, here we go: I've installed hastmon and both FreeBSD nodes and one on
 DS> Linux Debian as watchdog:

 DS> Simple setup:
 DS>  
 DS> # cat /etc.local/hastmon.conf 

 DS> resource sanip {
 DS> exec /usr/local/_rbg/bin/san-ip
 DS> friends iscsihead-m iscsihead-s nos

 DS> on iscsihead-m {
 DS> remote tcp4://iscsihead-s
 DS> priority 0
 DS> }
 DS> on iscsihead-s {
 DS> remote tcp4://iscsihead-m
 DS> priority 1
 DS> }
 DS> on linux {
 DS> remote tcp4://iscsihead-m tcp4://iscsihead-s
 DS> }
 DS> } 

 DS> It works only half. 

 DS> The simple script adds/remove an alias for the em0 and for status it
 DS> does a ping -c 1 to the global ip. After tell every host, what is role
 DS> is, I get on the primary "state unknown", in the secondary "state run"
 DS> and watchdog for the Linux host.

It is difficult to tell without additional information what happened. It might
be that your '/usr/local/_rbg/bin/san-ip status' was returning unknown status.

In this case running manually

/usr/local/_rbg/bin/san-ip status; echo $?

might be helpful. And logs too :-).

 DS> Than I rebooted the primary, the secondary take over and executed the
 DS> script. After the primary was reachable again, he doesn't get the
 DS> secondary role, but init/unknown.

 DS> The same happens, in the opposite:

 DS> from Linux:

 DS> hastmonctl status
 DS> sanip:
 DS>   role: watchdog
 DS>   exec: /usr/local/_rbg/bin/san-ip
 DS>   remote:
 DS> tcp4://iscsihead-m (primary/run)
 DS> tcp4://iscsihead-s (init/unknown)
 DS>   state: run
 DS>   attempts: 0 from 5
 DS>   complaints: 0 for last 60 sec (threshold 3)
 DS>   heartbeat: 10 sec

 DS> from iscsihead-s:

 DS> hastmonctl status
 DS> sanip:
 DS>   role: init
 DS>   exec: /usr/local/_rbg/bin/san-ip
 DS>   remote:
 DS> tcp4://iscsihead-m
 DS>   state: unknown
 DS>   attempts: 0 from 5
 DS>   complaints: 0 for last 60 sec (threshold 3)
 DS>   heartbeat: 10 sec

 DS> and last from iscsihead-m


 DS> hastmonctl status
 DS> sanip:
 DS>   role: primary
 DS>   exec: /usr/local/_rbg/bin/san-ip
 DS>   remote:
 DS> tcp4://iscsihead-s (disconnected)
 DS>   state: run
 DS>   attempts: 0 from 5
 DS>   complaints: 0 for last 60 sec (threshold 3)
 DS>   heartbeat: 10 sec

 DS> If I take a look into the logfile from the iscsihead-m:

 DS> [sanip] (primary) Remote node acts as init for the resource and not as
 DS> secondary.

 DS> [sanip] (primary) Handshake header from tcp4://iscsihead-s has no
 DS> 'token' field.

 DS> Do I have missed something?

 DS> cu denny

This is expected behavior. After start hastmon is in init role. You need to
setup the role you want manually or via a startup script.

This is because you might want different configurations depending on your
requirenments:

1) After start the role is set manually by administrator (useful e.g. if you
prefer to investigate crashed host before returning it back to cluster).

2) After star the node is switched to secondary automatically (by rc script).

If all cluster nodes are configured to be in secondary on startup, and all
started simultaneously watchdog will figure out that there is no primary and
will send complaints to all secondary nodes. The nodes will be trying to
switch to master simultaneously and the node with highest priority will win.

3) One node that has highest priority configures is set on startup always to
primary. All others are to secondary.

With this configuration if the primary fails, secondary switches to primary,
then when the initial primary comes back it becomes primary again
automatically.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: HAST instability

2011-05-30 Thread Mikolaj Golub

On Mon, 30 May 2011 17:43:04 +0300 Daniel Kalchev wrote:

 DK> Some further investigation:

 DK> The HAST nodes do not disconnect when checksum is enabled (either
 DK> crc32 or sha256).

 DK> One strange thing is that there is never established TCP connection
 DK> between both nodes:

 DK> tcp4   0  0 10.2.101.11.48939  10.2.101.12.8457   
FIN_WAIT_2
 DK> tcp4   0   1288 10.2.101.11.57008  10.2.101.12.8457   
CLOSE_WAIT
 DK> tcp4   0  0 10.2.101.11.46346  10.2.101.12.8457   
FIN_WAIT_2
 DK> tcp4   0  90648 10.2.101.11.13916  10.2.101.12.8457   
CLOSE_WAIT
 DK> tcp4   0  0 10.2.101.11.8457   *.*LISTEN

It is normal. hastd uses the connections only in one direction so it calls
shutdown to close unused directions.

 DK> When using sha256 one CPU core is 100% utilized by each hastd process,
 DK> while 70-80MB/sec per HAST resource is being transferred (total of up
 DK> to 140 MB/sec traffic for both);

 DK> When using crc32 each CPU core is at 22% utilization;

 DK> When using none as checksum, CPU usage is under 10%

I suppose when checksum is enabled the bottleneck is cpu, the triffic rate is
lower and the problem is not triggered.

 DK> Eventually after many hours, got corrupted communication:

 DK> May 30 17:32:35 b1b hastd[9827]: [data0] (secondary) Hash mismatch.

"Hash mismatch" message suggests that actually you were using checksum then,
weren't you?

 DK> May 30 17:32:35 b1b hastd[9827]: [data0] (secondary) Unable to receive
 DK> request data: No such file or directory.
 DK> May 30 17:32:38 b1b hastd[9397]: [data0] (secondary) Worker process
 DK> exited ungracefully (pid=9827, exitcode=75).

 DK> and

 DK> May 30 17:32:27 b1a hastd[1837]: [data0] (primary) Unable to receive
 DK> reply header: Operation timed out.
 DK> May 30 17:32:30 b1a hastd[1837]: [data0] (primary) Disconnected from
 DK> 10.2.101.12.
 DK> May 30 17:32:30 b1a hastd[1837]: [data0] (primary) Unable to send
 DK> request (Broken pipe): WRITE(99128470016, 131072).

It looks a little different than in your fist message.

Do you have clock in sync on both nodes?

I would like to look at full logs for some rather large period, with several
cases, from both primary and secondary (and be sure about synchronized time).

Also, it might worth checking that there is no network packet corruption (some
strange things in netstat -di, netstat -s, may be copying large files via net
and comparing checksums).

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: HAST instability

2011-05-30 Thread Mikolaj Golub

On Mon, 30 May 2011 17:43:04 +0300 Daniel Kalchev wrote:

 DK> tcp4   0  0 10.2.101.11.48939  10.2.101.12.8457   
FIN_WAIT_2
 DK> tcp4   0   1288 10.2.101.11.57008  10.2.101.12.8457   
CLOSE_WAIT
 DK> tcp4   0  0 10.2.101.11.46346  10.2.101.12.8457   
FIN_WAIT_2
 DK> tcp4   0  90648 10.2.101.11.13916  10.2.101.12.8457   
CLOSE_WAIT
 DK> tcp4   0  0 10.2.101.11.8457   *.*LISTEN

Also, it might be useful to see if you normally have full receive buffers like
above or only when the issue is observed, running netstat in loop, something
like below:

while sleep 5; do 
  t=`date '+%F %H:%M:%S'`;
  netstat -na | grep 8457 |
  while read l; do 
echo "$t $l";
  done;
done > /tmp/netstat.log

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: HAST instability

2011-05-31 Thread Mikolaj Golub

On Tue, 31 May 2011 15:51:07 +0300 Daniel Kalchev wrote:

 DK> On 30.05.11 21:42, Mikolaj Golub wrote:
 >>   DK>  One strange thing is that there is never established TCP connection
 >>   DK>  between both nodes:
 >>
 >>   DK>  tcp4   0  0 10.2.101.11.48939  10.2.101.12.8457   
 >> FIN_WAIT_2
 >>   DK>  tcp4   0   1288 10.2.101.11.57008  10.2.101.12.8457   
 >> CLOSE_WAIT
 >>   DK>  tcp4   0  0 10.2.101.11.46346  10.2.101.12.8457   
 >> FIN_WAIT_2
 >>   DK>  tcp4   0  90648 10.2.101.11.13916  10.2.101.12.8457   
 >> CLOSE_WAIT
 >>   DK>  tcp4   0  0 10.2.101.11.8457   *.*
 >> LISTEN
 >>
 >> It is normal. hastd uses the connections only in one direction so it calls
 >> shutdown to close unused directions.
 DK> So the TCP connections are all too short-lived that I can never see a
 DK> single one in ESTABLISHED state? 10Gbit Ethernet is indeed fast, so
 DK> this might well be possible...

No the connections are persistent, just only one (unused) direction of
communication is closed. See shutdown(2) for further info.

 >> I would like to look at full logs for some rather large period, with several
 >> cases, from both primary and secondary (and be sure about synchronized 
 >> time).
 DK> I have made sure clocks are synchronized and am currently running on a 
freshly rebooted nodes (with two additional SATA drives at each node) -- 
 DK> so far some interesting findings, like  I get hash errors and
 DK> disconnects much more frequent now. Will post when an bonnie++ run on
 DK> the ZFS filesystem on top of the HAST resources finishes.

As I wrote privately, it would be nice to see both netstat and hast logs (from
both nodes) for the same rather long period, when several cases occured. It
would be good to place them somewere on web so other guys could access them
too, as I will be offline for 7-10 days and will not be able to help you until
I am back.

 DK> One additional note: while playing with this setup, I tried to
 DK> simulate local disk going away in the hope HAST will switch to using
 DK> the remote disk. Instead of asking someone at the site to pull out the
 DK> drive, I just issued on the primary

 DK> hastctl role init data0

 DK> which resulted in kernel panic. Unfortunately, there was no sufficient
 DK> dump space for 48GB. I will re-run this again with more drives for the
 DK> crash dump. Anything you want me to look for in particular? (kernels
 DK> have no KDB compiled in yet)

Well, removing physical disk (device /dev/gpt/data0 consumed by hastd
dissapears) and switching a resource to init role (devive /dev/hast/data0
consumed by FS dissapears) are two different things. Sure you should not
normally change the resource role (destroy hast device) before unmounting
(exporting) FS.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: hast syncronization speed issue

2011-06-08 Thread Mikolaj Golub

On Thu, 2 Jun 2011 11:47:26 +0300 Yurius Radomskyi wrote:

 YR> Hi,

 YR> I have a HAST device set up between two systems. I experience very low
 YR> speed with dirty blocks synchronization after split-brain condition
 YR> been recovered: it's 200KB/s average on 1Gbit link. On the other side,
 YR> when i copy a big file to the zfs partition  that is created on top of
 YR> the hast device the synchronization speed between the host is 50MB/s
 YR> (wich is not too high for 1Gbit link, but acceptable.)

Could you please try the patch (the kernel needs rebuilding)?

http://people.freebsd.org/~trociny/uipc_socket.c.patch

The patch was committed to current (r222454) and is going to be MFCed after
some time.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Unusable hastd in FreeBSD 8.2

2011-06-08 Thread Mikolaj Golub
Hi,

On Mon, 6 Jun 2011 16:46:55 +0200 Victor Balada Diaz wrote:

 VBD> Hello,

 VBD> Hastd in it's current form is not usable on FreeBSD 8.2-RELEASE or in 
8-STABLE. You
 VBD> can see why in this thread:

 VBD> http://lists.freebsd.org/pipermail/freebsd-fs/2011-February/010752.html

 VBD> You can see the committed fix in:

 VBD> http://svnweb.freebsd.org/base?view=revision&revision=219721

 VBD> But it's never been MFCd. Is it possible to MFC it to 8-STABLE and maybe
 VBD> do an errata notice for RELENG_8_2?

Actually, it was MFCed. In r220151.

Also, I don't think this is an issue that makes hastd unusable in FreeBSD 8.2 
:-).

The issue is the following. Before switching the node to primary the failover
(third-party) script is checking if secondary process is still alive (assuming
that in this case the primary on another node is still alive too) and fails if
it is -- some protection against split brain.  But before r219721 secondary
might not die automatically when primary host was down.

This can be workarounded. E.g. by removing the check in the script :-). Or
setting net.inet.tcp.keepidle to some small value (e.g. 10 seconds) -- this
should make secondary notice that another end is dead after this interval.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: HAST instability

2011-06-10 Thread Mikolaj Golub

On Fri, 03 Jun 2011 19:18:29 +0300 Daniel Kalchev wrote:

 DK> Well, apparently my HAST joy was short. On a second run, I got stuck with

 DK> Jun  3 19:08:16 b1a hastd[1900]: [data2] (primary) Unable to receive
 DK> reply header: Operation timed out.

 DK> on the primary. No messages on the secondary.

 DK> On primary:

 DK> # netstat -an | grep 8457

 DK> tcp4   0  0 10.2.101.11.42659  10.2.101.12.8457   
FIN_WAIT_2
 DK> tcp4   0  0 10.2.101.11.62058  10.2.101.12.8457   
CLOSE_WAIT
 DK> tcp4   0  0 10.2.101.11.34646  10.2.101.12.8457   
FIN_WAIT_2
 DK> tcp4   0  0 10.2.101.11.11419  10.2.101.12.8457   
CLOSE_WAIT
 DK> tcp4   0  0 10.2.101.11.37773  10.2.101.12.8457   
FIN_WAIT_2
 DK> tcp4   0  0 10.2.101.11.21911  10.2.101.12.8457   
FIN_WAIT_2
 DK> tcp4   0  0 10.2.101.11.40169  10.2.101.12.8457   
CLOSE_WAIT
 DK> tcp4   0  97749 10.2.101.11.44360  10.2.101.12.8457   
CLOSE_WAIT
 DK> tcp4   0  0 10.2.101.11.8457   *.*LISTEN

 DK> on secondary

 DK> # netstat -an | grep 8457

 DK> tcp4   0  0 10.2.101.12.8457   10.2.101.11.42659  
CLOSE_WAIT
 DK> tcp4   0  0 10.2.101.12.8457   10.2.101.11.62058  
FIN_WAIT_2
 DK> tcp4   0  0 10.2.101.12.8457   10.2.101.11.34646  
CLOSE_WAIT
 DK> tcp4   0  0 10.2.101.12.8457   10.2.101.11.11419  
FIN_WAIT_2
 DK> tcp4   0  0 10.2.101.12.8457   10.2.101.11.37773  
CLOSE_WAIT
 DK> tcp4   0  0 10.2.101.12.8457   10.2.101.11.21911  
CLOSE_WAIT
 DK> tcp4   0  0 10.2.101.12.8457   10.2.101.11.40169  
FIN_WAIT_2
 DK> tcp4   66415  0 10.2.101.12.8457   10.2.101.11.44360  
FIN_WAIT_2
 DK> tcp4   0  0 10.2.101.12.8457   *.*LISTEN

 DK> on primary

 DK> # hastctl status
 DK> data0:
 DK>   role: primary
 DK>   provname: data0
 DK>   localpath: /dev/gpt/data0
 DK>   extentsize: 2097152 (2.0MB)
 DK>   keepdirty: 64
 DK>   remoteaddr: 10.2.101.12
 DK>   sourceaddr: 10.2.101.11
 DK>   replication: fullsync
 DK>   status: complete
 DK>   dirty: 0 (0B)
 DK> data1:
 DK>   role: primary
 DK>   provname: data1
 DK>   localpath: /dev/gpt/data1
 DK>   extentsize: 2097152 (2.0MB)
 DK>   keepdirty: 64
 DK>   remoteaddr: 10.2.101.12
 DK>   sourceaddr: 10.2.101.11
 DK>   replication: fullsync
 DK>   status: complete
 DK>   dirty: 0 (0B)
 DK> data2:
 DK>   role: primary
 DK>   provname: data2
 DK>   localpath: /dev/gpt/data2
 DK>   extentsize: 2097152 (2.0MB)
 DK>   keepdirty: 64
 DK>   remoteaddr: 10.2.101.12
 DK>   sourceaddr: 10.2.101.11
 DK>   replication: fullsync
 DK>   status: complete
 DK>   dirty: 6291456 (6.0MB)
 DK> data3:
 DK>   role: primary
 DK>   provname: data3
 DK>   localpath: /dev/gpt/data3
 DK>   extentsize: 2097152 (2.0MB)
 DK>   keepdirty: 64
 DK>   remoteaddr: 10.2.101.12
 DK>   sourceaddr: 10.2.101.11
 DK>   replication: fullsync
 DK>   status: complete
 DK>   dirty: 0 (0B)

 DK> Sits in this state for over 10 minutes.

 DK> Unfortunately, no KDB in kernel. Any ideas what other to look for?

Could you please try this patch?

http://people.freebsd.org/~trociny/hastd.no_shutdown.patch

After patching you need to rebuild hastd and restart it (I expect only on
secondary is enough but it is better to do this on both nodes). No server
restart is needed.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: HAST instability

2011-06-10 Thread Mikolaj Golub

On Fri, 10 Jun 2011 20:05:43 +0300 Mikolaj Golub wrote to Daniel Kalchev:

 MG> Could you please try this patch?

 MG> http://people.freebsd.org/~trociny/hastd.no_shutdown.patch

Sure you still have to have your kernel patched with uipc_socket.c.patch :-)

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: HAST instability

2011-06-14 Thread Mikolaj Golub

On Tue, 14 Jun 2011 16:39:11 +0300 Daniel Kalchev wrote:

 DK> On 10.06.11 20:07, Mikolaj Golub wrote:
 >> On Fri, 10 Jun 2011 20:05:43 +0300 Mikolaj Golub wrote to Daniel Kalchev:
 >>
 >>   MG>  Could you please try this patch?
 >>
 >>   MG>  http://people.freebsd.org/~trociny/hastd.no_shutdown.patch
 >>
 >> Sure you still have to have your kernel patched with uipc_socket.c.patch :-)
 >>

 DK> It is now running for about a day with both patches applied, without
 DK> disconnects.

 DK> Also, now TCP/IP connections always stay in ESTABLISHED state. As I
 DK> believe they should. Primary to secondary drain quickly on switching
 DK> form init to primary etc. No troubles without checksums as
 DK> well. Kernel is as of

Thanks!

It has turned out that automatic receive buffer sizing works only for
connections in ESTABLISHED state. And with small receive buffer the connection
might stuck sending data only via TCP window probes -- one byte every few
seconds (see "Scenario to make recv(MSG_WAITALL) stuck" in net@ for details).

hastd.no_shutdown.patch disables closing of unused directions so the
connections remain in ESTABLISHED state and automatic receive buffer sizing
works again.

uipc_socket.c.patch has been committed to CURRENT and I am going to MFC soon.

 DK> FreeBSD b1a 8.2-STABLE FreeBSD 8.2-STABLE #1: Mon Jun 13 11:32:38 EEST
 DK> 2011 root@b1a:/usr/obj/usr/src/sys/GENERIC  amd64

 DK> Daniel

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: HAST + ZFS: no action on drive failure

2011-07-02 Thread Mikolaj Golub

On Thu, 30 Jun 2011 20:02:19 -0700 Timothy Smith wrote:

 TS> First posting here, hopefully I'm doing it right =)

 TS> I also posted this to the FreeBSD forum, but I know some hast folks monitor
 TS> this list regularly and not so much there, so...

 TS> Basically, I'm testing failure scenarios with HAST/ZFS. I got two nodes,
 TS> scripted up a bunch of checks and failover actions between the nodes.
 TS> Looking good so far, though more complex that I expected. It would be cool
 TS> to post it somewher to get some pointers/critiques, but that's another
 TS> thing.

 TS> Anyway, now I'm just seeing what happens when a drive fails on primary 
node.
 TS> Oddly/sadly, NOTHING!

 TS> Hast just keeps on a ticking, and doesn't change the state of the failed
 TS> drive, so the zpool has no clue the drive is offline. The
 TS> /dev/hast/ remains. The hastd does log some errors to the system
 TS> log like this, but nothing more.

 TS> messages.0:Jun 30 18:39:59 nas1 hastd[11066]: [ada6] (primary) Unable to
 TS> flush activemap to disk: Device not configured.
 TS> messages.0:Jun 30 18:39:59 nas1 hastd[11066]: [ada6] (primary) Local 
request
 TS> failed (Device not configured): WRITE(4736512, 512).

Although the request to local drive failed it succeeded on remote node, so
data was not lost, it was considered as successful, and no error was returned
to ZFS.

 TS> So, I guess the question is, "Do I have to script a cronjob to check for
 TS> these kinds of errors and then change the hast resource to 'init' or
 TS> something to handle this?" Or is there some kind of hastd config setting
 TS> that I need to set? What's the SOP for this?

Currently the only way to know is monitoring logs. It is not difficult to hook
event for these errors in the HAST code (like it is done for
connect/disconnect, syncstart/done etc) so one could script what to do on an
error occurrence but I am not sure it is a good idea -- the errors may be
generated with high rate.

 TS> As something related too, when the zpool in FreeBSD does finally notice 
that
 TS> the drive is missing because I have manually changed the hast resource to
 TS> INIT (so the /dev/hast/ is gone), my zpool (raidz2) hot spare doesn't
 TS> engage, even with "autoreplace=on". The zpool status of the degraded pool
 TS> seems to indicate that I should manually replace the failed drive. If 
that's
 TS> the case, it's not really a "hot spare". Does this mean the "FMA Agent"
 TS> referred to in the ZFS manual is not implemented in FreeBSD?

 TS> thanks!
 TS> ___
 TS> freebsd-stable@freebsd.org mailing list
 TS> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 TS> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: HAST + ZFS: no action on drive failure

2011-07-03 Thread Mikolaj Golub

On Sat, 2 Jul 2011 14:43:15 -0700 Timothy Smith wrote:

 TS> Hello Mikolaj,

 TS> So, just to be clear, if a local drive fails in my pool, but the
 TS> corresponding remote drive remains available, then hastd will both write to
 TS> and read from the remote drive? That's really very cool!

Yes.

 TS> I looked more closely at the hastd(8) man page. There is some indication of
 TS> what you say, but not so clear:

 TS> "Read operations (BIO_READ) are handled locally unless I/O error occurs or 
local
 TS> version of the data is not up-to-date yet (synchronization is in 
progress)."

This is about READ operations, and for WRITE we have just above:

 Every write, delete and flush operation (BIO_WRITE,
 BIO_DELETE, BIO_FLUSH) is send to local component and synchronously
 replicated to the remote (secondary) node if it is available.

There might be things that should be improved in documetation but I don't feel
capable to do this :-)

 TS> Perhaps this can be modified a bit? Adding, "or the local disk is
 TS> unavailable. In such a case, the I/O operation will be handled by the 
remote
 TS> resource."

 TS> It does makes sense however, since HAST is base on the idea of raid. This
 TS> feature increases the redundancy of the system greatly. My boss will  be
 TS> very impressed, as am I!

 TS> I did notice however that when the pulled drive is reinserted, I need to
 TS> change the associated hast resource to init, then back to primary to allow
 TS> hastd to once again use it (perhaps the same if the secondary drive is
 TS> failed?). Unless it will do this on it's own after some time? I did not 
wait
 TS> more than a few minutes. But this is easy enough to script or to monitor 
the
 TS> log and present a notification to admin at such a time.

When you are reinserting the drive the resource should be in init state.

Remember, some data was updated on secondary only, so the right sequence of
operations could be:

1) Failover (switch primary to init and secondary to primary).

2) Fix the disk issue.

3) If this is a new drive, recreate HAST metadata on it with hastctl utility.

4) Switch the repaired resource to secondary and wait until the new primary
connects to it and updates metadata. After this synchronization is started.

5) You can switch to the previous primary before the synchronization is
complete -- it will continue in right direction, but then you should expect
performance degradation until the synchronization is complete -- the READ
requests will go to remote node. So it might be better to wait until the
synchronization is complete before switching back.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: /usr/bin/script eating 100% cpu with portupgrade and xargs

2011-09-18 Thread Mikolaj Golub

On Sun, 18 Sep 2011 08:47:13 +0200 Ronald Klop wrote:

 RK> On Sun, 18 Sep 2011 07:39:01 +0200, Jeremy Chadwick
 RK>  wrote:

 >> On Sun, Sep 18, 2011 at 12:54:13AM -0400, Jason Hellenthal wrote:
 >>> On Sun, Sep 18, 2011 at 01:49:15AM +0200, Ronald Klop wrote:
 >>> > Hi,
 >>> >
 >>> > I'm running portupgrade in screen to update all the ports for
 >>> > 9-BETA2/9-CURRENT on amd64. While doing this script eats 100% cpu.
 >>> > Because portupgrade -fa crashed I'm running this command to update the
 >>> > remaining non-updates ports.
 >>> > find /var/db/pkg -name +DESC -mtime +2 |cut -d / -f 5 | xargs
 >>> time nice -n
 >>> > 20 portupgrade -f
 >>> >
 >>> > The output of truss -p `pgrep script` is this:
 >>> > clock_gettime(13,{1316301104.0 })= 0 (0x0)
 >>> > select(5,{0 4},0x0,0x0,{30.00 }) = 1 (0x1)
 >>> > read(0,0x7fffcdf0,1024)  = 0 (0x0)
 >>> > write(4,0x7fffcdf0,0)= 0 (0x0)
 >>> > clock_gettime(13,{1316301104.0 })= 0 (0x0)
 >>> > select(5,{0 4},0x0,0x0,{30.00 }) = 1 (0x1)
 >>> > read(0,0x7fffcdf0,1024)  = 0 (0x0)
 >>> > write(4,0x7fffcdf0,0)= 0 (0x0)
 >>> > clock_gettime(13,{1316301104.0 })= 0 (0x0)
 >>> > select(5,{0 4},0x0,0x0,{30.00 }) = 1 (0x1)
 >>> > read(0,0x7fffcdf0,1024)  = 0 (0x0)
 >>> > write(4,0x7fffcdf0,0)= 0 (0x0)
 >>> > clock_gettime(13,{1316301104.0 })= 0 (0x0)
 >>> > select(5,{0 4},0x0,0x0,{30.00 }) = 1 (0x1)
 >>> > read(0,0x7fffcdf0,1024)  = 0 (0x0)
 >>> > write(4,0x7fffcdf0,0)= 0 (0x0)
 >>> >
 >>> > So it is really fast in reading and writing 0 bytes most of the time.
 >>> >
 >>> > I also found
 >>> http://web.archiveorange.com/archive/v/6ETvLvjo60Gj9geAUAb6
 >>> > and I think I am better of by rewriting my command so stdin/stdout is
 >>> > still the terminal. Although the link is a couple of years old.
 >>> >
 >>> > Is this known? Can somebody explain me why my xargs command is
 >>> not working
 >>> > well?
 >>> >
 >>>
 >>> Are you absolutely sure that its script(1) causing this ? 100% CPU usage
 >>> has been a known side effect of screen(1) for quite some time. Rebuild
 >>> it and try again.
 >>
 >> Jason's referring to this, I believe:
 >> http://www.freebsd.org/cgi/cvsweb.cgi/ports/sysutils/screen/Makefile#rev1.55
 >>
 >> To clarify the what the commit message means: it does not mean "when the
 >> package is installed the installation takes up 100% CPU".  It means
 >> "once the package is installed and screen is used, screen takes up 100%
 >> CPU".  I know because I've seen this behaviour in the past (one of the
 >> many, many reasons I build ports from source).
 >>
 >> However:
 >> http://www.freebsd.org/cgi/cvsweb.cgi/ports/sysutils/screen/Makefile#rev1.78
 >>
 >> So: If a binary package is being installed through your above
 >> portupgrade command, and you're seeing this problem, then it sounds to
 >> me like commit revision 1.78 is a regression and NO_PACKAGE should be
 >> put back into place + packages removed from all mirrors.
 >>
 >> There are many reasons to not use GNU screen at all, or if you must have
 >> something like it, use tmux.  I recently had to provide an analysis of
 >> how GNU screen destroys one's terminal[1]; so if the above problem turns
 >> out to be caused by GNU screen as well, I'll just add it to my
 >> ever-growing list of reasons the software should be nuked from orbit.
 >>
 >> Otherwise, if this turns out to be a problem with portupgrade (which you
 >> found some evidence supporting such), then the solution is simple: stop
 >> using portupgrade, use portmaster (if it lacks things you need ask Doug
 >> Barton, he's incredibly receptive to adding new features/fixing things).
 >> Two databases that aren't compatible, ruby shims, and other crap = not
 >> worth it.  Think the database ordeal is long over with/fixed/whatever?
 >> It isn't[2].
 >>
 >> [1]:
 >> http://lists.freebsd.org/pipermail/freebsd-stable/2011-June/063052.html
 >> [2]:
 &

Re: /usr/bin/script eating 100% cpu with portupgrade and xargs

2011-09-18 Thread Mikolaj Golub

On Sun, 18 Sep 2011 13:25:26 +0200 Ronald Klop wrote:

 RK> It is a while since I programmed C, but why will writing 0 bytes give
 RK> the reader an end-of-file? Shouldn't the fd be closed to indicate
 RK> end-of-file?

AFAIR, this trick with writing 0 to emulate EOF because we can't close the fd
-- we still want to read from it.  Poor shutdown(2) for non-socket :-).

Colin might tell more...

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: /usr/bin/script eating 100% cpu with portupgrade and xargs

2011-09-18 Thread Mikolaj Golub

On Sun, 18 Sep 2011 20:24:23 +0300 Kostik Belousov wrote:

 KB> On Sun, Sep 18, 2011 at 02:54:34PM +0300, Mikolaj Golub wrote:
 >> 
 >> On Sun, 18 Sep 2011 13:25:26 +0200 Ronald Klop wrote:
 >> 
 >>  RK> It is a while since I programmed C, but why will writing 0 bytes give
 >>  RK> the reader an end-of-file? Shouldn't the fd be closed to indicate
 >>  RK> end-of-file?
 >> 
 >> AFAIR, this trick with writing 0 to emulate EOF because we can't close the 
 >> fd
 >> -- we still want to read from it.  Poor shutdown(2) for non-socket :-).
 >> 
 >> Colin might tell more...

 KB> Please note that interpreting the receiving of 0 bytes on the terminal 
 KB> as EOF is only a convention. If done absolutely properly, script shall
 KB> not interpret zero-byte read as EOF. Might be, the reasonable thing to
 KB> do would be to only look at the stdin once in a second after receiving
 KB> zero-bytes, and switching it back to normal mode if something is read.

Ok. I see. Below is the patch that does something like this.

-- 
Mikolaj Golub

Index: usr.bin/script/script.c
===
--- usr.bin/script/script.c	(revision 225653)
+++ usr.bin/script/script.c	(working copy)
@@ -53,6 +53,7 @@ static const char sccsid[] = "@(#)script.c	8.1 (Be
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -86,6 +87,7 @@ main(int argc, char *argv[])
 	char ibuf[BUFSIZ];
 	fd_set rfd;
 	int flushtime = 30;
+	bool readstdin;
 
 	aflg = kflg = 0;
 	while ((ch = getopt(argc, argv, "aqkt:")) != -1)
@@ -155,19 +157,20 @@ main(int argc, char *argv[])
 		doshell(argv);
 	close(slave);
 
-	if (flushtime > 0)
-		tvp = &tv;
-	else
-		tvp = NULL;
-
-	start = time(0);
-	FD_ZERO(&rfd);
+	start = tvec = time(0);
+	readstdin = true;
 	for (;;) {
+		FD_ZERO(&rfd);
 		FD_SET(master, &rfd);
-		FD_SET(STDIN_FILENO, &rfd);
-		if (flushtime > 0) {
-			tv.tv_sec = flushtime;
+		if (readstdin)
+			FD_SET(STDIN_FILENO, &rfd);
+		if (!readstdin || flushtime > 0) {
+			tv.tv_sec = !readstdin ? 1 : flushtime - (tvec - start);
 			tv.tv_usec = 0;
+			tvp = &tv;
+			readstdin = true;
+		} else {
+			tvp = NULL;
 		}
 		n = select(master + 1, &rfd, 0, 0, tvp);
 		if (n < 0 && errno != EINTR)
@@ -176,8 +179,10 @@ main(int argc, char *argv[])
 			cc = read(STDIN_FILENO, ibuf, BUFSIZ);
 			if (cc < 0)
 break;
-			if (cc == 0)
+			if (cc == 0) {
 (void)write(master, ibuf, 0);
+readstdin = false;
+			}
 			if (cc > 0) {
 (void)write(master, ibuf, cc);
 if (kflg && tcgetattr(master, &stt) >= 0 &&
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: /usr/bin/script eating 100% cpu with portupgrade and xargs

2011-10-04 Thread Mikolaj Golub
On Sun, Sep 18, 2011 at 1:58 PM, Mikolaj Golub  wrote:
>
> On Sun, 18 Sep 2011 08:47:13 +0200 Ronald Klop wrote:
>
>  RK> On Sun, 18 Sep 2011 07:39:01 +0200, Jeremy Chadwick
>  RK>  wrote:
>
>  >> On Sun, Sep 18, 2011 at 12:54:13AM -0400, Jason Hellenthal wrote:
>  >>> On Sun, Sep 18, 2011 at 01:49:15AM +0200, Ronald Klop wrote:
>  >>> > Hi,
>  >>> >
>  >>> > I'm running portupgrade in screen to update all the ports for
>  >>> > 9-BETA2/9-CURRENT on amd64. While doing this script eats 100% cpu.
>  >>> > Because portupgrade -fa crashed I'm running this command to update the
>  >>> > remaining non-updates ports.
>  >>> > find /var/db/pkg -name +DESC -mtime +2 |cut -d / -f 5 | xargs
>  >>> time nice -n
>  >>> > 20 portupgrade -f
>  >>> >
>  >>> > The output of truss -p `pgrep script` is this:
>  >>> > clock_gettime(13,{1316301104.0 })        = 0 (0x0)
>  >>> > select(5,{0 4},0x0,0x0,{30.00 })             = 1 (0x1)
>  >>> > read(0,0x7fffcdf0,1024)                      = 0 (0x0)
>  >>> > write(4,0x7fffcdf0,0)                        = 0 (0x0)
>  >>> > clock_gettime(13,{1316301104.0 })        = 0 (0x0)
>  >>> > select(5,{0 4},0x0,0x0,{30.00 })             = 1 (0x1)
>  >>> > read(0,0x7fffcdf0,1024)                      = 0 (0x0)
>  >>> > write(4,0x7fffcdf0,0)                        = 0 (0x0)
>  >>> > clock_gettime(13,{1316301104.0 })        = 0 (0x0)
>  >>> > select(5,{0 4},0x0,0x0,{30.00 })             = 1 (0x1)
>  >>> > read(0,0x7fffcdf0,1024)                      = 0 (0x0)
>  >>> > write(4,0x7fffcdf0,0)                        = 0 (0x0)
>  >>> > clock_gettime(13,{1316301104.0 })        = 0 (0x0)
>  >>> > select(5,{0 4},0x0,0x0,{30.00 })             = 1 (0x1)
>  >>> > read(0,0x7fffcdf0,1024)                      = 0 (0x0)
>  >>> > write(4,0x7fffcdf0,0)                        = 0 (0x0)
>  >>> >
>  >>> > So it is really fast in reading and writing 0 bytes most of the time.
>  >>> >
>  >>> > I also found
>  >>> http://web.archiveorange.com/archive/v/6ETvLvjo60Gj9geAUAb6
>  >>> > and I think I am better of by rewriting my command so stdin/stdout is
>  >>> > still the terminal. Although the link is a couple of years old.
>  >>> >
>  >>> > Is this known? Can somebody explain me why my xargs command is
>  >>> not working
>  >>> > well?
>  >>> >
>  >>>
>  >>> Are you absolutely sure that its script(1) causing this ? 100% CPU usage
>  >>> has been a known side effect of screen(1) for quite some time. Rebuild
>  >>> it and try again.
>  >>
>  >> Jason's referring to this, I believe:
>  >> 
> http://www.freebsd.org/cgi/cvsweb.cgi/ports/sysutils/screen/Makefile#rev1.55
>  >>
>  >> To clarify the what the commit message means: it does not mean "when the
>  >> package is installed the installation takes up 100% CPU".  It means
>  >> "once the package is installed and screen is used, screen takes up 100%
>  >> CPU".  I know because I've seen this behaviour in the past (one of the
>  >> many, many reasons I build ports from source).
>  >>
>  >> However:
>  >> 
> http://www.freebsd.org/cgi/cvsweb.cgi/ports/sysutils/screen/Makefile#rev1.78
>  >>
>  >> So: If a binary package is being installed through your above
>  >> portupgrade command, and you're seeing this problem, then it sounds to
>  >> me like commit revision 1.78 is a regression and NO_PACKAGE should be
>  >> put back into place + packages removed from all mirrors.
>  >>
>  >> There are many reasons to not use GNU screen at all, or if you must have
>  >> something like it, use tmux.  I recently had to provide an analysis of
>  >> how GNU screen destroys one's terminal[1]; so if the above problem turns
>  >> out to be caused by GNU screen as well, I'll just add it to my
>  >> ever-growing list of reasons the software should be nuked from orbit.
>  >>
>  >> Otherwise, if this turns out to be a problem with portupgrade (which you
>  >> found some evidence supporting such), then the solution is simple: stop
>  >> using portupgrade, use po

Re: /usr/bin/script eating 100% cpu with portupgrade and xargs

2011-10-06 Thread Mikolaj Golub

On Tue, 04 Oct 2011 18:34:07 +0200 Michiel Boland wrote:

 MB> On 10/04/2011 13:15, Mikolaj Golub wrote:
 >> On Sun, Sep 18, 2011 at 1:58 PM, Mikolaj Golub  wrote:
 MB> [...]
 >>>
 >>> I believe the behaviour is after this commit:
 >>>
 >>> http://svnweb.freebsd.org/base?view=revision&revision=125848
 >>>
 >>> I think we should skip select on STDIN after reading EOF from it, like in 
 >>> the
 >>> patch below.
 >>
 >> For the record. The issue has been fixed in CURRENT and the fix has
 >> been merged to STABLE.
 >>
 >> Thanks Kostik and Chris for their comments and suggestions.
 >>

 MB> Does this mean that bin/72501 can be closed?

Yes, thanks for pointing out. Closed.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: /usr/bin/script eating 100% cpu with portupgrade and xargs

2011-10-15 Thread Mikolaj Golub

On Wed, 12 Oct 2011 23:25:35 +0100 Adrian Wontroba wrote:

 AW> On Sat, Oct 08, 2011 at 01:27:07AM +0100, Adrian Wontroba wrote:
 >> I won't be in a position to create a simpler test case, raise a PR or
 >> try patches till Tuesday evening (UK) at the earliest.

 AW> So far I have been unable to reproduce the problem with portupgrade (and
 AW> will probably move to portmaster).

 AW> I have however found a different but possibly related problem with the
 AW> new version of script in RELENG_8, for which I have raised this PR:

 AW> misc/161526: script outputs corrupt if input is not from a terminal

As Jilles wrote ^D\b\b are echoed by the terminal when the script sends VEOF
to the program being script. 

In my recent commit r225809 the intention was to sent VEOF only once if STDIN
was not terminal. Unfortunately the fix was incorrect and for flushtime > 0 it
keeps sending VEOF. That is why you are observing series of ^D\b\b characters.

I am going to commit the attached patch to HEAD, that fixes this. But we will
still have one ^D\b\b in the output.

-- 
Mikolaj Golub
Index: usr.bin/script/script.c
===
--- usr.bin/script/script.c	(revision 226349)
+++ usr.bin/script/script.c	(working copy)
@@ -163,12 +163,15 @@ main(int argc, char *argv[])
 		FD_SET(master, &rfd);
 		if (readstdin)
 			FD_SET(STDIN_FILENO, &rfd);
-		if ((!readstdin && ttyflg) || flushtime > 0) {
-			tv.tv_sec = !readstdin && ttyflg ? 1 :
-			flushtime - (tvec - start);
+		if (!readstdin && ttyflg) {
+			tv.tv_sec = 1;
 			tv.tv_usec = 0;
 			tvp = &tv;
 			readstdin = 1;
+		} else if (flushtime > 0) {
+			tv.tv_sec = flushtime - (tvec - start);
+			tv.tv_usec = 0;
+			tvp = &tv;
 		} else {
 			tvp = NULL;
 		}
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: /usr/bin/script eating 100% cpu with portupgrade and xargs

2011-10-15 Thread Mikolaj Golub

On Fri, 14 Oct 2011 14:03:37 +0200 Jilles Tjoelker wrote:

 JT> On Wed, Oct 12, 2011 at 11:25:35PM +0100, Adrian Wontroba wrote:
 >> On Sat, Oct 08, 2011 at 01:27:07AM +0100, Adrian Wontroba wrote:
 >> > I won't be in a position to create a simpler test case, raise a PR or
 >> > try patches till Tuesday evening (UK) at the earliest.

 >> So far I have been unable to reproduce the problem with portupgrade (and
 >> will probably move to portmaster).

 >> I have however found a different but possibly related problem with the
 >> new version of script in RELENG_8, for which I have raised this PR:

 >> misc/161526: script outputs corrupt if input is not from a terminal

 >> Blast, should of course been bin/

 JT> The extra ^D\b\b are the EOF character being echoed. These EOF
 JT> characters are being generated by the new script(1) to pass through the
 JT> EOF condition on stdin.

 JT> One fix would be to change the termios settings temporarily to disable
 JT> the echoing but this may cause problems if the application is changing
 JT> termios settings concurrently and generally feels bad.

 JT> It may be best to remove writing EOF characters, perhaps adding an
 JT> option to enable it again if there is a concrete use case for it.

Without passing EOF to the to the program being scripted the following command
will hang forever:

echo 1 |script /tmp/script.out cat

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: /usr/bin/script eating 100% cpu with portupgrade and xargs

2011-10-15 Thread Mikolaj Golub

On Fri, 14 Oct 2011 22:50:32 +0200 Stefan Bethke wrote:

 SB> I finally figured out why my ports aren't updating anymore: when running 
portupgrade -a --batch from cron, stdin is /dev/null, and that produces the 
gobs of ^D in the output, as well as the script file that portupgrade creates.  
What's worse is that the upgrade never completes.

 SB> You can easily see this for yourself:
 SB> # portupgrade -a --batch  This is on 8-stable from October 5th.

Could you please try the patch I attached to another my mail in this thread to
see if it helps?

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: /usr/bin/script eating 100% cpu with portupgrade and xargs

2011-10-15 Thread Mikolaj Golub

On Sat, 15 Oct 2011 11:50:22 +0200 Stefan Bethke wrote:

 SB> Am 15.10.2011 um 09:36 schrieb Mikolaj Golub:

 >> 
 >> On Fri, 14 Oct 2011 22:50:32 +0200 Stefan Bethke wrote:
 >> 
 >> SB> I finally figured out why my ports aren't updating anymore: when 
 >> running portupgrade -a --batch from cron, stdin is /dev/null, and that 
 >> produces the gobs of ^D in the output, as well as the script file that 
 >> portupgrade creates.  What's worse is that the upgrade never completes.
 >> 
 >> SB> You can easily see this for yourself:
 >> SB> # portupgrade -a --batch > 
 >> SB> This is on 8-stable from October 5th.
 >> 
 >> Could you please try the patch I attached to another my mail in this thread 
 >> to
 >> see if it helps?


 SB> Seems to do the trick, thanks!

Thanks for testing! Committed. I am going to MFC it soon.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: problems with hast

2012-01-28 Thread Mikolaj Golub

Hi, 

On Wed, 18 Jan 2012 20:23:25 +0200 Artem Kajalainen wrote:

 AK> Hello,

 AK> I'm trying to setup hastd on two servers and got error, which I can't
 AK> understand. Box is running as primary, then i reboot it, another box
 AK> get primary role by carp events, then 1st box at boot tries to set up
 AK> primary role on own hast instance and fails with this:
 AK> Jan 18 22:13:03 gw_chlb_2 hastd[1387]: [storage0] (primary)
 AK> G_GATE_CMD_DONE failed: No such file or directory.
 AK> Jan 18 22:13:08 gw_chlb_2 hastd[1004]: [storage0] (primary) Worker
 AK> process exited ungracefully (pid=1387, exitcode=71).

 AK> I thought that geom_gate module can be problem, so i compiled it in
 AK> kernel. As you can see - it doesn't help. Both servers are
 AK> FreeBSD9.0-stable, updated 1 week ago. Hastd use whole disk. More info
 AK> from hastd:
 AK> gw_chlb_2# hastd -dF -c /etc/hast.conf
 AK> [INFO] Started successfully, running protocol version 1.
 AK> [DEBUG][1] Listening on control address /var/run/hastctl.
 AK> [INFO] Listening on address 192.168.0.1:8457.
 AK> [INFO] [storage0] (init) Role changed to primary.
 AK> [DEBUG][1] [storage0] (primary) Obtained info about /dev/ada2.
 AK> [DEBUG][1] [storage0] (primary) Locked /dev/ada2.
 AK> [INFO] [storage0] (primary) Device hast/storage0 created.
 AK> [DEBUG][1] [storage0] (primary) Privileges successfully dropped using
 AK> jail+setgid+setuid.
 AK> [INFO] [storage0] (primary) Privileges successfully dropped.
 AK> [INFO] [storage0] (primary) Connected to tcp4://192.168.0.2.
 AK> [INFO] [storage0] (primary) Synchronization started. 6.0MB to go.
 AK> [ERROR] [storage0] (primary) G_GATE_CMD_DONE failed: No such file or 
directory.
 AK> [INFO] [storage0] (primary) Received cancel from the kernel, exiting.
 AK> [DEBUG][1] Unable to receive event header: Socket is not connected.
 AK> [ERROR] [storage0] (primary) Worker process exited ungracefully
 AK> (pid=1452, exitcode=71).
 AK> [INFO] [storage0] (primary) Changing resource role back to init.

 AK> Any thoughts?

Sorry, Artem, I read your email only today.

Investigating, it looks after r226859, when 'async' mode was added, we have 2
issues with synchronization from secondary to master (rather very rear case
normally):

1) When the synchronization from secondary to master is running and primary
gets READ request, the request should be sent to the secondary but actually it
is lost. As a result READ operation gets stuck. After the syncronization is
complete the following READ requests, which now can be served by primary, work
ok.

2) In async mode, for syncronization requests, write_complete() function,
which sends G_GATE_CMD_DONE command to ggate, is called twice and the second
call fails.

Artem, did you run async mode? If you did then I suppose you observed the
second issue. Could you please try the attached patch?

-- 
Mikolaj Golub

Index: sbin/hastd/primary.c
===
--- sbin/hastd/primary.c	(revision 230661)
+++ sbin/hastd/primary.c	(working copy)
@@ -1255,7 +1255,7 @@ ggate_recv_thread(void *arg)
 		pjdlog_debug(2,
 		"ggate_recv: (%p) Moving request to the send queues.", hio);
 		refcount_init(&hio->hio_countdown, ncomps);
-		for (ii = ncomp; ii < ncomps; ii++)
+		for (ii = ncomp; ncomps != 0; ncomps--, ii++)
 			QUEUE_INSERT1(hio, send, ii);
 	}
 	/* NOTREACHED */
@@ -1326,7 +1326,7 @@ local_send_thread(void *arg)
 			} else {
 hio->hio_errors[ncomp] = 0;
 if (hio->hio_replication ==
-HAST_REPLICATION_ASYNC) {
+HAST_REPLICATION_ASYNC && !ISSYNCREQ(hio)) {
 	ggio->gctl_error = 0;
 	write_complete(res, hio);
 }
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 9.0 Stable unable to buildworld, missing KERN_PROC_ENV in kvm_proc.c

2012-02-05 Thread Mikolaj Golub

On Sun, 5 Feb 2012 20:09:08 +1100 Dewayne wrote:

 D> Unfortunately 9.0 Stable fails to compile due to missing declaration of
 D> KERN_PROC_ENV in /usr/src/lib/libkvm/kvm_proc.c.  csup'ed from today.

 D> Please refer to the following changes on 30-Jan-2012: 
 D> 
http://www.freebsd.org/cgi/cvsweb.cgi/src/lib/libkvm/kvm_proc.c.diff?r1=1.106.2.1;r2=1.106.2.2;f=h

 D> Compile error reads:
 D> cc -O2 -pipe -pipe -O2 -g0 -DSTRIP_FBSDID -UDEBUGGING -march=prescott 
-mtune=prescott  -DLIBC_SCCS -I/usr/src/lib/libkvm
 D> -DNDEBUG -std=gnu99 -fstack-protector -Wsystem-headers -Wall 
-Wno-format-y2k -W -Wno-unused-parameter -Wstrict-prototypes
 D> -Wmissing-prototypes -Wpointer-arith -Wno-uninitialized -Wno-pointer-sign 
-c /usr/src/lib/libkvm/kvm_proc.c
 D> /usr/src/lib/libkvm/kvm_proc.c: In function 'kvm_argv':
 D> /usr/src/lib/libkvm/kvm_proc.c:663: error: 'KERN_PROC_ENV' undeclared 
(first use in this function)
 D> /usr/src/lib/libkvm/kvm_proc.c:663: error: (Each undeclared identifier is 
reported only once
 D> /usr/src/lib/libkvm/kvm_proc.c:663: error: for each function it appears in.)

 D> Am I the last person using i386 architecture?  ;) I'm half joking. The
 D> buildworld completes successfully for architecture=amd64.

And there should not be problems with i386 too. The error does not look like
architecture specific. Could you please recheck your sources and building
procedure and give more details if the error still exists.

KERN_PROC_ENV is declared in sys/sys/sysctl.h, and this was MFCed in r230754,
before the MFC lib/libkvm (r230780) you are referring to.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: problems with hast

2012-02-05 Thread Mikolaj Golub

On Sun, 5 Feb 2012 10:27:54 +0100 Pawel Jakub Dawidek wrote:

 PJD> The analysis and fixes look good to me, please go ahead and commit
 PJD> (small nits below).

Thanks. Committed.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Issue with hast replication

2012-03-11 Thread Mikolaj Golub

On Sun, 11 Mar 2012 19:54:57 +0100 Phil Regnauld wrote:

 PR> Hi,

 PR> I've got a fairly simple setup: two hosts running 9.0-R (will upgrade to 
stable
 PR> if told to, but want to check here first), ZFS and HAST. HAST is 
configured to
 PR> run on top of zvols configured on each host, as illustrated:

 PR>   FS  FS
 PR>+--++--+ 
 PR>| hvol | < hastd -> | hvol | 
 PR>+--++--+ 
 PR>| zvol || zvol | 
 PR>+--++--+ 
 PR>| zfs  || zfs  | 
 PR>+--++--+ 
 PR>   h1  h2

 PR> Connection is gigabit to the same switch. No issues with large TCP
 PR> transfers such as SCP/FTP.

 PR> Config is vanilla:

 PR> # zfs create -V 10G zfs/hvol

 PR> hast.conf:

 PR> resource hvol {
 PR> on h1 {
 PR> local /dev/zvol/zfs/hvol
 PR> remote tcp4://192.168.1.100
 PR> }
 PR> on h2 {
 PR> local /dev/zvol/zfs/hvol
 PR> remote tcp4://192.168.1.200
 PR> }
 PR> }


 PR> h1 is behaving fine as primary, either with h2 turned off or in init -
 PR> but as soon as I set the role to secondary for h2, the receiver
 PR> repeatedly crashes and restarts - see the traces below.

 PR> Primary:

 PR> Mar 11 02:02:30 h1 hastd[2282]: [hvol] (primary) Disconnected from 
tcp4://192.168.1.200.
 PR> Mar 11 02:02:30 h1 hastd[2282]: [hvol] (primary) Unable to write 
synchronization data: Cannot allocate memory.
 PR> Mar 11 02:02:41 h1 hastd[2282]: [hvol] (primary) Unable to send request 
(Cannot allocate memory): WRITE(31642091520, 131072).

31642091520 looks like rather large offset for 10Gb volume...

Just to be more confident that this is a HAST issue could you please try the
following experiment?

1) Stop hastd on h2.

2) On h1 run something like below:

  dd if=/dev/zvol/zfs/hvol bs=131072 | ssh h2 dd bs=131072 of=/dev/zvol/zfs/hvol

(copy hvol from h1 to h2 without hastd to see if it will succeed).

Note: you will need to recreate HAST provider on secondary after this.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Issue with hast replication

2012-03-12 Thread Mikolaj Golub

On Mon, 12 Mar 2012 15:31:27 +0100 Phil Regnauld wrote:

 PR> Phil Regnauld (regnauld) writes:
 >> 
 >> 7) ktrace on the destination dd:
 >> 
 >> fstat(0,{ mode=p- ,inode=5,size=16384,blksize=4096 }) = 0 (0x0)
 >> lseek(0,0x0,SEEK_CUR)ERR#29 'Illegal seek'

 PR> [...]

 >> Illegal seek, eh ? Any clues ?
 >> 
 >> The boxes are identical (HP DL380 G6), though the RAM config is 
 >> different.
 >> 
 >> Summary:
 >> 
 >> - ssh works fine
 >> - h1 zvol to h2 zvol over ssh fails
 >> - h1 zvol to h2 /tmp/x over ssh is fine
 >> - h2 /dev/zero locally to h2 zvol is fine
 >> - h2 /tmp/x locally to h2 zvol fails at first, but works afterwards...

 PR> A few more data points: dd from a local zvol to a local zvol on either
 PR> machine works fine.

 PR> Using nc instead of ssh, this time it's the sender nc dying:

 PR> ktrace on the sender:

 PR> 47704 nc   CALL  write(0x3,0x7fff5450,0x800)
 PR> 47704 nc   RET   write -1 errno 32 Broken pipe
 PR> 47704 nc   PSIG  SIGPIPE SIG_DFL code=0x10006

 PR> truss on the sender:

 PR> poll({3/POLLIN 0/POLLIN},2,-1)   = 2 (0x2)
 PR> read(3,0x7fff5450,2048)  ERR#54 'Connection 
reset by peer'
 PR> close(3) = 0 (0x0)


 PR> On tcpdump, I do see the receiver send a FIN when using nc.
 PR> When using ssh, the sender is sending the FIN.

 PR> Anything else I can look for ?

It looks like in the case of hastd this was send(2) who returned ENOMEM, but
it would be good to check. Could you please start synchronization again,
ktrace primary worker process when ENOMEM errors are observed and show output
here?

If it is send(2) who fails then monitoring netstat and network driver
statistics might be helpful. Something like

netstat -nax
netstat -naT
netstat -m
netstat -nid

sysctl -a dev.

And may be

vmstat -m
vmstat -z

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Issue with hast replication

2012-03-13 Thread Mikolaj Golub

On Tue, 13 Mar 2012 00:22:23 +0100 Phil Regnauld wrote:

 PR> Mikolaj Golub (to.my.trociny) writes:
 >> 
 >> It looks like in the case of hastd this was send(2) who returned ENOMEM, but
 >> it would be good to check. Could you please start synchronization again,
 >> ktrace primary worker process when ENOMEM errors are observed and show 
 >> output
 >> here?

 PR> Ok, took a little while, as running ktrace on the hastd does slow it 
down
 PR> significantly, and the error normally occurs at 30-90 sec intervals.

 PR>0x0f90 b2f3 3ad5 e657 7f0f 3e50 698f 5deb 12af  |..:..W..>Pi.]...|
 PR>0x0fa0 740d c343 6e80 75f3 e1a7 bfdf a4c1 f6a6  |t..Cn.u.|
 PR>0x0fb0 ea85 655d e423 bd5e 42f7 7e9a 05d2 363a  |..e].#.^B.~...6:|
 PR>0x0fc0 025e a7b5 0956 417c f31c a6eb 2cd9 d073  |.^...VA|,..s|
 PR>0x0fd0 2589 e8c0 d76a 889f 8345 eeaf f2a0 c2d6  |%j...E..|
 PR>0x0fe0 b89e aaef fee2 6593 e515 7271 88aa cf66  |..e...rq...f|
 PR>0x0ff0 d272 411a 7289 d6c9 6643 bdbe 3c8c 8ae8  |.rA.r...fC..<...|
 PR>  50959 hastdRET   sendto 32768/0x8000
 PR>  50959 hastdCALL  
sendto(0x6,0x8024bf000,0x8000,0x2,0,0)
 PR>  50959 hastdRET   sendto -1 errno 12 Cannot allocate memory
 PR>  50959 hastdCALL  clock_gettime(0xd,0x7f3f86f0)
 PR>  50959 hastdRET   clock_gettime 0
 PR>  50959 hastdCALL  getpid
 PR>  50959 hastdRET   getpid 50959/0xc70f
 PR>  50959 hastdCALL  sendto(0x3,0x7f3f8780,0x84,0,0,0)
 PR>  50959 hastdGIO   fd 3 wrote 132 bytes
 PR>"<27>Mar 12 23:42:43 hastd[50959]: [hvol] (primary) Unable to sen\
 PR> d request (Cannot allocate memory): WRITE(8626634752, 131072)."  
 PR>  50959 hastdRET   sendto 132/0x84
 PR>  50959 hastdCALL  close(0x7)
 PR>  50959 hastdRET   close 0

Ok. So it is send(2). I suppose the network driver could generate the
error. Did you tell what network adaptor you had?

 >> If it is send(2) who fails then monitoring netstat and network driver
 >> statistics might be helpful. Something like
 >> 
 >> netstat -nax
 >> netstat -naT
 >> netstat -m
 >> netstat -nid

 PR> I could run this in a loop, but that would be a lot of data, and might
 PR> not be appropriate to paste here.

 PR> I didn't see any obvious errors, but I'm not sure what I'm looking for.
 PR> netstat -m didn't show anything close to running out of buffers or
 PR> clusters...

 >> sysctl -a dev.
 >>
 >> And may be
 >> 
 >> vmstat -m
 >> vmstat -z

 PR> No obvious errors there either, but again what should I look out for ?

I would look at sysctl -a dev. statistics and try to find if there is 
correlation
between ENOMEM failures and growing of error counters.

 PR> In the meantime, I've also experimented with a few different 
scenarios, and
 PR> I'm quite puzzled.

 PR> For instance, I configured one of the other gigabit cards on each host 
to
 PR> provide a dedicated replication network. The main difference is that up
 PR> until now this has been running using tagged vlans. To be on the safe 
side,
 PR> I decided to use an untagged interface (the second gigabit adapter in 
each
 PR> machine).
 PR> 
 PR> Here's where I observed, and it is very odd:
 PR> 
 PR> - doing a dd ... | ssh dd fails in the same fashion as before

 PR> - I created a second zvol + hast resource of just 1 GB, and it 
replicated
 PR>   without any problems, peaking at 75 MB / sec (!) - maybe 1GB is too 
small
 PR>   ?
 PR> 
 PR>   (side note: hastd doesn't pick up configuration changes even with 
SIGHUP,
 PR>which makes it hard to provision new resources on the fly) 

 PR> - I restarted replication on the 100 G hast resource, and it's 
currently
 PR>   replicating without any problems over the second ethernet, but it's
 PR>   dragging along at 9-10 MB/sec, peaking at 29 MB/sec occasionally.

Looking at buffer usage from 'netstat -nax' output ran during synchronization
(on both hosts) could provide useful info where the bottleneck is. top -HS
output might be useful too.

 PR>   Earlier, I was observing peaks at 65-70 MB sec in between failures...

 PR> So I don't really know what to conclude :-| 

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Issue with hast replication

2012-03-13 Thread Mikolaj Golub

On Tue, 13 Mar 2012 22:19:28 +0100 Phil Regnauld wrote:

 PR> dev.bce.0.l2fhdr_error_count: 0
 PR> dev.bce.0.stat_emac_tx_stat_dot3statsinternalmactransmiterrors: 0
 PR> dev.bce.0.stat_Dot3StatsCarrierSenseErrors: 0
 PR> dev.bce.0.stat_Dot3StatsFCSErrors: 0
 PR> dev.bce.0.stat_Dot3StatsAlignmentErrors: 0

What about failed counters like mbuf_alloc_failed_count,
dma_map_addr_rx_failed_count, dma_map_addr_tx_failed_count?

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Issue with hast replication

2012-03-17 Thread Mikolaj Golub

On Tue, 13 Mar 2012 00:22:23 +0100 Phil Regnauld wrote:

 PR>   (side note: hastd doesn't pick up configuration changes even with 
SIGHUP,
 PR>which makes it hard to provision new resources on the fly) 

I just tried to reproduce this and failed. For me a new recource was added
without problems on reload.

Mar 17 20:04:24 kopusha hastd[52678]: Reloading configuration...
Mar 17 20:04:24 kopusha hastd[52678]: Keep listening on address 0.0.0.0:7771.
Mar 17 20:04:24 kopusha hastd[52678]: Resource rtest added.
Mar 17 20:04:24 kopusha hastd[52678]: Configuration reloaded successfully.

You sent SIGHUP to master process and on both hosts, didn't you?

Could you please provide more details if you still fail to add new resources
on the fly (configuration, log messages).

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: svn commit: r233953 - stable/8/usr.bin/procstat

2012-04-08 Thread Mikolaj Golub

On Sun, 8 Apr 2012 17:12:18 -0400 Jason Hellenthal wrote:

 JH> This commit in action does not seem to be doing the correct thing even
 JH> though it does report an error when kern.proc.pathname is not known.

 JH> Running procstat -a -b produces:
 JH> [...]
 JH> 1848 ksh803500 /bin/ksh
 JH> procstat: sysctl: kern.proc.pathname: 2208: No such file or directory

Plese note, the error was generated by kern.proc.pathname sysctl, not
kern.proc.osrel. I suppose it is because /bin/ksh binary had been reinstalled
and 1848 ran the old binary.

My commit has not touched the kern.proc.pathname part and the same bahavior
was before the change.

 JH> 2210 ksh803500 /bin/ksh
 JH> [...]

 JH> While procstat -a produces:
 JH> [...]
 JH> 1848  1846  1848  1848  1848   1 jhellenthal wait  FreeBSD ELF32 ksh   
  
 JH> 2208  1814  2208  2208 0   1 jhellenthal selectFreeBSD ELF32 xterm
 JH> 2210  2208  2210  2210  2210   1 jhellenthal wait  FreeBSD ELF32 ksh
 JH> [...]

 JH> If process 2208 can be seen during (procstat -a) I do not see a reason
 JH> to bailout and print an error when (-b) is used. Just print the
 JH> orrelease as (0) and print the rest of the information that should be
 JH> seen...

 JH> Could someone have a closer look at this?


 JH> On Fri, Apr 06, 2012 at 04:32:29PM +, Mikolaj Golub wrote:
 >> Author: trociny
 >> Date: Fri Apr  6 16:32:29 2012
 >> New Revision: 233953
 >> URL: http://svn.freebsd.org/changeset/base/233953
 >> 
 >> Log:
 >>   MFC r233390:
 >>   
 >>   When displaying binary information show also osreldate.
 >>   
 >>   Suggested by:kib
 >> 
 >> Modified:
 >>   stable/8/usr.bin/procstat/procstat.1
 >>   stable/8/usr.bin/procstat/procstat_bin.c
 >> Directory Properties:
 >>   stable/8/usr.bin/procstat/   (props changed)
 >> 
 >> Modified: stable/8/usr.bin/procstat/procstat.1
 >> ==
 >> --- stable/8/usr.bin/procstat/procstat.1Fri Apr  6 16:31:29 2012
 >> (r233952)
 >> +++ stable/8/usr.bin/procstat/procstat.1Fri Apr  6 16:32:29 2012
 >> (r233953)
 >> @@ -25,7 +25,7 @@
 >>  .\"
 >>  .\" $FreeBSD$
 >>  .\"
 >> -.Dd March 7, 2010
 >> +.Dd March 23, 2012
 >>  .Dt PROCSTAT 1
 >>  .Os
 >>  .Sh NAME
 >> @@ -98,6 +98,8 @@ Display the process ID, command, and pat
 >>  process ID
 >>  .It COMM
 >>  command
 >> +.It OSREL
 >> +osreldate for process binary
 >>  .It PATH
 >>  path to process binary (if available)
 >>  .El
 >> 
 >> Modified: stable/8/usr.bin/procstat/procstat_bin.c
 >> ==
 >> --- stable/8/usr.bin/procstat/procstat_bin.cFri Apr  6 16:31:29 
 >> 2012(r233952)
 >> +++ stable/8/usr.bin/procstat/procstat_bin.cFri Apr  6 16:32:29 
 >> 2012(r233953)
 >> @@ -42,11 +42,11 @@ void
 >>  procstat_bin(pid_t pid, struct kinfo_proc *kipp)
 >>  {
 >>  char pathname[PATH_MAX];
 >> -int error, name[4];
 >> +int error, osrel, name[4];
 >>  size_t len;
 >>  
 >>  if (!hflag)
 >> -printf("%5s %-16s %-53s\n", "PID", "COMM", "PATH");
 >> +printf("%5s %-16s %8s %s\n", "PID", "COMM", "OSREL", 
 >> "PATH");
 >>  
 >>  name[0] = CTL_KERN;
 >>  name[1] = KERN_PROC;
 >> @@ -64,7 +64,19 @@ procstat_bin(pid_t pid, struct kinfo_pro
 >>  if (len == 0 || strlen(pathname) == 0)
 >>  strcpy(pathname, "-");
 >>  
 >> +name[2] = KERN_PROC_OSREL;
 >> +
 >> +len = sizeof(osrel);
 >> +error = sysctl(name, 4, &osrel, &len, NULL, 0);
 >> +if (error < 0 && errno != ESRCH) {
 >> +warn("sysctl: kern.proc.osrel: %d", pid);
 >> +return;
 >> +}
 >> +if (error < 0)
 >> +return;
 >> +
 >>  printf("%5d ", pid);
 >>  printf("%-16s ", kipp->ki_comm);
 >> +printf("%8d ", osrel);
 >>  printf("%s\n", pathname);
 >>  }
 >> ___
 >> svn-src-stabl...@freebsd.org mailing list
 >> http://lists.freebsd.org/mailman/listinfo/svn-src-stable-8
 >> To unsubscribe, send any mail to "svn-src-stable-8-unsubscr...@freebsd.org"

 JH> -- 
 JH> ;s =;

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: nfe0 loses network connectivity (8.0-RELEASE-p2)

2010-06-07 Thread Mikolaj Golub

On Mon, 7 Jun 2010 16:06:11 +0200 Olaf Seibert wrote:

 OS> I do get the impression there is a mbuf leak somehow. On a much older
 OS> file server (FreeBSD 6.1, serves a bit of NFS but has no ZFS) the mbuf
 OS> cluster useage is much lower, despite a longer uptime:

 OS> 256/634/890/25600 mbuf clusters in use (current/cache/total/max)

 OS> Also, it shows signs that measures are taken in case of mbuf shortage:

 OS> 2259806/466391/598621 requests for mbufs denied 
(mbufs/clusters/mbuf+clusters)
 OS> 1016 calls to protocol drain routines

 OS> whereas the FreeBSD 8.0 machine has zero or very low numbers:

 OS> 0/3956/1959 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
 OS> 0 calls to protocol drain routines

 OS> and useage keeps growing:

 OS> 26122/1782/27904/32768 mbuf clusters in use (current/cache/total/max)

It looks like the issue that has been fixed in STABLE.

http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/144330

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: freeBSD nullfs together with nfs and "silly rename"

2010-06-12 Thread Mikolaj Golub

On Sun, 6 Jun 2010 16:44:43 +0200 Leon Meßner wrote:

 LM> Hi,
 LM> I hope this is not the wrong list to ask. Didn't get any answers on
 LM> -questions.

 LM> When you try to do the following inside a nullfs mounted directory,
 LM> where the nullfs origin is itself mounted via nfs you get an error:

 LM> # foo 
 LM> # tail -f foo& 
 LM> # rm -f foo 
 LM> tail: foo: Stale NFS file handle
 LM> # fg

 LM> This is really a problem when running services inside jails and using
 LM> NFS as storage. As of [2] it looks like this problem is known for a
 LM> while. On a normal NFS mount this does not happen as "silly renaming"
 LM> [1] works there (producing nasty little .nfs files).

nfs_sillyrename() is called when vnode's usecount is more then 1. It is
expected that unlink() syscall increases vnode's usecount in namei() and if
the file has been already opened usecount will be more then 1.

But with nullfs layer present the reference counts are held by the upper node,
not the lower (nfs) one, so when unlink() is called it increases usecount of
the upper vnode, not nfs vnode and nfs_sillyrename() is never called.

The strightforward solution looks like to implement null_remove() that will
increase lower vnode's refcount before calling null_bypass() and then
decrement it after the call. See the attached patch (it works for me on both
8-STABLE and CURRENT).

-- 
Mikolaj Golub

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: freeBSD nullfs together with nfs and "silly rename"

2010-06-12 Thread Mikolaj Golub

On Sat, 12 Jun 2010 11:56:10 +0300 Mikolaj Golub wrote to Leon Meßner:

 MG> See the attached patch (it works for me on both 8-STABLE and CURRENT).

Sorry, actually here is the patch.

-- 
Mikolaj Golub

Index: sys/fs/nullfs/null_vnops.c
===
--- sys/fs/nullfs/null_vnops.c	(revision 208960)
+++ sys/fs/nullfs/null_vnops.c	(working copy)
@@ -499,6 +499,23 @@
 }
 
 /*
+ * Increasing refcount of lower vnode is needed at least for the case
+ * when lower FS is NFS to do sillyrename if the file is in use.
+ */
+static int
+null_remove(struct vop_remove_args *ap)
+{
+	int retval;
+	struct vnode *lvp;
+
+	lvp = NULLVPTOLOWERVP(ap->a_vp);
+	VREF(lvp);
+	retval = null_bypass(&ap->a_gen);
+	vrele(lvp);
+	return (retval);
+}
+
+/*
  * We handle this to eliminate null FS to lower FS
  * file moving. Don't know why we don't allow this,
  * possibly we should.
@@ -809,6 +826,7 @@
 	.vop_open =		null_open,
 	.vop_print =		null_print,
 	.vop_reclaim =		null_reclaim,
+	.vop_remove =		null_remove,
 	.vop_rename =		null_rename,
 	.vop_setattr =		null_setattr,
 	.vop_strategy =		VOP_EOPNOTSUPP,
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Has anyone usd hast in production yet - opinions ?

2010-10-15 Thread Mikolaj Golub

On Mon, 04 Oct 2010 16:55:05 +0100 Pete French wrote:

 >> Please see the freebsd-fs mailing list, which has quite a large number
 >> of problem reports/issues being posted to it on a regular basis (and
 >> patches are often provided).

 PF> Thanks have signed up - I was signed up to 'geom' but not that one.
 PF> A large number of problem reports is not quite what I was hping for, but
 PF> good to know,a nd maybe I shall hold off for a while :-)

Being the author of many problem reports I can say that most of them were not
critical and for marginal cases (like some issues with hooks or a race that
showed up when changing HAST role in loop -- you would never do this in
production). And fixes were committed in several days after a report. I don't
know any open issue.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: hast vs ggate+gmirror sychrnoisation speed

2010-10-22 Thread Mikolaj Golub

On Thu, 21 Oct 2010 13:25:34 +0100 Pete French wrote:

 PF> Well, I bit the bullet and moved to using hast - all went beautifully,
 PF> and I migrated the pool with no downtime. The one thing I do notice,
 PF> however, is that the synchronisation with hast is much slower
 PF> than the older ggate+gmirror combination. It's about half the
 PF> speed in fact.

 PF> When I orginaly setup my ggate configuration I did a lot of tweaks to
 PF> get the speed good - these copnsisted of expanding the send and
 PF> receive space for the sockets using sysctl.conf, and then providing
 PF> large buffers to ggate. Is there a way to control this with hast ?
 PF> I still have the sysctls set (as the machines have not rebooted)
 PF> but I cant see any options in hast.conf which are equivalent to the
 PF> "-S 262144 -R 262144" which I use with ggate

 PF> Any advice, or am I barking up the wrong tree here ?

Currently there are no options in hast.conf to change send and receive buffer
size. They are hardcoded in sbin/hastd/proto_tcp4.c:

val = 131072;
if (setsockopt(tctx->tc_fd, SOL_SOCKET, SO_SNDBUF, &val,
sizeof(val)) == -1) {
pjdlog_warning("Unable to set send buffer size on %s", addr);
}
val = 131072;
if (setsockopt(tctx->tc_fd, SOL_SOCKET, SO_RCVBUF, &val,
sizeof(val)) == -1) {
pjdlog_warning("Unable to set receive buffer size on %s", addr);
}

You could change the values and recompile hastd :-). It would be interesting
to know about the results of your experiment (if you do).

Also note there is another hardcoded value in sbin/hastd/proto_common.c

 /* Maximum size of packet we want to use when sending data. */
#define MAX_SEND_SIZE   32768

that looks like might affect synchronization speed too. Previously we had 128kB
here but this has been changed to 32Kb because it was reported about slow
synchronization with MAX_SEND_SIZE=128kB.

http://svn.freebsd.org/viewvc/base?view=revision&revision=211452

I wonder couldn't slow synchronization with MAX_SEND_SIZE=131072 be due to
SO_SNDBUF/SO_RCVBUF be equal to this size? May be increasing
SO_SNDBUF/SO_RCVBUF we could reach better performance with
MAX_SEND_SIZE=128kB?

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: hast vs ggate+gmirror sychrnoisation speed

2010-10-25 Thread Mikolaj Golub

On Mon, 25 Oct 2010 11:55:34 +0100 Pete French wrote:

 >> You could change the values and recompile hastd :-). It would be interesting
 >> to know about the results of your experiment (if you do).

 PF> I changed the buffer sizes to the same as I was using for ggate, but the 
speed
 PF> is still the same - 44meg/second (about half of what the link can do)

You can check if the queue size is an issue monitoring with netstat Recv-Q and
Send-Q for hastd connections during the test. Running something like below:

while sleep 1; do netstat -na |grep '\.8457.*ESTAB'; done

Also tcpdump may help :-)

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: hast vs ggate+gmirror sychrnoisation speed

2010-10-27 Thread Mikolaj Golub

On Tue, 26 Oct 2010 17:01:01 +0100 Pete French wrote:

 PF>  Actually, I just llooked I dmesg on the secondary - it is full
 PF> of messages thus:

 PF> Oct 26 15:44:59 serpentine-passive hastd[10394]: [serp0] (secondary) 
Unable to receive request header: RPC version wrong.
 PF> Oct 26 15:45:00 serpentine-passive hastd[782]: [serp0] (secondary) Worker 
process exited ungracefully (pid=10394, exitcode=75).
 PF> Oct 26 15:46:59 serpentine-passive hastd[10421]: [serp0] (secondary) 
Unable to receive request header: RPC version wrong.
 PF> Oct 26 15:47:04 serpentine-passive hastd[782]: [serp0] (secondary) Worker 
process exited ungracefully (pid=10421, exitcode=75).

I saw this too but only sporadic messages so I forgot and did not investigate
then this :-).

Now running synchronization I see them too (but again only sporadic). Setting
the assertion and looking at the received header:

(gdb) list
309 goto fail;
310
311 if (hdr.version != HAST_PROTO_VERSION) {
312 assert(0);
313 errno = ERPCMISMATCH;
314 goto fail;
315 }
316
317 hdr.size = le32toh(hdr.size);
318
(gdb) p/x hdr
$2 = {version = 0x9, size = 0x65657266}

So it looks like garbage.

In hast_proto_send() we send header and then data. Couldn't it be that
remote_send and sync threads interfere and their packets are mixed? May be some
synchronization is needed here?

I set sleep(1) in hast_proto_send() between proto_send(header) and
proto_send(data). The error started to occur frequently.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: hast vs ggate+gmirror sychrnoisation speed

2010-10-28 Thread Mikolaj Golub

On Thu, 28 Oct 2010 18:30:36 +0200 Pawel Jakub Dawidek wrote:

 PJD> On Wed, Oct 27, 2010 at 10:05:20PM +0300, Mikolaj Golub wrote:
 >> In hast_proto_send() we send header and then data. Couldn't it be that
 >> remote_send and sync threads interfere and their packets are mixed? May be 
 >> some
 >> synchronization is needed here?
 >> 
 >> I set sleep(1) in hast_proto_send() between proto_send(header) and
 >> proto_send(data). The error started to occur frequently.

 PJD> Synchronization requests are sent through the remote thread just like
 PJD> regular I/O requests, exactly because of races that can occur.

 PJD> I looked at the code and the keepalive packets arbe sent from another
 PJD> thread. Could you try turning them off in primary.c and see if that
 PJD> helps?

At first I set RETRY_SLEEP to 1 sec to have more keepalive packets. The errors
started to observe frequently:

Oct 28 21:35:53 bolek hastd[1709]: [storage] (secondary) Unable to receive 
request header: RPC version wrong.
Oct 28 21:35:54 bolek hastd[1632]: [storage] (secondary) Worker process exited 
ungracefully (pid=1709, exitcode=75).
Oct 28 21:36:12 bolek hastd[1722]: [storage] (secondary) Unable to receive 
request header: RPC version wrong.
Oct 28 21:36:12 bolek hastd[1632]: [storage] (secondary) Worker process exited 
ungracefully (pid=1722, exitcode=75).
...

Now I have been running synchronization for more then a half an hour with
keepalive_send disabled and have not seen any error.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: hast vs ggate+gmirror sychrnoisation speed

2010-10-30 Thread Mikolaj Golub

On Thu, 28 Oct 2010 22:08:54 +0300 Mikolaj Golub wrote to Pawel Jakub Dawidek:

 PJD>> I looked at the code and the keepalive packets arbe sent from another
 PJD>> thread. Could you try turning them off in primary.c and see if that
 PJD>> helps?

 MG> At first I set RETRY_SLEEP to 1 sec to have more keepalive packets. The 
errors
 MG> started to observe frequently:

 MG> Oct 28 21:35:53 bolek hastd[1709]: [storage] (secondary) Unable to receive 
request header: RPC version wrong.
 MG> Oct 28 21:35:54 bolek hastd[1632]: [storage] (secondary) Worker process 
exited ungracefully (pid=1709, exitcode=75).
 MG> Oct 28 21:36:12 bolek hastd[1722]: [storage] (secondary) Unable to receive 
request header: RPC version wrong.
 MG> Oct 28 21:36:12 bolek hastd[1632]: [storage] (secondary) Worker process 
exited ungracefully (pid=1722, exitcode=75).
 MG> ...

 MG> Now I have been running synchronization for more then a half an hour with
 MG> keepalive_send disabled and have not seen any error.

So :-) What do you think about sending keepalive in remote_send_thread() to
avoid this problem and sending them only when a connection is idle (it looks
like there is no much use to send them all the time)? Something like in the
patch below (it works for me).

-- 
Mikolaj Golub

Index: sbin/hastd/primary.c
===
--- sbin/hastd/primary.c	(revision 214550)
+++ sbin/hastd/primary.c	(working copy)
@@ -190,6 +190,19 @@ static pthread_mutex_t metadata_lock;
 	hio_next[(ncomp)]);		\
 	mtx_unlock(&hio_##name##_list_lock[(ncomp)]);			\
 } while (0)
+#define	QUEUE_TRY1(hio, name, ncomp)		do {			\
+	mtx_lock(&hio_##name##_list_lock[(ncomp)]);			\
+	(hio) = TAILQ_FIRST(&hio_##name##_list[(ncomp)]);		\
+	if (hio == NULL) {		\
+		cv_timedwait(&hio_##name##_list_cond[(ncomp)],		\
+		&hio_##name##_list_lock[(ncomp)], RETRY_SLEEP);	\
+		hio = TAILQ_FIRST(&hio_##name##_list[(ncomp)]);		\
+	}\
+	if (hio != NULL)		\
+		TAILQ_REMOVE(&hio_##name##_list[(ncomp)], hio,		\
+		hio_next[(ncomp)]);	\
+	mtx_unlock(&hio_##name##_list_lock[(ncomp)]);			\
+} while (0)
 #define	QUEUE_TAKE2(hio, name)	do {	\
 	mtx_lock(&hio_##name##_list_lock);\
 	while (((hio) = TAILQ_FIRST(&hio_##name##_list)) == NULL) {	\
@@ -1176,6 +1189,38 @@ local_send_thread(void *arg)
 	return (NULL);
 }
 
+static void
+keepalive_send(struct hast_resource *res, unsigned int ncomp)
+{
+	struct nv *nv;
+
+	if (!ISCONNECTED(res, ncomp))
+		return;
+	
+	assert(res->hr_remotein != NULL);
+	assert(res->hr_remoteout != NULL);
+
+	nv = nv_alloc();
+	nv_add_uint8(nv, HIO_KEEPALIVE, "cmd");
+	if (nv_error(nv) != 0) {
+		nv_free(nv);
+		pjdlog_debug(1,
+		"keepalive_send: Unable to prepare header to send.");
+		return;
+	}
+	if (hast_proto_send(res, res->hr_remoteout, nv, NULL, 0) < 0) {
+		pjdlog_common(LOG_DEBUG, 1, errno,
+		"keepalive_send: Unable to send request");
+		nv_free(nv);
+		rw_unlock(&hio_remote_lock[ncomp]);
+		remote_close(res, ncomp);
+		rw_rlock(&hio_remote_lock[ncomp]);
+		return;
+	}
+	nv_free(nv);
+	pjdlog_debug(2, "keepalive_send: Request sent.");
+}
+
 /*
  * Thread sends request to secondary node.
  */
@@ -1184,6 +1229,7 @@ remote_send_thread(void *arg)
 {
 	struct hast_resource *res = arg;
 	struct g_gate_ctl_io *ggio;
+	time_t lastcheck, now;
 	struct hio *hio;
 	struct nv *nv;
 	unsigned int ncomp;
@@ -1194,10 +1240,19 @@ remote_send_thread(void *arg)
 
 	/* Remote component is 1 for now. */
 	ncomp = 1;
+	lastcheck = time(NULL);	
 
 	for (;;) {
 		pjdlog_debug(2, "remote_send: Taking request.");
-		QUEUE_TAKE1(hio, send, ncomp);
+		QUEUE_TRY1(hio, send, ncomp);
+		if (hio == NULL) {
+			now = time(NULL);
+			if (lastcheck + RETRY_SLEEP <= now) {
+keepalive_send(res, ncomp);
+lastcheck = now;
+			}
+			continue;
+		}
 		pjdlog_debug(2, "remote_send: (%p) Got request.", hio);
 		ggio = &hio->hio_ggio;
 		switch (ggio->gctl_cmd) {
@@ -1883,32 +1938,6 @@ failed:
 }
 
 static void
-keepalive_send(struct hast_resource *res, unsigned int ncomp)
-{
-	struct nv *nv;
-
-	nv = nv_alloc();
-	nv_add_uint8(nv, HIO_KEEPALIVE, "cmd");
-	if (nv_error(nv) != 0) {
-		nv_free(nv);
-		pjdlog_debug(1,
-		"keepalive_send: Unable to prepare header to send.");
-		return;
-	}
-	if (hast_proto_send(res, res->hr_remoteout, nv, NULL, 0) < 0) {
-		pjdlog_common(LOG_DEBUG, 1, errno,
-		"keepalive_send: Unable to send request");
-		nv_free(nv);
-		rw_unlock(&hio_remote_lock[ncomp]);
-		remote_close(res, ncomp);
-		rw_rlock(&hio_remote_lock[ncomp]);
-		return;
-	}
-	nv_free(nv);
-	pjdlog_debug(2, "keepalive_send: Request sent.");
-}
-
-static void
 guard_one(struct hast_resource *res, unsigned int ncomp)
 {
 	struct proto_conn *in, *out;
@@ -192

Re: hast vs ggate+gmirror sychrnoisation speed

2010-11-01 Thread Mikolaj Golub

On Mon, 1 Nov 2010 12:01:00 +0100 Pawel Jakub Dawidek wrote:

 PJD> I like your patch and I agree of course it is better to send keepalive
 PJD> packets only when connection is idle. The only thing I'd change is to
 PJD> modify QUEUE_TAKE1() macro to take additional argument 'timeout' - if we
 PJD> don't want it to time out, we pass 0. Could you modify your patch?

Sure :-). Could you look at the updated version?

Note. So far I have only tested that hastd with this updated patch is
compilable and runnable. I will do normal testing today later when I have
access to my test instances and will report about the results.

-- 
Mikolaj Golub

Index: sbin/hastd/primary.c
===
--- sbin/hastd/primary.c	(revision 214624)
+++ sbin/hastd/primary.c	(working copy)
@@ -180,14 +180,20 @@ static pthread_mutex_t metadata_lock;
 	if (_wakeup)			\
 		cv_signal(&hio_##name##_list_cond);			\
 } while (0)
-#define	QUEUE_TAKE1(hio, name, ncomp)	do {\
+#define	QUEUE_TAKE1(hio, name, ncomp, timeout)	do {			\
+	bool _last;			\
+	\
 	mtx_lock(&hio_##name##_list_lock[(ncomp)]);			\
-	while (((hio) = TAILQ_FIRST(&hio_##name##_list[(ncomp)])) == NULL) { \
-		cv_wait(&hio_##name##_list_cond[(ncomp)],		\
-		&hio_##name##_list_lock[(ncomp)]);			\
+	_last = false;			\
+	while (((hio) = TAILQ_FIRST(&hio_##name##_list[(ncomp)])) == NULL && !_last) { \
+		cv_timedwait(&hio_##name##_list_cond[(ncomp)],		\
+		&hio_##name##_list_lock[(ncomp)], (timeout));	\
+		if ((timeout) != 0) 	\
+			_last = true;	\
 	}\
-	TAILQ_REMOVE(&hio_##name##_list[(ncomp)], (hio),		\
-	hio_next[(ncomp)]);		\
+	if (hio != NULL)		\
+		TAILQ_REMOVE(&hio_##name##_list[(ncomp)], (hio),	\
+		hio_next[(ncomp)]);	\
 	mtx_unlock(&hio_##name##_list_lock[(ncomp)]);			\
 } while (0)
 #define	QUEUE_TAKE2(hio, name)	do {	\
@@ -1112,7 +1118,7 @@ local_send_thread(void *arg)
 
 	for (;;) {
 		pjdlog_debug(2, "local_send: Taking request.");
-		QUEUE_TAKE1(hio, send, ncomp);
+		QUEUE_TAKE1(hio, send, ncomp, 0);
 		pjdlog_debug(2, "local_send: (%p) Got request.", hio);
 		ggio = &hio->hio_ggio;
 		switch (ggio->gctl_cmd) {
@@ -1176,6 +1182,38 @@ local_send_thread(void *arg)
 	return (NULL);
 }
 
+static void
+keepalive_send(struct hast_resource *res, unsigned int ncomp)
+{
+	struct nv *nv;
+
+	if (!ISCONNECTED(res, ncomp))
+		return;
+	
+	assert(res->hr_remotein != NULL);
+	assert(res->hr_remoteout != NULL);
+
+	nv = nv_alloc();
+	nv_add_uint8(nv, HIO_KEEPALIVE, "cmd");
+	if (nv_error(nv) != 0) {
+		nv_free(nv);
+		pjdlog_debug(1,
+		"keepalive_send: Unable to prepare header to send.");
+		return;
+	}
+	if (hast_proto_send(res, res->hr_remoteout, nv, NULL, 0) < 0) {
+		pjdlog_common(LOG_DEBUG, 1, errno,
+		"keepalive_send: Unable to send request");
+		nv_free(nv);
+		rw_unlock(&hio_remote_lock[ncomp]);
+		remote_close(res, ncomp);
+		rw_rlock(&hio_remote_lock[ncomp]);
+		return;
+	}
+	nv_free(nv);
+	pjdlog_debug(2, "keepalive_send: Request sent.");
+}
+
 /*
  * Thread sends request to secondary node.
  */
@@ -1184,6 +1222,7 @@ remote_send_thread(void *arg)
 {
 	struct hast_resource *res = arg;
 	struct g_gate_ctl_io *ggio;
+	time_t lastcheck, now;
 	struct hio *hio;
 	struct nv *nv;
 	unsigned int ncomp;
@@ -1194,10 +1233,19 @@ remote_send_thread(void *arg)
 
 	/* Remote component is 1 for now. */
 	ncomp = 1;
+	lastcheck = time(NULL);	
 
 	for (;;) {
 		pjdlog_debug(2, "remote_send: Taking request.");
-		QUEUE_TAKE1(hio, send, ncomp);
+		QUEUE_TAKE1(hio, send, ncomp, RETRY_SLEEP);
+		if (hio == NULL) {
+			now = time(NULL);
+			if (lastcheck + RETRY_SLEEP <= now) {
+keepalive_send(res, ncomp);
+lastcheck = now;
+			}
+			continue;
+		}
 		pjdlog_debug(2, "remote_send: (%p) Got request.", hio);
 		ggio = &hio->hio_ggio;
 		switch (ggio->gctl_cmd) {
@@ -1883,32 +1931,6 @@ failed:
 }
 
 static void
-keepalive_send(struct hast_resource *res, unsigned int ncomp)
-{
-	struct nv *nv;
-
-	nv = nv_alloc();
-	nv_add_uint8(nv, HIO_KEEPALIVE, "cmd");
-	if (nv_error(nv) != 0) {
-		nv_free(nv);
-		pjdlog_debug(1,
-		"keepalive_send: Unable to prepare header to send.");
-		return;
-	}
-	if (hast_proto_send(res, res->hr_remoteout, nv, NULL, 0) < 0) {
-		pjdlog_common(LOG_DEBUG, 1, errno,
-		"keepalive_send: Unable to send request");
-		nv_free(nv);
-		rw_unlock(&hio_remote_lock[ncomp]);
-		remote_close(res, ncomp);
-		rw_rlock(&hio_remote_lock[ncomp]);
-		return;
-	}
-	nv_free(nv);
-	pjdlog_debug(2, "keepalive_send: Request sent.");
-}
-
-static void
 guard_one(struct hast_resource *res, unsigned int ncomp)
 {
 	struct proto_conn *in, *out;
@@ -1926,12 +1948,6 @@ guard_one(struct 

Re: hast vs ggate+gmirror sychrnoisation speed

2010-11-01 Thread Mikolaj Golub

On Mon, 01 Nov 2010 17:06:49 +0200 Mikolaj Golub wrote:

 MG> On Mon, 1 Nov 2010 12:01:00 +0100 Pawel Jakub Dawidek wrote:

 PJD>> I like your patch and I agree of course it is better to send keepalive
 PJD>> packets only when connection is idle. The only thing I'd change is to
 PJD>> modify QUEUE_TAKE1() macro to take additional argument 'timeout' - if we
 PJD>> don't want it to time out, we pass 0. Could you modify your patch?

 MG> Sure :-). Could you look at the updated version?

 MG> Note. So far I have only tested that hastd with this updated patch is
 MG> compilable and runnable. I will do normal testing today later when I have
 MG> access to my test instances and will report about the results.

Tested. It works for me.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: can't disable hyperthreading on 7.1

2008-12-29 Thread Mikolaj Golub

On Wed, 24 Dec 2008 15:36:10 +0200 Alexander Melnik wrote:

 AM> Hi

 AM> I have several computers with 2 xeon processors with hyperthreading under 
FreeBSD 7.1-RC2 and in any case can not turn off hyperthreading:

 AM> [...@vmat ~]$ cat /boot/loader.conf
 AM> machdep.hyperthreading_allowed="0"
 AM> machdep.hlt_logical_cpus="1"

 AM> [...@vmat ~]$ sysctl machdep.hyperthreading_allowed
 AM> machdep.hyperthreading_allowed: 0

 AM> [...@vmat ~]$ sysctl machdep.hlt_logical_cpus
 AM> machdep.hlt_logical_cpus: 1

 AM> [...@vmat ~]$ sysctl hw.ncpu
 AM> hw.ncpu: 4

 AM> If machdep.hyperthreading_allowed = "0", the hw.ncpu must be equal to 2?

 AM> [...@vmat ~]$ top -nd 1
 AM> last pid:   825;  load averages:  0.00,  0.00,  0.00  up 0+00:21:19
15:22:24
 AM> 17 processes:  1 running, 16 sleeping

 AM> Mem: 6228K Active, 6984K Inact, 20M Wired, 9520K Buf, 960M Free
 AM> Swap: 2048M Total, 2048M Free


 AM>   PID USERNAME  THR PRI NICE   SIZERES STATE  C   TIME   WCPU COMMAND
 AM>   762 root1   40  8428K  3936K sbwait 2   0:00  0.00% sshd
 AM>   767 old 1   80  4396K  2212K wait   2   0:00  0.00% bash
 AM>   765 old 1  440  8428K  3952K select 0   0:00  0.00% sshd
 AM>   571 root1  440  3184K  1200K select 1   0:00  0.00% syslogd
 AM>   706 root1  440  5876K  3196K select 0   0:00  0.00% sendmail
 AM>   716 root1   80  3212K  1276K nanslp 2   0:00  0.00% cron
 AM>   759 root1   50  3184K  1088K ttyin  2   0:00  0.00% getty
 AM>   758 root1   50  3184K  1088K ttyin  3   0:00  0.00% getty
 AM>   760 root1   50  3184K  1088K ttyin  0   0:00  0.00% getty
 AM>   700 root1  440  5752K  3276K select 0   0:00  0.00% sshd
 AM>   710 smmsp   1  200  5876K  3200K pause  2   0:00  0.00% sendmail
 AM>   297 root1  960  3128K  1208K select 0   0:00  0.00% dhclient
 AM>   737 root1  960  3240K  1152K select 3   0:00  0.00% inetd
 AM>   163 root1  200  1380K   804K pause  0   0:00  0.00% adjkerntz
 AM>   512 root1  440  1888K   564K select 0   0:00  0.00% devd
 AM>   313 _dhcp   1  440  3128K  1320K select 0   0:00  0.00% dhclient
 AM>   825 old 1  440  3496K  1656K CPU0   0   0:00  0.00% top

 AM> 
 AM> If machdep.hlt_logical_cpus = "1" in the output top in any case should not 
be seen processors 2 and 3?

You can run

vmstat -i | grep cpu

to see how many CPUs are actually used.

I also observe on some hosts (6.3) with machdep.hlt_logical_cpus=1 that in C
column of top output there appear CPU numbers for CPUs that are actually
halted according to vmstat -i and I am curious too what this means.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


pthread.h: typo in #define pthread_cleanup_push/pthread_cleanup_pop

2009-11-24 Thread Mikolaj Golub
Hi,

I have problems with compiling our application under 8.0.

It fails due to these definitions in pthread.h that look like a typo or
incorrectly applied patch:

170 #define pthread_cleanup_push(cleanup_routine, cleanup_arg)  
\
171 {   
\
172 struct _pthread_cleanup_info __cleanup_info__;  
\
173 __pthread_cleanup_push_imp(cleanup_routine, 
cleanup_arg,\
174 &__cleanup_info__); 
\
175 {
176 
177 #define pthread_cleanup_pop(execute)
\
178 }   
\
179 __pthread_cleanup_pop_imp(execute); 
\
180 }


This patch fixes the problem for me:

--- pthread.h.orig2009-11-24 16:44:13.0 +0200
+++ pthread.h   2009-11-24 16:44:45.0 +0200
@@ -172,10 +172,10 @@
struct _pthread_cleanup_info __cleanup_info__;  
\
__pthread_cleanup_push_imp(cleanup_routine, 
cleanup_arg,\
&__cleanup_info__); 
\
-   {
+   }   
 
 #definepthread_cleanup_pop(execute)
\
-   }   
\
+   {   \
__pthread_cleanup_pop_imp(execute); 
\
    }

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: pthread.h: typo in #define pthread_cleanup_push/pthread_cleanup_pop

2009-11-24 Thread Mikolaj Golub
On Tue, 24 Nov 2009 16:53:35 +0200 Mikolaj Golub wrote:

> Hi,
>
> I have problems with compiling our application under 8.0.
>
> It fails due to these definitions in pthread.h that look like a typo or
> incorrectly applied patch:
>
> 170 #define pthread_cleanup_push(cleanup_routine, cleanup_arg)
>   \
> 171 { 
>   \
> 172 struct _pthread_cleanup_info 
> __cleanup_info__;  \
> 173 __pthread_cleanup_push_imp(cleanup_routine, 
> cleanup_arg,\
> 174 &__cleanup_info__);   
>   \
> 175 {
> 176 
> 177 #define pthread_cleanup_pop(execute)  
>   \
> 178 } 
>   \
> 179 __pthread_cleanup_pop_imp(execute);   
>   \
> 180 }
>
>
> This patch fixes the problem for me:

I was hurry when said that the patch fixed the problem. The application
compiled but later it crashed in pthread_cleanup_pop:

(gdb) bt
#0  0xbf4f9ee0 in ?? ()
#1  0x287d18c9 in __pthread_cleanup_pop_imp () from /lib/libthr.so.3
#2  0x287d18ed in pthread_cleanup_pop () from /lib/libthr.so.3
#3  0x287d123c in pthread_exit () from /lib/libthr.so.3
#4  0x287c7757 in pthread_getprio () from /lib/libthr.so.3
#5  0x in ?? ()

So, I don't know what these macros actually were supposed to be. They were
introduced in r179662:

Revision 1.43: download - view: text, markup, annotated - select for diffs
Mon Jun 9 01:14:10 2008 UTC (17 months, 2 weeks ago) by davidxu
Branches: MAIN
Diff to: previous 1.42: preferred, colored
Changes since revision 1.42: +21 -2 lines

SVN rev 179662 on 2008-06-09 01:14:10Z by davidxu

Make pthread_cleanup_push() and pthread_cleanup_pop() as a pair of macros,
use stack space to keep cleanup information, this eliminates overhead of
calling malloc() and free() in thread library.

Discussed on: thread@

> --- pthread.h.orig2009-11-24 16:44:13.0 +0200
> +++ pthread.h   2009-11-24 16:44:45.0 +0200
> @@ -172,10 +172,10 @@
> struct _pthread_cleanup_info __cleanup_info__;
>   \
> __pthread_cleanup_push_imp(cleanup_routine, 
> cleanup_arg,\
> &__cleanup_info__);   
>   \
> -   {
> +   }   
>  
>  #definepthread_cleanup_pop(execute)  
>   \
> -   } 
>   \
> +   {       \
> __pthread_cleanup_pop_imp(execute);   
>   \
> }

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: pthread.h: typo in #define pthread_cleanup_push/pthread_cleanup_pop

2009-11-24 Thread Mikolaj Golub
On Tue, 24 Nov 2009 17:34:22 +0200 Kostik Belousov wrote:

> pthread_cleanup_push/pop are supposed to be used from the common
> lexical scope. Citation from SUSv4:
>
> These functions may be implemented as macros. The application shall
> ensure that they appear as statements, and in pairs within the same
> lexical scope (that is, the pthread_cleanup_push() macro may be
> thought to expand to a token list whose first token is '{' with
> pthread_cleanup_pop() expanding to a token list whose last token is the
> corresponding '}' ).
>
> Your change is wrong.
>
> Basically, the code should do
>   pthread_cleanup_push(some_func, arh);
>   something ...
>   pthread_cleanup_pop(1);
> (1 denotes that some_func should be called).

I see. Thank you. So it really looks like a bug in our application as
pthread_cleanup_pop(1) is missed. I will tell our developers :-)

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


FreeBSD 7.1: QUOTA: kernel panics in jailed()

2009-12-05 Thread Mikolaj Golub
Hi,

Today we have observed the panic on our FreeBSD7.1 box build with QUOTA
support.

According to backtrace ffs_truncate() called chkdq() with NOCRED but later
jailed() was called and the system crashed dereferencing cred->cr_prison.

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd"...

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 7; apic id = 07
fault virtual address   = 0x64
fault code  = supervisor read, page not present
instruction pointer = 0x20:0xc07a1d26
stack pointer   = 0x28:0xedb2d8b8
frame pointer   = 0x28:0xedb2d8b8
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 9742 (icoms_agent_cox476)
trap number = 12
panic: page fault
cpuid = 7
Uptime: 19h54m4s
Physical memory: 3315 MB
Dumping 326 MB: 311 295 279 263 247 231 215 199 183 167 151 135 119 103 87 71 
55 39 23 7

Reading symbols from /boot/kernel/if_lagg.ko...Reading symbols from 
/boot/kernel/if_lagg.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/if_lagg.ko
Reading symbols from /boot/kernel/acpi.ko...Reading symbols from 
/boot/kernel/acpi.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/acpi.ko
#0  doadump () at pcpu.h:196
196 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) bt
#0  doadump () at pcpu.h:196
#1  0xc07c2b27 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418
#2  0xc07c2df9 in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:574
#3  0xc0ada1ec in trap_fatal (frame=0xedb2d878, eva=100) at 
/usr/src/sys/i386/i386/trap.c:939
#4  0xc0ada470 in trap_pfault (frame=0xedb2d878, usermode=0, eva=100) at 
/usr/src/sys/i386/i386/trap.c:852
#5  0xc0adae2c in trap (frame=0xedb2d878) at /usr/src/sys/i386/i386/trap.c:530
#6  0xc0ac0c9b in calltrap () at /usr/src/sys/i386/i386/exception.s:159
#7  0xc07a1d26 in jailed (cred=0x0) at /usr/src/sys/kern/kern_jail.c:465
#8  0xc07a1da5 in prison_priv_check (cred=0x0, priv=320) at 
/usr/src/sys/kern/kern_jail.c:581
#9  0xc07b62ce in priv_check_cred (cred=0x0, priv=320, flags=0) at 
/usr/src/sys/kern/kern_priv.c:86
#10 0xc09e742d in chkdq (ip=0xcb55c980, change=28, cred=0x0, flags=Variable 
"flags" is not available.
)
at /usr/src/sys/ufs/ufs/ufs_quota.c:188
#11 0xc09c24f7 in ffs_truncate (vp=0xcac04cf0, length=0, flags=2048, 
cred=0xc9871d00, td=0xc95d28c0)
at /usr/src/sys/ufs/ffs/ffs_inode.c:276
#12 0xc09ed372 in ufs_setattr (ap=0xedb2db64) at 
/usr/src/sys/ufs/ufs/ufs_vnops.c:600
#13 0xc0af0582 in VOP_SETATTR_APV (vop=0xc0c2ff80, a=0xedb2db64) at 
vnode_if.c:583
#14 0xc084c446 in kern_open (td=0xc95d28c0, path=0x4890e68c , 
pathseg=UIO_USERSPACE, flags=Variable "flags" is not available.
) at vnode_if.h:315
#15 0xc084c5b0 in open (td=0xc95d28c0, uap=0xedb2dcfc) at 
/usr/src/sys/kern/vfs_syscalls.c:999
#16 0xc0ada7c5 in syscall (frame=0xedb2dd38) at 
/usr/src/sys/i386/i386/trap.c:1090
#17 0xc0ac0d00 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:255
#18 0x0033 in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) fr 11
#11 0xc09c24f7 in ffs_truncate (vp=0xcac04cf0, length=0, flags=2048, 
cred=0xc9871d00, td=0xc95d28c0)
at /usr/src/sys/ufs/ffs/ffs_inode.c:276
276 (void) chkdq(ip, -datablocks, NOCRED, 0);
(kgdb) list
271 if (ip->i_flag & IN_SPACECOUNTED)
272 fs->fs_pendingblocks -= datablocks;
273 UFS_UNLOCK(ump);
274 } else {
275 #ifdef QUOTA
276 (void) chkdq(ip, -datablocks, NOCRED, 0);
277 #endif
278 softdep_setup_freeblocks(ip, length, 
needextclean ?
279 IO_EXT | IO_NORMAL : IO_NORMAL);
280 ASSERT_VOP_LOCKED(vp, "ffs_truncate1");
(kgdb) fr 7
#7  0xc07a1d26 in jailed (cred=0x0) at /usr/src/sys/kern/kern_jail.c:465
465 {
(kgdb) list
460 /*
461  * Return 1 if the passed credential is in a jail, otherwise 0.
462  */
463 int
464 jailed(struct ucred *cred)
465 {
466
467 return (cred->cr_prison != NULL);
468 }
469

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: FreeBSD 7.1: QUOTA: kernel panics in jailed()

2009-12-07 Thread Mikolaj Golub
On Sun, 6 Dec 2009 20:18:13 +0200 Kostik Belousov wrote:

> The kernel paniced because chkdq was supplied NULL credentials and
> _positive_ blocks use count change. Line 276 calls chkdq with
> -datablocks as the change. This could happen if you have problems
> either with hardware (e.g. memory or CPU cache), or your fs
> is damaged.
>
> Another possibility is random corruption of the kernel memory, but
> I recommend to start with fsck and then continue with memory testers
> if fsck have shown no problems.

We have checked FS -- looks OK. So far we have just rebooted to the kernel
without quota. To check the hardware is in our plans. Thank you.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: FreeBSD 7.1: QUOTA: kernel panics in jailed()

2009-12-10 Thread Mikolaj Golub
On Wed, 9 Dec 2009 15:52:23 -0600 Mike Pritchard wrote:

> On Mon, Dec 07, 2009 at 10:23:49AM +0200, Mikolaj Golub wrote:
>> On Sun, 6 Dec 2009 20:18:13 +0200 Kostik Belousov wrote:
>> 
>> > The kernel paniced because chkdq was supplied NULL credentials and
>> > _positive_ blocks use count change. Line 276 calls chkdq with
>> > -datablocks as the change. This could happen if you have problems
>> > either with hardware (e.g. memory or CPU cache), or your fs
>> > is damaged.
>> >
>> > Another possibility is random corruption of the kernel memory, but
>> > I recommend to start with fsck and then continue with memory testers
>> > if fsck have shown no problems.
>> 
>> We have checked FS -- looks OK. So far we have just rebooted to the kernel
>> without quota. To check the hardware is in our plans. Thank you.
>
> Did you happen to turn quotas off then back on for the file system in
> question?

Do you mean at the moment of the crash? No, our admins were far from the host
then :-).

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


NFS locking issue with FreeBSD7.1 client

2009-12-30 Thread Mikolaj Golub
call+0x335 

73265 100685 ls   -mi_switch+0x146 
sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 acquire+0x7a _lockmgr+0x45c 
vop_stdlock+0x40 VOP_LOCK1_APV+0x46 _vn_lock+0x166 vget+0x114 
vfs_hash_get+0x143 nfs_nget+0x94 nfs_root+0x3f lookup+0xa1c namei+0x39f 
kern_stat+0x3d stat+0x2f syscall+0x335 
73292 100832 mc   -mi_switch+0x146 
sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 acquire+0x7a _lockmgr+0x45c 
vop_stdlock+0x40 VOP_LOCK1_APV+0x46 _vn_lock+0x166 vget+0x114 
vfs_hash_get+0x143 nfs_nget+0x94 nfs_root+0x3f lookup+0xa1c namei+0x39f 
kern_lstat+0x4f lstat+0x2f syscall+0x335 

73357 100772 ls   -mi_switch+0x146 
sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 acquire+0x7a _lockmgr+0x45c 
vop_stdlock+0x40 VOP_LOCK1_APV+0x46 _vn_lock+0x166 vget+0x114 
vfs_hash_get+0x143 nfs_nget+0x94 nfs_root+0x3f lookup+0xa1c namei+0x39f 
kern_stat+0x3d stat+0x2f syscall+0x335 

73796 100746 ls   -mi_switch+0x146 
sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 acquire+0x7a _lockmgr+0x45c 
vop_stdlock+0x40 VOP_LOCK1_APV+0x46 _vn_lock+0x166 vget+0x114 
vfs_hash_get+0x143 nfs_nget+0x94 nfs_root+0x3f lookup+0xa1c namei+0x39f 
kern_lstat+0x4f lstat+0x2f syscall+0x335 
74074 100800 tcsh -mi_switch+0x146 
sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 acquire+0x7a _lockmgr+0x45c 
vop_stdlock+0x40 VOP_LOCK1_APV+0x46 _vn_lock+0x166 vget+0x114 
vfs_hash_get+0x143 nfs_nget+0x94 nfs_root+0x3f lookup+0xa1c namei+0x39f 
kern_stat+0x3d stat+0x2f syscall+0x335 
74125 100543 ls   -mi_switch+0x146 
sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 acquire+0x7a _lockmgr+0x45c 
vop_stdlock+0x40 VOP_LOCK1_APV+0x46 _vn_lock+0x166 vget+0x114 
vfs_hash_get+0x143 nfs_nget+0x94 nfs_root+0x3f lookup+0xa1c namei+0x39f 
kern_stat+0x3d stat+0x2f syscall+0x335 

74449 100547 df   -mi_switch+0x146 
sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 acquire+0x7a _lockmgr+0x45c 
vop_stdlock+0x40 VOP_LOCK1_APV+0x46 _vn_lock+0x166 vget+0x114 
vfs_hash_get+0x143 nfs_nget+0x94 nfs_statfs+0x69 __vfs_statfs+0x2f 
kern_getfsstat+0x2d5 getfsstat+0x2e syscall+0x335 Xint0x80_syscall+0x20 

74497 100737 bash -mi_switch+0x146 
sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 acquire+0x7a _lockmgr+0x45c 
vop_stdlock+0x40 VOP_LOCK1_APV+0x46 _vn_lock+0x166 vget+0x114 
vfs_hash_get+0x143 nfs_nget+0x94 nfs_root+0x3f lookup+0xa1c namei+0x39f 
kern_stat+0x3d stat+0x2f syscall+0x335 
74650 100837 df   -mi_switch+0x146 
sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 acquire+0x7a _lockmgr+0x45c 
vop_stdlock+0x40 VOP_LOCK1_APV+0x46 _vn_lock+0x166 vget+0x114 
vfs_hash_get+0x143 nfs_nget+0x94 nfs_statfs+0x69 __vfs_statfs+0x2f 
kern_getfsstat+0x2d5 getfsstat+0x2e syscall+0x335 Xint0x80_syscall+0x20 

76499 100771 perl5.8.9-mi_switch+0x146 
sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 acquire+0x7a _lockmgr+0x45c 
vop_stdlock+0x40 VOP_LOCK1_APV+0x46 _vn_lock+0x166 vget+0x114 
vfs_hash_get+0x143 nfs_nget+0x94 nfs_root+0x3f lookup+0xa1c namei+0x39f 
kern_stat+0x3d stat+0x2f syscall+0x335 
76533 100850 perl5.8.9-mi_switch+0x146 
sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 acquire+0x7a _lockmgr+0x45c 
vop_stdlock+0x40 VOP_LOCK1_APV+0x46 _vn_lock+0x166 vget+0x114 
vfs_hash_get+0x143 nfs_nget+0x94 nfs_root+0x3f lookup+0xa1c namei+0x39f 
kern_stat+0x3d stat+0x2f syscall+0x335 

I can send the full output privately if someone from developers is interested
to look at it.

We have removed all NFS shares from this server after the last incident, but
we have other servers where the problem might occur too. So any suggestions
what we should check/do then to provide more info could be helpful.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: FreeBSD NFS client/Linux NFS server issue

2010-01-19 Thread Mikolaj Golub
, 
15, 31, 52}, nm_sdrtt = {3, 3, 15, 15}, nm_sent = 0, nm_cwnd = 4096, 
nm_timeouts = 0, 
  nm_deadthresh = 9, nm_rsize = 32768, nm_wsize = 32768, nm_readdirsize = 4096, 
nm_readahead = 1, 
  nm_wcommitsize = 1177026, nm_acdirmin = 30, nm_acdirmax = 60, nm_acregmin = 
3, nm_acregmax = 60, 
  nm_verf = "JК╬W\000\004oМ", nm_bufq = {tqh_first = 0xda82dc70, tqh_last = 
0xda8058e0}, 
  nm_bufqlen = 2, nm_bufqwant = 0, nm_bufqiods = 1, nm_maxfilesize = 
1099511627775, 
  nm_rpcops = 0xc0c2b5bc, nm_tprintf_initial_delay = 12, nm_tprintf_delay = 30, 
nm_nfstcpstate = {
rpcresid = 0, flags = 1, sock_send_inprog = 0}, 
  nm_hostname = "172.30.10.92\000/var/www/app31", '\0' , 
nm_clientid = 0, nm_fsid = {
val = {0, 0}}, nm_lease_time = 0, nm_last_renewal = 0}

buffers on it:

(kgdb) p *nmp->nm_bufq.tqh_first
$7 = {b_bufobj = 0xc7324960, b_bcount = 31565, b_caller1 = 0x0, 
  b_data = 0xde581000 " valid_lines:", ' ' , "1341\n  
invalid_lines:", ' ' , "1556\n  total_lines:", ' ' 
, "2897\n\nError summary:\n  Inactive pr"..., 
b_error = 0, b_iocmd = 2 '\002', b_ioflags = 0 '\0', b_iooffset = 196608, 
  b_resid = 0, b_iodone = 0, b_blkno = 384, b_offset = 196608, b_bobufs = 
{tqe_next = 0x0, 
tqe_prev = 0xc7324964}, b_left = 0x0, b_right = 0x0, b_vflags = 0, 
b_freelist = {
tqe_next = 0xda805894, tqe_prev = 0xc725d3c0}, b_qindex = 0, b_flags = 
536870948, 
  b_xflags = 2 '\002', b_lock = {lk_object = {lo_name = 0xc0b73635 "bufwait", 
  lo_type = 0xc0b73635 "bufwait", lo_flags = 70844416, lo_witness_data = 
{lod_list = {
  stqe_next = 0x0}, lod_witness = 0x0}}, lk_interlock = 0xc0c77b50, 
lk_flags = 262144, 
lk_sharecount = 0, lk_waitcount = 0, lk_exclusivecount = 1, lk_prio = 80, 
lk_timo = 0, 
lk_lockholder = 0xfffe, lk_newlock = 0x0}, b_bufsize = 31744, 
b_runningbufspace = 0, 
  b_kvabase = 0xde581000 " valid_lines:", ' ' , "1341\n   
   invalid_lines:", ' ' , "1556\n  total_lines:", ' ' 
, "2897\n\nError summary:\n  Inactive pr"..., 
b_kvasize = 32768, b_lblkno = 6, b_vp = 0xc73248a0, b_dirtyoff = 31512, 
  b_dirtyend = 31565, b_rcred = 0x0, b_wcred = 0xcebec400, b_saveaddr = 
0xde581000, b_pager = {
pg_reqpage = 0}, b_cluster = {cluster_head = {tqh_first = 0xda917ec8, 
tqh_last = 0xda888e94}, 
cluster_entry = {tqe_next = 0xda917ec8, tqe_prev = 0xda888e94}}, b_pages = 
{0xc3726e90, 0xc448dca8, 
    0xc2a55b98, 0xc3bf1a28, 0xc3467ff0, 0xc3299600, 0xc28db130, 0xc2301398, 0x0 
}, 
  b_npages = 8, b_dep = {lh_first = 0x0}, b_fsprivate1 = 0x0, b_fsprivate2 = 
0x0, b_fsprivate3 = 0x0, 
  b_pin_count = 0}

These are entires from our log file. Note that b_qindex is 0. But
bufqueues[0] is empty:

(kgdb) p bufqueues[0]
$8 = {tqh_first = 0x0, tqh_last = 0xc0c83e20}

Also does not it look strange that lk_lockholder of b_lock points to
innvalid location (0xfffe)?

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: FreeBSD NFS client/Linux NFS server issue

2010-01-19 Thread Mikolaj Golub
On Tue, 19 Jan 2010 10:02:57 +0200 Mikolaj Golub wrote:

> I have found in the Internet that other people have been observed the similar
> problem with FreeBSD6.2 client:
>
> http://forums.freebsd.org/showthread.php?t=1697

Reading this through carefully it looks like the guy did not experience the
problem (gotten stuck processes). He just described the behaviour of freebsd
client when appending the file.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: FreeBSD NFS client/Linux NFS server issue

2010-01-22 Thread Mikolaj Golub
On Tue, 19 Jan 2010 10:02:57 +0200 Mikolaj Golub wrote:

> So, on some of our freebsd7.1 nfs clients (and it looks like we have had
> similar case with 6.3), which have several nfs mounts to the same CentOS 5.3
> NFS server (mount options: rw,-3,-T,-s,-i,-r=32768,-w=32768,-o=noinet6), at
> some moment the access to one of the NFS mount gets stuck, while the access to
> the other mounts works ok.
>
> In all cases we have been observed so far the first gotten stuck process was
> php script (or two) that was (were) writing to logs file (appending). In
> tcpdump we see that every write to the file causes the sequence of the
> following rpc: ACCESS - READ - WRITE - COMMIT. And at some moment this stops
> after READ rpc call and successful reply.
>
> After this in tcpdump successful readdir/access/lookup/fstat calls are
> observed from our other utilities, which just check the presence of some files
> and they work ok (df also works). The php process at this state is in bo_wwait
> invalidating buffer cache [1].
>
> If at this time we try accessing the share with mc then it hangs acquiring the
> vn_lock held by php process [2] and after this any operations with this NFS
> share hang (df hangs too).
>
> If instead some other process is started that writes to some other file on
> this share (append) then the first process "unfreezes" too (starting from
> WRITE rpc, so there is no any retransmits).

So it looks for me that the problem here is that eventually problem nfsmount
ends up in this state:

(kgdb) p *nmp
$1 = {nm_mtx = {lock_object = {lo_name = 0xc0b808ee "NFSmount lock", 
  lo_type = 0xc0b808ee "NFSmount lock", lo_flags = 16973824, 
lo_witness_data = {lod_list = {
  stqe_next = 0x0}, lod_witness = 0x0}}, mtx_lock = 4, mtx_recurse = 
0}, nm_flag = 35399, 
  nm_state = 1310720, nm_mountp = 0xc6b472cc, nm_numgrps = 16, 
  nm_fh = "\001\000\000\000\000\223\000\000\...@\003\n", '\0' , nm_fhsize = 12, 
  nm_rpcclnt = {rc_flag = 0, rc_wsize = 0, rc_rsize = 0, rc_name = 0x0, rc_so = 
0x0, rc_sotype = 0, 
rc_soproto = 0, rc_soflags = 0, rc_timeo = 0, rc_retry = 0, rc_srtt = {0, 
0, 0, 0}, rc_sdrtt = {0, 
  0, 0, 0}, rc_sent = 0, rc_cwnd = 0, rc_timeouts = 0, rc_deadthresh = 0, 
rc_authtype = 0, 
rc_auth = 0x0, rc_prog = 0x0, rc_proctlen = 0, rc_proct = 0x0}, nm_so = 
0xc6e81d00, nm_sotype = 1, 
  nm_soproto = 0, nm_soflags = 44, nm_nam = 0xc6948640, nm_timeo = 6000, 
nm_retry = 2, nm_srtt = {15, 
15, 31, 52}, nm_sdrtt = {3, 3, 15, 15}, nm_sent = 0, nm_cwnd = 4096, 
nm_timeouts = 0, 
  nm_deadthresh = 9, nm_rsize = 32768, nm_wsize = 32768, nm_readdirsize = 4096, 
nm_readahead = 1, 
  nm_wcommitsize = 1177026, nm_acdirmin = 30, nm_acdirmax = 60, nm_acregmin = 
3, nm_acregmax = 60, 
  nm_verf = "JК╬W\000\004oМ", nm_bufq = {tqh_first = 0xda82dc70, tqh_last = 
0xda8058e0}, 
  nm_bufqlen = 2, nm_bufqwant = 0, nm_bufqiods = 1, nm_maxfilesize = 
1099511627775, 
  nm_rpcops = 0xc0c2b5bc, nm_tprintf_initial_delay = 12, nm_tprintf_delay = 30, 
nm_nfstcpstate = {
rpcresid = 0, flags = 1, sock_send_inprog = 0}, 
  nm_hostname = "172.30.10.92\000/var/www/app31", '\0' , 
nm_clientid = 0, nm_fsid = {
val = {0, 0}}, nm_lease_time = 0, nm_last_renewal = 0}

We have nonempty nm_bufq, nm_bufqiods = 1, but actually there is no nfsiod
thread run for this mount, which is wrong -- nm_bufq will not be emptied until
some other process starts writing to the nfsmount and starts nfsiod thread for
this mount.

Reviewing the code how it could happen I see the following path. Could someone
confirm or disprove me?

in nfs_bio.c:nfs_asyncio() we have:

   1363 mtx_lock(&nfs_iod_mtx);
...
   1374 /*
   1375  * Find a free iod to process this request.
   1376  */
   1377 for (iod = 0; iod < nfs_numasync; iod++)
   1378 if (nfs_iodwant[iod]) {
   1379 gotiod = TRUE;
   1380 break;
   1381 }
   1382 
   1383 /*
   1384  * Try to create one if none are free.
   1385  */
   1386 if (!gotiod) {
   1387 iod = nfs_nfsiodnew();
   1388 if (iod != -1)
   1389 gotiod = TRUE;
   1390 }

Let's consider situation when new nfsiod is created. 

nfs_nfsiod.c:nfs_nfsiodnew() before creating nfssvc_iod thread unlocks 
nfs_iod_mtx:

179 mtx_unlock(&nfs_iod_mtx);
180 error = kthread_create(nfssvc_iod, nfs_asyncdaemon + i, NULL, 
RFHIGHPID,
181 0, "nfsiod %d", newiod);
182 mtx_lock(&nfs_iod_mtx);


And  nfs_nfsiod.c:nfssvc_iod() do the followin:

226 mtx_lock(&nfs_iod_mtx);
...
238 nfs_iodwant[myiod] = curthread->td_proc;
239 nfs

Re: FreeBSD NFS client/Linux NFS server issue

2010-01-22 Thread Mikolaj Golub
On Fri, 22 Jan 2010 14:37:48 -0500 (EST) Rick Macklem wrote:

>> --- nfs_bio.c.orig  2010-01-22 15:38:02.0 +
>> +++ nfs_bio.c   2010-01-22 15:39:58.0 +
>> @@ -1385,7 +1385,7 @@ again:
>> */
>>if (!gotiod) {
>>iod = nfs_nfsiodnew();
>> -   if (iod != -1)
>> +   if ((iod != -1) && (nfs_iodwant[iod] == NULL))
>>gotiod = TRUE;
>>}
>>
>
> Unfortunately, I don't think the above fixes the problem.
> If another thread that called nfs_asyncio() has "stolen" the this "iod",
> it will have set nfs_iodwant[iod] == NULL (set non-NULL at #238)
> and it will remain NULL until the other thread is done with it.

I see. I have missed this. Thanks.

>
> There should probably be some sort of 3 way handshake between
> the code in nfs_asyncio() after calling nfs_nfsnewiod() and the
> code near the beginning of nfssvc_iod(), but I think the following
> somewhat cheesy fix might do the trick:
>
>   if (!gotiod) {
>   iod = nfs_nfsiodnew();
>   if (iod != -1) {
>   if (nfs_iodwant[iod] == NULL) {
>   /*
>* Either another thread has acquired this
>* iod or I acquired the nfs_iod_mtx mutex
>* before the new iod thread did in
>* nfssvc_iod(). To be safe, go back and
>* try again after allowing another thread
>* to acquire the nfs_iod_mtx mutex.
>*/
>   mtx_unlock(&nfs_iod_mtx);
>   /*
>* So long as mtx_lock() implements some
>* sort of fairness, nfssvc_iod() should
>* get nfs_iod_mtx here and set
>* nfs_iodwant[iod] != NULL for the case
>* where the iod has not been "stolen" by
>* another thread for a different mount
>* point.
>*/
>   mtx_lock(&nfs_iod_mtx);
>   goto again;
>   }
>   gotiod = TRUE;
>   }
>   }
>
> Does anyone else have a better solution?
> (Mikolaj, could you by any chance test this? You can test yours, but I
> think it breaks.)

Unfortunately we observed this only on our production servers. A week ago we
made some changes in configuration as workaround -- reconfigure cron no to run
scripts simultaneously, set the scripts in cron that just periodically write a
line to the file on nfs share (to "unlock" it if it is locked). We have not
been observed problems since then and we would not like to experiment in
production. If I manage to produce good test case in test environment I will
be able to test the patch but I am not sure...

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: top Segmentation faulting on 8.0p2 amd64

2010-01-22 Thread Mikolaj Golub
On Wed, 20 Jan 2010 08:06:23 +0100 Harald Schmalzbauer wrote:

> Dear all,
>
> I have no idea why top crashes with segmentation fault on my amd64
> machine running FreeBSD 8.0-RELEASE-p2.
> If someone wants to have a loot at the core dump:
> http://www.schmalzbauer.de/downloads/top.core

core file is useless without binary and libraries. So it is better to run gdb
on your host, produce backtrace and post here:

gdb /usr/bin/top top.core
bt

And sure a backtrace from the top built with -g would be much better.

cd /usr/src/usr.bin/top
CFLAGS=-g make

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: top Segmentation faulting on 8.0p2 amd64 (nss_ldapd problem?)

2010-01-23 Thread Mikolaj Golub
On Sat, 23 Jan 2010 02:02:04 +0100 Harald Schmalzbauer wrote:

> gdb /usr/bin/top top.core
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd"...
> Core was generated by `top'.
> Program terminated with signal 11, Segmentation fault.
> Reading symbols from /lib/libncurses.so.8...done.
> Loaded symbols for /lib/libncurses.so.8
> Reading symbols from /lib/libm.so.5...done.
> Loaded symbols for /lib/libm.so.5
> Reading symbols from /lib/libkvm.so.5...done.
> Loaded symbols for /lib/libkvm.so.5
> Reading symbols from /lib/libc.so.7...done.
> Loaded symbols for /lib/libc.so.7
> Reading symbols from /usr/local/lib/nss_ldap.so.1...done.
> Loaded symbols for /usr/local/lib/nss_ldap.so.1
> Reading symbols from /libexec/ld-elf.so.1...done.
> Loaded symbols for /libexec/ld-elf.so.1
> bt:
> #0  0x000800d08403 in __nss_compat_gethostbyname () from
> /usr/local/lib/nss_ldap.so.1
> #0  0x000800d08403 in __nss_compat_gethostbyname () from
> /usr/local/lib/nss_ldap.so.1
> #1  0x000800d0606f in _nss_ldap_getpwent_r () from
> /usr/local/lib/nss_ldap.so.1

It is worth rebuilding and installing nss_ldap.so with debugging symbols.

> #2  0x0008009ffc54 in __nss_compat_getpwent_r () from /lib/libc.so.7
> #3  0x000800a84a3d in nsdispatch () from /lib/libc.so.7
> #4  0x000800a50976 in getpwent_r () from /lib/libc.so.7
> #5  0x000800a50596 in sysctlbyname () from /lib/libc.so.7

And may be libc.so :-)

> #6  0x00406c6d in machine_init (statics=0x7fffea30,
> do_unames=1 '\001')
> at /usr/src/usr.bin/top/machine.c:257
> #7  0x00407a10 in main (argc=1, argv=0x7fffeb08)
> at /usr/src/usr.bin/top/../../contrib/top/top.c:458
>
> I'm using nss_ldapd-0.7.2 and there's no way to live without ldap...

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: FreeBSD NFS client/Linux NFS server issue

2010-01-23 Thread Mikolaj Golub
On Fri, 22 Jan 2010 17:13:09 -0500 (EST) Rick Macklem wrote:

> On Fri, 22 Jan 2010, Rick Macklem wrote:
>
>>
>> There should probably be some sort of 3 way handshake between
>> the code in nfs_asyncio() after calling nfs_nfsnewiod() and the
>> code near the beginning of nfssvc_iod(), but I think the following
>> somewhat cheesy fix might do the trick:
>>
> [stuff deleted]
> I know it's a little weird to reply to my own posting, but I think
> this might be a reasonable patch (I have only tested it for a few
> minutes at this point).
>
> I basically redefined nfs_iodwant[] as a tri-state variable (although
> it was a struct proc *, it was only tested NULL/non-NULL).
> 0 - was NULL
> 1 - was non-NULL
> -1 - just created by nfs_asyncio() and will be used by it
>
> I'll keep testing it, but hopefully someone else can test and/or
> review it... rick

I applied your patch to FreeBSD8.0 (the box I get on weekend :-), mounted 10
shares, set vfs.nfs.iodmaxidle=10 (to have nfsiod creation more frequently)
and have been running tests for 4 hours -- just to check the patch does not
break anything. No issues have been detected.

It would be very nice to have this patch committed.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: bsnmpd returns incorrect hrProcessorLoad values

2010-01-29 Thread Mikolaj Golub
On Fri, 29 Jan 2010 12:37:52 +0100 Gustau Pérez wrote:

>   Hi,
>
>   I'm using cacti to monitor some servers running FBSD. I was using 7.2
> with SCHED_4BSD. With this configuration : bsnmpd+bsnmp-ucd was
> returning right values for the cores' load.
>
>I recently updated the servers (via csup) to RELENG_8 and bsnmpd is
> returning negative values for the cores' load. If I try something like
> in a 4-core system :
>
>   snmpwalk -v 2c -c community server .1.3.6.1.2.1.25.3.3.1
>
>what I get is :
>
> .1.3.6.1.2.1.25.3.3.1.1.6 = OID: .0.0
> .1.3.6.1.2.1.25.3.3.1.1.10 = OID: .0.0
> .1.3.6.1.2.1.25.3.3.1.1.14 = OID: .0.0
> .1.3.6.1.2.1.25.3.3.1.1.18 = OID: .0.0
> .1.3.6.1.2.1.25.3.3.1.2.6 = INTEGER: -182
> .1.3.6.1.2.1.25.3.3.1.2.10 = INTEGER: -182
> .1.3.6.1.2.1.25.3.3.1.2.14 = INTEGER: -182
> .1.3.6.1.2.1.25.3.3.1.2.18 = INTEGER: -182
>
>   I tried and old bsnmpd-ucd (0.2.1, works fine in a 7,2 system) with a
> 8.0 system. Same wrong results. And it seems bsnmpd in /usr/src/contrib
> has not changed between 7.2 and 8.0.
>
>   Any ideas ? I'm not an expert, but with tcpdump I see different
> results. Against an old 7.2 system, the field related to each core load
> gives the right value. Instead, against and 8.0 system, those field show
> (in hex) values like fd 4b. What I don't know is how bsdnmp-ucb retrives
> those values and how it construct the udp response packet.

bsnmpd-ucd has nothing to do with HOST-RESOURCES-MIB. These mibs are provided
by snmp_hostres(3) module (/usr/lib/snmp_hostres.so). So something wrong is
there (I suppose it is not in sync with some recent changes in kernel or
libkvm).

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


virtualbox status on 8.0-STABLE i386

2010-03-06 Thread Mikolaj Golub
Hi,

Recently I have updated my 8.0-STABLE i386 system and have learnt that
virtualbox begins to crash my box with the error

panic: vm_fault: fault on nofault entry, addr: c1608000

(kgdb) bt
#0  doadump () at pcpu.h:246
#1  0xc04ec379 in db_fncall (dummy1=-1064468854, dummy2=0, dummy3=-1, 
dummy4=0xe865d5bc "пуeХ")
at /usr/src/sys/ddb/db_command.c:548
#2  0xc04ec7af in db_command (last_cmdp=0xc0e04c9c, cmd_table=0x0, dopager=0)
at /usr/src/sys/ddb/db_command.c:445
#3  0xc04ec864 in db_command_script (command=0xc0e05bc4 "call doadump")
at /usr/src/sys/ddb/db_command.c:516
#4  0xc04f09a0 in db_script_exec (scriptname=0xe865d6c8 "kdb.enter.panic", 
warnifnotfound=Variable "warnifnotfound" is not available.
)
at /usr/src/sys/ddb/db_script.c:302
#5  0xc04f0a87 in db_script_kdbenter (eventname=0xc0cc248d "panic") at 
/usr/src/sys/ddb/db_script.c:324
#6  0xc04ee768 in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:228
#7  0xc08d7d06 in kdb_trap (type=3, code=0, tf=0xe865d804) at 
/usr/src/sys/kern/subr_kdb.c:535
#8  0xc0beb39b in trap (frame=0xe865d804) at /usr/src/sys/i386/i386/trap.c:690
#9  0xc0bccd0b in calltrap () at /usr/src/sys/i386/i386/exception.s:165
#10 0xc08d7e8a in kdb_enter (why=0xc0cc248d "panic", msg=0xc0cc248d "panic") at 
cpufunc.h:71
#11 0xc08a88b6 in panic (fmt=0xc0cecbc4 "vm_fault: fault on nofault entry, 
addr: %lx")
at /usr/src/sys/kern/kern_shutdown.c:562
#12 0xc0b0c3d7 in vm_fault (map=0xc199, vaddr=3244326912, 
fault_type=Variable "fault_type" is not available.
)
at /usr/src/sys/vm/vm_fault.c:283
#13 0xc0bea7d6 in trap_pfault (frame=0xe865dac0, usermode=0, eva=3244330720)
at /usr/src/sys/i386/i386/trap.c:840
#14 0xc0beb225 in trap (frame=0xe865dac0) at /usr/src/sys/i386/i386/trap.c:533
#15 0xc0bccd0b in calltrap () at /usr/src/sys/i386/i386/exception.s:165
#16 0xc12beed0 in rtR0MemObjNativeGetPagePhysAddr (pMem=0xc5ed3110, iPage=0) at 
pmap.h:300
#17 0xc12ac354 in SUPR0LockMem (pSession=0xc5c61c10, pvR3=695959552, cPages=1, 
paPages=0xc5f83668)
at SUPDrv.c:2307
#18 0xc12ac8cb in supdrvIOCtl (uIOCtl=536892942, pDevExt=0xc12c9ac0, 
pSession=0xc5c61c10, 
pReqHdr=0xc5f83650) at SUPDrv.c:1245
#19 0xc12b0c3a in VBoxDrvFreeBSDIOCtl (pDev=0xc665d800, ulCmd=536892942, 
pvData=0xe865dd00 "ю8 )\003╬кюq\002", fFile=3, pTd=0xc69556f0)
at 
/usr/ports/emulators/virtualbox-ose-kmod/work/VirtualBox-3.1.2_OSE/out/freebsd.x86/debug/bin/src/vboxdrv/freebsd/SUPDrv-freebsd.c:505
#20 0xc0829658 in devfs_ioctl_f (fp=0xc670fa80, com=536892942, data=0xe865dd00, 
cred=0xc6bbeb00, 
td=0xc69556f0) at /usr/src/sys/fs/devfs/devfs_vnops.c:659
#21 0xc08eec8d in kern_ioctl (td=0xc69556f0, fd=7, com=536892942, 
data=0xe865dd00 "ю8 )\003╬кюq\002")
at file.h:262
#22 0xc08eee14 in ioctl (td=0xc69556f0, uap=0xe865dcf8) at 
/usr/src/sys/kern/sys_generic.c:678
#23 0xc0beaad0 in syscall (frame=0xe865dd38) at 
/usr/src/sys/i386/i386/trap.c:
#24 0xc0bccda0 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:261
#25 0x0033 in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) fr 16
#16 0xc12beed0 in rtR0MemObjNativeGetPagePhysAddr (pMem=0xc5ed3110, iPage=0) at 
pmap.h:300
300 pa = (pa & PG_FRAME) | (va & PAGE_MASK);
(kgdb) list
295  * access the PTE because it would use the new PDE.  It 
is,
296  * however, safe to use the old PDE because the page 
table
297  * page is preserved by the promotion.
298  */
299 pa = KPTmap[i386_btop(va)];
300 pa = (pa & PG_FRAME) | (va & PAGE_MASK);
301 }
302 return (pa);
303 }
304

There were some changes in this part recently (r203182):

http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/i386/include/pmap.h.diff?r1=1.140.2.2;r2=1.140.2.3;only_with_tag=RELENG_8

So I removed KPTmap[i386_btop(va)] with *vtopte(va) and have working
virtualbox again, but I suppose this is rather the problem with virualbox and
not with the kernel code.

In February Alexander Eichner posted the patch to freebsd-emulation@ (thread
with the subject "Patch to fix VirtualBox with recent kernel versions"):

http://lists.freebsd.org/pipermail/freebsd-emulation/2010-February/007434.html

But it does not fix my panics. The patch adds additional handling in
rtR0MemObjNativeGetPagePhysAddr() for the case 
pMem.enmType == RTR0MEMOBJTYPE_MAPPING, while I am observing the panics
for pMem.enmType == RTR0MEMOBJTYPE_LOCK:

(kgdb) fr 17
#17 0xc12ac354 in SUPR0LockMem (pSession=0xc5c61c10, pvR3=695959552, cPages=1, 
paPages=0xc5f83668)
at SUPDrv.c:2307
2307paPages[iPage] = RTR0MemObjGetPagePhysAddr(Mem.MemObj, 
iPage);
(kgdb) p Mem.MemObj.enmType
$1 = RTR0MEMOBJTYPE_LOCK

So, it looks like some additional handling should b

Re: net.inet.tcp.timer_race: does anyone have a non-zero value?

2010-03-07 Thread Mikolaj Golub
On Sun, 7 Mar 2010 11:59:35 + (GMT) Robert Watson wrote:

> Please check the results of the following command:
>
>   % sysctl net.inet.tcp.timer_race
>   net.inet.tcp.timer_race: 0

Are the results for FreeBSD7 look interesting for you? Because currently we
have mostly FreeBSD7.1 hosts in production and I observe nonzero values on 8
hosts (about 15%). I would send more details to you privately if you are
interested.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: virtualbox status on 8.0-STABLE i386

2010-03-08 Thread Mikolaj Golub
On Sun, 07 Mar 2010 15:28:48 +0100 Alexander Eichner wrote:

> Hi,
>
> can you try the attached patch please?
> This should fix the panic you encountered. Please undo your kernel
> changes befoer testing.

Unfortunately, the same panic:

(kgdb) bt
#0  doadump () at pcpu.h:246
#1  0xc04ec379 in db_fncall (dummy1=-1064468854, dummy2=0, dummy3=-1, 
dummy4=0xe866b5b4 "х╣fХ")
at /usr/src/sys/ddb/db_command.c:548
#2  0xc04ec7af in db_command (last_cmdp=0xc0e04c9c, cmd_table=0x0, dopager=0)
at /usr/src/sys/ddb/db_command.c:445
#3  0xc04ec864 in db_command_script (command=0xc0e05bc4 "call doadump")
at /usr/src/sys/ddb/db_command.c:516
#4  0xc04f09a0 in db_script_exec (scriptname=0xe866b6c0 "kdb.enter.panic", 
warnifnotfound=Variable "warnifnotfound" is not available.
)
at /usr/src/sys/ddb/db_script.c:302
#5  0xc04f0a87 in db_script_kdbenter (eventname=0xc0cc246d "panic") at 
/usr/src/sys/ddb/db_script.c:324
#6  0xc04ee768 in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:228
#7  0xc08d7d06 in kdb_trap (type=3, code=0, tf=0xe866b7fc) at 
/usr/src/sys/kern/subr_kdb.c:535
#8  0xc0beb38b in trap (frame=0xe866b7fc) at /usr/src/sys/i386/i386/trap.c:690
#9  0xc0bcccfb in calltrap () at /usr/src/sys/i386/i386/exception.s:165
#10 0xc08d7e8a in kdb_enter (why=0xc0cc246d "panic", msg=0xc0cc246d "panic") at 
cpufunc.h:71
#11 0xc08a88b6 in panic (fmt=0xc0cecba4 "vm_fault: fault on nofault entry, 
addr: %lx")
at /usr/src/sys/kern/kern_shutdown.c:562
#12 0xc0b0c3c7 in vm_fault (map=0xc199, vaddr=3244318720, 
fault_type=Variable "fault_type" is not available.
)
at /usr/src/sys/vm/vm_fault.c:283
#13 0xc0bea7c6 in trap_pfault (frame=0xe866bab8, usermode=0, eva=3244322776)
at /usr/src/sys/i386/i386/trap.c:840
#14 0xc0beb215 in trap (frame=0xe866bab8) at /usr/src/sys/i386/i386/trap.c:533
#15 0xc0bcccfb in calltrap () at /usr/src/sys/i386/i386/exception.s:165
#16 0xc12beef3 in rtR0MemObjNativeGetPagePhysAddr () from 
/boot/modules/vboxdrv.ko
#17 0xc12ac374 in SUPR0LockMem () from /boot/modules/vboxdrv.ko
#18 0xc12ac8eb in supdrvIOCtl () from /boot/modules/vboxdrv.ko
#19 0xc12b0c5a in VBoxDrvFreeBSDIOCtl () from /boot/modules/vboxdrv.ko
#20 0xc0829658 in devfs_ioctl_f (fp=0xc5f1c8c0, com=3321378576, 
data=0xe866bd00, cred=0xc6972a00, 
td=0xc728e250) at /usr/src/sys/fs/devfs/devfs_vnops.c:659
#21 0xc08eec8d in kern_ioctl (td=0xc728e250, fd=7, com=536892942, 
data=0xe866bd00 "@г\023)Ь\023Эю8╫fХЬ\023Эю,╫fХ\005\\╫ю\001") at file.h:262
#22 0xc08eee14 in ioctl (td=0xc728e250, uap=0xe866bcf8) at 
/usr/src/sys/kern/sys_generic.c:678
#23 0xc0beaac0 in syscall (frame=0xe866bd38) at 
/usr/src/sys/i386/i386/trap.c:
#24 0xc0bccd90 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:261
#25 0x0033 in ?? ()
Previous frame inner to this frame (corrupt stack?)

(this time I built the modules without debugging symbols).

Just to be sure that I did all thing properly below the steps I did:

1) returned original pmap.h (with KPTmap), rebuilt the kernel and rebooted
2) rebuilt with the patch virtualbox drivers and virtualbox (not sure this last 
was 
   needed bu just in case...):
 cd emulators/virtualbox-ose-kmod && make patch
 applied this patch and your previous patch ("Patch to fix VirtualBox with 
recent kernel versions")
 built and reinstall
 the same for emulators/virtualbox-ose
3) rebooted and started vm guest

virtualbox-ose-3.1.2_1 A general-purpose full virtualizer for x86 hardware
virtualbox-ose-kmod-3.1.2_1 VirtualBox kernel module for FreeBSD

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Fatal trap 12: page fault while in kernel mode/current process: 12 (swi2: cambio)

2010-03-21 Thread Mikolaj Golub
On Sun, 21 Mar 2010 00:39:01 -0400 jhell wrote:

> DDB as I have heard can be configured AFAIR to textdump but I have no
> knowledge of that.

ddb_enable="YES" in /etc/rc.conf would be enough. But I also remove "textdump
set" in kdb.enter.panic script (/etc/ddb.conf) as I prefer normal dumps (with
output of ddb scripts in capture buffer) to textdumps. You can't debug
textdump and crashinfo will fail too. And all info provided in textdump is
retrieved from vmcore capture buffer by crashifo utility automatically.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: sysctl(?) problem on boot in 8-STABLE

2010-04-05 Thread Mikolaj Golub
On Mon, 5 Apr 2010 10:56:22 -0400 Jeff Blank wrote:

> Hi,
>
> I upgraded an 8-STABLE box to r206119 and am now unable to boot
> multi-user.  I found that it hangs at line 58/59 of
> /etc/rc.d/initrandom:
>
> ( ps -fauxww; sysctl -a; date; df -ib; dmesg; ps -fauxww ) \
> | dd of=/dev/random bs=8k 2>/dev/null
>
> when I run each of these commands by hand, I get only as far as
> 'sysctl -a', which seems to exit normally but leaves my keyboard
> unresponsive (actually acting like I'm leaning on the  key).
> more digging reveals 'sysctl dev.uart' to be what triggers it.

kern/143040 looks similar.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: em driver regression

2010-04-11 Thread Mikolaj Golub
Hi,

On Thu, 8 Apr 2010 14:52:07 -0500 Brandon Gooch wrote:

> On Thu, Apr 8, 2010 at 2:17 PM, Jack Vogel  wrote:
>> Try the code I just checked in, it puts in the CRC stripping, but also
>> tweaks the
>> TX code, this may resolve the watchdogs. Let me know.
>>
>> Cheers,
>>
>> Jack
>>
>
> Yes, this is indeed the fix for both the dhclient and VirtualBox issue
> (at least with my setup). There appear to be no ill effects either.

Today I have upgraded the kernel in my VirtualBox (3.1.51.r27187) to the
latest current and have "em0: Watchdog timeout -- resetting" issue. My
previous kernel was for Mar 12.

Tracking the revision where the problem appeared I see that the issue is not
observed for r203834 and starts to observe after r205869.

Interestingly, if I enter ddb and then exit (sometimes I needed to do this
twice) the errors stop and network starts working.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: em driver regression

2010-04-14 Thread Mikolaj Golub

On Sun, 11 Apr 2010 23:40:03 +0300 Mikolaj Golub wrote:

 MG> Hi,

 MG> Today I have upgraded the kernel in my VirtualBox (3.1.51.r27187) to the
 MG> latest current and have "em0: Watchdog timeout -- resetting" issue. My
 MG> previous kernel was for Mar 12.

 MG> Tracking the revision where the problem appeared I see that the issue is 
not
 MG> observed for r203834 and starts to observe after r205869.

 MG> Interestingly, if I enter ddb and then exit (sometimes I needed to do this
 MG> twice) the errors stop and network starts working.

Adding some prints I observed the following:

Apr 14 07:14:08 hasta kernel: em0: lem_init_locked started (ticks 813, 
watchdog_time: 0)
Apr 14 07:14:08 hasta kernel: em0: lem_init_locked returned at 3 (ticks 818, 
watchdog_time: 0)
Apr 14 07:14:08 hasta kernel: em0: setting watchdog_check to TRUE in 
lem_mq_start_locked 1 (ticks 818, watchdog_
time: 0)
Apr 14 07:14:08 hasta kernel: em0: lem_init_locked started (ticks 818, 
watchdog_time: 0)
Apr 14 07:14:08 hasta kernel: em0: lem_init_locked returned at 3 (ticks 823, 
watchdog_time: 0)
Apr 14 07:14:08 hasta kernel: em0: setting watchdog_check to TRUE in 
lem_mq_start_locked 1 (ticks 828, watchdog_
time: 0)
Apr 14 07:14:08 hasta kernel: em0: lem_txeof started (ticks: 923, 
watchdog_time: 0)
Apr 14 07:14:08 hasta kernel: em0: lem_txeof returned at 3 (ticks: 923, 
watchdog_time: 0)
Apr 14 07:14:08 hasta kernel: em0: lem_txeof started (ticks: 1023, 
watchdog_time: 0)
Apr 14 07:14:08 hasta kernel: em0: lem_txeof returned at 3 (ticks: 1023, 
watchdog_time: 0)
Apr 14 07:14:08 hasta kernel: em0: Watchdog timeout -- resetting (ticks: 1023, 
watchdog_time: 0)
Apr 14 07:14:08 hasta kernel: em0: lem_init_locked started (ticks 1024, 
watchdog_time: 0)
Apr 14 07:14:08 hasta kernel: em0: lem_init_locked returned at 3 (ticks 1028, 
watchdog_time: 0)
Apr 14 07:14:08 hasta kernel: em0: lem_txeof started (ticks: 1128, 
watchdog_time: 0)
Apr 14 07:14:08 hasta kernel: em0: lem_txeof returned at 1 (ticks: 1128, 
watchdog_time: 0)
Apr 14 07:14:08 hasta kernel: em0: Watchdog timeout -- resetting (ticks: 1128, 
watchdog_time: 0)
...

So althogh adapter->watchdog_check was set TRUE, adapter->watchdog_time was
never set.

I see that before r205869 watchdog_time was set in em_xmit but lem_xmit does
not contain this. After adding back this line to lem_xmit (see the first patch
below) the problem has gone on my box.

Also seeing that in the current em_mq_start_locked() both watchdog_check and
watchdog_time are set I tried another patch adding watchdog_time setting in
lem_mq_start_locked() too (see the second patch below). This has also fixed
the issue for me but I don't know if this is a correct fix and if this is the
only place where watchdog_time should be set (there are other places in the
function and in the code where watchdog_check is set to TRUE but watchdog_time
is not set).

-- 
Mikolaj Golub

Index: sys/dev/e1000/if_lem.c
===
--- sys/dev/e1000/if_lem.c	(revision 206595)
+++ sys/dev/e1000/if_lem.c	(working copy)
@@ -1880,6 +1880,7 @@ lem_xmit(struct adapter *adapter, struct mbuf **m_
 	 */
 	tx_buffer = &adapter->tx_buffer_area[first];
 	tx_buffer->next_eop = last;
+	adapter->watchdog_time = ticks;
 
 	/*
 	 * Advance the Transmit Descriptor Tail (TDT), this tells the E1000
Index: sys/dev/e1000/if_lem.c
===
--- sys/dev/e1000/if_lem.c	(revision 206595)
+++ sys/dev/e1000/if_lem.c	(working copy)
@@ -873,6 +873,7 @@ lem_mq_start_locked(struct ifnet *ifp, struct mbuf
 			*/
 			ETHER_BPF_MTAP(ifp, m);
 			adapter->watchdog_check = TRUE;
+			adapter->watchdog_time = ticks;
 		}
 	} else if ((error = drbr_enqueue(ifp, adapter->br, m)) != 0)
 		return (error);
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: em driver regression

2010-04-14 Thread Mikolaj Golub
On Wed, 14 Apr 2010 09:28:33 -0700 Jack Vogel wrote:

> Oh, didn't realize you were running the lem code :)  Will make the changes
> shortly,

r206614 works for me. Thanks :-)

> thanks for your debugging efforts.
>
> Jack

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: bsnmpd always died on HDD detach

2012-09-09 Thread Mikolaj Golub
On Sun, Sep 09, 2012 at 11:56:55PM +0200, Miroslav Lachman wrote:
> I am running bsnmpd with basic snmpd.config (only community and location 
> changed).
> 
> When there is a problem with HDD and disk disapeared from ATA channel 
> (eg.: disc physically removed) the bsnmpd always dumps core:
> 
> kernel: pid 1188 (bsnmpd), uid 0: exited on signal 11 (core dumped)
> 
> I see this for a long rime on all releases of 7.x and 8.x branches (i386 
> and amd64). I did not tested 9.x.
> 
> Is it a known bug, or should I file PR?

Do you happen to run bsnmp-ucd too? If you do then what version is it?
In bsnmp-ucd-0.3.5 I introduced a bug that lead to bsnmpd crash on a
disk detach. It has been fixed (thanks to Brian Somers) in 0.3.6.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: bsnmpd always died on HDD detach

2012-09-10 Thread Mikolaj Golub
On Mon, Sep 10, 2012 at 04:46:15PM +0200, Miroslav Lachman wrote:
> Mikolaj Golub wrote:
> > On Sun, Sep 09, 2012 at 11:56:55PM +0200, Miroslav Lachman wrote:
> >> I am running bsnmpd with basic snmpd.config (only community and location
> >> changed).
> >>
> >> When there is a problem with HDD and disk disapeared from ATA channel
> >> (eg.: disc physically removed) the bsnmpd always dumps core:
> >>
> >> kernel: pid 1188 (bsnmpd), uid 0: exited on signal 11 (core dumped)
> >>
> >> I see this for a long rime on all releases of 7.x and 8.x branches (i386
> >> and amd64). I did not tested 9.x.
> >>
> >> Is it a known bug, or should I file PR?
> >
> > Do you happen to run bsnmp-ucd too? If you do then what version is it?
> > In bsnmp-ucd-0.3.5 I introduced a bug that lead to bsnmpd crash on a
> > disk detach. It has been fixed (thanks to Brian Somers) in 0.3.6.
> 
> No, I never installed bsnmpd-ucd. We are using plain bsnmpd from base 
> without any modules.
> It is used by MRTG only for network traffic. Nothing else.

Then the backtrace might be useful.

gdb /usr/sbin/bsnmpd /path/to/bsnmpd.core
bt

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: bsnmpd always died on HDD detach

2012-09-11 Thread Mikolaj Golub
On Tue, Sep 11, 2012 at 10:16:57PM +0200, Miroslav Lachman wrote:

> (gdb) bt
> #0  0x000801046cba in refresh_disk_storage_tbl () from 
> /usr/lib/snmp_hostres.so
> #1  0x0008010478bd in refresh_device_tbl () from 
> /usr/lib/snmp_hostres.so
> #2  0x000801047be6 in start_device_tbl () from /usr/lib/snmp_hostres.so
> #3  0x00080065fad5 in poll_dispatch () from /lib/libbegemot.so.4
> #4  0x0040616a in main ()
> 
> 
> Is it all you need? (I don't know how to use gdb)
> 
> It is on FreeBSD 8.3-RELEASE #0: Mon Apr  9 21:23:18 UTC 2012 
> r...@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64

Not sure we can get more than provided from this core as snmp_hostres
is not built with debugging symbols. You can try rebuilding
snmp_hostres with -g option, intalling and running gdb/bt again

DEBUG_FLAGS=-g make -C /usr/src/usr.sbin/bsnmpd/modules/snmp_hostres clean all 
install

AFAIK it might work or not. If it does not then wait for another crash :-)

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: bsnmpd always died on HDD detach

2012-09-12 Thread Mikolaj Golub
On Wed, Sep 12, 2012 at 10:39:12AM +0200, Miroslav Lachman wrote:

> (gdb) bt
> #0  0x000801046cba in disk_query_disk (entry=0x0) at 
> hostres_diskstorage_tbl.c:241
> #1  0x000801dd6a00 in ?? ()
> #2  0x000801dd6600 in ?? ()
> #3  0x in ?? ()
> #4  0x000801048230 in device_entry_create (name=0x0, 
> location=0x800c14ee0 "0", descr=0x8010482a6 "") at hostres_device_tbl.c:217
> #5  0x000801dd7800 in ?? ()
> #6  0x000801dd7800 in ?? ()
> #7  0x000801dd7400 in ?? ()
> #8  0x in ?? ()
> #9  0x000801048230 in device_entry_create (name=0x801dd7c00 "", 
> location=0x801048230 "˙˙I\213|$8čŕ\201˙˙L\211çčŘ\201˙˙é\035ţ˙˙H\215\025",
>  descr=0x8010482a6 "") at hostres_device_tbl.c:217
> #10 0x000801dd4a00 in ?? ()
> #11 0x000801dd4a00 in ?? ()
> #12 0x000801dd1a00 in ?? ()
> #13 0x in ?? ()
> #14 0x000801048230 in device_entry_create (name=0x801dd8400 "", 
> location=0x801048230 "˙˙I\213|$8čŕ\201˙˙L\211çčŘ\201˙˙é\035ţ˙˙H\215\025",
>  descr=0x8010482a6 "") at hostres_device_tbl.c:217
> #15 0x000801dd1800 in ?? ()
> #16 0x000801dd1800 in ?? ()
> #17 0x000800c00ea8 in ?? ()
> #18 0x0051b1c8 in ?? ()
> #19 0x000800c00938 in ?? ()
> #20 0x0051b258 in ?? ()
> #21 0x000801dc8a00 in ?? ()
> #22 0x0008009f7be9 in free () from /lib/libc.so.7
> #23 0x in ?? ()
> #24 0x7fffed98 in ?? ()
> #25 0x0008010478bd in device_entry_delete () at hostres_device_tbl.c:266
> #26 0x005187d0 in snmp_error ()
> #27 0x000801047be6 in op_hrDeviceTable (ctx=Variable "ctx" is not 
> available.
> ) at hostres_device_tbl.c:671
> #28 0x0051b840 in ?? ()
> #29 0x0051b830 in ?? ()
> #30 0x in ?? ()
> #31 0x7fffc360 in ?? ()
> #32 0x0051b830 in ?? ()
> #33 0x in ?? ()
> #34 0x0008009efbd2 in _pthread_mutex_init_calloc_cb () from 
> /lib/libc.so.7
> #35 0x0008009f2d32 in _malloc_prefork () from /lib/libc.so.7
> #36 0x0008009f6e1f in realloc () from /lib/libc.so.7
> #37 0x000800e0b441 in mib_if_is_dyn () from /usr/lib/snmp_mibII.so
> #38 0x in ?? ()
> #39 0x7fffc5cc in ?? ()
> #40 0x0001 in ?? ()
> #41 0x7fffc5e0 in ?? ()
> #42 0x31fa39e2fac72819 in ?? ()
> #43 0x0001 in ?? ()
> #44 0x00080065fad5 in poll_dispatch () from /lib/libbegemot.so.4
> #45 0x0040616a in main ()
> 
> 
> I hope it helps you to debug this problem.

Looks like we can't trust to this output.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: bsnmpd always died on HDD detach

2012-09-15 Thread Mikolaj Golub
On Sun, Sep 09, 2012 at 11:56:55PM +0200, Miroslav Lachman wrote:
> I am running bsnmpd with basic snmpd.config (only community and location 
> changed).
> 
> When there is a problem with HDD and disk disapeared from ATA channel 
> (eg.: disc physically removed) the bsnmpd always dumps core:
> 
> kernel: pid 1188 (bsnmpd), uid 0: exited on signal 11 (core dumped)
> 
> I see this for a long rime on all releases of 7.x and 8.x branches (i386 
> and amd64). I did not tested 9.x.

Ok, I was able to to reproduce this under qemu doing
  
  atacontrol detach ata1

It crashes in snmp_hostres module, in

  refresh_device_tbl->refresh_disk_storage_tbl->disk_OS_get_ATA_disks

when traversing device_map list and dereferencing map->entry_p, which
is NULL here.

device_map table is used for consistent device table indexing.

refresh_device_tbl(), refresh routine for hrDeviceTable, checks the
list of available devices and calls device_entry_delete() for devices
that have gone. It does not remove the entry from device_map table,
but just sets entry_p to NULL for it (to preserve index reuse by
another device).

Then refresh_disk_storage_tbl() is called, which in turn calls

 disk_OS_get_ATA_disks();
 disk_OS_get_MD_disks();
 disk_OS_get_disks();

and it crashes in disk_OS_get_ATA_disks() when the removed map entry
is dereferenced.

I am attaching the patch that fixes the issue for me.

I was wandering why the issue was not observed after md device
removal, as disk_OS_get_MD_disks() did the same things. It has turned
out that hostres just does not see md devices, so this function is
currently useless. hostres gets devices from devinfo(3), which does
not return md devices.

disk_OS_get_disks() calls kern.disks sysctl to get the list of disks,
and uses device_map differently, so it is not affected.

-- 
Mikolaj Golub
Index: usr.sbin/bsnmpd/modules/snmp_hostres/hostres_diskstorage_tbl.c
===
--- usr.sbin/bsnmpd/modules/snmp_hostres/hostres_diskstorage_tbl.c	(revision 240529)
+++ usr.sbin/bsnmpd/modules/snmp_hostres/hostres_diskstorage_tbl.c	(working copy)
@@ -287,6 +287,9 @@ disk_OS_get_ATA_disks(void)
 
 	/* Walk over the device table looking for ata disks */
 	STAILQ_FOREACH(map, &device_map, link) {
+		/* Skip deleted entries. */
+		if (map->entry_p == NULL)
+			continue;
 		for (found = lookup; found->media != DSM_UNKNOWN; found++) {
 			if (strncmp(map->name_key, found->dev_name,
 			strlen(found->dev_name)) != 0)
@@ -345,6 +348,9 @@ disk_OS_get_MD_disks(void)
 
 	/* Look for md devices */
 	STAILQ_FOREACH(map, &device_map, link) {
+		/* Skip deleted entries. */
+		if (map->entry_p == NULL)
+			continue;
 		if (sscanf(map->name_key, "md%d", &unit) != 1)
 			continue;
 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: bsnmpd always died on HDD detach

2012-09-16 Thread Mikolaj Golub
On Sun, Sep 16, 2012 at 05:56:22PM +0400, Andrey V. Elsukov wrote:
> On 15.09.2012 16:50, Mikolaj Golub wrote:
> > I am attaching the patch that fixes the issue for me.
> > 
> > I was wandering why the issue was not observed after md device
> > removal, as disk_OS_get_MD_disks() did the same things. It has turned
> > out that hostres just does not see md devices, so this function is
> > currently useless. hostres gets devices from devinfo(3), which does
> > not return md devices.
> > 
> > disk_OS_get_disks() calls kern.disks sysctl to get the list of disks,
> > and uses device_map differently, so it is not affected.
> 
> I also have a big patch to the hostres module, but it is not yet
> finished. Probably i should commit the part related to the disk
> subsystem. This part has been rewritten to be GEOM aware.

Wonderful! And as I understand it will solve this problem too? Then I
think no need in committing my patch, unless you are not planning to
merge to stable/[78] (where any fix for this problem is highly
desirable).

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: bsnmpd always died on HDD detach

2012-09-17 Thread Mikolaj Golub
On Sun, Sep 16, 2012 at 07:07:20PM +0200, Miroslav Lachman wrote:
 
> I am glad to read that you found the bug!
> The fix (patch) seems trivial - will it be commited / MFCed? :)

Andrey told me that he was not sure when he would be able to commit
his work, so I have just committed my fix. I am going to MFC it.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: hastctl hang

2012-11-25 Thread Mikolaj Golub
Sorry,  the message went privately to Daisuke, which was not my intention.

-- Forwarded message --
From: Mikolaj Golub 
Date: Mon, Nov 26, 2012 at 9:38 AM
Subject: Re: hastctl hang
To: Daisuke Aoyama 


On Mon, Nov 26, 2012 at 01:17:46AM +0900, Daisuke Aoyama wrote:
> Hello,
>
> I'm trying to integrate HAST to NAS4Free (FreeBSD 9.1-RC3).
> Now I have created version 9.1.0.1.531.
> http://sourceforge.net/projects/nas4free/files/NAS4Free-9.1.0.1/9.1.0.1.531/
>
> Basic CARP + HAST + iSCSI target setup can be done, but very frequently I
> get hastctl hang when called:
>
> /sbin/hastctl status
> /sbin/hastctl dump
>
> Is it better for this method not to call from a script?
> or somthing wrong to use it?

Normally it is ok to use hastctl for scripting.

Do you have it hang forever of just for a few seconds?

Usually hanged hastctl means that hastd master process is waiting for
its worker (either its response or exit).

Could you provide logs from both master ans secondary? Also you might
want to run hastd with -d to make it more verbose.

> Also, I don't know how to detect an error of writing to local device from
> hastd.
> Does anyone know about it?

Currently only by monitoring logs. It looks like a good idea to add
error counters to hastctl statistics output...

> Thanks,
> Daisuke Aoyama
>
> -- the procstat shows like this:
> [root@nas4free-nodeb /tmp]# procstat -ka|grep hast
> 11668 100069 hastd-mi_switch
> sleepq_catch_signals sleepq_wait_sig _sleep kern_wait sys_wait4
> amd64_syscall Xfast_syscall
> 17981 100406 hastd-mi_switch
> sleepq_catch_signals sleepq_wait_sig _sleep do_wait
> __umtx_op_wait_uint_private amd64_syscall Xfast_syscall
> 17981 100559 hastd-mi_switch
> sleepq_catch_signals sleepq_wait_sig _sleep soreceive_generic kern_recvit
> recvit sys_recvfrom amd64_syscall Xfast_syscall
> 17981 100560 hastd-mi_switch
> sleepq_catch_signals sleepq_wait_sig _sleep soreceive_generic kern_recvit
> recvit sys_recvfrom amd64_syscall Xfast_syscall
> 17981 100561 hastd-mi_switch
> sleepq_catch_signals sleepq_wait_sig _sleep do_wait
> __umtx_op_wait_uint_private amd64_syscall Xfast_syscall
> 17984 100078 hastd-mi_switch
> sleepq_catch_signals sleepq_wait_sig _sleep do_wait
> __umtx_op_wait_uint_private amd64_syscall Xfast_syscall
> 17984 100562 hastd-mi_switch
> sleepq_catch_signals sleepq_wait_sig _sleep soreceive_generic kern_recvit
> recvit sys_recvfrom amd64_syscall Xfast_syscall
> 17984 100563 hastd-mi_switch
> sleepq_catch_signals sleepq_wait_sig _sleep soreceive_generic kern_recvit
> recvit sys_recvfrom amd64_syscall Xfast_syscall
> 17984 100564 hastd-mi_switch
> sleepq_catch_signals sleepq_wait_sig _sleep do_wait
> __umtx_op_wait_uint_private amd64_syscall Xfast_syscall
> 18218 100145 hastctl  -mi_switch
> sleepq_catch_signals sleepq_wait_sig _sleep soreceive_generic kern_recvit
> recvit sys_recvfrom amd64_syscall Xfast_syscall
>
> [root@nas4free-nodeb /tmp]# procstat -ta|grep hast
> 11668 100069 hastd-  0  120 sleep   wait
> 17979 100557 hastd-  2  120 sleep   g_waitid

Strange, I don't see 17979 process in procstat -k output. Again, the logs
might be helpful here.

> 17981 100406 hastd-  2  120 sleep   uwait
> 17981 100559 hastd-  0  120 sleep   sbwait
> 17981 100560 hastd-  0  120 sleep   sbwait
> 17981 100561 hastd-  1  120 sleep   uwait
> 17984 100078 hastd-  2  121 sleep   uwait
> 17984 100562 hastd-  3  120 sleep   sbwait
> 17984 100563 hastd-  2  120 sleep   sbwait
> 17984 100564 hastd-  1  121 sleep   uwait
> 18218 100145 hastctl  -  2  152 sleep   sbwait
> -- the procstat shows like this:
>
>
> ___________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

--
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


libstdc++, libsupc++, delete operators and valgrind

2013-01-20 Thread Mikolaj Golub
(operator delete[](void*)) redirected to 0x1005700 
(operator delete[](void*))

Now the question is: is it ok that now we have "new" operators being
still called via libstdc++ while "delete" operators being called
directly from libsupc++?

If it is ok, is the proposed solution with adding redirects for
libsupc++ is a right way to fix the valgrind?

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: libstdc++, libsupc++, delete operators and valgrind

2013-01-27 Thread Mikolaj Golub
On Sun, Jan 20, 2013 at 02:19:55PM +0200, Mikolaj Golub wrote:
> Hi,
> 
> Some time ago I noticed that valgrind started to complain about
> "Mismatched free() / delete / delete []" for valid new/delete
> combinations.
> 
> For example, the following test program
> 
>   int main()
>   {
>   char* buf = new char[10];
>   delete [] buf;
>   
>   return 0;
>   }
> 
> produced a warning:
> 
> ==38718== Mismatched free() / delete / delete []
> ==38718==at 0x100416E: free (vg_replace_malloc.c:473)
> ==38718==by 0x4007BE: main (test.cpp:5)
> ==38718==  Address 0x2400040 is 0 bytes inside a block of size 10 alloc'd
> ==38718==at 0x10047D7: operator new[](unsigned long) 
> (vg_replace_malloc.c:382)
> ==38718==by 0x40079D: main (test.cpp:4)
> 
> For some time I hoped that "someone" would fix the problem but seeing
> that after several upgrades it was still there I decided it is time to
> do some investigations.
> 
> Running the valgrind with "--trace-redir=yes -v" showed that valgrind
> activates redirections for new/delete symbols in libstdc++:
> 
> --6729-- Reading syms from /usr/lib/libstdc++.so.6 (0x1209000)
> ...
> --6729---- ACTIVE --
> ...
> --6729-- 0x01260770 (operator new[](unsig) R-> (1001.0) 0x010041b0 
> operator new[](unsigned long, std::nothrow_t const&)
> --6729-- 0x01260780 (operator new(unsigne) R-> (1001.0) 0x01004270 
> operator new(unsigned long, std::nothrow_t const&)
> --6729-- 0x012608a0 (operator delete[](vo) R-> (1005.0) 0x01003e40 
> operator delete[](void*, std::nothrow_t const&)
> --6729-- 0x012608b0 (operator delete(void) R-> (1005.0) 0x01003fa0 
> operator delete(void*, std::nothrow_t const&)
> --6729-- 0x012dea90 (operator new[](unsig) R-> (1003.0) 0x01004770 
> operator new[](unsigned long)
> --6729-- 0x012deab0 (operator new(unsigne) R-> (1003.0) 0x01004860 
> operator new(unsigned long)
> --6729-- 0x012deca0 (operator delete[](vo) R-> (1005.0) 0x01003ef0 
> operator delete[](void*)
> --6729-- 0x012e2b80 (operator delete(void) R-> (1005.0) 0x01004050 
> operator delete(void*)
> 
> But "delete" redirection is not triggered, while "new" is:
> 
> --6729-- REDIR: 0x12dea90 (operator new[](unsigned long)) redirected to 
> 0x1004770 (operator new[](unsigned long))
> --6729-- REDIR: 0x19dd9a0 (free) redirected to 0x1004100 (free)
> ==6729== Mismatched free() / delete / delete []
> ==6729==at 0x100416E: free (vg_replace_malloc.c:473)
> ==6729==by 0x400715: main (test.cpp:5)
> ==6729==  Address 0x1ed7040 is 0 bytes inside a block of size 10 alloc'd
> ==6729==at 0x10047D7: operator new[](unsigned long) 
> (vg_replace_malloc.c:382)
> ==6729==by 0x400701: main (test.cpp:4)
> 
> A little research revealed that in this case the delete operator from
> libsupc++ is called and valgrind does not provide redirections for the
> symbols in libsupc++.
> 
> When I added the redirections for libsupc++ to valgrind's
> vg_replace_malloc.c:
> 
>   #define  VG_Z_LIBSUPCXX_SONAME  libsupcZpZpZa   // libsupc++*
>   
>   FREE(VG_Z_LIBSUPCXX_SONAME, _ZdlPv,__builtin_delete );
>   FREE(VG_Z_LIBSUPCXX_SONAME, _ZdlPvRKSt9nothrow_t,  __builtin_delete );
>   FREE(VG_Z_LIBSUPCXX_SONAME,  _ZdaPv,   __builtin_vec_delete );
>   FREE(VG_Z_LIBSUPCXX_SONAME,  _ZdaPvRKSt9nothrow_t, __builtin_vec_delete );
> 
> the issue was fixed:
> 
> --99254-- Reading syms from /usr/lib/libstdc++.so.6
> ...
> --99254---- ACTIVE --
> ...
> --99254-- 0x012627c0 (operator new[](unsig) R-> (1001.0) 0x01004ce0 
> operator new[](unsigned long, std::nothrow_t const&)
> --99254-- 0x012627d0 (operator new(unsigne) R-> (1001.0) 0x01004860 
> operator new(unsigned long, std::nothrow_t const&)
> --99254-- 0x012628d0 (operator delete[](vo) R-> (1005.0) 0x01005b00 
> operator delete[](void*, std::nothrow_t const&)
> --99254-- 0x012628e0 (operator delete(void) R-> (1005.0) 0x01005500 
> operator delete(void*, std::nothrow_t const&)
> --99254-- 0x012c27e0 (operator new[](unsig) R-> (1003.0) 0x01004a80 
> operator new[](unsigned long)
> --99254-- 0x012c2800 (operator new(unsigne) R-> (1003.0) 0x01004430 
> operator new(unsigned long)
> --99254-- 0x012c29a0 (operator delete[](vo) R-> (1005.0) 0x01005800 
> operator delete[](void*)
> --99254-- 0x012c3e40 (operator delete(void) R-> (1005.0) 0x01005200 
> operator delete(void*)
> ...
> --99254-- Reading syms from /usr/lib/libsupc++.so.1
> ...
> --99254---- ACTIVE --
> 

Re: Vimage Jail kernel crashed

2013-05-04 Thread Mikolaj Golub
On Sat, May 04, 2013 at 02:52:23PM +0900, KIRIYAMA Kazuhiko wrote:

> May  4 11:19:46 xx kernel: Fatal trap 12: page fault while in kernel mode
> May  4 11:19:46 xx kernel: cpuid = 2; apic id = 02
> May  4 11:19:46 xx kernel: fault virtual address  = 0x7818c3798
> May  4 11:19:46 xx kernel: fault code = supervisor write 
> data, page not present
> May  4 11:19:46 xx kernel: instruction pointer= 
> 0x20:0x8162c19e
> May  4 11:19:46 xx kernel: stack pointer  = 
> 0x28:0xff8121b22860
> May  4 11:19:46 xx kernel: frame pointer  = 
> 0x28:0xff8121b22870
> May  4 11:19:46 xx kernel: code segment   = base 0x0, limit 
> 0xf, type 0x1b
> May  4 11:19:46 xx kernel: = DPL 0, pres 1, long 1, def32 0, gran 1
> May  4 11:19:46 xx kernel: processor eflags   = interrupt enabled, 
> resume, IOPL = 0
> May  4 11:19:46 xx kernel: current process= 15360 
> (ifconfig)
> May  4 11:19:46 xx kernel: trap number= 12
> May  4 11:19:46 xx kernel: panic: page fault
> May  4 11:19:46 xx kernel: cpuid = 2
> May  4 11:19:46 xx kernel: KDB: stack backtrace:
> May  4 11:19:46 xx kernel: #0 0x80923446 at kdb_backtrace+0x66
> May  4 11:19:46 xx kernel: #1 0x808ed0be at panic+0x1ce
> May  4 11:19:46 xx kernel: #2 0x80c7e330 at trap_fatal+0x290
> May  4 11:19:46 xx kernel: #3 0x80c7e668 at trap_pfault+0x1e8
> May  4 11:19:46 xx kernel: #4 0x80c7ec6e at trap+0x3be
> May  4 11:19:46 xx kernel: #5 0x80c682ef at calltrap+0x8
> May  4 11:19:46 xx kernel: #6 0x8162c76d at 
> pfi_change_group_event+0x4d
> May  4 11:19:46 xx kernel: #7 0x809a0d3b at if_delgroup+0x38b
> May  4 11:19:46 xx kernel: #8 0x809a7846 at 
> if_clone_destroyif+0x136
> May  4 11:19:46 xx kernel: #9 0x809a831a at if_clone_destroy+0x17a
> May  4 11:19:46 xx kernel: #10 0x809a5892 at ifioctl+0x482
> May  4 11:19:46 xx kernel: #11 0x80934ef6 at kern_ioctl+0x106
> May  4 11:19:46 xx kernel: #12 0x8093513d at sys_ioctl+0xfd
> May  4 11:19:46 xx kernel: #13 0x80c7dc10 at amd64_syscall+0x540
> May  4 11:19:46 xx kernel: #14 0x80c685d7 at Xfast_syscall+0xf7

It looks like it crashed when referring vnet that had already been
destroyed, in pfi_change_group_event hook.

> Is there any suggestions? 

VIMAGE+pf support is fragile. If it works for someone it is rather by
accident. I expect replacing pf with ipfw_nat or natd will give better
results.

If you still prefer pf, you may try destroying epair interface before
destroying vnet, e.g. using prestop rc.d/jail hooks instead of
poststop, if it is possible.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Vimage Jail kernel crashed

2013-05-04 Thread Mikolaj Golub
On Sat, May 04, 2013 at 10:41:46PM +0900, KIRIYAMA Kazuhiko wrote:

> > If you still prefer pf, you may try destroying epair interface before
> > destroying vnet, e.g. using prestop rc.d/jail hooks instead of
> > poststop, if it is possible.
> 
> In particular, execute following sequence?
> 
> # ifconfig epairXa destroy
> # ifconfig bridge0 deletem epairXa

Yes, but in the revers order, delete from the bridge first. It is
about lines like these in your configuration:

 export jail_web_exec_poststop0="ifconfig bridge0 deletem epair4a"
 export jail_web_exec_poststop1="ifconfig epair4a destroy"

The crash happened when executing ifconfig epair destroy. 

You might want to try running commands manually before using the rc
script.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Nullfs leaks i-nodes

2013-05-07 Thread Mikolaj Golub
On Tue, May 07, 2013 at 08:30:06AM +0200, Göran Löwkrantz wrote:
> I created a PR, kern/178238, on this but would like to know if anyone has 
> any ideas or patches?
> 
> Have updated the system where I see this to FreeBSD 9.1-STABLE #0 r250229 
> and still have the problem.

I am observing an effect that might look like inode leak, which I
think is due free nullfs vnodes caching, recently added by kib
(r240285): free inode number does not increase after unlink; but if I
purge the free vnodes cache (temporary setting vfs.wantfreevnodes to 0
and observing vfs.freevnodes decreasing to 0) the inode number grows
back.

You have only about 1000 inodes available on your underlying fs, while
vfs.wantfreevnodes I think is much higher, resulting in running out of
i-nodes.

If it is really your case you can disable caching, mounting nullfs
with nocache (it looks like caching is not important in your case).

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Proposed MFC to hastctl: compact 'status' and introduce 'list' command

2013-05-23 Thread Mikolaj Golub
On Fri, May 24, 2013 at 12:54:56AM +0400, Dmitry Morozovsky wrote:
> Dear colleagues,
> 
> is there any objection for MFCing the change introduced in
> 
> http://svnweb.freebsd.org/changeset/base/248291
> 
> (the most major change: compacting output of `hastctl status' to one-liner 
> per 
> provider; old output is retained as `list' command)
> 
> to at least stable/9 ?
 
If we agreed to merge, I would prefer to both stable/9 and 8 to have
divergence between branches as minimal as possible.

> The reason I'm asking is that it could lead to changes in hast-related 
> scripts 
> which one use in production.
> 
> If no objections are received I'm (with the generous support from trociny, 
> thank you Mikolaj!) tend to merge it after, say, 2 weeks.
> 
> Thanks!
> 
> -- 
> Sincerely,
> D.Marck [DM5020, MCK-RIPE, DM3-RIPN]
> [ FreeBSD committer: ma...@freebsd.org ]
> 
> *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru ***
> 
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Proposed MFC to hastctl: compact 'status' and introduce 'list' command

2013-05-24 Thread Mikolaj Golub
On Fri, May 24, 2013 at 02:08:28PM +0400, Dmitry Morozovsky wrote:
> Pete,
> 
> On Fri, 24 May 2013, Pete French wrote:
> 
> > > http://svnweb.freebsd.org/changeset/base/248291
> > ...
> > > The reason I'm asking is that it could lead to changes in hast-related 
> > > scripts 
> > > which one use in production.
> > 
> > 
> > Any chance we could do this is 2 stages - first being to add 'list' to give 
> > us a chnace
> > ti change scripts over, then make the chnages to 'status'. I have scripts
> > which try and parse the outut from 'status' which will need changing,
> > and I sspect I am not the only one...
> 
> I see no problem with this, as it is one-lite patch (modulo usage/manual page 
> changes); it would be direct commit to -stable, but as it is temporary, I see 
> no problem there too.
> 
> Mikolaj, your opinion?

It looks like a very good idea.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: hast and zfs trim possibly causing some problems in 9.2

2013-10-10 Thread Mikolaj Golub
On Wed, Oct 09, 2013 at 03:47:29PM +0100, Steven Hartland wrote:

> ZFS will try to send DELETE requests to the underlying storage to
> support TRIM. If that fails then it will disable TRIM support for
> that vdev.
> 
> My guess would be you're just seeing hast being a bit verbose
> when these initial batch failures happen.

If the device on the secondary node does not supports DELETE, but the
device on the primary does, HAST will report to ZFS that DELETE
succeeded (although it failed on the secondary), and ZFS will not
disable TRIM. Pete, isn't this your case?

> From: "Pete French" 
> 
> >I just had a machine fall over on my for the first time in ages - one
> > of a pair of machine we have running hast with zfs on top. I havent
> > got any concrete evidence of what made it die as yet, but I
> > did notice the logifles filling up with thoursands of lines like this
> > just prior to the crash:
> > 
> > serpentine-active hastd[1522]: [serp1] (primary) Remote request failed 
> > (Operation not supported): DELETE(26847744000, 1536).
> > 
> > so I am guessing taht is ZFS trying to send a trim command to hast, and hast
> > does not support it. Have disabled zfs trim now, but thought it was
> > worth mentioning - I would have not expected zfs to be trying to issue
> > a trim command to an underlying device which doesnt support it. These
> > machines were rock solid under 8, and the only chnage I can see with 9 is
> > the trim support being added.

Another important change that comes to mind is the default replication
mode, changed from fullsync to memsync. Do you have the replication
mode explicitly set in your config?

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: hast and zfs trim possibly causing some problems in 9.2

2013-10-11 Thread Mikolaj Golub
On Fri, Oct 11, 2013 at 11:27:36AM +0100, Pete French wrote:
> > If the device on the secondary node does not supports DELETE, but the
> > device on the primary does, HAST will report to ZFS that DELETE
> > succeeded (although it failed on the secondary), and ZFS will not
> > disable TRIM. Pete, isn't this your case?
> 
> Afraid not, both machines are running normal "spinning rust" hard
> drives as the actual storage layer, so there is nothing TRIM capable
> anywhere.
> 
> I didnt get much chnace to look at this yesterday, but am looking at the logs
> again now, and I see these messages right up to the time the machine
> fell over. That machine had been up for a long time, and it was still logging
> these messages, so it looks very much as if ZFS did not stop trying to
> issue the TRIM.

You showed only "Remote request failed" errors from your logs. Do you
have "Local request failed" errors too?

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: hast and zfs trim possibly causing some problems in 9.2

2013-10-11 Thread Mikolaj Golub
On Fri, Oct 11, 2013 at 01:42:39PM +0300, Mikolaj Golub wrote:

> You showed only "Remote request failed" errors from your logs. Do you
> have "Local request failed" errors too?

You should also see them in "local errors" statistics from `hastctl
list' output.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: NICs locking up, "*tcp_sc_h"

2009-03-13 Thread Mikolaj Golub
On Fri, 13 Mar 2009 20:56:24 +1100 Nick Withers wrote:

> I'm sorry to ask what is probably a very simple question, but is there
> somewhere I should look to get clues on debugging from a manually
> generated dump? I tried "panic" after manually envoking the kernel
> debugger but proved highly inept at getting from the dump the same
> information "ps" / "where" gave me within the debugger live.

You can capture ddb session in capture buffer and then extract it from the
dump. In ddb run

capture on

do your debugging

then run "panic" or "call doadump" and after reboot:

ddb capture -M /var/crash/vmcore.X print > out

I would recommend to increase debug.ddb.capture.bufsize sysctl variable to be
sure all the ddb session will be captured.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 7.2-PRERELEASE/sunx2200/bge/msi broken

2009-03-22 Thread Mikolaj Golub

On Sun, 22 Mar 2009 12:55:02 +0200 Danny Braniss wrote:

 DB> Hi,
 DB> between March 16 and now, bge on a Sun X2200 stopped working,
 DB> turning off msi (via hw..pci.enable_msi=0) got it working again.
 DB> I tried first replacing bge with an older version but that did not help.

It looks like related to this report:

http://www.freebsd.org/cgi/getmsg.cgi?fetch=1253844+1263253+/usr/local/www/db/text/2009/freebsd-bugs/20090322.freebsd-bugs

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: RELENG_7 crash

2009-04-21 Thread Mikolaj Golub

On Tue, 21 Apr 2009 01:25:06 -0400 Mike Tancsa wrote:

 MT> The box has a fairly heavy UDP load.  Its RELENG_7 as of today and
 MT> took 3hrs for it to dump core.

 MT> Fatal trap 12: page fault while in kernel mode
 MT> cpuid = 1; apic id = 01
 MT> fault virtual address   = 0x68
 MT> fault code  = supervisor read, page not present
 MT> instruction pointer = 0x20:0xc0637146
 MT> stack pointer   = 0x28:0xe766eaac
 MT> frame pointer   = 0x28:0xe766eb54
 MT> code segment= base 0x0, limit 0xf, type 0x1b
 MT> = DPL 0, pres 1, def32 1, gran 1
 MT> processor eflags= interrupt enabled, resume, IOPL = 0
 MT> current process = 761 (bsnmpd)
 MT> trap number = 12
 MT> panic: page fault
 MT> cpuid = 1
 MT> Uptime: 3h47m43s
 MT> Physical memory: 2036 MB
 MT> Dumping 83 MB: 68 52 36 20 4

 MT> (kgdb) bt
 MT> #0  doadump () at pcpu.h:196
 MT> #1  0xc05964d7 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418
 MT> #2  0xc05967a9 in panic (fmt=Variable "fmt" is not available.
 MT> ) at /usr/src/sys/kern/kern_shutdown.c:574
 MT> #3  0xc07f64ac in trap_fatal (frame=0xe766ea6c, eva=104) at
 MT> /usr/src/sys/i386/i386/trap.c:939
 MT> #4  0xc07f6730 in trap_pfault (frame=0xe766ea6c, usermode=0, eva=104)
 MT> at /usr/src/sys/i386/i386/trap.c:852
 MT> #5  0xc07f70dc in trap (frame=0xe766ea6c) at 
/usr/src/sys/i386/i386/trap.c:530
 MT> #6  0xc07db7eb in calltrap () at /usr/src/sys/i386/i386/exception.s:159
 MT> #7  0xc0637146 in sysctl_ifdata (oidp=0xc08816a0, arg1=0xe766ec24,
 MT> arg2=2, req=0xe766eba4) at /usr/src/sys/net/if_mib.c:127
 MT> #8  0xc059fd77 in sysctl_root (oidp=Variable "oidp" is not available.
 MT> ) at /usr/src/sys/kern/kern_sysctl.c:1413
 MT> #9  0xc059ff14 in userland_sysctl (td=0xc5374460, name=0xe766ec14,
 MT> namelen=6, old=0x0, oldlenp=0xbfbf8478, inkernel=0, new=0x0,
 MT> newlen=0, retval=0xe766ec10, flags=0) at
 MT> /usr/src/sys/kern/kern_sysctl.c:1506
 MT> #10 0xc05a0064 in __sysctl (td=0xc5374460, uap=0xe766ecfc) at
 MT> /usr/src/sys/kern/kern_sysctl.c:1443
 MT> #11 0xc07f6a85 in syscall (frame=0xe766ed38) at
 MT> /usr/src/sys/i386/i386/trap.c:1090
 MT> #12 0xc07db850 in Xint0x80_syscall () at 
/usr/src/sys/i386/i386/exception.s:255
 MT> #13 0x0033 in ?? ()
 MT> Previous frame inner to this frame (corrupt stack?)
 MT> (kgdb)

Just FYI, the same problem has already been registered in pr database as 
kern/132734.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


  1   2   >