Jails on ZFS yielding 100% load on gstat

2018-08-13 Thread Marco Steinbach
Hi there.

% zpool list
NAMESIZE  ALLOC   FREE  EXPANDSZ   FRAGCAP  DEDUP  HEALTH  ALTROOT
zroot  5.41T   670G  4.75T -13%12%  1.00x  ONLINE  -

% uname -a
FreeBSD XXX 11.1-STABLE FreeBSD 11.1-STABLE #0 r322984 [...] amd64


I'm running multiple jails on ZFS, using ezjail to manage them,
including a websever and a mailserver. The mailserver is using a MySQL
database, otherwise depending on dovecot and postfix. Very low volume,
just a few polls / logins per minute.

I am experiencing very high loads as per gstat:

dT: 1.021s  w: 1.000s
 L(q)  ops/sr/s   kBps   ms/rw/s   kBps   ms/w   %busy Name
4181  0  00.0169873   20.1   98.2| ada0
2111  0  00.01005407.3   90.6| ada1
0 88  0  00.0 764581.4   43.3| ada2
0  0  0  00.0  0  00.00.0| ada0p1
3150  0  00.0150603   20.2   95.1| ada0p2
1 31  0  00.0 20270   19.2  117.0| ada0p3
0  0  0  00.0  0  00.00.0| gpt/gptboot0
0  0  0  00.0  0  00.00.0| ada1p1
1 85  0  00.0 853418.4   68.9| ada1p2
1 25  0  00.0 152000.9   75.0| ada1p3
0  0  0  00.0  0  00.00.0| ada2p1
0 62  0  00.0 622511.69.9| ada2p2
0 26  0  00.0 152080.5   42.0| ada2p3
0  0  0  00.0  0  00.00.0| gpt/gptboot1
0  0  0  00.0  0  00.00.0| gpt/gptboot2


These loads lead to the system suffering from very much delayed
responses to even the basic task of echoing characters entered on the
console, consequently rendering the services offered unusable to the
users because of the delays.

Restarting the jails (or even the whole machine at that) ends me up at
exactly the same situation.

I do have lab machines for running load scenarios, so if anyone feels
compelled to lend a hand, please do.

MfG CoCo
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


FreeBSD blocks on BOCHS serial port

2018-08-13 Thread Alexander Lochmann
Hi folks!

We are doing some automatic experiments using FreeBSD running in a
virtual machine.
To control the experiment from the outside, we use serial ports to
communicate with an userspace program.
The communication via serial does work with QEMU. However, it does not
work with BOCHS which is our desired emulator.
Even simple operations like 'echo FOO | tee /dev/ttyu1' or 'cat
/dev/ttyu1' do not work. Both commands block 'forever'.
It does not matter whether we use ttyu0 (file backend) or ttyu1 (tcp
socket).
I put some debug output in sys/dev/uart/uart_dev_ns8250.c. The output
suggests that the driver more or less reads and writes to the serial
ports. At least it does something...

Do you have any hints how we can further analyze this problem?
Did anyone came across a similar problem?

Thank you!

Regards,
Alex

-- 
Technische Universität Dortmund
Alexander LochmannPGP key: 0xBC3EF6FD
Otto-Hahn-Str. 16 phone:  +49.231.7556141
D-44227 Dortmund  fax:+49.231.7556116
http://ess.cs.tu-dortmund.de/Staff/al



signature.asc
Description: OpenPGP digital signature


Re: Jails on ZFS yielding 100% load on gstat

2018-08-13 Thread Alan Somers
Jails probably aren't the source of your problem.  You need to find out
what process or processes are responsible for all this activity.  Since the
write bandwidth is fairly low, you might have a process that's sync(2)ing
or fsync(2)ing. too often.  "gstat -o" will show if that's the case.  You
can also try running "top -mio" to see which processes are doing the most
I/O.

-Alan

On Sun, Aug 12, 2018 at 12:50 PM, Marco Steinbach <
c...@executive-computing.de> wrote:

> Hi there.
>
> % zpool list
> NAMESIZE  ALLOC   FREE  EXPANDSZ   FRAGCAP  DEDUP  HEALTH  ALTROOT
> zroot  5.41T   670G  4.75T -13%12%  1.00x  ONLINE  -
>
> % uname -a
> FreeBSD XXX 11.1-STABLE FreeBSD 11.1-STABLE #0 r322984 [...] amd64
>
>
> I'm running multiple jails on ZFS, using ezjail to manage them,
> including a websever and a mailserver. The mailserver is using a MySQL
> database, otherwise depending on dovecot and postfix. Very low volume,
> just a few polls / logins per minute.
>
> I am experiencing very high loads as per gstat:
>
> dT: 1.021s  w: 1.000s
>  L(q)  ops/sr/s   kBps   ms/rw/s   kBps   ms/w   %busy Name
> 4181  0  00.0169873   20.1   98.2| ada0
> 2111  0  00.01005407.3   90.6| ada1
> 0 88  0  00.0 764581.4   43.3| ada2
> 0  0  0  00.0  0  00.00.0| ada0p1
> 3150  0  00.0150603   20.2   95.1| ada0p2
> 1 31  0  00.0 20270   19.2  117.0| ada0p3
> 0  0  0  00.0  0  00.00.0| gpt/gptboot0
> 0  0  0  00.0  0  00.00.0| ada1p1
> 1 85  0  00.0 853418.4   68.9| ada1p2
> 1 25  0  00.0 152000.9   75.0| ada1p3
> 0  0  0  00.0  0  00.00.0| ada2p1
> 0 62  0  00.0 622511.69.9| ada2p2
> 0 26  0  00.0 152080.5   42.0| ada2p3
> 0  0  0  00.0  0  00.00.0| gpt/gptboot1
> 0  0  0  00.0  0  00.00.0| gpt/gptboot2
>
>
> These loads lead to the system suffering from very much delayed
> responses to even the basic task of echoing characters entered on the
> console, consequently rendering the services offered unusable to the
> users because of the delays.
>
> Restarting the jails (or even the whole machine at that) ends me up at
> exactly the same situation.
>
> I do have lab machines for running load scenarios, so if anyone feels
> compelled to lend a hand, please do.
>
> MfG CoCo
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
>
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Jails on ZFS yielding 100% load on gstat

2018-08-13 Thread Mike Tancsa
On 8/12/2018 2:50 PM, Marco Steinbach wrote:
> 
> These loads lead to the system suffering from very much delayed
> responses to even the basic task of echoing characters entered on the
> console, consequently rendering the services offered unusable to the
> users because of the delays.


Do you have a LOT of files and or metadata ? Have a look at the cache
stats to see if you are perhaps grinding away on big directory lookups?
Install sysutils/zfs-stats and post
zfs-stats -a

Also, does
top -mio -I
shed any light as to whats taking up the disk io ?

---Mike



-- 
---
Mike Tancsa, tel +1 519 651 3400 x203
Sentex Communications, m...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: All the memory eaten away by ZFS 'solaris' malloc - on 11.2-R amd64

2018-08-13 Thread Mark Martinec

2018-08-04 21:47, Mark Johnston wrote:

Sorry, I missed that message.  Given that information, it would be
useful to see the output of the following script instead:

# dtrace -c "zpool list -Hp" -x temporal=off -n '
 dtmalloc::solaris:malloc
   /pid == $target/{@allocs[stack(), args[3]] = count()}
 dtmalloc::solaris:free
   /pid == $target/{@frees[stack(), args[3]] = count();}'
This will record all allocations and frees from a single instance of
"zpool list".




2018-08-07 14:58, Mark Martinec wrote:

Collected, here it is:
  https://www.ijs.si/usr/mark/tmp/dtrace-cmd.out.bz2




Was there a mention of a defunct pool?


Indeed.
Haven't tried yet to destroy it, so it is only my hypothesis
that a defunct pool plays a role in this leak.

[...]

I have jumped from 10.3 directly to 11.1-RELEASE-p11, so I'm not sure
with exactly which version / patch level the problem was introduced.

Tried to reproduce the problem on another host running 11.2R,
using memory disk (md), created GPT partition on it and a ZFS pool
on top, then destroyed the disk, so the pool was left as UNAVAILABLE.
Unfortunately this did not reproduce the problem, the "zpool list"
on that host does not cause ZFS to leak memory. Must be something
specific to that failed disk or pool, which is causing the leak.
  Mark



More news: on my last posting I said I can't reproduce the issue
on another 11.2 host. Well, it turned out this was only half the truth.

So this is what I did the last time:

  # create a test pool on md
  mdconfig -a -t swap -s 1Gb
  gpart create -s gpt /dev/md0
  gpart add -t freebsd-zfs -a 4k /dev/md0
  zpool create test /dev/md0p1
  # destroy the disk underneath the pool, making it "unavailable"
  mdconfig -d -u 0 -o force

and I reported that the "zpool list" command does not leak memory,
unlike on another host where the problem was first detected.

But in the following days after this, the second machine
started to run out of memory and ground to a standstill after
a couple of days - this now happened three times, until I realized
the same thing was happening here as on the original host.
(the "zpool list" is running periodically as a plugin to a
"telegraf" monitoring)

Sure enough the "zpool list" was leaking "solaris" zone memory
here too, and even in larger chunks (previously by 570, now by about 
2k):


  # (while true; do zpool list >/dev/null; vmstat -m | \
  fgrep solaris; sleep 0.5; done) | awk '{print $2-a; a=$2}'
  12224540
  2509
  3121
  5022
  2507
  1834
  2508
  2505

And it's not just the "zpool list" command. The same leak occurs with
"zpool status" and with "zpool iostat", either when explicitly 
specifying
the defunct pool as argument, or without specifying a pool (implying 
all).

(but not when a healthy pool is explicitly specified to such command)

And to confirm the hypothesis: while running the "zpool list" in an
above loop, I destroyed the defunct pool from another terminal, and
the leak immediately vanished (the vmstat -m | fgrep solaris
no longer grew).

So the only missing link is: why the leak did not start immediately
after revoking the disk and making the pool unavailable, but only
some time later (hours? few days? after a reboot? after running some
other command?).

  Mark





___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: FreeBSD blocks on BOCHS serial port

2018-08-13 Thread Eugene Grosbein
13.08.2018 20:52, Alexander Lochmann wrote:

> Hi folks!
> 
> We are doing some automatic experiments using FreeBSD running in a
> virtual machine.
> To control the experiment from the outside, we use serial ports to
> communicate with an userspace program.
> The communication via serial does work with QEMU. However, it does not
> work with BOCHS which is our desired emulator.
> Even simple operations like 'echo FOO | tee /dev/ttyu1' or 'cat
> /dev/ttyu1' do not work. Both commands block 'forever'.
> It does not matter whether we use ttyu0 (file backend) or ttyu1 (tcp
> socket).
> I put some debug output in sys/dev/uart/uart_dev_ns8250.c. The output
> suggests that the driver more or less reads and writes to the serial
> ports. At least it does something...
> 
> Do you have any hints how we can further analyze this problem?
> Did anyone came across a similar problem?

This could be modem control line "Carrier Detection" (CD) or flow control 
problem:
emulators can have distinct default settings for serial ports.

You should not rely on defaults and make sure you disable modem control/CD
either explicitly (using stty(1) etc.) or implicitly by switching to /dev/cuau0
instead of /dev/ttyu0. Flow control settings should match too, for both sides
of virtual port.


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: All the memory eaten away by ZFS 'solaris' malloc - on 11.1-R amd64

2018-08-13 Thread Volodymyr Kostyrko

23.07.18 18:12, Mark Martinec wrote:

After upgrading an older AMD host from FreeBSD 10.3 to 11.1-RELEASE-p11
(amd64), ZFS is gradually eating up all memory, so that it crashes every
few days when the memory is completely exhausted (after swapping heavily
for a couple of hours).


I've been in the same situation. ZFS, only pool, no ZFS errors.

I think the problem is rather between swapping and ZFS ARC. This host 
has different load, sometimes it needs more active memory, somtimes 
less... This means that active zone can expand and shrink like +-2G os 
mem (I have 16Gb installed there). The problem is, when huge task is 
idle it doesn't use much active memory and other activity is pushing 
it's memory to the swap. When active runs low and ARC runs >50% of 
memory it becomes very hard to make ARC give some memory back. My host 
even was broght to the point when it couldn't get tasks back into memory 
from swap, because while some pages were restored from swap the time 
passes by and the other pages are instead stored to swap due to zome ARC 
activity. Finally active zone shrinks so bad that the host becomes 
unresponsive.


Like 6 month ago I tried tweaking kernel and swap to make things go 
other way. Currently I have `vm.swap_idle_enabled=1` in /etc/loader.conf 
and looks like this solves my problem. The other interesting things to 
look at are `vfs.zfs.arc_free_target`, `vfs.zfs.arc_shrink_shift`, 
`vfs.zfs.arc_grow_retry`.


Or you can take another route and plain limit current ARC size with 
`vfs.zfs.arc_max`.


Hope that helps.

--
Sphinx of black quartz judge my vow.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: FreeBSD blocks on BOCHS serial port

2018-08-13 Thread Alexander Lochmann


On 13.08.2018 21:36, Eugene Grosbein wrote:
> 13.08.2018 20:52, Alexander Lochmann wrote:
> 
>> Hi folks!
>>
>> We are doing some automatic experiments using FreeBSD running in a
>> virtual machine.
>> To control the experiment from the outside, we use serial ports to
>> communicate with an userspace program.
>> The communication via serial does work with QEMU. However, it does not
>> work with BOCHS which is our desired emulator.
>> Even simple operations like 'echo FOO | tee /dev/ttyu1' or 'cat
>> /dev/ttyu1' do not work. Both commands block 'forever'.
>> It does not matter whether we use ttyu0 (file backend) or ttyu1 (tcp
>> socket).
>> I put some debug output in sys/dev/uart/uart_dev_ns8250.c. The output
>> suggests that the driver more or less reads and writes to the serial
>> ports. At least it does something...
>>
>> Do you have any hints how we can further analyze this problem?
>> Did anyone came across a similar problem?
> 
> This could be modem control line "Carrier Detection" (CD) or flow control 
> problem:
> emulators can have distinct default settings for serial ports.
> 
> You should not rely on defaults and make sure you disable modem control/CD
> either explicitly (using stty(1) etc.) or implicitly by switching to 
> /dev/cuau0
> instead of /dev/ttyu0. Flow control settings should match too, for both sides
> of virtual port.
Thx. I cannot even run 'stty < /dev/ttyu1' to see the current settings.
It simply blocks...

'stty < /dev/ttyu0' works perfectly. ttyu0 uses a file-based backend.
Whereas ttyu1 uses a tcp server-based backend with a connected netcat.
> 
> 

-- 
Technische Universität Dortmund
Alexander LochmannPGP key: 0xBC3EF6FD
Otto-Hahn-Str. 16 phone:  +49.231.7556141
D-44227 Dortmund  fax:+49.231.7556116
http://ess.cs.tu-dortmund.de/Staff/al



signature.asc
Description: OpenPGP digital signature


Re: FreeBSD blocks on BOCHS serial port

2018-08-13 Thread Eugene Grosbein
14.08.2018 3:15, Alexander Lochmann wrote:

>> You should not rely on defaults and make sure you disable modem control/CD
>> either explicitly (using stty(1) etc.) or implicitly by switching to 
>> /dev/cuau0
>> instead of /dev/ttyu0. Flow control settings should match too, for both sides
>> of virtual port.
> Thx. I cannot even run 'stty < /dev/ttyu1' to see the current settings.
> It simply blocks...

Use /dev/ttyu1.init to see defaults and /dev/ttyu1.lock to set/show
locked defaults that cannot be changed without disabling a lock first.




signature.asc
Description: OpenPGP digital signature


Re: All the memory eaten away by ZFS 'solaris' malloc - on 11.1-R amd64

2018-08-13 Thread Mark Martinec

2018-08-13 21:48, Volodymyr Kostyrko wrote:

I've been in the same situation. ZFS, only pool, no ZFS errors.

I think the problem is rather between swapping and ZFS ARC. This host
has different load, sometimes it needs more active memory, somtimes
less... This means that active zone can expand and shrink like +-2G os
mem (I have 16Gb installed there). The problem is, when huge task is
idle it doesn't use much active memory and other activity is pushing
it's memory to the swap. When active runs low and ARC runs >50% of
memory it becomes very hard to make ARC give some memory back. My host
even was broght to the point when it couldn't get tasks back into
memory from swap, because while some pages were restored from swap the
time passes by and the other pages are instead stored to swap due to
zome ARC activity. Finally active zone shrinks so bad that the host
becomes unresponsive.

Like 6 month ago I tried tweaking kernel and swap to make things go
other way. Currently I have `vm.swap_idle_enabled=1` in
/etc/loader.conf and looks like this solves my problem. The other
interesting things to look at are `vfs.zfs.arc_free_target`,
`vfs.zfs.arc_shrink_shift`, `vfs.zfs.arc_grow_retry`.

Or you can take another route and plain limit current ARC size with
`vfs.zfs.arc_max`.


What you describe is not the same problem as the one I described
in this thread. In my case the ZFS malloc'ed memory ("solaris" zone)
is growing, while the size of the ARC remains capped to a reasonably
low value, and the ARC even shrinks as the "solaris" zone approaches
the memory size.

I too have been bitten previously by the ARC size being reluctant to
shrink. Ths problem is described here, but only partially mitigated
now in the 11.? version:

  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594

The usually suggested workaround is to limit the size of the ARC,
although it would be nice to find a solution to handle ARC UMA
shrinking automatically, like it worked well in FreeBSD 9 but
broke in FreeBSD 10.

Like I said, the problem I described in this thread is different.

  Mark
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: FreeBSD blocks on BOCHS serial port

2018-08-13 Thread Kurt Jaeger
Hi!

> 14.08.2018 3:15, Alexander Lochmann wrote:
> 
> >> You should not rely on defaults and make sure you disable modem control/CD
> >> either explicitly (using stty(1) etc.) or implicitly by switching to 
> >> /dev/cuau0
> >> instead of /dev/ttyu0. Flow control settings should match too, for both 
> >> sides
> >> of virtual port.
> > Thx. I cannot even run 'stty < /dev/ttyu1' to see the current settings.
> > It simply blocks...
> 
> Use /dev/ttyu1.init to see defaults and /dev/ttyu1.lock to set/show
> locked defaults that cannot be changed without disabling a lock first.

Thanks for this pointer! Is that behaviour written down/explained
somewhere in the man pages ?

-- 
p...@freebsd.org +49 171 3101372  2 years to go !
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"