stable/13: ARC no longer self-tuning?

2022-03-30 Thread Peter


Hi,

  while up to Rel 12 the ZFS ARC adjusted it's size to the demand, in
Rel. 13 it appears to be locked to a fixed minimum of 100M compressed.
 
Consequentially I just got a machine stall/freeze under moderate load:
no cmdline reaction (except in the guests), no login possible, all
processes in "D" state. Reset button needed, all guests and jails
destroyed:

38378  -  DJ   0:03.36 find -sx / /ext /var /usr/local /usr/ports /usr/obj 
39414  -  DJ   0:00.00 sendmail: running queue: /var/spool/mqueue (sendmail
39415  -  DJ   0:00.00 sendmail: running queue: /var/spool/clientmqueue (se
39416  -  DJ   0:00.00 /usr/local/www/cgit/cgit.cgi
39417  -  D<   0:00.00 /usr/local/bin/ruby /ext/libexec/heatctl.rb (ruby27)
39418  -  DJ   0:00.00 sendmail: running queue: /var/spool/clientmqueue (se
39419  -  DJ   0:00.00 sendmail: running queue: /var/spool/mqueue (sendmail
39420  -  DJ   0:00.00 sendmail: running queue: /var/spool/clientmqueue (se
39421  -  DJ   0:00.00 sendmail: accepting connections (sendmail)
39426  -  D0:00.00 sendmail: running queue: /var/spool/mqueue (sendmail
39427  -  D0:00.00 sendmail: running queue: /var/spool/clientmqueue (se
39428  -  DJ   0:00.00 sendmail: Queue runner@00:03:00 for /var/spool/clien
39429  -  DJ   0:00.00 sendmail: accepting connections (sendmail)
39430  -  DJ   0:00.00 sendmail: running queue: /var/spool/clientmqueue (se
39465  -  Ds   0:00.01 newsyslog
39466  -  Ds   0:00.01 /bin/sh /usr/libexec/save-entropy
59365  -  DsJ  0:00.09 /usr/sbin/cron -s

"top", apparently the only process still running, shows this:

last pid: 39657;  load averages:  0.27,  1.24,  4.55up 0+04:05:42  04:11:54
805 processes: 1 running, 804 sleeping
CPU:  0.1% user,  0.0% nice,  0.9% system,  0.0% interrupt, 99.0% idle
Mem: 16G Active, 5118M Inact, 1985M Laundry, 7144M Wired, 462M Buf, 905M Free
ARC: 1417M Total, 326M MFU, 347M MRU, 8216K Anon, 30M Header, 706M Other
 119M Compressed, 546M Uncompressed, 4.57:1 Ratio
Swap: 36G Total, 995M Used, 35G Free, 2% Inuse, 76K In

This is different to 12.3: there I would expect the ARC near 6G, wired
near 11G, and swap near 5G.

Last message in the log was 20 minutes earlier:
Mar 30 03:45:17  edge ntpd[7768]: no peer for too long,
server running free now

So, strangely, networking has also stalled. I thought networking uses
other device drivers separate from the disk drivers?

The effect appeared slowly, machine became increasingly unresponsive
and laggy (in all regards of I/O) during the "periodic daily". First
night it runs find over a million files in all jails, as these are not
yet in l2arc. Apparently this killed it:

It might be related to the periodic daily running find in every jail:
35944  -  DJ   0:04.71 find -sx / /var /ext /usr/local /usr/obj /usr/ports 
36186  -  DJ   0:04.75 find -sx / /var /usr/local /usr/obj /usr/ports /dev/
37599  -  DJ   0:04.14 find -sx / /var /ext /usr/local /ext/rapp /usr/ports
38378  -  DJ   0:03.36 find -sx / /ext /var /usr/local /usr/ports /usr/obj 
...

This would need a *lot* of inodes, and the arc seems quite small for
that.

I've not seen such behaviour before - I had ZFS running in ~2007 with
384 MB ram installed; now here are 32G (which I wouldn't have bought,
got them by accident), and that doesn't work well.

The ARC is configured in loader.conf:
# kenv
vfs.zfs.arc_max="10240M"
vfs.zfs.arc_min="1024M"

However, sysctl shows:
vfs.zfs.arc.max: 10737418240
vfs.zfs.arc.min: 0

Observing the behaviour, ARC wants to stay at or even below 1G:

last pid: 38718;  load averages:  2.12,  2.93,  2.88up 0+01:09:08  05:30:25
625 processes: 1 running, 624 sleeping
CPU:  0.0% user,  0.1% nice,  6.3% system,  0.0% interrupt, 93.6% idle
Mem: 12G Active, 1433M Inact, 9987M Wired, 50M Buf, 8237M Free
ARC: 749M Total, 116M MFU, 254M MRU, 2457K Anon, 42M Header, 334M Other
 84M Compressed, 396M Uncompressed, 4.70:1 Ratio
Swap: 36G Total, 36G Free

There are 3 bhyve with 16G + 7G + 2G, these naturally create much 
dirty memory. The point is that these should go to swap, that's
what SSD are for.

The ARC only grows when there is not much activity on the system. That
may be nice for desktops, but is no good for solid workload. I need it
to grow against workload (which it did before, but now doesn't) and
against paging (which not even appears).

Do we have some new knobs to tune?
This one is appears to already be zero by default:
  vfs.zfs.arc.grow_retry: 0
And what is this one doing?
  vfs.zfs.arc.p_dampener_disable=1

Do I need to read all the code? There are lots of other things that
did work on 12.3 and now fail or crash, like net/dhcpcd (crashes now
in libc), or mountd not understanding the zfs exports (syntax changed,
doesn't match the manpage, didn't in 12.3 either, but differently), and
I only have two eyes (and they don't get better with age).

What would be needed for the ARC is an affinity balance: should it
pref

Re: pfctl: Cannot allocate memory.

2022-03-30 Thread Mark Johnston
On Mon, Mar 28, 2022 at 09:44:14AM +0200, Kristof Provost wrote:
> On 27 Mar 2022, at 22:11, Marcel Bischoff wrote:
> > Hello all,
> >
> > when updating a table of ~370k entries, PF sometimes refuses to do so and 
> > from then on continues to refuse until I reboot the machine.
> >
> > $ doas pfctl -f /etc/pf.conf
> > /etc/pf.conf:27: cannot define table pfbadhost: Cannot allocate memory
> > pfctl: Syntax error in config file: pf rules not loaded
> >
> That sounds a lot like 
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=260406

Just a heads-up that this is believed to be fixed now in the main
branch.  The fix should appear in stable/13 and hopefully releng/13.1
shortly.

> 
> My current theory is that this is a bug in the memory allocator somewhere. I 
> do not have the background or time to debug that.



Re: Slow startup from D19488 (rtsol: sendmsg: Permission denied)

2022-03-30 Thread Kevin Oberman
On Tue, Mar 29, 2022 at 5:10 PM Peter  wrote:

>
> Hello Bjoern,
>
>   thanks much for the quick reply!
>
> On Tue, Mar 29, 2022 at 10:04:11PM +, Bjoern A. Zeeb wrote:
> ! On Tue, 29 Mar 2022, Peter wrote:
> !
> ! Hi,
> !
> ! I am a bit puzzled as after two years you are the first one to report
> ! that problem to my knowledge for either base system or jails.
>
> This is what greatly wonders me, too. So I was stronly thinking
> that I am doing something wrong or unusual. But I cannot figure
> it out, it just seems that the detrimental effect of the change
> cannot be avoided (e.g. "service jail start" takes quite long now -
> there's a lot of them).
>
> ! >  after upgrading 12.3 to stable/13, I am seeing these
> ! > errors in all my jails:
> ! >
> ! > > Additional TCP/IP options: log_in_vain=1.
> ! > > ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/local/lib
> ! > /usr/local/lib/c cmpat/pkg /usr/local/lib/compat/pkg
> ! > > 32-bit compatibility ldconfig path:
> ! > > rtsol: sendmsg on nrail1l: Permission denied
> ! > > rtsol: sendmsg on nrail1l: Permission denied
> ! > > rtsol: sendmsg on nrail1l: Permission denied
> ! > > Starting Network: lo0 nrail1l.
> !
> ! Can you give us a full startup log?
>
> It's the above, right from the beginning, and then follows:
>
> > lo0: flags=8049 metric 0 mtu 16384
> > options=680003
> > inet 127.0.0.1 netmask 0xff00
> > inet6 ::1 prefixlen 128
> > inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
> > groups: lo
> > nd6 options=21
> > nrail1l: flags=8843 metric 0 mtu
> 1500
> > options=28
> > ether 06:1d:92:01:01:0a
> > hwaddr 58:9c:fc:10:28:71
> > inet * netmask ** broadcast *
> > inet6 fe80::41d:92ff:fe01:10a%nrail1l prefixlen 64 scopeid 0x2
> > inet6 fd00: prefixlen 120
> > media: Ethernet autoselect (1000baseT )
> > status: active
> > nd6 options=23
> > Starting rtsold.
> > add host 127.0.0.1: gateway lo0 fib 0: route already in table
> > add net default: gateway *
> > Additional inet routing options: log ICMP redirect=YES.
> > add host ::1: gateway lo0 fib 0: route already in table
> > add net fe80::: gateway ::1
> > add net ff02::: gateway ::1
> > add net :::0.0.0.0: gateway ::1
> > add net ::0.0.0.0: gateway ::1
> > add net default: gateway fd00:*
> > Flushed all rules.
> > Firewall rules loaded.
> > Firewall logging pseudo-interface (ipfw0) created.
> > Creating and/or trimming log files.
> > Updating /var/run/os-release done.
> > Clearing /tmp (X related).
> > Updating motd:.
> > Starting syslogd.
> > Starting rapp.
> > Starting cron.
> > Starting sendmail.
> > Starting sendmail_msp_queue.
> > Performing sanity check on sshd configuration.
> > Starting sshd.
> >
> > Wed Mar 30 00:52:15 CEST 2022
>
> ! > Searching the cause I find change  1b5be7204eaeeaf  aka  D19488
> ! >
> ! > This doesn't work, because the firewall is not yet present. This is
> !
> ! Given you are talking firewall, I assume you are using vnet jails?
>
> Yes.
>
> ! And given you are talking ipfw I assume your default policy is deny
> ! and not accept?
>
> Yes.
>
> ! And given rtsol runs I assume you have IPv6 configured and in use?
>
> Yes. Here is how I do it:
> https://daemon.contact/ankh/articles/X3OyjgTpuv
>
> ! The same issue then should also happen in your base system on boot?
>
> No. The base system does (second level) prefix delegation and has
> ipv6_gateway_enable="YES" and rtsold_enable="NO" and is not affected.
>
> There is one vnet jail intended as VPN server, which also has these
> parameters in rc.conf and is also not affected.
>
> (I did not yet bother to figure out why, The shell code run from
> rc.d/netif is a bit lenghty...)
>
> ! > happening in rc.d/netif, and that must run before rc.d/ipfw in any
> ! > case, because the firewall needs to see the netifs.
> !
> ! I thought ipfw could log deal with interfaces coming and going?
>
> Maybe it can, but then modifying the rc.d logic so to get "ipfw" run
> before "netif" - that does likely open a box of worms.
>
> Furthermore, I do use ipfw as a genuine rerouting+filtering
> framework, and that logic is entirely based on the interfaces; all
> rules belong to exactly two interfaces. Here is a short abstract
> of the idea:
> https://forums.freebsd.org/threads/ipfw-or-pf.46706/post-561760
>
>
> cheerio,
> PMc
>
> This may be irrelevant, but updating to the stable branch is not
recommended as it is not regularly tested. Updating to 13.0-Release and
then to stable is less likely to be problematic.
-- 
Kevin Oberman, Part time kid herder and retired Network Engineer
E-mail: rkober...@gmail.com
PGP Fingerprint: D03FB98AFA78E3B78C1694B318AB39EF1B055683