Re: Wired Memory Increasing about 500MBytes per day

2021-08-02 Thread Andrey V. Elsukov
02.08.2021 08:00, Özkan KIRIK пишет:
> Hello,
> 
> I'm using FreeBSD stable/12 0f97f2a1857a96563792f0d873b11a16ff9f818c (Jul
> 25) built.
> pf, ipfw and ipsec options are built with kernel. The server is used as
> firewall that squid and snort3 (daq - netmap) is running.
> 
> I saw that, wired memory is increasing every day. It's about 500MBytes per
> day. I'm checking vmstat and top (sorted by res), I couldn't find what is
> consuming the wired memory.
> 
> How can I find that which process or which part of kernel is consuming the
> wired memory ?

Hi,

We noticed the same problem, I'm not sure the exact version, but you can
check the output:
# vmstat -z | egrep "ITEM|pgcache"

The page cache grows until lowmem is not reached. Then it automatically
cleans and begins to grow again.

-- 
WBR, Andrey V. Elsukov



OpenPGP_signature
Description: OpenPGP digital signature


Re: Wired Memory Increasing about 500MBytes per day

2021-08-03 Thread Andrey V. Elsukov
03.08.2021 11:40, Özkan KIRIK пишет:
> Thank you Andrey,
> 
> There is no line that contains the expression "pgcache".

Probably, it is only on 13+.

> I wonder that, what is the unit of USED column in vmstat -z output ?
> Is the size of allocated memory USED * SIZE bytes or USED bytes?

Yes, USED is the number of entries with SIZE bytes each.

-- 
WBR, Andrey V. Elsukov



OpenPGP_signature
Description: OpenPGP digital signature


Re: Wired Memory Increasing about 500MBytes per day

2021-08-03 Thread Andrey V. Elsukov
03.08.2021 16:47, Mark Johnston пишет:
>> We noticed the same problem, I'm not sure the exact version, but you can
>> check the output:
>> # vmstat -z | egrep "ITEM|pgcache"
>>
>> The page cache grows until lowmem is not reached. Then it automatically
>> cleans and begins to grow again.
> 
> The pgcache zones simply provide a per-CPU cache and allocator for
> physical page frames.  The sizes of the caches are bounded.  The numbers
> of "used" items from the pgcache zones do not really tell you anything
> since those pages may be allocated for any number of purposes, including
> for other UMA zones.  For instance, if ZFS allocates a buffer page from
> its ABD UMA zone, and that zone's caches are empty, UMA may allocate a
> new slab using uma_small_alloc() -> vm_page_alloc() -> pgcache zone.
> 
> So if there is some wired page leak, the pgcache zones are probably not
> directly responsible.

We don't see any leaks, but our monitoring shows that "free" memory
migrates to "wired" and only these zones are grow. So, we have on the
graphs linear growing of wired memory over 7 days. When free memory
reaches ~4% all returns to normal, and then again linear growing for 7
days. And pgcache zones reset their number of USED items to low value.
This is on the server with 256G RAM.

E.g. This is when 9% of free memory left:

$ vmstat -z | egrep "ITEM|pgcache"
ITEM   SIZE  LIMIT USED FREE  REQ
FAILSLEEP XDOMAIN
vm pgcache:4096,  0,5225, 139,  412976,   0,
0,   0
vm pgcache:4096,  0,28381269,  77,190108006,  24,
0,   0
vm pgcache:4096,  0,  166358,   11523,1684567513,3054,
 0,   0
vm pgcache:4096,  0,29548679, 576,780034183,1730,
0,   0
$ bc
>>> 5225+28381269+166358+29548679
58101531
>>> 58101531*4096/1024/1024/1024
221
>>>

This is when lowmem triggered:
% vmstat -z | egrep "ITEM|pgcache"
ITEM   SIZE  LIMIT USED FREE  REQ
FAILSLEEP XDOMAIN
vm pgcache:4096,  0,5336, 337,  410052,   0,
0,   0
vm pgcache:4096,  0, 3126129, 117,56689945,  24,
0,   0
vm pgcache:4096,  0,   49771,3910,413657845,1828,
0,   0
vm pgcache:4096,  0, 4249924, 706,224519238, 562,
0,   0
% bc
>>> 5336+3126129+49771+4249924
7431160
>>> 7431160*4096/1024/1024/1024
28
>>>

Look at the graph:
https://imgur.com/yhqK1p8.png

-- 
WBR, Andrey V. Elsukov



OpenPGP_signature
Description: OpenPGP digital signature


Re: Wired Memory Increasing about 500MBytes per day

2021-08-03 Thread Andrey V. Elsukov
03.08.2021 17:30, Mark Johnston пишет:
>>> So if there is some wired page leak, the pgcache zones are probably not
>>> directly responsible.
>>
>> We don't see any leaks, but our monitoring shows that "free" memory
>> migrates to "wired" and only these zones are grow.
> 
> How are you measuring this?  USED or USED+FREE?

AFAIK, monitoring uses sysctl variables:

vm.stats.vm.v_page_size
vm.stats.vm.v_free_count
vm.stats.vm.v_wire_count

-- 
WBR, Andrey V. Elsukov



OpenPGP_signature
Description: OpenPGP digital signature


Re: IPv6 checksum errors with divert

2021-10-29 Thread Andrey V. Elsukov
27.10.2021 16:28, Peter пишет:
> I see these checksum error when the packet goes into the divert
> socket, I see it when the packet comes back from divert, and I
> see it when the packet goes out onto the network.

> But, when I remove the divert socket from the path, then I still
> see the checksum error at the place where the divert would have
> happened, but when the packet goes out to the network, the checksums
> are okay.

Hi,

This is usually due to enabled IPv6 checksum offloading on the NIC. When
upper level protocols like TCP/UDP/SCTP send a packet, they can leave
checksum for delayed calculation. This delayed calculation occurs when
IP packet is going to the physical interface. If an interface is unable
to offload checksums calculation, IP layer does forced calculation,
otherwise it leaves checksum as is. This is why you see corrupted
checksums in the tcpdump output on egress interface. It is just not yet
calculated by interface.

Divert was designed for IPv4 only and it does not properly support
another address families.

But you can try this patch:
 https://people.freebsd.org/~ae/ipv6_divert_csum.diff

-- 
WBR, Andrey V. Elsukov



OpenPGP_signature
Description: OpenPGP digital signature


Re: IPv6 inflight fragmentation

2021-11-01 Thread Andrey V. Elsukov
31.10.2021 05:24, Peter пишет:
> From what I understood, inflight fragmentation (on an intermediate router)
> is not practical with IPv6. But it happens:
> And it doesn't seem like these packets would be answered at all.
> 
> This happens when there is a dummynet pipe/queue rule (or a divert
> rule) in the outbound rules to an interface that must reduce the MTU.
> As soon as we skip over that dummynet (or divert), we get these ICMPv6
> messages at the other end, and the fragmentation ceases:

Hi,

divert rule does implicit IP fragments reassembling before passing a
packet to application. I don't think dummynet is affected by this.

-- 
WBR, Andrey V. Elsukov



OpenPGP_signature
Description: OpenPGP digital signature


Re: IPv6 inflight fragmentation

2021-11-02 Thread Andrey V. Elsukov
01.11.2021 23:56, Peter пишет:
> ! divert rule does implicit IP fragments reassembling before passing a
> ! packet to application. I don't think dummynet is affected by this.
> 
> No, we're not going to an application, we are routing to the
> Internet. And the uplink iface (tun0) has mtu=1492. And we have a rule
> in ipfw, like:
> 
>> queue 21 proto all  xmit tun0 out
> 
> And we have sysctl net.inet.ip.fw.one_pass=0
> 
> So, at the time when we go thru the queue, we do not yet know the
> actual interface to use for xmit (because there might still be a
> "forward" rule following), so we do not yet know the mtu.
> 
> Only when we finally give the packet out for sending, *after* passing
> the queue, then we will recognize our actual mtu. And then the
> difference happens:
> 
>  * if we did *not* go through the queue, the packet is (probably)
>dropped and an ICMPv6 type 2 ("too big") is sent back to the
>originator. This is how I understand that it should work, and
>that works.

Hi,

without divert/dummynet rules packets are handled trough usual
forwarding path. I.e. ip6_tryforward() handles MTU case and sends
ICMP6_PACKET_TOO_BIG message.

>  * if we *did* go through the queue, the packet is split into
>fragments although it is IPv6. And that does not work; such packet
>does not get answered by Youtube, and playback hangs. From a quick
>glance the fragments do look technically correct - and I have no
>idea why YT would receive a fullsized packet from the player,
>anyway (and I won't analyze their stuff).

And there it seems we have the problem. When you use dummynet rule with
"out xmit" opcode, it is handled on PFIL_OUT|PFIL_FWD pass. And as the
result, dummynet consumes a packet and sends it to ip6_output() with
IPV6_FORWARDING flag. Currently this flag does make some sense only for
multicast routing. And there is the problem - the router that uses
dummynet rule for forwarded packet can do IP fragmentation, that it must
not do.

Alexander and Bjoern, can you take a look at this?
I made a quick patch that does check for PFIL_FWD and IPV6_FORWARDING
flags. So, dummynet now knows that we are forwarding and sets
IPV6_FORWARDING only in that case. Then ip6_output() does set dontfrag
variable when we have IPV6_FORWARDING. And in the end, when we got
EMSGSIZE error with IPV6_FORWARDING flag, we send ICMP6_PACKET_TOO_BIG
error message instead of quiet dropping.

https://people.freebsd.org/~ae/ip6_dont_frag.diff

The patch doesn't touch divert code. I think diverted packet can be
assumed as locally generated, so it is ok to fragment it.

> The behaviour is the same if there is either a "queue" action or
> a "divert" action or both.
> With "divert" we know that the mbuf flags are lost - with dummynet
> I did not yet look into the code. I had a hard time finding the cause
> in bulky video data, and then I simply reduced the mtu one hop earlier
> within my intranet, to workaround the issue for now.

As a workaround usually it is enough to use tcp-setmss opcode.

-- 
WBR, Andrey V. Elsukov



OpenPGP_signature
Description: OpenPGP digital signature