load balancing outgoing traffic with 4 uplinks

Thomas Huber Sat, 23 Mar 2019 11:48:41 -0700

hi misc

it´s again about my OpenBSD -stable on a APU2-board as loadbalancer setup:


I´ve four ADSL-Uplinks provided by two different ISPs
- pppoe0 runs on em0 and is directly connected to a modem and has a
static IP-adress from the ISP.
- pppoe1 is running over a vlan2 via em1 to a managed switch (switch1)
on which a dedicated bridge-modem is conected with an dynamic IP from the
ISP.
- vlan[3|4] are running over em1 to switch1 and further to
two router-modems which are doing the pppoe-connection.

I didn´t manage - although I thought I did - to do the pppoe within OpenBSD
for the third and fourth uplink, that´s why it is setup like this.
see here for that issue: https://marc.info/?l=openbsd-misc&m=155277213709648

On the LAN-side I have
vlan32 (10.10.10.0/24) and
vlan64 (10.64.0.0/10) via em2 to another managed switch (switch2).

As a further information, this is a hotel-setup:
vlan32 is internaly (office-computers, VoIP and gear)
vlan64 is guest-wifi with unifi controller and 10 APs
with ~20-100 connected devices.

The hostname.pppoeX looks like that:

$hostname.pppoe0
inet 0.0.0.0 255.255.255.255 NONE \
        pppoedev em0 authproto pap authname 'xxx' authkey 'xx' up
dest 0.0.0.1
!/sbin/route add -mpath default -ifp pppoe0 0.0.0.1

$hostname.vlan3:
dhcp vlan 3 vlandev em1
!/sbin/route add -mpath default -ifp vlan3 192.168.3.1

$hostname.vlan4:
dhcp vlan 4 vlandev em1
!/sbin/route add -mpath default -ifp vlan4 192.168.4.1

all pppoe[0|1] and vlan[3|4] are successfully connected to the ISP
or router-modem and due to the -mpath in the !/sbin/route command
all interface are in the egress interface-group:

# ifconfig egress
pppoe0: flags=8851<UP,POINTOPOINT,RUNNING,SIMPLEX,MULTICAST> mtu 1492
        index 7 priority 0 llprio 3
        dev: em0 state: session
        sid: 0x185 PADI retries: 0 PADR retries: 0 time: 1107d 16:47:37
        sppp: phase network authproto pap authname "my-first-adsl-username"
        groups: pppoe egress
        status: active
        inet 79.140.xxx.xxx --> 62.27.xxx.xxx netmask 0xffffffff
pppoe1: flags=8851<UP,POINTOPOINT,RUNNING,SIMPLEX,MULTICAST> mtu 1492
        index 8 priority 0 llprio 3
        dev: vlan2 state: session
        sid: 0x186 PADI retries: 0 PADR retries: 0 time: 1107d 16:47:37
        sppp: phase network authproto pap authname
"my-second-adsl-username"
        groups: pppoe egress
        status: active
        inet6 fe80::98f8:2562:d5f3:23a3%pppoe1 ->  prefixlen 64 scopeid 0x8
        inet 85.212.xxx.xxx --> 62.27.xxx.xxx netmask 0xffffffff
vlan4: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        lladdr 00:0d:b9:43:43:b5
        index 42 priority 0 llprio 3
        encap: vnetid 4 parent em1
        groups: vlan egress
        media: Ethernet autoselect (1000baseT full-duplex,rxpause,txpause)
        status: active
        inet 192.168.4.2 netmask 0xffffff00 broadcast 192.168.4.255
vlan3: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        lladdr 00:0d:b9:43:43:b5
        index 43 priority 0 llprio 3
        encap: vnetid 3 parent em1
        groups: vlan egress
        media: Ethernet autoselect (1000baseT full-duplex,rxpause,txpause)
        status: active
        inet 192.168.3.2 netmask 0xffffff00 broadcast 192.168.3.255



#route show -gateway -inet
Routing tables

Internet:
Destination        Gateway            Flags   Refs      Use   Mtu  Prio
Iface
default            192.168.4.1        UGSP       8    30803     -     8
vlan4
default            62.27.93.140       UGSP       0        2     -     8
pppoe0
default            62.27.93.143       UGSP       0        4     -     8
pppoe1
default            192.168.3.1        UGSP       0        3     -     8
vlan3
base-address.mcast localhost          URS        0        0 32768     8 lo0

10.64/10           10.64.0.1          UCn       50        0     -     4
vlan64
10.10.10/24        10.10.10.1         UCn        8        4     -     4
vlan32
10.10.10.255       10.10.10.1         UHb        0        0     -     1
vlan32
10.127.255.255     10.64.0.1          UHb        0        1     -     1
vlan64
62.27.93.140       79.140.177.216     UHh        1        1     -     8
pppoe0
62.27.93.143       55d4e174.access.ec UHh        1        1     -     8
pppoe1
79.140.177.216     79.140.177.216     UHl        0     2779     -     1
pppoe0
55d4e174.access.ec 55d4e174.access.ec UHl        0     1657     -     1
pppoe1
127/8              localhost          UGRS       0        0 32768     8 lo0

localhost          localhost          UHhl      13     2010 32768     1 lo0

192.168.3/24       192.168.3.2        UCn        1        0     -     4
vlan3
192.168.3.255      192.168.3.2        UHb        0        0     -     1
vlan3
192.168.4/24       192.168.4.2        UCn        1        2     -     4
vlan4
192.168.4.255      192.168.4.2        UHb        0        0     -     1
vlan4


I would like to achieve the following:
1. almost even usage of the 4 ADSL-Uplinks
2. prefer VoIP-traffic over vlan32-traffic over vlan64 traffic
3. ssh should be always available through the static IP on pppoe0
4. vlan32 (internal) should not be reachable from vlan64 (hotel-guests)

To do so, I almost followed /faq/pf/pools.html with the following change:
I assume that alomost all traffic in my setup is https this days
so I don´t see the point in two different pass in rules for https and
non-https.
To adress the problem with "secure" web-applications*
I use the source-hash method for nat-to and route-to

This is may working pf.conf to do the loadbalancing across
the two pppoe interfaces:

# cat /etc/pf_pppoe.conf


int_if = "{ vlan32, vlan64 }"
int_lan = "{ 10.10.10.0/24, 10.64.0.0/10}"

table <martians> { 0.0.0.0/8 10.0.0.0/8 127.0.0.0/8 169.254.0.0/16     \
                   172.16.0.0/12 192.0.0.0/24 192.0.2.0/24 224.0.0.0/3 \
                   192.168.0.0/16 198.18.0.0/15 198.51.100.0/24        \
                   203.0.113.0/24 }
set block-policy drop
#set loginterface egress
set skip on lo0

match in all scrub (no-df random-id max-mss 1440)
match out on pppoe from $int_lan nat-to (pppoe) source-hash #least-states
sticky-address

# VOIP Prio
match on vlan32 proto { tcp udp } to port { 5060 5064 } set prio 7
match on vlan32 proto udp from port 11780:12780 set prio 7

#Internal prio
match on vlan32 set prio 5

block in quick on pppoe from <martians> to any
block return out quick on pppoe from any to <martians>
block in
pass quick on vlan32 to vlan32:network
pass quick on vlan64 to vlan64:network
pass out on egress

block return in on vlan from vlan64:network to vlan32:network #no guests to
office
block return in on vlan inet proto tcp from any to any port 25 #avoid spam
out

pass in on $int_if route-to { (pppoe0 pppoe0:network), (pppoe1
pppoe1:network) } source-hash

#this lines are commented because everything seems to work with the
source-hash method
#pass out on pppoe0 from pppoe1 route-to (pppoe1 pppoe1:network)
#pass out on pppoe1 from pppoe0 route-to (pppoe0 pppoe0:network)

pass in on egress inet proto icmp all
pass in on pppoe0 proto tcp from any to (pppoe0) port ssh



Basically everythinig works but i notice some strange things:.

1. Somtimes the traffic is not even distributed between the uplinks.
My guess is this is due to the source-hash method which
- when I understand correctly - distributes traffic per IP and not per
connection.
When I use [round-robin | least-state] sticky-address i´ve problems with my
VoIP.
An maybe some guests have problems with "secure" web apps* too.
Anybody an Idea how to do prober loadbalancing with almost only https
traffic?

2. I tried to custumize this rules to also include vlan[3|4] to the
load-balancing.
2.1. use egress-group instead of the pppoe-group for nat-to:

match out on egress from $int_lan nat-to (egress) source-hash

2.2. add vlan[3|4] to the route-to rule:

pass in on $int_if route-to { (pppoe0 pppoe0:network), (pppoe1
pppoe1:network),\
 (vlan3 vlan3:network), (vlan4 vlan4:network) } source-hash

But it didn´t work: No internet connection from vlan32 and vlan64


3. ping with the -I flag is strange.
To see if my uplinks are working I used to:
# ping -I [assigend or static IP] 8.8.8.8
somtimes it works for an IP and doens´t for another like:
#ping -I [my static IP] 8.8.8.8 works
#ping -I [my static IP] 1.1.1.1 doesn´t work
#ping 1.1.1.1 works

#ping -I [dynamic IP] 8.8.8.8 doesn´t work
#ping -I [dynamic IP] 1.1.1.1 works
#ping 8.8.8.8 works

I don´t have any clue about this and where to look besides routing table.
This problem is a little bit od, cause it stops me from proper investigating
the issue. ping from vlan-ip to vlan-gateway works fine:

# ping -I 192.168.3.2 192.168.3.1
PING 192.168.3.1 (192.168.3.1): 56 data bytes
64 bytes from 192.168.3.1: icmp_seq=0 ttl=64 time=1.475 ms
64 bytes from 192.168.3.1: icmp_seq=1 ttl=64 time=0.719 ms
64 bytes from 192.168.3.1: icmp_seq=2 ttl=64 time=0.762 ms

# ping -I 192.168.4.2 192.168.4.1
PING 192.168.4.1 (192.168.4.1): 56 data bytes
64 bytes from 192.168.4.1: icmp_seq=0 ttl=64 time=0.828 ms
64 bytes from 192.168.4.1: icmp_seq=1 ttl=64 time=0.834 ms


3. My static IP is not always reachable from the outside.
One day it works, the other day it doesn´t.
I guess this could be a problem with an update of the dynamic IPs,
but this is just a guess because they are updated every 24h.
Else, I don´t know where to further look or investigate here too.

Hope someone has a clue on this...
Thanks in advance and all the best

Thomas

*) when writing "secure" in quotation mark please understand it
as in the example at /faq/pf/pools.html

load balancing outgoing traffic with 4 uplinks

Reply via email to