> Sent: Monday, April 09, 2018 at 6:50 PM
> From: "Aham Brahmasmi" <aham.brahma...@gmx.com>
> To: misc@openbsd.org
> Subject: Re: Cannot access internet with virtual switch
>
> > Sent: Saturday, April 07, 2018 at 5:02 AM
> > From: "Ayaka Koshibe" <akosh...@gmail.com>
> > To: "Aham Brahmasmi" <aham.brahma...@gmx.com>
> > Cc: misc@openbsd.org
> > Subject: Re: Cannot access internet with virtual switch
> >
> > On Fri, Apr 6, 2018 at 4:40 PM, Aham Brahmasmi <aham.brahma...@gmx.com> 
> > wrote:
> > > Hello misc,
> > >
> > > Problem
> > > A physical server with a switch (add em0 up) cannot access the internet.
> > > However, the same host with a bridge (add em0 up) can access the
> > > internet.
> > >
> > > Steps
> > > $ ifconfig
> > > em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
> > >         lladdr 22:22:22:22:22:22
> > >         index 1 priority 0 llprio 3
> > >         groups: egress
> > >         media: Ethernet autoselect (1000baseT full-duplex,master)
> > >         status: active
> > >         inet 20.20.20.20 netmask 0xffffff00 broadcast 20.20.20.255
> > > ...
> > > $ doas route -n show
> > > Routing tables
> > >
> > > Internet:
> > > Destination     Gateway            Flags   Refs      Use   Mtu  Prio Iface
> > > default         20.20.20.1         UGS        0     1XXX     -     8 em0
> > > 224/4           127.0.0.1          URS        0        0 32768     8 lo0
> > > 127/8           127.0.0.1          UGRS       0        0 32768     8 lo0
> > > 127.0.0.1       127.0.0.1          UHhl       1        X 32768     1 lo0
> > > 20.20.20/24     20.20.20.20        UCn        1      9XX     -     4 em0
> > > 20.20.20.1      33:33:33:33:33:33  UHLch      1     1XXX     -     3 em0
> > > 20.20.20.20     44:44:44:44:44:44  UHLl       0        X     -     1 em0
> > > 20.20.20.255    20.20.20.20        UHb        0        0     -     1 em0
> > > $ ping 8.8.8.8
> > > PING 8.8.8.8 (8.8.8.8): 56 data bytes
> > > 64 bytes from 8.8.8.8: icmp_seq=0 ttl=61 time=x.xxx ms
> > > ...
> > > $ doas ifconfig switch0 create
> > > $ doas ifconfig switch0 add em0
> > > $ doas ifconfig switch0 up
> > > $ ping 8.8.8.8
> > > PING 8.8.8.8 (8.8.8.8): 56 data bytes
> > > ^C
> > > --- 8.8.8.8 ping statistics ---
> > > 31 packets transmitted, 0 packets received, 100.0% packet loss
> > 
> > Hi,
> > 
> > Seems you haven't started switchd(8), or connected your switch to it
> > -- it shouldn't forward traffic until you do so.
> 
> 
> Hi Koshibe-san,
> 
> Thank you for your reply.
> 
> I have started switchd and connected to it. However, I still cannot
> ping 8.8.8.8. Starting switchd in debug mode results in output which
> broadly says error and closes the switch.
> 
> Steps (after the above switch0 up)
> $ cat /etc/switchd.conf
> listen on 0.0.0.0 tls port 6633
> $ doas switchd -dvvvv
> listen on 0.0.0.0 6633
> 
> (On another session)
> $ switchctl connect /dev/switch0
> 
> (Back to main session)
> ofrelay_input_done: ...
> /dev/switch0 > any: ...
> switch_learn: ...
> packet_input: ...
> any > /dev/switch0: ...
> (above block repeated multiple times)
> ...
> ofrelay_input_done: connection 1.1: 76 bytes from switch 1
> 0401004c 00000013 00020004 040d00a9 00000013 ffffffff 00000001 00100000
> 00000000 00000010 ffffffff ffff0000 00000000 00c88be2 d687ac1f 6b2e22ce
> 8100026f 08004500 006f42d2
> /dev/switch0 > any: version 1_3 type ERROR length 76 xid 19
>         error type BAD_ACTION code 4
> ofp13_input: message not supported: ERROR
> ofrelay_close: connection 1.1 closed
> switch_remove: switch 1 removed.
> 
> (Another session)
> $ tail -10 /var/log/messages
> Apr 9 XX:XX:XX MachineName /bsd: arp: attempt to add entry for GATEWAY_IP
> on em0 by XX:XX:XX:XX:XX:XX on tap0
> (above message repeated infrequently)
> 
> If it helps in any way, this machine is a dedicated/bare-metal machine
> on a large dedicated/bare-metal machine provider's network. The em0
> interface is in the egress group, has a public IP and is connected to
> the internet via the provider's network equipment.
> 
> The end goal in using the switch is to enable multiple OpenBSD VM's with
> with non-contiguous public IPs to be connected to the Internet as real
> hosts. In https://www.openbsd.org/faq/faq6.html#VMMnet, this is the
> Option 4, except using a switch instead of a bridge and public IPs
> on the host network.
> 
> Regards,
> ab
> ---------|---------|---------|---------|---------|---------|---------|--

Hi,

I have tried to locate the piece of code that might be causing the
switch to close.

In order to do so, I first looked at the specification 1.3.5 [1] for
Openflow protocol, specifically the ERROR message. This is because the
error that causes the switch to close is the "message not supported:
ERROR" message. This led me to page 113 (out of 177). Reading through
it led me to the following:

"If the error message is in response to a specific message from the
controller, then the xid field of the header in the error message must
match that of the offending message"

The OFP ERROR message has an xid of 19. Looking at the log, the message
just previous to that had an xid of 19, which implies that the previous
message caused the error. The full output from "doas switchd -dvvvv"
for that message is:

any > /dev/switch0: version 1_3 type PACKET_OUT length 169 xid 19
        buffer NO_BUFFER in_port <1> actions_len 16
                action OUTPUT len 16 port ANY max_len NO_BUFFER

Now looking at the ERROR message, it contains error type BAD_ACTION
and code 4. In source, BAD_ACTION is OFP_ERRTYPE_BAD_ACTION. Code 4
according to the enum ofp_bad_action_code on page 115 of the spec is
OFPBAC_BAD_OUT_PORT. OFP_ERRTYPE_BAD_ACTION is defined in sys/net/ofp.h
and used in the function swofp_validate_action in sys/net/switchofp.c.

Based on the above information, my guess is that the following line in
the function swofp_validate_action causes the error to occur:
case OFP_ACTION_OUTPUT:
...
        case OFP_PORT_ANY:
                *err = OFP_ERRACTION_OUT_PORT;
                return (-1);

This informs us that for a PACKET_OUT with action OUTPUT, it cannot
have its port as ANY. Now, I do not know why for a PACKET_OUT message,
an action OUTPUT cannot have port as ANY. More importantly, I do not
know why the controller seems to be sending the PACKET_OUT with action
OUTPUT and port ANY.

Beyond this, unfortunately, my skills more or less reach their limit.
If possible, I would request Reyk, Goda-san or Yasuoka-san as well to
please share any insights/help me. If required, I will share the
1100+ line log output of "doas switchd -dvvvv".

Regards,
ab
[1] - 
https://www.opennetworking.org/images/stories/downloads/sdn-resources/onf-specifications/openflow/openflow-switch-v1.3.5.pdf
---------|---------|---------|---------|---------|---------|---------|--

> 
> > 
> > > $ ifconfig
> > > em0: flags=8b43<UP,BROADCAST,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST> 
> > > mtu 1500
> > >         lladdr 22:22:22:22:22:22
> > >         index 1 priority 0 llprio 3
> > >         groups: egress
> > >         media: Ethernet autoselect (1000baseT full-duplex,master)
> > >         status: active
> > >         inet 20.20.20.20 netmask 0xffffff00 broadcast 20.20.20.255
> > > switch0: flags=41<UP,RUNNING>
> > >         index 6 llprio 3
> > >         groups: switch
> > >         datapath xxxxxxxxxxxxxxxxxx maxflow 10000 maxgroup 1000
> > >         em0 flags=0<>
> > >                 port 1 ifpriority 0 ifcost 0
> > > ...
> > > $ doas route -n show
> > > Routing tables
> > >
> > > Internet:
> > > Destination     Gateway            Flags   Refs      Use   Mtu  Prio Iface
> > > default         20.20.20.1         UGS        0     1XXX     -     8 em0
> > > 224/4           127.0.0.1          URS        0        0 32768     8 lo0
> > > 127/8           127.0.0.1          UGRS       0        0 32768     8 lo0
> > > 127.0.0.1       127.0.0.1          UHhl       1        X 32768     1 lo0
> > > 20.20.20/24     20.20.20.20        UCn        1      9XX     -     4 em0
> > > 20.20.20.1      33:33:33:33:33:33  UHLch      1     1XXX     -     3 em0
> > > 20.20.20.20     44:44:44:44:44:44  UHLl       0        X     -     1 em0
> > > 20.20.20.255    20.20.20.20        UHb        0        0     -     1 em0
> > > $ doas ifconfig switch0 destroy
> > > $ ping 8.8.8.8
> > > PING 8.8.8.8 (8.8.8.8): 56 data bytes
> > > 64 bytes from 8.8.8.8: icmp_seq=0 ttl=61 time=x.xxx ms
> > >
> > > Repeating the above steps with bridge0 does let the ping pass through
> > > after the bridge is brought up. The only delta between the switch and
> > > bridge output is in the ifconfig.
> > > $ ifconfig
> > > bridge0: flags=41<UP,RUNNING>
> > >         index 8 llprio 3
> > >         groups: bridge
> > >         priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto 
> > > rtsp
> > >         em0 flags=3<LEARNING,DISCOVER>
> > >                 port 1 ifpriority 0 ifcost 0
> > > ...
> > >
> > > I have run "doas route -n monitor" in a separate session while doing
> > > this. However, I cannot comprehend the output. pf is not involved -
> > > running tcpdump -nettti pflog0 with the catchall "block log" produces
> > > the normal output of blocked packets with the bridge. However, it stops
> > > producing the normal output of blocked packets with the switch. Once the
> > > switch is destroyed, it is back to normal blocked packets output.
> > >
> > > What am I doing wrong/missing? The only thing that stands out to me is
> > > the em0 flags=0<> line in the ifconfig for the switch. And I do not know
> > > what to make of it.
> > >
> > > Regards,
> > > ab
> > > ---------|---------|---------|---------|---------|---------|---------|--
> > >
> > 
> 
> 

Reply via email to