Re: [Linux-HA] Antw: Re: Q: Debug clustered IP Adress

Ulrich Windl Wed, 29 Aug 2012 04:31:36 -0700

>>> Lars Marowsky-Bree <[email protected]> schrieb am 29.08.2012 um 11:30 in 
>>> Nachricht
<[email protected]>:
> On 2012-08-29T10:15:50, Ulrich Windl <[email protected]> 
> wrote:
> 
> > The network guys say no. Should "arp" show the Cluster-IP? I cannot see it, 
> so I wonder if something's wrong.
> 
> Well, you should see the MAC/IP mapping in the arp table if the host is
> on the same ethernet segment, yes. Otherwise the host doesn't know where
> to send the packets to.


I checked the arp table of the host that is hosting the cluster IP address. 
Thought the host should accept ist own broadcasts also. However the machine is 
also a Xen hypervisor (Dom0), so everything is connected via software bridges.

> 
> You should see the ARP responses come in with tcpdump/wireshark.
> 
> > Could the "martian source" thing be responsible? I see this for the ARPs:
> > Aug 29 09:21:35 o1 kernel: [ 1261.556861] martian source 172.20.3.59 from
> > 172.20.3.59, on dev br0
> 
> That's difficult to comment on without knowing if "o1" is the gateway
> router, one of the servers, or one of the clients on the network, and
> what the network interfaces are like.

"o1" is a cluster node hosting the cluster IP.

[...]
> > > Can you get the network trace of the arp traffic on the router into the
> > > subnet when an outside ping comes in?
> > I see this on the host (one cluster node):
> > o1:~ # tcpdump -p -i br0 -s100 -v -n host 172.20.3.59

The router is part of some HP switch where I have no access.

> 
> Are you trying to reach the cluster IP from one of the cluster nodes
> itself? I'm not sure that will work.

Why not (curiosity)? No, I was using a host that is some distance away.

> 
> > tcpdump: listening on br0, link-type EN10MB (Ethernet), capture size 100 
> bytes
> > 09:43:38.305460 arp who-has 172.20.3.59 tell 172.20.3.62
> > 09:43:38.305493 arp reply 172.20.3.59 is-at f1:e9:91:b1:b9:51
> > 
> > (172.20.3.62 is the gateway)
> 
> That looks OK. You should check the ARP table on the gateway if it is
> correctly updated with the address, though.

I'll have to meet my local guru ;-) ... Actually the MAC address was found on 
the gateway as "(dynamic)", what ever that means...

> 
> If you try to ping the cluster IP from a client, what does tcpdump show
> on the servers/gateway? Do you see the ICMP ECHO REQUEST go to the
> cluster IP with the above MAC? How do the servers respond?

A remote server only shows outgoing ICMP ECHO requests, but no replies, and TCP 
open attempts to 172.20.3.59:445/139. I'm afraid packets end at the gateway (as 
you suspected).

> 
> > Packets also arrive via broadcast:
> > 09:45:03.826371 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP 
> (17), length 271) 172.20.3.59.138 > 172.20.3.63.138: NBT UDP PACKET(138)
> > 09:45:13.836608 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP 
> (17), length 271) 172.20.3.59.138 > 172.20.3.63.138: NBT UDP PACKET(138)
> 
> You have traffic *from* the cluster IP to the broadcast address of your
> network? That looks wrong. All nodes are likely to log a martian source
> for that one (since they're getting traffic from a locally bound IP). To
> communicate internally in the cluster, Samba should use one of the local
> IP addresses.

I thought Port 138 is NetBIOS which is renowned for broadcasting all the time.

> 
> The cluster IP is only useful for communicating with the outside world,
> not inside the cluster itself.

Well, the amazing thing is that it doesn't work here, but is supported through 
Novell. In contrast, the "public_address" of CTDB works just fine here, but 
isn't supported by Novell: "Due to technical limitations, this also includes 
the CTDB internal fail-over functionality for IP address take-over. Please note 
that this part is not supported by Novell. Only Pacemaker clusters are fully 
supported."

> 
> > Still don't know where to start debugging.
> 
> Start with something simpler than Samba, see if the CIP can be pinged
> from the outside and what happens there.

Well shouldn't the manual (sle-ha-manuals_en/manual/book.sleha.html) include 
some notes on understanding and/or troubleshooting the clustered IP addresses). 
Anyway, if one clustered IP address is up, it can also be used for testing with 
PING.

I also inspected the Firewall (but that's a bit complicated for me):
Chain INPUT (policy DROP 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 CLUSTERIP  all  --  br0    *       0.0.0.0/0            172.20.3.59 
        CLUSTERIP hashmode=sourceip-sourceport clustermac=F1:E9:91:B1:B9:51 
total_nodes=5 local_node=2 hash_init=0
[...]
 307K   47M input_int  all  --  br0    *       0.0.0.0/0            0.0.0.0/0
[...]
    0     0 input_int  all  --  eth0   *       0.0.0.0/0            0.0.0.0/0
[...]
Chain FORWARD (policy DROP 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
30836 1584K ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0   
        PHYSDEV match --physdev-is-bridged
[...]
Chain input_int (8 references)
 pkts bytes target     prot opt in     out     source               destination
 618K   92M ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0
[...]
Chain FORWARD (policy DROP 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
  148 10168 ACCEPT     all      *      *       ::/0                 ::/0        
        PHYSDEV match --physdev-is-bridged
[...]
Chain input_int (8 references)
 pkts bytes target     prot opt in     out     source               destination
  488 35136 ACCEPT     all      *      *       ::/0                 ::/0
[...]

Regards,
Ulrich

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Antw: Re: Q: Debug clustered IP Adress

Reply via email to