On Thursday, 25 August 2016 13:21:24 UTC+1, Marc Haber wrote:
>
> On Wed, Aug 24, 2016 at 08:36:49AM -0700, Luke Bigum wrote: 
> > Here we have very strict control over our hardware and what interface 
> goes 
> > where. We keep CentOS 6's naming scheme on Dell hardware, so p2p1 is PCI 
> > slot 2, Port 1, and don't try rename it. 
>
> Isn't CentOS 6 still using eth0, 1, 2, 3? How do you handle different 
> hardware having different slot numbers, or PCI bridges shifting bus 
> numbers? 
>

I find this depends on the manufacturer. I've never come across a Dell 
server newer than an R510 that *doesn't* give you PCI based names. I just 
checked an R510 and it does. All of our ancient HP gear (7 years, older 
than the R510s which is old) give the ethX names. Also random SuperMicro 
hardware gives ethX. I don't really know what's missing for the kernel / 
udev to name them so, but for us it doesn't really matter.

>  We have a 3rd party patch manager tool (patchmanager.com), LLDP on 
> >  our switches, and a Nagios check that tells me if an interface is not 
> >  plugged into the switch port it is supposed to be plugged into 
> >  (according to patchmanager). 
>
> Nice ;-) Is the code for the Nagios stuff public? 
>

Unfortunately no :-( Another one of those LMAX modules that's had years of 
development but too much company specific stuff hard coded in it to 
release. It's not a huge amount though, and I did just ask my Lead if I 
could clean up our networking module and release it and he was more than 
happy, I'm sure I could do the same for our nagios module. Watch this 
space, but don't hold your breath.


>  This works perfectly on Dell hardware because the PCI name mapping 
> >  works. 
>
> And you don't have many different kinds of servers. 


We try keep as few as possible, but it's not that small a list:

*******************
[root@puppet ~]# mco facts productname
Report for fact: productname

        .................................        found 1 times
        KVM                                      found 603 times
        OptiPlex 7010                            found 1 times
        OptiPlex 7020                            found 2 times
        PowerEdge FC430                          found 15 times
        PowerEdge FC630                          found 56 times
        PowerEdge R220                           found 1 times
        PowerEdge R320                           found 92 times
        PowerEdge R330                           found 1 times
        PowerEdge R510                           found 17 times
        PowerEdge R520                           found 66 times
        PowerEdge R720                           found 36 times
        PowerEdge R720xd                         found 30 times
        PowerEdge R730                           found 7 times
        PowerEdge R730xd                         found 37 times
        Precision Tower 5810                     found 10 times
        Precision WorkStation T5500              found 7 times
        ProLiant DL360 G6                        found 2 times
        ProLiant DL380 G5                        found 16 times
        ProLiant DL380 G6                        found 11 times
        To Be Filled By O.E.M.                   found 1 times
        X9SCL/X9SCM                              found 6 times
*************************
 

 
>
>  On really old HP gear it doesn't work, 
>
> What does that mean? 
>
>
I meant that on our very old HP servers the PCI device name mapping doesn't 
come up, so you end up with eth0, eth1, etc.

 

> > We still need some sort of "glue record" that says "this interface 
> should 
> > be up and have this IP". In our older designs this was managed entirely 
> in 
> > Hiera - so there's a giant multi-level hash that we run 
> create_resources() 
> > over to define every single network interface. You can imagine the 
> amount 
> > of Hiera data we have. 
>
> That's what we're trying to avoid. Can you share example snippets? 
>


Here is a snippet of the older style, in a Node's Hiera. It is what I'm 
trying to move away from, because if you want to create 20 of these 
machines you've got to copy this Hiera hash around 20 times over. Oh the 
number of typos... You can probably interpolate the defined types that this 
data has create_resources() run over, the key names are pretty Red Hat 
specific:

*******************************
networking::interfaces:
  bond1:
    bonding_opts: mode=802.3ad xmit_hash_policy=layer3+4 lacp_rate=slow 
miimon=100
    enable: true
    onboot: 'yes'
    type: Bonding
  bond1.3:
    broadcast: 1.1.3.255
    enable: true
    ipaddr: 1.1.3.7
    netmask: 255.255.255.0
    network: 1.1.3.0
    onboot: 'yes'
    vlan: 'yes'
  p4p1:
    enable: true
    master: bond1
    onboot: 'yes'
    slave: 'yes'
    type: Ethernet
  p4p2:
    enable: true
    master: bond1
    onboot: 'yes'
    slave: 'yes'
    type: Ethernet
networking::routes:
  bond1:
    device: bond1
    routes:
    - 1.1.2.0/24 via 1.1.3.1

*******************************


> >  In the newer designs which are a lot more of a role/profile approach 
> >  I've been trying to conceptualise the networking based on our 
> >  profiles. So if one of our servers is fulfilling function "database" 
> >  there will be a Class[profile::database]. This Class might create a 
> >  bonded interface for the "STORAGE" network and another interface for 
> >  the "CLIENT" network. 
>
> That is interesting and a nice concept. But nothing one introduces 
> just to remedy an error report named "help, my interface names do not 
> fit any more". 


Probably not, it's a lot of work for burying an error message if that's 
just your aim. What I get from the abstraction above is being able to take 
our profiles and re-use them in a completely different site on the other 
side of the world, or in a staging / testing environment. So I don't have 
the concept of "VLAN 123 in Production UK", I've just got "The STORAGE 
network" which in Production UK happens to be vlan 123 (buried low down in 
Hiera, and only specified once once), but in Dev it's 456, and over there 
it doesn't exist so we'll give it the same vlan tag as the CLIENT network, 
etc... The physical-ness of the network is abstracted from the concepts our 
software relies on.
 

> So you do create network interfaces in the profile and not in the 
> role? 
>

We try to follow the design rule that "Roles only include Profiles". Since 
our software stack is heavily dependent on the networking architecture, our 
profiles for our software are not designed exist on the same server. I 
would never have profile::database and profile::frontend on the same 
server, as they might both try create a STORAGE network and fail catalog 
compilation. As such our roles generally only contain one "business level" 
profile, and look something like:

*******************
class role::database {
  include profile::mandatory         #Everything mandatory on EL6
  include profile::authentication    #Authentication is not mandatory
  include profile::database           #The profile that does most of the 
work for our software
}
*******************

This is more a function of our software being heavily tied to networking 
layout though. I do have other roles and profiles for internal (office) 
systems that can move around a lot easier, as they are just your standard 
"Run a webapp and DB" thing.
 

> >  Through various levels of Hiera I can define the STORAGE network as 
> >  VLAN 100, because it might be a different vlan tag at a different 
> >  location. Then at the Hiera node level (on each individual server) I 
> >  will have something like: 
> > 
> > profile::database::bond_storage_slaves: [ 'p2p1', 'p2p2' ] 
> > 
> > That's the glue. At some point I need to tell Puppet that on this 
> specific 
> > server, the storage network is a bond of p2p1 and p2p2. 
>
> So you need to know when writing this code what kind of hardware the 
> system is running on, probably down to firmware version and hardware 
> extras?
>

No, the exact opposite ideally.  You need to know *conceptually* what the 
requirements of our software are. So sticking with the same fictitious 
"database" example, you must have a STORAGE network and you must have a 
CLIENT network otherwise the App simply won't run (we're a little bit more 
complicated than a LAMP-stack-in-AWS company). When we've got this coded 
correctly it should be hardware independent, but, there is this "mandatory 
data" that we need to supply to get it to build (what interfaces are for 
what network). These are the "glue records" I keep talking about (to borrow 
a term from DNS). Ideally this would be zero. We *could* programatically 
determine it, but the arguments against are part "effort vs gain" and part 
"what do you want your source of truth to be". Maybe the best way for us to 
do it auto-magically would be to query Patch Manager to determine what 
networking interfaces should be present and what logical networks they 
attach to. We already Nagios check against it... That's not trivial though, 
and it also means my Puppet builds rely on an externally hosted SaaS (not 
going to fly).


> I have bounced around the idea of removing this step and trusting the 
> > switch - ie: write a fact to do an LLDP query for the VLAN of the switch 
> > port each interface is connected to, that way you wouldn't need the 
> glue, 
> > there'd be a fact called vlan_100_interfaces. 
>
> So the fact would be coming from the _server_ based on what lldpcli 
> show neighbors detail returns, which is supposed to include the VLAN 
> information? Would this work on 801.1q trunks as well? 
>

That was the idea, yes. Don't know about 801.1q, depends on what the switch 
OS does for such interfaces.

>  Two problems with this approach: we end up trusting the switch to be 
> >  our source of truth (it may not be correct, 
>
> The switch uses its own source of truth which also influences which 
> network traffic gets sent down the link, so trusting the switch will 
> at least fail to the safe side and avoid accidentally putting full 
> trust on an Internet link. 
>

Yeah if that suits your use case, you could do that. For me though, I'd 
much prefer a Puppet manifest to fail to compile because someone hasn't 
supplied the correct data. It forces an engineer to think about what they 
are building, and where it's attached.

>  and, what if the switch port is down?). 
>
> One would have to fall back to a certain safety net then. 
>
> >  Secondly the quality and consistency of LLDP information you get out 
> >  of various manufacturers of networking hardware is very different, so 
> >  relying on LLDP information to define your OS network config is a bit 
> >  risky for me. 
>
> Is it really this bad? I do have experience with HP and Cisco, and 
> their LLDP/CDP information is usually fine. 
>

In my opinion it is, yes.  One our Network Engineers changed a Dell FX2 
chassis internal I/O switch between one mode and the other to get MLAG 
workign (these are Dell Force 10 internally) and the structure of the LLDP 
information changed, and this was simple shit too - the switch description 
just "disappeared" :-(

Here's one part of our client side Nagios monitoring, a script that 
converts the LLDP information into a parse-able CSV. Our Nagios servers 
query this data via SNMP and compare it to Patch Manager, there by telling 
us if something is plugged in to the wrong port. It is sanitised to the 
"database" example, it looks like this:

[root@server ~]$ sudo .//interfaces.py 
p3p2,yes,clientswitch01.example.com,16,456,Arista DCS-7124SX
em1,yes,storageswitch01.example.com,8/1/20,123,Brocade ICX6450-48
em2,yes,storageswitch01.example.com,8/1/20,123,Brocade ICX6450-48
p4p1,yes,clientswitch02.example.com,16,456,Arista DCS-7124SX

And here's the Python that generates that output. Note the number of if 
statements in the function parse_switch_type_from_data(), and how I have to 
fall back on MAC address checks because some models simply don't want to 
report that they are a "Brocade", etc:

https://gist.github.com/lukebigum/efb5b789bfeaf962ef15128092015d08

I haven't read the LLDP standard, but from personal experience I assume it 
reads something like "Here is a list of optional fields, put whatever you 
want in them".
 

> > It's a different story for our VMs. Since they are Puppet defined we 
> > specify a MAC address and so we "know" which MAC will be attached to 
> which 
> > VM bridge. We drop a MAC based udev rule into the guest to name them 
> > similarly, ie: eth100 is on br100. 
>
> How do you puppet define your MAC addresses? Which virtualization do 
> you use? Can i see a code snippet? 
>


KVM. MAC addresses statically defined to a deterministic formulae - so if 
the IP is 1.2.3.4 the MAC address is 52:54:00:02:03:04 - the last 3 IP 
bytes are the same as the last three MAC Hex numbers. This means no MAC 
address clash :-) Unfortunately I haven't got everything magically defined 
once, so we can must  define a VM in Hiera (not the best place for it, but 
it's what we've got):

*****************
libvirt::vms:
  database:
    cpus: '4'
    ensure: running
    interfaces:
    - bridge:br123,54:52:00:01:02:03
    - bridge:br456,54:52:00:
    memory: '4096'
    on_crash: restart
    on_poweroff: destroy
    on_reboot: restart
    virt_disk: 
path=/var/lib/libvirt/images/ld4deploy01/ld4deploy01.img,size=16,bus=virtio,sparse=false
    virt_type: kvm

*****************

And then we must duplicate the MAC address in the Hiera of the VM itself 
when creating the networking inside the VM. This is crap, as it's the same 
MAC address in multiple places, but it's tricky to fix. I might be able to 
solve it with exported resources... but I'd probably get VM definitions out 
of Hiera first before I fixed this.


> That's what we do, but it's made easy by an almost homogeneous hardware 
> > platform and strict physical patch management. 
>
> Yes. The homogenous hardware platform is probably something that can 
> only be maintained for really large installations. 
>
> > When I read about your problem, it sounds like you are missing a "glue 
> > record" that describes your logical interfaces to your physical devices. 
>
> We're desperately trying to avoid having this in Hiera. 
>

I can understand that, and it's good you've got that mindset. I'd like to 
get to the same place eventually. For me, going from 100s of lines of Hiera 
for a node to < 20 is good enough so far.

>  If you were to follow something along the lines of our approach, you 
> >  might have something like this: 
> > 
> > class profile::some_firewall( 
> >   $external_interface_name = 'eth0', 
> >   $internal_interface_name = 'eth1', 
> >   $perimiter_interface_name = 'eth2' 
> > ) { 
> >   firewall { '001_allow_internal': 
> >     chain   => 'INPUT', 
> >     iniface => $internal_interface_name, 
> >     action  => 'accept', 
> >     proto => 'all', 
> >   } 
> > 
> >   firewall { '002_some_external_rule': 
> >     chain   => 'INPUT', 
> >     iniface => $external_interface_name, 
> >     action  => 'accept', 
> >     proto => 'tcp', 
> >     dport => '443', 
> >   } 
> > } 
> > 
> > That very simple firewall profile probably already works on your HP 
> > hardware, and on your Dell hardware you'd need to override the 3 
> parameters 
> > in Hiera: 
> > 
> > profile::some_firewall::internal_interface_name: 'em1' 
> > profile::some_firewall::external_interface_name: 'p3p1' 
> > profile::some_firewall::perimiter_interface_name: 'p1p1' 
>
> On the Dell R680, yes. A hypothetical "R680s" would need some other 
> definition, and a VMware Vm comes up with eno<number> interfaces with 
> eight-digit <number>, basically random.  

Thanks for your input, I appreciate it. 
>
> Greetings 
> Marc 
>
>
>
> -- 
> ----------------------------------------------------------------------------- 
>
> Marc Haber         | "I don't trust Computers. They | Mailadresse im 
> Header 
> Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 
> 1600402 
> Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 
> 1600421 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-users/f041efb7-8dcb-4011-a6ba-67a7b7597020%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to