On Thu, Aug 25, 2016 at 08:08:13AM -0700, Luke Bigum wrote:
> On Thursday, 25 August 2016 13:21:24 UTC+1, Marc Haber wrote:
> > On Wed, Aug 24, 2016 at 08:36:49AM -0700, Luke Bigum wrote: 
> > > Here we have very strict control over our hardware and what interface 
> > goes 
> > > where. We keep CentOS 6's naming scheme on Dell hardware, so p2p1 is PCI 
> > > slot 2, Port 1, and don't try rename it. 
> >
> > Isn't CentOS 6 still using eth0, 1, 2, 3? How do you handle different 
> > hardware having different slot numbers, or PCI bridges shifting bus 
> > numbers? 
> >
> 
> I find this depends on the manufacturer. I've never come across a Dell 
> server newer than an R510 that *doesn't* give you PCI based names. I just 
> checked an R510 and it does. All of our ancient HP gear (7 years, older 
> than the R510s which is old) give the ethX names. Also random SuperMicro 
> hardware gives ethX. I don't really know what's missing for the kernel / 
> udev to name them so, but for us it doesn't really matter.

Can you run
$ for iface in /sys/class/net/*; do echo $iface; sudo udevadm info -q all -p 
$iface | grep ID_NET_NAME; done
on some of your gear? I'd like to learn what different vendors deliver.

My Thinkpad T450:
/sys/class/net/br0
/sys/class/net/enx000011121314
E: ID_NET_NAME=enp0s20u4
E: ID_NET_NAME_MAC=enx000011121314
E: ID_NET_NAME_PATH=enp0s20u4
/sys/class/net/eth0
E: ID_NET_NAME=enp0s25
E: ID_NET_NAME_MAC=enx507b9d681b36
E: ID_NET_NAME_PATH=enp0s25
/sys/class/net/lo

My APU at home (with a lot of bridges and VLANs):
[2/501]mh@aida:~$ for iface in /sys/class/net/*; do echo $iface; sudo udevadm 
info -q all -p $iface | grep ID_NET_NAME; done
/sys/class/net/br181
/sys/class/net/br182
/sys/class/net/br183
/sys/class/net/br184
/sys/class/net/br185
/sys/class/net/br186
/sys/class/net/br187
/sys/class/net/br188
/sys/class/net/br189
/sys/class/net/br191
/sys/class/net/br192
/sys/class/net/br281
/sys/class/net/br381
/sys/class/net/br382
/sys/class/net/br383
/sys/class/net/brenp1s0
/sys/class/net/brenp2s0
/sys/class/net/brenp3s0
/sys/class/net/enp1s0
E: ID_NET_NAME_MAC=enx000db9342afc
E: ID_NET_NAME_PATH=enp1s0
/sys/class/net/enp2s0
E: ID_NET_NAME_MAC=enx000db9342afd
E: ID_NET_NAME_PATH=enp2s0
/sys/class/net/enp3s0
E: ID_NET_NAME_MAC=enx000db9342afe
E: ID_NET_NAME_PATH=enp3s0
/sys/class/net/int181
/sys/class/net/int182
/sys/class/net/int191
/sys/class/net/int192
/sys/class/net/lo
/sys/class/net/per281
/sys/class/net/unt381
/sys/class/net/unt382
/sys/class/net/unt383
[3/502]mh@aida:~$

A KVM VM with a lot of interfaces:
/sys/class/net/eth0
E: ID_NET_NAME=enx525400422c88
E: ID_NET_NAME_MAC=enx525400422c88
/sys/class/net/eth1
E: ID_NET_NAME=enx525400d22ad3
E: ID_NET_NAME_MAC=enx525400d22ad3
/sys/class/net/eth2
E: ID_NET_NAME=enx52540095dfa6
E: ID_NET_NAME_MAC=enx52540095dfa6
/sys/class/net/int181
/sys/class/net/int182
/sys/class/net/int183
/sys/class/net/int184
/sys/class/net/int185
/sys/class/net/int186
/sys/class/net/int187
/sys/class/net/int188
/sys/class/net/int189
/sys/class/net/int191
/sys/class/net/int192
/sys/class/net/lo
/sys/class/net/per281
/sys/class/net/unt381
/sys/class/net/unt382
/sys/class/net/unt383

> >  We have a 3rd party patch manager tool (patchmanager.com), LLDP on 
> > >  our switches, and a Nagios check that tells me if an interface is not 
> > >  plugged into the switch port it is supposed to be plugged into 
> > >  (according to patchmanager). 
> >
> > Nice ;-) Is the code for the Nagios stuff public? 
> >
> 
> Unfortunately no

Too bad ;-)

>  :-( Another one of those LMAX modules that's had years of 
>  development but too much company specific stuff hard coded in it to 
>  release. It's not a huge amount though, and I did just ask my Lead if
>  I could clean up our networking module and release it and he was more
>  than happy, I'm sure I could do the same for our nagios module. Watch
>  this space, but don't hold your breath.

That would be really really nice.

> >  This works perfectly on Dell hardware because the PCI name mapping 
> > >  works. 
> >
> > And you don't have many different kinds of servers. 
> 
> We try keep as few as possible, but it's not that small a list:
> 
> *******************
> [root@puppet ~]# mco facts productname
> Report for fact: productname
> 
>         .................................        found 1 times
>         KVM                                      found 603 times
>         OptiPlex 7010                            found 1 times
>         OptiPlex 7020                            found 2 times
>         PowerEdge FC430                          found 15 times
>         PowerEdge FC630                          found 56 times
>         PowerEdge R220                           found 1 times
>         PowerEdge R320                           found 92 times
>         PowerEdge R330                           found 1 times
>         PowerEdge R510                           found 17 times
>         PowerEdge R520                           found 66 times
>         PowerEdge R720                           found 36 times
>         PowerEdge R720xd                         found 30 times
>         PowerEdge R730                           found 7 times
>         PowerEdge R730xd                         found 37 times
>         Precision Tower 5810                     found 10 times
>         Precision WorkStation T5500              found 7 times
>         ProLiant DL360 G6                        found 2 times
>         ProLiant DL380 G5                        found 16 times
>         ProLiant DL380 G6                        found 11 times
>         To Be Filled By O.E.M.                   found 1 times
>         X9SCL/X9SCM                              found 6 times
> *************************

That's rather nice and straight through, but probably you need to
distinguish what kind of cards is plugged in. There have been cases
where bus numbers changed depending whether cards with PCI bridges
were plugged in. Also, platforms like the raspi do spontaneously
reassign their USB bus numbers, I guess this happens in the PCI world
als well.

> > > We still need some sort of "glue record" that says "this interface 
> > should 
> > > be up and have this IP". In our older designs this was managed entirely 
> > in 
> > > Hiera - so there's a giant multi-level hash that we run 
> > create_resources() 
> > > over to define every single network interface. You can imagine the 
> > amount 
> > > of Hiera data we have. 
> >
> > That's what we're trying to avoid. Can you share example snippets? 
> >
> 
> 
> Here is a snippet of the older style, in a Node's Hiera. It is what I'm 
> trying to move away from, because if you want to create 20 of these 
> machines you've got to copy this Hiera hash around 20 times over.

I understand. But somewhere you need to maintain this info. This
really belongs into a CMDB, but if one doesn't have one, hiera is
probably the next best place.

> > >  In the newer designs which are a lot more of a role/profile approach 
> > >  I've been trying to conceptualise the networking based on our 
> > >  profiles. So if one of our servers is fulfilling function "database" 
> > >  there will be a Class[profile::database]. This Class might create a 
> > >  bonded interface for the "STORAGE" network and another interface for 
> > >  the "CLIENT" network. 
> >
> > That is interesting and a nice concept. But nothing one introduces 
> > just to remedy an error report named "help, my interface names do not 
> > fit any more". 
> 
> 
> Probably not, it's a lot of work for burying an error message if that's 
> just your aim.

No, that's the subject of the ticket that prompted the discussion
here. People kept hardcoding ethX in their code...

>  What I get from the abstraction above is being able to take our
>  profiles and re-use them in a completely different site on the other 
>  side of the world, or in a staging / testing environment. So I don't
>  have the concept of "VLAN 123 in Production UK", I've just got "The
>  STORAGE network" which in Production UK happens to be vlan 123
>  (buried low down in Hiera, and only specified once once), but in Dev
>  it's 456, and over there it doesn't exist so we'll give it the same
>  vlan tag as the CLIENT network, etc... The physical-ness of the
>  network is abstracted from the concepts our software relies on.

Yes, that is a really nice concept with should have been considered
here years ago. Alas, people didn't.

> > So you do create network interfaces in the profile and not in the 
> > role? 
> >
> 
> We try to follow the design rule that "Roles only include Profiles".

... "and don't define their own resources", you mean?

That's one of the aspects of the role-and-profiles approach that I
have never seen spelled out explicitly, but still honored by nearly
anybody, and I have not yet fully grokked the reasons for doing so.

> > >  Through various levels of Hiera I can define the STORAGE network as 
> > >  VLAN 100, because it might be a different vlan tag at a different 
> > >  location. Then at the Hiera node level (on each individual server) I 
> > >  will have something like: 
> > > 
> > > profile::database::bond_storage_slaves: [ 'p2p1', 'p2p2' ] 
> > > 
> > > That's the glue. At some point I need to tell Puppet that on this 
> > specific 
> > > server, the storage network is a bond of p2p1 and p2p2. 
> >
> > So you need to know when writing this code what kind of hardware the 
> > system is running on, probably down to firmware version and hardware 
> > extras?
> >
> 
> No, the exact opposite ideally.  You need to know *conceptually* what the 
> requirements of our software are. So sticking with the same fictitious 
> "database" example, you must have a STORAGE network and you must have a 
> CLIENT network otherwise the App simply won't run (we're a little bit more 
> complicated than a LAMP-stack-in-AWS company). When we've got this coded 
> correctly it should be hardware independent, but, there is this "mandatory 
> data" that we need to supply to get it to build (what interfaces are for 
> what network). These are the "glue records" I keep talking about (to borrow 
> a term from DNS). Ideally this would be zero. We *could* programatically 
> determine it, but the arguments against are part "effort vs gain" and part 
> "what do you want your source of truth to be". Maybe the best way for us to 
> do it auto-magically would be to query Patch Manager to determine what 
> networking interfaces should be present and what logical networks they 
> attach to. We already Nagios check against it... That's not trivial though, 
> and it also means my Puppet builds rely on an externally hosted SaaS (not 
> going to fly).

I think have understood now.

> > The switch uses its own source of truth which also influences which 
> > network traffic gets sent down the link, so trusting the switch will 
> > at least fail to the safe side and avoid accidentally putting full 
> > trust on an Internet link. 
> 
> Yeah if that suits your use case, you could do that. For me though, I'd 
> much prefer a Puppet manifest to fail to compile because someone hasn't 
> supplied the correct data. It forces an engineer to think about what they 
> are building, and where it's attached.

Yes, that's of course a valid assumption. I was more thinking about
the intern who was sent to the datacenter to re-wire the networking
cables of Server 3847 but instead works on 3874. The lldp setup would
catch that.

> >  and, what if the switch port is down?). 
> >
> > One would have to fall back to a certain safety net then. 
> >
> > >  Secondly the quality and consistency of LLDP information you get out 
> > >  of various manufacturers of networking hardware is very different, so 
> > >  relying on LLDP information to define your OS network config is a bit 
> > >  risky for me. 
> >
> > Is it really this bad? I do have experience with HP and Cisco, and 
> > their LLDP/CDP information is usually fine. 
> 
> In my opinion it is, yes.  One our Network Engineers changed a Dell FX2 
> chassis internal I/O switch between one mode and the other to get MLAG 
> workign (these are Dell Force 10 internally) and the structure of the LLDP 
> information changed, and this was simple shit too - the switch description 
> just "disappeared" :-(

Ouch. All software sucks.

> Here's one part of our client side Nagios monitoring, a script that 
> converts the LLDP information into a parse-able CSV. Our Nagios servers 
> query this data via SNMP and compare it to Patch Manager, there by telling 
> us if something is plugged in to the wrong port. It is sanitised to the 
> "database" example, it looks like this:
> 
> [root@server ~]$ sudo .//interfaces.py 
> p3p2,yes,clientswitch01.example.com,16,456,Arista DCS-7124SX
> em1,yes,storageswitch01.example.com,8/1/20,123,Brocade ICX6450-48
> em2,yes,storageswitch01.example.com,8/1/20,123,Brocade ICX6450-48
> p4p1,yes,clientswitch02.example.com,16,456,Arista DCS-7124SX
> 
> And here's the Python that generates that output. Note the number of if 
> statements in the function parse_switch_type_from_data(), and how I have to 
> fall back on MAC address checks because some models simply don't want to 
> report that they are a "Brocade", etc:
> 
> https://gist.github.com/lukebigum/efb5b789bfeaf962ef15128092015d08
> 
> I haven't read the LLDP standard, but from personal experience I assume it 
> reads something like "Here is a list of optional fields, put whatever you 
> want in them".

Nice, thanks. It is always enlightening to be given a show into other
people's machine rooms. People should do that way more often.

> > > It's a different story for our VMs. Since they are Puppet defined we 
> > > specify a MAC address and so we "know" which MAC will be attached to 
> > which 
> > > VM bridge. We drop a MAC based udev rule into the guest to name them 
> > > similarly, ie: eth100 is on br100. 
> >
> > How do you puppet define your MAC addresses? Which virtualization do 
> > you use? Can i see a code snippet? 
> 
> KVM. MAC addresses statically defined to a deterministic formulae - so if 
> the IP is 1.2.3.4 the MAC address is 52:54:00:02:03:04 - the last 3 IP
> bytes are the same as the last three MAC Hex numbers. This means no MAC 
> address clash :-)

My first reaction to that was "stupid idea. Don't do this". But after
thinking about it and especially realizing that IPv6 SLAAC basically
does the same thing with the entire MAC addresses, it's actually
pretty smart.

> *****************
> libvirt::vms:
>   database:
>     cpus: '4'
>     ensure: running
>     interfaces:
>     - bridge:br123,54:52:00:01:02:03
>     - bridge:br456,54:52:00:
>     memory: '4096'
>     on_crash: restart
>     on_poweroff: destroy
>     on_reboot: restart
>     virt_disk: 
> path=/var/lib/libvirt/images/ld4deploy01/ld4deploy01.img,size=16,bus=virtio,sparse=false
>     virt_type: kvm
> 
> *****************
> 
> And then we must duplicate the MAC address in the Hiera of the VM itself 
> when creating the networking inside the VM. This is crap, as it's the same 
> MAC address in multiple places, but it's tricky to fix. I might be able to 
> solve it with exported resources... but I'd probably get VM definitions out 
> of Hiera first before I fixed this.

I like it.

> > That's what we do, but it's made easy by an almost homogeneous hardware 
> > > platform and strict physical patch management. 
> >
> > Yes. The homogenous hardware platform is probably something that can 
> > only be maintained for really large installations. 
> >
> > > When I read about your problem, it sounds like you are missing a "glue 
> > > record" that describes your logical interfaces to your physical devices. 
> >
> > We're desperately trying to avoid having this in Hiera. 
> 
> I can understand that, and it's good you've got that mindset. I'd like to 
> get to the same place eventually. For me, going from 100s of lines of Hiera 
> for a node to < 20 is good enough so far.

It's a matter of style and history. A someone who changes teams often,
I have stopped thinking about stuff like that (those decisions tend to
be taken before I come in) and just adopt what the environment is used
to.

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-users/20160826095716.GP2471%40torres.zugschlus.de.
For more options, visit https://groups.google.com/d/optout.

Reply via email to