Forum: CFEngine Help Subject: High Availability / High Performance Advanced Automation Cfengine Architecture Author: msvob...@linkedin.com Link to topic: https://cfengine.com/forum/read.php?3,26239,26239#msg-26239
At LinkedIn, we wanted our Cfengine automation architecture to allow us to scale linerally to support tens of thousands of machines. Our requirements of what we wanted to achive: [*] Contain all Cfengine network traffic to stay within the same cage. Do not cross core network routers / infrastructure. [*] Implement client-side software load balacing across multiple master policy servers. [*] Be highly available. If a single master policy server goes offline, we should still be able to maintain our automation. [*] Allow us to co-locate various services (yum servers, range servers, rsync servers, etc.) that could also take advantage of the above. Since deploying 3.3.1 into production, we are able to achieve our requirements using the new splay_class feature. From the reference guide: https://cfengine.com/manuals/cf3-Reference#select_005fclass-in-classes This feature is somewhat like the splayclass function, but instead of selecting a class for a moment in time, it always chooses one class in the list – the same class each time for a given host. This allows hosts to be distributed across a controlled list of classes, e.g for load balancing purposes. The class is chosen deterministically (not randomly) but it is not possible to say which host will end up in which class in advance – only that hosts will always end up in the same class every time. So the neat thing about splay_class is that clients can make a decision, and always stay consistant with that decision -- and that clients will automatically balance themselves across all available choices. That last sentenace probably didn't make much sense, so lets start looking at some code examples. First, to understand how we have implemented multiple master policy servers, here is an example what a possible configuration could look like. If you've been using Cfengine, you understand that clients execute cf-agent against failsafe.cf, and then promises.cf. Failsafe.cf provides the network transfer of data to clients. Lets take a look at this first. $ cat failsafe.cf body common control { bundlesequence => { "update" }; inputs => { "shared_global_environment.cf", "cf-execd.cf", "update.cf" }; } bundle common global { # Define the environment and site as classes. We will use this in update.cf # to determine what point in the SVN tree / filesystem directory hirarchy we should pull site specific policies. # failsafe_global.cf is one of them. That means failsafe_global.cf is dependant upon itself for a properly # functioning CFEngine infrastructure. Thats how critical this is. Dont break it. vars: "global_nic" string => execresult("/usr/bin/getent hosts `/bin/hostname` | awk '{print $1}'","useshell"); } body agent control { # Bind to the global NIC on all inbound / outbound network i/o bindtointerface => "$(global.global_nic)"; # We hit this class if we didn't find our SITE/ENV from shared_global_environment.cf. We can't perform network transfers # if we dont know where we are and who we should contact. Someone probably forgot to update shared_global_environment with # the new VLAN information. abortclasses => { "stop_cfengine_execution" }; } So, our failsafe.cf is actually very simple. We import 3 files that contain the primary logic of how to perform network transfers. cf-execd.cf is just configuration for the cf-execd daemon and isn't that interesting, so lets focus on shared_global_environment.cf and update.cf. The exciting stuff happens in shared_global_environment.cf $ cat -n shared_global_environment.cf 1 bundle common shared_global_environment 2 3 # This file is imported in failsafe.cf and promises.cf to set global classes in execution of both policies. If you break this file, 4 # you break Cfengine everywhere. 5 6 { 7 vars: 8 PROD:: 9 "environment" string => "PROD"; 10 CORP:: 11 "environment" string => "CORP"; 12 STG:: 13 "environment" string => "STG"; 14 QA:: 15 "environment" string => "QA"; 16 17 dc2:: 18 "site" string => "dc2"; 19 dc3:: 20 "site" string => "dc3"; 21 dc1:: 22 "site" string => "dc1"; 23 dc5:: 24 "site" string => "dc5"; 25 dc4:: 26 "site" string => "dc4"; 27 dc6:: 28 "site" string => "dc6"; 29 30 ################################################### dc1 ################################################################## 31 (dc1|dc5):: 32 "mps1" string => "dc1-cage1-mps01.corp.cfengine.com", 33 policy => "free"; 34 35 "mps2" string => "dc1-cage1-mps02.corp.cfengine.com", 36 policy => "free"; 37 38 ################################################### dc1 ################################################################## 39 ################################################### dc4 ################################################################## 40 dc4:: 41 "mps1" string => "dc4-cage10-mps01.corp.cfengine.com", 42 policy => "free"; 43 44 "mps2" string => "dc4-cage10-mps02.corp.cfengine.com", 45 policy => "free"; 46 ################################################### dc4 ################################################################## 47 ################################################### dc6 ################################################################## 48 dc6:: 49 "mps1" string => "dc6-cage11-mps01.prod.cfengine.com", 50 policy => "free"; 51 52 "mps2" string => "dc6-cage11-mps02.prod.cfengine.com", 53 policy => "free"; 54 55 ################################################### dc6 ################################################################## 56 ################################################### dc2 ################################################################## 57 dc2_cage1:: 58 "mps1" string => "dc2-cage1-mps01.prod.cfengine.com", 59 policy => "free"; 60 dc2_cage1:: 61 "mps2" string => "dc2-cage1-mps02.prod.cfengine.com", 62 policy => "free"; 63 dc2_cage2:: 64 "mps1" string => "dc2-cage2-mps01.prod.cfengine.com", 65 policy => "free"; 66 dc2_cage2:: 67 "mps2" string => "dc2-cage2-mps02.prod.cfengine.com", 68 policy => "free"; 69 dc2_cage3:: 70 "mps1" string => "dc2-cage1-mps01.prod.cfengine.com", 71 policy => "free"; 72 dc2_cage3:: 73 "mps2" string => "dc2-cage1-mps02.prod.cfengine.com", 74 policy => "free"; 75 dc2_cage4:: 76 "mps1" string => "dc2-cage4-mps01.prod.cfengine.com", 77 policy => "free"; 78 dc2_cage4:: 79 "mps2" string => "dc2-cage4-mps02.prod.cfengine.com", 80 policy => "free"; 81 dc2_cage5:: 82 "mps1" string => "dc2-cage5-mps01.prod.cfengine.com", 83 policy => "free"; 84 dc2_cage5:: 85 "mps2" string => "dc2-cage5-mps02.prod.cfengine.com", 86 policy => "free"; 87 ################################################### dc2 ################################################################## 88 ################################################### dc3 ################################################################## 89 dc3_cage6:: 90 "mps1" string => "dc3-cage6-mps01.prod.cfengine.com", 91 policy => "free"; 92 dc3_cage6:: 93 "mps2" string => "dc3-cage6-mps02.prod.cfengine.com", 94 policy => "free"; 95 dc3_cage7:: 96 "mps1" string => "dc3-cage7-mps01.prod.cfengine.com", 97 policy => "free"; 98 dc3_cage7:: 99 "mps2" string => "dc3-cage7-mps02.prod.cfengine.com", 100 policy => "free"; 101 dc3_cage8:: 102 "mps1" string => "dc3-cage7-mps01.prod.cfengine.com", 103 policy => "free"; 104 dc3_cage8:: 105 "mps2" string => "dc3-cage7-mps02.prod.cfengine.com", 106 policy => "free"; 107 dc3_cage9:: 108 "mps1" string => "dc3-cage9-mps01.prod.cfengine.com", 109 policy => "free"; 110 dc3_cage9:: 111 "mps2" string => "dc3-cage9-mps02.prod.cfengine.com", 112 policy => "free"; 113 ################################################### dc3 ################################################################## 114 ################################################### MPS ################################################################## 115 # The master policy servers use select_class to alternate between mps1 and mps2 as well. They're just both set to localhost. 116 # we reference shared_global_enviornment.mps1/mps2 for range queries, yum repo servers, etc. so this needs to resolve for the MPS. 117 master_policy_servers:: 118 "mps1" string => "$(sys.host)", 119 policy => "free"; 120 121 "mps2" string => "$(sys.host)", 122 policy => "free"; 123 ################################################### MPS ################################################################## 124 classes: 125 "mps_selection" select_class => { "mps1_primary", "mps2_primary" }; 126 127 "master_policy_servers" or => { 128 "dc3_cage6_mps01_prod_cfengine_com", 129 "dc3_cage6_mps02_prod_cfengine_com", 130 "dc3_cage7_mps01_prod_cfengine_com", 131 "dc3_cage7_mps02_prod_cfengine_com", 132 "dc3_cage9_mps01_prod_cfengine_com", 133 "dc3_cage9_mps02_prod_cfengine_com", 134 "dc1_cage1_mps01_corp_cfengine_com", 135 "dc1_cage1_mps02_corp_cfengine_com", 136 "dc2_cage1_mps01_prod_cfengine_com", 137 "dc2_cage1_mps02_prod_cfengine_com", 138 "dc2_cage5_mps01_prod_cfengine_com", 139 "dc2_cage5_mps02_prod_cfengine_com", 140 "dc2_cage2_mps01_prod_cfengine_com", 141 "dc2_cage2_mps02_prod_cfengine_com", 142 "dc2_cage4_mps01_prod_cfengine_com", 143 "dc2_cage4_mps02_prod_cfengine_com", 144 "dc4_cage10_mps01_corp_cfengine_com", 145 "dc4_cage10_mps02_corp_cfengine_com", 146 "dc6_cage11_mps01_prod_cfengine_com", 147 "dc6_cage11_mps02_prod_cfengine_com", }; 148 149 150 ################################################ dc3 ########################################################## 151 "dc3_cage7" or => {"ipv4_192_20_130", "ipv4_192_20_131", "ipv4_192_20_132", "ipv4_192_20_133", 152 "ipv4_192_20_134", "ipv4_192_20_135", "ipv4_192_20_136", 153 "ipv4_192_20_137", "ipv4_192_20_138", "ipv4_192_20_139", "ipv4_192_20_140", 154 "ipv4_192_20_141", "ipv4_192_20_142", "ipv4_192_20_143", "ipv4_192_20_144", 155 "ipv4_192_20_145", "ipv4_192_20_146", "ipv4_192_20_147", "ipv4_192_20_148", 156 "ipv4_192_20_149", "ipv4_192_20_150", "ipv4_192_20_151", "ipv4_192_20_152", 157 "ipv4_192_20_153", "ipv4_192_20_154", "ipv4_192_20_155", "ipv4_192_20_156", }; 158 159 "dc3_cage6" or => {"ipv4_192_20_162", "ipv4_192_20_163", "ipv4_192_20_166", "ipv4_192_20_167", 160 "ipv4_192_20_168", "ipv4_192_20_169", "ipv4_192_20_170", "ipv4_192_20_171", 161 "ipv4_192_20_192", "ipv4_192_20_173", "ipv4_192_20_174", "ipv4_192_20_175", 162 "ipv4_192_20_176", "ipv4_192_20_179", "ipv4_192_20_180", }; 163 164 "dc3_cage9" or => {"ipv4_192_20_225", "ipv4_192_20_226", "ipv4_192_20_227", "ipv4_192_20_228", 165 "ipv4_192_20_229", "ipv4_192_20_230", "ipv4_192_20_231", "ipv4_192_20_232", 166 "ipv4_192_20_233", "ipv4_192_20_234", "ipv4_192_20_235", "ipv4_192_20_236", 167 "ipv4_192_20_237", "ipv4_192_20_240", "ipv4_192_20_241", "ipv4_192_20_242", 168 "ipv4_192_20_243", "ipv4_192_20_244", "ipv4_192_20_245", "ipv4_192_20_246", 169 "ipv4_192_20_247", "ipv4_192_20_248", "ipv4_192_20_249", "ipv4_192_20_250", 170 "ipv4_192_20_251", "ipv4_192_20_252", "ipv4_192_20_253", "ipv4_192_20_254", }; 171 172 "dc3_cage8" or => {"ipv4_192_20_194", "ipv4_192_20_195", "ipv4_192_20_196", "ipv4_192_20_197", 173 "ipv4_192_20_198", }; 174 175 "dc3" or => {"dc3_cage7", "dc3_cage6", "dc3_cage9", "dc3_cage8", "dc3_cage8", }; 176 ################################################ dc3 ########################################################## 177 ################################################ dc2 ########################################################## 178 "dc2_cage1" or => {"ipv4_192_20_50", "ipv4_192_20_51", "ipv4_192_20_52", "ipv4_192_20_53", 179 "ipv4_192_20_54", "ipv4_192_20_55", "ipv4_192_20_56", "ipv4_192_20_57", 180 "ipv4_192_20_58", "ipv4_192_20_59", "ipv4_192_20_60", "ipv4_192_20_61", 181 "ipv4_192_20_62", "ipv4_192_20_63", "ipv4_192_20_64", "ipv4_192_20_65", 182 "ipv4_192_20_66", "ipv4_192_20_67", "ipv4_192_20_68", "ipv4_192_20_69", 183 "ipv4_192_20_70", "ipv4_192_20_71", "ipv4_192_20_72", "ipv4_192_20_73", 184 "ipv4_192_20_74", "ipv4_192_20_77", }; 185 186 "dc2_cage2" or => {"ipv4_192_20_82", "ipv4_192_20_83", "ipv4_192_20_84", "ipv4_192_20_85", 187 "ipv4_192_20_86", "ipv4_192_20_87", "ipv4_192_20_88", "ipv4_192_20_89", 188 "ipv4_192_20_90", "ipv4_192_20_91", "ipv4_192_20_92", "ipv4_192_20_93", 189 "ipv4_192_20_96", "ipv4_192_20_97", "ipv4_192_20_98", "ipv4_192_20_99", 190 "ipv4_192_20_100", "ipv4_192_20_101", "ipv4_192_20_102", "ipv4_192_20_103", 191 "ipv4_192_20_104", "ipv4_192_20_105", "ipv4_192_20_109", "ipv4_192_20_110", }; 192 193 "dc2_cage3" or => {"ipv4_192_20_114", "ipv4_192_20_115", "ipv4_192_20_116", "ipv4_192_20_117", 194 "ipv4_192_20_118", "ipv4_192_20_119", "ipv4_192_20_120", "ipv4_192_20_121", 195 "ipv4_192_20_122", "ipv4_192_20_123", "ipv4_192_20_124", "ipv4_192_20_125", }; 196 197 "dc2_cage4" or => {"ipv4_192_20_18", "ipv4_192_20_18", "ipv4_192_20_19", "ipv4_192_20_20", 198 "ipv4_192_20_21", "ipv4_192_20_22", "ipv4_192_20_23", "ipv4_192_20_24", 199 "ipv4_192_20_25", "ipv4_192_20_26", "ipv4_192_20_27", "ipv4_192_20_28", 200 "ipv4_192_20_29", "ipv4_192_20_30", "ipv4_192_20_33", "ipv4_192_20_34", 201 "ipv4_192_20_35", "ipv4_192_20_36", "ipv4_192_20_37", "ipv4_192_20_38", 202 "ipv4_192_20_39", "ipv4_192_20_40", "ipv4_192_20_41", "ipv4_192_20_45", }; 203 204 "dc2_cage5" or => {"ipv4_192_24_128", "ipv4_192_24_129", "ipv4_192_24_130", "ipv4_192_24_131", 205 "ipv4_192_24_132", "ipv4_192_24_133", "ipv4_192_24_134", "ipv4_192_24_135", 206 "ipv4_192_24_136", "ipv4_192_24_137", "ipv4_192_24_138", "ipv4_192_24_139", 207 "ipv4_192_24_140", "ipv4_192_24_141", "ipv4_192_24_142", "ipv4_192_24_143", 208 "ipv4_192_24_144", "ipv4_192_24_145", "ipv4_192_24_146", "ipv4_192_24_147", 209 "ipv4_192_24_148", "ipv4_192_24_149", "ipv4_192_24_150", "ipv4_192_24_151", 210 "ipv4_192_24_152", "ipv4_192_24_153", "ipv4_192_24_154", "ipv4_192_24_155", 211 "ipv4_192_24_156", "ipv4_192_24_157", "ipv4_192_24_158", "ipv4_192_24_159", 212 "ipv4_192_24_160", "ipv4_192_24_161", "ipv4_192_24_162", "ipv4_192_24_163", 213 "ipv4_192_24_164", "ipv4_192_24_165", "ipv4_192_24_166", "ipv4_192_24_167", 214 "ipv4_192_24_168", "ipv4_192_24_169", "ipv4_192_24_170", "ipv4_192_24_171", 215 "ipv4_192_24_192", "ipv4_192_24_173", "ipv4_192_24_174", "ipv4_192_24_175", 216 "ipv4_192_24_176", "ipv4_192_24_177", "ipv4_192_24_178", "ipv4_192_24_179", 217 "ipv4_192_24_180", "ipv4_192_24_181", "ipv4_192_24_182", "ipv4_192_24_183", 218 "ipv4_192_24_184", "ipv4_192_24_185", "ipv4_192_24_186", "ipv4_192_24_187", 219 "ipv4_192_24_188", "ipv4_192_24_189", "ipv4_192_24_190", "ipv4_192_24_191", }; 220 221 "dc2" or => {"dc2_cage1", "dc2_cage2", "dc2_cage3", "dc2_cage4", "dc2_cage5", }; 222 ################################################ dc2 ########################################################## 223 ################################################ dc1 ########################################################## 224 "dc1_corp_cfengine_com" or => {"ipv4_192_21_16", "ipv4_192_21_17", "ipv4_192_21_18", "ipv4_192_21_19", 225 "ipv4_192_21_20", 226 "ipv4_192_21_24", "ipv4_192_21_25", "ipv4_192_21_26", "ipv4_192_21_27", 227 "ipv4_192_21_28", "ipv4_192_21_29", "ipv4_192_21_30", "ipv4_192_21_31", 228 "ipv4_192_21_32", "ipv4_192_21_33", "ipv4_192_21_34", "ipv4_192_21_35", 229 "ipv4_192_21_36", "ipv4_192_21_37", "ipv4_192_21_38", "ipv4_192_21_39", 230 "ipv4_192_21_40", "ipv4_192_21_41", "ipv4_192_21_42", "ipv4_192_21_45",}; 231 232 "dc1_stg_cfengine_com" or => {"ipv4_192_16_64", "ipv4_192_16_65", "ipv4_192_16_66", "ipv4_192_16_67", 233 "ipv4_192_16_68", "ipv4_192_16_69", 234 "ipv4_192_16_72", "ipv4_192_16_73", "ipv4_192_16_74", "ipv4_192_16_75", 235 "ipv4_192_16_76", "ipv4_192_16_77", "ipv4_192_16_78", "ipv4_192_16_79", 236 "ipv4_192_21_98", "ipv4_192_16_70", "ipv4_192_16_71", }; 237 238 "dc1" or => {"dc1_corp_cfengine_com", "dc1_stg_cfengine_com", }; 239 ################################################ dc1 ########################################################## 240 ################################################ dc5 ########################################################## 241 "dc5_corp_cfengine_com" or => {"ipv4_192_21_48", "ipv4_192_21_49", "ipv4_192_21_50", "ipv4_192_21_51", 242 "ipv4_192_21_52", "ipv4_192_21_53", "ipv4_192_21_54", "ipv4_192_21_55", 243 "ipv4_192_21_56", "ipv4_192_21_57", "ipv4_192_21_58", "ipv4_192_21_59", 244 "ipv4_192_21_60", "ipv4_192_21_61", "ipv4_192_21_62", "ipv4_192_21_63", }; 245 246 "dc5" or => {"dc5_corp_cfengine_com", }; 247 ################################################ dc5 ########################################################## 248 ################################################ dc6 ########################################################## 249 "dc6_corp_cfengine_com" or => {"ipv4_192_21_81", "ipv4_192_21_82", "ipv4_192_21_83", "ipv4_192_21_86", 250 "ipv4_192_21_87", }; 251 252 "dc6_prod_cfengine_com" or => {"ipv4_192_20_210", "ipv4_192_20_211", "ipv4_192_20_214", "ipv4_192_20_215", 253 "ipv4_192_20_216", "ipv4_192_20_217", "ipv4_192_20_218", "ipv4_192_20_219", 254 "ipv4_192_20_220", "ipv4_192_20_221", }; 255 256 "dc6" or => {"dc6_corp_cfengine_com", "dc6_prod_cfengine_com" }; 257 ################################################ dc6 ########################################################## 258 ################################################ dc4 ########################################################## 259 "dc4_corp_cfengine_com" or => {"ipv4_192_21_128", "ipv4_192_21_129", "ipv4_192_21_130", "ipv4_192_21_131", 260 "ipv4_192_21_132", "ipv4_192_21_133", "ipv4_192_21_134", "ipv4_192_21_135", 261 "ipv4_192_21_136", "ipv4_192_21_137", "ipv4_192_21_138", "ipv4_192_21_141", 262 "ipv4_192_21_142", "ipv4_192_21_143", "ipv4_192_21_158", "ipv4_192_21_159", 263 "ipv4_192_21_191", }; 264 265 "dc4_stg_cfengine_com" or => {"ipv4_192_21_160", "ipv4_192_21_161", "ipv4_192_21_162", "ipv4_192_21_163", 266 "ipv4_192_21_164", "ipv4_192_21_165", "ipv4_192_21_166", "ipv4_192_21_167", 267 "ipv4_192_21_168", "ipv4_192_21_169", "ipv4_192_21_170", "ipv4_192_21_171", 268 "ipv4_192_21_192", "ipv4_192_21_173", "ipv4_192_21_174", "ipv4_192_21_175", 269 "ipv4_192_21_176", "ipv4_192_21_177", "ipv4_192_21_178", "ipv4_192_21_179", 270 "ipv4_192_21_180", "ipv4_192_21_181", "ipv4_192_21_182", "ipv4_192_21_183", 271 "ipv4_192_21_184", "ipv4_192_21_185", "ipv4_192_21_186", "ipv4_192_21_187", 272 "ipv4_192_21_188", "ipv4_192_21_189", "ipv4_192_21_190", }; 273 274 "dc4_qa_cfengine_com" or => {"ipv4_192_21_144", "ipv4_192_21_145", "ipv4_192_21_146", "ipv4_192_21_147", 275 "ipv4_192_21_148", "ipv4_192_21_149", "ipv4_192_21_150", "ipv4_192_21_151", }; 276 277 "dc4" or => {"dc4_corp_cfengine_com", "dc4_stg_cfengine_com", "dc4_qa_cfengine_com", }; 278 ################################################ dc4 ########################################################## 279 280 ############################################## SITE/ENV ######################################################## 281 # Behaves as the XOR operation on class expressions. It can be used to define a class if exactly one of the class 282 # expressions on the RHS matches. We only want a single SITE/ENV defined so we use xor instead of or to guarantee only 1 class 283 # is active at any time. With multiple SITE/ENV classes defined, we dont know what/where to transfer data from. 284 285 "CORP" xor => {"dc4_corp_cfengine_com", "dc5_corp_cfengine_com", "dc1_corp_cfengine_com", "dc6_corp_cfengine_com", }; 286 "STG" xor => {"dc4_stg_cfengine_com", "dc1_stg_cfengine_com", }; 287 "QA" xor => {"dc4_qa_cfengine_com", }; 288 "PROD" xor => {"dc2", "dc3", "dc6_prod_cfengine_com", }; 289 290 "SINGLE_SITE_DEFINED" xor => {"dc4", "dc5", "dc1", "dc2", "dc3", "dc6", }; 291 "SINGLE_ENV_DEFINED" xor => {"CORP", "FIN", "STG", "QA", "PROD" }; 292 "SITE_ENV_DEFINED" and => {"SINGLE_SITE_DEFINED", "SINGLE_ENV_DEFINED"}; 293 "ABORT_CFENGINE_EXECUTION" not => "SITE_ENV_DEFINED"; 294 ############################################## SITE/ENV ######################################################## 295 } [*] line 1, we enter bundle common, which means that the classes / variables defined here are global in scope. [*] lines 2-30 is just a method of converting a class into a variable. [*] lines 30-113 define the master policy servers [*] lines 114-123, we have to make a special case for the master policy servers themselves. Since we run our master policy servers just like clients -- we need to be able to resolve $(shared_global_environment.mps1) and $(shared_global_envionment.mps2) from the master policy servers as well. Here, we just set them to resolve to localhost. [*] line 125 uses select_class to determine if the client should fall under the mps1_primary or mps2_primary class. The whole logic of this policy example relies on this line. Clients will evenly balance between the two class statements. [*] lines 127 - 150 creates the master_policy_servers class by hostname. [*] lines 150 - 278 create the datacenter / cage classes by VLANs. We use these classes to decide which master policy servers to use in lines 30-113. This is how we contain Cfengine traffic in the same cage / datacenter. [*] lines 280 - 295 create the PROD/CORP/STG/etc. environment classes that we use to determine the machine's function. The machine must only exist in a single envionment, as we will use this enviornment to transfer data in update.cf. If a machine belongs to mulitple enviorments, we raise the ABORT_CFENGINE_EXECUTION class which causes the client to halt since network transfers / policy logic will break. So above, we have defined a TON of classes and logic of an envionment. Finally, lets actually perform the network transfers using the above data. $ cat -n update.cf 1 bundle agent update 2 { 3 vars: 4 "master_modules" string => "/var/cfengine/masterfiles/cf-agent_modules"; 5 "generic_policy_location" string => "/var/cfengine/masterfiles/generic_cf-agent_policies"; 6 7 PROD.dc1:: 8 "site_specific_policy_location" string => "/var/cfengine/masterfiles/PROD/dc1/cf-agent"; 9 PROD.dc2:: 10 "site_specific_policy_location" string => "/var/cfengine/masterfiles/PROD/dc2/cf-agent"; 11 PROD.dc3:: 12 "site_specific_policy_location" string => "/var/cfengine/masterfiles/PROD/dc3/cf-agent"; 13 CORP.dc4:: 14 "site_specific_policy_location" string => "/var/cfengine/masterfiles/CORP/dc4/cf-agent"; 15 CORP.dc5:: 16 "site_specific_policy_location" string => "/var/cfengine/masterfiles/CORP/dc5/cf-agent"; 17 CORP.dc6:: 18 "site_specific_policy_location" string => "/var/cfengine/masterfiles/CORP/dc6/cf-agent"; 19 CORP.dc3:: 20 "site_specific_policy_location" string => "/var/cfengine/masterfiles/CORP/dc3/cf-agent"; 21 STG.dc4:: 22 "site_specific_policy_location" string => "/var/cfengine/masterfiles/STG/dc4/cf-agent"; 23 STG.dc5:: 24 "site_specific_policy_location" string => "/var/cfengine/masterfiles/STG/dc5/cf-agent"; 25 STG.dc6:: 26 "site_specific_policy_location" string => "/var/cfengine/masterfiles/STG/dc6/cf-agent"; 27 QA.dc6:: 28 "site_specific_policy_location" string => "/var/cfengine/masterfiles/QA/dc6/cf-agent"; 29 30 classes: 31 # If we dont have a SITE/ENV defined in shared_global_environment, then we must abort execution. We dont know what to transfer 32 # into the client's /var/cfengine/inputs_site_specific, so, bail. promises.cf would miss a bunch of stuff too. we probably added 33 # a VLAN, but we didn't update shared_global_environment.cf with that mapping. 34 "stop_cfengine_execution" or => {"ABORT_CFENGINE_EXECUTION", }; 35 36 files: 37 ########################################### /var/cfengine/inputs ################################################ 38 !master_policy_servers:: 39 "/var/cfengine/inputs" 40 handle => "multi_mps_update_general_policies_for_cf_agent", 41 copy_from => multiple_remote_copy("$(generic_policy_location)","$(shared_global_environment.mps1)", "$(shared_global_environment.mps2)"), 42 depth_search => recurse(inf), 43 action => immediate; 44 45 master_policy_servers:: 46 "/var/cfengine/inputs" 47 handle => "update_general_policies_for_mps", 48 copy_from => mycopy("$(generic_policy_location)"), 49 depth_search => recurse(inf), 50 action => immediate; 51 ########################################### /var/cfengine/inputs ################################################ 52 ########################################### /var/cfengine/modules ############################################### 53 !master_policy_servers:: 54 "/var/cfengine/modules" 55 handle => "multi_mps_update_modules_for_cf_agent", 56 perms => mog("0700","root","root"), 57 copy_from => multiple_remote_copy("$(master_modules)","$(shared_global_environment.mps1)", "$(shared_global_environment.mps2)"), 58 depth_search => recurse(inf), 59 action => immediate; 60 61 master_policy_servers:: 62 "/var/cfengine/modules" 63 handle => "update_modules_for_mps", 64 perms => mog("0700","root","root"), 65 copy_from => mycopy("$(master_modules)"), 66 depth_search => recurse("inf"), 67 action => immediate; 68 ########################################### /var/cfengine/modules ############################################### 69 ##################################### /var/cfengine/inputs_site_specific ######################################## 70 !master_policy_servers:: 71 "/var/cfengine/inputs_site_specific" 72 handle => "multi_mps_site_specific_policies_for_cf_agent", 73 copy_from => multiple_remote_copy("$(site_specific_policy_location)","$(shared_global_environment.mps1)", "$(shared_global_environment.mps2)"), 74 depth_search => recurse(inf), 75 action => immediate; 76 77 master_policy_servers:: 78 "/var/cfengine/inputs_site_specific" 79 handle => "update_site_specific_polices_for_mps", 80 copy_from => mycopy("$(site_specific_policy_location)"), 81 depth_search => recurse(inf), 82 action => immediate; 83 ##################################### /var/cfengine/inputs_site_specific ######################################## 84 85 ############################################### Filesystem ###################################################### 86 # Update the binaries from the local sbin directory 87 linux|sunos_5_10:: 88 "/var/cfengine/bin" 89 handle => "update_binaries_for_global_use", 90 perms => mog("0700","root","root"), 91 copy_from => mycopy("/var/cfengine/sbin"), 92 depth_search => recurse("1"), 93 action => immediate; 94 95 # Keep the root CFEngine directory 0700/root/root so nobody can peek in here but root. 96 linux|sunos_5_10:: 97 "/var/cfengine/." 98 handle => "cfengine_root_dir_perms", 99 perms => mog("0700","root","root"); 100 101 !master_policy_servers.mps1_primary:: 102 "/etc/cm.conf" 103 handle => "add_multi_mps1_primary_to_cm_conf", 104 create => "true", 105 perms => mog("0644","root","root"), 106 edit_line => append_if_no_line("PRIMARY_MPS:$(shared_global_environment.mps1)"); 107 108 !master_policy_servers.mps1_primary:: 109 "/etc/cm.conf" 110 handle => "add_multi_mps2_secondary_to_cm_conf", 111 create => "true", 112 perms => mog("0644","root","root"), 113 edit_line => append_if_no_line("SECONDARY_MPS:$(shared_global_environment.mps2)"); 114 115 !master_policy_servers.mps2_primary:: 116 "/etc/cm.conf" 117 handle => "add_multi_mps2_primary_to_cm_conf", 118 create => "true", 119 perms => mog("0644","root","root"), 120 edit_line => append_if_no_line("PRIMARY_MPS:$(shared_global_environment.mps2)"); 121 122 !master_policy_servers.mps2_primary:: 123 "/etc/cm.conf" 124 handle => "add_multi_mps1_secondary_to_cm_conf", 125 create => "true", 126 perms => mog("0644","root","root"), 127 edit_line => append_if_no_line("SECONDARY_MPS:$(shared_global_environment.mps1)"); 128 129 master_policy_servers:: 130 "/etc/cm.conf" 131 handle => "add_localhost_to_cm_conf", 132 create => "true", 133 perms => mog("0644","root","root"), 134 edit_line => append_if_no_line("MASTER_POLICY_SERVER:$(sys.host)"); 135 136 # Insert our site/env into cm.conf since we remove module_site_env and rely on shared_global_environment.cf now. 137 linux|sunos_5_10:: 138 "/etc/cm.conf" 139 handle => "add_env_site_to_cm_conf", 140 edit_line => append_if_no_line("ENV_SITE:$(shared_global_environment.environment)@$(shared_global_environment.site)"); 141 } 142 ######################################################### 143 body perms mog(m,o,g) 144 { 145 mode => "$(m)"; 146 owners => { "$(o)" }; 147 groups => { "$(g)" }; 148 } 149 ######################################################### 150 body depth_search recurse(d) 151 { 152 depth => "$(d)"; 153 exclude_dirs => { "\.svn" }; 154 155 } 156 ######################################################### 157 body copy_from mycopy(from) 158 { 159 source => "$(from)"; 160 compare => "digest"; 161 # we must keep purge=false or we will purge cf-serverd as its handled in a separate promise. 162 purge => "false"; 163 } 164 ######################################################### 165 body copy_from multiple_remote_copy(sourcedir,mps1,mps2) 166 { 167 source => "$(sourcedir)"; 168 copy_backup => "true"; 169 purge => "true"; 170 trustkey => "true"; 171 compare => "digest"; 172 encrypt => "true"; 173 verify => "true"; 174 175 mps1_primary:: 176 servers => {"$(mps1)", "$(mps2)"}; 177 178 mps2_primary:: 179 servers => {"$(mps2)", "$(mps1)"}; 180 } 181 ######################################################### 182 body action immediate 183 { 184 ifelapsed => "1"; 185 } 186 ######################################################### 187 bundle edit_line append_if_no_line(str) 188 { 189 insert_lines: 190 191 "$(str)"; 192 } [*] lines 3-30 define where we are going to pull site-specific configuration files from off of the master policy servers [*] line 34 aborts Cfengine client execution if we have defined multiple sites / environments. We defined stop_cfengine_execution in failsafe.cf in abortclasses. [*] lines 37 - 83 performs the actual network transfers note our copy_from promise. We pull the variables $(shared_global_environment.mps1/2) copy_from => multiple_remote_copy("$(site_specific_policy_location)","$(shared_global_environment.mps1)", "$(shared_global_environment.mps2)"), [*] lines 165 - 180 define the multiple_remote_copy promise. This takes two master policy servers as an argument, and alternates between the two depending on the mps1_primary or mps2_primary class we defined in shared_global_envionment.cf on line 125. If either one of our master policy servers are offline, we have defined the standby / backup master policy server for the client to contact next. This gives us high availability. [*] Lines 86 - 141 perform verification on the local filesystem. The only thing of notice here is that we add into a file, /etc/cm.conf, our PRIMARY_MPS and SECONDARY_MPS so humans can easily determine which master policy server this client will hit first. So, the above configuration has given us the ability to add two master policy servers per cage / datacenter / network block. If we wanted to add more master policy servers, we would just extend the above logic. Theoritically, if you had a datacenter with 50,000 machines, we could stand up 10x master policy servers (or more) and the above architecture would give you instant performance and high availbility. Finally, we want to extend this configuration to perform other actions in policy and still use high availaibility / high performance. So, lets look in some policies: In promises.cf, we just need to add shared_global_envionments.cf into our inputs statement. This means that shared_global_environments.cf defines global classes / variables in BOTH scopes (failsafe.cf and promises.cf execution) $ more promises.cf body common control { inputs slist => {"cfengine_stdlib.cf", "shared_global_environment.cf", ..... And if we wanted to set global classes using Range as described in this post? https://cfengine.com/forum/read.php?3,24968 agent.!range_classes_defined.!sparc.mps1_primary:: "range_classes_defined" expression => usemodule("module_probe_range_classes.py -p $(shared_global_environment.mps1) -s $(shared_global_environment.mps2)",""); agent.!range_classes_defined.mps2_primary:: "range_classes_defined" expression => usemodule("module_probe_range_classes.py -p $(shared_global_environment.mps2) -s $(shared_global_environment.mps1)",""); So, the above two usemodule statements alternated between $(shared_global_environment.mps1) and $(shared_global_environment.mps2) as the primary machine to query based off of the mps1_primary or mps2_primary classes. We accepted a primary and secondary machines to query in the script, much like as we did in update.cf to perform network transfers. Want another example? How about setting YUM repo servers? linux.mps1_primary:: "reposerver" string => $(shared_global_environment.mps1); linux.mps2_primary:: "reposerver" string => $(shared_global_environment.mps2); I think you get the idea at this point... The select_class statement that we've made in shared_global_envionments can be used anywhere in any policy to instantly give us high availaibily / load balancing using two simple class and vairable statements. Cfengine is extremely scalabile software, and this architecture will allow us to scale indefinatly. Cheers Mike _______________________________________________ Help-cfengine mailing list Help-cfengine@cfengine.org https://cfengine.org/mailman/listinfo/help-cfengine