Forum: CFEngine Help
Subject: Integrate Cfengine in your environment / The Cfengine challenge throw 
down
Author: msvob...@linkedin.com
Link to topic: https://cfengine.com/forum/read.php?3,24968,24968#msg-24968

LinkedIn had the pleasure of hosting Mark Burgess last month at the BayLISA 
meeting at our headquarters in Mountain View.  At the end of Mark's 
presentation, I asked a question about what was the best / most reliable way of 
pulling in "external information" outside of your Cfengine SVN/CVS/Git 
repository in a safe way.

For all advanced Cfengine administrators, this issue has come up in your 
environment in one way or another.  You have to pull in an external datasource 
outside of your source code repository and deliver it to clients.  How can you 
do this reliably?  How can we keep with Cfengine's methologoligy that if the 
network link is busted, that no damage could happen to the client?

So, the use case that we have in this example is a technology that Yahoo! open 
sourced called range:
https://github.com/yahoo/range

Range is a very simple HTTP lookup method that keeps metadata about machines.  
The server side is just a mod_range.so Apache module that you stick into 
httpd.conf.  If you execute a range query for a host, you can return all sorts 
of metadata about it.  This is very useful, because it allows machines to be 
catagorized into groups (What machines are running as webservers?  What JVMs 
are running on machineXYZ?)

We can already define global classes in Cfengine in promises.cf or local 
classes in any other policy -- but thats not the point.  The business should 
define what systems belong in which classes.  The Cfengine administrator should 
build policy.  Once the Cfengine administrator is left to manually defining 
classes within policy, you become a bottleneck.  What if you had the ability to 
write a policy and use a class within that policy -- but the business could 
control which machines were associated with that class external to Cfengine?  
You end up with an extremely powerful configuration management system.  You 
aren't the bottleneck, and you give your business a whole new flexiability.  
The business can deploy an application to a machine, and Cfengine automatically 
responds in its policy because new classes have been set on the client 
automatically.   Other business units can create classes of machines, and file 
a ticket for you to implement XYZ against the class they created.  
 You are allowing your business to work the way it wants to.

So, range is an excellent example of an external datasource in our environment 
that I need to set classes upon.  This could be anything in your environment -- 
Zookeeper, Ning's Galaxy, or whatever else you use to manage applications on 
your systems.  If you need to know what application / service is defined on 
specific machines, this information does not reside in your Cfengine 
repository.  You have to go out and grab it.  This leaves us with some problems:

1.  What if the client is off the network?
2.  What if the external source is unavailable?  Load balancer down / 
intermediate network issues / etc.
3.  I need to timeout my request.  My module can't hang forever, or cf-agent / 
cf-promises will just hang
4.  The classes I define have to be canonified.  
5.  I need persistance, which doesn't have a defined time.  It could be hours / 
days / months before this machine gets communcation restored to the range 
servers.  


Here's my solution to this problem.  In short, here's how it works.  I think it 
would be cool if Cfengine could implement something like this into the product 
itself, but, thats what is awesome about the usemodule function.  If you can 
define what you want to do programatically, you can create whatever client-side 
behavior you want.  

1.  I define my FQDN, and perform a lookup against it for all "range clusters". 
 A range cluster is basically just a group that this machine belongs to.  
2. I perform 2 additional lookups for any running applications on this machine. 
 These are called "tag" lookups.  You could create a tag for anything.  
CONTAINER and SERVICE are just the tags we used.    
3. Canonify all of the above to happy Cfengine classes.
4.  Dump the data to JSON format on the filesystem.
5.  If I can't read the range servers, or my request times out, read the JSON 
file on the filesystem and raise a class that we'll report on.  This class 
basically means that I read from disk instead of performing a live query.  This 
is class persistance for an unlimited amount of time.  Hopefully administrators 
are watching the reports and someone figures out WTF is going on with the 
communication problems between the client and range servers.

So, this satisfies all of the above.  Here's my code:


# cat -n module_define_range_classes.py 
     1  #!/usr/bin/python2.6
     6  
     7  import os
     8  import sys
     9  import signal
    10  import json
    11  import time
    12  import platform
    13  import subprocess
    14  import re
    15  from optparse import OptionParser
    16  import site
    17  site.addsitedir('/usr/local/linkedin/lib/python2.6/site-packages')
    18  import seco.range
    19  
    20  class timeout_exception(Exception):
    21    pass
    22  
    23  
####################################################################################################################
    24  def process_range_data(range_data_array, range_string):
    25    flush = 0
    26    # We need to cannofy the strings so that we can set Cfengine classes 
on these.  We could probably also set global variables
    27    # but right now we just need classes set.  First, confirm that every 
character in the string is alphanumeric or an underscore
    28    # period, or dash.  Then replace everything with an underscore.
    29    # include any possible bad characters below, which will be replaced.
    30    p = re.compile(r'[-.]')
    31    holder = ""
    32  
    33    # Append the arrays, regardless if they are empty or not since we are 
requring the mapping of arrays to indexes
    34    # If we make a successful query, then set a flag which will cause us 
to return a true value.  this determines if we should
    35    # flush our current results out to range_classes.conf or if we should 
still read from what is on disk.
    36    temp_array = []
    37    if range_data_array:
    38      for item in range_data_array:
    39        # range_string identifies if this is a range cluster, container, 
or service.
    40        holder = range_string + p.sub('_',item)
    41        # Make sure our item is now alphanumeric after we substituted 
unscores for the periods and dashes
    42        if re.match('',holder):
    43          temp_array.append(holder)
    44          # Set the global class by printing a + sign with the name that 
we've verified is canonical 
    45          print "+" + holder
    46          flush = 1
    47        else:
    48          print "+invalid_range_data"
    49      range_classes.append(temp_array)
    50  
    51    if flush:
    52      return 1
    53  
####################################################################################################################
    54  def execute_range_query():
    55    flush = 0
    56    range_clusters = []
    57    range_containers = []
    58    range_services = []
    59  
    60    def timeout_handler(signum, frame):
    61      raise timeout_exception()
    62  
    63    old_handler = signal.signal(signal.SIGALRM, timeout_handler) 
    64    # set a 5 second alarm
    65    signal.alarm(5) 
    66    
    67    try:
    68      # Grab all range clusters.  This is the expensive query to run on 
the range servers, because they have to search all of their
    69      # maps for the clusters specific for this host (bottom up search.)  
Once we have the clusters, we perform "tag" lookups which
    70      # is a cheap operation on the range infrastructure.
    71      range_object = seco.range.Range(options.url)
    72      try:
    73        range_clusters = range_object.expand('?' + fqdn)
    74      except seco.range.RangeException, e:
    75        if "NO_CLUSTER" in str(e)\
    76          or "NOCLUSTER" in str(e):
    77            print "+no_range_clusters"
    78  
    79      # We define range_clusters, range_containers, and range_services as 
local arrays because we append to a two deminsional array,
    80      # range_classes, which is global in scope in process_range_data.
    81      if range_clusters:
    82        for cluster in range_clusters:
    83          try:
    84            for container in range_object.expand('%{'+ cluster 
+'}:CONTAINER'):
    85              range_containers.append(container)
    86          except seco.range.RangeException, e:
    87            if "NO_CLUSTER" in str(e)\
    88              or "NOCLUSTER" in str(e):
    89                print "+no_range_containers"
    90          
    91          try:
    92            for service in range_object.expand('%{'+ cluster 
+'}:SERVICE'):
    93              range_services.append(service)
    94          except seco.range.RangeException, e:
    95            if "NO_CLUSTER" in str(e)\
    96              or "NOCLUSTER" in str(e):
    97                print "+no_range_services"
    98  
    99        # Now that we have the arrays which contain all of the clusters, 
containers, and services, process through them to make them
   100        # Cfengine-happy strings.  Flush the data out if we see any 
modifications.
   101        if process_range_data(range_clusters, "range_clusters_"):
   102          flush = 1
   103        if process_range_data(range_containers, "range_containers_"):
   104          flush = 1
   105        if process_range_data(range_services, "range_services_"):
   106          flush = 1
   107  
   108    except timeout_exception:
   109      print "+invalid_range_data"
   110    finally:
   111      signal.signal(signal.SIGALRM, old_handler)
   112      signal.alarm(0)
   113   
   114    if flush:
   115      return 1
   116    else:
   117      return 0
   118  
####################################################################################################################
   119  def print_previous_results():
   120    try:
   121      temp_array = previous_range_classes.pop()
   122      if temp_array:
   123        while temp_array:
   124          for item in temp_array:
   125            print "+" + item
   126          temp_array = previous_range_classes.pop()
   127    except IndexError:
   128      pass
   129  
####################################################################################################################
   130  if __name__ == '__main__':
   131    """
   132    Query the range servers and set global classes within Cfengine based 
upon their output.  When complete, dump data into a JSON file.
   133    If for whatever reason we can't query the range servers, then read 
the range class data from this file instead of querying the range
   134    servers directly.  This allows for "persistant classes".
   142    """
   143    parser = OptionParser(usage ="usage: %prog ",
   144      version ="%prog 1.0") 
   145    parser.add_option("-v", "--verbose",
   146      action = "store_true",
   147      dest = "verbose",
   148      default = False,
   149      help = "Enable verbose execution")
   150    parser.add_option("-u", "--url",
   151      action = "store",
   152      dest = "url",
   153      help = "Which URL to query against?  PROD/STG load balancers.  
REQUIRED")
   154    parser.add_option("-f", "--file",
   155      action = "store",
   156      dest = "file",
   157      help = "Which file should we read / write range classes to for 
persistance? (In case the range servers are not answering)")
   158  
   159    (options, args) = parser.parse_args()
   160  
   161    if options.url is None:
   162      print "A URL is required to execute this script.  Exiting."
   163      sys.exit(1)
   164  
   165    if options.file is None:
   166      options.file = "/etc/range_classes.conf"
   167  
   168    previous_range_classes = []
   169    range_classes = []
   170    file_created = 0
   171    fqdn = ""
   172  
   173    # The below statement sets a global class that we use to key off of 
in promises.cf so only one execution of the script occurs
   174    # per Cfengine execution.  Otherwise, we'd hit this script like 5 
times and overload the range servers.
   175    print "+module_define_range_classes_executed"
   176  
   177    if "linkedin.com" not in platform.node():
   178      fqdn = platform.node() + ".linkedin.com"
   179    else:
   180      fqdn = platform.node()
   181  
   182    try:
   183      if os.path.exists(options.file):
   184        with open(options.file) as fh:
   185          previous_range_classes = json.load(fh)
   186      else:
   187        file_created = 1
   188    except:
   189      file_created = 1
   190  
   191    # we return a 1 "flush" above if we should flush our results out to 
disk.  If not, then just set classes based off of 
   192    # what we found in range_classes.conf (our way of making persistant 
classes if the range servers don't respond)
   193    if execute_range_query() or file_created:
   194      try:
   195        with open(options.file, mode="w") as fh:
   196          json.dump(range_classes, fh, sort_keys=True, indent=2)
   197      except:
   198        print "We tried to dump data to the JSON files, but for whatever 
reason, we couldn't.  Sorry"
   199    else:
   200      # We didn't successfully poll the range servers, so, loop through 
and print our previous range classes found in range_classes.conf
   201      # There is no need to dump data back into a JSON file since we 
didn't read anything new.
   202      print_previous_results()





So, when we execute this script, this is what we get...


$ /var/cfengine/modules/module_define_range_classes.py -u 
range.servers.url.linkedin.com
+module_define_range_classes_executed
+range_clusters_alpha_agent_1
+range_clusters_alpha_fuse_usagecontrol_1
+range_clusters_alpha_genie_services_1
+range_clusters_alpha_languagepack_1
+range_clusters_alpha_liar_life_1
+range_clusters_alpha_profile_services_1
+range_clusters_alpha_tether_1
+range_containers_agent
+range_containers_fuse_usagecontrol
+range_containers_genie_services
+range_containers_languagepack
+range_containers_liar_life
+range_containers_profile_services
+range_containers_tether
+range_services_agent
+range_services_fuse_usagecontrol
+range_services_genie_services
+range_services_language_pack_cs_CZ
+range_services_language_pack_da_DK
+range_services_language_pack_de_DE
+range_services_language_pack_en_US
+range_services_language_pack_es_ES
+range_services_language_pack_fr_FR
+range_services_language_pack_in_ID
+range_services_language_pack_it_IT
+range_services_language_pack_ja_JP
+range_services_language_pack_ko_KR
+range_services_language_pack_ms_MY
+range_services_language_pack_nl_NL
+range_services_language_pack_no_NO
+range_services_language_pack_pl_PL
+range_services_language_pack_pt_BR
+range_services_language_pack_ro_RO
+range_services_language_pack_ru_RU
+range_services_language_pack_sv_SE
+range_services_language_pack_tr_TR
+range_services_liar_life
+range_services_profile_services
+range_services_tether




All of this data is dumped to JSON format into /etc/range_classes.conf.    We 
read from this file if our network path to the range servers is busted.  This 
allows the clients to continue to set these classes until range returns data to 
set it otherwise.


$ cat /etc/range_classes.conf 
[
  [
    "range_clusters_alpha_agent_1", 
    "range_clusters_alpha_fuse_usagecontrol_1", 
    "range_clusters_alpha_genie_services_1", 
    "range_clusters_alpha_languagepack_1", 
    "range_clusters_alpha_liar_life_1", 
    "range_clusters_alpha_profile_services_1", 
    "range_clusters_alpha_tether_1"
  ], 
  [
    "range_containers_agent", 
    "range_containers_fuse_usagecontrol", 
    "range_containers_genie_services", 
    "range_containers_languagepack", 
    "range_containers_liar_life", 
    "range_containers_profile_services", 
    "range_containers_tether"
  ], 
  [
    "range_services_agent", 
    "range_services_fuse_usagecontrol", 
    "range_services_genie_services", 
    "range_services_language_pack_cs_CZ", 
    "range_services_language_pack_da_DK", 
    "range_services_language_pack_de_DE", 
    "range_services_language_pack_en_US", 
    "range_services_language_pack_es_ES", 
    "range_services_language_pack_fr_FR", 
    "range_services_language_pack_in_ID", 
    "range_services_language_pack_it_IT", 
    "range_services_language_pack_ja_JP", 
    "range_services_language_pack_ko_KR", 
    "range_services_language_pack_ms_MY", 
    "range_services_language_pack_nl_NL", 
    "range_services_language_pack_no_NO", 
    "range_services_language_pack_pl_PL", 
    "range_services_language_pack_pt_BR", 
    "range_services_language_pack_ro_RO", 
    "range_services_language_pack_ru_RU", 
    "range_services_language_pack_sv_SE", 
    "range_services_language_pack_tr_TR", 
    "range_services_liar_life", 
    "range_services_profile_services", 
    "range_services_tether"
  ]
]



Here, I'll introduce a time.sleep(10) at line 71 just to simiulate the range 
servers not responding:


$ time /var/cfengine/modules/module_define_range_classes.py -u 
range.servers.url.linkedin.com
+module_define_range_classes_executed
+invalid_range_data
+range_services_agent
+range_services_fuse_usagecontrol
+range_services_genie_services
+range_services_language_pack_cs_CZ
+range_services_language_pack_da_DK
+range_services_language_pack_de_DE
+range_services_language_pack_en_US
+range_services_language_pack_es_ES
+range_services_language_pack_fr_FR
+range_services_language_pack_in_ID
+range_services_language_pack_it_IT
+range_services_language_pack_ja_JP
+range_services_language_pack_ko_KR
+range_services_language_pack_ms_MY
+range_services_language_pack_nl_NL
+range_services_language_pack_no_NO
+range_services_language_pack_pl_PL
+range_services_language_pack_pt_BR
+range_services_language_pack_ro_RO
+range_services_language_pack_ru_RU
+range_services_language_pack_sv_SE
+range_services_language_pack_tr_TR
+range_services_liar_life
+range_services_profile_services
+range_services_tether
+range_containers_agent
+range_containers_fuse_usagecontrol
+range_containers_genie_services
+range_containers_languagepack
+range_containers_liar_life
+range_containers_profile_services
+range_containers_tether
+range_clusters_alpha_agent_1
+range_clusters_alpha_fuse_usagecontrol_1
+range_clusters_alpha_genie_services_1
+range_clusters_alpha_languagepack_1
+range_clusters_alpha_liar_life_1
+range_clusters_alpha_profile_services_1
+range_clusters_alpha_tether_1

real    0m5.098s
user    0m0.070s
sys     0m0.031s




So, this behaved exactly like we were expecting.  Once we passed the 5 second 
timeout, we read from the JSON file to set the classes instead of performing 
the live query.   We also raised the global class invalid_range_data, so we 
will report that we've read from JSON.   

The only other takeaway from this script is at line 175.  We raise a class, 
module_define_range_classes_executed, so we only execute this script a single 
time.  When cf-agent runs through all of the modules in promises.cf, it will 
execute the modules like 5-6 times.  This in turn, will end up hammering our 
range servers.  So in promises.cf, I call this script using the below code:


        !module_define_range_classes_executed::
                "discover_range_crud"           expression      =>      
usemodule("module_define_range_classes.py -u range.servers.url.linkedin.com", 
"");





I hope this helps someone else trying to figure out how to pull in external 
datasources / information into Cfengine client execution.

If you are a Cfengine guru / expert, then please consider sharing some of your 
code / policies.  I'm not sharing this for my well being.  I'm sharing this 
with you because I want (I need) you to share what you are doing within your 
organization.  I need automation ideas.  I want to expand what we're doing at 
LinkedIn, and I need your help doing so.  

This list isn't just for the n00bs asking basic Cfengine questions.  I want to 
learn something from you, and it'll help the n00bs too.  Share what you're 
automating and amaze me.   If I end up implementing some uber cool automation 
idea you've shared, I'll send you a cookie (seriously, it'll be one of those 
big chocolate chip cookies that you see at the mall.)

Thanks
Mike

_______________________________________________
Help-cfengine mailing list
Help-cfengine@cfengine.org
https://cfengine.org/mailman/listinfo/help-cfengine

Reply via email to