On 02/21/2014 09:59 PM, Denis Usanov wrote:
Good evening.

First of all I would like to apologize for the name of topic. I really didn't 
know how to name it more correctly.

I mostly develop on Python some automation scripts such as deployment (it's not about 
fabric and may be not ssh at all), testing something, etc. In this terms I have such 
abstraction as "step".

Some code:

class IStep(object):
     def run():
         raise NotImplementedError()

And the certain steps:

class DeployStep: ...
class ValidateUSBFlash: ...
class SwitchVersionS: ...

Where I implement run method.
Then I use some "builder" class which can add steps to internal list and has a method 
"start" running all step one by one.

And I like this. It's loosely coupled system. It works fine in simple cases. But sometimes some 
steps have to use the results from previous steps. And now I have problems. Before now I had 
internal dict in "builder" and named it as "world" and passed it to each run() 
methods of steps. It worked but I disliked this.

How would you solve this problem and how would you do it? I understant that 
it's more architecture specific question, not a python one.

I bet I wouldn't have asked this if I had worked with some of functional 
programming languages.

A few months ago I posted a summary of a data transformation framework inviting commentary. (https://mail.python.org/pipermail/python-list/2013-August/654226.html). It didn't meet with much interest and I forgot about it. Now that someone is looking for something along the line as I understand his post, there might be some interest after all.


My module is called TX. A base class "Transformer" handles the flow of data. A custom Transformer defines a method "T.transform (self)" which transforms input to output. Transformers are callable, taking input as an argument and returning the output:

    transformed_input = T (some_input)

A Transformer object retains both input and output after a run. If it is called a second time without input, it simply returns its output, without needlessly repeating its job:

    same_transformed_input = T ()

Because of this IO design, Transformers nest:

csv_text = CSV_Maker (Data_Line_Picker (Line_Splitter (File_Reader ('1st-quarter-2013.statement'))))

A better alternative to nesting is to build a Chain:

Statement_To_CSV = TX.Chain (File_Reader, Line_Splitter, Data_Line_Picker, CSV_Maker)

A Chain is functionally equivalent to a Transformer:

    csv_text = Statement_To_CSV ('1st-quarter-2013.statement')

Since Transformers retain their data, developing or debugging a Chain is a relatively simple affair. If a Chain fails, the method "show ()" displays the innards of its elements one by one. The failing element is the first one that has no output. It also displays such messages as the method "transform (self)" would have logged. (self.log (message)). While fixing the failing element, the element preceding keeps providing the original input for testing, until the repair is done.

Since a Chain is functionally equivalent to a Transformer, a Chain can be placed into a containing Chain alongside Transformers:

Table_Maker = TX.Chain (TX.File_Reader (), TX.Line_Splitter (), TX.Table_Maker ()) Table_Writer = TX.Chain (Table_Maker, Table_Formatter, TX.File_Writer (file_name = '/home/xy/office/addresses-4214')) DB_Writer = TX.Chain (Table_Maker, DB_Formatter, TX.DB_Writer (table_name = 'contacts'))

Better:

    Splitter = TX.Splitter (TX.Table_Writer (), TX.DB_Writer ())
    Table_Handler = TX.Chain (Table_Maker, Splitter)

Table_Handler ('home/xy/Downloads/report-4214') # Writes to both file and to DB


If a structure builds up too complex to remember, the method "show_tree ()" would display something like this:

    Chain
    Chain[0] - Chain
    Chain[0][0] - Quotes
    Chain[0][1] - Adjust Splits
    Chain[1] - Splitter
    Chain[1][0] - Chain
    Chain[1][0][0] - High_Low_Range
    Chain[1][0][1] - Splitter
    Chain[1][0][1][0] - Trailing_High_Low_Ratio
    Chain[1][0][1][1] - Standard Deviations
    Chain[1][1] - Chain
    Chain[1][1][0] - Trailing Trend
    Chain[1][1][1] - Pegs

Following a run, all intermediary formats are accessible:

    standard_deviations = C[1][0][1][1]()

    TM = TX.Table_Maker ()
    TM (standard_deviations).write ()

         0      | 1      | 2     |

         116.49 | 132.93 | 11.53 |
         115.15 | 128.70 | 11.34 |
           1.01 |   0.00 |  0.01 |

A Transformer takes parameters, either at construction time or by means of the method "T.set (key = parameter)". Whereas a File Reader doesn't get payload passed and may take a file name as input argument, as a convenient alternative, a File Writer does take payload and the file name must be set by keyword:

    File_Writer = TX.File_Writer (file_name = '/tmp/memos-with-dates-1')
    File_Writer (input)  # Writes file
    File_Writer.set ('/tmp/memos-with-dates-2')
File_Writer ()  # Writes the same thing to the second file



That's about it. I am very pleased with the design. I developed it to wrap a growing jungle of existing modules and classes having no interconnectability and no common input-output specifications. The improvement in terms of work time and resource management is enormous. I would share the base class and a few custom classes, reasonably autonomous to not require surgical extraction from the jungle.

Writing a custom class requires no more than defining private keywords, if any, and writing the method "transform (self)", or "process_record (self, record)" if the input is a list of records, which it often is. The modular design encourages to have a Transformer do just one simple thing, easy to write and easy to debug. Complexity comes from assembling simple Transformers in a great variety of configurations.


Frederic

--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to