General Purpose Pipeline library?

Jason Mon, 20 Nov 2017 07:53:38 -0800

a pipeline can be described as a sequence of functions that are applied to an 
input with each subsequent function getting the output of the preceding 
function:


out = f6(f5(f4(f3(f2(f1(in))))))

However this isn't very readable and does not support conditionals.

Tensorflow has tensor-focused pipepines:
    fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1')
    fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, 
scope='fc2')
    out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')

I have some code which allows me to mimic this, but with an implied parameter.

def executePipeline(steps, collection_funcs = [map, filter, reduce]):
        results = None
        for step in steps:
                func = step[0]
                params = step[1]
                if func in collection_funcs:
                        print func, params[0]
                        results = func(functools.partial(params[0], 
*params[1:]), results)
                else:
                        print func
                        if results is None:
                                results = func(*params)
                        else:
                                results = func(*(params+(results,)))
        return results

executePipeline( [
                                (read_rows, (in_file,)),
                                (map, (lower_row, field)),
                                (stash_rows, ('stashed_file', )),
                                (map, (lemmatize_row, field)),
                                (vectorize_rows, (field, min_count,)),
                                (evaluate_rows, (weights, None)),
                                (recombine_rows, ('stashed_file', )),
                                (write_rows, (out_file,))
                        ]
)

Which gets me close, but I can't control where rows gets passed in. In the 
above code, it is always the last parameter.

I feel like I'm reinventing a wheel here.  I was wondering if there's already 
something that exists?

-- 
https://mail.python.org/mailman/listinfo/python-list

General Purpose Pipeline library?

Reply via email to