On Nov 20, 2017 10:50 AM, "Jason" <jasonh...@gmail.com> wrote: > > a pipeline can be described as a sequence of functions that are applied to an input with each subsequent function getting the output of the preceding function: > > out = f6(f5(f4(f3(f2(f1(in)))))) > > However this isn't very readable and does not support conditionals. > > Tensorflow has tensor-focused pipepines: > fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1') > fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2') > out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out') > > I have some code which allows me to mimic this, but with an implied parameter. > > def executePipeline(steps, collection_funcs = [map, filter, reduce]): > results = None > for step in steps: > func = step[0] > params = step[1] > if func in collection_funcs: > print func, params[0] > results = func(functools.partial(params[0], *params[1:]), results) > else: > print func > if results is None: > results = func(*params) > else: > results = func(*(params+(results,))) > return results > > executePipeline( [ > (read_rows, (in_file,)), > (map, (lower_row, field)), > (stash_rows, ('stashed_file', )), > (map, (lemmatize_row, field)), > (vectorize_rows, (field, min_count,)), > (evaluate_rows, (weights, None)), > (recombine_rows, ('stashed_file', )), > (write_rows, (out_file,)) > ] > ) > > Which gets me close, but I can't control where rows gets passed in. In the above code, it is always the last parameter. > > I feel like I'm reinventing a wheel here. I was wondering if there's already something that exists?
IBM has had for a very long time a program called Pipelines which runs on IBM mainframes. It does what you want. A number of attempts have been made to create cross-platform versions of this marvelous program. A long time ago I started but never completed an open source python version. If you are interested in taking a look at this let me know. -- https://mail.python.org/mailman/listinfo/python-list