[Python-ideas] A new itertools.product variant that supports kwargs and returns dicts

Daniel Grießhaber Tue, 18 Feb 2020 04:11:14 -0800

When doing a grid-search over some hyper-parameters of an algorithm I often 
need to specify the possible value space for each parameter and then evaluate 
on every combination of these values.
In this case, the product function of the itertools module is handy:


    from itertools import product
    
    def train_and_evaluate(lr, num_layers, dataset):
        print(f'training on {dataset} with {num_layers=} and {lr=}')
    
    params = (
                (0.01, 0.1, 0.5), #learning rate
                range(10, 40, 10), #number of hidden neurons
                ('MNLI', 'SNLI', 'QNLI') # training dataset
    )
    
    for config in product(*params):
        train_and_evaluate(*config)

However, this code relies on the order of parameters in the function 
train_and_evaluate, which could be considered not very pythonic, as explicit is 
better than implicit. 
In the same way, the 'intention' for the values is only clear with the context 
of the function signature (or the comments, if someone bothered to write them).

Therefore I propose a new variant of the product() function that supports 
Mappings instead of Iterables to enable to explicitly specify the function of 
an iterables values in the product. A trivial implementation could be:

    def named_product(repeat=1, **kwargs):
        for combination in itertools.product(*kwargs.values(), repeat=repeat):
        yield dict(zip(kwargs.keys(), combination))

which could then be used like this:

    params = {
        'lr': (0.01, 0.1, 0.5), #learning rate
        'num_layers': range(10, 40, 10), #number of hidden neurons
        'dataset': ('MNLI', 'SNLI', 'QNLI') # training dataset
    }
    
    for config in named_product(**params):
        train_and_evaluate(**config)

This has the advantage that the order of the parameters in the kwargs does not 
depend on the method signature of train_and_evaluate.

I also would appreciate your input on the following questions:

- Support scalar values?
    I the example use-case it may be nice to not have each kwarg to be an 
iterable, because you may want to specify all parameters to the function, even 
if they do not vary in the product items (factors?)

        params = named_product(a=(1, 2), b=3)

    instead of

        params = named_product(a=(1, 2), b=(3, ))

    However, this may be unexpected to users when they supply a string which 
would then result in an iteration over its characters.

- Would this may be suited for the more-itertools package?
    However, I would love to have this build-in and not depend on a non-stdlib 
module

- More functionality:
    In my toolbox-package I implemented a variant with much more functionality: 
[https://py-toolbox.readthedocs.io/en/latest/modules/itertools.html](https://py-toolbox.readthedocs.io/en/latest/modules/itertools.html).
 This version offers additional functionality compared to the proposed function:
    - accept both, a dict as *arg or **kwargs and merges them
    - accepts scalar values, not only iterables but treats strings as scalars
    - if the value of any item is a dict itself (not scalar or Sequence), 
iterate over the nested dict as a seperate call to named_product would and 
update the 'outer' dict with those values
    - copies values before yielding which is useful if the returned values are 
mutable (can be excluded)

Maybe some of these features would also be useful for a general version? 
However, this version is mostly developed for my special use-case and some 
behaviour is solely based on my peculiarities

## Example usage:
Here is an example from my actual code which may clarify some decisions on 
functionality of the features of 
[https://py-toolbox.readthedocs.io/en/latest/modules/itertools.html](https://py-toolbox.readthedocs.io/en/latest/modules/itertools.html)

    configs = named_product({
        'nb_epoch': 3,
        'acquisition_iterations': 0, 
        'training_function': {
            partial(train_bert, decoder_function=create_decoder):{
                'decoder_name': 'FFNN'
            },
            partial(train_bert, decoder_function=create_cnn_decoder):{
                'decoder_name': 'CNN'
            }
        },
        'training_data': partial(load_glue, initial_training_data_size=10),
        'number_queries': 100,
        'batch_size': 64,
        'dataset_class': [QNLIDataset, MNLIDataset],
        'num_frozen_layers': 0,
        'acquire_function': {
            acquire_uncertain: {
                'certainty_function': calculate_random, 
                'predict_function': 'predict',
                'dropout_iterations': 1
            }
        }
    })
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/EVW2F3PD4O25S5QWQFRCBPFB7YO6BEEA/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] A new itertools.product variant that supports kwargs and returns dicts

Reply via email to