This is an automated email from the ASF dual-hosted git repository. skrawcz pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/hamilton.git
commit f723544c6643f9cdb6bb460db322a12e92ee534e Author: Charles Swartz <[email protected]> AuthorDate: Sun Apr 6 09:24:47 2025 -0400 Add documentation for `unpack_fields` --- docs/concepts/function-modifiers.rst | 29 +++++++++++++++++-- docs/reference/decorators/index.rst | 1 + docs/reference/decorators/unpack_fields.rst | 44 +++++++++++++++++++++++++++++ 3 files changed, 71 insertions(+), 3 deletions(-) diff --git a/docs/concepts/function-modifiers.rst b/docs/concepts/function-modifiers.rst index c3b38c50..4c4797b2 100644 --- a/docs/concepts/function-modifiers.rst +++ b/docs/concepts/function-modifiers.rst @@ -191,13 +191,36 @@ Sometimes, your node outputs multiple values that you would like to name and mak To add metadata to extracted nodes, use ``@tag_output``, which works just like ``@tag``. +@unpack_fields +~~~~~~~~~~~~~~ + +A good example is splitting a dataset into training, validation, and test splits. We use ``@unpack_fields``, which requires specifying the names of the fields to extract. The function must return a tuple with at least as many elements as there are specified fields. Note that selecting a subset of the tuple or using an indeterminate tuple size is also possible. + +.. code-block:: python + + from typing import Tuple + from hamilton.function_modifiers import unpack_fields + + @unpack_fields("X_train" "X_validation", "X_test") + def dataset_splits(X: np.ndarray) -> Tuple[np.ndarray, np.ndarray, np.ndarray]: + """Randomly split data into train, validation, test""" + X_train, X_validation, X_test = random_split(X) + return X_train, X_validation, X_test + +.. image:: ./_function-modifiers/extract_fields.png + :height: 250px + + +Now, ``X_train``, ``X_validation``, and ``X_test`` are available to other nodes and can be queried with ``.execute()``. However, since ``dataset_splits`` is itself a node, you can query it to obtain all splits in a single tuple! + @extract_fields ~~~~~~~~~~~~~~~ -A good example is splitting a dataset into train, validation, and test splits. We will use ``@extract_fields``, which requires specifying in a dictionary the ``field_name: field_type`` of each field. +Additionally, we can extract fields from an output dictionary using ``@extract_fields``. In this case, you must specify the dictionary keys and their types. The function must return a dictionary that contains, at a minimum, those keys specified in the decorator. .. code-block:: python + from typing import Dict from hamilton.function_modifiers import extract_fields @extract_fields(dict( # don't forget the dictionary @@ -205,7 +228,7 @@ A good example is splitting a dataset into train, validation, and test splits. W X_validation=np.ndarray, X_test=np.ndarray, )) - def dataset_splits(X: np.ndarray) -> dict: + def dataset_splits(X: np.ndarray) -> Dict: """Randomly split data into train, validation, test""" X_train, X_validation, X_test = random_split(X) return dict( @@ -218,7 +241,7 @@ A good example is splitting a dataset into train, validation, and test splits. W :height: 250px -Now, ``X_train``, ``X_validation``, and ``X_test`` are available to other nodes, and they can be queried by ``.execute()``. But, since ``dataset_splits`` is its own node, you can query it to get all splits in a dictionary! +Again, ``X_train``, ``X_validation``, and ``X_test`` are now available to other nodes, or you can query the ``dataset_splits`` node to retrieve all splits in a dictionary. @extract_columns ~~~~~~~~~~~~~~~~ diff --git a/docs/reference/decorators/index.rst b/docs/reference/decorators/index.rst index d5154e01..49bd8d43 100644 --- a/docs/reference/decorators/index.rst +++ b/docs/reference/decorators/index.rst @@ -21,6 +21,7 @@ Reference dataloader datasaver does + unpack_fields extract_columns extract_fields inject diff --git a/docs/reference/decorators/unpack_fields.rst b/docs/reference/decorators/unpack_fields.rst new file mode 100644 index 00000000..f91c3187 --- /dev/null +++ b/docs/reference/decorators/unpack_fields.rst @@ -0,0 +1,44 @@ +======================= +unpack_fields +======================= +This decorator works on a function that outputs a tuple and unpacks its elements to make them individually available for consumption. Essentially, it expands the original function into n separate functions, each of which takes the original output tuple and, in return, outputs a specific field based on the index supplied to the ``unpack_fields`` decorator. + +.. code-block:: python + + import pandas as pd + from hamilton.function_modifiers import unpack_fields + + @unpack_fields('X_train', 'X_test', 'y_train', 'y_test') + def train_test_split_func( + feature_matrix: np.ndarray, + target: np.ndarray, + test_size_fraction: float, + shuffle_train_test_split: bool, + ) -> Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]: + ... # Calculate the train-test split + return X_train, X_test, y_train, y_test + + +The arguments to the decorator not only represent the names of the resulting fields but also determine their position in the output tuple. This means you can choose to unpack a subset of the fields or declare an indeterminate number of fields — as long as the number of requested fields does not exceed the number of elements in the output tuple. + +.. code-block:: python + + import pandas as pd + from hamilton.function_modifiers import unpack_fields + + @unpack_fields('X_train', 'X_test', 'y_train', 'y_test') + def train_test_split_func( + feature_matrix: np.ndarray, + target: np.ndarray, + test_size_fraction: float, + shuffle_train_test_split: bool, + ) -> Tuple[np.ndarray, ...]: # indeterminate number of fields + ... # Calculate the train-test split + return X_train, X_test, y_train, y_test + +---- + +**Reference Documentation** + +.. autoclass:: hamilton.function_modifiers.unpack_fields + :special-members: __init__
