This is an automated email from the ASF dual-hosted git repository. skrawcz pushed a commit to branch update_references in repository https://gitbox.apache.org/repos/asf/hamilton.git
commit a319142643ea68ee99cd22297d1a7afb599ef887 Author: Stefan Krawczyk <[email protected]> AuthorDate: Sun Jun 22 20:58:17 2025 -0700 Reduces overuse of Apache Hamilton Since in most cases Hamilton suffices when describing concepts within Hamilton. --- docs/code-comparisons/langchain.rst | 26 +++++++++---------- docs/concepts/best-practices/function-naming.rst | 2 +- .../best-practices/migrating-to-hamilton.rst | 2 +- .../best-practices/output-immutability.rst | 2 +- .../using-within-your-etl-system.rst | 4 +-- docs/concepts/driver.rst | 2 +- docs/concepts/function-modifiers.rst | 8 +++--- docs/concepts/glossary.rst | 8 +++--- docs/concepts/materialization.rst | 2 +- docs/concepts/node.rst | 2 +- docs/concepts/parallel-task.rst | 2 +- docs/concepts/visualization.rst | 2 +- docs/how-tos/load-data.rst | 2 +- docs/how-tos/pre-commit-hooks.md | 2 +- docs/how-tos/use-in-jupyter-notebook.md | 30 +++++++++++----------- docs/how-tos/wrapping-driver.rst | 12 ++++----- docs/integrations/fastapi.md | 4 +-- docs/integrations/streamlit.md | 2 +- docs/reference/disabling-telemetry.md | 2 +- docs/reference/drivers/index.rst | 2 +- docs/reference/graph-adapters/DaskGraphAdapter.rst | 2 +- .../species_distribution_modeling/README.md | 2 +- writeups/garbage_collection/post.md | 2 +- 23 files changed, 62 insertions(+), 62 deletions(-) diff --git a/docs/code-comparisons/langchain.rst b/docs/code-comparisons/langchain.rst index 6bff3018..ba57b317 100644 --- a/docs/code-comparisons/langchain.rst +++ b/docs/code-comparisons/langchain.rst @@ -33,11 +33,11 @@ A simple joke example .. figure:: langchain_snippets/hamilton-invoke.png - :alt: Structure of the Apache Hamilton DAG + :alt: Structure of the Hamilton DAG :align: center :width: 50% - The Apache Hamilton DAG visualized. + The Hamilton DAG visualized. ----------------------- A streamed joke example @@ -57,11 +57,11 @@ Note: you could use @config.when to include both streamed and non-streamed versi .. figure:: langchain_snippets/hamilton-streamed.png - :alt: Structure of the Apache Hamilton DAG + :alt: Structure of the Hamilton DAG :align: center :width: 50% - The Apache Hamilton DAG visualized. + The Hamilton DAG visualized. ------------------------------- A "batch" parallel joke example @@ -82,18 +82,18 @@ e.g. Ray, Dask, etc. We use multi-threading here. .. figure:: langchain_snippets/hamilton-batch.png - :alt: Structure of the Apache Hamilton DAG + :alt: Structure of the Hamilton DAG :align: center :width: 75% - The Apache Hamilton DAG visualized. + The Hamilton DAG visualized. ---------------------- A "async" joke example ---------------------- Here we show how to make the joke using async constructs. With Apache Hamilton you can mix and match async and regular functions, the only change -is that you need to use the async Apache Hamilton Driver. +is that you need to use the async Hamilton Driver. .. table:: Async Version :align: left @@ -107,11 +107,11 @@ is that you need to use the async Apache Hamilton Driver. .. figure:: langchain_snippets/hamilton-async.png - :alt: Structure of the Apache Hamilton DAG + :alt: Structure of the Hamilton DAG :align: center :width: 50% - The Apache Hamilton DAG visualized. + The Hamilton DAG visualized. --------------------------------- @@ -133,11 +133,11 @@ that uses the different OpenAI model. .. figure:: langchain_snippets/hamilton-completion.png - :alt: Structure of the Apache Hamilton DAG + :alt: Structure of the Hamilton DAG :align: center :width: 50% - The Apache Hamilton DAG visualized with configuration provided for the completion path. Note the dangling node - that's normal, it's not used in the completion path. + The Hamilton DAG visualized with configuration provided for the completion path. Note the dangling node - that's normal, it's not used in the completion path. --------------------------------- @@ -160,11 +160,11 @@ to use Anthropic. .. figure:: langchain_snippets/hamilton-anthropic.png - :alt: Structure of the Apache Hamilton DAG + :alt: Structure of the Hamilton DAG :align: center :width: 50% - The Apache Hamilton DAG visualized with configuration provided to use Anthropic. + The Hamilton DAG visualized with configuration provided to use Anthropic. --------------------------------- diff --git a/docs/concepts/best-practices/function-naming.rst b/docs/concepts/best-practices/function-naming.rst index 3255d4cd..7fb1f569 100644 --- a/docs/concepts/best-practices/function-naming.rst +++ b/docs/concepts/best-practices/function-naming.rst @@ -35,7 +35,7 @@ When people come to encounter your code, they'll need to understand it, add to i You'll want to ensure some standardization to enable: #. Mapping business concepts to function names. E.g. That will help people to find things in the code that map to things that happen within your business. -#. Ensuring naming uniformity across the code base. People usually follow the precedent of the code around them, so if everything in a particular module for say, date features, has a ``D_`` prefix, then they will likely follow that naming convention. This is likely something you will iterate on -- and it's best to try to converge on a team naming convention once you have a feel for the Apache Hamilton functions being written by the team. +#. Ensuring naming uniformity across the code base. People usually follow the precedent of the code around them, so if everything in a particular module for say, date features, has a ``D_`` prefix, then they will likely follow that naming convention. This is likely something you will iterate on -- and it's best to try to converge on a team naming convention once you have a feel for the Hamilton functions being written by the team. We suggest that long functions names that are separated by ``_`` aren't a bad thing. E.g. if you were to come across a function named ``life_time_value`` versus ``ltv`` versus ``l_t_v``, which one is more obvious as to what it is and what diff --git a/docs/concepts/best-practices/migrating-to-hamilton.rst b/docs/concepts/best-practices/migrating-to-hamilton.rst index 3a656e1a..d5ef913b 100644 --- a/docs/concepts/best-practices/migrating-to-hamilton.rst +++ b/docs/concepts/best-practices/migrating-to-hamilton.rst @@ -25,7 +25,7 @@ change those systems to be able to use Apache Hamilton. If that's the case, then Specifically, this custom wrapper object class's purpose is to match your existing API expectations. It will act as the translation layer from your existing API expectations, to what running Apache Hamilton requires, and back. In Apache Hamilton -terminology, this is a `Custom Driver Wrapper`, since it wraps around the Apache Hamilton Driver class. +terminology, this is a `Custom Driver Wrapper`, since it wraps around the Hamilton Driver class. .. image:: ../../_static/Hamilton_ApplyMeetup_2022_wrapper.svg :alt: The wrapper driver class helps ensure your existing API expectations are matched. diff --git a/docs/concepts/best-practices/output-immutability.rst b/docs/concepts/best-practices/output-immutability.rst index 4bdf218c..4ee172d3 100644 --- a/docs/concepts/best-practices/output-immutability.rst +++ b/docs/concepts/best-practices/output-immutability.rst @@ -12,7 +12,7 @@ output of a function is immutable, then there's only one place it was created; i provides a great debugging experience if there are ever issues in your dataflow. We believe that by default, one should always strive for immutability of outputs. -However, it is up to you, the Apache Hamilton function writer, to ensure that immutability is something that is adhered to. +However, it is up to you, the Hamilton function writer, to ensure that immutability is something that is adhered to. Best practice: -------------- diff --git a/docs/concepts/best-practices/using-within-your-etl-system.rst b/docs/concepts/best-practices/using-within-your-etl-system.rst index 444b9a48..b41634a9 100644 --- a/docs/concepts/best-practices/using-within-your-etl-system.rst +++ b/docs/concepts/best-practices/using-within-your-etl-system.rst @@ -34,8 +34,8 @@ Compatibility Matrix ETL Recipe ---------- -#. Write Apache Hamilton functions & `“driver”` code. -#. Publish your Apache Hamilton functions in a package, or import via other means (e.g. checkout a repository & include in python path). +#. Write Hamilton functions & `“driver”` code. +#. Publish your Hamilton functions in a package, or import via other means (e.g. checkout a repository & include in python path). #. Include `sf-hamilton` as a python dependency #. Have your ETL system execute your “driver” code. #. Profit. diff --git a/docs/concepts/driver.rst b/docs/concepts/driver.rst index 1ddc4fc5..b07e4cc6 100644 --- a/docs/concepts/driver.rst +++ b/docs/concepts/driver.rst @@ -2,7 +2,7 @@ Driver ====== -Once you defined your dataflow in a Python module, you need to create a Apache Hamilton Driver to execute it. This page details the Driver basics, which include: +Once you defined your dataflow in a Python module, you need to create a Hamilton Driver to execute it. This page details the Driver basics, which include: 1. Defining the Driver 2. Visualizing the dataflow diff --git a/docs/concepts/function-modifiers.rst b/docs/concepts/function-modifiers.rst index 0e16c781..cbe97d06 100644 --- a/docs/concepts/function-modifiers.rst +++ b/docs/concepts/function-modifiers.rst @@ -2,9 +2,9 @@ Function modifiers ================== -In :doc:`node`, we discussed how to write Python functions to define Apache Hamilton nodes and dataflow. In the basic case, each function defines one node. +In :doc:`node`, we discussed how to write Python functions to define Hamilton nodes and dataflow. In the basic case, each function defines one node. -Yet, it's common to need nodes with similar purposes but different dependencies, such as preprocessing training and evaluation datasets. In that case, using a **function modifier** can help create both nodes from a single Apache Hamilton function! +Yet, it's common to need nodes with similar purposes but different dependencies, such as preprocessing training and evaluation datasets. In that case, using a **function modifier** can help create both nodes from a single Hamilton function! On this page, you'll learn: @@ -41,7 +41,7 @@ Function modifiers were designed to have clear semantics, so you should be able Reminder: Anatomy of a node --------------------------- -This section from the page :doc:`node` details how a Python function maps to a Apache Hamilton node. We'll reuse these terms to explain the function modifiers. +This section from the page :doc:`node` details how a Python function maps to a Hamilton node. We'll reuse these terms to explain the function modifiers. .. image:: ../_static/function_anatomy.png :scale: 13% @@ -161,7 +161,7 @@ The next snippet checks if the returned Series is of type ``np.int32``, which is - To see all available validators, go to the file ``hamilton/data_quality/default_validators.py`` and view the variable ``AVAILABLE_DEFAULT_VALIDATORS``. -- The function modifier ``@check_output_custom`` allows you to define your own validator. Validators inherit the ``base.BaseDefaultValidator`` class and are essentially standardized Apache Hamilton node definitions (instead of functions). See ``hamilton/data_quality/default_validators.py`` or reach out on `Slack <https://join.slack.com/t/hamilton-opensource/shared_invite/zt-2niepkra8-DGKGf_tTYhXuJWBTXtIs4g>`_ for help! +- The function modifier ``@check_output_custom`` allows you to define your own validator. Validators inherit the ``base.BaseDefaultValidator`` class and are essentially standardized Hamilton node definitions (instead of functions). See ``hamilton/data_quality/default_validators.py`` or reach out on `Slack <https://join.slack.com/t/hamilton-opensource/shared_invite/zt-2niepkra8-DGKGf_tTYhXuJWBTXtIs4g>`_ for help! - Note: ``@check_output_custom`` decorators cannot be stacked, but they instead can take multiple validators. .. note:: diff --git a/docs/concepts/glossary.rst b/docs/concepts/glossary.rst index b04f578b..3addd578 100644 --- a/docs/concepts/glossary.rst +++ b/docs/concepts/glossary.rst @@ -16,13 +16,13 @@ Before we dive into the concepts, let's clarify the terminology we'll be using: the other), acyclic, (there are no cycles, i.e., no function runs before itself), and a graph (it is easily \ naturally represented by nodes and edges) and can be represented visually. See :doc:`node`. * - Node | - Apache Hamilton node | + Hamilton node | Transform - A single step in the dataflow DAG representing a computation -- usually 1:1 with functions but decorators break that \ pattern -- in which case multiple transforms trace back to a single function. See :doc:`node`. * - Function | Python function | - Apache Hamilton function | + Hamilton function | Node definition - A Python function written by a user to create a single node (in the standard case) or \ many (using function modifiers). See :doc:`node`. @@ -30,7 +30,7 @@ Before we dive into the concepts, let's clarify the terminology we'll be using: Python module - Python code organized into a ``.py`` file. These are natural groupings of functions that turn to a set of nodes. See :doc:`best-practices/code-organization` for more details. * - Driver | - Apache Hamilton Driver + Hamilton Driver - An object that loads Python modules to build a dataflow. It is responsible for visualizing and executing the \ dataflow. See :doc:`driver`. * - script | @@ -41,4 +41,4 @@ Before we dive into the concepts, let's clarify the terminology we'll be using: - Data that dictates the way the DAG is constructed. See :doc:`driver`. * - Function modifiers | Decorators - - A function that modifies how your Apache Hamilton function is compiled into a Apache Hamilton node. See :doc:`function-modifiers`. + - A function that modifies how your Hamilton function is compiled into a Hamilton node. See :doc:`function-modifiers`. diff --git a/docs/concepts/materialization.rst b/docs/concepts/materialization.rst index b12d7680..426b0c25 100644 --- a/docs/concepts/materialization.rst +++ b/docs/concepts/materialization.rst @@ -113,7 +113,7 @@ The dataflow is executed by passing ``from_`` and ``to`` objects to ``Driver.mat Function modifiers ~~~~~~~~~~~~~~~~~~ -By adding ``@load_from`` and ``@save_to`` function modifiers (:ref:`loader-saver-decorators`) to Apache Hamilton functions, materializers are generated when using ``Builder.with_modules()``. This approach ressembles **1) from Driver**: +By adding ``@load_from`` and ``@save_to`` function modifiers (:ref:`loader-saver-decorators`) to Hamilton functions, materializers are generated when using ``Builder.with_modules()``. This approach ressembles **1) from Driver**: .. note:: diff --git a/docs/concepts/node.rst b/docs/concepts/node.rst index e270a60b..89d06418 100644 --- a/docs/concepts/node.rst +++ b/docs/concepts/node.rst @@ -75,7 +75,7 @@ A node is a single "operation" or "step" in a dataflow. Apache Hamilton users wr Anatomy of a node ~~~~~~~~~~~~~~~~~ -The following figure and table detail how a Python function maps to a Apache Hamilton node. +The following figure and table detail how a Python function maps to a Hamilton node. .. image:: ../_static/function_anatomy.png diff --git a/docs/concepts/parallel-task.rst b/docs/concepts/parallel-task.rst index e5bdfa11..54796b86 100644 --- a/docs/concepts/parallel-task.rst +++ b/docs/concepts/parallel-task.rst @@ -12,7 +12,7 @@ The adapter approach effectively farms out the execution of each node/function t futures. That is, Apache Hamilton walks the DAG and submits each node to the adapter, which then submits the node for execution, and internally the execution resolves any Futures from prior submitted nodes. -To make use of this, the general pattern is you apply an adapter to the driver and don't need to touch your Apache Hamilton functions!: +To make use of this, the general pattern is you apply an adapter to the driver and don't need to touch your Hamilton functions!: .. code-block:: python diff --git a/docs/concepts/visualization.rst b/docs/concepts/visualization.rst index 83a037ce..407e7c6f 100644 --- a/docs/concepts/visualization.rst +++ b/docs/concepts/visualization.rst @@ -124,7 +124,7 @@ Learn more about :doc:`materialization`. View node dependencies ---------------------- -Representing data pipelines, ML experiments, or LLM applications as a dataflow helps reason about the dependencies between operations. The Apache Hamilton Driver has the following utilities to select and return a list of nodes (to learn more :doc:`../how-tos/use-hamilton-for-lineage`): +Representing data pipelines, ML experiments, or LLM applications as a dataflow helps reason about the dependencies between operations. The Hamilton Driver has the following utilities to select and return a list of nodes (to learn more :doc:`../how-tos/use-hamilton-for-lineage`): - ``.what_is_upstream_of(*node_names: str)`` - ``.what_is_downstream_of(*node_names: str)`` diff --git a/docs/how-tos/load-data.rst b/docs/how-tos/load-data.rst index d0c9675a..cb2fec5a 100644 --- a/docs/how-tos/load-data.rst +++ b/docs/how-tos/load-data.rst @@ -2,7 +2,7 @@ Loading data ================== -While we've been injecting data in from the driver in previous examples, Apache Hamilton functions are fully capable of loading their own data. +While we've been injecting data in from the driver in previous examples, Hamilton functions are fully capable of loading their own data. In the following example, we'll show how to use Apache Hamilton to: 1. Load data from an external source (CSV file and duckdb database) diff --git a/docs/how-tos/pre-commit-hooks.md b/docs/how-tos/pre-commit-hooks.md index b7c519c0..a2856cfc 100644 --- a/docs/how-tos/pre-commit-hooks.md +++ b/docs/how-tos/pre-commit-hooks.md @@ -65,7 +65,7 @@ Apache Hamilton doesn't have many syntactic constraints, but there's a few thing - functions with a name starting with underscore (`_`) are ignored from the dataflow - functions with a `@config` decorator received a trailing double underscore with a suffix (e.g., `hello__weekday()`, `hello__weekend()`) -Instead of reimplementing this logic, we can try to build the Apache Hamilton Driver with the command `hamilton build MODULES` and catch errors. This also ensures the verification is always in sync with the actual build mechanism. This hook will help prevent us from committing invalid dataflow definitions. +Instead of reimplementing this logic, we can try to build the Hamilton Driver with the command `hamilton build MODULES` and catch errors. This also ensures the verification is always in sync with the actual build mechanism. This hook will help prevent us from committing invalid dataflow definitions. ### Checking dataflow paths A dataflow definition might be valid, but it might break paths in unexpected ways. The command `hamilton validate` (which internally uses `Driver.validate_execution()`) can check if a node is reachable. diff --git a/docs/how-tos/use-in-jupyter-notebook.md b/docs/how-tos/use-in-jupyter-notebook.md index cd44a0da..5ad41652 100644 --- a/docs/how-tos/use-in-jupyter-notebook.md +++ b/docs/how-tos/use-in-jupyter-notebook.md @@ -6,10 +6,10 @@ There are two main ways to use Apache Hamilton in a notebook. 2. Import modules into the notebook. ## 1 - Dynamically create modules within your notebook -There's two main ways, using the Apache Hamilton Jupyter magic, or using `ad_hoc_utils` to create a temporary module. +There's two main ways, using the Hamilton Jupyter magic, or using `ad_hoc_utils` to create a temporary module. -### Use Apache Hamilton Jupyter Magic -The Apache Hamilton Jupyter magic allows you to dynamically create a module from a cell in your notebook. This is useful for quick iteration and development. +### Use Hamilton Jupyter Magic +The Hamilton Jupyter magic allows you to dynamically create a module from a cell in your notebook. This is useful for quick iteration and development. Once you're then happy, it's easy to then write out a module with the functions you've developed using `%%writefile` magic. To load the magic: @@ -54,7 +54,7 @@ Once you're happy with the functions you've developed, you can then write them o #### Importing specific functions into cell modules -If you import parts of modules in a Apache Hamilton Jupyter Magic cell, these will need to be reloaded when changes are made to their source. This can be done either by restarting the kernel or with the help of importlib.reload: +If you import parts of modules in a Hamilton Jupyter Magic cell, these will need to be reloaded when changes are made to their source. This can be done either by restarting the kernel or with the help of importlib.reload: ```python %%cell_to_module MODULE_NAME @@ -95,7 +95,7 @@ temp_module = ad_hoc_utils.create_temporary_module( log_avg_3wk_spend, module_name='function_example') ``` -You can now treat `temp_module` like a python module and pass it to your driver and use Apache Hamilton like normal: +You can now treat `temp_module` like a python module and pass it to your driver and use Hamilton like normal: ```python # Step 3 - add the module to the driver and continue as usual @@ -125,17 +125,17 @@ Then to start the notebook server it should just be: ### Step 2— Set up the files <a href="#57fe" id="57fe"></a> 1. Start up your Jupyter notebook. -2. Go to the directory where you want your notebook and Apache Hamilton function module(s) to live. +2. Go to the directory where you want your notebook and Hamilton function module(s) to live. 3. Create a python file(s). Do that by going to “New > text file”. It’ll open a “file” editor view. Name the file and give it a `.py` extension. Once you save it, you’ll see that jupyter now provides python syntax highlighting. Keep this tab open, so you can flip back to it to edit this file. 4. Start up a notebook that you will use in another browser tab. ### Step 3— The basic process of iteration <a href="#e434" id="e434"></a> -At a high level, you will be switching back and forth between your tabs. You will add functions to your Apache Hamilton function python module, and then import/reimport that module into your notebook to get the changes. From there you will then use Apache Hamilton as usual to run and execute things and the notebook for all the standard things you use notebooks for. +At a high level, you will be switching back and forth between your tabs. You will add functions to your Hamilton function python module, and then import/reimport that module into your notebook to get the changes. From there you will then use Apache Hamilton as usual to run and execute things and the notebook for all the standard things you use notebooks for. Let’s walk through an example. -Here’s a function I added to our Apache Hamilton function module. I named the module `some_functions.py` (obviously choose a better name for your situation). +Here’s a function I added to our Hamilton function module. I named the module `some_functions.py` (obviously choose a better name for your situation). ```python import pandas as pd @@ -146,7 +146,7 @@ def avg_3wk_spend(spend: pd.Series) -> pd.Series: return spend.rolling(3).mean() ``` -And here’s what I set up in my notebook to be able to use Apache Hamilton and import this module: +And here’s what I set up in my notebook to be able to use Hamilton and import this module: Cell 1: This just imports the base things we need; see the pro-tip at the bottom of this page for how to automatically reload changes. @@ -156,7 +156,7 @@ import pandas as pd from hamilton import driver ``` -Cell 2: Import your Apache Hamilton function module(s) +Cell 2: Import your Hamilton function module(s) ```python # import your hamilton function module(s) here @@ -172,7 +172,7 @@ importlib.reload(some_functions) What this will do is reload the module, and therefore make sure the code is up to date for you to use. -Cell 4: Use Apache Hamilton +Cell 4: Use Hamilton ```python config = {} @@ -183,7 +183,7 @@ df = dr.execute(['avg_3wk_spend'], inputs=input_data) You should see `foo` printed as an output after running this cell. -Okay, so let’s now say we’re iterating on our Apache Hamilton functions. Go to your Apache Hamilton function module (`some_functions.py` in this example) in your other browser tab, and change the `print("foo")` to something else, e.g. `print("foo-bar").` Save the file — it should look something like this: +Okay, so let’s now say we’re iterating on our Hamilton functions. Go to your Hamilton function module (`some_functions.py` in this example) in your other browser tab, and change the `print("foo")` to something else, e.g. `print("foo-bar").` Save the file — it should look something like this: ```python def avg_3wk_spend(spend: pd.Series) -> pd.Series: @@ -231,15 +231,15 @@ hamilton_driver.execute(['desired_output1', 'desired_output2']) You'd then follow the following process: 1. Write your data transformation in the open python module -2. In the notebook, instantiate a Apache Hamilton driver and test the DAG with a small subset of data.  -3. Because of %autoreload, the module is reimported with the latest changes each time the Apache Hamilton DAG is executed. This approach prevents out-of-order notebook executions, and functions always reside in clean .py files. +2. In the notebook, instantiate a Hamilton Driver and test the DAG with a small subset of data.  +3. Because of %autoreload, the module is reimported with the latest changes each time the Hamilton DAG is executed. This approach prevents out-of-order notebook executions, and functions always reside in clean .py files. Credit: [Thierry Jean's blog post](https://medium.com/@thijean/the-perks-of-creating-dataflows-with-hamilton-36e8c56dd2a). ## Pro-tip: You can import functions directly <a href="#2e10" id="2e10"></a> -The nice thing about forcing Apache Hamilton functions into a module, is that it’s very easy to re-use in another context. E.g. another notebook, or directly. +The nice thing about forcing Hamilton functions into a module, is that it’s very easy to re-use in another context. E.g. another notebook, or directly. For example, it is easy to directly use the functions in the notebook, like so: diff --git a/docs/how-tos/wrapping-driver.rst b/docs/how-tos/wrapping-driver.rst index 2f8bbb32..21ae8843 100644 --- a/docs/how-tos/wrapping-driver.rst +++ b/docs/how-tos/wrapping-driver.rst @@ -1,11 +1,11 @@ Wrapping the Driver ------------------------------ -The APIs that the Apache Hamilton Driver is built on, are considered internal. So it is possible for you to define your own -driver in place of the stock Apache Hamilton driver, we suggest the following path if you don't like how the current Apache Hamilton +The APIs that the Hamilton Driver is built on, are considered internal. So it is possible for you to define your own +driver in place of the stock Hamilton Driver, we suggest the following path if you don't like how the current Apache Hamilton Driver interface is designed: -`Write a "Wrapper" class that delegates to the Apache Hamilton Driver.` +`Write a "Wrapper" class that delegates to the Hamilton Driver.` i.e. @@ -21,11 +21,11 @@ i.e. # ... def my_execute_function(self, arg1, arg2, ...): - """What actually calls the Apache Hamilton""" + """What actually calls the Hamilton""" dr = driver.Driver(self.constructor_arg, ...) df = dr.execute(self.outputs) return self.augmetn(df) -That way, you can create the right API constructs to invoke Apache Hamilton in your context, and then delegate to the stock -Apache Hamilton Driver. By doing so, it will ensure that your code continues to work, since we intend to honor the Apache Hamilton +That way, you can create the right API constructs to invoke Hamilton in your context, and then delegate to the stock +Hamilton Driver. By doing so, it will ensure that your code continues to work, since we intend to honor the Hamilton Driver APIs with backwards compatibility as much as possible. diff --git a/docs/integrations/fastapi.md b/docs/integrations/fastapi.md index 9517bf14..8b0626d9 100644 --- a/docs/integrations/fastapi.md +++ b/docs/integrations/fastapi.md @@ -38,7 +38,7 @@ FastAPI already does a great job at automating API documentation by integrating ## Apache Hamilton + FastAPI Adding Apache Hamilton to your FastAPI server can provide a better separation between the dataflow and the API endpoints. Each endpoint can use `Driver.execute()` to request variables and wrap results into an HTTP response. Then, data transformations and interactions with resources (e.g., database, web service) are curated into standalone Python modules and decoupled from the server code. -Since Apache Hamilton dataflows will run the same way inside or outside FastAPI, you can write simpler unit tests for Apache Hamilton functions without defining a mock server and client. Additionnally, visualizations for the defined Apache Hamilton dataflows can be added to the FastAPI [Swagger UI documentation](https://fastapi.tiangolo.com/features/#automatic-docs). They will remain in sync with the API behavior because they are generated from the code. +Since Apache Hamilton dataflows will run the same way inside or outside FastAPI, you can write simpler unit tests for Hamilton functions without defining a mock server and client. Additionnally, visualizations for the defined Apache Hamilton dataflows can be added to the FastAPI [Swagger UI documentation](https://fastapi.tiangolo.com/features/#automatic-docs). They will remain in sync with the API behavior because they are generated from the code. ### Example In this example, we'll build a backend for a PDF summarizer application. @@ -97,7 +97,7 @@ import summarization app = FastAPI() -# build the Apache Hamilton driver with the summarization module +# build the Hamilton Driver with the summarization module dr = ( driver.Builder() .with_modules(summarization) diff --git a/docs/integrations/streamlit.md b/docs/integrations/streamlit.md index db391b28..65e3db26 100644 --- a/docs/integrations/streamlit.md +++ b/docs/integrations/streamlit.md @@ -192,4 +192,4 @@ if __name__ == "__main__": - **Reusable code**: the module `logic.py` can be reused elsewhere with Apache Hamilton. - If you are building a proof-of-concept with Streamlit, your Apache Hamilton module will be able to grow with your project and be useful for your production pipelines. - If you are already building dataflows with Apache Hamilton, using it with Streamlit ensures your dashboard metrics have the same implementation with your production pipeline (i.e., prevent [implementation skew](https://building.nubank.com.br/dealing-with-train-serve-skew-in-real-time-ml-models-a-short-guide/)) -- **Performance boost**: by caching the Apache Hamilton Driver and its execution call, we are able to effectively cache all data operations in a few lines of code. Furthermore, Apache Hamilton can scale further by using a remote task executor on a separate machine from the Streamlit application. +- **Performance boost**: by caching the Hamilton Driver and its execution call, we are able to effectively cache all data operations in a few lines of code. Furthermore, Apache Hamilton can scale further by using a remote task executor on a separate machine from the Streamlit application. diff --git a/docs/reference/disabling-telemetry.md b/docs/reference/disabling-telemetry.md index 03e9d4f6..38d5783d 100644 --- a/docs/reference/disabling-telemetry.md +++ b/docs/reference/disabling-telemetry.md @@ -1,7 +1,7 @@ # Telemetry If you do not wish to participate in telemetry capture, one can opt-out with one of the following methods: -1. Set it to false programmatically in your code before creating a Apache Hamilton driver: +1. Set it to false programmatically in your code before creating a Hamilton Driver: ```python from hamilton import telemetry telemetry.disable_telemetry() diff --git a/docs/reference/drivers/index.rst b/docs/reference/drivers/index.rst index 491df069..0c534006 100644 --- a/docs/reference/drivers/index.rst +++ b/docs/reference/drivers/index.rst @@ -10,7 +10,7 @@ It's highly parameterizable, allowing you to customize: To tune the above, pass in a Graph Adapter, a Result Builder, and/or anotehr lifecycle method -- see :doc:`../result-builders/index`, :doc:`../graph-adapters/index`. -Let's walk through how you might use the Apache Hamilton Driver. +Let's walk through how you might use the Hamilton Driver. Instantiation ============= diff --git a/docs/reference/graph-adapters/DaskGraphAdapter.rst b/docs/reference/graph-adapters/DaskGraphAdapter.rst index e51a73a1..d22f1ec4 100644 --- a/docs/reference/graph-adapters/DaskGraphAdapter.rst +++ b/docs/reference/graph-adapters/DaskGraphAdapter.rst @@ -2,7 +2,7 @@ h_dask.DaskGraphAdapter ======================= -Runs the entire Apache Hamilton DAG on dask. +Runs the entire Hamilton DAG on dask. .. autoclass:: hamilton.plugins.h_dask.DaskGraphAdapter diff --git a/examples/scikit-learn/species_distribution_modeling/README.md b/examples/scikit-learn/species_distribution_modeling/README.md index 0d084531..62be47b8 100644 --- a/examples/scikit-learn/species_distribution_modeling/README.md +++ b/examples/scikit-learn/species_distribution_modeling/README.md @@ -5,7 +5,7 @@ We translate the Species distribution modeling from scikit-learn into Apache Ham # Highlights Example of a simple ETL pipeline broken into modules with external couplings. -1) To see how to couple external modules / source code and integrate it into a Apache Hamilton DAG check out `grids.py` or `preprocessing.py`, where we use `@pipe` to wrap and inject external functions as Apache Hamilton nodes. +1) To see how to couple external modules / source code and integrate it into a Apache Hamilton DAG check out `grids.py` or `preprocessing.py`, where we use `@pipe` to wrap and inject external functions as Hamilton nodes. 2) To see how to re-use functions check out `train_and_predict.py`, where we use `@pipe_output` to evaluate our model on the individual test and train datasets.  diff --git a/writeups/garbage_collection/post.md b/writeups/garbage_collection/post.md index d061eeff..8044d1e2 100644 --- a/writeups/garbage_collection/post.md +++ b/writeups/garbage_collection/post.md @@ -116,7 +116,7 @@ swap space, takes longer to handle memory retrieval/commitment, and eventually j Now that we have a grasp on the problem and a profiling methodology, let's dig in to see what we can fix. Luckily, the root of the issue was fairly clear. -To execute a Apache Hamilton node and just its required upstream dependencies, we conduct a depth-first-traversal of the graph, storing results +To execute a Hamilton node and just its required upstream dependencies, we conduct a depth-first-traversal of the graph, storing results we've realized in a `computed` dictionary. We use this for multiple purposes -- it can help us (a) avoid recomputing nodes and instead store the results, and (b) return the final results at the end that we need. The problem is that we held onto all results, regardless of whether we would need them later. In the (albeit contrived) script above, we only need the prior node in the chain to compute the current one.
