http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_sources/installation.rst.txt ---------------------------------------------------------------------- diff --git a/_sources/installation.rst.txt b/_sources/installation.rst.txt index 6d32c07..5faca5e 100644 --- a/_sources/installation.rst.txt +++ b/_sources/installation.rst.txt @@ -1,3 +1,20 @@ +.. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + Installation ------------ @@ -14,7 +31,7 @@ You can also install Airflow with support for extra features like ``s3`` or ``po .. code-block:: bash - pip install apache-airflow[postgres,s3] + pip install "apache-airflow[s3, postgres]" .. note:: GPL dependency @@ -41,67 +58,66 @@ Here's the list of the subpackages and what they enable: +---------------+----------------------------------------------+-------------------------------------------------+ | subpackage | install command | enables | +===============+==============================================+=================================================+ -| all | ``pip install apache-airflow[all]`` | All Airflow features known to man | -+---------------+----------------------------------------------+-------------------------------------------------+ -| all_dbs | ``pip install apache-airflow[all_dbs]`` | All databases integrations | +| all | ``pip install apache-airflow[all]`` | All Airflow features known to man | +---------------+----------------------------------------------+-------------------------------------------------+ -| async | ``pip install apache-airflow[async]`` | Async worker classes for Gunicorn | +| all_dbs | ``pip install apache-airflow[all_dbs]`` | All databases integrations | +---------------+----------------------------------------------+-------------------------------------------------+ -| celery | ``pip install apache-airflow[celery]`` | CeleryExecutor | +| async | ``pip install apache-airflow[async]`` | Async worker classes for gunicorn | +---------------+----------------------------------------------+-------------------------------------------------+ -| cloudant | ``pip install apache-airflow[cloudant]`` | Cloudant hook | +| devel | ``pip install apache-airflow[devel]`` | Minimum dev tools requirements | +---------------+----------------------------------------------+-------------------------------------------------+ -| crypto | ``pip install apache-airflow[crypto]`` | Encrypt connection passwords in metadata db | +| devel_hadoop | ``pip install apache-airflow[devel_hadoop]`` | Airflow + dependencies on the Hadoop stack | +---------------+----------------------------------------------+-------------------------------------------------+ -| devel | ``pip install apache-airflow[devel]`` | Minimum dev tools requirements | +| celery | ``pip install apache-airflow[celery]`` | CeleryExecutor | +---------------+----------------------------------------------+-------------------------------------------------+ -| devel_hadoop | ``pip install apache-airflow[devel_hadoop]`` | Airflow + dependencies on the Hadoop stack | +| crypto | ``pip install apache-airflow[crypto]`` | Encrypt connection passwords in metadata db | +---------------+----------------------------------------------+-------------------------------------------------+ -| druid | ``pip install apache-airflow[druid]`` | Druid related operators & hooks | +| druid | ``pip install apache-airflow[druid]`` | Druid.io related operators & hooks | +---------------+----------------------------------------------+-------------------------------------------------+ -| gcp_api | ``pip install apache-airflow[gcp_api]`` | Google Cloud Platform hooks and operators | +| gcp_api | ``pip install apache-airflow[gcp_api]`` | Google Cloud Platform hooks and operators | | | | (using ``google-api-python-client``) | +---------------+----------------------------------------------+-------------------------------------------------+ -| hdfs | ``pip install apache-airflow[hdfs]`` | HDFS hooks and operators | +| jdbc | ``pip install apache-airflow[jdbc]`` | JDBC hooks and operators | +---------------+----------------------------------------------+-------------------------------------------------+ -| hive | ``pip install apache-airflow[hive]`` | All Hive related operators | +| hdfs | ``pip install apache-airflow[hdfs]`` | HDFS hooks and operators | +---------------+----------------------------------------------+-------------------------------------------------+ -| jdbc | ``pip install apache-airflow[jdbc]`` | JDBC hooks and operators | +| hive | ``pip install apache-airflow[hive]`` | All Hive related operators | +---------------+----------------------------------------------+-------------------------------------------------+ -| kerbero s | ``pip install apache-airflow[kerberos]`` | Kerberos integration for Kerberized Hadoop | +| kerberos | ``pip install apache-airflow[kerberos]`` | kerberos integration for kerberized hadoop | +---------------+----------------------------------------------+-------------------------------------------------+ -| ldap | ``pip install apache-airflow[ldap]`` | LDAP authentication for users | +| ldap | ``pip install apache-airflow[ldap]`` | ldap authentication for users | +---------------+----------------------------------------------+-------------------------------------------------+ -| mssql | ``pip install apache-airflow[mssql]`` | Microsoft SQL Server operators and hook, | +| mssql | ``pip install apache-airflow[mssql]`` | Microsoft SQL operators and hook, | | | | support as an Airflow backend | +---------------+----------------------------------------------+-------------------------------------------------+ -| mysql | ``pip install apache-airflow[mysql]`` | MySQL operators and hook, support as an Airflow | -| | | backend. The version of MySQL server has to be | -| | | 5.6.4+. The exact version upper bound depends | -| | | on version of ``mysqlclient`` package. For | -| | | example, ``mysqlclient`` 1.3.12 can only be | +| mysql | ``pip install apache-airflow[mysql]`` | MySQL operators and hook, support as | +| | | an Airflow backend. The version of MySQL server | +| | | has to be 5.6.4+. The exact version upper bound | +| | | depends on version of ``mysqlclient`` package. | +| | | For example, ``mysqlclient`` 1.3.12 can only be | | | | used with MySQL server 5.6.4 through 5.7. | +---------------+----------------------------------------------+-------------------------------------------------+ -| password | ``pip install apache-airflow[password]`` | Password authentication for users | +| password | ``pip install apache-airflow[password]`` | Password Authentication for users | +---------------+----------------------------------------------+-------------------------------------------------+ -| postgres | ``pip install apache-airflow[postgres]`` | PostgreSQL operators and hook, support as an | -| | | Airflow backend | +| postgres | ``pip install apache-airflow[postgres]`` | Postgres operators and hook, support | +| | | as an Airflow backend | +---------------+----------------------------------------------+-------------------------------------------------+ -| qds | ``pip install apache-airflow[qds]`` | Enable QDS (Qubole Data Service) support | +| qds | ``pip install apache-airflow[qds]`` | Enable QDS (qubole data services) support | +---------------+----------------------------------------------+-------------------------------------------------+ -| rabbitmq | ``pip install apache-airflow[rabbitmq]`` | RabbitMQ support as a Celery backend | +| rabbitmq | ``pip install apache-airflow[rabbitmq]`` | Rabbitmq support as a Celery backend | +---------------+----------------------------------------------+-------------------------------------------------+ -| redis | ``pip install apache-airflow[redis]`` | Redis hooks and sensors | +| s3 | ``pip install apache-airflow[s3]`` | ``S3KeySensor``, ``S3PrefixSensor`` | +---------------+----------------------------------------------+-------------------------------------------------+ -| s3 | ``pip install apache-airflow[s3]`` | ``S3KeySensor``, ``S3PrefixSensor`` | +| samba | ``pip install apache-airflow[samba]`` | ``Hive2SambaOperator`` | +---------------+----------------------------------------------+-------------------------------------------------+ -| samba | ``pip install apache-airflow[samba]`` | ``Hive2SambaOperator`` | +| slack | ``pip install apache-airflow[slack]`` | ``SlackAPIPostOperator`` | +---------------+----------------------------------------------+-------------------------------------------------+ -| slack | ``pip install apache-airflow[slack]`` | ``SlackAPIPostOperator`` | +| vertica | ``pip install apache-airflow[vertica]`` | Vertica hook | +| | | support as an Airflow backend | +---------------+----------------------------------------------+-------------------------------------------------+ -| ssh | ``pip install apache-airflow[ssh]`` | SSH hooks and Operator | +| cloudant | ``pip install apache-airflow[cloudant]`` | Cloudant hook | +---------------+----------------------------------------------+-------------------------------------------------+ -| vertica | ``pip install apache-airflow[vertica]`` | Vertica hook support as an Airflow backend | +| redis | ``pip install apache-airflow[redis]`` | Redis hooks and sensors | +---------------+----------------------------------------------+-------------------------------------------------+ Initiating Airflow Database
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_sources/integration.rst.txt ---------------------------------------------------------------------- diff --git a/_sources/integration.rst.txt b/_sources/integration.rst.txt index 4c513bf..1ab8e60 100644 --- a/_sources/integration.rst.txt +++ b/_sources/integration.rst.txt @@ -1,3 +1,20 @@ +.. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + Integration =========== @@ -64,6 +81,15 @@ Your reverse proxy (ex: nginx) should be configured as follow: } } +To ensure that Airflow generates URLs with the correct scheme when +running behind a TLS-terminating proxy, you should configure the proxy +to set the `X-Forwarded-Proto` header, and enable the `ProxyFix` +middleware in your `airflow.cfg`:: + + enable_proxy_fix = True + +Note: you should only enable the `ProxyFix` middleware when running +Airflow behind a trusted proxy (AWS ELB, nginx, etc.). .. _Azure: @@ -439,6 +465,135 @@ BigQueryHook .. autoclass:: airflow.contrib.hooks.bigquery_hook.BigQueryHook :members: +Cloud SQL +''''''''' + +Cloud SQL Operators +""""""""""""""""""" + +- :ref:`CloudSqlInstanceDatabaseDeleteOperator` : deletes a database from a Cloud SQL +instance. +- :ref:`CloudSqlInstanceDatabaseCreateOperator` : creates a new database inside a Cloud +SQL instance. +- :ref:`CloudSqlInstanceDatabasePatchOperator` : updates a database inside a Cloud +SQL instance. +- :ref:`CloudSqlInstanceDeleteOperator` : delete a Cloud SQL instance. +- :ref:`CloudSqlInstanceCreateOperator` : create a new Cloud SQL instance. +- :ref:`CloudSqlInstancePatchOperator` : patch a Cloud SQL instance. + +.. CloudSqlInstanceDatabaseDeleteOperator: + +CloudSqlInstanceDatabaseDeleteOperator +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. autoclass:: airflow.contrib.operators.gcp_sql_operator.CloudSqlInstanceDatabaseDeleteOperator + +.. CloudSqlInstanceDatabaseCreateOperator: + +CloudSqlInstanceDatabaseCreateOperator +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. autoclass:: airflow.contrib.operators.gcp_sql_operator.CloudSqlInstanceDatabaseCreateOperator + +.. CloudSqlInstanceDatabasePatchOperator: + +CloudSqlInstanceDatabasePatchOperator +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. autoclass:: airflow.contrib.operators.gcp_sql_operator.CloudSqlInstanceDatabasePatchOperator + +.. CloudSqlInstanceDeleteOperator: + +CloudSqlInstanceDeleteOperator +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. autoclass:: airflow.contrib.operators.gcp_sql_operator.CloudSqlInstanceDeleteOperator + +.. CloudSqlInstanceCreateOperator: + +CloudSqlInstanceCreateOperator +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. autoclass:: airflow.contrib.operators.gcp_sql_operator.CloudSqlInstanceCreateOperator + +.. CloudSqlInstancePatchOperator: + +CloudSqlInstancePatchOperator +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. autoclass:: airflow.contrib.operators.gcp_sql_operator.CloudSqlInstancePatchOperator + +Cloud SQL Hook +"""""""""""""" + +.. autoclass:: airflow.contrib.hooks.gcp_sql_hook.CloudSqlHook + :members: + +Compute Engine +'''''''''''''' + +Compute Engine Operators +"""""""""""""""""""""""" + +- :ref:`GceInstanceStartOperator` : start an existing Google Compute Engine instance. +- :ref:`GceInstanceStopOperator` : stop an existing Google Compute Engine instance. +- :ref:`GceSetMachineTypeOperator` : change the machine type for a stopped instance. + +.. _GceInstanceStartOperator: + +GceInstanceStartOperator +^^^^^^^^^^^^^^^^^^^^^^^^ + +.. autoclass:: airflow.contrib.operators.gcp_compute_operator.GceInstanceStartOperator + +.. _GceInstanceStopOperator: + +GceInstanceStopOperator +^^^^^^^^^^^^^^^^^^^^^^^ + +.. autoclass:: airflow.contrib.operators.gcp_compute_operator.GceInstanceStopOperator + +.. _GceSetMachineTypeOperator: + +GceSetMachineTypeOperator +^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. autoclass:: airflow.contrib.operators.gcp_compute_operator.GceSetMachineTypeOperator + + +Cloud Functions +''''''''''''''' + +Cloud Functions Operators +""""""""""""""""""""""""" + +- :ref:`GcfFunctionDeployOperator` : deploy Google Cloud Function to Google Cloud Platform +- :ref:`GcfFunctionDeleteOperator` : delete Google Cloud Function in Google Cloud Platform + +.. autoclass:: airflow.contrib.operators.gcp_operator.GCP + +.. _GcfFunctionDeployOperator: + +GcfFunctionDeployOperator +^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. autoclass:: airflow.contrib.operators.gcp_function_operator.GcfFunctionDeployOperator + + +.. _GcfFunctionDeleteOperator: + +GcfFunctionDeleteOperator +^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. autoclass:: airflow.contrib.operators.gcp_function_operator.GcfFunctionDeleteOperator + + +Cloud Functions Hook +"""""""""""""""""""" + +.. autoclass:: airflow.contrib.hooks.gcp_function_hook.GcfHook + :members: + Cloud DataFlow '''''''''''''' @@ -776,12 +931,6 @@ GKEClusterDeleteOperator .. autoclass:: airflow.contrib.operators.gcp_container_operator.GKEClusterDeleteOperator .. _GKEClusterDeleteOperator: -GKEPodOperator -^^^^^^^^^^^^^^ - -.. autoclass:: airflow.contrib.operators.gcp_container_operator.GKEPodOperator -.. _GKEPodOperator: - Google Kubernetes Engine Hook """"""""""""""""""""""""""""" http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_sources/kubernetes.rst.txt ---------------------------------------------------------------------- diff --git a/_sources/kubernetes.rst.txt b/_sources/kubernetes.rst.txt index a491685..372b27c 100644 --- a/_sources/kubernetes.rst.txt +++ b/_sources/kubernetes.rst.txt @@ -1,10 +1,28 @@ +.. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + Kubernetes Executor ^^^^^^^^^^^^^^^^^^^ The kubernetes executor is introduced in Apache Airflow 1.10.0. The Kubernetes executor will create a new pod for every task instance. Example helm charts are available at `scripts/ci/kubernetes/kube/{airflow,volumes,postgres}.yaml` in the source distribution. The volumes are optional and depend on your configuration. There are two volumes available: -- Dags: by storing all the dags onto the persistent disks, all the workers can read the dags from there. Another option is using git-sync, before starting the container, a git pull of the dags repository will be performed and used throughout the lifecycle of the pod/ + +- Dags: by storing all the dags onto the persistent disks, all the workers can read the dags from there. Another option is using git-sync, before starting the container, a git pull of the dags repository will be performed and used throughout the lifecycle of the pod. - Logs: by storing the logs onto a persistent disk, all the logs will be available for all the workers and the webserver itself. If you don't configure this, the logs will be lost after the worker pods shuts down. Another option is to use S3/GCS/etc to store the logs. @@ -81,6 +99,14 @@ Kubernetes Operator } } + tolerations = [ + { + 'key': "key", + 'operator': 'Equal', + 'value': 'value' + } + ] + k = KubernetesPodOperator(namespace='default', image="ubuntu:16.04", cmds=["bash", "-cx"], @@ -91,11 +117,13 @@ Kubernetes Operator volume_mounts=[volume_mount] name="test", task_id="task", - affinity=affinity + affinity=affinity, + is_delete_operator_pod=True, + hostnetwork=False, + tolerations=tolerations ) .. autoclass:: airflow.contrib.operators.kubernetes_pod_operator.KubernetesPodOperator .. autoclass:: airflow.contrib.kubernetes.secret.Secret - http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_sources/license.rst.txt ---------------------------------------------------------------------- diff --git a/_sources/license.rst.txt b/_sources/license.rst.txt index 3c53035..bcb2b76 100644 --- a/_sources/license.rst.txt +++ b/_sources/license.rst.txt @@ -1,3 +1,20 @@ +.. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + License ======= http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_sources/lineage.rst.txt ---------------------------------------------------------------------- diff --git a/_sources/lineage.rst.txt b/_sources/lineage.rst.txt index 719ef01..c94fe70 100644 --- a/_sources/lineage.rst.txt +++ b/_sources/lineage.rst.txt @@ -1,3 +1,20 @@ +.. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + Lineage ======= http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_sources/plugins.rst.txt ---------------------------------------------------------------------- diff --git a/_sources/plugins.rst.txt b/_sources/plugins.rst.txt index 3f1f7ee..61b4957 100644 --- a/_sources/plugins.rst.txt +++ b/_sources/plugins.rst.txt @@ -1,3 +1,20 @@ +.. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + Plugins ======= @@ -72,31 +89,16 @@ looks like: # A list of objects created from a class derived # from flask_admin.BaseView admin_views = [] - # A list of Blueprint object created from flask.Blueprint + # A list of Blueprint object created from flask.Blueprint. For use with the flask_admin based GUI flask_blueprints = [] - # A list of menu links (flask_admin.base.MenuLink) + # A list of menu links (flask_admin.base.MenuLink). For use with the flask_admin based GUI menu_links = [] + # A list of dictionaries containing FlaskAppBuilder BaseView object and some metadata. See example below + appbuilder_views = [] + # A list of dictionaries containing FlaskAppBuilder BaseView object and some metadata. See example below + appbuilder_menu_items = [] -You can derive it by inheritance (please refer to the example below). -Please note ``name`` inside this class must be specified. - -After the plugin is imported into Airflow, -you can invoke it using statement like - - -.. code:: python - - from airflow.{type, like "operators", "sensors"}.{name specificed inside the plugin class} import * - - -When you write your own plugins, make sure you understand them well. -There are some essential properties for each type of plugin. -For example, - -* For ``Operator`` plugin, an ``execute`` method is compulsory. -* For ``Sensor`` plugin, a ``poke`` method returning a Boolean value is compulsory. - Example ------- @@ -159,6 +161,22 @@ definitions in Airflow. name='Test Menu Link', url='https://airflow.incubator.apache.org/') + # Creating a flask appbuilder BaseView + class TestAppBuilderBaseView(AppBuilderBaseView): + @expose("/") + def test(self): + return self.render("test_plugin/test.html", content="Hello galaxy!") + v_appbuilder_view = TestAppBuilderBaseView() + v_appbuilder_package = {"name": "Test View", + "category": "Test Plugin", + "view": v_appbuilder_view} + + # Creating a flask appbuilder Menu Item + appbuilder_mitem = {"name": "Google", + "category": "Search", + "category_icon": "fa-th", + "href": "https://www.google.com"} + # Defining the plugin class class AirflowTestPlugin(AirflowPlugin): name = "test_plugin" @@ -170,3 +188,13 @@ definitions in Airflow. admin_views = [v] flask_blueprints = [bp] menu_links = [ml] + appbuilder_views = [v_appbuilder_package] + appbuilder_menu_items = [appbuilder_mitem] + + +Note on role based views +------------------------ + +Airflow 1.10 introduced role based views using FlaskAppBuilder. You can configure which UI is used by setting +rbac = True. To support plugin views and links for both versions of the UI and maintain backwards compatibility, +the fields appbuilder_views and appbuilder_menu_items were added to the AirflowTestPlugin class. http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_sources/profiling.rst.txt ---------------------------------------------------------------------- diff --git a/_sources/profiling.rst.txt b/_sources/profiling.rst.txt index 0910233..c4f1c0b 100644 --- a/_sources/profiling.rst.txt +++ b/_sources/profiling.rst.txt @@ -1,3 +1,22 @@ +.. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +.. TODO: This section would be removed after we migrate to www_rbac completely. + Data Profiling ============== http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_sources/project.rst.txt ---------------------------------------------------------------------- diff --git a/_sources/project.rst.txt b/_sources/project.rst.txt index cd3b60f..6e4074f 100644 --- a/_sources/project.rst.txt +++ b/_sources/project.rst.txt @@ -1,3 +1,20 @@ +.. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + Project ======= @@ -30,7 +47,6 @@ Committers - @fokko (Fokko Driesprong) - @ash (Ash Berlin-Taylor) - @kaxilnaik (Kaxil Naik) -- @feng-tao (Tao Feng) For the full list of contributors, take a look at `Airflow's Github Contributor page: http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_sources/scheduler.rst.txt ---------------------------------------------------------------------- diff --git a/_sources/scheduler.rst.txt b/_sources/scheduler.rst.txt index 3e89589..72a3d8f 100644 --- a/_sources/scheduler.rst.txt +++ b/_sources/scheduler.rst.txt @@ -1,3 +1,20 @@ +.. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + Scheduling & Triggers ===================== @@ -13,7 +30,7 @@ execute ``airflow scheduler``. It will use the configuration specified in ``airflow.cfg``. Note that if you run a DAG on a ``schedule_interval`` of one day, -the run stamped ``2016-01-01`` will be triggered soon after ``2016-01-01T23:59``. +the run stamped ``2016-01-01`` will be trigger soon after ``2016-01-01T23:59``. In other words, the job instance is started once the period it covers has ended. @@ -63,6 +80,8 @@ use one of these cron "preset": | ``@yearly`` | Run once a year at midnight of January 1 | ``0 0 1 1 *`` | +--------------+----------------------------------------------------------------+---------------+ +**Note**: Use ``schedule_interval=None`` and not ``schedule_interval='None'`` when +you don't want to schedule your DAG. Your DAG will be instantiated for each schedule, while creating a ``DAG Run`` entry for each schedule. @@ -134,8 +153,6 @@ specific ``run_id``. The ``DAG Runs`` created externally to the scheduler get associated to the trigger's timestamp, and will be displayed in the UI alongside scheduled ``DAG runs``. -In addition, you can also manually trigger a ``DAG Run`` using the web UI (tab "DAGs" -> column "Links" -> button "Trigger Dag"). - To Keep in Mind ''''''''''''''' @@ -160,7 +177,6 @@ Here are some of the ways you can **unblock tasks**: states (``failed``, or ``success``) * Clearing a task instance will no longer delete the task instance record. Instead it updates max_tries and set the current task instance state to be None. -* Marking task instances as failed can be done through the UI. This can be used to stop running task instances. * Marking task instances as successful can be done through the UI. This is mostly to fix false negatives, or for instance when the fix has been applied outside of Airflow. * The ``airflow backfill`` CLI subcommand has a flag to ``--mark_success`` and allows selecting http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_sources/security.rst.txt ---------------------------------------------------------------------- diff --git a/_sources/security.rst.txt b/_sources/security.rst.txt index 253587a..0be3609 100644 --- a/_sources/security.rst.txt +++ b/_sources/security.rst.txt @@ -1,3 +1,20 @@ +.. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + Security ======== @@ -10,6 +27,13 @@ backends or creating your own. Be sure to checkout :doc:`api` for securing the API. +.. note:: + + Airflow uses the config parser of Python. This config parser interpolates + '%'-signs. Make sure escape any ``%`` signs in your config file (but not + environment variables) as ``%%``, otherwise Airflow might leak these + passwords on a config parser exception to a log. + Web Authentication ------------------ @@ -54,8 +78,7 @@ LDAP '''' To turn on LDAP authentication configure your ``airflow.cfg`` as follows. Please note that the example uses -an encrypted connection to the ldap server as you probably do not want passwords be readable on the network level. -It is however possible to configure without encryption if you really want to. +an encrypted connection to the ldap server as we do not want passwords be readable on the network level. Additionally, if you are using Active Directory, and are not explicitly specifying an OU that your users are in, you will need to change ``search_scope`` to "SUBTREE". @@ -280,7 +303,7 @@ Google Authentication ''''''''''''''''''''' The Google authentication backend can be used to authenticate users -against Google using OAuth2. You must specify the email domains to restrict +against Google using OAuth2. You must specify the domains to restrict login, separated with a comma, to only members of those domains. .. code-block:: bash @@ -337,10 +360,10 @@ certs and keys. .. code-block:: bash [celery] - CELERY_SSL_ACTIVE = True - CELERY_SSL_KEY = <path to key> - CELERY_SSL_CERT = <path to cert> - CELERY_SSL_CACERT = <path to cacert> + ssl_active = True + ssl_key = <path to key> + ssl_cert = <path to cert> + ssl_cacert = <path to cacert> Impersonation ------------- http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_sources/start.rst.txt ---------------------------------------------------------------------- diff --git a/_sources/start.rst.txt b/_sources/start.rst.txt index a3e21f9..e3b16b2 100644 --- a/_sources/start.rst.txt +++ b/_sources/start.rst.txt @@ -1,3 +1,20 @@ +.. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + Quick Start ----------- http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_sources/timezone.rst.txt ---------------------------------------------------------------------- diff --git a/_sources/timezone.rst.txt b/_sources/timezone.rst.txt index 9e8598e..078f948 100644 --- a/_sources/timezone.rst.txt +++ b/_sources/timezone.rst.txt @@ -1,24 +1,41 @@ +.. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + Time zones ========== Support for time zones is enabled by default. Airflow stores datetime information in UTC internally and in the database. -It allows you to run your DAGs with time zone dependent schedules. At the moment Airflow does not convert them to the -end userâs time zone in the user interface. There it will always be displayed in UTC. Also templates used in Operators +It allows you to run your DAGs with time zone dependent schedules. At the moment Airflow does not convert them to the +end userâs time zone in the user interface. There it will always be displayed in UTC. Also templates used in Operators are not converted. Time zone information is exposed and it is up to the writer of DAG what do with it. -This is handy if your users live in more than one time zone and you want to display datetime information according to +This is handy if your users live in more than one time zone and you want to display datetime information according to each userâs wall clock. -Even if you are running Airflow in only one time zone it is still good practice to store data in UTC in your database -(also before Airflow became time zone aware this was also to recommended or even required setup). The main reason is -Daylight Saving Time (DST). Many countries have a system of DST, where clocks are moved forward in spring and backward -in autumn. If youâre working in local time, youâre likely to encounter errors twice a year, when the transitions -happen. (The pendulum and pytz documentation discusses these issues in greater detail.) This probably doesnât matter -for a simple DAG, but itâs a problem if you are in, for example, financial services where you have end of day -deadlines to meet. +Even if you are running Airflow in only one time zone it is still good practice to store data in UTC in your database +(also before Airflow became time zone aware this was also to recommended or even required setup). The main reason is +Daylight Saving Time (DST). Many countries have a system of DST, where clocks are moved forward in spring and backward +in autumn. If youâre working in local time, youâre likely to encounter errors twice a year, when the transitions +happen. (The pendulum and pytz documentation discusses these issues in greater detail.) This probably doesnât matter +for a simple DAG, but itâs a problem if you are in, for example, financial services where you have end of day +deadlines to meet. -The time zone is set in `airflow.cfg`. By default it is set to utc, but you change it to use the systemâs settings or -an arbitrary IANA time zone, e.g. `Europe/Amsterdam`. It is dependent on `pendulum`, which is more accurate than `pytz`. +The time zone is set in `airflow.cfg`. By default it is set to utc, but you change it to use the systemâs settings or +an arbitrary IANA time zone, e.g. `Europe/Amsterdam`. It is dependent on `pendulum`, which is more accurate than `pytz`. Pendulum is installed when you install Airflow. Please note that the Web UI currently only runs in UTC. @@ -28,8 +45,8 @@ Concepts Naïve and aware datetime objects '''''''''''''''''''''''''''''''' -Pythonâs datetime.datetime objects have a tzinfo attribute that can be used to store time zone information, -represented as an instance of a subclass of datetime.tzinfo. When this attribute is set and describes an offset, +Pythonâs datetime.datetime objects have a tzinfo attribute that can be used to store time zone information, +represented as an instance of a subclass of datetime.tzinfo. When this attribute is set and describes an offset, a datetime object is aware. Otherwise, itâs naive. You can use timezone.is_aware() and timezone.is_naive() to determine whether datetimes are aware or naive. @@ -39,7 +56,7 @@ Because Airflow uses time-zone-aware datetime objects. If your code creates date .. code:: python from airflow.utils import timezone - + now = timezone.utcnow() a_date = timezone.datetime(2017,1,1) @@ -49,9 +66,9 @@ Interpretation of naive datetime objects Although Airflow operates fully time zone aware, it still accepts naive date time objects for `start_dates` and `end_dates` in your DAG definitions. This is mostly in order to preserve backwards compatibility. In -case a naive `start_date` or `end_date` is encountered the default time zone is applied. It is applied +case a naive `start_date` or `end_date` is encountered the default time zone is applied. It is applied in such a way that it is assumed that the naive date time is already in the default time zone. In other -words if you have a default time zone setting of `Europe/Amsterdam` and create a naive datetime `start_date` of +words if you have a default time zone setting of `Europe/Amsterdam` and create a naive datetime `start_date` of `datetime(2017,1,1)` it is assumed to be a `start_date` of Jan 1, 2017 Amsterdam time. .. code:: python @@ -65,16 +82,16 @@ words if you have a default time zone setting of `Europe/Amsterdam` and create a op = DummyOperator(task_id='dummy', dag=dag) print(op.owner) # Airflow -Unfortunately, during DST transitions, some datetimes donât exist or are ambiguous. -In such situations, pendulum raises an exception. Thatâs why you should always create aware +Unfortunately, during DST transitions, some datetimes donât exist or are ambiguous. +In such situations, pendulum raises an exception. Thatâs why you should always create aware datetime objects when time zone support is enabled. -In practice, this is rarely an issue. Airflow gives you aware datetime objects in the models and DAGs, and most often, -new datetime objects are created from existing ones through timedelta arithmetic. The only datetime thatâs often +In practice, this is rarely an issue. Airflow gives you aware datetime objects in the models and DAGs, and most often, +new datetime objects are created from existing ones through timedelta arithmetic. The only datetime thatâs often created in application code is the current time, and timezone.utcnow() automatically does the right thing. -Default time zone +Default time zone ''''''''''''''''' The default time zone is the time zone defined by the `default_timezone` setting under `[core]`. If @@ -92,15 +109,15 @@ it is therefore important to make sure this setting is equal on all Airflow node Time zone aware DAGs -------------------- -Creating a time zone aware DAG is quite simple. Just make sure to supply a time zone aware `start_date`. It is +Creating a time zone aware DAG is quite simple. Just make sure to supply a time zone aware `start_date`. It is recommended to use `pendulum` for this, but `pytz` (to be installed manually) can also be used for this. .. code:: python import pendulum - + local_tz = pendulum.timezone("Europe/Amsterdam") - + default_args=dict( start_date=datetime(2016, 1, 1, tzinfo=local_tz), owner='Airflow' @@ -110,18 +127,21 @@ recommended to use `pendulum` for this, but `pytz` (to be installed manually) ca op = DummyOperator(task_id='dummy', dag=dag) print(dag.timezone) # <Timezone [Europe/Amsterdam]> - +Please note that while it is possible to set a `start_date` and `end_date` for Tasks always the DAG timezone +or global timezone (in that order) will be used to calculate the next execution date. Upon first encounter +the start date or end date will be converted to UTC using the timezone associated with start_date or end_date, +then for calculations this timezone information will be disregarded. Templates ''''''''' -Airflow returns time zone aware datetimes in templates, but does not convert them to local time so they remain in UTC. +Airflow returns time zone aware datetimes in templates, but does not convert them to local time so they remain in UTC. It is left up to the DAG to handle this. .. code:: python import pendulum - + local_tz = pendulum.timezone("Europe/Amsterdam") local_tz.convert(execution_date) @@ -129,10 +149,10 @@ It is left up to the DAG to handle this. Cron schedules '''''''''''''' -In case you set a cron schedule, Airflow assumes you will always want to run at the exact same time. It will -then ignore day light savings time. Thus, if you have a schedule that says -run at end of interval every day at 08:00 GMT+1 it will always run end of interval 08:00 GMT+1, -regardless if day light savings time is in place. +In case you set a cron schedule, Airflow assumes you will always want to run at the exact same time. It will +then ignore day light savings time. Thus, if you have a schedule that says +run at end of interval every day at 08:00 GMT+1 it will always run end of interval 08:00 GMT+1, +regardless if day light savings time is in place. Time deltas http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_sources/tutorial.rst.txt ---------------------------------------------------------------------- diff --git a/_sources/tutorial.rst.txt b/_sources/tutorial.rst.txt index 1c2dfd6..69670d7 100644 --- a/_sources/tutorial.rst.txt +++ b/_sources/tutorial.rst.txt @@ -1,3 +1,20 @@ +.. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + Tutorial ================ @@ -37,7 +54,7 @@ complicated, a line by line explanation follows below. # 'end_date': datetime(2016, 1, 1), } - dag = DAG('tutorial', default_args=default_args) + dag = DAG('tutorial', default_args=default_args, schedule_interval=timedelta(days=1)) # t1, t2 and t3 are examples of tasks created by instantiating operators t1 = BashOperator( @@ -147,7 +164,7 @@ define a ``schedule_interval`` of 1 day for the DAG. .. code:: python dag = DAG( - 'tutorial', default_args=default_args, schedule_interval=timedelta(1)) + 'tutorial', default_args=default_args, schedule_interval=timedelta(days=1)) Tasks ----- @@ -247,23 +264,36 @@ in templates, make sure to read through the :ref:`macros` section Setting up Dependencies ----------------------- -We have two simple tasks that do not depend on each other. Here's a few ways +We have tasks `t1`, `t2` and `t3` that do not depend on each other. Here's a few ways you can define dependencies between them: .. code:: python - t2.set_upstream(t1) + t1.set_downstream(t2) # This means that t2 will depend on t1 - # running successfully to run - # It is equivalent to - # t1.set_downstream(t2) + # running successfully to run. + # It is equivalent to: + t2.set_upstream(t1) - t3.set_upstream(t1) + # The bit shift operator can also be + # used to chain operations: + t1 >> t2 + + # And the upstream dependency with the + # bit shift operator: + t2 << t1 + + # Chaining multiple dependencies becomes + # concise with the bit shift operator: + t1 >> t2 >> t3 - # all of this is equivalent to - # dag.set_dependency('print_date', 'sleep') - # dag.set_dependency('print_date', 'templated') + # A list of tasks can also be set as + # dependencies. These operations + # all have the same effect: + t1.set_downstream([t2, t3]) + t1 >> [t2, t3] + [t2, t3] << t1 Note that when executing your script, Airflow will raise exceptions when it finds cycles in your DAG or when a dependency is referenced more @@ -277,8 +307,8 @@ something like this: .. code:: python """ - Code that goes along with the Airflow located at: - http://airflow.readthedocs.org/en/latest/tutorial.html + Code that goes along with the Airflow tutorial located at: + https://github.com/apache/incubator-airflow/blob/master/airflow/example_dags/tutorial.py """ from airflow import DAG from airflow.operators.bash_operator import BashOperator @@ -301,7 +331,7 @@ something like this: } dag = DAG( - 'tutorial', default_args=default_args, schedule_interval=timedelta(1)) + 'tutorial', default_args=default_args, schedule_interval=timedelta(days=1)) # t1, t2 and t3 are examples of tasks created by instantiating operators t1 = BashOperator( http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_sources/ui.rst.txt ---------------------------------------------------------------------- diff --git a/_sources/ui.rst.txt b/_sources/ui.rst.txt index 4b232fa..a2f6b85 100644 --- a/_sources/ui.rst.txt +++ b/_sources/ui.rst.txt @@ -1,6 +1,23 @@ +.. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + UI / Screenshots ================= -The Airflow UI make it easy to monitor and troubleshoot your data pipelines. +The Airflow UI makes it easy to monitor and troubleshoot your data pipelines. Here's a quick overview of some of the features and visualizations you can find in the Airflow UI. http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_static/basic.css ---------------------------------------------------------------------- diff --git a/_static/basic.css b/_static/basic.css index 19ced10..104f076 100644 --- a/_static/basic.css +++ b/_static/basic.css @@ -81,6 +81,10 @@ div.sphinxsidebar input { font-size: 1em; } +div.sphinxsidebar #searchbox form.search { + overflow: hidden; +} + div.sphinxsidebar #searchbox input[type="text"] { float: left; width: 80%; @@ -427,6 +431,13 @@ table.field-list td, table.field-list th { hyphens: manual; } +/* -- hlist styles ---------------------------------------------------------- */ + +table.hlist td { + vertical-align: top; +} + + /* -- other body styles ----------------------------------------------------- */ ol.arabic {
