Re: [DISCUSS] Getting rid of regular expressions (and getting rid of re2 as result of it) ?

Jarek Potiuk Wed, 19 Feb 2025 15:41:34 -0800

> Challenge would be... who has capacity to make it?

This seems to be not much work - assuming that we agree on how to approach
it, I can do it with the help of involved people to understand
the single case where I am not sure. I would love some feedback.

I actually did an "inventory" of sorts in the Python 3.13 first-pass PR
https://github.com/apache/airflow/pull/46891 - and I can extract google-re
removal from it while fixing the dependency issues. so part of the work is
already done.

And I have concrete proposals for the approach (with one unsure thing).

*1. Internal regexps - just use re*

We can just use re here. I already removed and reviewed re2 where only
"internal" regexps are used.

*2. Very likely no changes needed.*

We can assume that an admin person configuring it will not be "rogue", so
likely setting regexp in configuration does not have to be "protected".

* airflow/metrics/validators.py ->  metrics allow/block list  from
configuration
* airflow/serialization/serde.py -> allowed_deserialization_classes_regexp
from configuration
* airflow/models/dagrun.py -> allowed_run_id_pattern
* task_sdk/src/airflow/sdk/execution_time/secrets_masker.py -> patterns
configured by secret adapters

*3. Security model clarification maybe? Or breaking changes?*

I think while we could add some breaking changes, it will be better to
clarify things in security mode.

* airflow/cli/commands/remote_commands/dag_command.py : dag command,
mark_success_pattern
* airflow/models.dag.py : mark_success_pattern:  dag/test dag command
* airflow/utils/cli.py:  get_dags : dags command

Those regexps are provided by potentially "low access" CLI  (especially
when later we do remote CLI) - so we could likely change it to glob. But
instead we can also make a security model explanation and explain
that whoever has access to CLI and those specific commands should be
trusted not to abuse it and cause ReDoc. Also - we can worry about it when
we go remot, and we can also mention that the CLI command line is not
covered by "back-compatibility" guarantee (so that we can change it later).

* airflow/utils/file.py:  those are .*ariflowignore* rules. DAG authors
could potentially modify them, but DAG authors can still do "dos" pretty
freely with just TOP level code and task doing "stuff" - so while we could
consider dropping regexp, I see no big value in it and we can use "re".

*4. Unsure (here I need some help) *

* task_ids_or_regex (airflow/models/dag.py, sdk/definitions, grid.py ->
this is the one that was flagged by the security report (passed as root
parameter) - this is used by both UI and exposed via task_sdk. I am not
really sure - if we really need to use regex here, maybe glob (which is
less flexible and safer) can be used instead of regex even if it will be
somewhat breaking/new behaviour for the user?

I think Brent, Ash, Amogh, Pierre - you've been touching that one
recently.  Would love to hear what you think whether we can do something
with it?

*5. Future protection*

Looking at the "number of cases" above and how those regexes could be
passed between UI/ task SDK API etc -  I am not sure if we can automate
"future" avoidance easily. I think we might simply start paying attention
and treat the "introduce regexp - you have two problems" saying more
seriously during reviews and possibly maybe - similar to the "internal-api"
label - flag and fail if a PR introduces a regexp :) ?. Not sure if it's
worth it though

WDYT? Does the approach look good? Any comments?

J.

On Wed, Feb 19, 2025 at 9:35 PM Jens Scheffler <j_scheff...@gmx.de.invalid>
wrote:

> Sounds good to me - I was not aware of user-facing interfaces... and if
> removing support then 3.0 is the best point.
>
> Challenge would be... who has capacity to make it?
>
> On 19.02.25 15:39, Jed Cunningham wrote:
> > Sounds good to me.
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
>
>

Re: [DISCUSS] Getting rid of regular expressions (and getting rid of re2 as result of it) ?

Reply via email to