> Challenge would be... who has capacity to make it? This seems to be not much work - assuming that we agree on how to approach it, I can do it with the help of involved people to understand the single case where I am not sure. I would love some feedback.
I actually did an "inventory" of sorts in the Python 3.13 first-pass PR https://github.com/apache/airflow/pull/46891 - and I can extract google-re removal from it while fixing the dependency issues. so part of the work is already done. And I have concrete proposals for the approach (with one unsure thing). *1. Internal regexps - just use re* We can just use re here. I already removed and reviewed re2 where only "internal" regexps are used. *2. Very likely no changes needed.* We can assume that an admin person configuring it will not be "rogue", so likely setting regexp in configuration does not have to be "protected". * airflow/metrics/validators.py -> metrics allow/block list from configuration * airflow/serialization/serde.py -> allowed_deserialization_classes_regexp from configuration * airflow/models/dagrun.py -> allowed_run_id_pattern * task_sdk/src/airflow/sdk/execution_time/secrets_masker.py -> patterns configured by secret adapters *3. Security model clarification maybe? Or breaking changes?* I think while we could add some breaking changes, it will be better to clarify things in security mode. * airflow/cli/commands/remote_commands/dag_command.py : dag command, mark_success_pattern * airflow/models.dag.py : mark_success_pattern: dag/test dag command * airflow/utils/cli.py: get_dags : dags command Those regexps are provided by potentially "low access" CLI (especially when later we do remote CLI) - so we could likely change it to glob. But instead we can also make a security model explanation and explain that whoever has access to CLI and those specific commands should be trusted not to abuse it and cause ReDoc. Also - we can worry about it when we go remot, and we can also mention that the CLI command line is not covered by "back-compatibility" guarantee (so that we can change it later). * airflow/utils/file.py: those are .*ariflowignore* rules. DAG authors could potentially modify them, but DAG authors can still do "dos" pretty freely with just TOP level code and task doing "stuff" - so while we could consider dropping regexp, I see no big value in it and we can use "re". *4. Unsure (here I need some help) * * task_ids_or_regex (airflow/models/dag.py, sdk/definitions, grid.py -> this is the one that was flagged by the security report (passed as root parameter) - this is used by both UI and exposed via task_sdk. I am not really sure - if we really need to use regex here, maybe glob (which is less flexible and safer) can be used instead of regex even if it will be somewhat breaking/new behaviour for the user? I think Brent, Ash, Amogh, Pierre - you've been touching that one recently. Would love to hear what you think whether we can do something with it? *5. Future protection* Looking at the "number of cases" above and how those regexes could be passed between UI/ task SDK API etc - I am not sure if we can automate "future" avoidance easily. I think we might simply start paying attention and treat the "introduce regexp - you have two problems" saying more seriously during reviews and possibly maybe - similar to the "internal-api" label - flag and fail if a PR introduces a regexp :) ?. Not sure if it's worth it though WDYT? Does the approach look good? Any comments? J. On Wed, Feb 19, 2025 at 9:35 PM Jens Scheffler <j_scheff...@gmx.de.invalid> wrote: > Sounds good to me - I was not aware of user-facing interfaces... and if > removing support then 3.0 is the best point. > > Challenge would be... who has capacity to make it? > > On 19.02.25 15:39, Jed Cunningham wrote: > > Sounds good to me. > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > For additional commands, e-mail: dev-h...@airflow.apache.org > >