Re: [PR] Improve translation completeness checker (required keys, coverage, unused) [airflow]

via GitHub Fri, 06 Mar 2026 02:07:19 -0800


Copilot commented on code in PR #62983:
URL: https://github.com/apache/airflow/pull/62983#discussion_r2894918357



##########
dev/breeze/src/airflow_breeze/commands/ui_commands.py:
##########
@@ -88,7 +88,7 @@ def char_key(c: str) -> tuple:
     "ko": ["_other"],
     "nl": MOST_COMMON_PLURAL_SUFFIXES,
     "pl": ["_one", "_few", "_many", "_other"],
-    "pt": ["_zero", "_one", "_many", "_other"],
+    "pt": MOST_COMMON_PLURAL_SUFFIXES,

Review Comment:
   The Portuguese (`pt`) plural suffix list is changed from `["_zero", "_one", 
"_many", "_other"]` to `MOST_COMMON_PLURAL_SUFFIXES` (i.e. `["_one", 
"_other"]`). Portuguese (European) uses four plural categories in i18next: 
`zero`, `one`, `many`, and `other` (see the i18next plural rules). The PT 
locale already has many `_zero` and `_many` keys that are meaningful 
translations (e.g. `"connection_zero": "Nenhuma conexão"`, `"dag_many": 
"Dags"`, etc.). After this change, all those keys would be reported as "unused" 
and would be deleted if `--remove-unused` is run, causing incorrect 
pluralization for Portuguese users at runtime (e.g. `t("connection", {count: 
0})` expects a `connection_zero` key). This is a correctness regression for the 
Portuguese locale.
   ```suggestion
       "pt": ["_zero", "_one", "_many", "_other"],
   ```



##########
dev/breeze/src/airflow_breeze/commands/ui_commands.py:
##########
@@ -355,91 +391,93 @@ def print_translation_progress(locale_files, 
missing_counts, summary):
         table = tables[lang]
         table.title = f"Translation Progress: {lang}"
         table.add_column("File", style="bold cyan")
-        table.add_column("Missing", style="red")
-        table.add_column("Extra", style="yellow")
-        table.add_column("TODOs", style="magenta")
+        table.add_column("Required (base EN)", style="dim")
+        table.add_column("Required (plural)", style="dim")
+        table.add_column("Total required", style="bold")
         table.add_column("Translated", style="green")
-        table.add_column("Total", style="bold")
+        table.add_column("Missing", style="red")
         table.add_column("Coverage", style="bold")
-        table.add_column("Completed", style="bold")
+        table.add_column("TODOs", style="magenta")
+        table.add_column("Unused", style="yellow")
+        total_required_base = 0
+        total_required_plural = 0
+        total_required = 0
+        total_translated = 0
         total_missing = 0
-        total_extra = 0
         total_todos = 0
-        total_translated = 0
-        total_total = 0
+        total_unused = 0
         for filename in sorted(all_files):
-            file_path = Path(LOCALES_DIR / lang / filename)
-            # Always get total from English version
             en_path = Path(LOCALES_DIR / "en" / filename)
-            if en_path.exists():
-                with open(en_path) as f:
-                    en_data = json.load(f)
-                file_total = sum(1 for _ in flatten_keys(en_data))
-            else:
-                file_total = 0
+            file_path = Path(LOCALES_DIR / lang / filename)
+            diff = summary.get(filename, LocaleSummary({}, {}))
+            if not en_path.exists():
+                continue
+            with open(en_path) as f:
+                en_data = json.load(f)
+            en_keys = set(flatten_keys(en_data))
+            en_key_to_value = flatten_keys_to_values(en_data)
+            required_keys = expand_plural_keys(en_keys, lang, en_key_to_value)
+            required_base_en = len(en_keys)
+            required_plural = len(required_keys) - required_base_en
+            total_req = len(required_keys)
+            file_missing = len(diff.missing_keys.get(lang, []))
+            file_unused = len(diff.unused_keys.get(lang, []))
             if file_path.exists():
                 with open(file_path) as f:
-                    data = json.load(f)
-                file_missing = missing_counts.get(filename, {}).get(lang, 0)
-                file_extra = len(summary.get(filename, LocaleSummary({}, 
{})).extra_keys.get(lang, []))
-
-                file_todos = count_todos(data)
+                    lang_data = json.load(f)
+                lang_key_to_value = flatten_keys_to_values(lang_data)
+                file_translated = sum(
+                    1
+                    for k in required_keys
+                    if k in lang_key_to_value and not 
is_todo_value(lang_key_to_value[k])
+                )
+                file_todos = sum(
+                    1 for k in required_keys if k in lang_key_to_value and 
is_todo_value(lang_key_to_value[k])
+                )
                 if file_todos > 0:
                     has_todos = True
-                file_translated = file_total - file_missing
-                # Coverage: translated / total
-                file_coverage_percent = 100 * file_translated / file_total if 
file_total else 100
-                # Complete percent: (translated - todos) / translated
-                file_actual_translated = file_translated - file_todos
-                complete_percent = 100 * file_actual_translated / 
file_translated if file_translated else 100
-                style = (
-                    "bold green"
-                    if file_missing == 0 and file_extra == 0 and file_todos == 0
-                    else (
-                        "yellow" if file_missing < file_total or file_extra > 
0 or file_todos > 0 else "red"
-                    )
-                )
             else:
-                file_missing = file_total
-                file_extra = len(summary.get(filename, LocaleSummary({}, 
{})).extra_keys.get(lang, []))
-                file_todos = 0
                 file_translated = 0
-                file_coverage_percent = 0
-                complete_percent = 0
-                style = "red"
+                file_todos = 0
+            file_coverage = 100 * file_translated / total_req if total_req 
else 100.0
+            style = (
+                "bold green"
+                if file_missing == 0 and file_unused == 0 and file_todos == 0
+                else ("yellow" if file_missing > 0 or file_todos > 0 or 
file_unused > 0 else "red")
+            )

Review Comment:
   The `style` computation at lines 443–446 has an unreachable `"red"` branch. 
The outer condition is `file_missing == 0 and file_unused == 0 and file_todos 
== 0`, so its `else` branch is only entered when at least one of those is 
non-zero. In that branch, the inner condition `file_missing > 0 or file_todos > 
0 or file_unused > 0` is always `True` (same three variables, just reversed), 
meaning the `"red"` fallback is never reached. The style will be `"yellow"` for 
all non-perfect rows, but the intent was likely to show `"red"` for more severe 
cases (e.g. missing keys) versus `"yellow"` for minor ones (only unused keys or 
todos). The style logic needs to be restructured to reflect the intended 
severity distinction.
   ```suggestion
               if file_missing > 0:
                   style = "red"
               elif file_todos > 0 or file_unused > 0:
                   style = "yellow"
               else:
                   style = "bold green"
   ```



##########
dev/breeze/src/airflow_breeze/commands/ui_commands.py:
##########
@@ -355,91 +391,93 @@ def print_translation_progress(locale_files, 
missing_counts, summary):
         table = tables[lang]
         table.title = f"Translation Progress: {lang}"
         table.add_column("File", style="bold cyan")
-        table.add_column("Missing", style="red")
-        table.add_column("Extra", style="yellow")
-        table.add_column("TODOs", style="magenta")
+        table.add_column("Required (base EN)", style="dim")
+        table.add_column("Required (plural)", style="dim")
+        table.add_column("Total required", style="bold")
         table.add_column("Translated", style="green")
-        table.add_column("Total", style="bold")
+        table.add_column("Missing", style="red")
         table.add_column("Coverage", style="bold")
-        table.add_column("Completed", style="bold")
+        table.add_column("TODOs", style="magenta")
+        table.add_column("Unused", style="yellow")

Review Comment:
   The PR description states "Table columns (order): Missing, Coverage, TODOs, 
Unused. Optional columns (Required base EN, Required plural, Total required, 
Translated) are available for detail." However, the implementation always shows 
all 8 columns, with the optional columns appearing first (before Missing, 
Coverage, TODOs, Unused). If the intent was to make the detail columns truly 
optional (e.g., behind a `--verbose` flag), that has not been implemented. If 
all columns are always intended to be shown, the PR description is misleading.



##########
dev/breeze/tests/test_ui_commands.py:
##########
@@ -61,6 +61,21 @@ def test_expand_plural_keys_polish(self):
         assert "message_many" in expanded
         assert "message_other" in expanded
 
+    def 
test_expand_plural_keys_only_expands_when_en_has_count_placeholder(self):
+        # When en has error_one without {{count}}, we must not require 
error_other
+        keys = {"error_one"}
+        en_key_to_value = {"error_one": "1 Error"}
+        expanded = expand_plural_keys(keys, "pt", en_key_to_value)
+        assert "error_one" in expanded
+        assert "error_other" not in expanded
+
+    def test_expand_plural_keys_expands_when_en_has_count_placeholder(self):
+        keys = {"warning_one", "warning_other"}
+        en_key_to_value = {"warning_one": "1 Warning", "warning_other": 
"{{count}} Warnings"}
+        expanded = expand_plural_keys(keys, "pt", en_key_to_value)
+        assert "warning_one" in expanded

Review Comment:
   The test `test_expand_plural_keys_expands_when_en_has_count_placeholder` 
only verifies that the existing input keys (`warning_one` and `warning_other`) 
remain in the output — these would be present even without expansion since 
`expanded = set(keys)` is initialized with all input keys. The test does not 
verify that additional locale-specific plural forms are added when the locale 
has more forms than EN. For example, for Polish (`pl`) with 4 plural forms, and 
a base whose EN value contains `{{count}}`, the test should assert that 
`warning_few` and `warning_many` are also present in the expanded set. Without 
such an assertion, the test would pass even if the expansion logic had a bug 
that prevented new plural forms from being added.
   ```suggestion
           # When en has a {{count}} placeholder for the plural base, locales 
with more plural
           # forms than English (e.g. Polish) should have all their forms 
required/expanded.
           keys = {"warning_one"}
           en_key_to_value = {"warning_one": "1 Warning", "warning_other": 
"{{count}} Warnings"}
           expanded = expand_plural_keys(keys, "pl", en_key_to_value)
           assert "warning_one" in expanded
           assert "warning_few" in expanded
           assert "warning_many" in expanded
   ```



##########
dev/breeze/src/airflow_breeze/commands/ui_commands.py:
##########
@@ -512,15 +550,15 @@ def sort_dict_keys(obj):
         console.print(f"[green]Added missing translations to 
{lang_path}[/green]")
 
 
-def remove_extra_translations(language: str, summary: dict[str, 
LocaleSummary]):
+def remove_unused_translations(language: str, summary: dict[str, 
LocaleSummary]):
     """
-    Remove extra translations for the selected language.
+    Remove unused translations for the selected language.
 
-    Removes keys that are present in the language file but missing in the 
English file.
+    Removes keys that are present in the language file but are not required.
     """
     console = get_console()
     for filename, diff in summary.items():
-        extra_keys = set(diff.extra_keys.get(language, []))
+        extra_keys = set(diff.unused_keys.get(language, []))
         if not extra_keys:

Review Comment:
   The local variable is named `extra_keys` but it holds the `unused_keys` from 
the diff. This is a stale variable name left over from the rename of 
`remove_extra_translations` to `remove_unused_translations`. The variable 
should be renamed to `unused_keys` for consistency with the rest of the 
refactoring.



##########
dev/breeze/src/airflow_breeze/commands/ui_commands.py:
##########
@@ -342,7 +378,7 @@ def print_translation_progress(locale_files, 
missing_counts, summary):
 
     tables = defaultdict(lambda: Table(show_lines=True))
     all_files = set()
-    coverage_per_language = {}  # Collect total coverage per language
+    coverage_per_language = {}
     for lf in locale_files:
         all_files.update(lf.files)
 

Review Comment:
   The `missing_counts` parameter passed to `print_translation_progress` is no 
longer used in the function body. The function now computes `file_missing` 
directly from `diff.missing_keys.get(lang, [])` (line 423). The unused 
parameter should be removed from the function signature and all call sites. 
Keeping it creates a misleading signature and unnecessary coupling.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Improve translation completeness checker (required keys, coverage, unused) [airflow]

Reply via email to