pomegranited opened a new issue, #32854:
URL: https://github.com/apache/superset/issues/32854

   ## [SIP] Proposal for Translating Superset asset data
   
   ### Motivation
   
   Superset provides translation support for built-in components in the UI. 
However, the Open edX project also needs the user-provided terms used in the 
assets themselves to be translatable, e.g. dashboard and chart title, axes 
labels, and metric labels.
   
   We also need these asset translations to be easily maintained between 
upgrades of Superset, and to be re-deployable when translations are updated. An 
automatic way to export translated asset fields and import translations would 
also help integrate Superset into Open edX's ecosystem.
   
   Superset's requirements for this feature include (refs 
[1](https://github.com/apache/superset/issues/32139#issuecomment-2675387707), 
[2](https://github.com/apache/superset/issues/32139#issuecomment-2670053442)):
   
   * Zero performance hit for English `en` (or when running with a single 
system default language)
     The system default language strings should live in the field itself, so no 
extra lookups are required.
   * No data translations will be added to Superset's officially supported 
translations -- just like asset data, these translations must be managed by 
Superset users.
   * Translators can update asset translations in the Superset UI and see their 
updates in real time.
   
   Desirable features include:
   
   * Searchability of the appropriate translated text when user has selected a 
non-`en` language 
([ref](https://github.com/apache/superset/issues/32139#issuecomment-2675846271)).
   
   Superset, by default, is configured to use English (`en`) as the single 
default language via the 
[`BABEL_DEFAULT_LOCALE`](https://github.com/apache/superset/blob/45ea11c1b65887755f30e5945ea280abc0847929/superset/config.py#L374)
 setting. Some translations for built-in UI components are provided for 16 
additional languages which can be enabled via the 
[`LANGUAGES`](https://github.com/apache/superset/blob/45ea11c1b65887755f30e5945ea280abc0847929/superset/config.py#L378-L396)
 setting, but [these languages are disabled by 
default](https://github.com/apache/superset/blob/45ea11c1b65887755f30e5945ea280abc0847929/superset/config.py#L397-L399)
 for performance reasons, and due to incomplete available translations.
   
   Open edX aims to implement this feature to increase uptake of the use of 
Superset in our ecosystem. The more internationalised sites that run Superset, 
the more people will be motivated to fill in Superset's missing UI translations 
for more supported languages.
   
   ### Proposed Change
   
   We propose implementing this feature in two phases.
   
   This feature will be disabled by default, and can be enabled by an app-wide 
feature flag.
   
   #### Phase 1: Functionality
   
   Add the base functionality for rendering configured translatable model 
fields using a new jinja filter called `i18n`.
   No changes to the UI or data will be visible to a user who edits a 
translatable field -- the user continues to provide the full 
`BABEL_DEFAULT_LOCALE` default text for the field.
   When this feature is enabled and the user views an asset, they will see 
their selected language's translation, if found, or the default text, if not.
   
   When rendering translatable fields when this feature is enabled:
   * If there's only one language configured, or the user is using the app's 
default language, simply return the field's default text.
   * Else:
     * Show `translated_text` found in the `i18n_translations` table for the 
given asset UUID + default text + user's current language.
     * Fallback to the default text.
   
   #### Phase 2: UX
   
   Add buttons to the Superset UI to enable editors to provide translations of 
asset text in the Superset UI.
   Exact UX TBD, but see https://github.com/apache/superset/issues/13442 for 
one nice way to do this.
   
   When editing a templated field, the user should be able to see:
   
   * The full `default_text` value to be translated
   * All configured 
[`LANGUAGES`](https://github.com/apache/superset/blob/45ea11c1b65887755f30e5945ea280abc0847929/superset/config.py#L378-L396)
 to be translated
   
   This phase requires making a lot of UI changes, so will likely be broken 
into smaller contributions.
   It also would benefit from the input of a UI/UX designer, which we can 
provide.
   
   ### New or Changed Public Interfaces
   
   * Visualization types -- unchanged
   * Form data for saved dashboards and charts -- likely unchanged, but see 
"Open Question 4" below.
   
   #### Phase 1: Functionality 
   
   Configuration:
   
   1. A new configuration feature flag would be added to enable/disable this 
feature, called `ASSET_TRANSLATIONS`.
   
   Data passed between backend and frontend:
   
   1. Add new jinja filter called `i18n` which uses the `i18n_translations` 
table described below to render translated text in a request.
   2. Add a model mixin to add the concept of "templated fields"
      E.g the Dashboard model would have the following class variable:
      ```python
       templated_fields = [
           'dashboard_title', 
           'description',
           'json_metadata',
           'position_json',
      ]
      ```
   3. When rendering model fields to the "view" API:
      * If the app-wide setting is enabled: wrap templated field values in the 
`i18n` jinja filter, and render values.
      * Else: pass field values straight through.
   
   Command line tools and arguments:
   
   1. New command-line tools will be added to extract translatable asset fields 
and import translated text into the `i18n_translations` table, using the `.mo` 
translation file format.
   
   #### Phase 2: UX
   
   A new "translation UI component" would take these properties as input:
   * `asset_uuid`
   * `model_name`
   * `field_name`
   * `default_text`
   
   REST endpoints:
   
   1. A "translation API" would be added to the backend to support these 
components for viewing and editing translated text for a given asset model 
field with the configured `BABEL_DEFAULT_LOCALE` and `LANGUAGES` settings.
   
   ### New dependencies
   
   No new `npm`/`PyPI` packages required.
   
   ### Migration Plan and Compatibility
   
   A migration will be added to create a new table called `i18n_translations`.
   Fields will be:
   * `asset_uuid`: unique identifier for the asset being translated.
   * `default_text`: user-provided text for the configured 
`BABEL_DEFAULT_LOCALE`
   * `language_code`: language/locale code for the translated text
   * `translated_text`: `default_text` translation for the `language_code`
   * `model_name`: name of the table containing the translatable field.
   * `field_name`: name of the field being translated.
   
   Indexes:
   * `asset_uuid` + `default_text` + `language_code` (unique): used to locate 
any `translated_text` fields when rendering a given asset.
      May also add `model_name` and/or `field_name` to this index, if needed.
   
   Notes:
   * Dashboards and charts that are saved or bookmarked will still work after 
the change, or when this change is enabled/disabled.
   
   ### Open Questions:
   
   1. May need to batch queries or use a cache for performance.
   2. Some model fields are edited within a modal popup – so we'd want to avoid 
a modal-within-modal UI.
   3. Is there a difference in how echart editors are rendered vs old-style 
charts? i.e. Do they both use the data API?
   4. Unsure how to provide translations for complex fields like dashboard 
[`json_metadata`](https://github.com/apache/superset/blob/45ea11c1b65887755f30e5945ea280abc0847929/superset/models/dashboard.py#L141),
 
[`position_json`](https://github.com/apache/superset/blob/45ea11c1b65887755f30e5945ea280abc0847929/superset/models/dashboard.py#L136)
 and slice 
[`params`](https://github.com/apache/superset/blob/45ea11c1b65887755f30e5945ea280abc0847929/superset/models/slice.py#L80),
 e.g. `metadata.native_filter_configuration.name`, `position.*.meta.text`, 
`params.x_axis_label`.
      May need to store templated field values in these complex fields, and 
mark the whole field (e.g `json_metadata`) as a `templated_field`.
   5. Unsure how to provide asset search/filtering for translated text -- can 
we use 
[base_filters](https://flask-appbuilder.readthedocs.io/en/latest/advanced.html#base-filtering)
 somehow?
   
   ### Rejected Alternatives
   
   * [SIP-60](https://github.com/apache/superset/issues/13442) -- The frontend 
UI proposed in SIP-60 could be used here. But we propose an alternative backend 
implementation which uses a single `i18n_translation` table to store all 
translations instead of introducing individual `<field_name>_i18n` field for 
each translatable field.
   * [SIP-153](https://github.com/apache/superset/issues/32139) -- rejected 
because its approach was ["build time" focused and not as dynamic as 
desired](https://github.com/apache/superset/issues/32139#issuecomment-2670053442)
 and because it abuses the [frontend language packs shipped from the 
backend](https://github.com/apache/superset/issues/32139#issuecomment-2670069005),
 which is not ideal. SIP-153 also didn't support returning translated fields in 
the data APIs, while this solution does.
   * Decided not to support the more complex jina filter format [proposed 
here](https://github.com/apache/superset/issues/32139#issuecomment-2675387707) 
e.g `"CompanyX {{ "Revenue Dashboard" | i18n("rev_dash") }}"` because:
     1. Many languages would need to change the order of the translatable 
phrases in order to be sensible, e.g. "Panel de ingresos de la CompanyX", and 
so translating the full phrase is better.
     2. Simplifies the editing user's experience by not requiring them to use 
jinja template syntax
     3. Allows this feature to be enabled or disabled without having to change 
data. E.g. if the actual stored `dashboard_title = "CompanyX {{ "Revenue 
Dashboard" | i18n("rev_dash") }}`, this field becomes unintelligible if the 
feature is subsequently disabled.
     4. We have enough information (asset UUID, model name, field name) to 
uniquely identify the translated field text if we treat it as an atomic unit, 
rather than allowing its value to be broken into some unknown number of parts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to