pomegranited opened a new issue, #32854: URL: https://github.com/apache/superset/issues/32854
## [SIP] Proposal for Translating Superset asset data ### Motivation Superset provides translation support for built-in components in the UI. However, the Open edX project also needs the user-provided terms used in the assets themselves to be translatable, e.g. dashboard and chart title, axes labels, and metric labels. We also need these asset translations to be easily maintained between upgrades of Superset, and to be re-deployable when translations are updated. An automatic way to export translated asset fields and import translations would also help integrate Superset into Open edX's ecosystem. Superset's requirements for this feature include (refs [1](https://github.com/apache/superset/issues/32139#issuecomment-2675387707), [2](https://github.com/apache/superset/issues/32139#issuecomment-2670053442)): * Zero performance hit for English `en` (or when running with a single system default language) The system default language strings should live in the field itself, so no extra lookups are required. * No data translations will be added to Superset's officially supported translations -- just like asset data, these translations must be managed by Superset users. * Translators can update asset translations in the Superset UI and see their updates in real time. Desirable features include: * Searchability of the appropriate translated text when user has selected a non-`en` language ([ref](https://github.com/apache/superset/issues/32139#issuecomment-2675846271)). Superset, by default, is configured to use English (`en`) as the single default language via the [`BABEL_DEFAULT_LOCALE`](https://github.com/apache/superset/blob/45ea11c1b65887755f30e5945ea280abc0847929/superset/config.py#L374) setting. Some translations for built-in UI components are provided for 16 additional languages which can be enabled via the [`LANGUAGES`](https://github.com/apache/superset/blob/45ea11c1b65887755f30e5945ea280abc0847929/superset/config.py#L378-L396) setting, but [these languages are disabled by default](https://github.com/apache/superset/blob/45ea11c1b65887755f30e5945ea280abc0847929/superset/config.py#L397-L399) for performance reasons, and due to incomplete available translations. Open edX aims to implement this feature to increase uptake of the use of Superset in our ecosystem. The more internationalised sites that run Superset, the more people will be motivated to fill in Superset's missing UI translations for more supported languages. ### Proposed Change We propose implementing this feature in two phases. This feature will be disabled by default, and can be enabled by an app-wide feature flag. #### Phase 1: Functionality Add the base functionality for rendering configured translatable model fields using a new jinja filter called `i18n`. No changes to the UI or data will be visible to a user who edits a translatable field -- the user continues to provide the full `BABEL_DEFAULT_LOCALE` default text for the field. When this feature is enabled and the user views an asset, they will see their selected language's translation, if found, or the default text, if not. When rendering translatable fields when this feature is enabled: * If there's only one language configured, or the user is using the app's default language, simply return the field's default text. * Else: * Show `translated_text` found in the `i18n_translations` table for the given asset UUID + default text + user's current language. * Fallback to the default text. #### Phase 2: UX Add buttons to the Superset UI to enable editors to provide translations of asset text in the Superset UI. Exact UX TBD, but see https://github.com/apache/superset/issues/13442 for one nice way to do this. When editing a templated field, the user should be able to see: * The full `default_text` value to be translated * All configured [`LANGUAGES`](https://github.com/apache/superset/blob/45ea11c1b65887755f30e5945ea280abc0847929/superset/config.py#L378-L396) to be translated This phase requires making a lot of UI changes, so will likely be broken into smaller contributions. It also would benefit from the input of a UI/UX designer, which we can provide. ### New or Changed Public Interfaces * Visualization types -- unchanged * Form data for saved dashboards and charts -- likely unchanged, but see "Open Question 4" below. #### Phase 1: Functionality Configuration: 1. A new configuration feature flag would be added to enable/disable this feature, called `ASSET_TRANSLATIONS`. Data passed between backend and frontend: 1. Add new jinja filter called `i18n` which uses the `i18n_translations` table described below to render translated text in a request. 2. Add a model mixin to add the concept of "templated fields" E.g the Dashboard model would have the following class variable: ```python templated_fields = [ 'dashboard_title', 'description', 'json_metadata', 'position_json', ] ``` 3. When rendering model fields to the "view" API: * If the app-wide setting is enabled: wrap templated field values in the `i18n` jinja filter, and render values. * Else: pass field values straight through. Command line tools and arguments: 1. New command-line tools will be added to extract translatable asset fields and import translated text into the `i18n_translations` table, using the `.mo` translation file format. #### Phase 2: UX A new "translation UI component" would take these properties as input: * `asset_uuid` * `model_name` * `field_name` * `default_text` REST endpoints: 1. A "translation API" would be added to the backend to support these components for viewing and editing translated text for a given asset model field with the configured `BABEL_DEFAULT_LOCALE` and `LANGUAGES` settings. ### New dependencies No new `npm`/`PyPI` packages required. ### Migration Plan and Compatibility A migration will be added to create a new table called `i18n_translations`. Fields will be: * `asset_uuid`: unique identifier for the asset being translated. * `default_text`: user-provided text for the configured `BABEL_DEFAULT_LOCALE` * `language_code`: language/locale code for the translated text * `translated_text`: `default_text` translation for the `language_code` * `model_name`: name of the table containing the translatable field. * `field_name`: name of the field being translated. Indexes: * `asset_uuid` + `default_text` + `language_code` (unique): used to locate any `translated_text` fields when rendering a given asset. May also add `model_name` and/or `field_name` to this index, if needed. Notes: * Dashboards and charts that are saved or bookmarked will still work after the change, or when this change is enabled/disabled. ### Open Questions: 1. May need to batch queries or use a cache for performance. 2. Some model fields are edited within a modal popup – so we'd want to avoid a modal-within-modal UI. 3. Is there a difference in how echart editors are rendered vs old-style charts? i.e. Do they both use the data API? 4. Unsure how to provide translations for complex fields like dashboard [`json_metadata`](https://github.com/apache/superset/blob/45ea11c1b65887755f30e5945ea280abc0847929/superset/models/dashboard.py#L141), [`position_json`](https://github.com/apache/superset/blob/45ea11c1b65887755f30e5945ea280abc0847929/superset/models/dashboard.py#L136) and slice [`params`](https://github.com/apache/superset/blob/45ea11c1b65887755f30e5945ea280abc0847929/superset/models/slice.py#L80), e.g. `metadata.native_filter_configuration.name`, `position.*.meta.text`, `params.x_axis_label`. May need to store templated field values in these complex fields, and mark the whole field (e.g `json_metadata`) as a `templated_field`. 5. Unsure how to provide asset search/filtering for translated text -- can we use [base_filters](https://flask-appbuilder.readthedocs.io/en/latest/advanced.html#base-filtering) somehow? ### Rejected Alternatives * [SIP-60](https://github.com/apache/superset/issues/13442) -- The frontend UI proposed in SIP-60 could be used here. But we propose an alternative backend implementation which uses a single `i18n_translation` table to store all translations instead of introducing individual `<field_name>_i18n` field for each translatable field. * [SIP-153](https://github.com/apache/superset/issues/32139) -- rejected because its approach was ["build time" focused and not as dynamic as desired](https://github.com/apache/superset/issues/32139#issuecomment-2670053442) and because it abuses the [frontend language packs shipped from the backend](https://github.com/apache/superset/issues/32139#issuecomment-2670069005), which is not ideal. SIP-153 also didn't support returning translated fields in the data APIs, while this solution does. * Decided not to support the more complex jina filter format [proposed here](https://github.com/apache/superset/issues/32139#issuecomment-2675387707) e.g `"CompanyX {{ "Revenue Dashboard" | i18n("rev_dash") }}"` because: 1. Many languages would need to change the order of the translatable phrases in order to be sensible, e.g. "Panel de ingresos de la CompanyX", and so translating the full phrase is better. 2. Simplifies the editing user's experience by not requiring them to use jinja template syntax 3. Allows this feature to be enabled or disabled without having to change data. E.g. if the actual stored `dashboard_title = "CompanyX {{ "Revenue Dashboard" | i18n("rev_dash") }}`, this field becomes unintelligible if the feature is subsequently disabled. 4. We have enough information (asset UUID, model name, field name) to uniquely identify the translated field text if we treat it as an atomic unit, rather than allowing its value to be broken into some unknown number of parts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
