mistercrunch commented on issue #32854:
URL: https://github.com/apache/superset/issues/32854#issuecomment-2780156812
There's a lot to parse here, but will dump more thoughts:
### Clarifying the use of `string_id`s and/or use strings as keys in
`i18n_translations`.
Say if I want to set a string for translation, I'd put something like `{{
i18n('Amazing Dashboard') }}`, but can consider also supporting `{{
i18n(string_id='AMZ_DASH', main_locale='Amazing Dashboard') }}`. There are
tradeoffs here but either work. I would suggest against `{{
i18n(string_id='AMZ_DASH') }} where main locale has to be looked up, but hey,
could be manageable too. Allowing **only** `{{ i18n('Amazing Dashboard') }}` is
probably my preferred approach, though if you change something minor in the
string, like capitalization, it orphans all the related translation, which is
probably right (capitalization should change there too, but might have to
copy/paste/edit for long strings).
### About caching
Worried about lots of IO with `i18n_translations`, I'd say it's probably
reasonable for the backend to manage an in-memory data structure that it looks
up first prior to fetching/caching from `i18n_translations`. Maybe that's a
config flag `ENABLE_I18N_IN_MEMORY_CACHE` for those who prefer IO over memory
footprint. Simple tuple used as keys as in `i18n_cache[(str_id, locale)]`, if
it's there it's free to lookup, if not it fetches from `i18n_translations`.
Memory footprint is probably reasonable here, a simple factor of
(avg_string_size * number_of_strings * number_of_locales). Note that each web
worker will have this incremental memory footprint, provision your instances
accordingly.
### About batch-retrieval
I think that's hard with jinja, not sure if it's even possible. With in-mem
caching it's not as needed as each lookup is super cheap.
### About auto-populating translation through a service
In the age of AI, I think no one should manually translate stuff anymore.
The jinja macro, as it encounters new strings, will auto-populate rows in
`i18n_translations` and flag them as "to-be-translated". It seems writing a
simple job, or generating async jobs to call a service using celery-beat (a
cron that can run every N minutes that looks for new string, batches a call to
GPT or google translate). No UI required then. Even if we do want/need a UI for
translation, i'd suggest building it externally to Superset maybe (similar to
the poedit approach), so we'd only have to expose a CRUD REST API (if even) as
you could slap a simple UI on top (Claude can probably build one quickly, or
even plug a simple no-code solution like ReTool or AirTable on top of that
table/model). It punts this UI complexity outside of Superset.
### About garbage collection
Lots of strings might get orphaned, and can be good to trim the table. That
would require maintaining some sort of "last_used_dttm" thing, but that's more
IO to manage... Maybe it's yet another config flag, where periodically could
turn it on, say for a month, and delete based on usage... Maybe I'm
overthinking it. Or maybe it's lazy, like if we read the string and notice that
it's more than say a week since the last recorded read, only then you run an
UPDATE for that `last_used_dttm`, minimizing write operations on read.
### About Jinja overhead
Curious how much the `templated_fields` approach, even in `en`-only envs add
overheads. You have to create the jinja template object and `.render()` it,
which even if it's just a short "oh I'm in en, don't have to do anything
here...", it may add significant overhead. I'd suggest either profiling this,
or simply not bothering with doing the template logic if/when we're in a
non-i18n environment.
-------
All in all, a few new ideas here and things to clarify. Main thing I'd push
hard on is keeping the "translation UI" out of Superset, maybe even the
translation service logic as there are a bunch of non-open-source services we'd
have to rely on... Maybe it's a plugin or something that lives outside of
Superset.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]