Re: [I] [SIP-161] Translating Superset asset data [superset]

via GitHub Fri, 04 Apr 2025 19:49:05 -0700


mistercrunch commented on issue #32854:
URL: https://github.com/apache/superset/issues/32854#issuecomment-2780156812


   There's a lot to parse here, but will dump more thoughts:
   
   ### Clarifying the use of `string_id`s and/or use strings as keys in 
`i18n_translations`. 
   
   Say if I want to set a string for translation, I'd put something like `{{ 
i18n('Amazing Dashboard') }}`, but can consider also supporting `{{ 
i18n(string_id='AMZ_DASH', main_locale='Amazing Dashboard') }}`. There are 
tradeoffs here but either work. I would suggest against `{{ 
i18n(string_id='AMZ_DASH') }} where main locale has to be looked up, but hey, 
could be manageable too. Allowing **only** `{{ i18n('Amazing Dashboard') }}` is 
probably my preferred approach, though if you change something minor in the 
string, like capitalization, it orphans all the related translation, which is 
probably right (capitalization should change there too, but might have to 
copy/paste/edit for long strings).
   
   ### About caching
   
   Worried about lots of IO with `i18n_translations`, I'd say it's probably 
reasonable for the backend to manage an in-memory data structure that it looks 
up first prior to fetching/caching from `i18n_translations`. Maybe that's a 
config flag `ENABLE_I18N_IN_MEMORY_CACHE` for those who prefer IO over memory 
footprint. Simple tuple used as keys as in `i18n_cache[(str_id, locale)]`, if 
it's there it's free to lookup, if not it fetches from `i18n_translations`. 
Memory footprint is probably reasonable here, a simple factor of 
(avg_string_size * number_of_strings * number_of_locales). Note that each web 
worker will have this incremental memory footprint, provision your instances 
accordingly.
   
   ### About batch-retrieval
   
   I think that's hard with jinja, not sure if it's even possible. With in-mem 
caching it's not as needed as each lookup is super cheap.
   
   ### About auto-populating translation through a service
   
   In the age of AI, I think no one should manually translate stuff anymore. 
The jinja macro, as it encounters new strings, will auto-populate rows in 
`i18n_translations` and flag them as "to-be-translated". It seems writing a 
simple job, or generating async jobs to call a service using celery-beat (a 
cron that can run every N minutes that looks for new string, batches a call to 
GPT or google translate). No UI required then. Even if we do want/need a UI for 
translation, i'd suggest building it externally to Superset maybe (similar to 
the poedit approach), so we'd only have to expose a CRUD REST API (if even) as 
you could slap a simple UI on top (Claude can probably build one quickly, or 
even plug a simple no-code solution like ReTool or AirTable on top of that 
table/model). It punts this UI complexity outside of Superset.
   
   ### About garbage collection
   
   Lots of strings might get orphaned, and can be good to trim the table. That 
would require maintaining some sort of "last_used_dttm" thing, but that's more 
IO to manage... Maybe it's yet another config flag, where periodically could 
turn it on, say for a month, and delete based on usage... Maybe I'm 
overthinking it. Or maybe it's lazy, like if we read the string and notice that 
it's more than say a week since the last recorded read, only then you run an 
UPDATE for that `last_used_dttm`, minimizing write operations on read.
   
   ### About Jinja overhead
   
   Curious how much the `templated_fields` approach, even in `en`-only envs add 
overheads. You have to create the jinja template object and `.render()` it, 
which even if it's just a short "oh I'm in en, don't have to do anything 
here...", it may add significant overhead. I'd suggest either profiling this, 
or simply not bothering with doing the template logic if/when we're in a 
non-i18n environment.
   
   -------
   All in all, a few new ideas here and things to clarify. Main thing I'd push 
hard on is keeping the "translation UI" out of Superset, maybe even the 
translation service logic as there are a bunch of non-open-source services we'd 
have to rely on... Maybe it's a plugin or something that lives outside of 
Superset.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] [SIP-161] Translating Superset asset data [superset]

Reply via email to