On Tue, May 16, 2017 at 7:03 AM, Tim Guan-tin Chien
<timdr...@mozilla.com> wrote:
> According to Alexa top 100 Taiwan sites and quick spot checks, I can only
> see the following two sites encoded in Big5:
>
> http://www.ruten.com.tw/
> https://www.momoshop.com.tw/
>
> Both are shopping sites (eBay-like and Amazon-like) so you get the idea how
> forms are used there.

Thank you. It seems to me that encoder performance doesn't really
matter for sites like these, since the number of characters one would
enter in the search field at a time is very small.

> Mike reminded me to check the Tax filing website: http://www.tax.nat.gov.tw/
> .Yes, it's unfortunately also in Big5.

I guess I'm not going to try filing taxes there for testing. :-)

- -

One option I've been thinking about is computing an encode
acceleration table for JIS X 0208 on the first attempt to encode a CJK
Unified Ideograph in any of Shift_JIS, EUC-JP or ISO-2022-JP, for GBK
on the first attempt to encode a CJK Unified Ideograph in either GBK
or gb18030, and for Big5 on the first attempt to encode a CJK Unified
Ideograph in Big5.

Each of the three tables would then remain allocated through to the
termination of the process.

This would have the advantage of not bloating our binary footprint
with data that can be computed from other data in the binary while
still making legacy Chinese and Japanese encode fast without a setup
cost for each encoder instance.

The downsides would be that the memory for the tables wouldn't be
reclaimed if the tables aren't needed anymore (the browser can't
predict the future) and executions where any of the tables has been
created wouldn't be valgrind-clean. Also, in the multi-process world,
the tables would be recomputed per-process. OTOH, if we shut down
rendered processes from time to time, it would work as a coarse
mechanism to reclaim the memory is case Japanese or Chinese legacy
encode is a relatively isolated event in the user's browsing pattern.

Creating a mechanism for the encoding library to become aware of
application shutdown just in order to be valgrind-clean would be
messy, though. (Currently, we have shutdown bugs where uconv gets used
after we've told it can shut down. I'd really want to avoid
re-introducing that class of bugs with encoding_rs.)

Is it OK to create allocations that are intentionally never freed
(i.e. process termination is what "frees" them)? Is valgrind's message
suppression mechanism granular enough to suppress three allocations
from a particular Rust crate statically linked into libxul?

-- 
Henri Sivonen
hsivo...@hsivonen.fi
https://hsivonen.fi/
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to