On Tue, May 16, 2017 at 7:03 AM, Tim Guan-tin Chien <timdr...@mozilla.com> wrote: > According to Alexa top 100 Taiwan sites and quick spot checks, I can only > see the following two sites encoded in Big5: > > http://www.ruten.com.tw/ > https://www.momoshop.com.tw/ > > Both are shopping sites (eBay-like and Amazon-like) so you get the idea how > forms are used there.
Thank you. It seems to me that encoder performance doesn't really matter for sites like these, since the number of characters one would enter in the search field at a time is very small. > Mike reminded me to check the Tax filing website: http://www.tax.nat.gov.tw/ > .Yes, it's unfortunately also in Big5. I guess I'm not going to try filing taxes there for testing. :-) - - One option I've been thinking about is computing an encode acceleration table for JIS X 0208 on the first attempt to encode a CJK Unified Ideograph in any of Shift_JIS, EUC-JP or ISO-2022-JP, for GBK on the first attempt to encode a CJK Unified Ideograph in either GBK or gb18030, and for Big5 on the first attempt to encode a CJK Unified Ideograph in Big5. Each of the three tables would then remain allocated through to the termination of the process. This would have the advantage of not bloating our binary footprint with data that can be computed from other data in the binary while still making legacy Chinese and Japanese encode fast without a setup cost for each encoder instance. The downsides would be that the memory for the tables wouldn't be reclaimed if the tables aren't needed anymore (the browser can't predict the future) and executions where any of the tables has been created wouldn't be valgrind-clean. Also, in the multi-process world, the tables would be recomputed per-process. OTOH, if we shut down rendered processes from time to time, it would work as a coarse mechanism to reclaim the memory is case Japanese or Chinese legacy encode is a relatively isolated event in the user's browsing pattern. Creating a mechanism for the encoding library to become aware of application shutdown just in order to be valgrind-clean would be messy, though. (Currently, we have shutdown bugs where uconv gets used after we've told it can shut down. I'd really want to avoid re-introducing that class of bugs with encoding_rs.) Is it OK to create allocations that are intentionally never freed (i.e. process termination is what "frees" them)? Is valgrind's message suppression mechanism granular enough to suppress three allocations from a particular Rust crate statically linked into libxul? -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform