Status of support for Android on ARMv7 without NEON
Do we support Android on ARMv7 without NEON? Does the answer differ for our different API level builds? -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Aarch64 as higher than tier-3?
Are there any plans to support Aarch64 as a tier higher than tier-3? For Android or the upcoming Aarch64 flavor of Windows 10 maybe? -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Status of support for Android on ARMv7 without NEON
On Mon, Jan 23, 2017 at 8:03 PM, Nicholas Alexander wrote: > On Mon, Jan 23, 2017 at 9:58 AM, Henri Sivonen wrote: >> >> Do we support Android on ARMv7 without NEON? > > > Ralph Giles told me just a few days ago that yes, we support ARMv7 with and > without NEON. OK. That makes it impractical to use NEON in Rust at this time then (even with optimism about currently nightly-only Rust features becoming available for Firefox code soon enough). > Right now, we ship only a single Fennec APK that supports Android API 15+. Thanks. I thought we we shipping a couple of different APKs for different API levels. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Aarch64 as higher than tier-3?
On Mon, Jan 23, 2017 at 9:00 PM, Kevin Brosnan wrote: > AArch64 (ARMv8) has been shipping on Android phones since Q3 2014. Furthermore, the flagships all seem to have 4 GB of RAM now. It's not clear to me if ARMv7 userland processes on AArch64 Android kernel get 2 GB or 3 GB of virtual address space, so it's not exactly clear to me to what extent we are currently failing to use the RAM on the latest phones, but in any case, the physical RAM is already in the territory where the pointer size starts to matter. (The context of my question was, however, understanding how soon we might have higher than tier-3 configs that have NEON available unconditionally.) -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
How to see which parts of Servo are in use?
A search like https://searchfox.org/mozilla-central/search?q=EncodingRef finds a bunch of stuff that is under servo/components/script/. I gather we don't use that part of Servo in Quantum. Correct? How does one see which parts of servo/ are actually in use in Quantum? Is there a way to filter out the unused parts on Searchfox? ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Tool for converting C++ standard library naming style to Gecko naming style?
Is there some kind of tool, similar to ./mach clang-format for whitespace, that changes identifies from C++ standard library style to Gecko style? I.e. foo becomes Foo, foo_bar becomes FooBar and arguments and members get prefixed with a and m respectively? -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Mozilla naming style vs. C++ standard library containers
It seems that we already have MFBT code that has lower case methods begin/end/cbegin/cend, etc., for compatibility with C++ standard library iteration: https://dxr.mozilla.org/mozilla-central/source/mfbt/ReverseIterator.h#136 I guess these methods shouldn't be capitalized, then. It seems that even size() and empty() should be lower-case if one wants a type to quack like a Container: http://en.cppreference.com/w/cpp/concept/Container Should these, too, no longer be capitalized? If a containerish type has more methods that are uncapitalized for the above reasons than methods whose names aren't constrained by standard interop, should the remaining methods follow Mozilla naming or standard library naming? -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Mozilla naming style vs. C++ standard library containers
On Feb 16, 2017 9:09 PM, "Botond Ballo" wrote: On Thu, Feb 16, 2017 at 1:05 PM, smaug wrote: > AFAIK, uncapitalized method names in MFBT are the old style, and new code > should just > use Mozilla coding style. > This was discussed in some other thread in dev.platform, but I can't find it > right now. In the case of begin() and end(), it's not just a matter of style. These methods need to be lowercase to make the class work with the range-based for loop. My question really is: Given that some methods have to have specific lower-case names, if I take the time to convert the other methods to Mozilla style, can I expect that a reviewer won't tell me to convert the other methods back in order to be consistent with the names whose case has to be lower case? ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Mozilla naming style vs. C++ standard library containers
On Thu, Feb 16, 2017 at 9:09 PM, Botond Ballo wrote: > In the case of begin() and end(), it's not just a matter of style. It seems that nsTArray, too, mixes these into a class that otherwise follows Mozilla naming, so I guess we have enough precedent for using the standard-library naming for iterator interop and Mozilla naming for other stuff in the same class. On Thu, Feb 16, 2017 at 7:24 PM, Henri Sivonen wrote: > It seems that even size() and empty() should be lower-case if one > wants a type to quack like a Container: > http://en.cppreference.com/w/cpp/concept/Container > > Should these, too, no longer be capitalized? I fail to find precedent for lower-case size() and empty() in a class with otherwise Mozilla-case methods. However, we do have code that uses standard-library containers in m-c (mainly code originating from Google), so it seems like a bad idea to break compatibility with standard-library collections. I think I'm going to proceed with keeping size() and empty() in lower case in order to quack like a standard library container but provide synonyms Length() and IsEmpty() for use from Mozilla-style code. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Doxygen output?
Our comments mostly try to follow the Doxygen format, and MDN says that the documentation team has a tool for importing Doxygen-formatted IDL comments into MDN articles. Other than that, is Doxygen output from m-c input being published anywhere? https://people-mozilla.org/~bgirard/doxygen/gfx/ is 404 these days. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Doxygen output?
On Tue, Feb 21, 2017 at 12:42 AM, wrote: > My short (<2yr) experience of the code gave me the impression that only a > small amount of it has proper doxygen comments. > We must be frequenting different circles; or I'm somehow blind to them. :-) I get to look at stuff like: /** * Cause parser to parse input from given URL * @updategess5/11/98 * @param aURL is a descriptor for source document * @param aListener is a listener to forward notifications to * @return TRUE if all went well -- FALSE otherwise */ > Anyway, they're mainly useful when generated websites/documents are readily > available, which it seems isn't the case (anymore). Right. I'm trying to assess how much effort I should put into writing Doxygen-formatted docs, and if we aren't really publishing Doxygen output, I feel like it's probably good to write /** ... */ in case we start using Doxygen again but probably not worthwhile to use the @ tags. On Tue, Feb 21, 2017 at 10:13 PM, Bill McCloskey wrote: > I've been thinking about how to integrate documentation into Searchfox. One > obvious thing is to allow it to display Markdown files and > reStructuredText. I wonder if it could do something useful with Doxygen > comments though? Is this something people would be interested in? I think integrating docs with Searchfox would be more useful than having unintegrated Doxygen output somewhere. Compared to just reading a .h with comments, I think a documentation view would be particularly useful for templates and headers with a lot of inline definitions as a means to let the reader focus on the interface and hide the implementation (including hiding whatever is in a namespace with the substring "detail" in the name of the namespace for templates). ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Should cheddar-generated headers be checked in?
Looking at mp4parse, the C header is generated: https://searchfox.org/mozilla-central/source/media/libstagefright/binding/mp4parse_capi/build.rs But also checked in: https://searchfox.org/mozilla-central/source/media/libstagefright/binding/include/mp4parse.h Is this the best current practice that I should follow with encoding_rs? See also: https://users.rust-lang.org/t/how-to-retrieve-h-files-from-dependencies-into-top-level-crates-target/9488 (unanswered at the moment) -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Should cheddar-generated headers be checked in?
On Wed, Feb 22, 2017 at 5:49 PM, Ted Mielczarek wrote: > Given that > the C API here is under your complete control, it seems like it's > possible to generate a cross-platform header I believe the header is cross-platform, yes. > Alternately you could just generate it at build time, and we could pass > the path to $(DIST)/include in a special environment variable so you > could put the header in the right place. So just https://doc.rust-lang.org/std/env/fn.var.html in build.rs? Any naming conventions for the special variable? (I'm inferring from the way you said it that DIST itself isn't being passed to the build.rs process. Right?) -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Tool for converting C++ standard library naming style to Gecko naming style?
On Fri, Feb 17, 2017 at 8:54 PM, Birunthan Mohanathas wrote: > On 16 February 2017 at 13:41, Henri Sivonen wrote: >> Is there some kind of tool, similar to ./mach clang-format for >> whitespace, that changes identifies from C++ standard library style to >> Gecko style? > > Yes, clang-tidy has the readability-identifier-naming check: > http://clang.llvm.org/extra/clang-tidy/checks/readability-identifier-naming.html Thank you! This does more than half of the work. (When it renames method arguments, it doesn't rename the uses of the arguments in the method body, but that seems to be the only thing in my case that needs manual post-processing.) -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Should cheddar-generated headers be checked in?
On Thu, Feb 23, 2017 at 4:37 PM, Ted Mielczarek wrote: > On Thu, Feb 23, 2017, at 06:40 AM, Emilio Cobos Álvarez wrote: >> On Thu, Feb 23, 2017 at 08:25:30AM +0200, Henri Sivonen wrote: >> > On Wed, Feb 22, 2017 at 5:49 PM, Ted Mielczarek >> > wrote: >> > > Given that >> > > the C API here is under your complete control, it seems like it's >> > > possible to generate a cross-platform header >> > >> > I believe the header is cross-platform, yes. >> > >> > > Alternately you could just generate it at build time, and we could pass >> > > the path to $(DIST)/include in a special environment variable so you >> > > could put the header in the right place. >> > >> > So just https://doc.rust-lang.org/std/env/fn.var.html in build.rs? Any >> > naming conventions for the special variable? (I'm inferring from the >> > way you said it that DIST itself isn't being passed to the build.rs >> > process. Right?) >> >> FWIW, in Stylo we use MOZ_DIST[1], which is passed to the build script, >> not sure if it's stylo only though. >> >> [1]: >> https://searchfox.org/mozilla-central/rev/b1044cf7c2000c3e75e8181e893236a940c8b6d2/servo/components/style/build_gecko.rs#48 > > So if you're only concerned about it working in Gecko--there you go! Thanks. I'm interested both in the Gecko case and the general case. When doing a parallel build, I see an interleave of C++ and Rust build system output. What guarantees that a build.rs that exports headers runs before the C++ compiler wants to see the headers? > I'm > not aware that there's any better convention for this in Rust in the > general sense. :-( -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Editing vendored crates
I tried to add some panics to a vendored to create (rust-encoding) to see if the code in question runs. However, I didn't get to the running part, because the edited code failed to build. It turns out that each vendored crate has a .cargo-checksum.json file that contains hashes of all the files in the crate, and Cargo refuses to build the crate if the hashes don't match or the .cargo-checksum.json file is removed. This seems hostile not only to experimental local edits as in my case but also to use cases such as uplifting fixes to branches and shipping modified crates if the upstream is slow to accept patches or disagrees with the patches. As far as I can tell, there doesn't seem to be a way to ask cargo to regenerate the .cargo-checksum.json after edits. Also, adding a [replace] section to Cargo.toml of libgkrust pointing to the edited crate under third-party/rust or adding paths = [ "third-party/rust" ] to .cargo/config.in don't make Cargo happy. What's the right way to build with edited crates under third-party/rust? (I think we should have an easy way to do so.) -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Editing vendored crates
On Mon, Feb 27, 2017 at 12:10 AM, Bobby Holley wrote: > Can you elaborate on what goes wrong here? This worked for ted's experiment > mentioned upthread, and for me on at least one occasion in > https://hg.mozilla.org/try/rev/18dc070e0308 (permalink: > https://pastebin.mozilla.org/8980438 ) > > You'll need to |cargo update| after adding the replace in Cargo.toml to > update Cargo.lock. Thanks. I failed to do that yesterday. When I do that, Cargo complains about not finding the encoding-index-foo crates in subdirectories of encoding. Replacement in gkrust's Cargo.toml doesn't work. So then I go and edit encoding's Cargo.toml to point it to the right place. Then Cargo complains about those crates not finding encoding_index_tests. So then I edit their Cargo.tomls to point to the test crate. Then cargo update passes. But then I do ./mach build and Cargo complains about the checksums not matching because I edited the Cargo.tomls under the crates that I thought I was changing from "locally-stored crates.io crate" status to "local replacement" status. The I remove the checksum file. The cargo complains about not finding the checksum file. I find this level of difficulty (self-inflicted quasi-Tivoization practically) an unreasonable impediment to practicing trivial Software Freedom with respect to the vendored crates. > This is basically the right way to do it, rather than editing the checksums. > [replace] tells the Cargo machinery that the vendored code is its own > snowflake, rather than just a cache of what's on crates.io. Doing otherwise > breaks cargo invariants. What are the invariants? Why do we need the invariants, when can do without impediments like these for e.g. the vendored copy of ICU? What badness would arise from patching Cargo to ignore the .cargo-checksum.json files? On Mon, Feb 27, 2017 at 1:23 AM, Xidorn Quan wrote: > This workflow should really be automated via a mach command. Filed bug > 1342815 [1] for that. > > [1] https://bugzilla.mozilla.org/show_bug.cgi?id=1342815 Thank you. I think it should be possible to stick a diagnostic println!() or panic!() in the vendored code with zero or minimal ceremony. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Should cheddar-generated headers be checked in?
On Mon, Feb 27, 2017 at 1:40 AM, Xidorn Quan wrote: >> When doing a parallel build, I see an interleave of C++ and Rust build >> system output. What guarantees that a build.rs that exports headers >> runs before the C++ compiler wants to see the headers? > > Oh, it's about C++ header generated from Rust? Yes. cheddar, not bindgen. > I don't think it is > possible given the current architecture. ... > So if it is Rust library exposing C API, it is probably responsibility > of the Rust library to also provide a C header for other code to use. I take it that I should check in the header, then. Is there some way to disable build.rs for a vendored crate (without making cargo unhappy about the hashes)? (If the header is checked in, compiling cheddar in order to generate it at build time is useless.) -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Should cheddar-generated headers be checked in?
On Mon, Feb 27, 2017 at 2:38 PM, Henri Sivonen wrote: > Is there some way to disable build.rs for a vendored crate (without > making cargo unhappy about the hashes)? (If the header is checked in, > compiling cheddar in order to generate it at build time is useless.) This seems not only relevant for build time but relevant to what dependencies we vendor. Right now, running ./mach vendor rust with gkrust depending on https://crates.io/crates/encoding_c pulls in the dependencies for cheddar, which we apparently don't already have in the tree. Maybe having the header generation as part of build.rs is more trouble than it's worth in the first place... -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Editing vendored crates
On Mon, Feb 27, 2017 at 7:04 PM, Ralph Giles wrote: > On Mon, Feb 27, 2017 at 4:03 AM, Henri Sivonen wrote: > >> I find this level of difficulty (self-inflicted quasi-Tivoization >> practically) an unreasonable impediment to practicing trivial Software >> Freedom with respect to the vendored crates. > > I agree we need to fix the ergonomics here, but I don't think you > should be so hard on cargo. Sorry about the tone. I'm rather frustrated at how hard it is to do something that should be absolutely trivial (adding a local diagnostic panic!()/println!()). > The hash checking is designed to make > builds more reproducible, so that unless we make an explicit diversion > we know we're building with the same source as every other use of that > same package version. This has benefits for security, debugging, and > change control. We don't seem to need such change control beyond hg logs for e.g. the in-tree ICU or Skia, though. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Editing vendored crates
On Mon, Feb 27, 2017 at 7:47 PM, Ted Mielczarek wrote: > On Mon, Feb 27, 2017, at 12:32 PM, Henri Sivonen wrote: >> We don't seem to need such change control beyond hg logs for e.g. the >> in-tree ICU or Skia, though. > > As someone who has maintained a vendored upstream C++ project (Breakpad) > for a decade, I can say that this causes us headaches *all the time*. OK. > I'm sorry this is causing you pain, and we should figure out a way to > make it less painful, but note that the intention is that things in > `third_party/rust` should be actual third-party code, not things under > active development by Mozilla. We don't currently have a great middle > ground between "mozilla-central is the repository of record for this > crate" and "this crate is vendored from crates.io". We're finding our > way there with Servo, so we might have a better story for things like > encoding-rs when we get that working well. Note that my problem at hand isn't with encoding_rs but with the currently-vendored rust-encoding. That is, I indeed would like to add a diagnostic panic!()/println!() to genuinely third-party code--not to code I've written. That is, I'd like to experimentally understand what, if anything, rust-encoding is currently used for. (My best current hypothesis from reading things on SearchFox is that it's used in order to be able to compile one Option that's always None in Gecko.) -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Editing vendored crates
On Mon, Feb 27, 2017 at 8:00 PM, Bobby Holley wrote: > FWIW, |cargo tree| is a very helpful tool to figure out who's pulling in a > crate. Thanks, but what I'm trying to figure out isn't whose pulling it in (the style crate is) but whether it is actually used beyond an always-None Option in a way that would result in the data tables actually getting included in the build as oppose to (hopefully) getting discarded by LTO. (Motivation: If we are taking on the data tables, I want to argue that we should include encoding_rs instead even before the "replace uconv with encoding_rs" step is done.) -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Editing vendored crates
On Mon, Feb 27, 2017 at 8:47 PM, Simon Sapin wrote: > As an aside, I have a plan to remove rust-encoding entirely from Servo > (including Stylo) and use encoding_rs instead. But doing this the way I want > to has a number of prerequisites, and I’d prefer to switch everything at > once to avoid having both in the build. At the moment I’m prioritizing other > Stylo-related work, but I’m confident it’ll happen before we start shipping > Stylo. Nice! Works for me. :-) -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Should cheddar-generated headers be checked in?
On Mon, Feb 27, 2017 at 5:57 PM, Henri Sivonen wrote: > Maybe having the header generation as part of build.rs is more trouble > than it's worth in the first place... Now that the build.rs in commented out in the crates.io crate and the generated header is shipped in the crates.io crate: Considering that editing the vendored crates is not allowed, so I can't put moz.build files on the path to the headers, what's the appropriate way to make the m-c build system pick up headers from third_party/rust/encoding_c/include? -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Tracking bug for removals after XPCOM extensions are no more?
Do we have a tracking bug for all the stuff that we can and should remove once we no longer support XPCOM extensions? -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Tracking bug for removals after XPCOM extensions are no more?
On Mon, Mar 13, 2017 at 3:17 PM, Nathan Froyd wrote: > We do not. OK. I filed one: https://bugzilla.mozilla.org/show_bug.cgi?id=1347507 -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Tracking bug for removals after XPCOM extensions are no more?
On Wed, Mar 15, 2017 at 10:24 PM, Boris Zbarsky wrote: > On 3/15/17 3:26 PM, Botond Ballo wrote: >> >> What will happen to WebExtension Experiments once these APIs start >> being removed? My understanding is that WebExtension Experiments use >> the same XPCOM APIs as XUL addons. > > > We shouldn't be removing APIs that have no alternative. In some cases there's an alternative, but the legacy dependencies are turtles all the way down. What's the current outlook on letting chrome JS read ArrayBuffers as opposed to JS strings where the high 8 bits are zero and the low 8 bits are the byte values from XPCOM streams? (Or letting chrome JS access things that are currently exposed as XPCOM streams via some other thing that exposes bytes as ArrayBuffers?) It would be good to remove nsIScriptableUConv, nsIConverterInputStream and nsIConverterOutputStream sooner than later and let chrome JS use TextDecoder like Web JS. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Will script implementations of nsIOutputStream be prohibited once XPCOM extensions are no more?
Do we need to keep caring about https://bugzilla.mozilla.org/show_bug.cgi?id=170416 once XPCOM extensions are no more? -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Tracking bug for removals after XPCOM extensions are no more?
On Thu, Mar 16, 2017 at 7:34 AM, Boris Zbarsky wrote: > On 3/15/17 5:35 PM, Henri Sivonen wrote: >> >> What's the current outlook on letting chrome JS read ArrayBuffers as >> opposed to JS strings where the high 8 bits are zero and the low 8 >> bits are the byte values from XPCOM streams? > > > I see no reason not to allow that. We'd just add this on > nsIScriptableInputStream, I assume, so we don't have to modify every single > stream impl OK. Thanks. Turns out this is already on file: https://bugzilla.mozilla.org/show_bug.cgi?id=923017 -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Tracking bug for removals after XPCOM extensions are no more?
On Thu, Mar 16, 2017 at 8:12 PM, Kris Maglione wrote: > On Wed, Mar 15, 2017 at 11:35:10PM +0200, Henri Sivonen wrote: >> >> What's the current outlook on letting chrome JS read ArrayBuffers as >> opposed to JS strings where the high 8 bits are zero and the low 8 >> bits are the byte values from XPCOM streams? (Or letting chrome JS >> access things that are currently exposed as XPCOM streams via some >> other thing that exposes bytes as ArrayBuffers?) > > > This is already possible via nsIBinaryInputStream: > > http://searchfox.org/mozilla-central/rev/571c1fd0ba0617f83175ccc06ed6f3eb0a1a8b92/xpcom/io/nsIBinaryInputStream.idl#71-82 The stated purpose of nsIBinaryInputStream is very different from the stated purpose of nsIScriptableInputStream. Since the needed code is already in the former, should we nonetheless tell people to use the former and deprecate the latter instead of trying to modernize the latter within its stated purpose? (I'd be fine with changing the documentation on both IDLs to broaden the stated purpose of nsIBinaryInputStream and to deprecate and hopefully subsequently remove nsIScriptableInputStream. Fewer nsIFoo is better.) -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Do we actually have use cases for rejecting non-characters in UTF-8ness check?
Our IsUTF8() by default rejects strings that contain code points whose lowest 16 bits are 0xFFFE or 0x. Do we actually have use cases for rejecting such strings in UTF-8ness checks? The code was introduced in https://bugzilla.mozilla.org/show_bug.cgi?id=191541 and both the patch author and the reviewer seemed unsure of the utility of this quirk at the time. (To reduce bloat and to benefit from SIMD, I'd like to replace the implementation of IsUTF8() with a call to Rust code that contains optimized UTF-8ness checking code in any case, but that code doesn't have the quirk of rejecting non-characters.) -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Proper way to return nsresult from Rust before Stylo is mandatory?
It seems that our Rust bindings for nsresult are part of Stylo, but Stylo isn't yet a guaranteed part of the build. Until Stylo becomes a mandatory part of the build, what's the proper way to return nsresult from Rust such that it works with or without Stylo enabled? -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Do we actually have use cases for rejecting non-characters in UTF-8ness check?
On Fri, Mar 17, 2017 at 12:12 PM, Anne van Kesteren wrote: > On Fri, Mar 17, 2017 at 11:00 AM, Henri Sivonen wrote: >> Our IsUTF8() by default rejects strings that contain code points whose >> lowest 16 bits are 0xFFFE or 0x. >> >> Do we actually have use cases for rejecting such strings in UTF-8ness checks? > > I'm not aware of any web-observable feature that would need that. Thanks. > The > only places I know of that do something with non-characters are URLs > and HTML, which exclude them for validity purposes, but there's no > browser API necessarily affected by that and they wouldn't use a > IsUTF8() code path. Are there too many callers to examine the > implications? The callers aren't many, but they involve protocols and formats that I'm not familiar with on the quirk level of detail: https://searchfox.org/mozilla-central/search?q=symbol:_Z6IsUTF8RK10nsACStringb&redirect=false As a matter of API design, I disapprove of a method called IsUTF8 doing something other than a pure UTF-8ness check. For example, the reason why it now has the option to opt out of the non-character rejection quirk is that Web Socket code used the function for what its name says and that was a bug. Instead of changing the semantics to match the name for everyone, an opt-out was introduced for callers in Web Socket code. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Future of out-of-tree spell checkers?
Without XPCOM extensions, what's the story for out-of-tree spell checkers? Finnish spell checking in Firefox (and Thunderbird) has so far been accomplished using the mozvoikko extension, which implements mozISpellCheckingEngine in JS and connects to the libvoikko[1] back end via jsctypes. (Even though hunspell was initially developed for Hungarian and, therefore, was initially hoped to be suitable for Finnish, it turned out to be inadequate for dealing with Finnish.) Previously, libvoikko was GPL-only, but it seems that most code in the newest version can be alternatively used under MPL 1.1. (I don't know why one would want to compile in the GPL-only stuff. Maybe for compatibility with legacy-format Northern Sami or Greenlandic dictionaries?) Considering that mozvoikko already requires libvoikko to be present on the system by other means and libvoikko now supports a non-GPL configuration, could we put C++ glue code in-tree and dlopen libvoikko if found? [1] http://voikko.puimula.org/ -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Future of out-of-tree spell checkers?
On Wed, Mar 22, 2017 at 11:18 AM, Henri Sivonen wrote: > Without XPCOM extensions, what's the story for out-of-tree spell checkers? > > Finnish spell checking in Firefox (and Thunderbird) has so far been > accomplished using the mozvoikko extension, which implements > mozISpellCheckingEngine in JS and connects to the libvoikko[1] back > end via jsctypes. Further searching strongly suggest that there exist just 3 implementors of mozISpellCheckingEngine: 1) The in-tree wrapper for Mozilla's fork of Hunspell. 2) The mozvoikko extension that provides Finnish spell checking using libvoikko. 3) The Kukkuniiaat extension that provides Greenlandic spell checking using libvoikko. To me, this is a strong indication that we should add a C++ adapter for (dlopened) libvoikko in-tree and deCOMtaminate mozISpellCheckingEngine while at it. (FWIW, the desktop browser market share of Firefox in both Finland and Greenland is above the average for Europe. It would be sad to mess that up by just letting this stuff break.) -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Future of out-of-tree spell checkers?
On Wed, Mar 22, 2017 at 3:52 PM, Nicolas B. Pierron wrote: > On 03/22/2017 09:18 AM, Henri Sivonen wrote: >> >> Without XPCOM extensions, what's the story for out-of-tree spell checkers? >> >> […], which implements >> mozISpellCheckingEngine in JS and connects to the libvoikko[1] back >> end via jsctypes. […] > > > Would compiling libvoikko to WebAssembly remove the need for jsctypes and > XPCOM? It would remove the need for jsctypes, but how would a WebAssembly program in a Web Extension get to act as a spell checking engine once extensions can no longer implement XPCOM interfaces (mozISpellCheckingEngine in this case)? -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Future of out-of-tree spell checkers?
On Wed, Mar 22, 2017 at 4:45 PM, Axel Hecht wrote: > Am 22.03.17 um 15:39 schrieb Jorge Villalobos: >> >> On 3/22/17 8:10 AM, Henri Sivonen wrote: >>> >>> On Wed, Mar 22, 2017 at 3:52 PM, Nicolas B. Pierron >>> wrote: >>>> >>>> On 03/22/2017 09:18 AM, Henri Sivonen wrote: >>>>> >>>>> >>>>> Without XPCOM extensions, what's the story for out-of-tree spell >>>>> checkers? >>>>> >>>>> […], which implements >>>>> mozISpellCheckingEngine in JS and connects to the libvoikko[1] back >>>>> end via jsctypes. […] >>>> >>>> >>>> >>>> Would compiling libvoikko to WebAssembly remove the need for jsctypes >>>> and >>>> XPCOM? >>> >>> >>> It would remove the need for jsctypes, but how would a WebAssembly >>> program in a Web Extension get to act as a spell checking engine once >>> extensions can no longer implement XPCOM interfaces >>> (mozISpellCheckingEngine in this case)? >>> >> >> Note there is a bug on file to implement an spell-checker API for >> WebExtensions: https://bugzilla.mozilla.org/show_bug.cgi?id=1343551 >> >> The API request was approved but is low priority. >> >> Jorge >> > > Note, that bug seems about using an API like mozISpellCheckingEngine from > web extensions. > > It doesn't seem to be about providing an implementation of it via a web > extension. Indeed. Considering that there seems to be only one out-of-tree library that gets glued into a mozISpellCheckingEngine provider (libvoikko), it seems to me that it would be misplaced effort if Mozilla designed a Web Extension API for providing a spell checker and then asked the Voikko developers to figure out how to compile the code into WebAssembly and how to package the wasm and all the data files as a Web Extension. dlopening libvoikko, if installed, and having thin C++ glue code in-tree seems much simpler, except maybe for sandboxing. What are the sandboxing implications of dlopening a shared library that will want to load its data files? -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Faster gecko builds with IceCC on Mac and Linux
On Wed, Jul 6, 2016 at 2:42 AM, Gregory Szorc wrote: > The Lenovo ThinkStation P710 is a good starting point ( > http://shop.lenovo.com/us/en/workstations/thinkstation/p-series/p710/). To help others who follow the above advice save some time: Xeons don't have Intel integrated GPUs, so one has to figure how to get this up and running with a discrete GPU. In the case of Nvidia Quadro M2000, the latest Ubuntu and Fedora install images don't work. This works: Disable or enable the TPM. (By default, it's in a mode where the kernel can see it but it doesn't work. It should either be hidden or be allowed to work.) Disable secure boot. (Nvidia's proprietary drivers don't work with secure boot enabled.) Use the Ubuntu 16.04.1 install image (i.e. intentionally old image--you can upgrade later) After installing, edit /etc/default/grub and set GRUB_CMDLINE_LINUX_DEFAULT="" (i.e. make the string empty; without this, the nvidia proprietary driver conflicts with LUKS pass phrase input). update-initramfs -u update-grub apt install nvidia-375 Then upgrade the rest. Even rolling forward the HWE stack works *after* the above steps. (For a Free Software alternative, install Ubuntu 16.04.1, stick to 2D graphics from nouveau with llvmpipe for 3D and be sure never to roll the HWE stack forward.) -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Future of out-of-tree spell checkers?
On Fri, Mar 24, 2017 at 2:38 AM, Ehsan Akhgari wrote: > On Wed, Mar 22, 2017 at 11:50 AM, Jeff Muizelaar > wrote: >> >> On Wed, Mar 22, 2017 at 11:08 AM, Henri Sivonen >> wrote: >> > >> > dlopening libvoikko, if installed, and having thin C++ glue code >> > in-tree seems much simpler, except maybe for sandboxing. What are the >> > sandboxing implications of dlopening a shared library that will want >> > to load its data files? >> >> My understanding is that the spell checker mostly lives in the Chrome >> process so it seems sandboxing won't be a problem. > > > That is mostly correct. The spell checker *completely* lives in the parent > process and is completely unaffected by sandboxing. > > But that's actually a problem. My understanding is that WebExtensions won't > be allowed to load code in the parent process. Bill, Kris, is that correct? > If yes, we should work with the maintainers of the Finnish and Greenlandic > dictionaries on adding custom support for loading their code... But when (according to doing a Google Web search excluding mozilla.org and wading through all the results and by searching the JS for all AMO-hosted extensions) the only out-of-tree spell checkers use libvoikko, why involve Web Extensions at all? Why wouldn't we dlopen libvoikko and put a thin C++ adapter between libvoikko's C API and our internal C++ interface in-tree? That would be significantly simpler than involving Web extensions. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Future of out-of-tree spell checkers?
On Fri, Mar 24, 2017 at 3:20 PM, Ehsan Akhgari wrote: > On 2017-03-24 4:20 AM, Henri Sivonen wrote: >> On Fri, Mar 24, 2017 at 2:38 AM, Ehsan Akhgari >> wrote: >>> On Wed, Mar 22, 2017 at 11:50 AM, Jeff Muizelaar >>> wrote: >>>> >>>> On Wed, Mar 22, 2017 at 11:08 AM, Henri Sivonen >>>> wrote: >>>>> >>>>> dlopening libvoikko, if installed, and having thin C++ glue code >>>>> in-tree seems much simpler, except maybe for sandboxing. What are the >>>>> sandboxing implications of dlopening a shared library that will want >>>>> to load its data files? >>>> >>>> My understanding is that the spell checker mostly lives in the Chrome >>>> process so it seems sandboxing won't be a problem. >>> >>> >>> That is mostly correct. The spell checker *completely* lives in the parent >>> process and is completely unaffected by sandboxing. >>> >>> But that's actually a problem. My understanding is that WebExtensions won't >>> be allowed to load code in the parent process. Bill, Kris, is that correct? >>> If yes, we should work with the maintainers of the Finnish and Greenlandic >>> dictionaries on adding custom support for loading their code... >> >> But when (according to doing a Google Web search excluding mozilla.org >> and wading through all the results and by searching the JS for all >> AMO-hosted extensions) the only out-of-tree spell checkers use >> libvoikko, why involve Web Extensions at all? Why wouldn't we dlopen >> libvoikko and put a thin C++ adapter between libvoikko's C API and our >> internal C++ interface in-tree? That would be significantly simpler >> than involving Web extensions. > > Is that different than what I suggested above in some way that I'm > missing? I thought you meant that Web Extensions were your primary choice if they could load code into the parent process. > I think it's better to engage the developers of those > libraries first and ask them how they would like us to proceed. I wanted to get an understanding of what we'd be OK with before contacting Harri Pitkänen (libvoikko developer) or Timo Jyrinki (libvoikko and mozvoikko maintainer for Debian and Ubuntu), because I don't want to cause them to write code only to find a Mozilla decision render the code useless. On Fri, Mar 24, 2017 at 8:45 PM, Bill McCloskey wrote: > If we do end up going with the dlopen plan, let's make sure that we enforce > some kind of code signing. We're finally almost rid of all the untrusted > binary code that we used to load (NPAPI, binary XPCOM, ctypes). It would be > a shame to open up a new path. What threat do you intend to defend against? On Linux, we should think of libvoikko as an optional system library. (If you install Ubuntu choosing English as the system language at install time, libvoikko is not installed by default. If you install Ubuntu choosing Finnish as the system language at install time, libvoikko is installed by default. In any case, you can get it from the distro repo.) We already dlopen() PulseAudio as a system library that we don't verify. In the Crash Reporter, we dlopen() libcurl and some Gnome stuff. I expect that someone operating with the user's privileges can cause whatever unverified code to be mapped into our address space via LD_PRELOAD and system libraries that we link against unconditionally. As for Windows, since a spell checker doesn't add Web-exposed functionality, we wouldn't have the risk that we had with NPAPI (or, technically, with arbitrary add-ons) that a site could entice users to run a random setup.exe in order to see some additional Web content. The libvoikko API is pretty narrow, so I wouldn't expect it to enable more anti-virus mischief than what can be done by hooking stuff into the Windows system DLLs that we need to use. The main problems I see are: 1) Right now the libvoikko distribution point is without https. (Fixable with Let's Encrypt.) 2) We couldn't trigger a libvoikko autoupdate on Windows/Mac if there was a crasher bug in the library. (I'd expect the distros to take care of pushing an update in the Linux case. It's the same situation with e.g. PulseAudio and Gtk anyway.) On Sat, Mar 25, 2017 at 8:48 PM, Ehsan Akhgari wrote: > Another option would be talking to the maintainers of libvoikko and > seeing if they would be open to maintaining the Mozilla bindings, in > which case I think we should even consider doing something like what we > do to download the OpenH264 binary at runtime when we need to. We could > even build and sign it in the infrastructure ourselves if we imported it &g
Re: Future of out-of-tree spell checkers?
On Sat, Mar 25, 2017 at 8:48 PM, Ehsan Akhgari wrote: > Another option would be talking to the maintainers of libvoikko and > seeing if they would be open to maintaining the Mozilla bindings, I started a fact-finding thread on the libvoikko list: http://lists.puimula.org/pipermail/libvoikko/2017-March/000896.html (Not about anyone writing any code yet.) -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Rationalising Linux audio backend support
On Wed, Mar 29, 2017 at 12:29 PM, Kurt Roeckx wrote: > The FAQ seems to suggest that telemetry is only enabled in the pre-release > versions > and not in the release versions. I assume there is a bias that is caused by > this. There are two types of telemetry: "Firefox Health Report" (enabled by default) and "Telemetry" (enabled by default in Nightly, Aurora and Beta but not in release builds). Arguably, system configuration info belongs under FHR, so it would not be optimal if the Pulse check wasn't there but was in opt-in Telemetry instead. Where was it? It's a problem if distros disable FHR by default or if distros disable the first-run on-boarding UI for opt-in Telemetry. In any case, running without telemetry means not having a say in data-driven decisions about what configurations Mozilla should support. It's OK to disable telemetry (that's why it's user-controllable), but both users and distros that make decisions on users' behalf should to take into account that if don't let Firefox send info about your system config to Mozilla, your system config is invisible to Mozilla's decision making about what to support. > Pulseaudio is really a layer between the application and alsa. If pulseaudio > can do something it should be possible to do the same with alsa. It's not that "ALSA can't do this or that" it's "cubeb on top of ALSA without Pulse in between can't do this or that without additional work that's already done by Pulse or by cubeb on top of Pulse". > But maybe pulseaudio makes certain things easier, I don't know. That PulseAudio makes things easier has been a key point that has been made. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
PSA: mozilla::Span - slices for C++
https://bugzilla.mozilla.org/show_bug.cgi?id=1295611 has landed and added mozilla::Span (#include "mozilla/Span.h"; code in mfbt/Span.h). Going forward, please consider using mozilla::Span as a method argument where you'd previously have used a pointer argument and a length argument. Span implements Rust's slice concept for C++. It's called "Span" instead of "Slice" to follow the naming used in C++ Core Guidelines. A Span wraps a pointer and a length that identify a non-owning view to a contiguous block of memory of objects of the same type. Various types, including (pre-decay) C arrays, XPCOM strings, nsTArray, mozilla::Array, mozilla::Range and contiguous standard-library containers, auto-convert into Spans when attempting to pass them as arguments to methods that take Spans. MakeSpan() functions can be used for explicit conversion in other contexts. MakeSpan() works conveniently with local variables declared as auto, so you don't need to write the type parameter of the Span. (Span itself autoconverts into mozilla::Range.) Like Rust's slices, Span provides safety against out-of-bounds access by performing run-time bound checks. However, unlike Rust's slices, Span cannot provide safety against use-after-free. (Note: Span is like Rust's slice only conceptually. Due to the lack of ABI guarantees, you should still decompose spans/slices to raw pointer and length parts when crossing the FFI.) In addition to having constructors and MakeSpan() functions that take various well-known types, a Span for an arbitrary type can be constructed (via constructor or MakeSpan()) from a pointer and a length or a pointer and another pointer pointing just past the last element. A Span can be obtained for const char* pointing to a zero-terminated C string using the MakeCStringSpan() function. A corresponding implicit constructor does not exist in order to avoid accidental construction in cases where const char* does not point to a zero-terminated C string. Span has methods that follow the Mozilla naming style and methods that don't. The methods that follow the Mozilla naming style are meant to be used directly from Mozilla code. The methods that don't are meant for integration with C++11 range-based loops and with meta-programming that expects the same methods that are found on the standard-library containers. For example, to decompose a Span into its parts in Mozilla code, use Elements() and Length() (as with nsTArray) instead of data() and size() (as with std::vector). The pointer and length wrapped by a Span cannot be changed after a Span has been created. When new values are required, simply create a new Span. Span has a method called Subspan() that works analogously to the Substring() method of XPCOM strings taking a start index and an optional length. As a Mozilla extension (relative to Microsoft's gsl::span that mozilla::Span is based on), Span has methods From(start), To(end) and FromTo(start, end) that correspond to Rust's &slice[start..], &slice[..end] and &slice[start..end], respectively. (That is, the end index is the index of the first element not to be included in the new subspan.) When indicating a Span that's only read from, const goes inside the type parameter. Don't put const in front of Span. That is: size_t ReadsFromOneSpanAndWritesToAnother(Span aReadFrom, Span aWrittenTo); Any Span can be viewed as Span using the function AsBytes(). Any Span can be viewed as Span using the function AsWritableBytes(). FAQ: Why introduce mozilla::Span when mozilla::Range already exits? 1) Span's storage (pointer and length as opposed to pointer and pointer) matches Rust's slices without arithmetic, so they decompose and recompose as the other on the other side of the FFI without arithmetic. 2) An earlier dev-platform thread seemed to OK Span on the grounds of it having enough of a pedigree from C++ Core Guidelines. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Rationalising Linux audio backend support
On Mar 31, 2017 4:49 PM, "Chris Coulson" wrote: The Firefox package in Ubuntu is maintained by 1 contributor in his spare time and myself who is only able to do the minimum in order to provide updates, Does today’s announcement of Ubuntu’s change in direction affect resourcing for Firefox packaging? ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Welcome new HTML Parser peer
William Chen made the largest change to the HTML Parser since the initial landing of the current HTML parser by adding support for HTML templates and, since then, has been reviewing HTML Parser changes for a long time now. I'm pleased to announce that William in now officially an HTML Parser peer. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Future of out-of-tree spell checkers?
On Sat, Apr 15, 2017 at 8:06 PM, Ehsan Akhgari wrote: > On 2017-03-27 3:30 AM, Henri Sivonen wrote: >> 2) We couldn't trigger a libvoikko autoupdate on Windows/Mac if there >> was a crasher bug in the library. (I'd expect the distros to take care >> of pushing an update in the Linux case. It's the same situation with >> e.g. PulseAudio and Gtk anyway.) > > It is also untrusted and unsigned code and can cause security and > unstability issues. We have done a lot of work to remove all such code > from our parent process. I don't think it's useful to make an analogy > between this code and things like gtk. I get it that libvoikko and gtk may (I haven't checked) have a different code quality level and, therefore, involve different parent process crash or exploitability risk. However, on e.g. Ubuntu and Debian the trust and signedness status is indeed the same as for gtk: both gtk and libvoikko are distro-provided code that is signed for delivery but signatures aren't checked when executing the code (i.e. the trust model of the OS doesn't treat root-owned libraries under /usr as adversarial in general) and the distro is responsible for pushing updates in case of critical bugs. It would help me understand the issues if you could expand on your trust and signing concerns. >> On Sat, Mar 25, 2017 at 8:48 PM, Ehsan Akhgari >> wrote: >>> Another option would be talking to the maintainers of libvoikko and >>> seeing if they would be open to maintaining the Mozilla bindings, in >>> which case I think we should even consider doing something like what we >>> do to download the OpenH264 binary at runtime when we need to. We could >>> even build and sign it in the infrastructure ourselves if we imported it >>> into the tree, with task cluster this is possible today with a super >>> simple shell script (well, at least the building side of it!). >> >> If we are willing to do the engineering for that, that would be great! >> (Of course, just putting libvoikko into libxul would be simpler, but >> would cost an added 250 KB in libxul size for everyone who doesn't >> need libvoikko.) > > That's not an option. 250KB for essentially dead code for most of our > users is too much. Annoyingly, chances are that no one will be willing to say ahead of time how many kilobytes would be acceptable. :-/ As for how many users this would benefit, there's a big difference between the immediate and the potential. The immediate is: very few relative to the entire Firefox user population. There exist dictionaries with clear licensing for Finnish, Northern Sami, Southern Sami and Lule Sami and a dictionary with unclear (at least to me) licensing for Greenlandic. The spell checking engine has broader applicability, though. Maybe if we made it available with the same ease as Hunspell, it would make it worthwhile for other languages that are too complex for Hunspell to get dictionaries made or maybe some languages that are unsatisfactorily supported by Hunspell would migrate leading to better UX for users whose language already seems to be covered by Hunspell but isn't actually handled well by Hunspell. Hard to say. > It may still be possible for them to provide a native library to us that > we can load on the background thread and call into but it may require > code changes on their side as well as our side to get that to work properly. In a background thread in the chrome process? I.e. not isolated in a way that would protect against the spell checker crashing the chrome process? -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Future of out-of-tree spell checkers?
On Wed, Apr 19, 2017 at 4:43 AM, Ehsan Akhgari wrote: > On 2017-04-18 2:38 AM, Henri Sivonen wrote: >> On Sat, Apr 15, 2017 at 8:06 PM, Ehsan Akhgari >> wrote: >>> On 2017-03-27 3:30 AM, Henri Sivonen wrote: >>>> 2) We couldn't trigger a libvoikko autoupdate on Windows/Mac if there >>>> was a crasher bug in the library. (I'd expect the distros to take care >>>> of pushing an update in the Linux case. It's the same situation with >>>> e.g. PulseAudio and Gtk anyway.) >>> >>> It is also untrusted and unsigned code and can cause security and >>> unstability issues. We have done a lot of work to remove all such code >>> from our parent process. I don't think it's useful to make an analogy >>> between this code and things like gtk. >> >> I get it that libvoikko and gtk may (I haven't checked) have a >> different code quality level and, therefore, involve different parent >> process crash or exploitability risk. However, on e.g. Ubuntu and >> Debian the trust and signedness status is indeed the same as for gtk: >> both gtk and libvoikko are distro-provided code that is signed for >> delivery but signatures aren't checked when executing the code (i.e. >> the trust model of the OS doesn't treat root-owned libraries under >> /usr as adversarial in general) and the distro is responsible for >> pushing updates in case of critical bugs. > > Sure, but why do you keep bringing up these two distros? What about > Windows, where presumably most of Finnish and Greenlandic speaking users > will be? :-) I made the gtk/pulse comparison in the Linux context only. >> It would help me understand the issues if you could expand on your >> trust and signing concerns. > > The security issues should be obvious. I don't trust the C++ code that > I write and by extension I don't trust the C++ code that anybody else > writes. I see. I thought about "trusted" in the usual sense. I.e. code is "trusted" if it has been given the necessary privileges to mess everything up. > The stability issues: If you go to > https://crash-stats.mozilla.com/topcrashers/?product=Firefox&version=52.0.2&days=7 > right now, you will see top crashers caused by untrusted binary code > that we don't control doing bad things (I spotted #11, > js::Proxy::construct based on a cursory look right now). We have years > of concrete hard evidence in terms of 100s of crash bug reports. What's > even worse about this particular case is that due to the smaller size of > the user base, the chances of issues like crashes raising to an extent > that they become visible under our radar is slim. So the concrete risk > would be the possibility of loading this code in the parent process > causing a startup crash that flies under the radar and costs us all > users in those locales. It's unclear to me if you are arguing that Mozilla shouldn't distribute libvoikko, because it might have a crasher bug that we might not detect despite having the ability to push updates, or if you are arguing that we shouldn't load libvoikko that's present on the user's system via non-Mozilla distribution mechanism, because it might have a crasher bug that we could neither detect nor push a fix for. Either way, I still don't see how code signing would address this concern. Running spell checking in a separate process would. What problem did you mean to address by code signing? >>>> On Sat, Mar 25, 2017 at 8:48 PM, Ehsan Akhgari >>>> wrote: >>>>> Another option would be talking to the maintainers of libvoikko and >>>>> seeing if they would be open to maintaining the Mozilla bindings, in >>>>> which case I think we should even consider doing something like what we >>>>> do to download the OpenH264 binary at runtime when we need to. We could >>>>> even build and sign it in the infrastructure ourselves if we imported it >>>>> into the tree, with task cluster this is possible today with a super >>>>> simple shell script (well, at least the building side of it!). >>>> >>>> If we are willing to do the engineering for that, that would be great! >>>> (Of course, just putting libvoikko into libxul would be simpler, but >>>> would cost an added 250 KB in libxul size for everyone who doesn't >>>> need libvoikko.) >>> >>> That's not an option. 250KB for essentially dead code for most of our >>> users is too much. >> >> Annoyingly, chances are that no one will be willing to say ahead of >>
Re: Future of out-of-tree spell checkers?
On Tue, Apr 25, 2017 at 9:02 PM, Bill McCloskey wrote: > On Tue, Apr 25, 2017 at 5:41 AM, Henri Sivonen wrote: >> >> What problem did you mean to address by code signing? > > The reason I suggested code signing is because loading libvoikko would > provide an easy way for people to inject code into Firefox. For a while > we've been trying to make it difficult for semi-legit-but-not-quite-malware > parties to load crappy code into Firefox (I'm thinking of crappy antivirus > software, adware, etc.). Removing binary XPCOM components and NPAPI support, > and requiring add-on signing, are all facets of this. If we simply load and > run code from any file named voikko.dll on the user's computer, then we've > opened up another door. It's a less powerful door since we probably (I hope) > wouldn't give them access to XPCOM. But they could still open windows that > look like they came from Firefox and I imagine there's other bad stuff I > haven't thought of. > > People often object to this argument by saying that, without libvoikko, > these bad actors could just replace libxul or something. But I think in > practice it would be harder for them to pull that off, both technically and > socially. From a technical perspective, it's harder to replace core parts of > Firefox while still leaving it in a working state, especially if the updater > is still allowed to run. And socially, I think it makes their software look > a lot more like malware if they replace parts of Firefox rather than simply > install a new DLL that we then load. This concern applies to Windows but not to Linux, right? What about Mac? To address that concern, the local system itself would have to be treated as semi-hostile and the signature would have to be checked at library load time as opposed to the usual library install time. Do we have pre-existing code for that? AFAIK, in the case of OpenH264 we check a hash at library install time, but when we subsequently load the library, we don't check a hash or signature. In the case of OpenH264, the library gets loaded into a sandbox, which probably addresses the concern of a replacement OpenH264 with dodgy additional code being able to open windows that look like they came from Firefox. Assuming that we don't already have code for validating library provenance at library load time, wouldn't it make more sense to put effort into reusing the facilities for spawning a GMP process to spawn a low-privilege spell checking process than to try validate the provenance of already-installed code in a way that still doesn't address the crash impact concern in the case of the code being legitimate? > Overall, though, I agree with Ehsan that this discussion isn't very > worthwhile unless we what the voikko people want to do. It seems to me that this thread raises enough concerns on our side that it doesn't make sense to ask a third party what they want to do before we have an idea what we'd be OK with. Suppose they'd say they'd want to include libvoikko in Firefox like Hunspell? We'd have binary size and crash impact concerns. Suppose they'd say they'd want to make libvoikko download on-demand using Mozilla infra like OpenH264? We'd have concerns of finding release engineers and front end engineers with time to set it up, the crash impact concern and the concern of another party dropping malware in libvoikko's place. Suppose they'd say they'd want to install libvoikko somewhere on the user's library path and have us dlopen() it? We'd have concerns about crash impact, being able to remedy crashes, directing people to install non-Mozilla software (though @firefox on Twitter regularly does) and other parties dropping malware in libvoikko's place. Suppose they'd say they'd want to ship a back end for system spell checking frameworks and have us use the system spell checking API[1]? We'd have concerns of Windows 7 not being covered, directing people to install non-Mozilla software and crash impact at least in the Linux case (AFAICT, Enchant doesn't provide process isolation from the back end; I *think* Apple's solution does; not sure about Windows 8+) and having to write 3 system-specific spell checking integrations. Suppose they'd say they'd want to ship it as Web Assembly in a Web Extension? We'd have concern about allocating engineering time to enable a Web Extension to act as a spell checking provider, when there's only one extension that'd foreseeably use it. - - [1] Enchant on Linux (currently hard-codes the assumption that Voikko is Finnish only, so at least for the time being (until a potential future version of Enchant without that hard-coding makes its way through the distros) would throw Greenlandic, Sami and hope of other lan
Rounding to the jemalloc bucket size
For growable buffer types that internally contain the logical length (how many slots of the buffer are in use as far as external callers are concerned) and capacity (the actual allocation length of the buffer), the capacity should ideally always equal to a bucket size of the underlying allocator. Otherwise, the growable buffer type may think its capacity has been reached and there's a need to reallocate when the true underlying allocation still has space. To this end, we should have a method that takes a size and rounds it up to the closest jemalloc bucket size. Do we already have such a method somewhere? -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Future of out-of-tree spell checkers?
On Wed, Apr 26, 2017 at 9:49 PM, Ehsan Akhgari wrote: > On 04/26/2017 07:02 AM, Henri Sivonen wrote: >> >> On Tue, Apr 25, 2017 at 9:02 PM, Bill McCloskey >> wrote: >>> >>> On Tue, Apr 25, 2017 at 5:41 AM, Henri Sivonen >>> wrote: >>>> >>>> What problem did you mean to address by code signing? >>> >>> The reason I suggested code signing is because loading libvoikko would >>> provide an easy way for people to inject code into Firefox. > > > Yes, this is precisely what I'm worried about as well. >>> >>> For a while >>> we've been trying to make it difficult for >>> semi-legit-but-not-quite-malware >>> parties to load crappy code into Firefox (I'm thinking of crappy >>> antivirus >>> software, adware, etc.). Removing binary XPCOM components and NPAPI >>> support, >>> and requiring add-on signing, are all facets of this. If we simply load >>> and >>> run code from any file named voikko.dll on the user's computer, then >>> we've >>> opened up another door. It's a less powerful door since we probably (I >>> hope) >>> wouldn't give them access to XPCOM. But they could still open windows >>> that >>> look like they came from Firefox and I imagine there's other bad stuff I >>> haven't thought of. >>> >>> People often object to this argument by saying that, without libvoikko, >>> these bad actors could just replace libxul or something. But I think in >>> practice it would be harder for them to pull that off, both technically >>> and >>> socially. From a technical perspective, it's harder to replace core parts >>> of >>> Firefox while still leaving it in a working state, especially if the >>> updater >>> is still allowed to run. And socially, I think it makes their software >>> look >>> a lot more like malware if they replace parts of Firefox rather than >>> simply >>> install a new DLL that we then load. >> >> This concern applies to Windows but not to Linux, right? What about Mac? > > FTR my main concern is about Windows here. But that being said I think we > can probably do something similar for Linux and Mac (but if we don't have > the time or resources to address those first/now, that's probably fine.) As noted previously about how we load other libs on Linux, I think it doesn't make sense to do load-time signature checking on Linux. >> To address that concern, the local system itself would have to be >> treated as semi-hostile and the signature would have to be checked at >> library load time as opposed to the usual library install time. Do we >> have pre-existing code for that? > > We should treat the local system as *hostile*. Because that's what it is in > the real world at least for our Windows users. > > I was hoping that we can use the code that we use to sign and verify our mar > files for the updater here, see for example this header which we use for > signature verification > <http://searchfox.org/mozilla-central/source/modules/libmar/verify/cryptox.h>. > I'm suggesting to use this code as a *basis* for this work, so there will be > some new code to be written for sure. > > The advantage of this code is that it's pretty self-contained, so for > example we can use it to create a small command line utility to give the > voikko folks to use for signing, etc. So this would be a special Mozilla-specific code signing scheme and not Authenticode for Windows. >> AFAIK, in the case of OpenH264 we check a hash at library install >> time, but when we subsequently load the library, we don't check a hash >> or signature. In the case of OpenH264, the library gets loaded into a >> sandbox, which probably addresses the concern of a replacement >> OpenH264 with dodgy additional code being able to open windows that >> look like they came from Firefox. >> >> Assuming that we don't already have code for validating library >> provenance at library load time, wouldn't it make more sense to put >> effort into reusing the facilities for spawning a GMP process to spawn >> a low-privilege spell checking process than to try validate the >> provenance of already-installed code in a way that still doesn't >> address the crash impact concern in the case of the code being >> legitimate? >> >>> Overall, though, I agree with Ehsan that this discussion isn't very >>> worthwhile unless we what the voikko people want to do. >> &g
Re: Rounding to the jemalloc bucket size
On Thu, Apr 27, 2017 at 10:37 AM, Marco Bonardo wrote: > I'm far from being an expert here, but I seem to remember from Storage that > we have malloc_good_size Thank you! I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1360139 and https://bugzilla.mozilla.org/show_bug.cgi?id=1360138 . -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Updating a particular vendored crate by minor version
I have toolkit/library/rust/shared/Cargo.toml depending on a crates.io crate encoding_c, which depends on encoding_rs. Then I update the minor version of encoding_rs on crates.io but don't update encoding_c. Now if I re-run ./mach vendor rust, nothing happens, because I didn't change the encoding_c dependency version in toolkit/library/rust/shared/Cargo.toml to force the dependencies to be unsatisfied. If, instead, I delete toolkit/library/rust/shared/Cargo.lock and then run ./mach vendor rust, I get minor version updates to all crates in the dependency graph that have changed since they were vendored. Other than manually merging lock files and unstaging unrelated crate changes, how do I scope the re-vendoring to encoding_rs only? -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Updating a particular vendored crate by minor version
On Tue, May 2, 2017 at 3:11 PM, Kartikaya Gupta wrote: > You can update a specific crate to a specific version like so: > > cd toolkit/library/rust > cargo update -p encoding_rs --precise > cd ../gtest/rust > cargo update -p encoding_rs --precise > cd ../../../../ > mach vendor rust Thank you. This works. Filed https://bugzilla.mozilla.org/show_bug.cgi?id=1361734 to get mach vendor to perform those steps. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Representing a pointer to static in XPConnected JS?
Our codebase has the conceptual design flaw of representing character encodings as nsACStrings holding the name of the encoding instead of having a type-safe representation. This causes ambiguity between strings that are external protocol text designating an encoding ("label" in spec speak; many labels to one encoding) and internal strings that have one-to-one mapping to encodings ("name" in spec speak). In https://bugzilla.mozilla.org/show_bug.cgi?id=encoding_rs , I'm in the process of introducing a type-safe representation for encodings: const mozilla::Encoding* in C++ and Option<&'static encoding_rs::Encoding> in Rust, which have the same memory representation and, thus, are zero-toll bridged between C++ and Rust. (That is, what gets passed around is just a pointer to static. The pointee's fields are only accessed on the Rust side.) The next step (as a multiple separate work items and landings) would be using the type-safe representation throughout the tree. However, using const mozilla::Encoding* in place of char* or nsACString throughout C++ code has JavaScript implications, because encodings occur as arguments (including outparams) in XPIDL methods that are exposed to JS, too. Is it feasible (with reasonably low effort) to introduce a new XPIDL type that is a pointer to a non-refcounted immutable static object in C++ and still gets bridged to JS? -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Representing a pointer to static in XPConnected JS?
On Thu, May 4, 2017 at 10:08 AM, Henri Sivonen wrote: > Is it feasible (with reasonably low effort) to introduce a new XPIDL > type that is a pointer to a non-refcounted immutable static object in > C++ and still gets bridged to JS? My question was underspecified. At minimum, the JS bridging should have these properties: 1) Returning a non-null const Encoding* from C++ materializes a JS object. 2) Passing the JS object back to C++ materializes the original pointer on the C++ side. 3) Returning a null const Encoding* from C++ materializes a JS null. 4) Passing a JS null to C++ materializes nullptr. 5) Comparing two JS objects materialized per point #1 with == is true iff they were materiazed from the same C++ pointer. Is that kind of thing doable with little effort? It would be a bonus if the JS objects could come with pre-defined methods, but that's not strictly necessary. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Representing a pointer to static in XPConnected JS?
On Thu, May 4, 2017 at 4:27 PM, Nathan Froyd wrote: > On Thu, May 4, 2017 at 3:08 AM, Henri Sivonen wrote: >> Is it feasible (with reasonably low effort) to introduce a new XPIDL >> type that is a pointer to a non-refcounted immutable static object in >> C++ and still gets bridged to JS? > > You can certainly have static objects with what amount to dummy > AddRef/Release methods passed through XPIDL (we do this in a couple of > places throughout Gecko), but I don't think you can get away with > having a non-refcounted object passed through XPIDL. Do the AddRef/Release need to be virtual? -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Representing a pointer to static in XPConnected JS?
On Thu, May 4, 2017 at 7:39 PM, Nathan Froyd wrote: > On Thu, May 4, 2017 at 12:32 PM, Henri Sivonen wrote: >> On Thu, May 4, 2017 at 4:27 PM, Nathan Froyd wrote: >>> On Thu, May 4, 2017 at 3:08 AM, Henri Sivonen wrote: >>>> Is it feasible (with reasonably low effort) to introduce a new XPIDL >>>> type that is a pointer to a non-refcounted immutable static object in >>>> C++ and still gets bridged to JS? >>> >>> You can certainly have static objects with what amount to dummy >>> AddRef/Release methods passed through XPIDL (we do this in a couple of >>> places throughout Gecko), but I don't think you can get away with >>> having a non-refcounted object passed through XPIDL. >> >> Do the AddRef/Release need to be virtual? > > Yes. (I'm not sure how XPConnect would discover the refcounting > methods if they were non-virtual.) > > Please note that the static objects with dummy AddRef/Release methods > also implement XPConnect interfaces, i.e. QueryInterface, nsresult > virtual methods, etc. OK. That doesn't fit my case. There's nothing virtual on either the C++ or Rust side about mozilla::Encoding / encoding_rs::Encoding. All the instances come from the Rust side and the interpretation of the pointer just changes when crossing the FFI so that C++ thinks it's a pointer to mozilla::Encoding. On the C++ side, the (all non-virtual) methods take the "this" pointer and send it back to Rust as the first argument to FFI. > I think you could possibly make your things a WebIDL interface, which > don't require refcounting, and magically make the WebIDL interfaces > work with XPIDL, but I do not know the details there. I'll keep that in mind. Thanks. Another option is to have dual methods on objects that are accessed from both C++ and JS: A non-scriptable method that takes const mozilla::Encoding* and a scriptable method that takes something else: either a string containing a name or some kind of manually-applied XPCOM/XPIDL-ish wrapper for const mozilla::Encoding*. It just would be nice for the wrapping part to be automagic in the binding. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Representing a pointer to static in XPConnected JS?
On Fri, May 5, 2017 at 6:02 PM, Boris Zbarsky wrote: > I'm not sure this is going to work in this case. WebIDL interfaces that > don't require refcounting basically require that the JS object owns the C++ > thing; it will delete it when finalized. > > You could do non-virtual no-op refcounting. But you still have a problem > where ToJSValue only knows how to work with subclasses of either nsISupports > or nsWrapperCache, both of which involve virtual functions. Yeah, that's not OK for my case. > If I can take a step back, though, you have the following requirements: > > 1) The same Encoding* has to lead to the same JS object. I thought that == could be true when the objects aren't the same as with strings, but now that I look at the spec, it seems that I was wrong. But perhaps chrome JS could get away with not comparing these things: just passing them from one XPCOM API to another. > 2) The JS representation needs to be an object. This is probably good in > terms of typesafety (in that we can check whether the object is actually the > thing we were passed), but complicates the other bits. For example, if this > were not a requirement, we could conceivably use a JS PrivateValue to just > directly encode the Encoding* in a value that looks like a JS Number. This > does mean that on JS-to-C++ conversion we'd effectively reinterpret_cast a > double to an Encoding*, so if someone messed up and passed some random > double bad things would happen. I suppose compared to that it would be pretty tame to have a string-based API that MOZ_CRASHes if the string argument isn't an encoding name. (Previously, we couldn't do that, since XPCOM extension bugs could then crash the browser, but maybe we could have higher expectations for our own chrome JS.) > One thing that is not clear to me: do you need support for this on worker > threads, or just mainthread? I'm not sure what chrome JS runs on non-main threads and if there's non-main-thread chrome JS doing things like obtain an encoding name from a channel and pass it to the UTF8 converter service. > This would be a bit of work, but not too insane, I think. This seems complicated enough that it's probably the best to have non-scriptable methods that are type-safe for C++ usage and scriptable overloads that deal with encoding names as strings for chrome JS. After all, Web Platform JS represents encodings as strings, too. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: CodeCoverage Monthly Update
On Thu, Apr 6, 2017 at 6:26 AM, Kyle Lahnakoski wrote: > * Getting Rust to emit coverage artifacts is important: > https://bugzilla.mozilla.org/show_bug.cgi?id=1335518 Is there a plan to factor "cargo test" of individual vendored crates into the coverage of Rust code? For example, for encoding_rs, I was thinking of testing mainly the C++ integration as an mozilla-central-specific gtest and leaving the testing of the crate internals to the crate's standalone "cargo test". -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Using references vs. pointers in C++ code
On Tue, May 9, 2017 at 12:58 PM, Emilio Cobos Álvarez wrote: > I think references help to encode that a bit more in the type system, > and help reasoning about the code without having to look at the > implementation of the function you're calling into, or without having to > rely on the callers to know that you expect a non-null argument. > > Personally, I don't think that the fact that they're not used as much as > they could/should is a good argument to prevent their usage, but I don't > know what's the general opinion on that. The relevant bit of the Core Guidelines is https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#Rf-ptr-ref and says: "A pointer (T*) can be a nullptr and a reference (T&) cannot, there is no valid "null reference". Sometimes having nullptr as an alternative to indicated "no object" is useful, but if it is not, a reference is notationally simpler and might yield better code." As a result, I have an in-flight patch that takes T& instead of NotNull where applicable, even though I do use NotNull to annotate return values. I agree that in principle it makes sense to use the type system instead of relying on partial debug-build run-time measures to denote non-null arguments when possible. That said, having to dereference a smart pointer with prefix * in order to pass its referent to a method that takes a T& argument feels a bit odd when one is logically thinking of passing a pointer still, but then, again, &*foo seems like common pattern on the Rust side of FFI to make a reference out of a pointer and effectively asserting to the human reader that the pointer is null. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Is Big5 form submission fast enough?
In Firefox 43, I rewrote our Big5 support and, among other things, I optimized the *encoder* for footprint rather than speed on the theory that users won't notice anyway since the encoder run is followed by a dominating wait for the network when submitting a form. Since then, I've learned that the relative slowness of the Big5 encoder is greater than I had anticipated. Still, I haven't seen anyone complaining, but I don't know if anyone who finds it too slow knows how to attribute the complaint. I'd like to hear from someone who uses a Web site/app that involves submitting a textarea of Traditional Chinese text in Big5 if the form submission performance seems normal (doesn't feel particularly slow) on low-performance hardware, like an Android phone. (In the phone case, I mean the amount of text you'd feel OK to input on a phone at one time.) If UTF-8 is so widely deployed that no one in the Taipei office needs to submit forms in Big5 anymore, that would be good to know, too. Context: I need to decide if I should make Big5 encode faster or if I should trade off speed for smaller footprint for the legacy Simplified Chinese and Japanese *encoders*, too. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Is Big5 form submission fast enough?
On Fri, May 12, 2017 at 4:28 AM, Kan-Ru Chen wrote: > On Thu, May 11, 2017, at 01:43 PM, Henri Sivonen wrote: >> In Firefox 43, I rewrote our Big5 support and, among other things, I >> optimized the *encoder* for footprint rather than speed on the theory >> that users won't notice anyway since the encoder run is followed by a >> dominating wait for the network when submitting a form. >> >> Since then, I've learned that the relative slowness of the Big5 >> encoder is greater than I had anticipated. Still, I haven't seen >> anyone complaining, but I don't know if anyone who finds it too slow >> knows how to attribute the complaint. >> >> I'd like to hear from someone who uses a Web site/app that involves >> submitting a textarea of Traditional Chinese text in Big5 if the form >> submission performance seems normal (doesn't feel particularly slow) >> on low-performance hardware, like an Android phone. (In the phone >> case, I mean the amount of text you'd feel OK to input on a phone at >> one time.) >> >> If UTF-8 is so widely deployed that no one in the Taipei office needs >> to submit forms in Big5 anymore, that would be good to know, too. > > I don't feel that I see a lot of Big5 websites out there. It's hard for > me to even find one to test. Thank you. I guess it doesn't really matter whether Big5 form submission feels slow or not if it's something that people very rarely experience. >> Context: >> I need to decide if I should make Big5 encode faster or if I should >> trade off speed for smaller footprint for the legacy Simplified >> Chinese and Japanese *encoders*, too. > > I think Shift_JIS are still widely used. But this is just my experience > and guessing. If we really want to know the real word usage we should > collect data. Is there some telemetry probe for this already? If Big5 form submission is so rarely used that its performance doesn't matter, we can't then extrapolate the lack of complaints to inform the implementation Japanese or Simplified Chinese legacy encodings. Keeping the implementation approach asymmetry between Big5 on one hand and legacy Japanese and legacy Simplified Chinese encodings on the other hand seems like a valid approach in the case of a large disparity in usage today. There aren't telemetry probes for "this" regardless of what "this" you meant. To my knowledge, there's no telemetry probe for counting form submission encodings and there is no telemetry probe measuring form submission round trip time by encoding. Telemetry analysis in this area would have to be scoped by locale (e.g. analysis of relative frequency of Big5 and UTF-8 form submissions scoped to the zh-TW locale) to be meaningful, and, from experience, such analyses are annoying to carry out, because they need manual privileged access to telemetry data. Locale-scoped views aren't available on the telemetry dashboards, because some locales have so few users that scoping by locale would narrow the population so much that the data could be potentially too identifiable. It would be nice if locale-scoped views were available for locales whose telemetry-enabled instances are more numerous than some threshold. (Each of zh-TW, zh-CN and ja-JP surely has enough users, at least on the release channel, for aggregate data views not to be too identifiable.) Additionally, I don't really know what would be a good way to place a probe in our code to measure form submission round trip time (what the user perceives) rather than the encode step only. (It's already obvious that the encode step itself would show a disparity by a massive factor between UTF-8 and Big5.) (I don't think we need telemetry to believe that Shift_JIS and gbk form submissions still happen routinely.) -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Is it OK to make allocations that intentionally aren't freed? (was: Re: Is Big5 form submission fast enough?)
On Tue, May 16, 2017 at 7:03 AM, Tim Guan-tin Chien wrote: > According to Alexa top 100 Taiwan sites and quick spot checks, I can only > see the following two sites encoded in Big5: > > http://www.ruten.com.tw/ > https://www.momoshop.com.tw/ > > Both are shopping sites (eBay-like and Amazon-like) so you get the idea how > forms are used there. Thank you. It seems to me that encoder performance doesn't really matter for sites like these, since the number of characters one would enter in the search field at a time is very small. > Mike reminded me to check the Tax filing website: http://www.tax.nat.gov.tw/ > .Yes, it's unfortunately also in Big5. I guess I'm not going to try filing taxes there for testing. :-) - - One option I've been thinking about is computing an encode acceleration table for JIS X 0208 on the first attempt to encode a CJK Unified Ideograph in any of Shift_JIS, EUC-JP or ISO-2022-JP, for GBK on the first attempt to encode a CJK Unified Ideograph in either GBK or gb18030, and for Big5 on the first attempt to encode a CJK Unified Ideograph in Big5. Each of the three tables would then remain allocated through to the termination of the process. This would have the advantage of not bloating our binary footprint with data that can be computed from other data in the binary while still making legacy Chinese and Japanese encode fast without a setup cost for each encoder instance. The downsides would be that the memory for the tables wouldn't be reclaimed if the tables aren't needed anymore (the browser can't predict the future) and executions where any of the tables has been created wouldn't be valgrind-clean. Also, in the multi-process world, the tables would be recomputed per-process. OTOH, if we shut down rendered processes from time to time, it would work as a coarse mechanism to reclaim the memory is case Japanese or Chinese legacy encode is a relatively isolated event in the user's browsing pattern. Creating a mechanism for the encoding library to become aware of application shutdown just in order to be valgrind-clean would be messy, though. (Currently, we have shutdown bugs where uconv gets used after we've told it can shut down. I'd really want to avoid re-introducing that class of bugs with encoding_rs.) Is it OK to create allocations that are intentionally never freed (i.e. process termination is what "frees" them)? Is valgrind's message suppression mechanism granular enough to suppress three allocations from a particular Rust crate statically linked into libxul? -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Is it OK to make allocations that intentionally aren't freed? (was: Re: Is Big5 form submission fast enough?)
On Sat, May 20, 2017 at 5:48 AM, Botond Ballo wrote: > On Fri, May 19, 2017 at 10:38 PM, Nicholas Nethercote > wrote: >> There's also a pre-processor constant that we define in Valgrind/ASAN/etc. >> builds that you can check in order to free more stuff than you otherwise >> would. But I can't for the life of me remember what it's called :( > > It looks like some code checks for MOZ_ASAN and MOZ_VALGRIND. Thanks. On Fri, May 19, 2017 at 10:09 PM, Jeff Muizelaar wrote: > We use functions like cairo_debug_reset_static_data() on shutdown to > handle cases like this. The comment there is encouraging, since it suggests that Cairo doesn't attempt to deal with cairo_debug_reset_static_data() getting called too early. On Fri, May 19, 2017 at 9:58 PM, Kris Maglione wrote: > On Fri, May 19, 2017 at 08:44:58AM +0300, Henri Sivonen wrote: >> >> The downsides would be that the memory for the tables wouldn't be >> reclaimed if the tables aren't needed anymore (the browser can't >> predict the future) and executions where any of the tables has been >> created wouldn't be valgrind-clean. > > > If we do this, it would be nice to flush the tables when we get a > memory-pressure event, which should at least mitigate some of the effects > for users on memory-constrained systems. How large would the tables have to be for it to be worthwhile, in your estimate, to engineer a mechanism for dynamically dropping them (for non-valgrind reasons)? If there is only a one-way transition (first a table doesn't exist and after some point in time it exists and will continue to exist), there can be an atomic pointer to the table and no mutex involved when the pointer is read as non-null. That is, only threads that see the pointer as null would then obtain a mutex to make sure that only one thread creates the table. If the tables can go away, the whole use of the table becomes a critical section and then entering the critical section on a per-character basis probably becomes a bad idea and hoisting the mutex acquisition to cover a larger swath of work means that the fact that the table is dynamically created will leak from behind some small lookup abstraction. That's doable, of course, but I'd really like to avoid over-designing this, when there's a good chance that users wouldn't even notice if GBK and Shift_JIS got the same slowdown as Big5 got in Firefox 43. I guess instead of looking at the relative slowness and pondering acceleration tables, I should measure how much Chinese or Japanese text a Raspberry Pi 3 (the underpowered ARM device I have access to and that has predictable-enough scheduling to be benchmarkable in a usefully repeatable way unlike Android devices) can legacy-encode in a tenth of a second or 1/24th of a second without an acceleration table. (I posit that with the network roundtrip happening afterwards, no one is going to care if the form encode step in the legacy case takes up to one movie frame duration. Possibly, the "don't care" allowance is much larger.) > And is there a reason ClearOnShutdown couldn't be used to deal with valgrind > issues? I could live with having a valgrind/ASAN-only clean-up method that would be UB to call too early if I can genuinely not be on the hook for someone calling it too early. I don't want to deal with what we have now: First we tell our encoding framework to shut down but then we still occasionally do stuff like parse URLs afterwards. > That said, can we try to get some telemetry on how often we'd need to build > these tables, and how likely they are to be needed again in the same > process, before we make a decision? I'd really like to proceed with work sooner than it takes to do the whole round-trip of setting up telemetry and getting results. What kind of thresholds would we be looking for to make decisions? On Fri, May 19, 2017 at 10:38 PM, Eric Rahm wrote: > I'd be less concerned about overhead if we had a good way of sharing these > static tables across processes Seem sad to add process-awareness complexity to a library that otherwise doesn't need to know about processes when the necessity of fast legacy encode is itself doubtful. > (ICU seems like a good candidate as well). What do you mean? On Fri, May 19, 2017 at 10:22 PM, Jet Villegas wrote: > Might be good to serialize to/from disk after the first run, so only > the first process pays the compute cost? Building the acceleration table would be more of a matter of writing memory in a non-sequential order than about doing serious math. Reading from disk would mostly have the benefit of making the memory write sequential. I'd expect the general overhead of disk access to be worse that a recompute. In any case, I don't want encoding_rs to have to know about file sys
Re: Is it OK to make allocations that intentionally aren't freed? (was: Re: Is Big5 form submission fast enough?)
On Sun, May 21, 2017 at 3:46 PM, Henri Sivonen wrote: > I guess instead of looking at the relative slowness and pondering > acceleration tables, I should measure how much Chinese or Japanese > text a Raspberry Pi 3 (the underpowered ARM device I have access to > and that has predictable-enough scheduling to be benchmarkable in a > usefully repeatable way unlike Android devices) can legacy-encode in a > tenth of a second or 1/24th of a second without an acceleration table. > (I posit that with the network roundtrip happening afterwards, no one > is going to care if the form encode step in the legacy case takes up > to one movie frame duration. Possibly, the "don't care" allowance is > much larger.) Here are numbers from ARMv7 code running on RPi3: UTF-16 to Shift_JIS: 626000 characters per second or the human-readable non-markup text of a Wikipedia article in 1/60th of a second. UTF-16 to GB18030 (same as GBK for the dominant parts): 206000 characters per second or the human-readable non-markup text of a Wikipedia article in 1/15th of a second UTF-16 to Big5: 258000 characters per second or the human-readable non-markup text of a Wikipedia article in 1/20th of a second Considering that usually a user submits considerably less than a Wikipedia article's worth of text in a form at a time, I think we can conclude that as far as user perception of form submission goes, it's OK to ship Japanese and Chinese legacy encoders that do linear search over decode-optimized data (no encode-specific data structures at all) and are extremely slow *relative* (by a factor of over 200!) to UTF-16 to UTF-8 encode. The test data I used was: https://github.com/hsivonen/encoding_bench/blob/master/src/wikipedia/zh_tw.txt https://github.com/hsivonen/encoding_bench/blob/master/src/wikipedia/zh_cn.txt https://github.com/hsivonen/encoding_bench/blob/master/src/wikipedia/ja.txt So it's human-authored text, but my understanding is that the Simplified Chinese version has been machine-mapped from the Traditional Chinese version, so it's possible that some slowness of the Simplified Chinese case is attributable to the conversion from Traditional Chinese exercising less common characters than if it had been human-authored directly as Simplified Chinese. Japanese is not fully ideographic and the kana mapping is a matter of a range check plus offset, which is why the Shift_JIS case is so much faster. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Mozilla Charset Detectors
On Mon, May 22, 2017 at 12:13 PM, Gabriel Sandor wrote: > I recently came across the Mozilla Charset Detectors tool, at > https://www-archive.mozilla.org/projects/intl/chardet.html. I'm working on > a C# project where I could use a port of this library (e.g. > https://github.com/errepi/ude) for advanced charset detection. It's somewhat unfortunate that chardet got ported over to languages like Python and C# with its shortcomings. The main shortcoming is that despite the name saying "universal", the detector was rather arbitrary in what it detected and what it didn't. Why Hebrew and Thai but not Arabic or Vietnamese? Why have a Hungarian-specific frequency model (that didn't actually work) but no models for e.g. Polish and Czech from the same legacy encoding family? The remaining detector bits in Firefox are for Japanese, Russian and Ukrainian only, and I strongly suspect that the Russian and Ukrainian detectors are doing more harm than good. > I'm not sure however if this tool is deprecated or not, and still > recommended by Mozilla for use in modern applications. The tool page is > archived and most of the links are dead, while the code seems to be at > least 7-8 years old. Could you please tell me what's the status of this > tool and whether I should use it in my project or look for something else? I recommend not using it. (I removed most of it from Firefox.) I recommend avoiding heuristic detection unless your project absolutely can't do without. If you *really* need a detector, ICU and https://github.com/google/compact_enc_det/ might be worth looking at, though this shouldn't be read as an endorsement of either. With both ICU and https://github.com/google/compact_enc_det/ , watch out for the detector's possible guess space containing very rarely used encodings that you really don't want content detected as by mistake. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Scope of XML parser rewrite?
In reference to: https://twitter.com/nnethercote/status/866792097101238272 Is the rewrite meant to replace expat only or also some of our old code on both above and below expat? Back in 2011, I wrote a plan for rewriting the code around expat without rewriting expat itself: https://wiki.mozilla.org/Platform/XML_Rewrite I've had higher-priority stuff to do ever since... (The above plan talks about pushing UTF-16 to the XML parser and having deep C++ namespaces. Any project starting this year should make the new parser use UTF-8 internally for cache-friendliness and use less deep C++ namespaces.) Also, I think the decision of which XML version to support should be a deliberate decision and not an accident. I think the reasonable choices are XML 1.0 4th edition (not rocking the boat) and reviving XML5 (original discussion: https://annevankesteren.nl/2007/10/xml5 , latest draft: https://ygg01.github.io/xml5_draft/). XML 1.1 is dead. XML 1.0 5th edition tried to have the XML 1.0 cake and eat the XML 1.1 cake too and expanded the set of documents that parser doesn't reject. Any of the newly well-forming documents would be incompatible with 4th ed. and earlier parsers, which would be a break from universal XML interop. I think it doesn't make sense to relax XML only a bit. If XML is to be relaxed (breaking interop in the sense of starting to accept docs that old browsers would show the Yellow Screen of Death on), we should go all the way (i.e. XML5). Notably, it looks like Servo already has an XML5 parser written in Rust: https://github.com/servo/html5ever/tree/master/xml5ever The tweets weren't clear about whether xml5ever had been considered, but https://twitter.com/eroc/status/866808814959378434 looks like it's talking about writing a new one. It seems like integrating xml5ever (as opposed to another XML parser written in Rust) into Gecko would give some insight into how big a deal it would be to replace Gecko's HTML parser with html5ever (although due to document.write(), HTML is always a bigger deal integration-wise than XML). (If the outcome here is to do XML5, we should make sure the spec is polished enough at the WHATWG in order not to a unilateral thing in relative secret.) -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Scope of XML parser rewrite?
Figured out the email address of the XML5 editor / xml5ever developer, so adding to CC. On Tue, May 23, 2017 at 9:43 AM, Henri Sivonen wrote: > In reference to: https://twitter.com/nnethercote/status/866792097101238272 > > Is the rewrite meant to replace expat only or also some of our old > code on both above and below expat? > > Back in 2011, I wrote a plan for rewriting the code around expat > without rewriting expat itself: > https://wiki.mozilla.org/Platform/XML_Rewrite > I've had higher-priority stuff to do ever since... > > (The above plan talks about pushing UTF-16 to the XML parser and > having deep C++ namespaces. Any project starting this year should make > the new parser use UTF-8 internally for cache-friendliness and use > less deep C++ namespaces.) > > Also, I think the decision of which XML version to support should be a > deliberate decision and not an accident. I think the reasonable > choices are XML 1.0 4th edition (not rocking the boat) and reviving > XML5 (original discussion: https://annevankesteren.nl/2007/10/xml5 , > latest draft: https://ygg01.github.io/xml5_draft/). XML 1.1 is dead. > XML 1.0 5th edition tried to have the XML 1.0 cake and eat the XML 1.1 > cake too and expanded the set of documents that parser doesn't reject. > Any of the newly well-forming documents would be incompatible with 4th > ed. and earlier parsers, which would be a break from universal XML > interop. I think it doesn't make sense to relax XML only a bit. If XML > is to be relaxed (breaking interop in the sense of starting to accept > docs that old browsers would show the Yellow Screen of Death on), we > should go all the way (i.e. XML5). > > Notably, it looks like Servo already has an XML5 parser written in Rust: > https://github.com/servo/html5ever/tree/master/xml5ever > > The tweets weren't clear about whether xml5ever had been considered, > but https://twitter.com/eroc/status/866808814959378434 looks like it's > talking about writing a new one. > > It seems like integrating xml5ever (as opposed to another XML parser > written in Rust) into Gecko would give some insight into how big a > deal it would be to replace Gecko's HTML parser with html5ever > (although due to document.write(), HTML is always a bigger deal > integration-wise than XML). > > (If the outcome here is to do XML5, we should make sure the spec is > polished enough at the WHATWG in order not to a unilateral thing in > relative secret.) > > -- > Henri Sivonen > hsivo...@hsivonen.fi > https://hsivonen.fi/ -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Scope of XML parser rewrite?
On Tue, May 23, 2017 at 12:44 PM, Daniel Fath wrote: >> (If the outcome here is to do XML5, we should make sure the spec is >> polished enough at the WHATWG in order not to a unilateral thing in >> relative secret.) > > What does it mean to be polished enough at the WHATWG? I was thinking of having resolutions for the issues that are currently warnings in red and multi-vendor buy-in. (Previously, Tab from Google was interested in making SVG parsing non-Draconian, but I have no idea how reflective of wider buy-in that remark was.) > Also how far reaching should spec be? Include Namespaces? I would expect the spec to take a byte stream as input, specify how the encoding is determined, delegate the decoding from bytes to Unicode code points to the Encoding Standard and then define how the code point stream is processed into a DOM tree. (Bonus points for defining a coercion to an XML 1.0 4th ed. Infoset, too, for non-browser use cases.) That would include the processing of Namespaces. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Scope of XML parser rewrite?
On Tue, May 23, 2017 at 5:01 PM, Daniel Fath wrote: > So, if I understand this correctly - We'll first need to land this component > in Firefox, right? And if it proves itself fine, then formalize it. No, both the implementation and the spec would have to be pretty solid before stuff can go into Firefox. But, as noted, DTDs are a blocker (if Firefox is to use the same XML parser for both XUL and for the Web, which makes sense in terms of binary size even if it's rather sad for XUL to constrain the Web side). >> I was thinking of having resolutions for the issues that are currently >> warnings in red and multi-vendor buy-in. (Previously, Tab from Google >> was interested in making SVG parsing non-Draconian, but I have no idea >> how reflective of wider buy-in that remark was.) > > You also mentioned warnings in red and multi-vendor buy-in. What does that > entail? Looks like at this time, even Mozilla-internal buy-in is lacking. :-/ On Tue, May 23, 2017 at 9:23 PM, Eric Rahm wrote: > I was hoping to write a more thorough blog post about this proposal (I have > some notes in a gist [1]), but for now I've added comments inline. The main > takeaway here is that I want to do a bare-bones replacement of just the > parts of expat we currently use. It needs to support DTD entities, have a > streaming interface, and support XML 1 v4. That's it, no new features, no > rewrite of our entire XML stack. OK. > Our current interface is UTF-16, so that's my target for now. I think > whatever cache-friendliness would be lost converting from UTF-16 -> UTF-8 -> > UTF-16. I hope this can be reconsidered, because the assumption that it would have to be UTF-16 -> UTF-8 -> UTF-16 isn't accurate. encoding_rs (https://bugzilla.mozilla.org/show_bug.cgi?id=encoding_rs) adds the capability to decode directly to UTF-8. This is a true direct-to-UTF-8 capability without pivoting through UTF-16. When the input is UTF-8 (as is the case with our chrome XML and with most on-the-Web XML), in the streaming mode, except for the few bytes representing code points split across buffer boundaries, this is fast UTF-8 validation (without doing math to compute scalar values and with SIMD acceleration for ASCII runs) and memcpy. (In the non-streaming case, it's validation and borrow when called from Rust and validation and nsStringBuffer refcount increment when called from C++.) On the other side of the parser, it's true that our DOM API takes UTF-16, but if all the code points in a text node are U+00veryFF or under, the text gets stored with leading zeros omitted. It would be fairly easy to add a hole in the abstraction to allow a UTF-8-oriented parser to set the compressed form directly without expansion to UTF-16 and then compression immediately back to ASCII when the parser knows that a text node is all ASCII. For element and attribute names, we already support finding atoms by UTF-8 representation and in most cases element and attribute names are ASCII with static atoms already existing for them. It seems to me that attribute values would be the only case where a conversion from UTF-8 to UTF-16 would be needed all the time, and that conversion can be fast for ASCII, which is what attribute values mostly are. Furthermore, the main Web XML case is SVG, which has relatively little natural-language text, so it's almost entirely ASCII. Looking at the ratio markup and natural-language text in XUL, it seems fair to guess that parsing XUL as UTF-8 would be a cache-friendliness win, too. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Scope of XML parser rewrite?
On Wed, May 24, 2017 at 10:11 AM, Henri Sivonen wrote: > It seems to me that attribute values would be the only case where a > conversion from UTF-8 to UTF-16 would be needed all the time, and that > conversion can be fast for ASCII, which is what attribute values > mostly are. Moreover, this conversion doesn't need to have the cost of converting potentially-bogus UTF-8 to UTF-16 but only the cost of converting guaranteed-valid UTF-8 to UTF-16, because UTF-8 validity was already guaranteed earlier. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Scope of XML parser rewrite?
On Wed, May 24, 2017 at 10:34 AM, Anne van Kesteren wrote: > Contrary to Henri, I think XML 1.0 edition 5 (which isn't "XML5") is > worth considering given > https://bugzilla.mozilla.org/show_bug.cgi?id=501837. It's what Chrome > ships and our current implementation doesn't seem to align with either > the 4th or 5th edition of XML 1.0. OK, if Chrome has shipped 1.0 5th ed., we should, too. :-( -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Scope of XML parser rewrite?
On Wed, May 24, 2017 at 10:11 AM, Henri Sivonen wrote: >> Our current interface is UTF-16, so that's my target for now. I think >> whatever cache-friendliness would be lost converting from UTF-16 -> UTF-8 -> >> UTF-16. > > I hope this can be reconsidered, because the assumption that it would > have to be UTF-16 -> UTF-8 -> UTF-16 isn't accurate. I see that this part didn't get an on-list reply but got an blog reply: http://www.erahm.org/2017/05/24/a-rust-based-xml-parser-for-firefox/ I continue to think it's a bad idea to write another parser that uses UTF-16 internally. Even though I can see your desire to keep the project tightly scoped, I think it's fair to ask you to expand the scope a bit by 1) adding a way to pass Latin-1 data to text nodes directly (and use this when the the parser sees a text node is all ASCII) and 2) replacing nsScanner with a bit of new buffering code that takes bytes from the network and converts them to UTF-8 using encoding_rs. We've both had the displeasure of modifying nsScanner as part of a security fix. nsScanner isn't valuable code that we should try to keep. It's no longer scanning for anything. It's just an over-complicated way of maintaining a buffer of UTF-16 data. While nsScanner and the associated classes are a lot of code, they do something simple that should be done in quite a bit less code, so as scope creep, replacing nsScanner should be a drop in a bucket effort-wise compared to replacing expat. I think it's super-sad if we get another UTF-16-using parser because replacing nsScanner was scoped out of the project. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Scope of XML parser rewrite?
On Fri, May 26, 2017 at 1:18 AM, Eric Rahm wrote: > Limiting to modifying nsScanner might be an option, but probably not > changing all callers that use the nsAString interface. I guess we can just > UTF-16 => UTF-8 those and file a bunch of follow ups? Yeah. The main follow-up would be https://bugzilla.mozilla.org/show_bug.cgi?id=1355106 , which would allow the avoidance of UTF-16 expansion in the innerHTML/createContextualFragment/DOMParser cases for strings that the JS engine doesn't store as 16-bit units in the first place. (Performance-wise, I see the network entry point as the main thing for the XML parser and innerHTML/createContextualFragment/DOMParser as secondary.) > One thing we've ignored are all the consumers expect output to be UTF-16, so > there's a fair amount of work on that side as well. I guess we have a viewpoint difference in terms of what the "consumers" are. I think of the DOM as a consumer, and the DOM takes Atoms (which can be looked up from UTF-8). While the callbacks in nsExpatDriver aren't bad code like nsScanner is, I don't think of the exact callback callback code as worth preserving in its precise current state. > Maybe a reasonable approach is to use a UTF-8 interface for the replacement > Rust library and work on a staged rollout: > > Start just converting UTF-16 => UTF-8 for input at the nsExpatDriver level, > UTF-8 => UTF-16 for output > Modify/replace nsScanner with something that works with UTF-8 (and > encoding_rs?), convert UTF-16 => UTF-8 for the nsAString methods > Follow up replacing nsAString methods with UTF-8 versions > Look into whether modifying the consumers of the tokenized data to handle > UTF-8 is reasonable, follow up as necessary > > WDYT? Seems good to me with the note that doing the direct UTF-8 to nsIAtom lookup would probably be a pretty immediate thing rather a true follow-up. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Mozilla Charset Detectors
On Thu, May 25, 2017 at 10:44 PM, wrote: > Think of XML files without the "encoding" attribute in the declaration or > HTML files without the meta charset tag. Per spec, these must be treated as UTF-16 if there's a UTF-16 BOM and as UTF-8 otherwise. It's highly inappropriate to run heuristic detection for XML. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Consensus check: Asking the Unicode Technical Committee to revert their decision to change the preferred UTF-8 error handling
(If you don't care about the details of UTF-8 error handling, it's safe to stop reading.) In reference to https://hsivonen.fi/broken-utf-8/ , I think it would be appropriate to submit that post to the Unicode Consortium with a cover note asking the Unicode Technical Committee to revert their decision to change the preferred UTF-8 error handling for Unicode 11 and to retract the action item to draft corresponding new text for Unicode 11 for reasons given in the post. I think it would be preferable to do this via Mozilla's liaison membership of the Unicode Consortium rather than me doing it as a random member of the public, because submission via Mozilla's liaison membership allows for visibility into the process and opportunity for follow-up whereas if I do it on my own, it's basically a matter of dropping a note into a one-way black box. (It seems that this kind of thing is exactly what Mozilla's liaison membership is for.) However, submitting via Mozilla's liaison membership raises the question of whether the submission would properly represent a Mozilla consensus. I estimate this to be noncontroversial, because deliberate effort has been expended to make the Mozilla-affiliated implementations that I am aware of (uconv, encoding_rs and the Rust standard library) behave according to the pre-Unicode 11 version of the guidance either directly by looking at the Unicode Standard or by the way of implementing the WHATWG Encoding Standard, which elevates the pre-Unicode 11 preferred approach into a requirement. If I have mis-guessed that the above-contemplated submission should be non-controversial from the Mozilla perspective and you believe that the above-contemplated submission should not be made via Mozilla's liaison membership, please let me know. (My understanding is that a reversal of the decision is quite possible, but actually making the above-contemplated submission is a process prerequisite for a reversal to take place.) -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Consensus check: Asking the Unicode Technical Committee to revert their decision to change the preferred UTF-8 error handling
On Wed, Jun 7, 2017 at 12:16 PM, Jet Villegas wrote: > SGTM, Thanks. > Thanks for pushing on this one. > > One comment: although this is a proposed change to non-normative spec > text, it appears that several implementations already implement the > original (also non-normative) recommendation. Would it be worthwhile > to propose the reversal and also mark the section as normative? It seems to me that this episode has revealed that the Unicode Consortium doesn't have an objective way to deem one way of doing things as clearly the best one, so it seems to me that there's a better chance of Unicode toning down the language expresses a preference to make it a lesser preference (by calling it something lesser than "best practice" going forward) and it seems to me that there wouldn't be Unicode-level agreement of elevating the old or the new preference to a requirement on the Unicode level. The WHATWG spec would continue to make the old Unicode preference required, so I think it's an OK outcome for the requirement to live in the WHATWG spec and Unicode preferring the same thing (i.e. reverting the change to the preference) in weaker terms than so far. Letting it be this way wouldn't invite objections from non-Web-oriented implementors who implement something else that's currently within Unicode compliance and who don't want to change any code. > --Jet > > On Wed, Jun 7, 2017 at 2:11 AM, Henri Sivonen wrote: >> (If you don't care about the details of UTF-8 error handling, it's >> safe to stop reading.) >> >> In reference to https://hsivonen.fi/broken-utf-8/ , I think it would >> be appropriate to submit that post to the Unicode Consortium with a >> cover note asking the Unicode Technical Committee to revert their >> decision to change the preferred UTF-8 error handling for Unicode 11 >> and to retract the action item to draft corresponding new text for >> Unicode 11 for reasons given in the post. >> >> I think it would be preferable to do this via Mozilla's liaison >> membership of the Unicode Consortium rather than me doing it as a >> random member of the public, because submission via Mozilla's liaison >> membership allows for visibility into the process and opportunity for >> follow-up whereas if I do it on my own, it's basically a matter of >> dropping a note into a one-way black box. (It seems that this kind of >> thing is exactly what Mozilla's liaison membership is for.) >> >> However, submitting via Mozilla's liaison membership raises the >> question of whether the submission would properly represent a Mozilla >> consensus. I estimate this to be noncontroversial, because deliberate >> effort has been expended to make the Mozilla-affiliated >> implementations that I am aware of (uconv, encoding_rs and the Rust >> standard library) behave according to the pre-Unicode 11 version of >> the guidance either directly by looking at the Unicode Standard or by >> the way of implementing the WHATWG Encoding Standard, which elevates >> the pre-Unicode 11 preferred approach into a requirement. >> >> If I have mis-guessed that the above-contemplated submission should be >> non-controversial from the Mozilla perspective and you believe that >> the above-contemplated submission should not be made via Mozilla's >> liaison membership, please let me know. >> >> (My understanding is that a reversal of the decision is quite >> possible, but actually making the above-contemplated submission is a >> process prerequisite for a reversal to take place.) >> >> -- >> Henri Sivonen >> hsivo...@hsivonen.fi >> https://hsivonen.fi/ >> ___ >> dev-platform mailing list >> dev-platform@lists.mozilla.org >> https://lists.mozilla.org/listinfo/dev-platform -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
New character encoding conversion API
ate a new one. It's cheap. The creation doesn't perform any lookup table preparation or things like that. From C++, to avoid using malloc again, you can use mozilla::Encoding::NewDecoderInto() and variants to recycle the old heap allocation. * We don't have third-party crates in m-c that (unconditionally) require rust-encoding. However, if you need to import such a crate and it's infeasible to make it use encoding_rs directly, please do not vendor rust-encoding into the tree. Vendoring rust-encoding into the tree would bring in another set of lookup tables, which encoding_rs is specifically trying to avoid. I have a compatibily shim ready in case the need to vendor rust-encoding-dependent crates arises. https://github.com/hsivonen/encoding_rs_compat -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to unship: ISO-2022-JP-2 support in the ISO-2022-JP decoder
On Mon, Nov 30, 2015 at 5:38 PM, Henri Sivonen wrote: > Japanese *email* is often encoded as ISO-2022-JP, and Web browsers > also support ISO-2022-JP even though Shift_JIS and EUC-JP are the more > common Japanese legacy encodings on the *Web*. The two UTF-16 variants > and ISO-2022-JP are the only remaining encodings in the Web Platform > that encode non-Basic Latin characters to bytes that represent Basic > Latin characters in ASCII. > > There exists an extension of ISO-2022-JP called ISO-2022-JP-2. The > ISO-2022-JP decoder (not encoder) in Gecko supports ISO-2022-JP-2 > features, which include the use of characters from JIS X 0212, KS X > 1001 (better known as the repertoire for EUC-KR), GB 2312, ISO-8859-1 > and ISO-8859-7. The reason originally given for adding ISO-2022-JP-2 > support to Gecko was: "I want to add a ISO-2022-JP-2 charset decoder > to Mozilla."[1] > > Other browsers don't support this extension, so it clearly can't be a > requirement for the Web Platform, and the Encoding Standard doesn't > include the ISO-2022-JP-2 extension in its definition for the > ISO-2022-JP decoder. Bringing our ISO-2022-JP decoder to compliance[2] > would, therefore, involve removing ISO-2022-JP-2 support. > > The only known realistic source of ISO-2022-JP-2 data is Apple's Mail > application under some circumstances, which may impact Thunderbird and > SeaMonkey. > > Are there any objections to removing the ISO-2022-JP-2 functionality > from mozilla-central? > > [1] https://bugzilla.mozilla.org/show_bug.cgi?id=72468 > [2] https://bugzilla.mozilla.org/show_bug.cgi?id=715833 > -- > Henri Sivonen > hsivo...@hsivonen.fi > https://hsivonen.fi/ Code implementing the above-quoted intent has landed. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: New character encoding conversion API
On Thu, Jun 15, 2017 at 3:58 PM, Nathan Froyd wrote: > Can you file a bug so `mach vendor rust` complains about vendoring > rust-encoding? Filed https://bugzilla.mozilla.org/show_bug.cgi?id=1373554 -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Overhead of returning a string from C++ to JS over WebIDL bindings
I noticed a huge performance difference between https://hsivonen.com/test/moz/encoding_bench_web/ and https://github.com/hsivonen/encoding_bench/ . The former has the overhead of JS bindings. The latter doesn't. On a 2009 Mac Mini (Core 2 Duo), in the case of English, the overhead is over twice the time spent by encoding_rs in isolation, so the time per iteration over triples! The snowman test indicates that this isn't caused by SpiderMonkey's Latin1 space optimization. Safari performs better than Firefox, despite Safari using ICU (except for UTF-8 and windows-1252) and ICU being slower than encoding_rs in isolation on encoding_bench (tested on Linux). Also, looking at Safari's UTF-8 and windows-1252 decoders, which I haven't had the time to extract for isolated testing, and Safari's TextDecoder implementation, there's no magic there (especially no magic compared to the Blink fork of the UTF-8 and windows-1252 decoders). My hypothesis is that the JSC/WebKit overhead of returning a string from C++ to JS is much lower than SpiderMonkey/Gecko overhead or the V8/Blink overhead. (When encoding from string to ArrayBuffer, Safari doesn't have the advantage, which is also suggestive of this not being a matter of how GC happens relative to the timed runs.) Do we know why? -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Overhead of returning a string from C++ to JS over WebIDL bindings
On Fri, Jun 16, 2017 at 3:08 PM, Jan de Mooij wrote: > I profiled this quickly and we're spending a lot of time in GC. OK. So I accidentally created a string GC test instead of creating a TextDecoder test. :-( Is there a good cross-browser way to cause GC predictably outside the timed benchmark section in order to count only the TextDecoder run in the timing? On Fri, Jun 16, 2017 at 3:41 PM, Jan de Mooij wrote: > Also note that we have an external string cache (see > ExternalStringCache::lookup), it compares either the char pointers or the > actual characters (if length <= 100). If we are returning the same strings > (same characters) all the time on this benchmark, you could try removing > that length check to see if it makes a difference. The length of the string is always well over 100, so that already means that a string cache isn't interfering with the test, right? (Interfering meaning making the test reflect something other that the performance of the back end used by TextDecoder.) On Fri, Jun 16, 2017 at 11:19 PM, Boris Zbarsky wrote: > On 6/16/17 7:22 AM, Henri Sivonen wrote: >> >> My hypothesis is that the JSC/WebKit overhead of returning a string >> from C++ to JS is much lower than SpiderMonkey/Gecko overhead or the >> V8/Blink overhead. > > > It definitely is. JSC and WebKit use the same exact refcounted strings, > last I checked, so returning a string from WebKit to JSC involves a single > non-atomic refcount increment. It's super-fast. Jan said "We create external strings (a JS string referencing a DOM string buffer) for the strings returned from DOM to JS", so that means Gecko does roughly the same in this case, right? Could it be that JSC realizes that nothing holds onto the string and derefs it right away without having to wait for an actual GC? -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Overhead of returning a string from C++ to JS over WebIDL bindings
On Fri, Jun 16, 2017 at 3:08 PM, Jan de Mooij wrote: > It may be different for other parts of the benchmark - it would be nice to > have a minimal testcase showing the problem. https://hsivonen.com/test/moz/encoding_bench_web/english-only.html is minimized in the sense that 1) it runs only one benchmark and 2) it does the setup first and then waits for the user to click a button, which hopefully makes it easier to omit the setup from examination. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
C++ performance test harness?
Do we have a gtest analog for local performance testing? That is, something that makes it easy to microbenchmark libxul methods? -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: C++ performance test harness?
On Wed, Jul 5, 2017 at 3:36 PM, Emilio Cobos Álvarez wrote: > On 07/05/2017 10:19 AM, Henri Sivonen wrote: >> Do we have a gtest analog for local performance testing? That is, >> something that makes it easy to microbenchmark libxul methods? > > For CSS parsing there is a benchmark using MOZ_GTEST_BENCH[1]. Thanks. This seems to be what I'm looking for. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Embedding Firefox or Gecko in 2017
Positron isn't real Win32 embedding, though, right? That is, Positron runs the whole app instead of a view inside an otherwise Win32 app. To put a Gecko view inside a larger Win32 app, resurrecting EmbedLite might the more appropriate way to go. The actively-maintained user of EmbedLite code is the Sailfish Browser (not Win32 but Qt5), so it probably makes sense to start at https://git.merproject.org/mer-core/qtmozembed/activity and discover the repo dependencies to find the latest EmbedLite code, which seems to be at https://git.merproject.org/mer-core/gecko-dev/tree/nemo_embedlite/embedding/embedlite . On Fri, Jul 7, 2017 at 12:23 AM, Ralph Giles wrote: > You're right that this isn't currently easy to do. > > A more recent project, now also abandoned, was > https://github.com/mozilla/positron You might see if that's a place to > start. > > -r > > On Thu, Jul 6, 2017 at 12:47 PM, cnico7 wrote: > >> Hi, >> >> I would like to embed gecko or firefox in a native windows application. >> The need is to be able to use gecko's rendering and javascript engines in >> order to display web pages that are not correctly displayed with other >> browsers (mainly for legacy reasons of the intranet web site). Those pages >> are part of a desktop application deployed on windows 7. >> >> So I did some searches around this subject and I discovered it could be a >> complex path to achieve it with recent gecko versions. >> >> Here is what I found : >> * around 2000, an embedding solution was available through XPCOM as >> described here : https://developer.mozilla.org/en-US/docs/Gecko/Embedding_ >> Mozilla >> Unfortunately, this is obsolete and does not work any more from my >> understanding. >> * this blog post http://chrislord.net/2016/03/08/state-of-embedding-in- >> gecko/ lists some old embedding possibilities. >> Unfortunately, none of them seems to still be available. >> * the servo project seems interesting but is to be shipped in years from >> now. >> >> So here are my questions : >> If I had the technical capabilities to contribute to a gecko embedding >> solution, from what should I start investigating ? >> Is EmbedLite (aka IPCLite) the good starting point or is there a better >> path to embed gecko ? >> If yes, is there a description of the API to use ? >> >> I had the idea of using firefox with marionette protocol in order to >> interact with the engine and to use a custom plugin in order to hide all >> the design (menus, tab bars,...). This idea has many drawbacks : it is slow >> at launching time, it requires to improve marionette protocol in order to >> intercept browsing events and hiding all ui elements with webextensions is >> no more possible. >> So my idea is clearly not a the good way. >> >> Any help would be appreciated. Even if I have not all the technical >> knowledge of firefox internal, I am ready to work on it but I need the good >> entry points to start. >> >> Thank you for reading me and for your answers, >> >> Regards, >> _______ >> dev-platform mailing list >> dev-platform@lists.mozilla.org >> https://lists.mozilla.org/listinfo/dev-platform >> > ___ > dev-platform mailing list > dev-platform@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-platform -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Only build the Gecko-dev html parser
On Thu, Jul 20, 2017 at 11:00 PM, wrote: > Hi everyone, as title, I want to use the C/C++ to research the Gecko-dev html > parser. > Is it possible to build the Gecko-dev html parser? > > https://github.com/mozilla/gecko-dev/tree/e9fa5c772abe4426c5e33ffe61c438f75f990aca/parser What kind of research? That directory as a whole can't be used outside the context of Gecko. However, the core parts of both the HTML parser and the XML parser are available separately. The core of the HTML parser is written in Java (there's a translator program the translates it into C++ for use in Gecko) and can be run (as Java) in isolation of Gecko: https://hg.mozilla.org/projects/htmlparser/ In theory, it would be possible to edit the translator to support a non-Gecko translation target alongside the Gecko-coupled target, but that hasn't happened because first I had other stuff to do and then https://github.com/google/gumbo-parser showed up and ended the demand for a standalone C++ translation. The core of the XML parser is expat, which, of course, is available as an independent piece of software. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: More Rust code
lining across object files produced by different-language compilers. Naïvely, one would think that it should be possible to do that with clang producing "object files" holding LLVM IR and rustc producing "object files" holding LLVM IR and the "link" step involving mashing those together, running LLVM optimizations again and then producing machine code from a massive collection of mashed-together LLVM IR. In London, so over a year ago, I asked people who (unlike me) actually understand the issues involved how far off this kind of cross-language inlining would be, and I was told that it was very far off. Most obviously, it would require us to compile using clang instead of MSVC on Windows. Now that it's been over a year and two significant things have happened, 1) we actually have (traditionally-linked for the FFI boundary) Rust code in Firefox and 2) clang is ready enough on Windows that Chrome has switched to it on Windows, I guess it's worthwhile to ask again: If we were compiling C++ using clang on all platforms, how far off would such cross-language inlining be? If we could have the cross-language inlining benefit from compiling C++ using clang on all platforms, how far off would we be from being able to switch to clang on all platforms? - - But to go back to Rust and DOM objects: Even the context of DOM objects, there are two very different scenarios of relevance: 1) Rust code participating in DOM mutations 2) Rust code reading from the DOM when the DOM is guaranteed not to change. Scenario #2 applies to Stylo, but Stylo isn't the only case where it could be useful to have Rust code reading from the DOM when the DOM is guaranteed not to change. I've been talking about wishing to rewrite our DOM serializers (likely excluding the one we use for innerHTML in the document is in the HTML mode) in Rust. I have been assuming that such work could reuse the code that Stylo of has for viewing the DOM from Rust in a read-only stop-the-world fashion. I haven't actually examined how reusable that Stylo code is for non-Stylo purposes. Is it usable for non-Stylo purposes? - - And on the topic of memory management: DOM nodes themselves obviously have to be able to deal with multiple references to them, but otherwise we have a lot of useless use of refcounting attributable to the 1998/1999 mindset of making everything an nsIFoo. In cases where mozilla::UniquePtr would suffice and nsCOMPtr isn't truly needed considering the practical ownership pattern, making the Rust objects holdable by mozilla::UniquePtr is rather easy: mozilla::Decoder and mozilla::Encoder are real-world examples. The main thing is implementing operator delete for the C++ stand-in class that has no fields, no virtual methods, an empty destructor and deleted constructors and operator=: https://searchfox.org/mozilla-central/source/intl/Encoding.h#903 For the rest of the boilerplate, see: https://searchfox.org/mozilla-central/source/intl/Encoding.h#1069 https://searchfox.org/mozilla-central/source/intl/Encoding.h#661 https://searchfox.org/mozilla-central/source/third_party/rust/encoding_c/src/lib.rs#467 https://searchfox.org/mozilla-central/source/third_party/rust/encoding_c/include/encoding_rs.h#350 https://searchfox.org/mozilla-central/source/third_party/rust/encoding_c/src/lib.rs#677 This, of course, involves more boilerplate than scenarios that stay completely within C++ or that stay completely within Rust, but in the case of encoding_rs, the work needed to create the boilerplate was trivial compared to the overall effort of implementing the bulk of the library itself. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: More Rust code
I guess I buried my questions in too long a post, so extracting them: On Mon, Jul 31, 2017 at 1:02 PM, Henri Sivonen wrote: > Naïvely, one would think that it should be possible to do that with > clang producing "object files" holding LLVM IR and rustc producing > "object files" holding LLVM IR and the "link" step involving mashing > those together, running LLVM optimizations again and then producing > machine code from a massive collection of mashed-together LLVM IR. ... > If we were compiling C++ using clang on all platforms, how far off > would such cross-language inlining be? > > If we could have the cross-language inlining benefit from compiling > C++ using clang on all platforms, how far off would we be from being > able to switch to clang on all platforms? ... > 2) Rust code reading from the DOM when the DOM is guaranteed not to change. ... > I've been talking about wishing to rewrite our DOM serializers (likely > excluding the one we use for innerHTML in the document is in the HTML > mode) in Rust. I have been assuming that such work could reuse the > code that Stylo of has for viewing the DOM from Rust in a read-only > stop-the-world fashion. > > I haven't actually examined how reusable that Stylo code is for > non-Stylo purposes. Is it usable for non-Stylo purposes? -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: More Rust code
On Mon, Jul 31, 2017 at 4:53 PM, smaug wrote: >> And on the topic of memory management: >> >> DOM nodes themselves obviously have to be able to deal with multiple >> references to them, but otherwise we have a lot of useless use of >> refcounting attributable to the 1998/1999 mindset of making everything >> an nsIFoo. In cases where mozilla::UniquePtr would suffice and >> nsCOMPtr isn't truly needed considering the practical ownership >> pattern, making the Rust objects holdable by mozilla::UniquePtr is >> rather easy: mozilla::Decoder and mozilla::Encoder are real-world >> examples. > > Reference counting is needed always if both JS and C++ can have a pointer to > the object. Being able to have JS point to a lot of things is part of the original "everything is an nsIFoo" XPCOM mindset. For example, the types that mozilla::Decoder and mozilla::Encoder replaced were not scriptable but had XPCOM reference counting. They were always used in a manner where mozilla::UniquePtr would have sufficed in C++ and didn't work from JS anyway. > And UniquePtr makes it harder to ensure some object stays alive during a > method call, say, if a member variable is > UniquePtr. With refcounting it is always easy to have local strong > reference. When UniquePtr points to a Rust object, there's the mitigation that introducing a call from Rust to C++ involves more intentionality than adding another C++ to C++ call, so it's easier to notice when a call to C++ that might release the C++ object that holds the UniquePtr is introduced. For example, mozilla::Decoder and mozilla::Encoder never call to C++ code, so it's easy to reason that such leaf code can't accidentally delete what `self` points to. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to remove:
On Thu, May 5, 2016 at 10:55 PM, Ehsan Akhgari wrote: > At the very least we need to make sure that a surprisingly large > number of sessions don't run into isindex, right? Out of 36.15 million release-channel Firefox 54 sessions examined, there were 8 (just 8, no multiplier) with at least one isindex form submission. The removal corresponding to this intent is in 56. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: 64-bit Firefox progress report: 2017-07-18
On Thu, Jul 20, 2017 at 10:42 AM, Chris Peterson wrote: > Users with only 2 GB and 5 minute browser sessions would probably have a > faster user experience with 32-bit Firefox than with 64-bit, but how do we > weigh that experience versus the security benefits of ASLR? Not giving users a security mechanism due to a non-obvious reason feels bad. Furthermore, considering that Microsoft documents 2 GB as a "requirement" for 64-bit Windows, is it really worthwhile for us to treat three Windows pointer size combinations (32-bit on 32-bit, 64-bit on 64-bit and 32-bit on 64-bit) as fully supported when one of the combinations is in contradiction with the OS vendor's stated requirements? Do we have any metrics on whether 32-bit on 64-bit exhibits bugs that 32-bit on 32-bit and 64-bit on 64-bit don't? That is, what kind of bug burden are we keeping by catering to users who've installed 64-bit Windows with less than 2 GB of RAM in contradiction with what Microsoft states as a requirement? -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: More Rust code
On Tue, Aug 8, 2017 at 1:12 AM, Mike Hommey wrote: > Here's a bunch of data why "let's switch compilers" is not necessarily > easy (I happen to have gathered that recently): Thank you. > Newer versions of clang-cl might generate faster code, but they crash > during the build: https://bugs.llvm.org/show_bug.cgi?id=33997 I'm guessing using a very new clang was what allowed Chrome to switch from MSVC to clang? (Chrome accepted a binary size increase on 32-bit Windows as a result of switching to clang.) -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Changing .idl files
On Tue, Aug 8, 2017 at 6:17 AM, Kris Maglione wrote: > On nightlies and > in unbranded builds, it will still be possible to enable them by flipping a > pref, but they will be completely unsupported. > > Yes, that means that some users and developers will continue to use them, > and continue to get upset when they break Why is it worthwhile to keep a configuration that will continue to make people upset? Does there exist a known set of legacy extensions that 1) do something that WebExtensions can't yet do and that 2) are limited to using interfaces that we don't want to remove or refactor ASAP? -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Finalizer for WebIDL object backed by JS impl of an XPCOM interface
What's the correct way to take an action right before a JS-implemented XPCOM object that acts as the implementation for a WebIDL interface gets garbage collected? -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Finalizer for WebIDL object backed by JS impl of an XPCOM interface
On Tue, Aug 8, 2017 at 1:26 PM, Henri Sivonen wrote: > What's the correct way to take an action right before a JS-implemented > XPCOM object that acts as the implementation for a WebIDL interface > gets garbage collected? Taking action soon after GC would work for me as well. I'm thinking of introducing a C++-implemented XPCOM object that the JS-implemented XPCOM object can hold a reference to and that has a C++ destructor that does what I want. But is there a simpler way? -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Finalizer for WebIDL object backed by JS impl of an XPCOM interface
On Wed, Aug 9, 2017 at 9:39 PM, Boris Zbarsky wrote: > On 8/9/17 1:55 PM, Henri Sivonen wrote: >> >> I'm thinking of introducing a C++-implemented XPCOM object that the >> JS-implemented XPCOM object can hold a reference to and that has a C++ >> destructor that does what I want. > > > Does that mean your action doesn't depend on which exact JS object got > collected, or that you can encode that information in some way without > referencing the JS object? The action is decrementing a counter on the inner window, so it's sufficient if the C++ destructor knows if it needs to decrement the counter and knows which window. The concrete situation is described in https://bugzilla.mozilla.org/show_bug.cgi?id=1378123#c9 -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Linking with lld instead of ld/gold
On Mon, Aug 14, 2017 at 12:08 AM, Sylvestre Ledru wrote: > Thanks to bug https://bugzilla.mozilla.org/show_bug.cgi?id=1336978, it > is now possible to link with LLD (the linker from the LLVM toolchain) > on Linux instead of bfd or gold. Great news. Thank you! Does this enable lld to ingest object files that contain LLVM bitcode instead of target machine code and to perform cross-compilation-unit optimization? How far are we from cross-compilation-unit optimization when some compilation units come from clang and some from rustc? -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Coding style: Argument alignment
Regardless of the outcome of this particular style issue, where are we in terms of clang-formatting all the non-third-party C++ in the tree? I've had a couple of cases of late where the initializers in a pre-existing constructor didn't follow our style, so when I changed the list a tiny bit, the post-clang-format patch showed the whole list as changed (due to any change to the list triggering reformatting the whole thing to our style). I think it would be better for productivity not to have to explain artifacts of clang-format during review, and at this point the way to avoid it would be to make sure the base revision is already clang-formatted. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Coding style: Argument alignment
On Wed, Aug 30, 2017 at 10:21 AM, Sylvestre Ledru wrote: > > Le 30/08/2017 à 08:53, Henri Sivonen a écrit : > > Regardless of the outcome of this particular style issue, where are we > in terms of clang-formatting all the non-third-party C++ in the tree? > > We have been working on that but we delayed it to avoid doing it during the > 57 work. > > We will share more news about that soon. Cool. Thanks. > I've had a couple of cases of late where the initializers in a > pre-existing constructor didn't follow our style, so when I changed > the list a tiny bit, the post-clang-format patch showed the whole list > as changed (due to any change to the list triggering reformatting the > whole thing to our style). I think it would be better for productivity > not to have to explain artifacts of clang-format during review, and at > this point the way to avoid it would be to make sure the base revision > is already clang-formatted. > > Could you report a bug? We wrote a few patches upstream to improve > the support of our coding style. It's not a bug: The constructors were not formatted according to our style to begin with, so an edit triggered clang-format formatting them to our style. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform