On Thursday, September 22, 2022 9:20:46 AM MST Agustin Martin wrote:
> First of all, I am curious about the reasons behind this new format,
> the problems it deals with and its advantages. I assume they are valid
> enough, but they imply yet another spellchecking engine/format. We
> currently have goog old ispell, aspell and hunspell. vim has its own
> spellchecker engine using its own format, with dicts that can be
> created from old myspell2 dicts. We did not add vim format dicts (from
> aspell dicts sources) since there seems to be some work to make vim
> use hunspell directly. And now these bdict dicts.

The .bdic format is specified by the upstream Chromium project, and is required 
by 
anything that is based off of Chromium's code, like Qt WebEngine.  I do not 
know why 
they went with a proprietary binary format, but I would assume that if they 
went to so 
much trouble to not use the standard Hunspell format there must have been 
something 
to make it worthwhile, like some performance improvement.  Perhaps I am giving 
Google too much credit for having logical reasons instead of making arbitrary 
decisions.

> From your info and proposed locations seems that these dicts are
> arch:all, ¿is that true?

I have not seen anything to indicate they are not arch:all.  Although it 
probably depends 
on how the binary data is processed.  There is a possibility there might be an 
endianess 
issue.

> Another question is what happens with affix files, which I see are
> used at build time, ¿are they used (from their path) at runtime or is
> all the info (dic+aff) bundled into the bdic file? If explicit affix
> files are still required at runtime, both bdic and aff files should
> probably be in the same dir. Otherwise I am more for a separate
> location. In this case, since bdic dicts seem to be more generic than
> just a qtwebengine issue and they are indeed created from hunspell
> files I would go for a rather generic name (may be something like
> /usr/share/hunspell-bdic or something without the hunspell name?)

The .bdic binary file contains all the information from the .dic and .aff 
files, so neither of 
them are needed by Qt WebEngine.  As such, I think a dedicated directory for 
the .bdic 
files is best.

My personal motivation for getting these dictionaries into Debian is that I am 
the 
developer of Privacy Browser, which is a web browser based on Qt WebEngine.  
The PC 
version is currently in a pre-alpha state.

https://www.stoutner.com/privacy-browser-pc/[1]

When adding spell checking functionality, I realized that these dictionaries 
were not 
already packaged.  The little bit of poking around that I did showed that Arch 
Linux 
packages them, but I do not know if other distributions do so.

https://archlinux.org/todo/packaging-qtwebengine-dictionaries/[2]

There are a number of existing web browsers in Debian based on Qt WebEngine 
that 
could take advantage of the presence of these .bdic dictionaries.  A 
non-exhaustive list 
includes:  Konqueror, Falkon, qutebrowser, and angelfish.  If it ends up being 
feasible for 
Chromium to also use a system-wide .bdic location, then any Chromium fork would 
also 
benefit.

Once Privacy Browser reaches an alpha release, my intention is to maintain a 
Debian 
package for it.  I have the option of integrating the .bdics directly into the 
program's 
personal data folders, but that seems like a suboptimal approach, because 
anything else 
on the system that wanted to use them would have to have their own copy.  When 
the 
binary dictionaries are installed in the correct system-wide folder, any Qt 
WebEngine 
program can utilize them with a single line of code that specifies which 
dictionary to use 
(only one can be active at a time).  Of course, the program would also probably 
need to 
establish a GUI where the user can select which dictionary they would like to 
be active, 
which GUI involves more than a single line of code.

-- 
Soren Stoutner
so...@stoutner.com

--------
[1] https://www.stoutner.com/privacy-browser-pc/
[2] https://archlinux.org/todo/packaging-qtwebengine-dictionaries/

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to