On Sun, 2020-04-05 at 21:33 +0200, Roland Clobus wrote: > Thank you for your efforts to make the live-manual better. > I'm adding the mailing list to the list of recipients, because the > work > is done by the live team (of which I'm not a member, just a person > with > interest in the project) and the team can decide on the future > direction > of the project.
Yes, I'm aware. I was just curious about the nature of the update to live-manual you were tackling. I was not ready to initiate a proper discussion/proposal of format changes, which would have occurred openly later if I was still then interested in actually carrying it out. I just wanted at this time to merely gain an understanding of whether this was something you were already doing, and if not then some idea of when perhaps your work might be available to build such a change upon. > > I believe you were the person who spoke of being in the middle of a > > big> update to live-manual? > Well big... I am working (with the little time that I have) on > updating > the live-manual, primarily such that all references to Alioth get > removed. While doing so, I got to know the live building process and > updated parts of the manual that got out-of-date over time. Yes, I understood that your work essentially was focussed on bringing it up to date and my impression was that this was more than just a few minor tweaks, especially after noticing previously a question being asked about live-wrapper which the manual does not currently cover at all, and since if it was a very minor change you'd have probably submitted it by now. (Though of course the response indicated that live-wrapper is effectively dead so no longer needs covering anyway...) Alioth references - Aside from translation files, which I've ignored, all I can find is three old links in searching for "alioth"; one of which I missed in my previous "live-systems" update, I've just submitted an MR for that, the other two I'd previously spotted but not known what to do with only require minor corrections. So I don't understand what it is you're actually doing with respect to "removing alioth references" if it's not just that... (not that I need to know) (and not that I've really looked at the manual in quite some time). Perhaps you could publish a preview (WIP) copy of your changes so far, even if it's currently an incomplete mess? (I looked at your salsa repo on Saturday but saw no sign of such work, so you must have it locally only I suppose). btw, I've noticed that the manual is currently missing discussion of injecting environment variables via the user config, if fixing that fits within your scope of improvements and want to tackle it. It would be good to have that covered. I can provide some details if needed. > > Does your work involve at all: > > b. any change to the markup language and respective "build" tool > > used > > for "building" the documentation (generation of HTML pages and > > PDFs). > > No, the markup language SiSU is not a common markup language, but for > me > it suits its purpose. ok. > > I ask because I made a relatively small change the other day to the > > coding style section (submitted an MR which is pending review); > > I assume you are referring to > https://salsa.debian.org/live-team/live-manual/-/merge_requests/22 yes. > > The build tool SiSU was my biggest gripe. I was not impressed with > > the > > hundreds of megabytes of dozens of packages required for sisu- > > complete > > installation, that it seemed to be generating a postgresql database > > on > > installation, that it pulls in ruby and such, and the manual having > > to > > give advice on speeding up a supposedly very slow build process (I > > followed the tips to limit the scope of building, which was > > reasonably > > quick, I did not explore how slow the worst case supposedly is). > > As far as I can see, SiSU was a nice tool at the time the live-manual > was started. It apparently didn't catch on, as nowadays there are > hardly > any reverse-(build)dependencies on SiSU. Yes, I believe that live-manual is the only package build depending on sisu-complete, and unless I've made a mistake in how I've searched the archive, it is the only package in the entire archive actually using SiSU. A good few dozen on the other hand depend upon markdown or pandoc. Hell, I just looked at the SiSU website and found that all of their source code links to be dead (everything at http://git.sisudoc.org: http://git.sisudoc.org/software/sisu; http://git.sisudoc.org/git/code/sisu.git; http://git.sisudoc.org/gitweb/?p=code/sisu.git;a=summary) which is not at all a good sign, not a state expected of a healthy project. > However, SiSU allows the author to focus on the content without being > bothered by layout decisions and markup. I don't exactly agree, and I do not think that the separation is a significant concern here. >From what little of SiSU I know (primarily from noting the difference in ssi vs. ssm files), yes, SiSU separates content from layout, but it certainly does not stop authors being bothered by markup. Surely you did not mean to suggest that? The SSI content files are riddled with SiSU markup artefacts, as to be expected in anything that is not just plaintext. You cannot escape from having to deal with markup of one form or another unless writing only plaintext. Furthermore the uncommon nature of SiSU markup is itself a hindrance to authoring changes. I only just about got by without having to find SiSU documentation by copying what I was seeing elsewhere in the files. Markdown/commonmark and HTML on the other hand are widely understood. The layout components of live-manual are _very_ minimal. I would not expect anyone to reasonably consider them to be getting in the way of authoring changes if we were to move to one of the two proposed alternate formats. If we consider markdown as an alternate to SiSU, then essentially all of the ssi files will translate to equivalent markdown/commonmark markup. They could perhaps work stand alone without any template, with the all-in-one version being created from a simple bit of script mostly just appending each page into a single file. For HTML as an alternative, you'd have a relatively small block of HTML before and after the chunk that is the content. The HTML within the content would largely just be equivalent markup, with the major different just being <p></p> tags surrounding paragraphs and similar for headings. Or, if really wanting to separate out the starting and ending HTML from the chunk that is the content, you could have that in a "template" HTML file that has a placeholder for the content, and then have the build script inject the file content of pages using the `sed` tool or similar. You thus split out this small amount of non-content HTML, at the expense of having to "build" pages to "test"/review changes in their non-source form. Leaving it in each file though I really don't consider worthy of being said to be a bother to authors. If ever wanting to modify this starting/ending portion of HTML though, having a template has benefits over having to change multiple files and across multiple translations, but how often are changes ever going to be made that this becomes a concern? Is it really enough of a concern to make templating worthwhile vs. the trade off of needing to "build" pages. Generating the all-in-one copy of the documentation can quite trivially be achieved from both markdown and HTML forms, in theory, you're just essentially copying all of the individual pages into one file effectively. If using HTML and no template, then it's still a trivial task to extract the content portion to do this, so that's no bother. Another factor is that my editor of choice - gedit - has syntax highlighting for many formats, but this does not include SiSU. > For the translators, the translation framework is also present, using > the well-known po files. I would argue that for documentation files like the manual, having a translation framework is over the top and inappropriate. The po translation system is well suited to code files because it separates the text strings from the mass of code they are sparsely located within, allowing the translator to focus on the data they need - the strings. It also makes sense considering the use of the translations - the translations are not going into translation specific binaries, they are just translating the string data, for which a single common compiled binary is to make use of when it dynamically loads the right set of language strings at runtime (except when just using the embedded English of course with the gettext system). Compare this to the manual, where a different copy is made for each language, and the content of relevance to translations forms almost the entirety of the source for generating "built" artifacts, thus bringing into question the model of trying to split the content out in the same way, when it might be must more simple to be able to just generate them directly from the files containing translated content, if only permitted to contain a tiny portion of layout if necessary. Of course PO is not perfect. You may be interested by Project Fluent, if you've not come across it before: https://projectfluent.org/ With regard to the manual, breaking up the content of the pages in terms of individual headings, bullet points, paragraphs, etc, in POT/PO files, really does not add much over just having a copy of the original files, very little of which is not pure content. Compare about_manual.ssi and about_manual.ssi.pot. There is no benefit to PO here in terms of focussing on the job at hand; all of the small bits of markup within sentences/paragraphs are carried across, even things like "code{" get grabbed as possibly translatable strings. Compare user_basics.ssi and user_basics.ssi.pot. The first thing in there is "code{", which, since the PO file is trying to avoid duplicate copies of, means that it appears nowhere else in the PO file, meaning that you cannot just go through the PO file, translating, understanding context as perfectly as with the original. PO thus can potentially even be a hindrance to translation of these files. For instance, take this string from the POT: " # apt-get install xorriso\n". It is not necessarily immediately obvious to a translator that this is a shell command and that the word "install" if not all words it contains should not be translated. The surrounding code block markers having been removed potentially damages understanding of context. Of course I'm forgetting thus far that po files have each original string alongside each translation string as an "ID", which may be very helpful for comparison. An alternate that might work perfectly well for these documentation files might be for translators to view the original and translation side by side, or in a split 2-file view in an editor. Of course perhaps po4a actually works with HTML and markdown files, I'm not sure. I just did a brief google search and did not get a clear answer. If so then the issue of moving away from po4a is rather redundant is it not, if it is considered worth keeping for the translation workflow, compared to side-by-side / split-dual-view comparison based work as just described. Since, as I repeat, these files are almost entirely pure content, and po damages the view of that actual content in cases and thus damages understanding important context. The other issue of course is translators identifying portions of text that have changed where updating translation is necessary. If keeping po then this aspect is irrelevant. If moving away from it, then translators would either reply upon side-by-side / split-dual-view comparison, going through line by line; review the commits to the repo and update following the changes made in them; or get a diff of the English version from when the translation was last updated up to present, and work from that. > > Just at the minute I'm taking a brief pause in my other work to > > consider the possibility of whether there is a better, more modern, > > lightweight, etc tool that could replace use of SiSU. Something > > markdown/commonmark based perhaps. Or maybe if we just used plain > > HTML > > directly, with a conversion tool to produce PDFs (if we want to be > > able > > to make them). > > > > I don't think we need care about the database backed "search" > > mechanism > > SiSU provides. > > Back when Alioth was still running, the database backend functioned > as a > kind of search engine. I'm not sure that I follow. I presumed that the SiSU search component was a feature of generating a search facility for searching within the content of a SiSU formatted set of documents. So the database generates and stores a pre-compiled set of data with which to respond to searches with. I am familiar with a similar thing being a part of documentation generated for Rust projects via `cargo doc`, only I believe it just compiles information into javascript, rather than use a DBMS. I do not consider such a search facility to be of value to the live-build manual. It does not consist of many pages and you can easily (a) use the in-page search facility of your web browser either in multi-page or single-page view (b) use the search facility in your PDF viewer if using the PDF (c) use search engine to find the right page for multi page view. > > And I expect we don't need the same translation stuff we > > use for code; translations can just be copies of the english files, > > translated... > > I disagree here. Having a standard translation framework in place is > important to me. For me, translated files should take over the > structure > and as much of the markup from the original file as possible, and > preferably only replace the English content. Again, as above, _very_ little of the content of the files, now and under the alternate formats, is not content or markup that naturally and already finds its way into the translation files. I do not expect any notable difference under markdown/HTML if continuing to use po. > > Anyway, I just wanted to know whether you were already working on > > anything in this area or just rewriting the text. Also I was > > wondering > > how near completion you might be, as I'd not want to start any > > effort > > (if I was to do it myself) to convert to a different markup and > > such > > until your substantial rewrite was available to do it on, otherwise > > it > > would just create a lot of extra work for one of us to rework > > things on > > top of the other's changes obviously. > > I would rather ask whether any of the team members would be willing > to > do a review if all files will be touched. As in thoroughly compare old and new side by side word for word? Personally I would not be overly concerned about review, provided a discussion and agreement takes place on the direction of the work beforehand regarding selection of format, and use of templating and translation. Considering all the benefits, not least moving away from relying upon an essentially obsolete build tool, then I think there's good chance they'd agree. In terms of what the reviewer might do in their review of such a change, I would not necessarily expect them to bother to look all that closely at such a change, at least if it were me submitting it rather than someone new here. I have to some degree built up a working relationship with Raphaël and Luca. I do not know whether that relationship will ever extend to team membership, but I've been working with them over the past few weeks getting a substantial amount of work merged into live-build. Furthermore (1) we're just talking about chunks of documentation here, not code where bugs including security vulnerabilities can be easily introduced by the smallest flaw (unintentional or deliberate), (2) it's not like they'd expect it likely that I, having established a good working relationship with them, would suddenly jeopardise that by playing some prank submitting a vandalised work, or that it likely that I'd screw things up royally performing such a format conversion, such as loosing a paragraph. I would thus not expect them to be spending time doing a word by word comparison. I'd expect little more than a cursory look at how I'd implemented the new format, and a quick check that the build/translation functionality works as before; trusting me that the content is unchanged. Of course I do appreciate that they already have their work cut out reviewing all of my live-build contribution work as it is. But then this would not necessarily need to get reviewed and merged immediately, it could sit in salsa pending merge for a little while until their time frees up enough (or they give me access and freedom to do it myself, assuming they agree with the nature of the change in principle). > My proposal: > * First gather consensus from the team whether a change is needed > * When so, decide on a solution that matches the requirements Of course. As I said above, I was just seeking out some info on the nature of your work at this stage, prior to making any possible enquiry/exploration of format change at a later date. > Let me try to summarise the current state: > > As-is state: > * The documentation is written in SiSU > * The output is available in PDF (A4 and letter, each both in > portrait > and landscape), HTML, epub, odf and plain text `markdown` can convert from markdown to HTML. I'm not sure if it can do more. `pandoc` can convert from markdown to HTML, PDF (using LaTeX), ODT, docx, RTF, plaintext, EPUB, and others. I don't have any experience of using these, they're just a couple of tools I came across doing a brief bit of research yesterday. So thus I do not know about portrait vs. landscape and A4 vs. letter, though I don't see why we'd care about anything other than portrait A4... > * Translations in all document formats are generated for 9 languages > (ca, de, es, fr, it, ja, pl, pt_BR, ro) po4a may well work just fine for markdown/HTML, so no difference if so; Just perhaps requires a bit of effort to migrate existing translations. (If we care to, perhaps they're so out of date by now the existing copies should just be ditched?) > * Dead links can be found using linkchecker on the html output We're getting HTML output in any case; Whether we go with markdown or actual HTML, we will always want HTML output for the web hosted copy. > * sisu-complete pulls in many packages > * The latest build from the git branch master is available on > https://live-team.pages.debian.net/live-manual/ > * live-manual is packaged in Debian as live-manual, live-manual-epub, > live-manual-html, live-manual-odf and live-manual-pdf Interesting, I actually had not noticed the existence of all of those actual packages. I'm not quite sure whether I'd agree that this is the ideal means of distributing the manual. I mean in terms of the PDF form, live-manual-pdf dumps 40 files into /usr/share/doc/live- manual/pdf/, with different orientations, and different languages, and compressed into archives. Is that really a suitable/desirable/useful way to distribute the manual? > Positive notes: > * Sisu is well-supported with syntax highlighting As mentioned above, no highlighting in my editor - gedit. Being little used is not going to encourage package maintenance. > * Proof-reading the English text is done by 'make PROOF=1', which > takes > only about 8 seconds on my computer. > > Negative notes: > * Sisu has a few reverse build dependencies > * sisu-complete brings a large list of dependencies, but can be > replaced > by a smaller list > > Personally, I see no immediate need to have this large > transition/rewrite right now. > A task within the live team, that I see will be more pressing, will > be > the generation of the standard live images (about which I shortly > wrote > on 2020-03-21T17:27). Current Debian Stable images are built with > live-wrapper, which uses vmdebootstrap under the hood. vmdebootstrap > depends on Python 2, and will not be present in the next version of > Debian. I am not aware of something that provides a 1-to-1 > replacement > that will work on Debian Testing (and therefore the next release of > Debian). If you mean bullseye, I expect that it won't be until some time next year that that gets released, so I don't see a pressing concern for focussing on a move away fro live-wrapper there. Also, the live images were previously produced with live-build until live-wrapper came into existence a few years back and they moved to that, as far as I understand it. With live-wrapper and vmdebootstrap being abandoned, I would expect that it should not be a big deal for them to just move back to live-build... Furthermore, live-manual is currently live-build focussed only, and although live-wrapper and live-build both have the maintainer listed as the Debian live team, effectively they have different groups of people working on them. Since the original author of live-build left, live-build has been in low maintenance mode, with Raphaël Hertzog holding ownership, and sharing upload responsibility with Luca Boccassi, as I understand it. It is these two who have been reviewing and merging my contributions these past few weeks. Effectively I personally have been working full time just now on live-build improving many deficiencies in the codebase and fixing various bugs. Though I am not on the team myself. With live-wrapper, if you checkout debian/control and review the commit history, it is Iain Learmonth and Steve McIntyre who are the main two who have been developing it, and the two of them along with Jonathan Carter who hold upload responsibility. So although all of them are grouped under the banner of the Debian live team, and are members of owner or maintainer status in its salsa project, it's two different sets of people developing and maintaining the two different tools. Furthermore the original author of live-build was just the developer of that tool, I do not believe he was ever a member of the Debian team responsible for releasing official images. I've lost touch with who the team members are, if I ever knew. Perhaps those involved in live- wrapper belong to it? Perhaps not. I recall Steve being a notable member of the project from past contribution work I did for Debian, but I don't know if releasing images is one of his areas of responsibility. So, live-manual is currently live-build oriented only. If a discussion of expanding it to cover live-wrapper ever took place, I'm not aware of it, and in any case it would be pointless now that live-wrapper is looking dead. With live-manual being live-build only, and with the live-build, live-wrapper, and debian-image-team (whatever we should call them) being separate groups of people (or at least the live-build people being separate), I don't expect that the effort of which you speak regarding migrating away from live-wrapper for official images is of any concern to the workload of those working on live-build and its documentation in live-manual.