Re: Linux HTML mail agent with RTL and LTR paragraph explicit support
On Mon, Jun 25, 2012 at 8:06 AM, Shachar Shemesh wrote: > I disagree completely. The embedding control characters are designed for, > well, embedding. Correct. As plain text has no concept of a paragraph, using \n, \n\n, \r\n, \r\n\r\n, or any other convention for a paragraph is arbitrary. So if any arbitrary part of the text is to be RTL (no matter if the user calls it a paragraph or not) then it is to be marked as an embedded RTL section. > What the standard[1] suggests, but does not require, is the > use of the first strong directional character in the paragraph. The reasons > this does not work for email are: > > It is not required by the standard. It is suggested as a way to determine > paragraph directionality, but this suggestion is incomplete. For example, > the standard says nothing about what to do with a paragraph with no strong > directional character at all. > This suggestion is non-normative. The standard explicitly states that a > "higher level protocol" can be used to determine this property. > HTML has chosen the "higher level property" as the BiDi directionality path. > Unless certain discussions currently in effect become standard, HTML will > not guess the directionality of a paragraph ever, no matter how much you > want it to. There are some discussions about adding a "direction: auto" > property to CSS. > The only standard way to provide paragraph directionality in email is by > sending it as HTML > > A few takeaways. There is no standard I'm aware of that states you SHOULDN'T > use the first character in a paragraph to determine paragraph direction in > plain-text emails. I think that is a perfectly reasonable approach. However, > most of the world uses various MS based email readers. Those don't do it, > and they do not violate any standard by not doing it. As a result, if you > want your email to be legible by any recipient, HTML mail is the way to go > if you are writing in Hebrew. Complaining to your recipient (or sender) that > they are not doing it properly is both impolite and, which I feel many > people here will see as worse, technically incorrect. > > > I know many people on this list don't like this standard, but this extra > email did nothing to change it (not that I, personally, think that changing > it is the right thing to do). > I agree with you completely in regards to interoperating with defacto-standard software. >> Are you referring to me, in regard to the discussion that we had in >> which I think that the LTR- and RTL-Embedding characters should be >> available in the Hebrew keyboard layout? > > No. I am referring to all those who complain so violently when HTML mail is > sent to the list. > I see. I'm glad that I know how to configure my email client properly not to notice it, and that I have the disk space to spare for some markup. I wonder how loud those folks would scream if they noticed that Hebrew UTF-8 characters are _two_ bytes long! >> That doesn't mean that I >> dislike the idea of using HTML. Actually, I don't like HTML mail but >> not for that reason, rather a personal preference with no root in >> ideology nor technical reason. > > Okay, so maybe I was referring to you after all :-) > No, I'm not a complainer. I don't like _sending_ HTML mail, but I'll happily receive the mail in any format that standard email clients support (NOT Word!). -- Dotan Cohen http://gibberish.co.il http://what-is-what.com ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Linux HTML mail agent with RTL and LTR paragraph explicit support
On 06/25/2012 01:42 PM, Dotan Cohen wrote: > On Mon, Jun 25, 2012 at 8:06 AM, Shachar Shemesh wrote: >> I disagree completely. The embedding control characters are designed for, >> well, embedding. > Correct. Good. But > As plain text has no concept of a paragraph, Well, that really depends on what you mean by "plain text". RLE/PDF are defined by the UBA (Unicode BiDi Algorithm), and it, clearly, does have a concept of a paragraph. > using \n, \n\n, > \r\n, \r\n\r\n, or any other convention for a paragraph is arbitrary. Technically true, but both irrelevant and misleading. Misleading because the choice of \n or \r\n was arbitrary, but is now standard. Irrelevant because we are talking about the UBA, not "plain text" (whatever that means). > So if any arbitrary part of the text is to be RTL (no matter if the > user calls it a paragraph or not) then it is to be marked as an > embedded RTL section. This is incorrect. It does not matter much what the user calls a paragraph, but if the /text editor/ calls a certain run a paragraph, then that is the case. You make it sound as if, in the sequence "something more something \n even more ", the third part, saying " even more" will have an RTL level. That will simply not be the case with any UBA conforming text editor, as UBA specifically says that any embedding levels are reset when the paragraph is terminated. This is because the embedding controls are *embedded* in the paragraph. In other words, a paragraph is a paragraph, with BiDi direction, and embedding is embedding. The two are not the same. Shachar -- Shachar Shemesh Lingnu Open Source Consulting Ltd. http://www.lingnu.com ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Linux HTML mail agent with RTL and LTR paragraph explicit support
Schachar, I before addressing the issue at hand, I would like to state an observation. When I reply to your mail, all text is of the same quote level. That is, there is a single > at the beginning of each line, whether it is a line that you wrote or a line that I wrote. Obviously, I am replying to the HTML portion of your multipart message, not the plain text portion. My mailer (Gmail) does not know that blockquote type="cite" means that the text is a quote. Why should it, is that a standard (I don't know, it might be)? This is a good argument against HTML mail. You can tell me that my mailer (Gmail) is broken. But remember that Gmail is now no less a defacto standard mailer such as Outlook once was, and that you advocate compatibility with Outlook based on it's defacto standard status. I've manually fixed the nesting below: On Mon, Jun 25, 2012 at 5:19 PM, Shachar Shemesh wrote: >> On 06/25/2012 01:42 PM, Dotan Cohen wrote: >>> On Mon, Jun 25, 2012 at 8:06 AM, Shachar Shemesh >>> wrote: >>> >>> I disagree completely. The embedding control characters are designed for, > well, embedding. > >> Correct. > > Good. But > >> As plain text has no concept of a paragraph, > > Well, that really depends on what you mean by "plain text". Plain text is a sequence of bytes in a standard encoding which may or may not begin with a BOM and is designed to be read in a text editor. A text editor is a program that reads a sequence of bytes and using a table commonly referred to as an encoding then displays character glyphs on the screen as per the sequence of bytes. > RLE/PDF are > defined by the UBA (Unicode BiDi Algorithm), and it, clearly, does have a > concept of a paragraph. > Are you referring to the use of linefeeds to designate the end of an embedded section? >> using \n, \n\n, >> \r\n, \r\n\r\n, or any other convention for a paragraph is arbitrary. > > Technically true, but both irrelevant and misleading. Misleading because the > choice of \n or \r\n was arbitrary, but is now standard. Irrelevant because > we are talking about the UBA, not "plain text" (whatever that means). > I see that you are. Fine, I was unaware that they did call that a paragraph and I do know that embedded sections do end at newlines. Whatever, let us agree then that sections of text separated by newlines are paragraphs as that is how the embedded sections end. >> So if any arbitrary part of the text is to be RTL (no matter if the >> user calls it a paragraph or not) then it is to be marked as an >> embedded RTL section. > > This is incorrect. It does not matter much what the user calls a paragraph, > but if the text editor calls a certain run a paragraph, then that is the > case. > Alright. > You make it sound as if, in the sequence "something more something \n > even more ", the third part, saying " even more" will have an RTL > level. That will simply not be the case with any UBA conforming text editor, > as UBA specifically says that any embedding levels are reset when the > paragraph is terminated. This is because the embedding controls are embedded > in the paragraph. > > In other words, a paragraph is a paragraph, with BiDi direction, and > embedding is embedding. The two are not the same. > So we have established that sections of text separated by newlines are paragraphs. Let us return to the issue. In a plain text file, as defined above, there does exist a method by which the author of the file may specify that a paragraph is to be RTL. Therefore there is no need for HTML to send RTL emails, nor is there technical need for the email client to guess. However, I agree that there is practical need for the email client to guess as many users may not mark RTL paragraphs as RTL (be them plain text or HTML). Have I forgotten anything? -- Dotan Cohen http://gibberish.co.il http://what-is-what.com ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Linux HTML mail agent with RTL and LTR paragraph explicit support
> Date: Mon, 25 Jun 2012 13:42:20 +0300 > From: Dotan Cohen > Cc: linux-il@cs.huji.ac.il > > [...] plain text has no concept of a paragraph [...] That's not true. The UBA explicitly defines a paragraph as chunk of text delimited by paragraph separator characters: Paragraphs are divided by the Paragraph Separator or appropriate Newline Function [...] And therefore this conclusion is incorrect: > So if any arbitrary part of the text is to be RTL (no matter if the > user calls it a paragraph or not) then it is to be marked as an > embedded RTL section. ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Linux HTML mail agent with RTL and LTR paragraph explicit support
On 06/25/2012 06:21 PM, Dotan Cohen wrote: > Schachar, I before addressing the issue at hand, I would like to state > an observation. When I reply to your mail, all text is of the same > quote level. That is, there is a single > at the beginning of each > line, whether it is a line that you wrote or a line that I wrote. Until I debug this, I'm replying as plain text only. > Are you referring to the use of linefeeds to designate the end of an > embedded section? No. I'm referring to paragraph terminators. Dotan, may I suggest you go read the standard before making claims on what it is saying? >From the standard (section 3), the UBA[1] is applied by using the following four steps: - Separation into paragraphs - Initialization - Resolution of the embedding levels - Reordering Paragraphs are resolved in step 1 and 2. RLEs in 3. They are simply not the same thing. BD5 defines "paragraph direction". > So we have established that sections of text separated by newlines are > paragraphs. Let us return to the issue. In a plain text file, as > defined above, there does exist a method by which the author of the > file may specify that a paragraph is to be RTL. There exists many. Specifically, the standard, which I urge you to read, offers one, and then specifically says that others are also okay (i.e.- not in violation of the standard). These are mentioned in the text right after P3, and again at HL1. It seems to me you are trying to force your agenda. > Therefore there is no > need for HTML to send RTL emails, nor is there technical need for the > email client to guess. Except there so no standard, de-facto or otherwise (as far as I'm aware) on whether HL1 is being applied be email clients for plain text emails, and the HTML standard is that HL1 is being applied, and paragraph direction must be set. > Have I forgotten anything? Yes. To substantiate your claims. Shachar 1- http://unicode.org/reports/tr9/ -- Shachar Shemesh Lingnu Open Source Consulting Ltd. http://www.lingnu.com ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Linux HTML mail agent with RTL and LTR paragraph explicit support
> Date: Mon, 25 Jun 2012 17:19:01 +0300 > From: Shachar Shemesh > Cc: linux-il@cs.huji.ac.il > > > \r\n, \r\n\r\n, or any other convention for a paragraph is arbitrary. > Technically true, but both irrelevant and misleading. Misleading because > the choice of \n or \r\n was arbitrary, but is now standard. Irrelevant > because we are talking about the UBA, not "plain text" (whatever that > means). The UBA _is_ about plain text. ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Linux HTML mail agent with RTL and LTR paragraph explicit support
> Date: Mon, 25 Jun 2012 18:21:25 +0300 > From: Dotan Cohen > Cc: linux-il@cs.huji.ac.il > > So we have established that sections of text separated by newlines are > paragraphs. Let us return to the issue. In a plain text file, as > defined above, there does exist a method by which the author of the > file may specify that a paragraph is to be RTL. Therefore there is no > need for HTML to send RTL emails, nor is there technical need for the > email client to guess. However, I agree that there is practical need > for the email client to guess as many users may not mark RTL > paragraphs as RTL (be them plain text or HTML). > > Have I forgotten anything? Yes, a few things: . Base paragraph direction in plain text is determined by the first strong directional character in the paragraph. . A UBA-compliant MUA will determine the base direction in plain text as the UBA specifies, and will display an RTL paragraph starting at the right margin of the window. . A UBA-compliant MUA _may_ decide that a paragraph is defined by something other than a single newline etc. (e.g., Emacs uses the high level protocols clause to define a paragraph as text delimited by empty lines, i.e. two newlines in a row). But whatever the definition of a paragraph, once a paragraph _is_ detected that starts with a string RTL character, a compliant MUA will display it as RTL. . You can start the paragraph with the RLM or LRM character to force its base direction be RTL resp. LTR, because these two have streong directionality. ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Linux HTML mail agent with RTL and LTR paragraph explicit support
> Date: Mon, 25 Jun 2012 20:02:48 +0300 > From: Shachar Shemesh > Cc: linux-il@cs.huji.ac.il > > > Therefore there is no > > need for HTML to send RTL emails, nor is there technical need for the > > email client to guess. > Except there so no standard, de-facto or otherwise (as far as I'm aware) > on whether HL1 is being applied be email clients for plain text emails, Yes, there is such a standard: the UBA. It explicitly applies to plain text, in the absence of any high-level protocols. And there can be no high-level protocols that disable determination of base direction of a paragraph altogether. > and the HTML standard is that HL1 is being applied, and paragraph > direction must be set. True. ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Linux HTML mail agent with RTL and LTR paragraph explicit support
> Date: Mon, 25 Jun 2012 08:06:14 +0300 > From: Shachar Shemesh > Cc: linux-il@cs.huji.ac.il > > What the standard[1] suggests, but does not > require, is the use of the first strong directional character in the > paragraph. You are wrong, it does require that. See below. > the standard says nothing about what to do with a paragraph with > no strong directional character at all. Not true: P3. If a character is found in P2 and it is of type AL or R, then set the paragraph embedding level to one; otherwise, set it to zero. The last part means that a paragraph without any strong directional character has L2R base direction. > 2. This suggestion is non-normative. The standard explicitly states > that a "higher level protocol" can be used to determine this property. That is a strange interpretation of the UBA's language. The higher level protocols clause is clearly meant for _improving_ the display by tailoring it to the needs of the application and its users. It certainly isn't meant to allow bypassing the determination of the paragraph direction altogether; that simply makes no sense. > 4. The only standard way to provide paragraph directionality in email > is by sending it as HTML I think this assertion is rather extreme. Using a bidi-aware mail client, you can certainly do that in plain text as well. Please don't forget that Unicode in general and the UBA in particular are first and foremost about plain text. Marked-up text, such as HTML, has other means to specify directionality, and thus doesn't mean the bidi formatting control characters at all. It only needs the implicit levels and the corresponding reordering. > most of the world uses various MS based email > readers. Those don't do it, and they do not violate any standard by not > doing it. As a result, if you want your email to be legible by any > recipient, HTML mail is the way to go if you are writing in Hebrew. This is a strange thing to read in a forum dedicated to advancing GNU/Linux systems. Since when do we in the Free Software movement bow to badly implemented standards in MS software and take example from there? Btw, Outlook does support directional control characters; e.g., you can use LRO..PDF to get a string of Hebrew characters displayed in strict logical order. It's just that it doesn't implement the paragraph direction part of the UBA, no doubt because the corresponding Windows text widget doesn't (I see the same problem in Notepad). > No. I am referring to all those who complain so violently when HTML mail > is sent to the list. I'm not going to complain, certainly not "violently", but please note that using HTML does have its disadvantages. E.g., look at one your message as archived by the server for this forum: http://www.mail-archive.com/linux-il@cs.huji.ac.il/msg63031.html All your careful formatting is gone, and the Hebrew text reads awkwardly (at least in my Mozilla Firefox v13.0.1). So I think using plain text and insisting on MUAs to support the UBA does have its merits. At least in Emacs 24, which I use for email (as well as for many other things), I see no problems with that. ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Linux HTML mail agent with RTL and LTR paragraph explicit support
On 06/25/2012 08:13 PM, Eli Zaretskii wrote: >> Date: Mon, 25 Jun 2012 20:02:48 +0300 >> From: Shachar Shemesh >> Cc: linux-il@cs.huji.ac.il >> >>> Therefore there is no >>> need for HTML to send RTL emails, nor is there technical need for the >>> email client to guess. >> Except there so no standard, de-facto or otherwise (as far as I'm aware) >> on whether HL1 is being applied be email clients for plain text emails, > Yes, there is such a standard: the UBA. HL1 is part of the UBA, even if you, personally, don't like it. Two implementations, opting to use HL1 one and the other not, can both conform to the UBA. There is no standard on whether HL1 should be applied or not for plain text email clients. > It explicitly applies to > plain text, in the absence of any high-level protocols. Outlook employs a higher level protocol. It is "all paragraphs are LTR, unless the user presses CTRL+RIGHT SHIFT, in which case all paragraphs are RTL". It is a valid, standard conforming protocol, even if Eli Zaretskii doesn't approve. > And there can > be no high-level protocols that disable determination of base > direction of a paragraph altogether. HL1 states the exact opposite. Last I checked (half an hour ago), it was still part of the standard. Shachar -- Shachar Shemesh Lingnu Open Source Consulting Ltd. http://www.lingnu.com ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Linux HTML mail agent with RTL and LTR paragraph explicit support
> Date: Mon, 25 Jun 2012 21:00:07 +0300 > From: Shachar Shemesh > CC: linux-il@cs.huji.ac.il > > On 06/25/2012 08:13 PM, Eli Zaretskii wrote: > >> Date: Mon, 25 Jun 2012 20:02:48 +0300 > >> From: Shachar Shemesh > >> Cc: linux-il@cs.huji.ac.il > >> > >>> Therefore there is no > >>> need for HTML to send RTL emails, nor is there technical need for the > >>> email client to guess. > >> Except there so no standard, de-facto or otherwise (as far as I'm aware) > >> on whether HL1 is being applied be email clients for plain text emails, > > Yes, there is such a standard: the UBA. > HL1 is part of the UBA, even if you, personally, don't like it. Actually, I do like it. > Two implementations, opting to use HL1 one and the other not, can > both conform to the UBA. True. > There is no standard on whether HL1 should be applied or not for plain > text email clients. True again. > Outlook employs a higher level protocol. It is "all paragraphs are LTR, > unless the user presses CTRL+RIGHT SHIFT, in which case all paragraphs > are RTL". It is a valid, standard conforming protocol Again, I think such an interpretation is against the spirit of HL1. Here's the full text: HL1. Override P3, and set the paragraph embedding level explicitly. . A higher-level protocol may set any paragraph level. This can be done on the basis of the context, such as on a table cell, paragraph, document, or system level. (P2 may be skipped if P3 is overridden). Note that this does not allow a higher-level protocol to override the limit specified in BD2. . A higher-level protocol may apply rules equivalent to P2 and P3 but default to level 1 (RTL) rather than 0 (LTR) to match overall RTL context. . A higher-level protocol may use an entirely different algorithm that heuristically auto-detects the paragraph embedding level based on the paragraph text and its context. For example, it could base it on whether there are more RTL characters in the text than LTR. As another example, when the paragraph contains no strong characters, its direction could be determined by the levels of the paragraphs before and after. This gives examples when a paragraph can be considered RTL even if it formally doesn't fit the conditions of P3. I don't understand how this can be interpreted to mean "all paragraphs are LTR". But whatever the interpretation, ... > even if Eli Zaretskii doesn't approve. ... there's no need to get personal. We can agree to disagree, you know, and still be able to conduct a civilized discussion, devoid of ad hominem. > > And there can > > be no high-level protocols that disable determination of base > > direction of a paragraph altogether. > HL1 states the exact opposite. I disagree. > Last I checked (half an hour ago), it was still part of the > standard. True. ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Web gallery software
Hi Does anybody know/use any good, open source software for hosting a gallery on a web server? Ideally it should be: - indexed - searchable - easy to browse/navigate - have author pages - links to the same artwork in several sizes - and of course have different functionality for authors and people browsing. Thanks. ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Web gallery software
On Mon, Jun 25, 2012 at 10:49 PM, Mordechai Behar < mordecha.be...@mail.huji.ac.il> wrote: > Hi > Does anybody know/use any good, open source software for hosting a gallery > on a web server? > Ideally it should be: > >- indexed >- searchable >- easy to browse/navigate >- have author pages >- links to the same artwork in several sizes >- and of course have different functionality for authors and people >browsing. > > Thanks. > > There's of course http://gallery.menalto.com/ - not sure about author pages though I think it does everything else and more... -- Shimi ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Linux HTML mail agent with RTL and LTR paragraph explicit support
On Mon, Jun 25, 2012, Shachar Shemesh wrote about "Re: Linux HTML mail agent with RTL and LTR paragraph explicit support": > I disagree completely. The embedding control characters are designed > for, well, embedding. What the standard[1] suggests, but does not > require, is the use of the first strong directional character in the > paragraph. The reasons this does not work for email are: I remember how 11 years ago, when I wrote "bidiv", a simple command-line tool to display Hebrew text files and emails (using the bidi algorithm from fribidi), I had exactly the problems you described. While the standard *does*, if I remember correctly, specify how the base direction of each paragraph is determined (using the first character with a strong direction), no standard really specified what in a text file is a "paragraph". I ended up implementing several different algorithms, but my favorite (and bidi's default) became splitting up the text file into paragraphs on empty lines. At the time, there was really no other tool for displaying bidi plain text, so I hoped that this convention would be adopted by others. I don't know if it ever was - I'm still hoping it is, or will be. I certainly haven't seen a different convention. But my biggest fear is Shachar's claim that: > 4. The only standard way to provide paragraph directionality in email > is by sending it as HTML I still believe that there's merit to plain text - a document format that is guaranteed to contain nothing but text - no icons, no fonts, no bold, no underlines, nothing, just text. I honestly think that the best format for emails (and also for instant-messaging, SMSs, tweets, and various other types of textual messages) is plain text. I refuse to admit that while plain text can still exist for English text, it cannot be used for Hebrew text. If we're missing conventions on how plain text is supposed to work in Hebrew, then by all means - let's try to define these conventions. That's what I thought I did 11 years ago with bidiv (see http://dev.man-online.org/man1/bidiv/) > However, most of the world uses various MS based email > readers. Is this actually true nowadays? Honestly *nobody* I know uses any MS-based email readers nowadays. Most home users are using some sort of web mail. Others I know use Thunderbird, the mail client in iOS or Android, Lotus Notes(!), and other stuff (some on this list use crazy things like mutt and emacs ;-)). > Those don't do it, and they do not violate any standard by not > doing it. As a result, if you want your email to be legible by any > recipient, HTML mail is the way to go if you are writing in Hebrew. > Complaining to your recipient (or sender) that they are not doing it > properly is both impolite and, which I feel many people here will see as > worse, technically incorrect. What do all these mail clients do about paragraphs et al. in Hebrew plain-text mails? I have to admit - I don't know. It's worth checking. I don't know if you yourself have actually checked them all... Some of these mail clients are open source or based on open source, by the way, so it wouldn't be to contrived to consider changing them to better show Hebrew. > No. I am referring to all those who complain so violently when HTML mail > is sent to the list. I also hate HTML email, but it has nothing to do with Hebrew. I equally hate HTML email in English ;-) -- Nadav Har'El| Monday, Jun 25 2012, 6 Tammuz 5772 n...@math.technion.ac.il |- Phone +972-523-790466, ICQ 13349191 |Long periods of drought are always http://nadav.harel.org.il |followed by rain. ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Linux HTML mail agent with RTL and LTR paragraph explicit support
On 06/25/2012 09:56 PM, Eli Zaretskii wrote: >> Outlook employs a higher level protocol. It is "all paragraphs are LTR, >> unless the user presses CTRL+RIGHT SHIFT, in which case all paragraphs >> are RTL". It is a valid, standard conforming protocol > Again, I think such an interpretation is against the spirit of HL1. I'm not sure what "the spirit" of a standard is. Standards are specifications. You are either conforming, or non conforming. This is by design, and part of the reason that standards are formed by large committees. There is an attempt to prevent threads like this one. More to the point, I think you are reading HL1 wrong. Usually, when writing standards, certain words have a meaning that has special stress when used in standard. These words are explained in RFC 2119. Now, I will readily grant you that RFC 2119 does recommend that standards that use the words with this meaning (which is, more or less, the usual English meaning of the words) refer back to RFC 2119 and capitalize those words. The UBA does neither. Further, it uses the word "can", which is not defined by RFC 2119. Still, when trying to figure out what can or cannot be written in conformance to a standard, I find these words useful. In order to bring us to a more strict agreement on what HL1 says, I've allowed myself to modify the text of the paragraph to be in conformance to RFC 2119. I've mapped "can", which is not defined by RFC 2119, to "MAY", which has, more or less, the same meaning. I've capitalized the key words I changed, so you can check whether you agree with my modifications (and because RFC 2119 suggests it): > Here's the full text: > > HL1. Override P3, and set the paragraph embedding level explicitly. > > . A higher-level protocol MAY set any paragraph level. Okay. We may. We may not. Both are okay. > This MAY be > done on the basis of the context, such as on a table cell, > paragraph, document, or system level. These are nested options. I read it to say that even if you chose to override the paragraph levels, you may still choose to use context or not. Also, notice the "document" and "system level" options. These clearly encompass the Outlook/Notepad use I mentioned in my previous email, which you found to contradict HL1. Lastly, note that the original verb, "can", is actually weaker than "may". In plain English, "may" is fairly neutral, while "can" suggest that the default action is not to. > (P2 MAY be skipped if P3 is > overridden). P2 is only used to produce data used by P3, so going through the motions of P2 if P3 is not going to be used makes no sense. I would actually recommend this changed to "SHOULD". > Note that this does not allow a higher-level protocol > to override the limit specified in BD2. So, if you do decide to explicitly set a paragraph embedding level, you may only choose one between 0 and 61 inclusive. I think we're good. > > . A higher-level protocol MAY apply rules equivalent to P2 and P3 > but default to level 1 (RTL) rather than 0 (LTR) to match overall > RTL context. > > . A higher-level protocol MAY use an entirely different algorithm > that heuristically auto-detects the paragraph embedding level > based on the paragraph text and its context. For example, it could > base it on whether there are more RTL characters in the text than > LTR. As another example, when the paragraph contains no strong > characters, its direction could be determined by the levels of the > paragraphs before and after. I think it is fairly clear that these two bullets are alternatives to the first one (and, indeed, each other). For one thing, it is technically impossible for an implementation to implement all three; they are contradicting. Since the first one covers our use, the fact that these two don't does not matter. All in all, I find it hard to see ANY policy for setting the paragraph direction that would violate HL1. Even if so, however, please notice you can achieve the exact same effect using only HL3, which is not limited in its use in any way form or shape (personally, I think that neither is HL1, but I can see where your disagreement comes from). Also note how the standard clearly states that HL1 and HL3 are contained in what HL4 and HL5 allow you to do, and these two, again and unsurprisingly, allow you to achieve the exact same effect. Again, these two are not limited in what way they are being used in any way. Personally, I see no way to read this standard but as allowing arbitrary determination of paragraph direction with no restrictions on how. > But whatever the interpretation, ... > >> even if Eli Zaretskii doesn't approve. > ... there's no need to get personal. We can agree to disagree, you > know, and still be able to conduct a civilized discussion, devoid of > ad hominem. I'm sorry about that. In my defense, you made it extremely difficult to answer to the point. You simultaneously sent out seven (!!!) replies to differ
Re: Linux HTML mail agent with RTL and LTR paragraph explicit support
On 06/25/2012 11:22 PM, Nadav Har'El wrote: > On Mon, Jun 25, 2012, Shachar Shemesh wrote about "Re: Linux HTML mail agent > with RTL and LTR paragraph explicit support": >> I disagree completely. The embedding control characters are designed >> for, well, embedding. What the standard[1] suggests, but does not >> require, is the use of the first strong directional character in the >> paragraph. The reasons this does not work for email are: > I remember how 11 years ago, when I wrote "bidiv", a simple command-line > tool to display Hebrew text files and emails (using the bidi algorithm > from fribidi), I had exactly the problems you described. While the > standard *does*, if I remember correctly, specify how the base direction > of each paragraph is determined I would use "recommends" rather than "specify". > no standard really specified what in a text file is a > "paragraph". And lucky for you that they don't. Even with the simple case of a plain text file, a paragraph is defined differently depending on whether the display is expected to do line wrapping or not. Had it said one thing, in all likelihood, your implementation would be non-conforming. > At the time, there was really no other tool for linux > for displaying bidi plain > text, so I hoped that this convention would be adopted by others. > I don't know if it ever was - I'm still hoping it is, or will be. > I certainly haven't seen a different convention. But my biggest fear > is Shachar's claim that: > >> 4. The only standard way to provide paragraph directionality in email >> is by sending it as HTML > I still believe that there's merit to plain text I agree. There is a lot of merit to plain text. However, displaying BiDi with plain text is difficult, and each implementation does it differently. The problem is further compounded for pre-line broken text. If you want your Hebrew email to appear as you have written it, you need to send it in HTML. Shachar -- Shachar Shemesh Lingnu Open Source Consulting Ltd. http://www.lingnu.com ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Linux HTML mail agent with RTL and LTR paragraph explicit support
> Date: Mon, 25 Jun 2012 23:22:03 +0300 > From: Nadav Har'El > > On Mon, Jun 25, 2012, Shachar Shemesh wrote about "Re: Linux HTML mail agent > with RTL and LTR paragraph explicit support": > > I disagree completely. The embedding control characters are designed > > for, well, embedding. What the standard[1] suggests, but does not > > require, is the use of the first strong directional character in the > > paragraph. The reasons this does not work for email are: > > While the standard *does*, if I remember correctly, specify how the > base direction of each paragraph is determined (using the first > character with a strong direction), no standard really specified > what in a text file is a "paragraph". I showed earlier a citation from the UBA where it does define that. > I ended up implementing > several different algorithms, but my favorite (and bidi's default) > became splitting up the text file into paragraphs on empty lines. Right, and that's what Emacs 24 does, invoking the high-level protocols clause of the UBA. ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Linux HTML mail agent with RTL and LTR paragraph explicit support
> Date: Tue, 26 Jun 2012 04:12:01 +0300 > From: Shachar Shemesh > CC: linux-il@cs.huji.ac.il > > On 06/25/2012 09:56 PM, Eli Zaretskii wrote: > >> Outlook employs a higher level protocol. It is "all paragraphs are LTR, > >> unless the user presses CTRL+RIGHT SHIFT, in which case all paragraphs > >> are RTL". It is a valid, standard conforming protocol > > Again, I think such an interpretation is against the spirit of HL1. > I'm not sure what "the spirit" of a standard is. Standards are > specifications. You are either conforming, or non conforming. This is by > design, and part of the reason that standards are formed by large > committees. There is an attempt to prevent threads like this one. > > More to the point, I think you are reading HL1 wrong. Well, can we at least agree that rendering Hebrew paragraphs as RTL makes them display better than what one sees in Outlook? ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Linux HTML mail agent with RTL and LTR paragraph explicit support
> Date: Tue, 26 Jun 2012 04:28:33 +0300 > From: Shachar Shemesh > Cc: linux-il@cs.huji.ac.il > > > no standard really specified what in a text file is a > > "paragraph". > And lucky for you that they don't. Even with the simple case of a plain > text file, a paragraph is defined differently depending on whether the > display is expected to do line wrapping or not. No, line wrapping is on a different level, and the UBA discusses that briefly. I don't think there's an issue here. ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il