Re: Linux HTML mail agent with RTL and LTR paragraph explicit support

2012-06-25 Thread Dotan Cohen
On Mon, Jun 25, 2012 at 8:06 AM, Shachar Shemesh  wrote:
> I disagree completely. The embedding control characters are designed for,
> well, embedding.

Correct. As plain text has no concept of a paragraph, using \n, \n\n,
\r\n, \r\n\r\n, or any other convention for a paragraph is arbitrary.
So if any arbitrary part of the text is to be RTL (no matter if the
user calls it a paragraph or not) then it is to be marked as an
embedded RTL section.


> What the standard[1] suggests, but does not require, is the
> use of the first strong directional character in the paragraph. The reasons
> this does not work for email are:
>
> It is not required by the standard. It is suggested as a way to determine
> paragraph directionality, but this suggestion is incomplete. For example,
> the standard says nothing about what to do with a paragraph with no strong
> directional character at all.
> This suggestion is non-normative. The standard explicitly states that a
> "higher level protocol" can be used to determine this property.
> HTML has chosen the "higher level property" as the BiDi directionality path.
> Unless certain discussions currently in effect become standard, HTML will
> not guess the directionality of a paragraph ever, no matter how much you
> want it to. There are some discussions about adding a "direction: auto"
> property to CSS.
> The only standard way to provide paragraph directionality in email is by
> sending it as HTML
>
> A few takeaways. There is no standard I'm aware of that states you SHOULDN'T
> use the first character in a paragraph to determine paragraph direction in
> plain-text emails. I think that is a perfectly reasonable approach. However,
> most of the world uses various MS based email readers. Those don't do it,
> and they do not violate any standard by not doing it. As a result, if you
> want your email to be legible by any recipient, HTML mail is the way to go
> if you are writing in Hebrew. Complaining to your recipient (or sender) that
> they are not doing it properly is both impolite and, which I feel many
> people here will see as worse, technically incorrect.
>
>
> I know many people on this list don't like this standard, but this extra
> email did nothing to change it (not that I, personally, think that changing
> it is the right thing to do).
>

I agree with you completely in regards to interoperating with
defacto-standard software.


>> Are you referring to me, in regard to the discussion that we had in
>> which I think that the LTR- and RTL-Embedding characters should be
>> available in the Hebrew keyboard layout?
>
> No. I am referring to all those who complain so violently when HTML mail is
> sent to the list.
>

I see. I'm glad that I know how to configure my email client properly
not to notice it, and that I have the disk space to spare for some
markup. I wonder how loud those folks would scream if they noticed
that Hebrew UTF-8 characters are _two_ bytes long!


>>  That doesn't mean that I
>> dislike the idea of using HTML. Actually, I don't like HTML mail but
>> not for that reason, rather a personal preference with no root in
>> ideology nor technical reason.
>
> Okay, so maybe I was referring to you after all :-)
>

No, I'm not a complainer. I don't like _sending_ HTML mail, but I'll
happily receive the mail in any format that standard email clients
support (NOT Word!).



-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Linux HTML mail agent with RTL and LTR paragraph explicit support

2012-06-25 Thread Shachar Shemesh
On 06/25/2012 01:42 PM, Dotan Cohen wrote:
> On Mon, Jun 25, 2012 at 8:06 AM, Shachar Shemesh  wrote:
>> I disagree completely. The embedding control characters are designed for,
>> well, embedding.
> Correct.
Good. But
>  As plain text has no concept of a paragraph,
Well, that really depends on what you mean by "plain text". RLE/PDF are
defined by the UBA (Unicode BiDi Algorithm), and it, clearly, does have
a concept of a paragraph.
>  using \n, \n\n,
> \r\n, \r\n\r\n, or any other convention for a paragraph is arbitrary.
Technically true, but both irrelevant and misleading. Misleading because
the choice of \n or \r\n was arbitrary, but is now standard. Irrelevant
because we are talking about the UBA, not "plain text" (whatever that
means).
> So if any arbitrary part of the text is to be RTL (no matter if the
> user calls it a paragraph or not) then it is to be marked as an
> embedded RTL section.
This is incorrect. It does not matter much what the user calls a
paragraph, but if the /text editor/ calls a certain run a paragraph,
then that is the case.

You make it sound as if, in the sequence "something  more something
\n even more ", the third part, saying " even more" will have an
RTL level. That will simply not be the case with any UBA conforming text
editor, as UBA specifically says that any embedding levels are reset
when the paragraph is terminated. This is because the embedding controls
are *embedded* in the paragraph.

In other words, a paragraph is a paragraph, with BiDi direction, and
embedding is embedding. The two are not the same.

Shachar

-- 
Shachar Shemesh
Lingnu Open Source Consulting Ltd.
http://www.lingnu.com

___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Linux HTML mail agent with RTL and LTR paragraph explicit support

2012-06-25 Thread Dotan Cohen
Schachar, I before addressing the issue at hand, I would like to state
an observation. When I reply to your mail, all text is of the same
quote level. That is, there is a single > at the beginning of each
line, whether it is a line that you wrote or a line that I wrote.
Obviously, I am replying to the HTML portion of your multipart
message, not the plain text portion. My mailer (Gmail) does not know
that blockquote type="cite" means that the text is a quote. Why should
it, is that a standard (I don't know, it might be)? This is a good
argument against HTML mail.

You can tell me that my mailer (Gmail) is broken. But remember that
Gmail is now no less a defacto standard mailer such as Outlook once
was, and that you advocate compatibility with Outlook based on it's
defacto standard status.

I've manually fixed the nesting below:

On Mon, Jun 25, 2012 at 5:19 PM, Shachar Shemesh  wrote:
>> On 06/25/2012 01:42 PM, Dotan Cohen wrote:
>>> On Mon, Jun 25, 2012 at 8:06 AM, Shachar Shemesh 
>>> wrote:
>>>
>>> I disagree completely. The embedding control characters are designed for,
> well, embedding.
>
>> Correct.
>
> Good. But
>
>>  As plain text has no concept of a paragraph,
>
> Well, that really depends on what you mean by "plain text".

Plain text is a sequence of bytes in a standard encoding which may or
may not begin with a BOM and is designed to be read in a text editor.
A text editor is a program that reads a sequence of bytes and using a
table commonly referred to as an encoding then displays character
glyphs on the screen as per the sequence of bytes.


> RLE/PDF are
> defined by the UBA (Unicode BiDi Algorithm), and it, clearly, does have a
> concept of a paragraph.
>

Are you referring to the use of linefeeds to designate the end of an
embedded section?


>>  using \n, \n\n,
>> \r\n, \r\n\r\n, or any other convention for a paragraph is arbitrary.
>
> Technically true, but both irrelevant and misleading. Misleading because the
> choice of \n or \r\n was arbitrary, but is now standard. Irrelevant because
> we are talking about the UBA, not "plain text" (whatever that means).
>

I see that you are. Fine, I was unaware that they did call that a
paragraph and I do know that embedded sections do end at newlines.
Whatever, let us agree then that sections of text separated by
newlines are paragraphs as that is how the embedded sections end.


>> So if any arbitrary part of the text is to be RTL (no matter if the
>> user calls it a paragraph or not) then it is to be marked as an
>> embedded RTL section.
>
> This is incorrect. It does not matter much what the user calls a paragraph,
> but if the text editor calls a certain run a paragraph, then that is the
> case.
>

Alright.


> You make it sound as if, in the sequence "something  more something \n
> even more ", the third part, saying " even more" will have an RTL
> level. That will simply not be the case with any UBA conforming text editor,
> as UBA specifically says that any embedding levels are reset when the
> paragraph is terminated. This is because the embedding controls are embedded
> in the paragraph.
>
> In other words, a paragraph is a paragraph, with BiDi direction, and
> embedding is embedding. The two are not the same.
>

So we have established that sections of text separated by newlines are
paragraphs. Let us return to the issue. In a plain text file, as
defined above, there does exist a method by which the author of the
file may specify that a paragraph is to be RTL. Therefore there is no
need for HTML to send RTL emails, nor is there technical need for the
email client to guess. However, I agree that there is practical need
for the email client to guess as many users may not mark RTL
paragraphs as RTL (be them plain text or HTML).

Have I forgotten anything?

-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Linux HTML mail agent with RTL and LTR paragraph explicit support

2012-06-25 Thread Eli Zaretskii
> Date: Mon, 25 Jun 2012 13:42:20 +0300
> From: Dotan Cohen 
> Cc: linux-il@cs.huji.ac.il
> 
> [...] plain text has no concept of a paragraph [...]

That's not true.  The UBA explicitly defines a paragraph as chunk of
text delimited by paragraph separator characters:

  Paragraphs are divided by the Paragraph Separator or appropriate
  Newline Function [...]

And therefore this conclusion is incorrect:

> So if any arbitrary part of the text is to be RTL (no matter if the
> user calls it a paragraph or not) then it is to be marked as an
> embedded RTL section.

___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Linux HTML mail agent with RTL and LTR paragraph explicit support

2012-06-25 Thread Shachar Shemesh
On 06/25/2012 06:21 PM, Dotan Cohen wrote:
> Schachar, I before addressing the issue at hand, I would like to state
> an observation. When I reply to your mail, all text is of the same
> quote level. That is, there is a single > at the beginning of each
> line, whether it is a line that you wrote or a line that I wrote.
Until I debug this, I'm replying as plain text only.
> Are you referring to the use of linefeeds to designate the end of an
> embedded section?
No. I'm referring to paragraph terminators.

Dotan, may I suggest you go read the standard before making claims on
what it is saying?

>From the standard (section 3), the UBA[1] is applied by using the
following four steps:
- Separation into paragraphs
- Initialization
- Resolution of the embedding levels
- Reordering

Paragraphs are resolved in step 1 and 2. RLEs in 3. They are simply not
the same thing. BD5 defines "paragraph direction".
> So we have established that sections of text separated by newlines are
> paragraphs. Let us return to the issue. In a plain text file, as
> defined above, there does exist a method by which the author of the
> file may specify that a paragraph is to be RTL.
There exists many. Specifically, the standard, which I urge you to read,
offers one, and then specifically says that others are also okay (i.e.-
not in violation of the standard). These are mentioned in the text right
after P3, and again at HL1.

It seems to me you are trying to force your agenda.

>  Therefore there is no
> need for HTML to send RTL emails, nor is there technical need for the
> email client to guess.
Except there so no standard, de-facto or otherwise (as far as I'm aware)
on whether HL1 is being applied be email clients for plain text emails,
and the HTML standard is that HL1 is being applied, and paragraph
direction must be set.

> Have I forgotten anything?
Yes. To substantiate your claims.

Shachar

1- http://unicode.org/reports/tr9/

-- 
Shachar Shemesh
Lingnu Open Source Consulting Ltd.
http://www.lingnu.com


___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Linux HTML mail agent with RTL and LTR paragraph explicit support

2012-06-25 Thread Eli Zaretskii
> Date: Mon, 25 Jun 2012 17:19:01 +0300
> From: Shachar Shemesh 
> Cc: linux-il@cs.huji.ac.il
> 
> > \r\n, \r\n\r\n, or any other convention for a paragraph is arbitrary.
> Technically true, but both irrelevant and misleading. Misleading because
> the choice of \n or \r\n was arbitrary, but is now standard. Irrelevant
> because we are talking about the UBA, not "plain text" (whatever that
> means).

The UBA _is_ about plain text.

___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Linux HTML mail agent with RTL and LTR paragraph explicit support

2012-06-25 Thread Eli Zaretskii
> Date: Mon, 25 Jun 2012 18:21:25 +0300
> From: Dotan Cohen 
> Cc: linux-il@cs.huji.ac.il
> 
> So we have established that sections of text separated by newlines are
> paragraphs. Let us return to the issue. In a plain text file, as
> defined above, there does exist a method by which the author of the
> file may specify that a paragraph is to be RTL. Therefore there is no
> need for HTML to send RTL emails, nor is there technical need for the
> email client to guess. However, I agree that there is practical need
> for the email client to guess as many users may not mark RTL
> paragraphs as RTL (be them plain text or HTML).
> 
> Have I forgotten anything?

Yes, a few things:

 . Base paragraph direction in plain text is determined by the first
   strong directional character in the paragraph.

 . A UBA-compliant MUA will determine the base direction in plain text
   as the UBA specifies, and will display an RTL paragraph starting at
   the right margin of the window.

 . A UBA-compliant MUA _may_ decide that a paragraph is defined by
   something other than a single newline etc. (e.g., Emacs uses the
   high level protocols clause to define a paragraph as text delimited
   by empty lines, i.e. two newlines in a row).  But whatever the
   definition of a paragraph, once a paragraph _is_ detected that
   starts with a string RTL character, a compliant MUA will display it
   as RTL.

 . You can start the paragraph with the RLM or LRM character to force
   its base direction be RTL resp. LTR, because these two have streong
   directionality.

___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Linux HTML mail agent with RTL and LTR paragraph explicit support

2012-06-25 Thread Eli Zaretskii
> Date: Mon, 25 Jun 2012 20:02:48 +0300
> From: Shachar Shemesh 
> Cc: linux-il@cs.huji.ac.il
> 
> >  Therefore there is no
> > need for HTML to send RTL emails, nor is there technical need for the
> > email client to guess.
> Except there so no standard, de-facto or otherwise (as far as I'm aware)
> on whether HL1 is being applied be email clients for plain text emails,

Yes, there is such a standard: the UBA.  It explicitly applies to
plain text, in the absence of any high-level protocols.  And there can
be no high-level protocols that disable determination of base
direction of a paragraph altogether.

> and the HTML standard is that HL1 is being applied, and paragraph
> direction must be set.

True.

___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Linux HTML mail agent with RTL and LTR paragraph explicit support

2012-06-25 Thread Eli Zaretskii
> Date: Mon, 25 Jun 2012 08:06:14 +0300
> From: Shachar Shemesh 
> Cc: linux-il@cs.huji.ac.il
> 
> What the standard[1] suggests, but does not
> require, is the use of the first strong directional character in the
> paragraph.

You are wrong, it does require that.  See below.

> the standard says nothing about what to do with a paragraph with
> no strong directional character at all.

Not true:

  P3. If a character is found in P2 and it is of type AL or R, then
  set the paragraph embedding level to one; otherwise, set it to
  zero.

The last part means that a paragraph without any strong directional
character has L2R base direction.

>  2. This suggestion is non-normative. The standard explicitly states
> that a "higher level protocol" can be used to determine this property.

That is a strange interpretation of the UBA's language.  The higher
level protocols clause is clearly meant for _improving_ the display
by tailoring it to the needs of the application and its users.  It
certainly isn't meant to allow bypassing the determination of the
paragraph direction altogether; that simply makes no sense.

>  4. The only standard way to provide paragraph directionality in email
> is by sending it as HTML

I think this assertion is rather extreme.  Using a bidi-aware mail
client, you can certainly do that in plain text as well.  Please don't
forget that Unicode in general and the UBA in particular are first and
foremost about plain text.  Marked-up text, such as HTML, has other
means to specify directionality, and thus doesn't mean the bidi
formatting control characters at all.  It only needs the implicit
levels and the corresponding reordering.

> most of the world uses various MS based email
> readers. Those don't do it, and they do not violate any standard by not
> doing it. As a result, if you want your email to be legible by any
> recipient, HTML mail is the way to go if you are writing in Hebrew.

This is a strange thing to read in a forum dedicated to advancing
GNU/Linux systems.  Since when do we in the Free Software movement bow
to badly implemented standards in MS software and take example from
there?

Btw, Outlook does support directional control characters; e.g., you
can use LRO..PDF to get a string of Hebrew characters displayed in
strict logical order.  It's just that it doesn't implement the
paragraph direction part of the UBA, no doubt because the
corresponding Windows text widget doesn't (I see the same problem in
Notepad).

> No. I am referring to all those who complain so violently when HTML mail
> is sent to the list.

I'm not going to complain, certainly not "violently", but please note
that using HTML does have its disadvantages.  E.g., look at one your
message as archived by the server for this forum:

  http://www.mail-archive.com/linux-il@cs.huji.ac.il/msg63031.html

All your careful formatting is gone, and the Hebrew text reads
awkwardly (at least in my Mozilla Firefox v13.0.1).

So I think using plain text and insisting on MUAs to support the UBA
does have its merits.  At least in Emacs 24, which I use for email (as
well as for many other things), I see no problems with that.

___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Linux HTML mail agent with RTL and LTR paragraph explicit support

2012-06-25 Thread Shachar Shemesh
On 06/25/2012 08:13 PM, Eli Zaretskii wrote:
>> Date: Mon, 25 Jun 2012 20:02:48 +0300
>> From: Shachar Shemesh 
>> Cc: linux-il@cs.huji.ac.il
>>
>>>  Therefore there is no
>>> need for HTML to send RTL emails, nor is there technical need for the
>>> email client to guess.
>> Except there so no standard, de-facto or otherwise (as far as I'm aware)
>> on whether HL1 is being applied be email clients for plain text emails,
> Yes, there is such a standard: the UBA.
HL1 is part of the UBA, even if you, personally, don't like it. Two
implementations, opting to use HL1 one and the other not, can both
conform to the UBA.

There is no standard on whether HL1 should be applied or not for plain
text email clients.
>   It explicitly applies to
> plain text, in the absence of any high-level protocols.
Outlook employs a higher level protocol. It is "all paragraphs are LTR,
unless the user presses CTRL+RIGHT SHIFT, in which case all paragraphs
are RTL". It is a valid, standard conforming protocol, even if Eli
Zaretskii doesn't approve.
>   And there can
> be no high-level protocols that disable determination of base
> direction of a paragraph altogether.
HL1 states the exact opposite. Last I checked (half an hour ago), it was
still part of the standard.

Shachar

-- 
Shachar Shemesh
Lingnu Open Source Consulting Ltd.
http://www.lingnu.com


___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Linux HTML mail agent with RTL and LTR paragraph explicit support

2012-06-25 Thread Eli Zaretskii
> Date: Mon, 25 Jun 2012 21:00:07 +0300
> From: Shachar Shemesh 
> CC: linux-il@cs.huji.ac.il
> 
> On 06/25/2012 08:13 PM, Eli Zaretskii wrote:
> >> Date: Mon, 25 Jun 2012 20:02:48 +0300
> >> From: Shachar Shemesh 
> >> Cc: linux-il@cs.huji.ac.il
> >>
> >>>  Therefore there is no
> >>> need for HTML to send RTL emails, nor is there technical need for the
> >>> email client to guess.
> >> Except there so no standard, de-facto or otherwise (as far as I'm aware)
> >> on whether HL1 is being applied be email clients for plain text emails,
> > Yes, there is such a standard: the UBA.
> HL1 is part of the UBA, even if you, personally, don't like it.

Actually, I do like it.

> Two implementations, opting to use HL1 one and the other not, can
> both conform to the UBA.

True.

> There is no standard on whether HL1 should be applied or not for plain
> text email clients.

True again.

> Outlook employs a higher level protocol. It is "all paragraphs are LTR,
> unless the user presses CTRL+RIGHT SHIFT, in which case all paragraphs
> are RTL". It is a valid, standard conforming protocol

Again, I think such an interpretation is against the spirit of HL1.
Here's the full text:

  HL1. Override P3, and set the paragraph embedding level explicitly. 
  
  . A higher-level protocol may set any paragraph level. This can be
done on the basis of the context, such as on a table cell,
paragraph, document, or system level. (P2 may be skipped if P3 is
overridden). Note that this does not allow a higher-level protocol
to override the limit specified in BD2.

  . A higher-level protocol may apply rules equivalent to P2 and P3
but default to level 1 (RTL) rather than 0 (LTR) to match overall
RTL context.
  
  . A higher-level protocol may use an entirely different algorithm
that heuristically auto-detects the paragraph embedding level
based on the paragraph text and its context. For example, it could
base it on whether there are more RTL characters in the text than
LTR. As another example, when the paragraph contains no strong
characters, its direction could be determined by the levels of the
paragraphs before and after.

This gives examples when a paragraph can be considered RTL even if it
formally doesn't fit the conditions of P3.  I don't understand how
this can be interpreted to mean "all paragraphs are LTR".

But whatever the interpretation, ...

> even if Eli Zaretskii doesn't approve.

... there's no need to get personal.  We can agree to disagree, you
know, and still be able to conduct a civilized discussion, devoid of
ad hominem.

> >   And there can
> > be no high-level protocols that disable determination of base
> > direction of a paragraph altogether.
> HL1 states the exact opposite.

I disagree.

> Last I checked (half an hour ago), it was still part of the
> standard.

True.

___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Web gallery software

2012-06-25 Thread Mordechai Behar
Hi
Does anybody know/use any good, open source software for hosting a gallery
on a web server?
Ideally it should be:

   - indexed
   - searchable
   - easy to browse/navigate
   - have author pages
   - links to the same artwork in several sizes
   - and of course have different functionality for authors and people
   browsing.

Thanks.
___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Web gallery software

2012-06-25 Thread shimi
On Mon, Jun 25, 2012 at 10:49 PM, Mordechai Behar <
mordecha.be...@mail.huji.ac.il> wrote:

> Hi
> Does anybody know/use any good, open source software for hosting a gallery
> on a web server?
> Ideally it should be:
>
>- indexed
>- searchable
>- easy to browse/navigate
>- have author pages
>- links to the same artwork in several sizes
>- and of course have different functionality for authors and people
>browsing.
>
> Thanks.
>
>
There's of course http://gallery.menalto.com/ - not sure about author pages
though

I think it does everything else and more...


-- Shimi
___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Linux HTML mail agent with RTL and LTR paragraph explicit support

2012-06-25 Thread Nadav Har'El
On Mon, Jun 25, 2012, Shachar Shemesh wrote about "Re: Linux HTML mail agent 
with RTL and LTR paragraph explicit support":
> I disagree completely. The embedding control characters are designed
> for, well, embedding. What the standard[1] suggests, but does not
> require, is the use of the first strong directional character in the
> paragraph. The reasons this does not work for email are:

I remember how 11 years ago, when I wrote "bidiv", a simple command-line
tool to display Hebrew text files and emails (using the bidi algorithm
from fribidi), I had exactly the problems you described. While the
standard *does*, if I remember correctly, specify how the base direction
of each paragraph is determined (using the first character with a
strong direction), no standard really specified what in a text file is a
"paragraph". I ended up implementing several different algorithms, but
my favorite (and bidi's default) became splitting up the text file into
paragraphs on empty lines.

At the time, there was really no other tool for displaying bidi plain
text, so I hoped that this convention would be adopted by others.
I don't know if it ever was - I'm still hoping it is, or will be.
I certainly haven't seen a different convention. But my biggest fear
is Shachar's claim that:

>  4. The only standard way to provide paragraph directionality in email
> is by sending it as HTML

I still believe that there's merit to plain text - a document format
that is guaranteed to contain nothing but text - no icons, no fonts, no
bold, no underlines, nothing, just text. I honestly think that the best
format for emails (and also for instant-messaging, SMSs, tweets, and
various other types of textual messages) is plain text.

I refuse to admit that while plain text can still exist for English
text, it cannot be used for Hebrew text. If we're missing conventions on
how plain text is supposed to work in Hebrew, then by all means - let's
try to define these conventions. That's what I thought I did 11 years
ago with bidiv (see http://dev.man-online.org/man1/bidiv/)

> However, most of the world uses various MS based email
> readers.

Is this actually true nowadays? Honestly *nobody* I know uses any
MS-based email readers nowadays. Most home users are using some sort of
web mail. Others I know use Thunderbird, the mail client in iOS or Android,
Lotus Notes(!), and other stuff (some on this list use crazy things like
mutt and emacs ;-)).

> Those don't do it, and they do not violate any standard by not
> doing it. As a result, if you want your email to be legible by any
> recipient, HTML mail is the way to go if you are writing in Hebrew.
> Complaining to your recipient (or sender) that they are not doing it
> properly is both impolite and, which I feel many people here will see as
> worse, technically incorrect.

What do all these mail clients do about paragraphs et al. in Hebrew
plain-text mails? I have to admit - I don't know. It's worth checking.
I don't know if you yourself have actually checked them all...
Some of these mail clients are open source or based on open source, by the
way, so it wouldn't be to contrived to consider changing them to better
show Hebrew.

> No. I am referring to all those who complain so violently when HTML mail
> is sent to the list.

I also hate HTML email, but it has nothing to do with Hebrew. I equally
hate HTML email in English ;-)


-- 
Nadav Har'El|   Monday, Jun 25 2012, 6 Tammuz 5772
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |Long periods of drought are always
http://nadav.harel.org.il   |followed by rain.

___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Linux HTML mail agent with RTL and LTR paragraph explicit support

2012-06-25 Thread Shachar Shemesh
On 06/25/2012 09:56 PM, Eli Zaretskii wrote:
>> Outlook employs a higher level protocol. It is "all paragraphs are LTR,
>> unless the user presses CTRL+RIGHT SHIFT, in which case all paragraphs
>> are RTL". It is a valid, standard conforming protocol
> Again, I think such an interpretation is against the spirit of HL1.
I'm not sure what "the spirit" of a standard is. Standards are
specifications. You are either conforming, or non conforming. This is by
design, and part of the reason that standards are formed by large
committees. There is an attempt to prevent threads like this one.

More to the point, I think you are reading HL1 wrong.

Usually, when writing standards, certain words have a meaning that has
special stress when used in standard. These words are explained in RFC
2119. Now, I will readily grant you that RFC 2119 does recommend that
standards that use the words with this meaning (which is, more or less,
the usual English meaning of the words) refer back to RFC 2119 and
capitalize those words. The UBA does neither. Further, it uses the word
"can", which is not defined by RFC 2119. Still, when trying to figure
out what can or cannot be written in conformance to a standard, I find
these words useful. In order to bring us to a more strict agreement on
what HL1 says, I've allowed myself to modify the text of the paragraph
to be in conformance to RFC 2119. I've mapped "can", which is not
defined by RFC 2119, to "MAY", which has, more or less, the same
meaning. I've capitalized the key words I changed, so you can check
whether you agree with my modifications (and because RFC 2119 suggests it):
> Here's the full text:
>
>   HL1. Override P3, and set the paragraph embedding level explicitly. 
>   
>   . A higher-level protocol MAY set any paragraph level.
Okay. We may. We may not. Both are okay.
>  This MAY be
> done on the basis of the context, such as on a table cell,
> paragraph, document, or system level.
These are nested options. I read it to say that even if you chose to
override the paragraph levels, you may still choose to use context or
not. Also, notice the "document" and "system level" options. These
clearly encompass the Outlook/Notepad use I mentioned in my previous
email, which you found to contradict HL1.

Lastly, note that the original verb, "can", is actually weaker than
"may". In plain English, "may" is fairly neutral, while "can" suggest
that the default action is not to.
>  (P2 MAY be skipped if P3 is
> overridden).
P2 is only used to produce data used by P3, so going through the motions
of P2 if P3 is not going to be used makes no sense.  I would actually
recommend this changed to "SHOULD".
>  Note that this does not allow a higher-level protocol
> to override the limit specified in BD2.
So, if you do decide to explicitly set a paragraph embedding level, you
may only choose one between 0 and 61 inclusive. I think we're good.
>
>   . A higher-level protocol MAY apply rules equivalent to P2 and P3
> but default to level 1 (RTL) rather than 0 (LTR) to match overall
> RTL context.
>   
>   . A higher-level protocol MAY use an entirely different algorithm
> that heuristically auto-detects the paragraph embedding level
> based on the paragraph text and its context. For example, it could
> base it on whether there are more RTL characters in the text than
> LTR. As another example, when the paragraph contains no strong
> characters, its direction could be determined by the levels of the
> paragraphs before and after.
I think it is fairly clear that these two bullets are alternatives to
the first one (and, indeed, each other). For one thing, it is
technically impossible for an implementation to implement all three;
they are contradicting. Since the first one covers our use, the fact
that these two don't does not matter.

All in all, I find it hard to see ANY policy for setting the paragraph
direction that would violate HL1.

Even if so, however, please notice you can achieve the exact same effect
using only HL3, which is not limited in its use in any way form or shape
(personally, I think that neither is HL1, but I can see where your
disagreement comes from). Also note how the standard clearly states that
HL1 and HL3 are contained in what HL4 and HL5 allow you to do, and these
two, again and unsurprisingly, allow you to achieve the exact same
effect. Again, these two are not limited in what way they are being used
in any way.

Personally, I see no way to read this standard but as allowing arbitrary
determination of paragraph direction with no restrictions on how.
> But whatever the interpretation, ...
>
>> even if Eli Zaretskii doesn't approve.
> ... there's no need to get personal.  We can agree to disagree, you
> know, and still be able to conduct a civilized discussion, devoid of
> ad hominem.
I'm sorry about that. In my defense, you made it extremely difficult to
answer to the point. You simultaneously sent out seven (!!!) replies to
differ

Re: Linux HTML mail agent with RTL and LTR paragraph explicit support

2012-06-25 Thread Shachar Shemesh
On 06/25/2012 11:22 PM, Nadav Har'El wrote:
> On Mon, Jun 25, 2012, Shachar Shemesh wrote about "Re: Linux HTML mail agent 
> with RTL and LTR paragraph explicit support":
>> I disagree completely. The embedding control characters are designed
>> for, well, embedding. What the standard[1] suggests, but does not
>> require, is the use of the first strong directional character in the
>> paragraph. The reasons this does not work for email are:
> I remember how 11 years ago, when I wrote "bidiv", a simple command-line
> tool to display Hebrew text files and emails (using the bidi algorithm
> from fribidi), I had exactly the problems you described. While the
> standard *does*, if I remember correctly, specify how the base direction
> of each paragraph is determined
I would use "recommends" rather than "specify".
>  no standard really specified what in a text file is a
> "paragraph".
And lucky for you that they don't. Even with the simple case of a plain
text file, a paragraph is defined differently depending on whether the
display is expected to do line wrapping or not. Had it said one thing,
in all likelihood, your implementation would be non-conforming.
> At the time, there was really no other tool
for linux
>  for displaying bidi plain
> text, so I hoped that this convention would be adopted by others.
> I don't know if it ever was - I'm still hoping it is, or will be.
> I certainly haven't seen a different convention. But my biggest fear
> is Shachar's claim that:
>
>>  4. The only standard way to provide paragraph directionality in email
>> is by sending it as HTML
> I still believe that there's merit to plain text
I agree. There is a lot of merit to plain text. However, displaying BiDi
with plain text is difficult, and each implementation does it
differently. The problem is further compounded for pre-line broken text.

If you want your Hebrew email to appear as you have written it, you need
to send it in HTML.

Shachar

-- 
Shachar Shemesh
Lingnu Open Source Consulting Ltd.
http://www.lingnu.com


___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Linux HTML mail agent with RTL and LTR paragraph explicit support

2012-06-25 Thread Eli Zaretskii
> Date: Mon, 25 Jun 2012 23:22:03 +0300
> From: Nadav Har'El 
> 
> On Mon, Jun 25, 2012, Shachar Shemesh wrote about "Re: Linux HTML mail agent 
> with RTL and LTR paragraph explicit support":
> > I disagree completely. The embedding control characters are designed
> > for, well, embedding. What the standard[1] suggests, but does not
> > require, is the use of the first strong directional character in the
> > paragraph. The reasons this does not work for email are:
> 
> While the standard *does*, if I remember correctly, specify how the
> base direction of each paragraph is determined (using the first
> character with a strong direction), no standard really specified
> what in a text file is a "paragraph".

I showed earlier a citation from the UBA where it does define that.

> I ended up implementing
> several different algorithms, but my favorite (and bidi's default)
> became splitting up the text file into paragraphs on empty lines.

Right, and that's what Emacs 24 does, invoking the high-level
protocols clause of the UBA.

___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Linux HTML mail agent with RTL and LTR paragraph explicit support

2012-06-25 Thread Eli Zaretskii
> Date: Tue, 26 Jun 2012 04:12:01 +0300
> From: Shachar Shemesh 
> CC: linux-il@cs.huji.ac.il
> 
> On 06/25/2012 09:56 PM, Eli Zaretskii wrote:
> >> Outlook employs a higher level protocol. It is "all paragraphs are LTR,
> >> unless the user presses CTRL+RIGHT SHIFT, in which case all paragraphs
> >> are RTL". It is a valid, standard conforming protocol
> > Again, I think such an interpretation is against the spirit of HL1.
> I'm not sure what "the spirit" of a standard is. Standards are
> specifications. You are either conforming, or non conforming. This is by
> design, and part of the reason that standards are formed by large
> committees. There is an attempt to prevent threads like this one.
> 
> More to the point, I think you are reading HL1 wrong.

Well, can we at least agree that rendering Hebrew paragraphs as RTL
makes them display better than what one sees in Outlook?

___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Linux HTML mail agent with RTL and LTR paragraph explicit support

2012-06-25 Thread Eli Zaretskii
> Date: Tue, 26 Jun 2012 04:28:33 +0300
> From: Shachar Shemesh 
> Cc: linux-il@cs.huji.ac.il
> 
> >  no standard really specified what in a text file is a
> > "paragraph".
> And lucky for you that they don't. Even with the simple case of a plain
> text file, a paragraph is defined differently depending on whether the
> display is expected to do line wrapping or not.

No, line wrapping is on a different level, and the UBA discusses that
briefly.  I don't think there's an issue here.

___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il