On Dec 22, 2007 3:38 AM, Dermot <[EMAIL PROTECTED]> wrote:
>
>
>
> On 22/12/2007, Jay Savage <[EMAIL PROTECTED]> wrote:
> >
> > On Dec 20, 2007 3:54 AM, Dr.Ruud <[EMAIL PROTECTED]> wrote:
> > > Rob Dixon schreef:
> > > > Dr.Ruud wrote:
> > > >> Jay Savage schreef:
> > > >>> Corin Lawson wrote:
> >
> >
> > I also think you may be confusing logical punctuation with typography
> > and character encodings. Any use of two consecutive quotation marks
> > is, by definition, a double quote. Whether that is represented by one
> > or more characters in a given encoding, and whether the visual and/or
> > programatic representation of those marks is, e.g. ',`,<,&#8216;,
> > will, of course, depend on your locale and encoding. Some encodings
> > and markups do provide shortcuts for common doubblings that one should
> > be aware of. For instance, HTML provides characters &#8220; and
> > &#8221;, ASCII provides character 0x42, and Perl itself he qq//
> > operator. The existence of these typographical and programatic
> > conventions and shortcuts, though, doesn't mean that e.g. "``" is in
> > any way less of a double quote than e.g. """. This is precisely why
> > languages like LaTeX separate out the logical quote from the
> > typographical representation.
>
> I hope I am not putting an size 9s in it here but I want to make sure I am
> getting the point correct.
>
> Is it correct then that 2 x ' is the same as 1 x " when looked from a
> pattern-matching point of view? Put another way a single ascii octal value
> 42 is the same as 2 ascii values 47 in the context of the perl regex engine?
> Or is it the other way round; because it's possible to encode one way or the
> other that the encoding dictates what's to be searched for.
>
> Sorry to labour the point, it just roused my curiosity.
> Dp.

No, searching for qq/\x27word\x27/ will require a different regex from
searching for qq/\x22\x22word\x22\x22/. I was just defending my
decision to refer to \x22\x22 as a "double quote." The existence of
047 is just  holdover from the handpress era, when combining
frequently-used combinations of characters enabled typesetters to set
texts more efficiently.

But unfortunately in most fonts, there isn't any way to visually
differentiate between a single character 047 and two characters 042,
which is a problem. '' and " look exactly the same in most
variable-width fonts, which is the point. ASCII 047 is a character. So
are utf8 code points 8022 and 8021. "Double quote," though, isn't a
character, it's a function.


HTH,

-- jay
--------------------------------------------------
This email and attachment(s): [  ] blogable; [ x ] ask first; [  ]
private and confidential

daggerquill [at] gmail [dot] com
http://www.tuaw.com  http://www.downloadsquad.com  http://www.engatiki.org

values of β will give rise to dom!

Reply via email to