File type misclassification

2007-03-20 Thread David Kastrup

Please write in English if possible, because the Emacs maintainers
usually do not have translators to read other languages for them.

Your bug report will be posted to the emacs-pretest-bug@gnu.org mailing list.

Please describe exactly what actions triggered the bug
and the precise symptoms of the bug:

Hi,

opening the following file in emacs-snapshot from Ubuntu Edgy
(sorry, I don't have a fresher CVS Emacs at work) will throw the
buffer into PostScript mode, presumably because it starts with "%!".
This seems rather like overkill.

Maybe it is already fixed in CVS: no idea.

%!TEX encoding = UTF-8 Unicode
%
% Example of Greek input
%
\documentclass[a4paper,greek]{europecv}
%\usepackage[T1]{fontenc}
\usepackage[10pt]{type1ec} % To use CB-Greek
\usepackage[greek,english]{babel} % This is mandatory
\usepackage{graphicx}
\ecvlastname{Επώνυμο}
\ecvfirstname{Όνομα}
\ecvaddress{Οδός, αριθμός, ταχυδρομικός κωδικός, πόλη, χώρα (Προαιρετικά, βλ. οδηγίες)}
\ecvtelephone{(Προαιρετικά, βλ. οδηγίες)}
\ecvfax{(Προαιρετικά, βλ. οδηγίες)}
\ecvemail{(Προαιρετικά, βλ. οδηγίες)}
\ecvnationality{(Προαιρετικά, βλ. οδηγίες)}
\ecvgender{(Προαιρετικά, βλ. οδηγίες)}
\ecvdateofbirth{\foreignlanguage{english}{You can use the Latin alphabet.}}

\begin{document}
\selectlanguage{greek}
  \begin{europecv}
  \ecvpersonalinfo
  \end{europecv}
\end{document} 

If emacs crashed, and you have the emacs process in the gdb debugger,
please include the output from the following gdb commands:
`bt full' and `xbacktrace'.
If you would like to further debug the crash, please read the file
/usr/share/emacs/22.0.50/etc/DEBUG for instructions.


In GNU Emacs 22.0.50.1 (i486-pc-linux-gnu, GTK+ Version 2.10.3)
 of 2006-09-19 on rothera, modified by Debian
 (Debian emacs-snapshot package, version 1:20060915-1)
X server distributor `The X.Org Foundation', version 11.0.70101000
configured using `configure  '--build' 'i486-linux-gnu' '--host' 
'i486-linux-gnu' '--prefix=/usr' '--sharedstatedir=/var/lib' 
'--libexecdir=/usr/lib' '--localstatedir=/var' '--infodir=/usr/share/info' 
'--mandir=/usr/share/man' '--with-pop=yes' 
'--enable-locallisppath=/etc/emacs-snapshot:/etc/emacs:/usr/local/share/emacs/22.0.50/site-lisp:/usr/local/share/emacs/site-lisp:/usr/share/emacs/22.0.50/site-lisp:/usr/share/emacs/site-lisp:/usr/share/emacs/22.0.50/leim'
 '--with-x=yes' '--with-x-toolkit=gtk' 'build_alias=i486-linux-gnu' 
'host_alias=i486-linux-gnu' 'CFLAGS=-DDEBIAN -DSITELOAD_PURESIZE_EXTRA=5000 -g 
-O2''

Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8
  default-enable-multibyte-characters: t

Major mode: PostScript

Minor modes in effect:
  server-buffer-clients: (server <*5*>)
  shell-dirtrack-mode: t
  TeX-PDF-mode: t
  server-mode: t
  desktop-save-mode: t
  tooltip-mode: t
  mouse-wheel-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  unify-8859-on-encoding-mode: t
  utf-translate-cjk-mode: t
  auto-compression-mode: t
  line-number-mode: t

Recent input:
   B  y e s  
q g M-x g n u  C-g C-x C-v  n o  
C-c C-n M-x r e v e r t - b u   y e s 
 M-x r e p o r t - e m  

Recent messages:
nnml: Reading incoming mail from file...
nnml: Reading incoming mail (no new mail)...done
Opening nndoc server on /tmp/bastmail...done
Checking new news...done
Auto-saving...
Loading ps-mode...done
When done with a buffer, type C-x #
Quit
find-alternate-file: Aborted
Loading emacsbug...done

-- 
David Kastrup


Re: File type misclassification

2007-03-20 Thread David Kastrup
[EMAIL PROTECTED] (Kim F. Storm) writes:

> David Kastrup <[EMAIL PROTECTED]> writes:
>
>>> 1) Restrict the magic for PostScript files to %!PS
>>>
>>>  ("%!PS" . ps-mode)
>>
>> And probably EPS?
>
> Dunno.
>
>>>
>>> 2) Recognize the specific case of TEX
>>>
>>>  ("%![^VT]" . ps-mode)
>
>> Sigh.  Seems like a magic string for the "TeXshop" TeX editor.  But I
>> think just ruling out [VT] is still asking for trouble.
>
> So maybe add this to the magic-mode-alist before the ps rule:
>
>   ("%!TEX" . tex-mode)

That makes "%!" even less discriminatory than your last proposal.  The
PostScript magic is _far_ too lenient.

I find strings like the following:

%!PS-Adobe-2.0

The Ghostscript example files (created manually, apparently) start
with

%!

and nothing else.  I think it perfectly feasible to detect their
document type on the file name alone.

EPS files seem to start with something like

%!PS-Adobe-2.0 EPSF-2.0

So I'd propose making the magic string for PostScript at least
%!PS

While it may be conceivable to also allow %! on a line of its own, I
would judge that too weak for content-based detection.

-- 
David Kastrup


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: File type misclassification

2007-03-20 Thread David Kastrup
[EMAIL PROTECTED] (Kim F. Storm) writes:

> "Juanma Barranquero" <[EMAIL PROTECTED]> writes:
>
>> On 3/20/07, David Kastrup <[EMAIL PROTECTED]> wrote:
>>
>>> opening the following file in emacs-snapshot from Ubuntu Edgy
>>> (sorry, I don't have a fresher CVS Emacs at work) will throw the
>>> buffer into PostScript mode, presumably because it starts with "%!".
>>> This seems rather like overkill.
>>
>> Yep. It's magic-mode-alist's doing:
>>
>> ("%![^V]" . ps-mode)
>
> First line of the file reads:
>
> %!TEX encoding = UTF-8 Unicode
>
>
> I see three fixes:
>
>
> 1) Restrict the magic for PostScript files to %!PS
>
>  ("%!PS" . ps-mode)

And probably EPS?

>
> 2) Recognize the specific case of TEX
>
>  ("%![^VT]" . ps-mode)

I don't think that there is a special case here: it would be my guess
that the author just picked that string by chance.

[Google]

Sigh.  Seems like a magic string for the "TeXshop" TeX editor.  But I
think just ruling out [VT] is still asking for trouble.

> 3) Remove it from magic-mode-alist.

Also an option in my book.  But I think we should start by making the
string much more discriminatory.  There is no harm if we overdo it: in
general, the file extension will catch what we don't, effectively
giving us option 3) for those cases.

-- 
David Kastrup


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: File type misclassification

2007-03-20 Thread David Kastrup
Stefan Monnier <[EMAIL PROTECTED]> writes:

>>>> Sigh.  Seems like a magic string for the "TeXshop" TeX editor.  But I
>>>> think just ruling out [VT] is still asking for trouble.
>>> I think a bug report to the TeXshop is in order.
>> Uh, you people are joking, right?
>
> Nope!
>
>> It is not a bug in TeXshop if Emacs' magic-mode-alist goes out of control
>> and calls everything "PostScript".
>
> The %! thingy is not Emacs's invention.  It's how postscript was
> specified.

The only relevant standard I can find starts off with "%!PS-Adobe".
In contrast, %! is far too generic to be useful.  It may be a
heuristic for a PostScript interpreter to decide whether it is getting
fed PostScript on stdin.  But it does not sound like a useful
heuristic for a text editor to decide whether a named file contains
PostScript code or anything else.

> And for that reason `file greek-utf8.tex' agrees with Emacs.
>
> This said, I'd be happy to see the %! entry removed from
> magic-mode-alist, because I think magic-mode-alist should really be
> kept to its absolute strictest minimum.

I don't think that "%!PS" has comparable potential to do accidental
harm.  Whether it does noticeable good is a different question
altogether.

However, dvips -i produces PostScript files where the extension is
replaced by a serial number.  Those will not be recognized as
PostScript without magic number detection.  "%!PS" is completely
sufficient for that purpose, however.

I think that little except hand-crafted PostScript would ever start
with "%!" alone, and hand-crafted PostScript will have a proper file
name.

Even if one uses
dvips -N
(which disabled structured comments) the file starts with
%!PS (but not EPSF; comments have been disabled)

So I think that "%!PS" _does_ have some usefulness, and it is clearly
not as overboard as "%!".

-- 
David Kastrup


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: File type misclassification

2007-03-21 Thread David Kastrup
Stefan Monnier <[EMAIL PROTECTED]> writes:

>>>>> opening the following file in emacs-snapshot from Ubuntu Edgy
>>>>> (sorry, I don't have a fresher CVS Emacs at work) will throw the
>>>>> buffer into PostScript mode, presumably because it starts with
>>>>> "%!".  This seems rather like overkill.
>
> Same old problem where the file's content and the file's extension do not
> agree on what the file actually contains.  And once again, the file
> extension is the better predictor whereas Emacs uses magic-mode-alist in
> preference to auto-mode-alist.
>
>> Sigh.  Seems like a magic string for the "TeXshop" TeX editor.  But I
>> think just ruling out [VT] is still asking for trouble.
>
> I think a bug report to the TeXshop is in order.

Uh, you people are joking, right?  It is not a bug in TeXshop if
Emacs' magic-mode-alist goes out of control and calls everything
"PostScript".

>>> 3) Remove it from magic-mode-alist.
>> Also an option in my book.
>
> Agreed, a very good option I'd say.  Especially since editing
> postscript is rather uncommon.

Since I don't seem too good at explaining what appears as common sense
to me, I'll fix the magic expression myself to "#!PS".  That's still a
less drastic change than removing it altogether, and most people seem
to agree that the latter option would be quite feasible.

This won't catch "#!\n" which seems to be used in some hand-crafted PS
files, but then the handcrafted files (vasarely.ps, for example)
should be discernible by file extension, and one would have to
actually use some generic line-ending recognizer instead of "\n",
anyway, since PostScript has no fixed line-ending convention.

If others find they want even the "#!PS" gone (which I don't really
see as a problem), or add some form of "#!lineend", feel free to
discuss and fix.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]