>From below, I believe that either step two (allowing an input of system
properties) or step three (fixing the include bug) would solve the problem.
I am not sure why the spec doesn't allow system properties to be added to
the tomcat startup scripts (i.e., -Dfile.encoding="UTF-8" , etc.), but I
trust you.

Thanks and have some Happy Holidays,

Nathan
 

-----Original Message-----
From: Pierre Delisle
To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: 12/21/00 3:42 PM
Subject: Bug #55: Default for included files is 8859_1, with no option to
set  otherwise

Trying to close a few Jasper bugs before the holiday break.
I'd appreciate at least another pair of eyes to review what I believe
should be done on that one...  -- Pierre

-----
Bug #55

   -----
   Synopsis: 
     Default for included files is 8859_1, with no option to set
otherwise. 

   Report Description: 
     The default for reading an included file is ISO_8859_1. We can,
     of course, set pageConent to read UTF-8 (which is what we need it
     to be to support international
    code). Unfortunately, when there are two or more levels of
    encoding (or the pageContent type ins't set), the encoding that
    the JspReader gets set to a hard-coded
     "ISO_8859_1", and doesn't allow this to be set to anything else
     via the runtime system properties. In:
     org.apache.jasper.compiler.JspReader JspReader.java line
     158, encoding ALWAYS defaults to 8859_1, and the file.encoding,
     when set from the System properties. This is an easy fix, to set
     encoding to: encoding =
     System.getPropert("file.encoding","8859_1") ; The result,
     typically, is that the file will flake out and convert all of the
     non-UTF-8 characters to US-ASCII, @%, etc.
     -----

I'm not sure I fully understand what's described there,
so here is what I believe should be done. 

The "encoding" for a JSP file is currently handled as follows:

1. In Compiler.java, we create a JspReader for the top-level
   ("including") jsp file using the 8859_1 encoding.

2. Using that JspReader, we check if there is a page directive
   with 'contentType' specified. If there is, then 
   a new JspReader for the page is created with the encoding set to the 
   "charset" specified in the contentType value of the page
   directive; otherwise we stick with the default 8859_1 encoding.

3. When a page is included, JspReader.pushFile() is called,
   and the encoding passed as argument appears to always 
   be null (since no encoding attribute can be specified in 
   the "include" directive, reading 'encoding' off of the 
   attributes appears to be a bug in JspParseEventListener).
   Because it is null, it always defaults to 8859_1. 

If I understand well the intent of the bug report, we'd need the 
following modifications:

- In step 2, if contentType is not specified in the "including" page,
  set the encoding to be:

     encoding = System.getProperty("file.encoding", "8859_1");

  This means that the default encoding of all JSP files at a site could
  be defined globally using system property "file.encoding".
  I don't think this is spec-compliant, and would be reluctant
  to make that change. 

- In step 3, use the encoding of the "including" page.

  This would fix what I believe is a bug in the current implementation.


Comments?

    -- Pierre

Reply via email to