Thorsten,

On 5/2/25 2:49 PM, Thorsten Heit wrote:
please excuse the long delay in answering (unplanned holidays...)

Tomcat is never going to figure out what MIME type should be used for a request like "/my/servlet/app?version=!!1.22.32-4-g8a3c060!!"

So I think Mark is probably right (well, he's right like 99.999% of the time, so...) about this being related to https://bz.apache.org/ bugzilla/ show_bug.cgi?id=69623 but I suspect your servlet is not explicitly setting a content-type.

It took quite some time debugging into our application to find out the place where the output difference happens between Tomcat 10.1.39 and 10.1.40:

We're using our own template engine that renders the HTML output, depending on the actual state of the application for the (logged-in) user. Basically the code is working the same way as this minimal reproducer:

Thanks for the reproducer. I can confirm that on my Tomcat/10.1.41-dev environment I am receiving this response:

$ curl -v http://localhost:8080/test/hello
* Host localhost:8080 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:8080...
* Connected to localhost (::1) port 8080
> GET /test/hello HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.7.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 200
< Content-Type: content/unknown;charset=UTF-8
< Content-Length: 226
< Date: Mon, 05 May 2025 15:48:59 GMT
<
<!DOCTYPE html>
<html>
  <head>
    <meta charset="UTF-8">
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" >
    <title>Hello World!</title>
  </head>
  <body>
    <p>Hello World!</p>
  </body>
</html>

* Connection #0 to host localhost left intact

Note that your call to getResource("HelloWorld.html") is technically illegal as your string must begin with a / but it worked once I got HelloWorld.html into the right place :)

If I make the following change, I get the expected text/html content-type coming back:

    var output = baos.toByteArray();

    response.setContentLength(length);
-    response.setContentType(contentType);
+    //response.setContentType(contentType);
    try (var os = response.getOutputStream()) {
      os.write(output, 0, output.length);

That is, if you don't try to change the content-type a second time.

So it seems that URLConnection.getContentType is returning an unexpected content type.

ServletContext.getResource().openConnection() is returning an instance of org.apache.catalina.webresources.CachedResource$CachedResourceURLConnection

The code for CachedResourceURLConnection.getContentType is:

        @Override
        public String getContentType() {
// "content/unknown" is the value used by sun.net.www.URLConnection. It is used here for consistency. return Objects.requireNonNullElse(getResource().getMimeType(), "content/unknown");
        }

So I think this is "expected behavior", though it may not be expected by you.

My assertion that returning text/unknown was a Tomcat bug was based upon the idea that Tomcat was either providing the wrong default (as per HTTP spec) MIME type for an HTTP response, or that Tomcat was somehow ignoring the content type you explicitly requested, but neither are the case.

The cached resource is not providing the correct MIME type, but that content type is being used without re-checking it in any way.

Instead of using the content type coming from the URLConnection, perhaps you want to use name-based MIME type detection?

contentType = request.getServletContext().getMimeType("HelloWorld.html");

But the change that did it was 84608e11906b4d56e74a3ea2f5a4df0b9e8ee09a:

+
+        @Override
+        public String getContentType() {
+ // "content/unknown" is the value used by sun.net.www.URLConnection. It is used here for consistency. + return Objects.requireNonNullElse(getResource().getMimeType(), "content/unknown");
+        }

This was added to fix this bug:
https://bz.apache.org/bugzilla/show_bug.cgi?id=69623

It seems that returning a null MIME type here causes problems downstream. I might change your code to this:

-    if (null == contentType) {
+    if (null == contentType || contentType.equals("content/unknown")) {
      contentType = "text/html";
    }

This will make the code compatible with Sun's URLConnection class when it can't determine the content-type of a resource.

I think an argument might be made for attempting to use Tomcat's existing filename-based MIME registry from within this class, though.

-chris


package com.example;

import java.io.ByteArrayOutputStream;
import java.io.IOException;

import jakarta.servlet.ServletException;
import jakarta.servlet.annotation.WebServlet;
import jakarta.servlet.http.HttpServlet;
import jakarta.servlet.http.HttpServletRequest;
import jakarta.servlet.http.HttpServletResponse;

@WebServlet("/HelloWorld")
public class HelloWorld extends HttpServlet {
   @Override
  protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
     response.setContentType("text/html; charset=UTF-8");
    var resource = request.getServletContext().getResource("HelloWorld.html");
     var connection = resource.openConnection();

     String contentType = connection.getContentType();
     if (null == contentType) {
       contentType = "text/html";
     }
     var length = connection.getContentLength();
     var baos = new ByteArrayOutputStream();
     try (var is = connection.getInputStream()) {
       is.transferTo(baos);
     }
     var output = baos.toByteArray();

     response.setContentLength(length);
     response.setContentType(contentType);
     try (var os = response.getOutputStream()) {
       os.write(output, 0, output.length);
     }
   }
}


web.xml:

<?xml version="1.0" encoding="UTF-8"?>
<web-app xmlns="http://xmlns.jcp.org/xml/ns/javaee";
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
   xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee
                       http://xmlns.jcp.org/xml/ns/javaee/web-app_3_1.xsd";
   version="3.1" metadata-complete="true">
   <display-name>Hello World Web Application</display-name>
   <servlet>
     <servlet-name>HelloWorld</servlet-name>
     <servlet-class>com.example.HelloWorld</servlet-class>
   </servlet>
   <servlet-mapping>
     <servlet-name>HelloWorld</servlet-name>
     <url-pattern>/*</url-pattern>
   </servlet-mapping>
</web-app>


HelloWorld.html:

<!DOCTYPE html>
<html>
   <head>
     <meta charset="UTF-8">
     <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" >
     <title>Hello World!</title>
   </head>
   <body>
     <p>Hello World!</p>
   </body>
</html>



With Tomcat 10.1.39 I'm getting the following result with curl:

%> curl -v http://localhost:8080/hello

(...)
* Request completely sent off
< HTTP/1.1 200
< Content-Type: text/html;charset=UTF-8
< Content-Length: 212
(...)


With Tomcat 10.1.40:

(...)
* Request completely sent off
< HTTP/1.1 200
< Content-Type: content/unknown;charset=UTF-8
< Content-Length: 212
(...)


The reason is exactly what you assumed and the change that Mark mentioned:
Since 10.1.40 the class CachedResource$CachedResourceURLConnection now has a new method "public String getContentType()" that is causing this difference...


Ok, we could change our code so that in case the content type is set to "content/unknown" we're replacing that by "text/html". OTOH with respect to our customers this isn't really a good solution: On the one hand they partly have older releases that would have to be patched. On the other hand we normally don't have control about their environment; we could only give advises, especially in this case don't upgrade to Tomcat 10.1.40...


WDYT?

Thorsten

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to