On 28/04/2023 15:06, Nicolas George wrote:
Max Nikulin (12023-04-28):
So URI comparison is not a trivial task.

It is an impossible task unless you have specific information about the
workings of the website.

However some steps toward URL normalization should still be tried.

And you will quickly face servers that sends incorrectly Content-Type or
intentionally put application/octet-stream with no sniff header to force
browser to save the file instead of opening it e.g. in built-in PDF reader.

So what?

Usually I would trust libmagic/file(1) more than the content-type header. HTTP server may send header depending on file extension. Of course, there are cases when info provided by libmagic may be extended by Content-Type or file suffix (in URI path or download file name hint in HTTP headers): XPI browser extensions are ZIP files. Plain text file may contain markdown or reStructured text markup. You regret absence of standard way to store file type, but incorrect value may be intentionally specified there. I consider heuristics unavoidable whether with standardized place or without it.


Reply via email to