On Tue, Nov 19, 2013 at 10:32 AM, Bill Moseley <[email protected]> wrote:

> Anyone aware of a good, portable way in Perl to encode the filename in a
> Content-Disposition header? I would like to support UTF8 filenames, but
> support in browsers is unclear (if not changing).
>
> Is this complexity something that the Catalyst framework should handle?
> It's one of those areas where it's easy to get wrong (I can see many
> different approaches in our own code).
>
> http://greenbytes.de/tech/tc2231/
>
>
> http://stackoverflow.com/questions/93551/how-to-encode-the-filename-parameter-of-content-disposition-header-in-http
>

I have no idea what the client can accept or what its OS uses as a
path-separator, and I don't want to go down the client-sniffing path,
anyway.

I have a user-supplied character string that I want to use as the filename,
which I have to assume can contain any unicode character since it's
user-supplied data.

>From my limited tests it seems most modern browsers are supporting the
"filename*" extension.   Each browser does some special handling (like
replacing the path-separator, or adding a file extension based on
content-type if no file extension is in the filename).


All I want to do is make valid HTTP headers and let the client decide how
to handle it, but also provide a usable filename (not just underscores, for
example).


So, all I'm after is to make this valid markup:

$c->res->header( content_disposition =>
        qq[attachment; filename="$ascii_file"; filename*=UTF-8''$utf8_file]
);



The filename* is easy, I'm finding:

my $utf8_file = uri_escape( Encode::encode( 'UTF-8' => $filename ) );



But the $ascii_file is a bit more work.  Percent-encoding doesn't work.
So, have to do a bit of filtering.


See any easier/cleaner/more-correct approach?   When I see this much code I
tend to think it's the wrong approach.


# Convert to ASCII using underscore as replacement

my $ascii_file = Encode::encode( ascii => $filename, sub { '_' } );

# Remove quotes as we want to use quoted form of "filename" and preserve
whitespace.

$ascii_file =~ s/"/_/g;

# Replace non-printable characters with underscore, and collapse dups

$ascii_file =~ s/[^[:print:]]/_/g;
$ascii_file =~ s/_{2,}/_/g;

# Split off the extension so can check length of filename w/o extension.

# Of course, $ext could end up as dot + underscore.

my ( $base, $ext ) = split /(\.\w+)$/, $ascii_file;

# Use default filename if we don't have more than three "meaningful"
characters.

# very subjective.

$base = 'your_file' unless ( () = $base =~ /[A-Za-z0-9]/g ) > 3;

# Stuff the extension back on.

$ascii_file = $base;
$ascii_file .= $ext if defined $ext;



Again, "filename*" support is good, and I'm not trying to prevent buggy
clients from doing something stupid (e.g. filename=/etc/passwd), but want
to provide a reasonable fallback to "filename".

Perhaps the simple solution is to always use "filename=your_file" and hope
most clients use the filename* extension.


-- 
Bill Moseley
[email protected]
_______________________________________________
List: [email protected]
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/[email protected]/
Dev site: http://dev.catalyst.perl.org/

Reply via email to