On 05/23/2007 04:16 PM, Michael Higgins wrote:
-----Original Message-----
From: Michael Higgins [mailto:[EMAIL PROTECTED]
Hello, List-ers --
I've come across a problem, unsure where to ask, so
subscribed here. I upload a file through a browser. It's a
'.txt' file and it comes as text/html.
However, I've found some hyphen and single-quote like
characters that are in this text file are from a higher
codepoint... or something. What _seems_ to happen is the
browser is stripping them and my script isn't getting all the
info to dump into my database.
[8<]
-----Original Message-----
From: Scott Statland [mailto:[EMAIL PROTECTED]
The characters that you are describing, may need to be
escaped or have their codes entered.
It sounds like that they may have special meanings in either
the scripting language or in the html output.
Hmm.
I guess my question wasn't clear. The issue is a file upload that is tagged
as text/html but has wide characters in it. The file doesn't make it out of
the browser right AFAICT. (If this is obviously incorrect, please post the
correction!)
A little more pain and research let me to find this:
open F, '<', $ARGV[0] or die $!;
for (<F>){
s/([^\x00-\x7f])/sprintf('&#%d;', ord($1))/ge;
print
}
... helpful code snippet, which applied to my files before they are uploaded
gives me a new text file with lines like: "Regarding the box – the
driver wouldn’t".
The cool part is that it is uploaded fully and when viewed in a browser the
characters are displayed correctly. Duh.
Now, if I could only get the browser to fix it up like this when sending...
rather than what it was doing. Since it's going to a *nix box, I don't care
about the text/binary thing, right? I guess I could test from a 'nix Firefox
and see if the behaviour is different.
Anyone have a thought on what is happening that the browser upload fails to
accommodate text with wide chars? I don't know how it determines ... maybe
if the first char was wide, it'd go up as a different mimetype?
Cheers,
Michael Higgins
What browser is creating the problem? What O/S is that browser running on?
MSIE reportedly performs some file type heuristics, so I suspect the
browser is MSIE.
Evidently the .txt file looks like an HTML file. If it truly is an HTML
file, you might be able to fix the problem by specifying the character
set in a META tag.
Or you could compress the file with gzip or zip to convince MSIE to
leave the file alone when uploading it.
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/