On Thu, Oct 05, 2006 at 10:35:09AM -0500, Mumia W. wrote: > On 10/05/2006 09:48 AM, Chad Perrin wrote: > >On Thu, Oct 05, 2006 at 09:06:11AM -0500, Mumia W. wrote: > >>> > >>Perhaps you could look at the problem in reverse. Strip out all > >>characters that are not in a certain set; e.g., you might take anything > >>that is not a digit, space, tab, alphanumeric character, period, or > >>comma and delete it. > > > >That won't work so well for characters that are garbage versions of good > >characters that are actually needed. Generally, quotes are there for a > >reason, for instance -- so just throwing away "smart quotes" rather than > >replacing them with standard vertical ASCII quotes might not be > >desirable. > > You're right and figuring out what is truly garbage and what are garbled > bytes that need to be converted is not trivial. Maybe there's a module > on CPAN...
If so, that'd definitely be the way to go. If not, there's potential for a new module out there. -- CCD CopyWrite Chad Perrin [ http://ccd.apotheon.org ] Ben Franklin: "As we enjoy great Advantages from the Inventions of others we should be glad of an Opportunity to serve others by any Invention of ours, and this we should do freely and generously." -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>