Chas. Owens wrote:
> On Wed, Jan 7, 2009 at 07:41, Anže Vidmar <anz...@gmail.com> wrote:
>> hello!
>>
>> I have some nasty, non-ascii character in some files that contains php code
>> (actually somewhere in my SVN branch). What I want to do here is to
>> recursively find all the files that contains a specific non-ascii character
>> in the file. And most importantly - i need to know the name of the files
>> containing it.
>>
>> So far, I found a script that looks into a file for non-ascii characters and
>> prints this characters in hex:
>>
>> while (<>) {
>>    s/([\x80-\xff])/sprintf "\\x{%02x}",ord($1)/eg;
>>    print;
>> }
>>
>> Ok, this is good, the non-ascii character (in hex) that I'm looking for is:
>>
>> x{ef}\\x{bb}\\x{bf}
>>
>> The problem here is that I can't run this script to run recursively and I
>> don't get the name of the file that actually contains this characters.
>>
>> I've tried with bash, but since it's standard output, I can't get any
>> resault on this. Here is what I've tried:
>>
>> find |xargs /usr/local/bin/check_for_non-ascii_characters.sh  |grep -l
>> 'x{ef}\\x{bb}\\x{bf}'
>>
>> So, I need a way to recursively find non-ascii characters (a specific
>> pattern, mentioned before) in all files and I need the name of the files
>> containing it.
>>
>> It would be enough if I would be able only to see what file contains this
>> character set.
>>
>> Thanks
> 
> #!/usr/bin/perl
> 
> use strict;
> use warnings;
> 
> use File::Find;
> 
> File::Find::find(
>     sub {
>         return unless -f;
>         #refine further with a return unless /\.php$/ if desired
>         open my $fh, "<", $_
>             or die "could not open $_";
>         while (<$fh>) {
>             my $offset = 0;
>             for my $char (split //) {
>                 if (ord $char > 127) {
>                     printf "non-ascii char (%04x) in file %s on line
> %d position %d:\n%s\n",
>                         ord($char), $File::Find::name, $., $offset, $_;
>                 }
>                 $offset++;
>             }
>         }
>     },
>     @ARGV
> );

File::Find exports find() by default. It is better either to use the import or
to prevent it altogether with

  use File::Find ();

in the first place.

Rob


-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to