Hmmm - http://search.cpan.org/~dankogai/Encode-2.39/lib/Encode/Guess.pm

It says right at the bottom that below method won't work to guess the
encoding. :(

Cheers,
Parag




On Sun, Jan 3, 2010 at 10:23 PM, Parag Kalra <paragka...@gmail.com> wrote:

> Thanks a bunch Shlomi.
>
> Using your snippet now I am to create even 1 Giga file. Previously it was
> throwing 'Out of Memory' message. :)
>
> Ok coming to UTF discussion, will the following work:
>
> use Encode;
> my @all_encodings = Encode->encodings(":all");
> use Encode::Guess @all_encodings;
>
>
> while(<$sample_file_fh>){
>
>     # Encoding into utf data
>     $utf_internal = decode("Guess",$_);
>     $utf_data = encode("utf8", $utf_internal);
>
>     $data_string = $data_string.$utf_data;
> }
>
>
> And then the snippet suggested by Shlomi.
>
> Cheers,
> Parag
>
>
>
>
>
> On Sun, Jan 3, 2010 at 9:12 PM, Shlomi Fish <shlo...@iglu.org.il> wrote:
>
>> On Sunday 03 Jan 2010 16:25:09 Parag Kalra wrote:
>> > I am curious to know more on UTF and understand related issues that may
>> > creep in my algorithm. Could someone please shed some light on it.
>> >
>> > Can I use following:
>> >
>> > use Encode;
>> >
>>
>> Make sure you add "use strict;" and "use warnings;".
>>
>> > while(<$sample_file_fh>){
>> >
>> >     # Encoding into utf data
>> >     $utf_data = encode("utf8", $_);
>> >     $data_string = $data_string.$utf_data;
>> > }
>> >
>> >
>> > # Checking the current length of the string
>> > while(length($data_string)<$total_size){
>> >     $data_string = $data_string.$data_string;
>> > }
>>
>> This snippet:
>>
>> 1. Will grow the size of $data_string twice each time (exponentially).
>>
>> 2. Will create a very large buffer in memory.
>>
>> 3. Can be better written as "$data_string .= $data_string;"
>>
>> A better snippet would be (untested):
>>
>> <<<<<<<<<<<<
>> {
>>        open my $out_fh, ">", $out_filename
>>                or die "Could not open $out_filename - $!";
>>
>>        my $length_so_far = 0;
>>
>>        while ($length_so_far < $total_size)
>>        {
>>                print {$out_fh} $data_string;
>>
>>                $length_so_far += length($data_string);
>>        }
>>
>>        close($out_fh);
>> }
>> >>>>>>>>>>>>
>>
>> Regards,
>>
>>        Shlomi Fish
>>
>> --
>> -----------------------------------------------------------------
>> Shlomi Fish       http://www.shlomifish.org/
>> Funny Anti-Terrorism Story - http://shlom.in/enemy
>>
>> Bzr is slower than Subversion in combination with Sourceforge.
>> ( By: http://dazjorz.com/ )
>>
>
>

Reply via email to