Re: Script to create huge sample files

Parag Kalra Sun, 03 Jan 2010 08:54:43 -0800

Thanks a bunch Shlomi.

Using your snippet now I am to create even 1 Giga file. Previously it was
throwing 'Out of Memory' message. :)


Ok coming to UTF discussion, will the following work:

use Encode;
my @all_encodings = Encode->encodings(":all");
use Encode::Guess @all_encodings;

while(<$sample_file_fh>){

    # Encoding into utf data
    $utf_internal = decode("Guess",$_);
    $utf_data = encode("utf8", $utf_internal);
    $data_string = $data_string.$utf_data;
}


And then the snippet suggested by Shlomi.

Cheers,
Parag




On Sun, Jan 3, 2010 at 9:12 PM, Shlomi Fish <shlo...@iglu.org.il> wrote:

> On Sunday 03 Jan 2010 16:25:09 Parag Kalra wrote:
> > I am curious to know more on UTF and understand related issues that may
> > creep in my algorithm. Could someone please shed some light on it.
> >
> > Can I use following:
> >
> > use Encode;
> >
>
> Make sure you add "use strict;" and "use warnings;".
>
> > while(<$sample_file_fh>){
> >
> >     # Encoding into utf data
> >     $utf_data = encode("utf8", $_);
> >     $data_string = $data_string.$utf_data;
> > }
> >
> >
> > # Checking the current length of the string
> > while(length($data_string)<$total_size){
> >     $data_string = $data_string.$data_string;
> > }
>
> This snippet:
>
> 1. Will grow the size of $data_string twice each time (exponentially).
>
> 2. Will create a very large buffer in memory.
>
> 3. Can be better written as "$data_string .= $data_string;"
>
> A better snippet would be (untested):
>
> <<<<<<<<<<<<
> {
>        open my $out_fh, ">", $out_filename
>                or die "Could not open $out_filename - $!";
>
>        my $length_so_far = 0;
>
>        while ($length_so_far < $total_size)
>        {
>                print {$out_fh} $data_string;
>
>                $length_so_far += length($data_string);
>        }
>
>        close($out_fh);
> }
> >>>>>>>>>>>>
>
> Regards,
>
>        Shlomi Fish
>
> --
> -----------------------------------------------------------------
> Shlomi Fish       http://www.shlomifish.org/
> Funny Anti-Terrorism Story - http://shlom.in/enemy
>
> Bzr is slower than Subversion in combination with Sourceforge.
> ( By: http://dazjorz.com/ )
>

Re: Script to create huge sample files

Reply via email to