Re: sending marc records into a script that uses MARC::Batch

2014-05-30 Thread Stefano Bargioni
If I'm not wrong, 
$batch->strict_off();
will avoid your loop to print warnings and stop processing records.
HTH. Stefano

On 29/mag/2014, at 23.13, John E Guillory wrote:

> Thanks Timothy for your help.
>  
> When processing about 5 million records I would expect some crazy records. 
> The new script (incorporating Timothy’s  suggestions) exited prematurely on 
> record 85,877 with: “Warnings detected: Entirely empty subfield found in tag 
> 260”. I know 260 is publication stuff but it’s not “required”.  I’m 
> deliberately printing warnings but again the script exited prematurely.
>  
> Thanks for assistance.
> John
>  
>  
>  
>  
>  
>  
> From: Timothy Prettyman [mailto:timo...@umich.edu] 
> Sent: Thursday, May 29, 2014 11:23 AM
> To: John E Guillory
> Cc: perl4lib@perl.org
> Subject: Re: sending marc records into a script that uses MARC::Batch
>  
> For your first question, instead of:
>  
>  $batch = MARC::Batch->new(‘USMARC’,);
>  
> use:
>  
>  $batch = MARC::Batch->new(‘USMARC’,STDIN);
>  
> For your second, the error is likely caused when a field you're using 
> as_string() on doesn't exist in the record.  
>  
> So, you could do something like the following:
>  
> $field = $record->field('008');
> $field or do {  # check for existence 
> of field
>print "no 008 field for record\n";# no field
>next;  # skip the field 
> (or whatever)
> };
> $field_008 = $field->as_string();
>  
> Hope this helps
>  
> -Tim
>  
> Timothy Prettyman
> LIT/Library Systems
> University of Michigan
>  
> 
> On Thu, May 29, 2014 at 12:08 PM, John E Guillory  wrote:
> Hello,
> Two questions please:
>  
> 1.  I’ve written a script that opens a marc file for reading using this 
> syntax:
> 
>  
> $file = $ARGV[0];
> $batch = MARC::Batch->new('USMARC',$file);
>  
> It then loops thru the records using this syntax:
> while ( $record = $batch->next()) {
>  …..check position 6, 7 of leader and position 23 of 008 and make 
> some changes
> }
>  
> This works great. However, instead of accessing the file this way, I want to 
> pipe the output of a previously run marc dump command directly into this 
> script via the pipe.  
> I understand that this can be done using this syntax:while ($line 
> =){ …}, but I don’t understand how to use that STDIN with 
> “MARC::Batch->new(‘USMARC’,$file);”This does not work:$batch = 
> MARC::Batch->new(‘USMARC’,);
>  
> 2.  My current script successfully reads and processes a marc file of 
> over 5 gigs!but exits entirely on record 160,585 with the error from 
> MARC::Batch, “Can't call method "as_string" on an undefined value at 
> ./marc_batch.pl”.  Documentation on using MARC::Batch says that to tell it to 
> continue processing even when errors are encountered one should use 
> strict_off(), then print/report warnings at the bottom of the script. I don’t 
> think my particular error is being handled by the strict_off() setting. 
> Doesn’t anybody know what causes/how to fix “Can’t call method as_string?” 
> error? Full script below—it’s pretty short, thanks to MARC::Batch.
> 
>  
> Thanks for ensights! 
>  
>  
> use MARC::Batch;
>  
> $file = $ARGV[0];
> chomp($file);
>  
> $batch = MARC::Batch->new('USMARC',$file);
> $batch->strict_off();# otherwise script exits when encounters errors
>  
> open(OUT,'>new_marc');
>  
> while ( $record = $batch->next()) {
> $leader= $record->leader();
> $leader_pos_6  = substr($leader,6,1);
> $leader_pos_7  = substr($leader,7,1);
>  
> $field = $record->field('008');
> $field_008 = $field->as_string();
> $field_008_position_23 = substr($field_008,23,1);
>  
> if ( ($leader_pos_6 eq "a") && ($leader_pos_7 eq "m") && 
> ($field_008_position_23 eq "o") || ($field_008_position_23 eq "s") ) {
>  
>$control_num= $record->field('001');
>$control_num= $control_num->as_string();
>  
>print "008 position 23: $field_008_position_23 \n";
>print "OLD leader: $leader \n";
>$old_leader = $leader;
>substr($leader,6,1) = 'm';
>print "NEW leader: $leader \n";
>  
>print OUT $record->as_usmarc();
>   print "$control_num|$old_leader|$leader|$field_008\n";
>   
> } else {  # not a match so just print this one unchanged…
>print OUT $record->as_usmarc();
> }
>  
> }
>  
> # handles errors:
> if (@warnings = $batch->warnings()) {
>  print "\n Warnings detected: \n", @warnings;
> }
>  
> close(OUT);
> close(LOG);
>  
>  
>  
> John Guillory
> Louisiana Library Network
> 225.578.3758
>  
>  



__
Il tuo 5x1000 al Patronato di San Girolamo della Carità è un gesto semplice ma 
di grande valore.
Una tua firma aiuterà i sacerdoti ad essere più vicini alle esigenze di tutti 
noi.
Aiutaci a for

Re: sending marc records into a script that uses MARC::Batch

2014-05-30 Thread Timothy Prettyman
I think you have to check for warnings as you read each record, so try
moving your error handing code right after the batch->next() call.  But
Robin's suggestion is good advice, and is probably a more robust way to
handle the crud that can show up in a file of marc records.

-Tim


On Fri, May 30, 2014 at 5:20 AM, Stefano Bargioni  wrote:

> If I'm not wrong,
> $batch->strict_off();
> will avoid your loop to print warnings and stop processing records.
> HTH. Stefano
>
> On 29/mag/2014, at 23.13, John E Guillory wrote:
>
>  Thanks Timothy for your help.
>
>
>
> When processing about 5 million records I would expect some crazy records.
> The new script (incorporating Timothy’s  suggestions) exited prematurely on
> record 85,877 with: “Warnings detected: Entirely empty subfield found in
> tag 260”. I know 260 is publication stuff but it’s not “required”.  I’m
> deliberately printing warnings but again the script exited prematurely.
>
>
>
> Thanks for assistance.
>
> John
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From:* Timothy Prettyman [mailto:timo...@umich.edu]
> *Sent:* Thursday, May 29, 2014 11:23 AM
> *To:* John E Guillory
> *Cc:* perl4lib@perl.org
> *Subject:* Re: sending marc records into a script that uses MARC::Batch
>
>
>
> For your first question, instead of:
>
>
>
>  $batch = MARC::Batch->new(‘USMARC’,);
>
>
>
> use:
>
>
>
>  $batch = MARC::Batch->new(‘USMARC’,STDIN);
>
>
>
> For your second, the error is likely caused when a field you're using
> as_string() on doesn't exist in the record.
>
>
>
> So, you could do something like the following:
>
>
>
> $field = $record->field('008');
>
> $field or do {  # check for
> existence of field
>
>print "no 008 field for record\n";# no field
>
>next;  # skip the field
> (or whatever)
>
> };
>
> $field_008 = $field->as_string();
>
>
>
> Hope this helps
>
>
>
> -Tim
>
>
>
> Timothy Prettyman
>
> LIT/Library Systems
>
> University of Michigan
>
>
>
> On Thu, May 29, 2014 at 12:08 PM, John E Guillory  wrote:
>
> Hello,
>
> Two questions please:
>
>
>
> 1.  I’ve written a script that opens a marc file for reading using
> this syntax:
>
>
>
> $file = $ARGV[0];
>
> $batch = MARC::Batch->new('USMARC',$file);
>
>
>
> It then loops thru the records using this syntax:
>
> while ( $record = $batch->next()) {
>
>  …..check position 6, 7 of leader and position 23 of 008 and make
> some changes
>
> }
>
>
>
> This works great. However, instead of accessing the file this way, I want
> to pipe the output of a previously run marc dump command directly into this
> script via the pipe.
>
> I understand that this can be done using this syntax:while ($line
> =){ …}, but I don’t understand how to use that STDIN with
> “MARC::Batch->new(‘USMARC’,$file);”This does not work:$batch =
> MARC::Batch->new(‘USMARC’,);
>
>
>
> 2.  My current script successfully reads and processes a marc file of
> over 5 gigs!but exits entirely on record 160,585 with the error from
> MARC::Batch, “Can't call method "as_string" on an undefined value at ./
> marc_batch.pl”.  Documentation on using MARC::Batch says that to tell it
> to continue processing even when errors are encountered one should use
> strict_off(), then print/report warnings at the bottom of the script. I
> don’t think my particular error is being handled by the strict_off()
> setting. Doesn’t anybody know what causes/how to fix “Can’t call method
> as_string?” error? Full script below—it’s pretty short, thanks to
> MARC::Batch.
>
>
>
> Thanks for ensights!
>
>
>
>
>
> use MARC::Batch;
>
>
>
> $file = $ARGV[0];
>
> chomp($file);
>
>
>
> $batch = MARC::Batch->new('USMARC',$file);
>
> $batch->strict_off();# otherwise script exits when encounters errors
>
>
>
> open(OUT,'>new_marc');
>
>
>
> while ( $record = $batch->next()) {
>
> $leader= $record->leader();
>
> $leader_pos_6  = substr($leader,6,1);
>
> $leader_pos_7  = substr($leader,7,1);
>
>
>
> $field = $record->field('008');
>
> $field_008 = $field->as_string();
>
> $field_008_position_23 = substr($field_008,23,1);
>
>
>
> if ( ($leader_pos_6 eq "a") && ($leader_pos_7 eq "m") &&
> ($field_008_position_23 eq "o") || ($field_008_position_23 eq "s") ) {
>
>
>
>$control_num= $record->field('001');
>
>$control_num= $control_num->as_string();
>
>
>
>print "008 position 23: $field_008_position_23 \n";
>
>print "OLD leader: $leader \n";
>
>$old_leader = $leader;
>
>substr($leader,6,1) = 'm';
>
>print "NEW leader: $leader \n";
>
>
>
>print OUT $record->as_usmarc();
>
>   print "$control_num|$old_leader|$leader|$field_008\n";
>
>
>
> } else {  # not a match so just print this one unchanged…
>
>print OUT $record->as_usmarc();
>
> }
>
>
>
> }
>
>
>
> # handles errors:
>