Hi,
I'm running Pig 0.10.0 in local mode on some small text files. There is
no intention to run it on Hadoop at all. We have a job that runs every 5
minutes and about 3% of the time, the job fails with the error below. It
happens at random places within the Pig Script.
2012-10-19 14:15:37,719 [Thread-15] WARN
org.apache.hadoop.mapred.LocalJobRunner - job_local_0004
java.lang.NullPointerException
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator
.processInput(PhysicalOperator.java:286)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperat
ors.POProject.getNext(POProject.java:158)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperat
ors.POProject.getNext(POProject.java:360)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator
.getNext(PhysicalOperator.java:330)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
ors.POForEach.processPlan(POForEach.java:332)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
ors.POForEach.getNext(POForEach.java:284)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator
.processInput(PhysicalOperator.java:290)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
ors.POFilter.getNext(POFilter.java:95)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator
.processInput(PhysicalOperator.java:290)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
ors.POForEach.getNext(POForEach.java:233)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator
.processInput(PhysicalOperator.java:290)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
ors.POLocalRearrange.getNext(POLocalRearrange.java:256)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
ors.POUnion.getNext(POUnion.java:165)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBa
se.runPipeline(PigGenericMapBase.java:271)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBa
se.map(PigGenericMapBase.java:266)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBa
se.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
In the Pig Log, I get
ERROR 2244: Job failed, hadoop does not return any error message
org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job
failed, hadoop does not return any error message
at
org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193
)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165
)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:555)
at org.apache.pig.Main.main(Main.java:111)
============================================================================
====
Pig script is attached.
Any help gratefully received
Thanks
Malc
--Load data from input fie
indata = LOAD '$input' USING PigStorage(',') AS (utc_ts:chararray,
local_ts:chararray, timezone:chararray, region:chararray, hostname:chararray,
stat_type:chararray,
stat_key:chararray, stat_value:long);
/**********************************************************************************
* Output: ats_stat
*
* Description: Generate output file of ATS data to load into the ats_stat_tbl
*
**********************************************************************************/
ats_total_errors = FILTER indata BY stat_key == 'IntegraStatistics/TotalErrors';
ats_total_txns = FILTER indata BY stat_key ==
'IntegraStatistics/TotalTransactions';
ats_resp_time = FILTER indata BY stat_key ==
'IntegraStatistics/UWMAResponseTime';
ats_join_data = JOIN ats_total_errors BY
(utc_ts,local_ts,timezone,region,hostname),
ats_total_txns BY
(utc_ts,local_ts,timezone,region,hostname),
ats_resp_time BY
(utc_ts,local_ts,timezone,region,hostname);
ats_out_data = FOREACH ats_join_data GENERATE $0,$1,$2,$3,$4,$7,$15,$23;
STORE ats_out_data INTO '$outdir/ats_stat.dat.$uniq_id' USING PigStorage(',');
/**********************************************************************************
* Output: ldap_stat
*
* Description: Generate output file of LDAP data to load into the
ldap_stat_tbl *
**********************************************************************************/
ldap_total_errors = FILTER indata BY stat_key ==
'LDAPStatistics/FailedRequests';
ldap_total_txns = FILTER indata BY stat_key == 'LDAPStatistics/TotalRequests';
ldap_resp_time = FILTER indata BY stat_key == 'LDAPStatistics/UWMAResponseTime';
ldap_join_data = JOIN ldap_total_errors BY
(utc_ts,local_ts,timezone,region,hostname),
ldap_total_txns BY
(utc_ts,local_ts,timezone,region,hostname),
ldap_resp_time BY
(utc_ts,local_ts,timezone,region,hostname);
ldap_out_data = FOREACH ldap_join_data GENERATE $0,$1,$2,$3,$4,$7,$15,$23;
STORE ldap_out_data INTO '$outdir/ldap_stat.dat.$uniq_id' USING PigStorage(',');
/**********************************************************************************
* Output: pcrf_stat
*
* Description: Generate output file of PCRF data to load into the
pcrf_stat_tbl *
**********************************************************************************/
pcrf_total_errors = FILTER indata BY stat_key == 'PcrfStatistics/TotalErrors';
pcrf_total_txns = FILTER indata BY stat_key ==
'PcrfStatistics/TotalRequestsSent';
pcrf_resp_time = FILTER indata BY stat_key == 'PcrfStatistics/UWMAResponseTime';
pcrf_join_data = JOIN pcrf_total_errors BY
(utc_ts,local_ts,timezone,region,hostname),
pcrf_total_txns BY
(utc_ts,local_ts,timezone,region,hostname),
pcrf_resp_time BY
(utc_ts,local_ts,timezone,region,hostname);
pcrf_out_data = FOREACH pcrf_join_data GENERATE $0,$1,$2,$3,$4,$7,$15,$23;
STORE pcrf_out_data INTO '$outdir/pcrf_stat.dat.$uniq_id' USING PigStorage(',');
/**********************************************************************************
* Output: sess_stat
*
* Description: Generate output file of Session Counts data to load into the
*
* sess_stat_tbl
*
**********************************************************************************/
sess_active = FILTER indata BY stat_key == 'SessionStatistics/ActiveSessions';
sess_total = FILTER indata BY stat_key == 'SessionStatistics/TotalSessions';
sess_duration = FILTER indata BY stat_key ==
'SessionStatistics/UWMASessionLength';
sess_join_data = JOIN sess_active BY (utc_ts,local_ts,timezone,region,hostname),
sess_total BY (utc_ts,local_ts,timezone,region,hostname),
sess_duration BY
(utc_ts,local_ts,timezone,region,hostname);
sess_out_data = FOREACH sess_join_data GENERATE $0,$1,$2,$3,$4,$7,$15,$23;
STORE sess_out_data INTO '$outdir/sess_stat.dat.$uniq_id' USING PigStorage(',');
/**********************************************************************************
* Output: radius_tps
*
* Description: Generate output file of Radius TPS data to load into the
*
* radius_tps_tbl
*
**********************************************************************************/
radius_tps_total_interims = FILTER indata BY stat_key ==
'RadiusStatistics/RadiusInterims';
radius_tps_total_starts = FILTER indata BY stat_key ==
'RadiusStatistics/RadiusStarts';
radius_tps_total_stops = FILTER indata BY stat_key ==
'RadiusStatistics/RadiusStops';
radius_tps_join_data = JOIN radius_tps_total_interims BY
(utc_ts,local_ts,timezone,region,hostname),
radius_tps_total_starts BY
(utc_ts,local_ts,timezone,region,hostname),
radius_tps_total_stops BY
(utc_ts,local_ts,timezone,region,hostname);
radius_tps_out_data = FOREACH radius_tps_join_data GENERATE
$0,$1,$2,$3,$4,$7,$15,$23;
STORE radius_tps_out_data INTO '$outdir/radius_tps.dat.$uniq_id' USING
PigStorage(',');
/**********************************************************************************
* Output: radius_bcast
*
* Description: Generate output file of Radius Broadcast data to load into
*
the radius_bcast_tbl
*
**********************************************************************************/
radius_bcast_total_errors = FILTER indata BY stat_key ==
'RadiusBroadcast/TotalErrors';
radius_bcast_total_txns = FILTER indata BY stat_key ==
'RadiusBroadcast/TotalTransactions';
radius_bcast_resp_time = FILTER indata BY stat_key ==
'RadiusBroadcast/ResponseTime';
radius_bcast_join_data = JOIN radius_bcast_total_errors BY
(utc_ts,local_ts,timezone,region,hostname),
radius_bcast_total_txns BY
(utc_ts,local_ts,timezone,region,hostname),
radius_bcast_resp_time BY
(utc_ts,local_ts,timezone,region,hostname);
radius_bcast_out_data = FOREACH radius_bcast_join_data GENERATE
$0,$1,$2,$3,$4,$7,$15,$23;
STORE radius_bcast_out_data INTO '$outdir/radius_bcast.dat.$uniq_id' USING
PigStorage(',');