[
https://issues.apache.org/jira/browse/PIG-3401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sergey resolved PIG-3401.
-------------------------
Resolution: Won't Fix
Sorry, It was 4 AM, I spent 4 hours trying to fix it and I was tired.
The problem was:
Relation A had atom branch_id:null (type hasn't been explicitly defined).
Ration B didn' have such field. I did UNION ONSCHEMA and expected that
branch_id:null should be interpreted correctly. It's out of UNION scope. There
is no need to do someting special with this atom and it's type.
I've explicitly defined type for this atom and now UNION ONSCHEMA works.
I'm going to create an improvement for UNION statement. It was really hard to
guess where the problem was.
Sorry for wasting your time.
> UNION on schema throws ExecException: ERROR 2055
> -------------------------------------------------
>
> Key: PIG-3401
> URL: https://issues.apache.org/jira/browse/PIG-3401
> Project: Pig
> Issue Type: Bug
> Components: grunt
> Affects Versions: 0.11
> Environment: local
> Reporter: Sergey
>
> Hi, I get strange exception when trying to union two relations by schema.
> It works when one of relations doesn't have any records.
> It breaks when both relations are not empty.
> Here is a part of the code:
> {code}
> lastEndPoints24h = LOAD '$lastEndPoints24h' USING
> org.apache.pig.piggybank.storage.avro.AvroStorage();
> describe lastEndPoints24h;
> dump lastEndPoints24h;
> lastEndPoints24hProj = FOREACH lastEndPoints24h GENERATE msisdn, ts,
> center_lon,
> center_lat,
> lac, cid, lon,
> lat, cell_type, is_active, azimuth, hpbw, max_dist,
> tile_id,
> zone_col, zone_row,
> is_end_point,
> end_point_type;
> describe lastEndPoints24hProj;
> dump lastEndPoints24hProj;
> unionOfPivotsAndLastEndPoints = UNION ONSCHEMA validPivotsProj,
> lastEndPoints24hProj;
> describe unionOfPivotsAndLastEndPoints;
> --dump unionOfPivotsAndLastEndPoints;
> groupedValidPivots = GROUP unionOfPivotsAndLastEndPoints BY msisdn;
> dump groupedValidPivots;
> {code}
> Something bad happens when I try to access union result in relation
> unionOfPivotsAndLastEndPoints.
> I can say for sure that relation lastEndPoints24h is correctly opened.
> Here is a proof:
> {code}
> 2013-07-29 03:34:18,833 [main] INFO
> org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
> HadoopVersion PigVersion UserId StartedAt FinishedAt Features
> 2.0.0-cdh4.3.0 0.11.0-cdh4.3.0 ssa 2013-07-29 03:34:13
> 2013-07-29 03:34:18 UNKNOWN
> Success!
> Job Stats (time in seconds):
> JobId Alias Feature Outputs
> job_local634744752_0006 lastEndPoints24h MAP_ONLY
> file:/tmp/temp-1898051886/tmp-1962855781,
> Input(s):
> Successfully read records from:
> "/home/ssa/devel/lololabs/analyt/some_analyt_case/src/test/resources/pig/route_pivot_preparator/test_2013_07_23/lastEndPoints24h.avro"
> Output(s):
> Successfully stored records in: "file:/tmp/temp-1898051886/tmp-1962855781"
> Job DAG:
> job_local634744752_0006
> {code}
> And here is schema and dump for it's projection lastEndPoints24hProj:
> {code}
> (79263332100,1374521131,37.553441893272755,55.880436657140294,7712,24316,37.5473,55.8792,OUTDOOR,true,75,60,1102,49646,469,410,true,JITTER_START)
> lastEndPoints24hProj: {msisdn: long,ts: long,center_lon: double,center_lat:
> double,lac: int,cid: int,lon: double,lat: double,cell_type:
> chararray,is_active: boolean,azimuth: int,hpbw: int,max_dist: int,tile_id:
> int,zone_col: int,zone_row: int,is_end_point: boolean,end_point_type:
> chararray}
> {code}
> When this file is empty (one of test cases), script works correctly.
> When this file is not empty I do get
> {code}
> 2013-07-29 03:34:47,898 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR
> 1066: Unable to open iterator for alias groupedValidPivots
> Details at logfile:
> /home/ssa/devel/lololabs/analyt/some_analyt_case/src/main/resources/pig/pig_1375054429131.log
> {code}
> An exception from log file
> {code}
> Pig Stack Trace
> ---------------
> ERROR 1066: Unable to open iterator for alias groupedValidPivots
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
> open iterator for alias groupedValidPivots
> at org.apache.pig.PigServer.openIterator(PigServer.java:838)
> at
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:696)
> at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:320)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
> at org.apache.pig.Main.run(Main.java:604)
> at org.apache.pig.Main.main(Main.java:157)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> Caused by: java.io.IOException: Job terminated with anomalous status FAILED
> at org.apache.pig.PigServer.openIterator(PigServer.java:830)
> ... 12 more
> ================================================================================
> {code}
> Any "touch" of union gives an error with test: "unable to open iterator for
> alias ..."
> Schemas are fully defined, field names do match. What's the problem?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira