[ https://issues.apache.org/jira/browse/FLINK-13292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Flink Jira Bot updated FLINK-13292: ----------------------------------- Labels: auto-deprioritized-major auto-deprioritized-minor (was: auto-deprioritized-major stale-minor) Priority: Not a Priority (was: Minor) This issue was labeled "stale-minor" 7 days ago and has not received any updates so it is being deprioritized. If this ticket is actually Minor, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > NullPointerException when reading a string field in a nested struct from an > Orc file. > ------------------------------------------------------------------------------------- > > Key: FLINK-13292 > URL: https://issues.apache.org/jira/browse/FLINK-13292 > Project: Flink > Issue Type: Bug > Components: Connectors / ORC > Affects Versions: 1.8.0 > Reporter: Alejandro Sellero > Priority: Not a Priority > Labels: auto-deprioritized-major, auto-deprioritized-minor > Attachments: LinkField.png, one_row.json, output.orc > > > When I try to read an Orc file using flink-orc an NullPointerException > exception is thrown. > I think this issue could be related with this closed issue > https://issues.apache.org/jira/browse/FLINK-8230 > This happens when trying to read the string fields in a nested struct. This > is my schema: > {code:java} > "struct<" + > "operation:int," + > "originalTransaction:bigInt," + > "bucket:int," + > "rowId:bigInt," + > "currentTransaction:bigInt," + > "row:struct<" + > "id:int," + > "headline:string," + > "user_id:int," + > "company_id:int," + > "created_at:timestamp," + > "updated_at:timestamp," + > "link:string," + > "is_html:tinyint," + > "source:string," + > "company_feed_id:int," + > "editable:tinyint," + > "body_clean:string," + > "activitystream_activity_id:bigint," + > "uniqueness_checksum:string," + > "rating:string," + > "review_id:int," + > "soft_deleted:tinyint," + > "type:string," + > "metadata:string," + > "url:string," + > "imagecache_uuid:string," + > "video_id:int" + > ">>",{code} > {code:java} > [error] Caused by: java.lang.NullPointerException > [error] at java.lang.String.checkBounds(String.java:384) > [error] at java.lang.String.<init>(String.java:462) > [error] at > org.apache.flink.orc.OrcBatchReader.readString(OrcBatchReader.java:1216) > [error] at > org.apache.flink.orc.OrcBatchReader.readNonNullBytesColumnAsString(OrcBatchReader.java:328) > [error] at > org.apache.flink.orc.OrcBatchReader.readField(OrcBatchReader.java:215) > [error] at > org.apache.flink.orc.OrcBatchReader.readNonNullStructColumn(OrcBatchReader.java:453) > [error] at > org.apache.flink.orc.OrcBatchReader.readField(OrcBatchReader.java:250) > [error] at > org.apache.flink.orc.OrcBatchReader.fillRows(OrcBatchReader.java:143) > [error] at > org.apache.flink.orc.OrcRowInputFormat.ensureBatch(OrcRowInputFormat.java:333) > [error] at > org.apache.flink.orc.OrcRowInputFormat.reachedEnd(OrcRowInputFormat.java:313) > [error] at > org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:190) > [error] at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) > [error] at java.lang.Thread.run(Thread.java:748){code} > Instead to use the TableApi I am trying to read the orc files in the Batch > mode as following: > {code:java} > env > .readFile( > new OrcRowInputFormat( > "", > "SCHEMA_GIVEN_BEFORE", > new HadoopConfiguration() > ), > "PATH_TO_FOLDER" > ) > .writeAsText("file:///tmp/test/fromOrc") > {code} > Thanks for your support -- This message was sent by Atlassian Jira (v8.20.1#820001)