[ https://issues.apache.org/jira/browse/HIVE-20580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16787922#comment-16787922 ]
Peter Vary commented on HIVE-20580: ----------------------------------- Based on the description I suspect that the following methods should be checked: {code:java} public static boolean isOriginal(Reader file) { return !file.hasMetadataValue(OrcRecordUpdater.ACID_KEY_INDEX_NAME); } public static boolean isOriginal(Footer footer) { for(OrcProto.UserMetadataItem item: footer.getMetadataList()) { if (item.hasName() && item.getName().equals(OrcRecordUpdater.ACID_KEY_INDEX_NAME)) { return true; } } return false; } {code} The funny thing is that the first method (with the Reader as a parameter) returns {{true}} if we do *not find* the {{hive.acid.key.index}} in the metadata list, the second method returns true if we *find* the {{hive.acid.key.index}} :) :) I think the original intention (pun intended :)) was to return true for a Non-ACID file, and false for an ACID one. The second method is used only to set {{org.apache.hadoop.hive.llap.io.metadata.OrcFileMetadata.isOriginalFormat}} which is not accessed anywhere in the code (or if so, I was not able to find), so I think we will stick to the original meaning of the isOriginal, and we should fix the second one. Tested the first part (Reader based check only) of the change with using the following commands: {code:java|title=Non ACID} 0: jdbc:hive2://localhost:10003/default> load data inpath 'original.orc' into table acid; [..] INFO : Completed executing command(queryId=petervary_20190308140915_3e1ee5ef-22ec-4cd5-9353-7b00f0702e4d); Time taken: 10.706 seconds {code} {code:java|title=ACID} 0: jdbc:hive2://localhost:10003/default> load data inpath 'acid.orc' into table acid; Error: Error while compiling statement: FAILED: SemanticException [Error 10413]: "acid.orc" was created by Acid write - it cannot be loaded into anther Acid table (state=42000,code=10413) {code} Also created a little code to test the stuff on specific files: {code} import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hive.ql.io.orc.OrcFile; import org.apache.hadoop.hive.ql.io.orc.OrcInputFormat; import org.apache.hadoop.hive.ql.io.orc.Reader; import org.apache.orc.OrcProto; import java.io.IOException; public class a { public static void main(String[] args) throws IOException { // String path = "/Users/petervary/tmp/orc_split_elim.orc"; // Non-ACID file String path = "/Users/petervary/tmp/bucket_00000"; // ACID file Reader reader = OrcFile.createReader(new Path(path), OrcFile.readerOptions(new Configuration())); OrcProto.Footer footer = reader.getFileTail().getFooter(); boolean result1 = OrcInputFormat.isOriginal(reader); boolean result2 = OrcInputFormat.isOriginal(footer); System.out.println("IsOriginal: " + result1 + " " + result2); } } {code} [~vgumashta], [~ashutoshc]: Any easy way to write a unit test? I think the best would be to have 3 test files in the {{/data/files/}} directory: * Non-ACID orc file * ACID v1 file * ACID v2 file And the test code above could be used the check the result of the isOriginal method. Shall I create the test files myself, or you know some files that are already there and I can use them? Thanks, Peter > OrcInputFormat.isOriginal() should not rely on hive.acid.key.index > ------------------------------------------------------------------ > > Key: HIVE-20580 > URL: https://issues.apache.org/jira/browse/HIVE-20580 > Project: Hive > Issue Type: Improvement > Components: Transactions > Affects Versions: 3.1.0 > Reporter: Eugene Koifman > Assignee: Peter Vary > Priority: Major > > {{org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.isOriginal()}} is checking > for presence of {{hive.acid.key.index}} in the footer. This is only created > when the file is written by {{OrcRecordUpdater}}. It should instead check > for presence of Acid metadata columns so that a file can be produced by > something other than {{OrcRecordUpater}}. > Also, {{hive.acid.key.index}} counts number of different type of events which > is not really useful for Acid V2 (as of Hive 3) since each file only has 1 > type of event. -- This message was sent by Atlassian JIRA (v7.6.3#76005)