[ https://issues.apache.org/jira/browse/HIVE-11118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14617271#comment-14617271 ]
Sushanth Sowmyan commented on HIVE-11118: ----------------------------------------- I have a question here - I will open another bug if need be, but if it's a simple misunderstanding, it won't matter. >From the patch, I see the following bit: {code} 337 private void ensureFileFormatsMatch(TableSpec ts, URI fromURI) throws SemanticException { 338 Class<? extends InputFormat> destInputFormat = ts.tableHandle.getInputFormatClass(); 339 // Other file formats should do similar check to make sure file formats match 340 // when doing LOAD DATA .. INTO TABLE 341 if (OrcInputFormat.class.equals(destInputFormat)) { 342 Path inputFilePath = new Path(fromURI); 343 try { 344 FileSystem fs = FileSystem.get(fromURI, conf); 345 // just creating orc reader is going to do sanity checks to make sure its valid ORC file 346 OrcFile.createReader(fs, inputFilePath); 347 } catch (FileFormatException e) { 348 throw new SemanticException(ErrorMsg.INVALID_FILE_FORMAT_IN_LOAD.getMsg("Destination" + 349 " table is stored as ORC but the file being loaded is not a valid ORC file.")); 350 } catch (IOException e) { 351 throw new SemanticException("Unable to load data to destination table." + 352 " Error: " + e.getMessage()); 353 } 354 } 355 } {code} Now, it's entirely possible that the table in question is an ORC table, but the partition being loaded is of another format, such as Text - Hive supports mixed partition scenarios. In fact, this is a likely scenario in the case of a replication of a table that used to be Text, but has been converted to Orc, so that all new partitions will be orc. Then, in that case, the destination table will be a MANAGED_TABLE, and will be an "orc" table, but import will try to load a text partition on to it. Shouldn't this refer to a partitionspec rather than the table's inputformat for this check to work with that scenario? > Load data query should validate file formats with destination tables > -------------------------------------------------------------------- > > Key: HIVE-11118 > URL: https://issues.apache.org/jira/browse/HIVE-11118 > Project: Hive > Issue Type: Bug > Affects Versions: 2.0.0 > Reporter: Prasanth Jayachandran > Assignee: Prasanth Jayachandran > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-11118.2.patch, HIVE-11118.3.patch, > HIVE-11118.4.patch, HIVE-11118.patch > > > Load data local inpath queries does not do any validation wrt file format. If > the destination table is ORC and if we try to load files that are not ORC, > the load will succeed but querying such tables will result in runtime > exceptions. We can do some simple sanity checks to prevent loading of files > that does not match the destination table file format. -- This message was sent by Atlassian JIRA (v6.3.4#6332)