Eugene Koifman created HIVE-5105:
------------------------------------
Summary: HCatSchema.remove(HCatFieldSchema hcatFieldSchema) does
not clean up fieldPositionMap
Key: HIVE-5105
URL: https://issues.apache.org/jira/browse/HIVE-5105
Project: Hive
Issue Type: Bug
Components: HCatalog
Affects Versions: 0.12.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
Fix For: 0.12.0
org.apache.hcatalog.data.schema.HCatSchema.remove(HCatFieldSchema
hcatFieldSchema) makes the following call:
fieldPositionMap.remove(hcatFieldSchema);
but fieldPositionMap is of type Map<String, Integer> so the element is not
getting removed
Here's a detailed comment from [~sushanth]
The result is that that the name will not be removed from fieldPositionMap.
This results in 2 things:
a) If anyone tries to append a field to a hcatschema after having removed that
field, it shouldn't fail, but it will.
b) If anyone asks for the position of the removed field by name, it will still
give the position.
Now, there is only one place in hcat code where we remove a field, and that is
called from HCatOutputFormat.setSchema, where we try to detect if the user
specified partition column names in the schema when they shouldn't have, and if
they did, we remove it. Normally, people do not specify this, and this check
tends to be superfluous.
Once we do this, we wind up serializing that new object (after performing some
validations), and this does appear to stay through the serialization (and
eventual deserialization) which is very worrying.
However, we are luckily saved by the fact that we do not append that field to
it at any time(all appends in hcat code are done on newly initialized
HCatSchema objects which have had no removes done on them), and we don't ask
for the position of something we do not expect to be there(harder to verify for
certain, but seems to be the case on inspection).
The main part that gives me worry is that HCatSchema is part of our public
interface for HCat, in that M/R programs that use HCat can use it, and thus,
they might have more interesting usage patterns that are hitting this bug.
I can't think of any currently open bugs that is caused by this because of the
rarity of the situation, but nevertheless, something we should fix immediately.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira