BELUGA BEHR created HIVE-18956: ---------------------------------- Summary: AvroSerDe Race Condition Key: HIVE-18956 URL: https://issues.apache.org/jira/browse/HIVE-18956 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 2.3.2, 3.0.0 Reporter: BELUGA BEHR
{code} @Override public Writable serialize(Object o, ObjectInspector objectInspector) throws SerDeException { if(badSchema) { throw new BadSchemaException(); } return getSerializer().serialize(o, objectInspector, columnNames, columnTypes, schema); } @Override public Object deserialize(Writable writable) throws SerDeException { if(badSchema) { throw new BadSchemaException(); } return getDeserializer().deserialize(columnNames, columnTypes, writable, schema); } ... private AvroDeserializer getDeserializer() { if(avroDeserializer == null) { avroDeserializer = new AvroDeserializer(); } return avroDeserializer; } private AvroSerializer getSerializer() { if(avroSerializer == null) { avroSerializer = new AvroSerializer(); } return avroSerializer; } {code} {{getDeserializer}} and {{getSerializer}} methods are not thread safe, so neither are {{deserialize}} and {{serialize}} methods. It probably didn't matter with MapReduce, but now that we have Spark/Tez, it may be an issue. You could visualize a scenario where three threads all enter {{getSerializer}} and all see that {{avroSerializer}} is _null_ and create three instances, then they would fight to assign the new object to the {{avroSerializer}} variable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)