BELUGA BEHR created HIVE-18956:
----------------------------------

             Summary: AvroSerDe Race Condition
                 Key: HIVE-18956
                 URL: https://issues.apache.org/jira/browse/HIVE-18956
             Project: Hive
          Issue Type: Bug
          Components: Serializers/Deserializers
    Affects Versions: 2.3.2, 3.0.0
            Reporter: BELUGA BEHR


{code}
  @Override
  public Writable serialize(Object o, ObjectInspector objectInspector) throws 
SerDeException {
    if(badSchema) {
      throw new BadSchemaException();
    }
    return getSerializer().serialize(o, objectInspector, columnNames, 
columnTypes, schema);
  }

  @Override
  public Object deserialize(Writable writable) throws SerDeException {
    if(badSchema) {
      throw new BadSchemaException();
    }
    return getDeserializer().deserialize(columnNames, columnTypes, writable, 
schema);
  }

...

  private AvroDeserializer getDeserializer() {
    if(avroDeserializer == null) {
      avroDeserializer = new AvroDeserializer();
    }

    return avroDeserializer;
  }

  private AvroSerializer getSerializer() {
    if(avroSerializer == null) {
      avroSerializer = new AvroSerializer();
    }

    return avroSerializer;
  }
{code}

{{getDeserializer}} and {{getSerializer}} methods are not thread safe, so 
neither are {{deserialize}} and {{serialize}} methods.  It probably didn't 
matter with MapReduce, but now that we have Spark/Tez, it may be an issue.

You could visualize a scenario where three threads all enter {{getSerializer}} 
and all see that {{avroSerializer}} is _null_ and create three instances, then 
they would fight to assign the new object to the {{avroSerializer}} variable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to