Hi,

While running Adam Genomics [https://github.com/bigdatagenomics/adam], which 
uses Avro, on Apache Spark we discovered that threads (tasks in Spark context) 
block inside Avro while executing the getDefaultValue(Field field) method 
inside GenericData.java 
[https://github.com/apache/avro/blob/trunk/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java]

Cause: use of SynchronizedMap
Resolution: use ConcurrentHashMap instead
Affected versions: 1.7.4 and above

Issue and associated patch are already logged in jira 
(https://issues.apache.org/jira/browse/AVRO-1760). After applying the patch, we 
are able to gain a 3.4x performance improvement.

For details, see attached.

Thank you,
Mulugeta

Reply via email to