[ https://issues.apache.org/jira/browse/TIKA-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17935136#comment-17935136 ]
Tim Allison edited comment on TIKA-4393 at 3/13/25 10:41 AM: ------------------------------------------------------------- Thank you for opening this issue. I haven't worked much in this area of the codebase. -On a quick review, it looks like even with a synchronized method, the underlying map's contents would not be threadsafe -- AbstractConverter maintains and modifies instance data. For example, in MSOfficeXMLConverter, the call to {{process(...)}} calls {{super.setMetadata(metadata)}}.- -It looks like we need to create new converters for each call to convert or use threadlocal, which I really want to avoid.- Wait, no, I'm wrong. The new classes are constructed per instance in getConverter(). The Map stores the mimeType and the classes for the converters NOT instances for the converter. So, y, the {{synchronized}} keyword should work. We should add multithreaded tests on the same document type to confirm my above suspicions from the code review. WDYT? was (Author: talli...@mitre.org): Thank you for opening this issue. I haven't worked much in this area of the codebase. On a quick review, it looks like even with a synchronized method, the underlying map's contents would not be threadsafe -- AbstractConverter maintains and modifies instance data. For example, in MSOfficeXMLConverter, the call to {{process(...)}} calls {{super.setMetadata(metadata)}}. It looks like we need to create new converters for each call to convert or use threadlocal, which I really want to avoid. We should add multithreaded tests on the same document type to confirm my above suspicions from the code review. WDYT? > Thread-safety issue in TikaToXMP.getConverterMap() > -------------------------------------------------- > > Key: TIKA-4393 > URL: https://issues.apache.org/jira/browse/TIKA-4393 > Project: Tika > Issue Type: Bug > Components: tika-app > Affects Versions: 2.9.3, 3.1.0 > Environment: java 8, fedora 3.9, tika 2.9.3 > Reporter: Ghiles OUAREZKI > Priority: Minor > > h2. Description: > We execute TikaToXmp conversion in a multithreaded environment where we call > multiple conversion at the same time. According to new parser contribution > guide, there is nothing about thread-safety. During our test, a thread-safety > issue has been found in the getConverterMap() method of the TikaToXMP class > in the tika-xmp module. > > We are currently using the 2.9.3 version. We checked in the latest version > and we think the behavior is the same. > h2. Problem: > The method manipulates the static class variable converterMap, but it is not > thread-safe. Since there is no synchronization mechanism, multiple threads > can enter getConverterMap() at the same time and create different instances > of converterMap, leading to potential race conditions. > Current code: > {code:java} > private static Map<MediaType, Class<? extends ITikaToXMPConverter>> > converterMap; > private static Map<MediaType, Class<? extends ITikaToXMPConverter>> > getConverterMap() { > if (converterMap == null) { > converterMap = new HashMap<>(); > initialize(); > } > return converterMap; > } > {code} > If multiple threads call getConverterMap() simultaneously when converterMap > is still null, they might create multiple instances of converterMap, which > can lead to unexpected behavior. > h2. > Potential solution: > To ensure thread safety, we offer a simple patch : to synchronize > getConverterMap() so that only one thread can initialize converterMap at a > time. The fix consists of adding the *synchronized* keyword to the method: > {code:java} > private static synchronized Map<MediaType, Class<? extends > ITikaToXMPConverter>> getConverterMap() { > if (converterMap == null) { > converterMap = new HashMap<>(); > initialize(); > } > return converterMap; > }{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)