[ 
https://issues.apache.org/jira/browse/FLINK-32896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17757683#comment-17757683
 ] 

Marcono1234 commented on FLINK-32896:
-------------------------------------

{quote}
Marcono1234 do you have capacity to work on this one?
{quote}
The main problem is that I am not familiar with Flink, neither the source code 
nor the project itself. It would therefore be great if one of you could do 
this, because it will most likely be much easier and way more efficient than me 
trying to get familiar with the project and how to build it. Sorry if this is 
causing trouble for you.
I came across these issues while [writing a custom CodeQL 
query|https://github.com/Marcono1234/codeql-java-queries/blob/master/codeql-custom-queries-java/queries/likely-bugs/method-ref-creating-collection-with-capacity.ql]
 and running it against popular Java projects on GitHub to verify that it works 
correctly. The query results included those listed above for Flink, and I 
thought it might be useful for you if I notified you about these issues.

> Incorrect `Map.computeIfAbsent(..., ...::new)` usage which misinterprets key 
> as initial capacity
> ------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-32896
>                 URL: https://issues.apache.org/jira/browse/FLINK-32896
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.18.0, 1.17.1
>            Reporter: Marcono1234
>            Priority: Minor
>              Labels: starter
>
> The are multiple cases in the code which look like this:
> {code}
> map.computeIfAbsent(..., ArrayList::new)
> {code}
> Not only does this create a new collection (here an {{ArrayList}}), but 
> {{computeIfAbsent}} also passes the map key as argument to the mapping 
> function, so instead of calling the no-args constructor such as {{new 
> ArrayList<>()}}, this actually calls the constructor with {{int}} initial 
> capacity parameter, such as {{new ArrayList<>(initialCapacity)}}.
> For some cases where this occurs in the Flink code I am not sure if it is 
> intended, but in same cases this does not seem to be intended. Here are the 
> cases I found:
> - 
> [{{HsFileDataIndexImpl:163}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/HsFileDataIndexImpl.java#L163C70-L163C84]
> - 
> [{{HsSpillingStrategy:128}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/HsSpillingStrategy.java#L128C68-L128C82]
> - 
> [{{HsSpillingStrategy:134}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/HsSpillingStrategy.java#L134C63-L134C77]
> - 
> [{{HsSpillingStrategy:140}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/HsSpillingStrategy.java#L140C63-L140C77]
> - 
> [{{HsSpillingStrategy:145}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/HsSpillingStrategy.java#L145C70-L145C84]
> - 
> [{{HsSpillingStrategy:151}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/HsSpillingStrategy.java#L151C65-L151C79]
> - 
> [{{HsSpillingStrategy:157}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/HsSpillingStrategy.java#L157C65-L157C79]
> - 
> [{{HsSpillingStrategyUtils:76}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/HsSpillingStrategyUtils.java#L76C74-L76C88]
> - 
> [{{ProducerMergedPartitionFileIndex:171}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/tiered/file/ProducerMergedPartitionFileIndex.java#L171C75-L171C89]
> - 
> [{{TestingSpillingInfoProvider:201}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/test/java/org/apache/flink/runtime/io/network/partition/hybrid/TestingSpillingInfoProvider.java#L201C56-L201C70]
> - 
> [{{TestingSpillingInfoProvider:208}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/test/java/org/apache/flink/runtime/io/network/partition/hybrid/TestingSpillingInfoProvider.java#L208C54-L208C66]
> - 
> [{{TestingSpillingInfoProvider:216}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/test/java/org/apache/flink/runtime/io/network/partition/hybrid/TestingSpillingInfoProvider.java#L216C54-L216C66]
> - 
> [{{SourceCoordinatorConcurrentAttemptsTest:269}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/test/java/org/apache/flink/runtime/source/coordinator/SourceCoordinatorConcurrentAttemptsTest.java#L269C53-L269C65]
> This can lead either to runtime exceptions in case the map key is negative, 
> since the collection constructors reject negative initial capacity values, or 
> it can lead to bad performance if the key (which is misinterpreted as initial 
> capacity) is pretty low, such as 0, or is pretty large and therefore 
> pre-allocates a lot of memory for the collection.
> Regardless of whether or not these cases are intended it might be good to 
> replace them with lambda expressions to make this more explicit:
> {code}
> map.computeIfAbsent(..., capacity -> new ArrayList<>(capacity))
> {code}
> or
> {code}
> map.computeIfAbsent(..., k -> new ArrayList<>())
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to