On Wed, 11 Jan 2023 21:51:45 GMT, Maurizio Cimadamore <mcimadam...@openjdk.org> 
wrote:

> So, in this example though you are calling an instance method before the 
> object is initialized, which would seem to me like a leak

D'oh, you're right. But if you made `returnMe()` static or private then the 
argument would still hold.

> And, if the method is static, same story - you are passing ref3 somewhere 
> else, and ref3 potentially contains this.

Not true... static methods are safe to "invoke" because they can't be 
overridden. When the analyzer "invokes" the method, it will see that all it 
does with its parameter is return it.

In other words, any method (or constructor) in the current compilation unit 
that can't be overridden is never considered "somewhere else".

> While I know that this is not perfect, and bound to generate false positives, 
> I get the spirit of my question is, really: is there a (much) simpler scheme 
> we can get away with, which has bounded complexity, and which has the 
> property we care about (which seems to be no false negative). I'm less 
> worried about contrived cases emitting false positives, as it's an optional 
> warning that can be shut down - but I'd like perhaps to move the discussion 
> from trying to detect precisely if the leak happens to try to detect if 
> potentially a leak can happen, and see if there's some simpler analysis that 
> can be used to get there (e.g. one which doesn't require flooding). It's 
> possible that you have already considered all these options and the analysis 
> you have here is the best trade off between complexity and precision - but 
> I'd like to have a better understanding of what the trade offs are, and, more 
> importantly, what happens to real code when we tweak the analysis this or 
> that way (as this is a
  problem where I feel it's easy to get in the land of diminishing returns).

I can only report from my own experience. I thought about this a bit and tried 
a few different things and what I ended up was the best I could come up with in 
terms of that trade-off. Of course there was a good bit of intuition and 
SWAG'ing in that and also a lot of thought experiments.

Of course if you have a clever idea for how to do this in a simpler way that 
achieves basically the same result, I'm all ears :)

But I don't really have an analysis of all the trade-offs. Really, at a certain 
point I stumbled on what I thought was a fairly straightforward way to achieve 
a good false negative rate and a very low false positive rate, without a lot of 
work. And frankly at that point I stopped caring about the question you're 
asking, although I agree it's certainly interesting from an academic point of 
view (I'm a practical person).

A lot of my initial ideas were too simple for my taste, because I could easily 
find real-world false negatives.

An example of "too simple": at first I was not trying to track an outer 'this'. 
But I found a lot of constructors out there that instantiate nested classes and 
then do things with them. Without tracking outer 'this' these would all be 
missed.

To take a random example of that:

public class FileChooserDemo extends JPanel implements ActionListener {
    ...
    public FileChooserDemo() {
        ...
        // create a radio listener to listen to option changes
        OptionListener optionListener = new OptionListener();
 
         // Create options
        openRadioButton = new JRadioButton("Open");
        openRadioButton.setSelected(true);
        openRadioButton.addActionListener(optionListener);  // <- LEAK HERE
        ...
    }

    private class OptionListener implements ActionListener {
        ...
    }
}

That particular leak is (probably) innocuous, but it still qualifies as a leak 
and should be reported. Note that failing to track an outer 'this' causes false 
negatives, which are a bigger problem than false positives.

On the flip side, I started out trying to explicitly track field references, 
but that seemed like more complexity than needed, at least for phase 1, so that 
was left out.

The "flooding" aspect didn't really worry me because the reference set is 
"append only" and there is small, finite set of possible references at each 
scope, so it can't really get that out of hand. Testing indeed shows it's not a 
problem. By the way recursion also "floods" until convergence, just like 
looping.

I would have been pleased to find "a much simpler scheme". But the more I 
thought about those options, the more I came up with easy misses. And after a 
certain point, it actually became easier to simply "execute" the code by 
scanning the AST while carrying around a `Set<Ref>` to track possible 
references. It's really that simple. Instead of trying to be "smart" we just 
let the code tell us what happens.

Apologies if this is not a very good answer to your question.

In the big picture, the false positive rate is traded-off against code 
complexity, and we want the best possible trade-off, right?

But how are you defining "complexity"?

If you really mean performance, then I don't see a problem... the JDK build 
times are essentially unchanged.

If you mean lines of code or whatever, then it doesn't seem inordinate. And the 
algorithm is more or less just an AST scan, like lots of other examples in the 
compiler code. So I don't see a problem there either.

Yes, there may be a much simpler way that's just as good, but if I could have 
thought of it I would have already. Perhaps you or someone else has better 
insight.

-------------

PR: https://git.openjdk.org/jdk/pull/11874

Reply via email to