Incoming PRs related to 'tree' enhancements

Bear Giles Sun, 30 Mar 2025 10:06:17 -0700

I've taken the time to look around the other 'tree' classes, or more
precisely the '*DependencyNodeVisitor" classes, and noticed that most if
not all of them follow the anti-pattern of doing more than one thing in
each method. More precisely they mix logic ('what to include') and
presentation ('what to send to the writer). This makes code difficult to
understand, to test, to extend by a third party, etc.


This is why template engines were developed! :-)

Fortunately there's already a dependency on 'velocity', although it appears
to only be used during the extensive testing. However it's still there in
the 'compile' scope so using it won't require adding new dependencies.

That's a very clean solution to the problem identified above. For most
formats the velocity template won't need anything beyond what's already
included in the WrappedNode (DependencyNode). If a format can benefit from
adding a tiny bit of additional information it can be added to the
WrappedNode since that class is exclusive to the 'tree' package. In more
complex cases the individual class can maintain it's own data structures
and add them to the Context provided to velocity.

This should work with all of the existing classes although I'm only
planning to modify one at this time. It should make the classes much easier
to maintain since they can focus on the logic of how to convert the data
provided by the DependencyNode to the data usable by the Velocity engine. I
know other template engines, e.g., jinjava, have a way to provide
user-defined functions but I don't think velocity does - that's unfortunate
since it will require a modest amount of pre-computation. Fortunately that
can be handled separately, immediately before calling the Velocity 'merge'
method.

This will only require a few classes - basically just a wrapper to set up
the velocity engine. It will also require a standard place to put the
templates and I recommend 'templates/{format}/{name} and
'templates/{format}/macros/{name}. The 'format' would be 'dot', 'json',
etc., and the 'macros' allows velocity macros to be kept separately from
the templates since they're much more likely to be reused. (Hmm... perhaps
also 'templates/macros/' for macros that are format-agnostic.

-----------------------------

The second PR will probably be a bit more controversial since there's
multiple ways to implement it and comments in some of the 'upstream'
methods suggest there's already WIP that may affect how the data is
provided to the *DependencyNodeVisitor classes.

At the moment these classes only include two methods - visit(node) and
endVisit(node). It is the responsibility of the endVisit(node) method to
determine that all data has been received. I think that's an obvious
omission but the DependencyNodeVisitor interface is defined in a different
project. It would be trivial to add to the existing implementations, and
java8 supports optional methods so there would be no impact on others if
this method is added to the interface with an empty optional method.

That's not a big problem if your visitor is able to run in a 'streaming'
mode, but some formats can benefit from using a 'document' mode instead.
In most cases this doesn't matter since the results are not intended for
human consumption - they need to be a standard format accepted by other
applications and these formats are usually designed to work in a streaming
fashion.

However the dot format, and potentially other, are intended for human
consumption. The file itself isn't human-friendly but there are widely
available tools to convert it into human-friendly images and even
interactive formats like svg. In these cases it makes much more sense  to
use a document mode when creating the final document since it is often
appropriate to provide the same information, or at least related
information, in multiple places in the document.

So there's clearly a benefit to a new method that is called immediately
after the final endVisit(), one that provides the complete dependency tree.
The implementations should be able to ignore visit() and endVisit(). (And
in fact they can be added to the abstract class if not the
DependencyNodeVisitor interface)

The question is how to implement this, esp. when looking at other potential
enhancements to the class.

For instance it's possible to have DependencyNodeVisitor implement
Consumer<DependencyNode> and the 'accept()' method calls visit(). It may
not be practical since many implementations will need the nesting that
endVisit() provides.

Another possibility is to have DependencyNodeVisitor implement
Function<DependencyNode, Boolean> that's called after the final endVisit()
call. Again the idea is to simplify integration into streams but there's no
reason you couldn't just use the (object::function) notation.

I think this ultimately comes down to what's best with future improvements
to the upstream code that creates the data sent to these classes. There are
notes that indicate the desire to shift to a streaming pattern (while
maintaining legacy interfaces, of course) and it would make sense to use a
consistent approach here - even if it means reformatting the data
immediately before calling the new method.

For now I'll just add a new method to the abstract class that all of the
implementations extend but I consider it a stopgap measure since this
functionality would be useful to third party developers, etc.

(Sidenote - it could even quietly replace some of the existing logic since
the final method could easily perform a recursive descent of rootNode and
call visit() and endVisit() itself. I think this could be put into the
DependencyNodeVisitor interface as a default method so there would be no
risk to removing the existing code. The primary drawback would be that the
current approach allows the implementation to bail out early if it hits a
problem while this approach would require the entire rootNode structure to
be generated first.)

Bear

Incoming PRs related to 'tree' enhancements

Reply via email to