Hi Niklas, I haven't read the whole proposal as I'm short of time. But Alan Zimmerman is doing a lot of work on integrating HaRe with the GHC API [1]. He is alanz on freenode and a regular in #hspec.
I haven't looked at the code, but maybe it's of interest to you. Cheers, Simon [1] https://github.com/alanz/HaRe/tree/ghc-api On Mon, Apr 29, 2013 at 02:00:23PM +0800, Niklas Hambüchen wrote: > I would like to propose the development of source code refactoring tool > that operates on Haskell source code ASTs and lets you formulate rewrite > rules written in Haskell. > > Objective > --------- > > The goal is to make refactorings easier and allow global code changes > that might be incredibly tedious to do in a non-automated way. > By making these transformations convenient, we can make it easier to > maintain clean code, add new features or clean up leftovers faster, and > reduce the fear and effort to upgrade to newer versions of packages and > APIs. > > > Transformations > --------------- > > First, here are a few operations you would use this tool for. Some of > them are common operations you would also do in other programming > languages, some are more specific to Haskell. > > * Changing all occurrences of "import Prelude hiding (catch)" to "import > qualified Control.Exception as E" > > * Replacing all uses of a function with that function being imported > qualified or the other way around > > * Adding a field to data constructor a record, setting user-supplied > defaults for construction and destruction: > > -- Suppose you want to change one of these > data User = User { name :: String, age :: Int } > data User = User String Int > > -- into one of these > data User = User { name :: String, age :: Int, active :: Bool } > data User = User String Int Bool > > -- the refactoring tool could perform, in all relevant locations: > show (User name age) = ... > show (User name age _) = ... > > -- and also this transformation: > ... u { name = "deleted" } ... > ... u { name = "deleted", active = False } ... > > -- or equivalently with records. > > -- Special cases could be taken care of as specified, such as > -- "whenever an object of [this User type] has > -- of its records passed into some function 'email', do this > -- now only if the user is active, so modify all relevant code > -- email (name u) > -- to > -- if (active u) then email (name u) else return () > > -- Other examples include adding a position counter to attoparsec. > > * Adding a type parameter to a type > > -- This happens a lot on monad transformer stacks, e.g. > newtype MyMonad a b c = MyMonad (ReaderT a (WriterT b ... > > -- and as you would probably agree on, this is not the most > -- comfortable change to make; in big project this can mean > -- hour-long grinding. > > -- It has also recently happened in the basic underlying types > -- of packages like conduit and pipes. > > * Adding a new transformer around a monad > > * Addressing problems like mentioned in > http://blog.ezyang.com/2012/01/modelling-io/: > "There is one last problem with this approach: once the primitives > have been selected, huge swaths of the standard library have to be > redefined by “copy pasting” their definitions ..." > > * Extracting a value into a let or where clause > > * Renaming a variable, and all its occurrences that are semantically > same variable (based on its scope) > > * Changing the way things are done, such as: > > * Replacing uses of fmap with <$>, also taking care of the > corresponding import, and such cases were partial application > is involved > > * Replacing uses of "when (isJust)" to "forM_" > > * Making imports clearer by adding all functions used to the file to the > import list of the module that gets them in scope > > * Finding all places where an exported function does not have all its > arguments haddock-documented. > > * Performing whole-project refactorings instead of operating on single > files only, allowing operations like > > "Find me all functions of this type, e.g. > Maybe a -> (a -> m a) -> m a > in the project and extract them into this new module, > with the name 'onJust'." > > > Some of the problems above can be tried to address using regex-based > search and replace, but this already fails in the simplest case of > "import Prelude hiding (catch)" in case there is more than that imported > from Prelude or newlines involved in the import list. > > Transformation on the AST are much more powerful, and can guarantee that > the result is, at least syntactically, valid. No text base tool can do that. > > > Other uses > ---------- > > In addition to being able to perform transformations as mentioned above, > the refactoring tool as a library can be leveraged to: > > * Support or be the base of code formatting tools such as > haskell-stylish, linters, style/convention checkers, static analyzers, > test coverage tools etc. > > * Implement automatic API upgrades. > > Imagine the author of a library you use deprecates some functions, > introduces replacements, adds type parameters. In these cases, it is > very clear and often well-documented which code has to be replaced by > what. The library author could, along with the new release, publish > "upgrade transformations" that you can apply to your code base to save > most of the manual work. > > These upgrade transformations could be either parts of the packages > themselves or be separately maintained and refined on the feedback of > users applying them to their code bases. > This could allow the Haskell community to keep up their fast pace > while making API breakage a non-problem. > > Automation is what makes us deal well with test suites, source > control, error checking. Taking the pain out of this will increase > people's incentive to upgrade their code an to keep it well-maintained > by easy refactorings. > > This concept of automatic code upgrades would, to my knowledge, be > quite unique in the programming language world, and the possibility to > have this is, as many things, the result of Haskell's excellent type system. > In comparison to dynamic and "unsafe" languages, we actually have a > lot of information around that we can use to write powerful and > expressive tools. > > > Splitting into GSoC tasks > ------------------------- > > I think that the project is too large for one summer and should be split > into two parts: > > 1. Implementation of a "full-source" transformation-enabling AST > 2. The transformation engine and API / DSL > > > About 1: Creating a full-source AST > > Existing parsers for Haskell tend to throw away information that is not > necessary for compiling the code. Of course this is not good for a > code-to-code transformation tool, stripping comments or whitespace the > programmer cares about is not an option. > > haskell-src-exts has gained support for comments around version 1.1 (as > another GSoC project?), but still comments are treated as foreigners, > given to you in a list detached from the AST; their positions are > specified with integers, which means that when you modify the AST, you > have to take care to adjust the comments as well. > > The problem seems to be that haskell-source-exts is made for parsing, > its AST is made for transformations. I have yet to see a tool that > actually *modifies* haskell-src-exts's AST - most of them use it for > parsing, and then apply some form of pretty-printing to get back to code. > > We need an AST that contains *all* information about the original source > code, which means render . parse = id; this is why I called it "full > source" AST. > Also, this AST should have all elements available in a way that > encourages modification and does not discriminate against operations > like re-indentation or alignment. > > I like to think of this as something that could eventually be part of > ghc, in a pipeline like > > GHC's parser > -> full-source AST > -> reduced AST that strips parts irrelevant for compilation > -> usual compilation pipeline > > I am thinking of this being in a GHC environment because it would > guarantee that it is up-to-date with the currently supported language > extensions etc. > However, I also see that this might slow down development and make > things harder, so at least for the beginning it might be better to do it > as as a normal library; I would love to get feedback on this. > > It might be a good idea to start with GHC's or haskell-source-exts > parser and modify them to obtain the full-source AST. I believe though > that it would be beneficial if the AST would be to a certain extent be > decoupled from the parser, such that other people could write other > parsers (e.g. using other parsing libraries like Parsec, attoparsec, > uu-parsinglib, trifecta) that produce the same compatible AST type. > > I believe that creating this AST makes a GSoC project for itself, and > that it would be beneficial for all efforts processing Haskell as a > source language. > > > About 2: Transformation engine and how to write transformations > > Transformations should be Haskell code that operates on the AST. > > They would be similar to how you write TemplateHaskell in a way, yet > much easier and more intuitive to use in most cases. I suggest the > creation of a monadic DSL that allows you to select and match those > parts of the code you are interested in, perform some case analysis and > computation to determine how you want to transform it, and then give you > convenient ways to express your changes in that DSL. > > Functions in this DSL would roughly belong to one of the kinds > * finding/matching > * transforming/rewriting. > > I have not thought about how this DSL would exactly look like, but I > could imagine myself writing a high-level transformation like this: > > is <- getImports > let matching = filter (importsFunction "catch") > . filter (hasModuleName "Prelude") > mapM_ (rewrite . removeImportFunction "catch") matching > > One of the priority goals of this GSoC project is making this API > convenient; transformations should read more naturally and be much > shorter than equivalent TemplateHaskell. > > This extensive, convenience oriented monadic DSL should aim to have a > reasonably large set of functions, and "do things for you" even if you > could build them yourself with three or four combinators. > It should base itself on a minimal set of core transformations. > I think that the Haskell community has a lot of expertise, especially > from parsing libraries, in creating this core API + convenience > combinators duo, that could aid the GSoC student to get this right. > > Along with this way to specify transformations, a transformation engine > has to be constructed which can perform them somewhat efficiently over > large code-bases. > > While speed should should not be a main target for this project and > functionality and expressiveness are the main goals, the transformation > engine should be designed at least with speed in mind, which can be > worked on at a later stage outside of the GSoC project. > > This second part of the proposal should result in a library to perform > transformations and an executable that can apply them to a folder of > source files. > > > Possible extensions and follow-up projects > ------------------------------------------ > > * Interactive refactoring tool > > The refactoring tool could be driven by an interactive application that > lets you write some transformations, apply them to your code base, view > the diffs and build the project; on build failures or otherwise > not-complete transformations it would allow to refine your > transformations, or write custom special cases for selected files, line > ranges or functions. > This way you could quickly upgrade your code base in a fast > write-check-repeat cycle. > > * An API upgrade infrastructure > > As mentioned above, transformations could be published along with new > software packages that allow their dependants to easily upgrade their > code to the latest API version with minimal manual effort. > An infrastructure or standard way of way of doing this could be established. > For now, I think that package maintainers would publish their upgrade > transformations in their projects' or separate code repositories, and > users would apply them with the refactoring tool as needed. > I can also imagine a more dedicated infrastructure though, where API > upgrades are stored on Hackage or a similar database. > > > Summary > ------- > > This proposal contains two GSoC projects that I believe to be reasonably > sized for one summer each. > > I think that they make good projects according to GSoC standards as they > are focused, feasible, and aimed at creating real-world code that bring > a clearly visible benefit. > > The entry barrier is not too high since knowledge of compiler or runtime > internals is not required. However, applicants should probably have a > fairly good understanding of parsers and how they are dealt with in > Haskell, and be somewhat familiar with the Haskell community in order to > find sources of good feedback and for being able to judge whether the > tools being created would be convenient for the community to use. > > > Discussion > ---------- > > Now I would be glad to get some responses on this proposal; I have > written a bit of text, but it is still a very rough idea and I would > love to hear your thoughts about it. > > _______________________________________________ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://www.haskell.org/mailman/listinfo/haskell-cafe _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe