Yes, that would definitely be one productive way forward. One concern
is that Language.C is BSD-licensed (and it would be nice to keep it
that way), and cpphs is LGPL. However, if cpphs remained a separate
program, producing C + extra stuff as output, and the Language.C
parser understood the extra stuff, this could accomplish what I'm
interested in. It would be interesting, even, to just extend the
Language.C parser to support comments, and to tell cpphs to leave them
in.
There's also another pre-processor, mcpp [1], that is quite featureful
and robust, and which supports an output mode with special syntax
describing the origin of the code resulting from macro expansion.
Aaron
[1] http://mcpp.sourceforge.net/
On Mar 30, 2010, at 12:14 PM, austin seipp wrote:
(sorry for the dupe aaron! forgot to add haskell-cafe to senders
list!)
Perhaps the best course of action would be to try and extend cpphs to
do things like this? From the looks of the interface, it can already
do some of these things e.g. do not strip comments from a file:
http://hackage.haskell.org/packages/archive/cpphs/1.11/doc/html/Language-Preprocessor-Cpphs.html#t%3ABoolOptions
Malcolm would have to attest to how complete it is w.r.t. say, gcc's
preprocessor, but if this were to be a SOC project, extending cpphs to
include needed functionality would probably be much more realistic
than writing a new one.
On Tue, Mar 30, 2010 at 12:30 PM, Aaron Tomb <at...@galois.com> wrote:
Hello,
I'm wondering whether there's anyone on the list with an interest
in doing
additional work on the Language.C library for the Summer of Code.
There are
a few enhancements that I'd be very interested seeing, and I'd love
be a
mentor for such a project if there's a student interested in
working on
them.
The first is to integrate preprocessing into the library.
Currently, the
library calls out to GCC to preprocess source files before parsing
them.
This has some unfortunate consequences, however, because comments
and macro
information are lost. A number of program analyses could benefit from
metadata encoded in comments, because C doesn't have any sort of
formal
annotation mechanism, but in the current state we have to resort to
ugly
hacks (at best) to get at the contents of comments. Also, effective
diagnostic messages need to be closely tied to original source
code. In the
presence of pre-processed macros, column number information is
unreliable,
so it can be difficult to describe to a user exactly what portion
of a
program a particular analysis refers to. An integrated preprocessor
could
retain comments and remember information about macros, eliminating
both of
these problems.
The second possible project is to create a nicer interface for
traversals
over Language.C ASTs. Currently, the symbol table is built to
include only
information about global declarations and those other declarations
currently
in scope. Therefore, when performing multiple traversals over an
AST, each
traversal must re-analyze all global declarations and the entire
AST of the
function of interest. A better solution might be to build a
traversal that
creates a single symbol table describing all declarations in a
translation
unit (including function- and block-scoped variables), for easy
reference
during further traversals. It may also be valuable to have this
traversal
produce a slightly-simplified AST in the process. I'm not thinking of
anything as radical as the simplifications performed by something
like CIL,
however. It might simply be enough to transform variable references
into a
form suitable for easy lookup in a complete symbol table like I've
just
described. Other simple transformations such as making all implicit
casts
explicit, or normalizing compound initializers, could also be good.
A third possibility, which would probably depend on the integrated
preprocessor, would be to create an exact pretty-printer. That is, a
pretty-printing function such that pretty . parse is the identity.
Currently, parse . pretty should be the identity, but it's not true
the
other way around. An exact pretty-printer would be very useful in
creating
rich presentations of C source code --- think LXR on steroids.
If you're interested in any combination of these, or anything
similar, let
me know. The deadline is approaching quickly, but I'd be happy to
work
together with a student to flesh any of these out into a full
proposal.
Thanks,
Aaron
--
Aaron Tomb
Galois, Inc. (http://www.galois.com)
at...@galois.com
Phone: (503) 808-7206
Fax: (503) 350-0833
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
--
- Austin
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe