Hi Dodji Seketeli:

Is it possible to gcc to accept libcpp.patch and plugin.patch?

I recently rewrite my doc.txt which mainly add a new section <Macro
Expansion Overview>, it's focused on pfile.context usage linking to
macro. I think it's important to use cb_macro_start/end callbacks
because most users only care about the outest macro expansion, test
whether pfile.context.prev == NULL, however if he hasn't known macro
cascaded case, his code will crashed.

Sincerely
                                                                     Yunfeng
// vim: foldmarker=<([{,}])> foldmethod=marker
                         Gcc symbol database (symdb)
                             zyf.zer...@gmail.com
                              November 24, 2009
                           revised on May 24, 2012

// Purpose <([{
The file is used to record the idea I got -- collecting gcc internal data
(definition, file-dependence etc.) and outputting them into database for
further usage.  Have you knowed cscope? but I think it's more appropriate that
symbols should be collected by gcc itself. Later sections can be cataloged
into two genres

For user (here user is IDE-like develop tools, not final user)
1) Need to know what symdb can do, goto section <Feature List>.
2) Goto section <User Manual> for how to using symdb.
3) <Multiple results> and <Tested cases> have more about the plugin.
For gcc internal developer
1) Section <New Token Type> defines some new token types used in my symdb.
2) Sections <Gcc XXXX Macro Expansion> shows some complex cases linking to
macro expansion, I list calling sequence from plugin-side and stack snapshot
from gcc-side in every section, read them carefully, it's the key to
understand so.c:class mo.  test/testplan.txt and test/macro have the
testcases.
3) Section <Patch Overview> makes focus on which files and how are changed in
the patch.

Before we go, let's clear up some terminology or abbreviation used in my symdb
1) cpp abbreviates from c preprocess (follows gcc intern convention); however,
cxx represents c++.
2) In gcc/c-ppoutput.c, gcc defines compilation unit as compiling a file, new
noun `compilation session' means compiling all files of a project.
// }])>

For User
// Feature List <([{
1) The plugin only works on C not C++.
2) Plugin can collect all extern definitins and dump them to database.
3) As the convention, the members of an extern enum are collected and dumped
to database.
4) Funtion call relationship are collected too, just like cscope.
5) You can use table FileDependence of database to reconstruct file dependence
relationship.
6) Not in cscope, you can use `gs addsym/rmsym' to re-edit the database,
remove the duplicate results and append new symbols to the database, see
section <Multiple-results ...>.
7) I finished a vim script `helper.vim' to help you using the database in vim.
8) My plugin is better than cscope in any cases, since I can catch definition
after macro expansion (such as tell you where `sys_open' is defined in linux
source) and skip `#ifdef/#if'.
// }])>

// User Manual <([{
Note: Using my plugin on correct code, buggy code maybe cause my plugin
infinite-loop.

Prepare stage (patch on gcc-4.6.3):
1) cp gcc.patches/* gcc.src/patches
2) quilt push -a
3) make # as usual
More about compilation suite (such as crosstool-ng-1.13.2):
1) Since gcc plugin is implemented as shared library, so disable compiling
static toolchain option.
2) add `--enable-plugin' to your gcc configure line, or append
`CT_CC_GCC_ENABLE_PLUGINS=y' to your crossng.config.
3) See section <Tested cases> for a sample command line on gcc-4.6.3.

Compiling source by patched gcc (cd myplugin.src/):
1) make
2) cp gs helper.vim init.sql target.src/ && cd target.src/
3) ./gs initdb ./ # Initialize database. If you want to custom the plugin,
update plugin-control-fields of database:ProjectOverview.
4) Append `-fplugin=/path/to/symdb.so
-fplugin-arg-symdb-dbfile=/target.src/gccsym.db' to your CFLAGS.
5) Since sqlite uses file-lock to synchronize database, so use `make -j1' to
compile source.
6) It will cost more time to compile your project, because my plugin need
compare whether a token has been inserted into database and multi-core can't
help you -- see previous steps, do it overnight.
7) ./gs vacuumdb ./ # Rearrange and defrag your database.
Of course, you can use some short-cuts to compile your projects without any
modification.
    alias gcc='gccplugin -fplugin=symdb.so -fplugin-arg-symdb-dbfile=gccsym.db'
    alias make='make -j1'

Working with new database:
1) cd /target.src
2) vi
3) execute `:source helper.vim'
4) Using `CTRL-]' to search a definition.
5) Using `CTRL-[' to search which functions calls the function.
6) Using `CTRL-T' to jump back.
7) Using `Gs def yoursymbol' to search a definition.
8) Using `Gs callee yourfunction' to search function call relationship.

Vim quickref:
Since my database stores the file-offset of every token, so
1) Using `:go fileoffset' to jump to the token.
2) Using `g<CTRL-g>' on the char to get the file-offset.

Testing my plugin:
1) cd test && ./run.sh
// }])>

// Multiple-results from my database and my solution <([{
Things are not always perfect, here I list some cases which makes my plugin
return multiple-resuls when search a definition.

    int i;
    int i;
    extern int i;
The syntax is acceped by gcc, so to my plugin, by the way, my plugin doesn't
store the third line to database.

    typedef abc abc;
    struct abc {
        ...
    };
The syntax is correct. `gs' will return two results, one is DEF_TYPEDEF,
another is DEF_STRUCT. In gcc-4.6.2, search `cpp_reader' will see the case, by
the way, `helper.vim' will search definition from current file and its
dependence, then all files, so in libcpp/internal.h searching `cpp_reader'
will return one result.

    #define X a
    #include "afile"
    #undef X
    #define X b
    #include "afile"
    #undef X
Search `X' will return multiple results, gcc/tree.c includes several
gcc/tree.def and definite symbol `DEFTREECODE' several times.

    #ifdef X
    int x;
    #else
    char x;
    #endif
Sometime `x' is returned two results.
glibc-2.13/include/shlib-compat.h:compat_symbol is the case. The reason is
every file is compiled two times in glibc internal
    gcc .. -DSHARED .. -o x.os
    gcc .. -o x.o
It causes the strange case.

Fortunately, I add two commands (not in cscope :)
    gs addsym/rmsym filename definition position
So you can remove the item you doesn't like by them. `gs addsym' is also
useful since you can use it to append the symbols which are invisible from my
plugin, such as `gs addsym arch/mips/kernel/vmlinux.lds jiffies 6601'.
// }])>

// Tested projects (gcc, glibc and linux) <([{
Prepare gcc support packages:
sqlite-autoconf-3070900:
    ./configure --prefix=/home/zyf/root
    make install
gmp-5.0.1
    ./configure --prefix=/home/zyf/root/ --enable-shared=no && make install
mpfr-3.0.1
    ./configure --prefix=/home/zyf/root/ --with-gmp=/home/zyf/root/
    --enable-shared=no && make install
mpc-0.9
    ./configure --prefix=/home/zyf/root/ --with-mpfr=/home/zyf/root/
    --with-gmp=/home/zyf/root/ --enable-shared=no && make install
gcc-4.6.3 (x86):
    *) ./configure --prefix=/home/zyf/root/ --with-mpc=/home/zyf/root/
    --with-gmp=/home/zyf/root/ --with-mpfr=/home/zyf/root/
    *) # After quilt my patches.
    *) make STAGE1_CFLAGS="-ggdb" all-stage1
    *) # prepare database.
    *) make STAGE2_CFLAGS="-fplugin=/home/zyf/src/symdb.gcc/symdb.so
    -fplugin-arg-symdb-dbfile=/home/zyf/src/gcc-4.6.3/gccsym.db" all-stage2
    *) # rearrange database.
glibc-2.13 (mips):
    *) Change glibc.src/Makeconfig: `override CFLAGS = '.
linux-2.6.35 (mips):
    *) Change linux.src/Makefile: `KBUILD_CFLAGS := '.
// }])>

For Developer
// New Token Type <([{
In this section, I'll define some new token type which is available everywhere
in my symdb, but before we go, we need go further into gcc internal on how gcc
compiles a file, consider the case

    ----------- a.c -----------
    #define FOO \
        = 2;
    // a line comment only.
    ...
    int x FOO;

Now do `gcc -save-temps --verbose a.c', you will get all intermediate files
and gcc call hierarchy -- main, preprocess, compiling, assemble and linkage
stage. Comparing a.c and a.i, we find some tokens are erased, some are
substituted, so my new token types are
1) EXPANDED_TOKEN -- is macro-expanded or substituted (`FOO' of the last
line).
2) ERASED_TOKEN -- erased during preprocess stage (The first 3 lines).
3) COMMON_TOKEN -- exist in both .c and .i.
4) MACRO_TOKEN -- macro result (sample is `=' and `2').

And
*) Original .c/.h include the first 3 types -- also called chToken.
*) .i includes the last 2 types -- called iToken.
*) To function-like macro, such as, `x', `(', `a', `)', all four tokens are
EXPANDED_TOKEN, and the first is also called leader EXPANDED_TOKEN.

By the way, preprocess stage also combines all soft carriage line -- tailing
with `\' into a line, so the first 2 lines are combined, and `\' itself isn't
cpp token.
// }])>

// Macro Expansion Overview, based on gcc-4.6.3 <([{
The fold records gcc macro expansion internal flow and my implementation,
generally when macro expansion occurs, new context is pushed into
pfile.context and its context.macro != NULL, however later lines show there're
other cases using pfile.context too and context.macro = NULL. Note we ignore
pragma, conditional macro, error handling, traditional macro lines.
Terminology:
    1) paste case: `#define A x ## y'. To `x' token.flags & PASTE_LEFT.
    2) stringify case: `#define A(x) #x'. To `x' token.flags & STRINGIFY_ARG.

cpp_get_token:
    _cpp_lex_token
    if token.flags == PASTE_LEFT, paste_all_tokens. context.macro = NULL.
    _cpp_pop_context, cb_macro_end is broadcasted.
    cb_macro_start is broadcasted, enter_macro_context

enter_macro_context:
    If fun_like:
        funlike_invocation_p (>> collect_args), and when macro cascaded, it
            pushes a context.macro = NULL.
        replace_args, see below.
    If !fun_like:
        _cpp_push_token_context, context.macro != NULL.

// replace_args: tags paste tokens and handles stringify cases <([{
Iterate macro tokens and handle CPP_MACRO_ARG.
    stringify case: call stringify_arg.
    paste case: do nothing.
    macro case: call expand_arg which context.macro = NULL.
Do replacement:
    token literally copy from user token and continue.
    otherwise CPP_MACRO_ARG, do later
    1) insert padding token before the arg unless it's the first token.
    `!(src[-1].flags & PASTE_LEFT)'.
    2) copy arg result.
    3) insert padding token after args. The token is pfile.avoid_paste
    (CPP_PADDING). `if !(src->flags & PASTE_LEFT)'.
Call push_ptoken_context, context.macro != NULL.

So to case 1:
    #define xglue(x, y) x y
    `x', CPP_PADDING (post-padding), CPP_PADDING (pre-padding), `y',
    CPP_PADDING (post-padding).
So to case 2:
    #define xglue(x, y) x ## y
    `x', `y', CPP_PADDING (post-padding).
    And `x' token.flags = PASTE_LEFT.
// }])>

// Context conclusion <([{
cpp_get_token calls _cpp_pop_context.
paste_all_tokens calls _cpp_push_token_context, context.macro = NULL.
enter_macro_context calls _cpp_push_token_context if !funlike.
funlike_invocation_p calls _cpp_push_token_context if macro cascaded.
replace_args calls push_ptoken_context to contain macro result.
expand_arg calls push_ptoken_context/_cpp_pop_context, context.macro = NULL.
// }])>

Keep it in mind when cb_macro_start is called, new macro context has NOT
pushed, to cb_macro_end context has been poped. To my plugin, only the outest
macro expansion is cared about, which means, to the callbacks, we should
detect whether pfile.context.prev == NULL, however section <GCC cascaded macro
expansion> shows there's a trap in it. Read replace_args carefully, it will
help to understand mo_maybe_cascaded.
// }])>

// GCC Cancel Macro Expansion <([{
Consider the case
    #define Z(a) a
    int Z = 3;

in fact, gcc doesn't complain the last line. After prefetch two tokens `Z' and
`=', gcc realizes that `Z' shouldn't be treated as macro, so it cancels macro
expansion and return `Z' and `=' as COMMON_TOKEN not EXPANDED_TOKEN. However
it makes my code flow become more complex, I place a bool mo.cancel to solve
it.

Calling sequence (Note, cb_macro_start and cb_macro_end are always matched
even macro expansion is canceled):
    cb_macro_start(Z)
    cb_end_arg(cancel = true)
    cb_macro_end(Z, prev = NULL)
    symdb_cpp_token('Z')
    symdb_cpp_token('=')

By the way, my plugin only cares about 1-level macro cancel.
// }])>

// GCC Cascaded Macro Expansion <([{
Consider the case
    #define Z(a) a
    #define Y Z
    #define X Z(1)
    Y(1);
    X;

The case is special due to the expansion process of X and Y is converted to
the expansion process of a fun-like macro finally.

To the fourth line, the sequence is
    cb_macro_start(Y)
    cb_macro_start(Z)
# enter_macro_context(Z) >> funlike_invocation_p >> cpp_get_token, Y macro is
# popped from pfile.context, so
    cb_macro_end(Y, prev = NULL)
    cb_end_arg(Z)
# enter_macro_context(Z) >> replace_args. Z macro is pushed to pfile.context.
    symdb_cpp_token(a)
    cb_macro_end(Z, prev = NULL)

To the fifth line, the sequence is
    cb_macro_start(X)
    cb_macro_start(Z)
    cb_end_arg(Z)
    cb_macro_end(Z)
    cb_macro_end(X, prev = NULL)

The fourth line is also breaking cb_macro_start/cb_macro_end pair rule. It
makes my code a little complex. I use mo_maybe_cascaded() to solve it.

Note: macro cascaded means the tail of a macro expansion tokens is another
funlike macro. So later macro definition is also belongs to macro cascade.
    #define Y2 1 + Z
But later isn't
    #define Y2 1 + Z + 1

To 3-level or higher cascaded cases, it's obvious all immediate macroes can't
be funlike macro. Here only list the first sequence
    cb_macro_start(Y)
    cb_macro_start(Y2)
    cb_macro_start(Y3)
    cb_macro_start(Z)
    ...
// }])>

// GCC Cancel + Cascaded Macro Expansion <([{
Consider the case
    #define Z(p) a
    #define Y Z
    #define X Z = 1
    int Y = 1;
    int X;

When cancel encounters cascaded, thing will go even worse.

To the fourth line, the sequence is
    cb_macro_start(Y)
    cb_macro_start(Z)
# enter_macro_context(Z) >> funlike_invocation_p >> cpp_get_token, Y macro is
# popped from pfile.context, so
    cb_macro_end(Y, prev = NULL)
# Then Z is canceled, funlike_invocation_p >> _cpp_push_token_context which
# pushes a context to pfile.context, its context.macro = NULL.
    cb_end_arg(cancel = true)
# So in my cb_macro_end, prev != NULL, however its context.macro = NULL, a
# strange case.
    cb_macro_end(Z, match-pair to cb_end_arg, prev != NULL)
    symdb_cpp_token(`Z' is returned, which is the result of Y macro)
    cb_macro_end(prev = NULL)
# context.macro = NULL is poped.
    symdb_cpp_token(=)

To the fifth line, the sequence is
    cb_macro_start(X)
    cb_macro_start(Z)
    cb_end_arg(cancel = true)
    cb_macro_end(Z, match-pair to cb_end_arg)
    symdb_cpp_token(Z)
    symdb_cpp_token(=)
    symdb_cpp_token(1)
    cb_macro_end(X, prev = NULL)

To the fourth line, two cb_macro_start and three cb_macro_end are called.
// }])>

// Patch Overview <([{
// symdb_enhance_libcpp <([{
Several new callbacks are appended into libpp/include/cpplib.h:cpp_callbacks:
void (*macro_start_expand) (...);
void (*macro_end_arg) (..., bool cancel);
void (*macro_end_expand) (...);
    are used to collect EXPANDED_TOKEN and MACRO_TOKEN.
Note:
1) macro_end_arg is callbacked when a function-like macro ends to collect
its arguments.
2) See section <GCC Cancel Macro Expansion> for more about parameter cancel of
macro_end_arg.
3) macro_{start, end}_expand can be called several times if the macro includes
more macroes in its define clause.
4) Even macro expansion cancel, a macro_end_expand is called too.
void (*start_directive) (...);
void (*end_directive) (...);
void (*directive_token) (...);
    are used to collect ERASED_TOKEN. Meanwhile, directive_token is much
powerful than cpp_get_token since cpp_get_token doesn't output ERASED_TOKEN.
Most code in libcpp directory is surrounding how to implement the callbacks.

A new field -- file_offset is added into cpplib.h:cpp_token, the field is used
to mark every chToken exclusively, it's just like line_map + source_location,
but simpler, and to show line/column to user new script offset2lc(filename,
fileoffset, &line, &column) should be used. internal.h:_cpp_line_note is also
changed to fit with the purpose.
// }])>

// symdb_enhance_plugin <([{
New events are added into current plugin architecture
PLUGIN_CPP_TOKEN
PLUGIN_C_TOKEN
PLUGIN_EXTERN_DECL
PLUGIN_EXTERN_FUNC_OLD_PARAM
PLUGIN_EXTERN_FUNC
PLUGIN_EXTERN_VAR
PLUGIN_EXTERN_DECLSPECS
PLUGIN_CALL_FUNCTION
PLUGIN_ENUM_SPECIFIER
The first two events are used to cache iToken, the remain are used to collect
definition.
// }])>
// }])>

// Database (init.sql) <([{
*) User should use the fields of ProjectOverview to control the plugin.
*) User should use view Helper to search file, definition and position.
*) Using `gs initdb/vacuumdb' to initialize database and arrange it.

init.sql has been organized by vim fold feature. Meanwhile table chFile is
the root.
// }])>

// Misc. <([{
Gcc defines a default cpp_callbacks::file_change, to listen the callback in my
patch to monitor file depedence, I replace the value in symdb_unit_init and
call the original value in cb_file_change.

New field cpp_token::file_offset breaks the fact the size of cpp_token should
be fit with a cacheline. See section <Patch Overview> for the solution.
// }])>

Reply via email to