Re: Question about past research in detecting compiler used to create executable binary

2008-01-23 Thread Tim Josling
On Wed, 2008-01-23 at 16:48 -0600, Stephen Torri wrote:
> GCC Community,
> 
> I am a PhD candidate at Auburn University in Alabama investigating
> automated compiler detection for reverse engineering.  The reason I am
> contacting this mailing list is to see if anyone knows of research done
> to discover the compiler used to create a binary executable.
> 
> Sincerely,
> 
> Stephen Torri
> PhD Candidate
> Auburn University
> Department of Computer Science and Software Engineering
> [EMAIL PROTECTED]
> 
> 

If GCC is any guide, this will often be trivial. GCC embeds lots of data
about the source system and compiler in the executable.

> file temp.x
temp.x: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), for
GNU/Linux 2.6.0, dynamically linked (uses shared libs), not stripped

Also in the same file

GCC: (GNU) 4.1.2 (Ubuntu 4.1.2-0ubuntu4)

If this is a reverse engineering project, your adversary will probably
have stripped as much of this kind of thing as possible though.

Tim Josling



Re: Rant about ChangeLog entries and commit messages - better to do something than just complain

2008-02-23 Thread Tim Josling
On the principle that it's better to do something than just complain...

I monitored the time I spent looking for the emails associated with a
given patch and I found it takes high single digit minutes to find them.
Sometimes you can't find them (which takes a lot longer). I do this a
lot. 

I wrote a little proof-of-concept script to take the mailing list
archives and the ChangeLog files and annotate the ChangeLog files with
the URLs of the probable email containing the patch.

Sample output is here (annotation of the current ChangeLog file). 

http://cobolforgcc.cvs.sourceforge.net/cobolforgcc/gcc/gcc/gcb/gcc_ChangeLog.txt?revision=1.1&view=markup
Or http://tinyurl.com/2v824o
Or http://preview.tinyurl.com/2v824o

The program is here (not much internal documentation at all). Testing
has been limited - in any case, with processing of text written by
people, perfection is not possible.

http://cobolforgcc.cvs.sourceforge.net/cobolforgcc/gcc/gcc/gcb/gcc_mailscan.rb?revision=1.1&view=markup
Or http://tinyurl.com/2yem2u 
Or http://preview.tinyurl.com/2yem2u

It runs in about 25 minutes on my system and uses a few hundred MB of
storage.

Things I learned:

1. There is a lot of data. It's a good thing Ruby 1.9 is a lot faster
than Ruby 1.8.

There are over 100 ChangeLog files in the GCC source, with over 600,000
lines in total. The gcc patches mailing list archives are over 2 GB in
size, and take a considerable time to download.

2. Most patches to ChangeLog have an identifiable email in the archive.
Things get spotty with branches in some cases, also as you go back in
time, and also there is a large gap in the email archives from a while
back.

3. I think this may be a useful thing. If a place could be found to put
the 30MB of files I would be happy to maintain them on a weekly basis or
so. Alternatively I could update the ChangeLog files themselves but I
have reason to suspect that may not be popular.

If nothing else happens I will keep it up-to-date for my own use.

Tim Josling

On Tue, 2007-12-04 at 08:05 -0500, Richard Kenner wrote:
> > I didn't say you cannot or should not use these tools.  But a good comment 
> > on a piece of code sure beats a good commit message, which must be looked 
> > at 
> > separately, and can be fragmented over multiple commits, etc.
> 
> I don't see one as "beating" the other because they have very different
> purposes.  Sometimes you need one and sometimes you need the other.
> 
> The purpose of COMMENTS is to help somebody understand the code as it
> stands at some point in time.  In most cases, that means saying WHAT the
> code does and WHY (at some level) it does what it does.  Once in a while,
> it also means saying why it DOESN'T do something, for example, if it might
> appear that there's a simpler way of doing what the code is doing now but
> it doesn't work for some subtle reason.  But it's NOT appropriate to put
> into comments the historical remark that this code used to have a typo
> which caused a miscompilation at some specific place.  However, the commit
> log IS the place for that sort of note.
> 
> My view is that, in general, the comments are usually the most appropriate
> place to put information about how the code currently works and the commit
> log is generally the best place for information that contrasts how the code
> currently works with how it used to work and provides the motivation for
> making the change.  But there are exceptions to both of those generalizations.



Getting host and target size and alignment information at build time?

2008-04-11 Thread Tim Josling
I need to find out the alignment and size information for the standard
integral types and pointer types at GCC build time. 

The information is needed to work out the sizes of data structures so
that warnings about size mismatches can be produced.

The information is needed at build time because the parser and validator
do not have access to the gcc back end code when the compiler runs. So
this information needs to be worked out earlier and generated as Lisp
code ie in the build phase.

I have found tm.h, and also bconfig.h, config.h and tconfig.h. The sizes
are more or less OK as there are macros for sizes, apart from pointer
sizes in some cases. The alignment is the main problem; the alignments
for i386 are not constants but function calls and vary in certain
scenarios.

My current attempt at doing this is below. I fully acknowledge that it
is not correct. That is the reason for this posting. Does anyone have
any suggestions about how to get this information at build time?

Apart from some simple solution which I hope someone will come up with I
have two other possible avenues to solve the problem

1. Have some sort of install process which compiles and that extracts
the information out of the target compiler. Eg I could write a small
program which prints out the __alignof__ values and sizeof values for
various data items. This information then gets stored somewhere that it
can be used by the Lisp code.

2. Turn GCC into a libgccbackend and call it from my lisp code at run
time using a foreign function interface. This would make it unnecessary
to get the information at build time because the Lisp code could get it
from the compiler back end when compiling the program. This would be a
last resort at this stage due to possibilities for misuse of a
libgccbackend and also the foreign function interface overheads.

Tim Josling

/* -*- C -*- */

/*

Copyright ...

FILE Generate target-info file, - data item attributes for target time. 

Output goes to standard output.

*/
#include 
#include 
#include 
#define IN_GCC
#include "tconfig.h"
#include "system.h"
#include "coretypes.h"
#include "tm.h"

/* We don't want fancy_abort */
#undef abort

#ifndef BIGGEST_FIELD_ALIGNMENT
#define BIGGEST_FIELD_ALIGNMENT 32
#endif

#ifndef BITS_PER_UNIT
#define BITS_PER_UNIT 8
#endif

#ifndef POINTER_SIZE
#ifdef TARGET_64BIT
#define POINTER_SIZE 64
#else
#define POINTER_SIZE 32
#endif
#endif

/* Fake because some macro needs it. */
int ix86_isa_flags = 0;

static int
maxint (int a, int b)
{
  return (a>b?a:b);
}

static void
print_one_item (char *name, char *actual_usage, char *basic_type, int
size_bits)
{
  printf ("(defconstant %s-attributes\n"
  "   (gcb:make-usage-attributes\n"
  "   :usage cbt:%s\n"
  "   :basic-type cbt:%s\n"
  "   :size %d\n"
  "   :default-alignment %d\n"
  "   :sync-alignment %d))\n",
  name, actual_usage, basic_type, size_bits/BITS_PER_UNIT, 1,
  maxint (size_bits/BITS_PER_UNIT, 
/* This alignment is all wrong but there doesn't seem to be any way
   to get the true figure out of GCC short of doing a cross-build
   and then running a program on the target machine. */
  BIGGEST_FIELD_ALIGNMENT/BITS_PER_UNIT));
}

int
main (int argc, char **argv)
{
  fprintf (stderr, "TARGET_64BIT %d\n", TARGET_64BIT);
  fprintf (stderr, "POINTER_SIZE %d\n", POINTER_SIZE);
  if (argc != 1)
{
  fprintf (stderr, "Unexpected number of parameters - should be none
\n");
  abort ();
}
  printf ("...file header stuff");

  print_one_item ("char", "binary-char", "binary", BITS_PER_UNIT);
  print_one_item ("short", "binary-short", "binary", SHORT_TYPE_SIZE);
  print_one_item ("int", "binary-int", "binary", INT_TYPE_SIZE);
  print_one_item ("long", "binary-long", "binary", LONG_TYPE_SIZE);
  print_one_item ("long-long", "binary-long-long", "binary",
LONG_LONG_TYPE_SIZE);
  print_one_item ("sizet", "binary-size", "binary", POINTER_SIZE);
  print_one_item ("ptr", "binary-ptr", "binary", POINTER_SIZE);
  print_one_item ("ptr-diff", "binary-ptr-diff", "binary",
POINTER_SIZE);
  print_one_item ("display", "display", "display", BITS_PER_UNIT);
  print_one_item ("binary", "binary", "binary", INT_TYPE_SIZE);
  print_one_item ("binary1", "binary1", "binary", 1 * BITS_PER_UNIT);
  print_one_item ("binary2", "binary2", "binary", 2 * BITS_PER_UNIT);
  print_one_item ("binary4", "binary4",

Re: Getting host and target size and alignment information at build time?

2008-04-11 Thread Tim Josling
On Fri, 2008-04-11 at 09:07 -0400, Daniel Jacobowitz wrote:
> Please don't reply to an existing thread to post a new question.

Sorry, I didn't realize that would cause a problem.

> Simply put, you can't do this.  All of these things can depend on
> command line options.

It does seem you can only get this information in the context of an
actual compile on the target machine.

> 
> Why not get it out of GCC later?  You don't need to hack up GCC to do
> that.

Later is too late. I need to make decisions before the GCC back end gets
involved (the back end is in a separate OS process). For example "Is
this literal too long for this group data item?". Or "Is a redefine
larger than the original (which is not allowed)?". If the literal is too
long I need to truncate it and give an error message; if a redefine is
too large I need to extend the original and give an error message.

While this can all be done, it means I am duplicating more logic into
the C code, and this has a 4X negative productivity impact versus Lisp.
It also makes it extremely difficult to output error messages in line#
sorted order, because they are issued by different processes.

Still if that's how GCC operates I will need to find some way to deal
with it. Maybe a cut down libgccbackend that doesn't generate code, it
just gives me the information I want.

Tim Josling



Re: Getting host and target size and alignment information at build time?

2008-04-12 Thread Tim Josling
On Fri, 2008-04-11 at 17:05 -0400, Daniel Jacobowitz wrote:
> On Sat, Apr 12, 2008 at 06:59:28AM +1000, Tim Josling wrote:
> > > Why not get it out of GCC later?  You don't need to hack up GCC to do
> > > that.

> That's not what I meant.  You don't need it _during the GCC build
> process_.  You can fork GCC and run it and have it tell you the answer
> based on the current command line arguments, read its output, and
> go on with what you were doing.  Which presumably involves further
> compilation.
> 

You're right... That's more or less what I think I will do. I'm working
on a proof of concept at the moment.

> (You didn't say what you are trying to do, so I'm guessing at the
> context a bit.)
> 

Here is some more explanation of what I am trying to do. 

My COBOL compiler was going to look like this:

(1). The gcc driver calls a lisp program which does the preprocessing
(lex/parse/copy-includes/replaces), lex, parse, cross checking,
simplification, and creates an output file in a simple binary format.
This Lisp program does not have direct access to any GCC code including
headers.

(2). The gcc driver passes the output file to another program (cob1)
which would be similar to cc1 except that its input file is a simple
binary format that does not need to be lexed and parsed. This program
will be drived by toplev.c and will generate, via the gcc back end, the
assembler output to be assembled and linked by subsequent programs
called from the gcc driver. It will have access to all the gcc middle
and back end code.

My initial intention was that the program (1) should know as little
about gcc as possible. I then realised that it would need some target
information such as type sizes and alignment information from gcc. I
thought I could get this information by writing a small program that
would pull in some headers and could then output a Lisp program that
could be compiled into program (1).

This didn't work out very well because the information is only available
within the compiler at run time on the target system, and it is dynamic
and option-dependent.

So I will add an option to the compiler "-fget-types". This will trigger
the output on standard output of all the information I need. So the flow
will be:

(0) cob1 -fGet-types ->stdout passed as a parameter to (1)
(1) Lisp pgm ->binary file
(2) cob1 Main toplev.c compilation taking binary file as input.

For various reasons I have to run the Lisp program via a shell script. I
cab readily include the -fget-types run in that, something like this

lisp  --user-parms -o 
 -ftypes=`cob1 -fget-types` 

The stdout from cob1 -fget-types will get passed to the Lisp program via
the shell back-quotes facility, which incorporates stdout from a file
into the command line where the back quotes appear. This is used
elsewhere in gcc.

Regards,
Tim Josling





Re: Getting host and target size and alignment information at build time?

2008-04-16 Thread Tim Josling
On Sat, 2008-04-12 at 18:16 +1000, Tim Josling wrote:
> On Fri, 2008-04-11 at 17:05 -0400, Daniel Jacobowitz wrote:
> > On Sat, Apr 12, 2008 at 06:59:28AM +1000, Tim Josling wrote:
> > > > Why not get it out of GCC later?  You don't need to hack up GCC to do
> > > > that.

> You're right... That's more or less what I think I will do. I'm working
> on a proof of concept at the moment.

Here is the proof of concept for getting the type information out of the
gcc back end. It was not as hard as I expected in the end.

cob2.c:
http://cobolforgcc.cvs.sourceforge.net/cobolforgcc/gcc/gcb/cob2.c?revision=1.1&view=markup
See get_build_types () and get_target_types ()

Called from script cob1.sh:
http://cobolforgcc.cvs.sourceforge.net/cobolforgcc/gcc/gcb/cob1.sh?revision=1.1

Used by type-info.lisp:
http://cobolforgcc.cvs.sourceforge.net/cobolforgcc/gcc/gcb/type-info.lisp?revision=1.1&view=markup
See defun init-type-info

Any comments or suggestions welcome. Thanks for your ideas Daniel.

Tim Josling



Some questions about writing a front end

2008-04-16 Thread Tim Josling
BACKGROUND (optional)

I've now reached the point of writing the GCC middle/back end interface
for my Cobol compiler. See
 
http://cobolforgcc.sourceforge.net/

Previously I wrote two front ends but that was a while ago. These were
the original iteration of cobolforgcc 1998-2003, and the now defunct
treelang of similar vintage. I also translated and updated the "how to
write a front end document", now sadly out of date
http://cobolforgcc.sourceforge.net/cobol_14.html

But that was all a while ago and a lot has happened. I read the GCC
Summit papers and the GCC Wiki but a few questions remain and there are
some things I'm not quite sure about.


QUESTIONS

1. Sample front-end: Given treelang no longer exists and "is not a good
example anyway" what would be the best front end to use as a model and
to plagiarize code?

I have found that the Ada front end, while large, is quite easy to
follow and I've been using that. C/C++ seem to have the back end
interface very enmeshed in the hand coded parsers. The Java front end is
reasonably small (only handles class files?) but the back end (BE)
interface is spread among 30 files. The fortran Front End (FE) has 58
files with BE interfaces. Objective C/++ are mostly just add-ons to C.

What I don't know is how up-to-date the various front ends are and how
good an example they are.

2. Most-Gimplified front-end: Allied to Q1, which front ends have been
most thoroughly converted to GIMPLE?

3. LANG_HOOKS: There has been some discussion about LANG_HOOKS being
removed in the future. From memory this was in the context of the
"optimization in the linker (LTI)" projects. Is there a replacement I
should use now, or is there anything I should do to prepare for the
replacement?

4. What does Gimple cover: What is the scope of GIMPLE? Most of the
discussion is about procedural code. Does it also cover variable
definition, function prototype definition etc.

5. What is deprecated: Is there any time-effective way to identify
constructs, header files, macros, variable and functions that are
"deprecated".

6. Tuples: I am a bit confused about tuples. Tuples seem to be really
just structs by another name, unless I have missed the point. The idea
is not a bad one - I went through the same process in the Lisp code in
the front end where initially I stored everything in arrays and later
switched to structs/tuples. In lisp this provided the advantages of
run-time type-checking and the ability to use mnemonic names. 

The first email about tuples that I can find seems to assume a
reasonable amount of background on the part of the reader:
http://www.mailinglistarchive.com/gcc@gcc.gnu.org/msg01669.html

Some clarification about what the tuples project is trying to do, and in
particular how I should position for the advent of tuples would be very
useful. I have read the material in the Wiki and from the GCC summit.

7. Should I target GENERIC, High Gimple or Low Gimple? Would I miss
optimizations if I went straight to a Gimple representation? Is one
interface more likely to change radically in the future? The assumption
here is that the front end will be using an entirely different
representation so there is no question of using one of these in the
Front End. It is just a question of which format to convert into.

Thank you all for any help you can provide,
Tim Josling



Re: [tuples] New requirement for new patches

2008-04-17 Thread Tim Josling
> - The C front end is bootstrapping.  The failure rate in the
testsuites is in the 2-4% range.

I've been trying to do a C-only bootstrap of the tuples branch for a
couple of days  on "Linux tim-gcc 2.6.20-15-generic #2 SMP Sun Apr 15
06:17:24 UTC 2007 x86_64 GNU/Linux" and I get

/../libdecnumber
-I/home2/gcc-gimple-tuples-branch/gcc/gcc/../libdecnumber/bid
-I../libdecnumber  /home2/gcc-gimple-tuples-branch/gcc/gcc/tree-optimize.c -o 
tree-optimize.o
/home2/gcc-gimple-tuples-branch/gcc/gcc/tree-data-ref.c: In function
'compute_all_dependences':
/home2/gcc-gimple-tuples-branch/gcc/gcc/tree-data-ref.c:3930: internal
compiler error: in avail_expr_eq, at tree-ssa-dom.c:2482
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.
make[3]: *** [tree-data-ref.o] Error 1
make[3]: *** Waiting for unfinished jobs

Also this one is looping at the time the other one crashes. From ps aux|
grep cc1:

tim   4500 71.9  1.6  82840 67128 pts/0RN+  11:08
3:02 /home2/gcc-gimple-tuples-branch/objdir/./prev-gcc/cc1 -quiet -I.
-I. -I/home2/gcc-gimple-tuples-branch/gcc/gcc
-I/home2/gcc-gimple-tuples-branch/gcc/gcc/.
-I/home2/gcc-gimple-tuples-branch/gcc/gcc/../include
-I/home2/gcc-gimple-tuples-branch/gcc/gcc/../libcpp/include
-I/usr/local/include
-I/home2/gcc-gimple-tuples-branch/gcc/gcc/../libdecnumber
-I/home2/gcc-gimple-tuples-branch/gcc/gcc/../libdecnumber/bid
-I../libdecnumber
-iprefix 
/home2/gcc-gimple-tuples-branch/objdir/prev-gcc/../lib/gcc/x86_64-unknown-linux-gnu/4.4.0/
 -isystem /home2/gcc-gimple-tuples-branch/objdir/./prev-gcc/include -isystem 
/home2/gcc-gimple-tuples-branch/objdir/./prev-gcc/include-fixed -DIN_GCC 
-DHAVE_CONFIG_H insn-attrtab.c -quiet -dumpbase insn-attrtab.c -mtune=generic 
-auxbase-strip insn-attrtab.o -g -O2 -W -Wall -Wwrite-strings 
-Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition -pedantic 
-Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror 
-Wno-return-type -Wno-format -Wno-missing-format-attribute -o /tmp/ccHig7nY.s

Tim Josling

On Thu, 2008-04-17 at 17:19 -0700, Diego Novillo wrote:
> Please notice that the wiki page for tuples has new rules for patches. 
>  From now on, every patch needs to have been tested with a C-only bootstrap.
> 
> 
> Thanks.  Diego.



Re: Some questions about writing a front end

2008-04-30 Thread Tim Josling
On Thu, 2008-04-17 at 10:24 -0700, Ian Lance Taylor wrote:
> Tim Josling <[EMAIL PROTECTED]> writes:

> > 5. What is deprecated: Is there any time-effective way to identify
> > constructs, header files, macros, variable and functions that are
> > "deprecated".
> 
> Not really.  We try not to leave deprecated stuff around for too long.
> 

Good, but I was wondering then what I should avoid so that my front-end
avoids being "not a very good example" (like treelang).

Tim Josling



Re: Rant about ChangeLog entries and commit messages

2007-12-03 Thread Tim Josling
On Mon, 2007-12-03 at 13:58 -0500, Diego Novillo wrote:
> On 12/03/07 13:50, Richard Kenner wrote:
> >> I guess that could work, but that wouldn't give a way into the history 
> >> for the change.  Several times there is a post-mortem discussion on the 
> >> patch, leading to more patches.
> > 
> > How about both?
> 
> Sure.
> 
> 
> Diego.

Quite a few people are worried about verbose descriptions of changes
cluttering up the ChangeLog. Others (like me) would like a way easily to
find the discussions about the change, and would like a brief indication
in the ChangeLog of the context of the change. The FSF also has good
reasons for keeping solid records of who made what change.

So, how about this:

1. For a PR fix, continue to record the PR number and category.
Like this: 
  PR tree-optimization/32694

2. For all changes, a one-line record giving the context, plus the URL
of a key message in the email message trail, unless the intent is
plainly obvious such as bumping the version number.
Like this:
  Gimplification of Fortran front end. 
  http://gcc.gnu.org/ml/gcc-patches/2007-12/msg00072.html   

3. Continue to record "who made what change".
Like this: 
   * config/xtensa/xtensa.c (xtensa_expand_prologue): Put a
REG_FRAME_RELATED_EXPR note on the last insn that sets up the stack
pointer or frame pointer.

This should satisfy everyone's needs.

This would by no means be the largest divergence from the FSF standards
by the GCC project. The use of languages other than C in the Ada front
end is non-compliant by my reading. The compliance of the rest of the
code to the FSF standards is spotty at times eg the garbage collection
code.

While this is a divergence from the FSF standards, it is a positive
change and no information is being lost.

It would be interesting to ask someone who was around at the time why
the guidelines were written as they were. They rationale may no longer
be relevant.

Tim Josling




Re: [RFC] WHOPR - A whole program optimizer framework for GCC

2007-12-12 Thread Tim Josling
On Wed, 2007-12-12 at 15:06 -0500, Diego Novillo wrote:
> Over the last few weeks we (Google) have been discussing ideas on how to
> leverage the LTO work to implement a whole program optimizer that is
> both fast and scalable.
> 
> While we do not have everything thought out in detail, we think we have
> enough to start doing some implementation work.  I tried attaching the 
> document, but the mailing list rejected it.  I've uploaded it to
> http://airs.com/dnovillo/pub/whopr.pdf

A few questions:

Do you have any thoughts on how this approach would be able to use
profiling information, which is very a very powerful source of
information for producing good optimisations?

Would there be much duplication of code between this and normal GCC
processing or would it be possible to share a common code base?

A few years back there were various suggestions about having files
containing intermediate representations and this was criticised because
it could make it possible for people for subvert the GPL by connecting
to the optimisation phases via such an intermediate file. Arguably the
language front end is then a different program and not covered by the
GPL. It might be worth thinking about this aspect. 

This also triggers the thought that if you have this intermediate
representation, and it is somewhat robust to GCC patchlevels, you do not
actually need source code of proprietary libraries to optimize into
them. You only need the intermediate files, which may be easier to get
than source code.

Tim Josling



Re: [RFC] WHOPR - A whole program optimizer framework for GCC

2007-12-19 Thread Tim Josling
On Thu, 2007-12-13 at 08:27 -0500, Diego Novillo wrote:
> On 12/13/07 2:39 AM, Ollie Wild wrote:
> 
> > The lto branch is already doing this, so presumably that discussion
> > was resolved (Maybe someone in the know should pipe up.).
> 
> Yes, streaming the IL to/from disk is a resolved issue.
> ...
> 
> Diego.

I found this thread
http://gcc.gnu.org/ml/gcc/2005-11/msg00735.html

>> From: Mark Mitchell 
>> To: gcc mailing list 
>> Date: Wed, 16 Nov 2005 14:26:28 -0800
>> Subject: Link-time optimzation


>> The GCC community has talked about link-time optimization for some time.
>> ...
>> We would prefer not to have this thread devolve into a discussion about
>> legal and "political" issues relating to reading and writing GCC's
>> internal representation.  I've said publicly for a couple of years that
>> GCC would need to have this ability, and, more constructively, David
>> Edelsohn has talked with the FSF (both RMS and Eben Moglen) about it.
>> The FSF has indicated that GCC now can explore adding this feature,
>> although there are still some legal details to resolve.

>> ...
>>  http://gcc.gnu.org/projects/lto/lto.pdf
>> ... 

Was there any more about this?

I have restarted work on my COBOL front end. Based on my previous
experiences writing a GCC front end I want to have as little code as
possible in the same process as the GCC back end. 

This means passing over a file. So I would like to understand how to
avoid getting into political/legal trouble when doing this.

Thanks,
Tim Josling



Re: Rant about ChangeLog entries and commit messages

2007-12-25 Thread Tim Josling
On Sat, 2007-12-15 at 20:54 -0200, Alexandre Oliva wrote:
> On Dec  3, 2007, [EMAIL PROTECTED] (Richard Kenner) wrote:
> 
> > In my view, ChangeLog is mostly "write-only" from a developer's
> > perspective.  It's a document that the GNU project requires us to
> produce
> > for
> 
> ... a good example of compliance with the GPL:
> 
>   5. Conveying Modified Source Versions.
> 
> a) The work must carry prominent notices stating that you modified
> it, and giving a relevant date.
> 

(Minor quibble) As copyright owner of GCC, the FSF is not bound by the
conditions of the licence it grants in the same way as licencees are
bound. So I don't think this provision in itself would mandate that
those who have copyright assignments to the FSF record their changes.

I don't hear anyone arguing that people should not record what they
changes and when. The question is whether it is sufficient.

I just started using git locally, and I keep thinking it would be really
great to have something like "git blame" for gcc. The command "git
blame" gives you a listing of who changed each line of the file and
when, and also gives the commit id. From that all can be revealed.

> 
> FWIW, I've used ChangeLogs to find problems a number of times in my 14
> years of work in GCC, and I find them very useful.  When I need more
> details, web-searching for the author of the patch and some relevant
> keywords in the ChangeLog will often point at the relevant e-mail, so
> burdening people with adding a direct URL seems pointless to me.  It's
> pessimizing the common case for a small optimization in far less
> common cases.
> 

This may possibly work when the mailing list entries exist and are
accessible. 

However they are only available AFAIK from 1998. GCC has been going for
2-3 times as long as that. And there is at least one significant gap:
February 2004 up to and including this message
http://gcc.gnu.org/ml/gcc-patches/2004-02/msg02288.html.

In my experience, when documentation is not stored with the source code,
it often gets lost.

When a person is offline the mailing list htmls are not available.

I have an idea to resolve this that I am working on... more in due
course if it comes to anything.

Tim Josling