This is interesting. is this related to
http://www.fossology.org/projects/fossology fosology in any way?
mike

On Tue, Apr 17, 2012 at 6:35 AM, Silvio Cesare <silvio.ces...@gmail.com>wrote:

> The Debian Package clonewise-core (currently in the mentors archive)
> http://mentors.debian.net/package/clonewise-core
> http://www.foocodechu.com/downloads/clonewise
> --
>
> Clonewise is a tool for detecting code reuse in Debian packages. This is
> also
> known as detecting embedded code copies. Debian maintains a database of
> packages that embed code in the security tracker. Clonewise is a tool to
> automate and supplement the manual tracking of packages.
>
> The primary use of it is for the security team who may identify a
> vulnerability
> in a library and want to know if that library is reused and embedded in any
> other Debian packages.
>
> -- QUICK GUIDE
>
> You might want to install the Clonewise database instead of generating it
> (which can take several days when you first run Clonewise).
>
> Download it from http://www.foocodechu.com/downloads/clonewise/
>
> Example usage to discover if the source package libpng is reused in other
> Debian packages is as follows:
>
> $ Clonewise -vv libpng
> libpng CLONED_IN_SOURCE afterstep (18.457640)
>                MATCH png.c (5.605583) (33.000000)
>                MATCH pngtrans.c (6.409078) (57.000000)
>                MATCH pngwtran.c (6.442979) (80.000000)
>        libpng CLONED_IN_PACKAGE libafterimage-dev
>        libpng CLONED_IN_PACKAGE afterstep
>        libpng CLONED_IN_PACKAGE afterstep-data
>        libpng CLONED_IN_PACKAGE libafterimage0
>        libpng CLONED_IN_PACKAGE afterstep-dbg
>        libpng CLONED_IN_PACKAGE libafterstep1
> libpng CLONED_IN_SOURCE fltk1.1 (44.336105)
>                MATCH png.c (5.605583) (58.000000)
>                MATCH pngerror.c (6.442979) (57.000000)
>                MATCH pngmem.c (6.442979) (85.000000)
>                MATCH pngpread.c (6.514438) (52.000000)
>                MATCH pngrio.c (6.478071) (77.000000)
>                MATCH pngtrans.c (6.409078) (63.000000)
>                MATCH pngwtran.c (6.442979) (80.000000)
>        libpng CLONED_IN_PACKAGE fltk1.1-doc
>        libpng CLONED_IN_PACKAGE fltk1.1-games
>        libpng CLONED_IN_PACKAGE libfltk1.1
>        libpng CLONED_IN_PACKAGE libfltk1.1-dbg
>        libpng CLONED_IN_PACKAGE libfltk1.1-dev
> [ snip ]
>
> So libpng is embedded in the source packages afterstep and fltk1.1.
> Looking at my version of the embedded-code-copies file on the security
> tracker, I can see that fltk1.1 is actually referenced as libfltk1.1 and
> has
> been fixed a while ago. The security tracker is meant to report the source
> package name, so this should probably be fixed. Clonewise otherwise
> ignores embedded code copies that have been fixed (according to the
> security tracker). I can't see afterstep in the tracker, so again, we might
> need to make an update. We don't know if afterstep has been patched
> to use a system library so we need to investigate more - like seeing
> if libpng is a dependency of the afterstep package. In real usage, if
> libpng
> is buggy, it's probably important to do this and check the afterstep
> package
> to see if is vulnerable to a libpng bug.
>
> The matching files have a weight and a score that represents the
> significance
> of the file in the repository and and the similarity of the file between
> the
> two packages.
>
> CLONED_IN_SOURCE are the source packages.
> CLONED_PACKAGE are the binary packages built from the source package.
>
> -- BUILDING THE DATABASE
>
> If you don't install clonewise-database, then the database of the package
> repository will probably need to be built the first time you run Clonewise.
> You will need to be the superuser to do this and in all likelihood it will
> take several days to complete.
>
> Clonewise will run Clonewise-BuildDatabase when the database has not been
> built. It will download the entire Debian source repository, unpack the
> packages and generate signatures for each package.
>
> -- CONFIGURATION FILES
>
> There are a number of configuration files in Clonewise.
>
> /var/lib/Clonewise/extensions - contains a list of filename extensions that
> are used to identify source code. Clonewise ignores all reuse of non
> program
> code in package contents and this is how it knows this.
>
> /var/lib/Clonewise/threshold - is the default threshold of the amount of
> code
> reuse that needs to occur before Clonewise reports it. If you get too many
> false positives, then increase this number. You can also override this
> threshold on the command line with Clonewise -C <threshold>.
>
> /var/lib/Clonewise/ignore-these-fixed - is a list of package pairs from
> the embedded-code-copies file maintained in the Debian security tracker
> where
> it has been reported that the packages in question have been modified so
> system wide libraries are being used and there is no embedded code in the
> build.
>
> /var/lib/Clonewise/ignore-these-false-positives - is a list of package
> pairs
> that should not be reported as having code reuse. This file is intended to
> contain known false positives.
>
> -- HELPER UTILITIES
>
> Clonewise-ParseDatabase is a program to parse Debian's embedded-code-copies
> file maintained in the security tracker. Probably the main use of it is to
> generate the content for the ignore-these-fixed configuration file.
>
> To list the package pairs of embedded code that are reported to have been
> "fixed", run this command:
>
> $ Clonewise-ParseDatabase -f <embedded-code-copies-file>
>
> The output of that command can go directly into the ignore-these-fixed
> configuration file. For example:
>
> # Clonewise-ParseDatabase -f <embedded-code-copies> >
> /var/lib/Clonewise/ignore-these-fixed
>
> You might want to run that command whenever the upstream version of the
> embedded-code-copies file is changed to reflect that a package has been
> fixed
> to avoid an embedded code copy.
>
> The -u option is for identifying unfixed embedded code copies. The command
> run without any options prints all embedded code copies in the Clonewise
> native format.
>
> Another utility which is probably only useful for developers is:
>
> $ Clonewise-RunTests
>
> This is useful for comparing Clonewise's results against Debian's manually
> created embedded-code-copies file maintained in the security tracker.
>
> -- COMMAND LINE OPTIONS
>
> The command line options for Clonewise are:
>
> -e              Report all internal errors.
>
> -o xml          Output in XML.
>
> -C <threshold>  Override threshold configuration on how much code reuse
> needs
>                to occur before reporting.
>
> -v              Verbose - show more information.
>
> -vv             Really verbose - show why packages are reported as reusing
>                code. This is the option most people want.S
>
> -vvv            Show scores for all packages. Not really useful for non
>                developers.
>
> -a              Run analysis over entire database and show all embedded
> code
>                copies. When using this option, no package name argument is
>                required on the command line.
>
> -s              Don't use ssdeep to do a fuzzy check of similar content.
> This
>                will increase the false positive rate, but can also increase
>                the true positive rate. Probably not useful for non
> developers.
>
> -t              Don't use filename extensions when compring packages. This
> is
>                useful if you are looking for reuse of a package's contents
>                that is not based on program code.
>
> -- EXTENDED DESCRIPTION OF THE NUMBERS IN THE OUTPUT
>
> What are the numbers in the output of Clonewise? They represent weights and
> scores.
>
> $ Clonewise -vv libpng
> libpng CLONED_IN_SOURCE afterstep (18.457640)
>                MATCH png.c (5.605583) (33.000000)
>                MATCH pngtrans.c (6.409078) (57.000000)
>                MATCH pngwtran.c (6.442979) (80.000000)
> [ snip ]
>
> png.c has a weight of 5.605583. The more frequent png.c occurs accross
> packages
> in the Debian source repository, the lower the weight. For example, if
> extensions were not used and README was matched, then the weight would be
> very low because the filename README occurs in almost every package.
>
> png.c has a similarity of 33.000000. This means that ssdeep identified a
> similarity of 33% between png.c in the afterstep and libpng package.
> Because it
> is greater than 0, it probably means that they derive from the same source
> in
> some earlier version of libpng.
>
> The score of 18.45760 is an accumulation of the weights in the matching
> files.
> This score is what the Clonewise threshold is compared against. If this
> score
> is greater than the threshold, Clonewise reports code reuse to have
> occured.
> The higher this number, the much more believable it is that code reuse has
> occured.
>
> -- HOW DOES IT WORK?
>
> It's a simple idea really. If two packages' source trees share the same
> filenames, and the content looks similar according to a fuzzy hash, then
> they
> share code.
>
> Each filename has a weight based on the inverse document frequency. This
> is a fancy way of saying if the same filename is common to lots of packages
> then it has a lower weight.
>
> Each matching file is counted and the weights all add up. If the sum weight
> exceeds a threshold, Clonewise will report it.
>
>  -- Silvio Cesare <silvio.ces...@gmail.com>
>
>
> --
> To UNSUBSCRIBE, email to debian-mentors-requ...@lists.debian.org
> with a subject of "unsubscribe". Trouble? Contact
> listmas...@lists.debian.org
> Archive:
> http://lists.debian.org/ca+ygn1ja3dpdnjfyzy_bzje2iurvhuhmy9rxshy3kfbe3p...@mail.gmail.com
>
>


-- 
James Michael DuPont
Member of Free Libre Open Source Software Kosova http://flossk.org

Reply via email to