Forwarding as requested. - Sam Ruby
---------- Forwarded message ---------- From: Christian Lohmaier <lohmaier+ooofut...@googlemail.com> Date: Mon, Jun 6, 2011 at 5:45 AM Subject: [tdf-discuss] Code covered by the Oracle grant (was: Proposal to join Apache OpenOffice) To: disc...@documentfoundation.org Hi Sam, *, please forward this also to the apache-list where I'm not subscribed (I suggest only Sam does, in order to prevent 50 people forwarding the very same mail :-D) On Sun, Jun 5, 2011 at 1:32 AM, Sam Ruby <ru...@apache.org> wrote: > On Sat, Jun 4, 2011 at 7:03 PM, Christian Lohmaier > <lohmaier+ooofut...@googlemail.com> wrote: >> >> As far as I know, there is only the "intent" of Oracle to >> donate it unter the Apache License, but no clear statement has been >> made as to what exact sourcecode this will cover. > > The ASF has a signed software grant with a specific list of source files. > >> It's not even clear whether it will be the current codebase or some >> older version IBM is basing their version on. > > It is the codebase on openoffice.org. The intent is to move the full > version history. The mechanics of this have yet to be worked out. As on the apache list, a link to that "list of source files" has been provided, and there have been claims that this list is covering the whole source, I had a deeper look myself. 1st of all: It doesn't any history-data/mercurial database files, so how this point is covered is not clear to me at all, but on to my analysis of the Oracle provided filelist that was made available here: http://people.apache.org/~rubys/openoffice.files.txt 1st observation: Some filepaths are split. The lines are split at various line-length, and not at "word limits" like the dot for the filename extension or the slash that delimits directorys, but in middle of the string, see http://libreoffice.pastebin.ca/2075460 for a patch to fix those 2nd observation: The file is not sorted alphabetically (at least differs from sort output/what comm tool that is later used expects, so sort it: sort openoffice.files.txt > sorted_ooo.lst In order to do the comparison, clone the current repo hg clone http://hg.services.openoffice.org/DEV300/ and create a filelist, excluding the repository's data find DEV300/ -type f -not -path 'DEV300/.hg/*' | cut -c 8- | sort > repo.lst raw numbers: wc -l repo.lst sorted_ooo.lst 69076 repo.lst 39616 sorted_ooo.lst So even calling this "seems to include the full repo" and that even twice is either with malicious intent, or with no clue. Christian Lippka really should know better, but had stated this at least twice. Close to 30000 files gone, who cares "source seems complete".. Now to interesting numbers: Files in the Oracle's list, but not in the repo-list (= files most likely moved by refactoring the code (gbuildification of modules and similar) = indication of when the snapshot was taken): comm -1 -3 repo.lst sorted_ooo.lst |wc -l $ 455 digging in hg's history shows that the snapshot of the sources must have been taken before 2011-03-21 - as those files were [re]moved in the following cws: 276288 2011-03-21 CWS-TOOLING: integrate CWS dr78 276552 2011-03-29 CWS-TOOLING: integrate CWS ka102 276583 2011-03-29 CWS-TOOLING: integrate CWS vcl2gnumake 276711 2011-04-01 CWS-TOOLING: integrate CWS solaris11 276673 2011-04-01 CWS-TOOLING: integrate CWS calcvba 276692 2011-04-01 CWS-TOOLING: integrate CWS mav60 So while one can clearly say that those are not part of the sources, and hence the code is at most in the state of m103 (but of course that doesn't exclude that the codebase can be older than that) The changes of at least 27 CWS (+3 masterfix ones) that have been integrated into OOo code in the meantime are definitely missing. Files in repo, but not in Oracle's list: $ comm -2 -3 repo.lst sorted_ooo.lst |wc -l 29915 sdf files = translation files: Those are not included in either repos, the sdf files that are in the repo are for testcases/gsicheck, the translations have been split to a seperate repository http://hg.services.openoffice.org/master_l10n/DEV300/ So those don't even account to the difference! $ grep -c sdf$ repo.lst sorted_ooo.lst repo.lst:10 sorted_ooo.lst:0 Image files = binary files egrep -c '(bmp|png|gif|jpe?g)$' repo.lst sorted_ooo.lst repo.lst:12352 sorted_ooo.lst:0 So this is one big chunk, all toolbar icons for the different themes, cursors, artwork for the installers, etc. But what are the remaining 17563 files? shell-fu will give a hint: $ comm -2 -3 repo.lst sorted_ooo.lst | egrep -v '(bmp|png|gif|jpe?g)$' | sed -n -e 's/.*\.\([^./]*\)$/\1/p' | sort | uniq -c | sort -rn | head 1716 ott 1329 xml 1140 xlb 813 xcu 749 cfg 710 csv 588 txt 555 h 472 css 459 java OK, the user will not get any templates either, too bad, but the next ones are interesting. No configuration schemes, no configuration data either. Let's have a closer look: $ comm -2 -3 repo.lst sorted_ooo.lst | grep xcu$ | awk -F/ '{print $1}' |sort |uniq -c 32 dictionaries 4 extensions 716 filter 3 lingucomponent 2 mysqlc 21 odk 16 officecfg 1 pyuno 3 scripting 7 sdext 5 sfx2 3 testautomation Want to load documents? Too bad, Apache won't know about the filters. Want to save? Hah, that 's a good one, apache-OOo doesn't know about export filters either. Spellchecking? ha, dream on… (but that is understandable, as dictionaries are mostly third-party stuff, so that one is excused) Let alone the other binary files (various OOo documents, also some MS-Office documents, the palettes, icon/wav (for gallery) the interesting ones include: Tons of xml comm -2 -3 repo.lst sorted_ooo.lst | grep xml$ | awk -F/ '{print $1}' |sort |uniq -c |sort -nr | head 235 sw 201 i18npool 154 sc 129 sd 112 testautomation 64 dictionaries 51 toolkit 45 desktop 34 scripting 29 svx Didn't look into that closer, but $ comm -2 -3 repo.lst sorted_ooo.lst | grep xml$ | grep toolbar |wc -l 392 So want to use toolbar buttons? Too bad, the corresponding definitions are not included, you won't get any/most toolbars. Good luck starting from scratch defining your own. But let alone those boring "non-code" stuff. 134 patches missing (for the external modules) (Ok, that's arguable, as the external modules won't be part of apache-OOo in the long run anyway) You want to actually build this thing? Well, too bad - the build.lst files that define the inter-module & directory dependencies, and the d.lst files that list the module' files to be exported for use by other modules are not included either: $ grep -c d.lst repo.lst sorted_ooo.lst repo.lst:425 sorted_ooo.lst:0 similar: 302 *.mk files that are only in the repo, amongst them the solenv//inc/_tg_*.mk ones, the templates that define the very basic target rules used throughout the build (and that are expanded by mkunroll to produce the makefiles that are then included by the actual build) So with this snapshot, Apache-OOo is far from being able to deliver something that is even close to OOo.as it is now. It is missing all translations, all artwork, build-dependency definitions that are absolutely needed for doing a build, no toolbar-definitions, no filter-configurations. Apart from the systematic omission of images, random source-files are missing as well, probably because they don't carry the default copyright header, for example binfilter/inc/bf_svx/svxslots.hxx So calling this list "complete" or stating something along the lines of "looks like a straight dump from hg" is a joke. So Oracle definitely needs to revise that list, and include at least the translations, the artwork, the configuration data/xml-files, the randomly omitted files, etc. And while they're on it, they could base their list on the current m106 milestone. ciao Christian -- Unsubscribe instructions: E-mail to discuss+h...@documentfoundation.org Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette List archive: http://listarchives.documentfoundation.org/www/discuss/ All messages sent to this list will be publicly archived and cannot be deleted --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org