I used a CPD (copy/paste detector) in PMD to analyze the code duplication in Python source code. I found that Python3.0 contains more duplicated code than the previous versions. The CPD tool is far from perfect, but I still feel the analysis makes some sense.
|Source Code | NLOC | Dup60 | Dup30 | Rate60 | Rate 30 | Python1.5(Core) 19418 1072 3023 6% 16% Python2.5(Core) 35797 1656 6441 5% 18% Python3.0(Core) 40737 3460 9076 8% 22% Apache(server) 18693 1114 2553 6% 14% NLOC: The net lines of code Dup60: Lines of code that has 60 continuous tokens duplicated to other code (counted twice or more) Dup30: 30 tokens duplicated Rate60: Dup60/NLOC Rate30: Dup30/NLOC We can see that the common duplicated rate is tended to be stable. But Python3.0 is slightly bigger than that. Consider the small increase in NLOC, the duplication rate of Python3.0 might be too big. Does that say something about the code quality of Python3.0? -- http://mail.python.org/mailman/listinfo/python-list