Hi, a newish conversion with cvs2git is available to check here:
git://www.bluegap.ch/ (it's not incremental and will only stay for a few days) For everybody interested, please check the committer names and emails. I'm missing the names and email addresses for these committers: 'barry' : ('barry??', ''), 'dennis' : ('Dennis??', ''), 'inoue' : ('inoue??', ''), 'jurka' : ('jurka??', ''), 'pjw' : ('pjw??', ''), And I'm guessing that 'peter' is the same as 'petere': 'peter' : ('Peter Eisentraut (?)', 'pete...@gmx.net'), I've compared all branch heads and all tags with a cvs checkout. The only differences are keyword expansion errors. Most commonly the RCS version "1.1" is used in the resulting git repository, instead of version "1.1.1.1". This also leads to getting dates wrong ($Date keyword). I'm unsure on how to test Tom's requirement that every commit and its log message is included in the resulting git repository. Feel free to clone and inspect the mentioned git repository and propose improvements on the cvs2git options used. Aidan Van Dyk wrote: > Yes, but the point is you want an exact replica of CVS right? You're > git repo should have $PostgreSQL$ and the cvs export/checkout (you do > use -kk right) should also have $PostgreSQL$. No, I'm testing against cvs checkout, as that's what everybody is used to. > But it's important, because on *some* files you *do* want expanded > "keywords" (like the $OpenBSD ... Exp $. One of the reasons pg CVS went > to the $PostgreSQL$ keyword (I'm guessing) was so they could explictly > de-couple them from other keywords that they didn't want munging on. I don't care half as much about the keyword expansion stuff - that's doomed to disappear anyway. What I'm much more interested in is correctness WRT historic contents, i.e. that git log, git blame, etc.. deliver correct results. That's certainly harder to check. In my experience, cvs2svn (or cvs2git) does a pretty decent job at that, even in case of some corruptions. Plus it offers lots of options to fine tune the conversion, see the attached configuration I've used. > So, I wouldn't consider any conversion good unless it had all these: > > As well as stuff like: > parsecvs-master:src/backend/access/index/genam.c: * $PostgreSQL$ I disagree here and find it more convenient for the git repository to keep the "old" RCS versions - as in the source tarballs that got (and still get) shipped. Just before switching over to git one can (and should, IMO) remove these tags to avoid confusion. Regards Markus Wanner
# (Be in -*- mode: python; coding: utf-8 -*- mode.) import re from cvs2svn_lib import config from cvs2svn_lib import changeset_database from cvs2svn_lib.common import CVSTextDecoder from cvs2svn_lib.log import Log from cvs2svn_lib.project import Project from cvs2svn_lib.git_revision_recorder import GitRevisionRecorder from cvs2svn_lib.git_output_option import GitRevisionMarkWriter from cvs2svn_lib.git_output_option import GitOutputOption from cvs2svn_lib.revision_manager import NullRevisionRecorder from cvs2svn_lib.revision_manager import NullRevisionExcluder from cvs2svn_lib.fulltext_revision_recorder \ import SimpleFulltextRevisionRecorderAdapter from cvs2svn_lib.rcs_revision_manager import RCSRevisionReader from cvs2svn_lib.cvs_revision_manager import CVSRevisionReader from cvs2svn_lib.checkout_internal import InternalRevisionRecorder from cvs2svn_lib.checkout_internal import InternalRevisionExcluder from cvs2svn_lib.checkout_internal import InternalRevisionReader from cvs2svn_lib.symbol_strategy import AllBranchRule from cvs2svn_lib.symbol_strategy import AllTagRule from cvs2svn_lib.symbol_strategy import BranchIfCommitsRule from cvs2svn_lib.symbol_strategy import ExcludeRegexpStrategyRule from cvs2svn_lib.symbol_strategy import ForceBranchRegexpStrategyRule from cvs2svn_lib.symbol_strategy import ForceTagRegexpStrategyRule from cvs2svn_lib.symbol_strategy import ExcludeTrivialImportBranchRule from cvs2svn_lib.symbol_strategy import ExcludeVendorBranchRule from cvs2svn_lib.symbol_strategy import HeuristicStrategyRule from cvs2svn_lib.symbol_strategy import UnambiguousUsageRule from cvs2svn_lib.symbol_strategy import HeuristicPreferredParentRule from cvs2svn_lib.symbol_strategy import SymbolHintsFileRule from cvs2svn_lib.symbol_transform import ReplaceSubstringsSymbolTransform from cvs2svn_lib.symbol_transform import RegexpSymbolTransform from cvs2svn_lib.symbol_transform import IgnoreSymbolTransform from cvs2svn_lib.symbol_transform import NormalizePathsSymbolTransform from cvs2svn_lib.property_setters import AutoPropsPropertySetter from cvs2svn_lib.property_setters import CVSBinaryFileDefaultMimeTypeSetter from cvs2svn_lib.property_setters import CVSBinaryFileEOLStyleSetter from cvs2svn_lib.property_setters import CVSRevisionNumberSetter from cvs2svn_lib.property_setters import DefaultEOLStyleSetter from cvs2svn_lib.property_setters import EOLStyleFromMimeTypeSetter from cvs2svn_lib.property_setters import ExecutablePropertySetter from cvs2svn_lib.property_setters import KeywordsPropertySetter from cvs2svn_lib.property_setters import MimeMapper from cvs2svn_lib.property_setters import SVNBinaryFileKeywordsPropertySetter Log().log_level = Log.NORMAL ctx.revision_recorder = SimpleFulltextRevisionRecorderAdapter( CVSRevisionReader(cvs_executable=r'cvs'), GitRevisionRecorder('cvs2git-tmp/git-blob.dat'), ) ctx.revision_excluder = NullRevisionExcluder() ctx.revision_reader = None ctx.sort_executable = r'sort' ctx.trunk_only = False ctx.cvs_author_decoder = CVSTextDecoder( ['ascii', 'latin1'], ) ctx.cvs_log_decoder = CVSTextDecoder( ['ascii', 'latin1'], ) ctx.cvs_filename_decoder = CVSTextDecoder( ['ascii', 'latin1'], ) ctx.initial_project_commit_message = ( 'Standard project directories initialized by cvs2git.' ) ctx.post_commit_message = ( 'This commit was generated by cvs2git to track changes on a CVS ' 'vendor branch.' ) ctx.symbol_commit_message = ( "This commit was manufactured by cvs2git to create %(symbol_type)s " "'%(symbol_name)s'." ) ctx.decode_apple_single = False ctx.symbol_info_filename = None global_symbol_strategy_rules = [ ExcludeTrivialImportBranchRule(), UnambiguousUsageRule(), BranchIfCommitsRule(), HeuristicStrategyRule(), # Convert all ambiguous symbols as branches: AllBranchRule(), # Convert all ambiguous symbols as tags: AllTagRule(), # The last rule is here to choose the preferred parent of branches # and tags, that is, the line of development from which the symbol # sprouts. HeuristicPreferredParentRule(), ] ctx.username = 'cvs2git' ctx.svn_property_setters.extend([ CVSBinaryFileEOLStyleSetter(), CVSBinaryFileDefaultMimeTypeSetter(), DefaultEOLStyleSetter(None), SVNBinaryFileKeywordsPropertySetter(), KeywordsPropertySetter(config.SVN_KEYWORDS_VALUE), ExecutablePropertySetter(), ]) ctx.tmpdir = r'cvs2git-tmp' ctx.cross_project_commits = False ctx.cross_branch_commits = False ctx.keep_cvsignore = True ctx.retain_conflicting_attic_files = True author_transforms={ 'adunstan' : ('Androw Dunstan', 'and...@dunslane.net'), 'alvherre' : ('Alvaro Herrera', 'alvhe...@commandprompt.com'), 'barry' : ('barry??', ''), 'bryanh' : ('Bryan Henderson', 'bry...@giraffe.netgate.net'), 'darcy' : ('D\'Arcy J.M. Cain', 'da...@druid.net'), 'dennis' : ('Dennis??', ''), 'heikki' : ('Heikki Linnakangas', 'heikki.linnakan...@enterprisedb.com'), 'inoue' : ('inoue??', ''), 'ishii' : ('Tatsuo Ishii', 'is...@sraoss.co.jp'), 'joe' : ('Joe Conway', 'm...@joeconway.com'), 'jurka' : ('jurka??', ''), 'meskes' : ('Michael Meskes', 'mes...@postgresql.org'), 'mha': ('Magnus Hagander', 'mag...@hagander.net'), 'momjian' : ('Bruce Momjian', 'br...@momjian.us'), 'neilc' : ('Neil Conway', 'neil.con...@gmail.com'), 'petere' : ('Peter Eisentraut', 'pete...@gmx.net'), 'peter' : ('Peter Eisentraut (?)', 'pete...@gmx.net'), 'pjw' : ('pjw??', ''), 'scrappy' : ('Marc G. Fournier', 'scra...@postgresql.org'), 'teodor' : ('Teodor Sigaev', 'teo...@sigaev.ru'), 'tgl' : ('Tom Lane', 't...@sss.pgh.pa.us'), 'vadim' : ('Vadim B. Mikheev', 'vadi...@yahoo.com'), 'wieck' : ('Jan Wieck', 'janwi...@yahoo.com'), 'cvs2git' : ('cvs2git', 'ad...@postgresql.org'), } # This is the main option that causes cvs2svn to output to git rather # than Subversion: ctx.output_option = GitOutputOption( 'cvs2git-tmp/git-dump.dat', GitRevisionMarkWriter(), max_merges=None, author_transforms=author_transforms, ) run_options.profiling = False changeset_database.use_mmap_for_cvs_item_to_changeset_table = True run_options.set_project( r'../postgresql.org/pgsql', symbol_transforms=[ ReplaceSubstringsSymbolTransform('\\','/'), NormalizePathsSymbolTransform(), ], symbol_strategy_rules=global_symbol_strategy_rules, )
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers