Re: [PATCHv1 0/3] git-p4: fixing --changes-block-size support
Great work. For curiosity's sake, the -m solution has been observed to work on at least one Perforce installation. However clearly it doesn't work on others, so the batch ranges approach looks like it will be better. Based on what has been seen so far, the Perforce maxscanrows setting must be applying the low-level database queries that Perforce uses internally in its implementation. That makes the precise effect on external queries rather hard to predict. It likely also depends on the version of Perforce. Lex Spoon -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv1 1/3] git-p4: additional testing of --changes-block-size
I'll add in reviews since I touched similar code, but I don't know whether it's sufficient given I don't know the code very well. Anyway, these tests LGTM. Having a smaller test repository is fine, and the new tests for files outside the client spec are a great idea. -Lex -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv1 2/3] git-p4: test with limited p4 server results
LGTM. That's great adding a user with the appropriate restrictions on it to really exercise the functionality. -Lex -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv1 3/3] git-p4: fixing --changes-block-size handling
The implementation looks fine, especially given the test cases that back it up. I am only curious why the block size is set to a default of None. To put it as contcretely as possible: is there any expected configuration where None would work but 500 would not? We know there are many cases of the other way around, and those cases are going to send users to StackOverflow to find the right workaround. Dropping the option would also simplify the code in several places. The complex logic around get_another_block could be removed, and instead there could be a loop from start to mostRecentCommit by block_size. Several places that check "if not block_size" could just choose the other branch. Lex Spoon -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 3/3] git-p4: fixing --changes-block-size handling
Unless I am reading something wrong, the "new_changes" variable could be dropped now. It was needed for the -m version for detecting the smallest change number that was returned. Otherwise it looks good to me. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 3/3] git-p4: fixing --changes-block-size handling
Precisely, Junio, that's what I had in mind. The patch with the two lines deleted LGTM. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv3 0/4] git-p4: fixing --changes-block-size handling
The latest patch series LGTM. It's a pity about the more complicated structure with two different ways to query the changes list, but it does look hard to make it any simpler. Lex Spoon -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] git-p4: Use -m when running p4 changes
(resending with accidental HTML removed) Great, I'm glad it looks like a good approach! I'll add a test case for it and to support the test case, an option for the block size. I guess the block-size option will go on "sync", "clone", and "fetch". Alternatively, maybe someone has a better suggestion of how to configure the block size. Lex Spoon -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] git-p4: Use -m when running p4 changes
>From 9cc607667a20317c837afd90d50c078da659b72f Mon Sep 17 00:00:00 2001 From: Lex Spoon Date: Sat, 11 Apr 2015 10:01:15 -0400 Subject: [PATCH] git-p4: Use -m when running p4 changes Signed-off-by: Lex Spoon --- Updated to include a test case git-p4.py | 51 ++- t/t9818-git-p4-block.sh | 64 + 2 files changed, 104 insertions(+), 11 deletions(-) create mode 100755 t/t9818-git-p4-block.sh diff --git a/git-p4.py b/git-p4.py index 549022e..2fc8d9c 100755 --- a/git-p4.py +++ b/git-p4.py @@ -740,17 +740,43 @@ def createOrUpdateBranchesFromOrigin(localRefPrefix = "refs/remotes/p4/", silent def originP4BranchesExist(): return gitBranchExists("origin") or gitBranchExists("origin/p4") or gitBranchExists("origin/p4/master") -def p4ChangesForPaths(depotPaths, changeRange): +def p4ChangesForPaths(depotPaths, changeRange, block_size): assert depotPaths -cmd = ['changes'] -for p in depotPaths: -cmd += ["%s...%s" % (p, changeRange)] -output = p4_read_pipe_lines(cmd) +assert block_size + +# Parse the change range into start and end +if changeRange is None or changeRange == '': +changeStart = '@1' +changeEnd = '#head' +else: +parts = changeRange.split(',') +assert len(parts) == 2 +changeStart = parts[0] +changeEnd = parts[1] +# Accumulate change numbers in a dictionary to avoid duplicates changes = {} -for line in output: -changeNum = int(line.split(" ")[1]) -changes[changeNum] = True + +for p in depotPaths: +# Retrieve changes a block at a time, to prevent running +# into a MaxScanRows error from the server. +start = changeStart +end = changeEnd +get_another_block = True +while get_another_block: +new_changes = [] +cmd = ['changes'] +cmd += ['-m', str(block_size)] +cmd += ["%s...%s,%s" % (p, start, end)] +for line in p4_read_pipe_lines(cmd): +changeNum = int(line.split(" ")[1]) +new_changes.append(changeNum) +changes[changeNum] = True +if len(new_changes) == block_size: +get_another_block = True +end = '@' + str(min(new_changes)) +else: +get_another_block = False changelist = changes.keys() changelist.sort() @@ -1912,6 +1938,8 @@ class P4Sync(Command, P4UserMap): optparse.make_option("--import-local", dest="importIntoRemotes", action="store_false", help="Import into refs/heads/ , not refs/remotes"), optparse.make_option("--max-changes", dest="maxChanges"), +optparse.make_option("--changes-block-size", dest="changes_block_size", type="int", + help="Block size for calling p4 changes"), optparse.make_option("--keep-path", dest="keepRepoPath", action='store_true', help="Keep entire BRANCH/DIR/SUBDIR prefix during import"), optparse.make_option("--use-client-spec", dest="useClientSpec", action='store_true', @@ -1940,6 +1968,7 @@ class P4Sync(Command, P4UserMap): self.syncWithOrigin = True self.importIntoRemotes = True self.maxChanges = "" +self.changes_block_size = 500 self.keepRepoPath = False self.depotPaths = None self.p4BranchesInGit = [] @@ -2578,7 +2607,7 @@ class P4Sync(Command, P4UserMap): return "" -def importNewBranch(self, branch, maxChange): +def importNewBranch(self, branch, maxChange, changes_block_size): # make fast-import flush all changes to disk and update the refs using the checkpoint # command so that we can try to find the branch parent in the git history self.gitStream.write("checkpoint\n\n"); @@ -2586,7 +2615,7 @@ class P4Sync(Command, P4UserMap): branchPrefix = self.depotPaths[0] + branch + "/" range = "@1,%s" % maxChange #print "prefix" + branchPrefix -changes = p4ChangesForPaths([branchPrefix], range) +changes = p4ChangesForPaths([branchPrefix], range, changes_block_size) if len(changes) <= 0: return False firstChange = changes[0] @@ -3002,7 +3031,7 @@ class P4Sync(Command, P4UserMap): if self.verbose: print "Getting p4 changes for
Re: [PATCH] git-p4: Use -m when running p4 changes
Thanks, all. I will update the patch as requested and resend a [PATCH v3]. This time without the redundant headers. I will also make an extra effort to make sure that the raw tabs do not get converted to spaces this time. Oof, I am really out of practice at programming with raw tabs, much less getting them to make it through email software. Thank you for your patience. test_seq is a neat utility. Also, I don't know why I didn't think to update the document page. Certainly it needs to be updated. Lex Spoon -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3] git-p4: Use -m when running p4 changes
Simply running "p4 changes" on a large branch can result in a "too many rows scanned" error from the Perforce server. It is better to use a sequence of smaller calls to "p4 changes", using the "-m" option to limit the size of each call. Signed-off-by: Lex Spoon Reviewed-by: Junio C Hamano Reviewed-by: Luke Diamand --- Updated as suggested: - documentation added - avoided touch(1) - used test_seq - used || exit for test commands inside for loops - more tabs - fewer line breaks - expanded commit message Documentation/git-p4.txt | 17 ++--- git-p4.py| 54 +++- t/t9818-git-p4-block.sh | 64 3 files changed, 120 insertions(+), 15 deletions(-) create mode 100755 t/t9818-git-p4-block.sh diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt index a1664b9..82aa5d6 100644 --- a/Documentation/git-p4.txt +++ b/Documentation/git-p4.txt @@ -225,9 +225,20 @@ Git repository: they can find the p4 branches in refs/heads. --max-changes :: - Limit the number of imported changes to 'n'. Useful to - limit the amount of history when using the '@all' p4 revision - specifier. + Import at most 'n' changes, rather than the entire range of + changes included in the given revision specifier. A typical + usage would be use '@all' as the revision specifier, but then + to use '--max-changes 1000' to import only the last 1000 + revisions rather than the entire revision history. + +--changes-block-size :: + The internal block size to use when converting a revision + specifier such as '@all' into a list of specific change + numbers. Instead of using a single call to 'p4 changes' to + find the full list of changes for the conversion, there are a + sequence of calls to 'p4 changes -m', each of which requests + one block of changes of the given size. The default block size + is 500, which should usually be suitable. --keep-path:: The mapping of file names from the p4 depot path to Git, by diff --git a/git-p4.py b/git-p4.py index 549022e..1fba3aa 100755 --- a/git-p4.py +++ b/git-p4.py @@ -740,17 +740,43 @@ def createOrUpdateBranchesFromOrigin(localRefPrefix = "refs/remotes/p4/", silent def originP4BranchesExist(): return gitBranchExists("origin") or gitBranchExists("origin/p4") or gitBranchExists("origin/p4/master") -def p4ChangesForPaths(depotPaths, changeRange): +def p4ChangesForPaths(depotPaths, changeRange, block_size): assert depotPaths -cmd = ['changes'] -for p in depotPaths: -cmd += ["%s...%s" % (p, changeRange)] -output = p4_read_pipe_lines(cmd) +assert block_size + +# Parse the change range into start and end +if changeRange is None or changeRange == '': +changeStart = '@1' +changeEnd = '#head' +else: +parts = changeRange.split(',') +assert len(parts) == 2 +changeStart = parts[0] +changeEnd = parts[1] +# Accumulate change numbers in a dictionary to avoid duplicates changes = {} -for line in output: -changeNum = int(line.split(" ")[1]) -changes[changeNum] = True + +for p in depotPaths: +# Retrieve changes a block at a time, to prevent running +# into a MaxScanRows error from the server. +start = changeStart +end = changeEnd +get_another_block = True +while get_another_block: +new_changes = [] +cmd = ['changes'] +cmd += ['-m', str(block_size)] +cmd += ["%s...%s,%s" % (p, start, end)] +for line in p4_read_pipe_lines(cmd): +changeNum = int(line.split(" ")[1]) +new_changes.append(changeNum) +changes[changeNum] = True +if len(new_changes) == block_size: +get_another_block = True +end = '@' + str(min(new_changes)) +else: +get_another_block = False changelist = changes.keys() changelist.sort() @@ -1911,7 +1937,10 @@ class P4Sync(Command, P4UserMap): optparse.make_option("--import-labels", dest="importLabels", action="store_true"), optparse.make_option("--import-local", dest="importIntoRemotes", action="store_false", help="Import into refs/heads/ , not refs/remotes"), -optparse.make_option("--max-changes", dest="maxChanges"), +optparse.make_option("--max-changes&
Re: [PATCH v3] git-p4: Use -m when running p4 changes
On Mon, Apr 20, 2015 at 5:53 AM, Luke Diamand wrote: > I could be wrong about this, but it looks like importNewBranches() is taking > an extra argument, but that isn't reflected in the place where it gets > called. I think it just got missed. > > As a result, t9801-git-p4-branch.sh fails with this error: Oh dear, definitely. The argument can in fact be dropped, because it's already already available via a field of the same object. I post an update with that change. -Lex -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4] git-p4: Use -m when running p4 changes
Simply running "p4 changes" on a large branch can result in a "too many rows scanned" error from the Perforce server. It is better to use a sequence of smaller calls to "p4 changes", using the "-m" option to limit the size of each call. Signed-off-by: Lex Spoon Reviewed-by: Junio C Hamano Reviewed-by: Luke Diamand --- Updated to avoid the crash Luke pointed out. All t98* tests pass now except for t9814, which is already failing on master for some reason. Documentation/git-p4.txt | 17 ++--- git-p4.py| 52 ++- t/t9818-git-p4-block.sh | 64 3 files changed, 119 insertions(+), 14 deletions(-) create mode 100755 t/t9818-git-p4-block.sh diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt index a1664b9..82aa5d6 100644 --- a/Documentation/git-p4.txt +++ b/Documentation/git-p4.txt @@ -225,9 +225,20 @@ Git repository: they can find the p4 branches in refs/heads. --max-changes :: - Limit the number of imported changes to 'n'. Useful to - limit the amount of history when using the '@all' p4 revision - specifier. + Import at most 'n' changes, rather than the entire range of + changes included in the given revision specifier. A typical + usage would be use '@all' as the revision specifier, but then + to use '--max-changes 1000' to import only the last 1000 + revisions rather than the entire revision history. + +--changes-block-size :: + The internal block size to use when converting a revision + specifier such as '@all' into a list of specific change + numbers. Instead of using a single call to 'p4 changes' to + find the full list of changes for the conversion, there are a + sequence of calls to 'p4 changes -m', each of which requests + one block of changes of the given size. The default block size + is 500, which should usually be suitable. --keep-path:: The mapping of file names from the p4 depot path to Git, by diff --git a/git-p4.py b/git-p4.py index 549022e..e28033f 100755 --- a/git-p4.py +++ b/git-p4.py @@ -740,17 +740,43 @@ def createOrUpdateBranchesFromOrigin(localRefPrefix = "refs/remotes/p4/", silent def originP4BranchesExist(): return gitBranchExists("origin") or gitBranchExists("origin/p4") or gitBranchExists("origin/p4/master") -def p4ChangesForPaths(depotPaths, changeRange): +def p4ChangesForPaths(depotPaths, changeRange, block_size): assert depotPaths -cmd = ['changes'] -for p in depotPaths: -cmd += ["%s...%s" % (p, changeRange)] -output = p4_read_pipe_lines(cmd) +assert block_size + +# Parse the change range into start and end +if changeRange is None or changeRange == '': +changeStart = '@1' +changeEnd = '#head' +else: +parts = changeRange.split(',') +assert len(parts) == 2 +changeStart = parts[0] +changeEnd = parts[1] +# Accumulate change numbers in a dictionary to avoid duplicates changes = {} -for line in output: -changeNum = int(line.split(" ")[1]) -changes[changeNum] = True + +for p in depotPaths: +# Retrieve changes a block at a time, to prevent running +# into a MaxScanRows error from the server. +start = changeStart +end = changeEnd +get_another_block = True +while get_another_block: +new_changes = [] +cmd = ['changes'] +cmd += ['-m', str(block_size)] +cmd += ["%s...%s,%s" % (p, start, end)] +for line in p4_read_pipe_lines(cmd): +changeNum = int(line.split(" ")[1]) +new_changes.append(changeNum) +changes[changeNum] = True +if len(new_changes) == block_size: +get_another_block = True +end = '@' + str(min(new_changes)) +else: +get_another_block = False changelist = changes.keys() changelist.sort() @@ -1911,7 +1937,10 @@ class P4Sync(Command, P4UserMap): optparse.make_option("--import-labels", dest="importLabels", action="store_true"), optparse.make_option("--import-local", dest="importIntoRemotes", action="store_false", help="Import into refs/heads/ , not refs/remotes"), -optparse.make_option("--max-changes", dest="maxChanges"), +optparse.make_option("--max-changes", dest="maxChanges",
Re: [PATCH v4] git-p4: Use -m when running p4 changes
On Mon, Apr 20, 2015 at 11:15 AM, Luke Diamand wrote: > Sorry - could you resubmit your patch (PATCHv4 it will be) with this > change squashed in please? It will make life much easier, especially > for Junio! The message you just responded is already the squashed version. It's a single patch that includes all changes so far discussed. The subject line says "PATCH v4", although since it's in the same thread, not all email clients will show the subject change. Let me know if I can do more to make the process go smoothly. Lex Spoon -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] git-p4: add failing tests for case-folding p4d
The last comment in the test took me a minute to decipher. I would suggest "no repo path called LC" instead of "no repo called LC". Also, it would have helped me to either have a little comment on the "UC" version of the test, or to make the previous comment a little more neutral so that it will apply to both test cases. Otherwise, while I am not a regular maintainer of this code, the patch does LGTM. Certainly it's good to have more test coverage. For the underlying problem, I haven't thought about it very much, but it looks like a plausible first step might be to simply probe the given file name and see if it comes back the same way. If it comes back differently, then maybe the command should abort? What a tough problem all around... Lex -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html