Re: [PATCHv1 0/3] git-p4: fixing --changes-block-size support

2015-06-07 Thread Lex Spoon
Great work.

For curiosity's sake, the -m solution has been observed to work on at
least one Perforce installation. However clearly it doesn't work on
others, so the batch ranges approach looks like it will be better.

Based on what has been seen so far, the Perforce maxscanrows setting
must be applying the low-level database queries that Perforce uses
internally in its implementation. That makes the precise effect on
external queries rather hard to predict. It likely also depends on the
version of Perforce.

Lex Spoon
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv1 1/3] git-p4: additional testing of --changes-block-size

2015-06-07 Thread Lex Spoon
I'll add in reviews since I touched similar code, but I don't know
whether it's sufficient given I don't know the code very well.

Anyway, these tests LGTM. Having a smaller test repository is fine,
and the new tests for files outside the client spec are a great idea.
-Lex
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv1 2/3] git-p4: test with limited p4 server results

2015-06-07 Thread Lex Spoon
LGTM. That's great adding a user with the appropriate restrictions on
it to really exercise the functionality.  -Lex
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv1 3/3] git-p4: fixing --changes-block-size handling

2015-06-07 Thread Lex Spoon
The implementation looks fine, especially given the test cases that
back it up. I am only curious why the block size is set to a default
of None. To put it as contcretely as possible: is there any expected
configuration where None would work but 500 would not? We know there
are many cases of the other way around, and those cases are going to
send users to StackOverflow to find the right workaround.

Dropping the option would also simplify the code in several places.
The complex logic around get_another_block could be removed, and
instead there could be a loop from start to mostRecentCommit by
block_size. Several places that check "if not block_size" could just
choose the other branch.

Lex Spoon
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 3/3] git-p4: fixing --changes-block-size handling

2015-06-07 Thread Lex Spoon
Unless I am reading something wrong, the "new_changes" variable could
be dropped now. It was needed for the -m version for detecting the
smallest change number that was returned. Otherwise it looks good to
me.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 3/3] git-p4: fixing --changes-block-size handling

2015-06-08 Thread Lex Spoon
Precisely, Junio, that's what I had in mind. The patch with the two
lines deleted LGTM.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3 0/4] git-p4: fixing --changes-block-size handling

2015-06-12 Thread Lex Spoon
The latest patch series LGTM. It's a pity about the more complicated
structure with two different ways to query the changes list, but it
does look hard to make it any simpler.

Lex Spoon
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] git-p4: Use -m when running p4 changes

2015-04-14 Thread Lex Spoon
(resending with accidental HTML removed)


Great, I'm glad it looks like a good approach!

I'll add a test case for it and to support the test case, an
option for the block size. I guess the block-size option will go on
"sync", "clone", and "fetch". Alternatively, maybe someone has a
better suggestion of how to configure the block size.

Lex Spoon
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] git-p4: Use -m when running p4 changes

2015-04-14 Thread Lex Spoon
>From 9cc607667a20317c837afd90d50c078da659b72f Mon Sep 17 00:00:00 2001
From: Lex Spoon 
Date: Sat, 11 Apr 2015 10:01:15 -0400
Subject: [PATCH] git-p4: Use -m when running p4 changes

Signed-off-by: Lex Spoon 
---
Updated to include a test case

 git-p4.py   | 51 ++-
 t/t9818-git-p4-block.sh | 64 +
 2 files changed, 104 insertions(+), 11 deletions(-)
 create mode 100755 t/t9818-git-p4-block.sh

diff --git a/git-p4.py b/git-p4.py
index 549022e..2fc8d9c 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -740,17 +740,43 @@ def
createOrUpdateBranchesFromOrigin(localRefPrefix = "refs/remotes/p4/",
silent
 def originP4BranchesExist():
 return gitBranchExists("origin") or
gitBranchExists("origin/p4") or gitBranchExists("origin/p4/master")

-def p4ChangesForPaths(depotPaths, changeRange):
+def p4ChangesForPaths(depotPaths, changeRange, block_size):
 assert depotPaths
-cmd = ['changes']
-for p in depotPaths:
-cmd += ["%s...%s" % (p, changeRange)]
-output = p4_read_pipe_lines(cmd)
+assert block_size
+
+# Parse the change range into start and end
+if changeRange is None or changeRange == '':
+changeStart = '@1'
+changeEnd = '#head'
+else:
+parts = changeRange.split(',')
+assert len(parts) == 2
+changeStart = parts[0]
+changeEnd = parts[1]

+# Accumulate change numbers in a dictionary to avoid duplicates
 changes = {}
-for line in output:
-changeNum = int(line.split(" ")[1])
-changes[changeNum] = True
+
+for p in depotPaths:
+# Retrieve changes a block at a time, to prevent running
+# into a MaxScanRows error from the server.
+start = changeStart
+end = changeEnd
+get_another_block = True
+while get_another_block:
+new_changes = []
+cmd = ['changes']
+cmd += ['-m', str(block_size)]
+cmd += ["%s...%s,%s" % (p, start, end)]
+for line in p4_read_pipe_lines(cmd):
+changeNum = int(line.split(" ")[1])
+new_changes.append(changeNum)
+changes[changeNum] = True
+if len(new_changes) == block_size:
+get_another_block = True
+end = '@' + str(min(new_changes))
+else:
+get_another_block = False

 changelist = changes.keys()
 changelist.sort()
@@ -1912,6 +1938,8 @@ class P4Sync(Command, P4UserMap):
 optparse.make_option("--import-local",
dest="importIntoRemotes", action="store_false",
  help="Import into refs/heads/ ,
not refs/remotes"),
 optparse.make_option("--max-changes", dest="maxChanges"),
+optparse.make_option("--changes-block-size",
dest="changes_block_size", type="int",
+ help="Block size for calling p4 changes"),
 optparse.make_option("--keep-path",
dest="keepRepoPath", action='store_true',
  help="Keep entire
BRANCH/DIR/SUBDIR prefix during import"),
 optparse.make_option("--use-client-spec",
dest="useClientSpec", action='store_true',
@@ -1940,6 +1968,7 @@ class P4Sync(Command, P4UserMap):
 self.syncWithOrigin = True
 self.importIntoRemotes = True
 self.maxChanges = ""
+self.changes_block_size = 500
 self.keepRepoPath = False
 self.depotPaths = None
 self.p4BranchesInGit = []
@@ -2578,7 +2607,7 @@ class P4Sync(Command, P4UserMap):

 return ""

-def importNewBranch(self, branch, maxChange):
+def importNewBranch(self, branch, maxChange, changes_block_size):
 # make fast-import flush all changes to disk and update the
refs using the checkpoint
 # command so that we can try to find the branch parent in the
git history
 self.gitStream.write("checkpoint\n\n");
@@ -2586,7 +2615,7 @@ class P4Sync(Command, P4UserMap):
 branchPrefix = self.depotPaths[0] + branch + "/"
 range = "@1,%s" % maxChange
 #print "prefix" + branchPrefix
-changes = p4ChangesForPaths([branchPrefix], range)
+changes = p4ChangesForPaths([branchPrefix], range, changes_block_size)
 if len(changes) <= 0:
 return False
 firstChange = changes[0]
@@ -3002,7 +3031,7 @@ class P4Sync(Command, P4UserMap):
 if self.verbose:
 print "Getting p4 changes for

Re: [PATCH] git-p4: Use -m when running p4 changes

2015-04-17 Thread Lex Spoon
Thanks, all. I will update the patch as requested and resend a [PATCH
v3]. This time without the redundant headers. I will also make an
extra effort to make sure that the raw tabs do not get converted to
spaces this time. Oof, I am really out of practice at programming with
raw tabs, much less getting them to make it through email software.
Thank you for your patience.

test_seq is a neat utility. Also, I don't know why I didn't think to
update the document page. Certainly it needs to be updated.


Lex Spoon
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] git-p4: Use -m when running p4 changes

2015-04-17 Thread Lex Spoon
Simply running "p4 changes" on a large branch can
result in a "too many rows scanned" error from the
Perforce server. It is better to use a sequence
of smaller calls to "p4 changes", using the "-m"
option to limit the size of each call.

Signed-off-by: Lex Spoon 
Reviewed-by: Junio C Hamano 
Reviewed-by: Luke Diamand 
---
Updated as suggested:
- documentation added
- avoided touch(1)
- used test_seq
- used || exit for test commands inside for loops
- more tabs
- fewer line breaks
- expanded commit message

 Documentation/git-p4.txt | 17 ++---
 git-p4.py| 54 +++-
 t/t9818-git-p4-block.sh  | 64 
 3 files changed, 120 insertions(+), 15 deletions(-)
 create mode 100755 t/t9818-git-p4-block.sh

diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt
index a1664b9..82aa5d6 100644
--- a/Documentation/git-p4.txt
+++ b/Documentation/git-p4.txt
@@ -225,9 +225,20 @@ Git repository:
they can find the p4 branches in refs/heads.
 
 --max-changes ::
-   Limit the number of imported changes to 'n'.  Useful to
-   limit the amount of history when using the '@all' p4 revision
-   specifier.
+   Import at most 'n' changes, rather than the entire range of
+   changes included in the given revision specifier. A typical
+   usage would be use '@all' as the revision specifier, but then
+   to use '--max-changes 1000' to import only the last 1000
+   revisions rather than the entire revision history.
+
+--changes-block-size ::
+   The internal block size to use when converting a revision
+   specifier such as '@all' into a list of specific change
+   numbers. Instead of using a single call to 'p4 changes' to
+   find the full list of changes for the conversion, there are a
+   sequence of calls to 'p4 changes -m', each of which requests
+   one block of changes of the given size. The default block size
+   is 500, which should usually be suitable.
 
 --keep-path::
The mapping of file names from the p4 depot path to Git, by
diff --git a/git-p4.py b/git-p4.py
index 549022e..1fba3aa 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -740,17 +740,43 @@ def createOrUpdateBranchesFromOrigin(localRefPrefix = 
"refs/remotes/p4/", silent
 def originP4BranchesExist():
 return gitBranchExists("origin") or gitBranchExists("origin/p4") or 
gitBranchExists("origin/p4/master")
 
-def p4ChangesForPaths(depotPaths, changeRange):
+def p4ChangesForPaths(depotPaths, changeRange, block_size):
 assert depotPaths
-cmd = ['changes']
-for p in depotPaths:
-cmd += ["%s...%s" % (p, changeRange)]
-output = p4_read_pipe_lines(cmd)
+assert block_size
+
+# Parse the change range into start and end
+if changeRange is None or changeRange == '':
+changeStart = '@1'
+changeEnd = '#head'
+else:
+parts = changeRange.split(',')
+assert len(parts) == 2
+changeStart = parts[0]
+changeEnd = parts[1]
 
+# Accumulate change numbers in a dictionary to avoid duplicates
 changes = {}
-for line in output:
-changeNum = int(line.split(" ")[1])
-changes[changeNum] = True
+
+for p in depotPaths:
+# Retrieve changes a block at a time, to prevent running
+# into a MaxScanRows error from the server.
+start = changeStart
+end = changeEnd
+get_another_block = True
+while get_another_block:
+new_changes = []
+cmd = ['changes']
+cmd += ['-m', str(block_size)]
+cmd += ["%s...%s,%s" % (p, start, end)]
+for line in p4_read_pipe_lines(cmd):
+changeNum = int(line.split(" ")[1])
+new_changes.append(changeNum)
+changes[changeNum] = True
+if len(new_changes) == block_size:
+get_another_block = True
+end = '@' + str(min(new_changes))
+else:
+get_another_block = False
 
 changelist = changes.keys()
 changelist.sort()
@@ -1911,7 +1937,10 @@ class P4Sync(Command, P4UserMap):
 optparse.make_option("--import-labels", dest="importLabels", 
action="store_true"),
 optparse.make_option("--import-local", 
dest="importIntoRemotes", action="store_false",
  help="Import into refs/heads/ , not 
refs/remotes"),
-optparse.make_option("--max-changes", dest="maxChanges"),
+optparse.make_option("--max-changes&

Re: [PATCH v3] git-p4: Use -m when running p4 changes

2015-04-20 Thread Lex Spoon
On Mon, Apr 20, 2015 at 5:53 AM, Luke Diamand  wrote:
> I could be wrong about this, but it looks like importNewBranches() is taking
> an extra argument, but that isn't reflected in the place where it gets
> called. I think it just got missed.
>
> As a result, t9801-git-p4-branch.sh fails with this error:

Oh dear, definitely. The argument can in fact be dropped, because it's
already already available via a field of the same object. I post an
update with that change.  -Lex
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4] git-p4: Use -m when running p4 changes

2015-04-20 Thread Lex Spoon
Simply running "p4 changes" on a large branch can
result in a "too many rows scanned" error from the
Perforce server. It is better to use a sequence
of smaller calls to "p4 changes", using the "-m"
option to limit the size of each call.

Signed-off-by: Lex Spoon 
Reviewed-by: Junio C Hamano 
Reviewed-by: Luke Diamand 
---
Updated to avoid the crash Luke pointed out.
All t98* tests pass now except for t9814,
which is already failing on master for some reason.

 Documentation/git-p4.txt | 17 ++---
 git-p4.py| 52 ++-
 t/t9818-git-p4-block.sh  | 64 
 3 files changed, 119 insertions(+), 14 deletions(-)
 create mode 100755 t/t9818-git-p4-block.sh

diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt
index a1664b9..82aa5d6 100644
--- a/Documentation/git-p4.txt
+++ b/Documentation/git-p4.txt
@@ -225,9 +225,20 @@ Git repository:
they can find the p4 branches in refs/heads.
 
 --max-changes ::
-   Limit the number of imported changes to 'n'.  Useful to
-   limit the amount of history when using the '@all' p4 revision
-   specifier.
+   Import at most 'n' changes, rather than the entire range of
+   changes included in the given revision specifier. A typical
+   usage would be use '@all' as the revision specifier, but then
+   to use '--max-changes 1000' to import only the last 1000
+   revisions rather than the entire revision history.
+
+--changes-block-size ::
+   The internal block size to use when converting a revision
+   specifier such as '@all' into a list of specific change
+   numbers. Instead of using a single call to 'p4 changes' to
+   find the full list of changes for the conversion, there are a
+   sequence of calls to 'p4 changes -m', each of which requests
+   one block of changes of the given size. The default block size
+   is 500, which should usually be suitable.
 
 --keep-path::
The mapping of file names from the p4 depot path to Git, by
diff --git a/git-p4.py b/git-p4.py
index 549022e..e28033f 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -740,17 +740,43 @@ def createOrUpdateBranchesFromOrigin(localRefPrefix = 
"refs/remotes/p4/", silent
 def originP4BranchesExist():
 return gitBranchExists("origin") or gitBranchExists("origin/p4") or 
gitBranchExists("origin/p4/master")
 
-def p4ChangesForPaths(depotPaths, changeRange):
+def p4ChangesForPaths(depotPaths, changeRange, block_size):
 assert depotPaths
-cmd = ['changes']
-for p in depotPaths:
-cmd += ["%s...%s" % (p, changeRange)]
-output = p4_read_pipe_lines(cmd)
+assert block_size
+
+# Parse the change range into start and end
+if changeRange is None or changeRange == '':
+changeStart = '@1'
+changeEnd = '#head'
+else:
+parts = changeRange.split(',')
+assert len(parts) == 2
+changeStart = parts[0]
+changeEnd = parts[1]
 
+# Accumulate change numbers in a dictionary to avoid duplicates
 changes = {}
-for line in output:
-changeNum = int(line.split(" ")[1])
-changes[changeNum] = True
+
+for p in depotPaths:
+# Retrieve changes a block at a time, to prevent running
+# into a MaxScanRows error from the server.
+start = changeStart
+end = changeEnd
+get_another_block = True
+while get_another_block:
+new_changes = []
+cmd = ['changes']
+cmd += ['-m', str(block_size)]
+cmd += ["%s...%s,%s" % (p, start, end)]
+for line in p4_read_pipe_lines(cmd):
+changeNum = int(line.split(" ")[1])
+new_changes.append(changeNum)
+changes[changeNum] = True
+if len(new_changes) == block_size:
+get_another_block = True
+end = '@' + str(min(new_changes))
+else:
+get_another_block = False
 
 changelist = changes.keys()
 changelist.sort()
@@ -1911,7 +1937,10 @@ class P4Sync(Command, P4UserMap):
 optparse.make_option("--import-labels", dest="importLabels", 
action="store_true"),
 optparse.make_option("--import-local", 
dest="importIntoRemotes", action="store_false",
  help="Import into refs/heads/ , not 
refs/remotes"),
-optparse.make_option("--max-changes", dest="maxChanges"),
+optparse.make_option("--max-changes", dest="maxChanges",

Re: [PATCH v4] git-p4: Use -m when running p4 changes

2015-04-20 Thread Lex Spoon
On Mon, Apr 20, 2015 at 11:15 AM, Luke Diamand  wrote:
> Sorry - could you resubmit your patch (PATCHv4 it will be) with this
> change squashed in please? It will make life much easier, especially
> for Junio!

The message you just responded is already the squashed version. It's a
single patch that includes all changes so far discussed. The subject
line says "PATCH v4", although since it's in the same thread, not all
email clients will show the subject change.

Let me know if I can do more to make the process go smoothly.

Lex Spoon
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] git-p4: add failing tests for case-folding p4d

2015-04-28 Thread Lex Spoon
The last comment in the test took me a minute to decipher. I would
suggest "no repo path called LC" instead of "no repo called LC". Also,
it would have helped me to either have a little comment on the "UC"
version of the test, or to make the previous comment a little more
neutral so that it will apply to both test cases.

Otherwise, while I am not a regular maintainer of this code, the patch
does LGTM. Certainly it's good to have more test coverage.

For the underlying problem, I haven't thought about it very much, but
it looks like a plausible first step might be to simply probe the
given file name and see if it comes back the same way. If it comes
back differently, then maybe the command should abort?


What a tough problem all around...

Lex
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html