On Thu, Oct 11, 2018 at 08:33:58AM -0400, Derrick Stolee wrote:
> > I don't know if this is a fruitful path at all or not. I was mostly just
> > satisfying my own curiosity on the bitmap encoding question. But I'll
> > post the patches, just to show my work. The first one is the same
> > initial p
On 10/9/2018 7:12 PM, Jeff King wrote:
On Tue, Oct 09, 2018 at 05:14:50PM -0400, Jeff King wrote:
Hmph. It really sounds like we could do better with a custom RLE
solution. But that makes me feel like I'm missing something, because
surely I can't invent something better than the state of the ar
On Tue, Oct 09, 2018 at 05:14:50PM -0400, Jeff King wrote:
> Hmph. It really sounds like we could do better with a custom RLE
> solution. But that makes me feel like I'm missing something, because
> surely I can't invent something better than the state of the art in a
> simple thought experiment,
On Tue, Oct 09, 2018 at 03:03:08PM -0400, Derrick Stolee wrote:
> > I wonder if Roaring does better here.
>
> In these sparse cases, usually Roaring will organize the data as "array
> chunks" which are simply lists of the values. The thing that makes this
> still compressible is that we store two
On 10/9/2018 2:46 PM, Jeff King wrote:
On Tue, Oct 09, 2018 at 09:48:20AM -0400, Derrick Stolee wrote:
[I snipped all of the parts about bloom filters that seemed entirely
reasonable to me ;) ]
Imagine we have that list. Is a bloom filter still the best data
structure for each commit? At
On Tue, Oct 09, 2018 at 09:48:20AM -0400, Derrick Stolee wrote:
> [I snipped all of the parts about bloom filters that seemed entirely
> reasonable to me ;) ]
> > Imagine we have that list. Is a bloom filter still the best data
> > structure for each commit? At the poin
On Tue, Oct 09 2018, Derrick Stolee wrote:
> The filter needs to store every path that would be considered "not
> TREESAME". It can't store wildcards, so you would need to evaluate the
> wildcard and test all of those paths individually (not a good idea).
If full paths are stored, yes, But have
;not
TREESAME". It can't store wildcards, so you would need to evaluate the
wildcard and test all of those paths individually (not a good idea).
At least not by itself. If we imagine that the commit-graph also had an
alphabetized list of every path in every tree, then it's easy:
Hi Kuba,
On Mon, 14 May 2018, Jakub Narebski wrote:
> [... lots and lots of discussions...]
>
> All right.
>
> Here is my [allegedly] improved version, which assumes that we always
> want to start from commit with maximum generation number (there may be
> more than one such commit).
>
> Let's a
ect behavior with shallow clones, replace-objects, and grafts
So the goal of current v1.0 phase is to introduce generation numbers.
use them for better performance ("low hanging fruit"), ensure that it is
automatic and safe -- thus useable for an ordinary user.
>
> Commi
* 'verify' and fsck/gc integration
* correct behavior with shallow clones, replace-objects, and grafts
Commit-graph v1.1:
* Place commit-graph storage in the_repository
* 'git tag --merged' use generation numbers
* 'git log --graph' use generation numbers
Commit-graph
still perform synthetic tests: how much less
commits we walk when checking that A can reach B on real commit graphs
(like I did in mentioned Google Colaboratory notebook [3]). This
assumes that the cost of accessing commit data (and possibly also
indexes data) dominates, and the cost of using rea
rge
contributor summit notes" [2] is already present in VSTS (Visual Studio
Team Services - the server counterpart of GVFS: Git Virtual File System)
at Microsoft:
AV> - VSTS adds bloom filters to know which paths have changed on the commit
AV> - tree-same check in the bloom filter is
you could skip straight to the merge base
> and keep walking.
Another solution that I thought of is to use the same mechanism that
commit-graph file uses for storing merges: store Bloom filters for first
two parents, and if there are more parents (octopus merge), store Bloom
filters for the remai
On Fri, May 04 2018, Jakub Narebski wrote:
(Just off-the cuff here and I'm surely about to be corrected by
Derrick...)
> * What to do about merge commits, and octopus merges in particular?
> Should Bloom filter be stored for each of the parents? How to ensure
> fast access then (fixed-width
On Fri, May 04 2018, Jakub Narebski wrote:
> With early parts of commit-graph feature (ds/commit-graph and
> ds/lazy-load-trees) close to being merged into "master", see
> https://public-inbox.org/git/xmqq4ljtz87g@gitster-ct.c.googlers.com/
> I think it would be good idea to think what other
ng if the file or directory
was changed in given commit, for queries such as "git log -- " or
"git blame ". This is something that according to "Git Merge
contributor summit notes" [2] is already present in VSTS (Visual Studio
Team Services - the server counterpart of GVFS:
On 09/12/2015 07:16 AM, Shawn Pearce wrote:
> On Fri, Sep 11, 2015 at 2:13 PM, Michael Haggerty
> wrote:
>> I have been thinking about Wilhelm Bierbaum's talk at the last GitMerge
>> conference [1] in which he describes a scheme for using Bloom filters to
>&g
On Sat, Sep 12, 2015 at 12:01 PM, Junio C Hamano wrote:
> Shawn Pearce writes:
>
>> The worst case is due to a bug in the negotiation. With nothing
>> common, the client just goes on forever until it reaches roots
>> (something is wrong with MAX_IN_VAIN). We saw 56,318 have lines ... a
>> 2.6 MiB
Shawn Pearce writes:
> The worst case is due to a bug in the negotiation. With nothing
> common, the client just goes on forever until it reaches roots
> (something is wrong with MAX_IN_VAIN). We saw 56,318 have lines ... a
> 2.6 MiB section. But smart HTTP gzips, so this may be only 1.3 MiB on
>
On Fri, Sep 11, 2015 at 2:13 PM, Michael Haggerty wrote:
> I have been thinking about Wilhelm Bierbaum's talk at the last GitMerge
> conference [1] in which he describes a scheme for using Bloom filters to
> make the initial reference advertisement less expensive.
...
> But i
Michael Haggerty writes:
> 1. The server advertises the references that it has in the way that it
> is currently done.
> 2. The client advertises the objects that it has (or some subset of
> them; see below) via a Bloom filter.
> 3. The server sends the client the packfile that results from assum
I have been thinking about Wilhelm Bierbaum's talk at the last GitMerge
conference [1] in which he describes a scheme for using Bloom filters to
make the initial reference advertisement less expensive.
In his scheme (if I understand correctly) the client starts off the
conversation by passin
23 matches
Mail list logo