On Fri, Mar 6, 2020 at 11:00 AM Andres Freund wrote:
> > Pushed.
>
> Congrats!
Thanks Andres!
--
Peter Geoghegan
On 2020-02-26 14:43:27 -0800, Peter Geoghegan wrote:
> On Mon, Feb 24, 2020 at 4:54 PM Peter Geoghegan wrote:
> > Attached is v34, which has this change. My plan is to commit something
> > very close to this on Wednesday morning (barring any objections).
>
> Pushed.
Congrats!
On Wed, Feb 26, 2020 at 10:03 PM Fujii Masao
wrote:
> Thanks for committing this nice feature!
You're welcome!
> Here is one minor comment.
>
> + deduplicate_items
> + storage parameter
>
> This should be
>
> deduplicate_items storage parameter
I pushed a fix for this.
Thanks
--
On 2020/02/27 7:43, Peter Geoghegan wrote:
On Mon, Feb 24, 2020 at 4:54 PM Peter Geoghegan wrote:
Attached is v34, which has this change. My plan is to commit something
very close to this on Wednesday morning (barring any objections).
Pushed.
Thanks for committing this nice feature!
Her
On Mon, Feb 24, 2020 at 4:54 PM Peter Geoghegan wrote:
> Attached is v34, which has this change. My plan is to commit something
> very close to this on Wednesday morning (barring any objections).
Pushed.
I'm going to delay committing the pageinspect patch until tomorrow,
since I haven't thought
On Thu, Feb 20, 2020 at 10:58 AM Peter Geoghegan wrote:
> I think that there is a good chance that it just won't matter. The
> number of indexes that won't be able to support deduplication will be
> very small in practice. The important exceptions are INCLUDE indexes
> and nondeterministic collati
On Thu, Feb 20, 2020 at 7:38 AM Anastasia Lubennikova
wrote:
> I don't think this patch really needs more nitpicking )
But when has that ever stopped it? :-)
> User can discover this with a complex query to pg_index and pg_opclass.
> To simplify this, we can probably wrap this into function or
On 19.02.2020 22:16, Peter Geoghegan wrote:
On Wed, Feb 19, 2020 at 8:14 AM Anastasia Lubennikova
wrote:
Thank you for this work. I've looked through the patches and they seem
to be ready for commit.
I haven't yet read recent documentation and readme changes, so maybe
I'll send some more feedba
On Wed, Feb 19, 2020 at 11:16 AM Peter Geoghegan wrote:
> On Wed, Feb 19, 2020 at 8:14 AM Anastasia Lubennikova
> wrote:
> > Thank you for this work. I've looked through the patches and they seem
> > to be ready for commit.
> > I haven't yet read recent documentation and readme changes, so maybe
On Wed, Feb 19, 2020 at 8:14 AM Anastasia Lubennikova
wrote:
> Thank you for this work. I've looked through the patches and they seem
> to be ready for commit.
> I haven't yet read recent documentation and readme changes, so maybe
> I'll send some more feedback tomorrow.
Great.
> Is there any sp
On 14.02.2020 05:57, Peter Geoghegan wrote:
Attached is v33, which adds the last piece we need: opclass
infrastructure that tells nbtree whether or not deduplication can be
applied safely. This is based on work by Anastasia that was shared
with me privately.
Thank you for this work. I've looked
On Fri, Jan 10, 2020 at 1:36 PM Peter Geoghegan wrote:
> * HEIKKI: Do we only generate one posting list in one WAL record? I
> would assume it's better to deduplicate everything on the page, since
> we're modifying it anyway.
Still thinking about this one.
> * HEIKKI: Does xl_btree_vacuum WAL re
On Wed, Jan 8, 2020 at 5:56 AM Heikki Linnakangas wrote:
> On 04/01/2020 03:47, Peter Geoghegan wrote:
> > Attached is v28, which fixes bitrot from my recent commits to refactor
> > VACUUM-related code in nbtpage.c.
>
> I started to read through this gigantic patch.
Oh come on, it's not that big.
On Thu, Dec 12, 2019 at 6:21 PM Peter Geoghegan wrote:
> Still waiting for some review of the first patch, to get it out of the
> way. Anastasia?
I plan to commit this first patch [1] in the next day or two, barring
any objections.
It's clear that the nbtree "pin scan" VACUUM code is totally
unn
On Tue, Dec 17, 2019 at 5:18 PM Bruce Momjian wrote:
> On Tue, Dec 17, 2019 at 03:30:33PM -0800, Peter Geoghegan wrote:
> > With many real world unique indexes, the true reason behind most or
> > all B-Tree page splits is "version churn". I view these page splits as
> > a permanent solution to a t
On Tue, Dec 17, 2019 at 03:30:33PM -0800, Peter Geoghegan wrote:
> With many real world unique indexes, the true reason behind most or
> all B-Tree page splits is "version churn". I view these page splits as
> a permanent solution to a temporary problem -- we *permanently*
> degrade the index struc
On Tue, Dec 17, 2019 at 1:58 PM Bruce Momjian wrote:
> > Attached is v26, which adds this new criteria/heuristic for unique
> > indexes. We now seem to consistently get good results with unique
> > indexes.
>
> In the past we tried to increase the number of cases where HOT updates
> can happen but
On Thu, Dec 12, 2019 at 06:21:20PM -0800, Peter Geoghegan wrote:
> On Tue, Dec 3, 2019 at 12:13 PM Peter Geoghegan wrote:
> > The new criteria/heuristic for unique indexes is very simple: If a
> > unique index has an existing item that is a duplicate on the incoming
> > item at the point that we m
On Tue, Nov 12, 2019 at 3:21 PM Peter Geoghegan wrote:
> * Decided to go back to turning deduplication on by default with
> non-unique indexes, and off by default using unique indexes.
>
> The unique index stuff was regressed enough with INSERT-heavy
> workloads that I was put off, despite my init
On Mon, Nov 18, 2019 at 05:26:37PM -0800, Peter Geoghegan wrote:
> Attached is v24. This revision doesn't fix the problem with
> xl_btree_insert record bloat, but it does fix the bitrot against the
> master branch that was caused by commit 50d22de9. (This patch has had
> a surprisingly large number
Moin,
On 2019-11-16 01:04, Peter Geoghegan wrote:
On Fri, Nov 15, 2019 at 5:16 AM Robert Haas
wrote:
Hmm. Well, maybe I'm just behind the times. But that same wikipedia
article also says that deduplication works on large chunks "such as
entire files or large sections of files" thus differentia
On Fri, Nov 15, 2019 at 5:43 PM Mark Dilger wrote:
> On 11/13/19 11:51 AM, Peter Geoghegan wrote:
> > Can you suggest an alternative?
>
> Dupression
This suggestion makes me feel better about "deduplication".
--
Peter Geoghegan
On 11/13/19 11:51 AM, Peter Geoghegan wrote:
Can you suggest an alternative?
Dupression
--
Mark Dilger
On Wed, Sep 11, 2019 at 2:04 PM Peter Geoghegan wrote:
> > I haven't measured how these changes affect WAL size yet.
> > Do you have any suggestions on how to automate testing of new WAL records?
> > Is there any suitable place in regression tests?
>
> I don't know about the regression tests (I do
On Fri, Nov 15, 2019 at 5:16 AM Robert Haas wrote:
> Hmm. Well, maybe I'm just behind the times. But that same wikipedia
> article also says that deduplication works on large chunks "such as
> entire files or large sections of files" thus differentiating it from
> compression algorithms which work
On Wed, Nov 13, 2019 at 2:51 PM Peter Geoghegan wrote:
> "Deduplication" never means that you get rid of duplicates. According
> to Wikipedia's deduplication article: "Whereas compression algorithms
> identify redundant data inside individual files and encodes this
> redundant data more efficientl
On Wed, Nov 13, 2019 at 11:33 AM Robert Haas wrote:
> On Tue, Nov 12, 2019 at 6:22 PM Peter Geoghegan wrote:
> > * Disabled deduplication in system catalog indexes by deeming it
> > generally unsafe.
>
> I (continue to) think that deduplication is a terrible name, because
> you're not getting rid
On Tue, Nov 12, 2019 at 6:22 PM Peter Geoghegan wrote:
> * Disabled deduplication in system catalog indexes by deeming it
> generally unsafe.
I (continue to) think that deduplication is a terrible name, because
you're not getting rid of the duplicates. You are using a compressed
representation of
On Mon, Sep 30, 2019 at 7:39 PM Peter Geoghegan wrote:
> I've found that my "regular pgbench, but with a low cardinality index
> on pgbench_accounts(abalance)" benchmark works best with the specific
> heuristics used in the patch, especially over many hours.
I ran pgbench without the pgbench_acco
On Fri, Sep 27, 2019 at 9:43 AM Anastasia Lubennikova
wrote:
> Attached is v19.
Cool.
> * By default deduplication is on for non-unique indexes and off for
> unique ones.
I think that it makes sense to enable deduplication by default -- even
with unique indexes. It looks like deduplication can
On Wed, Sep 25, 2019 at 8:05 AM Anastasia Lubennikova
wrote:
> Attached is v18. In this version bt_dedup_one_page() is refactored so that:
> - no temp page is used, all updates are applied to the original page.
> - each posting tuple wal logged separately.
> This also allowed to simplify btree_xlo
24.09.2019 3:13, Peter Geoghegan wrote:
On Wed, Sep 18, 2019 at 7:25 PM Peter Geoghegan wrote:
I attach version 16. This revision merges your recent work on WAL
logging with my recent work on simplifying _bt_dedup_one_page(). See
my e-mail from earlier today for details.
I attach version 17. T
On Mon, Sep 23, 2019 at 5:13 PM Peter Geoghegan wrote:
> I attach version 17.
I attach a patch that applies on top of v17. It adds support for
deduplication within unique indexes. Actually, this is a snippet of
code that appeared in my prototype from August 5 (we need very little
extra code for t
On Wed, Sep 18, 2019 at 10:43 AM Peter Geoghegan wrote:
> This also suggests that making _bt_dedup_one_page() do raw page adds
> and page deletes to the page in shared_buffers (i.e. don't use a temp
> buffer page) could pay off. As I went into at the start of this
> e-mail, unnecessarily doing exp
On Tue, Sep 17, 2019 at 9:43 AM Anastasia Lubennikova
wrote:
> > 3. Third, there is incremental writing of the page itself -- avoiding
> > using a temp buffer. Not sure where I stand on this.
>
> I think it's a good idea. memmove must be much faster than copying
> items tuple by tuple.
> I'll sen
16.09.2019 21:58, Peter Geoghegan wrote:
On Mon, Sep 16, 2019 at 8:48 AM Anastasia Lubennikova
wrote:
I tested patch with nbtree_wal_test, and found out that the real issue is
not the dedup WAL records themselves, but the full page writes that they
trigger.
Here are test results (config is sta
On Mon, Sep 16, 2019 at 8:48 AM Anastasia Lubennikova
wrote:
> Attached is v14 based on v12 (v13 changes are not merged).
>
> In this version, I fixed the bug you mentioned and also fixed nbtinsert,
> so that it doesn't save newposting in xlog record anymore.
Cool.
> I tested patch with nbtree_w
13.09.2019 4:04, Peter Geoghegan wrote:
On Wed, Sep 11, 2019 at 2:04 PM Peter Geoghegan wrote:
I think that the new WAL record has to be created once per posting
list that is generated, not once per page that is deduplicated --
that's the only way that I can see that avoids a huge increase in
t
On Wed, Sep 11, 2019 at 3:09 PM Peter Geoghegan wrote:
> Hmm. So v12 seems to have some problems with the WAL logging for
> posting list splits. With wal_debug = on and
> wal_consistency_checking='all', I can get a replica to fail
> consistency checking very quickly when "make installcheck" is run
On Wed, Sep 11, 2019 at 5:38 AM Anastasia Lubennikova
wrote:
> Attached is v12, which contains WAL optimizations for posting split and
> page
> deduplication.
Hmm. So v12 seems to have some problems with the WAL logging for
posting list splits. With wal_debug = on and
wal_consistency_checking='al
On Wed, Sep 11, 2019 at 5:38 AM Anastasia Lubennikova
wrote:
> I reviewed them and everything looks good except the idea of not
> splitting dead posting tuples.
> According to comments to scan->ignore_killed_tuples in genam.c:107,
> it may lead to incorrect tuple order on a replica.
> I don't sure
09.09.2019 22:54, Peter Geoghegan wrote:
Attached is v11, which makes the kill_prior_tuple optimization work
with posting list tuples. The only catch is that it can only work when
all "logical tuples" within a posting list are known-dead, since of
course there is only one LP_DEAD bit available f
On Thu, Aug 29, 2019 at 10:10 PM Peter Geoghegan wrote:
> I see some Valgrind errors on v9, all of which look like the following
> two sample errors I go into below.
I've found a fix for these Valgrind issues. It's a matter of making
sure that _bt_truncate() sizes new pivot tuples properly, which
On Thu, Aug 29, 2019 at 5:07 PM Peter Geoghegan wrote:
> I agree that v9 might be ever so slightly more space efficient than v5
> was, on balance.
I see some Valgrind errors on v9, all of which look like the following
two sample errors I go into below.
First one:
==11193== VALGRINDERROR-BEGIN
=
On Thu, Aug 29, 2019 at 5:13 AM Anastasia Lubennikova
wrote:
> Your explanation helped me to understand that this approach can be
> extended to
> the case of insertion into posting list, that doesn't trigger posting
> split,
> and that nbtsplitloc indeed doesn't need to know about posting tuples
>
28.08.2019 6:19, Peter Geoghegan wrote:
On Fri, Aug 16, 2019 at 8:56 AM Anastasia Lubennikova
wrote:
Now the algorithm is the following:
- In case page split is needed, pass both tuples to _bt_split().
_bt_findsplitloc() is now aware of upcoming replacement of origtup with
neworigtup, so it
On Fri, Aug 16, 2019 at 8:56 AM Anastasia Lubennikova
wrote:
> Now the algorithm is the following:
> - In case page split is needed, pass both tuples to _bt_split().
> _bt_findsplitloc() is now aware of upcoming replacement of origtup with
> neworigtup, so it uses correct item size where needed.
23.08.2019 7:33, Peter Geoghegan wrote:
On Wed, Aug 21, 2019 at 10:19 AM Anastasia Lubennikova
wrote:
I'm going to look through the patch once more to update nbtxlog
comments, where needed and
answer to your remarks that are still left in the comments.
Have you been using amcheck's rootdescend
On Wed, Aug 21, 2019 at 10:19 AM Anastasia Lubennikova
wrote:
> I'm going to look through the patch once more to update nbtxlog
> comments, where needed and
> answer to your remarks that are still left in the comments.
Have you been using amcheck's rootdescend verification? I see this
problem wit
20.08.2019 4:04, Peter Geoghegan wrote:
On Fri, Aug 16, 2019 at 8:56 AM Anastasia Lubennikova
wrote:
It seems that now all replace operations are crash-safe. The new patch passes
all regression tests, so I think it's ready for review again.
I'm looking at it now. I'm going to spend a signific
On Fri, Aug 16, 2019 at 8:56 AM Anastasia Lubennikova
wrote:
> Now the algorithm is the following:
>
> - If bt_findinsertloc() found out that tuple belongs to existing posting
> tuple's
> TID interval, it sets 'in_posting_offset' variable and passes it to
> _bt_insertonpg()
>
> - If 'in_posting_o
On Tue, Aug 13, 2019 at 8:45 AM Anastasia Lubennikova
wrote:
> > I still need to think about the exact details of alignment within
> > _bt_insertonpg_in_posting(). I'm worried about boundary cases there. I
> > could be wrong.
> Could you explain more about these cases?
> Now I don't understand the
13.08.2019 18:45, Anastasia Lubennikova wrote:
I also added a nearby FIXME comment to
_bt_insertonpg_in_posting() -- I don't think think that the code for
splitting a posting list in two is currently crash-safe.
Good catch. It seems, that I need to rearrange the code.
I'll send updated patch
06.08.2019 4:28, Peter Geoghegan wrote:
Attached is v5, which is based on your v4. The three main differences
between this and v4 are:
* Removed BT_COMPRESS_THRESHOLD stuff, for the reasons explained in my
July 24 e-mail. We can always add something like this back during
performance validation o
On Wed, Jul 31, 2019 at 9:23 AM Anastasia Lubennikova
wrote:
> > * Included my own pageinspect hack to visualize the minimum TIDs in
> > posting lists. It's broken out into a separate patch file. The code is
> > very rough, but it might help someone else, so I thought I'd include
> > it.
> Cool, I
24.07.2019 4:22, Peter Geoghegan wrote:
Attached is a revised version of your v2 that fixes this issue -- I'll
call this v3. In general, my goal for the revision was to make sure
that all of my old tests from the v12 work passed, and to make sure
that amcheck can detect almost any possible probl
On Thu, 25 Jul 2019 at 05:49, Peter Geoghegan wrote:
>
> On Wed, Jul 24, 2019 at 3:06 PM Peter Geoghegan wrote:
> > There seems to be a kind of "synergy" between the nbtsplitloc.c
> > handling of pages that have lots of duplicates and posting list
> > compression. It seems as if the former mechan
On Wed, Jul 24, 2019 at 3:06 PM Peter Geoghegan wrote:
> There seems to be a kind of "synergy" between the nbtsplitloc.c
> handling of pages that have lots of duplicates and posting list
> compression. It seems as if the former mechanism "sets up the bowling
> pins", while the latter mechanism "kn
On Tue, Jul 23, 2019 at 6:22 PM Peter Geoghegan wrote:
> Attached is a revised version of your v2 that fixes this issue -- I'll
> call this v3.
Remember that index that I said was 5.5x smaller with the patch
applied, following retail insertions (a single big INSERT ... SELECT
...)? Well, it's 6.5
On Fri, Jul 19, 2019 at 7:24 PM Peter Geoghegan wrote:
> Hmm. So, the attached test case fails amcheck verification for me with
> the latest version of the patch:
Attached is a revised version of your v2 that fixes this issue -- I'll
call this v3. In general, my goal for the revision was to make
On Fri, Jul 19, 2019 at 12:32 PM Peter Geoghegan wrote:
> On Fri, Jul 19, 2019 at 10:53 AM Anastasia Lubennikova
> wrote:
> > Patch 0002 (must be applied on top of 0001) implements preserving of
> > correct TID order
> > inside posting list when inserting new tuples.
> > This version passes all r
On Fri, Jul 19, 2019 at 10:53 AM Anastasia Lubennikova
wrote:
> Patch 0002 (must be applied on top of 0001) implements preserving of
> correct TID order
> inside posting list when inserting new tuples.
> This version passes all regression tests including amcheck test.
> I also used following scrip
17.07.2019 19:36, Anastasia Lubennikova:
There is one major issue left - preserving TID order in posting lists.
For a start, I added a sort into BTreeFormPostingTuple function.
It turned out to be not very helpful, because we cannot check this
invariant lazily.
Now I work on patching _bt_bins
11.07.2019 21:19, Peter Geoghegan wrote:
On Thu, Jul 11, 2019 at 8:34 AM Rafia Sabih wrote:
Hi,
Peter, Rafia, thanks for the review. New version is attached.
+ elog(DEBUG4, "insert_itupprev_to_page. compressState->ntuples %d
IndexTupleSize %zu free %zu",
+ compressState->ntuples, IndexTupleS
On Thu, Jul 11, 2019 at 10:42 AM Peter Geoghegan wrote:
> > I think unique indexes may benefit from deduplication not only because
> > of NULL values. Non-HOT updates produce duplicates of non-NULL values
> > in unique indexes. And those duplicates can take significant space.
>
> I agree that we
On Thu, Jul 11, 2019 at 8:34 AM Rafia Sabih wrote:
> I tried this patch and found the improvements impressive. However,
> when I tried with multi-column indexes it wasn't giving any
> improvement, is it the known limitation of the patch?
It'll only deduplicate full duplicates. It works with multi
On Thu, Jul 11, 2019 at 8:09 AM Alexander Korotkov
wrote:
> BTW, I think deduplication could cause some small performance
> degradation in some particular cases, because page-level locks became
> more coarse grained once pages hold more tuples. However, this
> doesn't seem like something we shoul
On Thu, Jul 11, 2019 at 8:02 AM Alexander Korotkov
wrote:
> Could you please elaborate more on preserving the logical contents? I
> can understand it as following: "B-Tree should have the same structure
> and invariants as if each TID in posting list be a separate tuple".
That's exactly what I m
On Thu, Jul 11, 2019 at 7:30 AM Bruce Momjian wrote:
> Wow, I never thought of that. The only things I know we lock until
> transaction end are rows we update (against concurrent updates), and
> additions to unique indexes. By definition, indexes with many
> duplicates are not unique, so that do
On Sun, 7 Jul 2019 at 01:08, Peter Geoghegan wrote:
> * Maybe we could do compression with unique indexes when inserting
> values with NULLs? Note that we now treat an insertion of a tuple with
+1
I tried this patch and found the improvements impressive. However,
when I tried with multi-column i
On Thu, Jul 11, 2019 at 7:53 AM Peter Geoghegan wrote:
> Anyway, I think that *hundreds* or even *thousands* of rows are
> effectively locked all at once when a bitmap index needs to be updated
> in these other systems -- and I mean a heavyweight lock that lasts
> until the xact commits or aborts,
Hi Peter,
Thank you very much for your attention to this patch. Let me comment
some points of your message.
On Sun, Jul 7, 2019 at 2:09 AM Peter Geoghegan wrote:
> On Thu, Jul 4, 2019 at 5:06 AM Anastasia Lubennikova
> wrote:
> > The new version of the patch is attached.
> > This version is eve
On Wed, Jul 10, 2019 at 09:53:04PM -0700, Peter Geoghegan wrote:
> Anyway, I think that *hundreds* or even *thousands* of rows are
> effectively locked all at once when a bitmap index needs to be updated
> in these other systems -- and I mean a heavyweight lock that lasts
> until the xact commits o
On Sat, Jul 6, 2019 at 4:08 PM Peter Geoghegan wrote:
> I took a closer look at this patch, and have some general thoughts on
> its design, and specific feedback on the implementation.
I have some high level concerns about how the patch might increase
contention, which could make queries slower.
On Thu, Jul 4, 2019 at 5:06 AM Anastasia Lubennikova
wrote:
> The new version of the patch is attached.
> This version is even simpler than the previous one,
> thanks to the recent btree design changes and all the feedback I received.
> I consider it ready for review and testing.
I took a closer
On Thu, Jul 4, 2019 at 05:06:09PM -0700, Peter Geoghegan wrote:
> This result is very impressive. We'll need to revisit what the right
> trade-off is for the compression scheme, which Heikki had some
> thoughts on when we left off 3 years ago, but that should be a lot
> easier now. I am very encou
On Thu, Jul 4, 2019 at 10:38 AM Peter Geoghegan wrote:
> I tried this on my own "UK land registry" test data [1], which was
> originally used for the v12 nbtree work. My test case has a low
> cardinality, multi-column text index. I chose this test case because
> it was convenient for me.
>
> On v1
On Thu, Jul 4, 2019 at 5:06 AM Anastasia Lubennikova
wrote:
> i - number of distinct values in the index.
> So i=1 means that all rows have the same key,
> and i=1000 means that all keys are different.
>
> i / old size (MB) / new size (MB)
> 121588
> 100021590
> 100
The new version of the patch is attached.
This version is even simpler than the previous one,
thanks to the recent btree design changes and all the feedback I received.
I consider it ready for review and testing.
[feature overview]
This patch implements the deduplication of btree non-pivot tuples
79 matches
Mail list logo