t that
still doesn't quite capture ICU's more complex definition of word
boundaries.
Or, we could remove those unused functions for now, and figure out if
there's a reason to add them back later. They are probably adding more
confusion than anything.
Regards,
Jeff Davis
From ff
ge.
I tried that in v2-0003, but I think it ended up worse. Most
pg_wc_xyz() functions don't care if it's the default collation or not,
so there are a lot of duplicate cases.
The previous approach is still there as v2-0002.
Regrads,
Jeff Davis
From 9724181f715ce3468e9342763fad
) = 'I';
?column? | ?column? | ?column? | ?column?
--+--+------+--
t| t| f| f
That behavior goes back a long way, so I'm not suggesting that we
change it.
Regards,
Jeff Davis
From e8a68f42f5802d138ba04043b25b7d
On Wed, 2025-04-02 at 17:58 +0530, Shlok Kyal wrote:
> I reviewed the patch and I have a comment:
Thank you and vignesh for the feedback. This patch didn't quite make it
for v18, but I will address it for the next CF.
Regards,
Jeff Davis
On Wed, 2025-03-19 at 15:17 -0700, Jeff Davis wrote:
> On Sat, 2025-03-15 at 21:37 -0400, Corey Huinker wrote:
> > > 0001 - no changes, but the longer I go the more I'm certain this
> > > is
> > > something we want to do.
>
> This replaces regclassin
to
fetch the next batch), and have a single static variable that points to
that.
Also in 0003, the "next_te" variable is a bit confusing, because it's
actually the last TocEntry, until it's advanced to point to the current
one.
Other than that, looks good to me.
Regards,
Jeff Davis
der parallelism, which might
defeat the batching work that we're trying to do.
Regards,
Jeff Davis
uld
use the same $src_dump for both restoration and comparison, but it
looks like you wanted coverage of the --create option. (Aside: why
parallel restore there? Is that just for test coverage or was there a
performance reason?)
Regards,
Jeff Davis
by
> a
> previous call). Does that sound like a strong enough check?
Again, I'd just be practical here and do the check if it feels natural,
and if not, improve the comments so that someone modifying the code
would know where to look.
Regards,
Jeff Davis
ot;? Isn't
that already implied by "JOIN unnest($1, $2) ... s.tablename =
u.tablename"?
Regards,
Jeff Davis
ke it in, or
waiting for beta reports, may yield some new information that could
change minds.
Mid-beta might be too long, but let's wait for the final CF to settle
and give people the chance to respond to a top-level thread?
Regards,
Jeff Davis
make the decision now for some reason?
Regards,
Jeff Davis
ore and after dumps, and if the
"before" version is 17, then it will not have the relallfrozen argument
to pg_restore_relation_stats. We might need a filtering step in
adjust_new_dumpfile?
Attached new v11j-0001
Regards,
Jeff Davis
From 154b8b5c10ec330c26ccd9006c434a7db1feef04
to unblock
your work.
Regards,
Jeff Davis
suite for me.
Are you saying that the tests don't work for you even when v2j-0003 is
applied? Or are you saying that your tests are failing on master, and
that v2j-0002 should be committed?
Regards,
Jeff Davis
From 6fc3b98dc9a2589b9943e075b492b4c31044c14e Mon Sep 17 00:00:00 2001
Fro
e can wait until beta to see what kinds of
problems people encounter.
Regards,
Jeff Davis
On Sat, 2025-03-22 at 09:39 -0700, Jeff Davis wrote:
> For some reason I'm getting a decline of about 3% in the c.sql test
> that seems to be associated with the accessor functions, even when
> inlined. I'm also not seeing as much benefit from the inlining of the
> MemoryCont
le, you get what you asked for.
> >
>
>
> They *asked for* that because they didn't have the mechanism to say
> "hold the mayo" or "everything except pickles". That's reducing their
> choice, and then blaming them for their choice.
Can we reach a decision here and move forward?
Regards,
Jeff Davis
On Tue, 2025-03-04 at 17:28 -0800, Jeff Davis wrote:
> My results (with wide tables):
>
> GROUP BY EXCEPT
> master: 2151 1732
> entire v8 series: 2054 1740
I'm not sure what I did with the EXCEPT test,
less risky than not updating: if you don't update Unicode,
then the code points could end up in the database treated as
unassigned, and then cause a problem for future updates.
Regards,
Jeff Davis
might* be DDL
happening while I'm trying to do a simple SELECT query. But probably
not, so let's make it the responsibility of DDL to warn others that
it's doing something, rather than the responsibility of the SELECT
query.
Regards,
Jeff Davis
any other multi-lib work (which I am not promising to do) might
slip to PG20, which users will see at the end of 2027. Ugh.
Regards,
Jeff Davis
lems with newly-assigned code
points?
And, if possible, how we might extend this user experience to libc or
ICU updates?
Regards,
Jeff Davis
ExplicitNamespace().
Regards,
Jeff Davis
at need fixing,
and reindex just those few tuples. In theory, it should be possible:
there are a finite number of codepoints that change each Unicode
version, and we can just search for them in the data and fix up derived
structures.
Regards,
Jeff Davis
an actual problem, etc. If you disagree, I'd like to hear more.
Regards,
Jeff Davis
the concerns raised in this thread,
but I'd like others to understand that what they are asking for is a
lot of work, and that the builtin collation provider solves 99% of it
already. All this effort is to solve that last 1%.
Regards,
Jeff Davis
else does it need?
It's an upgrade-time check rather than a GUC, but it basically seems to
match what you want. See:
https://www.postgresql.org/message-id/16c4e37d4c89e63623b009de9ad6fb90e7456ed8.ca...@j-davis.com
Regards,
Jeff Davis
choice remains to remain on the older one.
What do you think of Tom's argument that waiting to update Unicode is
what creates the problem in the first place?
"by then they might well have instances of the newly-assigned code
points in their database"[1]
Regards,
Jeff Davi
+static const pg_wchar * const casekind_map[NCaseKind] =
Fixed also (except pgindent had a slightly different opinion about
spaces).
Was this a general suggestion, or did you see something in particular
that would make it more optimizable this way?
Regards,
Jeff Davis
n:
U&'\0363' ~ '[[:alpha:]]' COLLATE PG_C_UTF8
from false to true, even though U+0363 is assigned in both Unicode
15.1.0 and 16.0.0. That might plausibly matter, but such cases would be
more obscure than case folding.
Regards,
Jeff Davis
[1] https://commitfest.postgresql.org/patch/4876/
e's no "withdrawn
-- duplicate", so it might send the wrong message.
Regards,
Jeff Davis
ng about ways we can express the right dependencies,
and I may be making some proposals along those lines.
Regards,
Jeff Davis
On Sat, Mar 15, 2025 at 1:11 PM Tom Lane wrote:
> Jeff Davis writes:
> > Committed. Thank you!
>
> crake doesn't like your perl style:
>
> ./src/common/unicode/generate-unicode_case_table.pl: Loop iterator is not
> lexical at line 638, column 2. See page 108 o
e possible to
simplify branch() a bit, but I'm fine with the way it's done.
When looking around, I didn't find a lot of material discussing this
generated-branches approach. While it's mentioned a few places, I
couldn't even find a consistent name for it. If you know o
that it's
> faster
> than ICU?
It doesn't break primary keys.
Also, it's stable within a major version, we can document and test its
behavior, it solves 99% of the upgrade problem, and what problems
remains are much more manageable.
And yes, collation is way, way faster than ICU.
Regards,
Jeff Davis
On Sat, 2025-03-15 at 12:15 -0400, Tom Lane wrote:
> On the other hand, if we keep up with the Joneses by updating the
> Unicode data, we can hopefully put those behavioral changes into
> effect *before* they'd affect any real data.
That's a good point.
Regards,
Jeff Davis
On Fri, 2025-03-14 at 13:16 +0200, Heikki Linnakangas wrote:
> Attached are fixes for those and some other minor things.
Thank you, I agree and I have applied your changes.
Regards,
Jeff Davis
mposition table remains the same, getting used for the binary
search in the frontend code, where we care more about the size of the
libraries like libpq over performance..."
>
Regards,
Jeff Davis
From ed4d2803aa32add7c05726286b94e78e49bb1257 Mon Sep 17 00:00:00 2001
From: Jeff Davis
Date
lines.
Did you collect performance results for 0004?
Regards,
Jeff Davis
sier than I expected, so I'm open to other suggestions.
The reason is because materialized view data is also pushed to
RESTORE_PASS_POST_ACL, so we need to do the same for the statistics
(otherwise the dependency is just ignored).
Regards,
Jeff Davis
From 4e84889cb890e5e89191e6ca8d1
dency on the relation and a boundary dependency on
the postDataBound (unless it's an index, or an MV that got pushed to
SECTION_POST_DATA).
I suspect what we need here is a dependency on the MV *data*, because
that's doing a heap swap, which resets the stats. Looking into it.
Regards,
Jeff Davis
e fixes the cross-version-upgrade
> failure in local testing, and pushed it.
Ah, thank you.
Regards,
Jeff Davis
ema-only and --data-only
* --include overrides any default
is that right?
Thoughts on how we should document when/how to use --section vs --
include? Granted, that might be a point of confusion regardless of the
options we offer.
Regards,
Jeff Davis
ng any
exclusions.
But I agree the previous code was hard to read in one place, and
redundant in another, so I will commit a fixup.
Regards,
Jeff Davis
clude=data,statistics <=> --data-only --statistics
--include=schema,data <=> --no-statistics
Not sure which approach is better.
Regards,
Jeff Davis
OWER(t) < U&'\')
RETURNING *
) INSERT INTO tpart SELECT * FROM d;
COMMIT;
The order of operations should be to fix indexes, unique constraints,
and check constraints first; and then to fix partitioned tables. That
way the above partitioned table queries get correc
s mean include statistics also or statistics only? Can
you explicitly request that data be included but rely on the default
for statistics? What options would it override or conflict with?
Regards,
Jeff Davis
t; Could we, instead of having --with-$foo, just use --$foo?
That creates a conflict with the existing --schema option, which is a
namespace filter.
Another idea: we could use --definitions/--data/--statistics.
Regards,
Jeff Davis
clined to commit it after I verify
that it improves performance.
Regards,
Jeff Davis
[1]
https://www.postgresql.org/message-id/714295.1741286...@sss.pgh.pa.us
[2] https://www.postgresql.org/message-id/716907.1741288...@sss.pgh.pa.us
schema|data|stats} and
--{schema|data|stats}-only
I suggest we adjust the options now with something resembling the
attached patch and decide on changing the default sometime during beta.
Regards,
Jeff Davis
From c47fc9e570ddd083097f4bfc708465cf644f48c2 Mon Sep 17 00:00:00 2001
From: Jeff
ould it be appropriate to create a temp table? I wouldn't normally
expect pg_dump to create temp tables, but I can't think of a major
reason not to.
If not, did you have in mind a CTE with a large VALUES expression, or
just a giant IN() list?
Regards,
Jeff Davis
ents we can make to memory usage
and performance, and take it from there.
Regards,
Jeff Davis
tovacuum=off might ok. It's not documented either way, so
> we could change that behaviour later if we find it troublesome.
Sounds good. I will commit something like the v2 patch then soon, and
if we need a different condition we can change it later.
Regards,
Jeff Davis
There can be false positives, because even if such an expression index
exists, it's often not an actual problem. Do we want to stop an upgrade
from happening in that case? I doubt it, but if so, we'd need some kind
of option to bypass it.
Regards,
Jeff Davis
col, except that Execute doesn't cause ReadyForQuery or
RowDescription to be issued."
Each result set needs a RowDescription, so I think you're right that it
breaks the extended protocol. I missed that the first time.
Regards,
Jeff Davis
ipeline.
Here the bits aren't changing, so we're only talking about mask-and-
test, right? My intuition is that wouldn't cause much of a problem.
Regards,
Jeff Davis
ows
> back to the client, and then writes out the plan to some local
> memory.
That's another idea, but I am starting to think returning two result
sets from EXPLAIN ANALYZE would be generally useful.
Regards,
Jeff Davis
[1]
https://www.postgresql.org/message-id/CA%
g we're doing in particular?
Even if we don't have answers, it might be worth adding a brief comment
that we empirically determined that booleans are faster than bitfields
or flags. In the future, maybe compilers mostly get this right, and we
want to change to bitfields.
Regards,
Jeff Davis
e of the feature so that I can
stop accidentally pushing it in some direction by asking questions
about out-of-scope use cases.
Regards,
Jeff Davis
at, even if there is a collision,
and it happened to work, it's as likely to be a feature as a bug.
I didn't look into the technical details to see what might be required
to allow that kind of collaboration, and I am not suggesting you
redesign the entire feature around that idea.
Regards,
Jeff Davis
onal size and have heap_form_minimal_tuple() allocate that
> much extra memory?
I assume we wouldn't want to actually add a field to TupleDescData,
right?
When I reworked the ExecCopySlotMinimalTupleExtra() API to place the
extra memory before the tuple, it worked out to be a bit clean
from extension B?
Maybe we should just allow multiple extensions to use the same option
name?
Regards,
Jeff Davis
conclusion? Is
it worth adding a comment about why we use independent booleans, even
if we don't have a complete answer?
Regards,
Jeff Davis
uld also stay in SECTION_DATA. But then we have a mess, so
we might as well just put all stats in SECTION_POST_DATA.
But I don't see it as a dependency problem. When I look at the above
SQL, it reads nicely to me and there's no obvious problem with it.
If we want stats to be stable, we need some kind of mode to tell the
server not to apply these kind of helpful optimizations, otherwise the
issue will resurface in some form no matter what we do with pg_dump. We
could invent a new mode, but autovacuum=off seems close enough to me.
Regards,
Jeff Davis
detail.
Regards,
Jeff Davis
alls with their multi-line formats
> (not 100% sure we don't have that anywhere, as things like "SELECT
> setval" and "SELECT set_config" are single line, but there may be
> existing things)
That's an interesting point. What tools are currrently trying to parse
pg_dump output?
Regards,
Jeff Davis
On Fri, 2025-02-28 at 14:56 -0600, Nathan Bossart wrote:
> On Fri, Feb 28, 2025 at 12:54:03PM -0800, Jeff Davis wrote:
> > (Aside: I assume everyone here agrees that pg_upgrade should
> > transfer
> > the stats by default.)
>
> That feels like a safe assumption to me..
On Mon, 2024-12-16 at 20:05 -0800, Jeff Davis wrote:
> On Wed, 2024-10-30 at 08:08 -0700, Jeff Davis wrote:
>
Rebased v14.
The approach has changed multiple times. It starte off with more in-
core code, but in response to review feedback, has become more
decoupled from core and more coup
and restored database - which may have other effects like
> change in plans.
Then let's just address that concern directly: disable updating stats
implicitly if autovacuum is off. If autovacuum is on, the user
shouldn't have an expectation of stable stats anyway. Patch attached.
o we will need to weigh the costs and benefits.
Unless there's a consensus to change it, I'm inclined to keep it the
default at least into beta, so that we can get feedback from users and
make a more informed decision.
(Aside: I assume everyone here agrees that pg_upgrade should transfer
o dump first with -
-no-statistics, and then with --statistics-only, and restore the two
SQL files in order.
Alternatively, we could put stats into SECTION_POST_DATA, which was
already discussed[*], and we decided against it (though there was not a
clear consensus).
Regards,
Jeff Davis
*:
https:/
rovide some nice benefits, but would
introduce this behavior change, which makes it slightly more than a
refactoring.
It sounds like the behavior change would be desirable or at least
neutral. I will have to try it out and see if the refactoring is a net
improvement or turns into a mess.
Regards,
Jeff Davis
omething similar for hash_mem_multiplier, too.
Regards,
Jeff Davis
anything as complex or
specific as ExecAssignWorkMem(). If we just add it at the time the Path
is created, and then propagate it to the plan with
copy_generic_path_info(), that would be a lot less code. What am I
missing?
Regards,
Jeff Davis
o know that there is a
general consensus that we don't want to use in-place updates for non-
critical things like stats (and perhaps eliminate them entirely). In
other words, the inconcistency likely won't last forever.
Regards,
Jeff Davis
for me) secondarily about churn on pg_class. The bloat was
never terrible.
With that in mind, should we remove the in-place updates from ANALYZE
as well?
Regards,
Jeff Davis
On Wed, 2025-02-26 at 13:06 -0500, Tom Lane wrote:
> Jeff Davis writes:
> > I ran a quick measurement and it appears within the noise of the
> > numbers I posted here:
> > https://www.postgresql.org/message-id/6af48508a32499a8be3398cafffd29fb6188c44b.ca...@j-davis.com
>
measurement and it appears within the noise of the
numbers I posted here:
https://www.postgresql.org/message-id/6af48508a32499a8be3398cafffd29fb6188c44b.ca...@j-davis.com
Regards,
Jeff Davis
rds,
Jeff Davis
em.
Regards,
Jeff Davis
v3j-0002:1.7s
I plan to commit the patches soon.
Regards,
Jeff Davis
From d617fb142158e0ca964e5bc8bb3351d993de6062 Mon Sep 17 00:00:00 2001
From: Corey Huinker
Date: Fri, 21 Feb 2025 23:31:04 -0500
Subject: [PATCH v3j 1/2] Avoid unnecessary relation stats query in pg_dump.
T
(parts 1 and 2)
> * scalars can't have elem_count_histogram
> * cannot set most_common_elems for range type
Done.
And committed.
Regards,
Jeff Davis
do so. The docs and tests required substantial
rework, but I think it's for the better now that we aren't trying to do
in-place updates.
Regards,
Jeff Davis
From ea413ee48b10299530bafc3102395285b5ea8ce3 Mon Sep 17 00:00:00 2001
From: Jeff Davis
Date: Mon, 24 Feb 2025 17:24:05 -08
king rules still apply when updating pg_class or
pg_database even if the current caller is not performing an in-place
update. It might be better to point instead to
check_lock_if_inplace_updateable_rel()?
Regards,
Jeff Davis
ct a Plan node, and then reference the
plan_work_mem instead of the GUC directly.
Can you give a bit more context about why we need so many changes,
including test changes?
Regards,
Jeff Davis
e that code.
We should probably add a comment somewhere, though.
Regards,
Jeff Davis
#x27;s doing the
conversions to/from float4?
Regards,
Jeff Davis
what ANALYZE does), then I'm OK removing it. Which makes
me wonder why ANALYZE does it with inplace updates?
Regards,
Jeff Davis
think those are major concerns for v1, so in principle I'm fine
removing it. But the problem is that it affects the documented
semantics, so it would be hard to change later, and we'd be stuck with
the bloating behavior forever.
Regards,
Jeff Davis
On Mon, 2025-02-24 at 15:03 -0500, Tom Lane wrote:
> Jeff Davis writes:
> > But you have a point in that float4in() does slightly more work
> > than
> > strtof() to handle platform differences about NaN/Inf. I'm not sure
> > how
> > much to weigh that conce
ng (a) why we're using
32 bytes to store something where the natural representation is 4
bytes; and (b) whether that memory adds up to anything worth worrying
about. I'm sure we could analyze that and write an explanatory comment,
but that has non-zero cognitive overhead, as well.
Regards,
Jeff Davis
working reasonably
well for cases where the user might be doing something more interesting
than a binary upgrade, as you point out. But attribute numbers for
indexes seem much more reliable: an index with a different attribute
order is a fundamentally different index.
Regards,
Jeff Davis
just have the pg_set variants set elevel=ERROR and
inplace=false, and otherwise be identical to the pg_restore variants?
Regards,
Jeff Davis
ernal view that exposes only starelid/staattnum, and
pg_stats could just be a simple join on top of that?
There's another annoyance, which is that pg_stats doesn't expose any
custom stakinds, so we lose those, but I'm not sure if that's worth
trying to fix.
Regards,
Jeff Davis
e place.
> Conversions to and from other data types introduce the possibility,
> though very remote, of the converted-and-then-unconverted value being
> cosmetically different from what we got from the server, and if down
> the road we're dealing with more complex data types, those
> conversions might become significant.
That's a good point but let's avoid excessive redundancy in the
structures. Adding a few fields to RelStatsInfo should be enough.
Regards,
Jeff Davis
but it
wasn't helping much in that function anyway, because it was no longer a
loop.
I didn't measure any performance difference between your version and
mine, but avoiding a few allocations couldn't hurt. It seems to save
just under 20% on an unoptimized buil
t
> be updated.
Changed.
Regards,
Jeff Davis
On Wed, 2025-02-12 at 19:00 -0800, Jeff Davis wrote:
> I'm still reviewing v48, but I intend to commit something soon.
Committed with some revisions on top of v48:
* removed the short option -X, leaving the long option "--statistics-
only" with the same meaning.
* removed th
On Wed, 2025-02-19 at 01:54 +0300, Alexander Borisov wrote:
> In proposing the patch for v3, I struck a balance between improving
> performance and reducing binary size, without sacrificing code
> clarity.
Fair enough. I will continue reviewing v3.
Regards,
Jeff Davis
1 - 100 of 1315 matches
Mail list logo