Binary encoding of timetz type
Hi folks, Apologies if it's the wrong place to ask. I have a question for postgres developers. I wrote parser for COPY binary encoding. Everything is great but one thing worries me. It seems like the time offset field has reversed digit sign. I'm referring to this function https://github.com/postgres/postgres/blob/REL_13_STABLE/src/backend/utils/adt/date.c#L2150-L2159 COPY response for value *'13:37:00-0300'::timetz* gives me binary 0x000b69d1d7002a30. And value *'13:37:00+0300'::timetz* is encoded as 0x000b69d1d700d5d0. 0xd5d0 is -10800 for signed integer and 0x2a30 is 10800. I reverse the sign in my code but feeling that I'm missing something is nagging me. Please advise.
Reproducing incorrect order with order by in a subquery
Hello, For example I have a query: SELECT main.*, count(*) OVER () FROM (SELECT DISTINCT ... ORDER BY X) main; So the `ORDER BY` clause ended up in a subquery. Most of the time ordering works until it doesn't. Can you help me create a set of test tables with some data to reproduce this problem more repeatedly? I just want to write a regression test to make sure it doesn't happen again. -- Best regards, Ruslan.
Re: Reproducing incorrect order with order by in a subquery
On Tue, Jun 13, 2023 at 1:26 PM Thomas Kellerer wrote: > Ruslan Zakirov schrieb am 13.06.2023 um 09:49: > > For example I have a query: > > > > SELECT main.*, count(*) OVER () FROM (SELECT DISTINCT ... ORDER BY X) > > main; > > > > So the `ORDER BY` clause ended up in a subquery. Most of the time > > ordering works until it doesn't. > > > > Can you help me create a set of test tables with some data to > > reproduce this problem more repeatedly? I just want to write a > > regression test to make sure it doesn't happen again. > Your final/overall query has no ORDER BY, so Postgres is free to return > the result in any order it likes. > > You will have to add an ORDER BY to the "main" part to get a guaranteed > sort order > I know how to fix the problem and I know that ORDER BY should be in the outermost select. However, I want to write a test case that shows that the old code is wrong, but can not create minimal set of tables to reproduce it. With this I'm looking for help. -- Best regards, Ruslan.
Re: Reproducing incorrect order with order by in a subquery
On Tue, Jun 13, 2023 at 6:06 PM Tom Lane wrote: > Ruslan Zakirov writes: > > I know how to fix the problem and I know that ORDER BY should be in the > > outermost select. > > > However, I want to write a test case that shows that the old code is > wrong, > > but can not create > > minimal set of tables to reproduce it. With this I'm looking for help. > > The ORDER BY in the sub-select will be honored at the output of the > sub-select. To have a different ordering at the final output, you > need the upper query to do something that would re-order the rows. > Joining the sub-select to something else might make that happen, > or you could apply DISTINCT or some other non-trivial processing > in the upper query. > > regards, tom lane > Hello Tom, Thanks for replying. Maybe I'm just wrong in my assumption. A user reports incorrect order in the following query: SELECT main.*, COUNT(main.id) OVER() AS search_builder_count_all FROM ( SELECT DISTINCT main.* FROM Tickets main LEFT JOIN Groups Groups_2 ON ( Groups_2.Domain = 'RT::Ticket-Role' ) AND ( Groups_2.Instance = main.id ) JOIN Queues Queues_1 ON ( Queues_1.id = main.Queue ) LEFT JOIN CachedGroupMembers CachedGroupMembers_3 ON ( CachedGroupMembers_3.Disabled = '0' ) AND ( CachedGroupMembers_3.MemberId IN ('38', '837', ... , '987', '58468') ) AND ( CachedGroupMembers_3.GroupId = Groups_2.id ) WHERE ( ( main.Queue IN ('1', ... , '20') OR ( CachedGroupMembers_3.MemberId IS NOT NULL AND LOWER(Groups_2.Name) IN ('cc', 'requestor') ) OR ( main.Owner = '38' ) ) AND (main.IsMerged IS NULL) AND (main.Status != 'deleted') AND (main.Type = 'ticket') AND (main.Owner = '6' AND ( ( Queues_1.Lifecycle = 'default' AND main.Status IN ('new', 'open', 'stalled') ) OR ( Queues_1.Lifecycle = 'approvals' AND main.Status IN ('new', 'open', 'stalled') ) ) ) ORDER BY main.Created DESC ) main LIMIT 50 We have an option in our product that makes this query simpler, no joins in the subquery. The user reports that using this option helps with order. This is a too complex query to build a test on. Tried simpler scenarios and failed. -- Best regards, Ruslan.
Re: Reproducing incorrect order with order by in a subquery
On Tue, Jun 13, 2023 at 8:55 PM Ruslan Zakirov wrote: > On Tue, Jun 13, 2023 at 6:06 PM Tom Lane wrote: > >> Ruslan Zakirov writes: >> > I know how to fix the problem and I know that ORDER BY should be in the >> > outermost select. >> >> > However, I want to write a test case that shows that the old code is >> wrong, >> > but can not create >> > minimal set of tables to reproduce it. With this I'm looking for help. >> >> The ORDER BY in the sub-select will be honored at the output of the >> sub-select. To have a different ordering at the final output, you >> need the upper query to do something that would re-order the rows. >> Joining the sub-select to something else might make that happen, >> or you could apply DISTINCT or some other non-trivial processing >> in the upper query. >> >> regards, tom lane >> > > Hello Tom, > > Thanks for replying. Maybe I'm just wrong in my assumption. A user reports > incorrect order in the following query: > > SELECT main.*, COUNT(main.id) OVER() AS search_builder_count_all FROM ( > SELECT DISTINCT main.* FROM Tickets main > LEFT JOIN Groups Groups_2 ON ( Groups_2.Domain = 'RT::Ticket-Role' ) AND > ( Groups_2.Instance = main.id ) > JOIN Queues Queues_1 ON ( Queues_1.id = main.Queue ) > LEFT JOIN CachedGroupMembers CachedGroupMembers_3 ON ( > CachedGroupMembers_3.Disabled = '0' ) AND ( CachedGroupMembers_3.MemberId > IN ('38', '837', ... , '987', '58468') ) AND ( CachedGroupMembers_3.GroupId > = Groups_2.id ) > WHERE ( ( main.Queue IN ('1', ... , '20') OR ( > CachedGroupMembers_3.MemberId IS NOT NULL AND LOWER(Groups_2.Name) IN > ('cc', 'requestor') ) OR ( main.Owner = '38' ) ) >AND (main.IsMerged IS NULL) >AND (main.Status != 'deleted') >AND (main.Type = 'ticket') >AND (main.Owner = '6' AND ( ( Queues_1.Lifecycle = 'default' AND > main.Status IN ('new', 'open', 'stalled') ) OR ( Queues_1.Lifecycle = > 'approvals' AND main.Status IN ('new', 'open', 'stalled') ) ) > ) ORDER BY main.Created DESC ) main LIMIT 50 > > We have an option in our product that makes this query simpler, no joins > in the subquery. The user reports that using this option helps with order. > > This is a too complex query to build a test on. Tried simpler scenarios > and failed. > Hello, First of all I want to apologize. We work with multiple RDBMS systems. This particular user is using mysql. So it's not clear if it works ok or not on Pg. Anyway, yesterday I tried my simplified case on Pg latest, Pg 11 and on mysql latest. Had no luck. Either my test case is too simple or I can not find the correct distribution of data between two tables. Spent too much time on this. Going to work on the query builder and move the "order by" clause out of the subquery. Just to be on the safe side. Most probably it will fix the issue for the user.
Re: Reproducing incorrect order with order by in a subquery
On Wed, Jun 14, 2023 at 3:50 PM Karsten Hilbert wrote: > IOW neither is help to be expected on this list nor can any > testing (on PG) help with anything to be expected on MySQL ? > Don't expect any help on mysql part. As to the question: since the outer query does not have an > ORDER BY it can return results in any order INCLUDING the one > produced by the subquery. Which renders impossible any *proving* that it can return rows in orders different from > Well, in *theory* it can return rows in a different order that doesn't match the order produced by the subquery. As far as I know no RDBMS state in its documentation that ordering between subquery and its outer query is preserved. Some explicitly state the opposite: https://mariadb.com/kb/en/why-is-order-by-in-a-from-subquery-ignored/ MS SQL server: "The ORDER BY clause is not valid in views, inline functions, derived tables, and subqueries, unless either the TOP or OFFSET and FETCH clauses are also specified. When ORDER BY is used in these objects, the clause is used only to determine the rows returned by the TOP clause or OFFSET and FETCH clauses. The ORDER BY clause does not guarantee ordered results when these constructs are queried, unless ORDER BY is also specified in the query itself." My goal was to find a small dataset that demonstrates this ordering mismatch. Failed to reach my goal. Ended up with a code change with a lot of explanations, comments and links to documentation. No prove in tests that the old code was wrong and a new one fixes it. > the subquery *unless* one forces a different order on the > outer query. Which in turn would defeat the purpose as then > the outer query *does* have an explicit ordering... > -- Best regards, Ruslan.
Re: Helping planner to chose sequential scan when it improves performance
On Tue, Jun 13, 2023 at 10:28 PM Patrick O'Toole wrote: > Hi all, > > Questions: > >1. In Plan A, what factors are causing the planner to select a >substantially slower plan despite having recent stats about number of rows? > > Estimated overall cost. For Plan A it is ~200k. For plans B/C (haven't noticed any differences in these two) it is ~250k. The planner uses a less expensive plan. Also, in the plans you can see that Pg estimates the number of rows correctly. > >1. Is there a substantial difference between the on-the-fly hash done >in Plan B and Plan C compared to the hash-index used in Plan A? Can I >assume they are essentially the same? Perhaps there are there differences >in how they're applied? > > I don't see any difference in plans B and C, but you report timing changes. To me this looks like just a fluctuation in measurements. So I wouldn't trust any measurements for plan A either. I'm not a big expert, but can not say that plan A and B are essentially the same. Plan A: DB scans item_text table and for every record looks into the index of conversation_item table, then looks into the table itself. Plan B/C: DB scans conversation_item table without looking into its indexes building a hash table on the fly. >1. Is it common to see values for random_page_cost set as high as 8.0? >We would of course need to investigate whether we see a net positive or net >negative impact on other queries, to adopt this as a general setting, but >is it a proposal we should actually consider? > > No idea. > >1. Maybe we are barking up the wrong tree with the previous questions. >Are there other configuration parameters we should consider first to >improve performance in situations like the one illustrated? > > Recheck your numbers. > >1. Are there other problems with our schema, query, or plans shown >here? Other approaches (or tools/analyses) we should consider? > > You can try the following index: CREATE INDEX conversation_item_ruz1 ON conversation_item(item_uuid, conversation_uuid, tenant_id); I believe this index would allow Pg to use "index only scan" as variation of Plan A and avoid touching the conversation_item table completely. -- Best regards, Ruslan.