Binary encoding of timetz type

2021-03-16 Thread Ruslan
Hi folks,

Apologies if it's the wrong place to ask. I have a question for postgres
developers.

I wrote parser for COPY binary encoding. Everything is great but one thing
worries me. It seems like the time offset field has reversed digit sign.
I'm referring to this function

https://github.com/postgres/postgres/blob/REL_13_STABLE/src/backend/utils/adt/date.c#L2150-L2159

COPY response for value *'13:37:00-0300'::timetz* gives me binary
0x000b69d1d7002a30. And value *'13:37:00+0300'::timetz* is encoded
as 0x000b69d1d700d5d0.

0xd5d0 is -10800 for signed integer and 0x2a30 is 10800.

I reverse the sign in my code but feeling that I'm missing something is
nagging me. Please advise.


Reproducing incorrect order with order by in a subquery

2023-06-13 Thread Ruslan Zakirov
Hello,

For example I have a query:

SELECT main.*, count(*) OVER () FROM (SELECT DISTINCT ... ORDER BY X) main;

So the `ORDER BY` clause ended up in a subquery. Most of the time ordering
works until it doesn't.

Can you help me create a set of test tables with some data to reproduce
this problem more repeatedly? I just want to write a regression test to
make sure it doesn't happen again.

-- 
Best regards, Ruslan.


Re: Reproducing incorrect order with order by in a subquery

2023-06-13 Thread Ruslan Zakirov
On Tue, Jun 13, 2023 at 1:26 PM Thomas Kellerer  wrote:

> Ruslan Zakirov schrieb am 13.06.2023 um 09:49:
> > For example I have a query:
> >
> > SELECT main.*, count(*) OVER () FROM (SELECT DISTINCT ... ORDER BY X)
> > main;
> >
> > So the `ORDER BY` clause ended up in a subquery. Most of the time
> > ordering works until it doesn't.
> >
> > Can you help me create a set of test tables with some data to
> > reproduce this problem more repeatedly? I just want to write a
> > regression test to make sure it doesn't happen again.
> Your final/overall query has no ORDER BY, so Postgres is free to return
> the result in any order it likes.
>
> You will have to add an ORDER BY to the "main" part to get a guaranteed
> sort order
>

I know how to fix the problem and I know that ORDER BY should be in the
outermost select.

However, I want to write a test case that shows that the old code is wrong,
but can not create
minimal set of tables to reproduce it. With this I'm looking for help.

-- 
Best regards, Ruslan.


Re: Reproducing incorrect order with order by in a subquery

2023-06-13 Thread Ruslan Zakirov
On Tue, Jun 13, 2023 at 6:06 PM Tom Lane  wrote:

> Ruslan Zakirov  writes:
> > I know how to fix the problem and I know that ORDER BY should be in the
> > outermost select.
>
> > However, I want to write a test case that shows that the old code is
> wrong,
> > but can not create
> > minimal set of tables to reproduce it. With this I'm looking for help.
>
> The ORDER BY in the sub-select will be honored at the output of the
> sub-select.  To have a different ordering at the final output, you
> need the upper query to do something that would re-order the rows.
> Joining the sub-select to something else might make that happen,
> or you could apply DISTINCT or some other non-trivial processing
> in the upper query.
>
> regards, tom lane
>

Hello Tom,

Thanks for replying. Maybe I'm just wrong in my assumption. A user reports
incorrect order in the following query:

SELECT main.*, COUNT(main.id) OVER() AS search_builder_count_all FROM (
  SELECT DISTINCT main.* FROM Tickets main
  LEFT JOIN Groups Groups_2 ON ( Groups_2.Domain = 'RT::Ticket-Role' ) AND
( Groups_2.Instance = main.id )
  JOIN Queues Queues_1 ON ( Queues_1.id = main.Queue )
  LEFT JOIN CachedGroupMembers CachedGroupMembers_3 ON (
CachedGroupMembers_3.Disabled = '0' ) AND ( CachedGroupMembers_3.MemberId
IN ('38', '837', ... , '987', '58468') ) AND ( CachedGroupMembers_3.GroupId
= Groups_2.id )
WHERE ( ( main.Queue IN ('1', ... , '20') OR (
CachedGroupMembers_3.MemberId IS NOT NULL AND LOWER(Groups_2.Name) IN
('cc', 'requestor') ) OR ( main.Owner = '38' ) )
   AND (main.IsMerged IS NULL)
   AND (main.Status != 'deleted')
   AND (main.Type = 'ticket')
   AND (main.Owner = '6' AND ( ( Queues_1.Lifecycle = 'default' AND
main.Status IN ('new', 'open', 'stalled') ) OR ( Queues_1.Lifecycle =
'approvals' AND main.Status IN ('new', 'open', 'stalled') ) )
) ORDER BY main.Created DESC ) main LIMIT 50

We have an option in our product that makes this query simpler, no joins in
the subquery. The user reports that using this option helps with order.

This is a too complex query to build a test on. Tried simpler scenarios and
failed.

-- 
Best regards, Ruslan.


Re: Reproducing incorrect order with order by in a subquery

2023-06-14 Thread Ruslan Zakirov
On Tue, Jun 13, 2023 at 8:55 PM Ruslan Zakirov 
wrote:

> On Tue, Jun 13, 2023 at 6:06 PM Tom Lane  wrote:
>
>> Ruslan Zakirov  writes:
>> > I know how to fix the problem and I know that ORDER BY should be in the
>> > outermost select.
>>
>> > However, I want to write a test case that shows that the old code is
>> wrong,
>> > but can not create
>> > minimal set of tables to reproduce it. With this I'm looking for help.
>>
>> The ORDER BY in the sub-select will be honored at the output of the
>> sub-select.  To have a different ordering at the final output, you
>> need the upper query to do something that would re-order the rows.
>> Joining the sub-select to something else might make that happen,
>> or you could apply DISTINCT or some other non-trivial processing
>> in the upper query.
>>
>> regards, tom lane
>>
>
> Hello Tom,
>
> Thanks for replying. Maybe I'm just wrong in my assumption. A user reports
> incorrect order in the following query:
>
> SELECT main.*, COUNT(main.id) OVER() AS search_builder_count_all FROM (
>   SELECT DISTINCT main.* FROM Tickets main
>   LEFT JOIN Groups Groups_2 ON ( Groups_2.Domain = 'RT::Ticket-Role' ) AND
> ( Groups_2.Instance = main.id )
>   JOIN Queues Queues_1 ON ( Queues_1.id = main.Queue )
>   LEFT JOIN CachedGroupMembers CachedGroupMembers_3 ON (
> CachedGroupMembers_3.Disabled = '0' ) AND ( CachedGroupMembers_3.MemberId
> IN ('38', '837', ... , '987', '58468') ) AND ( CachedGroupMembers_3.GroupId
> = Groups_2.id )
> WHERE ( ( main.Queue IN ('1', ... , '20') OR (
> CachedGroupMembers_3.MemberId IS NOT NULL AND LOWER(Groups_2.Name) IN
> ('cc', 'requestor') ) OR ( main.Owner = '38' ) )
>AND (main.IsMerged IS NULL)
>AND (main.Status != 'deleted')
>AND (main.Type = 'ticket')
>AND (main.Owner = '6' AND ( ( Queues_1.Lifecycle = 'default' AND
> main.Status IN ('new', 'open', 'stalled') ) OR ( Queues_1.Lifecycle =
> 'approvals' AND main.Status IN ('new', 'open', 'stalled') ) )
> ) ORDER BY main.Created DESC ) main LIMIT 50
>
> We have an option in our product that makes this query simpler, no joins
> in the subquery. The user reports that using this option helps with order.
>
> This is a too complex query to build a test on. Tried simpler scenarios
> and failed.
>

Hello,

First of all I want to apologize. We work with multiple RDBMS systems. This
particular user is using mysql. So it's not clear if it works ok or not on
Pg.

Anyway, yesterday I tried my simplified case on Pg latest, Pg 11 and on
mysql latest. Had no luck. Either my test case is too simple or I can not
find the correct distribution of data between two tables.

Spent too much time on this. Going to work on the query builder and move
the "order by" clause out of the subquery. Just to be on the safe side.
Most probably it will fix the issue for the user.


Re: Reproducing incorrect order with order by in a subquery

2023-06-14 Thread Ruslan Zakirov
On Wed, Jun 14, 2023 at 3:50 PM Karsten Hilbert 
wrote:

> IOW neither is help to be expected on this list nor can any
> testing (on PG) help with anything to be expected on MySQL ?
>

Don't expect any help on mysql part.

As to the question: since the outer query does not have an
> ORDER BY it can return results in any order INCLUDING the one
> produced by the subquery. Which renders impossible any

*proving* that it can return rows in orders different from
>

Well, in *theory* it can return rows in a different order that doesn't
match the order produced
by the subquery. As far as I know no RDBMS state in its documentation that
ordering between
subquery and its outer query is preserved. Some explicitly state the
opposite:

https://mariadb.com/kb/en/why-is-order-by-in-a-from-subquery-ignored/

MS SQL server:

"The ORDER BY clause is not valid in views, inline functions, derived
tables, and subqueries, unless either the TOP or OFFSET and FETCH clauses
are also specified. When ORDER BY is used in these objects, the clause is
used only to determine the rows returned by the TOP clause or OFFSET and
FETCH clauses. The ORDER BY clause does not guarantee ordered results when
these constructs are queried, unless ORDER BY is also specified in the
query itself."

My goal was to find a small dataset that demonstrates this ordering
mismatch.

Failed to reach my goal. Ended up with a code change with a lot of
explanations,
comments and links to documentation. No prove in tests that the old code
was wrong
and a new one fixes it.


> the subquery *unless* one forces a different order on the
> outer query. Which in turn would defeat the purpose as then
> the outer query *does* have an explicit ordering...
>

-- 
Best regards, Ruslan.


Re: Helping planner to chose sequential scan when it improves performance

2023-06-14 Thread Ruslan Zakirov
On Tue, Jun 13, 2023 at 10:28 PM Patrick O'Toole 
wrote:

> Hi all,
>



> Questions:
>
>1. In Plan A, what factors are causing the planner to select a
>substantially slower plan despite having recent stats about number of rows?
>
> Estimated overall cost. For Plan A it is ~200k. For plans B/C (haven't
noticed any differences in these two) it is ~250k. The planner uses a less
expensive plan.

Also, in the plans you can see that Pg estimates the number of rows
correctly.

>
>1. Is there a substantial difference between the on-the-fly hash done
>in Plan B and Plan C compared to the hash-index used in Plan A? Can I
>assume they are essentially the same? Perhaps there are there differences
>in how they're applied?
>
> I don't see any difference in plans B and C, but you report timing
changes. To me this looks like just a fluctuation in measurements. So I
wouldn't trust any measurements for plan A either.

I'm not a big expert, but can not say that plan A and B are essentially the
same.

Plan A: DB scans item_text table and for every record looks into the index
of conversation_item table, then looks into the table itself.

Plan B/C: DB scans conversation_item table without looking into its indexes
building a hash table on the fly.



>1. Is it common to see values for random_page_cost set as high as 8.0?
>We would of course need to investigate whether we see a net positive or net
>negative impact on other queries, to adopt this as a general setting, but
>is it a proposal we should actually consider?
>
> No idea.

>
>1. Maybe we are barking up the wrong tree with the previous questions.
>Are there other configuration parameters we should consider first to
>improve performance in situations like the one illustrated?
>
> Recheck your numbers.

>
>1. Are there other problems with our schema, query, or plans shown
>here? Other approaches (or tools/analyses) we should consider?
>
> You can try the following index:

CREATE INDEX conversation_item_ruz1 ON conversation_item(item_uuid,
conversation_uuid, tenant_id);

I believe this index would allow Pg to use "index only scan" as variation
of Plan A and avoid touching the conversation_item table completely.

-- 
Best regards, Ruslan.