Re: Index scan is not pushed down to union all subquery

2023-10-09 Thread Lauri Kajan
On Thu, Oct 5, 2023 at 4:25 PM Lauri Kajan  wrote:

> In my dreams the plan would be something like this:
> Nested Loop
>   ->  Index Scan using dealers_pkey on dealers
> Index Cond: (id = ANY ('{54,12,456,315,468}'::integer[]))
>   ->  Append
> ->  Bitmap Heap Scan on bikes
>   Recheck Cond: (dealer_name = dealers.name)
>   Filter: (frame_size = 52)
>   Rows Removed by Filter: 91
>   ->  Bitmap Index Scan on bikes_dealer_name_idx
> Index Cond: (dealer_name = dealers.name)
> ->  Bitmap Heap Scan on cars
>   Recheck Cond: (dealer_name = dealers.name)
>   ->  Bitmap Index Scan on cars_dealer_name_idx
> Index Cond: (dealer_name = dealers.name)
>


OK,  I'm getting pretty close.
With the following query where I select the filtered column in the union
all and add the where clause to the top level query I get exactly the query
plan I want.

EXPLAIN (ANALYZE)
WITH
  targets as (
select 'bike' vehicle, id, dealer_name, frame_size as filter FROM bikes
union all
select 'car' vehicle, id, dealer_name, null as filter FROM cars
-- In the real use case I have here dozens of tables
  )
SELECT
  dealers.name dealer,
  targets.vehicle,
  targets.id
FROM
  dealers
  JOIN targets
ON dealers.name = targets.dealer_name
WHERE
  dealers.id in (54,12,456,887,468)
  and (filter is null or filter = 52)

But! This is not quite usable since it is tedious to write the query when
there are filters in multiple tables and all different columns must be
added to all the subqueries.

Regardless of that this kind of proves that the desired plan is possible to
run with Postgres. So maybe this is just a missing feature in the Optimizer.


-Lauri


RE: Ask about Foreign Table Plug-in on Windows Server.

2023-10-09 Thread Anuwat Sagulmontreechai
Tyvvm, will try


Best Regards,
Anuwat Sagulmontreechai (Max)
Soft Square Group of Companies
Tel. 02-997-2000 ext.2305

-Original Message-
From: Laurenz Albe  
Sent: Friday, October 6, 2023 2:43 AM
To: Anuwat Sagulmontreechai ; 
pgsql-general@lists.postgresql.org
Cc: jakari...@softsquaregroup.com; phasi...@softsquaregroup.com
Subject: Re: Ask about Foreign Table Plug-in on Windows Server.

On Thu, 2023-10-05 at 10:39 +, Anuwat Sagulmontreechai wrote:
> Our customer install PG on Windows Server 2022 and need to use Foreign 
> Table to connect to SQL Server on WS2022 too, so we tried to find the 
> plug-in but unfortunately we found only Linux version. Could you please 
> advise us?

There are no official releases of tds_fdw for Windows.  But in this issue
(https://github.com/tds-fdw/tds_fdw/issues/53) somebody claims to have build 
the extension on Windows.  You could try the same.

Yours,
Laurenz Albe



GIST index and ORDER BY

2023-10-09 Thread Michał Kłeczek
Hi All,

Maybe somebody could help come up with an idea on how to resolve this.

The situation is as follows:

1. We have a huge (10 billion rows) table with customer operations
2. It needs to be searched by several criteria at the same time - one of them 
is text similarity (and we decided to try pg_trgm)
3. For the purposes of this question the important columns are:
- customer account id (customer can have multiple accounts)
- operation date
- operation pk (autonumber)

4. We want to partition the table by range on operation date for easier 
archiving
5. The queries are always in the form of TOP N ordered by operation date DESC
(Ie. latest operations meeting search criteria)

I’ve decided to have a single “universal” index that will cover all search 
criteria: multicolumn GIST with btree_gist extension.
(Multiple indexes don’t work well because selectivity of subsets of columns is 
too low)

To workaround the lack of GIST support for ORDER BY and implement paging (ie. 
TOP n) I use the following trick:

SELECT
…
WHERE
 customer_id = $cust AND
 operation_date = $date AND operation_id < $id
 AND …other criteria
ORDER BY (‘2200-01-01' <-> operation_date), (MAX_BIGINT <-> operation_id)

UNION ALL

SELECT
…
WHERE
 customer_id = $cust AND
 operation_date < $date
 AND …other criteria
ORDER BY (‘2200-01-01' <-> operation_date), (MAX_BIGINT <-> operation_id)

LIMIT $limit


It all works really well for a single partition - it is a straight forward 
index scan with limit.
The problem is with partition pruning based on ORDER BY because
sort criteria are different than partition criteria - so the plan ends up with 
scanning all partitions and merge joining them.

I tried to partition by (‘2200-01-01' <-> operation_date) expression instead 
and it helps with partition pruning for order by
but disables partition pruning based on search criteria.
Artificially adding this expression to search criteria does not help because 
this criteria is not covered by index and triggers
additional filter in index scan.

I also tried to - instead of indxing operation_date - index (‘2200-01-01' <-> 
operation_date). But it actually does not solve the problem
because sort criteria the becomes: ( MAX_INT <-> indexed_expression )
And the issue is actually the same.

The only solution I could come up with is manual sorting by partition ranges 
using a series of UNION ALL:

SELECT * FROM part_100 … ORDER BY ...
UNION ALL
SELECT * FROM part_99 … ORDER BY …
…

Does anyone have any other idea?

The problem seems to be fundamentally caused by the fact that btree_gist 
requires different expressions for searching and ordering.

Possibly worth posting it to pgsql-hackers to find out if there are fundamental 
reasons why GIST does not support ORDER BY
_even_ for btree_gist (where you could implement the above trick internally).


Thanks for help!

—
Michal

Proposal to Compile a 256-Byte Identifier Length Version Alongside the Current 64-Byte Version

2023-10-09 Thread David HJ
Hello PostgreSQL Community,

I am writing to propose an alternative compilation of PostgreSQL that
allows for a 256-byte identifier length limit, alongside the existing
64-byte version.

*Problem:*
The current limit of 63 bytes can be quite restrictive, especially for
databases that use multi-byte character sets like UTF-8. In such cases, a
Chinese character takes up 3 bytes, limiting the name to just 21 characters.

*Use-Case:*
In specific use-cases, table names like
"能源消耗统计_全球主要国家石油与天然气使用量_年度碳排放与可再生能源比例表" and
"气候变化研究_全球主要地区极端天气事件频率_年度灾害损失与应对措施表" can far exceed the current limit.

*Proposed Solution:*
I propose that we offer an alternative compilation of PostgreSQL that
increases the NAMEDATALEN constant to allow for 256-byte identifiers. This
would be particularly useful for databases that make extensive use of
multi-byte character sets like UTF-8.

*Dual Compilation:*
To maintain backward compatibility and to offer more flexibility, I suggest
compiling this 256-byte version alongside the existing 64-byte version.
Users could then choose which version best suits their needs.

*Backward Compatibility:*
I understand that this change could affect backward compatibility. Offering
this as an alternative compilation could be a way to mitigate those
concerns. I am open to suggestions on how this could be implemented to
minimize disruptions.

I look forward to your feedback and hope we can make PostgreSQL even more
adaptable to various needs.

Best regards,
Chuxiong


Re: Proposal to Compile a 256-Byte Identifier Length Version Alongside the Current 64-Byte Version

2023-10-09 Thread Laurenz Albe
On Tue, 2023-10-10 at 11:49 +0800, David HJ wrote:
> I am writing to propose an alternative compilation of PostgreSQL that allows
> for a 256-byte identifier length limit, alongside the existing 64-byte 
> version.
> 
> Problem:
> The current limit of 63 bytes can be quite restrictive, especially for 
> databases
> that use multi-byte character sets like UTF-8. In such cases, a Chinese 
> character
> takes up 3 bytes, limiting the name to just 21 characters.
> 
> Use-Case:
> In specific use-cases, table names like
> "能源消耗统计_全球主要国家石油与天然气使用量_年度碳排放与可再生能源比例表" and
> "气候变化研究_全球主要地区极端天气事件频率_年度灾害损失与应对措施表" can far exceed
> the current limit.

I have little understanding for people who name their tables
energy_consumption_statistics_oil_and_natural_gas_usage_in_major_countries_around_the_world_annual_carbon_emissions_and_renewable_energy_ratio_table.
The table could just as well be named "能源统计排放比".  The exact description could
go in the table comment.  Apart from that, it is a good idea to use table names
that are standard SQL identifiers, so that you don't have to double quote them
all the time.

Anyway, you are not the first person to hit the limit, so there is clearly a
pain that some people feel.

> Proposed Solution:
> I propose that we offer an alternative compilation of PostgreSQL that 
> increases
> the NAMEDATALEN constant to allow for 256-byte identifiers. This would be
> particularly useful for databases that make extensive use of multi-byte 
> character
> sets like UTF-8.

Wouldn't it be a good solution if we promote the #define to a configure option,
like "./configure --identifier-length-limit=256"?

Yours,
Laurenz Albe




Re: Proposal to Compile a 256-Byte Identifier Length Version Alongside the Current 64-Byte Version

2023-10-09 Thread Julien Rouhaud
On Tue, Oct 10, 2023 at 08:22:55AM +0200, Laurenz Albe wrote:
> On Tue, 2023-10-10 at 11:49 +0800, David HJ wrote:
> > I am writing to propose an alternative compilation of PostgreSQL that allows
> > for a 256-byte identifier length limit, alongside the existing 64-byte 
> > version.
> >
> > Problem:
> > The current limit of 63 bytes can be quite restrictive, especially for 
> > databases
> > that use multi-byte character sets like UTF-8. In such cases, a Chinese 
> > character
> > takes up 3 bytes, limiting the name to just 21 characters.
>
> Anyway, you are not the first person to hit the limit, so there is clearly a
> pain that some people feel.
>
> > Proposed Solution:
> > I propose that we offer an alternative compilation of PostgreSQL that 
> > increases
> > the NAMEDATALEN constant to allow for 256-byte identifiers. This would be
> > particularly useful for databases that make extensive use of multi-byte 
> > character
> > sets like UTF-8.
>
> Wouldn't it be a good solution if we promote the #define to a configure 
> option,
> like "./configure --identifier-length-limit=256"?

Note that there was some thread recently [1] where the possibility of having
some kind of compilation matrix to generate multiple set of binaries with
various compile-time values was discussed, so I guess it could fit well with
that approach.

[1]
https://www.postgresql.org/message-id/20230630211153.kbysulcjedxa5...@awork3.anarazel.de
and following messages




Re: Proposal to Compile a 256-Byte Identifier Length Version Alongside the Current 64-Byte Version

2023-10-09 Thread Tatsuo Ishii
> Hello PostgreSQL Community,
> 
> I am writing to propose an alternative compilation of PostgreSQL that
> allows for a 256-byte identifier length limit, alongside the existing
> 64-byte version.
> 
> *Problem:*
> The current limit of 63 bytes can be quite restrictive, especially for
> databases that use multi-byte character sets like UTF-8. In such cases, a
> Chinese character takes up 3 bytes, limiting the name to just 21 characters.
> 
> *Use-Case:*
> In specific use-cases, table names like
> "能源消耗统计_全球主要国家石油与天然气使用量_年度碳排放与可再生能源比例表" and
> "气候变化研究_全球主要地区极端天气事件频率_年度灾害损失与应对措施表" can far exceed the current limit.
> 
> *Proposed Solution:*
> I propose that we offer an alternative compilation of PostgreSQL that
> increases the NAMEDATALEN constant to allow for 256-byte identifiers. This
> would be particularly useful for databases that make extensive use of
> multi-byte character sets like UTF-8.

Another solution would be, letting the meaning of NAMEDATALEN to be
number of *characters*, not the number of bytes. This way, you can use
up to 64 UTF-8 characters. In my understanding MySQL already does this
way. I know this requires non trivial code modifications to PostgreSQL
but would be better than to make binaries with random NAMEDATALEN
values.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp