Re: Can we go beyond the standard to make Postgres radically better?

2022-02-13 Thread Peter J. Holzer
On 2022-02-13 01:11:16 +0100, Andreas 'ads' Scherbaum wrote:
> On 12/02/2022 22:34, Peter J. Holzer wrote:
> > On 2022-02-12 22:09:25 +0100, Andreas 'ads' Scherbaum wrote:
> > > On 12/02/2022 20:50, Peter J. Holzer wrote:
> > > > On 2022-02-12 01:18:04 +0100, Andreas 'ads' Scherbaum wrote:
> > > > > On 10/02/2022 18:22, Peter J. Holzer wrote:
> > > > > > On 2022-02-09 21:14:39 -0800, Guyren Howe wrote:
> > > > > > > Examples of small things Postgres could have:
> > > > > > > 
> > > > > > >  • SELECT * - b.a_id from a natural join b
> > [...]
> > > > > Maybe for this specific use case it's easier to teach psql how to do 
> > > > > that,
> > [...]
> > > > I think the easiest way to get the columns would be to EXPLAIN(verbose)
> > > > the query. Otherwise psql (or whatever your shell is) would have to
> > > > completely parse the SQL statement to find the columns.
[...]
> > > I was thinking about this problem for a while, and it's not easy to solve.
> > > Hence I came up with the idea that psql could - once the table is known
> > > and very specific psql syntax is there (\- as example) replace the * with
> > > the actual columns. All of this before the query is run, and as a user you
> > > can edit the column list further.
> > Yeah, but the problem is that it isn't that easy for psql to figure out
> > which table is involved.
> 
> The complaint is not about complex queries, or CTEs, or Joins. This is
> about simple queries where a user wants to discover - surf - the database
> and look into specific tables, but exclude certain columns.

If you look back through this thread you will notice that it was me who
brought up that specific scenario. You might trust me to know what I was
thinking about when I wrote it ;-).

> More specifically, this is when the user types in interactive queries.

I do write quite complex queries interactively. Even when "surfing",
joins are common because in a normalized schema many columns just
contain meaningless foreign keys. But I also do some data analysis
interactively (which is where CTEs usually come in) and I like to
develop queries interactively before putting them into (Python) programs.

But even if I didn't do that - if you add a feature like that to psql,
it should work for all queries, not just a tiny subset.

> Today psql tries to do autocomplete for certain scenarios, this too does
> not work in complex queries, but nevertheless is a useful help if someone
> tries to run simple, interactive queries.

Autocomplete doesn't even work well for simple queries. It's better than
nothing, but I don't think that it should serve as a model for new
features.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Can we go beyond the standard to make Postgres radically better?

2022-02-13 Thread Peter J. Holzer
On 2022-02-12 20:12:02 -0500, Mladen Gogala wrote:
> On 2/12/22 19:11, Andreas 'ads' Scherbaum wrote:
> 
> The complaint is not about complex queries, or CTEs, or Joins. This is
> about simple queries where a user wants to discover - surf - the database
> and look into specific tables, but exclude certain columns. More
> specifically,
> this is when the user types in interactive queries.
> 
> There is already something very similar to what you are describing:
> 
> https://www.psycopg.org/docs/cursor.html

I'm not sure whether the PEP 249 notion of a cursor is relevant here.
That's quite Python specific and at least one step removed from the SQL
concept of a cursor.


> Each cursor has its description, which consists of the column descriptions.

Not really. While description is a property of the cursor object in
Python, it always describes the last query executed within that cursor:

% python3
Python 3.8.10 (default, Nov 26 2021, 20:14:08)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import psycopg2
>>> db = psycopg2.connect("")
>>> csr = db.cursor()
>>> csr.description

(no output)

>>> csr.execute("select * from t1 natural join t2")
>>> csr.description
(Column(name='a', type_code=23), Column(name='b', type_code=23), 
Column(name='c', type_code=23))
>>> csr.fetchall()
[(1, 2, 11), (1, 2, 10)]
>>> csr.description
(Column(name='a', type_code=23), Column(name='b', type_code=23), 
Column(name='c', type_code=23))

(we can still refer to the description even after fetching all the data)

>>> csr.execute("select x from t1 natural join t2")
Traceback (most recent call last):
  File "", line 1, in 
psycopg2.errors.UndefinedColumn: column "x" does not exist
LINE 1: select x from t1 natural join t2

>>> csr.description

(no output again after a failed query)


> Basically, it's like doing \d on a cursor. Unfortunately, it's not 
> interactive,
> one has to do some pythong programming in order do to that. Unfortunately, it
> is not possible to just "describe the cursor", the description becomes
> available after the "execute" call.

Yup, as demonstrated above. Which means that you have to actually
execute the query. Which is something that a should not happen as a side
effect of editing the query.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Can we go beyond the standard to make Postgres radically better?

2022-02-13 Thread Pavel Stehule
ne 13. 2. 2022 v 9:29 odesílatel Peter J. Holzer  napsal:

> On 2022-02-13 01:11:16 +0100, Andreas 'ads' Scherbaum wrote:
> > On 12/02/2022 22:34, Peter J. Holzer wrote:
> > > On 2022-02-12 22:09:25 +0100, Andreas 'ads' Scherbaum wrote:
> > > > On 12/02/2022 20:50, Peter J. Holzer wrote:
> > > > > On 2022-02-12 01:18:04 +0100, Andreas 'ads' Scherbaum wrote:
> > > > > > On 10/02/2022 18:22, Peter J. Holzer wrote:
> > > > > > > On 2022-02-09 21:14:39 -0800, Guyren Howe wrote:
> > > > > > > > Examples of small things Postgres could have:
> > > > > > > >
> > > > > > > >  • SELECT * - b.a_id from a natural join b
> > > [...]
> > > > > > Maybe for this specific use case it's easier to teach psql how
> to do that,
> > > [...]
> > > > > I think the easiest way to get the columns would be to
> EXPLAIN(verbose)
> > > > > the query. Otherwise psql (or whatever your shell is) would have to
> > > > > completely parse the SQL statement to find the columns.
> [...]
> > > > I was thinking about this problem for a while, and it's not easy to
> solve.
> > > > Hence I came up with the idea that psql could - once the table is
> known
> > > > and very specific psql syntax is there (\- as example) replace the *
> with
> > > > the actual columns. All of this before the query is run, and as a
> user you
> > > > can edit the column list further.
> > > Yeah, but the problem is that it isn't that easy for psql to figure out
> > > which table is involved.
> >
> > The complaint is not about complex queries, or CTEs, or Joins. This is
> > about simple queries where a user wants to discover - surf - the database
> > and look into specific tables, but exclude certain columns.
>
> If you look back through this thread you will notice that it was me who
> brought up that specific scenario. You might trust me to know what I was
> thinking about when I wrote it ;-).
>
> > More specifically, this is when the user types in interactive queries.
>
> I do write quite complex queries interactively. Even when "surfing",
> joins are common because in a normalized schema many columns just
> contain meaningless foreign keys. But I also do some data analysis
> interactively (which is where CTEs usually come in) and I like to
> develop queries interactively before putting them into (Python) programs.
>
> But even if I didn't do that - if you add a feature like that to psql,
> it should work for all queries, not just a tiny subset.
>
> > Today psql tries to do autocomplete for certain scenarios, this too does
> > not work in complex queries, but nevertheless is a useful help if someone
> > tries to run simple, interactive queries.
>
> Autocomplete doesn't even work well for simple queries. It's better than
> nothing, but I don't think that it should serve as a model for new
> features.
>

The MySQL autocomplete is designed without context filtering. Maybe we can
have this implementation too (as alternative)

so using all column names + all table names + aliases.column names (when we
know defined alias)

Another idea about column excluding. Any implementation on the client side
is very complex, because you need to parse sql. But maybe we can enhance
SQL with some syntax.

SELECT ... and LAST optional clause can be HIDE, OR EXCLUDE NAME or EXCLUDE
TYPE

SELECT * FROM postgistab HIDE TYPE LIKE 'wkb%'
SELECT * FROM postgistab ORDER BY ... EXCLUDE COLUMN LIKE 'shape%'

WITH x AS (SELECT * FROM xx)
SELECT * FROM x EXCLUDE COLUMN x1,x2

The column excluding should be separate *last* clase.

More with this syntax is less possibility of collision with ANSI SQL

What do you think about it?

Regards

Pavel




Implementation on the server side can be pretty easy then.






>
> hp
>
> --
>_  | Peter J. Holzer| Story must make more sense than reality.
> |_|_) ||
> | |   | h...@hjp.at |-- Charles Stross, "Creative writing
> __/   | http://www.hjp.at/ |   challenge!"
>


Re: Can we go beyond the standard to make Postgres radically better?

2022-02-13 Thread Guyren Howe

> > The MySQL autocomplete is designed without context filtering. Maybe we can 
> > have this implementation too (as alternative)
> >
> > so using all column names + all table names + aliases.column names (when we 
> > know defined alias)
> >
> > Another idea about column excluding. Any implementation on the client side 
> > is very complex, because you need to parse sql. But maybe we can enhance 
> > SQL with some syntax.
> >
> > SELECT ... and LAST optional clause can be HIDE, OR EXCLUDE NAME or EXCLUDE 
> > TYPE
> >
> > SELECT * FROM postgistab HIDE TYPE LIKE 'wkb%'
> > SELECT * FROM postgistab ORDER BY ... EXCLUDE COLUMN LIKE 'shape%'
> >
> > WITH x AS (SELECT * FROM xx)
> > SELECT * FROM x EXCLUDE COLUMN x1,x2
> >
> > The column excluding should be separate *last* clase.
> >
> > More with this syntax is less possibility of collision with ANSI SQL
Not against this. Seems somewhere in here might be a nice quality of life 
change.

Still.

I originally suggested that SQL is terrible and we should fearlessly either 
replace it or migrate it toward something better. And the thread winds up with 
a debate about a minor change to a development tool.

I’m back to just having no earthly idea why anyone who finds relations to be a 
productive tool for building a model would think that SQL being the only means 
to do that is Okay.

SMH.


Re: Can we go beyond the standard to make Postgres radically better?

2022-02-13 Thread Pavel Stehule
ne 13. 2. 2022 v 10:45 odesílatel Guyren Howe  napsal:

>
> The MySQL autocomplete is designed without context filtering. Maybe we can
> have this implementation too (as alternative)
>
> so using all column names + all table names + aliases.column names (when
> we know defined alias)
>
> Another idea about column excluding. Any implementation on the client side
> is very complex, because you need to parse sql. But maybe we can enhance
> SQL with some syntax.
>
> SELECT ... and LAST optional clause can be HIDE, OR EXCLUDE NAME or
> EXCLUDE TYPE
>
> SELECT * FROM postgistab HIDE TYPE LIKE 'wkb%'
> SELECT * FROM postgistab ORDER BY ... EXCLUDE COLUMN LIKE 'shape%'
>
> WITH x AS (SELECT * FROM xx)
> SELECT * FROM x EXCLUDE COLUMN x1,x2
>
> The column excluding should be separate *last* clase.
>
> More with this syntax is less possibility of collision with ANSI SQL
>
> Not against this. Seems somewhere in here might be a nice quality of life
> change.
>
> Still.
>
> I originally suggested that SQL is terrible and we should fearlessly
> either replace it or migrate it toward something better. And the thread
> winds up with a debate about a minor change to a development tool.
>
> I’m back to just having no earthly idea why anyone who finds relations to
> be a productive tool for building a model would think that SQL being the
> only means to do that is Okay.
>

I think the rating of data langues is very subjective, and I am happy with
SQL more than with special languages like D or Quel, or other. I know SQL
has a lot of disadvantages, but it was designed for humans and it works for
me.

I don't think using a special query language needs some special interface
in Postgres. You can use COBOL with SQL databases today, The overhead of
some middle layers should be low.

But there can be a valid second question - it can be nice to use extensions
with availability to define their own communication protocol. Postgres has
a special protocol for replication or for backup. With this possibility you
can do what you need without the necessity of an external application
server.

Regards

Pavel



>
> SMH.
>


How to split normal and overtime hours

2022-02-13 Thread Andrus

Hi!

Hours table contains working hours for jobs:

    create table hours (
    jobid integer primary key, -- job done, unique for person
    personid char(10) not null, -- person who did job
    hours numeric(5,2) not null -- hours worked for job
    )

Hours more than 120 are overtime hours.

How to split regular and overtime hours into different columns using 
running total by job id and partition by person id?


For example, if John did 3 jobs 1, 2,3 with 90, 50 and 40 hours (total 
180 hours) for each job correspondingly, result table should be:


    personid    jobid  normal_hours   overtime_hours
    john    1 90   0
    john    2 30  20
    john    3  0  40

sum on normal_hours column should not be greater than 120 per person.

sum of normal_hours and overtime_hours columns must be same as sum of 
hours column in hours table for every person.


Note that since hours running total becomes greater than 120 in job 2, 
job 2 hours should appear in both hours columns.


Maybe window functions can used.

Andrus.


Re: How to split normal and overtime hours

2022-02-13 Thread Torsten Förtsch
something like

SELECT *
 , least(sum(hours) OVER w, 120) AS regular
 , greatest(sum(hours) OVER w - 120, 0) AS overtime
  FROM hours
WINDOW w AS (PARTITION BY person ORDER BY job_id);

 job_id | person | hours | regular | overtime
++---+-+--
  2 | bill   |10 |  10 |0
  5 | bill   |40 |  50 |0
  8 | bill   |10 |  60 |0
 10 | bill   |70 | 120 |   10
 11 | bill   |30 | 120 |   40
 13 | bill   |40 | 120 |   80
 15 | bill   |10 | 120 |   90
  4 | hugo   |70 |  70 |0
  7 | hugo   |   130 | 120 |   80
  1 | john   |10 |  10 |0
  3 | john   |50 |  60 |0
  6 | john   |30 |  90 |0
  9 | john   |50 | 120 |   20
 12 | john   |30 | 120 |   50
 14 | john   |50 | 120 |  100


On Sun, Feb 13, 2022 at 12:47 PM Andrus  wrote:

> Hi!
>
> Hours table contains working hours for jobs:
>
> create table hours (
> jobid integer primary key, -- job done, unique for person
> personid char(10) not null, -- person who did job
> hours numeric(5,2) not null -- hours worked for job
> )
>
> Hours more than 120 are overtime hours.
>
> How to split regular and overtime hours into different columns using
> running total by job id and partition by person id?
>
> For example, if John did 3 jobs 1, 2,3 with 90, 50 and 40 hours (total 180
> hours) for each job correspondingly, result table should be:
>
> personidjobid  normal_hours   overtime_hours
> john1 90   0
> john2 30  20
> john3  0  40
>
> sum on normal_hours column should not be greater than 120 per person.
>
> sum of normal_hours and overtime_hours columns must be same as sum of
> hours column in hours table for every person.
>
> Note that since hours running total becomes greater than 120 in job 2, job
> 2 hours should appear in both hours columns.
>
> Maybe window functions can used.
>
> Andrus.
>


Re: How to split normal and overtime hours

2022-02-13 Thread Torsten Förtsch
WITH x AS (
   SELECT *
, sum(hours) OVER w AS s
 FROM hours
   WINDOW w AS (PARTITION BY person ORDER BY job_id)
)
SELECT *
, greatest(least(s, 120) - coalesce(lag(s, 1) OVER w, 0), 0) AS regular
, hours - greatest(least(s, 120) - coalesce(lag(s, 1) OVER w, 0), 0) AS
overtime
 FROM x
WINDOW w AS (PARTITION BY person ORDER BY job_id)


On Sun, Feb 13, 2022 at 1:57 PM Andrus  wrote:

> Hi!
>
> Thank you. In this result, regular and overtime columns contain running
> totals.
>
> How to fix this so that those columns contain just hours for each job?
>
> sum on regular column should not be greater than 120 per person.
>
> sum of regular and overtime  columns must be same as sum of hours column
> in hours table for every person.
>
> Andrus.
> 13.02.2022 14:46 Torsten Förtsch kirjutas:
>
> something like
>
> SELECT *
>  , least(sum(hours) OVER w, 120) AS regular
>  , greatest(sum(hours) OVER w - 120, 0) AS overtime
>   FROM hours
> WINDOW w AS (PARTITION BY person ORDER BY job_id);
>
>  job_id | person | hours | regular | overtime
> ++---+-+--
>   2 | bill   |10 |  10 |0
>   5 | bill   |40 |  50 |0
>   8 | bill   |10 |  60 |0
>  10 | bill   |70 | 120 |   10
>  11 | bill   |30 | 120 |   40
>  13 | bill   |40 | 120 |   80
>  15 | bill   |10 | 120 |   90
>   4 | hugo   |70 |  70 |0
>   7 | hugo   |   130 | 120 |   80
>   1 | john   |10 |  10 |0
>   3 | john   |50 |  60 |0
>   6 | john   |30 |  90 |0
>   9 | john   |50 | 120 |   20
>  12 | john   |30 | 120 |   50
>  14 | john   |50 | 120 |  100
>
>
> On Sun, Feb 13, 2022 at 12:47 PM Andrus  wrote:
>
>> Hi!
>>
>> Hours table contains working hours for jobs:
>>
>> create table hours (
>> jobid integer primary key, -- job done, unique for person
>> personid char(10) not null, -- person who did job
>> hours numeric(5,2) not null -- hours worked for job
>> )
>>
>> Hours more than 120 are overtime hours.
>>
>> How to split regular and overtime hours into different columns using
>> running total by job id and partition by person id?
>>
>> For example, if John did 3 jobs 1, 2,3 with 90, 50 and 40 hours (total
>> 180 hours) for each job correspondingly, result table should be:
>>
>> personidjobid  normal_hours   overtime_hours
>> john1 90   0
>> john2 30  20
>> john3  0  40
>>
>> sum on normal_hours column should not be greater than 120 per person.
>>
>> sum of normal_hours and overtime_hours columns must be same as sum of
>> hours column in hours table for every person.
>>
>> Note that since hours running total becomes greater than 120 in job 2,
>> job 2 hours should appear in both hours columns.
>>
>> Maybe window functions can used.
>>
>> Andrus.
>>
>


Re: Can we go beyond the standard to make Postgres radically better?

2022-02-13 Thread Mladen Gogala

On 2/13/22 05:00, Pavel Stehule wrote:
But there can be a valid second question - it can be nice to use 
extensions with availability to define their own communication 
protocol. Postgres has a special protocol for replication or for 
backup. With this possibility you can do what you need without the 
necessity of an external application server.


Regards


And here we are back on the Babelfish question. Babelfish defines its 
own protocol (TDS = Table Data Streaming) but it requires building 
PostgreSQL from the source, with some hooks for the additional protocol. 
A built in mechanism to do that without rebuilding would be nice. 
Babelfish Postgres is version 13.5, currently the most prevalent version 
in the data centers. A library to implement foreign communication 
protocol primitives would be a very nice thing indeed.


--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com


FDW error on remote view

2022-02-13 Thread Michael Mauger
I'm working on building a demonstration of this error, but I figured
I'd ask the question first.

BACKGROUND

I've got two databases, stage and app. There is a postgres_fdw
connection between them so that stage can see objects in the app
database. The app database contains complex metadata that is used to
describe data and the tables/views in stage used to load the data. I
have built a view that generates the SQL for creating multiple staging
tables and views.  I have used PG ENUM types in several places in the
metadata that are used in the generation of the stage objects.

PROBLEM

When I query the view in the app schema, the appropriate SQL is
generated.  If I save the SQL and run it in the stage database it is
successful and creates the correct objects.

However, after I create a FOREIGN TABLE in the stage database that
references the app view, I get an error when I query that table.  The
error indicates a problem executing the view in the app database. The
error highlights one of the app ENUM types not being defined. There is
no direct reference to the ENUM in the view itself, but there is a
reference in one of nested functions deep within the view.

QUESTION

Why is there an error in the app context only when I invoke the view
through a FOREIGN TABLE. And more importantly, how do I get around it?

DETAILS

I am trying to create a demonstration of this error outside of this
application so that all of the details can be exposed and debugged.

  [I've tried to anonymize the naming, so there may be some small
   issues but the error messages and basic structure is correct.]

On the app database:

    app=> \dv meta.stage_objects
  List of relations
 Schema | Name  | Type |  Owner
    +---+--+--
 meta   | stage_objects | view | postgres
    (1 row)

    app=> \dv+ meta.stage_objects
    List of relations
 Schema | Name  | Type |  Owner   | Persistence |  Size   | 
Description
    
+---+--+--+-+-+-
 meta   | stage_objects | view | postgres | permanent   | 0 bytes |
    (1 row)

    app=> \d+ meta.dataset_stage_objects
   View "meta.stage_objects"
 Column | Type | Collation | Nullable | Default | Storage  | Description
    +--+---+--+-+--+-
 name   | name |   |  | | plain    |
 sql    | text | C |  | | extended |
    View definition:
 SELECT x.name,
    meta.gen_stage_objects(x.name) AS sql
   FROM dsource x;

    app=> TABLE meta.stage_objects ;
 name |   sql
    --+
  ...
    (2 rows)

And then, on the stage database:

    stage=> \det meta.stage_objects
 List of foreign tables
 Schema | Table | Server
    +---+
 meta   | stage_objects | app_db
    (1 row)

    stage=> \d meta.*
 Foreign table "meta.stage_objects"
 Column | Type | Collation | Nullable | Default | FDW options
    +--+---+--+-+--
 name   | name |   |  | | (column_name 'name')
 sql    | text | C |  | | (column_name 'sql')
    Server: app_db
    FDW options: (schema_name 'meta', table_name 'stage_objects')

    stage=> TABLE meta.stage_objects ;
    ERROR:  42704: type "xyzxx_t" does not exist
    CONTEXT:  PL/pgSQL function 
meta.gen_stage_object_xyzxx(public.xyzxx_t,public.abcaa_t,public.id) line 10 at 
EXECUTE
    SQL function "gen_stage_object_detail" statement 1
    remote SQL command: SELECT name, sql FROM meta.stage_objects
    LOCATION:  pgfdw_report_error, connection.c:813

The types "xyzxx_t" and "abcaa_t" both exist in schema "public" in the
app database.  These do not exist and are not needed in the stage
database.

--
mich...@mauger.com // FSF and EFF member // GNU Emacs sql.el maintainer




Re: FDW error on remote view

2022-02-13 Thread Tom Lane
Michael Mauger  writes:
> Why is there an error in the app context only when I invoke the view
> through a FOREIGN TABLE. And more importantly, how do I get around it?

It's hard to be sure with this amount of detail, but I think
what is happening is that your view invokes a plpgsql function
and that function is sensitive to the active search_path.
postgres_fdw runs the remote session with minimal search path
(i.e., search_path = 'pg_catalog'), so a function that tries
to reference something in 'public' is not gonna work unless
it provides an explicit schema-qualification or a function-local
setting of search_path.

regards, tom lane