Re: [HACKERS] raw output from copy

2016-03-03 Thread Corey Huinker
On Sat, Feb 27, 2016 at 2:26 AM, Pavel Stehule 
wrote:

> Hi
>
> 2015-08-06 10:37 GMT+02:00 Pavel Stehule :
>
>> Hi,
>>
>> Psql based implementation needs new infrastructure (more than few lines)
>>
>> Missing:
>>
>> * binary mode support
>> * parametrized query support,
>>
>> I am not against, but both points I proposed, and both was rejected.
>>
>> So why dont use current infrastructure? Raw copy is trivial patch.
>>
>
> I was asked by Daniel Verite about reopening this patch in opened
> commitfest.
>
> I am sending rebased patch
>
> Regards
>
> Pavel
>
>
>
Since this patch does something I need for my own work, I've signed up as a
reviewer.

>From a design standpoint, I feel that COPY is the preferred means of
dealing with data from sources too transient to justify setting up a
foreign data wrapper, and too simple to justify writing application code.
So, for me, RAW is the right solution, or at least *a* right solution.

My first pass of reading the code changes and the regression tests is
complete, and I found the changes to be clear and fairly straightforward.
This shouldn't surprise anyone, as the previous reviewers had only minor
quibbles with the code. So far, so good.

The regression tests seem to adequately cover all new functionality, though
I wonder if we should add some cases that highlight situations where BINARY
mode is insufficient.

Before I give my approval, I want to read it again more closely to make
sure that no cases were skipped with regard to the  (binary || raw) and
(binary || !raw) tests. Also, I want to use it on some of my problematic
files. Maybe I'll find a good edge case. Probably not.

I hope to find time for those things in the next few days.


Re: [HACKERS] Add generate_series(date,date) and generate_series(date,date,integer)

2016-03-04 Thread Corey Huinker
>
>
> I feel rather uneasy about simply removing the 'infinity' checks. Is there
> a way to differentiate those two cases, i.e. when the generate_series is
> called in target list and in the FROM part? If yes, we could do the check
> only in the FROM part, which is the case that does not work (and consumes
> arbitrary amounts of memory).
>
>
It would be simple enough to remove the infinity test on the "stop" and
leave it on the "start". Or yank both. Just waiting for others to agree
which checks should remain.


Re: [HACKERS] Add generate_series(date,date) and generate_series(date,date,integer)

2016-03-08 Thread Corey Huinker
>
> > It would be simple enough to remove the infinity test on the "stop" and
> > leave it on the "start". Or yank both. Just waiting for others to agree
> > which checks should remain.
>
> Let's yank 'em.  This is a minor issue which is distracting us from
> the main point of this patch, and I don't think it's worth getting
> distracted.
>

+1. It leaves this function consistent with the others, and if we want to
add checks later we can do them all at the same time.


>
> + 
> +
> generate_series(start,
> stop, step
> integer)
> +  date
> +  setof date
> +  
> +   Generate a series of values, from start
> to stop
> +   with a step size of step
>
> I think this should be followed by the word "days" and a period.
>
>
No objections. I just followed the pattern of the other generate_series()
docs.


> +   else
> +   /* do when there is no more left */
> +   SRF_RETURN_DONE(funcctx);
>
> I think we should drop the "else" and unindent the next two lines.
> That's the style I have seen elsewhere.  Plus less indentation equals
> more happiness.
>

No objections here either. I just followed the pattern of generate_series()
for int there.


>
> I'm pretty meh about the whole idea of this function, though,
> actually, and I don't see a single clear +1 vote for this
> functionality upthread.  (Apologies if I've missed one.)  In the
> absence of a few of those, I recommend we reject this.
>

Just David and Vik so far. The rest were either against(Simon), meh(Robert)
or +1ed/-1ed the backpatch, leaving their thoughts on the function itself
unspoken.

Happy to make the changes above if we're moving forward with it.


Re: [HACKERS] Declarative partitioning

2016-03-08 Thread Corey Huinker
>
> Sorry for replying so late.
>

No worries! We have jobs to do aside from this.


>
> Everything seemed to go dandy until I tried FOR VALUES (blah , blah],
> where psql wouldn't send the command string without accepting the closing
> parenthesis, :(.  So maybe I should try to put the whole thing in '', that
> is, accept the full range_spec in a string, but then we are back to
> requiring full-blown range parse function which I was trying to avoid by
> using the aforementioned grammar.  So, I decided to move ahead with the
> following grammar for time being:
>
> START (lower-bound) [ EXCLUSIVE ]
> | END (upper-bound) [ INCLUSIVE ]
> | START (lower-bound) [ EXCLUSIVE ] END (upper-bound) [ INCLUSIVE ]
>
> Where,
>
> *-bound: a_expr
>  | *-bound ',' a_expr
>
> Note that in the absence of explicit specification, lower-bound is
> inclusive and upper-bound is exclusive.
>

Thanks for trying. I agree that it would be a full blown range parser, and
I'm not yet advanced enough to help you with that.

So presently partitions that are unbounded on the lower end aren't
possible, but that's a creation syntax issue, not an infrastructure issue.
Correct?


> Okay, perhaps I should not presume a certain usage.  However, as you know,
> the usage like yours requires some mechanism of data redistribution (also
> not without some syntax), which I am not targeting with the initial patch.
>

I'm quite fine with limitations in this initial patch, especially if they
don't limit what's possible in the future.


> > Question: I haven't dove into the code, but I was curious about your
> tuple
> > routing algorithm. Is there any way for the algorithm to begin it's scan
> of
> > candidate partitions based on the destination of the last row inserted
> this
> > statement? I ask because most use cases (that I am aware of) have data
> that
> > would naturally cluster in the same partition.
>
> No.  Actually the tuple-routing function starts afresh for each row.  For
> range partitions, it's binary search over an array of upper bounds.  There
> is no row-to-row state caching in the partition module itself.
>
>
bsearch should be fine, that's what I've used in my own custom partitioning
schemes.

Was there a new patch, and if so, is it the one you want me to kick the
tires on?


Re: [HACKERS] Declarative partitioning

2016-03-08 Thread Corey Huinker
>
> > So presently partitions that are unbounded on the lower end aren't
> > possible, but that's a creation syntax issue, not an infrastructure
> issue.
> > Correct?
>
> In case it wasn't apparent, you can create those:
>
> FOR VALUES END (upper-bound) [INCLUSIVE]
>
> which is equivalent to:
>
> FOR VALUES '(, upper-bound)' or FOR VALUES '(, upper-bound]'
>
>
Thanks for clarifying. My BNF-fu is weak.


>
>


Re: [HACKERS] raw output from copy

2016-03-09 Thread Corey Huinker
>
>
>> The regression tests seem to adequately cover all new functionality,
>> though I wonder if we should add some cases that highlight situations where
>> BINARY mode is insufficient.
>>
>>
One thing I tried to test RAW was to load an existing json file.

My own personal test was to load an existing .json file into a 1x1 bytea
table, which worked. From there I was able to
select encode(col_name,'escape')::text::jsonb from test_table
and the json was correctly converted.

A similar test copying binary failed.

A write up of the test looks like this:


\copy (select '{"foo": "bar"}') to '/tmp/raw_test.jsonb' (format raw);
COPY 1
create temporary table raw_byte (b bytea);
CREATE TABLE
create temporary table raw_text (t text);
CREATE TABLE
\copy raw_jsonb from '/tmp/raw_test.blob' (format raw);
psql:/home/ubuntu/raw_test.sql:9: ERROR:  relation "raw_jsonb" does not
exist
\copy raw_byte from '/tmp/raw_test.blob' (format raw);
COPY 1
select encode(b,'escape')::text::json from raw_byte;
 encode

 {"foo": "bar"}
(1 row)

\copy raw_text from '/tmp/raw_test.blob' (format raw);
COPY 1
select t::jsonb from raw_text;
   t

 {"foo": "bar"}
(1 row)

create temporary table binary_byte (b bytea);
CREATE TABLE
create temporary table binary_text (t text);
CREATE TABLE
\copy binary_byte from '/tmp/raw_test.blob' (format binary);
psql:/home/ubuntu/raw_test.sql:22: ERROR:  COPY file signature not
recognized
select encode(b,'escape')::jsonb from binary_byte;
 encode

(0 rows)

\copy binary_text from '/tmp/raw_test.blob' (format binary);
psql:/home/ubuntu/raw_test.sql:26: ERROR:  COPY file signature not
recognized
select t::jsonb from binary_text;
 t
---
(0 rows)


So, *if* we want to add a regression test to demonstrate to posterity why
we need RAW for cases that BINARY can't handle, I offer the attached file.

Does anyone else see value in adding that to the regression tests?



> Before I give my approval, I want to read it again more closely to make
>> sure that no cases were skipped with regard to the  (binary || raw) and
>> (binary || !raw) tests. Also, I want to use it on some of my problematic
>> files. Maybe I'll find a good edge case. Probably not.
>>
>
I don't know why I thought this, but when I looked at the patch, I assumed
that the ( binary || raw ) tests were part of a large if/elseif/else
waterfall. They are not. They stand alone. There are no edge cases to find.

Review complete and passed. I can re-review if we want to add the
additional test.


raw_test.sql
Description: application/sql

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning

2016-03-09 Thread Corey Huinker
>
>
> I think we're converging on a good syntax, but I don't think the
> choice of nothingness to represent an open range is a good idea, both
> because it will probably create grammar conflicts now or later and
> also because it actually is sort of confusing and unintuitive to read
> given the rest of our syntax.  I suggest using UNBOUNDED instead.
>
>
As much as it reminds me of the window syntax I loathe (ROWS BETWEEN
UNBOUNDED gah), I'm inclined to agree with Robert here.

It also probably helps for code forensics in the sense that it's easier to
text search for a something than a nothing.


Re: [HACKERS] Add generate_series(date,date) and generate_series(date,date,integer)

2016-03-10 Thread Corey Huinker
On Thu, Mar 10, 2016 at 10:58 AM, Robert Haas  wrote:

> On Thu, Mar 10, 2016 at 10:30 AM, Simon Riggs 
> wrote:
> > On 10 March 2016 at 06:53, Michael Paquier 
> > wrote:
> >>
> >> On Wed, Mar 9, 2016 at 12:13 AM, Alvaro Herrera
> >>  wrote:
> >> > Robert Haas wrote:
> >> >> I'm pretty meh about the whole idea of this function, though,
> >> >> actually, and I don't see a single clear +1 vote for this
> >> >> functionality upthread.  (Apologies if I've missed one.)  In the
> >> >> absence of a few of those, I recommend we reject this.
> >> >
> >> > +1
> >>
> >> I'm meh for this patch.
> >
> >
> > "meh" == +1
> >
> > I thought it meant -1
>
> In my case it meant, like, -0.5.  I don't really like adding lots of
> utility functions like this to the default install, because I'm not
> sure how widely they get used and it gradually increases the size of
> the code, system catalogs, etc.  But I also don't want to block
> genuinely useful things.  So basically, I'm not excited about this
> patch, but I don't want to fight about it either.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

New patch for Alvaro's consideration.

Very minor changes since the last time, the explanations below are
literally longer than the changes:
- Rebased, though I don't think any of the files had changed in the mean
time
- Removed infinity checks/errors and the test cases to match
- Amended documentation to add 'days' after 'step' as suggested
- Did not add a period as suggested, to remain consistent with other
descriptions in the same sgml table
- Altered test case and documentation of 7 day step example so that the
generated dates do not land evenly on the end date. A reader might
incorrectly infer that the end date must be in the result set, when it
doesn't have to be.
- Removed unnecessary indentation that existed purely due to following of
other generate_series implementations
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 4b5ee81..0a8c280 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14700,6 +14700,26 @@ AND
   
  
 
+ 
+  generate_series(start, 
stop)
+  date
+  setof date
+  
+   Generate a series of values, from start to 
stop
+   with a step size of one day
+  
+ 
+
+ 
+  generate_series(start, 
stop, step 
integer)
+  date
+  setof date
+  
+   Generate a series of values, from start to 
stop
+   with a step size of step days
+  
+ 
+
 

   
@@ -14764,6 +14784,26 @@ SELECT * FROM generate_series('2008-03-01 
00:00'::timestamp,
  2008-03-03 22:00:00
  2008-03-04 08:00:00
 (9 rows)
+
+SELECT d.date_val FROM generate_series('1991-09-24'::date,'1991-10-01'::date) 
as d(date_val);
+  date_val  
+
+ 1991-09-24
+ 1991-09-25
+ 1991-09-26
+ 1991-09-27
+ 1991-09-28
+ 1991-09-29
+ 1991-09-30
+ 1991-10-01
+(8 rows)
+
+SELECT d.date_val FROM 
generate_series('1991-09-24'::date,'1991-10-02'::date,7) as d(date_val);
+  date_val  
+
+ 1991-09-24
+ 1991-10-01
+(2 rows)
 
   
 
diff --git a/src/backend/utils/adt/date.c b/src/backend/utils/adt/date.c
index 332db7e..af4000d 100644
--- a/src/backend/utils/adt/date.c
+++ b/src/backend/utils/adt/date.c
@@ -30,6 +30,7 @@
 #include "utils/datetime.h"
 #include "utils/nabstime.h"
 #include "utils/sortsupport.h"
+#include "funcapi.h"
 
 /*
  * gcc's -ffast-math switch breaks routines that expect exact results from
@@ -2811,3 +2812,92 @@ timetz_izone(PG_FUNCTION_ARGS)
 
PG_RETURN_TIMETZADT_P(result);
 }
+
+/* Corey BEGIN */
+typedef struct
+{
+   DateADT current;
+   DateADT stop;
+   int32   step;
+} generate_series_date_fctx;
+
+
+/* generate_series_date()
+ * Generate the set of dates from start to stop by step
+ */
+Datum
+generate_series_date(PG_FUNCTION_ARGS)
+{
+   return generate_series_step_date(fcinfo);
+}
+
+Datum
+generate_series_step_date(PG_FUNCTION_ARGS)
+{
+   FuncCallContext *funcctx;
+   generate_series_date_fctx *fctx;
+   DateADT result;
+
+   /* stuff done only on the first call of the function */
+   if (SRF_IS_FIRSTCALL())
+   {
+   DateADT start = PG_GETARG_DATEADT(0);
+   DateADT stop = PG_GETARG_DATEADT(1);
+   int32   step = 1;
+   MemoryContext oldcontext;
+
+   /* see if we were given an explicit step size */
+   if (PG_NARGS() == 3)
+   {
+   step = PG_GETARG_INT32(2);
+   if (step == 0)
+   ereport(ERROR,
+   
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("step size cannot equal 
zero")));
+   }
+
+   /* create a function context for cross-call persistence */
+   funcctx = SRF_FIRSTCALL_INI

Re: [HACKERS] Add generate_series(date,date) and generate_series(date,date,integer)

2016-03-10 Thread Corey Huinker
removed leftover development comment

On Thu, Mar 10, 2016 at 11:02 AM, Corey Huinker 
wrote:

> On Thu, Mar 10, 2016 at 10:58 AM, Robert Haas 
> wrote:
>
>> On Thu, Mar 10, 2016 at 10:30 AM, Simon Riggs 
>> wrote:
>> > On 10 March 2016 at 06:53, Michael Paquier 
>> > wrote:
>> >>
>> >> On Wed, Mar 9, 2016 at 12:13 AM, Alvaro Herrera
>> >>  wrote:
>> >> > Robert Haas wrote:
>> >> >> I'm pretty meh about the whole idea of this function, though,
>> >> >> actually, and I don't see a single clear +1 vote for this
>> >> >> functionality upthread.  (Apologies if I've missed one.)  In the
>> >> >> absence of a few of those, I recommend we reject this.
>> >> >
>> >> > +1
>> >>
>> >> I'm meh for this patch.
>> >
>> >
>> > "meh" == +1
>> >
>> > I thought it meant -1
>>
>> In my case it meant, like, -0.5.  I don't really like adding lots of
>> utility functions like this to the default install, because I'm not
>> sure how widely they get used and it gradually increases the size of
>> the code, system catalogs, etc.  But I also don't want to block
>> genuinely useful things.  So basically, I'm not excited about this
>> patch, but I don't want to fight about it either.
>>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
>
> New patch for Alvaro's consideration.
>
> Very minor changes since the last time, the explanations below are
> literally longer than the changes:
> - Rebased, though I don't think any of the files had changed in the mean
> time
> - Removed infinity checks/errors and the test cases to match
> - Amended documentation to add 'days' after 'step' as suggested
> - Did not add a period as suggested, to remain consistent with other
> descriptions in the same sgml table
> - Altered test case and documentation of 7 day step example so that the
> generated dates do not land evenly on the end date. A reader might
> incorrectly infer that the end date must be in the result set, when it
> doesn't have to be.
> - Removed unnecessary indentation that existed purely due to following of
> other generate_series implementations
>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 4b5ee81..0a8c280 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14700,6 +14700,26 @@ AND
   
  
 
+ 
+  generate_series(start, 
stop)
+  date
+  setof date
+  
+   Generate a series of values, from start to 
stop
+   with a step size of one day
+  
+ 
+
+ 
+  generate_series(start, 
stop, step 
integer)
+  date
+  setof date
+  
+   Generate a series of values, from start to 
stop
+   with a step size of step days
+  
+ 
+
 

   
@@ -14764,6 +14784,26 @@ SELECT * FROM generate_series('2008-03-01 
00:00'::timestamp,
  2008-03-03 22:00:00
  2008-03-04 08:00:00
 (9 rows)
+
+SELECT d.date_val FROM generate_series('1991-09-24'::date,'1991-10-01'::date) 
as d(date_val);
+  date_val  
+
+ 1991-09-24
+ 1991-09-25
+ 1991-09-26
+ 1991-09-27
+ 1991-09-28
+ 1991-09-29
+ 1991-09-30
+ 1991-10-01
+(8 rows)
+
+SELECT d.date_val FROM 
generate_series('1991-09-24'::date,'1991-10-02'::date,7) as d(date_val);
+  date_val  
+
+ 1991-09-24
+ 1991-10-01
+(2 rows)
 
   
 
diff --git a/src/backend/utils/adt/date.c b/src/backend/utils/adt/date.c
index 332db7e..af4000d 100644
--- a/src/backend/utils/adt/date.c
+++ b/src/backend/utils/adt/date.c
@@ -30,6 +30,7 @@
 #include "utils/datetime.h"
 #include "utils/nabstime.h"
 #include "utils/sortsupport.h"
+#include "funcapi.h"
 
 /*
  * gcc's -ffast-math switch breaks routines that expect exact results from
@@ -2811,3 +2812,92 @@ timetz_izone(PG_FUNCTION_ARGS)
 
PG_RETURN_TIMETZADT_P(result);
 }
+
+typedef struct
+{
+   DateADT current;
+   DateADT stop;
+   int32   step;
+} generate_series_date_fctx;
+
+
+/* generate_series_date()
+ * Generate the set of dates from start to stop by step
+ */
+Datum
+generate_series_date(PG_FUNCTION_ARGS)
+{
+   return generate_series_step_date(fcinfo);
+}
+
+Datum
+generate_series_step_date(PG_FUNCTION_ARGS)
+{
+   FuncCallContext *funcctx;
+   generate_series_date_fctx *fctx;
+   DateADT result;
+
+   /* stuff done only on the first call of the function */
+   if (SRF_IS_FIRSTCALL())
+   {
+   DateADT start = PG_GETARG_DATEADT(0

Re: [HACKERS] psql metaqueries with \gexec

2016-03-14 Thread Corey Huinker
>
>
> I'm getting a warning from this patch:
>
> 1 warning generated.
>

Fixed that one.


(note that I'm using CC='ccache clang -Qunused-arguments
> -fcolor-diagnostics')
>
> for (r = 0; r < nrows; r++)
>> {
>> for (c = 0; c < ncolumns; c++)
>> {
>>
> etc...
>
> Normally we don't use gratuitous {'s, and I don't think it's helping
> anything in this case. But I'll let whoever commits this decide.
>


Good to know in the future. I can remove or leave to the committer.


> diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
>> index 5f27120..0f87f29 100644
>> --- a/src/bin/psql/tab-complete.c
>> +++ b/src/bin/psql/tab-complete.c
>> @@ -1280,8 +1280,8 @@ psql_completion(const char *text, int start, int
>> end)
>> "\\dm", "\\dn", "\\do", "\\dO", "\\dp", "\\drds", "\\ds",
>> "\\dS",
>> "\\dt", "\\dT", "\\dv", "\\du", "\\dx", "\\dy",
>> "\\e", "\\echo", "\\ef", "\\encoding", "\\ev",
>> -   "\\f", "\\g", "\\gset", "\\h", "\\help", "\\H", "\\i",
>> "\\ir", "\\l",
>> -   "\\lo_import", "\\lo_export", "\\lo_list", "\\lo_unlink",
>> +   "\\f", "\\g", "\\gexec", "\\gset", "\\h", "\\help",
>> "\\H", "\\i", "\\ir",
>> +   "\\l", "\\lo_import", "\\lo_export", "\\lo_list",
>> "\\lo_unlink",
>>
>
> FWIW, it's generally better to leave that kind of re-wrapping to the next
> pg_indent run.
>

Good to know in the future. Not much point in undoing it now, I suppose.


>
> I added tests for ON_ERROR_STOP. New patch attached.
>

I was wondering if ON_ERROR_STOP tests were verbotten because you only get
to kick the tires on one feature...


>
> The patch still needs to document this feature in the psql docs (and maybe
> the manpage? not sure how that's generated...)


doc/src/sgml/ref/psql-ref.sgml is the source for both html and man pagers.

I'm on it. I didn't expect the name "gexec" to survive first contact with
the community.

Patch attached. Changes are thus:
- proper assignment of success var
- added documentation to psql manpage/html with examples pulled from
regression tests.

Not changed are:
- exuberant braces, can remove if someone wants me to
- attempt at line-wrappng the enumerated slash commands, leave that to
pg_indent
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 8a85804..acb0eb7 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1753,6 +1753,91 @@ Tue Oct 26 21:40:57 CEST 1999
   
 
   
+\gexec
+
+
+
+ Sends the current query input buffer to the server and treats
+ every column of every row of query output (if any) as a separate
+ SQL statement to be immediately executed. For example:
+
+=> SELECT 'select 1 as ones', 'select x.y, x.y*2 as double from 
generate_series(1,4) as x(y)'
+-> UNION ALL
+-> SELECT 'select true as is_true', 'select ''2000-01-01''::date 
as party_over'
+-> \gexec
+ones
+
+   1
+(1 row)
+
+y double
+- --
+1  2
+2  4
+3  6
+4  8
+(4 rows)
+
+is_true
+---
+t
+(1 row)
+
+party_over
+--
+01-01-2000
+(1 row)
+
+
+
+The secondary queries are executed in top-to-bottom, left-to-right 
order, so the command
+above is the equivalent of:
+
+=> select 1 as ones;
+=> select x.y, x.y*2 as double from generate_series(1,4) as 
x(y);
+=> select true as is_true;
+=> select '2000-01-01'::date as party_over;
+
+
+
+If the query returns no rows, no error is raised, but no secondary 
query 
+is executed, either.
+
+=%gt; SELECT 'select 1 as expect_zero_rows ' where false
+-> \gexec
+
+
+
+
+Results that are not valid SQL will of course fail, and the execution 
of further
+secondary statements is subject to the current \ON_ERROR_STOP setting.
+
+=> SELECT 'a', 'select 1', 'b'
+-> \gexec
+ERROR:  syntax error at or near "a"
+LINE 1: a
+^
+?column?
+
+   1
+(1 row)
+ERROR:  syntax error at or near "b"
+LINE 1: b
+^
+=> \set ON_ERROR_STOP 1
+=> SELECT 'a', 'select 1', 'b'
+-> \gexec
+ERROR:  syntax error at or near "a"
+LINE 1: a
+^
+
+
+The results of the main query are sent directly to the server, without
+evaluation by psql. Therefore, they cannot contain psql vars or \ 
commands.
+
+
+  
+  
 \gset [ prefix ]
 
 
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 9750a5b..5ca769f 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -849,6 +849,13 @@ exec_command(const char *cmd,
status = PSQL_CMD_SEND;
}
 
+   /* \gexec -- send query and treat every result cell as a query to be 
executed */
+   else if (strcmp(cmd, "gexec") == 0)
+   {
+   pset.gexec_flag = true;
+   status = PSQL_CMD_SEND;
+   }
+
/

[HACKERS] Soliciting Feedback on Improving Server-Side Programming Documentation

2016-03-15 Thread Corey Huinker
Over the past few months, I've been familiarizing myself with postgres
server side programming in C.

My attempts to educate myself were slow and halting. The existing server
side programming documentation has some examples, but those examples didn't
show me how do what I wanted to do, and my research-via-google was highly
circular, almost always pointing back to the documentation I had already
found lacking, or a copy of it.

Most of what I have learned I have culled from asking people on IRC, or
bugging people I've met through user groups and PgConf. In all cases,
people have been extremely helpful. However, this method is inefficient,
because we're using two people's time, one of whom has to tolerate my
incessant questions and slow learning pace.

Furthermore, the helpful suggestions I received boiled down to:
1. The function/macro/var you're looking for is PG_FOO, git grep PG_FOO
2. Look in blah.c which does something like what you're trying to do
3. The comments in blah.h do a good job of listing and explaining this
macro or that

#1 git grep is a helpful reflex for discovering examples on my own, but it
requires that I have a term to search on in the first place, and too often
I don't know what I don't know.

#2 is the gold standard in terms of correctness (the code had to have
worked at least up to the last checkin date), and in terms of
discoverability it often gave me names of new macros to search for, coding
patterns, etc. However, I was always left with the questions: How would I
have figured this out on my own? How is the next person going to figure it
out? Why doesn't anybody document this?

#3 Often answers the last question in #2: It *is* documented, but that
documentation is not easily discoverable by conventional means.

So what I'd like to do is migrate some of the helpful information in the
header files into pages of web searchable documentation, and also to revamp
the existing documentation to be more relevant.

Along the way, I collected a list of things I wished I'd had from the start:

   - A list of all the GETARG_* macros. It would have been especially great
   if this were in table form:   Your Parameter Is A / Use This Macro / Which
   Gives This Result Type / Working example.
   - A list/table of the DatumGet* macros. I'm aware that many of them
   overlap/duplicate others. That'd be good to know too.
   - The table at
   http://www.postgresql.org/docs/9.1/static/errcodes-appendix.html has the
   numeric codes and PL/PGSQL constants enumerated. It'd be nice if it had the
   C #define as well
   - The SPI documentation mentions most/all of the SPI functions, but I
   couldn't find documentation on the SPI variables like SPI_processed and
   SPI_tuptable.
   - Examples and explanation of how PG_TRY()/PG_CATCH work. How to add
   context callbacks.
   - Direct Function Calls
   - A comparison of the two modes of writing SRF functions (Materialize vs
   multi-call)
   - Less explanation of how to do write V0-style functions. That was
   called the "old style" back in version 7.1. Why is that information up
   front in the documentation when so much else is sequestered in header files?

Some of these things may seem obvious/trivial to you. I would argue that
they're only obvious in retrospect, and the more obvious-to-you things we
robustly document, the quicker we accumulate programmers who are capable of
agreeing that it's obvious, and that's good for the community.

I'm aware that some of these APIs change frequently. In those cases, I
suggest that we make note of that on the same page.

Because I'm still going through the learning curve, I'm probably the least
qualified to write the actual documentation. However, I have a clear memory
of what was hard to learn and I have the motivation to make it easier on
the next person. That makes me a good focal point for gathering,
formatting, and submitting the documentation in patch form. I'm
volunteering to do so. What I need from you is:

   - Citations of existing documentation in header files that could/should
   be exposed in our more formal documentation.
   - Explanations of any of the things above, which I can then reformat
   into proposed documentation.
   - A willingness to review the proposed new documentation
   - Reasoned explanations for why this is a fool's errand

You supply the expertise, I'll write the patch.

Thanks in advance.


Re: [HACKERS] Soliciting Feedback on Improving Server-Side Programming Documentation

2016-03-15 Thread Corey Huinker
>
>
> I think this is all great. You may find some automated assistance from
> doxygen.postgresql.org .
>
> Sincerely,
>
> JD


doxygen is great as far as it goes, but it does a great job of separating
function definition from the comment explaining the function, so I have to
drill into the raw source anyway.

Also, doxygen isn't very helpful with macros, and a lot of functions in the
internals are actually macros.


Re: [HACKERS] Soliciting Feedback on Improving Server-Side Programming Documentation

2016-03-15 Thread Corey Huinker
On Tue, Mar 15, 2016 at 1:19 PM, Shulgin, Oleksandr <
oleksandr.shul...@zalando.de> wrote:

> There's also a good deal of README files in the source tree, so I would
> add:
>
> 4. find src -name 'README*'
>

That too. But README's don't show up (easily) in a google search, so they
elude discovery. We should want to make discovery easy to the uninitiated.


Re: [HACKERS] Soliciting Feedback on Improving Server-Side Programming Documentation

2016-03-15 Thread Corey Huinker
On Tue, Mar 15, 2016 at 1:35 PM, Joshua D. Drake 
wrote:

> On 03/15/2016 10:30 AM, Corey Huinker wrote:
>
>>
>> On Tue, Mar 15, 2016 at 1:19 PM, Shulgin, Oleksandr
>> mailto:oleksandr.shul...@zalando.de>>
>> wrote:
>>
>> There's also a good deal of README files in the source tree, so I
>> would add:
>>
>> 4. find src -name 'README*'
>>
>>
>> That too. But README's don't show up (easily) in a google search, so
>> they elude discovery. We should want to make discovery easy to the
>> uninitiated.
>>
>
> I don't think anyone is arguing with you. I think we are trying to point
> you to sources for your project.
>
>
I didn't mean to imply that anyone was arguing. All responses so far have
been positive.


Re: [HACKERS] Soliciting Feedback on Improving Server-Side Programming Documentation

2016-03-15 Thread Corey Huinker
On Tue, Mar 15, 2016 at 4:38 PM, Álvaro Hernández Tortosa 
wrote:

>
> I started a similar thread with probably similar concerns:
> http://www.postgresql.org/message-id/56d1a6aa.6080...@8kdata.com
>
> I believe this effort should be done. I added to my TODO list to
> compile a list of used functions in a selection of picked extensions to use
> that as a starting point of an "API".
>
> Regards,
>
> Álvaro
>
>
Clearly we have the same goal in mind. I don't know how I missed seeing
your thread.


Re: [HACKERS] Re: Add generate_series(date,date) and generate_series(date,date,integer)

2016-03-19 Thread Corey Huinker
On Thu, Mar 17, 2016 at 10:00 AM, David Steele  wrote:

> On 3/17/16 4:49 AM, Dean Rasheed wrote:
>
> > On 16 March 2016 at 23:32, David Steele  wrote:
> >
> >>
> >> I think in this case it comes down to a committer's judgement so I have
> >> marked this "ready for committer" and passed the buck on to Álvaro.
> >
> > So I was pretty much "meh" on this patch too, because I'm not
> > convinced it actually saves much typing, if any.
> >
> > However, I now realise that it introduces a backwards-compatibility
> > breakage. Today it is possible to type
> >
> > SELECT * FROM generate_series('01-01-2000'::date, '01-04-2000', '7
> days');
>
> It can also be broken as below and this is even scarier to me:
>
> postgres=# SELECT * FROM generate_series('01-01-2000'::date,
> '01-04-2000'::date, '7 days');
> ERROR:  invalid input syntax for integer: "7 days"
> LINE 1: ...te_series('01-01-2000'::date, '01-04-2000'::date, '7 days');
>
> And only works when:
>
> postgres=# SELECT * FROM generate_series('01-01-2000'::date,
> '01-04-2000'::date, '7 days'::interval);
> generate_series
> 
>  2000-01-01 00:00:00+00
> (1 row)
>
> One might argue that string constants for dates in actual code might be
> rare but I think it's safe to say that string constants for intervals
> are pretty common.  I also think it unlikely that they are all
> explicitly cast.
>
> I marked this "waiting for author" so Corey can respond.  I actually
> tested with the v3 patch since the v4 patch seems to be broken with git
> apply or patch:
>
> $ patch -p1 < ../other/generate_series_date.v4.diff
> patching file doc/src/sgml/func.sgml
> Hunk #1 succeeded at 14787 (offset 87 lines).
> Hunk #2 succeeded at 14871 (offset 87 lines).
> patching file src/backend/utils/adt/date.c
> patch:  malformed patch at line 163: diff --git
> a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
>
> $ git apply -3 ../other/generate_series_date.v4.diff
> fatal: corrupt patch at line 163
>
> --
> -David
> da...@pgmasters.net
>

Not sure what's wrong with the patch, but I get a clean one to you pending
the outcome of the design discussion. v4 just ripped out the infinity
tests, so v3 is valid for the issue you found..

So the function clobbers the situation where the user meant a timestamp
range but instead gave two dates, and meant an interval but gave a string.
I'm curious how generate_series() for numeric didn't have the same issue
with floats.

I see two ways around this:

1. Drop the step parameter entirely. My own use cases only ever require the
step values 1 and -1, and which one the user wants can be inferred from
(start < end). This would still change the output type where a person
wanted timestamps, but instead input two dates.
2. Rename the function date_series() or generate_series_date()

I still think this is an important function because at the last several
places I've worked, I've found myself manufacturing a query where some
event data is likely to have date gaps, but we want to see the date gaps or
at least have the 0 values contribute to a later average aggregate.

Like this:

select d.dt, sum(s.v1), sum(s2.v2), sum(t.v1), sum(t2.v2)
from   date_series_function(input1,input2) as d(dt),
   some_dimensional_characteristic c
left join sparse_table s on s.event_date = d.dt and s.c1 = c.cval
left join sparse_table s2 on s2.event_date = d.dt and s2.c2 = c.cval
left join other_sparse t on t.event_date = d.dt and t.c1 = c.cval
left join other_sparse t2 on t2.event_date = d.dt and t2.c2 = c.cval
group by d.dt
order by d.dt

This gets cumbersome when you have to cast every usage of your date column.

Furthermore, this is 90%+ of my usage of generate_series(), the remaining
10% being the integer case to populate sample data for tests.

Any thoughts on how to proceed?


Re: [HACKERS] Re: Add generate_series(date,date) and generate_series(date,date,integer)

2016-03-19 Thread Corey Huinker
On Thu, Mar 17, 2016 at 11:30 AM, David G. Johnston <
david.g.johns...@gmail.com> wrote:

> ​I'd call it "generate_dates(...)" and be done with it.
>

Sold. Hope to have a revised patch for you today or tomorrow.


Re: [HACKERS] Re: Add generate_series(date,date) and generate_series(date,date,integer)

2016-03-19 Thread Corey Huinker
On Thu, Mar 17, 2016 at 12:04 PM, Tom Lane  wrote:

> David Steele  writes:
> > On 3/17/16 11:30 AM, David G. Johnston wrote:
> >> ​I'd call it "generate_dates(...)" and be done with it.
> >> We would then have:
> >> generate_series()
> >> generate_subscripts()
> >> generate_dates()
>
> > To me this completely negates the idea of this "just working" which is
> > why it got a +1 from me in the first place.  If I have to remember to
> > use a different function name then I'd prefer to just cast on the
> > timestamp version of generate_series().
>
> Yeah, this point greatly weakens the desirability of this function IMO.
> I've also gone from "don't care" to "-1".
>
> regards, tom lane
>

Since that diminishes the already moderate support for this patch, I'll
withdraw it.


Re: [HACKERS] Declarative partitioning

2016-01-16 Thread Corey Huinker
>
>
> If we have a CREATE statement for each partition, how do we generalize
> that to partitions at different levels? For example, if we use something
> like the following to create a partition of parent_name:
>
> CREATE PARTITION partition_name OF parent_name FOR VALUES ...
> WITH ... TABLESPACE ...
>
> Do we then say:
>
> CREATE PARTITION subpartition_name OF partition_name ...
>


That's how I'd want it for partitions created after the initial partitioned
table is created.

I'd like to be able to identify the parent partition by it's own
partitioning parameters rather than name, like the way we can derive the
name of an index in ON CONFLICT. But I see no clean way to do that, and if
one did come up, we'd simply allow the user to replace

with
table_name PARTITION partition_spec [...PARTITION partition_spec [
...PARTITION turtles_all_the_way_down]]).

Again, totally fine with forcing the maintenance script to know or discover
the name of the partition to be subpartitioned...for now.



> to create a level 2 partition (sub-partition) of parent_name? Obviously,
> as is readily apparent from the command, it is still a direct partition of
> partition_name for all internal purposes (consider partition list caching
> in relcache, recursive tuple routing, etc.) save some others.
>
> I ask that also because it's related to the choice of syntax to use to
> declare the partition key for the multi-level case. I'm considering the
> SUBPARTITION BY notation and perhaps we could generalize it to more than
> just 2 levels. So, for the above case, parent_name would have been created
> as:
>
> CREATE TABLE parent_name PARTITION BY ... SUBPARTITION BY ...


> Needless to say, when subpartition_name is created with the command we saw
> a moment ago, the root partitioned table would be locked. In fact, adding
> a partition anywhere in the hierarchy needs an exclusive lock on the root
> table. Also, partition rule (the FOR VALUES clause) would be validated
> against PARTITION BY or SUBPARTITION BY clause at the respective level.
>
> Although, I must admit I feel a little uneasy about the inherent asymmetry
> in using SUBPARTITION BY for key declaration whereas piggybacking CREATE
> PARTITION for creating sub-partitions. Is there a better way?
>

Provided that the syntax allows for N levels of partitioning, I don't care
if it's
PARTITION BY.., PARTITION BY..., PARTITION BY ...
or
PARTITION BY.., SUBPARTITION BY..., SUBPARTITION BY ...

The first is probably better for meta-coding purposes, but the second makes
it clear which partition layer is first.


>
> > As for the convenience syntax (if at all), how about:
> >
> > CREATE TABLE foo (
> >   ...
> > )
> > PARTITION BY ... ON (...)
> > SUBPARTITION BY ... ON (...)
> > opt_partition_list;
> >
> > where opt_partition_list is:
> >
> > PARTITIONS (
> >   partname FOR VALUES ... [WITH] [TABLESPACE] opt_subpart_list
> >   [, ...]
> > )
> >
> > where opt_subpart_list is:
> >
> > SUBPARTITIONS (
> >   subpartname FOR VALUES ... [WITH] [ TABLESPACE]
> >   [, ...]
> > )
>
> Do we want this at all? It seems difficult to generalize this to
> multi-level hierarchy of more than 2 levels.
>

I want this.

Granted the syntax of a 3+ level partitioning would be cumbersome, but it
is what the user wanted, and the nested PARTITION/SUBPARTITION. In those
cases, the user might opt to not create more than the default first
subpartition to keep the syntax sane, or we might auto-generate default
partitions (with a VALUES clause of whatever "all values" is for that
datatype...again, this is an area where leveraging range types would be
most valuable).



> On one hand, I think to keep treating "partition hierarchies" as
> "inheritance hierachies" might have some issues. I am afraid that
> documented inheritance semantics may not be what we want to keep using for
> the new partitioned tables. By that, I mean all the user-facing behaviors
> where inheritance has some bearing. Should it also affect new partitioned
> tables? Consider whether inheritance semantics would render infeasible
> some of the things that we'd like to introduce for the new partitioned
> tables such as automatic tuple routing, or keep us from improving planner
> smarts and executor capabilities for partitioned tables over what we
> already have.
>

I feel that Automatic tuple routing should be considered they key benefit
of "real" partitions over inherited tables. Trigger maintenance is most of
the work of custom partitioning schemes, at least the ones I've written.

There's a great chance that not everyone cares right now about this part
> of the new partitioning but just want to put it out there. There are more
> contentious issues like the syntax, partitioning maintenance commands that
> we plan to support (now or later) and such.
>

What I've read so far addresses most of my concerns.

Still somewhat on my mind:

1. ability to describe partition bounds via range types, regardless of
whether the Automatic T

Re: [HACKERS] Advices on custom data type and extension development

2016-01-19 Thread Corey Huinker
>
> Seriously, you should consider having a primary key with two
> columns, of type date and int.  It would take exactly the same
> space as your current plan, and performance should be very close to
> what you propose.  As long as you aren't using some ORM that is too
> dumb to deal with this, it should be far easier than creating the
> custom type.
>

+1

Most ORMs cannot handle ENUMs, let alone user defined composite types.

That, or they *flood* the database with SELECT * FROM pg_type WHERE ...
queries. I'm looking at you, Cake.

You're far better off trying a (date,integer) key as Kevin said.

If the ORM doesn't allow that, I'd suggest a custom function that encodes
the date bit-shifted to the high 4 bytes, and then adds in the four bytes
from a cycling sequence. At least then you've got a shot at partitioning,
though the lower/upper bounds of the partitions would not make sense to the
casual observer.


Re: [HACKERS] Declarative partitioning

2016-01-22 Thread Corey Huinker
>
>
> So for now, you create an empty partitioned table specifying all the
> partition keys without being able to  define any partitions in the same
> statement. Partitions (and partitions thereof, if any) will be created
> using CREATE PARTITION statements, one for each.
>

...and I would assume that any attempt to insert into a partitioned table
with no partitions (or lacking partitions at a defined level) would be an
error? If so, I'd be ok with that.


> Specifying range partitioning bound as PARTITION FOR RANGE 
> sounds like it offers some flexibility, which can be seen as a good thing.
> But it tends to make internal logic slightly complicated.
>
> Whereas, saying PARTITION FOR VALUES LESS THAN (max1, max2, ...) is
> notationally simpler still and easier to work with internally. Also, there
> will be no confusion about exclusivity of the bound if we document it so.


I understand wanting the internal rules to be simple. Oracle clearly went
with VALUES LESS THAN waterfalls for that reason.

What I'm hoping to avoid is:
- having to identify my "year2014" partition by VALUES LESS THAN
'2015-01-01', a bit of cognitive dissonance defining data by what it's not.
- and then hoping that there is a year2013 partition created by someone
with similar sensibilities, the partition definition being incomplete
outside of the context of other partition definitions.
- and then further hoping that nobody drops the year2013 partition, thus
causing new 2013 rows to fall into the year2014 partition, a side effect of
an operation that did not mention the year2014 partition.

Range types do that, and if we're concerned about range type overhead,
we're only dealing with the ranges at DDL time, we can break down the ATR
rules into a more easily digestible form once the partition is modified.

Range continuity can be tested with -|-, but we'd only need to test for
overlaps: gaps in ranges are sometimes a feature, not a bug (ex: I don't
want any rows from future dates and we weren't in business before 1997).

Also, VALUES LESS THAN forces us to use discrete values.  There is no way
with to express with VALUES LESS THAN partitions that have float values for
temperature:
ice (,0.0), water [0.0,212.0], steam (212.0,3000.0], plasma (3000.0,).

Yes, I can calculate the day after the last day in a year, I can use
212.01, I can write code to rigorously check that all partitions
are in place. I'd just rather not.

p.s. I'm really excited about what this will bring to Postgres in general
and my organization in particular. This feature alone will help chip away
at our needs for Vertica and Redshift clusters. Let me know if there's
anything I can do to help.


[HACKERS] Add generate_series(date,date) and generate_series(date,date,integer)

2016-01-24 Thread Corey Huinker
This patch addresses a personal need: nearly every time I use
generate_series for timestamps, I end up casting the result into date or
the ISO string thereof. Like such:

SELECT d.dt::date as dt
FROM generate_series('2015-01-01'::date,
 '2016-01-04'::date,
 interval '1 day') AS d(dt);


That's less than elegant.

With this patch, we can do this:


SELECT d.date_val FROM
generate_series('1991-09-24'::date,'1991-10-01'::date) as d(date_val);
  date_val

 1991-09-24
 1991-09-25
 1991-09-26
 1991-09-27
 1991-09-28
 1991-09-29
 1991-09-30
 1991-10-01
(8 rows)

SELECT d.date_val FROM
generate_series('1991-09-24'::date,'1991-10-01'::date,7) as d(date_val);
  date_val

 1991-09-24
 1991-10-01
(2 rows)

SELECT d.date_val FROM
generate_series('1999-12-31'::date,'1999-12-29'::date,-1) as d(date_val);
  date_val

 1999-12-31
 1999-12-30
 1999-12-29
(3 rows)


One thing I discovered in doing this patch is that if you do a timestamp
generate_series involving infinityit tries to do it. I didn't wait to
see if it finished.

For the date series, I put in checks to return an empty set:

SELECT d.date_val FROM
generate_series('-infinity'::date,'1999-12-29'::date) as d(date_val);
 date_val
--
(0 rows)

SELECT d.date_val FROM generate_series('1991-09-24'::date,'infinity'::date)
as d(date_val);
 date_val
--
(0 rows)



Notes:
- I borrowed the int4 implementation's check for step-size of 0 for POLA
reasons. However, it occurred to me that the function might be leakproof if
the behavior where changed to instead return an empty set. I'm not sure
that leakproof is a goal in and of itself.

First attempt at this patch attached. The examples above are copied from
the new test cases.
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 9c143b2..15ebe47 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14657,6 +14657,26 @@ AND
   
  
 
+ 
+  generate_series(start, 
stop)
+  date
+  setof date
+  
+   Generate a series of values, from start to 
stop
+   with a step size of one day
+  
+ 
+
+ 
+  generate_series(start, 
stop, step 
integer)
+  date
+  setof date
+  
+   Generate a series of values, from start to 
stop
+   with a step size of step
+  
+ 
+
 

   
@@ -14721,6 +14741,26 @@ SELECT * FROM generate_series('2008-03-01 
00:00'::timestamp,
  2008-03-03 22:00:00
  2008-03-04 08:00:00
 (9 rows)
+
+SELECT d.date_val FROM generate_series('1991-09-24'::date,'1991-10-01'::date) 
as d(date_val);
+  date_val  
+
+ 1991-09-24
+ 1991-09-25
+ 1991-09-26
+ 1991-09-27
+ 1991-09-28
+ 1991-09-29
+ 1991-09-30
+ 1991-10-01
+(8 rows)
+
+SELECT d.date_val FROM 
generate_series('1991-09-24'::date,'1991-10-01'::date,7) as d(date_val);
+  date_val  
+
+ 1991-09-24
+ 1991-10-01
+(2 rows)
 
   
 
diff --git a/src/backend/utils/adt/date.c b/src/backend/utils/adt/date.c
index 332db7e..7404a2f 100644
--- a/src/backend/utils/adt/date.c
+++ b/src/backend/utils/adt/date.c
@@ -30,6 +30,7 @@
 #include "utils/datetime.h"
 #include "utils/nabstime.h"
 #include "utils/sortsupport.h"
+#include "funcapi.h"
 
 /*
  * gcc's -ffast-math switch breaks routines that expect exact results from
@@ -2811,3 +2812,97 @@ timetz_izone(PG_FUNCTION_ARGS)
 
PG_RETURN_TIMETZADT_P(result);
 }
+
+/* Corey BEGIN */
+typedef struct
+{
+   DateADT current;
+   DateADT finish;
+   int32   step;
+} generate_series_date_fctx;
+
+
+/* generate_series_date()
+ * Generate the set of dates from start to finish by step
+ */
+Datum
+generate_series_date(PG_FUNCTION_ARGS)
+{
+   return generate_series_step_date(fcinfo);
+}
+
+Datum
+generate_series_step_date(PG_FUNCTION_ARGS)
+{
+   FuncCallContext *funcctx;
+   generate_series_date_fctx *fctx;
+   DateADT result;
+
+   /* stuff done only on the first call of the function */
+   if (SRF_IS_FIRSTCALL())
+   {
+   DateADT start = PG_GETARG_DATEADT(0);
+   DateADT finish = PG_GETARG_DATEADT(1);
+   int32   step = 1;
+
+   /* see if we were given an explicit step size */
+   if (PG_NARGS() == 3)
+   step = PG_GETARG_INT32(2);
+   if (step == 0)
+   ereport(ERROR,
+   
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("step size cannot equal 
zero")));
+
+   MemoryContext oldcontext;
+
+   /* create a function context for cross-call persistence */
+   funcctx = SRF_FIRSTCALL_INIT();
+
+   /*
+* switch to memory context appropriate for multiple function 
calls
+*/
+   oldcontext = 
MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+

Re: [HACKERS] Add generate_series(date,date) and generate_series(date,date,integer)

2016-01-24 Thread Corey Huinker
>
>
> If it didn't respond to SIGINT, that would be an issue, but otherwise
> this doesn't seem much more exciting than any other way to create a
> query that will run longer than you want to wait.
>
> regards, tom lane
>

It responded to SIGINT, so yeah, meh.

I can see value in aligning the behavior of infinity queries between date
and timestamp, but I have no strong opinion about which behavior is better:
it's either set step = 0 or an ereport(), no biggie if we want to handle
the condition, I rip out the DATE_NOT_FINITE() checks.

Incidentally, is there a reason behind the tendency of internal functions
to avoid parameter defaults in favor of multiple pg_proc entries? I copied
the existing behavior of the int4 generate_series, but having one entry
with the defaults seemed more self-documenting.


Re: [HACKERS] Add generate_series(date,date) and generate_series(date,date,integer)

2016-01-25 Thread Corey Huinker
On Mon, Jan 25, 2016 at 2:07 AM, Tom Lane  wrote:

> Corey Huinker  writes:
> > Incidentally, is there a reason behind the tendency of internal functions
> > to avoid parameter defaults in favor of multiple pg_proc entries?
>
> Yes: you can't specify defaults in pg_proc.h.
>
> We work around that where absolutely necessary, see the last part of
> system_views.sql.  But it's messy enough, and bug-prone enough, to be
> better avoided --- for example, it's very easy for the redeclaration
> in system_view.sql to forget a STRICT or other similar marking.
>
> Personally I'd say that every one of the existing cases that simply has
> a default for the last argument was a bad idea, and would better have
> been done in the traditional way with two pg_proc.h entries.
>
> regards, tom lane
>

...which answers my follow up question where make_interval was getting the
defaults I saw in the table but not the .h file. Thanks!


Re: [HACKERS] [POC] FETCH limited by bytes.

2016-01-25 Thread Corey Huinker
>
>
> Review:
>
> - There is an established idiom in postgres_fdw for how to extract
> data from lists of DefElems, and this patch does something else
> instead.  Please make it look like postgresIsForeignRelUpdatable,
> postgresGetForeignRelSize, and postgresImportForeignSchema instead of
> inventing something new.  (Note that your approach would require
> multiple passes over the list if you needed information on multiple
> options, whereas the existing approach does not.)
>

I will look into that. The original patch pre-dates import foreign schema,
so I'm not surprised that I missed the established pattern.


>
> - I think the comment in InitPgFdwOptions() could be reduced to a
> one-line comment similar to those already present.
>

Aye.


> - I would reduce the debug level on the fetch command to something
> lower than DEBUG1, or drop it entirely.  If you keep it, you need to
> fix the formatting so that the line breaks look like other ereport()
> calls.
>

As I mentioned, I plan to drop it entirely.  It was only there to show a
reviewer that it works as advertised. There's not much to see without it.


> - We could consider folding fetch_size into "Remote Execution
> Options", but maybe that's too clever.
>

If you care to explain, I'm listening. Otherwise I'm going forward with the
other suggestions you've made.


>
> I would not worry about trying to get this into the regression tests.
>
>
Happy to hear that.


Re: [HACKERS] [POC] FETCH limited by bytes.

2016-01-25 Thread Corey Huinker
On Mon, Jan 25, 2016 at 3:22 PM, Robert Haas  wrote:

> On Mon, Jan 25, 2016 at 1:21 PM, Corey Huinker 
> wrote:
> >> - We could consider folding fetch_size into "Remote Execution
> >> Options", but maybe that's too clever.
> >
> > If you care to explain, I'm listening. Otherwise I'm going forward with
> the
> > other suggestions you've made.
>
> It's just a little unfortunate to have multiple sections with only a
> single option in each.  It would be nice to avoid that somehow.
>
>
Revised in patch v3:
* get_option() and get_fetch_size() removed, fetch_size searches added to
existing loops.
* Move fetch_size <= 0 tests into postgres_fdw_validator() routine in
option.c
* DEBUG1 message removed, never intended that to live beyond the proof of
concept.
* Missing regression test mentioned in makefile de-mentioned, as there's
nothing to see without the DEBUG1 message.
* Multi-line comment shrunk

(There's a v2 patch that is prior to the change to postgres_fdw_validator()
in option.c, but in retrospect that's not interesting to you).

I'm not too keen on having *no* new regression tests, but defer to your
judgement.

Still not sure what you mean by remote execution options. But it might be
simpler now that the patch is closer to your expectations.


v3.Make-fetch_size-settable-per-server-and-table.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add generate_series(date,date) and generate_series(date,date,integer)

2016-01-26 Thread Corey Huinker
On Tue, Jan 26, 2016 at 7:53 AM, Michael Paquier 
wrote:

> On Tue, Jan 26, 2016 at 7:00 PM, Torsten Zuehlsdorff
>  wrote:
> >
> > On 26.01.2016 07:52, Simon Riggs wrote:
> >
> >>> Imagine for example a script that in some rare cases passes happens to
> >>> pass infinity into generate_series() - in that case I'd much rather
> error
> >>> out than wait till the end of the universe.
> >>>
> >>> So +1 from me to checking for infinity.
> >>>
> >>
> >> +1
> >>
> >> ERROR infinite result sets are not supported, yet
> >
> >
> > Maybe we should skip the "yet". Or do we really plan to support them in
> > (infinite) future? ;)
> >
> > +1 from me to check infinity also.
>
> Something like the patch attached would be fine? This wins a backpatch
> because the query continuously running eats memory, no?
> --
> Michael
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>
I'll modify my generate_series_date to give similar errors.

I have no opinion on backpatch.

+1 for flippant references to infinity.


Re: [HACKERS] Add generate_series(date,date) and generate_series(date,date,integer)

2016-01-26 Thread Corey Huinker
Revised patch:

Differences in this patch vs my first one:
- infinite bounds generate errors identical to Michael's timestamp patch
(though I did like Simon's highly optimistic error message).
- Bounds checking moved before memory context allocation
- arg variable "finish" renamed "stop" to match parameter name in
documentation and error message

On Tue, Jan 26, 2016 at 12:47 PM, Corey Huinker 
wrote:

> On Tue, Jan 26, 2016 at 7:53 AM, Michael Paquier <
> michael.paqu...@gmail.com> wrote:
>
>> On Tue, Jan 26, 2016 at 7:00 PM, Torsten Zuehlsdorff
>>  wrote:
>> >
>> > On 26.01.2016 07:52, Simon Riggs wrote:
>> >
>> >>> Imagine for example a script that in some rare cases passes happens to
>> >>> pass infinity into generate_series() - in that case I'd much rather
>> error
>> >>> out than wait till the end of the universe.
>> >>>
>> >>> So +1 from me to checking for infinity.
>> >>>
>> >>
>> >> +1
>> >>
>> >> ERROR infinite result sets are not supported, yet
>> >
>> >
>> > Maybe we should skip the "yet". Or do we really plan to support them in
>> > (infinite) future? ;)
>> >
>> > +1 from me to check infinity also.
>>
>> Something like the patch attached would be fine? This wins a backpatch
>> because the query continuously running eats memory, no?
>> --
>> Michael
>>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>
>>
> I'll modify my generate_series_date to give similar errors.
>
> I have no opinion on backpatch.
>
> +1 for flippant references to infinity.
>
>


v2.generate_series_date.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] FETCH limited by bytes.

2016-01-27 Thread Corey Huinker
>
>
> Looks pretty good.  You seem to have added a blank line at the end of
> postgres_fdw.c, which should be stripped back out.
>

Done.


>
> > I'm not too keen on having *no* new regression tests, but defer to your
> > judgement.
>
> So... I'm not *opposed* to regression tests.  I just don't see a
> reasonable way to write one.  If you've got an idea, I'm all ears.
>

The possible tests might be:
- creating a server/table with fetch_size set
- altering an existing server/table to have fetch_size set
- verifying that the value exists in pg_foreign_server.srvoptions and
pg_foreign_table.ftoptions.
- attempting to set fetch_size to a non-integer value

None of which are very interesting, and none of which actually test what
fetch_size was actually used.

But I'm happy to add any of those that seem worthwhile.


> > Still not sure what you mean by remote execution options. But it might be
> > simpler now that the patch is closer to your expectations.
>
> I'm talking about the documentation portion of the patch, which
> regrettably seems to have disappeared between v2 and v3.
>

Ah, didn't realize you were speaking about the documentation, and since
that section was new, I wasn't familiar with the name. Moved.

...and not sure why the doc change didn't make it into the last patch, but
it's in this one.

I also moved the paragraph starting "When using the extensions option, *it
is the user's responsibility* that..." such that it is in the same
varlistentry as "extensions", though maybe that change should be delegated
to the patch that created the extensions option.

Enjoy.
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 86a5789..f89de2f 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -131,6 +131,17 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
 			/* check list syntax, warn about uninstalled extensions */
 			(void) ExtractExtensionList(defGetString(def), true);
 		}
+		else if (strcmp(def->defname, "fetch_size") == 0)
+		{
+			int		fetch_size;
+
+			fetch_size = strtol(defGetString(def), NULL,10);
+			if (fetch_size <= 0)
+ereport(ERROR,
+		(errcode(ERRCODE_SYNTAX_ERROR),
+		 errmsg("%s requires a non-negative integer value",
+def->defname)));
+		}
 	}
 
 	PG_RETURN_VOID();
@@ -162,6 +173,9 @@ InitPgFdwOptions(void)
 		/* updatable is available on both server and table */
 		{"updatable", ForeignServerRelationId, false},
 		{"updatable", ForeignTableRelationId, false},
+		/* fetch_size is available on both server and table */
+		{"fetch_size", ForeignServerRelationId, false},
+		{"fetch_size", ForeignTableRelationId, false},
 		{NULL, InvalidOid, false}
 	};
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 374faf5..f71bf61 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -68,7 +68,9 @@ enum FdwScanPrivateIndex
 	/* SQL statement to execute remotely (as a String node) */
 	FdwScanPrivateSelectSql,
 	/* Integer list of attribute numbers retrieved by the SELECT */
-	FdwScanPrivateRetrievedAttrs
+	FdwScanPrivateRetrievedAttrs,
+	/* Integer representing the desired fetch_size */
+	FdwScanPrivateFetchSize
 };
 
 /*
@@ -126,6 +128,8 @@ typedef struct PgFdwScanState
 	/* working memory contexts */
 	MemoryContext batch_cxt;	/* context holding current batch of tuples */
 	MemoryContext temp_cxt;		/* context for per-tuple temporary data */
+
+	int			fetch_size;		/* number of tuples per fetch */
 } PgFdwScanState;
 
 /*
@@ -380,6 +384,7 @@ postgresGetForeignRelSize(PlannerInfo *root,
 	fpinfo->fdw_startup_cost = DEFAULT_FDW_STARTUP_COST;
 	fpinfo->fdw_tuple_cost = DEFAULT_FDW_TUPLE_COST;
 	fpinfo->shippable_extensions = NIL;
+	fpinfo->fetch_size = 100;
 
 	foreach(lc, fpinfo->server->options)
 	{
@@ -394,16 +399,17 @@ postgresGetForeignRelSize(PlannerInfo *root,
 		else if (strcmp(def->defname, "extensions") == 0)
 			fpinfo->shippable_extensions =
 ExtractExtensionList(defGetString(def), false);
+		else if (strcmp(def->defname, "fetch_size") == 0)
+			fpinfo->fetch_size = strtol(defGetString(def), NULL,10);
 	}
 	foreach(lc, fpinfo->table->options)
 	{
 		DefElem*def = (DefElem *) lfirst(lc);
 
 		if (strcmp(def->defname, "use_remote_estimate") == 0)
-		{
 			fpinfo->use_remote_estimate = defGetBoolean(def);
-			break;/* only need the one value */
-		}
+		else if (strcmp(def->defname, "fetch_size") == 0)
+			fpinfo->fetch_size = strtol(defGetString(def), NULL,10);
 	}
 
 	/*
@@ -1069,6 +1075,9 @@ postgresGetForeignPlan(PlannerInfo *root,
 	 */
 	fdw_private = list_make2(makeString(sql.data),
 			 retrieved_attrs);
+	fdw_private = list_make3(makeString(sql.data),
+			 retrieved_attrs,
+			 makeInteger(fpinfo->fetch_size));
 
 	/*
 	 * Create the ForeignScan node from target list, filtering expressions,
@@ -1147,6 +1156,8 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	 FdwScanPrivateSelectSql));
 

Re: [HACKERS] Re: Add generate_series(date, date) and generate_series(date, date, integer)

2016-02-02 Thread Corey Huinker
Doh, I left that comment to myself in there. :)

The corresponding structs in timestamp.c and int.c have no comment, so
suggestions of what should be there are welcome. In the interim I put in
this:

/* state for generate_series_date(date,date,[step]) */


Extra linefeed after struct removed.

Do you have any insight as to why the documentation test failed?

In the mean time, here's the updated patch.


On Tue, Feb 2, 2016 at 11:41 AM, David Steele  wrote:

> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   tested, passed
> Spec compliant:   tested, passed
> Documentation:tested, failed
>
> Everything looks good except for two minor issues:
>
> 1) There should be an informative comment for this struct:
>
> +/* Corey BEGIN */
> +typedef struct
> +{
> +   DateADT current;
> +   DateADT stop;
> +   int32   step;
> +} generate_series_date_fctx;
>
> 2) There's an extra linefeed after the struct.  Needed?
>
> Regards,
> -David
>
> The new status of this patch is: Waiting on Author
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


v3.generate_series_date.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Add generate_series(date, date) and generate_series(date, date, integer)

2016-02-02 Thread Corey Huinker
>
>
> > Do you have any insight as to why the documentation test failed?
> >
> > In the mean time, here's the updated patch.
>
> I didn't pass the docs on account of the wonky comment since I consider
> code comments to be part of the docs.  The sgml docs build fine and look
> good to me.
>
>
Ah, understood. I rebuilt the docs thinking there was a make check failure
I missed, then viewed the html in links (I develop on an AWS box) and saw
nothing untoward.


> I'll retest and update the review accordingly.
>

Thanks!


Re: [HACKERS] [POC] FETCH limited by bytes.

2016-02-02 Thread Corey Huinker
>
>
> postgres_fdw.c:2642:16: error: use of undeclared identifier 'server'
> foreach(lc, server->options)
>

Odd. Compiled for me. Maybe I messed up the diff. Will get back to you on
that with the tests suggested.



> ^
> ../../src/include/nodes/pg_list.h:153:26: note: expanded from macro
> 'foreach'
> for ((cell) = list_head(l); (cell) != NULL; (cell) = lnext(cell))
> ^


Didn't modify this file on this or any other work of mine. Possible garbage
from a git pull. Will investigate.


Re: [HACKERS] [POC] FETCH limited by bytes.

2016-02-02 Thread Corey Huinker
>
>
> I don't see how.  There really is no declaration in there for a
> variable called server.
>

Absolutely correct. My only guess is that it was from failing to make clean
after a checkout/re-checkout. A good reason to have even boring regression
tests.

Regression tests added.
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2390e61..e28cf77 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -3951,3 +3951,63 @@ QUERY:  CREATE FOREIGN TABLE t5 (
 OPTIONS (schema_name 'import_source', table_name 't5');
 CONTEXT:  importing foreign table "t5"
 ROLLBACK;
+BEGIN;
+CREATE SERVER fetch101 FOREIGN DATA WRAPPER postgres_fdw OPTIONS( fetch_size '101' );
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'fetch101'
+AND srvoptions @> array['fetch_size=101'];
+ count 
+---
+ 1
+(1 row)
+
+ALTER SERVER fetch101 OPTIONS( SET fetch_size '202' );
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'fetch101'
+AND srvoptions @> array['fetch_size=101'];
+ count 
+---
+ 0
+(1 row)
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'fetch101'
+AND srvoptions @> array['fetch_size=202'];
+ count 
+---
+ 1
+(1 row)
+
+CREATE FOREIGN TABLE table3 ( x int ) SERVER fetch101 OPTIONS ( fetch_size '3' );
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table3'::regclass
+AND ftoptions @> array['fetch_size=3'];
+ count 
+---
+ 1
+(1 row)
+
+ALTER FOREIGN TABLE table3 OPTIONS ( SET fetch_size '6');
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table3'::regclass
+AND ftoptions @> array['fetch_size=3'];
+ count 
+---
+ 0
+(1 row)
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table3'::regclass
+AND ftoptions @> array['fetch_size=6'];
+ count 
+---
+ 1
+(1 row)
+
+ROLLBACK;
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 86a5789..f89de2f 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -131,6 +131,17 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
 			/* check list syntax, warn about uninstalled extensions */
 			(void) ExtractExtensionList(defGetString(def), true);
 		}
+		else if (strcmp(def->defname, "fetch_size") == 0)
+		{
+			int		fetch_size;
+
+			fetch_size = strtol(defGetString(def), NULL,10);
+			if (fetch_size <= 0)
+ereport(ERROR,
+		(errcode(ERRCODE_SYNTAX_ERROR),
+		 errmsg("%s requires a non-negative integer value",
+def->defname)));
+		}
 	}
 
 	PG_RETURN_VOID();
@@ -162,6 +173,9 @@ InitPgFdwOptions(void)
 		/* updatable is available on both server and table */
 		{"updatable", ForeignServerRelationId, false},
 		{"updatable", ForeignTableRelationId, false},
+		/* fetch_size is available on both server and table */
+		{"fetch_size", ForeignServerRelationId, false},
+		{"fetch_size", ForeignTableRelationId, false},
 		{NULL, InvalidOid, false}
 	};
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 2ab85f6..8f72ff7 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -68,7 +68,9 @@ enum FdwScanPrivateIndex
 	/* SQL statement to execute remotely (as a String node) */
 	FdwScanPrivateSelectSql,
 	/* Integer list of attribute numbers retrieved by the SELECT */
-	FdwScanPrivateRetrievedAttrs
+	FdwScanPrivateRetrievedAttrs,
+	/* Integer representing the desired fetch_size */
+	FdwScanPrivateFetchSize
 };
 
 /*
@@ -126,6 +128,8 @@ typedef struct PgFdwScanState
 	/* working memory contexts */
 	MemoryContext batch_cxt;	/* context holding current batch of tuples */
 	MemoryContext temp_cxt;		/* context for per-tuple temporary data */
+
+	int			fetch_size;		/* number of tuples per fetch */
 } PgFdwScanState;
 
 /*
@@ -380,6 +384,7 @@ postgresGetForeignRelSize(PlannerInfo *root,
 	fpinfo->fdw_startup_cost = DEFAULT_FDW_STARTUP_COST;
 	fpinfo->fdw_tuple_cost = DEFAULT_FDW_TUPLE_COST;
 	fpinfo->shippable_extensions = NIL;
+	fpinfo->fetch_size = 100;
 
 	foreach(lc, fpinfo->server->options)
 	{
@@ -394,16 +399,17 @@ postgresGetForeignRelSize(PlannerInfo *root,
 		else if (strcmp(def->defname, "extensions") == 0)
 			fpinfo->shippable_extensions =
 ExtractExtensionList(defGetString(def), false);
+		else if (strcmp(def->defname, "fetch_size") == 0)
+			fpinfo->fetch_size = strtol(defGetString(def), NULL,10);
 	}
 	foreach(lc, fpinfo->table->options)
 	{
 		DefElem*def = (DefElem *) lfirst(lc);
 
 		if (strcmp(def->defname, "use_remote_estimate") == 0)
-		{
 			fpinfo->use_remote_estimate = defGetBoolean(def);
-			break;/* only need the one value */
-		}
+		else if (strcmp(def->defname, "fetch_size") == 0)
+			fpinfo->fetch_size = strtol(defGetString(def), NULL,10);
 	}
 
 	/*
@@ -1012,6 +1018,9 @@ postgresGetForeignPlan(PlannerInfo *root,
 	 */
 	fdw_private = list_make2(makeString(sql.data

Re: [HACKERS] enable parallel query by default?

2016-02-08 Thread Corey Huinker
>
> I think that's an argument to enable it till at least beta1. Let's
>> change the default, and add an item to the open items list to reconsider
>> then.
>>
>>
>
+1 during the beta, +0.95 for default thereafter.

I think that most databases in the past have defaulted to single-core
unless otherwise stated because machines that had multiple cores were
uncommon, and the query that could intelligently use parallel was even more
uncommon. So for them, the name of the game was "plan stability".

Machines are architected to be multicore now, and that will be the norm
going forward, as will larger workloads that can easily overwhelm a single
CPU. So I think Postgres should just enable parallel out of the box.

Having said that, it seems like the sort of thing I'd want set-able on a
per-user basis.

ALTER ROLE overly_needy_web_client SET max_parallel_degree = 1;
ALTER ROLE moar_powarrr SET max_parallel_degree = 32;

And of course this is my chance to re-ask that we not block the possibility
of one day being able to set this value relative to the number of cores
available on the machine, i.e. this user can have a parallel degree 2x the
number of CPUs, this one can only have 0.25x as many CPUs. It would be nice
to have our configurations adapt with the hardware.


Re: [HACKERS] proposal: schema PL session variables

2016-02-09 Thread Corey Huinker
On Tue, Feb 9, 2016 at 9:58 AM, Pavel Stehule 
wrote:

>
>
> 2016-02-09 15:32 GMT+01:00 Marko Tiikkaja :
>
>> On 08/02/16 14:16, Pavel Stehule wrote:
>>
>>> 2016-02-08 13:53 GMT+01:00 Marko Tiikkaja :
>>>

 Yeah, and that's exactly what I don't want, because that means that
 CREATE
 SCHEMA VARIABLE suddenly breaks existing code.


>>> theoretically yes, but this conflict can be 100% detected - so no quiet
>>> bug
>>> is possible, and plpgsql_check can find this issue well. If you don't use
>>> schema variable, then your code will be correct. You have to explicitly
>>> create the variable, and if there will be any problem, then the problem
>>> will be only in PL functions in one schema. And you can identify it by
>>> statical analyse.
>>>
>>
>> I'm sorry, but I think you've got your priorities completely backwards.
>> You're saying that it's OK to add a footgun because blown-off pieces of
>> feet can be found by using a third party static analyzer barely anyone
>> uses.  And at best, that footgun is only a very minor convenience (though
>> I'd argue that omitting it actually hurts readability).
>>
>
> I don't block the integration plpgsql_check to upstream. I spent hundreds
> hours for it.
>
> Can we look on this problem with different side? What I can do it for safe
> using proposed schema variables.
>
> The possible ways:
>
> 1. requirement prefix like : or @. I don't prefer it because a) hard to
> find a agreement - Oracle fans like ":", MSSQL like @, other maybe $, b)
> with any currently unsupported syntax I have to fix SQL lexer, parser
>
> 2. requirement to use qualified name everywhere - it can works, but I
> don't prefer it, because sometimes can be unfunny to write long qualified
> identifiers. There are not aliases on schema in PLpgSQL. Possible solved by
> variable aliases. But it requires alias.
>
> 3. plpgsql GUC where schema variables are: a) disabled, b) enabled, c)
> only qualified names are allowed - it is similar to #variable_conflict
> option
>
> I prefer @3 with "c" as default, but I can live with @2, and dislike @1
> due mentioned reasons.
>
> Can you be satisfied by any mentioned variant?
>
> Regards
>
> Pavel
>
>
>>
>> That makes absolutely no sense to me at all.
>>
>>
>> .m
>>
>
>
Would it make sense to explicitly import variables in function definitions?

CREATE SESSION VARIABLE foo integer;
CREATE SESSION VARIABLE my_schema.bar text;
SET SESSION VARIABLE foo to 4;
SET SESSION VARIABLE my_schema.bar to 'hi mom';

CREATE FUNCTION my_func (p_param text) returns boolean
LANGUAGE SQL
IMPORT g_foo integer FROM foo,
IMPORT g_textval IN OUT text FROM my_schema.bar

AS $$

SELECT COUNT(*) > 1
FROM my_table
WHERE id = g_foo
AND name = g_textval;
$$;


The IMPORT clause would be something like:

IMPORT local_var_name [IN] [OUT] type FROM [session variable | expression ]


And obviously it would reject importing an expression as an OUT type.
Importing an expression would largely serve the purpose of compile-time
macro, or allowing us to pass parameters into anonymous blocks, something
we've wanted for a while now.

With something like this, the session variables are seen as parameters
inside the function regardless of language and with no new prefix, :, @, or
otherwise.

Oh, and I suggest we call them SESSION variables rather than SCHEMA
variables, to reinforce the idea of how long the values in the variables
live. A session variable is in a sense a 1x1 temp table, whose definition
persists across sessions but whose value does not.

Of course, if they do persist across sessions, then yeah, SCHEMA makes more
sense. But every package variable in Oracle PL/SQL was initialized when the
package was first loaded into the session.


Re: [HACKERS] proposal: schema PL session variables

2016-02-09 Thread Corey Huinker
On Tue, Feb 9, 2016 at 2:55 PM, David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Tue, Feb 9, 2016 at 11:32 AM, Corey Huinker 
> wrote:
>
>>
>> Oh, and I suggest we call them SESSION variables rather than SCHEMA
>> variables, to reinforce the idea of how long the values in the variables
>> live. A session variable is in a sense a 1x1 temp table, whose definition
>> persists across sessions but whose value does not.
>>
>> Of course, if they do persist across sessions, then yeah, SCHEMA makes
>> more sense. But every package variable in Oracle PL/SQL was initialized
>> when the package was first loaded into the session.
>>
>>
> ​The key distinction for SCHEMA was that all functions in the schema would
> be able to see them (and only those in the schema).
>
> I am a bit partial, with little deep thought, to the IMPORT mechanic.
> Changing them to actual session variables would be doable and you could
> allow for the IMPORT specification to use search_path or explicit means to
> locate said variables regardless of which schema​
>
> ​they exist in.
>
> However, part of the goal is to blend into the broader database community
> and thus increase porting capabilities.  I'm not sure how well this would
> help fulfill that goal.
>
>
We're not going to get source compatibility without implementing packages,
and there's no enthusiasm for that. It's been stated a few times before by
some that the only value they see in packages is the package/session
variables. Pavel's idea gives us that.

I forgot to mention that if we're FROM-phobic the syntax could also be

IMPORT my_schema.bar AS g_localtext IN OUT text

Either way, you get the idea: the function defines what external globals
it's willing to see, and gives an alias for them, and it's the same
regardless of what the function language is.


Re: [HACKERS] Declarative partitioning

2016-02-15 Thread Corey Huinker
>
>
> The individual patches have commit messages that describe code changes.
> This is registered in the upcoming CF. Feedback and review is greatly
> welcomed!
>
> Thanks,
> Amit
>
>
We have a current system that is currently a mix of tables, each of which
is range partitioned into approximately 15 partitions (using the pgxn range
partitioning extension), and those partitions are themselves date-series
partitioned via pg_partman. The largest table ingests about 100M rows per
day in a single ETL. I will try this patch out and see how well it compares
in handling the workload. Do you have any areas of interest or concern that
I should monitor?


Re: [HACKERS] Declarative partitioning

2016-02-15 Thread Corey Huinker
>
> Also, you won't see any optimizer and executor changes. Queries will still
> use the same plans as existing inheritance-based partitioned tables,
> although as I mentioned, constraint exclusion won't yet kick in. That will
> be fixed very shortly.
>
> And of course, comments on syntax are welcome as well.
>
> Thanks,
> Amit
>
>
>
Good to know the current limitations/expectations.

Our ETL has a great number of workers that do something like this:
1. grab a file
2. based on some metadata of that file, determine the partition that that
would receive ALL of the rows in that file. It's actually multiple tables,
all of which are partitioned, all of which fully expect the file data to
fit in exactly one partition.
3. \copy into a temp table
4. Transform the data and insert the relevant bits into each of the target
partitions derived in #2.

So while ATR is a *major* feature of true partitioning, it's not something
we'd actually need in our current production environment, but I can
certainly code it that way to benchmark ATR vs "know the destination
partition ahead of time" vs "insane layered range_partitioning trigger +
pg_partman trigger".

Currently we don't do anything like table swapping, but I've done that
enough in the past that I could probably concoct a test of that too, once
it's implemented.

As for the syntax, I'm not quite sure your patch addresses the concerned I
voiced earlier: specifically if the VALUES IN works for RANGE as well as
LIST,  but I figured that would become clearer once I tried to actually use
it. Currently we have partitioning on C-collated text ranges (no, they
don't ship with postgres, I had to make a custom type) something like this:

part0: (,BIG_CLIENT)
part1: [BIG_CLIENT,BIG_CLIENT]
part2: (BIG_CLIENT,L)
part3: [L,MONSTROUSLY_BIG_CLIENT)
part4: [MONSTROUSLY_BIG_CLIENT,MONSTROUSLY_BIG_CLIENT]
part5: (MONSTROUSLY_BIG_CLIENT,RANDOM_CLIENT_LATE_IN_ALPHABET]
part6: (RANDOM_CLIENT_LATE_IN_ALPHABET,)


I can't implement that with a simple VALUES LESS THAN clause, unless I
happen to know 'x' in 'BIG_CLIENTx', where 'x' is the exact first character
in the collation sequence, which has to be something unprintable, and that
would make those who later read my code to say something unprintable. So
yeah, I'm hoping there's some way to cleanly represent such ranges.


Re: [HACKERS] Declarative partitioning

2016-02-19 Thread Corey Huinker
On Thu, Feb 18, 2016 at 12:41 AM, Amit Langote <
langote_amit...@lab.ntt.co.jp> wrote:

> START [ EXCL ] (startval) END [ INCL ] (endval)
>
> That is, in range type notation, '[startval, endval)' is the default
> behavior. So for each partition, there is at least the following pieces of
> metadata:
>

This is really close, and if it is what we ended up with we would be able
to use it.

I suggest that the range notation can be used even when no suitable range
type exists.

I assume the code for parsing a range spec regardless of data type already
exists, but in case it doesn't, take a range spec of unknown type:

[x,y)

x and y are either going to be raw strings or doublequoted strings with
possible doublequote escapes, each of which would be coercible into the the
type of the partition column.

In other words, if your string values were 'blah , blah ' and 'fabizzle',
the [) range spec would be ["blah , blah ",fabizzle).

Using regular range specs syntax also allows for the range to be unbounded
in either or both directions, which is a possibility, especially in newer
tables where the expected distribution of data is unknown.

p.s. Sorry I haven't been able to kick the tires just yet. We have a very
good use case for this, it's just a matter of getting a machine and the
time to devote to it.


[HACKERS] psql metaqueries with \gexec

2016-02-19 Thread Corey Huinker
Often, I'm faced with a long .sql script that builds some objects, then
builds things on top of them.

This means that some of the queries I wish to run are dependent on the
state of things that are unknown at the time of writing the script.

I could give up, and make a python script that mostly just strings together
SQL statements. That's ugly and cumbersome.

I could do some wizardry like this:

$ create table foo( a integer, b text, c date);
$ select coalesce( ( select string_agg(format('create index
foo(%I);',attname),E'\n')
   from pg_attribute
   where attrelid = 'foo'::regclass
   and attnum > 0 order by attnum),
 '') as sql_statements
\gset
:sql_statements


For those of you not willing to parse that, that's a dictionary query with
a 1-column result set formatted into sql with a ';' appended, string
aggregated with a newline delimiter, with the final result set coalesced
with an empty string because \gset will error on an empty result set. I
then immediately put that psql variable back into the command buffer, where
I hope that I meta-wrote valid SQL. If it hurt to read, you can imagine
what it was like to write.

I could use \g and pipe the results to another psql session...but that will
happen in another transaction where my objects might not exist yet.

I would also like the log to show what commands were run.

For that reason, I created the psql command \gexec

It is like \g and \gset in the sense that it executes the query currently
in the buffer. However, it treats every cell in the result set as a query
which itself should be immediately executed.

$ create temporary table gexec_temp( a int, b text, c date, d float);
CREATE TABLE
$ select format('create index on gexec_temp(%I)',attname)
from pg_attribute
where attrelid = 'gexec_temp'::regclass
and attnum > 0
order by attnum

\gexec

create index on gexec_temp(a)
CREATE INDEX
create index on gexec_temp(b)
CREATE INDEX
create index on gexec_temp(c)
CREATE INDEX
create index on gexec_temp(d)
CREATE INDEX



Execution order of the statements is top to bottom, left to right.

$ select 'select 1 as ones', 'select x.y, x.y*2 as double from
generate_series(1,4) as x(y)'
union all
select 'select true as is_true', 'select ''2000-01-01''::date as party_over'
\gexec
ones

   1
(1 row)

y double
- --
1  2
2  4
3  6
4  8
(4 rows)

is_true
---
t
(1 row)

party_over
--
01-01-2000
(1 row)



Empty result sets do nothing:

$ select 'select 1 as expect_zero_rows ' where false
\gexec


The results are just strings which are sent to SendQuery(), where they
succeed or fail on their own merits

$ select 'do $$ begin raise notice ''plpgsql block executed''; end;$$' as
block
from generate_series(1,2)

\gexec

do $$ begin raise notice 'plpgsql block executed'; end;$$
NOTICE:  plpgsql block executed
DO
do $$ begin raise notice 'plpgsql block executed'; end;$$
NOTICE:  plpgsql block executed
DO


I am not sure that "gexec" is the right name for this command. Others
considered were \execute_each, \meta, \gmeta, \geach, as well as adding a
"<" parameter to the \g command.

Many thanks to Pavel Stěhule for giving me some direction in this endeavor,
though he might not agree with the design.
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 9750a5b..5ca769f 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -849,6 +849,13 @@ exec_command(const char *cmd,
status = PSQL_CMD_SEND;
}
 
+   /* \gexec -- send query and treat every result cell as a query to be 
executed */
+   else if (strcmp(cmd, "gexec") == 0)
+   {
+   pset.gexec_flag = true;
+   status = PSQL_CMD_SEND;
+   }
+
/* \gset [prefix] -- send query and store result into variables */
else if (strcmp(cmd, "gset") == 0)
{
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 2cb2e9b..35bbeb9 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -710,6 +710,39 @@ StoreQueryTuple(const PGresult *result)
return success;
 }
 
+/*
+ * ExecQueryTuples: assuming query result is OK, execute every query
+ * result as its own statement
+ *
+ * Returns true if successful, false otherwise.
+ */
+static bool
+ExecQueryTuples(const PGresult *result)
+{
+   boolsuccess = true;
+   int nrows = PQntuples(result);
+   int ncolumns = PQnfields(result);
+   int r, c;
+
+   for (r = 0; r < nrows; r++)
+   {
+   for (c = 0; c < ncolumns; c++)
+   {
+   if (! PQgetisnull(result, r, c))
+   {
+   if ( ! SendQuery(PQgetvalue(result, r, c)) )
+   {
+   success = false;
+   }
+   }
+   }
+  

Re: [HACKERS] Re: Add generate_series(date, date) and generate_series(date, date, integer)

2016-02-21 Thread Corey Huinker
>
>
> [step] is in days, but is not documented as such.
>
>
It is in days, and is not documented as such, but since a day is the
smallest unit of time for a date, I felt there was no other interpretation.



> My understanding is you want to replace this
>
> SELECT d.dt::date as dt
> FROM generate_series('2015-01-01'::date,
>  '2016-01-04'::date,
>  interval '1 day') AS d(dt);
>
>
> with this
>
> SELECT d.dt
> FROM generate_series('2015-01-01'::date,
>  '2016-01-04'::date,
>  7) as d(dt);
>
>

I'd also like to be able to join the values of d.dt without typecasting
them.
To me it's as awkward as (select 1::double + 1::double)::integer


>
> Personally, I think writing  INTERVAL '7 days' to be clearer than just
> typing 7.
>

Well, nearly all my use cases involve the step being 1 (and thus omitted)
or -1.

Maybe this example will appeal to you

SELECT d.birth_date, COUNT(r.kiddo_name)
FROM generate_series('2016-01-01'::date,'2016-01-10'::date) as d(birth_date)
LEFT OUTER JOIN birth_records r ON r.birth_date = d.birth_date
GROUP BY 1
ORDER BY 1;



> Other than that, the only difference is the ::date part. Is it really
> worth adding extra code just for that? I would say not.
>

I would argue it belongs for the sake of completeness.
We added generate_series for numerics when generate_series for floats
already existed.

No comments on the patch itself, which seems to do the job, so apologies to
> give this opinion on your work, I do hope it doesn't put you off further
> contributions.
>

Thanks. I appreciate that.


Re: [HACKERS] psql metaqueries with \gexec

2016-02-22 Thread Corey Huinker
>
> I like what you've proposed, though I am wondering if you considered doing
> something server-side instead? It seems a shame to do all this work and
> exclude all other tools.
>

I have, but my solutions closely mirror the one you mention in the next
paragraph.


> I frequently find myself creating a function that is just a wrapper on
> EXECUTE for this purpose, but obviously that has transactional limitations.
>

...and query text visibility, and result visibility, and error handling,
etc. In this case, we're leveraging the psql environment we'd already set
up, and if there's an error, \set ECHO queries shows us the errant SQL as
if we typed it ourselves..


>
> FWIW, I also wish we had something better than format() for this stuff. I
> did create [1] towards that end, but it currently depends on some C code,
> which is cumbersome.


For the most party, I'm pretty thrilled with format(), though:
- I'll admit to being grumpy about the %1$s notation, but I have no better
suggestion.
- I'd also like it if there were a %I variant that accepted schema
qualified names and %I-ed both, though I see the inability to tell the
difference between a schema dot and a really-named-that dot.
- I'd love it if there were a %C format that took a pg_class oid and
formatted the qualified schema name. As it is i just use %s and cast the
parameter as ::regclass


>
> [1]
> https://github.com/decibel/trunklet-format/blob/master/doc/trunklet-format.asc


That's intense. I'll ask you about that in an off-list thread.


Re: [HACKERS] psql metaqueries with \gexec

2016-02-22 Thread Corey Huinker
On Mon, Feb 22, 2016 at 10:08 AM, Daniel Verite 
wrote:

>     Corey Huinker wrote:
>
> > ...and query text visibility, and result visibility, and error handling,
> > etc. In this case, we're leveraging the psql environment we'd already set
> > up, and if there's an error, \set ECHO queries shows us the errant SQL as
> > if we typed it ourselves..
>
> BTW, about error handling, shouldn't it honor ON_ERROR_STOP ?
>
> With the patch when trying this:
>
> => set ON_ERROR_STOP on
> => select * from (values ('select 1/0', 'select 1/0')) AS n \gexec
>
> it produces two errors:
> ERROR:  division by zero
> ERROR:  division by zero
>
> I'd rather have the execution stop immediately after the first error,
> like it's the case with successive queries entered normally via the
> query buffer:
>
> => \set ON_ERROR_STOP on
> => select 1/0; select 1/0;
> ERROR:  division by zero
>
> as opposed to:
>
> => \set ON_ERROR_STOP off
> => select 1/0; select 1/0;
> ERROR:  division by zero
> ERROR:  division by zero
>
>
Yes, I would like it to honor ON_ERROR_STOP. I'll look into that.


Re: [HACKERS] psql metaqueries with \gexec

2016-02-22 Thread Corey Huinker
On Mon, Feb 22, 2016 at 11:30 AM, Corey Huinker 
wrote:

> On Mon, Feb 22, 2016 at 10:08 AM, Daniel Verite 
> wrote:
>
>>     Corey Huinker wrote:
>>
>> > ...and query text visibility, and result visibility, and error handling,
>> > etc. In this case, we're leveraging the psql environment we'd already
>> set
>> > up, and if there's an error, \set ECHO queries shows us the errant SQL
>> as
>> > if we typed it ourselves..
>>
>> BTW, about error handling, shouldn't it honor ON_ERROR_STOP ?
>>
>> With the patch when trying this:
>>
>> => set ON_ERROR_STOP on
>> => select * from (values ('select 1/0', 'select 1/0')) AS n \gexec
>>
>> it produces two errors:
>> ERROR:  division by zero
>> ERROR:  division by zero
>>
>> I'd rather have the execution stop immediately after the first error,
>> like it's the case with successive queries entered normally via the
>> query buffer:
>>
>> => \set ON_ERROR_STOP on
>> => select 1/0; select 1/0;
>> ERROR:  division by zero
>>
>> as opposed to:
>>
>> => \set ON_ERROR_STOP off
>> => select 1/0; select 1/0;
>> ERROR:  division by zero
>> ERROR:  division by zero
>>
>>
> Yes, I would like it to honor ON_ERROR_STOP. I'll look into that.
>
>
Well, that was easy enough. Turns out that pset.on_error_stop is checked in
MainLoop, whereas the other pset.on_* vars are checked in SendQuery().

My original idea had been to push each cell into a in-memory temp file
handle and call MainLoop() on each. Pavel suggested that temp files of any
sort were a bad idea, hence using SendQuery instead. It's probably for the
best.


# select 'select 1,2,3', 'select 1/0', 'select 4,5,6'
... # \gexec
 ?column? | ?column? | ?column?
--+--+--
1 |2 |3
(1 row)

Time: 0.151 ms
ERROR:  22012: division by zero
LOCATION:  int4div, int.c:719
Time: 0.528 ms
 ?column? | ?column? | ?column?
--+--+--
4 |5 |6
(1 row)

Time: 0.139 ms
Time: 0.595 ms
# \set ON_ERROR_STOP 1
# select 'select 1,2,3', 'select 1/0', 'select 4,5,6' \gexec
 ?column? | ?column? | ?column?
--+--+--
1 |2 |3
(1 row)

Time: 0.137 ms
ERROR:  22012: division by zero
LOCATION:  int4div, int.c:719
Time: 0.165 ms
Time: 0.284 ms


Does \set ON_ERROR_STOP mess up regression tests? If not, I'll add the test
above (minus the \set VERBOSITY verbose-isms) to the regression.

In the mean time, update patch attached.


Re: [HACKERS] psql metaqueries with \gexec

2016-02-22 Thread Corey Huinker
>
> In the mean time, update patch attached.
>
>
Really attached this time.
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 9750a5b..5ca769f 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -849,6 +849,13 @@ exec_command(const char *cmd,
status = PSQL_CMD_SEND;
}
 
+   /* \gexec -- send query and treat every result cell as a query to be 
executed */
+   else if (strcmp(cmd, "gexec") == 0)
+   {
+   pset.gexec_flag = true;
+   status = PSQL_CMD_SEND;
+   }
+
/* \gset [prefix] -- send query and store result into variables */
else if (strcmp(cmd, "gset") == 0)
{
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 2cb2e9b..54b7790 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -710,6 +710,46 @@ StoreQueryTuple(const PGresult *result)
return success;
 }
 
+/*
+ * ExecQueryTuples: assuming query result is OK, execute every query
+ * result as its own statement
+ *
+ * Returns true if successful, false otherwise.
+ */
+static bool
+ExecQueryTuples(const PGresult *result)
+{
+   boolsuccess = true;
+   int nrows = PQntuples(result);
+   int ncolumns = PQnfields(result);
+   int r, c;
+
+   for (r = 0; r < nrows; r++)
+   {
+   for (c = 0; c < ncolumns; c++)
+   {
+   if (! PQgetisnull(result, r, c))
+   {
+   if ( ! SendQuery(PQgetvalue(result, r, c)) )
+   {
+   if (pset.on_error_stop)
+   {
+   return false;
+   }
+   else
+   {
+   success = false;
+   }
+   }
+   }
+   }
+   }
+
+   /* Return true if all queries were successful */
+   return success;
+}
+
+
 
 /*
  * ProcessResult: utility function for use by SendQuery() only
@@ -903,8 +943,14 @@ PrintQueryResults(PGresult *results)
switch (PQresultStatus(results))
{
case PGRES_TUPLES_OK:
-   /* store or print the data ... */
-   if (pset.gset_prefix)
+   /* execute or store or print the data ... */
+   if (pset.gexec_flag)
+   {
+   /* Turn off gexec_flag to avoid infinite loop */
+   pset.gexec_flag = false;
+   ExecQueryTuples(results);
+   }
+   else if (pset.gset_prefix)
success = StoreQueryTuple(results);
else
success = PrintQueryTuples(results);
diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c
index 59f6f25..251dd1e 100644
--- a/src/bin/psql/help.c
+++ b/src/bin/psql/help.c
@@ -173,6 +173,7 @@ slashUsage(unsigned short int pager)
fprintf(output, _("General\n"));
fprintf(output, _("  \\copyright show PostgreSQL usage and 
distribution terms\n"));
fprintf(output, _("  \\g [FILE] or ; execute query (and send 
results to file or |pipe)\n"));
+   fprintf(output, _("  \\gexec execute query and treat 
every result cell as a query to be executed )\n"));
fprintf(output, _("  \\gset [PREFIX] execute query and store 
results in psql variables\n"));
fprintf(output, _("  \\q quit psql\n"));
fprintf(output, _("  \\watch [SEC]   execute query every SEC 
seconds\n"));
diff --git a/src/bin/psql/settings.h b/src/bin/psql/settings.h
index 20a6470..9f1e94b 100644
--- a/src/bin/psql/settings.h
+++ b/src/bin/psql/settings.h
@@ -91,6 +91,9 @@ typedef struct _psqlSettings
char   *gfname; /* one-shot file output 
argument for \g */
char   *gset_prefix;/* one-shot prefix argument for \gset */
 
+   boolgexec_flag; /* true if query results are to 
be treated as
+* queries to 
be executed. Set by \gexec */
+
boolnotty;  /* stdin or stdout is not a tty 
(as determined
 * on startup) 
*/
enum trivalue getPassword;  /* prompt the user for a username and 
password */
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 5f27120..0f87f29 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-com

Re: [HACKERS] Add generate_series(date,date) and generate_series(date,date,integer)

2016-02-22 Thread Corey Huinker
>
> >
> > Given that counterexample, I think we not only shouldn't back-patch such
> a
> > change but should reject it altogether.
>
> Ouch, good point. The overflows are a different problem that we had
> better address though (still on my own TODO list)...
>

So I should remove the bounds checking from
generate_series(date,date,[int]) altogether?


format() changes discussion (was: Re: [HACKERS] psql metaqueries with \gexec)

2016-02-22 Thread Corey Huinker
>
> (One thing I had to come up with was processing of arrays, which you
> also see in that example JSON -- it's the specifiers that have a colon
> inside the {}.  The part after the colon is used as separator between
> the array elements, and each element is expanded separately.)
>
>
I'm splitting the subject line because it seems like two very different
patches may come out of this.


Re: [HACKERS] Declarative partitioning

2016-02-24 Thread Corey Huinker
>
> Hm, I see.  How about multi-column keys?  Do we care enough about that use
> case?  I don't see a nice-enough-looking range literal for such keys.
> Consider for instance,
>
> With the partitioned table defined as:
>
> CREATE TABLE foo(c1 char(2), c2 char(2)) PARTITION BY RANGE (c1, c2);
>

Good question! I would assume that we'd use a syntax that presumes c1 and
c2 are a hypothetical composite type. But what does that look like?

To answer it, I tried this:

# create type duple as (a text, b text);
CREATE TYPE
# create type duplerange as range (subtype = duple);
CREATE TYPE
# select '(beebop,alula)'::duple;
 duple

 (beebop,alula)
(1 row)

# select '("hey ho","letsgo")'::duple;
   duple
---
 ("hey ho",letsgo)
(1 row)

analytics=# select duplerange('(beebop,alula)','("hey ho","letsgo")','(]');
duplerange
--
 ("(beebop,alula)","(""hey ho"",letsgo)"]
(1 row)

So I would assume that we'd use a syntax that presumed the columns were in
a composite range type.

Which means your creates would look like (following Robert Haas's implied
suggestion that we leave off the string literal quotes):

CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES (  , (b,2) );
CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES [ (b,2), (b,3) );
CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES [ (b,3), (b,4) );
CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES [ (b,4), (c,2) );
CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES [ (c,2), (c,3) );
CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES [ (c,3), (c,4) );

That's not terrible looking.

We would want to also think about what subset of many permutations of this
> syntax to accept range specs for new partitions.  Mostly to preserve the
> non-overlapping invariant and I think it would also be nice to prevent
> gaps.
>

Gaps *might* be intentional. I can certainly see where we'd want to set up
warnings for discontinuity, or perhaps some utility functions:
pg_partitions_ranges_are_continuous('master_table_name')
pg_partitions_are_adjacent('master_table_name','p1','p2')

But for the most part, range partitions evolve from splits when one
partition grows too big, so that won't be such a problem.


Consider that once we create:
>
> PARTITION FOR VALUES [current_date,);
>
> Now to create a new partition starting at later date, we have to have a
> "split partition" feature which would scan the above partition to
> distribute the existing data rows appropriately to the resulting two
> partitions. Right?
>

Correct. And depending on policy, that might be desirable and might be not.
If the table were for death records, we'd very much want to reject rows in
the future, if only to avoid upsetting the person.
If the table were of movie release dates, we'd *expect* that only dates
(,current_date] would be entered, but if someone chose to leak a release
date, we'd want to capture that and deal with it later.
So yeah, we're going to (eventually) need a SPLIT PARTITION that migrates
rows to a new partition.


> IOW, one shouldn't create an unbounded partition if more partitions in the
> unbounded direction are expected to be created.  It would be OK for
> unbounded partitions to be on the lower end most of the times.
>

On this I'll have to disagree. My own use case where I use my
range_partitioning extension starts off with a single partition () and all
new partitions are splits of that. The ranges evolve over time as
partitions grow and slow down. It's nice because we're not trying to
predict where growth will be, we split where growth is.


>
> > p.s. Sorry I haven't been able to kick the tires just yet. We have a very
> > good use case for this, it's just a matter of getting a machine and the
> > time to devote to it.
>
> I would appreciate it.  You could wait a little more for my next
> submission which will contain some revisions to the tuple routing code.
>
>
Ok, I'll wait a bit. In the mean time I can tell you a bit about the
existing production system I'm hoping to replicate in true partitioning
looks like this:

Big Master Table:
 Range partition by C collated text
   Date Range
   Date Range
   ...
 Range partition by C collated text
   Date Range
   Date Range
   ...
...

Currently this is accomplished through my range_partitioning module, and
then using pg_partman on those partitions. It works, but it's a lot of
moving parts.

The machine will be a 32 core AWS box. As per usual with AWS, it will be
have ample memory and CPU, and be somewhat starved for I/O.

Question: I haven't dove into the code, but I was curious about your tuple
routing algorithm. Is there any way for the algorithm to begin it's scan of
candidate partitions based on the destination of the last row inserted this
statement? I ask because most use cases (that I am aware of) have data that
would naturally cluster in the same partition.


Re: [HACKERS] anyelement -> anyrange

2016-08-18 Thread Corey Huinker
On Tue, Aug 16, 2016 at 9:29 PM, Tom Lane  wrote:

> Jim Nasby  writes:
> > I can't think of any reason you'd want two different range types on a
> > single element type.
>
> We would not have built it that way if there were not clear use-cases.
> An easy example is you might want both a continuous timestamp range
> and one that is quantized to hour boundaries.  Primarily what the
> range type brings in besides the element type is a canonicalization
> function; and we can't guess which one you want.
>
> regards, tom lane
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Jim,

I wrote a routine that fishes in the dictionary for a suitable range type:
https://github.com/moat/range_partitioning/blob/master/sql/range_partitioning.sql#L459-L485

Obviously, it has the problems when the number of suitable ranges <> 1 as
mentioned by Tom.

You might also find some gleanable gems in:
https://github.com/moat/range_type_functions/blob/master/doc/range_type_functions.md


Re: [HACKERS] anyelement -> anyrange

2016-08-18 Thread Corey Huinker
I'd be happy to roll your code into the extension, and make it marked more
stable.

On Thu, Aug 18, 2016 at 2:49 PM, Jim Nasby  wrote:

> On 8/18/16 1:06 PM, Corey Huinker wrote:
>
>> You might also find some gleanable gems in:
>> https://github.com/moat/range_type_functions/blob/master/doc
>> /range_type_functions.md
>>
>
> Well crap, I searched for range stuff on PGXN before creating
> http://pgxn.org/dist/range_tools/ and the only thing that came up was
> your range_partitioning stuff, which AFAICT is unrelated.
> http://pgxn.org/dist/range_type_functions/ still doesn't show up in
> search, maybe because it's marked unstable?
>
> Rather frustrating that I've spent time creating an extension that
> duplicates your work. :(
>
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
> 855-TREBLE2 (855-873-2532)   mobile: 512-569-9461
>


Re: [HACKERS] anyelement -> anyrange

2016-08-19 Thread Corey Huinker
On Fri, Aug 19, 2016 at 11:40 AM, Jim Nasby 
wrote:

> On 8/18/16 6:02 PM, Corey Huinker wrote:
>
>> I'd be happy to roll your code into the extension, and make it marked
>> more stable.
>>
>
> Yeah, I've been meaning to look at submitting a pull request; hopefully
> will get to it today.
>
>
No rush, I'm on vacation. Though I really do appreciate other eyes on the
code and other people using it.


Re: [HACKERS] \timing interval

2016-08-24 Thread Corey Huinker
On Wed, Aug 24, 2016 at 10:36 PM, Gerdan Santos  wrote:

> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   tested, passed
> Spec compliant:   tested, passed
> Documentation:tested, passed
>
> Sorry, my mistake!
>
> I could not find a way to disable this functionality , I see that the
> impact can be big as it is changed the output structure \timing without a
> mode to disable it. I even finding great functionality but need a way to
> set to default.
>
>
>
Thanks for reviewing! I'm not really sure how to proceed, so I'll try to
summarize where it stands. Apologies if I
mischaracterize/misattribute/misremember someone's position.

Generally speaking, people disliked the third mode for \timing, and were
generally fine with AndrewG's idea of printing the timing in both raw
milliseconds and a more human-digestible format, which means that we can:

1. keep the format exactly as is, ignoring locale issues
 + It's already done
 + lightweight
 +TomL believes there will be no confusion
 - others disagree
2. we fish out the proper locale-specific abbreviations for
days/hours/minutes/seconds
 + no additional settings
 + locale stuff can't be that hard
 - I've never dealt with it (American, surprise)
3. ignore locales and fall back to a left-trimmed DDD HH:MM:SS.mmm format
 + Easy to revert to that code
 + My original format and one PeterE advocated
 - others disliked
4. we have a \pset that sets fixed scale (seconds, minutes, hours, days),
sliding scale (what's displayed now), or interval
 + some flexibility with some easy config values
 - still have the locale issue
 -  likely will miss a format somebody wanted
4. The \pset option is a time format string like "%d %h:%m:%s".
 + maximum flexibility
 + sidesteps the locale issue by putting it in the user's hands
 - those format strings are sometimes tough for users to grok
 - ISO 8601 isn't much of a help as it doesn't handle milliseconds
 - additional config variable
 - documentation changes
 - debate about what the default should be. GOTO 1.

I personally would be happy with any of these options, so I think we get
some more feedback to see if a consensus emerges. It's a tiny patch and
trivial to test, so we have time(ing) on our side.


Re: [HACKERS] proposal: psql \setfileref

2016-08-31 Thread Corey Huinker
On Wed, Aug 31, 2016 at 11:32 AM, Pavel Stehule 
wrote:

> Hi
>
> I propose a new type of psql variables - file references. The content of
> file reference is specified by referenced file. It allows simple inserting
> large data without necessity of manual escaping or using LO api.
>
> When I wrote the patch, I used parametrized queries for these data instead
> escaped strings - the code is not much bigger, and the error messages are
> much more friendly if query is not bloated by bigger content. The text mode
> is used only - when escaping is not required, then content is implicitly
> transformed to bytea. By default the content of file is bytea. When use
> requires escaping, then he enforces text escaping - because it has sense
> only for text type.
>
> postgres=# \setfileref a ~/test2.xml
> postgres=# \setfileref b ~/avatar.gif
> postgres=# insert into test values(convert_from(:a, 'latin2')::xml, :b);
> -- xml is passed as bytea
> postgres=# insert into test values(:'a', :b); -- xml is passed via unknown
> text value
>
> The content of file reference variables is not persistent in memory.
>
> Comments, notes?
>
> Regards
>
> Pavel
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>
Clearly jumping ahead on this one, but if the fileref is essentially a pipe
to "cat /path/to/file.name", is there anything stopping us from setting
pipes?
My interest is primarily in ways that COPY could use this.


Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-31 Thread Corey Huinker
On Wed, Aug 24, 2016 at 12:39 PM, Andres Freund  wrote:

>
>
> On August 24, 2016 9:32:48 AM PDT, Tomas Vondra <
> tomas.von...@2ndquadrant.com> wrote:
> >
> >
> >On 08/24/2016 12:20 AM, Andres Freund wrote:
> >> On 2016-08-23 19:18:04 -0300, Claudio Freire wrote:
> >>> On Tue, Aug 23, 2016 at 7:11 PM, Tomas Vondra
> >>>  wrote:
>  Could someone please explain how the unlogged tables are supposed
> >to fix the
>  catalog bloat problem, as stated in the initial patch submission?
> >We'd still
>  need to insert/delete the catalog rows when creating/dropping the
> >temporary
>  tables, causing the bloat. Or is there something I'm missing?
> >>
> >> Beats me.
> >>
> >
> >Are you puzzled just like me, or are you puzzled why I'm puzzled?
>
> Like you. I don't think this addresses the problem to a significant enough
> degree to care.
>
> Andres
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Ok, here's a wild idea, and it probably depends on having native
partitioning implemented.

Propagate relpersistence, or a boolean flag on (relpersistence = 't') from
pg_class into the child pg_attribute records.

Partition the tables pg_class and pg_attribute first by relpersistence, and
then by oid.

The partitions holding data on persistent objects would basically stay
as-is, but the partition wouldn't have much activity and no temp-table
churn.

The temporary ones, however, would fall into essentially a rotating set of
partitions. Pick enough partitions such that the active transactions only
cover some of the partitions. The rest can be safely truncated by vacuum.

It would mitigate the bloat, existing dictionary queries would still work,
but the additional lookup cost might not be worth it.


Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-31 Thread Corey Huinker
On Wed, Aug 31, 2016 at 6:07 PM, Andres Freund  wrote:

>
> In my experience pg attribute is usually the worst affected. Many tech
> takes won't even have stays entries...
>
>
Mine too. One database currently has a 400GB pg_attribute table, because we
chew through temp tables like popcorn.


Re: [HACKERS] \timing interval

2016-09-01 Thread Corey Huinker
On Thu, Sep 1, 2016 at 2:17 PM, Tom Lane  wrote:

> I wrote:
> > So for clarity's sake: first suitable format among these:
>
> > Time: 59.999 ms
> > Time: 121.999 ms (2:01.999)
> > Time: 10921.999 ms (3:02:01.999)
> > Time: 356521.999 ms (4 3:02:01.999)
>
> Sorry, that probably added no clarity at all, I was confusing
> seconds with milliseconds in the example values :-(
>
> regards, tom lane
>

I didn't scan your examples for correctness, but the parentheticals matched
my own idea for what "left-trimmed" meant.

I'm going to hold off a bit to see if anybody else chimes in, and if not
I'm going to deliver a patch.


Re: [HACKERS] \timing interval

2016-09-01 Thread Corey Huinker
On Thu, Sep 1, 2016 at 2:43 PM, Tom Lane  wrote:

> I wrote:
> > Sorry, that probably added no clarity at all, I was confusing
> > seconds with milliseconds in the example values :-(
>
> After a bit of further fooling with sample values, I propose this
> progression:
>
> Time: 0.100 ms
> Time: 1.200 ms
> Time: 1001.200 ms (0:01.001)
> Time: 12001.200 ms (0:12.001)
> Time: 60001.200 ms (1:00.001)
> Time: 720001.200 ms (12:00.001)
> Time: 3660001.200 ms (1:01:00.001)
> Time: 43920001.200 ms (12:12:00.001)
> Time: 176460001.200 ms (2 01:01:00.001)
> Time: 216720001.200 ms (2 12:12:00.001)
> Time: 1000.000 ms (115740740740 17:46:40.000)
>
> Note that times from 1 second to 1 hour all get the nn:nn.nnn
> treatment.  I experimented with these variants for sub-minute times:
>
> Time: 1001.200 ms (1.001)
> Time: 12001.200 ms (12.001)
> Time: 1001.200 ms (1.001 s)
> Time: 12001.200 ms (12.001 s)
>
> but it seems like the first variant is not terribly intelligible and
> the second variant is inconsistent with what happens for longer times.
> Adding a zero minutes field is a subtler way of cueing the reader that
> it's mm:ss.
>
> regards, tom lane
>

Well, if we're looking to be consistent, here's what interval does now:

# select '1 second 1 millisecond'::interval, '1 minute 2
milliseconds'::interval, '1 hour 30 milliseconds'::interval, '1 day 1 hour
999 milliseconds'::interval, '1 week 1 day 1 hour'::interval;
   interval   |   interval   |  interval   |  interval  |
 interval
--+--+-++-
 00:00:01.001 | 00:01:00.002 | 01:00:00.03 | 1 day 01:00:00.999 | 8 days
01:00:00
(1 row)


Should we just plug into whatever code that uses? It's slightly different:

# select interval '1000.001 milliseconds'::interval;
ERROR:  interval field value out of range: "1000.001
milliseconds"
LINE 1: select interval '1000.001 milliseconds'::int...
^
# select interval '216720001.200 milliseconds';
   interval
---
 60:12:00.0012
(1 row)

# select interval '176460001.200 ms';
   interval
---
 49:01:00.0012
(1 row)


Re: [HACKERS] \timing interval

2016-09-01 Thread Corey Huinker
On Thu, Sep 1, 2016 at 3:01 PM, Tom Lane  wrote:

> Corey Huinker  writes:
> > On Thu, Sep 1, 2016 at 2:43 PM, Tom Lane  wrote:
> >> Note that times from 1 second to 1 hour all get the nn:nn.nnn
> >> treatment.  I experimented with these variants for sub-minute times:
> >> ...
> >> but it seems like the first variant is not terribly intelligible and
> >> the second variant is inconsistent with what happens for longer times.
>
> > Well, if we're looking to be consistent, here's what interval does now:
> > ...
> > Should we just plug into whatever code that uses?
>
> Well, that code's on the backend side so we're not going to just call it
> in any case.  And I think we don't want to be quite so verbose as to go up
> to hh:mm:ss.fff as soon as we get past 1 second.  However, comparing that
> output to what I had suggests that maybe it's better to keep a leading
> zero in two-digit fields, that is render times like "00:01.234",
> "01:23.456", or "01:23:45.678" rather than suppressing the initial zero as
> I had in my examples.  It's an extra character but I think it reinforces
> the meaning.
>
> regards, tom lane
>

+1
The larger jump in widths from no MM:SS to HH:MM:SS is a good visual cue.
Jumping from MM:SS to H:MM:SS to HH:MM:SS would be more subtle and possibly
confusing.


Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM

2016-09-06 Thread Corey Huinker
On Fri, Sep 2, 2016 at 5:07 AM, Amit Langote 
wrote:

>
> I am not familiar with win32 stuff too, so I don't have much to say about
> that.  Maybe someone else can chime in to help with that.
>

The regressions basically *can't* test this because we'd need a shell
command we know works on any architecture.


>
> About the patch:
>
> * applies cleanly, compiles fine
> * basic functionality seems to work (have not done any extensive tests
> though)
>
>
> - Specifies the file to be read.  Required.  Must be an absolute path
> name.
> + Specifies the file to be read.  Must be an absolute path name.
> + Either filename or program
> must be
> + specified.  They are mutually exclusive.
> +
> +   
> +  
>
> The note about filename and program being mutually exclusive could be
> placed at the end of list of options.  Or perhaps mention this note as
> part of the description of program option, because filename is already
> defined by that point.
>
> +   
> +
> + Specifies the command to executed.
>
> s/to executed/to be executed/g
>

Correct. I will fix that when other issues below are resolved.


>
> + Either program or filename
> must be
> + specified.  They are mutually exclusive.
>  
>
> Oh I see this has already been mentioned in program option description.
> Did you intend to specify the same in both filename and program
> descriptions?
>

It's been a while since I repackaged Adam's code, but generally I'm in
favor of some redundancy if the two mutually exclusive things are
documented far enough apart.



>
> @@ -97,8 +99,9 @@ typedef struct FileFdwPlanState
>  typedef struct FileFdwExecutionState
>  {
>  char   *filename;   /* file to read */
> -List   *options;/* merged COPY options, excluding
> filename */
> -CopyState   cstate; /* state of reading file */
> +char   *program;/* program to read output from */
> +List   *options;/* merged COPY options, excluding
> filename and program */
> +CopyState   cstate; /* state of reading file or program */
>
> Have you considered a Boolean flag is_program instead of char **program
> similar to how copy.c does it?  (See a related comment further below)
>

Considered it yes, but decided against it when I started to write my
version. When Adam delivered his version of the patch, I noticed he had
made the same design decision. Only one of them will be initialized, and
the boolean will byte align to 4 bytes, so it's the same storage allocated
either way.

Either we're fine with two variables, or we think file_name is poorly
named. I have only a slight preference for the two variables, and defer to
the group for a preference.


>
> - * This is because the filename is one of those options, and we don't
> want
> - * non-superusers to be able to determine which file gets read.
> + * This is because the filename or program string are two of those
> + * options, and we don't want non-superusers to be able to determine
> which
> + * file gets read or what command is run.
>
> I'm no native English speaker, but I wonder if this should read: filename
> or program string *is* one of those options ... OR filename *and* program
> are two of those options ... ?  Also, "are two of those options" sounds a
> bit odd to me because I don't see that used as often as "one of those
> whatever".  I guess that's quite a bit of nitpicking about a C comment, ;)
>

Given that C comments constitute a large portion of our documentation, I
fully support making them as clear as possible.

I don't remember if this is Adam's comment or mine. Adam - if you're out
there, chime in.

The original paragraph was:

Changing table-level options requires superuser privileges, for security
reasons: only a superuser should be able to determine which file is read.
In principle non-superusers could be allowed to change the other options,
but that's not supported at present.


How does this paragraph sound?:

Changing table-level options requires superuser privileges, for security
reasons: only a superuser should be able to determine which file is read,
or which program is run. In principle non-superusers could be allowed to
change the other options, but that's not supported at present.




> @@ -257,9 +264,26 @@ file_fdw_validator(PG_FUNCTION_ARGS)
>  ereport(ERROR,
>  (errcode(ERRCODE_SYNTAX_ERROR),
>   errmsg("conflicting or redundant options")));
> +if (program)
> +ereport(ERROR,
> +(errcode(ERRCODE_SYNTAX_ERROR),
> + errmsg("conflicting or redundant options")));
>  filename = defGetString(def);
>  }
>
> +else if (strcmp(def->defname, "program") == 0)
> +{
> +if (filename)
> +ereport(ERROR,
> +(errcode(ERRCODE_SYNTAX_ERROR),
> + 

Re: [HACKERS] \timing interval

2016-09-06 Thread Corey Huinker
On Sun, Sep 4, 2016 at 7:05 PM, Jim Nasby  wrote:

> On 9/3/16 2:35 PM, Tom Lane wrote:
>
>> I pushed the patch using this:
>>
>> Time: 176460001.200 ms (2 d 01:01:00.001)
>>
>> and all else as before.
>>
>
> I'd find this useful in the final output of EXPLAIN ANALYZE as well; any
> objections to adding it?


It's sorta out of my hands now, but what Tom said earlier is that because
this is client-side code, it wouldn't use existing interval code.

EXPLAIN *is* server-side, we couldn't use this code, but we could leverage
existing interval code there to achieve a similar concept.

I have another thing I'd like to add to EXPLAIN output : server version
number output. So maybe we can pick those up in another thread.


Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM

2016-09-06 Thread Corey Huinker
On Tue, Sep 6, 2016 at 6:53 PM, Craig Ringer 
wrote:

> On 7 Sep. 2016 02:14, "Corey Huinker"  wrote:
> >
>
> > Having regression tests for this is extremely problematic, because the
> program invoked would need to be an invokable command on any architecture
> supported by postgres. I'm pretty sure no such command exists.
>
> Your best bet will be using the TAP framework. There you can use Perl
> logic.
>
> I'm not sure where to put such a test though. It doesn't really make sense
> in src/test/recovery/ .
>

And the TAP test would detect the operating system and know to create an
FDW that has the PROGRAM value 'cat test_data.csv' on Unix, 'type
test_data.csv' on windows, and 'type test_data.csv;1' on VMS?


Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM

2016-09-06 Thread Corey Huinker
On Tue, Sep 6, 2016 at 9:46 PM, Amit Langote 
wrote:

> On 2016/09/07 3:12, Corey Huinker wrote:
> > On Fri, Sep 2, 2016 at 5:07 AM, Amit Langote wrote:
> >> I am not familiar with win32 stuff too, so I don't have much to say
> about
> >> that.  Maybe someone else can chime in to help with that.
> >
> > The regressions basically *can't* test this because we'd need a shell
> > command we know works on any architecture.
>
> OK.
>

Well...maybe not, depending on what Craig and other can do to educate me
about the TAP tests.


> >
> > Changing table-level options requires superuser privileges, for security
> > reasons: only a superuser should be able to determine which file is read,
> > or which program is run. In principle non-superusers could be allowed to
> > change the other options, but that's not supported at present.
>
> Hmm, just a little modification would make it better for me:
>
> ... for security reasons.  For example, only superuser should be able to
> determine which file read or which program is run.
>
> Although that could be just me.
>

In this case "determine" is unclear whether a non-superuser can set the
program to be run, or is capable of knowing which program is set to be run
by the fdw.

We may want some more opinions on what is the most clear.


>
> >> @@ -632,11 +671,18 @@ fileBeginForeignScan(ForeignScanState *node, int
> >> eflags)
> >>   * Create CopyState from FDW options.  We always acquire all
> columns,
> >> so
> >>   * as to match the expected ScanTupleSlot signature.
> >>   */
> >> -cstate = BeginCopyFrom(node->ss.ss_currentRelation,
> >> -   filename,
> >> -   false,
> >> -   NIL,
> >> -   options);
> >> +if(filename)
> >> +cstate = BeginCopyFrom(node->ss.ss_currentRelation,
> >> +   filename,
> >> +   false,
> >> +   NIL,
> >> +   options);
> >> +else
> >> +cstate = BeginCopyFrom(node->ss.ss_currentRelation,
> >> +   program,
> >> +   true,
> >> +   NIL,
> >> +   options)
> >>
> >> Like I mentioned above, if there was a is_program Boolean flag instead
> of
> >> separate filename and program, this would be just:
> >>
> >> +cstate = BeginCopyFrom(node->ss.ss_currentRelation,
> >> +   filename,
> >> +   is_program,
> >> +   NIL,
> >> +   options);
> >>
> >> That would get rid of if-else blocks here and a couple of other places.
> >
> > It would, pushing the complexity out to the user. Which could be fine,
> but
> > if we do that then "filename" is a misnomer.
>
> This is an internal state so I'm not sure how this would be pushing
> complexity out to the user.  Am I perhaps misreading what you said?
>

Indeed, it is internal state. Maybe we rename the variable file_command or
something.


>
> What a user sees is that there are two separate foreign table options -
> filename and program.  That we handle them internally using a string to
> identify either file or program and a Boolean flag to show which of the
> two is just an internal implementation detail.
>
> COPY does it that way internally and I just saw that psql's \copy does it
> the same way too.  In the latter's case, following is the options struct
> (src/bin/psql/copy.c):
>
> struct copy_options
> {
> char   *before_tofrom;  /* COPY string before TO/FROM */
> char   *after_tofrom;   /* COPY string after TO/FROM filename */
> char   *file;   /* NULL = stdin/stdout */
> boolprogram;/* is 'file' a program to popen? */
> boolpsql_inout; /* true = use psql stdin/stdout */
> boolfrom;   /* true = FROM, false = TO */
> };
>
> But as you said above, this could be deferred to the committer.
>

Yeah, and that made for zero storage savings: a char pointer which is never
assigned a string takes up as much space as a 4-byte-aligned boolean. And
the result is that "file" really means program, which I found slightly
awkward.


> >> diff --git a/contrib/file_fdw/input/file_fdw.source
&g

Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM

2016-09-06 Thread Corey Huinker
On Tue, Sep 6, 2016 at 11:24 PM, Craig Ringer 
wrote:

> On 7 September 2016 at 11:21, Corey Huinker 
> wrote:
> > On Tue, Sep 6, 2016 at 6:53 PM, Craig Ringer <
> craig.rin...@2ndquadrant.com>
>
> > And the TAP test would detect the operating system and know to create an
> FDW
> > that has the PROGRAM value 'cat test_data.csv' on Unix, 'type
> test_data.csv'
> > on windows, and 'type test_data.csv;1' on VMS?
>
> Right. Or just "perl emit_test_data.pl" that works for all of them,
> since TAP is perl so you can safely assume you have Perl.
>

Thanks. I was mentally locked in more basic OS commands. Am I right in
thinking perl is about the *only* OS command you can be sure is on every
architecture?

The platforms page says we support S/390 but no mention of VM/MVS/CMS. Did
we do an OS/400 port yet? ::ducks::


Re: [HACKERS] \timing interval

2016-09-07 Thread Corey Huinker
>
> ... and it would probably greatly reduce the amount of mailing list
> traffic asking for version if nothing else.


That was the major reason for wanting it.
The second is that if an explain were posted to a forum like stackexchange,
the reader wouldn't have to wonder what version produced the plan.


Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM

2016-09-08 Thread Corey Huinker
On Tue, Sep 6, 2016 at 11:44 PM, Craig Ringer 
wrote:

> On 7 September 2016 at 11:37, Corey Huinker 
> wrote:
> > On Tue, Sep 6, 2016 at 11:24 PM, Craig Ringer <
> craig.rin...@2ndquadrant.com>
> > wrote:
> >>
> >> On 7 September 2016 at 11:21, Corey Huinker 
> >> wrote:
> >> > On Tue, Sep 6, 2016 at 6:53 PM, Craig Ringer
> >> > 
> >>
> >> > And the TAP test would detect the operating system and know to create
> an
> >> > FDW
> >> > that has the PROGRAM value 'cat test_data.csv' on Unix, 'type
> >> > test_data.csv'
> >> > on windows, and 'type test_data.csv;1' on VMS?
> >>
> >> Right. Or just "perl emit_test_data.pl" that works for all of them,
> >> since TAP is perl so you can safely assume you have Perl.
> >
> >
> > Thanks. I was mentally locked in more basic OS commands. Am I right in
> > thinking perl is about the *only* OS command you can be sure is on every
> > architecture?
>
> Probably, there's a lot of crazy out there.
>
> TAP tests can be conditionally run based on architecture, but
> something like this is probably worth testing as widely as possible.
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Stylistically, would a separate .pl file for the emitter be preferable to
something inline like

perl -e 'print "a\tb\tcc\t4\n"; print "b\tc\tdd\t5\n"'


?

I'm inclined to go inline to cut down on the number of moving parts, but I
can see where perl's readability is a barrier to some, and either way I
want to follow established patterns.


[*] For those who don't perl, the command prints:

a   b   cc  4
b   c   dd  5


Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM

2016-09-08 Thread Corey Huinker
On Thu, Sep 8, 2016 at 6:59 PM, Craig Ringer 
wrote:

> On 9 Sep. 2016 03:45, "Corey Huinker"  wrote:
> >
> >
>
> > Stylistically, would a separate .pl file for the emitter be preferable
> to something inline like
> >
> >> perl -e 'print "a\tb\tcc\t4\n"; print "b\tc\tdd\t5\n"'
>
> I'd be fine with that and a suitable comment. Just be careful with
> different platforms' shell escaping rules.
>
Do perl command switches on windows/VMS use /e instead of -e?  If so,
that'd be a great argument doing just "perl filename".


Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM

2016-09-10 Thread Corey Huinker
V2 of this patch:

Changes:
* rebased to most recent master
* removed non-tap test that assumed the existence of Unix sed program
* added non-tap test that assumes the existence of perl
* switched from filename/program to filename/is_program to more closely
follow patterns in copy.c
* slight wording change in C comments


On Thu, Sep 8, 2016 at 6:59 PM, Craig Ringer 
wrote:

> On 9 Sep. 2016 03:45, "Corey Huinker"  wrote:
> >
> >
>
> > Stylistically, would a separate .pl file for the emitter be preferable
> to something inline like
> >
> >> perl -e 'print "a\tb\tcc\t4\n"; print "b\tc\tdd\t5\n"'
>
> I'd be fine with that and a suitable comment. Just be careful with
> different platforms' shell escaping rules.
>
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index b471991..bf9753a 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -59,6 +59,7 @@ struct FileFdwOption
 static const struct FileFdwOption valid_options[] = {
/* File options */
{"filename", ForeignTableRelationId},
+   {"program", ForeignTableRelationId},
 
/* Format options */
/* oids option is not supported */
@@ -85,10 +86,11 @@ static const struct FileFdwOption valid_options[] = {
  */
 typedef struct FileFdwPlanState
 {
-   char   *filename;   /* file to read */
-   List   *options;/* merged COPY options, excluding 
filename */
-   BlockNumber pages;  /* estimate of file's physical 
size */
-   double  ntuples;/* estimate of number of rows 
in file */
+   char   *filename;   /* file/program to read */
+   boolis_program; /* true if filename represents 
an OS command */
+   List   *options;/* merged COPY options, excluding 
filename and program */
+   BlockNumber pages;  /* estimate of file or program 
output's physical size */
+   double  ntuples;/* estimate of number of rows 
in file/program output */
 } FileFdwPlanState;
 
 /*
@@ -96,9 +98,10 @@ typedef struct FileFdwPlanState
  */
 typedef struct FileFdwExecutionState
 {
-   char   *filename;   /* file to read */
-   List   *options;/* merged COPY options, excluding 
filename */
-   CopyState   cstate; /* state of reading file */
+   char   *filename;   /* file/program to read */
+   boolis_program; /* true if filename represents 
an OS command */
+   List   *options;/* merged COPY options, excluding 
filename and is_program */
+   CopyState   cstate; /* state of reading file or 
program */
 } FileFdwExecutionState;
 
 /*
@@ -139,7 +142,9 @@ static bool fileIsForeignScanParallelSafe(PlannerInfo 
*root, RelOptInfo *rel,
  */
 static bool is_valid_option(const char *option, Oid context);
 static void fileGetOptions(Oid foreigntableid,
-  char **filename, List **other_options);
+  char **filename,
+  bool *is_program,
+  List **other_options);
 static List *get_file_fdw_attribute_options(Oid relid);
 static bool check_selective_binary_conversion(RelOptInfo *baserel,
  Oid 
foreigntableid,
@@ -196,16 +201,17 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 
/*
 * Only superusers are allowed to set options of a file_fdw foreign 
table.
-* This is because the filename is one of those options, and we don't 
want
-* non-superusers to be able to determine which file gets read.
+* The reason for this is to prevent non-superusers from changing the 
+* definition to access an arbitrary file not visible to that user
+* or to run programs not accessible to that user.
 *
 * Putting this sort of permissions check in a validator is a bit of a
 * crock, but there doesn't seem to be any other place that can enforce
 * the check more cleanly.
 *
-* Note that the valid_options[] array disallows setting filename at any
-* options level other than foreign table --- otherwise there'd still 
be a
-* security hole.
+* Note that the valid_options[] array disallows setting filename and
+* program at any options level other than foreign table --- otherwise
+* there'd still be a security hole.
 */
if (catalog == ForeignTableRelationId && !superuser())
ereport(ERROR,
@@ -247,11 +253,11 @@ file_fdw_validator(PG_FUNCTION_ARGS)
   

Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM

2016-09-12 Thread Corey Huinker
Thanks for the review!

I agree with all the code cleanups suggested and have made then in the
attached patch, to save the committer some time.

Also in this patch, I changed sgml para to
 
  Changing table-level options requires superuser privileges, for security
  reasons: only a superuser should be able to determine which file is read
  or which program is run.  In principle non-superusers could be allowed to
  change the other options, but that's not supported at present.
 

"Determine" is an odd word in this context. I understand it to mean
"set/change", but I can see where a less familiar reader would take the
meaning to be "has permission to see the value already set". Either way, it
now mentions program as an option in addition to filename.


On Mon, Sep 12, 2016 at 1:59 AM, Amit Langote  wrote:

> On 2016/09/11 8:04, Corey Huinker wrote:
> > V2 of this patch:
> >
> > Changes:
> > * rebased to most recent master
> > * removed non-tap test that assumed the existence of Unix sed program
> > * added non-tap test that assumes the existence of perl
> > * switched from filename/program to filename/is_program to more closely
> > follow patterns in copy.c
> > * slight wording change in C comments
>
> This version looks mostly good to me.  Except some whitespace noise in
> some hunks:
>
> @@ -139,7 +142,9 @@ static bool fileIsForeignScanParallelSafe(PlannerInfo
> *root, RelOptInfo *rel,
>   */
>  static bool is_valid_option(const char *option, Oid context);
>  static void fileGetOptions(Oid foreigntableid,
> -   char **filename, List **other_options);
> +   char **filename,
> +   bool *is_program,
>
> Space after "is_program,"
>
> @@ -196,16 +201,17 @@ file_fdw_validator(PG_FUNCTION_ARGS)
>
>  /*
>   * Only superusers are allowed to set options of a file_fdw foreign
> table.
> - * This is because the filename is one of those options, and we don't
> want
> - * non-superusers to be able to determine which file gets read.
> + * The reason for this is to prevent non-superusers from changing the
>
> Space after "the"
>
> -if (stat(filename, &stat_buf) == 0)
> +if ((! is_program) && (stat(filename, &stat_buf) == 0)))
>
> Space between ! and is_program.
>
>
> -if (strcmp(def->defname, "filename") == 0)
> +if ((strcmp(def->defname, "filename") == 0) ||
> (strcmp(def->defname, "program") == 0))
>
> I think the usual style would be to split the if statement into two lines
> as follows to keep within 80 characters per line [1]:
>
> +if ((strcmp(def->defname, "filename") == 0) ||
> +(strcmp(def->defname, "program") == 0))
>
> And likewise for:
>
> -   &fdw_private->filename, &fdw_private->options);
> +   &fdw_private->filename, &fdw_private->is_program,
> &fdw_private->options);
>
> By the way, doesn't the following paragraph in file-fdw.sgml need an
> update?
>
>  
>   Changing table-level options requires superuser privileges, for security
>   reasons: only a superuser should be able to determine which file is read.
>   In principle non-superusers could be allowed to change the other options,
>   but that's not supported at present.
>  
>
>
> I would like to mark this now as "Ready for Committer".
>
> Thanks,
> Amit
>
> [1] https://www.postgresql.org/docs/devel/static/source-format.html
>
>
>
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index b471991..7f534b1 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -59,6 +59,7 @@ struct FileFdwOption
 static const struct FileFdwOption valid_options[] = {
/* File options */
{"filename", ForeignTableRelationId},
+   {"program", ForeignTableRelationId},
 
/* Format options */
/* oids option is not supported */
@@ -85,10 +86,11 @@ static const struct FileFdwOption valid_options[] = {
  */
 typedef struct FileFdwPlanState
 {
-   char   *filename;   /* file to read */
-   List   *options;/* merged COPY options, excluding 
filename */
-   BlockNumber pages;  /* estimate of file's physical 
size */
-   double  ntuples;/* estimate of number of rows 
in file */
+   char   *filename;   /* file/program to read */
+   boolis_program; /* true if filename represents 
an OS command */
+   List   *options;   

Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM

2016-09-12 Thread Corey Huinker
On Mon, Sep 12, 2016 at 1:59 AM, Amit Langote  wrote:

> On 2016/09/11 8:04, Corey Huinker wrote:
> > V2 of this patch:
> >
> > Changes:
> > * rebased to most recent master
> > * removed non-tap test that assumed the existence of Unix sed program
> > * added non-tap test that assumes the existence of perl
> > * switched from filename/program to filename/is_program to more closely
> > follow patterns in copy.c
> > * slight wording change in C comments
>
> This version looks mostly good to me.  Except some whitespace noise in
> some hunks:
>
> @@ -139,7 +142,9 @@ static bool fileIsForeignScanParallelSafe(PlannerInfo
> *root, RelOptInfo *rel,
>   */
>  static bool is_valid_option(const char *option, Oid context);
>  static void fileGetOptions(Oid foreigntableid,
> -   char **filename, List **other_options);
> +   char **filename,
> +   bool *is_program,
>
> Space after "is_program,"
>
> @@ -196,16 +201,17 @@ file_fdw_validator(PG_FUNCTION_ARGS)
>
>  /*
>   * Only superusers are allowed to set options of a file_fdw foreign
> table.
> - * This is because the filename is one of those options, and we don't
> want
> - * non-superusers to be able to determine which file gets read.
> + * The reason for this is to prevent non-superusers from changing the
>
> Space after "the"
>
> -if (stat(filename, &stat_buf) == 0)
> +if ((! is_program) && (stat(filename, &stat_buf) == 0)))
>
> Space between ! and is_program.
>
>
> -if (strcmp(def->defname, "filename") == 0)
> +if ((strcmp(def->defname, "filename") == 0) ||
> (strcmp(def->defname, "program") == 0))
>
> I think the usual style would be to split the if statement into two lines
> as follows to keep within 80 characters per line [1]:
>
> +if ((strcmp(def->defname, "filename") == 0) ||
> +(strcmp(def->defname, "program") == 0))
>
> And likewise for:
>
> -   &fdw_private->filename, &fdw_private->options);
> +   &fdw_private->filename, &fdw_private->is_program,
> &fdw_private->options);
>
> By the way, doesn't the following paragraph in file-fdw.sgml need an
> update?
>
>  
>   Changing table-level options requires superuser privileges, for security
>   reasons: only a superuser should be able to determine which file is read.
>   In principle non-superusers could be allowed to change the other options,
>   but that's not supported at present.
>  
>
>
> I would like to mark this now as "Ready for Committer".
>
> Thanks,
> Amit
>
> [1] https://www.postgresql.org/docs/devel/static/source-format.html
>
>
>
(reposting non-top-posted...sorry)

Thanks for the review!

I agree with all the code cleanups suggested and have made then in the
attached patch, to save the committer some time.

Also in this patch, I changed sgml para to
 
  Changing table-level options requires superuser privileges, for security
  reasons: only a superuser should be able to determine which file is read
  or which program is run.  In principle non-superusers could be allowed to
  change the other options, but that's not supported at present.
 

"Determine" is an odd word in this context. I understand it to mean
"set/change", but I can see where a less familiar reader would take the
meaning to be "has permission to see the value already set". Either way, it
now mentions program as an option in addition to filename.
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index b471991..7f534b1 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -59,6 +59,7 @@ struct FileFdwOption
 static const struct FileFdwOption valid_options[] = {
/* File options */
{"filename", ForeignTableRelationId},
+   {"program", ForeignTableRelationId},
 
/* Format options */
/* oids option is not supported */
@@ -85,10 +86,11 @@ static const struct FileFdwOption valid_options[] = {
  */
 typedef struct FileFdwPlanState
 {
-   char   *filename;   /* file to read */
-   List   *options;/* merged COPY options, excluding 
filename */
-   BlockNumber pages;  /* estimate of file's physical 
size */
-   double  ntuples;/* estimate of number of rows 
in file */
+   char   *filename;   /* file/program to read */
+   boolis_program; /* true if filename represents 
an OS command */
+   

[HACKERS] COPY as a set returning function

2016-09-30 Thread Corey Huinker
Attached is a _very_ rough patch implementing a proof-of-concept function
copy_srf();

It allows you to do things like this:

# select a,c,e from copy_srf('echo 12,345,67,89,2016-01-01',true) as t(a
integer, b text, c text, d text, e date);
 a  | c  | e
++
 12 | 67 | 2016-01-01
(1 row)



Uses for this include:
- avoiding the pattern of creating a temp table just to select all the rows
back out and then discard the table (avoidable disk activity, avoidable oid
churn)
- avoiding the pattern of creating a file_fdw table in pg_temp just to drop
it after one select (avoidable oid churn)
- filtering file/program input by the columns that are relevant to the
user's needs.

This experiment arose from my realization that file_fdw just plugs into the
externally visible portions of copy.c to do all of it's work. So why
couldn't we do the same for a set returning function? Of course it wasn't
as simple as that. The existing Begin/NextCopyFrom functions require the
->rel to be a valid Oid...which we won't have in this context, so I had to
bypass that code and use CopyFromRawFields() directly...which isn't
externally visible, hence this being a patch to core rather than an
extension.

Currently the function only accepts two parameters, "filename" and
"is_program". Header is always false and csv mode is always true. Obviously
if we go forward on this, we'll want to add that functionality back in, but
I'm holding off for now to keep the example simple and wait for consensus
on future direction.

As for that future direction, we could either have:
- a robust function named something like copy_srf(), with parameters for
all of the relevant options found in the COPY command
- a function that accepts an options string and parse that
- we could alter the grammar to make COPY RETURNING col1, col3, col5 FROM
'filename' a legit CTE.

Regardless of the path forward, I'm going to need help in getting there,
hence this email. Thank you for your consideration.


copy_as_a_srf.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fixing inheritance merge behavior in ALTER TABLE ADD CONSTRAINT

2016-10-07 Thread Corey Huinker
On Fri, Oct 7, 2016 at 5:09 PM, Tom Lane  wrote:

> What seems like a saner answer to me is to change the backend so that it
> will accept these ALTER commands in either order, with the same end state.
> The reason it throws an error now, IMO, is just so that blindly issuing
> the same ALTER ADD CONSTRAINT twice will fail.  But we could deal with
> that by saying that it's okay as long as the initially-targeted constraint
> doesn't already have conislocal = true.
>


+1. Been bitten by this one myself.


Re: [HACKERS] proposal: psql \setfileref

2016-10-09 Thread Corey Huinker
>
> here is new update - some mentioned issues are fixed + regress tests and
>> docs
>>
>
>
>
This might tie into some work I'm doing. Is there any way these filerefs
could be used as the inline portion of a COPY command?
i.e. like this:

COPY my_table FROM STDIN
:file_ref
\.


Re: [HACKERS] proposal: psql \setfileref

2016-10-09 Thread Corey Huinker
On Sun, Oct 9, 2016 at 12:12 PM, Pavel Stehule 
wrote:

>
>
> 2016-10-09 17:27 GMT+02:00 Corey Huinker :
>
>> here is new update - some mentioned issues are fixed + regress tests and
>>>> docs
>>>>
>>>
>>>
>>>
>> This might tie into some work I'm doing. Is there any way these filerefs
>> could be used as the inline portion of a COPY command?
>> i.e. like this:
>>
>> COPY my_table FROM STDIN
>> :file_ref
>> \.
>>
>>
>>
> I understand, but I am not sure how difficult implementation it is. This
> part (COPY input) doesn't support parametrization - and parametrization can
> have a negative performance impact.
>
> Regards
>


I may not understand your response. I was thinking we'd want :file_ref to
simply 'cat' the file (or program) in place. The fact that the output was
consumed by COPY was coincidental. Does that chance your response?


Re: [HACKERS] proposal: psql \setfileref

2016-10-09 Thread Corey Huinker
>
> look to code - some parts allows psql session variables, some not - I can
> use variables in statements - not in data block.
>
> More, there is ambiguity - :xxx should not be part of SQL statement (and
> then psql variables are possible), but : should be part of data.
>
> Regards
>
> Pavel
>

Makes sense, thanks for clearing it up.


Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-09 Thread Corey Huinker
>
>
> There's actually another use case here that's potentially extremely
> valuable for warehousing and other "big data": compact representation of a
> default value.
>
>
I too would benefit from tables having either a default value in the event
of a NOT-NULL column being flagged null, or a flat-out constant.

This would be a big win in partitioned tables where the partition can only
hold one value of the partitioning column.

I guess a constant would be a pg_type where the sole value is encoded, and
the column itself is stored like an empty string.


Re: [HACKERS] proposal: psql \setfileref

2016-10-09 Thread Corey Huinker
On Sun, Oct 9, 2016 at 11:26 PM, Jim Nasby  wrote:

> On 10/9/16 9:54 PM, Bruce Momjian wrote:
>
>> I understand, but I am not sure how difficult implementation it is. This
>>> part
>>> > (COPY input) doesn't support parametrization - and parametrization can
>>> have a
>>> > negative performance impact.
>>>
>> And it would need to be \:file_ref in COPY so real data doesn't trigger
>> it.
>>
>
> ISTM it'd be much saner to just restrict file ref's to only working with
> psql's \COPY, and not server-side COPY.


Which is a great discussion for the thread on "COPY as a set returning
function". I didn't mean to hijack this thread with a misguided tie-in.


Re: [HACKERS] COPY as a set returning function

2016-10-14 Thread Corey Huinker
>
> > That sounds fantastic. It'd help this copy variant retain festure parity
> > with normal copy. And it'd bring us closer to being able to FETCH in non
> > queries.
>
> On second thought, though, this couldn't exactly duplicate the existing
> COPY syntax, because COPY relies heavily on the rowtype of the named
> target table to tell it what it's copying.  You'd need some new syntax
> to provide the list of column names and types, which puts a bit of
> a hole in the "syntax we already know" argument.  A SRF-returning-record
> would have a leg up on that, because we do have existing syntax for
> defining the concrete rowtype that any particular call returns.
>
> regards, tom lane
>


I would like to make COPY itself a SRF. That's a bit beyond my
capabilities, so if that is the route we want to go, I will need help.

The syntax would probably look like this (new bits in bold):

WITH my_copy  AS (
COPY FROM 'example.csv' TO *RESULT SET(c1 text, c2 integer, dummy1
text, dummy2 text, c5 date)* WITH (FORMAT CSV)
*RETURNING c1, c2, c3*
)
SELECT ...
FROM my_copy
LEFT OUTER JOIN ref_table ...


The RESULT SET (colspecs) bit would be the rsinfo currently used by
copy_srf(). It would be nice if the CTE declaration could take types, but
it can't.

The RETURNING clause here doesn't return all the columns made available
from the COPY. That would be nice, but not required because the same
filtration could be done when the CTE is referenced. So if we require RETURNING
* be the only returning option I'd be fine with that.

If we're ok with adding a function like copy_srf() to the core, will we
still be happy with it when COPY does get a RETURNING clause?


Somewhat off-topic: here's some other observations of a n00b who spent a
fair amount of time looking at the copy.c code.

1. BeginCopyFrom and NextCopyFrom pull attlist/tupdesc info from ->rel,
repeatedly. If we were going to try to leverage that code we'd need to
store those things in a separate cstate member so that we add complexity
only in the initialization of the copy state data struct, pulling the
result structure from rsinfo rather than a relation. There's probably a
minor performance gain to be had in keeping that info around. Refactoring
those two procs to allow for a pre-set attlist/tupdesc would help.

2. NextCopyFrom() checks every single time to see if it's row 0 and if it
should skip this header row. I know a single (row_num == 0 && has_header)
isn't much extra processing, but shouldn't we digest and discard headers
before going into the per-row loop?

3. All the code that handles indexes, constraints, buffering, etc, simply
doesn't apply in the SRF context.

4. The code somewhat needlessly mixes code for the COPY FROM and COPY TO
cases. There's probably a good reason for this, but it made for a lot of
clutter in achieving my very narrow goal.


Re: [HACKERS] COPY as a set returning function

2016-10-17 Thread Corey Huinker
On Sun, Oct 16, 2016 at 9:01 AM, Craig Ringer 
wrote:

> On 15 Oct. 2016 04:56, "Corey Huinker"  wrote:
>
> > I would like to make COPY itself a SRF. That's a bit beyond my
> capabilities, so if that is the route we want to go, I will need help.
> >
> > The syntax would probably look like this (new bits in bold):
> >
> >> WITH my_copy  AS (
> >> COPY FROM 'example.csv' TO RESULT SET(c1 text, c2 integer, dummy1
> text, dummy2 text, c5 date) WITH (FORMAT CSV)
> >> RETURNING c1, c2, c3
> >> )
>
> Strong -1 from me on this approach. Our CTE implementation materializes
> everything so this is no better than COPYing to a temp table.
>
> Not unless you plan to fix that (and figure out the backward compatibility
> issues since the bug is documented as a feature) or implement RETURNING in
> subqueries... I'd go for the function.
>

Well, it saves burning the oid and the pg_attribute rows. A few long
running transactions can cause pg_attribute to bloat to 400GB on one of our
systems - hence my wanting something like this function.

If it does stay a function, we only need to implement 8 of the 12 options
as parameters (FREEZE and FORCE* options don't apply). My guess is that
future options added to COPY will be more about handling output or
optimizing table inserts, neither of which mean more options for this
proposed function.

Would the best approach be to build in a core srf-returning function that
might be deprecated once COPY is set-returning AND CTEs don't have to
materialize, or to refactor what's in copy.c such that a contrib module can
easily plug into it, and have copy_srf live there?


Re: [HACKERS] Multiple psql history files

2016-10-18 Thread Corey Huinker
On Tue, Oct 18, 2016 at 12:26 PM, Jonathan Jacobson 
wrote:

> The .psql_history file is naturally used by different DB connections
> (distinguished by a different combination of host + port + database + user).
> At least in my multi-database working environment, this leads sometimes to
> frustration because there are commands that cannot or should not be used by
> different connections.
> To solve this, psql could keep a separate command history file for each
> connection.
> I will be happy to make this my first contribution to PostgreSQL's code.
> What do you say?
>
> Regards,
> Jonathan
>

That's settable with HISTFILE

\set HISTFILE ~/.psql_history- :DBNAME

There's also :PORT and a few other vars.

(example borrowed from
http://stackoverflow.com/questions/17924906/where-is-psql-client-history-kept-psql-history-non-existant
)


Re: [HACKERS] COPY as a set returning function

2016-10-31 Thread Corey Huinker
Attached is a patch that implements copy_srf().

The function signature reflects cstate more than it reflects the COPY
options (filename+is_program instead of  FILENAME or PROGRAM, etc)

CREATE OR REPLACE FUNCTION copy_srf(
   filenametext DEFAULT null,
   is_program  boolean DEFAULT false,
   format  text DEFAULT 'text',  /* accepts text, csv, binary */
   delimiter   text DEFAULT null,
   null_string text DEFAULT E'\\N',
   header  boolean DEFAULT false,
   quote   text DEFAULT null,
   escape  text DEFAULT null,
   encodingtext DEFAULT null)
RETURNS SETOF RECORD

The major utility for this (at least for me) will be in ETLs that currently
make a lot of use of temp tables, and wish to either reduce I/O or avoid
pg_attribute bloat.

I have not yet implemented STDIN mode, but it's a start.

$ psql test
psql (10devel)
Type "help" for help.

# select * fromcopy_srf('echo 1,2; echo 4,5',true,'csv') as t(x text, y
text);
 x | y
---+---
 1 | 2
 4 | 5
(2 rows)

Time: 1.456 ms
# select * fromcopy_srf('/tmp/small_file.txt'::text,false,'text') as
t(x text, y text);
  x  |  y
-+-
 1   | 2
 a   | b
 cde | fgh
(3 rows)


On Mon, Oct 17, 2016 at 2:33 PM, Merlin Moncure  wrote:

> On Fri, Sep 30, 2016 at 9:56 PM, Tom Lane  wrote:
> > Craig Ringer  writes:
> >> On 1 Oct. 2016 05:20, "Tom Lane"  wrote:
> >>> I think the last of those suggestions has come up before.  It has the
> >>> large advantage that you don't have to remember a different syntax for
> >>> copy-as-a-function.
> >
> >> That sounds fantastic. It'd help this copy variant retain festure parity
> >> with normal copy. And it'd bring us closer to being able to FETCH in non
> >> queries.
> >
> > On second thought, though, this couldn't exactly duplicate the existing
> > COPY syntax, because COPY relies heavily on the rowtype of the named
> > target table to tell it what it's copying.  You'd need some new syntax
> > to provide the list of column names and types, which puts a bit of
> > a hole in the "syntax we already know" argument.  A SRF-returning-record
> > would have a leg up on that, because we do have existing syntax for
> > defining the concrete rowtype that any particular call returns.
>
> One big disadvantage of SRF-returning-record syntax is that functions
> are basically unwrappable with generic wrappers sans major gymnastics
> such as dynamically generating the query and executing it.  This is a
> major disadvantage relative to the null::type hack we use in the
> populate_record style functions and perhaps ought to make this
> (SRF-returning-record syntax) style of use discouraged for useful
> library functions.  If there were a way to handle wrapping I'd
> withdraw this minor objection -- this has come up in dblink
> discussions a few times).
>
> merlin
>
diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index ada2142..0876ee1 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1006,6 +1006,21 @@ LANGUAGE INTERNAL
 STRICT IMMUTABLE PARALLEL SAFE
 AS 'jsonb_insert';
 
+CREATE OR REPLACE FUNCTION copy_srf(
+   IN filename text DEFAULT null, 
+   IN is_program boolean DEFAULT false,
+   IN format text DEFAULT 'text',
+   IN delimiter text DEFAULT null,
+   IN null_string text DEFAULT E'\\N',
+   IN header boolean DEFAULT false,
+   IN quote text DEFAULT null,
+   IN escape text DEFAULT null,
+   IN encoding text DEFAULT null)
+RETURNS SETOF RECORD
+LANGUAGE INTERNAL
+VOLATILE ROWS 1000 COST 1000 CALLED ON NULL INPUT
+AS 'copy_srf';
+
 -- The default permissions for functions mean that anyone can execute them.
 -- A number of functions shouldn't be executable by just anyone, but rather
 -- than use explicit 'superuser()' checks in those functions, we use the GRANT
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b4140eb..90ed2c5 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -30,6 +30,7 @@
 #include "commands/defrem.h"
 #include "commands/trigger.h"
 #include "executor/executor.h"
+#include "funcapi.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
@@ -555,7 +556,6 @@ CopyGetData(CopyState cstate, void *databuf, int minread, 
int maxread)
 errmsg("could not read from 
COPY file: %m")));
break;
case COPY_OLD_FE:
-
/*
 * We cannot read more than minread bytes (which in 
practice is 1)
 * because old protocol doesn't have any clear way of 
separating
@@ -4555,3 +4555,377 @@ CreateCopyDestReceiver(void)
 
return (DestReceiver *) self;
 }
+
+Datum
+copy_srf(PG_FUNCTION_ARGS)
+{
+   ReturnSetInfo   *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+   TupleDesc   tupdesc;
+   Tuplestorestate *tupstore = NULL;
+   Memory

Re: [HACKERS] sequential scan result order vs performance

2016-10-31 Thread Corey Huinker
On Sun, Oct 30, 2016 at 11:37 PM, Jim Nasby 
wrote:

> BTW, I've sometimes wished for a mode where queries would silently have
> result ordering intentionally futzed, to eliminate any possibility of
> dependence on tuple ordering (as well as having sequences start at some
> random value). I guess with the hooks that are in place today it wouldn't
> be hard to stick a ORDER BY random() in if there wasn't already a Sort node
> at the top level...


+1
In Oracle, we sorta had that feature by adding a parallel hint to a query
even if it didn't need it. It came in handy.


Re: [HACKERS] Declarative partitioning - another take

2016-11-01 Thread Corey Huinker
On Tue, Nov 1, 2016 at 12:57 PM, Robert Haas  wrote:

> For strings and numeric types that are not integers, there is in
> theory a loss of power.  If you want a partition that allows very
> value starting with 'a' plus the string 'b' but not anything after
> that, you are out of luck.  START ('a') END ('b') INCLUSIVE would have
> done exactly what you want, but now you need to store the first string
> that you *don't* want to include in that partition, and what's that?
> Dunno.  Or similarly if you want to store everything from 1.0 up to
> and including 2.0 but nothing higher, you can't, really.
>
>
Exactly. This is especially true for date ranges. There's a lot of
cognitive dissonance in defining the "2014" partition as < '2015-01-01', as
was the case in Oracle waterfall-style partitioning. That was my reasoning
for pushing for range-ish syntax as well as form.


> But who wants that?  People who are doing prefix-based partitioning of
> their text keys are going to want all of the 'a' things together, and
> all of the 'b' things in another category.  Same for ranges of
> floating-point numbers, which are also probably an unlikely candidate
> for a partitioning key anyway.
>

/me raises hand.  We have tables with a taxonomy in them where the even
data splits don't fall on single letter boundaries, and often the single
string values have more rows than entire letters. In those situations,
being able to express ['XYZ','XYZ'] is important.  ['XYZ,'XZ') would let
'XYZ1' bleed into the partition and ['XYZ','XYZ1') lets in other values,
and so I go chasing down the non-discrete set rabbit hole.

If we're worried about keywords, maybe a BOUNDED '[]' clause?


Re: [HACKERS] Declarative partitioning - another take

2016-11-01 Thread Corey Huinker
On Tue, Nov 1, 2016 at 2:01 PM, Robert Haas  wrote:

> Yeah.  That syntax has some big advantages, though.  If we define that
> partition as START ('2014-01-01') INCLUSIVE END ('2014-12-31')
> INCLUSIVE, there's no way for the system to tell that the there's no
> gap between the that ending bound and the starting bound of the 2015
> partition, because the system has no domain-specific knowledge that
> there is no daylight between 2014-12-31 and 2015-01-01.  So if we
> allow things to be specified that way, then people will use that
> syntax and then complain when it doesn't perform quite as well as
> START ('2014-01-01') END ('2015-01-01').  Maybe the difference isn't
> material and maybe we don't care; what do you think?
>

It was a fight I didn't expect to win, and even if we don't get
[x,x]-expressible partitions, at least we're not in the Oracle
context-waterfall, where the lower bound of your partition is determined by
the upper bound of the NEXT partition.

(I really don't want to get tied up adding a system for adding and
> subtracting one to and from arbitrary data types.  Life is too short.
> If that requires that users cope with a bit of cognitive dissidence,
> well, it's not the first time something like that will have happened.
> I have some cognitive dissidence about the fact that creat(2) has no
> trailing "e" but truncate(2) does, and moreover the latter can be used
> to make a file longer rather than shorter.  But, hey, that's what you
> get for choosing a career in computer science.)
>

That noise your heard was the sound of my dream dying.


Re: [HACKERS] Declarative partitioning - another take

2016-11-01 Thread Corey Huinker
>
> OTOH I've seen a lot of people bitten by [2014-01-01,2014-12-31] on
> TIMESTAMP intervals.
>

No argument there.


> Everybody remembers december has 31 days, but when we have to do
> MONTHLY partitions if you use closed intervals someone always miskeys
> the number of days, or forgets wheter a particular year is leap or
> not, and when doing it automatically I always have to code it as start
> + 1 month - 1day. In my experience having the non-significant part of
> the dates ( days in monthly case, months too in yearly cases ) both 1
> and equal in start and end makes it easier to check and identify, and
> less error prone.
>

Being able to define partitions by expressions based on the values I want
would be a good thing.


> You just do the classical ( I've had to do it ) closed end || minimum
> char ( "XYZ","XYZ\0" in this case ). It is not that difficult as
> strings have a global order, the next string to any one is always that
> plus the \0, or whatever your minimum is.
>

I've thought about doing that in the past, but wasn't sure it would
actually work correctly. If you have experience with it doing so, that
would be encouraging. How does that *look* when you print your partition
layout though?


Re: [HACKERS] Declarative partitioning - another take

2016-11-01 Thread Corey Huinker
On Tue, Nov 1, 2016 at 2:24 PM, Robert Haas  wrote:

> Well, I'm not sure we've exactly reached consensus here, and you're
> making me feel like I just kicked a puppy.
>

It was hyperbole. I hope you found it as funny to read as I did to write.
This is a great feature and I'm not going to make "perfect" the enemy of
"good".


Re: [HACKERS] Making table reloading easier

2016-11-03 Thread Corey Huinker
>
>
>   ALTER TABLE my_table
>   DISABLE INDEX ALL;
>

+1
This very thing came up in a conversation with PeterG early last year. I
was in favor then and I was surprised that the only thing standing in the
way was a lack of ALTER TABLE syntax.
Creating temporary data structures to mimic existing metadata structures is
a pain.


>
> It'd be even better to also add a REINDEX flag to COPY, where it
> disables indexes and re-creates them after it finishes. But that could
> be done separately.
>

I'm iffy on the COPY change. If we add index rebuilding, why not disabling
as well? If the COPY fails, what state do we leave the indexes in?

I'm not sure I can tackle this in the current dev cycle,


I may have some spare cycles to devote to this, but it's unfamiliar
territory. I'm happy to do the grunt work if I had some higher level
guidance.


[HACKERS] Copying Permissions

2016-11-03 Thread Corey Huinker
Craig's post yesterday about exposing syntax for disabling indexes reminded
me of another feature I think we're lacking in areas where we have to do
table management.

The issue is to create a *something* that has the exact permissions of
another *something*. Usually it's creating a table related to (but not yet
inheriting) a parent, but it could also be to drop and recreate a
*something*, making sure it has the same permissions it had before.

BEGIN;

CREATE VIEW dummy AS SELECT 1::text as dummy;

UPDATE pg_class
SET relacl = ( SELECT relacl FROM pg_class
   WHERE oid = 'foo'::regclass)
WHERE oid = 'dummy'::regclass;

DROP VIEW foo;

CREATE VIEW foo AS ;

UPDATE pg_class
SET relacl = ( SELECT relacl FROM pg_class
   WHERE oid = 'dummy'::regclass)
WHERE oid = 'foo'::regclass;

END;

I suppose I could have learned how to store a relacl as a scalar and just
saved it to a variable, but even then I'd have to twiddle with (and have
the permissions to twiddle with) pg_class.

So it'd be nice to:
ALTER TABLE bar SET PERMISSIONS FROM foo;
or maybe even
GRANT SAME PERMISSIONS ON VIEW bar FROM foo;

Thoughts?


Re: [HACKERS] Copying Permissions

2016-11-09 Thread Corey Huinker
>
> > SET relacl = ( SELECT relacl FROM pg_class
> >WHERE oid = 'foo'::regclass)
> > WHERE oid = 'dummy'::regclass;
>
> Yikes, let's not suggest folks go updating catalog tables, ever, please.
> :)
>

Glad you find that as icky as I do.


> Should we have a way for users to define an ACL ala the ALTER DEFAULT
> PERMISSIONS approach where the ACL is not attached to any particular
> object and is, instead, something which can be assigned to a table.
> Further, if we support that, what happens if that is changed later?  I
> could see use-cases for both having that change impact all objects and
> for not having that happen.
>

I think allowing users to receive and send serialized relacl values (which
is what I *think* you're asking about here) is only slightly less icky, and
presents a backward compatibility issue. Those issues go away if the ACL is
contained in an existing object, or exists only for the life of a
statement. In which case I think you're suggesting something like this:


BEGIN;
 GATHER TEMPORARY DEFAULT PRIVILEGES FROM view_name;
 DROP VIEW view_name;
 CREATE VIEW view_name as ...;

COMMIT;


Which would solve the problem provided I don't want to drop dependent
objects with different permissions. Once I have to do a DROP a;DROP
b;CREATE b;CREATE a; and the permissions of A and B don't match, I'm sunk.


Second, as always, what's the syntax going to actually be?  I don't
> think GRANT SAME PERMISSIONS is going to work out too well in the
> parser, and it seems a bit grotty to me anyway.  I do think this should
> be associated with GRANT rather than ALTER TABLE- GRANT is what we use
> for managing privileges on an object.
>

So GRANT / REVOKE are a bit weird in this case, because they operate on an
object as it pertains to 1+ roles. Here are adding in a reference to
another like-typed object, and the roles aren't even mentioned.

Moreover, the operation itself would potentially do both GRANTing and
REVOKEing, depending on what the target objects permissions were relative
to the source object. So there's situations where an object could end up
with fewer permissions after a GRANT than it had before.

Or...we could instead decide that the GRANT only adds permissions, never
revokes, and if the user wants an exact copy then it's up to them to first
revoke all privs on the new object before the GRANT. Either way, the syntax
might be:

BEGIN;
 CREATE TEMPORARY VIEW dummy AS SELECT 1 AS dummy_col;
 GRANT ALL PRIVILEGES ON VIEW dummy FROM my_view;
 DROP VIEW my_view;
 CREATE VIEW my_view ...;
 REVOKE ALL PRIVILEGES on my_view FROM public ; /* repeat for every other
role you can think of ... ick */
 GRANT ALL PRIVILEGES ON VIEW my_view FROM dummy;
COMMIT;

That's still clumsy, but at least we've avoided having a user touch
pg_class.relacl.

So after all that wrangling, i got around to where Tom got rather quickly:
ALTER TABLE x COPY PERMISSIONS FROM y;

If we're worried about the ALTER-person's authority to GRANT things already
granted to table y, then I suppose the best thing to do would be this:

1. Strip all permissions from x (empty relacl), so the ALTER-ing person
pretty much has to be the owner.
2. Iterate over the permissions in the relacl of y, and attempt to grant
them (as the ALTER-person) one by one, issuing NOTICE or WARNING whenever a
grant fails.
3. The operation is judged to have succeeded if at least one permission is
granted, or NO grants failed (i.e. there was nothing to grant).

I can see obvious problems with copying grants from one user to another on
an existing object not of the user's creation, but in this case the
ALTER-ing person already has ownership (or close) of the object, they're
not compromising any previously existing object. Still, I'm sure somebody
could dream up a priv escalation somehow, hence my
iterate-and-spaghetti-test the individual grants approach.


On Wed, Nov 9, 2016 at 1:35 PM, Tom Lane  wrote:

> Robert Haas  writes:
> > On Tue, Nov 8, 2016 at 9:48 AM, Stephen Frost 
> wrote:
> >> Second, as always, what's the syntax going to actually be?  I don't
> >> think GRANT SAME PERMISSIONS is going to work out too well in the
> >> parser, and it seems a bit grotty to me anyway.  I do think this should
> >> be associated with GRANT rather than ALTER TABLE- GRANT is what we use
> >> for managing privileges on an object.
>
> > One thing to think about is that GRANT sort of implies adding
> > privileges, but this operation would both add and remove privileges as
> > necessary.
>
> Other things to think about:
>
> 1. If you can GRANT x, that generally implies that you can REVOKE x.
> What would REVOKE SAME PERMISSIONS mean?
>
> 2. The GRANT/REVOKE syntax is largely governed by the SQL standard.
> We risk getting boxed in by picking something that will conflict
> with future spec extensions in this area.
>
> On the whole, I suspect some sort of "ALTER TABLE x COPY PERMISSIONS
> FROM y" syntax would be better.
>
> BTW, please specify what the grantor 

Re: [HACKERS] Do we need use more meaningful variables to replace 0 in catalog head files?

2016-11-10 Thread Corey Huinker
On Wed, Nov 9, 2016 at 10:47 AM, Tom Lane  wrote:

> Yeah, that's the thread I remembered.  I think the basic conclusion was
> that we needed a Perl script that would suck up a bunch of data from some
> representation that's more edit-friendly than the DATA lines, expand
> symbolic representations (regprocedure etc) into numeric OIDs, and write
> out the .bki script from that.  I thought some people had volunteered to
> work on that, but we've seen no results ...
>

If there are no barriers to adding it to our toolchain, could that
more-edit-friendly representation be a SQLite database?

I'm not suggesting we store a .sqlite file in our repo. I'm suggesting that
we store the dump-restore script in our repo, and the program that
generates the .bki script would query the generated SQLite db.

>From that initial dump, any changes to pg_proc.h would be appended to the
dumped script

...

/* add new frombozulation feature */

ALTER TABLE pg_proc_template ADD frombozulator text;
/* bubbly frombozulation is the default for volatile functions */
UPDATE pg_proc_template SET frombozulator = 'bubbly' WHERE provolatile =
'v';

/* proposed new function */

INSERT INTO pg_proc_template(proname,proleakproof) VALUES ("new_func",'f');



That'd communicate the meaning of our changes rather nicely. A way to eat
our own conceptual dogfood.

Eventually it'd get cluttered and we'd replace the populate script with a
fresh ".dump". Maybe we do that as often as we reformat our C code.

I think Stephen Frost suggested something like this a while back, but I
couldn't find it after a short search.


Re: [HACKERS] Do we need use more meaningful variables to replace 0 in catalog head files?

2016-11-10 Thread Corey Huinker
On Thu, Nov 10, 2016 at 6:41 PM, Tom Lane  wrote:

> I think you've fundamentally missed the point here.  A data dump from a
> table would be semantically indistinguishable from the lots-o-DATA-lines
> representation we have now.  What we want is something that isn't that.
> In particular I don't see how that would let us have any extra level of
> abstraction that's not present in the finished form of the catalog tables.
>

I was thinking several tables, with the central table having column values
which we find semantically descriptive, and having lookup tables to map
those semantically descriptive keys to the value we actually want in the
pg_proc column. It'd be a tradeoff of macros for entries in lookup tables.


> I'm not very impressed with the suggestion of making a competing product
> part of our build dependencies, either.
>

I don't see the products as competing, nor did the presenter of
https://www.pgcon.org/2014/schedule/events/736.en.html (title: SQLite:
Protégé of PostgreSQL). That talk made the case that SQLite's goal is to be
the foundation of file formats, not an RDBMS. I do understand wanting to
minimize build dependencies.


> If we wanted to get into build
> dependency circularities, we could consider using a PG database in this
> way ... but I prefer to leave such headaches to compiler authors for whom
> it comes with the territory.
>

Agreed, bootstrapping builds aren't fun. This suggestion was a way to have
a self-contained format that uses concepts (joining a central table to
lookup tables) already well understood in our community.


Re: [HACKERS] proposal: psql \setfileref

2016-11-15 Thread Corey Huinker
On Sun, Nov 13, 2016 at 1:46 PM, Pavel Stehule 
wrote:

> r - read file without any modification
> rq - read file, escape as literal and use outer quotes
> rb - read binary file - transform to hex code string
> rbq - read binary file, transform to hex code string and use outer quotes
>
> Usage:


I am not asking for this feature now, but do you see any barriers to later
adding x/xq/xb/xbq equivalents for executing a shell command?


[HACKERS] dblink get_connect_string() passes FDW option "updatable" to the connect string, connection fails.

2016-11-20 Thread Corey Huinker
I ran into this today:

select current_database() as current_db \gset
CREATE EXTENSION postgres_fdw;
CREATE EXTENSION
CREATE EXTENSION dblink;
CREATE EXTENSION
CREATE ROLE bob LOGIN PASSWORD 'bob';
CREATE ROLE
CREATE SERVER bob_srv FOREIGN DATA WRAPPER postgres_fdw OPTIONS ( host
'localhost', dbname :'current_db' );
CREATE SERVER
CREATE USER MAPPING FOR public SERVER bob_srv OPTIONS ( user 'bob',
password 'bob' );
CREATE USER MAPPING
SELECT *
FROM dblink('bob_srv','SELECT 1') as t(x integer);
 x
---
 1
(1 row)

ALTER SERVER bob_srv OPTIONS (updatable 'true');
ALTER SERVER
SELECT *
FROM dblink('bob_srv','SELECT 1') as t(x integer);
psql:bug_example.sql:18: ERROR:  could not establish connection
DETAIL:  invalid connection option "updatable"


Is this something we want to fix?
If so, are there any other fdw/server/user-mapping options that we don't
want to pass along to the connect string?

Steps to re-create:

bug_example.sh:#!/bin/bash
dropdb bug_example
dropuser bob
createdb bug_example
psql bug_example -f bug_example.sql


bug_example.sql:

\set ECHO all

select current_database() as current_db \gset

CREATE EXTENSION postgres_fdw;
CREATE EXTENSION dblink;
CREATE ROLE bob LOGIN PASSWORD 'bob';
CREATE SERVER bob_srv FOREIGN DATA WRAPPER postgres_fdw OPTIONS ( host
'localhost', dbname :'current_db' );
CREATE USER MAPPING FOR public SERVER bob_srv OPTIONS ( user 'bob',
password 'bob' );

SELECT *
FROM dblink('bob_srv','SELECT 1') as t(x integer);

ALTER SERVER bob_srv OPTIONS (updatable 'true');

SELECT *
FROM dblink('bob_srv','SELECT 1') as t(x integer);


Re: [HACKERS] dblink get_connect_string() passes FDW option "updatable" to the connect string, connection fails.

2016-11-21 Thread Corey Huinker
>
> The dblink docs recommend using dblink_fdw as the FDW for this purpose,
> which would only accept legal connstr options.  However, I can see the
> point of using a postgres_fdw server instead, and considering that
> dblink isn't actually enforcing use of any particular FDW type, it seems
> like the onus should be on it to be more wary of what the options are.
>

I have to admit, this was the first I'd heard of dblink_fdw. I'm glad it
exists, though. And yes, I'd like to be able to use postgres_fdw entries
because there's value in knowing that the connection for your ad-hoc SQL
exactly matches the connection used for other FDW tables.


> It looks like this might be fairly easy to fix by having
> get_connect_string() use is_valid_dblink_option() to check each
> option name, and silently ignore options that are inappropriate.
>

>From what I can tell, it is very straightforward, the context oids are set
up just a few lines above where the new is_valid_dblink_option() calls
would be.

I'm happy to write the patch, for both v10 and any back-patches we feel are
necessary. However, I suspect with a patch this trivial that reviewing
another person's patch might be more work for a committer than just doing
it themselves. If that's not the case, let me know and I'll get started.


Re: [HACKERS] COPY IN/BOTH vs. extended query mode

2017-02-19 Thread Corey Huinker
On Sun, Feb 19, 2017 at 9:04 AM, Robert Haas  wrote:

If you tried to write an SQL-callable function that internally started
> and ended a copy from the client, then I think you would run into this
> problem, and probably some others.
>
>
That's it. I had a PoC patch submitted that allowed someone to do this

insert into some_table(id, total_sum) select id, sum(a_numeric_metric) from
copy_srf('/a/file/name') group by 1

or

insert into some_table(id, total_sum) select id, sum(a_numeric_metric) from
copy_srf('/a/program/name arg1 arg2',true) group by 1


and those worked just fine, however, attempts to use the STDIN

insert into some_table(id, total_sum) select id, sum(a_numeric_metric) from
copy_srf(null) group by 1

failed, because as it was explained to me, the order of such events would
be:

1. start query
2. send result set format to client
3. start copy which implies that query result set is done
4. finish copy
5. emit query results to client, but the defining result format is gone,
thus error.

I'm just putting this here for future reference in case there is a protocol
change in the works.


Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-02-22 Thread Corey Huinker
>
>  but if you think that it should be somewhere else, please advise Corey
> about where to put it.


Just a reminder that I'm standing by for advice.

The issue at hand is whether the if-state should be a part of the
PsqlScanState, or if it should be a separate state variable owned by
MainLoop() and passed to HandleSlashCommands(), ... or some other solution.


Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-02-22 Thread Corey Huinker
On Wed, Feb 22, 2017 at 4:00 PM, Tom Lane  wrote:

> Corey Huinker  writes:
> >> but if you think that it should be somewhere else, please advise Corey
> >> about where to put it.
>
> > Just a reminder that I'm standing by for advice.
>
> Sorry, I'd lost track of this thread.
>

Judging by the volume of the 50-or-so active threads on this list, I
figured you were busy.


> > The issue at hand is whether the if-state should be a part of the
> > PsqlScanState, or if it should be a separate state variable owned by
> > MainLoop() and passed to HandleSlashCommands(), ... or some other
> solution.
>
> My reaction to putting it in PsqlScanState is pretty much "over my dead
> body".  That's just trashing any pretense of an arms-length separation
> between psql and the lexer proper.  We only recently got done sweating
> blood to create that separation, why would we toss it away already?
>

Good to know that history.


>
> If you're concerned about the notational overhead of passing two arguments
> rather than one, my druthers would be to invent a new struct type, perhaps
> named along the lines of PsqlFileState or PsqlInputState, and pass that
> around.  One of its fields would be a PsqlScanState pointer, the rest
> would be for if-state and whatever else we think we need in per-input-file
> state.
>

I wasn't, my reviewer was. I thought about the super-state structure like
you described, and decided I was happy with two state params.


> However, that way is doubling down on the assumption that the if-state is
> exactly one-to-one with input file levels, isn't it?  We might be better
> off to just live with the separate arguments to preserve some flexibility
> in that regard.  The v12 patch doesn't look that awful in terms of what
> it's adding to argument lists.
>

The rationale for tying if-state to file levels is not so much of anything
if-then related, but rather of the mess we'd create for whatever poor soul
decided to undertake \while loops down the road, and the difficulties
they'd have trying to unwind/rewind jump points in file(s)...keeping it
inside one file makes things simpler for the coding and the coder.


> One thing I'm wondering is why the "active_branch" bool is in "pset"
> and not in the conditional stack.  That seems, at best, pretty grotty.
> _psqlSettings is meant for reasonably persistent state.
>

With the if-stack moved to MainLoop(), nearly all the active_branch checks
could be against a variable that lives in MainLoop(), with two big
exceptions: GetVariable() needs to know when NOT to expand a variable
because it's in a false-block, and get_prompt will need to know when it's
in a false block for printing the '@' prompt hint or equivalent, and pset
is the only global around I know of to do that. I can move nearly all the
is-this-branch-active checks to structures inside of MainLoop(), and that
does strike me as cleaner, but there will still have to be that gross bit
where we update pset.active_branch so that the prompt and GetVariable() are
clued in.


Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-02-22 Thread Corey Huinker
>
>
> Dunno, that sounds a lot like an "if the only tool I have is a hammer,
> then this must be a nail" argument.


More of a "don't rock the boat more than absolutely necessary", but knowing
that adding another global struct might be welcomed is good to know.


> reasonable interpretation of what it's for.  Inventing a PsqlFileState or
> similar struct might be a good idea to help pull some of that cruft out of
> pset and get it back to having a reasonably clearly defined purpose of
> holding "current settings".
>

+1

command was ignored" warning messages).  I'm failing to follow why
>
GetVariable would need to care.
>

It took me a second to find the post, written by Daniel Verite on Jan 26,
quoting:

> Revised patch

A comment about control flow and variables:
in branches that are not taken, variables are expanded
nonetheless, in a way that can be surprising.
Case in point:

\set var 'ab''cd'
-- select :var;
\if false
  select :var ;
\else
  select 1;
\endif

The 2nd reference to :var has a quoting hazard, but since it's within
an "\if false" branch, at a glance it seems like this script might work.
In fact it barfs with:
  psql:script.sql:0: found EOF before closing \endif(s)

AFAICS what happens is that :var gets injected and starts a
runaway string, so that as far as the parser is concerned
the \else ..\endif block slips into the untaken branch, as a part of
that unfinished string.



So that was the reasoning behind requiring GetVariable to know whether or
not the statement was being ignored.


Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-02-22 Thread Corey Huinker
On Wed, Feb 22, 2017 at 5:11 PM, Corey Huinker 
wrote:

> Dunno, that sounds a lot like an "if the only tool I have is a hammer,
>> then this must be a nail" argument.
>
>
> More of a "don't rock the boat more than absolutely necessary", but
> knowing that adding another global struct might be welcomed is good to know.
>
>

After some research, GetVariable is called by psql_get_variable, which is
one of the callback functions passed to psql_scan_create(). So passing a
state variable around probably isn't going to work and PsqlFileState now
looks like the best option.


Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-02-22 Thread Corey Huinker
On Wed, Feb 22, 2017 at 5:59 PM, Tom Lane  wrote:

> Ah, I see why *that* wants to know about it ... I think.  I suppose you're
> arguing that variable expansion shouldn't be able to insert, say, an \else
> in a non-active branch?  Maybe, but if it can insert an \else in an active
> branch, then why not non-active too?  Seems a bit inconsistent.
>

The major reason was avoiding situations like what Daniel showed: where
value of a variable that is meaningless/undefined in the current
false-block context gets expanded anyway, and thus code inside a false
block has effects outside of that block. Granted, his example was
contrived. I'm open to removing that feature and seeing what breaks in the
test cases.


Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-02-22 Thread Corey Huinker
On Wed, Feb 22, 2017 at 6:15 PM, Corey Huinker 
wrote:

> On Wed, Feb 22, 2017 at 5:59 PM, Tom Lane  wrote:
>
>> Ah, I see why *that* wants to know about it ... I think.  I suppose you're
>> arguing that variable expansion shouldn't be able to insert, say, an \else
>> in a non-active branch?  Maybe, but if it can insert an \else in an active
>> branch, then why not non-active too?  Seems a bit inconsistent.
>>
>
> The major reason was avoiding situations like what Daniel showed: where
> value of a variable that is meaningless/undefined in the current
> false-block context gets expanded anyway, and thus code inside a false
> block has effects outside of that block. Granted, his example was
> contrived. I'm open to removing that feature and seeing what breaks in the
> test cases.
>


Welcome to v15, highlights:
- all conditional data structure management moved to conditional.h and
conditional.c
- conditional state lives in mainloop.c and is passed to
HandleSlashCommands, exec_command and get_prompt as needed
- no more pset.active_branch, uses conditional_active(conditional_stack)
instead
- PsqlScanState no longer has branching state
- Implements the %R '@' prompt on false branches.
- Variable expansion is never suppressed even in false blocks, regression
test edited to reflect this.
- ConditionalStack could morph into PsqlFileState without too much work.
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index ae58708..dac8e37 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -2035,6 +2035,79 @@ hello 10
 
   
 
+  
+\if expr
+\elif expr
+\else
+\endif
+
+
+This group of commands implements nestable conditional blocks, like
+this:
+
+
+-- set ON_ERROR_STOP in case the variables are not valid boolean expressions
+\set ON_ERROR_STOP on
+SELECT
+EXISTS(SELECT 1 FROM customer WHERE customer_id = 123) as is_customer,
+EXISTS(SELECT 1 FROM employee WHERE employee_id = 456) as is_employee
+\gset
+\if :is_customer
+SELECT * FROM customer WHERE customer_id = 123;
+\elif :is_employee
+\echo 'is not a customer but is an employee'
+SELECT * FROM employee WHERE employee_id = 456;
+\else
+\if yes
+\echo 'not a customer or employee'
+\else
+\echo 'this should never print'
+\endif
+\endif
+
+
+Conditional blocks must begin with a \if and end
+with an \endif, and the pairs must be found in
+the same source file. If an EOF is reached on the main file or an
+\include-ed file before all local
+\if-\endif are matched, then
+psql will raise an error.
+
+
+A conditional block can have any number of 
+\elif clauses, which may optionally be followed by a
+single \else clause.
+
+
+The \if and \elif commands
+read the rest of the line and evaluate it as a boolean expression.
+Currently, expressions are limited to a single unquoted string
+which are evaluated like other on/off options, so the valid values
+are any unambiguous case insensitive matches for one of:
+true, false, 
1,
+0, on, off,
+yes, no.  So 
+t, T, and tR
+will all match true.
+
+Expressions that do not properly evaluate to true or false will
+generate an error and cause the \if or
+\elif command to fail.  Because that behavior may
+change branching context in undesirable ways (executing code which
+was intended to be skipped, causing \elif,
+\else, and \endif commands to
+pair with the wrong \if, etc), it is
+recommended that scripts which use conditionals also set
+ON_ERROR_STOP.
+
+
+Lines within false branches are not evaluated in any way: queries are
+not sent to the server, non-conditional commands are not evaluated but
+bluntly ignored, nested if-expressions in such branches are also not
+evaluated but are tallied to check for proper nesting.
+
+
+  
 
   
 \ir or \include_relative 
filename
diff --git a/src/bin/psql/Makefile b/src/bin/psql/Makefile
index c53733f..1492b66 100644
--- a/src/bin/psql/Makefile
+++ b/src/bin/psql/Makefile
@@ -21,7 +21,7 @@ REFDOCDIR= $(top_srcdir)/doc/src/sgml/ref
 override CPPFLAGS := -I. -I$(srcdir) -I$(libpq_srcdir) $(CPPFLAGS)
 LDFLAGS += -L$(top_builddir)/src/fe_utils -lpgfeutils -lpq
 
-OBJS=  command.o common.o help.o input.o stringutils.o mainloop.o copy.o \
+OBJS=  command.o common.o conditional.o help.o input.o stringutils.o 
mainloop.o copy.o \
startup.o prompt.o variables.o large_obj.o describe.o \
crosstabview.o tab-complete.o \
sql_help.o psqls

  1   2   3   4   >