Re: [GENERAL] Dumb question involving to_tsvector and a view

2013-02-23 Thread Tom Lane
Kevin Grittner  writes:
> Raymond C. Rodgers  wrote:
>> As I went to add a tsvector column, it occurred to me that it
>> might be possible to add a dynamic tsvector column through the
>> use of a view, so I created a temporary view with a command along
>> the lines of:
>> 
>>  CREATE TEMPORARY VIEW ftstest AS SELECT id, field1, field2,
>> TO_TSVECTOR(COALESCE(field1,'') || ' ' ||
>> COALESCE(field2,'')) AS txtsrch FROM mytable;
>> 
>> To my surprise, it worked. Now, I'm sitting here thinking about
>> the performance impact that doing this would have.

> I had a similar situation and benchmarked it both ways.  For my
> situation I came out ahead writing the extra column for inserts and
> updates than generating the tsvector values on the fly each time it
> was queried.  YMMV.  It probably depends mostly on the ratio of
> inserts and updates to selects.

A "virtual" tsvector like that is probably going to be useless for
searching as soon as you get a meaningful amount of data, because the
only way the DB can implement a search is to compute the tsvector value
for each table row and then examine it for the target word(s).

What you want is a GIST or GIN index on the contents of the tsvector.
You can either realize the tsvector as a table column and put a regular
index on it, or you can build a functional index on the to_tsvector()
expression.  The latter is kind of like your idea in that the tsvector
as a whole isn't stored anywhere --- but there's an index containing all
the words, which is what you need for searching.

I think there are examples of both ways in the "text search" chapter of
the manual.  (If not, there should be ...)

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] autoanalyze criteria

2013-02-23 Thread Stefan Andreatta

On 02/22/2013 06:27 PM, Jeff Janes wrote:
On Fri, Feb 22, 2013 at 5:57 AM, Stefan Andreatta 
mailto:s.andrea...@synedra.com>> wrote:


Hi,

If I understand
http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html
correctly, the autovacuum threshold in could be estimated like
this in PostgreSQL 9.1:

SELECT pg_stat_user_tables.relname,
 pg_stat_user_tables.n_dead_tup,
 CAST(current_setting('autovacuum_vacuum_threshold') AS bigint)
 + (CAST(current_setting('autovacuum_vacuum_scale_factor')
AS numeric)
* pg_class.reltuples) AS av_threshold
 FROM pg_stat_user_tables
 JOIN pg_class on pg_stat_user_tables.relid = pg_class.oid
 ORDER BY 1;

If pg_stat_user_tables.n_dead_tup exceeds av_threshold autovacuum
should kick in. Obviously, that does rely on up-to-date
statistics. Is that how it is actually done?


Pretty much, yes.  With the caveat that table storage settings can 
override the global settings.



2nd question: because pg_stat_user_tables.n_dead_tup is itself
estimated by ANALYZE it cannot be used as a criterion for the next
autoanalyze run, I think. 



n_dead_tup is updated by the stats collector, not by ANALYZE.

Thanks Jeff, that helped a lot (as did a careful rereading of 
http://www.postgresql.org/docs/9.1/static/monitoring-stats.html and 
http://www.postgresql.org/docs/9.1/static/catalog-pg-class.html ;-)


However, to estimate whether autoanalyze should be triggered, I am still 
missing something: the analyze threshold is compared to the "total 
number of tuples inserted, updated, or deleted since the last ANALYZE." 
(according to 
http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html).


pg_stat_user_tables.n_live tup - pg_class.reltuples should give 
something like the sum of rows inserted minus rows deleted since the 
last ANALYZE. But according to the documentation we would need the sum 
of those values. And we are still missing a number for rows updated 
since the last analyze. pg_stat_usert_tables. n_dead_tup, on the other 
hand, is only set back by successful VACUUM. autoanalyzing a table with 
more than 10% dead rows would therefore keep autoanalyze in a loop until 
the ratio rises beyond 20% (default configuration) and autovacuum kicks 
in. So that wouldn't make a lot of sense.


Regards,
Stefan


Re: [GENERAL] autoanalyze criteria

2013-02-23 Thread Jeff Janes
On Saturday, February 23, 2013, Stefan Andreatta wrote:

>
>
> Thanks Jeff, that helped a lot (as did a careful rereading of
> http://www.postgresql.org/docs/9.1/static/monitoring-stats.html and
> http://www.postgresql.org/docs/9.1/static/catalog-pg-class.html ;-)
>
> However, to estimate whether autoanalyze should be triggered, I am still
> missing something: the analyze threshold is compared to the "total number
> of tuples inserted, updated, or deleted since the last ANALYZE."
> (according to
> http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html).
>
> pg_stat_user_tables.n_live tup - pg_class.reltuples should give something
> like the sum of rows inserted minus rows deleted since the last ANALYZE.
> But according to the documentation we would need the sum of those values.
> And we are still missing a number for rows updated since the last analyze.
> pg_stat_usert_tables. n_dead_tup, on the other hand, is only set back by
> successful VACUUM. autoanalyzing a table with more than 10% dead rows would
> therefore keep autoanalyze in a loop until the ratio rises beyond 20%
> (default configuration) and autovacuum kicks in. So that wouldn't make a
> lot of sense.
>
>
Hi Stefan,

Sorry, I got tunnel vision about the how the threshold was computed, and
forgot about the thing it was compared to.  There is a "secret" data point
in the stats collector called changes_since_analyze.  This is not exposed
in the pg_stat_user_tables.  But I think it should be as I often have
wanted to see it.


Cheers,

Jeff


Re: [GENERAL] autoanalyze criteria

2013-02-23 Thread Stefan Andreatta


On 02/23/2013 05:10 PM, Jeff Janes wrote:

On Saturday, February 23, 2013, Stefan Andreatta wrote:




Thanks Jeff, that helped a lot (as did a careful rereading of
http://www.postgresql.org/docs/9.1/static/monitoring-stats.html
and
http://www.postgresql.org/docs/9.1/static/catalog-pg-class.html ;-)

However, to estimate whether autoanalyze should be triggered, I am
still missing something: the analyze threshold is compared to the
"total number of tuples inserted, updated, or deleted since the
last ANALYZE." (according to
http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html).

pg_stat_user_tables.n_live tup - pg_class.reltuples should give
something like the sum of rows inserted minus rows deleted since
the last ANALYZE. But according to the documentation we would need
the sum of those values. And we are still missing a number for
rows updated since the last analyze. pg_stat_usert_tables.
n_dead_tup, on the other hand, is only set back by successful
VACUUM. autoanalyzing a table with more than 10% dead rows would
therefore keep autoanalyze in a loop until the ratio rises beyond
20% (default configuration) and autovacuum kicks in. So that
wouldn't make a lot of sense.


Hi Stefan,

Sorry, I got tunnel vision about the how the threshold was computed, 
and forgot about the thing it was compared to.  There is a "secret" 
data point in the stats collector called changes_since_analyze.  This 
is not exposed in the pg_stat_user_tables.  But I think it should be 
as I often have wanted to see it.



Cheers,

Jeff


Sounds like a very good idea to me - any way I could help to make such a 
thing happen?


Stefan


Re: [GENERAL] Dumb question involving to_tsvector and a view

2013-02-23 Thread Raymond C. Rodgers

On 02/23/2013 05:26 AM, Tom Lane wrote:
A "virtual" tsvector like that is probably going to be useless for 
searching as soon as you get a meaningful amount of data, because the 
only way the DB can implement a search is to compute the tsvector 
value for each table row and then examine it for the target word(s). 
What you want is a GIST or GIN index on the contents of the tsvector. 
You can either realize the tsvector as a table column and put a 
regular index on it, or you can build a functional index on the 
to_tsvector() expression. The latter is kind of like your idea in that 
the tsvector as a whole isn't stored anywhere --- but there's an index 
containing all the words, which is what you need for searching. I 
think there are examples of both ways in the "text search" chapter of 
the manual. (If not, there should be ...) regards, tom lane 
I think the only real advantage to using something like this would be a 
space savings in terms of storing the tsvector data, but I don't see 
that being a significant enough reason to go ahead and use this idea in 
a production situation. As mentioned [by pretty much all of us], once 
the table size is sufficiently large there would be a performance 
penalty by to_tsvector being executed on every record in the table. (If 
I'm not mistaken, with the way I wrote that "create view", every record 
in "mytable" would be subject to the function call, then any narrowing 
parameters in the where clause would be applied afterwards.)


Any way, like I said originally, it was a dumb question. It might be ok 
to use that in a situation where the table size is known to be small, 
but there's little to no reason to do it in a production situation.


Thanks!
Raymond


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] stored procedure code

2013-02-23 Thread Derek Perak


Hi, I would like to learn about how stored procedures are handled in postgres. 
In particular, I'd like to learn how plpgsql procedures are compiled and 
stored, and how they (both plpgsql and C procedures) interact with the 
optimizer during planning. Would appreciate if someone can point out some links 
to the source code for me to check out.
Thanks!Derek  

Re: [GENERAL] stored procedure code

2013-02-23 Thread Adrian Klaver

On 02/23/2013 02:49 PM, Derek Perak wrote:

Hi, I would like to learn about how stored procedures are handled in
postgres. In particular, I'd like to learn how plpgsql procedures are
compiled and stored, and how they (both plpgsql and C procedures)
interact with the optimizer during planning. Would appreciate if someone
can point out some links to the source code for me to check out.


A brief intro:

http://www.postgresql.org/docs/9.2/interactive/plhandler.html

A graphical look at the source code:

http://www.postgresql.org/developer/backend/

If you click on the index link you get:

http://wiki.postgresql.org/wiki/Backend_flowchart



Thanks!
Derek



--
Adrian Klaver
adrian.kla...@gmail.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [JDBC] can't access through SSL

2013-02-23 Thread Maz Mohammadi
I still can't access my SSL enabled server!!!

Is root.crt supposed to be an exact copy of server.crt file which I use in my 
client's keystore?

I have another observation.  As I start the coordinator node, I don't see any 
file access to the server.key or server.crt file?  Aren't these files supposed 
to be read at start up time or at least when I try to make a connection from my 
java application?

Everything I try create a datasource on tomcat I get the follow error on client 
and server's console...

FATAL:  connection requires a valid client certificate.

Am I missing something?

-maz

From: pgsql-jdbc-ow...@postgresql.org [mailto:pgsql-jdbc-ow...@postgresql.org] 
On Behalf Of Maz Mohammadi
Sent: Friday, February 22, 2013 4:33 PM
To: pgsql-j...@postgresql.org
Subject: Re: [JDBC] can't access through SSL

Correction...

After double checking the path to java's keystore file, and correcting 
it...this is the new error.

FATAL:  connection requires a valid client certificate.

Any idea would be greatly appreciated.

-maz

From: Maz Mohammadi
Sent: Friday, February 22, 2013 3:51 PM
To: 'pgsql-j...@postgresql.org'
Subject: RE: [JDBC] can't access through SSL

Hello,

I regenerated some new keys for my postgres server.  I've placed them under 
/var/lib/coord and shared them with the datanodes as well.

After adding the certificates to the keystore for my tomcat java application, I 
get the following error on my server.

LOG:  could not accept SSL connection:  sslv3 alert certificate unkown.

I thought I had to use JDBC 3 for this.

Any ideas?

-maz

From: Maz Mohammadi
Sent: Friday, February 22, 2013 3:45 PM
To: pgsql-j...@postgresql.org
Subject: RE: [JDBC] can't access through SSL

Thx,  one step closer.

pgsql-j...@postgresql.org


From: Vitalii Tymchyshyn [mailto:tiv...@gmail.com]
Sent: Friday, February 22, 2013 12:56 PM
To: Maz Mohammadi
Cc: pgsql-j...@postgresql.org
Subject: Re: [JDBC] can't access through SSL

Try jdbc:postgresql://localhost:5432/testdb?ssl=true

2013/2/22 Maz Mohammadi mailto:mmohamm...@pentaho.com>>
Hello all,

I'm trying to access a postgres database through a java application (tomcat).  
This is the only entry I have in pg_hba.conf

# TYPE  DATABASEUSERADDRESS METHOD
hostssl all   all 
127.0.0.1/32cert

and put the certicate (from /var/lib/postre./coord/server.crt) in the 
cacerts under $JAVA_HOME/...

This is my jdbc URL
jdbc:postgresql://localhost:5432/testdb&ssl=true

But When I try to create a datasource on tomcat, I get the following error...
"Connection attempt failed: FATAL: no pg_hba.conf entry for host "127.0.0.1", 
user "progres-xc", database "testdb&ssl=true", SSL off"

Any help is greatly appreciated.

-maz



--
Best regards,
 Vitalii Tymchyshyn


Re: [GENERAL] Dumb question involving to_tsvector and a view

2013-02-23 Thread Jasen Betts
On 2013-02-23, Raymond C. Rodgers  wrote:
> On 02/23/2013 05:26 AM, Tom Lane wrote:
>> A "virtual" tsvector like that is probably going to be useless for 
>> searching as soon as you get a meaningful amount of data, because the 
>> only way the DB can implement a search is to compute the tsvector 
>> value for each table row and then examine it for the target word(s). 
>> What you want is a GIST or GIN index on the contents of the tsvector.

> I think the only real advantage to using something like this would be a 
> space savings in terms of storing the tsvector data, but I don't see 
> that being a significant enough reason to go ahead and use this idea in 
> a production situation. As mentioned [by pretty much all of us], once 
> the table size is sufficiently large there would be a performance 
> penalty by to_tsvector being executed on every record in the table.

Unless the plan comes out as a table scan the index will be used
instead ot to_tsvector()

When there is a table scan to_tsvector will be used instead of reading
from disk, I don't know how fast to_tsvector is compared to disk, but
usually computing a result is faster than reading it from disk.

Storing the tsvector in the table is likely to be faster only when a
tablescan is done and the table is fully cached in ram.

-- 
⚂⚃ 100% natural



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general