[GENERAL] Question on Trigram GIST indexes

2013-01-05 Thread ERR ORR
@Moderators: I am reposting this because the original from 22 December
apparently didn't arrive on the list.

I was trying to make Postgresql use a trigram gist index on a varchar
field, but to no avail.

Specifically, I was trying to replicate what is done in this blog post:
http://www.postgresonline.com/journal/archives/212-PostgreSQL-9.1-Trigrams-teaching-LIKE-and-ILIKE-new-tricks.html


I use Postgresql 9.1.7 on Linux FC17 64bit, my locale is UTF8.

My full table definition is

CREATE TABLE "TEST"
(
  "RECID" bigint NOT NULL DEFAULT next_id(),
  "TST_PAYLOAD" character varying(255),
  CONSTRAINT "PK_TEST" PRIMARY KEY ("RECID")
  USING INDEX TABLESPACE local
)
WITH (
  OIDS=FALSE
);

CREATE INDEX "TEST_PAYLOAD_PATTERN_1_IDX"
  ON "TEST"
  USING btree
  ("TST_PAYLOAD" COLLATE pg_catalog."default" varchar_pattern_ops)
TABLESPACE local;

CREATE INDEX "TEST_PAYLOAD_TRIGRAM_GIST_1_IDX"
  ON "TEST"
  USING gist
  ("TST_PAYLOAD" COLLATE pg_catalog."default" gist_trgm_ops)
TABLESPACE local;

CREATE INDEX "TEST_PAYLOAD_TRIGRAM_GIN_1_IDX"
  ON "TEST"
  USING gin
  ("TST_PAYLOAD" COLLATE pg_catalog."default" gin_trgm_ops)
TABLESPACE local;


The COLLATE pg_catalog."default" clause is inserted by the DB (default is
"Unicode"). I also tried to define the Trigram index with COLLATE
pg_catalog."C"  but the behavior did not change. I did vacuum and analyze
after creating each index.

The field "TST_PAYLOAD" contains 26389 names of cities, all in uppercase.

I have pg_tgrm installed - actually all extensions are present.

Queries which use "WHERE "TST_PAYLOAD" LIKE 'SEAT%'" go to the btree index
as it should.
Queries which use "WHERE "TST_PAYLOAD" LIKE '%EAT%'" *should* use the GIST
index but do a full table scan instead.
(I am looking for names like 'SEATTLE' in this example)

I also tried dropping the btree index but that has no influence on the
behavior.

I have texts/strings in different languages/charsets, so UTF8 looked like
the best decision to me, instead of, say, ISO-8859-15, which is limited to
just some European charsets. Specifically I am storing strings in European
languages (corresponding to the ISO-8859 series) including diacrites line
äöüñáéíóú ..., Russian, Arabic, Chinese etc. in one column instead of
making different columns/tables and using them via a view because that's my
use case and UTF8 should accommodate that IMHO (or is that an abuse of the
DB?)

Would it help to `ALTER DATABASE set lc_collate = 'C'`,supposing that is
possible? (Oracle doesn't allow that iirc).

Thanks for any insights, pointers ...

I'd be grateful if anybody could explain to me what I am doing wrong.

Thanks in advance.


[GENERAL] Counterintuitive locking behavior

2013-01-05 Thread Chris Travers
Hi everyone;

I recently discovered that subselects in update statements don't assume
that the select is for update of the updating table.

For example, if I do this:

CREATE TABLE foo (
   test int primary key,
);

INSERT INTO foo VALUES (1);

then in one session:

BEGIN;
UPDATE foo SET test = 2 WHERE test in (select test from foo where test = 1);

and then in the other session

BEGIN;
UPDATE foo SET test = 3 WHERE test in (select test from foo where test = 1);

When I commit both transactions, the second one chronologically always
takes precedence.  In other words, the locks takes effect after the
subselect but before the rows are updated.  This strikes me as quite error
prone and quite a bit more error prone than a rule which says that unless
stated otherwise subselects of the updated table are to be selected for
update.

This may strike some as a "do what I mean" kind of feature, but the way I
am looking at it is that a SQL statement is usually written as a
declarative block, and an assumption that the SQL statement is to be
evaluated atomically is a good one for predicability of software (in other
words, locks apply to the whole statement).

Is there a reason why we don't do locking this way?  (i.e. where on UPDATE
foo, all rows selected from foo during the update are locked unless the
subselect specifically states otherwise.)

Best Wishes,
Chris Travers


Re: [GENERAL] Counterintuitive locking behavior

2013-01-05 Thread Amit kapila
On Sunday, January 06, 2013 7:48 AM Chris Travers wrote:

> I recently discovered that subselects in update statements don't assume that 
> the select is for update of the updating table.


> For example, if I do this:


> CREATE TABLE foo (
>   test int primary key,
> );


> INSERT INTO foo VALUES (1);


> then in one session:


> BEGIN;
> UPDATE foo SET test = 2 WHERE test in (select test from foo where test = 1);


> and then in the other session


> BEGIN;
> UPDATE foo SET test = 3 WHERE test in (select test from foo where test = 1);

The behavior will be same even for UPDATE foo SET test = 3 WHERE test =1;



> Is there a reason why we don't do locking this way?  (i.e. where on UPDATE 
> foo, all rows selected from foo during the 
> update are locked unless the subselect specifically states otherwise.)

The reason for this behavior is if it locks all rows during select, then it can 
so happen that Update will actually not happen on the row but it will be locked.
For example 

UPDATE foo SET test = 3 WHERE test in (select test from foo where test = 1) and 
FALSE;

Now in this case if it locks the rows during subselect, then the rows will be 
locked during whole transaction
irrespective of the fact that they will not be updated.

With Regards,
Amit Kapila.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Counterintuitive locking behavior

2013-01-05 Thread Amit kapila

On Sunday, January 06, 2013 11:10 AM Amit kapila wrote:
On Sunday, January 06, 2013 7:48 AM Chris Travers wrote:




>> Is there a reason why we don't do locking this way?  (i.e. where on UPDATE 
>> foo, all rows selected from foo during the
>> update are locked unless the subselect specifically states otherwise.)

>The reason for this behavior is if it locks all rows during select, then it 
>can so happen that Update will actually not happen on the row but it will be 
>locked.
> For example

> UPDATE foo SET test = 3 WHERE test in (select test from foo where test = 1) 
> and FALSE;

> Now in this case if it locks the rows during subselect, then the rows will be 
> locked during whole transaction
> irrespective of the fact that they will not be updated.

In the above example and FALSE, I mean to say any other subquery which will 
yield the overall condition to not return any row.
Similarly there can be many more scenarios where only half of the selected rows 
(by one of the conds. ) will be actual candidates of Update.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Counterintuitive locking behavior

2013-01-05 Thread Boszormenyi Zoltan

2013-01-06 03:18 keltezéssel, Chris Travers írta:

Hi everyone;

I recently discovered that subselects in update statements don't assume that the select 
is for update of the updating table.


For example, if I do this:

CREATE TABLE foo (
   test int primary key,
);

INSERT INTO foo VALUES (1);

then in one session:

BEGIN;
UPDATE foo SET test = 2 WHERE test in (select test from foo where test = 1);

and then in the other session

BEGIN;
UPDATE foo SET test = 3 WHERE test in (select test from foo where test = 1);

When I commit both transactions, the second one chronologically always takes precedence. 
 In other words, the locks takes effect after the subselect but before the rows are 
updated.  This strikes me as quite error prone and quite a bit more error prone than a 
rule which says that unless stated otherwise subselects of the updated table are to be 
selected for update.


This may strike some as a "do what I mean" kind of feature, but the way I am looking at 
it is that a SQL statement is usually written as a declarative block, and an assumption 
that the SQL statement is to be evaluated atomically is a good one for predicability of 
software (in other words, locks apply to the whole statement).


Is there a reason why we don't do locking this way?  (i.e. where on UPDATE foo, all rows 
selected from foo during the update are locked unless the subselect specifically states 
otherwise.)


What you are seeing is the MVCC behaviour of PostgreSQL
in READ COMMITTED mode. You can use REPEATABLE READ mode
in 9.2.x or SERIALIZABLE mode in earlier generations.
Please read the "Concurrency control" section of the documentation.
http://www.postgresql.org/docs/9.2/interactive/mvcc.html

With this mode:

session 1:

zozo=> create table foo (test int primary key);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "foo_pkey" for table 
"foo"
CREATE TABLE
zozo=> insert into foo values (1);
INSERT 0 1
zozo=> begin isolation level repeatable read;
BEGIN
zozo=> update foo set test = 2 where test = (select test from foo where test = 
1);
UPDATE 1

session 2:
zozo=> begin isolation level repeatable read;
BEGIN
zozo=> update foo set test = 2 where test = (select test from foo where test = 
1);
(session 2 is waiting for the lock on the row at this point)

session 1:

zozo=> commit;
COMMIT

session 2 threw an error after session 1 committed:

ERROR:  could not serialize access due to concurrent update

But updating a row blindly is quite silly even in this small example,
you can use SELECT ... FOR UPDATE for explicit locking.

session 1:

zozo=> begin;
BEGIN
zozo=> select * from foo where test = 1 for update;
 test
--
1
(1 row)

session 2:

zozo=> begin;
BEGIN
zozo=> select * from foo where test = 1 for update;
session 2 waits...

session 1:

zozo=> update foo set test = 2 where test = 1;
UPDATE 1
zozo=> commit;
COMMIT

session 2 now gives the result:

zozo=> select * from foo where test = 1 for update;
 test
--
(0 rows)

and your application can warn the user that the row is not there anymore,
so no point in the subsequent UPDATE. You should do a new query to find
the row you wanted.

Best regards,
Zoltán Böszörményi

--
--
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general