Re: [GENERAL] grep -f keyword data query

2015-12-29 Thread Hiroyuki Sato
Hello Tom.

Thank you for replying.

This is Gin Index result.

It is slow too.

Best regards.

--
Hiroyuki Sato

1, create.sql

drop table if exists url_lists4;
create table url_lists4 (
  id int not null primary key,
  url text not null
);
--create index ix_url_url_lists4 on url_lists4(url);
create index ix_url_url_lists4 on url_lists4 using gin(url
gin_trgm_ops);

drop table if exists keywords4;
create table keywords4 (
  id int not null primary key,
  name varchar(40) not null,
  url text not null
);

create index ix_url_keywords4 on keywords4(url);
create index ix_name_keywords4 on keywords4(name);


\copy url_lists4(id,url) from 'sample.txt' with delimiter ',';
\copy keywords4(id,name,url) from 'keyword.txt' with delimiter ',';


2, EXPLAIN

QUERY PLAN


--
 Nested Loop  (cost=22.55..433522.66 rows=1250 width=57)
   ->  Seq Scan on keywords4 k  (cost=0.00..104.50 rows=5000 width=28)
 Filter: ((name)::text = 'esc_url'::text)
   ->  Bitmap Heap Scan on url_lists4 u  (cost=22.55..61.68 rows=2500
width=57)
 Recheck Cond: (url ~~ k.url)
 ->  Bitmap Index Scan on ix_url_url_lists4  (cost=0.00..21.92
rows=2500 width=0)
   Index Cond: (url ~~ k.url)
(7 rows)

3, EXPLAIN ANALYZE
QUERY
PLAN

---
 Nested Loop  (cost=22.55..433522.66 rows=1250 width=57) (actual
time=7227.210..1753163.751 rows=4850 loops=1)
   ->  Seq Scan on keywords4 k  (cost=0.00..104.50 rows=5000 width=28)
(actual time=0.035..16.577 rows=5000 loops=1)
 Filter: ((name)::text = 'esc_url'::text)
   ->  Bitmap Heap Scan on url_lists4 u  (cost=22.55..61.68 rows=2500
width=57) (actual time=350.625..350.626 rows=1 loops=5000)
 Recheck Cond: (url ~~ k.url)
 Rows Removed by Index Recheck: 0
 Heap Blocks: exact=159
 ->  Bitmap Index Scan on ix_url_url_lists4  (cost=0.00..21.92
rows=2500 width=0) (actual time=350.618..350.618 rows=1 loops=5000)
   Index Cond: (url ~~ k.url)
 Planning time: 0.169 ms
 Execution time: 1753165.329 ms
(11 rows)



2015年12月29日(火) 2:34 Tom Lane :

> Hiroyuki Sato  writes:
> > I re-created index with pg_trgm.
> > Execution time is 210sec.
> > Yes It is faster than btree index. But still slow.
> > It is possible to improve this query speed?
> > Should I use another query or idex?
>
> Did you try a GIN index?
>
> regards, tom lane
>


Re: [GENERAL] grep -f keyword data query

2015-12-29 Thread Hiroyuki Sato
Hello Jeff

Thank you for replying.

2015年12月29日(火) 4:35 Jeff Janes :

> On Sun, Dec 27, 2015 at 3:53 PM, Hiroyuki Sato 
> wrote:
> > Hello Andreas and Tom
> >
> > Thank you for replying.
> >
> > Sorry, I re-created my questions. I was mis-pasted query log on previous
> > question.
> > (@~ operator is PGroonga extension (http://pgroonga.github.io))
> > Please ignore it.
> >
> > Best regards.
> >
> > 1, Problem.
> >   (1) Following query is exteme slow. (478sec)
> > SELECT
> >   u.url
> > FROM
> >   url_lists4 u,
> >   keywords4 k
> > WHERE
> >   u.url like k.url
> > AND
> >   k.name = 'esc_url';
> >
> >
> >   (2) grep -f kwd.txt sample.txt (exec time under 1sec)
>
> Certainly not in my hands.  The best I can get is 9 seconds.
>
> >
> > 2, Questions
> >
> >   (1) Is it possible to improve this query like the command ``grep -f
> > keyword data``?
>
> You will not get a general tool to match a specialized tool in the
> specialized tool's own domain.  fgrep is good at what fgrep does.
>

This is just same data and real data is forward proxy log.
There are no own domain. It is contain 20,000,000 log per day.


>
>
> Since your queries all have constant text strings at the beginning,
> they could use the index.  But if you are not using the C collation,
> then you need build a special index:
>
> create index on url_lists4 (url text_pattern_ops);
>

Currently I just use C collation (ASCII) only.


>
> But, the planner refuses to use this index for your query anyway,
> because it can't see that the patterns are all left-anchored.
>
> Really, your best bet is refactor your url data so it is stored with a
> url_prefix and url_suffix column.  Then you can do exact matching
> rather than pattern matching.
>

I see, exact matching faster than pattern matting.
But I need pattern match in path part
(ie, http://www.yahoo.com/a/b/c/... )
 I would like to pattern match '/a/b/c' part.

That's why I asked this question.

If it is impossible to improve join speed, I will dump data once, and
match it with grep or something tools.

Thanks.



>
> Cheers,
>
> Jeff
>


[GENERAL] Regex help again (sorry, I am bad at these)

2015-12-29 Thread Christopher Molnar
Hello all!

Sorry to have to ask the experts here for some regex assistance again. I am
admittadly awful with these and could use some help.

Have the following string (this is an example) that needs to be changed.
Need to use a search and replace because the links across over 200K records
are similar but not the same.

'Complete the attached lab and submit via dropbox\rhttps://owncloud.porterchester.edu/HVACR/PCI_GasHeat/GasElectrical/HVACR1114_LAB_13A.pdf";
title="Lab 13A">Lab 13A<\a>'


Need the final string to separate the "LAB_13A.pdf" from the rest of the
URL by inserting a "&file=" in front of it. The final string should look
like:


'Complete the attached lab and submit via dropbox\rhttps://owncloud.porterchester.edu/HVACR/PCI_GasHeat/GasElectrical/&file=HVACR1114_LAB_13A.pdf";
title="Lab 13A">Lab 13A<\a>'


I have tried something like:

 update pcilms_assign set intro=regexp_replace(intro, '/([^/]*)\" title=$',
'&files=\1') where intro like '%https://owncloud.porterchester.edu%' and
course=18 and id=55413;

and the result puts the &file= in the wrong place (at the end of the whole
string).


Any suggestions?


Thanks!

-Chris


Re: [GENERAL] Regex help again (sorry, I am bad at these)

2015-12-29 Thread Melvin Davidson
Chris,

First, even though it may or may not apply in this instance, it is ALWAYS a
good idea (and good manners) to provide the PostgreSQL version and your O/S
when posting to this board.

I have also previously replied that the following should work:

UPDATE pcilms_assign
   SET intro = REPLACE (intro, 'HVACR1114_LAB_13A.pdf',
'&file=HVACR1114_LAB_13A.pdf')
 WHERE intro like 'https://owncloud.porterchester.edu%'
AND course=18 and id=55413;

NOTE: DO NOT use % in front of 'https: 
It is not needed and will slow query performance.

Your mileage may vary because you have not stated your VERSION of
PostgreSQL or your O/S.


On Mon, Dec 28, 2015 at 2:08 PM, Christopher Molnar 
wrote:

> Hello all!
>
> Sorry to have to ask the experts here for some regex assistance again. I
> am admittadly awful with these and could use some help.
>
> Have the following string (this is an example) that needs to be changed.
> Need to use a search and replace because the links across over 200K records
> are similar but not the same.
>
> 'Complete the attached lab and submit via dropbox\rhttps://owncloud.porterchester.edu/HVACR/PCI_GasHeat/GasElectrical/HVACR1114_LAB_13A.pdf";
> title="Lab 13A">Lab 13A<\a>'
>
>
> Need the final string to separate the "LAB_13A.pdf" from the rest of the
> URL by inserting a "&file=" in front of it. The final string should look
> like:
>
>
> 'Complete the attached lab and submit via dropbox\rhttps://owncloud.porterchester.edu/HVACR/PCI_GasHeat/GasElectrical/&file=HVACR1114_LAB_13A.pdf";
> title="Lab 13A">Lab 13A<\a>'
>
>
> I have tried something like:
>
>  update pcilms_assign set intro=regexp_replace(intro, '/([^/]*)\"
> title=$', '&files=\1') where intro like '%
> https://owncloud.porterchester.edu%' and course=18 and id=55413;
>
> and the result puts the &file= in the wrong place (at the end of the whole
> string).
>
>
> Any suggestions?
>
>
> Thanks!
>
> -Chris
>
>
>


-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


Re: [GENERAL] Regex help again (sorry, I am bad at these)

2015-12-29 Thread Christoph Moench-Tegeder
## Christopher Molnar (cmolna...@gmail.com):

> I have tried something like:
> 
>  update pcilms_assign set intro=regexp_replace(intro, '/([^/]*)\" title=$',
> '&files=\1') where intro like '%https://owncloud.porterchester.edu%' and
> course=18 and id=55413;

http://blog.codinghorror.com/parsing-html-the-cthulhu-way/
But you were close:
  regexp_replace(intro, '/([^/]*)" title=', '/&file=\1" title=')
looks like it's doing the right thing on your example.

Regards,
Christoph

-- 
Spare Space


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Regex help again (sorry, I am bad at these)

2015-12-29 Thread Christoph Moench-Tegeder
## Melvin Davidson (melvin6...@gmail.com):

> UPDATE pcilms_assign
>SET intro = REPLACE (intro, 'HVACR1114_LAB_13A.pdf',
> '&file=HVACR1114_LAB_13A.pdf')
>  WHERE intro like 'https://owncloud.porterchester.edu%'
> AND course=18 and id=55413;

Unfortunately, that tries to do the right thing for that example
only but does not account for the 200k documents not named
HVACR1114_LAB_13A.pdf.

> NOTE: DO NOT use % in front of 'https: 
> It is not needed and will slow query performance.

And that will plainly not work if the string does not start with the
URL - as in the example given.
It even won't work regardless of version and OS.

Regards,
Christoph

-- 
Spare Space


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] grep -f keyword data query

2015-12-29 Thread David Rowley
On 30 December 2015 at 04:21, Hiroyuki Sato  wrote:

> 2015年12月29日(火) 4:35 Jeff Janes :
>
>>
>>
>
>> But, the planner refuses to use this index for your query anyway,
>> because it can't see that the patterns are all left-anchored.
>>
>> Really, your best bet is refactor your url data so it is stored with a
>> url_prefix and url_suffix column.  Then you can do exact matching
>> rather than pattern matching.
>>
>
> I see, exact matching faster than pattern matting.
> But I need pattern match in path part
> (ie, http://www.yahoo.com/a/b/c/... )
>  I would like to pattern match '/a/b/c' part.
>

If your pattern matching is as simple as that, then why not split the
/a/b/c/ part out as mentioned by Jeff? Alternatively you could just write a
function which splits that out for you and returns it, then index that
function, and then just include a call to that function in the join
condition matching with the equality operator. That'll allow hash and merge
joins to be possible again.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [GENERAL] Regex help again (sorry, I am bad at these)

2015-12-29 Thread Michael Nolan
On Mon, Dec 28, 2015 at 2:08 PM, Christopher Molnar 
wrote:

> Hello all!
>
> Sorry to have to ask the experts here for some regex assistance again. I
> am admittadly awful with these and could use some help.
>
> Any suggestions?
>

I have found over the years that it is far easier to write a short PHP or
PERL program to do tasks like this.  Much easier to debug and the speed
improvement by using SQL is not important for 200,000 records.
--
Mike Nolan


Re: [GENERAL] Regex help again (sorry, I am bad at these)

2015-12-29 Thread David G. Johnston
On Tue, Dec 29, 2015 at 2:26 PM, Michael Nolan  wrote:

> On Mon, Dec 28, 2015 at 2:08 PM, Christopher Molnar 
> wrote:
>
>> Hello all!
>>
>> Sorry to have to ask the experts here for some regex assistance again. I
>> am admittadly awful with these and could use some help.
>>
>> Any suggestions?
>>
>
> I have found over the years that it is far easier to write a short PHP or
> PERL program to do tasks like this.  Much easier to debug and the speed
> improvement by using SQL is not important for 200,000 records.
>
>
​If all you end up doing is using regular expressions in Perl then I'm not
sure how that solves the "inexperienced at regular expressions"
problem...so what kind (if any) of non-regex based solution would​

​you implement to accomplish this goal.  I'll admit that, being familiar
with regular expressions, I probably tend to resort to them by default now
when other solutions - if I stopped to think of them - would be less
matrix-y.

Obviously a simple find-replace is unlike to work well though some form of
"split-and-rearrange" could work - but URLs seem to want the flexibility
since defining split points in one seems challenging.

David J.
​


[GENERAL] cannot get stable function to use index

2015-12-29 Thread Andy Colson

Hi all,

I seem to be missing something.

I'm using PG 9.3.9 on Slackware64.

My table:


create table search (
gid integer,
descr text,
search_vec tsvector
);
create index search_key on search using gin(search_vec);

I've put a bunch of data in it, and using to_tsquery uses the index fine:

explain analyze
select *
from search
where search_vec @@ to_tsquery('213 & E & 13 & ST & N');

QUERY PLAN
---
 Bitmap Heap Scan on search  (cost=16.00..17.02 rows=1 width=69) 
(actual time=87.493..87.494 rows=1 loops=1)

   Recheck Cond: (search_vec @@ to_tsquery('213 & E & 13 & ST & N'::text))
   ->  Bitmap Index Scan on search_key  (cost=0.00..16.00 rows=1 
width=0) (actual time=87.478..87.478 rows=1 loops=1)
 Index Cond: (search_vec @@ to_tsquery('213 & E & 13 & ST & 
N'::text))

 Total runtime: 87.554 ms
(5 rows)


Here is the problem, I'd like to use this function from
http://workshops.boundlessgeo.com/tutorial-autocomplete/


CREATE OR REPLACE FUNCTION public.to_tsquery_partial(text)
  RETURNS tsquery AS $$
SELECT to_tsquery(
   array_to_string(
   regexp_split_to_array(
   trim($1),E'\\s+'),' & ') ||
   CASE WHEN $1 ~ ' $' THEN '' ELSE ':*' END)
  $$ LANGUAGE 'sql' STABLE STRICT;



Originally it didn't have "STABLE STRICT", but I added it.  Doesn't seem 
to matter though.  I cannot get this sql to use the index:


explain analyze
select *
from search
where search_vec @@ to_tsquery_partial('213 E 13 ST N')

--
Seq Scan on search  (cost=0.00..2526.56 rows=1 width=69) (actual 
time=68.033..677.490 rows=1 loops=1)
   Filter: (search_vec @@ 
to_tsquery((array_to_string('{213,E,13,ST,N}'::text[], ' & '::text) || 
':*'::text)))

   Rows Removed by Filter: 76427
 Total runtime: 677.548 ms
(4 rows)


to_tsquery_partial() calls to_tsquery() and array_to_string(), both of 
which I checked, and all of them are marked as stable.


Any hints why this is happening?

Thanks,

-Andy


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] cannot get stable function to use index

2015-12-29 Thread David G. Johnston
On Tue, Dec 29, 2015 at 3:52 PM, Andy Colson  wrote:

> ​[...]​
>
> Originally it didn't have "STABLE STRICT", but I added it.  Doesn't seem
> to matter though.  I cannot get this sql to use the index:
>
> explain analyze
> select *
> from search
> where search_vec @@ to_tsquery_partial('213 E 13 ST N')
>
> --
> Seq Scan on search  (cost=0.00..2526.56 rows=1 width=69) (actual
> time=68.033..677.490 rows=1 loops=1)
>Filter: (search_vec @@
> to_tsquery((array_to_string('{213,E,13,ST,N}'::text[], ' & '::text) ||
> ':*'::text)))
>Rows Removed by Filter: 76427
>  Total runtime: 677.548 ms
> (4 rows)
>
>
> to_tsquery_partial() calls to_tsquery() and array_to_string(), both of
> which I checked, and all of them are marked as stable.
>

STABLE functions, nor VOLATILE ones, are candidates for indexing.  Only
IMMUTABLE ones.  The default for functions is VOLATILE.​

I haven't the time to provide a solution to your problem - I'm just
pointing out "cannot get stable function to use index" is working as
designed and as is logically required.  An index must not rely upon outside
information, most typically time, since there exists no means for an index
to update itself based upon changes in the environment.  The only type of
function guaranteed to not rely upon the external environment is an
immutable one.  And no, you shouldn't lie by marking a function immutable
to get this to work.  The system does not check that the stated volatility
and the actual implementation match.

David J.


Re: [GENERAL] cannot get stable function to use index

2015-12-29 Thread David G. Johnston
On Tue, Dec 29, 2015 at 4:13 PM, David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Tue, Dec 29, 2015 at 3:52 PM, Andy Colson  wrote:
>
>> ​[...]​
>>
>> Originally it didn't have "STABLE STRICT", but I added it.  Doesn't seem
>> to matter though.  I cannot get this sql to use the index:
>>
>> explain analyze
>> select *
>> from search
>> where search_vec @@ to_tsquery_partial('213 E 13 ST N')
>>
>> --
>> Seq Scan on search  (cost=0.00..2526.56 rows=1 width=69) (actual
>> time=68.033..677.490 rows=1 loops=1)
>>Filter: (search_vec @@
>> to_tsquery((array_to_string('{213,E,13,ST,N}'::text[], ' & '::text) ||
>> ':*'::text)))
>>Rows Removed by Filter: 76427
>>  Total runtime: 677.548 ms
>> (4 rows)
>>
>>
>> to_tsquery_partial() calls to_tsquery() and array_to_string(), both of
>> which I checked, and all of them are marked as stable.
>>
>
> STABLE functions, nor VOLATILE ones, are candidates for indexing.  Only
> IMMUTABLE ones.  The default for functions is VOLATILE.​
>
> I haven't the time to provide a solution to your problem - I'm just
> pointing out "cannot get stable function to use index" is working as
> designed and as is logically required.  An index must not rely upon outside
> information, most typically time, since there exists no means for an index
> to update itself based upon changes in the environment.  The only type of
> function guaranteed to not rely upon the external environment is an
> immutable one.  And no, you shouldn't lie by marking a function immutable
> to get this to work.  The system does not check that the stated volatility
> and the actual implementation match.
>
>
​So while the above is all true I apparently mis-understood your
question... :(

I'm going to wait for someone thinking more clearly to answer...but it
seems that given an inability to prove that the result of the function call
is meaningfully selective the system would default to choosing a sequential
scan plan over an index.  You happen to choose a value that only returns a
single row but nothing prevents you from picking one that returns the
entire table.  There may be other factors involved as I am not that
familiar with the full text search capabilities of PostgreSQL.

David J.


Re: [GENERAL] cannot get stable function to use index

2015-12-29 Thread Jim Nasby

On 12/29/15 5:21 PM, David G. Johnston wrote:

STABLE functions, nor VOLATILE ones, are candidates for indexing.
Only IMMUTABLE ones.  The default for functions is VOLATILE.​

I haven't the time to provide a solution to your problem - I'm just
pointing out "cannot get stable function to use index" is working as
designed and as is logically required.  An index must not rely upon
outside information, most typically time, since there exists no
means for an index to update itself based upon changes in the
environment. The only type of function guaranteed to not rely upon
the external environment is an immutable one.  And no, you shouldn't
lie by marking a function immutable to get this to work.  The system
does not check that the stated volatility and the actual
implementation match.


​So while the above is all true I apparently mis-understood your
question... :(


I'm not an expert on text search, but presumably the tsvector knows what 
config was used, which means as long as the config is part of the index 
it should be OK.


Marking to_tsquery_partial as immutable would be wrong and potentially 
dangerous, as you can change default_text_search_config at any time, 
which would change the output of to_tsquery (as well as to_tsvector).



I'm going to wait for someone thinking more clearly to answer...but it
seems that given an inability to prove that the result of the function
call is meaningfully selective the system would default to choosing a
sequential scan plan over an index.  You happen to choose a value that
only returns a single row but nothing prevents you from picking one that
returns the entire table.  There may be other factors involved as I am
not that familiar with the full text search capabilities of PostgreSQL.


If I'm reading EXPLAIN ANALYZE correctly, to_tsquery_partial is being 
simplified out of the query entirely:


   Filter: (search_vec @@ 
to_tsquery((array_to_string('{213,E,13,ST,N}'::text[], ' & '::text) || 
':*'::text)))


Part of this could well be that you're not feeding the same data to 
to_tsquery. Your hard-coded example is


where search_vec @@ to_tsquery('213 & E & 13 & ST & N');

but your second query becomes '213 & E & 13 & ST & N:*'. Have you tried 
that as a hard-coded value?

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] efficient math vector operations on arrays

2015-12-29 Thread Jim Nasby

On 12/27/15 2:00 AM, Jony Cohen wrote:

Hi, Don't know if it's exactly what you're looking for but the MADLib
package has utility function for matrix and vector operations.
see: http://doc.madlib.net/latest/group__grp__array.html


Apply an operator to al elements on an array or pair of arrays: 
http://theplateisbad.blogspot.com/2015/12/the-arraymath-extension-vs-plpgsql.html, 
https://github.com/pramsey/pgsql-arraymath.


See also 
http://theplateisbad.blogspot.com/2015/12/more-fortran-90-like-vector-operations.html.


BTW, if you want to simply apply a function to all elements in an array 
there is an internal C function array_map that can do it. There's no SQL 
interface to it, but it shouldn't be hard to add one.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] cannot get stable function to use index

2015-12-29 Thread Tom Lane
Andy Colson  writes:
> I cannot get this sql to use the index:

> explain analyze
> select *
> from search
> where search_vec @@ to_tsquery_partial('213 E 13 ST N')

> --
> Seq Scan on search  (cost=0.00..2526.56 rows=1 width=69) (actual 
> time=68.033..677.490 rows=1 loops=1)
> Filter: (search_vec @@ 
> to_tsquery((array_to_string('{213,E,13,ST,N}'::text[], ' & '::text) || 
> ':*'::text)))
> Rows Removed by Filter: 76427
>   Total runtime: 677.548 ms
> (4 rows)

If you force it with enable_seqscan = off, you'll soon see that it's
capable of picking the indexscan plan, but it doesn't want to because it
estimates that the cost will be much higher, which seems to be a
consequence of the ":*" in the query.  (Even though the functions involved
are only stable, the planner is capable of seeing through them to look at
the pattern that will be fed to the GIN index search.)  You get the same
results if you use the resulting tsquery without any function at all.
For example (with dummy data), I get

regression=# explain select * from search
where search_vec @@ '213 & e & 13 & st & n:*'::tsquery;
QUERY PLAN  
  
--
 Seq Scan on search  (cost=0.00..3774.01 rows=1 width=21)
   Filter: (search_vec @@ '''213'' & ''e'' & ''13'' & ''st'' & 
''n'':*'::tsquery)
(2 rows)

regression=# set enable_seqscan TO 0;
SET
regression=# explain select * from search
where search_vec @@ '213 & e & 13 & st & n:*'::tsquery;
 QUERY PLAN 


 Bitmap Heap Scan on search  (cost=10.00..104448.01 rows=1 width=21)
   Recheck Cond: (search_vec @@ '''213'' & ''e'' & ''13'' & ''st'' & 
''n'':*'::tsquery)
   ->  Bitmap Index Scan on search_key  (cost=0.00..10.00 rows=1 width=0)
 Index Cond: (search_vec @@ '''213'' & ''e'' & ''13'' & ''st'' & 
''n'':*'::tsquery)
(4 rows)

but for comparison, with a pattern without ':*', I get

regression=# explain select * from search
where search_vec @@ '213 & e & 13 & st & n'::tsquery;
QUERY PLAN  
  
--
 Bitmap Heap Scan on search  (cost=44.00..48.01 rows=1 width=21)
   Recheck Cond: (search_vec @@ '''213'' & ''e'' & ''13'' & ''st'' & 
''n'''::tsquery)
   ->  Bitmap Index Scan on search_key  (cost=0.00..44.00 rows=1 width=0)
 Index Cond: (search_vec @@ '''213'' & ''e'' & ''13'' & ''st'' & 
''n'''::tsquery)
(4 rows)

I'm inclined to think this is a bug in the estimator; it seems to be
charging for many more "entry page" fetches than there are pages in
the index.  But maybe it's right and there will be lots of repeated
work involved.  It would be interesting to see EXPLAIN ANALYZE results
from your data for these examples.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] efficient math vector operations on arrays

2015-12-29 Thread Tom Lane
Jim Nasby  writes:
> BTW, if you want to simply apply a function to all elements in an array 
> there is an internal C function array_map that can do it. There's no SQL 
> interface to it, but it shouldn't be hard to add one.

That wouldn't be useful for the example given originally, since it
iterates over just one array not two arrays in parallel.  But you could
imagine writing something similar that would iterate over two arrays and
call a two-argument function.

Whether it's worth a SQL interface is debatable though.  Whatever
efficiency you might gain from using this would probably be eaten by the
overhead of calling a SQL or PL function for each pair of array elements.
You'd probably end up in the same ballpark performance-wise as the UNNEST
solution given earlier.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] grep -f keyword data query

2015-12-29 Thread Hiroyuki Sato
Hello David

Thank you for replying.

2015年12月30日(水) 6:04 David Rowley :

> On 30 December 2015 at 04:21, Hiroyuki Sato  wrote:
>
>> 2015年12月29日(火) 4:35 Jeff Janes :
>>
>>>
>>>
>> But, the planner refuses to use this index for your query anyway,
>>> because it can't see that the patterns are all left-anchored.
>>>
>>> Really, your best bet is refactor your url data so it is stored with a
>>> url_prefix and url_suffix column.  Then you can do exact matching
>>> rather than pattern matching.
>>>
>> I see, exact matching faster than pattern matting.
>> But I need pattern match in path part
>> (ie, http://www.yahoo.com/a/b/c/... )
>>  I would like to pattern match '/a/b/c' part.
>>
>
> If your pattern matching is as simple as that, then why not split the
> /a/b/c/ part out as mentioned by Jeff? Alternatively you could just write a
> function which splits that out for you and returns it, then index that
> function, and then just include a call to that function in the join
> condition matching with the equality operator. That'll allow hash and merge
> joins to be possible again.
>

Could you tell me more detail about Alternatively part?

It is good idea to split host and part.
I'll try it.

My matching pattern is the following
1, http://www.yahoo.com/a/b/% (host equal, path like)
2, http://%.yahoo.com/a/b/%   (host and path like )

Can I use equality operator in those cases?

Best regards.


>
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>


Re: [GENERAL] grep -f keyword data query

2015-12-29 Thread David Rowley
On 30 December 2015 at 13:56, Hiroyuki Sato  wrote:

> 2015年12月30日(水) 6:04 David Rowley :
>
>> On 30 December 2015 at 04:21, Hiroyuki Sato  wrote:
>>
>>> 2015年12月29日(火) 4:35 Jeff Janes :
>>>


>>> But, the planner refuses to use this index for your query anyway,
 because it can't see that the patterns are all left-anchored.

 Really, your best bet is refactor your url data so it is stored with a
 url_prefix and url_suffix column.  Then you can do exact matching
 rather than pattern matching.

>>> I see, exact matching faster than pattern matting.
>>> But I need pattern match in path part
>>> (ie, http://www.yahoo.com/a/b/c/... )
>>>  I would like to pattern match '/a/b/c' part.
>>>
>>
>> If your pattern matching is as simple as that, then why not split the
>> /a/b/c/ part out as mentioned by Jeff? Alternatively you could just write a
>> function which splits that out for you and returns it, then index that
>> function, and then just include a call to that function in the join
>> condition matching with the equality operator. That'll allow hash and merge
>> joins to be possible again.
>>
>
> Could you tell me more detail about Alternatively part?
>
> It is good idea to split host and part.
> I'll try it.
>
> My matching pattern is the following
> 1, http://www.yahoo.com/a/b/% (host equal, path like)
> 2, http://%.yahoo.com/a/b/%   (host and path like )
>

It seems I misunderstood your pattern matching. The example you supplied
earlier indicated you just needed to match the document part (/a/b/c/) and
just ignore the protocol://host part, in which case you could have written
a function which took a text parameter, say: "http://www.yahoo.com/a/b/c/";,
and returned "/a/b/c", then performed: create index on yourtable
(thatfunction(yourcolumn)); However that method won't help you, as it seems
your pattern matching is more complex than the previous example that you
supplied.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [GENERAL] efficient math vector operations on arrays

2015-12-29 Thread Jim Nasby

On 12/29/15 6:50 PM, Tom Lane wrote:

Jim Nasby  writes:

>BTW, if you want to simply apply a function to all elements in an array
>there is an internal C function array_map that can do it. There's no SQL
>interface to it, but it shouldn't be hard to add one.

That wouldn't be useful for the example given originally, since it
iterates over just one array not two arrays in parallel.  But you could
imagine writing something similar that would iterate over two arrays and
call a two-argument function.


Actually, I suspect you could pretty easily do array_map(regprocedure, 
VARIADIC anyarray).



Whether it's worth a SQL interface is debatable though.  Whatever
efficiency you might gain from using this would probably be eaten by the
overhead of calling a SQL or PL function for each pair of array elements.
You'd probably end up in the same ballpark performance-wise as the UNNEST
solution given earlier.


Take a look at [1]; using a rough equivalent to array_map is 6% faster 
than unnest().


The array op array version is 30% faster that plpgsql, which based on 
the code at [2] I assume is doing


 explain analyze select array(select a*b from unnest(array(select 
random() from generate_series(1,100)), array(select random() from 
generate_series(1,100 u(a,b);


The syntactic sugar of r := array_map('function(a, b)', in1, in2) (let 
alone r := in1 * in2;) is appealing too.


[1] 
http://theplateisbad.blogspot.com/2015/12/the-arraymath-extension-vs-plpgsql.html
[2] 
http://theplateisbad.blogspot.com/2015/12/more-fortran-90-like-vector-operations.html

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Transfer db from one port to another

2015-12-29 Thread Killian Driscoll
On 24 December 2015 at 18:33, Adrian Klaver 
wrote:

> On 12/24/2015 12:03 AM, Killian Driscoll wrote:
>
>
>>
>> yeah, this one from Adrian, at 7:02am PST (Z-0800) this
>> morning
>>
>> Per previous posts you want, whenever possible, to us a
>> newer version
>> of pg_dump to move a database from an older version(9.3) to
>> a newer
>> one(9.4). Therefore you should do your dump and restore
>> using the
>> pg_dump.exe and pg_restore.exe from the Bitanami bin
>> directory. I
>> would cd to the above directory and do:
>>
>> pg_dump -V
>> pg_restore -V
>>
>> to make sure the programs are found and are the 9.4 versions.
>>
>> Then do:
>>
>> pg_dump -Fc -p 5432 -U postgres -f irll_project.out
>> irll_project
>>
>> pg_restore -U postgres -p 5532 irll_project.out
>>
>>
>> that last needs to have -d newdbname where newdbname has
>> already
>> been created, for instance, by...
>>
>>
>> Aah, my mistake. Yes you need to specify the database to get the
>> restore to work properly. Also explains why there is nothing in the
>> logs.
>>
>>
>> OK - with the inclusion of stating the dbname the restore works, but not
>> correctly: what is restored is 24 of 48 tables and 1 of 22 views from
>> one schema and no tables from the other schema.
>>
>> A log appeared at 0:08 last night (I'm at GMT +1), which I've attached.
>> Plus, I did the dump and restore again this morning and have attached
>> the text from the windows shell if that helps
>>
>>
>
> Well the one from this morning shows(I did not look through whole thing)
> you restoring over existing database objects. I would say at this point
> the  best thing you can do is get to a known state on the 9.4 cluster you
> want to dump to. I am assuming you are not doing anything with the database
> irll_project on the 9.4 server at this point, correct?
>
> If so, for the below keep track of exactly what you do and the order you
> do it, in case you need to post back here.
>
> 1) On the 9.4 server, while logged into another database on the server,
> say postgres do:
>
> DROP DATABASE irll_project;
>
> 2) Using the 9.4 version of pg_dump dump the 9.3 version of irll_project.
>
> 3) Using the 9.4 version of pg_restore restore irll_project to the 9.4
> server.
>

It worked - thank you very much for your time.

Regarding the file format used: I had used the pg_dump with .sql, but you
suggested .out. Is there a particular reason to use .out instead of .sql
when backing up?

>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>