Re: [HACKERS] Collect frequency statistics for arrays

2012-03-12 Thread Alexander Korotkov
On Thu, Mar 8, 2012 at 4:51 AM, Tom Lane wrote: > Alexander Korotkov writes: > > True. If (max count - min count + 1) is small, enumerating of frequencies > > is both more compact and more precise representation. Simultaneously, > > if (max count - min count + 1) is large, we can run out of > >

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-08 Thread Noah Misch
On Thu, Mar 08, 2012 at 11:30:52AM -0500, Tom Lane wrote: > Noah Misch writes: > > On Wed, Mar 07, 2012 at 07:51:42PM -0500, Tom Lane wrote: > >> On reflection my idea above is wrong; for example assume that we have a > >> column with 900 arrays of length 1 and 100 arrays of length 2. Going by >

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-08 Thread Tom Lane
Noah Misch writes: > On Wed, Mar 07, 2012 at 07:51:42PM -0500, Tom Lane wrote: >> On reflection my idea above is wrong; for example assume that we have a >> column with 900 arrays of length 1 and 100 arrays of length 2. Going by >> what I said, we'd reduce the histogram to {1,2}, which might accu

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-08 Thread Noah Misch
On Wed, Mar 07, 2012 at 07:51:42PM -0500, Tom Lane wrote: > Alexander Korotkov writes: > > True. If (max count - min count + 1) is small, enumerating of frequencies > > is both more compact and more precise representation. Simultaneously, > > if (max count - min count + 1) is large, we can run out

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-07 Thread Tom Lane
Alexander Korotkov writes: > On Mon, Mar 5, 2012 at 1:11 AM, Tom Lane wrote: >> Couldn't we reduce the histogram size when there aren't many >> different counts? >> >> It seems fairly obvious to me that we could bound the histogram >> size with (max count - min count + 1), but maybe something ev

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-05 Thread Alexander Korotkov
On Mon, Mar 5, 2012 at 1:11 AM, Tom Lane wrote: > BTW, one other thing about the count histogram: seems like we are > frequently generating uselessly large ones. For instance, do ANALYZE > in the regression database and then run > > select tablename,attname,elem_count_histogram from pg_stats >

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-04 Thread Tom Lane
BTW, one other thing about the count histogram: seems like we are frequently generating uselessly large ones. For instance, do ANALYZE in the regression database and then run select tablename,attname,elem_count_histogram from pg_stats where elem_count_histogram is not null; You get lots of ent

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-04 Thread Tom Lane
Alexander Korotkov writes: > On Sun, Mar 4, 2012 at 5:38 AM, Tom Lane wrote: >> 1. I'm still unhappy about the loop that fills the count histogram, >> as I noted earlier today. It at least needs a decent comment and some >> overflow protection, and I'm not entirely convinced that it doesn't have

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-04 Thread Tom Lane
Alexander Korotkov writes: > On Sun, Mar 4, 2012 at 5:38 AM, Tom Lane wrote: >> 2. The tests in the above-mentioned message show that in most cases >> where mcelem_array_contained_selec falls through to the "rough >> estimate", the resulting rowcount estimate is just 1, ie we are coming >> out wi

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-04 Thread Alexander Korotkov
On Sun, Mar 4, 2012 at 5:38 AM, Tom Lane wrote: > 2. The tests in the above-mentioned message show that in most cases > where mcelem_array_contained_selec falls through to the "rough > estimate", the resulting rowcount estimate is just 1, ie we are coming > out with very small selectivities. Alt

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-04 Thread Alexander Korotkov
On Sun, Mar 4, 2012 at 5:38 AM, Tom Lane wrote: > 1. I'm still unhappy about the loop that fills the count histogram, > as I noted earlier today. It at least needs a decent comment and some > overflow protection, and I'm not entirely convinced that it doesn't have > more bugs than the overflow i

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-03 Thread Tom Lane
Alexander Korotkov writes: > [ array statistics patch ] I've committed this after a fair amount of editorialization. There are still some loose ends to deal with, but I felt it was ready to go into the tree for wider testing. The main thing I changed that wasn't in the nature of cleanup/bugfixi

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-03 Thread Tom Lane
... BTW, could you explain exactly how that "Fill histogram by hashtab" loop works? It's way too magic for my taste, and does in fact have bugs in the currently submitted patch. I've reworked it to this: /* Fill histogram by hashtab. */ delta = analyzed_rows - 1;

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-02 Thread Tom Lane
I wrote: > ... So my preference is to align the two > definitions of STATISTIC_KIND_MCELEM by adding a null-element frequency > to tsvector's usage (where it'll always be zero) and getting rid of the > average distinct element count here. Actually, there's a way we can do this without code changes

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-02 Thread Tom Lane
Still working through this patch ... there are some things that bother me about the entries being made in pg_statistic: 1. You re-used STATISTIC_KIND_MCELEM for something that, while similar to tsvector's usage, is not the same. In particular, tsvector adds two extra elements to the stanumbers ar

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-01 Thread Nathan Boley
[ sorry Tom, reply all this time... ] > What do you mean by "storing sequences as arrays"? So, a simple example is, for transcripts ( sequences of DNA that are turned into proteins ), we store each of the connected components as an array of the form: exon_type in [1,6] splice_type = [1,3] and t

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-01 Thread Tom Lane
Alvaro Herrera writes: > Excerpts from Tom Lane's message of jue mar 01 18:51:38 -0300 2012: >> How would we make it optional? There's noplace I can think of to stick >> such a knob ... > Uhm, attoptions? Oh, I had forgotten we had that mechanism already. Yeah, that might work. I'm a bit temp

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-01 Thread Tom Lane
Nathan Boley writes: > Maybe this is bad design, but I've gotten in the habit of storing > sequences as arrays and I commonly join on them. I looked through my > code this morning, and I only have one 'range' query ( of the form > described up-thread ), but there are tons of the form > SELECT att

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-01 Thread Nathan Boley
>> What about MCV's? Will those be removed as well? > > Sure.  Those seem even less useful. Ya, this will destroy the performance of several queries without some heavy tweaking. Maybe this is bad design, but I've gotten in the habit of storing sequences as arrays and I commonly join on them. I lo

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-01 Thread Alvaro Herrera
Excerpts from Tom Lane's message of jue mar 01 18:51:38 -0300 2012: > > Alvaro Herrera writes: > > Excerpts from Robert Haas's message of jue mar 01 12:00:08 -0300 2012: > >> On Wed, Feb 29, 2012 at 5:43 PM, Tom Lane wrote: > >> I confess I am nervous about ripping this out. I am pretty sure w

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-01 Thread Tom Lane
Alvaro Herrera writes: > Excerpts from Robert Haas's message of jue mar 01 12:00:08 -0300 2012: >> On Wed, Feb 29, 2012 at 5:43 PM, Tom Lane wrote: >> I confess I am nervous about ripping this out. I am pretty sure we >> will get complaints about it. Performance optimizations that benefit >> gr

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-01 Thread Alvaro Herrera
Excerpts from Robert Haas's message of jue mar 01 12:00:08 -0300 2012: > On Wed, Feb 29, 2012 at 5:43 PM, Tom Lane wrote: > > No, just that we'd no longer have statistics relevant to that, and would > > have to fall back on default selectivity assumptions.  Do you think that > > such application

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-01 Thread Robert Haas
On Wed, Feb 29, 2012 at 5:43 PM, Tom Lane wrote: > Nathan Boley writes: >> On Wed, Feb 29, 2012 at 12:39 PM, Tom Lane wrote: >>> I am starting to look at this patch now.  I'm wondering exactly why the >>> decision was made to continue storing btree-style statistics for arrays, >>> in addition to

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-01 Thread Alexander Korotkov
On Thu, Mar 1, 2012 at 1:19 AM, Alexander Korotkov wrote: > On Thu, Mar 1, 2012 at 1:09 AM, Tom Lane wrote: > >> That seems like a pretty narrow, uncommon use-case. Also, to get >> accurate stats for such queries that way, you'd need really enormous >> histograms. I doubt that the existing para

Re: [HACKERS] Collect frequency statistics for arrays

2012-02-29 Thread Tom Lane
Nathan Boley writes: > On Wed, Feb 29, 2012 at 2:43 PM, Tom Lane wrote: >> Nathan Boley writes: >>> If I understand you're suggestion, queries of the form >>> SELECT * FROM rel >>> WHERE ARRAY[ 1,2,3,4 ] <= x >>>      AND x <=ARRAY[ 1, 2, 3, 1000]; >>> would no longer use an index. Is that corre

Re: [HACKERS] Collect frequency statistics for arrays

2012-02-29 Thread Nathan Boley
On Wed, Feb 29, 2012 at 2:43 PM, Tom Lane wrote: > Nathan Boley writes: >> On Wed, Feb 29, 2012 at 12:39 PM, Tom Lane wrote: >>> I am starting to look at this patch now.  I'm wondering exactly why the >>> decision was made to continue storing btree-style statistics for arrays, >>> in addition to

Re: [HACKERS] Collect frequency statistics for arrays

2012-02-29 Thread Tom Lane
Nathan Boley writes: > On Wed, Feb 29, 2012 at 12:39 PM, Tom Lane wrote: >> I am starting to look at this patch now.  I'm wondering exactly why the >> decision was made to continue storing btree-style statistics for arrays, >> in addition to the new stuff. > If I understand you're suggestion, qu

Re: [HACKERS] Collect frequency statistics for arrays

2012-02-29 Thread Nathan Boley
On Wed, Feb 29, 2012 at 12:39 PM, Tom Lane wrote: > Alexander Korotkov writes: >> On Mon, Jan 23, 2012 at 7:58 PM, Noah Misch wrote: >>> I've attached a new version that includes the UINT64_FMT fix, some edits of >>> your newest comments, and a rerun of pgindent on the new files.  I see no >>> o

Re: [HACKERS] Collect frequency statistics for arrays

2012-02-29 Thread Alexander Korotkov
On Thu, Mar 1, 2012 at 1:09 AM, Tom Lane wrote: > Alexander Korotkov writes: > > On Thu, Mar 1, 2012 at 12:39 AM, Tom Lane wrote: > >> I am starting to look at this patch now. I'm wondering exactly why the > >> decision was made to continue storing btree-style statistics for arrays, > > > Prob

Re: [HACKERS] Collect frequency statistics for arrays

2012-02-29 Thread Tom Lane
Alexander Korotkov writes: > On Thu, Mar 1, 2012 at 12:39 AM, Tom Lane wrote: >> I am starting to look at this patch now. I'm wondering exactly why the >> decision was made to continue storing btree-style statistics for arrays, > Probably, btree statistics really does matter for some sort of ar

Re: [HACKERS] Collect frequency statistics for arrays

2012-02-29 Thread Alexander Korotkov
On Thu, Mar 1, 2012 at 12:39 AM, Tom Lane wrote: > I am starting to look at this patch now. I'm wondering exactly why the > decision was made to continue storing btree-style statistics for arrays, > in addition to the new stuff. The pg_statistic rows for array columns > tend to be unreasonably

Re: [HACKERS] Collect frequency statistics for arrays

2012-02-29 Thread Tom Lane
Alexander Korotkov writes: > On Mon, Jan 23, 2012 at 7:58 PM, Noah Misch wrote: >> I've attached a new version that includes the UINT64_FMT fix, some edits of >> your newest comments, and a rerun of pgindent on the new files. I see no >> other issues precluding commit, so I am marking the patch

Re: [HACKERS] Collect frequency statistics for arrays

2012-01-23 Thread Alexander Korotkov
On Mon, Jan 23, 2012 at 7:58 PM, Noah Misch wrote: > > + /* Take care about events with low probabilities. */ > > + if (rest > DEFAULT_CONTAIN_SEL) > > + { > > Why the change from "rest > 0" to this in the latest version? > Ealier addition of "rest" distribution require O(m) time. Now

Re: [HACKERS] Collect frequency statistics for arrays

2012-01-23 Thread Noah Misch
On Mon, Jan 23, 2012 at 01:21:20AM +0400, Alexander Korotkov wrote: > Updated patch is attached. I've updated comment > of mcelem_array_contained_selec with more detailed description of > probability distribution assumption. Also, I found that "rest" behavious > should be better described by Poisso

Re: [HACKERS] Collect frequency statistics for arrays

2012-01-22 Thread Alexander Korotkov
Hi! Updated patch is attached. I've updated comment of mcelem_array_contained_selec with more detailed description of probability distribution assumption. Also, I found that "rest" behavious should be better described by Poisson distribution, relevant changes were made. On Tue, Jan 17, 2012 at 2:

Re: [HACKERS] Collect frequency statistics for arrays

2012-01-17 Thread Noah Misch
On Tue, Jan 17, 2012 at 12:04:06PM +0400, Alexander Korotkov wrote: > Thanks for your fixes to the patch. Them looks correct to me. I did some > fixes in the patch. The proof of some concepts is still needed. I'm going > to provide it in a few days. Your further fixes look good. Could you also an

Re: [HACKERS] Collect frequency statistics for arrays

2012-01-17 Thread Alexander Korotkov
Hi! Thanks for your fixes to the patch. Them looks correct to me. I did some fixes in the patch. The proof of some concepts is still needed. I'm going to provide it in a few days. On Thu, Jan 12, 2012 at 3:06 PM, Noah Misch wrote: > > I'm not sure about shared lossy counting module, because par

Re: [HACKERS] Collect frequency statistics for arrays

2012-01-07 Thread Alexander Korotkov
Hi! Patch where most part of issues are fixed is attached. On Thu, Dec 29, 2011 at 8:35 PM, Noah Misch wrote: > I find distressing the thought of having two copies of the lossy sampling > code, each implementing the algorithm with different variable names and levels > of generality. We might so

Re: [HACKERS] Collect frequency statistics for arrays

2012-01-06 Thread Noah Misch
Corrections: On Thu, Dec 29, 2011 at 11:35:00AM -0500, Noah Misch wrote: > On Wed, Nov 09, 2011 at 08:49:35PM +0400, Alexander Korotkov wrote: > > + *We set s to be the estimated frequency of the K'th element in a > > natural > > + *language's frequency table, where K is the tar

Re: [HACKERS] Collect frequency statistics for arrays

2012-01-03 Thread Alexander Korotkov
On Wed, Jan 4, 2012 at 12:33 AM, Noah Misch wrote: > On Wed, Jan 04, 2012 at 12:09:16AM +0400, Alexander Korotkov wrote: > > Thanks for your great work on reviewing this patch. Now I'm trying to > find > > memory corruption bug. Unfortunately it doesn't appears on my system. Can > > you check if

Re: [HACKERS] Collect frequency statistics for arrays

2012-01-03 Thread Noah Misch
On Wed, Jan 04, 2012 at 12:09:16AM +0400, Alexander Korotkov wrote: > Thanks for your great work on reviewing this patch. Now I'm trying to find > memory corruption bug. Unfortunately it doesn't appears on my system. Can > you check if this bug remains in attached version of patch. If so, please >

Re: [HACKERS] Collect frequency statistics for arrays

2012-01-03 Thread Alexander Korotkov
Hi! Thanks for your great work on reviewing this patch. Now I'm trying to find memory corruption bug. Unfortunately it doesn't appears on my system. Can you check if this bug remains in attached version of patch. If so, please provide me information about system you're running (processor, OS etc.)

Re: [HACKERS] Collect frequency statistics for arrays

2011-12-27 Thread Noah Misch
On Tue, Dec 20, 2011 at 04:37:37PM +0400, Alexander Korotkov wrote: > On Wed, Nov 16, 2011 at 1:43 AM, Nathan Boley wrote: > > > FYI, I've added myself as the reviewer for the current commitfest. > > > How is going review now? I will examine this patch within the week. -- Sent via pgsql-hacker

Re: [HACKERS] Collect frequency statistics for arrays

2011-12-20 Thread Alexander Korotkov
Hi! On Wed, Nov 16, 2011 at 1:43 AM, Nathan Boley wrote: > FYI, I've added myself as the reviewer for the current commitfest. > How is going review now? -- With best regards, Alexander Korotkov.

Re: [HACKERS] Collect frequency statistics for arrays

2011-11-15 Thread Nathan Boley
> Rebased with head. FYI, I've added myself as the reviewer for the current commitfest. Best, Nathan Boley -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Collect frequency statistics for arrays

2011-11-09 Thread Alexander Korotkov
Rebased with head. -- With best regards, Alexander Korotkov. arrayanalyze-0.7.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers