On Thu, Mar 8, 2012 at 4:51 AM, Tom Lane wrote:
> Alexander Korotkov writes:
> > True. If (max count - min count + 1) is small, enumerating of frequencies
> > is both more compact and more precise representation. Simultaneously,
> > if (max count - min count + 1) is large, we can run out of
> >
On Thu, Mar 08, 2012 at 11:30:52AM -0500, Tom Lane wrote:
> Noah Misch writes:
> > On Wed, Mar 07, 2012 at 07:51:42PM -0500, Tom Lane wrote:
> >> On reflection my idea above is wrong; for example assume that we have a
> >> column with 900 arrays of length 1 and 100 arrays of length 2. Going by
>
Noah Misch writes:
> On Wed, Mar 07, 2012 at 07:51:42PM -0500, Tom Lane wrote:
>> On reflection my idea above is wrong; for example assume that we have a
>> column with 900 arrays of length 1 and 100 arrays of length 2. Going by
>> what I said, we'd reduce the histogram to {1,2}, which might accu
On Wed, Mar 07, 2012 at 07:51:42PM -0500, Tom Lane wrote:
> Alexander Korotkov writes:
> > True. If (max count - min count + 1) is small, enumerating of frequencies
> > is both more compact and more precise representation. Simultaneously,
> > if (max count - min count + 1) is large, we can run out
Alexander Korotkov writes:
> On Mon, Mar 5, 2012 at 1:11 AM, Tom Lane wrote:
>> Couldn't we reduce the histogram size when there aren't many
>> different counts?
>>
>> It seems fairly obvious to me that we could bound the histogram
>> size with (max count - min count + 1), but maybe something ev
On Mon, Mar 5, 2012 at 1:11 AM, Tom Lane wrote:
> BTW, one other thing about the count histogram: seems like we are
> frequently generating uselessly large ones. For instance, do ANALYZE
> in the regression database and then run
>
> select tablename,attname,elem_count_histogram from pg_stats
>
BTW, one other thing about the count histogram: seems like we are
frequently generating uselessly large ones. For instance, do ANALYZE
in the regression database and then run
select tablename,attname,elem_count_histogram from pg_stats
where elem_count_histogram is not null;
You get lots of ent
Alexander Korotkov writes:
> On Sun, Mar 4, 2012 at 5:38 AM, Tom Lane wrote:
>> 1. I'm still unhappy about the loop that fills the count histogram,
>> as I noted earlier today. It at least needs a decent comment and some
>> overflow protection, and I'm not entirely convinced that it doesn't have
Alexander Korotkov writes:
> On Sun, Mar 4, 2012 at 5:38 AM, Tom Lane wrote:
>> 2. The tests in the above-mentioned message show that in most cases
>> where mcelem_array_contained_selec falls through to the "rough
>> estimate", the resulting rowcount estimate is just 1, ie we are coming
>> out wi
On Sun, Mar 4, 2012 at 5:38 AM, Tom Lane wrote:
> 2. The tests in the above-mentioned message show that in most cases
> where mcelem_array_contained_selec falls through to the "rough
> estimate", the resulting rowcount estimate is just 1, ie we are coming
> out with very small selectivities. Alt
On Sun, Mar 4, 2012 at 5:38 AM, Tom Lane wrote:
> 1. I'm still unhappy about the loop that fills the count histogram,
> as I noted earlier today. It at least needs a decent comment and some
> overflow protection, and I'm not entirely convinced that it doesn't have
> more bugs than the overflow i
Alexander Korotkov writes:
> [ array statistics patch ]
I've committed this after a fair amount of editorialization. There are
still some loose ends to deal with, but I felt it was ready to go into
the tree for wider testing.
The main thing I changed that wasn't in the nature of cleanup/bugfixi
... BTW, could you explain exactly how that "Fill histogram by hashtab"
loop works? It's way too magic for my taste, and does in fact have bugs
in the currently submitted patch. I've reworked it to this:
/* Fill histogram by hashtab. */
delta = analyzed_rows - 1;
I wrote:
> ... So my preference is to align the two
> definitions of STATISTIC_KIND_MCELEM by adding a null-element frequency
> to tsvector's usage (where it'll always be zero) and getting rid of the
> average distinct element count here.
Actually, there's a way we can do this without code changes
Still working through this patch ... there are some things that bother
me about the entries being made in pg_statistic:
1. You re-used STATISTIC_KIND_MCELEM for something that, while similar
to tsvector's usage, is not the same. In particular, tsvector adds two
extra elements to the stanumbers ar
[ sorry Tom, reply all this time... ]
> What do you mean by "storing sequences as arrays"?
So, a simple example is, for transcripts ( sequences of DNA that are
turned into proteins ), we store each of the connected components as
an array of the form:
exon_type in [1,6]
splice_type = [1,3]
and t
Alvaro Herrera writes:
> Excerpts from Tom Lane's message of jue mar 01 18:51:38 -0300 2012:
>> How would we make it optional? There's noplace I can think of to stick
>> such a knob ...
> Uhm, attoptions?
Oh, I had forgotten we had that mechanism already. Yeah, that might
work. I'm a bit temp
Nathan Boley writes:
> Maybe this is bad design, but I've gotten in the habit of storing
> sequences as arrays and I commonly join on them. I looked through my
> code this morning, and I only have one 'range' query ( of the form
> described up-thread ), but there are tons of the form
> SELECT att
>> What about MCV's? Will those be removed as well?
>
> Sure. Those seem even less useful.
Ya, this will destroy the performance of several queries without some
heavy tweaking.
Maybe this is bad design, but I've gotten in the habit of storing
sequences as arrays and I commonly join on them. I lo
Excerpts from Tom Lane's message of jue mar 01 18:51:38 -0300 2012:
>
> Alvaro Herrera writes:
> > Excerpts from Robert Haas's message of jue mar 01 12:00:08 -0300 2012:
> >> On Wed, Feb 29, 2012 at 5:43 PM, Tom Lane wrote:
> >> I confess I am nervous about ripping this out. I am pretty sure w
Alvaro Herrera writes:
> Excerpts from Robert Haas's message of jue mar 01 12:00:08 -0300 2012:
>> On Wed, Feb 29, 2012 at 5:43 PM, Tom Lane wrote:
>> I confess I am nervous about ripping this out. I am pretty sure we
>> will get complaints about it. Performance optimizations that benefit
>> gr
Excerpts from Robert Haas's message of jue mar 01 12:00:08 -0300 2012:
> On Wed, Feb 29, 2012 at 5:43 PM, Tom Lane wrote:
> > No, just that we'd no longer have statistics relevant to that, and would
> > have to fall back on default selectivity assumptions. Do you think that
> > such application
On Wed, Feb 29, 2012 at 5:43 PM, Tom Lane wrote:
> Nathan Boley writes:
>> On Wed, Feb 29, 2012 at 12:39 PM, Tom Lane wrote:
>>> I am starting to look at this patch now. I'm wondering exactly why the
>>> decision was made to continue storing btree-style statistics for arrays,
>>> in addition to
On Thu, Mar 1, 2012 at 1:19 AM, Alexander Korotkov wrote:
> On Thu, Mar 1, 2012 at 1:09 AM, Tom Lane wrote:
>
>> That seems like a pretty narrow, uncommon use-case. Also, to get
>> accurate stats for such queries that way, you'd need really enormous
>> histograms. I doubt that the existing para
Nathan Boley writes:
> On Wed, Feb 29, 2012 at 2:43 PM, Tom Lane wrote:
>> Nathan Boley writes:
>>> If I understand you're suggestion, queries of the form
>>> SELECT * FROM rel
>>> WHERE ARRAY[ 1,2,3,4 ] <= x
>>> AND x <=ARRAY[ 1, 2, 3, 1000];
>>> would no longer use an index. Is that corre
On Wed, Feb 29, 2012 at 2:43 PM, Tom Lane wrote:
> Nathan Boley writes:
>> On Wed, Feb 29, 2012 at 12:39 PM, Tom Lane wrote:
>>> I am starting to look at this patch now. I'm wondering exactly why the
>>> decision was made to continue storing btree-style statistics for arrays,
>>> in addition to
Nathan Boley writes:
> On Wed, Feb 29, 2012 at 12:39 PM, Tom Lane wrote:
>> I am starting to look at this patch now. I'm wondering exactly why the
>> decision was made to continue storing btree-style statistics for arrays,
>> in addition to the new stuff.
> If I understand you're suggestion, qu
On Wed, Feb 29, 2012 at 12:39 PM, Tom Lane wrote:
> Alexander Korotkov writes:
>> On Mon, Jan 23, 2012 at 7:58 PM, Noah Misch wrote:
>>> I've attached a new version that includes the UINT64_FMT fix, some edits of
>>> your newest comments, and a rerun of pgindent on the new files. I see no
>>> o
On Thu, Mar 1, 2012 at 1:09 AM, Tom Lane wrote:
> Alexander Korotkov writes:
> > On Thu, Mar 1, 2012 at 12:39 AM, Tom Lane wrote:
> >> I am starting to look at this patch now. I'm wondering exactly why the
> >> decision was made to continue storing btree-style statistics for arrays,
>
> > Prob
Alexander Korotkov writes:
> On Thu, Mar 1, 2012 at 12:39 AM, Tom Lane wrote:
>> I am starting to look at this patch now. I'm wondering exactly why the
>> decision was made to continue storing btree-style statistics for arrays,
> Probably, btree statistics really does matter for some sort of ar
On Thu, Mar 1, 2012 at 12:39 AM, Tom Lane wrote:
> I am starting to look at this patch now. I'm wondering exactly why the
> decision was made to continue storing btree-style statistics for arrays,
> in addition to the new stuff. The pg_statistic rows for array columns
> tend to be unreasonably
Alexander Korotkov writes:
> On Mon, Jan 23, 2012 at 7:58 PM, Noah Misch wrote:
>> I've attached a new version that includes the UINT64_FMT fix, some edits of
>> your newest comments, and a rerun of pgindent on the new files. I see no
>> other issues precluding commit, so I am marking the patch
On Mon, Jan 23, 2012 at 7:58 PM, Noah Misch wrote:
> > + /* Take care about events with low probabilities. */
> > + if (rest > DEFAULT_CONTAIN_SEL)
> > + {
>
> Why the change from "rest > 0" to this in the latest version?
>
Ealier addition of "rest" distribution require O(m) time. Now
On Mon, Jan 23, 2012 at 01:21:20AM +0400, Alexander Korotkov wrote:
> Updated patch is attached. I've updated comment
> of mcelem_array_contained_selec with more detailed description of
> probability distribution assumption. Also, I found that "rest" behavious
> should be better described by Poisso
Hi!
Updated patch is attached. I've updated comment
of mcelem_array_contained_selec with more detailed description of
probability distribution assumption. Also, I found that "rest" behavious
should be better described by Poisson distribution, relevant changes were
made.
On Tue, Jan 17, 2012 at 2:
On Tue, Jan 17, 2012 at 12:04:06PM +0400, Alexander Korotkov wrote:
> Thanks for your fixes to the patch. Them looks correct to me. I did some
> fixes in the patch. The proof of some concepts is still needed. I'm going
> to provide it in a few days.
Your further fixes look good. Could you also an
Hi!
Thanks for your fixes to the patch. Them looks correct to me. I did some
fixes in the patch. The proof of some concepts is still needed. I'm going
to provide it in a few days.
On Thu, Jan 12, 2012 at 3:06 PM, Noah Misch wrote:
> > I'm not sure about shared lossy counting module, because par
Hi!
Patch where most part of issues are fixed is attached.
On Thu, Dec 29, 2011 at 8:35 PM, Noah Misch wrote:
> I find distressing the thought of having two copies of the lossy sampling
> code, each implementing the algorithm with different variable names and
levels
> of generality. We might so
Corrections:
On Thu, Dec 29, 2011 at 11:35:00AM -0500, Noah Misch wrote:
> On Wed, Nov 09, 2011 at 08:49:35PM +0400, Alexander Korotkov wrote:
> > + *We set s to be the estimated frequency of the K'th element in a
> > natural
> > + *language's frequency table, where K is the tar
On Wed, Jan 4, 2012 at 12:33 AM, Noah Misch wrote:
> On Wed, Jan 04, 2012 at 12:09:16AM +0400, Alexander Korotkov wrote:
> > Thanks for your great work on reviewing this patch. Now I'm trying to
> find
> > memory corruption bug. Unfortunately it doesn't appears on my system. Can
> > you check if
On Wed, Jan 04, 2012 at 12:09:16AM +0400, Alexander Korotkov wrote:
> Thanks for your great work on reviewing this patch. Now I'm trying to find
> memory corruption bug. Unfortunately it doesn't appears on my system. Can
> you check if this bug remains in attached version of patch. If so, please
>
Hi!
Thanks for your great work on reviewing this patch. Now I'm trying to find
memory corruption bug. Unfortunately it doesn't appears on my system. Can
you check if this bug remains in attached version of patch. If so, please
provide me information about system you're running (processor, OS etc.)
On Tue, Dec 20, 2011 at 04:37:37PM +0400, Alexander Korotkov wrote:
> On Wed, Nov 16, 2011 at 1:43 AM, Nathan Boley wrote:
>
> > FYI, I've added myself as the reviewer for the current commitfest.
> >
> How is going review now?
I will examine this patch within the week.
--
Sent via pgsql-hacker
Hi!
On Wed, Nov 16, 2011 at 1:43 AM, Nathan Boley wrote:
> FYI, I've added myself as the reviewer for the current commitfest.
>
How is going review now?
--
With best regards,
Alexander Korotkov.
> Rebased with head.
FYI, I've added myself as the reviewer for the current commitfest.
Best,
Nathan Boley
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Rebased with head.
--
With best regards,
Alexander Korotkov.
arrayanalyze-0.7.patch.gz
Description: GNU Zip compressed data
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
46 matches
Mail list logo