Re: [PERFORM] Speed / Server

2009-10-07 Thread richard.henwood
> -Original Message-

> >
> > The problem is, this next year we're anticipating significant growth,
> > where we may be adding more like 20 million rows per month (roughly
> 15GB
> > of data).
> >
> > A row of data might have:
> >  The system identifier (int)
> >  Date/Time read (timestamp)
> >  Sensor identifier (int)
> >  Data Type (int)
> >  Data Value (double)
> 
> One approach that can sometimes help is to use arrays to pack data.
> Arrays may or may not work for the data you are collecting: they work
> best when you always pull the entire array for analysis and not a
> particular element of the array.  Arrays work well because they pack
> more data into index fetches and you get to skip the 20 byte tuple
> header.  That said, they are an 'optimization trade off'...you are
> making one type of query fast at the expense of others.
> 

I recently used arrays for a 'long and thin' table very like those
described here. The tuple header became increasingly significant in our
case. There are some details in my post:

http://www.nabble.com/optimizing-for-temporal-data-behind-a-view-td25490818.html

As Merlin points out: one considerable side-effect of using arrays 
is that it reduces the sort of queries which we could perform - 
i.e. querying data is was in an array becomes costly. 
So, we needed to make sure our user scenarios were (requirements) 
were well understood.

richard

--
Scanned by iCritical.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query plan for NOT IN

2009-10-07 Thread Grzegorz Jaśkiewicz
On Mon, Oct 5, 2009 at 8:35 PM, Guy Rouillier wrote:

> Grzegorz Jaśkiewicz wrote:
>
>>
>> well, as a rule of thumb - unless you can't think of a default value of
>> column - don't use nulls. So using nulls as default 'idunno' - is a bad
>> practice, but everybody's opinion on that differ.
>>
>
> I don't understand this point of view.  The concept of null was introduced
> into the SQL vernacular by Codd and Date expressly to represent unknown
> values.
>
> Yes, unknown. So as long as you know the default value of field, you should
set it to such.

For instance, if by default your account balance is 0, you should set it to
0, not leave it as null, etc. Other example, if client doesn't have
description - leave it as blank '' string, instead of null.

On the other hand, if you want to denote that the value wasn't set - use
null, but use it wisely. Hence, I personally think that DEFAULT value (in
create table) should be compulsory, and 'DEFAULT NULL' an option, that you
would have to choose.

Not to mention other (valid in this case) argument, that you would mostly
use IN/EXISTS, and/or join keys on fields that are either PK, or at least
NOT NULL. Hence, using JOIN instead of IN/EXISTS most of the times.
One of My few personal wishes, ever since I started to use postgresql - is
that it could rewrite IN/EXISTS into JOIN - when possible (that is, when
columns are NOT NULL).


-- 
GJ


Re: [PERFORM] Query plan for NOT IN

2009-10-07 Thread Kevin Grittner
Grzegorz Jaœkiewicz wrote:
> Guy Rouillier wrote:
>> Grzegorz Jaœkiewicz wrote:
 
>>> using nulls as default 'idunno' - is a bad practice
 
>> I don't understand this point of view.  The concept of null was
>> introduced into the SQL vernacular by Codd and Date expressly to
>> represent unknown values.
 
> if by default your account balance is 0, you should set it to 0, not
> leave it as null
 
If your business rules are that a new account is created with a zero
balance and then deposits are made, sure -- insert the account row
with a zero balance, *because you know it to be zero*.  It's been rare
that I've seen anyone err on the side of using NULL in place of a
default for such cases.  Much more common is using, for example, 'NMI'
in the middle name column to denote "No Middle Initial".  Such "magic
values" can cause no end of trouble.
 
A failing of the SQL standard is that it uses the same mark (NULL) to
show the absence of a value because it is unknown as for the case
where it is known that no value exists (not applicable).  Codd argued
for a distinction there, but it hasn't come to pass, at least in the
standard.  If anyone could suggest a way to support standard syntax
and semantics and add extensions to support this distinction, it might
be another advance that would distinguish PostgreSQL from "less
evolved" products.   :-)
 
None of that changes the requirement that NOT IN must result in
UNKNOWN if any of the values involved are NULL.  You can't say that my
birthday is not in the set of birthdays for other subscribers to this
list without knowing the birthdays of all subscribers.  This
definition of the operator makes it hard to optimize, but setting
unknown birthdays to some date far in the past or future, to avoid
using NULL, would just result in bogus results for this query as well
as, for example, queries attempting to calculate aggregates on age.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query plan for NOT IN

2009-10-07 Thread Guy Rouillier

Kevin Grittner wrote:

Grzegorz Jaœkiewicz wrote:



A failing of the SQL standard is that it uses the same mark (NULL) to
show the absence of a value because it is unknown as for the case
where it is known that no value exists (not applicable).  Codd argued
for a distinction there, but it hasn't come to pass, at least in the
standard.  If anyone could suggest a way to support standard syntax
and semantics and add extensions to support this distinction, it might
be another advance that would distinguish PostgreSQL from "less
evolved" products.   :-)


Theoretically, the distinction already exists.  If you don't know a 
person's middle initial, then set it to null; if you know the person 
doesn't have one, set it to the empty string.


But from a practical point of view, that wouldn't go very far.  Most 
*people* equate an empty string to mean the same as null.  When I wrote 
my own data access layer years ago, I expressly checked for empty 
strings on input and changed them to null.  I did this because empty 
strings had a nasty way of creeping into our databases; writing queries 
to produce predictable results got to be very messy.


--
Guy Rouillier

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query plan for NOT IN

2009-10-07 Thread Kevin Grittner
Guy Rouillier  wrote: 
> Kevin Grittner wrote:
 
>> A failing of the SQL standard is that it uses the same mark (NULL)
>> to show the absence of a value because it is unknown as for the
>> case where it is known that no value exists (not applicable).  Codd
>> argued for a distinction there, but it hasn't come to pass, at
>> least in the standard.  If anyone could suggest a way to support
>> standard syntax and semantics and add extensions to support this
>> distinction, it might be another advance that would distinguish
>> PostgreSQL from "less evolved" products.   :-)
> 
> Theoretically, the distinction already exists.  If you don't know a 
> person's middle initial, then set it to null; if you know the
> person doesn't have one, set it to the empty string.
 
Well, it is arguable whether an empty string is the proper way to
indicate that a character string based column is not applicable to a
given row, but it certainly falls flat for any other types, such as
dates or numbers; and I think there's value in having a consistent way
to handle this. 
 
> But from a practical point of view, that wouldn't go very far. 
> Most *people* equate an empty string to mean the same as null.  When
> I wrote my own data access layer years ago, I expressly checked for
> empty strings on input and changed them to null.  I did this because
> empty strings had a nasty way of creeping into our databases;
> writing queries to produce predictable results got to be very messy.
 
Yeah, there's that, too.
 
Which leaves the issue open -- a flexible way to flag the *reason* (or
*reasons*) for the absence of a value could be a nice enhancement, if
someone could invent a good implementation.  Of course, one could
always add a column to indicate the reason for a NULL; and perhaps
that would be as good as any scheme to attach reason flags to NULL. 
You'd just have to make sure the reason column was null capable for
those rows where there *was* a value, which would make the reason "not
applicable"
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query plan for NOT IN

2009-10-07 Thread Craig James

Kevin Grittner wrote:

Which leaves the issue open -- a flexible way to flag the *reason* (or
*reasons*) for the absence of a value could be a nice enhancement, if
someone could invent a good implementation.  Of course, one could
always add a column to indicate the reason for a NULL; and perhaps
that would be as good as any scheme to attach reason flags to NULL. 
You'd just have to make sure the reason column was null capable for

those rows where there *was* a value, which would make the reason "not
applicable"


I'd argue that this is just a special case of a broader problem of metadata: Data about the data.  
For example, I could have a temperature, 40 degrees, and an error bounds, +/- 0.25 degrees.  Nobody 
would think twice about making these separate columns.  I don't see how this is any different from 
a person's middle initial of NULL, plus a separate column indicating "not known" versus 
"doesn't have one" if that distinction is important.  There are many examples like this, 
where a simple value in one column isn't sufficient, so another column contains metadata that 
qualifies or clarifies the information.  NULL is just one such case.

But, this should probably be on an SQL discussion board, not PG performance...

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query plan for NOT IN

2009-10-07 Thread Guy Rouillier

Craig James wrote:

Kevin Grittner wrote:

Which leaves the issue open -- a flexible way to flag the *reason* (or
*reasons*) for the absence of a value could be a nice enhancement, if
someone could invent a good implementation.  Of course, one could
always add a column to indicate the reason for a NULL; and perhaps
that would be as good as any scheme to attach reason flags to NULL. 
You'd just have to make sure the reason column was null capable for

those rows where there *was* a value, which would make the reason "not
applicable"


I'd argue that this is just a special case of a broader problem of 
metadata: Data about the data.  For example, I could have a temperature, 
40 degrees, and an error bounds, +/- 0.25 degrees.  Nobody would think 
twice about making these separate columns.  I don't see how this is any 
different from a person's middle initial of NULL, plus a separate column 
indicating "not known" versus "doesn't have one" if that distinction is 
important.  There are many examples like this, where a simple value in 
one column isn't sufficient, so another column contains metadata that 
qualifies or clarifies the information.  NULL is just one such case.


But, this should probably be on an SQL discussion board, not PG 
performance...


Most DBMSs I'm aware of use a null *byte* attached to a nullable column 
to indicate whether the column is null or not.  yes/no takes one *bit*. 
 That leaves 255 other possible values to describe the state of the 
column.  That seems preferable to adding an additional column to every 
nullable column.


But as you say, that would have to be taken up with the SQL 
standardization bodies, and not PostgreSQL.


--
Guy Rouillier

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] dump time increase by 1h with new kernel

2009-10-07 Thread Greg Smith

On Fri, 2 Oct 2009, Justin Pryzby wrote:


When we upgraded from linux-2.6.24 to linux-2.6.27, our pg_dump
duration increased by 20% from 5 hours to 6.


Why 2.6.27 of all versions?  It's one of the versions I skipped altogether 
as looking like a mess, after CFS broke everything in 2.6.23 I went right 
from 2.6.22 to 2.6.28 before I found things usable again.  The first thing 
you're going to hear if you try to report this in kernel land is "is it 
still slow on 2.6.[last stable|head]?".


If you can try both kernel versions, the other thing you really should do 
is collect data from "vmstat 1" during the pg_dump period.  It would help 
narrow what area is slower.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Regarding mulitple rows insert in one shot using ADO .net connected to postgres

2009-10-07 Thread keshav upadhyaya
Hi ,

I want to insert multiple Rows in one shot to improve my performance .

>From C# code I am using ADO .net to connect to postgres .
Currently i am pasting the code which is not of postgres but in my dev
environment similar things  i am doing with Postgres.

MySqlConnection mySql = new MySqlConnection();
mySql.CreateConn();
mySql.Command = mySql.Connection.CreateCommand();
*mySql.Command.CommandText = "INSERT INTO dbo.table1 (textBox1,
textBox2) VALUES (@textBox1, @textBox2)";

mySql.Command.Parameters.Add("@textBox1", SqlDbType.VarChar);
mySql.Command.Parameters["@textBox1"].Value = TextBox1.Text;
mySql.Command.Parameters.Add("@textBox2", SqlDbType.VarChar);
mySql.Command.Parameters["@textBox2"].Value = TextBox2.Text;

mySql.Command.ExecuteNonQuery();

*mySql.Command.Dispose();
mySql.Connection.Close();
mySql.CloseConn();


Hi i have hilighted the line in  which I wanted to ask doubts .

Currently i am inserting one row in one time and then executing the query .
So with this approach i need to execute it many times for multiple rows
insert because of this my database is poor in doing this each time for very
large data.

What i want here is to insert multiple rows and then executing it in one
time only so that it will be faster.

Please help me out in this regards .

-- 
Thanks,
Keshav Upadhyaya