Re: [PERFORM] Speed / Server
> -Original Message- > > > > The problem is, this next year we're anticipating significant growth, > > where we may be adding more like 20 million rows per month (roughly > 15GB > > of data). > > > > A row of data might have: > > The system identifier (int) > > Date/Time read (timestamp) > > Sensor identifier (int) > > Data Type (int) > > Data Value (double) > > One approach that can sometimes help is to use arrays to pack data. > Arrays may or may not work for the data you are collecting: they work > best when you always pull the entire array for analysis and not a > particular element of the array. Arrays work well because they pack > more data into index fetches and you get to skip the 20 byte tuple > header. That said, they are an 'optimization trade off'...you are > making one type of query fast at the expense of others. > I recently used arrays for a 'long and thin' table very like those described here. The tuple header became increasingly significant in our case. There are some details in my post: http://www.nabble.com/optimizing-for-temporal-data-behind-a-view-td25490818.html As Merlin points out: one considerable side-effect of using arrays is that it reduces the sort of queries which we could perform - i.e. querying data is was in an array becomes costly. So, we needed to make sure our user scenarios were (requirements) were well understood. richard -- Scanned by iCritical. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query plan for NOT IN
On Mon, Oct 5, 2009 at 8:35 PM, Guy Rouillier wrote: > Grzegorz Jaśkiewicz wrote: > >> >> well, as a rule of thumb - unless you can't think of a default value of >> column - don't use nulls. So using nulls as default 'idunno' - is a bad >> practice, but everybody's opinion on that differ. >> > > I don't understand this point of view. The concept of null was introduced > into the SQL vernacular by Codd and Date expressly to represent unknown > values. > > Yes, unknown. So as long as you know the default value of field, you should set it to such. For instance, if by default your account balance is 0, you should set it to 0, not leave it as null, etc. Other example, if client doesn't have description - leave it as blank '' string, instead of null. On the other hand, if you want to denote that the value wasn't set - use null, but use it wisely. Hence, I personally think that DEFAULT value (in create table) should be compulsory, and 'DEFAULT NULL' an option, that you would have to choose. Not to mention other (valid in this case) argument, that you would mostly use IN/EXISTS, and/or join keys on fields that are either PK, or at least NOT NULL. Hence, using JOIN instead of IN/EXISTS most of the times. One of My few personal wishes, ever since I started to use postgresql - is that it could rewrite IN/EXISTS into JOIN - when possible (that is, when columns are NOT NULL). -- GJ
Re: [PERFORM] Query plan for NOT IN
Grzegorz Jaœkiewicz wrote: > Guy Rouillier wrote: >> Grzegorz Jaœkiewicz wrote: >>> using nulls as default 'idunno' - is a bad practice >> I don't understand this point of view. The concept of null was >> introduced into the SQL vernacular by Codd and Date expressly to >> represent unknown values. > if by default your account balance is 0, you should set it to 0, not > leave it as null If your business rules are that a new account is created with a zero balance and then deposits are made, sure -- insert the account row with a zero balance, *because you know it to be zero*. It's been rare that I've seen anyone err on the side of using NULL in place of a default for such cases. Much more common is using, for example, 'NMI' in the middle name column to denote "No Middle Initial". Such "magic values" can cause no end of trouble. A failing of the SQL standard is that it uses the same mark (NULL) to show the absence of a value because it is unknown as for the case where it is known that no value exists (not applicable). Codd argued for a distinction there, but it hasn't come to pass, at least in the standard. If anyone could suggest a way to support standard syntax and semantics and add extensions to support this distinction, it might be another advance that would distinguish PostgreSQL from "less evolved" products. :-) None of that changes the requirement that NOT IN must result in UNKNOWN if any of the values involved are NULL. You can't say that my birthday is not in the set of birthdays for other subscribers to this list without knowing the birthdays of all subscribers. This definition of the operator makes it hard to optimize, but setting unknown birthdays to some date far in the past or future, to avoid using NULL, would just result in bogus results for this query as well as, for example, queries attempting to calculate aggregates on age. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query plan for NOT IN
Kevin Grittner wrote: Grzegorz Jaœkiewicz wrote: A failing of the SQL standard is that it uses the same mark (NULL) to show the absence of a value because it is unknown as for the case where it is known that no value exists (not applicable). Codd argued for a distinction there, but it hasn't come to pass, at least in the standard. If anyone could suggest a way to support standard syntax and semantics and add extensions to support this distinction, it might be another advance that would distinguish PostgreSQL from "less evolved" products. :-) Theoretically, the distinction already exists. If you don't know a person's middle initial, then set it to null; if you know the person doesn't have one, set it to the empty string. But from a practical point of view, that wouldn't go very far. Most *people* equate an empty string to mean the same as null. When I wrote my own data access layer years ago, I expressly checked for empty strings on input and changed them to null. I did this because empty strings had a nasty way of creeping into our databases; writing queries to produce predictable results got to be very messy. -- Guy Rouillier -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query plan for NOT IN
Guy Rouillier wrote: > Kevin Grittner wrote: >> A failing of the SQL standard is that it uses the same mark (NULL) >> to show the absence of a value because it is unknown as for the >> case where it is known that no value exists (not applicable). Codd >> argued for a distinction there, but it hasn't come to pass, at >> least in the standard. If anyone could suggest a way to support >> standard syntax and semantics and add extensions to support this >> distinction, it might be another advance that would distinguish >> PostgreSQL from "less evolved" products. :-) > > Theoretically, the distinction already exists. If you don't know a > person's middle initial, then set it to null; if you know the > person doesn't have one, set it to the empty string. Well, it is arguable whether an empty string is the proper way to indicate that a character string based column is not applicable to a given row, but it certainly falls flat for any other types, such as dates or numbers; and I think there's value in having a consistent way to handle this. > But from a practical point of view, that wouldn't go very far. > Most *people* equate an empty string to mean the same as null. When > I wrote my own data access layer years ago, I expressly checked for > empty strings on input and changed them to null. I did this because > empty strings had a nasty way of creeping into our databases; > writing queries to produce predictable results got to be very messy. Yeah, there's that, too. Which leaves the issue open -- a flexible way to flag the *reason* (or *reasons*) for the absence of a value could be a nice enhancement, if someone could invent a good implementation. Of course, one could always add a column to indicate the reason for a NULL; and perhaps that would be as good as any scheme to attach reason flags to NULL. You'd just have to make sure the reason column was null capable for those rows where there *was* a value, which would make the reason "not applicable" -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query plan for NOT IN
Kevin Grittner wrote: Which leaves the issue open -- a flexible way to flag the *reason* (or *reasons*) for the absence of a value could be a nice enhancement, if someone could invent a good implementation. Of course, one could always add a column to indicate the reason for a NULL; and perhaps that would be as good as any scheme to attach reason flags to NULL. You'd just have to make sure the reason column was null capable for those rows where there *was* a value, which would make the reason "not applicable" I'd argue that this is just a special case of a broader problem of metadata: Data about the data. For example, I could have a temperature, 40 degrees, and an error bounds, +/- 0.25 degrees. Nobody would think twice about making these separate columns. I don't see how this is any different from a person's middle initial of NULL, plus a separate column indicating "not known" versus "doesn't have one" if that distinction is important. There are many examples like this, where a simple value in one column isn't sufficient, so another column contains metadata that qualifies or clarifies the information. NULL is just one such case. But, this should probably be on an SQL discussion board, not PG performance... Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query plan for NOT IN
Craig James wrote: Kevin Grittner wrote: Which leaves the issue open -- a flexible way to flag the *reason* (or *reasons*) for the absence of a value could be a nice enhancement, if someone could invent a good implementation. Of course, one could always add a column to indicate the reason for a NULL; and perhaps that would be as good as any scheme to attach reason flags to NULL. You'd just have to make sure the reason column was null capable for those rows where there *was* a value, which would make the reason "not applicable" I'd argue that this is just a special case of a broader problem of metadata: Data about the data. For example, I could have a temperature, 40 degrees, and an error bounds, +/- 0.25 degrees. Nobody would think twice about making these separate columns. I don't see how this is any different from a person's middle initial of NULL, plus a separate column indicating "not known" versus "doesn't have one" if that distinction is important. There are many examples like this, where a simple value in one column isn't sufficient, so another column contains metadata that qualifies or clarifies the information. NULL is just one such case. But, this should probably be on an SQL discussion board, not PG performance... Most DBMSs I'm aware of use a null *byte* attached to a nullable column to indicate whether the column is null or not. yes/no takes one *bit*. That leaves 255 other possible values to describe the state of the column. That seems preferable to adding an additional column to every nullable column. But as you say, that would have to be taken up with the SQL standardization bodies, and not PostgreSQL. -- Guy Rouillier -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] dump time increase by 1h with new kernel
On Fri, 2 Oct 2009, Justin Pryzby wrote: When we upgraded from linux-2.6.24 to linux-2.6.27, our pg_dump duration increased by 20% from 5 hours to 6. Why 2.6.27 of all versions? It's one of the versions I skipped altogether as looking like a mess, after CFS broke everything in 2.6.23 I went right from 2.6.22 to 2.6.28 before I found things usable again. The first thing you're going to hear if you try to report this in kernel land is "is it still slow on 2.6.[last stable|head]?". If you can try both kernel versions, the other thing you really should do is collect data from "vmstat 1" during the pg_dump period. It would help narrow what area is slower. -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Regarding mulitple rows insert in one shot using ADO .net connected to postgres
Hi , I want to insert multiple Rows in one shot to improve my performance . >From C# code I am using ADO .net to connect to postgres . Currently i am pasting the code which is not of postgres but in my dev environment similar things i am doing with Postgres. MySqlConnection mySql = new MySqlConnection(); mySql.CreateConn(); mySql.Command = mySql.Connection.CreateCommand(); *mySql.Command.CommandText = "INSERT INTO dbo.table1 (textBox1, textBox2) VALUES (@textBox1, @textBox2)"; mySql.Command.Parameters.Add("@textBox1", SqlDbType.VarChar); mySql.Command.Parameters["@textBox1"].Value = TextBox1.Text; mySql.Command.Parameters.Add("@textBox2", SqlDbType.VarChar); mySql.Command.Parameters["@textBox2"].Value = TextBox2.Text; mySql.Command.ExecuteNonQuery(); *mySql.Command.Dispose(); mySql.Connection.Close(); mySql.CloseConn(); Hi i have hilighted the line in which I wanted to ask doubts . Currently i am inserting one row in one time and then executing the query . So with this approach i need to execute it many times for multiple rows insert because of this my database is poor in doing this each time for very large data. What i want here is to insert multiple rows and then executing it in one time only so that it will be faster. Please help me out in this regards . -- Thanks, Keshav Upadhyaya