Re: [HACKERS] Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server
Tom Lane wrote: > Okay, I spent some time googling this question, and I can't find any > suggestion that any ARM variant uses non-IEEE-compliant float format. > What *is* real clear is that depending on ARM model and a run time (!) > CPU endianness flag, there are three or four different possibilities > for the endianness of the data, including a PDP-endian-like alternative > in which the order of the high and low words is at variance with the > order of bytes within the words. (Pardon me while I go vomit...) > Welcome to the wonderful world of embedded CPUs. These buggers will do ANYTHING, and I do mean anything, in order to squeeze a little more performance with a little less power consumption, while keeping the end price tag under 10$. The ARM9, for example, can switch, on the fly, between 32 and 16 bit machine language in order to save a few bytes in code size and gain a few MIPS in execution speed. As an amusing side note, I have heard a claim that the only reason we need endianity at all is because the Europeans didn't understand that Arabic is written from right to left. In Arabic you read "17" as "seven and ten", which means that it is already little endian. Just one request, please don't quote this story without also mentioning that this story is wrong, and that 1234 is said, in Arabic, as "one thousand two hundred four and thirty". Mixed endianity is usually relic of a 16bit processor that was enhanced to 32bit. The parts that were atomic before would be big endian, but the parts that the old CPU required to do in separate operations are stored low to high. > So > I would concur with a patch that ensures that this is what happens > on the different ARM variants ... though I'll still be interested > to see how you make that happen given the rather poor visibility > into which model and endianness we are running on. > You do it semantically. Attached is the outline for the code (I can form a patch only after we agree where it should go) I should note a few things: On IEEE platforms, the code will, of course, translate to/from the same format. This can be verified by the dump at the end. I have tested the code on several numbers, and it does work for normal and for denormalized numbers. I have not tested whether the detection whether we should generate one or the other actually works, so there may be an off by one there. The are a few corner cases that are not yet handled. Two are documented (underflow and rounding on denormalized numbers). There is one undocumented, of overflow. The IEEE -> native code is not yet written, but I think it should be fairly obvious how it will look once it is. There is also a function in the code called "calcsize". It's the beginning of a function to calculate the parameters for the current platform, again, without knowing the native format. I was thinking of putting it in the "configure" test, except, of course, the platforms we refer to are, typically, ones for which you cross compile. See below. Comments welcome. > PS: Of course this does not resolve the generic issue of what to do > with platforms that have outright non-IEEE-format floats. But at the > moment I don't see evidence that we need reach that issue for ARM. > The code above does detect when the float isn't being precisely represented by the IEEE float. We could have another format for those cases, and distinguish between the cases on import by testing its size. > PPS: I'm sort of wondering if the PDP-endian business doesn't afflict > int8 too on this platform. > It's likely. I would say that a configure test would be the best way to test it, but I suspect that most programs for ARM are cross compiled. I'm not sure how to resolve that. Maybe if there's a way to automatically test what gets into memory when you let the compiler create the constant 0123456789abcdef. At least for smaller than 8 bytes, the "hton" functions SHOULD do the right thing always. I COULD go back to my source (he's on vacation until Sunday anyways), but I'll throw in a guess. Since the ARMs (at least the 7 and the 9) are not 64 bit native, it's compiler dependent. There are two main compilers for the ARM, with one of them being gcc. That's, more or less, where my insights into this end. Shachar #include #include #include #include // What type would we be working on? #if 1 // Double #define TYPE double #define FRAC_BITS 52 #define EXP_BITS 11 #define EXP_BIAS 1023 #else // Float #define TYPE float #define FRAC_BITS 23 #define EXP_BITS 8 #define EXP_BIAS 127 #endif union fp { TYPE flt; struct { unsigned long low; unsigned long high; } i; unsigned long long l; struct { unsigned long long int frac:FRAC_BITS; unsigned long long int exp:EXP_BITS; unsigned long long int sign:1; } fp; }; void dumpnum( TYPE n ) { union fp val; val.flt=n; val.fp.sign=0; val.fp.exp=0x7ff; val.fp.frac=12; printf("%g %08x%08x\n", val.flt, val.i.high, val.i.low ); p
Do we need a TODO? (was Re: [HACKERS] Concurrently updating an updatable view)
Florian G. Pflug wrote: Is there consensus what the correct behaviour should be for self-referential updates in read-committed mode? Does the SQL Spec have anything to say about this? This seems to have gone all quiet. Do we need a TODO to keep a note of it? Just "correct behaviour for self-referential updates" Hiroshi originally noted the problem in one of his views here: http://archives.postgresql.org/pgsql-hackers/2007-05/msg00507.php -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [BUGS] Inconsistant SQL results - Suspected error with query planing or query optimisation.
adam terrey <[EMAIL PROTECTED]> writes at http://archives.postgresql.org/pgsql-bugs/2007-05/msg00187.php > Anyway, mybug: I have a test SELECT statement (Listing A - see "sql > listings.txt") wich produces different results under two simular setups > (Listing B and Listing C). Each setup should product the same result for > the given SELECT statement. The problem here is that 8.2 is incorrectly concluding that it can rearrange the order of the two LEFT JOIN steps in the query: SELECT a.id FROM items a LEFT JOIN ( SELECT b.id FROM items b LEFT JOIN ( SELECT c.id FROM items c WHERE number = 1 ) AS moded_items USING (id) WHERE moded_items.id IS NULL ) AS sub_items USING (id) WHERE sub_items.id IS NULL; The plan it comes up with is: Nested Loop Left Join (cost=469.00..1063.39 rows=1 width=4) (actual time=288.962..288.962 rows=0 loops=1) Filter: (c.id IS NULL) -> Hash Left Join (cost=469.00..1063.00 rows=1 width=8) (actual time=288.946..288.946 rows=0 loops=1) Hash Cond: (a.id = b.id) Filter: (b.id IS NULL) -> Seq Scan on items a (cost=0.00..344.00 rows=1 width=4) (actual time=0.080..50.973 rows=1 loops=1) -> Hash (cost=344.00..344.00 rows=1 width=4) (actual time=140.880..140.880 rows=1 loops=1) -> Seq Scan on items b (cost=0.00..344.00 rows=1 width=4) (actual time=0.046..69.395 rows=1 loops=1) -> Index Scan using items_pkey on items c (cost=0.00..0.38 rows=1 width=4) (never executed) Index Cond: (b.id = c.id) Filter: (c.number = 1) After reducing join_collapse_limit to 1, we get the right join order and the right answers: Hash Left Join (cost=750.54..1132.05 rows=1 width=4) (actual time=409.712..409.740 rows=2 loops=1) Hash Cond: (a.id = b.id) Filter: (b.id IS NULL) -> Seq Scan on items a (cost=0.00..344.00 rows=1 width=4) (actual time=0.100..51.052 rows=1 loops=1) -> Hash (cost=750.52..750.52 rows=1 width=4) (actual time=264.978..264.978 rows=9998 loops=1) -> Hash Left Join (cost=369.01..750.52 rows=1 width=4) (actual time=30.074..192.023 rows=9998 loops=1) Hash Cond: (b.id = c.id) Filter: (c.id IS NULL) -> Seq Scan on items b (cost=0.00..344.00 rows=1 width=4) (actual time=0.030..50.913 rows=1 loops=1) -> Hash (cost=369.00..369.00 rows=1 width=4) (actual time=29.976..29.976 rows=2 loops=1) -> Seq Scan on items c (cost=0.00..369.00 rows=1 width=4) (actual time=29.896..29.916 rows=2 loops=1) Filter: (number = 1) So there is something wrong with the rule used for deciding whether two LEFT JOINs can commute. Per the planner README: : The planner's treatment of outer join reordering is based on the following : identities: : : 1.(A leftjoin B on (Pab)) innerjoin C on (Pac) : = (A innerjoin C on (Pac)) leftjoin B on (Pab) : : where Pac is a predicate referencing A and C, etc (in this case, clearly : Pac cannot reference B, or the transformation is nonsensical). : : 2.(A leftjoin B on (Pab)) leftjoin C on (Pac) : = (A leftjoin C on (Pac)) leftjoin B on (Pab) : : 3.(A leftjoin B on (Pab)) leftjoin C on (Pbc) : = A leftjoin (B leftjoin C on (Pbc)) on (Pab) : : Identity 3 only holds if predicate Pbc must fail for all-null B rows : (that is, Pbc is strict for at least one column of B). If Pbc is not : strict, the first form might produce some rows with nonnull C columns : where the second form would make those entries null. What we have here is an invocation of rule 3 in a situation where it's not appropriate. The difficulty is that the code is only paying attention to the syntactical JOIN/ON clauses and has neglected the intermediate-level WHERE clause. After a bit of reflection it seems that a WHERE that is semantically just below a left-join's right side can be treated as if it were part of that left-join's ON clause. It will have the same effect as if it had been written there: any rows rejected by the WHERE will fail to be joined to the left side and will contribute nothing to the result. Had we been following this rule, we'd have concluded that c.id IS NULL is part of the upper join qual, and therefore that it has a predicate Pabc not just Pab and cannot be commuted with the lower join. Teaching initsplan.c to do things this way seems possible but less than trivial. Before I start worrying about that, does anyone see any flaws in the reasoning at this level of detail? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] like/ilike improvements
Starting from a review of a patch from Itagaki Takahiro to improve LIKE performance for UTF8-encoded databases, I have been working on improving both efficiency of the LIKE/ILIKE code and the code quality. The main efficiency improvement comes from some fairly tricky analysis and discussion on -patches. Essentially there are two calls that we make to advance the text and pattern cursors: NextByte and NextChar. In the case of single byte charsets these are in fact the same thing, but in multi byte charsets they are obviously not, and in that case NextChar is a lot more expensive. It turns out (according to the analysis) that the only time we actually need to use NextChar is when we are matching an "_" in a like/ilike pattern. It also turns out that there are some comparison tests that we can hoist out of a loop and thus avoid repeating over and over. Also, some calls can be marked "inline" to improve efficiency. Finally, the special case of computing lower(x) on the fly for ILIKE comparisons on single byte charset strings turns out to have the potential to call lower() O(n^2) times, so it has been removed and we now treat foo ILIKE bar as lower(foo) LIKE lower(bar) for all charsets uniformly. There will be cases where this approach wins and cases where it loses, but the wins are potentially dramatic, whereas the losses should be mild. The current state of this work is at http://archives.postgresql.org/pgsql-patches/2007-05/msg00385.php I've been testing it using a set of 5m rows of random Latin1 data - each row is between 100 and 400 chars long, and 20% of them (roughly) have the string "foo" randomly located within them. The test platform is gcc/fc6/AMD64. I have loaded the data into both Latin1 and UTF8 encoded databases. (I'm not sure if there are other multibyte charsets that are compatible with Latin1 client encoding). My test is essentially: select count(*) from footable where t like '%_foo%'; select count(*) from footable where t ilike '%_foo%'; select count(*) from footable where t like '%foo%'; select count(*) from footable where t ilike '%foo%'; Note that the "%_" case is probably the worst for these changes, since it involves lots of calls to NextChar() (see above). The multibyte results show significant improvement. The results are about flat or a slight improvement for the singlebyte cases. I'll post some numbers on this shortly. But before I commit this I'd appreciate seeing some more testing, both for correctness and performance. cheers andrew ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] like/ilike improvements
Andrew Dunstan <[EMAIL PROTECTED]> writes: > ... It turns out (according to the analysis) that the > only time we actually need to use NextChar is when we are matching an > "_" in a like/ilike pattern. I thought we'd determined that advancing bytewise for "%" was also risky, in two cases: 1. Multibyte character set that is not UTF8 (more specifically, does not have a guarantee that first bytes and not-first bytes are distinct) 2. "_" immediately follows the "%". regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] like/ilike improvements
Tom Lane wrote: Andrew Dunstan <[EMAIL PROTECTED]> writes: ... It turns out (according to the analysis) that the only time we actually need to use NextChar is when we are matching an "_" in a like/ilike pattern. I thought we'd determined that advancing bytewise for "%" was also risky, in two cases: 1. Multibyte character set that is not UTF8 (more specifically, does not have a guarantee that first bytes and not-first bytes are distinct) I will review - I thought we had ruled that out. Which non-UTF8 multi-byte charset would be best to test with? 2. "_" immediately follows the "%". The patch in fact calls NextChar in this case. cheers andrew ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] like/ilike improvements
Andrew Dunstan wrote: Tom Lane wrote: Andrew Dunstan <[EMAIL PROTECTED]> writes: ... It turns out (according to the analysis) that the only time we actually need to use NextChar is when we are matching an "_" in a like/ilike pattern. I thought we'd determined that advancing bytewise for "%" was also risky, in two cases: 1. Multibyte character set that is not UTF8 (more specifically, does not have a guarantee that first bytes and not-first bytes are distinct) I thought we disposed of the idea that there was a problem with charsets that didn't do first byte special. And Dennis said: Tom Lane skrev: You could imagine trying to do % a byte at a time (and indeed that's what I'd been thinking it did) but that gets you out of sync which breaks the _ case. It is only when you have a pattern like '%_' when this is a problem and we could detect this and do byte by byte when it's not. Now we check (*p == '\\') || (*p == '_') in each iteration when we scan over characters for '%', and we could do it once and have different loops for the two cases. That's pretty much what the patch does now - It never tries to match a single byte when it sees "_", whether or not preceeded by "%". cheers andrew ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server
On Tue, May 22, 2007 at 05:14:54PM +0300, Shachar Shemesh wrote: > As an amusing side note, I have heard a claim that the only reason we > need endianity at all is because the Europeans didn't understand that > Arabic is written from right to left. In Arabic you read "17" as "seven > and ten", which means that it is already little endian. Just one > request, please don't quote this story without also mentioning that this > story is wrong, and that 1234 is said, in Arabic, as "one thousand two > hundred four and thirty". For the record, dutch works like too, which leads to a fascinating way of reading phone numbers. 345678 becomes: four and thirty, six and fifty, eight and seventy. Takes a while to get used to that... Have a nice day, -- Martijn van Oosterhout <[EMAIL PROTECTED]> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [HACKERS] like/ilike improvements
Andrew Dunstan <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> I thought we'd determined that advancing bytewise for "%" was also >> risky, in two cases: >> >> 1. Multibyte character set that is not UTF8 (more specifically, does not >> have a guarantee that first bytes and not-first bytes are distinct) > I thought we disposed of the idea that there was a problem with charsets > that didn't do first byte special. We disposed of that in connection with a version of the patch that had "%" advancing in NextChar units, so that comparison of ordinary characters was always safely char-aligned. Consider 2-byte characters represented as {AB} etc: DATAx{AB}{CD}y PATTERN %{BC}% If "%" advances by bytes then this will find a spurious match. The only thing that prevents it is if "B" can't be both a leading and a trailing byte of validly-encoded MB characters. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server
Shachar Shemesh <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> I would concur with a patch that ensures that this is what happens >> on the different ARM variants ... though I'll still be interested >> to see how you make that happen given the rather poor visibility >> into which model and endianness we are running on. >> > You do it semantically. Attached is the outline for the code (I can form > a patch only after we agree where it should go) Cross-compile situations make life interesting. [ hold your nose before reading further... ] After studying how AC_C_BIGENDIAN does it, I propose that the best answer might be to compile a test program that contains carefully-chosen "double" constants, then grep the object file for the expected patterns. This works as long as the compiler knows what format it's supposed to emit (and if it doesn't, lots of other stuff will fall over). The only alternative that would work reliably is to run the test once when the result is first needed, which is kind of unfortunate because it involves continuing runtime overhead (at least a "switch" on every conversion). We in fact did things that way for integer endianness awhile back, but since we are now depending on AC_C_BIGENDIAN to get it right, I'd feel more comfortable using a similar solution for float endianness. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] like/ilike improvements
On 5/22/07, Andrew Dunstan <[EMAIL PROTECTED]> wrote: But before I commit this I'd appreciate seeing some more testing, both for correctness and performance. Any chance the patch applies cleanly on a 8.2 code base? I can test it on a real life 8.2 db but I won't have the time to load the data in a CVS HEAD one. If there is no obvious reason for it to fail on 8.2, I'll try to see if I can apply it. Thanks. -- Guillaume ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] like/ilike improvements
On 2007-05-22, Tom Lane <[EMAIL PROTECTED]> wrote: > If "%" advances by bytes then this will find a spurious match. The > only thing that prevents it is if "B" can't be both a leading and a > trailing byte of validly-encoded MB characters. Which is (by design) true in UTF8, but is not true of most other multibyte charsets. The %_ case is also trivially handled in UTF8 by simply ensuring that _ doesn't match a non-initial octet. This allows % to advance by bytes without danger of losing sync. -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] like/ilike improvements
Andrew - Supernews <[EMAIL PROTECTED]> writes: > On 2007-05-22, Tom Lane <[EMAIL PROTECTED]> wrote: >> If "%" advances by bytes then this will find a spurious match. The >> only thing that prevents it is if "B" can't be both a leading and a >> trailing byte of validly-encoded MB characters. > Which is (by design) true in UTF8, but is not true of most other > multibyte charsets. > The %_ case is also trivially handled in UTF8 by simply ensuring that > _ doesn't match a non-initial octet. This allows % to advance by bytes > without danger of losing sync. Yeah. It seems we need three comparison functions after all: 1. Single-byte character set: needs NextByte and ByteEq only. 2. Generic multi-byte character set: both % and _ must advance by characters to ensure we never try an out-of-alignment character comparison. But simple character comparison works bytewise given that. So primitives are NextChar, NextByte, ByteEq. 3. UTF8: % can advance bytewise. _ must check it is on a first byte (else return match failure) and if so do NextChar. So primitives are NextChar, NextByte, ByteEq, IsFirstByte. In no case do we need CharEq. I'd be inclined to drop ByteEq as a macro and just use "==", too. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] like/ilike improvements
On Tue, May 22, 2007 at 12:12:51PM -0400, Tom Lane wrote: > Andrew Dunstan <[EMAIL PROTECTED]> writes: > > ... It turns out (according to the analysis) that the > > only time we actually need to use NextChar is when we are matching an > > "_" in a like/ilike pattern. > I thought we'd determined that advancing bytewise for "%" was also risky, > in two cases: > 1. Multibyte character set that is not UTF8 (more specifically, does not > have a guarantee that first bytes and not-first bytes are distinct) > 2. "_" immediately follows the "%". Have you considered a two pass approach? First pass - match on bytes. Only if you find a match with the first pass, start a second pass to do a 'safe' check? Are there optimizations to recognize whether the index was created as lower(field) or upper(field), and translate ILIKE to the appropriate one? Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] like/ilike improvements
Tom Lane wrote: Yeah. It seems we need three comparison functions after all: Yeah, that was my confusion. I thought we had concluded that we didn't, but clearly we do. 1. Single-byte character set: needs NextByte and ByteEq only. 2. Generic multi-byte character set: both % and _ must advance by characters to ensure we never try an out-of-alignment character comparison. But simple character comparison works bytewise given that. So primitives are NextChar, NextByte, ByteEq. 3. UTF8: % can advance bytewise. _ must check it is on a first byte (else return match failure) and if so do NextChar. So primitives are NextChar, NextByte, ByteEq, IsFirstByte. In no case do we need CharEq. I'd be inclined to drop ByteEq as a macro and just use "==", too. I'll work this up. I think it will be easier if I marry cases 1 and 2, with NextChar being the same as NextByte in the single byte case. cheers andrew ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server
On 5/23/07, Martijn van Oosterhout <[EMAIL PROTECTED]> wrote: > As an amusing side note, I have heard a claim that the only reason we > need endianity at all is because the Europeans didn't understand that > Arabic is written from right to left. In Arabic you read "17" as "seven > and ten", which means that it is already little endian. Just one > request, please don't quote this story without also mentioning that this > story is wrong, and that 1234 is said, in Arabic, as "one thousand two > hundred four and thirty". For the record, dutch works like too, Same for German and Slovene. "Ein tausend zwei hundert vier und dreissig." "Tisoch dvesto shtiri in trideset." (sorry, can't produce the s and c with the hacek trivially here, replaced it with a sh and ch respectively ... ). Cheers, Andrej ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] MSVC build failure not exiting with proper error ststus
Andrew Dunstan wrote: > > mastodon and skylark just failed at the make stage due to a thinko on my > part (now fixed). However, this is not correctly caught by the buildfarm > script, meaning that the process invoked at this stage ('build 2>&1') is > not exiting properly with a non-zero status on error. That needs to be > fixed. I was just checking this, and I'm not sure what the problem is. I tried updating to the broken version of solution.pm (the one missing the quotes around --with-pgport), and it works for me. Insofar that I get errorlevel 255 set when exiting the process. Both if I run "perl mkvcbuild.pl" and if I run "build" (yes, also for build 2>&1). The error given is "Can't modify constant item in predecrement" and then a compile error. Am I testing the wrong thing? Could it be that the buildfarm script is somehow not picking up error code 255? (in all cases where it's errors from the vc++ tools, I think it's always errorcode 1 or 2) //Magnus ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] MSVC build failure not exiting with proper error ststus
Magnus Hagander wrote: Andrew Dunstan wrote: mastodon and skylark just failed at the make stage due to a thinko on my part (now fixed). However, this is not correctly caught by the buildfarm script, meaning that the process invoked at this stage ('build 2>&1') is not exiting properly with a non-zero status on error. That needs to be fixed. I was just checking this, and I'm not sure what the problem is. I tried updating to the broken version of solution.pm (the one missing the quotes around --with-pgport), and it works for me. Insofar that I get errorlevel 255 set when exiting the process. Both if I run "perl mkvcbuild.pl" and if I run "build" (yes, also for build 2>&1). The error given is "Can't modify constant item in predecrement" and then a compile error. Am I testing the wrong thing? Could it be that the buildfarm script is somehow not picking up error code 255? (in all cases where it's errors from the vc++ tools, I think it's always errorcode 1 or 2) The code executed is: chdir "$pgsql/src/tools/msvc"; @makeout = `build 2>&1`; chdir $branch_root; my $status = $? >>8; The perl docs say this about $?: The status returned by the last pipe close, backtick (‘‘) com- mand, successful call to wait() or waitpid(), or from the sys- tem() operator. This is just the 16-bit status word returned by the wait() system call (or else is made up to look like it). Thus, the exit value of the subprocess is really ("$? >> 8"), and "$? & 127" gives which signal, if any, the process died from, and "$? & 128" reports whether there was a core dump. cheers andrew ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server
Please note - I'm not trying to pick up a fight. Tom Lane wrote: > > Your source appears fairly ignorant of things-float. That is possible, and even likely, however > If they really are > using decimal FP, it's easy to demonstrate that a lossless conversion > to/from binary representation of similar size is impossible. The set of > exactly representable values is simply different. When I originally read this statement my initial response was *dough*. After having time to sleep over it, however, I'm no longer as certain as I was. Before you explode at me (again :), I'm not arguing that you can do binary based calculations of decimal numbers without having rounding errors that come to bite you. I know you can't. What I'm saying is that we have two cases to consider. In one of them the above is irrelevant, and in the other I'm not so sure it's true. The first case to consider is that of the client getting a number from the server and doing calculations on it. Since the client works in base 2, the inaccuracies are built into the model no matter what we'll do and how we export the actual number. As such, I don't think we need worry about it. If the client also works in base 10, see the second case. The second case is of a number being exported from the server, stored in binary (excuse the pun) format on the client, and then resent back to the server, where it is translated from base 2 to base 10 again. You will notice that no actual calculation will be performed on the number while in base 2. The only question is whether the number, when translated to base 2 and then back to base 10 is guaranteed to maintain its original value. I don't have a definite answer to that, but I did calculate the difference in representation. A 64 bit IEEE floating point has 1 bit of sign, 52 bit of mantissa and 11 bit of exponent. The number actually has 53 bits of mantissa for non-denormalized numbers, as there is another implied "1" at the beginning. I'm going to assume, however, that all binary numbers are denormalized, and only use 52. I'm allowed to assume that for two reasons. The first is that it decreases the accuracy of the base 2 representation, and thus makes my own argument harder to prove. If I can prove it under this assumption, it's obvious that it's still going to hold true with an extra bit of accuracy. The second reason I'm going to assume it is because I don't see how we can have "normalized" numbers under the base 10 representation. The assumed "1" is there because a base 2 number will have to have a leading "1" somewhere, and having it at the start will give best accuracy. The moment the leading number can be 1-9, it is no longer possible to assume it. In other words, I don't see how a base 10 representation can assume that bit, and it is thus losing it. Since this assumption may be wrong, I am "penalizing" the base 2 representation as well to compensate. To recap, then. With base 2 we have 52 bits of mantissa, which will get us as high as 4,503,599,627,370,500 combinations. These will have an effective exponent range (not including denormalized numbers) of 2,048 different combinations, which can get us (let's assume no fractions on both bases) as high as 2^2048, or 616.51 decimal digits. With decimal representation, each 4 bits are one digit, so the same 52 bits account for 13 digits, giving 10,000,000,000,000 possible mantissas, with an exponent range of 11 bits, but raised to the power of 10, so resulting in a range of 2048 decimal digits. Of course, we have no use for such a huge exponent range with such small mantissa, so we are likely to move bits from the exponent to the mantissa. Since we have no use for fractions of a decimal digit, we will move the bits in multiples of 4. I'm going now to assume an absurd assumption. I'll assume we move 8 bits from the exponent to the mantissa. This leaves us with only three bits of exponent, which will only cover 8 decimal digits, but give us 60 bits, or 15 decimal digits in the mantissa, or a range of 1,000,000,000,000,000 numbers. Please note that the base 2 representation still has 4.5 times more mantissas it can represent using only 52 bits. So what have we got so far? A 64 bit decimal based floating point can give up almost all of its exponent in order to create a mantissa that has, roughly, the same range as the base 2, and still be outnumbered by 2.17 bits worth ASSUMING WE DON'T USE THE IMPLIED BIT IN THE BASE 2 REPRESENTATION. Now, I suggest that even with "just" 2.17 bits extra, the binary representation will be accurate enough to hold the approximation of the decimal number to such precision that the back and forth translation will reliably produce the original number. Of course, if we do use the extra bit, it's 3.17 bits extra. If we don't give up 8, but only 4 bits from the exponent, we now have 6.49 bits extra (5.49 if you want the above assumption), while having an exponent range of only 128 decimal digits (as opposed to 616 with IEEE). Now,