date:20070522

Re: [HACKERS] Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

2007-05-22 Thread Shachar Shemesh

Tom Lane wrote:
> Okay, I spent some time googling this question, and I can't find any
> suggestion that any ARM variant uses non-IEEE-compliant float format.
> What *is* real clear is that depending on ARM model and a run time (!)
> CPU endianness flag, there are three or four different possibilities
> for the endianness of the data, including a PDP-endian-like alternative
> in which the order of the high and low words is at variance with the
> order of bytes within the words.  (Pardon me while I go vomit...)
>   
Welcome to the wonderful world of embedded CPUs. These buggers will do
ANYTHING, and I do mean anything, in order to squeeze a little more
performance with a little less power consumption, while keeping the end
price tag under 10$. The ARM9, for example, can switch, on the fly,
between 32 and 16 bit machine language in order to save a few bytes in
code size and gain a few MIPS in execution speed.

As an amusing side note, I have heard a claim that the only reason we
need endianity at all is because the Europeans didn't understand that
Arabic is written from right to left. In Arabic you read "17" as "seven
and ten", which means that it is already little endian. Just one
request, please don't quote this story without also mentioning that this
story is wrong, and that 1234 is said, in Arabic, as "one thousand two
hundred four and thirty".

Mixed endianity is usually relic of a 16bit processor that was enhanced
to 32bit. The parts that were atomic before would be big endian, but the
parts that the old CPU required to do in separate operations are stored
low to high.
> So
> I would concur with a patch that ensures that this is what happens
> on the different ARM variants ... though I'll still be interested
> to see how you make that happen given the rather poor visibility
> into which model and endianness we are running on.
>   
You do it semantically. Attached is the outline for the code (I can form
a patch only after we agree where it should go)
I should note a few things:
On IEEE platforms, the code will, of course, translate to/from the same
format. This can be verified by the dump at the end.
I have tested the code on several numbers, and it does work for normal
and for denormalized numbers. I have not tested whether the detection
whether we should generate one or the other actually works, so there may
be an off by one there.
The are a few corner cases that are not yet handled. Two are documented
(underflow and rounding on denormalized numbers). There is one
undocumented, of overflow.
The IEEE -> native code is not yet written, but I think it should be
fairly obvious how it will look once it is.
There is also a function in the code called "calcsize". It's the
beginning of a function to calculate the parameters for the current
platform, again, without knowing the native format. I was thinking of
putting it in the "configure" test, except, of course, the platforms we
refer to are, typically, ones for which you cross compile. See below.

Comments welcome.
> PS: Of course this does not resolve the generic issue of what to do
> with platforms that have outright non-IEEE-format floats.  But at the
> moment I don't see evidence that we need reach that issue for ARM.
>   
The code above does detect when the float isn't being precisely
represented by the IEEE float. We could have another format for those
cases, and distinguish between the cases on import by testing its size.
> PPS: I'm sort of wondering if the PDP-endian business doesn't afflict
> int8 too on this platform.
>   
It's likely. I would say that a configure test would be the best way to
test it, but I suspect that most programs for ARM are cross compiled.
I'm not sure how to resolve that. Maybe if there's a way to
automatically test what gets into memory when you let the compiler
create the constant 0123456789abcdef. At least for smaller than 8 bytes,
the "hton" functions SHOULD do the right thing always.

I COULD go back to my source (he's on vacation until Sunday anyways),
but I'll throw in a guess. Since the ARMs (at least the 7 and the 9) are
not 64 bit native, it's compiler dependent. There are two main compilers
for the ARM, with one of them being gcc. That's, more or less, where my
insights into this end.

Shachar
#include 
#include 
#include 
#include 

// What type would we be working on?
#if 1

// Double
#define TYPE double
#define FRAC_BITS 52
#define EXP_BITS 11
#define EXP_BIAS 1023

#else

// Float
#define TYPE float
#define FRAC_BITS 23
#define EXP_BITS 8
#define EXP_BIAS 127

#endif

union fp {
   TYPE flt;
   struct {
  unsigned long low;
  unsigned long high;
   } i;
   unsigned long long l;
   struct {
  unsigned long long int frac:FRAC_BITS;
  unsigned long long int exp:EXP_BITS;
  unsigned long long int sign:1;
   } fp;
};

void dumpnum( TYPE n )
{
   union fp val;
   val.flt=n;
   val.fp.sign=0;
   val.fp.exp=0x7ff;
   val.fp.frac=12;

   printf("%g %08x%08x\n", val.flt, val.i.high, val.i.low );
   p

Do we need a TODO? (was Re: [HACKERS] Concurrently updating an updatable view)

2007-05-22 Thread Richard Huxton


Florian G. Pflug wrote:


Is there consensus what the correct behaviour should be for
self-referential updates in read-committed mode? Does the SQL Spec
have anything to say about this?


This seems to have gone all quiet. Do we need a TODO to keep a note of 
it? Just "correct behaviour for self-referential updates"


Hiroshi originally noted the problem in one of his views here:
  http://archives.postgresql.org/pgsql-hackers/2007-05/msg00507.php

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] [BUGS] Inconsistant SQL results - Suspected error with query planing or query optimisation.

2007-05-22 Thread Tom Lane

adam terrey <[EMAIL PROTECTED]> writes at
http://archives.postgresql.org/pgsql-bugs/2007-05/msg00187.php

> Anyway, mybug: I have a test SELECT statement (Listing A - see "sql 
> listings.txt") wich produces different results under two simular setups 
> (Listing B and Listing C). Each setup should product the same result for 
> the given SELECT statement.

The problem here is that 8.2 is incorrectly concluding that it can
rearrange the order of the two LEFT JOIN steps in the query:

SELECT a.id
FROM items a
LEFT JOIN (
SELECT  b.id
FROM items b
LEFT JOIN (
SELECT c.id FROM items c WHERE number = 1
) AS moded_items USING (id)
WHERE moded_items.id IS NULL
) AS sub_items USING (id)
WHERE sub_items.id IS NULL;

The plan it comes up with is:

 Nested Loop Left Join  (cost=469.00..1063.39 rows=1 width=4) (actual 
time=288.962..288.962 rows=0 loops=1)
   Filter: (c.id IS NULL)
   ->  Hash Left Join  (cost=469.00..1063.00 rows=1 width=8) (actual 
time=288.946..288.946 rows=0 loops=1)
 Hash Cond: (a.id = b.id)
 Filter: (b.id IS NULL)
 ->  Seq Scan on items a  (cost=0.00..344.00 rows=1 width=4) 
(actual time=0.080..50.973 rows=1 loops=1)
 ->  Hash  (cost=344.00..344.00 rows=1 width=4) (actual 
time=140.880..140.880 rows=1 loops=1)
   ->  Seq Scan on items b  (cost=0.00..344.00 rows=1 width=4) 
(actual time=0.046..69.395 rows=1 loops=1)
   ->  Index Scan using items_pkey on items c  (cost=0.00..0.38 rows=1 width=4) 
(never executed)
 Index Cond: (b.id = c.id)
 Filter: (c.number = 1)

After reducing join_collapse_limit to 1, we get the right join order and
the right answers:

 Hash Left Join  (cost=750.54..1132.05 rows=1 width=4) (actual 
time=409.712..409.740 rows=2 loops=1)
   Hash Cond: (a.id = b.id)
   Filter: (b.id IS NULL)
   ->  Seq Scan on items a  (cost=0.00..344.00 rows=1 width=4) (actual 
time=0.100..51.052 rows=1 loops=1)
   ->  Hash  (cost=750.52..750.52 rows=1 width=4) (actual time=264.978..264.978 
rows=9998 loops=1)
 ->  Hash Left Join  (cost=369.01..750.52 rows=1 width=4) (actual 
time=30.074..192.023 rows=9998 loops=1)
   Hash Cond: (b.id = c.id)
   Filter: (c.id IS NULL)
   ->  Seq Scan on items b  (cost=0.00..344.00 rows=1 width=4) 
(actual time=0.030..50.913 rows=1 loops=1)
   ->  Hash  (cost=369.00..369.00 rows=1 width=4) (actual 
time=29.976..29.976 rows=2 loops=1)
 ->  Seq Scan on items c  (cost=0.00..369.00 rows=1 
width=4) (actual time=29.896..29.916 rows=2 loops=1)
   Filter: (number = 1)

So there is something wrong with the rule used for deciding whether two
LEFT JOINs can commute.  Per the planner README:

: The planner's treatment of outer join reordering is based on the following
: identities:
: 
: 1.(A leftjoin B on (Pab)) innerjoin C on (Pac)
:   = (A innerjoin C on (Pac)) leftjoin B on (Pab)
: 
: where Pac is a predicate referencing A and C, etc (in this case, clearly
: Pac cannot reference B, or the transformation is nonsensical).
: 
: 2.(A leftjoin B on (Pab)) leftjoin C on (Pac)
:   = (A leftjoin C on (Pac)) leftjoin B on (Pab)
: 
: 3.(A leftjoin B on (Pab)) leftjoin C on (Pbc)
:   = A leftjoin (B leftjoin C on (Pbc)) on (Pab)
: 
: Identity 3 only holds if predicate Pbc must fail for all-null B rows
: (that is, Pbc is strict for at least one column of B).  If Pbc is not
: strict, the first form might produce some rows with nonnull C columns
: where the second form would make those entries null.

What we have here is an invocation of rule 3 in a situation where it's
not appropriate.  The difficulty is that the code is only paying
attention to the syntactical JOIN/ON clauses and has neglected the
intermediate-level WHERE clause.

After a bit of reflection it seems that a WHERE that is semantically
just below a left-join's right side can be treated as if it were part of
that left-join's ON clause.  It will have the same effect as if it had
been written there: any rows rejected by the WHERE will fail to be
joined to the left side and will contribute nothing to the result.
Had we been following this rule, we'd have concluded that c.id IS NULL
is part of the upper join qual, and therefore that it has a predicate
Pabc not just Pab and cannot be commuted with the lower join.

Teaching initsplan.c to do things this way seems possible but less than
trivial.  Before I start worrying about that, does anyone see any flaws
in the reasoning at this level of detail?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[HACKERS] like/ilike improvements

2007-05-22 Thread Andrew Dunstan



Starting from a review of a patch from Itagaki Takahiro to improve LIKE 
performance for UTF8-encoded databases, I have been working on improving 
both efficiency of the LIKE/ILIKE code and the code quality.


The main efficiency improvement comes from some fairly tricky analysis 
and discussion on -patches. Essentially there are two calls that we make 
to advance the text and pattern cursors: NextByte and NextChar. In the 
case of single byte charsets these are in fact the same thing, but in 
multi byte charsets they are obviously not, and in that case NextChar is 
a lot more expensive. It turns out (according to the analysis) that the 
only time we actually need to use NextChar is when we are matching an 
"_" in a like/ilike pattern. It also turns out that there are some 
comparison tests that we can hoist out of a loop and thus avoid 
repeating over and over. Also, some calls can be marked "inline" to 
improve efficiency. Finally, the special case of computing lower(x) on 
the fly for ILIKE comparisons on single byte charset strings turns out 
to have the potential to call lower() O(n^2) times, so it has been 
removed and we now treat foo ILIKE bar as lower(foo) LIKE lower(bar) for 
all charsets uniformly. There will be cases where this approach wins and 
cases where it loses, but the wins are potentially dramatic, whereas the 
losses should be mild.


The current state of this work is at 
http://archives.postgresql.org/pgsql-patches/2007-05/msg00385.php


I've been testing it using a set of 5m rows of random Latin1 data - each 
row is between 100 and 400 chars long, and 20% of them (roughly) have 
the string "foo" randomly located within them. The test platform is 
gcc/fc6/AMD64.


I have loaded the data into both Latin1 and UTF8 encoded databases. (I'm 
not sure if there are other multibyte charsets that are compatible with 
Latin1 client encoding). My test is essentially:


 select count(*) from footable where t like '%_foo%';
 select count(*) from footable where t ilike '%_foo%';

 select count(*) from footable where t like '%foo%';
 select count(*) from footable where t ilike '%foo%';

Note that the "%_" case is probably the worst for these changes, since 
it involves lots of calls to NextChar() (see above).


The multibyte results show significant improvement. The results are 
about flat or a slight improvement for the singlebyte cases. I'll post 
some numbers on this shortly.


But before I commit this I'd appreciate seeing some more testing, both 
for correctness and performance.


cheers

andrew










---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] like/ilike improvements

2007-05-22 Thread Tom Lane

Andrew Dunstan <[EMAIL PROTECTED]> writes:
> ... It turns out (according to the analysis) that the 
> only time we actually need to use NextChar is when we are matching an 
> "_" in a like/ilike pattern.

I thought we'd determined that advancing bytewise for "%" was also risky,
in two cases:

1. Multibyte character set that is not UTF8 (more specifically, does not
have a guarantee that first bytes and not-first bytes are distinct)

2. "_" immediately follows the "%".

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] like/ilike improvements

2007-05-22 Thread Andrew Dunstan




Tom Lane wrote:

Andrew Dunstan <[EMAIL PROTECTED]> writes:
  
... It turns out (according to the analysis) that the 
only time we actually need to use NextChar is when we are matching an 
"_" in a like/ilike pattern.



I thought we'd determined that advancing bytewise for "%" was also risky,
in two cases:

1. Multibyte character set that is not UTF8 (more specifically, does not
have a guarantee that first bytes and not-first bytes are distinct)
  


I will review - I thought we had ruled that out.

Which non-UTF8 multi-byte charset would be best to test with?


2. "_" immediately follows the "%".


  


The patch in fact calls NextChar in this case.

cheers

andrew

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] like/ilike improvements

2007-05-22 Thread Andrew Dunstan




Andrew Dunstan wrote:



Tom Lane wrote:

Andrew Dunstan <[EMAIL PROTECTED]> writes:
 
... It turns out (according to the analysis) that the only time we 
actually need to use NextChar is when we are matching an "_" in a 
like/ilike pattern.



I thought we'd determined that advancing bytewise for "%" was also 
risky,

in two cases:

1. Multibyte character set that is not UTF8 (more specifically, does not
have a guarantee that first bytes and not-first bytes are distinct)


I thought we disposed of the idea that there was a problem with charsets 
that didn't do first byte special.


And Dennis said:


Tom Lane skrev:

You could imagine trying to do
% a byte at a time (and indeed that's what I'd been thinking it did)
but that gets you out of sync which breaks the _ case.


It is only when you have a pattern like '%_' when this is a problem 
and we could detect this and do byte by byte when it's not. Now we 
check (*p == '\\') || (*p == '_') in each iteration when we scan over 
characters for '%', and we could do it once and have different loops 
for the two cases.


That's pretty much what the patch does now - It never tries to match a 
single byte when it sees "_", whether or not preceeded by "%".


cheers

andrew




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

2007-05-22 Thread Martijn van Oosterhout

On Tue, May 22, 2007 at 05:14:54PM +0300, Shachar Shemesh wrote:
> As an amusing side note, I have heard a claim that the only reason we
> need endianity at all is because the Europeans didn't understand that
> Arabic is written from right to left. In Arabic you read "17" as "seven
> and ten", which means that it is already little endian. Just one
> request, please don't quote this story without also mentioning that this
> story is wrong, and that 1234 is said, in Arabic, as "one thousand two
> hundred four and thirty".

For the record, dutch works like too, which leads to a fascinating way
of reading phone numbers. 345678 becomes: four and thirty, six and
fifty, eight and seventy.

Takes a while to get used to that...

Have a nice day,
-- 
Martijn van Oosterhout   <[EMAIL PROTECTED]>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature

Re: [HACKERS] like/ilike improvements

2007-05-22 Thread Tom Lane

Andrew Dunstan <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> I thought we'd determined that advancing bytewise for "%" was also 
>> risky, in two cases:
>> 
>> 1. Multibyte character set that is not UTF8 (more specifically, does not
>> have a guarantee that first bytes and not-first bytes are distinct)

> I thought we disposed of the idea that there was a problem with charsets 
> that didn't do first byte special.

We disposed of that in connection with a version of the patch that had
"%" advancing in NextChar units, so that comparison of ordinary
characters was always safely char-aligned.  Consider 2-byte characters
represented as {AB} etc:

DATAx{AB}{CD}y

PATTERN %{BC}%

If "%" advances by bytes then this will find a spurious match.  The
only thing that prevents it is if "B" can't be both a leading and a
trailing byte of validly-encoded MB characters.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

2007-05-22 Thread Tom Lane

Shachar Shemesh <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> I would concur with a patch that ensures that this is what happens
>> on the different ARM variants ... though I'll still be interested
>> to see how you make that happen given the rather poor visibility
>> into which model and endianness we are running on.
>> 
> You do it semantically. Attached is the outline for the code (I can form
> a patch only after we agree where it should go)

Cross-compile situations make life interesting.

[ hold your nose before reading further... ]

After studying how AC_C_BIGENDIAN does it, I propose that the best
answer might be to compile a test program that contains carefully-chosen
"double" constants, then grep the object file for the expected patterns.
This works as long as the compiler knows what format it's supposed to
emit (and if it doesn't, lots of other stuff will fall over).

The only alternative that would work reliably is to run the test once
when the result is first needed, which is kind of unfortunate because
it involves continuing runtime overhead (at least a "switch" on every
conversion).  We in fact did things that way for integer endianness
awhile back, but since we are now depending on AC_C_BIGENDIAN to get
it right, I'd feel more comfortable using a similar solution for float
endianness.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] like/ilike improvements

2007-05-22 Thread Guillaume Smet


On 5/22/07, Andrew Dunstan <[EMAIL PROTECTED]> wrote:

But before I commit this I'd appreciate seeing some more testing, both
for correctness and performance.


Any chance the patch applies cleanly on a 8.2 code base? I can test it
on a real life 8.2 db but I won't have the time to load the data in a
CVS HEAD one.
If there is no obvious reason for it to fail on 8.2, I'll try to see
if I can apply it.

Thanks.

--
Guillaume

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] like/ilike improvements

2007-05-22 Thread Andrew - Supernews

On 2007-05-22, Tom Lane <[EMAIL PROTECTED]> wrote:
> If "%" advances by bytes then this will find a spurious match.  The
> only thing that prevents it is if "B" can't be both a leading and a
> trailing byte of validly-encoded MB characters.

Which is (by design) true in UTF8, but is not true of most other
multibyte charsets.

The %_ case is also trivially handled in UTF8 by simply ensuring that
_ doesn't match a non-initial octet. This allows % to advance by bytes
without danger of losing sync.

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] like/ilike improvements

2007-05-22 Thread Tom Lane

Andrew - Supernews <[EMAIL PROTECTED]> writes:
> On 2007-05-22, Tom Lane <[EMAIL PROTECTED]> wrote:
>> If "%" advances by bytes then this will find a spurious match.  The
>> only thing that prevents it is if "B" can't be both a leading and a
>> trailing byte of validly-encoded MB characters.

> Which is (by design) true in UTF8, but is not true of most other
> multibyte charsets.

> The %_ case is also trivially handled in UTF8 by simply ensuring that
> _ doesn't match a non-initial octet. This allows % to advance by bytes
> without danger of losing sync.

Yeah.  It seems we need three comparison functions after all:

1. Single-byte character set: needs NextByte and ByteEq only.

2. Generic multi-byte character set: both % and _ must advance by
characters to ensure we never try an out-of-alignment character
comparison.  But simple character comparison works bytewise given
that.  So primitives are NextChar, NextByte, ByteEq.

3. UTF8: % can advance bytewise.  _ must check it is on a first byte
(else return match failure) and if so do NextChar.  So primitives
are NextChar, NextByte, ByteEq, IsFirstByte.

In no case do we need CharEq.  I'd be inclined to drop ByteEq as a
macro and just use "==", too.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] like/ilike improvements

2007-05-22 Thread mark

On Tue, May 22, 2007 at 12:12:51PM -0400, Tom Lane wrote:
> Andrew Dunstan <[EMAIL PROTECTED]> writes:
> > ... It turns out (according to the analysis) that the 
> > only time we actually need to use NextChar is when we are matching an 
> > "_" in a like/ilike pattern.
> I thought we'd determined that advancing bytewise for "%" was also risky,
> in two cases:
> 1. Multibyte character set that is not UTF8 (more specifically, does not
> have a guarantee that first bytes and not-first bytes are distinct)
> 2. "_" immediately follows the "%".

Have you considered a two pass approach? First pass - match on bytes.
Only if you find a match with the first pass, start a second pass to
do a 'safe' check?

Are there optimizations to recognize whether the index was created as
lower(field) or upper(field), and translate ILIKE to the appropriate
one?

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] like/ilike improvements

2007-05-22 Thread Andrew Dunstan




Tom Lane wrote:

Yeah.  It seems we need three comparison functions after all:
  


Yeah, that was my confusion. I thought we had concluded that we didn't, 
but clearly we do.



1. Single-byte character set: needs NextByte and ByteEq only.

2. Generic multi-byte character set: both % and _ must advance by
characters to ensure we never try an out-of-alignment character
comparison.  But simple character comparison works bytewise given
that.  So primitives are NextChar, NextByte, ByteEq.

3. UTF8: % can advance bytewise.  _ must check it is on a first byte
(else return match failure) and if so do NextChar.  So primitives
are NextChar, NextByte, ByteEq, IsFirstByte.

In no case do we need CharEq.  I'd be inclined to drop ByteEq as a
macro and just use "==", too.


  


I'll work this up. I think it will be easier if I marry cases 1 and 2, 
with NextChar being the same as NextByte in the single byte case.


cheers

andrew


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

2007-05-22 Thread Andrej Ricnik-Bay

On 5/23/07, Martijn van Oosterhout <[EMAIL PROTECTED]> wrote:

> As an amusing side note, I have heard a claim that the only reason we
> need endianity at all is because the Europeans didn't understand that
> Arabic is written from right to left. In Arabic you read "17" as "seven
> and ten", which means that it is already little endian. Just one
> request, please don't quote this story without also mentioning that this
> story is wrong, and that 1234 is said, in Arabic, as "one thousand two
> hundred four and thirty".
For the record, dutch works like too,

Same for German and Slovene.
"Ein tausend zwei hundert vier und dreissig."
"Tisoch dvesto shtiri in trideset." (sorry, can't produce the s and c
with the hacek
trivially here, replaced it with a sh and ch respectively ... ).

Cheers,
Andrej

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] MSVC build failure not exiting with proper error ststus

2007-05-22 Thread Magnus Hagander

Andrew Dunstan wrote:
> 
> mastodon and skylark just failed at the make stage due to a thinko on my
> part (now fixed). However, this is not correctly caught by the buildfarm
> script, meaning that the process invoked at this stage ('build 2>&1') is
> not exiting properly with a non-zero status on error. That needs to be
> fixed.

I was just checking this, and I'm not sure what the problem is. I tried
updating to the broken version of solution.pm (the one missing the
quotes around --with-pgport), and it works for me. Insofar that I get
errorlevel 255 set when exiting the process. Both if I run "perl
mkvcbuild.pl" and if I run "build" (yes, also for build 2>&1).

The error given is "Can't modify constant item in predecrement" and then
a compile error.

Am I testing the wrong thing? Could it be that the buildfarm script is
somehow not picking up error code 255? (in all cases where it's errors
from the vc++ tools, I think it's always errorcode 1 or 2)

//Magnus

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] MSVC build failure not exiting with proper error ststus

2007-05-22 Thread Andrew Dunstan




Magnus Hagander wrote:

Andrew Dunstan wrote:
  

mastodon and skylark just failed at the make stage due to a thinko on my
part (now fixed). However, this is not correctly caught by the buildfarm
script, meaning that the process invoked at this stage ('build 2>&1') is
not exiting properly with a non-zero status on error. That needs to be
fixed.



I was just checking this, and I'm not sure what the problem is. I tried
updating to the broken version of solution.pm (the one missing the
quotes around --with-pgport), and it works for me. Insofar that I get
errorlevel 255 set when exiting the process. Both if I run "perl
mkvcbuild.pl" and if I run "build" (yes, also for build 2>&1).

The error given is "Can't modify constant item in predecrement" and then
a compile error.

Am I testing the wrong thing? Could it be that the buildfarm script is
somehow not picking up error code 255? (in all cases where it's errors
from the vc++ tools, I think it's always errorcode 1 or 2)
  



The code executed is:

chdir "$pgsql/src/tools/msvc";
@makeout = `build 2>&1`;
chdir $branch_root;
my $status = $? >>8;

The perl docs say this about $?:

The status returned by the last pipe close, backtick (‘‘) com-
mand, successful call to wait() or waitpid(), or from the sys-
tem() operator. This is just the 16-bit status word returned
by the wait() system call (or else is made up to look like it).
Thus, the exit value of the subprocess is really ("$? >> 8"),
and "$? & 127" gives which signal, if any, the process died
from, and "$? & 128" reports whether there was a core dump.

cheers

andrew



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

2007-05-22 Thread Shachar Shemesh

Please note - I'm not trying to pick up a fight.

Tom Lane wrote:
>
> Your source appears fairly ignorant of things-float.
That is possible, and even likely, however
> If they really are
> using decimal FP, it's easy to demonstrate that a lossless conversion
> to/from binary representation of similar size is impossible. The set of
> exactly representable values is simply different.
When I originally read this statement my initial response was *dough*.
After having time to sleep over it, however, I'm no longer as certain as
I was.

Before you explode at me (again :), I'm not arguing that you can do
binary based calculations of decimal numbers without having rounding
errors that come to bite you. I know you can't. What I'm saying is that
we have two cases to consider. In one of them the above is irrelevant,
and in the other I'm not so sure it's true.

The first case to consider is that of the client getting a number from
the server and doing calculations on it. Since the client works in base
2, the inaccuracies are built into the model no matter what we'll do and
how we export the actual number. As such, I don't think we need worry
about it. If the client also works in base 10, see the second case.

The second case is of a number being exported from the server, stored in
binary (excuse the pun) format on the client, and then resent back to
the server, where it is translated from base 2 to base 10 again. You
will notice that no actual calculation will be performed on the number
while in base 2. The only question is whether the number, when
translated to base 2 and then back to base 10 is guaranteed to maintain
its original value.

I don't have a definite answer to that, but I did calculate the
difference in representation. A 64 bit IEEE floating point has 1 bit of
sign, 52 bit of mantissa and 11 bit of exponent. The number actually has
53 bits of mantissa for non-denormalized numbers, as there is another
implied "1" at the beginning. I'm going to assume, however, that all
binary numbers are denormalized, and only use 52. I'm allowed to assume
that for two reasons. The first is that it decreases the accuracy of the
base 2 representation, and thus makes my own argument harder to prove.
If I can prove it under this assumption, it's obvious that it's still
going to hold true with an extra bit of accuracy.

The second reason I'm going to assume it is because I don't see how we
can have "normalized" numbers under the base 10 representation. The
assumed "1" is there because a base 2 number will have to have a leading
"1" somewhere, and having it at the start will give best accuracy. The
moment the leading number can be 1-9, it is no longer possible to assume
it. In other words, I don't see how a base 10 representation can assume
that bit, and it is thus losing it. Since this assumption may be wrong,
I am "penalizing" the base 2 representation as well to compensate.

To recap, then. With base 2 we have 52 bits of mantissa, which will get
us as high as 4,503,599,627,370,500 combinations. These will have an
effective exponent range (not including denormalized numbers) of 2,048
different combinations, which can get us (let's assume no fractions on
both bases) as high as 2^2048, or 616.51 decimal digits.

With decimal representation, each 4 bits are one digit, so the same 52
bits account for 13 digits, giving 10,000,000,000,000 possible
mantissas, with an exponent range of 11 bits, but raised to the power of
10, so resulting in a range of 2048 decimal digits.

Of course, we have no use for such a huge exponent range with such small
mantissa, so we are likely to move bits from the exponent to the
mantissa. Since we have no use for fractions of a decimal digit, we will
move the bits in multiples of 4. I'm going now to assume an absurd
assumption. I'll assume we move 8 bits from the exponent to the
mantissa. This leaves us with only three bits of exponent, which will
only cover 8 decimal digits, but give us 60 bits, or 15 decimal digits
in the mantissa, or a range of 1,000,000,000,000,000 numbers. Please
note that the base 2 representation still has 4.5 times more mantissas
it can represent using only 52 bits.

So what have we got so far? A 64 bit decimal based floating point can
give up almost all of its exponent in order to create a mantissa that
has, roughly, the same range as the base 2, and still be outnumbered by
2.17 bits worth ASSUMING WE DON'T USE THE IMPLIED BIT IN THE BASE 2
REPRESENTATION.

Now, I suggest that even with "just" 2.17 bits extra, the binary
representation will be accurate enough to hold the approximation of the
decimal number to such precision that the back and forth translation
will reliably produce the original number. Of course, if we do use the
extra bit, it's 3.17 bits extra. If we don't give up 8, but only 4 bits
from the exponent, we now have 6.49 bits extra (5.49 if you want the
above assumption), while having an exponent range of only 128 decimal
digits (as opposed to 616 with IEEE).

Now,

Re: [HACKERS] Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

Do we need a TODO? (was Re: [HACKERS] Concurrently updating an updatable view)

Re: [HACKERS] [BUGS] Inconsistant SQL results - Suspected error with query planing or query optimisation.

[HACKERS] like/ilike improvements

Re: [HACKERS] like/ilike improvements

Re: [HACKERS] like/ilike improvements

Re: [HACKERS] like/ilike improvements

Re: [HACKERS] Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

Re: [HACKERS] like/ilike improvements

Re: [HACKERS] Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

Re: [HACKERS] like/ilike improvements

Re: [HACKERS] like/ilike improvements

Re: [HACKERS] like/ilike improvements

Re: [HACKERS] like/ilike improvements

Re: [HACKERS] like/ilike improvements

Re: [HACKERS] Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

Re: [HACKERS] MSVC build failure not exiting with proper error ststus

Re: [HACKERS] MSVC build failure not exiting with proper error ststus

Re: [HACKERS] Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

19 matches

Site Navigation

Mail list logo

Footer information