Simon Riggs <[EMAIL PROTECTED]> writes:
> On Sun, 2005-11-06 at 11:26 -0500, Tom Lane wrote:
>> Really? After I woke up a bit more I realized there was only one bit
>> and change to spare, not two, so I don't see how it would work.
> Not sure why you think that. Seems to fit
[ counts on fing
On Sun, 2005-11-06 at 11:26 -0500, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > On Thu, 2005-11-03 at 10:32 -0500, Tom Lane wrote:
> >> I think we could make it go by cramming the sign and
> >> the high-order dscale bit into the first NumericDigit --- the
> >> digit itself can only
Simon Riggs <[EMAIL PROTECTED]> writes:
> On Thu, 2005-11-03 at 10:32 -0500, Tom Lane wrote:
>> I think we could make it go by cramming the sign and
>> the high-order dscale bit into the first NumericDigit --- the
>> digit itself can only be 0.. so there are a couple of bits
>> to spare.
> I'v
On Thu, 2005-11-03 at 10:32 -0500, Tom Lane wrote:
> I'd feel a lot happier about this if we could keep the dynamic range
> up to, say, 10^512 so that it's still true that NUMERIC can be a
> universal parse-time representation. That would also make it even
> more unlikely that anyone would complai
In article <[EMAIL PROTECTED]>,
Gregory Maxwell <[EMAIL PROTECTED]> writes:
> On 11/4/05, Martijn van Oosterhout wrote:
>> Yeah, and while one way of removing that dependance is to use ICU, that
>> library wants everything in UTF-16. So we replace "copying to add NULL
>> to string" with "converti
On Fri, Nov 04, 2005 at 07:15:22PM -0500, Gregory Maxwell wrote:
> Other lame aspects of using unicode encodings other than UTF-8
> internally is that it's harder to figure out what is text in GDB
> output and such.. can make debugging more difficult.
Yeah, that's one of the reasons I think UTF-16
On 11/4/05, Martijn van Oosterhout wrote:
[snip]
> : ICU does not use UCS-2. UCS-2 is a subset of UTF-16. UCS-2 does not
> : support surrogates, and UTF-16 does support surrogates. This means
> : that UCS-2 only supports UTF-16's Base Multilingual Plane (BMP). The
> : notion of UCS-2 is deprecated
On Fri, Nov 04, 2005 at 02:58:05PM -0500, Gregory Maxwell wrote:
> The correct question to ask is something like "Does it support non-bmp
> characters?" or "Does it really support UTF-16 or just UCS2?"
>
> UTF-16 is (now) a variable width encoding which is a strict superset
> of UCS2 which allows
On Fri, Nov 04, 2005 at 04:30:27PM -0500, Tom Lane wrote:
> "Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> > On Thu, Nov 03, 2005 at 10:32:03AM -0500, Tom Lane wrote:
> >> I'd feel a lot happier about this if we could keep the dynamic range
> >> up to, say, 10^512 so that it's still true that NUMERIC
"Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> On Thu, Nov 03, 2005 at 10:32:03AM -0500, Tom Lane wrote:
>> I'd feel a lot happier about this if we could keep the dynamic range
>> up to, say, 10^512 so that it's still true that NUMERIC can be a
>> universal parse-time representation. That would also
On Thu, Nov 03, 2005 at 10:32:03AM -0500, Tom Lane wrote:
> I'd feel a lot happier about this if we could keep the dynamic range
> up to, say, 10^512 so that it's still true that NUMERIC can be a
> universal parse-time representation. That would also make it even
> more unlikely that anyone would
On Thu, Nov 03, 2005 at 04:07:41PM +0100, Marcus Engene wrote:
> Simon Riggs wrote:
> >On Thu, 2005-11-03 at 11:13 -0300, Alvaro Herrera wrote:
> >
> >>Simon Riggs wrote:
> >>
> >>>On PostgreSQL, CHAR(12) is a bpchar datatype with all instantiations of
> >>>that datatype having a 4 byte varlena hea
Martijn van Oosterhout writes:
> Yeah, and while one way of removing that dependance is to use ICU, that
> library wants everything in UTF-16.
Really? Can't it do UCS4 (UTF-32)? There's a nontrivial population
of our users that isn't satisfied with UTF-16 anyway, so if that really
is a restrict
On 11/4/05, Tom Lane <[EMAIL PROTECTED]> wrote:
> Martijn van Oosterhout writes:
> > Yeah, and while one way of removing that dependance is to use ICU, that
> > library wants everything in UTF-16.
>
> Really? Can't it do UCS4 (UTF-32)? There's a nontrivial population
> of our users that isn't sa
On 11/4/05, Martijn van Oosterhout wrote:
> Yeah, and while one way of removing that dependance is to use ICU, that
> library wants everything in UTF-16. So we replace "copying to add NULL
> to string" with "converting UTF-8 to UTF-16 on each call. Ugh! The
> argument for UTF-16 is that if you're
On Fri, Nov 04, 2005 at 01:54:04PM -0500, Tom Lane wrote:
> [EMAIL PROTECTED] writes:
> > I read "the backend is by and large an ASCII, null-terminated-string
> > engine" with "we use UTF-8 [for varlena strings?]" as, a lot of the
> > code assumes varlena strings are '\0' terminated, and an assumpt
[EMAIL PROTECTED] writes:
> I read "the backend is by and large an ASCII, null-terminated-string
> engine" with "we use UTF-8 [for varlena strings?]" as, a lot of the
> code assumes varlena strings are '\0' terminated, and an assumption
> on my part, that the varlena strings are not stored in the b
On Fri, Nov 04, 2005 at 04:13:29PM +0100, Martijn van Oosterhout wrote:
> On Fri, Nov 04, 2005 at 08:38:38AM -0500, [EMAIL PROTECTED] wrote:
> > On Thu, Nov 03, 2005 at 09:17:43PM -0500, Tom Lane wrote:
> > > Actually, the real reason we use UTF-8 and not any of the
> > > sorta-fixed-size represent
On Fri, Nov 04, 2005 at 08:38:38AM -0500, [EMAIL PROTECTED] wrote:
> On Thu, Nov 03, 2005 at 09:17:43PM -0500, Tom Lane wrote:
> > Actually, the real reason we use UTF-8 and not any of the
> > sorta-fixed-size representations of Unicode is that the backend is by
> > and large an ASCII, null-termina
On Thu, Nov 03, 2005 at 09:17:43PM -0500, Tom Lane wrote:
> Gregory Maxwell <[EMAIL PROTECTED]> writes:
> > Another way to look at this is in the context of compression: With
> > unicode, characters are really 32bit values... But only a small range
> > of these values is common. So we store and wo
Gregory Maxwell <[EMAIL PROTECTED]> writes:
> Another way to look at this is in the context of compression: With
> unicode, characters are really 32bit values... But only a small range
> of these values is common. So we store and work with them in a
> compressed format, UTF-8.
> As such it might
On 11/3/05, Martijn van Oosterhout wrote:
> That's called UTF-16 and is currently not supported by PostgreSQL at
> all. That may change, since the locale library ICU requires UTF-16 for
> everything.
UTF-16 doesn't get us out of the variable length character game, for
that we need UTF-32... Unles
On Thu, 2005-11-03 at 10:32 -0500, Tom Lane wrote:
> Bruce Momjian writes:
> >> On Thu, 3 Nov 2005, Simon Riggs wrote:
> >>> At the moment we've established we can do this fairly much for free.
>
> > Agreed. With the proposal, we are saving perhaps 5% storage space for
> > numeric fields, but ar
On 2005-11-03, Martijn van Oosterhout wrote:
>> For "other databases", the column could be encoded as 2 byte characters
>> or 4 byte characters, allowing it to be fixed. I find myself doubting
>> that ASCII characters could be encoded more efficiently in such formats,
>> than the inclusion of a le
On Thu, Nov 03, 2005 at 12:28:02PM -0500, [EMAIL PROTECTED] wrote:
> It's unfortunate that the length is encoded multiple times. In UTF-8,
> for instance, each character has its length encoded in the most
> significant bits. Complicated to extract, however, the data is encoded
> twice. 1 in the hea
On Thu, 2005-11-03 at 11:36 -0500, Andrew Dunstan wrote:
> Well, it could also be argued that DW apps could often get away with
> using floating point types, even where the primary source needs to be in
> fixed point for accuracy, and that could generate lots of savings in
> speed and space. B
On Thu, Nov 03, 2005 at 03:09:26PM +0100, Martijn van Oosterhout wrote:
> On Thu, Nov 03, 2005 at 01:49:46PM +, Simon Riggs wrote:
> > In other databases, CHAR(12) and NUMERIC(12) are fixed length datatypes.
> > In PostgreSQL, they are dynamically varying datatypes.
> Please explain how a CHAR(
Simon Riggs wrote:
On Wed, 2005-11-02 at 19:12 -0500, Tom Lane wrote:
Andrew Dunstan <[EMAIL PROTECTED]> writes:
Could someone please quantify how much bang we might get for what seems
like quite a lot of bucks?
I appreciate the need for speed, but the saving here strikes me as
marg
On Thu, 2005-11-03 at 07:03 -0800, Stephan Szabo wrote:
> I don't believe the above is safe to say, yet. AFAICS, this has been
> discussed only on hackers (and patches) in this discussion, whereas this
> sort of change should probably be brought up on general as well to get a
> greater understandi
Bruce Momjian writes:
>> On Thu, 3 Nov 2005, Simon Riggs wrote:
>>> At the moment we've established we can do this fairly much for free.
> Agreed. With the proposal, we are saving perhaps 5% storage space for
> numeric fields, but are adding code complexity and reducing its possible
> precision.
Stephan Szabo wrote:
>
> On Thu, 3 Nov 2005, Simon Riggs wrote:
>
> > On Wed, 2005-11-02 at 19:12 -0500, Tom Lane wrote:
> > > Andrew Dunstan <[EMAIL PROTECTED]> writes:
> > > > Could someone please quantify how much bang we might get for what seems
> > > > like quite a lot of bucks?
> > > > I ap
Simon Riggs wrote:
On Thu, 2005-11-03 at 11:13 -0300, Alvaro Herrera wrote:
Simon Riggs wrote:
On PostgreSQL, CHAR(12) is a bpchar datatype with all instantiations of
that datatype having a 4 byte varlena header. In this example, all of
those instantiations having the varlena header set to 12
On Thu, 3 Nov 2005, Simon Riggs wrote:
> On Wed, 2005-11-02 at 19:12 -0500, Tom Lane wrote:
> > Andrew Dunstan <[EMAIL PROTECTED]> writes:
> > > Could someone please quantify how much bang we might get for what seems
> > > like quite a lot of bucks?
> > > I appreciate the need for speed, but the
On Thu, 2005-11-03 at 11:13 -0300, Alvaro Herrera wrote:
> Simon Riggs wrote:
> > On PostgreSQL, CHAR(12) is a bpchar datatype with all instantiations of
> > that datatype having a 4 byte varlena header. In this example, all of
> > those instantiations having the varlena header set to 12, so essent
Simon Riggs wrote:
> On PostgreSQL, CHAR(12) is a bpchar datatype with all instantiations of
> that datatype having a 4 byte varlena header. In this example, all of
> those instantiations having the varlena header set to 12, so essentially
> wasting the 4 byte header.
We need the length word beca
On Thu, Nov 03, 2005 at 01:49:46PM +, Simon Riggs wrote:
> In other databases, CHAR(12) and NUMERIC(12) are fixed length datatypes.
> In PostgreSQL, they are dynamically varying datatypes.
Please explain how a CHAR(12) can store 12 UTF-8 characters when each
character may be 1 to 4 bytes, unle
On Thu, 2005-11-03 at 08:27 +, Simon Riggs wrote:
> On Wed, 2005-11-02 at 19:12 -0500, Tom Lane wrote:
> > If we were willing to invent the "varlena2" datum format then we could
> > save four bytes per numeric, plus reduce numeric's alignment requirement
> > from int to short which would probab
On Wed, 2005-11-02 at 19:12 -0500, Tom Lane wrote:
> Andrew Dunstan <[EMAIL PROTECTED]> writes:
> > Could someone please quantify how much bang we might get for what seems
> > like quite a lot of bucks?
> > I appreciate the need for speed, but the saving here strikes me as
> > marginal at best, u
Andrew Dunstan <[EMAIL PROTECTED]> writes:
> Could someone please quantify how much bang we might get for what seems
> like quite a lot of bucks?
> I appreciate the need for speed, but the saving here strikes me as
> marginal at best, unless my instincts are all wrong (quite possible)
Two bytes
[patches removed]
Tom Lane wrote:
Simon Riggs <[EMAIL PROTECTED]> writes:
It seems straightforward enough to put in an additional test, similar to
the ones already there so that if its too big for a decimal we make it a
float straight away - only a float can be that big in that case. After
On Wed, Nov 02, 2005 at 06:12:37PM -0500, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > It seems straightforward enough to put in an additional test, similar to
> > the ones already there so that if its too big for a decimal we make it a
> > float straight away - only a float can be
Simon Riggs <[EMAIL PROTECTED]> writes:
> It seems straightforward enough to put in an additional test, similar to
> the ones already there so that if its too big for a decimal we make it a
> float straight away - only a float can be that big in that case. After
> that I can't really see what the p
On Wed, 2005-11-02 at 15:09 -0500, Tom Lane wrote:
> [ thinks for a moment... ] Actually, neither proposal is going to get
> off the ground, because the parser's handling of numeric constants is
> predicated on the assumption that type NUMERIC can handle any valid
> value of FLOAT8, and so we can
On Wed, 2005-11-02 at 15:09 -0500, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > I wasn't trying to claim the bit assignment made sense. My point was
> > that the work to mangle the two fields together to make it make sense
> > looked like it would take more CPU (since the standard
On Wed, Nov 02, 2005 at 12:53:07PM -0600, Jim C. Nasby wrote:
> > This is one of those issues where we need to run tests and take input.
> > We cannot decide this sort of thing just by debate alone. So, I'll leave
> > this as a less potentially fruitful line of enquiry.
>
> Is it worth comming up
Simon Riggs <[EMAIL PROTECTED]> writes:
> I wasn't trying to claim the bit assignment made sense. My point was
> that the work to mangle the two fields together to make it make sense
> looked like it would take more CPU (since the standard representation of
> signed integers is different for +ve an
On Wed, 2005-11-02 at 13:46 -0500, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > On Tue, 2005-11-01 at 17:55 -0500, Tom Lane wrote:
> >> I don't think it'd be worth having 2 types. Remember that the weight is
> >> measured in base-10k digits. Suppose for instance
> >>sign
On Wed, Nov 02, 2005 at 08:48:25AM +, Simon Riggs wrote:
> On Tue, 2005-11-01 at 18:15 -0500, Tom Lane wrote:
> > Simon Riggs <[EMAIL PROTECTED]> writes:
> > > Anybody like to work out a piece of SQL to perform data profiling and
> > > derive the distribution of values with trailing zeroes?
> >
Simon Riggs <[EMAIL PROTECTED]> writes:
> On Tue, 2005-11-01 at 17:55 -0500, Tom Lane wrote:
>> I don't think it'd be worth having 2 types. Remember that the weight is
>> measured in base-10k digits. Suppose for instance
>> sign1 bit
>> weight 7 bits (-64 .. +63)
>>
On Tue, 2005-11-01 at 17:55 -0500, Tom Lane wrote:
> "Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> > FWIW, most databases I've used limit NUMERIC to 38 digits, presumably to
> > fit length info into 1 or 2 bytes. So there's something to be said for a
> > small numeric type that has less overhead and
I am not able to quickly find your numeric format, so I'll just throw
this in. MaxDB (I only mention this because the format and algorithms
are now under the GPL, so they can be reviewed by the public) uses a
nifty number format that allows the use memcpy to compare two numbers
when they are in t
On 11/2/05, Simon Riggs <[EMAIL PROTECTED]> wrote:
> On Tue, 2005-11-01 at 18:15 -0500, Tom Lane wrote:
> > Simon Riggs <[EMAIL PROTECTED]> writes:
> > > Anybody like to work out a piece of SQL to perform data profiling and
> > > derive the distribution of values with trailing zeroes?
> >
> > Don't
On Tue, 2005-11-01 at 18:15 -0500, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > Anybody like to work out a piece of SQL to perform data profiling and
> > derive the distribution of values with trailing zeroes?
>
> Don't forget leading zeroes. And all-zero (we omit digits entirely
On 11/1/05 2:38 PM, "Jim C. Nasby" <[EMAIL PROTECTED]> wrote:
>
> FWIW, most databases I've used limit NUMERIC to 38 digits, presumably to
> fit length info into 1 or 2 bytes. So there's something to be said for a
> small numeric type that has less overhead and a large numeric (what we
> have toda
Simon Riggs <[EMAIL PROTECTED]> writes:
> Anybody like to work out a piece of SQL to perform data profiling and
> derive the distribution of values with trailing zeroes?
Don't forget leading zeroes. And all-zero (we omit digits entirely in
that case). I don't think you can claim that zero isn't
"Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> On Tue, Nov 01, 2005 at 05:40:35PM -0500, Tom Lane wrote:
>> Maybe if we had a few other datatypes that could also use the feature.
>> [ thinks... ] inet/cidr comes to mind but I don't see any others.
>> The case seems a bit weak :-(
> Would varchar(25
On Tue, 2005-11-01 at 23:16 +0100, Martijn van Oosterhout wrote:
lots of useful things, thank you.
> > So, assuming I have this all correct, means we could reduce the on disk
> > storage for NUMERIC datatypes to the following struct. This gives an
> > overhead of just 2.5 bytes, plus the loss of
"Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> FWIW, most databases I've used limit NUMERIC to 38 digits, presumably to
> fit length info into 1 or 2 bytes. So there's something to be said for a
> small numeric type that has less overhead and a large numeric (what we
> have today).
I don't think it'
On Tue, Nov 01, 2005 at 05:40:35PM -0500, Tom Lane wrote:
> Martijn van Oosterhout writes:
> > You are proposing a fourth type, say VARLENA2 which looks a lot like a
> > verlena but it's not. I think the shear volume of code that would need
> > to be checked is huge. Also, things like pg_attribute
On Tue, 2005-11-01 at 16:54 -0500, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > varlen is int32 to match the standard varlena header. However, the max
> > number of digits of the datatype is less than the threshold at which
> > values get toasted. So no NUMERIC values ever get toas
On Tue, Nov 01, 2005 at 11:16:58PM +0100, Martijn van Oosterhout wrote:
> Consider the algorithm: A number is stored as base + exponent. To
> multiply two numbers you can multiply the bases and add the exponents.
> OTOH, if you store the decimal inside the data, now you have to extract
> it again b
Martijn van Oosterhout writes:
> You are proposing a fourth type, say VARLENA2 which looks a lot like a
> verlena but it's not. I think the shear volume of code that would need
> to be checked is huge. Also, things like pg_attribute would need
> changing because you have to represent this new stat
On Tue, Nov 01, 2005 at 04:54:11PM -0500, Tom Lane wrote:
> It might be reasonable to restrict the range of NUMERIC to the point
> that we could fit the weight/sign/dscale into 2 bytes instead of 4,
> thereby saving 2 bytes per NUMERIC. I'm not excited about the other
> aspects of this, though.
F
On Tue, Nov 01, 2005 at 09:22:17PM +, Simon Riggs wrote:
> varlen is int32 to match the standard varlena header. However, the max
> number of digits of the datatype is less than the threshold at which
> values get toasted. So no NUMERIC values ever get toasted - in which
> case, why worry about
Simon Riggs <[EMAIL PROTECTED]> writes:
> varlen is int32 to match the standard varlena header. However, the max
> number of digits of the datatype is less than the threshold at which
> values get toasted. So no NUMERIC values ever get toasted - in which
> case, why worry about matching the size of
Currently, the overhead of NUMERIC datatype is 8 bytes. Each value is
stored on disk as
typedef struct NumericData
{
int32 varlen; /* Variable size (std varlena header) */
int16 n_weight; /* Weight of 1st digit */
uint16 n_sign_dscale; /* Sign + display
66 matches
Mail list logo