Re: Generating valid identifiers

2012-07-27 Thread Laszlo Nagy
> With 16 ** 10 possible digests, the probability of collision hits 50% at 1234605 tables Actually, I'm using base64 encoding. So it is 64**10. I guess using 6 characters will enough. -- http://mail.python.org/mailman/listinfo/python-list

Re: Generating valid identifiers

2012-07-27 Thread Laszlo Nagy
Unless an attacker can select the field names, in which case they may be able to improve those odds significantly. In the case of MD5, they can possibly improve those odds to 1 in 1, since MD5 is vulnerable to collision attacks. Not so for some (all?) of the SHA hashes, at least not yet, but the

Re: Generating valid identifiers

2012-07-27 Thread Laszlo Nagy
As a side note, the odds of having at least one hash collision among multiple tables are known as the birthday problem. At 4 hex digits there are 65536 possible digests, and it turns out that at 302 tables there is a >50% chance that at least one pair of those names have the same 4-digit digest

Re: Generating valid identifiers

2012-07-26 Thread Steven D'Aprano
On Thu, 26 Jul 2012 13:28:26 -0600, Ian Kelly wrote: > The odds of a given pair of identifiers having the same digest to 10 hex > digits are 1 in 16^10, or approximately 1 in a trillion. Unless an attacker can select the field names, in which case they may be able to improve those odds significa

Re: Generating valid identifiers

2012-07-26 Thread Ian Kelly
On Thu, Jul 26, 2012 at 1:28 PM, Ian Kelly wrote: > The odds of a given pair of identifiers having the same digest to 10 > hex digits are 1 in 16^10, or approximately 1 in a trillion. If you > bought one lottery ticket a day at those odds, you would win > approximately once every 3 billion years.

Re: Generating valid identifiers

2012-07-26 Thread Ian Kelly
On Thu, Jul 26, 2012 at 9:30 AM, Steven D'Aprano wrote: > What happens if you get a collision? > > That is, you have two different long identifiers: > > a.b.c.d...something > a.b.c.d...anotherthing > > which by bad luck both hash to the same value: > > a.b.c.d.$AABB99 > a.b.c.d.$AABB99 > > (or wha

Re: Generating valid identifiers

2012-07-26 Thread Laszlo Nagy
* Would it be a problem to use CRC32 instead of SHA? (Since security is not a problem, and CRC32 is faster.) What happens if you get a collision? That is, you have two different long identifiers: a.b.c.d...something a.b.c.d...anotherthing which by bad luck both hash to the same value: a.b.c

Re: Generating valid identifiers

2012-07-26 Thread Emile van Sebille
On 7/26/2012 5:26 AM Laszlo Nagy said... I have a program that creates various database objects in PostgreSQL. There is a DOM, and for each element in the DOM, a database object is created (schema, table, field, index and tablespace). I do not want this program to generate very long identifiers.

Re: Generating valid identifiers

2012-07-26 Thread Peter Otten
Laszlo Nagy wrote: > I do not want this program to generate very long identifiers. It would > increase SQL parsing time, and don't look good. Let's just say that the > limit should be 32 characters. But I also want to recognize the > identifiers when I look at their modified/truncated names. Real

Re: Generating valid identifiers

2012-07-26 Thread Steven D'Aprano
On Thu, 26 Jul 2012 14:26:16 +0200, Laszlo Nagy wrote: > I do not want this program to generate very long identifiers. It would > increase SQL parsing time, Will that increase in SQL parsing time be more, or less, than the time it takes to generate CRC32 or SHA hashsums and append them to a trun

Re: Generating valid identifiers

2012-07-26 Thread Terry Reedy
On 7/26/2012 8:26 AM, Laszlo Nagy wrote: I have a program that creates various database objects in PostgreSQL. There is a DOM, and for each element in the DOM, a database object is created (schema, table, field, index and tablespace). I do not want this program to generate very long identifiers.

Re: Generating valid identifiers

2012-07-26 Thread Arnaud Delobelle
On 26 July 2012 13:26, Laszlo Nagy wrote: [...] > I do not want this program to generate very long identifiers. It would > increase SQL parsing time, and don't look good. Let's just say that the > limit should be 32 characters. But I also want to recognize the identifiers > when I look at their mo

Generating valid identifiers

2012-07-26 Thread Laszlo Nagy
I have a program that creates various database objects in PostgreSQL. There is a DOM, and for each element in the DOM, a database object is created (schema, table, field, index and tablespace). I do not want this program to generate very long identifiers. It would increase SQL parsing time, an