Re: [PERFORM] Indexes for hashes

2016-06-17 Thread julyanto SUTANDANG
Crc32 is great because it is supported by Intel Hardware, unfortunatelly you have to code something like this: http://stackoverflow.com/questions/31184201/how-to-implement-crc32-taking-advantage-of-intel-specific-instructions int32_t sse42_crc32(const uint8_t *bytes, size_t len){ uint32_t hash

Re: [PERFORM] Indexes for hashes

2016-06-17 Thread Ivan Voras
And in any case, there's no crc32 in the built-in pgcrypto module. On 17 June 2016 at 06:18, Claudio Freire wrote: > On Fri, Jun 17, 2016 at 1:09 AM, julyanto SUTANDANG > wrote: > > This way is doing faster using crc32(data) than hashtext since crc32 is > > hardware accelerated in intel (and o

Re: [PERFORM] Indexes for hashes

2016-06-16 Thread Claudio Freire
On Fri, Jun 17, 2016 at 1:09 AM, julyanto SUTANDANG wrote: > This way is doing faster using crc32(data) than hashtext since crc32 is > hardware accelerated in intel (and others perhaps) > this way (crc32) is no way the same as hash, much way faster than others... > > Regards, Sure, but I've had

Re: [PERFORM] Indexes for hashes

2016-06-16 Thread julyanto SUTANDANG
This way is doing faster using crc32(data) than hashtext since crc32 is hardware accelerated in intel (and others perhaps) this way (crc32) is no way the same as hash, much way faster than others... Regards, On Fri, Jun 17, 2016 at 10:51 AM, Claudio Freire wrote: > On Wed, Jun 15, 2016 at 6:3

Re: [PERFORM] Indexes for hashes

2016-06-16 Thread Claudio Freire
On Wed, Jun 15, 2016 at 6:34 AM, Ivan Voras wrote: > > I have an application which stores a large amounts of hex-encoded hash > strings (nearly 100 GB of them), which means: > > The number of distinct characters (alphabet) is limited to 16 > Each string is of the same length, 64 characters > The s

Re: [PERFORM] Indexes for hashes

2016-06-15 Thread Jeff Janes
On Wed, Jun 15, 2016 at 6:16 AM, k...@rice.edu wrote: > On Wed, Jun 15, 2016 at 03:09:04PM +0200, Ivan Voras wrote: >> On 15 June 2016 at 15:03, k...@rice.edu wrote: >> >> >> I don't suppose there's an effort in progress to make hash indexes use WAL? >> :D > > Hi Ivan, > > Several people have loo

Re: [PERFORM] Indexes for hashes

2016-06-15 Thread k...@rice.edu
On Wed, Jun 15, 2016 at 04:20:46PM +0200, Ivan Voras wrote: > Hi, > > Just for testing... is there a fast (i.e. written in C) crc32 or a similar > small hash function for PostgreSQL? > Hi Ivan, Here is an extension that provides a number of different hash functions, including a version of the v

Re: [PERFORM] Indexes for hashes

2016-06-15 Thread Joshua D. Drake
On 06/15/2016 07:20 AM, Ivan Voras wrote: Hi, Just for testing... is there a fast (i.e. written in C) crc32 or a similar small hash function for PostgreSQL? https://www.postgresql.org/docs/9.5/static/pgcrypto.html We also have a builtin md5(). JD -- Command Prompt, Inc. http

Re: [PERFORM] Indexes for hashes

2016-06-15 Thread Ivan Voras
Hi, Just for testing... is there a fast (i.e. written in C) crc32 or a similar small hash function for PostgreSQL? On 15 June 2016 at 16:00, Ivan Voras wrote: > Hi, > > This idea is similar to the substring one, and while it does give > excellent performance and small size, it requires applica

Re: [PERFORM] Indexes for hashes

2016-06-15 Thread Ivan Voras
Hi, This idea is similar to the substring one, and while it does give excellent performance and small size, it requires application code modifications, so it's out. On 15 June 2016 at 15:58, julyanto SUTANDANG wrote: > Hi Ivan, > > How about using crc32 ? and then index the integer as the resu

Re: [PERFORM] Indexes for hashes

2016-06-15 Thread julyanto SUTANDANG
Hi Ivan, How about using crc32 ? and then index the integer as the result of crc32 function? you can split the hash into 2 part and do crc32 2x ? and then create composite index on both integer (the crc32 result) instead of using 64 char, you only employ 2 integer as index key. Regards, Jul On

Re: [PERFORM] Indexes for hashes

2016-06-15 Thread Ivan Voras
Hi, I understand your idea, and have also been thinking about it. Basically, existing applications would need to be modified, however slightly, and that wouldn't be good. On 15 June 2016 at 15:38, hubert depesz lubaczewski wrote: > On Wed, Jun 15, 2016 at 11:34:18AM +0200, Ivan Voras wrote:

Re: [PERFORM] Indexes for hashes

2016-06-15 Thread hubert depesz lubaczewski
On Wed, Jun 15, 2016 at 11:34:18AM +0200, Ivan Voras wrote: > I have an application which stores a large amounts of hex-encoded hash > strings (nearly 100 GB of them), which means: Why do you keep them hex encoded, and not use bytea? I made a sample table with 1 million rows, looking like this:

Re: [PERFORM] Indexes for hashes

2016-06-15 Thread k...@rice.edu
On Wed, Jun 15, 2016 at 03:09:04PM +0200, Ivan Voras wrote: > On 15 June 2016 at 15:03, k...@rice.edu wrote: > > > On Wed, Jun 15, 2016 at 11:34:18AM +0200, Ivan Voras wrote: > > > Hi, > > > > > > I have an application which stores a large amounts of hex-encoded hash > > > strings (nearly 100 GB

Re: [PERFORM] Indexes for hashes

2016-06-15 Thread Ivan Voras
On 15 June 2016 at 15:03, k...@rice.edu wrote: > On Wed, Jun 15, 2016 at 11:34:18AM +0200, Ivan Voras wrote: > > Hi, > > > > I have an application which stores a large amounts of hex-encoded hash > > strings (nearly 100 GB of them), which means: > > > >- The number of distinct characters (alp

Re: [PERFORM] Indexes for hashes

2016-06-15 Thread k...@rice.edu
On Wed, Jun 15, 2016 at 11:34:18AM +0200, Ivan Voras wrote: > Hi, > > I have an application which stores a large amounts of hex-encoded hash > strings (nearly 100 GB of them), which means: > >- The number of distinct characters (alphabet) is limited to 16 >- Each string is of the same len

Re: [PERFORM] Indexes for hashes

2016-06-15 Thread Torsten Zuehlsdorff
Hello Ivan, I have an application which stores a large amounts of hex-encoded hash strings (nearly 100 GB of them), which means: * The number of distinct characters (alphabet) is limited to 16 * Each string is of the same length, 64 characters * The strings are essentially random Creatin

[PERFORM] Indexes for hashes

2016-06-15 Thread Ivan Voras
Hi, I have an application which stores a large amounts of hex-encoded hash strings (nearly 100 GB of them), which means: - The number of distinct characters (alphabet) is limited to 16 - Each string is of the same length, 64 characters - The strings are essentially random Creating a B-T