Re: General purpose hashing func in pgbench

2018-03-21 Thread Teodor Sigaev
I finally managed to perform this test on sparc v9 machine which is 64 bit big-endian architecture. I run pgbench script (see previous message) with default_seed=123 on both x86-64 and sparc machines and visualized the results. You can find them in the attached chart. Both images showed the same d

Re: General purpose hashing func in pgbench

2018-03-14 Thread Ildar Musin
Hello Teodor, On 07.03.2018 16:21, Ildar Musin wrote: Turned out that the only big-endian machine I could run test on is out of order. I finally managed to perform this test on sparc v9 machine which is 64 bit big-endian architecture. I run pgbench script (see previous message) with default_se

Re: General purpose hashing func in pgbench

2018-03-07 Thread Ildar Musin
Hello Teodor, 1) Seems, it's good idea to add credits to Austin Appleby to comments. Done. Also rebased to the latest master. I think that both points refer to the fact that original algorithm accepts a byte string as an input, slices it up by 8 bytes and form unsigned int values from it.

Re: General purpose hashing func in pgbench

2018-03-06 Thread Ildar Musin
Hello Teodor, Thank you for reviewing this patch. On 06.03.2018 15:53, Teodor Sigaev wrote: Patch applies, compiles, pgbench & global "make check" ok, doc built ok. Agree. If I understand upthread correctly, implementation of Murmur hash algorithm based on Austin Appleby work https://github.

Re: General purpose hashing func in pgbench

2018-03-06 Thread Teodor Sigaev
Patch applies, compiles, pgbench & global "make check" ok, doc built ok. Agree. If I understand upthread correctly, implementation of Murmur hash algorithm based on Austin Appleby work https://github.com/aappleby/smhasher/blob/master/src/MurmurHash2.cpp If so, I have notice and objections:

Re: General purpose hashing func in pgbench

2018-01-29 Thread Ildar Musin
On 29.01.2018 15:03, Fabien COELHO wrote: Patch applies, compiles, pgbench & global "make check" ok, doc built ok. Ok for me, switched to "Ready". Thank you for the thorough review! -- Ildar Musin i.mu...@postgrespro.ru

Re: General purpose hashing func in pgbench

2018-01-29 Thread Fabien COELHO
Hello Ildar, Fixed the doc, attached the patch. Thanks! Patch applies, compiles, pgbench & global "make check" ok, doc built ok. Ok for me, switched to "Ready". -- Fabien.

Re: General purpose hashing func in pgbench

2018-01-29 Thread Ildar Musin
Hi Fabien, On 28.01.2018 11:10, Fabien COELHO wrote: Hello Ildar, I did everything you mention here and attached a new version on the patch. Patch applies, compiles, runs ok. Alas, I still have a few more very minor comments about the doc, sorry again: No problem : ) +defaul

Re: General purpose hashing func in pgbench

2018-01-28 Thread Fabien COELHO
Hello Ildar, I did everything you mention here and attached a new version on the patch. Patch applies, compiles, runs ok. Alas, I still have a few more very minor comments about the doc, sorry again: +default_seed + random seed used in hash functions by default s/rando

Re: General purpose hashing func in pgbench

2018-01-27 Thread Ildar Musin
Hello Fabien, 26/01/2018 09:28, Fabien COELHO пишет: > > Hello Ildar, > > Applies, compiles, runs. > > I still have a few very minor comments, sorry for this (hopefully) > last iteration from my part. I'm kind of iterative... > > The XML documentation source should avoid a paragraph on one very l

Re: General purpose hashing func in pgbench

2018-01-25 Thread Fabien COELHO
Hello Ildar, Applies, compiles, runs. I still have a few very minor comments, sorry for this (hopefully) last iteration from my part. I'm kind of iterative... The XML documentation source should avoid a paragraph on one very long line, but rather be indented like other sections. I'd propo

Re: General purpose hashing func in pgbench

2018-01-25 Thread Ildar Musin
Hello Fabien, On 18.01.2018 12:06, Fabien COELHO wrote: My intention was to introduce seed variable which potentially could be used in different contexts, not only for hash functions. Yes, good point. I'd need it for the pseudo-random permutation function. I renamed it to 'hash_seed' for no

Re: General purpose hashing func in pgbench

2018-01-18 Thread Fabien COELHO
Hello Ildar, Patch v8 applies cleanly and compiles. Global and local "make check ok". Doc build ok. For me "random seed" is what is passed to srandom, so the variable should rather be named hash_seed because there could also be a random seed (actually, there is in another parallel patch:-). Mo

Re: General purpose hashing func in pgbench

2018-01-17 Thread Ildar Musin
Hello Fabien, 17/01/2018 10:52, Fabien COELHO пишет: >> Here is a new version of patch. I've splitted it into two parts. The >> first one is almost the same as v4 from [1] with some refactoring. >> The second part introduces random_seed variable as you proposed. > > Patch 1 applies. Compilations

Re: General purpose hashing func in pgbench

2018-01-16 Thread Fabien COELHO
Hello Ildar, Here is a new version of patch. I've splitted it into two parts. The first one is almost the same as v4 from [1] with some refactoring. The second part introduces random_seed variable as you proposed. Patch 1 applies. Compilations fails, there are two "hash_seed" declared in "p

Re: General purpose hashing func in pgbench

2018-01-16 Thread Ildar Musin
Hi Fabien, 13/01/2018 11:16, Fabien COELHO пишет: > > Hello Ildar, > >>> so that different instances of hash function within one script would >>> have different seeds. Yes, that is a good idea, I can do that. >>> >> Added this feature in attached patch. But on a second thought this could >> be so

Re: General purpose hashing func in pgbench

2018-01-13 Thread Fabien COELHO
Hello Ildar, so that different instances of hash function within one script would have different seeds. Yes, that is a good idea, I can do that. Added this feature in attached patch. But on a second thought this could be something that user won't expect. For example, they may want to run pgbe

Re: General purpose hashing func in pgbench

2018-01-12 Thread Fabien COELHO
Hmm. I do not think that we should want a shared seed value. The seed should be different for each call so as to avoid undesired correlations. If wanted, correlation could be obtained by using an explicit identical seed. ISTM that the best way to add the seed is to call random() when the second

Re: General purpose hashing func in pgbench

2018-01-12 Thread Fabien COELHO
Hello Ildar, Hmm. I do not think that we should want a shared seed value. The seed should be different for each call so as to avoid undesired correlations. If wanted, correlation could be obtained by using an explicit identical seed. Probably I'm missing something but I cannot see the point.

Re: General purpose hashing func in pgbench

2018-01-12 Thread Ildar Musin
Hello Fabien, 11/01/2018 19:21, Ildar Musin пишет: > > 10/01/2018 21:42, Fabien COELHO пишет: >> Hmm. I do not think that we should want a shared seed value. The seed >> should be different for each call so as to avoid undesired >> correlations. If wanted, correlation could be obtained by using an

Re: General purpose hashing func in pgbench

2018-01-11 Thread Ildar Musin
10/01/2018 21:42, Fabien COELHO пишет: > > Hmm. I do not think that we should want a shared seed value. The seed > should be different for each call so as to avoid undesired > correlations. If wanted, correlation could be obtained by using an > explicit identical seed. > > ISTM that the best way

Re: General purpose hashing func in pgbench

2018-01-11 Thread Ildar Musin
Hello Fabien, 10/01/2018 21:42, Fabien COELHO пишет: Should we probably add some infrastructure for optional arguments? >>> >>> You can look at the handling of "CASE" which may or may not have an >>> "ELSE" clause. >>> >>> I'd suggest you use a new negative argument with the special meaning

Re: General purpose hashing func in pgbench

2018-01-10 Thread Fabien COELHO
Patch needs a rebase after Teodor push for a set of pgbench functions. Done. Congratulations on your patch finally being committed : ) I forgot: please provide a doc & some coverage tests as well! -- Fabien.

Re: General purpose hashing func in pgbench

2018-01-10 Thread Fabien COELHO
Hello Ildar, Patch needs a rebase after Teodor push for a set of pgbench functions. Done. Congratulations on your patch finally being committed : ) Over 21 months... I hope that pgbench will have hash functions sooner:-) Should we probably add some infrastructure for optional arguments?

Re: General purpose hashing func in pgbench

2018-01-10 Thread Ildar Musin
10/01/2018 16:35, Ildar Musin пишет: > 09/01/2018 23:11, Fabien COELHO пишет: >> Hello Ildar, >> >>> Sorry for a long delay. I've added hash() function which is just an >>> alias for murmur2. I've also utilized variable arguments feature from >>> least()/greates() functions to make optional seed p

Re: General purpose hashing func in pgbench

2018-01-10 Thread Ildar Musin
09/01/2018 23:11, Fabien COELHO пишет: > > Hello Ildar, > >> Sorry for a long delay. I've added hash() function which is just an >> alias for murmur2. I've also utilized variable arguments feature from >> least()/greates() functions to make optional seed parameter, but I >> consider this as a hack

Re: General purpose hashing func in pgbench

2018-01-09 Thread Fabien COELHO
Hello Ildar, Sorry for a long delay. I've added hash() function which is just an alias for murmur2. I've also utilized variable arguments feature from least()/greates() functions to make optional seed parameter, but I consider this as a hack. Patch needs a rebase after Teodor push for a set o

Re: General purpose hashing func in pgbench

2018-01-09 Thread Ildar Musin
Hello Fabien, 25/12/2017 19:17, Fabien COELHO пишет: > >>> However, the key can be used if controlled so that different values do >>> not have the same randomization in different part of the script, so as >>> to avoid using the same patterns for different things if not desirable. >> >> Original m

Re: General purpose hashing func in pgbench

2018-01-06 Thread Stephen Frost
Greetings Ildar, * Fabien COELHO (coe...@cri.ensmp.fr) wrote: > >>I noticed from the source of all human knowledege (aka Wikipedia:-) > >>that there seems to be a murmur3 successor. Have you considered it? > >>One good reason to skip it would be that the implementation is long > >>and complex. I'm

Re: General purpose hashing func in pgbench

2017-12-26 Thread Fabien COELHO
Bonjour Daniel, Most "permutation" functions are really cryptographic cyphers which are quite expensive, and require powers of two, which is not what is needed. ISTM that there are some constructs to deal with arbitrary sizes based on cryptographic functions, but that would make it too expensiv

Re: General purpose hashing func in pgbench

2017-12-26 Thread Daniel Verite
Fabien COELHO wrote: > Most "permutation" functions are really cryptographic cyphers which are > quite expensive, and require powers of two, which is not what is needed. > ISTM that there are some constructs to deal with arbitrary sizes based on > cryptographic functions, but that would

Re: General purpose hashing func in pgbench

2017-12-25 Thread Fabien COELHO
Hello, However, the key can be used if controlled so that different values do not have the same randomization in different part of the script, so as to avoid using the same patterns for different things if not desirable. Original murmur algorithm accepts seed as a parameter, which can be used

Re: General purpose hashing func in pgbench

2017-12-25 Thread Ildar Musin
25/12/2017 17:12, Fabien COELHO пишет: > > However, the key can be used if controlled so that different values do > not have the same randomization in different part of the script, so as > to avoid using the same patterns for different things if not desirable. Original murmur algorithm accepts see

Re: General purpose hashing func in pgbench

2017-12-25 Thread Fabien COELHO
Hello, I noticed from the source of all human knowledege (aka Wikipedia:-) that there seems to be a murmur3 successor. Have you considered it? One good reason to skip it would be that the implementation is long and complex. I'm not sure about a 8-byte input simplified version. Murmur2 naturall

Re: General purpose hashing func in pgbench

2017-12-25 Thread Ildar Musin
Hello Fabien, 24/12/2017 11:12, Fabien COELHO пишет: > > Yep. The ugliness is significantly linked to the choice of name. With > MM2_MUL and MM2_ROT ISTM that it is more readable: > >>     k *= MM2_MUL; >>     k ^= k >> MM2_ROT; >>     k *= MM2_MUL; >>     result ^= k; >>     result *= MM2_MUL; O

Re: General purpose hashing func in pgbench

2017-12-24 Thread Fabien COELHO
Hello Ildar, Actually the "bad" one appears in YCSB. Fine. Then it must be kept, whatever its quality. But if we should choose the only one I would stick to murmur too given it provides better results while having similar computational complexity. No. Keep both as there is a justificatio

Re: General purpose hashing func in pgbench

2017-12-22 Thread Ildar Musin
21/12/2017 18:26, Fabien COELHO пишет: > >> I think it is not commitfest ready yet -- I need to add some >> documentation and tests first. > > Yes, doc & test are missing. > > From your figures, the murmur2 algorithm output looks way better. I'm > wondering whether it makes sense to provide a bad

Re: General purpose hashing func in pgbench

2017-12-21 Thread Fabien COELHO
I think it is not commitfest ready yet -- I need to add some documentation and tests first. Yes, doc & test are missing. From your figures, the murmur2 algorithm output looks way better. I'm wondering whether it makes sense to provide a bad hash function if a good/better one is available, u

Re: General purpose hashing func in pgbench

2017-12-21 Thread Ildar Musin
21/12/2017 15:44, Fabien COELHO пишет: > >>> Add the submission to the next CF? >> I think it is not commitfest ready yet -- I need to add some >> documentation and tests first. > > It just helps to that the thread is referenced, and the review process > has started anyway. > You are right, I've su

Re: General purpose hashing func in pgbench

2017-12-21 Thread Fabien COELHO
Add the submission to the next CF? I think it is not commitfest ready yet -- I need to add some documentation and tests first. It just helps to that the thread is referenced, and the review process has started anyway. -- Fabien.

Re: General purpose hashing func in pgbench

2017-12-19 Thread Fabien COELHO
Hello Ildar, Following up the recent discussion on zipfian distribution I was trying to reproduce some YCSB-like workloads. As this paper [1] describes, YCSB uses zipfian distribution to generate keys in order simulate intensive load on small number of records as it happens in real world applic