Hello, Today I was reading a blog post from a fellow coworker http://www.depesz.com/index.php/2010/12/11/waiting-for-9-1-knngist/ and started to mess around with the trigram contrib package for postgres and playing with some different word dictionaries for English and German. I was wanting to see how performant particular queries could be if SIGLENINT in trgm.h was adjusted to be the avg character length for a particular word dictionary
http://packages.ubuntu.com/dapper/wamerican compling=# SELECT AVG(LENGTH(CAST(word AS bytea), 'UTF8')) FROM english_words; avg -------------------- 8.4498980409662267 vs http://packages.ubuntu.com/dapper/wngerman compling=# SELECT AVG(LENGTH(CAST(word AS bytea), 'UTF8')) FROM words; //german avg --------------------- 11.9518056504365566 (unsurprisingly German words are on average longer than English ones) Effectly wanting to make the trigram package act more along the lines of n-gram where I am explicitly setting the N when it is built. I, am however, not very proficient in C and doubt that is the only change necessary needed to convert the trigram contrib to an n-gram as after changing SIGLENINT to 12 in trgm.h I still get trigram results for show_trgrm() . I was hoping someone familiar with it could provide a little help for me by perhaps giving me a path of action needed to change the trigram implementation to behave as an n-gram. Thanks for your time and I appreciate any advice anyone can give me. Anthony Gentile