Re: Lucene and Google Web 1T 5 Gram

2008-04-24 Thread Karl Wettin
Rafael Turk skrev: *Great ideia! Berkeley DB is definitely a try, simple and effective, but I'll have to work the data previously. JDBM has a more appealing license if you ask ASF. karl - To unsubscribe, e-mail:

Re: Lucene and Google Web 1T 5 Gram

2008-04-24 Thread Mathieu Lecarme
Rafael Turk a écrit : Hi Mathieu, *What do you wont to do?* An spell checker and related keyword suggestion Here is a spell checker wich I try to finalize : https://admin.garambrogne.net/projets/revuedepresse/browser/trunk/src/java If you wont an ngram => popularity map, just use a berkl

Re: Lucene and Google Web 1T 5 Gram

2008-04-23 Thread Rafael Turk
Hi Mathieu, *What do you wont to do?* An spell checker and related keyword suggestion If you wont an ngram => popularity map, just use a berkley DB, and use this information in your Lucene application. Lucene is a reversed index, Berkeley DB an index. *Great ideia! Berkeley DB is definitely a t

Re: Lucene and Google Web 1T 5 Gram

2008-04-23 Thread Rafael Turk
Thanks Julien, I´ll definitely give it a try!!! []s Rafael On Wed, Apr 23, 2008 at 8:38 AM, Julien Nioche < [EMAIL PROTECTED]> wrote: > Hi Raphael, > > We initially tried to do the same but ended up developing our own API for > querying the Web 1T. You can find more details on > http://digita

Re: Lucene and Google Web 1T 5 Gram

2008-04-23 Thread Mathieu Lecarme
Rafael Turk a écrit : Hi Folks, I´m trying to load Google Web 1T 5 Gram to Lucene. (This corpus contains English word n-grams and their observed frequency counts. The length of the n-grams ranges from unigrams(single words) to five-grams) I´m loading each ngram (each row is a ngram) as an

Re: Lucene and Google Web 1T 5 Gram

2008-04-23 Thread Julien Nioche
Hi Raphael, We initially tried to do the same but ended up developing our own API for querying the Web 1T. You can find more details on http://digitalpebble.com/resources.html There could be a way to reuse elements from Lucene e.g. the Term index only but I could not find an obvious way to achieve

Lucene and Google Web 1T 5 Gram

2008-04-23 Thread Rafael Turk
Hi Folks, I´m trying to load Google Web 1T 5 Gram to Lucene. (This corpus contains English word n-grams and their observed frequency counts. The length of the n-grams ranges from unigrams(single words) to five-grams) I´m loading each ngram (each row is a ngram) as an individual Document. Th