Hi Mathieu, *What do you wont to do?*
An spell checker and related keyword suggestion If you wont an ngram => popularity map, just use a berkley DB, and use this information in your Lucene application. Lucene is a reversed index, Berkeley DB an index. *Great ideia! Berkeley DB is definitely a try, simple and effective, but I'll have to work the data previously. I was hopping to take advantage of Lucene's built in features* ** *[]s* ** On Wed, Apr 23, 2008 at 10:16 AM, Mathieu Lecarme <[EMAIL PROTECTED]> wrote: > Rafael Turk a écrit : > > Hi Folks, > > > > I´m trying to load Google Web 1T 5 Gram to Lucene. (This corpus > > contains > > English word n-grams and their observed frequency counts. The length of > > the > > n-grams ranges from unigrams(single words) to five-grams) > > > > I´m loading each ngram (each row is a ngram) as an individual > > Document. > > This way I´ll be able to search for each ngram separated, but I´m ending > > with huge indexes witch makes them very hard to load and read the index. > > > > Is there a better way to load and read ngrams to a Lucene index? Maybe > > using lower level api? > > > > > > More Info about Google Web 1T 5 Gram corpus at: > > <http://www.ldc.upenn.edu/Catalog/docs/LDC2006T13/readme.txt> > > > > Thanks, > > > > Rafael > > > > > > > > What do you wont to do? > If you wont an ngram => popularity map, just use a berkley DB, and use > this information in your Lucene application. Lucene is a reversed index, > Berkeley DB an index. > > M. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >