Thanks Julien, I´ll definitely give it a try!!!
[]s Rafael On Wed, Apr 23, 2008 at 8:38 AM, Julien Nioche < [EMAIL PROTECTED]> wrote: > Hi Raphael, > > We initially tried to do the same but ended up developing our own API for > querying the Web 1T. You can find more details on > http://digitalpebble.com/resources.html > There could be a way to reuse elements from Lucene e.g. the Term index > only > but I could not find an obvious way to achieve that. > > Best, > > Julien > > -- > DigitalPebble Ltd > http://www.digitalpebble.com > > > On 23/04/2008, Rafael Turk <[EMAIL PROTECTED]> wrote: > > > > Hi Folks, > > > > I´m trying to load Google Web 1T 5 Gram to Lucene. (This corpus > > contains > > English word n-grams and their observed frequency counts. The length of > > the > > n-grams ranges from unigrams(single words) to five-grams) > > > > I´m loading each ngram (each row is a ngram) as an individual > Document. > > This way I´ll be able to search for each ngram separated, but I´m ending > > with huge indexes witch makes them very hard to load and read the index. > > > > Is there a better way to load and read ngrams to a Lucene index? Maybe > > using lower level api? > > > > > > More Info about Google Web 1T 5 Gram corpus at: > > <http://www.ldc.upenn.edu/Catalog/docs/LDC2006T13/readme.txt> > > > > Thanks, > > > > > > Rafael > > >