50 million HTML pages (part of clueweb09 dataset for TREC) were indexed using Hadoop into 56 indexes. 56 indexes were merged into a single index. Analyzer is the StandardAnalyzer.
On Thu, Jul 16, 2009 at 6:07 PM, Anshum <ansh...@gmail.com> wrote: > Hi Prashant, > > What did you index? how did you index? what analyzer did you use? without > all of these, perhaps it'd be difficult to figure out the issue. > > -- > Anshum Gupta > Naukri Labs! > http://ai-cafe.blogspot.com > > The facts expressed here belong to everybody, the opinions to me. The > distinction is yours to draw............ > > > On Thu, Jul 16, 2009 at 6:04 PM, prashant ullegaddi < > prashullega...@gmail.com> wrote: > > > Hi, > > > > I tried searching: > > "Apache Jakarta"~10 > > > > Nothing was returned. What might be wrong? > > > > Regards, > > Prashant. > > >