Re: Exploiting a whole lot of memory

2013-10-09 Thread Benson Margulies
On Wed, Oct 9, 2013 at 7:18 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Wed, Oct 9, 2013 at 7:13 PM, Benson Margulies > wrote: > > On Tue, Oct 8, 2013 at 5:50 PM, Michael McCandless < > > luc...@mikemccandless.com> wrote: > > > >> DirectPostingsFormat? > >> > >> It stores all

Re: Exploiting a whole lot of memory

2013-10-09 Thread Michael McCandless
On Wed, Oct 9, 2013 at 7:13 PM, Benson Margulies wrote: > On Tue, Oct 8, 2013 at 5:50 PM, Michael McCandless < > luc...@mikemccandless.com> wrote: > >> DirectPostingsFormat? >> >> It stores all terms + postings as simple java arrays, uncompressed. >> > > This definitely speeded things up in my ben

Re: Exploiting a whole lot of memory

2013-10-09 Thread Benson Margulies
On Tue, Oct 8, 2013 at 5:50 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > DirectPostingsFormat? > > It stores all terms + postings as simple java arrays, uncompressed. > This definitely speeded things up in my benchmark, but I'm greedy for more. I just made a codec that returns it

Re: Lucene in-memory index

2013-10-09 Thread Vitaly Funstein
I don't think you want to load indexes of this size into a RAMDirectory. The reasons have been listed multiple times here... in short, just use MMapDirectory. On Wed, Oct 9, 2013 at 3:17 PM, Igor Shalyminov wrote: > Hello! > > I need to perform an experiment of loading the entire index in RAM an

Re: Wildcard question

2013-10-09 Thread Jack Krupansky
You get to decide: class QueryParser extends QueryParserBase: /** * Set to true to allow leading wildcard characters. * * When set, * or ? are allowed as * the first character of a PrefixQuery and WildcardQuery. * Note that this can produce very slow * queries on big indexes. * * Default: fals

Wildcard question

2013-10-09 Thread Carlos de Luna Saenz
I've used Lucene 2,3 and now 4... i used to believe that * wildcard on the begining was acepted since 3 (but never used) and reviewing documentation says "Note: You cannot use a * or ? symbol as the first character of a search." is that ok or is a missupdated note on theĀ  http://lucene.apache.or

Lucene in-memory index

2013-10-09 Thread Igor Shalyminov
Hello! I need to perform an experiment of loading the entire index in RAM and seeing how the search performance changes. My index has TermVectors with payload and position info, StoredFields, and DocValues. It takes ~30GB on disk (the server has 48). _indexDirectoryReader = DirectoryReader.open

Re: Synonym Search in Lucene..

2013-10-09 Thread Koji Sekiguchi
Hi Vignesh, I'm not sure it satisfies you, but there seems to be wordnet for global: http://globalwordnet.org/ koji (13/10/09 21:40), VIGNESH S wrote: Hi Koji, I got your Idea.Its awesome.. But my problem is Dictionary corpus itself..If I use Wordnet,it can create dictionary index for only

Re: Synonym Search in Lucene..

2013-10-09 Thread VIGNESH S
Hi Koji, I got your Idea.Its awesome.. But my problem is Dictionary corpus itself..If I use Wordnet,it can create dictionary index for only English.. I need to create Dictionary Index for all languages.I want to know whether anything like wordnet which i can readily plugin in my application ..

Re: Synonym Search in Lucene..

2013-10-09 Thread Koji Sekiguchi
Hi VIGNESH, The heart of my idea in the article is that if you have a dictionary (corpus) in Lucene index, my program can extract synonym data from the index. Wikipedia was a concrete example for the description I used. Please see the figure in the article for the system architecture. koji (13

Re: Synonym Search in Lucene..

2013-10-09 Thread VIGNESH S
Hi Koji, Thanks for your reply and guidance. I have read the Below Article and it is really helpful in getting the relevant synonyms. But How are you getting the synonym from Wikipedia..do wikipedia expose any API or is there any readymade dictionary file wikipedia is giving for all languages.