extractterms Output
Hi all - thanks in advance for any help... I have an app that aggregates keyword performance through incoming messages. A message comes in, I index it, search the index, and the output the performance. The two things I'm playing with are either parse the output from searcher.explain() or iterating through term fequencies. Typical query is: "Chef's knife" OR basil OR banana OR "frying pan" Explain gets me what I need, but the output would have to be parsed to get the relevant bits. I see the weight and occurences for all three of the above. If go through term fequencies, extractterms splits the terms as Chef's, knife, basil, banana, frying, pan. So, I can get an accurate hit, but term counts are registered individually. I had heard that using explain could be slow when things start to scale up, so I'd rather not have to build a parser to get what I want (or hack the explanation class). Why does extractterms do that, even though the search worked on the compound terms? -David- -- View this message in context: http://lucene.472066.n3.nabble.com/extractterms-Output-tp3654833p3654833.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Build RAMDirectory on FSDirectory, and then synchronzing the two
Maybe you could explain why you are doing this? Someone could suggest alternative approaches. Regards, Sanne On Jan 12, 2012 4:02 AM, "dyzc" <1393975...@qq.com> wrote: > That lies in that my apps add indexes to those in RAM rather than update > them. So the size doubled. Seem not related to the OpenMode.CREATE option. > > > -- Original -- > From: "Ian Lea"; > Date: Wed, Jan 11, 2012 05:20 PM > To: "java-user"; > > Subject: Re: Build RAMDirectory on FSDirectory, and then synchronzing the > two > > > > I tried IndexWriterConfig.OpenMode CREATE, and the size is doubled. > > Prove it. > > > -- > Ian. > > > The only way that is effective is the writer's deleteAll() methods. > > > > On Mon, Jan 9, 2012 at 5:23 AM, Ian Lea wrote: > > > >> If you load an existing disk index into a RAMDirectory, make some > >> changes in RAM and call addIndexes to add the contents of the > >> RAMDirectory to the original disk index, you are likely to end up with > >> duplicate data on disk. Depending of course on what you've done to > >> the RAM index. > >> > >> Sounds you want to call addIndexes using a writer on a new, empty, > >> index or overwrite the original. IndexWriterConfig.OpenMode CREATE. > >> > >> > >> -- > >> Ian. > >> > >> > >> On Mon, Jan 9, 2012 at 4:29 AM, dyzc <1393975...@qq.com> wrote: > >> > I'd better provide a snapshot of my code for people to understand my > >> issues: > >> > > >> > > >> > File file=new File("c:/index_files"); > >> > FSDirectory fsDir=new FSDirectory(file); > >> > RAMDirectory ramDir=new RAMDirectory(fsDir, new > >> IndexWriterConfig(Version.LUCENE_35, new StandardAnalyzer()); > >> > > >> > > >> > IndexWriter iw = new IndexWriter(ramDir, iwc); > >> > > >> > > >> > ..DO something here with iw (associated with ramDir). > >> > > >> > > >> > Now I am trying to synchronize ramDir with fsDir: > >> > > >> > > >> > //close iw prior to synchronization > >> > iw.close(); > >> > > >> > > >> > // synchronize RAM with FS > >> > IndexWriter writer = new IndexWriter(fsDir, new > >> IndexWriterConfig(Version.LUCENE_35, ik)); > >> > writer.addIndexes(ramDir); > >> > writer.close(); > >> > ramDir.close(); > >> > > >> > > >> > > >> > Now I end up with duplicate copies of index files in c:/index_files > >> > > >> > > >> > Is there something that I miss here? > >> > > >> > > >> > -- Original -- > >> > From: "zhoucheng2008"; > >> > Date: Mon, Jan 9, 2012 12:04 PM > >> > To: "java-user"; > >> > > >> > Subject: Build RAMDirectory on FSDirectory, and then synchronzing the > >> two > >> > > >> > > >> > Hi, > >> > > >> > I new a RAMDirectory based upon a FSDirectory. After a few > >> modifications, I would like to synchronize the two. > >> > > >> > > >> > Some on the mailing list provided a solution that uses addIndex() > >> function. > >> > > >> > > >> > However, the FSDirectory simply combines with the RAMDirectory, and > the > >> size doubled. > >> > > >> > > >> > How can I do a real synchronization? > >> > > >> > > >> > Thanks > >> > >> - > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > >> > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Build RAMDirectory on FSDirectory, and then synchronzing the two
The reason is I have indexes on hard drive but want to load them into ram for faster searching, adding, deleting, etc. Using RAMDirectory can help achieve this goal. On Thu, Jan 12, 2012 at 6:36 PM, Sanne Grinovero wrote: > Maybe you could explain why you are doing this? Someone could suggest > alternative approaches. > > Regards, > Sanne > On Jan 12, 2012 4:02 AM, "dyzc" <1393975...@qq.com> wrote: > > > That lies in that my apps add indexes to those in RAM rather than update > > them. So the size doubled. Seem not related to the OpenMode.CREATE > option. > > > > > > -- Original -- > > From: "Ian Lea"; > > Date: Wed, Jan 11, 2012 05:20 PM > > To: "java-user"; > > > > Subject: Re: Build RAMDirectory on FSDirectory, and then synchronzing > the > > two > > > > > > > I tried IndexWriterConfig.OpenMode CREATE, and the size is doubled. > > > > Prove it. > > > > > > -- > > Ian. > > > > > The only way that is effective is the writer's deleteAll() methods. > > > > > > On Mon, Jan 9, 2012 at 5:23 AM, Ian Lea wrote: > > > > > >> If you load an existing disk index into a RAMDirectory, make some > > >> changes in RAM and call addIndexes to add the contents of the > > >> RAMDirectory to the original disk index, you are likely to end up with > > >> duplicate data on disk. Depending of course on what you've done to > > >> the RAM index. > > >> > > >> Sounds you want to call addIndexes using a writer on a new, empty, > > >> index or overwrite the original. IndexWriterConfig.OpenMode CREATE. > > >> > > >> > > >> -- > > >> Ian. > > >> > > >> > > >> On Mon, Jan 9, 2012 at 4:29 AM, dyzc <1393975...@qq.com> wrote: > > >> > I'd better provide a snapshot of my code for people to understand my > > >> issues: > > >> > > > >> > > > >> > File file=new File("c:/index_files"); > > >> > FSDirectory fsDir=new FSDirectory(file); > > >> > RAMDirectory ramDir=new RAMDirectory(fsDir, new > > >> IndexWriterConfig(Version.LUCENE_35, new StandardAnalyzer()); > > >> > > > >> > > > >> > IndexWriter iw = new IndexWriter(ramDir, iwc); > > >> > > > >> > > > >> > ..DO something here with iw (associated with ramDir). > > >> > > > >> > > > >> > Now I am trying to synchronize ramDir with fsDir: > > >> > > > >> > > > >> > //close iw prior to synchronization > > >> > iw.close(); > > >> > > > >> > > > >> > // synchronize RAM with FS > > >> > IndexWriter writer = new IndexWriter(fsDir, new > > >> IndexWriterConfig(Version.LUCENE_35, ik)); > > >> > writer.addIndexes(ramDir); > > >> > writer.close(); > > >> > ramDir.close(); > > >> > > > >> > > > >> > > > >> > Now I end up with duplicate copies of index files in c:/index_files > > >> > > > >> > > > >> > Is there something that I miss here? > > >> > > > >> > > > >> > -- Original -- > > >> > From: "zhoucheng2008"; > > >> > Date: Mon, Jan 9, 2012 12:04 PM > > >> > To: "java-user"; > > >> > > > >> > Subject: Build RAMDirectory on FSDirectory, and then synchronzing > the > > >> two > > >> > > > >> > > > >> > Hi, > > >> > > > >> > I new a RAMDirectory based upon a FSDirectory. After a few > > >> modifications, I would like to synchronize the two. > > >> > > > >> > > > >> > Some on the mailing list provided a solution that uses addIndex() > > >> function. > > >> > > > >> > > > >> > However, the FSDirectory simply combines with the RAMDirectory, and > > the > > >> size doubled. > > >> > > > >> > > > >> > How can I do a real synchronization? > > >> > > > >> > > > >> > Thanks > > >> > > >> - > > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > >> > > >> > > > > - > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org >
10 million entities and 100 million related information
I have 10MM entities, for each of which I will index 10-20 fields. Also, I will have to index 100MM related information of the entities, and each piece of the information will have to go through some Analyzer. I have a few questions: 1) Can I use just one index folder for all the data? 2) If I have to segment the data, what is the size of each segment such that a real-time search is still achievable? Thanks