extractterms Output

2012-01-12 Thread David Olson
Hi all - thanks in advance for any help...

I have an app that aggregates keyword performance through incoming messages.
A message comes in, I index it, search the index, and the output the
performance. The two things I'm playing with are either parse the output
from searcher.explain() or iterating through term fequencies.

Typical query is: "Chef's knife" OR basil OR banana OR "frying pan" 

Explain gets me what I need, but the output would have to be parsed to get
the relevant bits. I see the weight and occurences for all three of the
above.

If go through term fequencies, extractterms splits the terms as Chef's,
knife, basil, banana, frying, pan. So, I can get an accurate hit, but term
counts are registered individually.

I had heard that using explain could be slow when things start to scale up,
so I'd rather not have to build a parser to get what I want (or hack the
explanation class).

Why does extractterms do that, even though the search worked on the compound
terms?

-David-

--
View this message in context: 
http://lucene.472066.n3.nabble.com/extractterms-Output-tp3654833p3654833.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Build RAMDirectory on FSDirectory, and then synchronzing the two

2012-01-12 Thread Sanne Grinovero
Maybe you could explain why you are doing this? Someone could suggest
alternative approaches.

Regards,
Sanne
On Jan 12, 2012 4:02 AM, "dyzc" <1393975...@qq.com> wrote:

> That lies in that my apps add indexes to those in RAM rather than update
> them. So the size doubled. Seem not related to the OpenMode.CREATE option.
>
>
> -- Original --
> From:  "Ian Lea";
> Date:  Wed, Jan 11, 2012 05:20 PM
> To:  "java-user";
>
> Subject:  Re: Build RAMDirectory on FSDirectory, and then synchronzing the
> two
>
>
> > I tried  IndexWriterConfig.OpenMode CREATE, and the size is doubled.
>
> Prove it.
>
>
> --
> Ian.
>
> > The only way that is effective is the writer's deleteAll() methods.
> >
> > On Mon, Jan 9, 2012 at 5:23 AM, Ian Lea  wrote:
> >
> >> If you load an existing disk index into a RAMDirectory, make some
> >> changes in RAM and call addIndexes to add the contents of the
> >> RAMDirectory to the original disk index, you are likely to end up with
> >> duplicate data on disk.  Depending of course on what you've done to
> >> the RAM index.
> >>
> >> Sounds you want to call addIndexes using a writer on a new, empty,
> >> index or overwrite the original. IndexWriterConfig.OpenMode CREATE.
> >>
> >>
> >> --
> >> Ian.
> >>
> >>
> >> On Mon, Jan 9, 2012 at 4:29 AM, dyzc <1393975...@qq.com> wrote:
> >> > I'd better provide a snapshot of my code for people to understand my
> >> issues:
> >> >
> >> >
> >> > File file=new File("c:/index_files");
> >> > FSDirectory fsDir=new FSDirectory(file);
> >> > RAMDirectory ramDir=new RAMDirectory(fsDir, new
> >> IndexWriterConfig(Version.LUCENE_35, new StandardAnalyzer());
> >> >
> >> >
> >> > IndexWriter iw = new IndexWriter(ramDir, iwc);
> >> >
> >> >
> >> > ..DO something here with iw (associated with ramDir).
> >> >
> >> >
> >> > Now I am trying to synchronize ramDir with fsDir:
> >> >
> >> >
> >> > //close iw prior to synchronization
> >> > iw.close();
> >> >
> >> >
> >> > // synchronize RAM with FS
> >> > IndexWriter writer = new IndexWriter(fsDir, new
> >> IndexWriterConfig(Version.LUCENE_35, ik));
> >> > writer.addIndexes(ramDir);
> >> > writer.close();
> >> > ramDir.close();
> >> >
> >> >
> >> >
> >> > Now I end up with duplicate copies of index files in c:/index_files
> >> >
> >> >
> >> > Is there something that I miss here?
> >> >
> >> >
> >> > -- Original --
> >> > From:  "zhoucheng2008";
> >> > Date:  Mon, Jan 9, 2012 12:04 PM
> >> > To:  "java-user";
> >> >
> >> > Subject:  Build RAMDirectory on FSDirectory, and then synchronzing the
> >> two
> >> >
> >> >
> >> > Hi,
> >> >
> >> > I new a RAMDirectory based upon a FSDirectory. After a few
> >> modifications, I would like to synchronize the two.
> >> >
> >> >
> >> > Some on the mailing list provided a solution that uses addIndex()
> >> function.
> >> >
> >> >
> >> > However, the FSDirectory simply combines with the RAMDirectory, and
> the
> >> size doubled.
> >> >
> >> >
> >> > How can I do a real synchronization?
> >> >
> >> >
> >> > Thanks
> >>
> >> -
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>
> >>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org


Re: Build RAMDirectory on FSDirectory, and then synchronzing the two

2012-01-12 Thread Cheng
The reason is I have indexes on hard drive but want to load them into ram
for faster searching, adding, deleting, etc.

Using RAMDirectory can help achieve this goal.

On Thu, Jan 12, 2012 at 6:36 PM, Sanne Grinovero
wrote:

> Maybe you could explain why you are doing this? Someone could suggest
> alternative approaches.
>
> Regards,
> Sanne
> On Jan 12, 2012 4:02 AM, "dyzc" <1393975...@qq.com> wrote:
>
> > That lies in that my apps add indexes to those in RAM rather than update
> > them. So the size doubled. Seem not related to the OpenMode.CREATE
> option.
> >
> >
> > -- Original --
> > From:  "Ian Lea";
> > Date:  Wed, Jan 11, 2012 05:20 PM
> > To:  "java-user";
> >
> > Subject:  Re: Build RAMDirectory on FSDirectory, and then synchronzing
> the
> > two
> >
> >
> > > I tried  IndexWriterConfig.OpenMode CREATE, and the size is doubled.
> >
> > Prove it.
> >
> >
> > --
> > Ian.
> >
> > > The only way that is effective is the writer's deleteAll() methods.
> > >
> > > On Mon, Jan 9, 2012 at 5:23 AM, Ian Lea  wrote:
> > >
> > >> If you load an existing disk index into a RAMDirectory, make some
> > >> changes in RAM and call addIndexes to add the contents of the
> > >> RAMDirectory to the original disk index, you are likely to end up with
> > >> duplicate data on disk.  Depending of course on what you've done to
> > >> the RAM index.
> > >>
> > >> Sounds you want to call addIndexes using a writer on a new, empty,
> > >> index or overwrite the original. IndexWriterConfig.OpenMode CREATE.
> > >>
> > >>
> > >> --
> > >> Ian.
> > >>
> > >>
> > >> On Mon, Jan 9, 2012 at 4:29 AM, dyzc <1393975...@qq.com> wrote:
> > >> > I'd better provide a snapshot of my code for people to understand my
> > >> issues:
> > >> >
> > >> >
> > >> > File file=new File("c:/index_files");
> > >> > FSDirectory fsDir=new FSDirectory(file);
> > >> > RAMDirectory ramDir=new RAMDirectory(fsDir, new
> > >> IndexWriterConfig(Version.LUCENE_35, new StandardAnalyzer());
> > >> >
> > >> >
> > >> > IndexWriter iw = new IndexWriter(ramDir, iwc);
> > >> >
> > >> >
> > >> > ..DO something here with iw (associated with ramDir).
> > >> >
> > >> >
> > >> > Now I am trying to synchronize ramDir with fsDir:
> > >> >
> > >> >
> > >> > //close iw prior to synchronization
> > >> > iw.close();
> > >> >
> > >> >
> > >> > // synchronize RAM with FS
> > >> > IndexWriter writer = new IndexWriter(fsDir, new
> > >> IndexWriterConfig(Version.LUCENE_35, ik));
> > >> > writer.addIndexes(ramDir);
> > >> > writer.close();
> > >> > ramDir.close();
> > >> >
> > >> >
> > >> >
> > >> > Now I end up with duplicate copies of index files in c:/index_files
> > >> >
> > >> >
> > >> > Is there something that I miss here?
> > >> >
> > >> >
> > >> > -- Original --
> > >> > From:  "zhoucheng2008";
> > >> > Date:  Mon, Jan 9, 2012 12:04 PM
> > >> > To:  "java-user";
> > >> >
> > >> > Subject:  Build RAMDirectory on FSDirectory, and then synchronzing
> the
> > >> two
> > >> >
> > >> >
> > >> > Hi,
> > >> >
> > >> > I new a RAMDirectory based upon a FSDirectory. After a few
> > >> modifications, I would like to synchronize the two.
> > >> >
> > >> >
> > >> > Some on the mailing list provided a solution that uses addIndex()
> > >> function.
> > >> >
> > >> >
> > >> > However, the FSDirectory simply combines with the RAMDirectory, and
> > the
> > >> size doubled.
> > >> >
> > >> >
> > >> > How can I do a real synchronization?
> > >> >
> > >> >
> > >> > Thanks
> > >>
> > >> -
> > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> > >>
> > >>
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
>


10 million entities and 100 million related information

2012-01-12 Thread Cheng
I have 10MM entities, for each of which I will index 10-20 fields. Also, I
will have to index 100MM related information of the entities, and each
piece of the information will have to go through some Analyzer.

I have a few questions:

1) Can I use just one index folder for all the data?

2) If I have to segment the data, what is the size of each segment such
that a real-time search is still achievable?

Thanks