Re: Securing stored data using Lucene

2013-07-03 Thread VIGNESH S
Hi Rafaela, Look at Lucene Transform.It might help to encrypt lucene documents. https://code.google.com/p/lucenetransform/ On Wed, Jun 26, 2013 at 2:36 PM, Rafaela Voiculescu < rafaela.voicule...@gmail.com> wrote: > Hello, > > Thank you all for your help and the suggestions. They are very usef

Re: handling nonexistent fields in an index

2013-07-03 Thread David Carlton
Oh, thanks for the pointer! I'll look to see if I can use that. On Wed, Jul 3, 2013 at 1:53 PM, Jack Krupansky wrote: > There is a Lucene filter that you can use to check efficiently for whether > a field has a value or not. > > new ConstantScoreQuery(new FieldValueFilter(String field, boolean n

Re: handling nonexistent fields in an index

2013-07-03 Thread Jack Krupansky
There is a Lucene filter that you can use to check efficiently for whether a field has a value or not. new ConstantScoreQuery(new FieldValueFilter(String field, boolean negate)) -- Jack Krupansky -Original Message- From: David Carlton Sent: Wednesday, July 03, 2013 4:27 PM To: java-u

Re: Issues with SortedSetDocValuesAccumulator when index has multiple segments?

2013-07-03 Thread Michael McCandless
I opened https://issues.apache.org/jira/browse/LUCENE-5090 Kaze, if you could try out that patch and see if it throws a better exception in your case that would be great ... Mike McCandless http://blog.mikemccandless.com On Wed, Jul 3, 2013 at 4:16 PM, Michael McCandless wrote: > Hmm, not goo

handling nonexistent fields in an index

2013-07-03 Thread David Carlton
I have a bunch of Lucene indices lying around, and I want to start adding a new field to documents in new indices that I'm generating. So, for a given index, either every document in the index will have that field or no document will have that field. The new field has a default value; and I would

Re: Issues with SortedSetDocValuesAccumulator when index has multiple segments?

2013-07-03 Thread Michael McCandless
Hmm, not good. One trickiness with SSDVA is that you must create a new SortedSetDocValuesReaderState every time you open a new IndexReader. If you don't do this correctly, e.g. you use the SSDVReaderState from an old reader, then it can lead to exceptions like this. Is it possible that's happeni

Issues with SortedSetDocValuesAccumulator when index has multiple segments?

2013-07-03 Thread Kaze
Hello, I'm a novice Lucene user and just started using it to do some prototyping for my project. I noticed SortedSetDocValues was introduced in 4.3.0 that allows faceted search without a dedicated taxonomy index. I've successfully used it to perform faceting on a small index (~3000 documents, ~4

Re: Facets ordering

2013-07-03 Thread Shai Erera
What's maxCount? What I mean is that if you create a FacetRequest with numResults = 5*K (for example), then you get the top-5K categories and can choose the best top-K of those, by their label. Yes, this will hurt top-K computation the least, but is not guaranteed to return the correct top-K. The

Re: Accumulating facets over a MultiReader

2013-07-03 Thread Shai Erera
What do you mean addDocument()? You re-index it? In that case, when you re-index it, just make sure to use FacetFields.addFacets() on it, so its facets are re-indexed too. Shai On Wed, Jul 3, 2013 at 8:52 PM, Peng Gao wrote: > Shai, > Thanks. > > I went with option #3 since the temp indexes ar

RE: Accumulating facets over a MultiReader

2013-07-03 Thread Peng Gao
Shai, Thanks. I went with option #3 since the temp indexes are actually created in separate processes in my case. It works. Now one more complication. I have a case where I need to merge only unique docs in the temp indexes into the master index. I have a unique key for each doc. Before facets,

Help with document design for indexing/searching

2013-07-03 Thread gtkesh
Hi everyone! This is my first post here and I'm new to Lucene, so I would appreciate your ideas with the design of lucene document I came up with. *What is my goal* I'm trying to index the collection of xml documents and all have the same structure like this: Each tag can itself have tag which

Re: Accumulating facets over a MultiReader

2013-07-03 Thread Shai Erera
Hi There are a couple of ways you can address that: Not create an index per-thread, but rather update the global index by all threads. IndexWriter and TaxoWriter support multiple threads. -- Or, if you need to build an index per-thread -- Use a single TaxonomyWriter instance and share between a

RE: Accumulating facets over a MultiReader

2013-07-03 Thread Peng Gao
Hi Shai, Thanks for the reply. Yes I used a single TaxonomyReader instance. I am adding facets to an existing app, which maintains two indexes, one for indexing system tools, and the other indexing user data in folders. The system tool index contains docs for describing the tool usage, and etc,

Re: Questions about doing a full text search with numeric values

2013-07-03 Thread Ivan Krišto
On 07/01/2013 12:22 PM, Erick Erickson wrote: > WordDelimiterFilter(Factory if you're experimenting with > Solr as Jack suggests) will fix a number of your cases since > it splits on case change and numeric/alpha changes. If WordDelimiterFilter doesn't help, maybe you could take a look at n-gram t