Re: Multiple faceting in lucene

2013-01-31 Thread Ramprakash Ramamoorthy
On Fri, Jan 25, 2013 at 6:23 PM, Shai Erera wrote: > Hi > > Are the values of 'a' and 'b' known in advance? Is it a limited set of > values? Are you always interested in a table which covers all values? > > If so, one way to do that is to each value of 'a' against all values of > 'b'. Of course,

Re: How to properly use updatedocument in lucene.

2013-01-31 Thread VIGNESH S
Hi Mike, Thanks for your reply.. MY Scenario is I am creating Lucene Index with Two Fields 1.Filename 2.File Contents For Example I initially added fields FileName:-say LuceneInAction.pdf which is not analysed FileContents:Content of the Book it is analysed using custom analyzer. Now what is t

Re: CompressingStoredFieldsFormat doesn't show improvement

2013-01-31 Thread arun k
Hi, Random data was indexed. I wanted to see the worst case where little data is same across documents and which in most of my cases is. So, i guess in these scenarios compression becomes an overhead. Arun On Thu, Jan 31, 2013 at 8:00 PM, Robert Muir wrote: > The top method here is your ra

Re: How to get field names and types from an IndexSearcher

2013-01-31 Thread Michael McCandless
On Thu, Jan 31, 2013 at 7:31 AM, Rolf Veen wrote: > Thank you, Mike. > > I didn't state why I need this. I want to be able to send > a query to some QueryParser that understands "field:1" > regardless if 'field' was added as StringField or LongField, > for example. I do not want to rely on schema

Re: How to properly use updatedocument in lucene.

2013-01-31 Thread Michael McCandless
On Thu, Jan 31, 2013 at 7:56 AM, Trejkaz wrote: > On Thu, Jan 31, 2013 at 11:05 PM, Michael McCandless > wrote: >> It's confusing, but you should never try to re-index a document you >> retrieved from a searcher, because certain index-time details (eg, >> whether a field was tokenized) are not pr

Re: Migrating from using doc IDs to using application IDs from the FieldCache

2013-01-31 Thread Michael McCandless
Unfortunately, t's not possible/easy to just add one new field to all existing docs ... there are several issues open to do this, eg see https://issues.apache.org/jira/browse/LUCENE-4258 and LUCENE-3837 and LUCENE-4272. Mike McCandless http://blog.mikemccandless.com On Thu, Jan 31, 2013 at 8:00

Re: IndexWriterConfig.OpenMode.CREATE vs OpenMode.APPEND (index files)

2013-01-31 Thread saisantoshi
Is it by design. The older API (2.4) does not have this problem. Lets say if I have 100 updates or so.. then it will create 100 versions of those files in the index. This would increase the number of files in the index directory and might run into some file issues? It would be good to just have th

Re: Questions about FuzzyQuery in Lucene 4.x

2013-01-31 Thread Michael McCandless
On Thu, Jan 31, 2013 at 2:52 PM, George Kelvin wrote: > Thank you! That is the problem! I changed the maxExpansions to 100 and the > results are found. Phew! > About my second question, the ranking of wildcard fuzzy search, can you > also give some suggestions? Thanks! This is tricky, eg see h

Re: IndexWriterConfig.OpenMode.CREATE vs OpenMode.APPEND (index files)

2013-01-31 Thread Michael McCandless
Then those files are expected. Your 2nd open was with APPEND, which means newly indexed documents are written into a new set of files. Lucene is segment based, so your first batch of documents are in segment _0, while your second batch is in _1 and _2. Mike McCandless http://blog.mikemccandless

Re: IndexWriterConfig.OpenMode.CREATE vs OpenMode.APPEND (index files)

2013-01-31 Thread saisantoshi
It's _0.si ( typo) For second update, create = "false". Thanks, Sai. -- View this message in context: http://lucene.472066.n3.nabble.com/IndexWriterConfig-OpenMode-CREATE-vs-OpenMode-APPEND-index-files-tp4037766p4037785.html Sent from the Lucene - Java Users mailing list archive at Nabble.com

Re: Questions about FuzzyQuery in Lucene 4.x

2013-01-31 Thread George Kelvin
Hi Jack, sorry for confusing you. I understand that it would be great if a minimal data set can be provided to repro the problem. But I was unable to do that.. Hi Michael, Thank you! That is the problem! I changed the maxExpansions to 100 and the results are found. About my second question, the

Re: IndexWriterConfig.OpenMode.CREATE vs OpenMode.APPEND (index files)

2013-01-31 Thread Michael McCandless
I don't know what _0.csi is ... was that supposed to be _0.si? Did you pass create=true or false for the 2nd update? Mike McCandless http://blog.mikemccandless.com On Thu, Jan 31, 2013 at 1:39 PM, saisantoshi wrote: > I am using the following below for creating the IndexWriter (for my > indexi

IndexWriterConfig.OpenMode.CREATE vs OpenMode.APPEND (index files)

2013-01-31 Thread saisantoshi
I am using the following below for creating the IndexWriter (for my indexing): IndexWriterConfig indexWriterConfig = new IndexWriterConfig(Version.LUCENE_40, new LimitTokenCountAnalyzer(analyzer, MAX_FIELD_SCAN_LENGTH)); if (create) { // create will be trure for indexing

Getting the number of all hits for the SpanQuery

2013-01-31 Thread Igor Shalyminov
Hello! I want to perform a SpanQuery and get the precise overall number of all hits throughout the entire index (i.e. if the query words combination appears multiple times in a document, I need that number counted). I've found a method called SpanQuery.getSpans, but the way of using it in the s

Re: CompressingStoredFieldsFormat doesn't show improvement

2013-01-31 Thread Robert Muir
The top method here is your random string generation. are you indexing random data? On Thu, Jan 31, 2013 at 12:46 AM, arun k wrote: > Hi, > > Please find the snapshots here. > http://picpaste.com/Lucene3.0.2-G00Z5FfX.png > http://picpaste.com/Lucene4.1-LsxpcQk0.png > > Arun > > > On Wed, Jan 30,

Re: How to find related words ?

2013-01-31 Thread Jack Krupansky
Oh, so you wanted "similar" words! You should have said so... your inquiry said you were looking for "related" words. So, which is it? More specifically, what exactly are you looking for, in terms of the semantics? In any case, "find similar" (MoreLikeThis) is about the best you can do out of t

Re: How to find related words ?

2013-01-31 Thread Andrew Gilmartin
wgggfiy wrote: en, it seems nice, but I'm puzzled by you and Andrew Gilmartina above, what's the difference between you guys ? The different is that similar documents do not give you similar terms. Similar documents can show a correlation of terms -- ie, whereever Lucene is mentioned so is So

Migrating from using doc IDs to using application IDs from the FieldCache

2013-01-31 Thread Trejkaz
Hi all. We have an application which has been around for so long that it's still using doc IDs to key to an external database. Obviously this won't work forever (even in Lucene 3.x we had to use a custom merge policy to keep it working) so we want to introduce application IDs eventually. We have

Re: How to properly use updatedocument in lucene.

2013-01-31 Thread Trejkaz
On Thu, Jan 31, 2013 at 11:05 PM, Michael McCandless wrote: > It's confusing, but you should never try to re-index a document you > retrieved from a searcher, because certain index-time details (eg, > whether a field was tokenized) are not preserved in the stored > document. > > Instead, you shoul

Re: How to get field names and types from an IndexSearcher

2013-01-31 Thread Rolf Veen
Thank you, Mike. I didn't state why I need this. I want to be able to send a query to some QueryParser that understands "field:1" regardless if 'field' was added as StringField or LongField, for example. I do not want to rely on schema information if I can avoid it, and rather use a smart QueryPar

Re: Real-time Get and Atomic Updates for SolrJ

2013-01-31 Thread Erick Erickson
I haven't used it myself, but I did find this for atomic updates: http://www.mumuio.com/solrj-4-0-0-alpha-atomic-updates/ Don't know if there really is need for specific support in SolrJ for RTG, isn't that all over on the Solr side and automagic? Best Erick On Wed, Jan 30, 2013 at 5:47 PM, Dye

Re: MMapDirectory performance - Are searchable field values contiguously stored in FS block?

2013-01-31 Thread Michael McCandless
On Thu, Jan 31, 2013 at 7:07 AM, Gili Nachum wrote: > So, when loading the results I want to return (say 10 documents), if not > all docs fit in RAM, I would incur up to 10 individual disk seek > operations. Which will kill my performance. Is that correct? Yes, 10 seeks, and that may or may not

Re: How to get field names and types from an IndexSearcher

2013-01-31 Thread Michael McCandless
Getting the FieldInfos from each AtomicReader is the right approach! But, FieldInfos won't tell you which XXXField class was used for the indexing: that information is not fully preserved ... Mike McCandless http://blog.mikemccandless.com On Thu, Jan 31, 2013 at 6:33 AM, Rolf Veen wrote: > Hel

Re: MMapDirectory performance - Are searchable field values contiguously stored in FS block?

2013-01-31 Thread Gili Nachum
Hi Mike, So, when loading the results I want to return (say 10 documents), if not all docs fit in RAM, I would incur up to 10 individual disk seek operations. Which will kill my performance. Is that correct? Considering what are my alternatives: 1. Create another separate lean index that would f

Re: How to properly use updatedocument in lucene.

2013-01-31 Thread Michael McCandless
It's confusing, but you should never try to re-index a document you retrieved from a searcher, because certain index-time details (eg, whether a field was tokenized) are not preserved in the stored document. Instead, you should re-build the document yourself, setting the right details per-Field, a

How to properly use updatedocument in lucene.

2013-01-31 Thread VIGNESH S
Hi All, I am having a basic doubt.. I am trying to update a lucene document field with a new value.. The below is my code.. It is not giving any errors and also it is not updating the document with field. Document d = searcher.doc(docId); writer1 = new IndexWriter(csDirectory, new IndexWriterC