Re: Trying to store Offsets. Dont know the exact meaning of some terms.

2013-08-14 Thread rizwan patel
Thanks Mike, this clarifies my understanding as well. Regds, Rizwan On Wed, Aug 14, 2013 at 7:56 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > I think you just need to add fieldType.setStoreTermVectors(true) as well. > > However, I see you are also indexing offsets into the postin

Re: BlockJoinQuery

2013-08-14 Thread vonPuh fonPuhendorf
want to use the infix suggester but instead to create different index for every postid (which was initial consideration - not very wise one :)) I want to create one index containing all the information about the postsid and every comment and for every to suggest matches only for a specific postid.

Re: BlockJoinQuery

2013-08-14 Thread vonPuh fonPuhendorf
but if i want to make this a suggester will be a hard case isnt it?

Re: BlockJoinQuery

2013-08-14 Thread Michael McCandless
You have to re-index the parent + all children (and delete the previous parent + all its children) whenever you want to add a new child doc. If you want to delete just children then you can do that w/o reindexing the full block. Mike McCandless http://blog.mikemccandless.com On Wed, Aug 14, 20

Re: BlockJoinQuery

2013-08-14 Thread vonPuh fonPuhendorf
and can documents cam be added dynamic i.e new comments can be indexed and added to parent doc? or have to rebuild the index 2013/8/14 Michael McCandless > Yes. > > Mike McCandless > > http://blog.mikemccandless.com > > > On Wed, Aug 14, 2013 at 11:50 AM, vonPuh fonPuhendorf > wrote: > > hi,

Re: Assistance for Unified Index Proces

2013-08-14 Thread Ian Lea
Have one big index holding everything, with a "folder" indexed field that you can use for filtering? -- Ian. On Wed, Aug 14, 2013 at 10:03 AM, Mark Jason B. Nacional wrote: > Hi Lucene Developers: > > I just want to ask some help regarding our new implementation of indexing > process. > > We

Assistance for Unified Index Proces

2013-08-14 Thread Mark Jason B. Nacional
Hi Lucene Developers: I just want to ask some help regarding our new implementation of indexing process. We use this API for searching a number of documents in a review platform (e-discovery). On our search module, we index a document per folder. The problem is that a user can create his own

Re: BlockJoinQuery

2013-08-14 Thread Michael McCandless
Yes. Mike McCandless http://blog.mikemccandless.com On Wed, Aug 14, 2013 at 11:50 AM, vonPuh fonPuhendorf wrote: > hi, can i use BlockJoinQuery to search only relative content i.e a parrent > will be post id and the children will be all the comments in the > threads(userids) > > and return res

BlockJoinQuery

2013-08-14 Thread vonPuh fonPuhendorf
hi, can i use BlockJoinQuery to search only relative content i.e a parrent will be post id and the children will be all the comments in the threads(userids) and return results only for comments from exact postid? i.e user 222 seach for "blah" from postid "1234" and results from only that postid wi

Re: Boolean Query when indexing each line as a document.

2013-08-14 Thread Ian Lea
If you're using StandardAnalyzer what's the reference to CustomAnalyzerForCaseSensitive all about? Someone else with more patience or better diagnostic skill may well spot your problem but I can't. My final suggestion is that you build and post the smallest possible self-contained program, using

Re: Boolean Query when indexing each line as a document.

2013-08-14 Thread Ankit Murarka
Hello. I gave the complete code sample so that anyone can try and let me know. This is because this issue is really taking a toll on me. I am so close yet so far. Yes, I am using analyzer to index the document. The Analyzer is StandardAnalyzer but I have commented the LowerCaseFilter code from

Re: Boolean Query when indexing each line as a document.

2013-08-14 Thread Ian Lea
I was rather hoping for something smaller! One suggestion from a glance is that you're using some analyzer somewhere but building a BooleanQuery out of a TermQuery or two. Are you sure (test it and prove it) that the strings you pass to the TermQuery are EXACTLY what has been indexed? -- Ian.

Re: Boolean Query when indexing each line as a document.

2013-08-14 Thread Ankit Murarka
Hello. The problem is as follows: I have a document containing information in lines. So I am indexing all files line by line. So If I say in my document I have, INSIDE POST OF SERVER\ and in my index file created I have, INSIDE POST OF SERVER\ and I fire a boolean qu

Re: Trying to store Offsets. Dont know the exact meaning of some terms.

2013-08-14 Thread Michael McCandless
I think you just need to add fieldType.setStoreTermVectors(true) as well. However, I see you are also indexing offsets into the postings, which is wasteful because now you've indexed offsets twice in your index. Usually only one place is needed, i.e. if you will use PostingsHighlighter, only inde

Re: IllegalStateException in SpanTermQuery

2013-08-14 Thread Michael McCandless
OK I see why your test in 3.5 was passing: if you just run SpanTermQuery alone, the hit count will be correct, because it never needs to access positions (I suspect?). Ie, a SpanTermQuery alone is like running TermQuery. It's when SpanTermQuery is used inside other SpanQuerys that positions will

Re: Boolean Query when indexing each line as a document.

2013-08-14 Thread Ian Lea
Well, you have supplied a bit more info - good - but I still can't spot the problem. Unless someone else can I suggest you post a very small self-contained program that demonstrates the problem. -- Ian. On Wed, Aug 14, 2013 at 2:50 PM, Ankit Murarka wrote: > Hello. > The problem does

RE: Fuzzy Searching on Lucene / Solr

2013-08-14 Thread Uwe Schindler
Hi Michael, It is also a size constraint! The FSA would be horrible huge. FYI: the different fuzzy distances are not implemented by a simple "parameter" to some algorithm. For every fuzzy distance there is a separate(automatically generated) Java class with huge FSA matrices that handles the fuz

Re: Fuzzy Searching on Lucene / Solr

2013-08-14 Thread Jack Krupansky
The limit of 2 is hard-coded precisely because good performance for editing distances above 2 cannot be guaranteed. -- Jack Krupansky -Original Message- From: Michael Tobias Sent: Wednesday, August 14, 2013 1:00 AM To: java-user@lucene.apache.org Subject: Fuzzy Searching on Lucene / S

Re: Boolean Query when indexing each line as a document.

2013-08-14 Thread Ankit Murarka
Hello. The problem does not seem to be getting solved. As mentioned, I am indexing each line of each file. The sample text present inside LUKE is \ \ java.lang.Thread.run(Thread.java:619) >>Size of list array::0\ at java.lang.reflect.Method.invoke(Method.java:597) org.com.dummy,INFO,<<

Re: IllegalStateException in SpanTermQuery

2013-08-14 Thread Erick Erickson
As Mike said, this is an intended change. The test passed in 3.5 because there was no check if Span queries were working on a field that supported them. In 4.x this is checked and an error is thrown. Best Erick On Wed, Aug 14, 2013 at 12:22 AM, Yonghui Zhao wrote: > In our old code, we create t

Re: Trying to store Offsets. Dont know the exact meaning of some terms.

2013-08-14 Thread rizwan patel
Ankit, Term Vector is the informational guide to get the details about your indexed information. TermOffset : is providing you the details about where the term occurs in the given data value. e.g. "lucene is smart" Here terms are : lucene, is, smart, TermVectorOffset would be position of the terms