RE: SpanNearQuery - inOrder parameter
Anyone else able to reply to this? Thanks Greg -Original Message- From: Gregory Tarr Sent: 13 May 2011 15:46 To: 'java-user@lucene.apache.org' Subject: RE: SpanNearQuery - inOrder parameter Chris, and others Thanks for your reply. In effect what you are saying is that SpanNearQuery works as expected, and I should set inOrder=true to obtain the behaviour I require, even though I don't care about the order? Thanks Greg -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: 11 May 2011 00:32 To: java-user@lucene.apache.org Subject: RE: SpanNearQuery - inOrder parameter : I attach a junit test which shows strange behaviour of the inOrder : parameter on the SpanNearQuery constructor, using Lucene 2.9.4. : : My understanding of this parameter is that true forces the order and : false doesn't care about the order. : : Using true always works. However using false works fine when the terms : in the query are distinct, but if they are equivalent, e.g. searching : for "john john", I do not get the expected results. The workaround seems : to be to always use true for queries with repeated terms. I don't think the situation of "overlapping spans" has changed much since this thread... http://search.lucidimagination.com/search/document/ee23395e5a93c525/non_ overlapping_span_queries#868b3a3ec6431afc the crux of hte issue (as i recall) is that there is really no conecptual reason to why a query for "'john' near 'john', in any order, with slop of Z" shouldn't match a doc that contains only one instance of "john" ... the first SpanTermQuery says "i found a match at position X" the second SpanTermQuery says "i found a match at position Y" and the SpanNearQuery says "the differnece between X and Y is less then Z" therefore i have a match. (The SpanNearQuery can't fail just because X and Y are the same -- they might be two distinct term instances, with differnet payloads perhaps, that just happen to have the same position). However: if true==inOrder case works because the SpanNearQuery enforces that "X must be less then Y" so the same term can't ever match twice. -Hoss - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org Please consider the environment before printing this email. This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately. Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory. The contents of this email may relate to dealings with other companies within the Detica Limited group of companies. Detica Limited is registered in England under No: 1337451. Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
how to create a range query with string parameters
Hi there :) I would like to perform a range query on a lucene index. I'm using lucene 3.1 api. I looked at the javadoc and found a rangeQueryNode but i'm not sure how to use it. I've got a field "article" in my index which is indexed this way : entry.add(new Field("article", article, Field.Store.YES, Field.Index.ANALYZED)); Now I would like to create a query such as : +article:[L. 140-1 TO L.145-2] I didn't manage to find code sample on the web. Could someone give me a hand please? Regards :) - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: how to create a range query with string parameters
Hi, Query q = new TermRangeQuery(...) Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: G.Long [mailto:jde...@gmail.com] > Sent: Tuesday, May 17, 2011 1:53 PM > To: java-user@lucene.apache.org > Subject: how to create a range query with string parameters > > Hi there :) > > I would like to perform a range query on a lucene index. I'm using lucene 3.1 > api. > I looked at the javadoc and found a rangeQueryNode but i'm not sure how to > use it. > > I've got a field "article" in my index which is indexed this way : > > entry.add(new Field("article", article, Field.Store.YES, > Field.Index.ANALYZED)); > > Now I would like to create a query such as : > > +article:[L. 140-1 TO L.145-2] > > I didn't manage to find code sample on the web. Could someone give me a > hand please? > > Regards :) > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Rewriting an index without losing 'hidden' data
Hi, I know it is too late to answer a question (sorry Chris) but I thought it could be useful to share things (even late). I was just going through the mails and I found that we've done it a few months back. *Objective: To add a new field to existing index without re-writing the whole index.* We have an index ("primary index") to which we want to add a new field, say "tags". Source of the data is database. I am adding pseudo code here Create an index "index 2" with just two fields "id" (which is also a unique identifier in main index) and "tags" (keep it stored) from database (source of data). Open a new IndexWriter ("index 3") Now run a loop over all the documents of "Primary Index" with increasing order of doc-id Get document of current doc-id (starting from zero) Find the value of "id" field Search this value in in secondary index in the same ("id") field. (or directly get the document through IndexReader and termVector). You should get only one document. If document is found Add this document to "index 3" If document is not found Add a blank document to "index 3" (to maintain the doc-id order) (After the loop is finished, the doc-ids and fields of "primary index" and "index 3" will be in order, i.e. document at doc id 5 in "index 3" and in "primary index" would be representing the same document of the database with different fields) Open a *ParallelReader* ( this is the key :-) ) and add both the indexes ("primary index" and "index 3") one by one. Open an IndexWriter and use addIndexes(IndexReader) to create a single index. The final index will contain primary index with "tags" field. :-) I request the list to comment if there could be any issue with that. My question follows then - I tried this on NumericField (as "tags") but this didn't work. My guess (excuse me for guessing without deeper investigations) is that this is because NumericField is not a Field. It is an AbstractField Irrespective of the correctness of my guess can someone give me a hint or point me to something which can help me doing the same process successfully for NumericField as well? I hope to listen from learned people. On Fri, Apr 8, 2011 at 9:38 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > Unfortunately, updateDocument replaces the *entire* previous document > with the new one. > > The ability to update a single indexed field (either replace that > field entirely, or, change only certain token occurrences within it), > while leaving all other indexed fields in the document unaffected, has > been a long requested big missing feature in Lucene. We call it > "incremental field updates". > > There have been some healthy discussions on the dev list, that have > worked out a good rough design (eg see > http://markmail.org/thread/lsfjhpiblzymkfcn). Also, recent > improvements in how buffered deletes are handled should make it alot > easier for updates to "piggyback" using that same packet stream > approach. So... I think there is hope some day that we'll get this > into Lucene. > > Mike > > http://blog.mikemccandless.com > > On Fri, Apr 8, 2011 at 11:00 AM, Ian Lea wrote: > > Unfortunately you just can't do this. Might be possible if all fields > > were stored but evidently they are not in your index. For unstored > > fields, the Document object will not contain the data that was passed > > in when the doc was originally added. > > > > I believe there might be a way of recreating some of the missing data > > via TermFreqVector but that has always sounded dodgy and lossy to me. > > > > The safest way is to reindex, however painful it might be. Maybe you > > could take the opportunity to upgrade lucene at the same time! > > > > > > -- > > Ian. > > > > > > On Fri, Apr 8, 2011 at 3:44 PM, Chris Bamford > > wrote: > >> Hi, > >> > >> I recently discovered that I need to add a single field to every > document in an existing (very large) index. Reindexing from scratch is not > an option I want to consider right now, so I wrote a utility to add the > field by rewriting the index - but this seemed to lose some of the fields > (indexed, but not stored?). In fact, it shrunk a 12Gb index down to 4.2Gb - > clearly not what I wanted. :-) > >> What am I doing wrong? > >> > >> My technique was: > >> > >> Analyzer analyser = new StandardAnalyzer(); > >> IndexSearcher searcher = new IndexSearcher(indexPath); > >> IndexWriter indexWriter = new IndexWriter(indexPath, analyser); > >> Hits hits = matchAllDocumentsFromIndex(searcher); > >> > >> for (int i=0; i < hits.length(); i++) { > >> Document doc = hits.doc(i); > >> String id = doc.get("unique-id"); > >> doc.add(new Field("newField", newValue, Field.Store.YES, > Field.Index.UN_TOKENIZED)); > >> indexWriter.updateDocument(new Term("unique-id", id), doc); > >> } > >> > >> searcher.close(); > >> indexWriter.optimize(); > >> indexWriter.close(); > >> > >> Note that my matchAllDocumentsFromIndex() does get the right
QueryParser/StopAnalyzer question
Hi, Let's say we have an index having few documents indexed using StopAnalyzer.ENGLISH_STOP_WORDS_SET. The user issues two queries: 1) foo:bar 2) baz:"there is" Let's assume that the first query yields some results because there are documents matching that query. The second query contains two stopwords ("there" and "is") and yields 0 results. The reason for this is because when baz:"there is" is parsed, it ends up as a void query as both "there" and "is" are stopwords (technically speaking, this is converted to an empty BooleanQuery having no clauses). So far so good. However, any of the following combined queries +foo:bar +baz:"there is" foo:bar AND baz:"there is" behave exactly the same way as query +foo:bar, that is, brings back some results. The second AND part which is supposed to yield no results is completely ignored. One might argue that when ANDing both conditions have to be met, that is, documents having foo=bar and baz being empty have to be retrieved, as when issued seperately, baz:"there is" yields 0 results. It seem contradictory as an atomic query component has different impact on the overall query depending on the context. Is there any logical explanation for this? Can this be addressed in any way, preferably without writing own QueryAnalyzer? If this makes any difference, observed behaviour happens under Lucene v3.0.2. Regards, Mindaugas - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: how to create a range query with string parameters
Hi Uwe :) Thank you for your answer ! Now I have another problem. Here is the code I use to query the index : ScoreDoc[] hits = null; TopFieldCollector collector = TopFieldCollector.create(new Sort(SortField.FIELD_DOC), 20, true, false, false, false); Directory directory = FSDirectory.open(new File("/home/user/index")); IndexSearcher isearcher = new IndexSearcher(directory); Query tQueryCode = new TermQuery(new Term(FIELD_CODE, "CCOM")); Query tQueryCodeRef = new TermQuery(new Term(FIELD_CODE_REF, "CCOM")); Query rQuery = new TermRangeQuery(FIELD_ARTICLE, "l110-1", "l146-4", true, true); BooleanQuery bQuery = new BooleanQuery(); bQuery.add(tQueryCode, Occur.MUST); bQuery.add(tQueryCodeRef, Occur.MUST); bQuery.add(rQuery, Occur.MUST); System.out.println(bQuery.toString()); isearcher.search(bQuery, collector); hits = collector.topDocs().scoreDocs; System.out.println(hits.length); The query is : +code:CCOM +codeRef:CCOM +article:[l110-1 TO l146-4] The hits[] is equal to Zero although there should be hits. I'm using a program called lukeall 3.1 which provide a GUI to query a lucene index. When I copy the query into this program and run it, it return a lot of results =o So I guess I'm missing something. I thought about a missing analyzer but I'm not sure... Regards, Gary Le 17/05/2011 14:02, Uwe Schindler a écrit : Hi, Query q = new TermRangeQuery(...) Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: how to create a range query with string parameters
It's likely to have something to do with analyzers. That is the usually the first thing to come to mind if queries hold upper or mixed case terms. Maybe Luke is using an analyzer that matches the one you used when you indexed your documents. You can use Luke to see what is being stored in the index. See also http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F Something that looks OK here but might bite you in the future is if your article fields aren't always in the same format and of the same length. The comparison is a simple string based one and if you had, say, l123-1, l1-123, l1-999 the range matching might not give you what you expected. -- Ian. On Tue, May 17, 2011 at 3:41 PM, G.Long wrote: > Hi Uwe :) > > Thank you for your answer ! Now I have another problem. Here is the code I > use to query the index : > > ScoreDoc[] hits = null; > TopFieldCollector collector = TopFieldCollector.create(new > Sort(SortField.FIELD_DOC), 20, true, false, false, false); > Directory directory = FSDirectory.open(new File("/home/user/index")); > > IndexSearcher isearcher = new IndexSearcher(directory); > Query tQueryCode = new TermQuery(new Term(FIELD_CODE, "CCOM")); > Query tQueryCodeRef = new TermQuery(new Term(FIELD_CODE_REF, > "CCOM")); > Query rQuery = new TermRangeQuery(FIELD_ARTICLE, "l110-1", "l146-4", > true, true); > > BooleanQuery bQuery = new BooleanQuery(); > bQuery.add(tQueryCode, Occur.MUST); > bQuery.add(tQueryCodeRef, Occur.MUST); > bQuery.add(rQuery, Occur.MUST); > > System.out.println(bQuery.toString()); > > isearcher.search(bQuery, collector); > hits = collector.topDocs().scoreDocs; > > System.out.println(hits.length); > > The query is : +code:CCOM +codeRef:CCOM +article:[l110-1 TO l146-4] > > The hits[] is equal to Zero although there should be hits. I'm using a > program called lukeall 3.1 which provide > a GUI to query a lucene index. When I copy the query into this program and > run it, it return a lot of results =o > > So I guess I'm missing something. I thought about a missing analyzer but I'm > not sure... > > Regards, > Gary > > Le 17/05/2011 14:02, Uwe Schindler a écrit : >> >> Hi, >> >> Query q = new TermRangeQuery(...) >> >> Uwe >> >> - >> Uwe Schindler >> H.-H.-Meier-Allee 63, D-28213 Bremen >> http://www.thetaphi.de >> eMail: u...@thetaphi.de > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: how to create a range query with string parameters
I added a standard analyzer and a Query Parser to parse each boolean clause of my query and i got some results :) But now there are some strange behaviors. the following queries : +code:CCOM +article:"l123-12" +code:CCOM +article:"l123-13" +code:CCOM +article:"l123-14" return one result. However, the following query : +code:CCOM +article[l123-12 TO l123-14] return nothing =( With other parameters, the range query works almost fine but some results are missing. What could be the problem? Could it have something to do with the way the documents are indexed? (the use of Field.Index.ANALYZED for example) Thank you for your help :) - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: how to create a range query with string parameters
Could it be as simple as a missing colon after article in +code:CCOM +article[l123-12 TO l123-14]? If not, double check analyzers, see what Luke shows as indexed terms for that field, work through the FAQ info posted earlier. And play with quotes - sometimes you show your article values quoted, sometimes not. -- Ian. On Tue, May 17, 2011 at 5:01 PM, G.Long wrote: > I added a standard analyzer and a Query Parser to parse each boolean clause > of my query and i got some results :) > But now there are some strange behaviors. > > the following queries : > > +code:CCOM +article:"l123-12" > +code:CCOM +article:"l123-13" > +code:CCOM +article:"l123-14" > > return one result. > > However, the following query : > > +code:CCOM +article[l123-12 TO l123-14] > > return nothing =( > > With other parameters, the range query works almost fine but some results > are missing. > What could be the problem? Could it have something to do with the way the > documents are indexed? > (the use of Field.Index.ANALYZED for example) > > Thank you for your help :) > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: how to create a range query with string parameters
I set the field article to NOT_ANALYZED and I didn't quoted the article values in the range part of the query and it looks like it works better now. However, some results are still missing. For exemple, sometimes a range like [l220-2 TO l220-10] will not return any results (although i'm sure there are results for this range). At the beginning I thought that was because the range was between 220 and 220 but I double checked a range like [a710-4 TO a710-10] and it returned results... :/ So it looks like there is another problem. I have to investigate more =) Thank you for your help :) Regards, Gary Le 17/05/2011 19:00, Ian Lea a écrit : Could it be as simple as a missing colon after article in +code:CCOM +article[l123-12 TO l123-14]? If not, double check analyzers, see what Luke shows as indexed terms for that field, work through the FAQ info posted earlier. And play with quotes - sometimes you show your article values quoted, sometimes not. -- Ian. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: how to create a range query with string parameters
Actually, there are no results in the range [l220-2 TO l220-10] This is basically a string comparison, and l220-2 > l220-10 so this range would never match. Best Erick On Tue, May 17, 2011 at 1:51 PM, G.Long wrote: > I set the field article to NOT_ANALYZED and I didn't quoted the article > values in the range part of the query and it looks like it works better now. > > However, some results are still missing. For exemple, sometimes a range like > [l220-2 TO l220-10] will not return any results (although i'm sure there are > results for this range). > > At the beginning I thought that was because the range was between 220 and > 220 but I double checked a range like [a710-4 TO a710-10] and it returned > results... :/ > > So it looks like there is another problem. I have to investigate more =) > > Thank you for your help :) > > Regards, > > Gary > > Le 17/05/2011 19:00, Ian Lea a écrit : >> >> Could it be as simple as a missing colon after article in +code:CCOM >> +article[l123-12 TO l123-14]? >> >> If not, double check analyzers, see what Luke shows as indexed terms >> for that field, work through the FAQ info posted earlier. And play >> with quotes - sometimes you show your article values quoted, sometimes >> not. >> >> >> -- >> Ian. >> > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
I need an available solr lucene consultant
Hi, I am looking for an experienced and skilled Solr & Lucene developer/consultant to work on a software project incorporating natural language processing and machine learning algorithms. As part of a larger NLP/AI project that is under way, we need someone to install, refine and optimize Solr and Lucene for our website. The data being analyzed will be from user-generated textual discussions around a multitude of topics that will continuously be updated. You must be able to work in a LAMP environment with other developers, be smart, reliable, and a self-starter with excellent problem solving and analytical abilities. You must have a solid grasp of English – written and verbal. Please note that I am a start-up and I am not going to be able to pay what a large established company can pay. Thank you, Lance - Lance