RE: SpanNearQuery - inOrder parameter

2011-05-17 Thread Gregory Tarr
Anyone else able to reply to this?

Thanks

Greg

-Original Message-
From: Gregory Tarr 
Sent: 13 May 2011 15:46
To: 'java-user@lucene.apache.org'
Subject: RE: SpanNearQuery - inOrder parameter

Chris, and others

Thanks for your reply. In effect what you are saying is that
SpanNearQuery works as expected, and I should set inOrder=true to obtain
the behaviour I require, even though I don't care about the order?

Thanks

Greg 

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
Sent: 11 May 2011 00:32
To: java-user@lucene.apache.org
Subject: RE: SpanNearQuery - inOrder parameter



: I attach a junit test which shows strange behaviour of the inOrder
: parameter on the SpanNearQuery constructor, using Lucene 2.9.4.
: 
: My understanding of this parameter is that true forces the order and
: false doesn't care about the order. 
: 
: Using true always works. However using false works fine when the terms
: in the query are distinct, but if they are equivalent, e.g. searching
: for "john john", I do not get the expected results. The workaround
seems
: to be to always use true for queries with repeated terms.

I don't think the situation of "overlapping spans" has changed much
since this thread...

http://search.lucidimagination.com/search/document/ee23395e5a93c525/non_
overlapping_span_queries#868b3a3ec6431afc

the crux of hte issue (as i recall) is that there is really no
conecptual reason to why a query for "'john' near 'john', in any order,
with slop of Z" shouldn't match a doc that contains only one instance of
"john" ... the first SpanTermQuery says "i found a match at position X"
the second SpanTermQuery says "i found a match at position Y" and the
SpanNearQuery says "the differnece between X and Y is less then Z"
therefore i have a match.  (The SpanNearQuery can't fail just because X
and Y are the same -- they might be two distinct term instances, with
differnet payloads perhaps, that just happen to have the same position).

However: if true==inOrder case works because the SpanNearQuery enforces
that  "X must be less then Y" so the same term can't ever match twice.



-Hoss

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Please consider the environment before printing this email.

This message should be regarded as confidential. If you have received this 
email in error please notify the sender and destroy it immediately.
Statements of intent shall only become binding when confirmed in hard copy by 
an authorised signatory.  The contents of this email may relate to dealings 
with other companies within the Detica Limited group of companies.

Detica Limited is registered in England under No: 1337451.

Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



how to create a range query with string parameters

2011-05-17 Thread G.Long

Hi there :)

I would like to perform a range query on a lucene index. I'm using 
lucene 3.1 api.
I looked at the javadoc and found a rangeQueryNode but i'm not sure how 
to use it.


I've got a field "article" in my index which is indexed this way :

entry.add(new Field("article", article, Field.Store.YES, 
Field.Index.ANALYZED));


Now I would like to create a query such as :

+article:[L. 140-1 TO L.145-2]

I didn't manage to find code sample on the web. Could someone give me a 
hand please?


Regards :)


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: how to create a range query with string parameters

2011-05-17 Thread Uwe Schindler
Hi,

Query q = new TermRangeQuery(...)

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: G.Long [mailto:jde...@gmail.com]
> Sent: Tuesday, May 17, 2011 1:53 PM
> To: java-user@lucene.apache.org
> Subject: how to create a range query with string parameters
> 
> Hi there :)
> 
> I would like to perform a range query on a lucene index. I'm using lucene
3.1
> api.
> I looked at the javadoc and found a rangeQueryNode but i'm not sure how to
> use it.
> 
> I've got a field "article" in my index which is indexed this way :
> 
> entry.add(new Field("article", article, Field.Store.YES,
> Field.Index.ANALYZED));
> 
> Now I would like to create a query such as :
> 
> +article:[L. 140-1 TO L.145-2]
> 
> I didn't manage to find code sample on the web. Could someone give me a
> hand please?
> 
> Regards :)
> 
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Rewriting an index without losing 'hidden' data

2011-05-17 Thread Samarendra Pratap
Hi, I know it is too late to answer a question (sorry Chris) but I thought
it could be useful to share things (even late).
I was just going through the mails and I found that we've done it a few
months back.

*Objective: To add a new field to existing index without re-writing the
whole index.*

We have an index ("primary index") to which we want to add a new field,  say
"tags".
Source of the data is database.

I am adding pseudo code here

Create an index "index 2" with just two fields "id" (which is also a unique
identifier in main index) and "tags" (keep it stored) from database (source
of data).

Open a new IndexWriter ("index 3")

Now run a loop over all the documents of "Primary Index" with increasing
order of doc-id
Get document of current doc-id (starting from zero)
 Find the value of "id" field
Search this value in in secondary index in the same ("id") field.  (or
directly get the document through IndexReader and termVector). You should
get only one document.
 If document is found
Add this document to "index 3"
If document is not found
 Add a blank document to "index 3" (to maintain the doc-id order)

(After the loop is finished, the doc-ids and fields of "primary index" and
"index 3" will be in order, i.e. document at doc id 5 in "index 3" and in
"primary index" would be representing the same document of the database with
different fields)

Open a *ParallelReader* ( this is the key :-) ) and add both the indexes
("primary index" and "index 3") one by one.
Open an IndexWriter and use addIndexes(IndexReader) to create a single
index.
The final index will contain primary index with "tags" field. :-)


I request the list to comment if there could be any issue with that.


My question follows then -
I tried this on NumericField (as "tags") but this didn't work.
My guess (excuse me for guessing without deeper investigations) is that this
is because NumericField is not a Field. It is an AbstractField

Irrespective of the correctness of my guess can someone give me a hint or
point me to something which can help me doing the same process successfully
for NumericField as well?

I hope to listen from learned people.


On Fri, Apr 8, 2011 at 9:38 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> Unfortunately, updateDocument replaces the *entire* previous document
> with the new one.
>
> The ability to update a single indexed field (either replace that
> field entirely, or, change only certain token occurrences within it),
> while leaving all other indexed fields in the document unaffected, has
> been a long requested big missing feature in Lucene.  We call it
> "incremental field updates".
>
> There have been some healthy discussions on the dev list, that have
> worked out a good rough design (eg see
> http://markmail.org/thread/lsfjhpiblzymkfcn).  Also, recent
> improvements in how buffered deletes are handled should make it alot
> easier for updates to "piggyback" using that same packet stream
> approach.  So... I think there is hope some day that we'll get this
> into Lucene.
>
> Mike
>
> http://blog.mikemccandless.com
>
> On Fri, Apr 8, 2011 at 11:00 AM, Ian Lea  wrote:
> > Unfortunately you just can't do this.  Might be possible if all fields
> > were stored but evidently they are not in your index.  For unstored
> > fields, the Document object will not contain the data that was passed
> > in when the doc was originally added.
> >
> > I believe there might be a way of recreating some of the missing data
> > via TermFreqVector but that has always sounded dodgy and lossy to me.
> >
> > The safest way is to reindex, however painful it might be.  Maybe you
> > could take the opportunity to upgrade lucene at the same time!
> >
> >
> > --
> > Ian.
> >
> >
> > On Fri, Apr 8, 2011 at 3:44 PM, Chris Bamford
> >  wrote:
> >> Hi,
> >>
> >> I recently discovered that I need to add a single field to every
> document in an existing (very large) index.  Reindexing from scratch is not
> an option I want to consider right now, so I wrote a utility to add the
> field by rewriting the index - but this seemed to lose some of the fields
> (indexed, but not stored?).  In fact, it shrunk a 12Gb index down to 4.2Gb -
> clearly not what I wanted.  :-)
> >> What am I doing wrong?
> >>
> >> My technique was:
> >>
> >>  Analyzer analyser = new StandardAnalyzer();
> >>  IndexSearcher searcher = new IndexSearcher(indexPath);
> >>  IndexWriter indexWriter = new IndexWriter(indexPath, analyser);
> >>  Hits hits = matchAllDocumentsFromIndex(searcher);
> >>
> >>  for (int i=0; i < hits.length(); i++) {
> >>  Document doc = hits.doc(i);
> >>  String id = doc.get("unique-id");
> >>  doc.add(new Field("newField", newValue, Field.Store.YES,
> Field.Index.UN_TOKENIZED));
> >>  indexWriter.updateDocument(new Term("unique-id", id), doc);
> >>  }
> >>
> >>  searcher.close();
> >>  indexWriter.optimize();
> >>  indexWriter.close();
> >>
> >> Note that my matchAllDocumentsFromIndex() does get the right 

QueryParser/StopAnalyzer question

2011-05-17 Thread Mindaugas Žakšauskas
Hi,

Let's say we have an index having few documents indexed using
StopAnalyzer.ENGLISH_STOP_WORDS_SET. The user issues two queries:
1) foo:bar
2) baz:"there is"

Let's assume that the first query yields some results because there
are documents matching that query.

The second query contains two stopwords ("there" and "is") and yields
0 results. The reason for this is because when baz:"there is" is
parsed, it ends up as a void query as both "there" and "is" are
stopwords (technically speaking, this is converted to an empty
BooleanQuery having no clauses). So far so good.

However, any of the following combined queries

+foo:bar +baz:"there is"
foo:bar AND baz:"there is"

behave exactly the same way as query +foo:bar, that is, brings back
some results. The second AND part which is supposed to yield no
results is completely ignored.

One might argue that when ANDing both conditions have to be met, that
is, documents having foo=bar and baz being empty have to be retrieved,
as when issued seperately, baz:"there is" yields 0 results.

It seem contradictory as an atomic query component has different
impact on the overall query depending on the context. Is there any
logical explanation for this? Can this be addressed in any way,
preferably without writing own QueryAnalyzer?

If this makes any difference, observed behaviour happens under Lucene v3.0.2.

Regards,
Mindaugas

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: how to create a range query with string parameters

2011-05-17 Thread G.Long

Hi Uwe :)

Thank you for your answer ! Now I have another problem. Here is the code 
I use to query the index :


ScoreDoc[] hits = null;
TopFieldCollector collector = TopFieldCollector.create(new 
Sort(SortField.FIELD_DOC), 20, true, false, false, false);
Directory directory = FSDirectory.open(new 
File("/home/user/index"));


IndexSearcher isearcher = new IndexSearcher(directory);
Query tQueryCode = new TermQuery(new Term(FIELD_CODE, "CCOM"));
Query tQueryCodeRef = new TermQuery(new Term(FIELD_CODE_REF, 
"CCOM"));
Query rQuery = new TermRangeQuery(FIELD_ARTICLE, "l110-1", 
"l146-4", true, true);


BooleanQuery bQuery = new BooleanQuery();
bQuery.add(tQueryCode, Occur.MUST);
bQuery.add(tQueryCodeRef, Occur.MUST);
bQuery.add(rQuery, Occur.MUST);

System.out.println(bQuery.toString());

isearcher.search(bQuery, collector);
hits = collector.topDocs().scoreDocs;

System.out.println(hits.length);

The query is : +code:CCOM +codeRef:CCOM +article:[l110-1 TO l146-4]

The hits[] is equal to Zero although there should be hits. I'm using a 
program called lukeall 3.1 which provide
a GUI to query a lucene index. When I copy the query into this program 
and run it, it return a lot of results =o


So I guess I'm missing something. I thought about a missing analyzer but 
I'm not sure...


Regards,
Gary

Le 17/05/2011 14:02, Uwe Schindler a écrit :

Hi,

Query q = new TermRangeQuery(...)

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: how to create a range query with string parameters

2011-05-17 Thread Ian Lea
It's likely to have something to do with analyzers.  That is the
usually the first thing to come to mind if queries hold upper or mixed
case terms.  Maybe Luke is using an analyzer that matches the one you
used when you indexed your documents.

You can use Luke to see what is being stored in the index.   See also
http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F


Something that looks OK here but might bite you in the future is if
your article fields aren't always in the same format and of the same
length.  The comparison is a simple string based one and if you had,
say, l123-1, l1-123, l1-999 the range matching might not give you
what you expected.

--
Ian.


On Tue, May 17, 2011 at 3:41 PM, G.Long  wrote:
> Hi Uwe :)
>
> Thank you for your answer ! Now I have another problem. Here is the code I
> use to query the index :
>
> ScoreDoc[] hits = null;
>        TopFieldCollector collector = TopFieldCollector.create(new
> Sort(SortField.FIELD_DOC), 20, true, false, false, false);
>        Directory directory = FSDirectory.open(new File("/home/user/index"));
>
>        IndexSearcher isearcher = new IndexSearcher(directory);
>        Query tQueryCode = new TermQuery(new Term(FIELD_CODE, "CCOM"));
>        Query tQueryCodeRef = new TermQuery(new Term(FIELD_CODE_REF,
> "CCOM"));
>        Query rQuery = new TermRangeQuery(FIELD_ARTICLE, "l110-1", "l146-4",
> true, true);
>
>        BooleanQuery bQuery = new BooleanQuery();
>        bQuery.add(tQueryCode, Occur.MUST);
>        bQuery.add(tQueryCodeRef, Occur.MUST);
>        bQuery.add(rQuery, Occur.MUST);
>
>        System.out.println(bQuery.toString());
>
>        isearcher.search(bQuery, collector);
>        hits = collector.topDocs().scoreDocs;
>
>        System.out.println(hits.length);
>
> The query is : +code:CCOM +codeRef:CCOM +article:[l110-1 TO l146-4]
>
> The hits[] is equal to Zero although there should be hits. I'm using a
> program called lukeall 3.1 which provide
> a GUI to query a lucene index. When I copy the query into this program and
> run it, it return a lot of results =o
>
> So I guess I'm missing something. I thought about a missing analyzer but I'm
> not sure...
>
> Regards,
> Gary
>
> Le 17/05/2011 14:02, Uwe Schindler a écrit :
>>
>> Hi,
>>
>> Query q = new TermRangeQuery(...)
>>
>> Uwe
>>
>> -
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: u...@thetaphi.de
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: how to create a range query with string parameters

2011-05-17 Thread G.Long
I added a standard analyzer and a Query Parser to parse each boolean 
clause of my query and i got some results :)

But now there are some strange behaviors.

the following queries :

+code:CCOM +article:"l123-12"
+code:CCOM +article:"l123-13"
+code:CCOM +article:"l123-14"

return one result.

However, the following query :

+code:CCOM +article[l123-12 TO l123-14]

return nothing =(

With other parameters, the range query works almost fine but some 
results are missing.
What could be the problem? Could it have something to do with the way 
the documents are indexed?

(the use of Field.Index.ANALYZED for example)

Thank you for your help :)

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: how to create a range query with string parameters

2011-05-17 Thread Ian Lea
Could it be as simple as a missing colon after article in +code:CCOM
+article[l123-12 TO l123-14]?

If not, double check analyzers, see what Luke shows as indexed terms
for that field, work through the FAQ info posted earlier.  And play
with quotes - sometimes you show your article values quoted, sometimes
not.


--
Ian.


On Tue, May 17, 2011 at 5:01 PM, G.Long  wrote:
> I added a standard analyzer and a Query Parser to parse each boolean clause
> of my query and i got some results :)
> But now there are some strange behaviors.
>
> the following queries :
>
> +code:CCOM +article:"l123-12"
> +code:CCOM +article:"l123-13"
> +code:CCOM +article:"l123-14"
>
> return one result.
>
> However, the following query :
>
> +code:CCOM +article[l123-12 TO l123-14]
>
> return nothing =(
>
> With other parameters, the range query works almost fine but some results
> are missing.
> What could be the problem? Could it have something to do with the way the
> documents are indexed?
> (the use of Field.Index.ANALYZED for example)
>
> Thank you for your help :)
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: how to create a range query with string parameters

2011-05-17 Thread G.Long
I set the field article to NOT_ANALYZED and I didn't quoted the article 
values in the range part of the query and it looks like it works better now.


However, some results are still missing. For exemple, sometimes a range 
like [l220-2 TO l220-10] will not return any results (although i'm sure 
there are results for this range).


At the beginning I thought that was because the range was between 220 
and 220 but I double checked a range like [a710-4 TO a710-10] and it 
returned results... :/


So it looks like there is another problem. I have to investigate more =)

Thank you for your help :)

Regards,

Gary

Le 17/05/2011 19:00, Ian Lea a écrit :

Could it be as simple as a missing colon after article in +code:CCOM
+article[l123-12 TO l123-14]?

If not, double check analyzers, see what Luke shows as indexed terms
for that field, work through the FAQ info posted earlier.  And play
with quotes - sometimes you show your article values quoted, sometimes
not.


--
Ian.




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: how to create a range query with string parameters

2011-05-17 Thread Erick Erickson
Actually, there are no results in the range [l220-2 TO l220-10]

This is basically a string comparison, and l220-2 > l220-10 so
this range would never match.

Best
Erick

On Tue, May 17, 2011 at 1:51 PM, G.Long  wrote:
> I set the field article to NOT_ANALYZED and I didn't quoted the article
> values in the range part of the query and it looks like it works better now.
>
> However, some results are still missing. For exemple, sometimes a range like
> [l220-2 TO l220-10] will not return any results (although i'm sure there are
> results for this range).
>
> At the beginning I thought that was because the range was between 220 and
> 220 but I double checked a range like [a710-4 TO a710-10] and it returned
> results... :/
>
> So it looks like there is another problem. I have to investigate more =)
>
> Thank you for your help :)
>
> Regards,
>
> Gary
>
> Le 17/05/2011 19:00, Ian Lea a écrit :
>>
>> Could it be as simple as a missing colon after article in +code:CCOM
>> +article[l123-12 TO l123-14]?
>>
>> If not, double check analyzers, see what Luke shows as indexed terms
>> for that field, work through the FAQ info posted earlier.  And play
>> with quotes - sometimes you show your article values quoted, sometimes
>> not.
>>
>>
>> --
>> Ian.
>>
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



I need an available solr lucene consultant

2011-05-17 Thread Lance
Hi,

I am looking for an experienced and skilled Solr & Lucene developer/consultant 
to work on a software project incorporating natural language processing and 
machine learning algorithms. As part of a larger NLP/AI project that is under 
way, we need someone to install, refine and optimize Solr and Lucene for our 
website. The data being analyzed will be from user-generated textual 
discussions around a multitude of topics that will continuously be updated.
You must be able to work in a LAMP environment with other developers, be smart, 
reliable, and a self-starter with excellent problem solving and analytical 
abilities. You must have a solid grasp of English – written and verbal. 

Please note that I am a start-up and I am not going to be able to pay what a 
large established company can pay.

Thank you,

Lance 

-
Lance