Re: QueryParser, double quotes and wilcard inside the double quotes

2012-07-04 Thread Jochen Hebbrecht
Thanks Ian, I'll give it a try!

2012/7/3 Ian Lea 

> You can use the QueryParser proximity feature e.g. "foo test"~n where
> n is the max distance you want them to be apart.  Or look at the
> SpanQuery stuff e.g. SpanNearQuery.
>
>
> --
> Ian.
>
>
> On Tue, Jul 3, 2012 at 4:59 PM, Jochen Hebbrecht
>  wrote:
> > Hi all,
> >
> > Imagine you have the following books which are indexed using Lucene
> >
> > book1 -> title: "foo bar test"
> > book2 -> title: "foo barrr test"
> > book3 -> title: "foo bar bar"
> >
> > I want to find book1 and book2 using the following query "foo * test".
> But
> > if I pass this string to the QueryParser, the QueryParser seems to be
> > searching for a literal '*' character.
> > Any idea's how to fix this?
> >
> > Thanks!
> > Jochen
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Starts with Query - Return like search

2012-07-04 Thread Ian Lea
Where exactly are you using these double quoted strings?  QueryParser?
 It would help if you showed a code snippet.

Assuming your real data is more complex and the strings you are
searching for aren't necessarily at the start of the text, you'll need
some mix of wildcard and proximity searching.  I don't think that "foo
ba*"~n
will work but I'm sure you'll be able to do it with a SpanQuery or
six.  SpanNearQuery lets you specify slop and whether you care if
matches are in order or not.

See http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/ for
info on spans.

See also 
http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F
for good tips on figuring out why things aren't doing what you want.

Good luck.


--
Ian.


On Wed, Jul 4, 2012 at 7:11 AM, Hiren Shah  wrote:
> I have used standardAnalyser to save the ANALYZED data in index.
>
> Data is as below:-
>
>1. foo bag test
>2. foo bar test
>3. bar india foo
>
>
> I used
> When  i search using--->  foo ba
> I get all results when  i use --->(+foo* +ba*)
>
>1. I tried using "foo ba" (with double quotes)  but no results come as
>it searches for exact word
>2. I tried using "foo ba*" (with double quotes)  but no results come as
>it searches for exact word
>3. I tried using "foo bar" (with double quotes)  Then 2nd result comes
>as both words are completed
>
> What should be done to get  options 1 and 2 in results when user types foo
> ba*. I dont want 3rd result but want 1st 2 results.
> Please help.
>
> Thanks
> Hiren

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Starts with Query - Return like search

2012-07-04 Thread Ian Lea
In fact there is an FAQ entry Can I combine wildcard and phrase
search, e.g. "foo ba*"? at
http://wiki.apache.org/lucene-java/LuceneFAQ#Can_I_combine_wildcard_and_phrase_search.2C_e.g._.22foo_ba.2A.22.3F
which suggests you extend the QueryParser to build a MultiPhraseQuery.
 There's also ComplexPhraseQueryParser which looks interesting.


--
Ian.


On Wed, Jul 4, 2012 at 9:51 AM, Ian Lea  wrote:
> Where exactly are you using these double quoted strings?  QueryParser?
>  It would help if you showed a code snippet.
>
> Assuming your real data is more complex and the strings you are
> searching for aren't necessarily at the start of the text, you'll need
> some mix of wildcard and proximity searching.  I don't think that "foo
> ba*"~n
> will work but I'm sure you'll be able to do it with a SpanQuery or
> six.  SpanNearQuery lets you specify slop and whether you care if
> matches are in order or not.
>
> See http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/ for
> info on spans.
>
> See also 
> http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F
> for good tips on figuring out why things aren't doing what you want.
>
> Good luck.
>
>
> --
> Ian.
>
>
> On Wed, Jul 4, 2012 at 7:11 AM, Hiren Shah  wrote:
>> I have used standardAnalyser to save the ANALYZED data in index.
>>
>> Data is as below:-
>>
>>1. foo bag test
>>2. foo bar test
>>3. bar india foo
>>
>>
>> I used
>> When  i search using--->  foo ba
>> I get all results when  i use --->(+foo* +ba*)
>>
>>1. I tried using "foo ba" (with double quotes)  but no results come as
>>it searches for exact word
>>2. I tried using "foo ba*" (with double quotes)  but no results come as
>>it searches for exact word
>>3. I tried using "foo bar" (with double quotes)  Then 2nd result comes
>>as both words are completed
>>
>> What should be done to get  options 1 and 2 in results when user types foo
>> ba*. I dont want 3rd result but want 1st 2 results.
>> Please help.
>>
>> Thanks
>> Hiren

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Starts with Query - Return like search

2012-07-04 Thread Hiren Shah
Please find the code here
package org.lucenesample;

import org.apache.lucene.search.Query;

import org.apache.lucene.*;
import org.apache.lucene.analysis.*;
import org.apache.lucene.analysis.standard.*;
import org.apache.lucene.analysis.standard.std31.*;
import org.apache.lucene.analysis.tokenattributes.*;
import org.apache.lucene.collation.*;
import org.apache.lucene.document.*;
import org.apache.lucene.document.Field.Index;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.index.*;
import org.apache.lucene.index.IndexWriter.MaxFieldLength;
import org.apache.lucene.messages.*;
import org.apache.lucene.queryParser.*;
import org.apache.lucene.search.*;
import org.apache.lucene.search.function.*;
import org.apache.lucene.search.payloads.*;
import org.apache.lucene.search.spans.*;
import org.apache.lucene.store.*;
import org.apache.lucene.util.*;
import org.apache.lucene.util.fst.*;
import org.apache.lucene.util.packed.*;

import java.io.File;
import java.sql.*;
import java.util.HashMap;

public class ExactPhrasesearchUsingStandardAnalyser {

/**
 * @param args
 */
public static void main(String[] args) throws Exception {
Directory directory = new RAMDirectory();
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);
MaxFieldLength mlf = MaxFieldLength.UNLIMITED;
IndexWriter writer = new IndexWriter(directory, analyzer, true,
mlf);
writer.addDocument(createDocument1("1", "foo bar baz blue"));
writer.addDocument(createDocument1("2", "red green blue"));
writer.addDocument(createDocument1("3", "test panda foo & bar
testt"));
writer.addDocument(createDocument1("4", " bar test test foo in
panda  red blue "));
writer.addDocument(createDocument1("4", "test"));
writer.close();

IndexSearcher searcher = new IndexSearcher(directory);
PhraseQuery query = new PhraseQuery();


QueryParser qp2 = new QueryParser(Version.LUCENE_35, "contents",
analyzer);
//qp.setDefaultOperator(QueryParser.Operator.AND);
Query queryx2 =qp2.parse("test foo in panda re*");//contains query
Query queryx23 =qp2.parse("+red +green +blu*"  );//exact phrase match
query.Make last word as followed by star
Query queryx234 =qp2.parse("(+red +green +blu*)& (\"red* green\") "  );





 /*String term = "new york";
// id and location are the fields in which i want to search the
"term"
MultiFieldQueryParser queryParser = new MultiFieldQueryParser(
   Version.LUCENE_35,
   { "contents"},
   new KeywordAnalyzer());
Query query = queryParser.parse(term);
System.out.println(query.toString());*/

QueryParser qp = new QueryParser(Version.LUCENE_35, "contents",
analyzer);
//qp.setDefaultOperator(QueryParser.Operator.AND);

Query queryx =qp.parse("\"air quality\"~10");
System.out.println("**Searching Code
starts**");
TopDocs topDocs = searcher.search(queryx2, 10);
for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
Document doc = searcher.doc(scoreDoc.doc);
System.out.println(doc+"test");
}

}



   private static Document createDocument1(String id, String content) {
Document doc = new Document();
doc.add(new Field("id", id, Store.YES, Index.NOT_ANALYZED));
doc.add(new Field("contents", content, Store.YES, Index.ANALYZED,
Field.

TermVector.WITH_POSITIONS_OFFSETS));
System.out.println(content);
return doc;
}
}


Also please refer the below post.
http://stackoverflow.com/questions/10828825/incremental-search-using-lucene
On Wed, Jul 4, 2012 at 2:21 PM, Ian Lea  wrote:

> Where exactly are you using these double quoted strings?  QueryParser?
>  It would help if you showed a code snippet.
>
> Assuming your real data is more complex and the strings you are
> searching for aren't necessarily at the start of the text, you'll need
> some mix of wildcard and proximity searching.  I don't think that "foo
> ba*"~n
> will work but I'm sure you'll be able to do it with a SpanQuery or
> six.  SpanNearQuery lets you specify slop and whether you care if
> matches are in order or not.
>
> See http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/ for
> info on spans.
>
> See also
> http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F
> for good tips on figuring out why things aren't doing what you want.
>
> Good luck.
>
>
> --
> Ian.
>
>
> On Wed, Jul 4, 2012 at 7:11 AM, Hiren Shah  wrote:
> > I have used standardAnalyser to save the ANALYZED data in index.
> >
> > Data is as below:-
> >
> >1. foo bag test
> >2. foo bar test
> >3. bar india foo
> >
> >
> > I used
> > When  i search using--->  foo ba
> > I get all results when  i use --->(+foo* +ba*)
> >
> >  

Re: Starts with Query - Return like search

2012-07-04 Thread Jack Krupansky
You might also consider using the EdgeNGram filter for your documents since 
it would index "bar" as both "ba" and "bar" at the same position, 
eliminating the need for the use of wildcards. It makes the index bigger, 
but eliminates the performance degradation of wildcards. It isn't great for 
all situations, but maybe it would work well for your case.


-- Jack Krupansky

-Original Message- 
From: Ian Lea

Sent: Wednesday, July 04, 2012 4:00 AM
To: java-user@lucene.apache.org
Subject: Re: Starts with Query - Return like search

In fact there is an FAQ entry Can I combine wildcard and phrase
search, e.g. "foo ba*"? at
http://wiki.apache.org/lucene-java/LuceneFAQ#Can_I_combine_wildcard_and_phrase_search.2C_e.g._.22foo_ba.2A.22.3F
which suggests you extend the QueryParser to build a MultiPhraseQuery.
There's also ComplexPhraseQueryParser which looks interesting.


--
Ian.


On Wed, Jul 4, 2012 at 9:51 AM, Ian Lea  wrote:

Where exactly are you using these double quoted strings?  QueryParser?
 It would help if you showed a code snippet.

Assuming your real data is more complex and the strings you are
searching for aren't necessarily at the start of the text, you'll need
some mix of wildcard and proximity searching.  I don't think that "foo
ba*"~n
will work but I'm sure you'll be able to do it with a SpanQuery or
six.  SpanNearQuery lets you specify slop and whether you care if
matches are in order or not.

See http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/ for
info on spans.

See also 
http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F

for good tips on figuring out why things aren't doing what you want.

Good luck.


--
Ian.


On Wed, Jul 4, 2012 at 7:11 AM, Hiren Shah  wrote:

I have used standardAnalyser to save the ANALYZED data in index.

Data is as below:-

   1. foo bag test
   2. foo bar test
   3. bar india foo


I used
When  i search using--->  foo ba
I get all results when  i use --->(+foo* +ba*)

   1. I tried using "foo ba" (with double quotes)  but no results come as
   it searches for exact word
   2. I tried using "foo ba*" (with double quotes)  but no results come 
as

   it searches for exact word
   3. I tried using "foo bar" (with double quotes)  Then 2nd result comes
   as both words are completed

What should be done to get  options 1 and 2 in results when user types 
foo

ba*. I dont want 3rd result but want 1st 2 results.
Please help.

Thanks
Hiren


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org 



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Starts with Query - Return like search

2012-07-04 Thread Hiren Shah
Hi Jack
This needs to be taken care while indexing?Where can i get the code for the
edgegram indexing and then searching.?

-Hiren

On Wed, Jul 4, 2012 at 9:19 PM, Jack Krupansky wrote:

> You might also consider using the EdgeNGram filter for your documents
> since it would index "bar" as both "ba" and "bar" at the same position,
> eliminating the need for the use of wildcards. It makes the index bigger,
> but eliminates the performance degradation of wildcards. It isn't great for
> all situations, but maybe it would work well for your case.
>
> -- Jack Krupansky
>
> -Original Message- From: Ian Lea
> Sent: Wednesday, July 04, 2012 4:00 AM
> To: java-user@lucene.apache.org
> Subject: Re: Starts with Query - Return like search
>
>
> In fact there is an FAQ entry Can I combine wildcard and phrase
> search, e.g. "foo ba*"? at
> http://wiki.apache.org/lucene-**java/LuceneFAQ#Can_I_combine_**
> wildcard_and_phrase_search.2C_**e.g._.22foo_ba.2A.22.3F
> which suggests you extend the QueryParser to build a MultiPhraseQuery.
> There's also ComplexPhraseQueryParser which looks interesting.
>
>
> --
> Ian.
>
>
> On Wed, Jul 4, 2012 at 9:51 AM, Ian Lea  wrote:
>
>> Where exactly are you using these double quoted strings?  QueryParser?
>>  It would help if you showed a code snippet.
>>
>> Assuming your real data is more complex and the strings you are
>> searching for aren't necessarily at the start of the text, you'll need
>> some mix of wildcard and proximity searching.  I don't think that "foo
>> ba*"~n
>> will work but I'm sure you'll be able to do it with a SpanQuery or
>> six.  SpanNearQuery lets you specify slop and whether you care if
>> matches are in order or not.
>>
>> See 
>> http://www.lucidimagination.**com/blog/2009/07/18/the-**spanquery/for
>> info on spans.
>>
>> See also http://wiki.apache.org/lucene-**java/LuceneFAQ#Why_am_I_**
>> getting_no_hits_.2BAC8_**incorrect_hits.3F
>> for good tips on figuring out why things aren't doing what you want.
>>
>> Good luck.
>>
>>
>> --
>> Ian.
>>
>>
>> On Wed, Jul 4, 2012 at 7:11 AM, Hiren Shah 
>> wrote:
>>
>>> I have used standardAnalyser to save the ANALYZED data in index.
>>>
>>> Data is as below:-
>>>
>>>1. foo bag test
>>>2. foo bar test
>>>3. bar india foo
>>>
>>>
>>> I used
>>> When  i search using--->  foo ba
>>> I get all results when  i use --->(+foo* +ba*)
>>>
>>>1. I tried using "foo ba" (with double quotes)  but no results come as
>>>it searches for exact word
>>>2. I tried using "foo ba*" (with double quotes)  but no results come
>>> as
>>>it searches for exact word
>>>3. I tried using "foo bar" (with double quotes)  Then 2nd result comes
>>>as both words are completed
>>>
>>> What should be done to get  options 1 and 2 in results when user types
>>> foo
>>> ba*. I dont want 3rd result but want 1st 2 results.
>>> Please help.
>>>
>>> Thanks
>>> Hiren
>>>
>>
> --**--**-
> To unsubscribe, e-mail: 
> java-user-unsubscribe@lucene.**apache.org
> For additional commands, e-mail: 
> java-user-help@lucene.apache.**org
>
> --**--**-
> To unsubscribe, e-mail: 
> java-user-unsubscribe@lucene.**apache.org
> For additional commands, e-mail: 
> java-user-help@lucene.apache.**org
>
>


Re: Starts with Query - Return like search

2012-07-04 Thread Jack Krupansky

Here's a Solr field type that supports edge n-grams:

positionIncrementGap="100">

 
   
   maxGramSize="15" side="front"/>

 
 
   
 


In Lucene, you would use the EdgeNGramFilter.

This is for Lucene/Solr 3.6.

-- Jack Krupansky

-Original Message- 
From: Hiren Shah

Sent: Wednesday, July 04, 2012 3:20 PM
To: java-user@lucene.apache.org
Subject: Re: Starts with Query - Return like search

Hi Jack
This needs to be taken care while indexing?Where can i get the code for the
edgegram indexing and then searching.?

-Hiren

On Wed, Jul 4, 2012 at 9:19 PM, Jack Krupansky 
wrote:



You might also consider using the EdgeNGram filter for your documents
since it would index "bar" as both "ba" and "bar" at the same position,
eliminating the need for the use of wildcards. It makes the index bigger,
but eliminates the performance degradation of wildcards. It isn't great 
for

all situations, but maybe it would work well for your case.

-- Jack Krupansky

-Original Message- From: Ian Lea
Sent: Wednesday, July 04, 2012 4:00 AM
To: java-user@lucene.apache.org
Subject: Re: Starts with Query - Return like search


In fact there is an FAQ entry Can I combine wildcard and phrase
search, e.g. "foo ba*"? at
http://wiki.apache.org/lucene-**java/LuceneFAQ#Can_I_combine_**
wildcard_and_phrase_search.2C_**e.g._.22foo_ba.2A.22.3F
which suggests you extend the QueryParser to build a MultiPhraseQuery.
There's also ComplexPhraseQueryParser which looks interesting.


--
Ian.


On Wed, Jul 4, 2012 at 9:51 AM, Ian Lea  wrote:


Where exactly are you using these double quoted strings?  QueryParser?
 It would help if you showed a code snippet.

Assuming your real data is more complex and the strings you are
searching for aren't necessarily at the start of the text, you'll need
some mix of wildcard and proximity searching.  I don't think that "foo
ba*"~n
will work but I'm sure you'll be able to do it with a SpanQuery or
six.  SpanNearQuery lets you specify slop and whether you care if
matches are in order or not.

See 
http://www.lucidimagination.**com/blog/2009/07/18/the-**spanquery/for

info on spans.

See also http://wiki.apache.org/lucene-**java/LuceneFAQ#Why_am_I_**
getting_no_hits_.2BAC8_**incorrect_hits.3F
for good tips on figuring out why things aren't doing what you want.

Good luck.


--
Ian.


On Wed, Jul 4, 2012 at 7:11 AM, Hiren Shah 
wrote:


I have used standardAnalyser to save the ANALYZED data in index.

Data is as below:-

   1. foo bag test
   2. foo bar test
   3. bar india foo


I used
When  i search using--->  foo ba
I get all results when  i use --->(+foo* +ba*)

   1. I tried using "foo ba" (with double quotes)  but no results come 
as

   it searches for exact word
   2. I tried using "foo ba*" (with double quotes)  but no results come
as
   it searches for exact word
   3. I tried using "foo bar" (with double quotes)  Then 2nd result 
comes

   as both words are completed

What should be done to get  options 1 and 2 in results when user types
foo
ba*. I dont want 3rd result but want 1st 2 results.
Please help.

Thanks
Hiren




--**--**-
To unsubscribe, e-mail: 
java-user-unsubscribe@lucene.**apache.org
For additional commands, e-mail: 
java-user-help@lucene.apache.**org


--**--**-
To unsubscribe, e-mail: 
java-user-unsubscribe@lucene.**apache.org
For additional commands, e-mail: 
java-user-help@lucene.apache.**org






-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Starts with Query - Return like search

2012-07-04 Thread Jack Krupansky

Oops... that's EdgeNGramTokenFilter in Lucene.

-- Jack Krupansky

-Original Message- 
From: Jack Krupansky

Sent: Wednesday, July 04, 2012 4:52 PM
To: java-user@lucene.apache.org
Subject: Re: Starts with Query - Return like search

Here's a Solr field type that supports edge n-grams:


 
   
   
 
 
   
 


In Lucene, you would use the EdgeNGramFilter.

This is for Lucene/Solr 3.6.

-- Jack Krupansky

-Original Message- 
From: Hiren Shah

Sent: Wednesday, July 04, 2012 3:20 PM
To: java-user@lucene.apache.org
Subject: Re: Starts with Query - Return like search

Hi Jack
This needs to be taken care while indexing?Where can i get the code for the
edgegram indexing and then searching.?

-Hiren

On Wed, Jul 4, 2012 at 9:19 PM, Jack Krupansky
wrote:


You might also consider using the EdgeNGram filter for your documents
since it would index "bar" as both "ba" and "bar" at the same position,
eliminating the need for the use of wildcards. It makes the index bigger,
but eliminates the performance degradation of wildcards. It isn't great 
for

all situations, but maybe it would work well for your case.

-- Jack Krupansky

-Original Message- From: Ian Lea
Sent: Wednesday, July 04, 2012 4:00 AM
To: java-user@lucene.apache.org
Subject: Re: Starts with Query - Return like search


In fact there is an FAQ entry Can I combine wildcard and phrase
search, e.g. "foo ba*"? at
http://wiki.apache.org/lucene-**java/LuceneFAQ#Can_I_combine_**
wildcard_and_phrase_search.2C_**e.g._.22foo_ba.2A.22.3F
which suggests you extend the QueryParser to build a MultiPhraseQuery.
There's also ComplexPhraseQueryParser which looks interesting.


--
Ian.


On Wed, Jul 4, 2012 at 9:51 AM, Ian Lea  wrote:


Where exactly are you using these double quoted strings?  QueryParser?
 It would help if you showed a code snippet.

Assuming your real data is more complex and the strings you are
searching for aren't necessarily at the start of the text, you'll need
some mix of wildcard and proximity searching.  I don't think that "foo
ba*"~n
will work but I'm sure you'll be able to do it with a SpanQuery or
six.  SpanNearQuery lets you specify slop and whether you care if
matches are in order or not.

See 
http://www.lucidimagination.**com/blog/2009/07/18/the-**spanquery/for

info on spans.

See also http://wiki.apache.org/lucene-**java/LuceneFAQ#Why_am_I_**
getting_no_hits_.2BAC8_**incorrect_hits.3F
for good tips on figuring out why things aren't doing what you want.

Good luck.


--
Ian.


On Wed, Jul 4, 2012 at 7:11 AM, Hiren Shah 
wrote:


I have used standardAnalyser to save the ANALYZED data in index.

Data is as below:-

   1. foo bag test
   2. foo bar test
   3. bar india foo


I used
When  i search using--->  foo ba
I get all results when  i use --->(+foo* +ba*)

   1. I tried using "foo ba" (with double quotes)  but no results come 
as

   it searches for exact word
   2. I tried using "foo ba*" (with double quotes)  but no results come
as
   it searches for exact word
   3. I tried using "foo bar" (with double quotes)  Then 2nd result 
comes

   as both words are completed

What should be done to get  options 1 and 2 in results when user types
foo
ba*. I dont want 3rd result but want 1st 2 results.
Please help.

Thanks
Hiren




--**--**-
To unsubscribe, e-mail: 
java-user-unsubscribe@lucene.**apache.org
For additional commands, e-mail: 
java-user-help@lucene.apache.**org


--**--**-
To unsubscribe, e-mail: 
java-user-unsubscribe@lucene.**apache.org
For additional commands, e-mail: 
java-user-help@lucene.apache.**org






-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org 



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: [ANNOUNCE] Apache Lucene 4.0-alpha released.

2012-07-04 Thread Bill Bell
Hey how do we use the MemoryCodec in Solr?

Sent from my Mobile device
720-256-8076

On Jul 3, 2012, at 7:09 AM, Robert Muir  wrote:

> 3 July 2012, Apache Luceneā€š 4.0-alpha available
> The Lucene PMC is pleased to announce the release of Apache Lucene 4.0-alpha
> 
> Apache Lucene is a high-performance, full-featured text search engine
> library written entirely in Java. It is a technology suitable for nearly
> any application that requires full-text search, especially cross-platform.
> 
> This release contains numerous bug fixes, optimizations, and
> improvements, some of which are highlighted below.  The release
> is available for immediate download at:
>   http://lucene.apache.org/core/mirrors-core-latest-redir.html?ver=4.0a
> 
> See the CHANGES.txt file included with the release for a full list of
> details.
> 
> Lucene 4.0-alpha Release Highlights:
> 
> * The index formats for terms, postings lists, stored fields, term
> vectors, etc
>   are pluggable via the Codec api. You can select from the provided
>   implementations or customize the index format with your own Codec
> to meet your needs.
> 
> * Similarity has been decoupled from the vector space model (TF/IDF).
> Additional models
>   such as BM25, Divergence from Randomness, Language Models, and
> Information-based models
>   are provided (see
> http://www.lucidimagination.com/blog/2011/09/12/flexible-ranking-in-lucene-4).
> 
> * Added support for per-document values (DocValues). DocValues can be
> used for custom
>   scoring factors (accessible via Similarity), for pre-sorted Sort
> values, and more.
> 
> * When indexing via multiple threads, each IndexWriter thread now
> flushes its own segment
>   to disk concurrently, resulting in substantial performance improvements
>   (see 
> http://blog.mikemccandless.com/2011/05/265-indexing-speedup-with-lucenes.html).
> 
> * Per-document normalization factors ("norms") are no longer limited
> to a single byte.
>   Similarity implementations can use any DocValues type to store norms.
> 
> * Added index statistics such as the number of tokens for a term or
> field, number of postings
>   for a field, and number of documents with a posting for a field:
> these support additional
>   scoring models (see
>   
> http://blog.mikemccandless.com/2012/03/new-index-statistics-in-lucene-40.html).
> 
> * Implemented a new default term dictionary/index (BlockTree) that
> indexes shared prefixes
>   instead of every n'th term. This is not only more time- and space-
> efficient, but can
>   also sometimes avoid going to disk at all for terms that do not
> exist. Alternative term
>   dictionary implementions are provided and pluggable via the Codec api.
> 
> * Indexed terms are no longer UTF-16 char sequences, instead terms
> can be any binary
>   value encoded as byte arrays. By default, text terms are now encoded as 
> UTF-8
>   bytes. Sort order of terms is now defined by their binary value,
> which is identical
>   to UTF-8 sort order.
> 
> * Substantially faster performance when using a Filter during searching.
> 
> * File-system based directories can rate-limit the IO (MB/sec) of merge
>   threads, to reduce IO contention between merging and searching threads.
> 
> * Added a number of alternative Codecs and components for different
> use-cases: "Appending"
>   works with append-only filesystems (such as Hadoop DFS), "Memory"
> writes the entire
>   terms+postings as an FST read into RAM (see
>   
> http://blog.mikemccandless.com/2011/06/primary-key-lookups-are-28x-faster-with.html),
>   "Pulsing" inlines the postings for low-frequency terms into the
> term dictionary (see
>   
> http://blog.mikemccandless.com/2010/06/lucenes-pulsingcodec-on-primary-key.html),
>   "SimpleText" writes all files in plain-text for easy
> debugging/transparency (see
>   http://blog.mikemccandless.com/2010/10/lucenes-simpletext-codec.html),
> among others.
> 
> * Term offsets can be optionally encoded into the postings lists and
> can be retrieved
>   per-position.
> 
> * A new AutomatonQuery returns all documents containing any term
> matching a provided
>   finite-state automaton (see
> http://www.slideshare.net/otisg/finite-state-queries-in-lucene).
> 
> * FuzzyQuery is 100-200 times faster than in past releases (see
>   
> http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html).
> 
> * A new spell checker, DirectSpellChecker, finds possible corrections
> directly against the
>   main search index without requiring a separate index.
> 
> * Various in-memory data structures such as the term dictionary and
> FieldCache are represented
>   more efficiently with less object overhead (see
> http://blog.mikemccandless.com/2010/07/lucenes-ram-usage-for-searching.html).
> 
> * All search logic is now required to work per segment, IndexReader
> was therefore refactored to
>   differentiate between atomic and composite readers
>   (see http://blog.thetaphi.de/2012/02/is-your-indexreader-atomic-major.html).
> 
> * L