Re: Escaping Special Characters

2011-07-05 Thread govind bhardwaj
Hi,
I'm not getting you. escape() method takes String as an argument.

The code snippet I am using is as follows

*String escaped = MultiFieldQueryParser.escape(queryString);
Query query1 = parser.parse(escaped);
TopDocs results = searcher.search(query1);

*Please point out where I may be going wrong.

Govind

On Mon, Jul 4, 2011 at 10:16 AM, Adriano Crestani  wrote:

> Hi Govind,
>
> escape() method should only be used to escape term, not the query itself.
> If
> the user is entering the query, it's his responsibility to escape the
> query.
>
> On Mon, Jul 4, 2011 at 4:21 AM, govind bhardwaj 
> wrote:
>
> > Hi,
> >
> > I am using Lucene version 3.1
> > Previously I had trouble regarding special characters as when I entered
> > "---" as my input, it gave the following error
> >
> >
> > *Caused by: org.apache.lucene.queryParser.ParseException: Encountered "
> "-"
> > "- "" at line 1, column 1.
> > Was expecting one of:
> >"(" ...
> >"*" ...
> > ...
> > ...
> > ...
> > ...
> >"[" ...
> >"{" ...
> > ...
> > ...
> >"*" ...
> >
> > *To overcome this, I used escape() method of the QueryParser and worked
> > fine. But now, unlike previously, the search for "item*" yielded no
> results
> > because I guess it escaped the asterisk character too. How should I go
> > about
> > preventing this from happening ? I am using MultiFieldQueryParser.
> >
> >
> >
> > Govind
> > *
> >
> > *--
> > No trees were harmed in the creation of this message, but several
> thousand
> > electrons were mildly inconvenienced.
> >
>



-- 
No trees were harmed in the creation of this message, but several thousand
electrons were mildly inconvenienced.


Re: full text searching in cloud for minor enterprises

2011-07-05 Thread Joe Scanlon
Look at searchblox

On Monday, July 4, 2011, Li Li  wrote:
> hi all,
>     I want to provide full text searching for some "small" websites.
> It seems cloud computing is  popular now. And it will save costs
> because it don't need employ engineer to maintain
> the machine.
>     For now, there are many services such as amazon s3, google app
> engine, ms azure etc. I am not familiar with cloud computing. Anyone
> give me a direction or some advice? thanks
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-- 
Joe Scanlon

jscan...@element115.net

Mobile: 603 459 3242
Office:  312 445 0018

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: [ANNOUNCE] Apache Lucene 3.3

2011-07-05 Thread Jan Engler
Hi,

does anyone know where I could find the class "ChainedFilter" in Lucene
3.3.? Before our Upgrade (from 3.0.2 to 3.3.3) it was located in
lucene-miscbut I cannot find that anymore at that location...

Thx for your help,
 Jan

Am 01.07.2011 07:56, schrieb Robert Muir:
> July 2011, Apache Lucene™ 3.3 available
> The Lucene PMC is pleased to announce the release of Apache Lucene 3.3.
> 
> Apache Lucene is a high-performance, full-featured text search engine
> library written entirely in Java. It is a technology suitable for nearly
> any application that requires full-text search, especially cross-platform.
> 
> This release contains numerous bug fixes, optimizations, and
> improvements, some of which are highlighted below.  The release
> is available for immediate download at:
>http://www.apache.org/dyn/closer.cgi/lucene/java (see note below).
> 
> See the CHANGES.txt file included with the release for a full list of details.
> 
> Lucene 3.3 Release Highlights:
> 
>  * The spellchecker module now includes suggest/auto-complete functionality,
>with three implementations: Jaspell, Ternary Trie, and Finite State.
> 
>  * Support for merging results from multiple shards, for both "normal"
>search results (TopDocs.merge) as well as grouped results using the
>grouping module (SearchGroup.merge, TopGroups.merge).
> 
>  * An optimized implementation of KStem, a less aggressive stemmer
>for English.
> 
>  * Single-pass grouping implementation based on block document indexing.
> 
>  * Improvements to MMapDirectory (now also the default implementation
>returned by FSDirectory.open on 64-bit Linux).
> 
>  * NRTManager simplifies handling near-real-time search with multiple
>search threads, allowing the application to control which indexing
>changes must be visible to which search requests.
> 
>  * TwoPhaseCommitTool facilitates performing a multi-resource
>two-phased commit, including IndexWriter.
> 
>  * The default merge policy, TieredMergePolicy, has a new method
>(set/getReclaimDeletesWeight) to control how aggressively it
>targets segments with deletions, and is now more aggressive than
>before by default.
> 
>  * PKIndexSplitter tool splits an index by a mid-point term.
> 
> Note: The Apache Software Foundation uses an extensive mirroring network for
> distributing releases.  It is possible that the mirror you are using may not
> have replicated the release yet.  If that is the case, please try another
> mirror.  This also goes for Maven access.
> 
> Thanks,
> Apache Lucene Developers
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: [ANNOUNCE] Apache Lucene 3.3

2011-07-05 Thread Robert Muir
Hi Jan,

 * LUCENE-2323: Moved contrib/regex into contrib/queries. Moved the
   queryparsers under contrib/misc and contrib/surround into
contrib/queryparser.
   Moved contrib/fast-vector-highlighter into contrib/highlighter.
   Moved ChainedFilter from contrib/misc to contrib/queries. contrib/spatial now
   depends on contrib/queries instead of contrib/misc.

for future reference, you can find this information in CHANGES.txt ,
contrib/CHANGES.txt
http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/CHANGES.txt


On Tue, Jul 5, 2011 at 8:30 AM, Jan Engler  wrote:
> Hi,
>
> does anyone know where I could find the class "ChainedFilter" in Lucene
> 3.3.? Before our Upgrade (from 3.0.2 to 3.3.3) it was located in
> lucene-miscbut I cannot find that anymore at that location...
>
> Thx for your help,
>  Jan
>
> Am 01.07.2011 07:56, schrieb Robert Muir:
>> July 2011, Apache Lucene™ 3.3 available
>> The Lucene PMC is pleased to announce the release of Apache Lucene 3.3.
>>
>> Apache Lucene is a high-performance, full-featured text search engine
>> library written entirely in Java. It is a technology suitable for nearly
>> any application that requires full-text search, especially cross-platform.
>>
>> This release contains numerous bug fixes, optimizations, and
>> improvements, some of which are highlighted below.  The release
>> is available for immediate download at:
>>    http://www.apache.org/dyn/closer.cgi/lucene/java (see note below).
>>
>> See the CHANGES.txt file included with the release for a full list of 
>> details.
>>
>> Lucene 3.3 Release Highlights:
>>
>>  * The spellchecker module now includes suggest/auto-complete functionality,
>>    with three implementations: Jaspell, Ternary Trie, and Finite State.
>>
>>  * Support for merging results from multiple shards, for both "normal"
>>    search results (TopDocs.merge) as well as grouped results using the
>>    grouping module (SearchGroup.merge, TopGroups.merge).
>>
>>  * An optimized implementation of KStem, a less aggressive stemmer
>>    for English.
>>
>>  * Single-pass grouping implementation based on block document indexing.
>>
>>  * Improvements to MMapDirectory (now also the default implementation
>>    returned by FSDirectory.open on 64-bit Linux).
>>
>>  * NRTManager simplifies handling near-real-time search with multiple
>>    search threads, allowing the application to control which indexing
>>    changes must be visible to which search requests.
>>
>>  * TwoPhaseCommitTool facilitates performing a multi-resource
>>    two-phased commit, including IndexWriter.
>>
>>  * The default merge policy, TieredMergePolicy, has a new method
>>    (set/getReclaimDeletesWeight) to control how aggressively it
>>    targets segments with deletions, and is now more aggressive than
>>    before by default.
>>
>>  * PKIndexSplitter tool splits an index by a mid-point term.
>>
>> Note: The Apache Software Foundation uses an extensive mirroring network for
>> distributing releases.  It is possible that the mirror you are using may not
>> have replicated the release yet.  If that is the case, please try another
>> mirror.  This also goes for Maven access.
>>
>> Thanks,
>> Apache Lucene Developers
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: [ANNOUNCE] Apache Lucene 3.3

2011-07-05 Thread Jan Engler
Hi Robert,

thanks a lotfound the right one ;-)

Thx again,
 Jan

Am 05.07.2011 14:34, schrieb Robert Muir:
> Hi Jan,
> 
>  * LUCENE-2323: Moved contrib/regex into contrib/queries. Moved the
>queryparsers under contrib/misc and contrib/surround into
> contrib/queryparser.
>Moved contrib/fast-vector-highlighter into contrib/highlighter.
>Moved ChainedFilter from contrib/misc to contrib/queries. contrib/spatial 
> now
>depends on contrib/queries instead of contrib/misc.
> 
> for future reference, you can find this information in CHANGES.txt ,
> contrib/CHANGES.txt
> http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/CHANGES.txt
> 
> 
> On Tue, Jul 5, 2011 at 8:30 AM, Jan Engler  wrote:
>> Hi,
>>
>> does anyone know where I could find the class "ChainedFilter" in Lucene
>> 3.3.? Before our Upgrade (from 3.0.2 to 3.3.3) it was located in
>> lucene-miscbut I cannot find that anymore at that location...
>>
>> Thx for your help,
>>  Jan
>>
>> Am 01.07.2011 07:56, schrieb Robert Muir:
>>> July 2011, Apache Lucene™ 3.3 available
>>> The Lucene PMC is pleased to announce the release of Apache Lucene 3.3.
>>>
>>> Apache Lucene is a high-performance, full-featured text search engine
>>> library written entirely in Java. It is a technology suitable for nearly
>>> any application that requires full-text search, especially cross-platform.
>>>
>>> This release contains numerous bug fixes, optimizations, and
>>> improvements, some of which are highlighted below.  The release
>>> is available for immediate download at:
>>>http://www.apache.org/dyn/closer.cgi/lucene/java (see note below).
>>>
>>> See the CHANGES.txt file included with the release for a full list of 
>>> details.
>>>
>>> Lucene 3.3 Release Highlights:
>>>
>>>  * The spellchecker module now includes suggest/auto-complete functionality,
>>>with three implementations: Jaspell, Ternary Trie, and Finite State.
>>>
>>>  * Support for merging results from multiple shards, for both "normal"
>>>search results (TopDocs.merge) as well as grouped results using the
>>>grouping module (SearchGroup.merge, TopGroups.merge).
>>>
>>>  * An optimized implementation of KStem, a less aggressive stemmer
>>>for English.
>>>
>>>  * Single-pass grouping implementation based on block document indexing.
>>>
>>>  * Improvements to MMapDirectory (now also the default implementation
>>>returned by FSDirectory.open on 64-bit Linux).
>>>
>>>  * NRTManager simplifies handling near-real-time search with multiple
>>>search threads, allowing the application to control which indexing
>>>changes must be visible to which search requests.
>>>
>>>  * TwoPhaseCommitTool facilitates performing a multi-resource
>>>two-phased commit, including IndexWriter.
>>>
>>>  * The default merge policy, TieredMergePolicy, has a new method
>>>(set/getReclaimDeletesWeight) to control how aggressively it
>>>targets segments with deletions, and is now more aggressive than
>>>before by default.
>>>
>>>  * PKIndexSplitter tool splits an index by a mid-point term.
>>>
>>> Note: The Apache Software Foundation uses an extensive mirroring network for
>>> distributing releases.  It is possible that the mirror you are using may not
>>> have replicated the release yet.  If that is the case, please try another
>>> mirror.  This also goes for Maven access.
>>>
>>> Thanks,
>>> Apache Lucene Developers
>>>
>>> -
>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



deleting 8,000,000 indexes takes forever!!!! any solution to this...

2011-07-05 Thread Hiller, Dean x66079
We are using a sort of nosql environment and deleting 200 gig on one machine 
from the database is fast, but then we go and delete 5 gigs of indexes that 
were created and it takes forever

Is there any option in lucene to make it so it uses LARGER files and less count 
of files so it is easier to maintain and wipe out an index much faster?

Thanks,
Dean

This message and any attachments are intended only for the use of the addressee 
and may contain information that is privileged and confidential. If the reader 
of the message is not the intended recipient or an authorized representative of 
the intended recipient, you are hereby notified that any dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please notify us immediately by e-mail and delete the message and any 
attachments from your system.



Re: Escaping Special Characters

2011-07-05 Thread Adriano Crestani
Hi Govind,

I think you are wrong by assuming escape should receive a query string as
parameter. However, it's meant to receive a term as parameter, which will
later be used to create a query. See the example below:

// this is bad, since +, &, * and - will be escaped
String query = "+lucene&solr -apache*";
String escapedQuery = MultiQueryParser.escape(query);
Query q = queryparser.parse(escapedQuery,...);

// this is good, the operators +, * and - will not be escaped
String term1 = "lucene&solr";
String term2 = "apache"
String term1Escaped = MultiQueryParser.escape(term1);
String term2Escaped = MultiQueryParser.escape(term2);
String escapedQuery = "+" + term1Escaped + " " + term2Escaped + "*";
Query q = queryparser.parse(escapedQuery,...);

As you can see in the second example, escape method should be used to escape
terms, not the entire query. The method should only be used to make sure
syntax characters are escaped and not treated as query operator, so it's
wrong to assume escape() will not escape * in the query term*.

On Tue, Jul 5, 2011 at 3:12 AM, govind bhardwaj  wrote:

> Hi,
> I'm not getting you. escape() method takes String as an argument.
>
> The code snippet I am using is as follows
>
> *String escaped = MultiFieldQueryParser.escape(queryString);
> Query query1 = parser.parse(escaped);
> TopDocs results = searcher.search(query1);
>
> *Please point out where I may be going wrong.
>
> Govind
>
> On Mon, Jul 4, 2011 at 10:16 AM, Adriano Crestani <
> adrianocrest...@gmail.com
> > wrote:
>
> > Hi Govind,
> >
> > escape() method should only be used to escape term, not the query itself.
> > If
> > the user is entering the query, it's his responsibility to escape the
> > query.
> >
> > On Mon, Jul 4, 2011 at 4:21 AM, govind bhardwaj 
> > wrote:
> >
> > > Hi,
> > >
> > > I am using Lucene version 3.1
> > > Previously I had trouble regarding special characters as when I entered
> > > "---" as my input, it gave the following error
> > >
> > >
> > > *Caused by: org.apache.lucene.queryParser.ParseException: Encountered "
> > "-"
> > > "- "" at line 1, column 1.
> > > Was expecting one of:
> > >"(" ...
> > >"*" ...
> > > ...
> > > ...
> > > ...
> > > ...
> > >"[" ...
> > >"{" ...
> > > ...
> > > ...
> > >"*" ...
> > >
> > > *To overcome this, I used escape() method of the QueryParser and worked
> > > fine. But now, unlike previously, the search for "item*" yielded no
> > results
> > > because I guess it escaped the asterisk character too. How should I go
> > > about
> > > preventing this from happening ? I am using MultiFieldQueryParser.
> > >
> > >
> > >
> > > Govind
> > > *
> > >
> > > *--
> > > No trees were harmed in the creation of this message, but several
> > thousand
> > > electrons were mildly inconvenienced.
> > >
> >
>
>
>
> --
> No trees were harmed in the creation of this message, but several thousand
> electrons were mildly inconvenienced.
>


Re: deleting 8,000,000 indexes takes forever!!!! any solution to this...

2011-07-05 Thread Shai Erera
Hi Dean

Could you share a little more information about those indexes (and your
problem in general), such as:
* Is there one index, or 8M indexes?
* How many files do those indexes contain? Do you use compound file format?
* What is the command/API you use to delete the indexes?
* Lucene version, IndexWriter settings etc.

Shai

On Tue, Jul 5, 2011 at 6:50 PM, Hiller, Dean x66079 <
dean.hil...@broadridge.com> wrote:

> We are using a sort of nosql environment and deleting 200 gig on one
> machine from the database is fast, but then we go and delete 5 gigs of
> indexes that were created and it takes forever
>
> Is there any option in lucene to make it so it uses LARGER files and less
> count of files so it is easier to maintain and wipe out an index much
> faster?
>
> Thanks,
> Dean
>
> This message and any attachments are intended only for the use of the
> addressee and may contain information that is privileged and confidential.
> If the reader of the message is not the intended recipient or an authorized
> representative of the intended recipient, you are hereby notified that any
> dissemination of this communication is strictly prohibited. If you have
> received this communication in error, please notify us immediately by e-mail
> and delete the message and any attachments from your system.
>
>


Index statistics

2011-07-05 Thread Andres Taylor
Hi there,

A work with Neo4j , a NoSQL graph
database tightly coupled with Lucene. I am now working on an optimizing
execution engine. To do this well, I would like to know more about the
existing Lucene indices. Ideally, I'd like to be able to ask a Lucene index
how many hits a query might give me, before I actually run the query. The
answer will probably just be an estimation, but that's fine.

Is this possible today?

Best regards,

Andrés


Re: Index statistics

2011-07-05 Thread Michael McCandless
This API doesn't exist today.

Lucene has long needed for queries impls to do this, so that we can
properly plan/optimize how the query is run.  EG an AND query would
use this to pick the more restrictive clause to drive the
intersection.

For TermQuery you could just call IR.docFreq?  (Doesn't take deletions
into account so it'll always be an upper bound).

For other queries... you could pull the scorer, iterate over some
number of docs, and then "guestimate" based on what docID you got up
to vs how many docs you asked for, how many matches there would be for
the full index?  This would assume matches are uniformly distributed
throughout the index (eg, that docs are indexed in random order) which
is definitely not the case typically in practice.

Mike McCandless

http://blog.mikemccandless.com

On Tue, Jul 5, 2011 at 2:19 PM, Andres Taylor
 wrote:
> Hi there,
>
> A work with Neo4j , a NoSQL graph
> database tightly coupled with Lucene. I am now working on an optimizing
> execution engine. To do this well, I would like to know more about the
> existing Lucene indices. Ideally, I'd like to be able to ask a Lucene index
> how many hits a query might give me, before I actually run the query. The
> answer will probably just be an estimation, but that's fine.
>
> Is this possible today?
>
> Best regards,
>
> Andrés
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Lucene 3.3: Self referring deprecation use insteads in LowerCaseTokenizer

2011-07-05 Thread Rene Hackl-Sommer

Hi,

just noted that the deprecation use ... insteads in LowerCaseTokenizer 
(Lucene 3.3) refer to themselves instead of the new constructors with 
(Version...).


E.g. *@deprecated*use {@link #LowerCaseTokenizer(Reader)}instead.

should be #LowerCaseTokenizer(Version, Reader). Same for the two other 
constructors.


Thanks,
Rene


Re: full text searching in cloud for minor enterprises

2011-07-05 Thread Li Li
  sounds great.
  but I'd like to do it myself.
  searchblox did provide a cloud hosting service and it seems it
implements cloud computing itself other than using
common services that are provided by google, amazon etc.

On Tue, Jul 5, 2011 at 7:02 PM, Joe Scanlon  wrote:
> Look at searchblox
>
> On Monday, July 4, 2011, Li Li  wrote:
>> hi all,
>>     I want to provide full text searching for some "small" websites.
>> It seems cloud computing is  popular now. And it will save costs
>> because it don't need employ engineer to maintain
>> the machine.
>>     For now, there are many services such as amazon s3, google app
>> engine, ms azure etc. I am not familiar with cloud computing. Anyone
>> give me a direction or some advice? thanks
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>
> --
> Joe Scanlon
>
> jscan...@element115.net
>
> Mobile: 603 459 3242
> Office:  312 445 0018
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org