RE: Number to printable sortable string, including floating point numbers and negatives

2008-09-11 Thread Jimi Hullegård
> : Do you have any suggestions on how to solve this in a > "neat" way? And is > > Have you looked at the NumberTools class? > > As i recall it generates strings that are always printable, but as a > result (of using fewer characters) are also always longer then the > corrisponding value from Solr'

Caching Filters and docIds when using MultiSearcher/IndexSearcher(MultiReader)...

2008-09-11 Thread Antony Bowesman
Up to now I have only needed to search a single index, but now I will have many index shards to search across. My existing search mantained cached filters for the index as well as a cache of my own unique ID fields in the index, keyed by Lucene DocId. Now I need to search multiple indices, I

Re: Case Sensitivity

2008-09-11 Thread Anthony Urso
On Thu, Aug 28, 2008 at 11:16 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > Yonik Seeley wrote: >> >> I wasn't originally going to add a Field.Index at all for omitNorms, >> but Doug suggested it. >> The problem with this type-safe way of doing things is the >> combinatorial explosion. > >

Re: query to return docs that has a certain field

2008-09-11 Thread Chris Hostetter
: Is there a way? I could make the documents containing field A have : another field B equals some flag, and then query for that flag, but : that would be kind of inefficient. There is space efficienty and there is time effeciency. if there was only only value for field B (ie: "true" or "yes") t

Re: Number to printable sortable string, including floating point numbers and negatives

2008-09-11 Thread Chris Hostetter
: Do you have any suggestions on how to solve this in a "neat" way? And is Have you looked at the NumberTools class? As i recall it generates strings that are always printable, but as a result (of using fewer characters) are also always longer then the corrisponding value from Solr's NumberUt

query to return docs that has a certain field

2008-09-11 Thread Cam Bazz
Hello, Lets say we have different document types, and one type of document only contains field A. How can I make a query so that I get all the documents that only has field A? There is a get all documents query, but that would get all the documents whether they contain field A or not. Is there

segment exists in external directory yet the MergeScheduler executed the merge in a separate thread

2008-09-11 Thread Anthony Urso
I have implemented a MapReduce job to merge a bunch of Lucene 2.3.2 indices together, but the reducers randomly fail with the following unchecked exception after thousands of successful merges: org.apache.lucene.index.MergePolicy$MergeException: segment "_0 exists in external directory yet the Mer

Spawn an indexing thread on every update

2008-09-11 Thread nobody
Hi, In our application, I want users to be able to search for the updates they make almost immediately. Hence, whenever they update, I spawn a thread immediately to index. However, when the load on the application is very high the number of threads spawned increases, and this results in "cannot

Re: Terms with different boosts

2008-09-11 Thread Otis Gospodnetic
Guy, ulimit -n is your friend. As is the compound index format. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Guy Gavriely <[EMAIL PROTECTED]> > To: "java-user@lucene.apache.org" > Sent: Thursday, September 11, 2008 10:28:34 AM > Subjec

RE: Problems when changing stoplist file

2008-09-11 Thread Steven A Rowe
Hi Marie, On 09/11/2008 at 4:03 AM, Marie-Christine Plogmann wrote: > I am currently using the demo class IndexFiles to index some > corpus. I have replaced the Standard by a GermanAnalyzer. > Here, indexing works fine. > But if i specify a different stopword list that should be > used, the tokeni

HitCollector - Remote-ability

2008-09-11 Thread Dino Korah
Hi All, I vaguely remember discussions on lucene remote-ability of HitCollectors based search(). As far as I remember, it is not possible if I use HitCollectors. In lucene 3, we are doing away with a lot of search() variants, including the ones that return Hits. I would like to know which one o

AW: AW: AW: Search with multiple wildcards

2008-09-11 Thread Sertic Mirko, Bedag
Ok, i see the problems. I will talk to my customer about this requirement. Perhaps he doesn't need it anymore. Again, thanks a lot to all, you saved my day! Regards Mirko -Ursprüngliche Nachricht- Von: Matthew Hall [mailto:[EMAIL PROTECTED] Gesendet: Donnerstag, 11. September 2008 16:

Re: Terms with different boosts

2008-09-11 Thread Erick Erickson
How many fields are you winding up with for each document? One for each term? And what is the higher-level task you're trying to accomplish? What distinguishes *why* a certain term in a certain document should boost a particular document? Perhaps if you explained the higher level task someone woul

Re: Terms with different boosts

2008-09-11 Thread Markus Lux
Hi Guy, I think that isn't a problem related to fields. I experienced this kind of error caused by an limitation of the underlying file system. The problem was that I had too much InputStreams open that had never been closed. Please check that in your code and tell us if it worked. Markus. 2008

Re: Searching substring starting at a fixed position

2008-09-11 Thread luther blisset
Really thanks Karsten and Ian Lea!! You gave me a very useful solutions I'm going to try the last one of Karsten: Because you easly can use lucene with 1 field and 365 different tokens (20080101, 20080102, ...20081231). even if the solution of Ian Lea seems to be a very good one and I'll try it

Re: Searching substring starting at a fixed position

2008-09-11 Thread xinxin zhou
that's ok. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Searching substring starting at a fixed position

2008-09-11 Thread Ian Lea
Luther RegexQuery might work, or how about splitting the digit string out into dates and search for them e.g. 11000 could be stored as "avail: jan03 jan04 jan05" and a search for +avail:jan03 +avail:jan04 +avail:jan05 would get a hit. -- Ian. On Thu, Sep 11, 2008 at 12:34 PM, luther bliss

Re: AW: AW: Search with multiple wildcards

2008-09-11 Thread Matthew Hall
Ah.. that's a darn good point.. Though, that second bit of code you have there could be used at display time for him to get the functionality that he wants. You could also modify it somewhat, and apply it against the displayable part of the hit he's getting back rather than the individual tok

Re: AW: AW: Search with multiple wildcards

2008-09-11 Thread mark harwood
>>That should give you the functionality you are looking for. If I understand your suggestion correctly, It won't. The Highlighter uses a tokenized version of the document text. Simplistically it does the following psuedo code: for all tokens in documentTokenStream, if(queryTermsSet.contains

Re: AW: AW: Search with multiple wildcards

2008-09-11 Thread Matthew Hall
Well, you could certainly manipulate your search string, removing the wildcard punctuations, and then use that for what you pass to the highlighter. That should give you the functionality you are looking for. -Matt mark harwood wrote: Is this possible? Not currently, the highlighter

Terms with different boosts

2008-09-11 Thread Guy Gavriely
Hi, I have to index terms with different boosts, meaning that if the word A appears in two documents one document will be ranked higher. I've tried to index them by putting them in different fields and give the fields different boost but i ran into too many files (caused by too many fields I g

Re: AW: AW: Search with multiple wildcards

2008-09-11 Thread mark harwood
>> Is this possible? Not currently, the highlighter works with a list of words (or words AND phrases using the new span support) and highlights those. To do anything else would require the higlighter to faithfully re-implement much of the logic in all of the different query types (fuzzy, wildcar

Re: Searching substring starting at a fixed position

2008-09-11 Thread xinxin zhou
i's a question of math and arithmetic,not a question about lucene.there is other good way deal with it.

Re: Searching substring starting at a fixed position

2008-09-11 Thread Karsten F.
Hi Luther, your question: "Is there a way to ask Lucene to search starting from a fixed position?" the anwer: no, not by standard search. But you don't want to use your field for scoring. So this is a field to filter results. you could easily change RangeFilter for this purpose but the new filt

Searching substring starting at a fixed position

2008-09-11 Thread luther blisset
hi folks, I'm new to Lucene and I'm looking for a way to search a substring that starts at a fixed position. It isn't a classical substring search because it's a bit weird. I indexed a field that represents the avability of a room in a hostal during 1 year. The field is composed by 365 digits and

AW: AW: Search with multiple wildcards

2008-09-11 Thread Sertic Mirko, Bedag
Ok, one final question: If i query for "*ll*", the query is expanded to ("hallo" or "alle" or ...), so the Highligter will highlight the words "hallo" or "alle". But how can i highlight only the original query, so only the "ll"? Is this possible? Thanks a lot Mirko -Ursprüngliche Nachricht

Re: C++ Bindings for Lucene?

2008-09-11 Thread xinxin zhou
> > just try it.and you will find answer. >

Searcher - search() & Hits Deprecation

2008-09-11 Thread Dino Korah
Hi All, In my project I use Hits from Searcher.search() for my query results. If I am to move to Lucene 3's ways, I will have to use TopDocs I presume. It'll be great if someone could guide me with some sort of skeleton code. Also is it possible to cache the results like I do with Hits? Anoth

Re: AW: Search with multiple wildcards

2008-09-11 Thread mark harwood
You need to call rewrite on the query to expand it then give that version to the highlighter - see the package javadocs. http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/highlight/package-summary.html#package_description Cheers Mark - Original Message From: "Sertic M

AW: Search with multiple wildcards

2008-09-11 Thread Sertic Mirko, Bedag
Ok, i gave it a try, but i ran into this TooManyClauses Exception. I see that 3ildcard queries are expanded before they are processed, and I see that i can set the clauses count to Integer.MAXVALUE, and queries can consume a lot of memory, but one final thing is still open: does a wildcard query

Problems when changing stoplist file

2008-09-11 Thread Marie-Christine Plogmann
Hi, I am currently using the demo class IndexFiles to index some corpus. I have replaced the Standard by a GermanAnalyzer. Here, indexing works fine. But if i specify a different stopword list that should be used, the tokenization doesn't seem to work properly. Mostly some letters are missing at