Re: Scale out design patterns

2011-02-03 Thread Toke Eskildsen
On Fri, 2011-02-04 at 05:54 +0100, Ganesh wrote: > 2. Consider a scenario I am sharding based on the User, I am having single > search server and It is handling 1000 members. Now as the memory consumption > is high, I have added one more search server. New users could access the > second server

Re: outlook MSG file text extraction tool?

2011-02-03 Thread findbestopensource
Here are few projects tagged text-extraction. http://www.findbestopensource.com/tagged/text-extraction I am not sure, If any product actually extract content from msg files. But take a look. On Fri, Feb 4, 2011 at 5:33 AM, Zhang, Lisheng < lisheng.zh...@broadvision.com> wrote: > > Hi, > > Do you

Re: Scale out design patterns

2011-02-03 Thread Ganesh
I am also in the same idea. Based on the field, I could shard but there are two practical difficulties. 1. If normal user logged-in then result could be fetched from the corresponding search server but if Admin user logged-in, then he may need to see all data. The query should be issued across

outlook MSG file text extraction tool?

2011-02-03 Thread Zhang, Lisheng
Hi, Do you know any good open source tool to extract text from MS outlook MSG files? 1) Apache Tika seems not to support *.msg yet. 2) Apache POI recently started to support *.msg (3.7 10/2010), but I run into several problems (cannot process Japanese well, null pointer exception ..)? Thank

Re: BooleanQuery / multiple indexes - Lucene 3.0.3

2011-02-03 Thread Robert Muir
On Thu, Feb 3, 2011 at 5:57 PM, Phil Herold wrote: > Hi, > > > > I'm getting incorrect search results when I use a MultiSearcher across > multiple indexes with a Boolean query, specifically, foo AND !bar (using > QueryParser). For example, with two indexes, I have a single document that > satisfie

BooleanQuery / multiple indexes - Lucene 3.0.3

2011-02-03 Thread Phil Herold
Hi, I'm getting incorrect search results when I use a MultiSearcher across multiple indexes with a Boolean query, specifically, foo AND !bar (using QueryParser). For example, with two indexes, I have a single document that satisfies both "foo" and "bar", so it should be excluded from the search

Re: Storing payloads without term-position and frequency

2011-02-03 Thread Alex
Hello Grant, I am currently storing the first term instance only because I just index each token for an article once. What I want to achieve is an index for versioned document collections like wikipedia (See this paper http://www.cis.poly.edu/suel/papers/archive.pdf). In detail I create on the f

Re: Using different field when overriding computeNorm

2011-02-03 Thread Robert Muir
On Thu, Feb 3, 2011 at 3:27 PM, Ryan Aylward wrote: > This is great. Is there a target of when 4.0 will be released? > Unfortunately I think its quite a ways away: there are branches for major features such as per-document payloads, realtime search, modern index compression algorithms, and a vari

RE: Using different field when overriding computeNorm

2011-02-03 Thread Ryan Aylward
This is great. Is there a target of when 4.0 will be released? -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Tuesday, February 01, 2011 11:10 AM To: java-user@lucene.apache.org Subject: Re: Using different field when overriding computeNorm On Tue, Feb 1, 2011 at 1:

Re: Regarding Search Performance

2011-02-03 Thread Simon Willnauer
You should really provide us with more info. Maybe some code too. Valuable infos are for example: - how big is your index? - how does the query look like? - are you searching from a local file system or ram dir or from remote FS? - how fast is the second search? - which version of lucene are you u

Regarding Search Performance

2011-02-03 Thread madhuri_1820
Hi, I have searching fields from multiple indexes. I am using Boolean Query. Index Search is taking nearly 20 sec for one query. I have read that Query Filter have a feature of caching the inner Query search results. I am not sure which Query is useful whether Query Filter or boolean query ?

Re: Some Problem with Lucene in Java

2011-02-03 Thread Felipe Lobo
If i understand you question right, you want do generate the snippet for the result documents. You can do something like the code below: QueryScorer scorer = new QueryScorer(query); Highlighter highlighter = new Highlighter(scorer); highlighter.setTextFragmenter(new SimpleSpanFragmenter(scorer));

Re: Fwd: Lucene Problems

2011-02-03 Thread Dawn Zoë Raison
We use the contrib package 'Highlighter' to do exactly that on our PDF newspaper website. Dawn On 03/02/2011 17:31, Gong Li wrote: Hi, I am developing an advanced pdf search engine in java by using pdfbox and lucene. And I must display the context of each keyword in the user interface, but i

Re: Lucene Problems

2011-02-03 Thread Ian Lea
The Lucene highlighter sounds just what you need. http://hrycan.com/2009/10/25/lucene-highlighter-howto/ talks about using it on an index of PDFs. Google will find lots of other info. -- Ian. On Thu, Feb 3, 2011 at 5:31 PM, Gong Li wrote: > Hi, > > I am developing an advanced pdf search engin

Some Problem with Lucene in Java

2011-02-03 Thread Cescy
Hi, I am developing an advanced pdf search engine in java by using pdfbox and lucene. And I must display the context of each keyword in the user interface, but i cannot find a method to do so. Most of the methods provided are used to deal with documents with whole content in the specified fiel

Re: Storing payloads without term-position and frequency

2011-02-03 Thread Grant Ingersoll
Payloads only make sense in terms of specific positions in the index, so I don't think there is a way to hack Lucene for it. You could, I suppose, just store the payload for the first instance of the term. Also, what's the use case you are trying to solve here? Why store term frequency as a p

Fwd: Lucene Problems

2011-02-03 Thread Gong Li
Hi, I am developing an advanced pdf search engine in java by using pdfbox and lucene. And I must display the context of each keyword in the user interface, but i cannot find a method to do so. Most of the methods provided are used to deal with documents with whole content in the specified field, a

RE: Syntax for Numeric Range

2011-02-03 Thread Uwe Schindler
Hi Anuj, You have to subclass QueryParser and override newRangeQuery() to parse yourself. Automatic parsing is impossible, because QueryParser does not know (in contrast to Apache Solr) which fields have which type (Lucene has no field schema). Example how to do this: http://mail-archives.apache

Syntax for Numeric Range

2011-02-03 Thread Anuj Shah
Is there a query syntax for specifying a numeric range for a field indexed as a NumericField. I've tried numericfield:[0 TO 10] > But it is parsed as a TermRangeQuery and not a NumericRangeQuery. Many thanks Anuj

Re: Storing an ID alongside a document

2011-02-03 Thread Jason Rutherglen
> there is a entire RAM resident part and a Iterator API that reads / > streams data directly from disk. > look at DocValuesEnum vs, Source Nice, thanks! On Thu, Feb 3, 2011 at 12:20 AM, Simon Willnauer wrote: > On Thu, Feb 3, 2011 at 3:23 AM, Jason Rutherglen > wrote: >> Is it?  I thought it w

Re: Storing an ID alongside a document

2011-02-03 Thread Simon Willnauer
On Thu, Feb 3, 2011 at 3:23 AM, Jason Rutherglen wrote: > Is it?  I thought it would load the values into heap RAM like the > field cache and in addition save the values to disk?  Does it also > read the values directly from disk? there is a entire RAM resident part and a Iterator API that reads