On Fri, 2011-02-04 at 05:54 +0100, Ganesh wrote:
> 2. Consider a scenario I am sharding based on the User, I am having single
> search server and It is handling 1000 members. Now as the memory consumption
> is high, I have added one more search server. New users could access the
> second server
Here are few projects tagged text-extraction.
http://www.findbestopensource.com/tagged/text-extraction I am not sure, If
any product actually extract content from msg files. But take a look.
On Fri, Feb 4, 2011 at 5:33 AM, Zhang, Lisheng <
lisheng.zh...@broadvision.com> wrote:
>
> Hi,
>
> Do you
I am also in the same idea. Based on the field, I could shard but there are two
practical difficulties.
1. If normal user logged-in then result could be fetched from the corresponding
search server but if Admin user logged-in, then he may need to see all data.
The query should be issued across
Hi,
Do you know any good open source tool to extract text from MS outlook MSG
files?
1) Apache Tika seems not to support *.msg yet.
2) Apache POI recently started to support *.msg (3.7 10/2010), but I run into
several problems (cannot process Japanese well, null pointer exception ..)?
Thank
On Thu, Feb 3, 2011 at 5:57 PM, Phil Herold wrote:
> Hi,
>
>
>
> I'm getting incorrect search results when I use a MultiSearcher across
> multiple indexes with a Boolean query, specifically, foo AND !bar (using
> QueryParser). For example, with two indexes, I have a single document that
> satisfie
Hi,
I'm getting incorrect search results when I use a MultiSearcher across
multiple indexes with a Boolean query, specifically, foo AND !bar (using
QueryParser). For example, with two indexes, I have a single document that
satisfies both "foo" and "bar", so it should be excluded from the search
Hello Grant,
I am currently storing the first term instance only because I just index
each token for an article once. What I want to achieve is an index for
versioned document collections like wikipedia (See this paper
http://www.cis.poly.edu/suel/papers/archive.pdf).
In detail I create on the f
On Thu, Feb 3, 2011 at 3:27 PM, Ryan Aylward wrote:
> This is great. Is there a target of when 4.0 will be released?
>
Unfortunately I think its quite a ways away: there are branches for
major features such as per-document payloads, realtime search, modern
index compression algorithms, and a vari
This is great. Is there a target of when 4.0 will be released?
-Original Message-
From: Robert Muir [mailto:rcm...@gmail.com]
Sent: Tuesday, February 01, 2011 11:10 AM
To: java-user@lucene.apache.org
Subject: Re: Using different field when overriding computeNorm
On Tue, Feb 1, 2011 at 1:
You should really provide us with more info. Maybe some code too.
Valuable infos are for example:
- how big is your index?
- how does the query look like?
- are you searching from a local file system or ram dir or from remote FS?
- how fast is the second search?
- which version of lucene are you u
Hi,
I have searching fields from multiple indexes. I am using Boolean Query. Index
Search is taking nearly 20 sec for one query.
I have read that Query Filter have a feature of caching the inner Query search
results. I am not sure which Query is useful whether Query Filter or boolean
query ?
If i understand you question right, you want do generate the snippet for the
result documents.
You can do something like the code below:
QueryScorer scorer = new QueryScorer(query);
Highlighter highlighter = new Highlighter(scorer);
highlighter.setTextFragmenter(new SimpleSpanFragmenter(scorer));
We use the contrib package 'Highlighter' to do exactly that on our PDF
newspaper website.
Dawn
On 03/02/2011 17:31, Gong Li wrote:
Hi,
I am developing an advanced pdf search engine in java by using pdfbox and
lucene. And I must display the context of each keyword in the user
interface, but i
The Lucene highlighter sounds just what you need.
http://hrycan.com/2009/10/25/lucene-highlighter-howto/ talks about
using it on an index of PDFs. Google will find lots of other info.
--
Ian.
On Thu, Feb 3, 2011 at 5:31 PM, Gong Li wrote:
> Hi,
>
> I am developing an advanced pdf search engin
Hi,
I am developing an advanced pdf search engine in java by using pdfbox and
lucene. And I must display the context of each keyword in the user interface,
but i cannot find a method to do so. Most of the methods provided are used to
deal with documents with whole content in the specified fiel
Payloads only make sense in terms of specific positions in the index, so I
don't think there is a way to hack Lucene for it. You could, I suppose, just
store the payload for the first instance of the term.
Also, what's the use case you are trying to solve here? Why store term
frequency as a p
Hi,
I am developing an advanced pdf search engine in java by using pdfbox and
lucene. And I must display the context of each keyword in the user
interface, but i cannot find a method to do so. Most of the methods provided
are used to deal with documents with whole content in the specified field,
a
Hi Anuj,
You have to subclass QueryParser and override newRangeQuery() to parse
yourself. Automatic parsing is impossible, because QueryParser does not know
(in contrast to Apache Solr) which fields have which type (Lucene has no
field schema).
Example how to do this:
http://mail-archives.apache
Is there a query syntax for specifying a numeric range for a field indexed
as a NumericField.
I've tried
numericfield:[0 TO 10]
>
But it is parsed as a TermRangeQuery and not a NumericRangeQuery.
Many thanks
Anuj
> there is a entire RAM resident part and a Iterator API that reads /
> streams data directly from disk.
> look at DocValuesEnum vs, Source
Nice, thanks!
On Thu, Feb 3, 2011 at 12:20 AM, Simon Willnauer
wrote:
> On Thu, Feb 3, 2011 at 3:23 AM, Jason Rutherglen
> wrote:
>> Is it? I thought it w
On Thu, Feb 3, 2011 at 3:23 AM, Jason Rutherglen
wrote:
> Is it? I thought it would load the values into heap RAM like the
> field cache and in addition save the values to disk? Does it also
> read the values directly from disk?
there is a entire RAM resident part and a Iterator API that reads
21 matches
Mail list logo