Hello, I have an index of locations for example. I'm indexing one field
using SimpleAnalyzer.
doc1: albany ny
doc2: hudson ny
doc3: new york ny
doc4: new york mills ny
when I search for "new york ny" , the first result returned is always "new
york mills ny". Am I doing something incorrect?
than
On 7/31/06, Simon Willnauer <[EMAIL PROTECTED]> wrote:
Hello,
I do have a question about fields with empty content should be added
to the document / index or not. I do have a index schema, which
defines all field a document can have. if one of the real documents
has no content for a certain fiel
Hi everyone,
If you're in Seattle for SIGIR, come to this meeting of FOHLNs (Friends
Of Hadoop, Lucene and Nutch). We'll talk about search and get something
to eat and drink.
Please RSVP via the Evite below so I can get a bigger venue if necessary.
http://www.evite.com/app/publicUrl/[EMAIL PRO
Hi Simon,
You can index an empty field ("" value), but there is no point in doing that,
really.
If you index am empty string, you will not be able to find documents that had
that field empty.
You will not be able to do a WHERE foo IS NULL type of query, unless you detect
an empty field during i
Hi Otis,
well if i have to such a query I need a "dummy" value. To point that
out a bit more in detail...
A xml document has "n" mandatory elements described by a schema or
dtd. Some of the could have empty values. Would it make any difference
to the index / searching if I just index an empty st
Hi Simon,
If you want to be able to run a "give me all documents that have an empty field
F", then you'll actually have to stuff a "dummy" value when no real value for
field F is present. If you have an index schema, perhaps that's a good place
to add a 'defaultValue'-type attribute with that
Steven Rowe wrote:
Michael J. Prichard wrote:
Hey Otis,
Sure I would love to! Can you ping me at [EMAIL PROTECTED] and
let me know what I need to do? Do I just post it to JIRA?
Thanks,
Michael
Otis Gospodnetic wrote:
A good place for that in JIRA. could you put it there? We ha
Michael J. Prichard wrote:
> Hey Otis,
>
> Sure I would love to! Can you ping me at [EMAIL PROTECTED] and
> let me know what I need to do? Do I just post it to JIRA?
>
> Thanks,
> Michael
>
> Otis Gospodnetic wrote:
>
>> A good place for that in JIRA. could you put it there? We have a
>> b
Hi All,
I thought our technology might interest the group.
Cypher is one of the first software program available which generates
metadata represention of natural language input. The program outputs RDF
graph and SeRQL query representations of a sentences, clauses, and phrases.
The Cypher framewo
Of course, another approach doesn't occur to me until the weekend. But,
even if building a filter is a time-consuming process, you could always
build them as a warm-up when your searcher starts, and cache them *then*.
That way, the user doesn't see a long pause when the filter is built the
fir
Thank you for the reply Doran! You are exactly right about the sql
count(*). I need the equivalent of group by, and count().
We have considered a 'joined' index where we would have a document for
each permutation. We discarded it (possibly prematurely) based on the
rapid explosion in the number
Thank you for the reply.
I am certainly open to different ways of organizing / indexing our
documents. However, the example I provided was simplified for the sake
of the discussion. In truth, what I was calling a category may be an
arbitrary set of movie ids (determined by a previous query). Th
Thanks for all the response. I am going to investigate java mail api
along with Michael's "Email Analyzer" code that was posted in this group.
thanks,
suba suresh.
John Haxby wrote:
Andrzej Bialecki wrote:
Just for the record - I've been using javamail POP and IMAP providers
in the past, and
I would like to use the email analyzer code. I am thinking of using it
along with java mail api. I have two different projects. In one I have
to parse the emails sent and extract the subject and the email address.
The other project I have to parse and index it in lucene for later
search and re
This is more of a design question. I have a ton of email that is
indexed. I need to search based on a date range so I use a RangeQuery
added to a BooleanQuery to search. This works. Now I need to include
another clause that will narrow the result even more. AND on top of
that I will need s
Hey Otis,
Sure I would love to! Can you ping me at [EMAIL PROTECTED] and
let me know what I need to do? Do I just post it to JIRA?
Thanks,
Michael
Otis Gospodnetic wrote:
A good place for that in JIRA. could you put it there? We have a bunch of
analyzers in Lucene's contrib, so if you
Awesome! Thanks!
Otis Gospodnetic wrote:
Or simpler:
wr = new IndexWriter(indexDir, aWrapper, !IndexReader.indexExists(indexDir));
- Original Message
From: Michael J. Prichard <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Sunday, July 30, 2006 1:35:29 PM
Subject: Re: PerF
You should build your own performance test cases to see what works for your
data. That being said, here are some numbers from a similar test I ran:
I did the following:
1) run a single term query which resulted in about half of the total set of
documents being returned. (~36,000)
2) built a Bo
Hello,
I do have a question about fields with empty content should be added
to the document / index or not. I do have a index schema, which
defines all field a document can have. if one of the real documents
has no content for a certain field. should that field be added to the
index or not.
Would
Ref 1: I was just about to show you a link at Sun but I realise that it was
my misread! OK, so the maximum heap is 2G on a 32-bit Linux platform, which
doubles the numbers, and yes indeed 64 bits seems like a good idea, if
having sort indexes in RAM is a good use of resources. But there must be a
b
On Mon, 2006-07-31 at 11:54 +0200, Andrzej Bialecki wrote:
> Chris Hostetter wrote:
> > 1) I didn't know there were any JVMs that limited the heap size to 1GB ...
> > a 32bit address space would impose a hard limit of 4GB, and I've heard
> > that Windows limits process to 2GB, but I don't know of a
Chris Hostetter wrote:
1) I didn't know there were any JVMs that limited the heap size to 1GB ...
a 32bit address space would impose a hard limit of 4GB, and I've heard
that Windows limits process to 2GB, but I don't know of any JVMs that have
1GB limits.
I believe all Win32 JVM-s have a lim
1) I didn't know there were any JVMs that limited the heap size to 1GB ...
a 32bit address space would impose a hard limit of 4GB, and I've heard
that Windows limits process to 2GB, but I don't know of any JVMs that have
1GB limits.
If you really need to deal with indexes big enough for that to m
What JVM are you using?
Can you post a small sample program (or better yet: jUnit test) that
causes this problem ?
: Date: Sun, 30 Jul 2006 07:31:55 -0700
: From: Alan Ezust <[EMAIL PROTECTED]>
: Reply-To: java-user@lucene.apache.org
: To: java-user@lucene.apache.org
: Subject: java.lang.Illegal
it would certainly be possible to get a score that was a simple count of
the number of matching clauses of a boolean query -- probably just with a
modified Similarity (no coord, 1/0 tf, no idf, no norms) but you *might*
need a slightly modified TermScorer to do that.
In general though, i think yo
25 matches
Mail list logo