Hello,
I used an Analyzer which removes stopwords when indexing, then I wanted
to do an AND search using MultiFieldQueryParser. So I did this:
word1 AND stopword AND word2
I thought the stopword would be ignored by the searcher (I use the same
Analyzer to index and search). But instead, I
yes its quite possible.
1.you need to create term which you need to search.
eg.
Term term = new Term("yourfield","yourword");
2. then create a TermDoc enum.
TermDocs provides an interface for enumerating pairs
for a term.
TermDocs t = new
FilterIndexReader(IndexReader.open("youindex")).termDocs(
You might want to look at the TermPositionVector. For it to work I think the
TermVector themselves have to be stored with option TermVector.YES
regards,
Dipesh
On Thu, Nov 13, 2008 at 4:26 AM, Sven <[EMAIL PROTECTED]> wrote:
> Hi everyone,
>
> I have a term "foo" and I want to count all the o
Also, by OS is the one reason we can think of now, but that doesn't
mean there aren't other reasons.
EG, who knows -- maybe for small indexes NIO doesn't help but for
large ones it does (just an example) and so you'd want non-static
choice.
Mike
Yonik Seeley wrote:
On Wed, Nov 12, 20
Good!
In fact now we see similar slowness with nio-thread vs nio-shared as
we see for RAM-thread vs RAM-shared. Ie, for both RAM and NIO you get
better performance sharing a single reader than reader-per-thread.
This is odd -- I would have expected that with infinite RAM reader-per-
thr
On Wed, Nov 12, 2008 at 5:00 PM, Chris Hostetter
<[EMAIL PROTECTED]> wrote:
> since the choice of FSDirectory varient is largly going to be based on OS,
> I can't think of any reason why a static setter method wouldn't be good
> enough in this particular case.
https://issues.apache.org/jira/browse
: >From the user perspective: a public constructor would be the most
: obvious, and would be consistent with RAMDirectory.
A lot of the cases where system properties are currently used can't
really be solved this way because the client isn't the one constructing
the object. SegmentReader's IMP
>From the user perspective: a public constructor would be the most
obvious, and would be consistent with RAMDirectory.
Dmitri
On Wed, Nov 12, 2008 at 4:50 AM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
>
> I think we really should open up a non-static way to choose a different
> FSDirectory im
Nice!
At 8 threads nio-shared catches up with ram-shared. Here's the complete table:
fs-thread nio-thread ram-thread fs-shared
nio-shared ram-shared
1 71877 70461 54739 73986 72155 61595
2 34949 34945 26735 43719 33019 28935
3
>
> Right, sounds like you have it spot on. That second * from 3 looks like a
> possible tricky part.
I agree that it will be the tricky part but I think as long as I'm careful
with counting as I iterate through it should be ok (I probably just doomed
myself by saying that...)
Right...you'd do i
Hello:
I am new to LUCENE and I am testing some issues about it. I can retrieve
the number of documents which satisfies a query, but I don't find how to
obtain the number of terms which match it.
For example, if I search for the word "house", I want to obtain the
number of times the word occurs (
Hi everyone,
I have a term "foo" and I want to count all the occurrences of all the
terms that are within 5 words of "foo" in all the documents which
contain "foo". For simplicity sake, this is only for a single field.
So if I have 3 documents (each with a single field) that look like this:
Onc
Greg Shackles wrote:
Thanks! This all actually sounds promising, I just want to make sure I'm
thinking about this correctly. Does this make sense?
Indexing process:
1) Get list of all words for a page and their attributes, stored in some
sort of data structure
2) Concatenate the text from tho
Thanks! This all actually sounds promising, I just want to make sure I'm
thinking about this correctly. Does this make sense?
Indexing process:
1) Get list of all words for a page and their attributes, stored in some
sort of data structure
2) Concatenate the text from those words (space separat
Hi everyone,
I have a term "foo" and I want to count all the occurrences of all the
terms that are within 5 words of "foo" in all the documents which
contain "foo". For simplicity sake, this is only for a single field.
So if I have 3 documents (each with a single field) that look like this:
Here is a great power point on payloads from Michael Busch:
www.us.apachecon.com/us2007/downloads/AdvancedIndexing*Lucene*.ppt.
Essentially, you can store metadata at each term position, so its an
excellent place to store attributes of the term - they are very fast to
load, efficient, etc.
Yo
Hey Mark,
This sounds very interesting. Is there any documentation or examples I
could see? I did a quick search but didn't really find much. It might just
be that I don't know how payloads work in Lucene, but I'm not sure how I
would see this actually doing what I need. My reasoning is this..
If your new to Lucene, this might be a little much (and maybe I am not
fully understand the problem), but you might try:
Add the attributes to the words in a payload with a PayloadAnalyzer. Do
searching as normal. Use the new PayloadSpanUtil class to get the
payloads for the matching words. (T
Hi Erick,
Thanks for the response, sorry that I was somewhat vague in the reasoning
for my implementation in the first post. I should have mentioned that the
word details are not details of the Lucene document, but are attributes
about the word that I am storing. Some examples are position on th
If I may suggest, could you expand upon what you're trying to
accomplish? Why do you care about the detailed information
about each word? The reason I'm suggesting this is "the XY
problem". That is, people often ask for details about a specific
approach when what they really need is a different app
I hope this isn't a dumb question or anything, I'm fairly new to Lucene so
I've been picking it up as I go pretty much. Without going into too much
detail, I need to store pages of text, and for each word on each page, store
detailed information about it. To do this, I have 2 indexes:
1) pages:
Note that the SpanQuery family are Querys, so they can
be used as clauses of a BooleanQuery just fine.
Making this work will be exciting...
<<>>
I'm having trouble understanding the use case. I don't
understand how the user can make sense of this, but then
it may well be unique to your problem sp
Hello Erick,
thank you very much for this interesting idea - but I'm not sure that the
SpanQuery will make every aspect I search for.
I think the lack is that in the case of a PhraseQuery (and I think also in
the case of the SpanQuery, but I'm not sure about yet), every term must appear
inside th
Or Tika, Lucene's cousin: http://incubator.apache.org/tika/
(which uses POI under the hood, but goes beyond MS Word parsing)
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
From: Donna L Gresh <[EMAIL PROTECTED]>
To: java-user@lucene.apache.or
Christian,
If I understand your situation correctly, you should look at sloppy phrases and
at Span family of queries.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
From: Christian Reuschling <[EMAIL PROTECTED]>
To: java-user@lucene.apache
But this is not the same - Lucene makes it transparent for you whether
you have one or several field entries for one attribute.
The behaviour will be the same in both of these cases:
Lucene document entry:
attName: "term1 term2"
attName: "term3 term4"
or
attName: "term1 term2 term3 term4"
For th
It's entirely unclear to me whether facets could help, since I haven't used
them, I've
seen these mentioned on the SOLR user list, it may bear investigating.
To expand on Stefan's point. I think his solution will work for you quite
well, but
there are a couple of tricks
The first thing to und
On Wednesday 12 November 2008 14:58:53 Christian Reuschling wrote:
> In order to offer some simple 1:n matching, currently we create
> several, counted attributes and expand our queries that we search
> inside each attribute, e.g.:
I use one attribute (Field) multiple times.
Stefan
-
Hello Friends,
In order to offer some simple 1:n matching, currently we create several, counted
attributes and expand our queries that we search inside each attribute, e.g.:
Query 'attName:myTerm' => Query 'attName1:myTerm attName2:myTerm'
This is not the fastest way, and sometimes not easy to
Check out POI; that's what I use
http://poi.apache.org/
"Sertic Mirko, Bedag" <[EMAIL PROTECTED]> wrote on 11/12/2008 03:25:47
AM:
> Hi
>
> You can also use a tool called "antiword" to extract the text from a
> .doc file, and then
> give the text to lucene.
>
> See here : http://en.wikipedia
I'm thinking about it, so if someone else doesn't get something together
before I have some free time...
Its just not clear to me at the moment how best to do it.
Michael McCandless wrote:
Any takers for pulling a patch together...?
Mike
Mark Miller wrote:
+1
- Mark
On Nov 12, 2008, at
Any takers for pulling a patch together...?
Mike
Mark Miller wrote:
+1
- Mark
On Nov 12, 2008, at 4:50 AM, Michael McCandless <[EMAIL PROTECTED]
> wrote:
I think we really should open up a non-static way to choose a
different FSDirectory impl? EG maybe add optional Class to
FSDir
+1
- Mark
On Nov 12, 2008, at 4:50 AM, Michael McCandless <[EMAIL PROTECTED]
> wrote:
I think we really should open up a non-static way to choose a
different FSDirectory impl? EG maybe add optional Class to
FSDirectory.getDirectory? Or maybe give NIOFSDirectory a public
ctor? Or s
I think we really should open up a non-static way to choose a
different FSDirectory impl? EG maybe add optional Class to
FSDirectory.getDirectory? Or maybe give NIOFSDirectory a public
ctor? Or something?
Mike
Mark Miller wrote:
Mark Miller wrote:
Thats a good point, and points out
Antiword would be hard to inject into Nutch as it is not Java based. It will
reqier native calls.
Alexander
2008/11/12 Sertic Mirko, Bedag <[EMAIL PROTECTED]>
> Hi
>
> You can also use a tool called "antiword" to extract the text from a .doc
> file, and then
> give the text to lucene.
>
> See he
Hi
You can also use a tool called "antiword" to extract the text from a .doc file,
and then
give the text to lucene.
See here : http://en.wikipedia.org/wiki/Antiword
Regards
Mirko
-Ursprüngliche Nachricht-
Von: dipesh [mailto:[EMAIL PROTECTED]
Gesendet: Mittwoch, 12. November 2008 04:
36 matches
Mail list logo