Thanks Ian, and Mike -- the code below was the result of badly copying
the Javadocs in exasperation and panic: all points taken with gratitude.
Cheers
Lee
On 04/03/2011 16:40, Ian Lea wrote:
Looks basically OK to me. I wonder if you need the isCurrent() check
as well as if (newReader
Hello list,
Does this look correct? I am told it is not functioning, in that new
entries to the index are not being picked-up?
Thanks
Lee
try {
if (! reader.isCurrent()){
IndexReader newReader = reader.reopen();
if (newReader != reader
Hi Java-user mailing list,
Please add me to the ContributorsGroup wiki page so I can edit the wiki
wikiuser: leehinman
Thanks!
;; Lee
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands
Clarification question:
If I don't store term vectors, then I:
-- won't have information on the position of matching terms
-- I don't have the term frequency vector
-- but I should still have the frequency of terms per document in the .frq
file, right?
So what's the difference between the term f
ley wrote:
>
> On Thu, Aug 21, 2008 at 7:20 PM, David Lee <[EMAIL PROTECTED]> wrote:
>>
>>> Clarification question:
>>>
>>> If I don't store term vectors, then I:
>>> -- won't have information on the position of matching terms
>>&
So from what I understand, is it true that if mergeFactor is 10, then when I
index my first 9 documents, I have 9 separate segments, each containing 1
document? And when searching, it will search through every segment?
Thanks!
David
; and
> #optimize(getMergeFactor())
> (btw #optimize() is equal to optimize(1) ).
>
>
> Best regards
> Karsten
>
> p.s. and yes, searching goes through every segment.
>
>
> David Lee-26 wrote:
> >
> > So from what I understand, is it true that if
Hi,
I was wondering when lucene queries two or more terms, does that mean the
time it takes will be twice as long? For example if I search +lucene
+apache, then does lucene get all the documents that match 'lucene' and all
the documents that match 'apache', and then combine them together? Or can it
ROTECTED]> wrote:
> On Thu, Sep 25, 2008 at 1:39 PM, David Lee <[EMAIL PROTECTED]> wrote:
> > I was wondering when lucene queries two or more terms, does that mean the
> > time it takes will be twice as long? For example if I search +lucene
> > +apache, then does lucene ge
t of these
projects are associated to lucene, someone might know.
David Lee
Hi Paul,
The clone() in SegmentInfos is correct. The best practice of clone is to
delegate the clone to the super class (if you look at the source code for
Vector, it too delegates to its super class, which is the Object) to create a
shallow copy, and then do a cloning of each of its mutable field
i think, very likely, you have another copy of java.util.Vector loaded, and
this one tries to be too clever with its implementation of clone (instantiate a
new Vector instance) instead of delegating to its super class (Object).
HTH,
Edwin
--- Chris Hostetter <[EMAIL PROTECTED]> wrote:
>
> :
Hi,
Probably off-topic, but just like to plug a bit on my blog post here:
http://tinyurl.com/4vytcc :p (incidentally, Java GC is one of my favourite
topics) It's not very detailed, but i would like to think it's a good place to
start reading...
Just like to point out a couple of things:
1. If yo
> >package that I have downloaded (some sort of incompatibility?). I will
> try
> >to recompile the lucene package in my own environment and see if I can
> fix
> >the problem.
> >
> >
> > On Sat, Oct 4, 2008 at 2:21 AM, Edwin Lee <[EMAIL PROTE
tells me that there is something wrong with Lucene build file that
> > causes this problem, but I have no idea what it could be. In Lucene's
> > common-build.xml, I changed the 1.4 properties to 1.6 propreties but still
> > to no avail.
> >
> >
I need know document's top n raw score & term.
For example,
If one document have {apple, banana, coconut} terms, and I need top 2 score
in the document.
Simple way is just search all term in the document and sort the score - like
as below.
first, search about 'apple' term then write the score
Hi all,
i'm using Lucene 2.3.1. What i'm trying to do seems straightforward enough (to
me), but i just can't find the method to do so.
Let's say i'm doing a PhraseQuery of the phrase "apples and oranges" with a
non-zero slop value, and it returns, e.g., 20 Hits. Because i'm using non-zero
slo
Thanks,
Edwin
> Date: Sat, 19 Apr 2008 22:01:17 +0200
> From: [EMAIL PROTECTED]
> To: java-user@lucene.apache.org
> Subject: Re: How to Retrieve Found Term?
>
> Edwin Lee skrev:
>> Hi all,
>>
>> i'm using Lucene 2.3.1. What i'm trying to
Hi Karl,
Thanks for the suggestions, i would be glad to contribute back to the project.
i'm not too familiar with the inner workings of Lucene though; how does such a
functionality feature in a Query implementation?
My naive interpretation, when i first got hold of Lucene, is that Query is wha
If I'm using a computer that has multiple cores, or if I want to use several
computers to speed up the indexing process, how should I do that? Is there
some kind of support for that in the API?
David Lee
Is it possible to do nested proximity searches with lucene?
i.e. can I say I want a to be within 1 word of b and then that group to be
within 4 words of c? The syntax ""a b"~1" c"~4 doesn't seem to work (since
it treats the first two quotes as a pair and the later 2 as another pair).
iling list for simple questions like this?
I tried googling, but didn't seem to get the information I wanted. Thanks!
David Lee
Hi,
I want to use Lucene/Nutch to index my mysql
database. I think of using JDBC, is it a good idea?
I searched all over the web, but all the examples are
non-lucene/Nutch related. Would you guys give me
pointers or websites or examples on how to use JDBC on
Lucene/Nutch to index mysql databas
Hi,
I just found a open source project called Compass
that works with Lucene to index database like mysql.
Has anyone used it? If so, please let us know what
you think about Compass.
Many thanks.
__
Do You Yahoo!?
Tired of spam? Yahoo! Mail has
p://www.theserverside.com/tss?service=direct/0/NewsThread/threadViewer.markNoisy.link&sp=l35679&sp=l180646
>
> Chris
>
> Lucene Search On Any Database
> http://www.dbsight.net
>
> On 10/26/05, Sam Lee <[EMAIL PROTECTED]>
> wr
I have a situation where I want to search for individual words in a
phrase as well as the phrase itself. For example, if the user enters
["classical music"] (with quotes) I want to find documents that
contain "classical music" (the phrase) *and* the individual words
"classical" and "music"
On Oct 28, 2005, at 10:38 AM, Erik Hatcher wrote:
So in this case a matching document must have both terms? Or could
it just have one or the other? If it must have both, you could try
a PhraseQuery with a slop of Integer.MAX_VALUE. PhraseQuery scores
closer matches higher.
Good to know,
On Oct 28, 2005, at 8:17 PM, Chris Hostetter wrote:
One thing to keep in mind is that if you have things you are adding
to hte
query to restrict the results, but you don't want them to
contribute to
the score, then try using a Filter instead. If you can't find an
easy way
to replace a query
Hi,
How do I use Nutch to crawl internal database
instead of web server? Does Nutch even support this
option?
Many thanks.
__
Yahoo! FareChase: Search multiple travel sites in one click.
http://farechase.yahoo.com
--
On Nov 3, 2005, at 9:37 AM, Oren Shir wrote:
If I understand correctly, when sorting by Sort.INDEXORDER the oldest
documents that were added to the index will be returned first. I
want the
reverse, because I'm more interested in newer documents.
Looking at the source, I see that Sort.INDEXOR
On Nov 3, 2005, at 10:22 AM, Oren Shir wrote:
There is no constructor for Sort(SortField, boolean) in Lucene API.
Which
version are you using?
I think 1.9rc1. I have a pretty recent svn checkout -- maybe this
constructor is new.
--Andy
-
Hi,
I am going to use mysql db to store some data, use
lucene(java) to index these data, and use Hibernate to
map them. I was originally thinking of using PHP to
input the data the visitors enter into the mysql db.
But if I use PHP and use mysql statement directly, it
may defeat the part of the pur
Hi,
I know that there are several ports of Lucene, like
cLucene, pLucene, etc. Are there other ports of Nutch
besides java?
Many thanks.
__
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com
---
Hi,
I use php and mysql. The visitors enters data
through the web and the data is stored in the
database. I want to make portions of that data to be
searchable using Lucene.
I am thinking of giving that data to Lucene for
indexing at the same time of inputing that same data
into the database
I forgot to mention that if I use php-java-bridge to
use Lucene to index at the same time I input the data
into the mysql db, I don't even need to use JDBC. If
I index inside the business logic layer which is
java, then I will have to use JDBC.
--- Victor Lee <[EMAIL PROTECTED]> w
Hi,
I use Lucene to index stuff that are changed very often but don't need to be
real-time to searchers. e.g. the search result can be changed couple times per
minute, but I only need to show the change every 5 minutes or so. Is it a good
idea to save the search result to a database like m
Sorry, actually I meant all search results, not just frequent results. And
there is only one search term per search, it's the stuff that belongs to the
search terms change often.
Victor Lee <[EMAIL PROTECTED]> wrote: Hi,
I use Lucene to index stuff that are changed very ofte
re
searcher.search(Query) using a basic query type like TermQuery, I
very seriously doubt you'd beat MySQL performance. What kind of
Query are you using for your searches?
Erik
On 24 Nov 2005, at 17:54, Victor Lee wrote:
> Sorry, actually I meant all search results, not just frequent
I'd put my money on Lucene beating MySQL in
the TermQuery scenario you described (e=hello) ;) But you'd be wise
to step out of design mode and get some real-world tests going. And
even if there is a performance difference, we're talking milliseconds
most likely.
Erik
On 24 No
Hi,
I am using Memeoryindex as described here:
http://dsd.lbl.gov/nux/api/org/apache/lucene/index/memory/MemoryIndex.html .
I am using it to match lots(10 thousands) of queries with one document. Then
I want to rank them based on score and some other variables. I want to know if
there i
time it will need to be queued in some way to make
sure it happens after the re-indexing.
I was just wondering if anyone had any pointers for doing this kind of
thing. Any help would be gratefully appreciated.
Many thanks
Lee
Lee Turner | Java Developer | Oyster Partners
D. +44 (0)20
hread pausing the queue to stop
it processing while the re-indexing takes place. I will also take a look at
quartz.
Your input is very much appreciated
Many thanks
Lee
-Original Message-
From: Jens Kraemer [mailto:[EMAIL PROTECTED]
Sent: 05 April 2005 09:30
To: java-user@lucene.apach
eadLocals hanging around ?
Any help would be greatly appreciated.
Many thanks
Lee
Lee Turner | Java Developer | Oyster Partners
D. +44 (0)20 74461418
T. +44 (0)20 7446 7500
www.oyster.com
_
The API for BooleanQuery only seems to allow adding clauses. The
nearest way I can see to *remove* a clause is by laboriously
constructing a new BooleanQuery (assuming you aren't absolutely tied
to the original instance) and adding all the clauses from the
original query except the one you
Oops, I'm confusing libraries. I meant I want to remove a Nutch
Clause from a Nutch Query.
--Andy
On Oct 13, 2005, at 4:45 PM, Andy Lee wrote:
The API for BooleanQuery only seems to allow adding clauses. The
nearest way I can see to *remove* a clause is by laboriously
construct
Hi,
I am implementing a Google Adwords-like Text Ad thing.
In Adwords, advertisers enter keywords and phases in
their ads. When visitor visits a webpage with
potential Google text ads, I want to know how they
link the webpage to the actual text ads? Linking those
text ads to the webpage is easy, th
ypically describe
> pages of that class
> - it then automatically creates the pattern
> descriptors it will use
> against other pages.
>
> Hope this helps,
>
> Simon
>
> Sam Lee wrote:
>
> >Hi,
> >I am implementing a Google Adwords-like Text Ad
&
Hi,
Do you guys have good recommendation on websites
that have detail explanation about how to use Lucene?
If they have source examples too, that would be great.
I already read the book Lucene in Action.
Many thanks.
__
Yahoo! FareChase: Se
Hi,
Normally, lucene or Nutch can match query "nike shoe
-blue" with "red nike shoe".
But what about matching "red nike shoe" with query
"nike shoe -blue"? It is the other way around. Can I
do it with a combinations of API?
Many thanks.
__
Do Yo
s it? Please
> elaborate more on what you're after. Maybe what
> you're looking for
> is the contrib/memory and the MemoryIndex within
> that Subversion area.
>
> Erik
>
>
> On 22 Oct 2005, at 18:54, Sam Lee wrote:
>
> > Hi,
> >
m your page (ajax), remove stop
> words, build a
> query from the page words by connect the words with
> OR and you will
> find the best matching ad.
> You may need to limit the words per page or set the
> maximum clauses
> to a much higher number.
> HTH
> Stefan
&g
d positive one called
> negative
> you query have to look somehow like this:
> positive: (keyword1 keywordN) AND NOT
> negative:(keyword1 keywordN)
>
> Am 23.10.2005 um 20:50 schrieb Sam Lee:
>
> > Yes, I thought of that. But since the ads have
> > negative keywor
d the MemoryIndex within
> that Subversion area.
>
> Erik
>
>
> On 22 Oct 2005, at 18:54, Sam Lee wrote:
>
> > Hi,
> > Normally, lucene or Nutch can match query "nike
> shoe
> > -blue" with "red nike shoe".
> >
> > But
Hi,
Someone suggested that I should use MemoryIndex to
match content to a large # of queries. e.g. "nike red
shoes" --match--> "nike shoes -blue" and --match-->
"nike shoes -black"... What if I have 10 of these
queries for each content? and there maybe 100 of
these contents.
But how f
s ( eg +/-
> operators) which may
> cause them to fail when run as queries against the
> MemoryIndexed subject
> doc which is why the first "query the queries"
> search is insufficient to
> find the matches.
>
> Cheers,
> Mark
>
>
> Sam Lee wro
Hi,
My network is designed to have a bunch of advertisers
to enter their ads with keywords. I think of using
mysql to store those, and then use lucene and part of
nutch to index them from mysql db, so that the
websites can find and show the ads. But how do I
integrate lucene/nutch with mysql?
t; <http://issues.apache.org/jira/browse/LUCENE-434>
>
> For the record, Derby is the Apache open source
> database. It's a
> full-featured relational database backed by an
> active open source
> community: http://db.apache.org/derby/.
>
> Cheers,
> -Rick
&g
available
> as open source...
> Am 25.10.2005 um 09:14 schrieb Sam Lee:
>
> > Hi,
> > My network is designed to have a bunch of
> advertisers
> > to enter their ads with keywords. I think of
> using
> > mysql to store those, and then use lucene and part
&g
Hi,
I am wondering if I can use Lucene to substitute
real database like mysql db? I know that many people
use lucene only to index mysql db because of inferior
full-text index of mysql.
Can Lucene to be used in place of mysql so that
website visitors can input data that will in turn
inserting
t; On Dienstag 25 Oktober 2005 22:37, Sam Lee wrote:
>
> > Can Lucene to be used in place of mysql so that
> > website visitors can input data that will in turn
> > inserting row into Lucene just like mysql db?
>
> That's a bad idea. Lucene lacks a real update (you
Hi,
I would like to know how's the performance during indexing and searching of
results on a large index files would be like.
And is it possible to create multiple index files and search across multiple
index files? If possible, may I know how could it be done?
Thanks a lot.
---
gth());
for (int c = 0; c < hits.length(); c++) {
Document doc = hits.doc(c);
System.out.println("Query found in file: " + doc.get("path"));
System.out.println("Content: " + doc.get("text"));
}
Regards,
Lee Li Bin
omcat for Chinese
search using Lucence
2. do we need to use JSP meta / page encoding ? what is the encoding
for jsp?
Regards,
Lee Li Bin
-Original Message-
From: Chris Lu [mailto:[EMAIL PROTECTED]
Sent: Monday, June 18, 2007 2:10 AM
To: java-user@lucene.apache.org
Subjec
hieu Lecarme [mailto:[EMAIL PROTECTED]
Sent: Monday, June 18, 2007 8:58 PM
To: java-user@lucene.apache.org
Subject: Re: Lucene for chinese search
Lee Li Bin a écrit :
> Hi,
>
> I still met problem for searching of Chinese words.
> XMl file which is the datasource and analyzer has already
stant Scalable Full-Text Search On Any Database/Application
> > site: http://www.dbsight.net
> > demo: http://search.dbsight.com
> > Lucene Database Search in 3 minutes:
> > http://wiki.dbsight.com/index.php?
> > title=Create_Lucene_Database_Search_in_3_minutes
Hi,
May I know how do I store TermVector?
When I set the last parameter to true, isn't it setting storeTermVector to
true?
But I get null value in TermFreqVector.
BTW, I'm using lucene 1.4.3
Not intended to upgrade to 2.0
docAll.add(Field.Text("contentText", new StringReader(allCo
Hi,
does anyone knows how to do pagination on jsp page using the number of hits
return? Or any other solutions?
Do provide me with some sample coding if possible or a step by step guide.
Sry if I'm asking too much, I'm new to lucene.
Thanks
Hi,
I still have no idea of how to get it done. Can give me some details?
The web application is in jsp btw.
Thanks a lot.
Regards,
Lee Li Bin
-Original Message-
From: Chris Lu [mailto:[EMAIL PROTECTED]
Sent: Saturday, June 30, 2007 2:21 AM
To: java-user@lucene.apache.org
Subject
Hi,
Thanks Mark!
I do have the same question as Alixandre. How do I get the content of the
document instead of the document id?
Thanks.
Regards,
Lee Li Bin
-Original Message-
From: Alixandre Santana [mailto:[EMAIL PROTECTED]
Sent: Tuesday, July 03, 2007 12:55 AM
To: java-user
Hi Mark,
How do I display results on the second page?
I manage to display on one page using your coding.
Regards,
Lee Li Bin
-Original Message-
From: Alixandre Santana [mailto:[EMAIL PROTECTED]
Sent: Tuesday, July 03, 2007 12:55 AM
To: java-user@lucene.apache.org
Subject: Re
Hi,
Anyone knows how to highlight Chinese character? When I do the highlight, it
tends to highlight the whole sentence instead of the keywords.
For Chinese highlighting, do I need to use the TermVector in order to
highlight the correct keywords?
Thanks
ought would have
been trimmed off with the higher threshold. With a threshold of 0.15
they would score 0.17, and with a threshold of 0.30 they are scoring
something like 0.33. Can anybody explain this? My trimming is coming
post-index-searching, so this is pretty confusing.
Thanks in
to the synonym index and let Lucene perform all
the work of synonym lookup / replacement?
Thanks in advance,
Andrew Lee
-
***
***
Confidentiality Warning: This message and
synonyms question
Hi Andrew,
There is othing built into Lucene for synonyms, but you can grab the code from
Lucene in Action to see how they can be handled (plus:
http://www.lucenebook.com/search?query=synonyms for some context)
Otis
- Original Message
From: "Lee, Andrew J (CA - To
74 matches
Mail list logo