: Unfortunately this is not that easy. Because I must be able to retrieve
: only one article and if i index all the content in one document then all
: the document will be retrieved instead of the single article.
i didn't say you had to *only* index the article contents in "group"
documents ... y
On Thu, 2006-07-27 at 08:59 +0200, Björn Ekengren wrote:
> > > When I close my application containing index writers the
> > > lock files are left in the temp directory causing an "Lock obtain
> > > timed out" error upon the next restart.
> >
> > My guess is that you keep a writer open even though
: I looked at the implementation of 'read(int[], int[])' in
: 'SegmentTermDocs' and saw that it did the following things:
: - check if the document has a frequency higher than 1, and if so read
: it;
: - check if the document has been deleted, and if so don't add it to the
: result;
: - store the
> I don't think it really matters wether you do deletes on the same
> IndexReader -- what matters is if there has been any deletes
> done to the
> index prior to opening the reader since it was last
> optimized. The reason
> being that deleting a document just causes a record of the
> deletion
Erick Erickson wrote:
As Miles said, use the DateTools (lucene) class with a DAY resolution.
That'll give you a MMDD format, which won't blow your query with a
"TooManyClauses" exception...
Remember that Lucene deals with strings, so you want to store things in
easily-manipulated string
Hi everyone,
I am just developing an application using Lucene, and I know how to get the
Term Freq via the IndexReader for the whole corpus. But I wonder if I can
get the term freq statistics just inside the query results, like I want the
hot words in just recent two weeks added into Lucene indic
Thancks everybody for the feedback. I now rewrote my app like this:
synchronized (searcher.getWriteLock()){
IndexReader reader = searcher.getIndexSearcher().getIndexReader();
try {
reader.deleteDocuments(new Term("id",id));
reader.cl
On Thu, 2006-07-27 at 11:06 +0200, Björn Ekengren wrote:
> Thancks everybody for the feedback. I now rewrote my app like this:
>
> synchronized (searcher.getWriteLock()){
> IndexReader reader = searcher.getIndexSearcher().getIndexReader();
> try {
>
I didn't describe the context fully. The app is a server that recieves updates
randomly a couple of hundred times a day and I want the index to be updated at
all times. If I would recieve several updates at once I could batch it but that
is quite unlikely.
_
Björn Ekengren
Bankaktiebol
I met this problem: when searching, I add documents to index. Although I
instantiates a new IndexSearcher, I can't retrieve the newly added
documents. I have to close the program and enter the program, then it will
be ok.
The platform is win xp. Is it the fault of xp?
Thank you in advance.
I met this problem: when searching, I add documents to index. Although I
instantiates a new IndexSearcher, I can't retrieve the newly added
documents. I have to close the program and enter the program, then it will
be ok.
Did you close your IndexWriter (so it flushes all changes to disk)
be
Hi John,
> Just for the record - I've been using javamail POP and IMAP providers in
> the past, and they were prone to hanging with some servers, and resource
> intensive. I've been also using Outlook (proper, not Outlook Express -
> this is AFAIK impossible to work with) via a Java-COM bridge suc
You could store Term Vectors for your documents, and then look up the
individual document vectors based on the query results. If you need
help w/ Term Vectors, check out Lucene in Action, search this list,
or http://www.cnlp.org/apachecon2005
-Grant
On Jul 27, 2006, at 4:52 AM, Jia Mi wr
Ok, I just tested it.
So consider:
String string = "word -foo";
String[] fields = { "title", "body" };
For the MultField I have:
MultiFieldQueryParser qp = new MultiFieldQueryParser(fields,
SearchEngine.ANALYZER);
Query fieldsQuery = qp.parse(string);
System.out.
I am curious about the potential use of document scoring as a means to
extract additional data from an index. Specifically, I would like the
score to be a count of how many times a particular field matched a set
of terms.
For example, I am indexing movie-stars (Each document is a movie-star).
A
I built an indexer that runs through email and its attachments, rips out
content and what not and then creates a Document and adds it to an
index. It works w/ no problem. The issue is that it takes around 3-5
seconds per email and I have seen up to 10-15 seconds for email w/
attachments. I n
Is this the W3 Ent collection you are indexing?
MC
Yes - parallelizing works great - we built a share-nothing java-spaces based
system at X1 and on a 11-way cluster were able to index 350 office documents
per second - this included the binary-2-text conversion, using Stellent INSO
libraries. The trick is to create separate indexes and, if you do no
Michael,
Certainly parallelizing on a set of servers would work (hmm... hadoop?), but if
you want to do this on a single machine you should tune some of the IndexWriter
params. You didn't mention them, so I assume you didn't tune anything yet. If
you have Lucene in Action, check out
2.7.1
Oits,
You mentioned the hadoop project. I check it out not a long time ago and
I read someting about it did not support the lucene index. Is it possible to
index and then search in a HDFS?
[]s
Rossini
On 7/27/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
Michael,
Certainly paralleli
Hi,
I'm going to attempt to output several thousand documents from a 3+ million
document collection into a csv file.
What is the most efficient method of retrieving all the text from the fields of
each document one by one? Please help!
Thanks,
Malcolm
I know there has been a lot of discussion on distributed search...I am
looking for a cross platform solution, which seems to kill solr's
approach...Everyone seems to have implemented this, but only as
proprietary code...it would seem that just using the RMI searcher would
allow a simple solutio
I think:
- Get the number of documents from IndexReader.
- Go from 0 to that number.
- If reader.deleted(docId) == false
get doc
output doc fields' content
Otis
- Original Message
From: MALCOLM CLARK <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Thursday, July 27, 200
I think we have an RMI example in Lucene in Action.
You could also look at how Nutch does it. I think the code is in
org.apache.nutch.ipc package.
I'm not sure why cross-platform requirement rules out Solr, I would think it
would exactly the opposite.
As for 10m limit, it depends. It depends on
Rossini{},
I think what you might have read might have been that searching a Lucene index
that lives in a HDFS would be slow. As far as I understand things, the thing
to do is to copy the index to a local disk, out of HDFS, and then search it
with Lucene from there.
Otis()
- Original Mes
Otis Gospodnetic wrote:
I think we have an RMI example in Lucene in Action.
You could also look at how Nutch does it. I think the code is in
org.apache.nutch.ipc package.
I'm not sure why cross-platform requirement rules out Solr, I would think it
would exactly the opposite.
As for 10m limit,
On 7/27/06, Mark Miller <[EMAIL PROTECTED]> wrote:
I thought I read that solr requires an OS that
supports hard links and thought that Windows only supports soft links.
For the default index distribution method from master to searcher,
yes, hard-links are currently needed.
The distribution mec
Yes, I have closed IndexWriter. But it doesn't work.
2006/7/27, Michael McCandless <[EMAIL PROTECTED]>:
> I met this problem: when searching, I add documents to index. Although
I
> instantiates a new IndexSearcher, I can't retrieve the newly added
> documents. I have to close the program an
Thank you, Grant, really help me :P
On 7/27/06, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
You could store Term Vectors for your documents, and then look up the
individual document vectors based on the query results. If you need
help w/ Term Vectors, check out Lucene in Action, search this li
Hi Mark -
Having gone down this path for the past year, I echo comments from others
that scalability/availability/failover is a lot of work. We migrated away
from a custom system based on Lucene running on Windows to Solr running on
Linux. It took us 6 months to get our system to a solid five-n
30 matches
Mail list logo