I am far from perfect in this pdf text extracting, however I noticed
something in your code that you may want to check to clear up the reason
for this failure, see below..
"Shivani Sawhney" <[EMAIL PROTECTED]> wrote on 12/10/2006
22:54:07:
> Hi All,
>
> I am facing a peculiar problem.
>
> I am try
Hi All,
I am facing a peculiar problem.
I am trying to index a file and the indexing code executes without any error
but when I try to close the indexer, I get the following error and the error
comes very rarely but when it does, no code on document indexing works and I
finally have to delete al
Gecko? ;)
My advice: stay away from EJBs as much as you can. They are too complicated
and too heavy for most systems. Servlet containers like Jetty, Tomcat, or
Resin are often perfectly suitable for the job and a lot simpler.
Otis
- Original Message
From: "Chenini, Mohamed " <[EMAIL
Lots of memory will help a lot. I have a customer of DBSight and he is
using Intel Core Duo, and configure everything in memory. The index
size is about 700M. When I checked his system's average response time,
it's 12ms! I guess you can estimate what you will get from your beefy
machine.
So it ma
I think a standalone J2EE application will be good and better loose
coupling than EJB. You can seperate memory, disk, and CPU resources
from your main application. You can send results back in XML, JSON, or
other formats.
Chris Lu
-
Instant Full-Text Search On Any Database
Thanks, Erik for the pointer to Solr.
Since the document index is added to frequently, creating new IndexSearchers is
required anyway. We plan to 'age' out already created IndexSearcher and create
new ones every few minutes. Solr's cache regeneration would be useful in this
scenario.
Does the
On Oct 12, 2006, at 7:11 PM, Renaud Waldura wrote:
I'm developing an application used by scientists -- people who have
a pretty good idea of what logic is -- and they were shocked to
find out that neither of these queries return the same results:
1- banana AND apple OR orange
2- banana AND
Renaud Waldura wrote:
While we are also developing a query-building UI, users must be able to
enter text queries as well. What do other folks do? I mean, this is
pretty bad. I can hardly go back to my scientists and tell them Lucene
is unable to handle 2 boolean operators, that they should pare
There is also the Surround Query Parser in contrib by the way...I would bet
that Paul will tell you that it does not have these issues. I can't wait to
see the replies on this one...I didn't realize that the QueryParser had
these problems and am a bit skeptical...unfortunately I am away from home
I'm developing an application used by scientists -- people who have a pretty
good idea of what logic is -- and they were shocked to find out that neither
of these queries return the same results:
1- banana AND apple OR orange
2- banana AND (apple OR orange)
3- (banana AND apple) OR orange
I'd
"Scott Smith" <[EMAIL PROTECTED]> wrote on 12/10/2006 14:14:57:
> Supposed I want to index 500,000 documents (average document size is
> 4kBs). Let's assume I create a single index and that the index is
> static (I'm not going to add any new documents to it). I would guess
> the index would be a
Supposed I want to index 500,000 documents (average document size is
4kBs). Let's assume I create a single index and that the index is
static (I'm not going to add any new documents to it). I would guess
the index would be around 2GB.
Now, I do searches against this on a somewhat beefy mach
You really should be using the same IndexSearcher for successive
searches. Sorting works best when done with a "warm" searcher. Have
a look at Solr's warming strategy, and consider adopting that in some
way.
Erik
On Oct 12, 2006, at 3:04 PM, <[EMAIL PROTECTED]> wrote:
Hi folks
Hi folks,
I am using Lucene 2.0
In our application, I am indexing a stream of documents. Each document is
fairly small (< 1 KB), but there can be 10's of millions of documents. Each
document has a Timestamp field. Users can enter free-form searches and a
date/time range. They are most interest
For example in the following statement
doc.add(new Field("contents", parser.getReader(), Field.TermVector.YES));
The reader is causing the IOException when internally invertDocument()
method is called where tokenstream is generated from the reader. I am not
worried if the document info is corrupt
IN THEORY, EJB containers are better able than Tomcat to spread
incoming requests over a multitude of servers. There was considerable
discussion some time ago about index search speed on a single
processor. I do not remember the details, but there was some
information about how fast a search
On Oct 12, 2006, at 10:17 AM, Apache Lucene wrote:
When I am adding a document to the lucene index if the method
throws an
IOException and if I continue with adding other documents ignoring the
exception, will the index be corrupted? What happens to the fields
which are
already written to t
EJB explicitly precludes you from accessing files, including via third party
libraries such as Lucene.
http://java.sun.com/blueprints/qanda/ejb_tier/restrictions.html
In practice you may be able to get away with it but I see no particular reasons
why using an EJB server should offer any benefit
go to http://briefcase.yahoo.com/pickupartistmistry
click on login
enter user pickupartistmistry
password: chotachetan
the document should be there
-tom
Bill Taylor wrote:
When I went there, I got a message that there were no shared folders
in the brief case.
It never gave me an opportunity t
Hello,
This is a design question: For Lucene to be able to process a million
documents and in the purpose for the search application to be scalable
and still have a good response time do we need to use an EJB container
such as Weblogic or is a Servlet container such as Tomcat sufficient to
do the
When I went there, I got a message that there were no shared folders in
the brief case.
It never gave me an opportunity to enter the password.
Thanks.
Bill Taylor
On Oct 12, 2006, at 6:34 AM, sachin wrote:
Hello,
I have got lot of personal emails for sharing the "Lucene
Investigation"
docu
When I am adding a document to the lucene index if the method throws an
IOException and if I continue with adding other documents ignoring the
exception, will the index be corrupted? What happens to the fields which are
already written to the index?
did someone delete the shared doc ?
[EMAIL PROTECTED] wrote:
Hello,
I have got lot of personal emails for sharing the "Lucene Investigation"
document. It is not possible to reply each of the Emails. So I am putting
this document inside my briefcase. Anyone interested please go to following
sit
Hello,
I have got lot of personal emails for sharing the "Lucene Investigation"
document. It is not possible to reply each of the Emails. So I am putting
this document inside my briefcase. Anyone interested please go to following
site and get the document.
http://briefcase.yahoo.com/pickupartistm
24 matches
Mail list logo