Sincerely,
Sithu D Sudarsan
Grant Ingersoll wrote:
> Where do you get your Lucene/Solr downloads from?
>
> [x] ASF Mirrors (linked in our release announcements or via the Lucene
> website)
>
> [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
>
> [x] I/we build them from
Hi Tsadok,
In Lucene 3.1:
"MultiSearcher is deprecated; ParallelMultiSearcher has been absorbed directly
into IndexSearcher "
-Sithu
-Original Message-
From: Israel Tsadok [mailto:itsa...@gmail.com]
Sent: Thursday, June 16, 2011 1:35 AM
To: java-user@lucene.apache.org
Subject: RemoteS
Hi All,
I'm new to Lucene.
1. Could you please tell me as to where do we see the old emails (even
one day old), not as an archived file but as a mailing list.
2. Where do we look for sample codes? Or detailed tutorials?
3. I found one at LuceneTutorial.com, but it is only for command line.
Not
@lucene.apache.org
Subject: Re: Lucene sample code and api documentation
Sithu,
Old emails: markmail.org
Sample code: Lucene in Action has free downloadable code --
manning.com/hatcher2
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: "Sudarsan,
Hi Prabina,
The way your are specifying path E:\... is not correct.
Use something like /prabina/lucene-2.4demo/src
Hope this helps,
Sincerely,
Sithu Sudarsan
-Original Message-
From: prabina pattanayak [mailto:[EMAIL PROTECTED]
Sent: Wednesday, October 15, 2008 1:12 AM
To: java-user@l
Hi,
I'm using Lucene2.3.2, and no problem so far with Windows.
One issue to look at would be, whether your Index directory has the permission
to write.
Probably, your Index folder is Read_only.
Sincerely,
Sithu Sudarsan
Graduate Research Assistant, UALR
& Visiting Researcher, CDRH/OSEL
[EMAI
Hi,
We are trying to index large collection of PDF documents, sizes varying
from few KB to few GB. Lucene 2.3.2 with jdk 1.6.0_01 (with PDFBox for
text extraction) and on Windows as well as CentOS Linux. Used java -Xms
and -Xmx options, both at 1080m, even though we have 4GB on Windows and
32 GB
Hi Glen, Mike, Grant & Mark
Thank you for the quick responses.
1. Yes, I'm looking now at ThreadPoolExecutor. Looking for a sample code
to improve the multi-threaded code.
2. We'll try using as many Indexwriters as the number of cores, first
(which is 2cpu x 4 core = 8).
3. Yes, PDFBox except
Eskildsen [mailto:[EMAIL PROTECTED]
Sent: Friday, October 24, 2008 10:43 AM
To: java-user@lucene.apache.org
Subject: RE: Multi -threaded indexing of large number of PDF documents
On Fri, 2008-10-24 at 16:01 +0200, Sudarsan, Sithu D. wrote:
> 4. We've tried using larger JVM space by defin
Hi All,
Based on your valuable inputs, we tried a few experiments with number of
threads. The observation is, if the number of threads are one less than
the number of cores (we have 'main' as a separate thread. Essentially,
including 'main' number of threads equal to number of cores), the
indexi
Our experience is, if the number of cores equal number of active
threads, then it performs optimal using single JVM.
Both on Windows XP and CentOS 5.2, with Lucene 2.3.2
Sincerely,
Sithu D Sudarsan
[EMAIL PROTECTED]
[EMAIL PROTECTED]
-Original Message-
From: Glen Newton [mailto:[EMAIL
You can use PDFBOX.
http://kalanir.blogspot.com/2008/08/indexing-pdf-documents-with-lucene.h
tml
Sincerely,
Sithu D Sudarsan
sithu.sudar...@fda.hhs.gov
sdsudar...@ualr.edu
-Original Message-
From: maxmil [mailto:m...@alwayssunny.com]
Sent: Friday, December 12, 2008 3:34 AM
To: java-
Hi All:
Is there any study / research done on using scanned paper documents as
images (may be PDF), and then use some OCR or other technique for
extracting text, and the resultant index quality?
Thanks in advance,
Sithu D Sudarsan
sithu.sudar...@fda.hhs.gov
sdsudar...@ualr.edu
Hi Tuztuz,
Please visit the book's website and the forum. You will get most queries
cleared.
Sincerely,
Sithu D Sudarsan
-Original Message-
From: Tuztuz T [mailto:tuztu...@yahoo.com]
Sent: Thursday, March 05, 2009 9:24 AM
To: java-user@lucene.apache.org
Subject: Learning Lucene
dear a
Hi All,
We're using Lucene 2.3.2 on Windows. When we try to generate index for
WordNet2.0 using Syns2Index class, while indexing, the following error
is thrown:
Java.lang.NoSuchMethodError:
org.apache.lucene.document.Field.UnIndexed(Ljava/lang/String;Ljava/lang/
String;)Lorg/apache/lucene/documen
x27;t exist any more).
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: "Sudarsan, Sithu D."
> To: java-user@lucene.apache.org
> Sent: Wednesday, April 8, 2009 7:01:16 PM
> Subject: Wordnet indexing error
>
> Hi A
>What is the best way to handle this sort of situation? My inclination
is
> build a new Search Server (with fast HDDs and lots of Memory for
tomcat)
> and leave the indexer on the old server connected via NFS.
- Our current development is on similar lines. Almost no deletes, but
only lots of ADDD
http://www.simpy.com/user/otis/search/wordnet
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: "Sudarsan, Sithu D."
> To: java-user@lucene.apache.org
> Sent: Friday, April 10, 2009 9:51:39 AM
> Subject: RE: Wordnet indexin
Hi,
While trying to parse xml documents of about 50MB size, we run into
OutOfMemoryError due to java heap space. Increasing JVM to use close 2GB
(that is the max), does not help. Is there any API that could be used to
handle such large single xml files?
If Lucene is not the right place, please l
arser? I recommend against using an in-memory
parser.
On Thu, May 21, 2009 at 3:42 PM, Sudarsan, Sithu D. <
sithu.sudar...@fda.hhs.gov> wrote:
>
> Hi,
>
> While trying to parse xml documents of about 50MB size, we run into
> OutOfMemoryError due to java heap space. Incre
Thanks everyone for your useful suggestions/links.
Lucene uses DOM and we tried with SAX.
XML Pull & vtd-xml as well as Piccolo seem good.
However, for now, we've broken the file into smaller chunks and then
parsing it.
When we get some time, we'ld like to refactor with the suggested ones.
Er
Hi Matt,
We use 32 bit JVM. Though it is supposed to have upto 4GB, any
assignment above 2GB in Windows XP fails. The machine has quad-core
dual processor.
On Linux we're able to use 4GB though!
If there is any setting that will let us use 4GB do let me know.
Thanks,
Sithu D Sudarsan
-O
Do you use stopword filtering?
Sincerely,
Sithu D Sudarsan
-Original Message-
From: vanshi [mailto:nilu.tha...@gmail.com]
Sent: Monday, June 01, 2009 11:39 AM
To: java-user@lucene.apache.org
Subject: Re: No hits while searching!
Thanks Erick, I was able to get this work...as you sai
Hi Stefan,
Are you using Windows 32 bit? If so, sometimes, if the index file before
optimizations crosses your jvm memory usage settings (if say 512MB),
there is a possibility of this happening.
Increase JVM memory settings if that is the case.
Sincerely,
Sithu D Sudarsan
Off: 301-796-2587
"the index file before
optimizations crosses your jvm memory usage settings (if say 512MB)" ?
Could you please further explain this ?
Stefan
-Ursprüngliche Nachricht-
Von: Sudarsan, Sithu D. [mailto:sithu.sudar...@fda.hhs.gov]
Gesendet: Mi 24.06.2009 15:55
An: java-user@lucene.
: Sudarsan, Sithu D. [mailto:sithu.sudar...@fda.hhs.gov]
Gesendet: Mi 24.06.2009 16:18
An: java-user@lucene.apache.org
Betreff: RE: OutOfMemoryError using IndexWriter
When the segments are merged, but not optimized. It happened at 1.8GB to our
program, and now we develop and test in Win32 but run the
I agree. Using Lucene 2.4.1 doc.getFields() returns in alpha order and
not the order in which they were added.
Sincerely,
Sithu D Sudarsan
-Original Message-
From: Matt Turner [mailto:m4tt_tur...@hotmail.com]
Sent: Thursday, June 25, 2009 4:33 PM
To: java-user@lucene.apache.org
Subje
Hi Joel,
With approx. 100K doc size, on dual-quad core machine, (3.0Ghz) -
Windows platform, we have an average 1000 docs/sec. This includes text
extraction from PDF docs.
Hope this helps.
Sincerely,
Sithu D Sudarsan
-Original Message-
From: Joel Halbert [mailto:j...@su3analytics.co
-
From: Sudarsan, Sithu D. [mailto:sithu.sudar...@fda.hhs.gov]
Sent: Thursday, September 24, 2009 1:11 PM
To: java-user@lucene.apache.org
Subject: RE: metrics for index ~100M docs
Hi Joel,
With approx. 100K doc size, on dual-quad core machine, (3.0Ghz) -
Windows platform, we have an average
Hi,
Is there a whitespace after the comma?
Sincerely,
Sithu D Sudarsan
-Original Message-
From: Wei Ho [mailto:we...@princeton.edu]
Sent: Thursday, April 29, 2010 3:51 PM
To: java-user@lucene.apache.org
Subject: Lucene QueryParser and Analyzer
Hello,
I'm using Lucene to index and s
and Input2? If
that is not the case, what do I need to change?
Thanks,
Wei Ho
Original Message
Subject: Re: Lucene QueryParser and Analyzer
From: Sudarsan, Sithu D.
To: java-user@lucene.apache.org
Date: 4/29/2010 3:54 PM
> Hi,
>
> Is there a whitespace after
ke to be sure that QueryParser is
using the analyzer the way I expect it to.
Thanks,
Wei
Original Message
Subject: Re: Lucene QueryParser and Analyzer
From: Sudarsan, Sithu D.
To: java-user@lucene.apache.org
Date: 4/29/2010 4:08 PM
>
> If so,
>
> Input1: c1c2c3
query? That is, force Lucene
to create Query2 for both Input1 and Input2.
Thanks,
Wei
Original Message --------
Subject: Re: Lucene QueryParser and Analyzer
From: Sudarsan, Sithu D.
To: java-user@lucene.apache.org
Date: 4/29/2010 4:54 PM
>
> ---sample code-
>
&
33 matches
Mail list logo