[POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Sudarsan, Sithu D.
Sincerely, Sithu D Sudarsan Grant Ingersoll wrote: > Where do you get your Lucene/Solr downloads from? > > [x] ASF Mirrors (linked in our release announcements or via the Lucene > website) > > [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) > > [x] I/we build them from

RE: RemoteSearchable deprecated. What to replace it with?

2011-06-16 Thread Sudarsan, Sithu D.
Hi Tsadok, In Lucene 3.1: "MultiSearcher is deprecated; ParallelMultiSearcher has been absorbed directly into IndexSearcher " -Sithu -Original Message- From: Israel Tsadok [mailto:itsa...@gmail.com] Sent: Thursday, June 16, 2011 1:35 AM To: java-user@lucene.apache.org Subject: RemoteS

Lucene sample code and api documentation

2008-08-27 Thread Sudarsan, Sithu D.
Hi All, I'm new to Lucene. 1. Could you please tell me as to where do we see the old emails (even one day old), not as an archived file but as a mailing list. 2. Where do we look for sample codes? Or detailed tutorials? 3. I found one at LuceneTutorial.com, but it is only for command line. Not

RE: Lucene sample code and api documentation

2008-08-28 Thread Sudarsan, Sithu D.
@lucene.apache.org Subject: Re: Lucene sample code and api documentation Sithu, Old emails: markmail.org Sample code: Lucene in Action has free downloadable code -- manning.com/hatcher2 Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: "Sudarsan,

RE: I am not able to run Lucene 2.4 Demo

2008-10-15 Thread Sudarsan, Sithu D.
Hi Prabina, The way your are specifying path E:\... is not correct. Use something like /prabina/lucene-2.4demo/src Hope this helps, Sincerely, Sithu Sudarsan -Original Message- From: prabina pattanayak [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 15, 2008 1:12 AM To: java-user@l

RE: I am not able to run Lucene 2.4 Demo

2008-10-20 Thread Sudarsan, Sithu D.
Hi, I'm using Lucene2.3.2, and no problem so far with Windows. One issue to look at would be, whether your Index directory has the permission to write. Probably, your Index folder is Read_only. Sincerely, Sithu Sudarsan Graduate Research Assistant, UALR & Visiting Researcher, CDRH/OSEL [EMAI

Multi -threaded indexing of large number of PDF documents

2008-10-23 Thread Sudarsan, Sithu D.
Hi, We are trying to index large collection of PDF documents, sizes varying from few KB to few GB. Lucene 2.3.2 with jdk 1.6.0_01 (with PDFBox for text extraction) and on Windows as well as CentOS Linux. Used java -Xms and -Xmx options, both at 1080m, even though we have 4GB on Windows and 32 GB

RE: Multi -threaded indexing of large number of PDF documents

2008-10-24 Thread Sudarsan, Sithu D.
Hi Glen, Mike, Grant & Mark Thank you for the quick responses. 1. Yes, I'm looking now at ThreadPoolExecutor. Looking for a sample code to improve the multi-threaded code. 2. We'll try using as many Indexwriters as the number of cores, first (which is 2cpu x 4 core = 8). 3. Yes, PDFBox except

RE: Multi -threaded indexing of large number of PDF documents

2008-10-24 Thread Sudarsan, Sithu D.
Eskildsen [mailto:[EMAIL PROTECTED] Sent: Friday, October 24, 2008 10:43 AM To: java-user@lucene.apache.org Subject: RE: Multi -threaded indexing of large number of PDF documents On Fri, 2008-10-24 at 16:01 +0200, Sudarsan, Sithu D. wrote: > 4. We've tried using larger JVM space by defin

RE: Multi -threaded indexing of large number of PDF documents

2008-11-14 Thread Sudarsan, Sithu D.
Hi All, Based on your valuable inputs, we tried a few experiments with number of threads. The observation is, if the number of threads are one less than the number of cores (we have 'main' as a separate thread. Essentially, including 'main' number of threads equal to number of cores), the indexi

RE: Merging indexes & multicore/multithreading

2008-12-02 Thread Sudarsan, Sithu D.
Our experience is, if the number of cores equal number of active threads, then it performs optimal using single JVM. Both on Windows XP and CentOS 5.2, with Lucene 2.3.2 Sincerely, Sithu D Sudarsan [EMAIL PROTECTED] [EMAIL PROTECTED] -Original Message- From: Glen Newton [mailto:[EMAIL

RE: Beginner: Best way to index and display orginal text of pdfs in search results

2008-12-12 Thread Sudarsan, Sithu D.
You can use PDFBOX. http://kalanir.blogspot.com/2008/08/indexing-pdf-documents-with-lucene.h tml Sincerely, Sithu D Sudarsan sithu.sudar...@fda.hhs.gov sdsudar...@ualr.edu -Original Message- From: maxmil [mailto:m...@alwayssunny.com] Sent: Friday, December 12, 2008 3:34 AM To: java-

Use of scanned documents for text extraction and indexing

2009-02-26 Thread Sudarsan, Sithu D.
Hi All: Is there any study / research done on using scanned paper documents as images (may be PDF), and then use some OCR or other technique for extracting text, and the resultant index quality? Thanks in advance, Sithu D Sudarsan sithu.sudar...@fda.hhs.gov sdsudar...@ualr.edu

RE: Learning Lucene

2009-03-05 Thread Sudarsan, Sithu D.
Hi Tuztuz, Please visit the book's website and the forum. You will get most queries cleared. Sincerely, Sithu D Sudarsan -Original Message- From: Tuztuz T [mailto:tuztu...@yahoo.com] Sent: Thursday, March 05, 2009 9:24 AM To: java-user@lucene.apache.org Subject: Learning Lucene dear a

Wordnet indexing error

2009-04-08 Thread Sudarsan, Sithu D.
Hi All, We're using Lucene 2.3.2 on Windows. When we try to generate index for WordNet2.0 using Syns2Index class, while indexing, the following error is thrown: Java.lang.NoSuchMethodError: org.apache.lucene.document.Field.UnIndexed(Ljava/lang/String;Ljava/lang/ String;)Lorg/apache/lucene/documen

RE: Wordnet indexing error

2009-04-10 Thread Sudarsan, Sithu D.
x27;t exist any more). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: "Sudarsan, Sithu D." > To: java-user@lucene.apache.org > Sent: Wednesday, April 8, 2009 7:01:16 PM > Subject: Wordnet indexing error > > Hi A

RE: Yet another NFS Question...

2009-04-27 Thread Sudarsan, Sithu D.
>What is the best way to handle this sort of situation? My inclination is > build a new Search Server (with fast HDDs and lots of Memory for tomcat) > and leave the indexer on the old server connected via NFS. - Our current development is on similar lines. Almost no deletes, but only lots of ADDD

RE: Wordnet indexing error

2009-04-27 Thread Sudarsan, Sithu D.
http://www.simpy.com/user/otis/search/wordnet Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: "Sudarsan, Sithu D." > To: java-user@lucene.apache.org > Sent: Friday, April 10, 2009 9:51:39 AM > Subject: RE: Wordnet indexin

Parsing large xml files

2009-05-21 Thread Sudarsan, Sithu D.
Hi, While trying to parse xml documents of about 50MB size, we run into OutOfMemoryError due to java heap space. Increasing JVM to use close 2GB (that is the max), does not help. Is there any API that could be used to handle such large single xml files? If Lucene is not the right place, please l

RE: Parsing large xml files

2009-05-21 Thread Sudarsan, Sithu D.
arser? I recommend against using an in-memory parser. On Thu, May 21, 2009 at 3:42 PM, Sudarsan, Sithu D. < sithu.sudar...@fda.hhs.gov> wrote: > > Hi, > > While trying to parse xml documents of about 50MB size, we run into > OutOfMemoryError due to java heap space. Incre

RE: Parsing large xml files

2009-05-22 Thread Sudarsan, Sithu D.
Thanks everyone for your useful suggestions/links. Lucene uses DOM and we tried with SAX. XML Pull & vtd-xml as well as Piccolo seem good. However, for now, we've broken the file into smaller chunks and then parsing it. When we get some time, we'ld like to refactor with the suggested ones. Er

RE: Parsing large xml files

2009-05-22 Thread Sudarsan, Sithu D.
Hi Matt, We use 32 bit JVM. Though it is supposed to have upto 4GB, any assignment above 2GB in Windows XP fails. The machine has quad-core dual processor. On Linux we're able to use 4GB though! If there is any setting that will let us use 4GB do let me know. Thanks, Sithu D Sudarsan -O

RE: No hits while searching!

2009-06-01 Thread Sudarsan, Sithu D.
Do you use stopword filtering? Sincerely, Sithu D Sudarsan -Original Message- From: vanshi [mailto:nilu.tha...@gmail.com] Sent: Monday, June 01, 2009 11:39 AM To: java-user@lucene.apache.org Subject: Re: No hits while searching! Thanks Erick, I was able to get this work...as you sai

RE: OutOfMemoryError using IndexWriter

2009-06-24 Thread Sudarsan, Sithu D.
Hi Stefan, Are you using Windows 32 bit? If so, sometimes, if the index file before optimizations crosses your jvm memory usage settings (if say 512MB), there is a possibility of this happening. Increase JVM memory settings if that is the case. Sincerely, Sithu D Sudarsan Off: 301-796-2587

RE: OutOfMemoryError using IndexWriter

2009-06-24 Thread Sudarsan, Sithu D.
"the index file before optimizations crosses your jvm memory usage settings (if say 512MB)" ? Could you please further explain this ? Stefan -Ursprüngliche Nachricht- Von: Sudarsan, Sithu D. [mailto:sithu.sudar...@fda.hhs.gov] Gesendet: Mi 24.06.2009 15:55 An: java-user@lucene.

RE: OutOfMemoryError using IndexWriter

2009-06-24 Thread Sudarsan, Sithu D.
: Sudarsan, Sithu D. [mailto:sithu.sudar...@fda.hhs.gov] Gesendet: Mi 24.06.2009 16:18 An: java-user@lucene.apache.org Betreff: RE: OutOfMemoryError using IndexWriter When the segments are merged, but not optimized. It happened at 1.8GB to our program, and now we develop and test in Win32 but run the

RE: Order of fields within a Document in Lucene 2.4+

2009-06-25 Thread Sudarsan, Sithu D.
I agree. Using Lucene 2.4.1 doc.getFields() returns in alpha order and not the order in which they were added. Sincerely, Sithu D Sudarsan -Original Message- From: Matt Turner [mailto:m4tt_tur...@hotmail.com] Sent: Thursday, June 25, 2009 4:33 PM To: java-user@lucene.apache.org Subje

RE: metrics for index ~100M docs

2009-09-24 Thread Sudarsan, Sithu D.
Hi Joel, With approx. 100K doc size, on dual-quad core machine, (3.0Ghz) - Windows platform, we have an average 1000 docs/sec. This includes text extraction from PDF docs. Hope this helps. Sincerely, Sithu D Sudarsan -Original Message- From: Joel Halbert [mailto:j...@su3analytics.co

RE: metrics for index ~100M docs ... Correction

2009-09-24 Thread Sudarsan, Sithu D.
- From: Sudarsan, Sithu D. [mailto:sithu.sudar...@fda.hhs.gov] Sent: Thursday, September 24, 2009 1:11 PM To: java-user@lucene.apache.org Subject: RE: metrics for index ~100M docs Hi Joel, With approx. 100K doc size, on dual-quad core machine, (3.0Ghz) - Windows platform, we have an average

RE: Lucene QueryParser and Analyzer

2010-04-29 Thread Sudarsan, Sithu D.
Hi, Is there a whitespace after the comma? Sincerely, Sithu D Sudarsan -Original Message- From: Wei Ho [mailto:we...@princeton.edu] Sent: Thursday, April 29, 2010 3:51 PM To: java-user@lucene.apache.org Subject: Lucene QueryParser and Analyzer Hello, I'm using Lucene to index and s

RE: Lucene QueryParser and Analyzer

2010-04-29 Thread Sudarsan, Sithu D.
and Input2? If that is not the case, what do I need to change? Thanks, Wei Ho Original Message Subject: Re: Lucene QueryParser and Analyzer From: Sudarsan, Sithu D. To: java-user@lucene.apache.org Date: 4/29/2010 3:54 PM > Hi, > > Is there a whitespace after

RE: Lucene QueryParser and Analyzer

2010-04-29 Thread Sudarsan, Sithu D.
ke to be sure that QueryParser is using the analyzer the way I expect it to. Thanks, Wei Original Message Subject: Re: Lucene QueryParser and Analyzer From: Sudarsan, Sithu D. To: java-user@lucene.apache.org Date: 4/29/2010 4:08 PM > > If so, > > Input1: c1c2c3

RE: Lucene QueryParser and Analyzer

2010-04-30 Thread Sudarsan, Sithu D.
query? That is, force Lucene to create Query2 for both Input1 and Input2. Thanks, Wei Original Message -------- Subject: Re: Lucene QueryParser and Analyzer From: Sudarsan, Sithu D. To: java-user@lucene.apache.org Date: 4/29/2010 4:54 PM > > ---sample code- > &