RE: Customizing indexing of large files

2012-02-27 Thread Prakash Reddy Bande
Rowe [mailto:sar...@syr.edu] Sent: Monday, February 27, 2012 2:16 PM To: java-user@lucene.apache.org Subject: RE: Customizing indexing of large files PatternReplaceCharFilter would probably work, or maybe a custom CharFilter? *CharFilter has the advantage of preserving original text offsets, for

RE: Customizing indexing of large files

2012-02-27 Thread Steven A Rowe
12:57 PM > To: java-user@lucene.apache.org > Subject: Re: Customizing indexing of large files > > Hi, > > Understood. > Write a custom FileReader that filters out the text you do not want. > This will do it streaming. > > Glen > > On Mon, Feb 27, 2012 at 1

Re: Customizing indexing of large files

2012-02-27 Thread Glen Newton
wton [mailto:glen.new...@gmail.com] > Sent: Monday, February 27, 2012 12:05 PM > To: java-user@lucene.apache.org > Subject: Re: Customizing indexing of large files > > I'd suggest writing a perl script or > insert-favourite-scripting-language-here script to pre-filter this > co

RE: Customizing indexing of large files

2012-02-27 Thread Prakash Reddy Bande
ewton [mailto:glen.new...@gmail.com] Sent: Monday, February 27, 2012 12:05 PM To: java-user@lucene.apache.org Subject: Re: Customizing indexing of large files I'd suggest writing a perl script or insert-favourite-scripting-language-here script to pre-filter this content out of the files before it gets

Re: Customizing indexing of large files

2012-02-27 Thread Glen Newton
I'd suggest writing a perl script or insert-favourite-scripting-language-here script to pre-filter this content out of the files before it gets to Lucene/Solr Or you could just grep for "Data' and"Description" (or is 'Description' multi-line)? -Glen Newton On Mon, Feb 27, 2012 at 11:55 AM, Prakas

Customizing indexing of large files

2012-02-27 Thread Prakash Reddy Bande
Hi, I want to customize the indexing of some specific kind of files I have. I am using 2.9.3 but upgrading is possible. This is how my file's data looks * Data for 2010 Description: This section has a general description of the data. DATA_BEGIN Month P1