TermDocs

2010-05-12 Thread roy-lucene-user
Hi guys,

I've had this code for some time but am just now questioning if it works.

I have a custom filter that i've been using since Lucene 1.4 to Lucene 2.2.0 
and it essentially builds up a BitSet like so:

for ( int x = 0; x < fields.length; x++ ) {
for ( int y = 0; y < values.length; y++ ) {
TermDocs termDocs = reader.termDocs( new Term( fields[x], values[y] ) );
try {
while ( termDocs.next() ) {
int doc = termDocs.doc();
bits.set( doc );
}
}
finally {
termDocs.close();
}
}
}

I notice that it grabs all the TermDocs for the first field and value but 
nothing after that.  But I do know that the other values exist but I don't get 
any TermDocs afterwards.

Do I need to reopen the IndexReader each time?

Regards,
Roy

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: getting started

2008-08-01 Thread roy-lucene-user
Hello Brittany,

I think the easiest thing for you to do is make each line a Document.  You
might want a FileName and LineNumber field on top of a "Text" field, this
way if you need to gather all the lines of your File back together again you
can do a search on the FileName.

So in your case:

Document 1
  FileName: [the file]
  LineNumber: 1
  Text: I like apples
Document 2
  ...etc

Regards,
Roy

On Fri, Aug 1, 2008 at 9:28 AM, Brittany Jacobs <[EMAIL PROTECTED]>wrote:

> Just trying to grasp the concept.
>
>
>
> I want to search a text file where each line is a separate item to be
> searched.  When text it entered by the user, I want to return all the lines
> in which that text appears.
>
> For example, if the text file has:
>
> I like apples.
>
> I went to the store.
>
> I bought an apple.
>
>
>
> If the user searches "apple", I want it to return the first and third
> sentences.
>
>
>
> Is each sentence a Token?  Is the user input going to be a QueryParser?
>  How
> should I read in the file so that each line of text is a token to search?
>
>
>
> Thanks in advance.
>
>
>
>
>
>
>
> Brittany Jacobs
>
> Java Developer
>
> JBManagement, Inc.
>
> 12 Christopher Way, Suite 103
>
> Eatontown, NJ 07724
>
> ph: 732-542-9200 ext. 229
>
> fax: 732-380-0678
>
> email:   [EMAIL PROTECTED]
>
>
>
>


Re: getting started

2008-08-01 Thread roy-lucene-user
That certainly works if the intent is to grab the entire file.   If all you
want is that particular line to be returned in the search then that's not
going to work.

Let's say the files was made up of a million lines and the text was stored
in the index (I know, absurd).

When grabbing the Document from a search, you don't necessarily want to grab
all the lines.

Also when you get the document, how do you know which Field contained the
line you wanted?

Roy

On Fri, Aug 1, 2008 at 9:59 AM, ನಾಗೇಶ್ ಸುಬ್ರಹ್ಮಣ್ಯ (Nagesh S) <
[EMAIL PROTECTED]> wrote:

>
> Why should each line be a Document ? If there is a single document having
> each line as a Field, then the search would result in a single Document as
> a
> 'hit' not the individual lines matching it. Is this right ?
>
> Nagesh
>
> On Fri, Aug 1, 2008 at 7:21 PM, <[EMAIL PROTECTED]> wrote:
>
> > Hello Brittany,
> >
> > I think the easiest thing for you to do is make each line a Document.
>  You
> > might want a FileName and LineNumber field on top of a "Text" field, this
> > way if you need to gather all the lines of your File back together again
> > you
> > can do a search on the FileName.
> >
> > So in your case:
> >
> > Document 1
> >  FileName: [the file]
> >  LineNumber: 1
> >  Text: I like apples
> > Document 2
> >  ...etc
> >
> > Regards,
> > Roy
> >
> > On Fri, Aug 1, 2008 at 9:28 AM, Brittany Jacobs <
> [EMAIL PROTECTED]
> > >wrote:
> >
> > > Just trying to grasp the concept.
> > >
> > >
> > >
> > > I want to search a text file where each line is a separate item to be
> > > searched.  When text it entered by the user, I want to return all the
> > lines
> > > in which that text appears.
> > >
> > > For example, if the text file has:
> > >
> > > I like apples.
> > >
> > > I went to the store.
> > >
> > > I bought an apple.
> > >
> > >
> > >
> > > If the user searches "apple", I want it to return the first and third
> > > sentences.
> > >
> > >
> > >
> > > Is each sentence a Token?  Is the user input going to be a QueryParser?
> > >  How
> > > should I read in the file so that each line of text is a token to
> search?
> > >
> > >
> > >
> > > Thanks in advance.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > Brittany Jacobs
> > >
> > > Java Developer
> > >
> > > JBManagement, Inc.
> > >
> > > 12 Christopher Way, Suite 103
> > >
> > > Eatontown, NJ 07724
> > >
> > > ph: 732-542-9200 ext. 229
> > >
> > > fax: 732-380-0678
> > >
> > > email:   [EMAIL PROTECTED]
> > >
> > >
> > >
> > >
> >
>


new added documents not showing

2005-03-17 Thread roy-lucene-user
Hi guys,

We were noticing some odd behavior recently with our indexes.

We have a process that adds new documents into our indexes.  When we iterate 
through all the documents through an IndexReader, we're not seeing the new 
documents and we're not seeing the new documents when we run a search.

However, after optimizing, suddenly those new documents appear.  Its almost as 
if the new segments are not being read by the IndexReader.

Any thoughts?

We're running on Windows NT, jdk 1.4.2 and lucene 1.4.

Roy.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: new added documents not showing

2005-03-18 Thread roy-lucene-user
> > However, after optimizing, suddenly those new documents
> > appear.  Its almost as if the new segments are not being read
> > by the IndexReader.
> 
> You need to close IndexWriter before open IndexReader. Or reopen
> IndexReader.
> 
> See TestIndexReader.java:: private void deleteReaderWriterConflict(boolean
> optimize) throws IOException
> for more info.

I don't think that is the problem since I'm starting a complete new IndexReader 
after an indexing process has completed in a new jvm.

But reading the TestIndexReader I found this:
// REQUEST OPTIMIZATION
// This causes a new segment to become current for all subsequent
// searchers. Because of this, deletions made via a previously open
// reader, which would be applied to that reader's segment, are lost
// for subsequent searchers/readers

Does this mean I have optimize for my new segments to be recognized?

Roy.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: new added documents not showing

2005-03-18 Thread roy-lucene-user
Hi guys,

Just trying to understand where problems can occur.

Maybe I need to describe our indexing process some more.

We create new indexes as "index parts" with Documents that are supposed to 
contain unique ID key fields.

These "index parts" get merged into two separate indexes: a main index and a 
message index.  The main index does not "store" any text (for faster search 
retrieval), so we create new Documents based on the old Documents and add them 
with the IndexWriter.addDocument method.

There is also the possibility that multiple copies of the same document exist 
(updated versions) in the main index and the "index parts".  Also multiple 
copies of the same document can exist in separate "index parts" as well.

So what we do is build a unique list of Documents based on the unique ID field 
by iterating through every Document in every "index part".  This entails 
opening and closing an IndexReader for each "index part".

We then make sure that none of these documents exist in the main index.  We 
open another IndexReader to the main index and then loop through all the 
Documents and run a IndexReader.delete( new Term( "ID", id ) ).  The 
IndexReader is then closed.

We then open an IndexWriter to the main index.  We then loop through all the 
Documents again, this time do a IndexWriter.addDocument for each Document.  We 
then close the main index.

We do not optimize until the weekend.

There is also another jvm running that handles searches.  This could be running 
a search on the main index at any given moment.  However, it closes the 
IndexSearcher after every search.

Are there areas here where it can cause the problem?

Roy.

On Thu, 17 Mar 2005 17:52:17 -0500, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> We were noticing some odd behavior recently with our indexes.
> 
> We have a process that adds new documents into our indexes.  When we iterate 
> through all the documents through an IndexReader, we're not seeing the new 
> documents and we're not seeing the new documents when we run a search.
> 
> However, after optimizing, suddenly those new documents appear.  Its almost 
> as if the new segments are not being read by the IndexReader.
> 
> Any thoughts?
> 
> We're running on Windows NT, jdk 1.4.2 and lucene 1.4.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: new added documents not showing

2005-03-21 Thread roy-lucene-user
On Sat, 19 Mar 2005 22:43:44 +0300, Pasha Bizhan <[EMAIL PROTECTED]> wrote:
> Could you provide the code snippets for your process?
> 

Sure (thanx for helping, btw)

I just realized that the way I described our process was off a little bit.

Here's the process again:

1.  grab all index Directorys (index parts)
2.  loop newest to oldest and make documents unique (by deleting older 
documents)
3.  get list of documents from index parts to delete from our main index
4.  delete documents from main index
5.  add all documents from index parts into the main index

I apologize for the amount of code below.

Here is the code that loops through all the index parts, from newest to oldest, 
and then deletes the documents from any older index parts.

The unique ID we use as a Key Field is "ReceivedDate".

IndexReader reader = null;
IndexReader reader2 = null;

try {
/*
 *-
 * Loop backwards (latest to oldest) through parts
 *-
 */
for ( int i = ( directories.length - 1 ); i >= 0; i-- ) {
reader = IndexReader.open( FSDirectory.getDirectory( 
directories[i], false ) );
int numDocuments = reader.numDocs();

/*
 *-
 * Loop forward (oldest to latest) up to the current part
 * being looked at.
 * Delete any messages from the older parts that exist in the
 * current part.
 *-
 */
for ( int x = 0; x < i; x++ ) {
String partName = directories[x].getName();
reader2 = IndexReader.open( FSDirectory.getDirectory( 
directories[x], false ) );

for ( int h = 0; h < numDocuments; h++ ) {
if ( !reader.isDeleted( h ) ) {
Document d = reader.document( h );
String receivedDate = d.get( "ReceivedDate" );
Term term = new Term( "ReceivedDate", receivedDate 
);
int num = reader2.delete( term );
}
}

reader2.close();
reader2 = null;
}

reader.close();
reader = null;
}
}
catch ( Exception e ) {
// log error
}
finally {
try {
if ( reader != null ) reader.close();
if ( reader2 != null ) reader2.close();
}
catch ( IOException e ) {
// log error
}
}

Here we build up a list of ReceivedDates to help us delete from the main.index. 
 I just realized that we could build this list from the previous section.

List list = new ArrayList();
for ( int i = 0; i < directories.length; i++ ) {
IndexReader r = null;
try {
r = IndexReader.open( directories[i] );
int num = r.numDocs();

for ( int x = 0; x < num; x++ ) {
if ( !r.isDeleted( x ) ) {
Map map = new HashMap();
Document d = r.document( x );
map.put( "ReceivedDate", d.get( "ReceivedDate" ) );
list.add( map );
}
}
}
catch ( Exception e ) {
e.printStackTrace();
}
finally {
if ( r != null ) try { r.close(); } catch ( Exception e ) {}
}
}
return list;

Here we actually go through and delete the documents from the main index.

IndexReader reader = null;

Map message;
try {
reader = IndexReader.open( mainindex );
Iterator it = indexList.iterator(); // returned from previous 
section

/*
 *-
 * Loop through messages to clear from the index
 *-
 */
while ( it.hasNext() ) {
message = (Map)it.next();

/*
 *-
 * Delete based on received date
 *-
 */
String receivedDate = (String)message.get( "ReceivedDate" );
Term term = new Term( "ReceivedDate", receivedDate );
int num = reader.delete( term );

Re: new added documents not showing

2005-03-21 Thread roy-lucene-user
> When do you open the index writer? Where is the code?

Ah, sorry.  That last section is in a method that gets called in a loop.

IndexWriter writer = null;
try {
writer = new IndexWriter( mainindex, new StandardAnalyzer(), false 
);
for ( int i = 0; i < directories.length; i++ ) {
moveDocumentsOver( writer, directories[i] );
// delete dir
}
}
catch ( Exception e ) {
// log error
}
finally {
if ( writer != null ) try { writer.close(); } catch ( Exception e ) 
{}
}

Roy.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: new added documents not showing

2005-03-21 Thread roy-lucene-user
correct, we also can't see the new documents when we open an IndexReader to the 
main index.

Roy.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: new added documents not showing

2005-03-22 Thread roy-lucene-user
Pasha, in short, that is all I'm trying to do.  Wasn't an issue really before.

Otis, not sure what Luke is.  But the documents appear after we optimize.

Roy.


On Mon, 21 Mar 2005 18:20:32 -0800 (PST), Otis Gospodnetic
<[EMAIL PROTECTED]> wrote:
> * Replies will be sent through Spamex to java-user@lucene.apache.org
> * For additional info click -> http://www.spamex.com/i/?v=6020981
> 
> Hello,
> 
> Sorry if this is stating the obvious, but have you used Luke to verify
> that the new documents were indexed in the first place?  Sorry if
> you've already mentioned this.
> 
> Otis
> 
> 
> --- [EMAIL PROTECTED] wrote:
> > > When do you open the index writer? Where is the code?
> >
> > Ah, sorry.  That last section is in a method that gets called in a
> > loop.
> >
> > IndexWriter writer = null;
> > try {
> > writer = new IndexWriter( mainindex, new
> > StandardAnalyzer(), false );
> > for ( int i = 0; i < directories.length; i++ ) {
> > moveDocumentsOver( writer, directories[i] );
> > // delete dir
> > }
> > }
> > catch ( Exception e ) {
> > // log error
> > }
> > finally {
> > if ( writer != null ) try { writer.close(); } catch (
> > Exception e ) {}
> > }
> >
> > Roy.
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: new added documents not showing

2005-03-23 Thread roy-lucene-user
Pasha, in short, that is all I'm trying to do.  Wasn't an issue really before.

Otis, not sure what Luke is.  But the documents appear after we optimize.

Roy.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene faster on JDK 1.5?

2005-07-08 Thread roy-lucene-user
This might be a good time to ask another question. Are there any advantages 
to lucene using the java.nio package?

Roy

On 7/8/05, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
> 
> 
> Nothing significant, but I've been using 1.5 on 
> Simpy.com(lots of
> Lucene behind it) for over a year now, and I'm happy with it.
> 
> Otis
> 
> 
> --- [EMAIL PROTECTED] wrote:
> 
> > Are people seeing a significant speed performance with Lucene when
> > they upgrade
> > to JDK 1.5?
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
>