Writing and searching same time

2006-11-29 Thread Java Programmer

Hello,
I have trouble with writing and searching on lucene index same time,
all I did so far is making a class which has 2 methods:
private String indexLocation;

public void addDocument(int id,String title, String body) throws IOException{
IndexWriter indexWriter = new IndexWriter(indexLocation, new
SimpleAnalyzer(), false);
Document doc = new Document();
doc.add(new 
Field("id",Integer.toString(id),Store.YES,Index.NO));
doc.add(new Field("title",title,Store.NO,Index.TOKENIZED));
doc.add(new Field("body",body,Store.NO,Index.TOKENIZED));
indexWriter.addDocument(doc);
indexWriter.close();
}

public List search(String query) throws IOException, ParseException{
IndexSearcher indexSearcher = new IndexSearcher(indexLocation);
MultiFieldQueryParser queryparser = new 
MultiFieldQueryParser(new
String[]{"title","body"}, new SimpleAnalyzer());
Query q = queryparser.parse(query);
Hits hits = indexSearcher.search(q);
Iterator it = hits.iterator();
List output = new ArrayList();
while(it.hasNext()){

output.add(Integer.parseInt(((Hit)it.next()).getDocument().get("id")));
}
indexSearcher.close();
return output;
}
What I don't like is that I have in each method opening IndexWriter
and IndexSearcher, I try to open them once and keep opened throught
whole lifecycle of application (which would be very long cause it
would be search for news working as webservice), but when I wasn't
close IndexWriter then IndexSearcher wasn't seen any new documents in
index. Next step was keeping IndexWriter open and reopen only
IndexSearcher but in this case also IndexSearcher was seen old index
without new documents. So my final version is this above, but could it
be better, without closing IndexWriter after each addition, and
opening IndexSearcher before each search query? What is the best
pattern of doing such systems?

Another question: do I need provide any synchronization on
indexWriter.addDocument(doc) method? I see that it isn't synchronized,
so maybe programmer need to do it himself?

Best regards,
Adr

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Lucene change field values to wrong ones when indexing

2006-12-14 Thread Java Programmer

Hello,
I have problem with my search code - i try to index some data with
searching simultanously. Everything goes fine till some number of data
are indexed then my fields are bugged.
Eg. I have field with title indexed as "Nowitzki führt "Mavs" zum
ersten Heimsieg" and inner id "15" (not doc id, just field called id).
At the end of indexing this field disappear, and some other values in
id field appear. I provide full listing of my program, without AXIS
part which is responsible for data transmitting. If you can watch my
code, maybe somewhere is wrong locking mechanism, or any other bug -
please help me if you can.

import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;

import javax.servlet.ServletContext;
import javax.servlet.http.HttpServlet;

import org.apache.axis.MessageContext;
import org.apache.axis.transport.http.HTTPConstants;
import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.Field.Index;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.index.IndexModifier;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.queryParser.MultiFieldQueryParser;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.search.BooleanClause;
import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.RangeQuery;
import org.apache.lucene.search.Sort;

public class SNewsSearch {

private static SNewsSearch search = new SNewsSearch();

private static IndexSearcher indexSearcher;

private String indexLocation;

private String adminLogin;

private String adminPass;

private SNewsSearch(){
System.out.println("Constructor SNewsSearch");

Properties props = new Properties();
HttpServlet srv = (HttpServlet)
MessageContext.getCurrentContext().getProperty(HTTPConstants.MC_HTTP_SERVLET);
ServletContext context = srv.getServletContext ();
InputStream is = 
context.getResourceAsStream("/WEB-INF/lucene.properties");
try {
props.load(is);
System.out.println("Using index: " + 
props.getProperty("index.location"));
this.indexLocation = 
props.getProperty("index.location");
this.adminLogin = props.getProperty("admin.login");
this.adminPass = props.getProperty("admin.pass");
} catch (IOException e) {
e.printStackTrace();
}
}

public static SNewsSearch getInstance(){
return search;
}

public void setIndexLocation(String indexLocation) {
this.indexLocation = indexLocation;
}

private static Object lock = new Object();

public void addDocument(int id,String title, String body, int
pubDate) throws IOException{
Document doc = new Document();
doc.add(new 
Field("id",Integer.toString(id),Store.YES,Index.UN_TOKENIZED));
doc.add(new Field("title",title,Store.NO,Index.TOKENIZED));
doc.add(new Field("body",body,Store.NO,Index.TOKENIZED));
doc.add(new 
Field("pubDate",Integer.toString(pubDate),Store.NO,Index.UN_TOKENIZED));
synchronized(lock){
IndexWriter indexWriter = new 
IndexWriter(indexLocation, new
SimpleAnalyzer(), false);
indexWriter.addDocument(doc);
indexWriter.close();
if(indexSearcher!=null)
indexSearcher.close();
indexSearcher = null;
}
}

public Map search(String query, int pubDate, int page,
int results, boolean reverse) throws IOException, ParseException{
MultiFieldQueryParser queryparser = new 
MultiFieldQueryParser(new
String[]{"title","body"}, new SimpleAnalyzer());
Query q; // = queryparser.parse(query);
if(pubDate!=0){
BooleanQuery bq = new BooleanQuery();

bq.add(queryparser.parse(query),BooleanClause.Occur.MUST);
bq.add(new RangeQuery(
new 
Term("pubDate",Integer.toString(pubDate)),
new Term("pubDate",""),
true),BooleanClause.

Speedup indexing process

2006-02-17 Thread Java Programmer
Hi,
Maybe this question is trivial but I need to ask it. I've some problem with
indexing large number of documents, and I seek for better solution.
Task is to index about 33GB text data CSV (each record about 30kB), it
possible of course to index these data but I'm not very happy with timings
(about 26 hours), so I want to know how can i speed up this process. First I
think about splitting CVS file into smaller ones, eg 5GB and index them on 6
indexing computers, but now is my question - can I join such parts into one
index after indexing jobs on each computer is finished? I saw example wit
RAMDirectory which could be merged with
FSDirectory, but this example was about same IndexWriter, in my case I need
some separate IndexWriters on few computers. So does it possible with
Lucene?

Thx in advance for hints,
Adrian


Hitmaps of results (number of results for category/filter or groupping results)

2006-02-24 Thread Java Programmer
Hello,
I'm quite new to Lucene but I pretty amazed with her abilities. My question
consider something called commonly as "hitmap" is it possible to do it in
Lucene, maybe someone did it already, or maybe even it is built-in?
By term hitmap we describe e.g. ability to group results into category, for
example when searching eBay you get also an hitmap of results within
category (Computers [10], Sports [111], etc.) . So is it possible to do this
in Lucene, without huge CPU/time consumption?
I appreciate for any hints for this problem,

Best Regards
Adrian


Searching in paths

2006-03-14 Thread Java Programmer
Hello,
I have problem with indexing / quering paths eg I put
"/home/users/apache/txt/qqq__docu.txt" in field called "path", I wanted to
submit query to find all documents which are provided by my user apache, so
i tried to query Lucene as AND path:/home/users/* but not results were find
by such query if I asked any other field without / the results are provided
eg AND title natio*.
Where am I doing mistake? What I can do to ask for paths (and all what is
below of them)?

Best Regards,
Adr


Re: Searching in paths

2006-03-15 Thread Java Programmer
On 3/14/06, Mordo, Aviran (EXP N-NANNATEK) <[EMAIL PROTECTED]> wrote:
> You need to index the field as a keyword, or use an analyzer that will
> not strip the / from the string
>
> Aviran
> http://www.aviransplace.com

Field is indexed as Keyword, I was using StandardAnalyzer(), but
currently I try to send queries directly via API, so I use this code
for doing job:
PrefixQuery query1 = null;
if(cat!=""){
Term term = new Term("category",cat);
query1 = new PrefixQuery(term);
Hits hits = is.search(query1);
}

Variable cat takes path like strings as arguments so eg it gets:
Top/World/Poland/
which is translated into: category:Top/World/Poland/*

Everything started to work, but I get such Exception:
org.apache.lucene.search.BooleanQuery$TooManyClauses
org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:79)
org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:71)
org.apache.lucene.search.PrefixQuery.rewrite(PrefixQuery.java:50)
org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:166)
org.apache.lucene.search.Query.weight(Query.java:84)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:85)
org.apache.lucene.search.Hits.getMoreDocs(Hits.java:64)
org.apache.lucene.search.Hits.(Hits.java:43)
org.apache.lucene.search.Searcher.search(Searcher.java:33)
org.apache.lucene.search.Searcher.search(Searcher.java:27)

when I cut the path, so in other words - I don't get exception when
query is Top/World/Poland/ but I start to get them when query is
Top/World/. I have also getMaxClauseCount() = 1024.
Lucene probably try to find all keywords which are good for
Top/World/* and put them in boolean query, am I right? If yes, I have
problem because from root category there are about 3-4kk children
categories, which I also want to search (and show category as url, so
user could get deeper in category tree if he wants).

What can I do with this, besides set up higher values in maxClauseCount?

Best Regards,
Adr

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Searching in paths

2006-03-15 Thread Java Programmer
Reply to myself hate this :(

What about such solution:
Split path like string into smaller tokens and index them as seperate words eg:
#Top/World/Poland/# #Top/World/# #Top/#
so if I ask about word #Top/# I will get all the results for this
category, without making so many boolean queries.

Is there a better solution for my problem?

Best Regards,
Adr

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Grouping results by choosen field

2006-03-17 Thread Java Programmer
Hello,
I tried to search myself for soultion, but without any good result, so I
want to ask group.
My problem concerns result grouping, the best example will be Google search
where you have results sorted by relevance, and also grouped by domain (they
have little indent/margin). In my project I want to get similar
functionality, without very huge CPU consumption.

Can you share any helpful hints ?

Best Regards,
Adr


Re: Grouping results by choosen field

2006-03-21 Thread Java Programmer
On 3/17/06, Java Programmer <[EMAIL PROTECTED]> wrote:
>
> Hello,
> I tried to search myself for soultion, but without any good result, so I want 
> to ask group.
> My problem concerns result grouping, the best example will be Google search 
> where you have results sorted by relevance, and also grouped by domain (they 
> have little indent/margin). In my project I want to get similar 
> functionality, without very huge CPU consumption.
>
>  Can you share any helpful hints ?
>
>  Best Regards,
>  Adr
>

Hello,
I have written some code to do sorting form me (it's not perfect,
maybe it's even very poor solution, but I'm still learning). So if you
have a time please take a look:

long sort_start = new Date().getTime();

Map> domains = new
HashMap>();
List results = new ArrayList();

int i = 0;

while(i());
results.add(url);
}
domains.get(url).add(i);
i++;
}

long sort_end = new Date().getTime();

so I'm grouping results for domains in Lists to prevent order of
score, such ordered groups I put into Map and key of that Map I put
into another List to prevent order of most scored domains, so in
result I get:
- domain A score 1.0
-- domain A score 0.6
- domain B score 0.9
etc.

I put this code into servlet (Tomcat 5.5) and it's working but ...
when I made first query it take a long time to run whole sorting
process eg. 4900 ms, but when I run same query again (eg with paging),
it's run very quickly eg 40 ms - why such thing is happen? Is there
any optimizing in Lucene, or any kind of caching? When restarting
servlet for queries which already were asked results are at once, but
new queries always take long time to process.

Maybe I'm miss something when read documentation - someone can give me
an explanation?

Best regards,
Adr

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Return all distinct values

2006-03-30 Thread Java Programmer
Hello,
I created small Lucene's application which stores lot of my users
infomation, on of it is zipcode in numeric format eg. 50501, 63601 -
zip codes are stored in Text fields so they are fully searchable what
I want now to do is getting all unique zipcodes which was stored so
far. Something like SQL: select distinct zipcode from ...
Is it possible?

Best regards,
Adr

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]