Similarity

2005-12-19 Thread Klaus
Hi All,

 

I'm new to lucene and a have some questions according to the entire system.

 

I) What is exactly written to the index? Is the index just an inverted list?
Is there term weight scoring stored?

 

II) How works the retrieval process work? I guess so:

 

1)   Get all the documents from the index via the inverted list. 

2)   Compute the score for every document and the query with the
similarity class. As far as i can see, the similarity is just based on the
tf-ddf weighting? Is there no cosine measure or so used, to compare the
document and the query vector?

 

Thanks a lot

 

Klaus



AW: Lucene parsing for PDF

2005-12-29 Thread Klaus

Hi,

I think the easiest way is ro exclude the pages while you are parsing the
pdf document. So you will provide just the necessary pages to lucene.
Another solution is to create for each site an own document, this should
hafe a field "pagenumber" or, und you can delete the document from the
index. 

Peace


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Finding similar documents

2006-01-09 Thread Klaus
Hi,

is there are build-in method for finding similar documents to one given
document?

Thx,

Klaus


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RF and IDF

2006-01-11 Thread Klaus
Hi all,

do you know how the tf und idf values are computed by the default
similarity? I mean the exact mathematical equation.

Thx,

Klaus




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



AW: RF and IDF

2006-01-11 Thread Klaus
Thx, but where can I find this classes?

>If you really want to understand how scoring works, I'd suggest also
>looking at TermWeight/TermScorer.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Boolean Query

2006-01-11 Thread Klaus
Hi,

I have got another question... How do I construct a BooleanQuery, where the
terms with the query a connected with OR? 

I have a list of term, representing to high scored terms in a document. Here
is my code

BooleanQuery query = new BooleanQuery();
for(Term t: terms)
{
query = new TermQuery(t);
query.add(t, false, false); // ist his wrong? 
}

If I construct the query as a string like "A a OR B b OR C" I get much more
results. I assume that the Boolean query uses an AND operator. How can I
change that. 

And I'm wondering what happens if I boost a TermQuery with a value smaller
then one. I'm asking because I would like to boost each TermQuery with the
td*idf Value of the term in the original document. From my point of view,
this should lead to a better precision, but on the first looks the results
are worse.

THX,

Klaus



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



AW: Boolean Query

2006-01-12 Thread Klaus
Hi,

I have tried to study to lucene scoring in the default similarity. Can
anyone explain me, how this similarity was designed? I have read a lot of IR
literature, but I have never seen an equation like the one used in lucene.
Why is this better then the normal cosine-measure?

Thanks,

Klaus


-Ursprüngliche Nachricht-
Von: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Im Auftrag von Chris Hostetter
Gesendet: Mittwoch, 11. Januar 2006 20:55
An: java-user@lucene.apache.org
Betreff: Re: Boolean Query

: BooleanQuery query = new BooleanQuery();
: for(Term t: terms)
: {
:   query = new TermQuery(t);
:   query.add(t, false, false); // ist his wrong?
: }
:
: If I construct the query as a string like "A a OR B b OR C" I get much
more
: results. I assume that the Boolean query uses an AND operator. How can I
: change that.

The "false, false" on when you add the subclauses should be doing the "OR"
behavior, but more then likely the problem you are running into has to do
with the analyzer being used by your QueryParser when it parses your
string -- when you build the query up by hand, no analyzer is used, so if
the analyzer used at indexing time did any lowercasing or steming you'll
miss a lot of matches.

a quick thing you should try is comparing the toString from each of the
queries you are comparing (the one QueryParser built, and the one you
built by hand).  You should also look at this wiki entry, and pick up a
copy of Lucene in Action and read chapter 4.

: And I'm wondering what happens if I boost a TermQuery with a value smaller
: then one. I'm asking because I would like to boost each TermQuery with the
: td*idf Value of the term in the original document. From my point of view,
: this should lead to a better precision, but on the first looks the results
: are worse.

Before you try this, make sure you understand the existing score
claculation ... look a the explain info for each document against your
query and see what it's already doing.


-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



AW: Use the lucene for searching in the Semantic Web.

2006-01-17 Thread Klaus
Hi Jiang,

I'm currently facing a similar problem. Up to now I have to use for the
semantic query a graph matching algorithm, but the fulltext search in the
semantic web is performed by lucene. 
At first I wrote the whole text into a one index. The document contains one
field for the unique id and on for the whole text. For the semantic markup I
use an extra index. Every rdf triple will result in a document with the
following fields id, predicate + subject + object. Every query is executed
on both indexes. I use an extra index for the rdf data, because this results
in a higher score for the documents. You might argue that this would
adulterate the result, but from me point of view explicit Meta data should
be higher scored then terms in document body. 

Cheers,

Klaus

-Ursprüngliche Nachricht-
Von: jason [mailto:[EMAIL PROTECTED] 
Gesendet: Dienstag, 17. Januar 2006 15:35
An: java-user@lucene.apache.org
Betreff: Use the lucene for searching in the Semantic Web.

Hi friends,

How do you think use the lucene for searching in the Semantic Web? I am
trying using the lucene for searching documents with ontological annotation.
But i do not get a better model to combine the keywords information and the
ontological information.

regards
jiang xing


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



AW: Use the lucene for searching in the Semantic Web.

2006-01-19 Thread Klaus
Hi,

>Actually, my problem is that, for instance, for a document d, Its feature
>vector may be keywords and concepts.

What do you exactly mean by features vector? You are referring to the
predicate - object pairs, connected to one subject node, don't you?
 
>I don't know how to weight the two
>items. Right now, i used a stupid method, given a document d, i can obtain
>a rank D based on keyword method. Also, it is annotated with a concept c
>(The most simple example) . People can have a rank  C of these concepts in
>the domain ontology, where the most relevant concepts should be the at top
>of this concept list. Finally, document's rank is decided by the sum of (C
>+D).

I'm going to implement something like a pagerank algorithm for my search
engine. In Contrast to the google approach I cannot just count the edge, of
one node, because of the know semantic I can weight them. Of course this
implies a knowledge of the domain ontology. For instance if there is a
predicate "cited_in_document" I could rank a document higher, if it is often
cited. But I'm not sure about the results...

Klaus


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Analyzer

2006-01-19 Thread Klaus
Hi,

Is there a way to get the unstemmed term out of the lucene index, or do I
have to change the analyzer, to save the original term and the stemmed one?

Thank,

Klaus


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



AW: Use the lucene for searching in the Semantic Web.

2006-01-20 Thread Klaus
>The feature vector may be bigger than the object-predicate pairs. In my
>application, each document may be annotated with several concepts to say
>this document contains an instance of a class.
How do you do that? I have to reengineer the ontology in my application, but
I'm not sure how to express that a document belongs to one or more concepts.
Would you mind sending my your ontology?


>I am very interesting at your approach. You can see the page rank like
>method used in the SWOOGLE. But the relations they used only some simple
>relations, Such as "import" (used in OWL files"). IF we can use the
>Semantic level relations, It's should be better. But I am not sure it can
>succeed, as it requires how to weight the relations.

Yes. I will have to provide some meta information about the ontology. You
can store this information as an Owl annotation, or in an extra file. I will
start to implement this during the weekend. I think it will be hard to find
the right weights for the predicates, I will keep you informed.

Cheers,

Klaus


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



AW: Document similarity

2006-01-20 Thread Klaus

>In my case, i need to filter similar documents in search results and
>therefore determine document similarity during indexing process using
>term vectors. Obviously, i can't compare currently indexing document
>with all documents in my collection. 

Yes you can. Right after indexing the new documents fetch the termvector for
this document from the index. Computer some kind of weight for each term,
und construct a Boolean query from all terms. You can use the termweights to
boost the termqueries. The hits will be scored, this score is a measure for
the similarity between the documents.

peace 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



AW: Related searches

2006-01-31 Thread Klaus
Hi Leon,

have you tried the WorldNet ad-on? You can easily expand the query with
synonyms.

-Ursprüngliche Nachricht-
Von: xing jiang [mailto:[EMAIL PROTECTED] 
Gesendet: Dienstag, 31. Januar 2006 19:03
An: java-user@lucene.apache.org
Betreff: Re: Related searches

I think you should build a type of domain specific dictionary first. You
should say, for instance, "automobile = car". This approach can satisfy your
requirement.

On 1/30/06, Leon Chaddock <[EMAIL PROTECTED]> wrote:
>
> Hi,
> Does anyone know if it is possible to show related searches with lucene,
> for example if someone searched for "car insurance" you could bring back
the
> results and related searches like these
>
>
> Automobile Insurance
> Car Insurance Quote
> Car Insurance Quotes
> Auto Insurance
> Cheap Car Insurance
> Car Insurance Company
> Car Insurance Companies
> Health Insurance
> Car Insurance Rates
> Car Insurance Rate
> Car Insurance Rental
> Insurance Quote
> Online Car Insurance Quote
> Home Insurance
>
> Thanks
>
> Leon
>



--
Regards

Jiang Xing


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



AW: two problems of using the lucene.

2006-02-05 Thread Klaus
Hi, 

you have to write your own similarity object and pass it to your analyzer.

http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.h
tml

Cheers,

Klaus
-Ursprüngliche Nachricht-
Von: xing jiang [mailto:[EMAIL PROTECTED] 
Gesendet: Sonntag, 5. Februar 2006 04:27
An: java-user@lucene.apache.org
Betreff: two problems of using the lucene.

Hi,

I got two problems of using the lucene and may need your help.

1. For each word, how the lucene calculate its weight. I only know for each
work in the document will be weighed by its tf/idf values.

2. Can I modify the lucene so that i use the term frequency instead of
tf/idf value to calculate the similarity between documents and queries.

--
Regards

Jiang Xing


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



AW: Reindexing

2006-02-08 Thread Klaus
Hi,

you have to index all object already contained in the database? Then there
is no other way then fetching all objects from the database and index them. 


On Feb 8, 2006, at 1:18 AM, Raul Raja Martinez wrote:

> Hi Eric, I'm in the same situation, I wouldn't normally ask  
> something related to hibernate here but I posted something similar  
> in the hibernate forums on Jan 16th but still haven't got any  
> response.
>
> http://forum.hibernate.org/viewtopic.php?t=954137&highlight=lucene
>
> It is really obvious that if they offer lucene indexing out of the  
> box with the hibernate release, people would have to index all  
> their persistent objects that were already in the database before.
>
> Any hint is highly appreciated.
>
> Erik Hatcher wrote:
>> You may likely get better response by posting in the Hibernate list.
>> Erik
>> On Feb 7, 2006, at 7:58 AM, revati joshi wrote:
>>> Hello lucene members,
>>>  i'm the silent member of  
>>> this group.last week i had sent some query regarding  
>>> reindexing,but i dn't received any reply from any one.Still i'm  
>>> stuck up with the same problem of reindexing.
>>>   i hve completed with the reindexing code using hibernate  
>>> Lifecycle class but i don't know where and when to call this  
>>> class for reindexing purpose during updation or new creation of  
>>> any file in ur system.
>>>   I just want to know the precise procedure or method for this.
>>>   So plz do suggest some solution to this as early as possible.
>>>   Thanks for ur cooperation.
>>>   Byee for now.
>>>
>>>-
>>>  Yahoo! Mail - Helps protect you from nasty viruses.
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



AW: Suggesting refine searches with Lucene

2006-02-13 Thread Klaus
A simple approach is to count the most common words in the result set and
present them in combination with the original query. If you have any meta
information you could use them the refine the query.

-Ursprüngliche Nachricht-
Von: Chun Wei Ho [mailto:[EMAIL PROTECTED] 
Gesendet: Montag, 13. Februar 2006 10:35
An: java-user@lucene.apache.org
Betreff: Suggesting refine searches with Lucene

Hi,

I am trying to suggest refine searches for my Lucene search. For
example, if a search turned out too many searches, it would list a
number of document title subsequences that occurred frequently in the
results of the previous search, as possible candidates for refining
the search.

Does anyone know the right/any approach to implementing this in a
Lucene-based search app?

Thanks.

CW

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



AW: Suggesting refine searches with Lucene

2006-02-13 Thread Klaus
>And next time if it is a refined search I will merge current query with  

How do you recognize a refined query? And how are you the queries refined?

Cheers,

klaus


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Lucene in multithreaded enviroment

2006-02-20 Thread Klaus

Hi 
I'm using Lucene in a web application. Every time a new object is added to
the system the index will be updated. May there be any problems, if two
objects were created at the same moment? I know Lucene has some locking
mechanism. 

Thx

klaus

-Ursprüngliche Nachricht-
Von: Amany Moussa [mailto:[EMAIL PROTECTED] 
Gesendet: Montag, 20. Februar 2006 21:22
An: java-user@lucene.apache.org
Betreff: Re: Lucene CPU Utilization

Thank you so much for your reply.

I know that you answered this question before. I just
wanted to post the question to receive more feedbacks
and share the information.

Thanks again.

Amany M.

--- Otis Gospodnetic <[EMAIL PROTECTED]>
wrote:

> I think I answered that question just the other
> day privately...
> No, there is nothing in Lucene to help you with CPU
> utilization.
> However, if you are running this on a UNIX box of
> some kind, you can (re)nice the process and thus
> lower its priority, giving other processes more time
> with the CPU.  Windows may have something similar.
> 
> Otis
> 
> - Original Message 
> From: Amany Moussa <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Monday, February 20, 2006 9:50:57 AM
> Subject: Lucene CPU Utilization
> 
> 
> Hello,
> 
> I am building a Lucene index with over a million   
> documents retrieved from database. I am running the
> application on Unix, I am getting a 100% CPU
> utilization the moment the application start.
> The application creates a list of small indices in a
> temp directory then merge them all in the main index
> file.   
>  
> Is there any way I can tune the indexing process and
> reduce the CPU utilization. 
> Thanks much.  
> 
> Amany M.
> 
> __
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam
> protection around 
> http://mail.yahoo.com 
> 
>
-
> To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> For additional commands, e-mail:
> [EMAIL PROTECTED]
> 
> 
> 
> 
> 
>
-
> To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> For additional commands, e-mail:
> [EMAIL PROTECTED]
> 
> 


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



AW: RE: Stemming and Wildcard - or fire and water

2013-01-04 Thread Klaus Nesbigall
I've encountered the same problem and tried to use your workaround. But 
overwriting the parser hasn't done the job.

I do not understand why the stemming is done anyway.
Uwe wrote 
> This is a well-known problem: Wildcards cannot be analyzed by the query 
> parser, because the analysis would destroy the wildcard characters; 
> also stemming of parts of terms will never work. 
> ...

The actual behavior doesn't work either.
The english word families will not be found in case the user types the query 
familie*
So why solve the problem by postulate one oppinion as right and another as 
wrong?
A simple flag which allows or suppresses the stemming would solve everyones 
problem. All who have no need of change can use the old form, everyone else can 
set the appropriate flag.
If this problem is so well known, there seems to be the need for a clean 
solution to this.


> A possible workaround could be to modify search terms with wildcard 
> tokens by stemming them manually and creating a new search string.
> Searches for hersen* would be modified to hers* and return what you 
> expect.
> Con is of course that you search for more than you specified.
> 
> Lars-Erik
> 
> -Original Message-
> From: Bayer Dennis [mailto:dennis.ba...@cursor.de]
> Sent: Tuesday, December 11, 2012 10:50 AM
> To: java-user@lucene.apache.org
> Subject: Stemming and Wildcard - or fire and water
> 
> Hello there,
> my colleague and I ran into an example which didn't return the result 
> size which we were expecting. We discovered that there is a mismatch 
> in handling terms while indexing and searching. This issue is already 
> discussed several times in the internet as we found out later on, but 
> in our point of view it's a buggy behavior if, at least, using a German 
> stemmer.
> 
> Tl;dr: a Junit testcase is available (http://pastebin.com/AdeFdW1k)
> 
> Setup:
> * Lucene 4.0.0
> * Use the GermanAnalyzer which internally uses a GermanStemmer
> 
> Issue:
> * Create an index for "Hersener" which has a common ending in German 
> -> the string is shortend to "hers"
> * Search for "Hers" -> a result is found
> * Search for "Hersen" -> a result is found because the input token is 
> also stemmed to "hers"
> * Search for "Hers*" -> a result is found
> * Search for "Hersen*" -> nothing is found because the analyzer does 
> not run
> 
> Similiar examples can be constructed easily if umlauts are involved.
> 
> Conclusion:
> The search query which contains a wildcard should also be run through 
> the analyzer, because there are a lot of queries which would return 
> nothing. The lucene FAQ already as a topic related to this issue:
> http://wiki.apache.org/lucene-
> java/LuceneFAQ#Are_Wildcard.2C_Prefix.2C_and_Fuzzy_queries_case_sen
> sitive.3F
> 
> The example with "dog" and "dogs" works as long as only one character 
> is stemmed - which could be true in English for the majority. But if 
> more characters are involved lucene does not return anything instead 
> of returning a few additional items. Just consider "families" which is 
> stemmed to "famili".
> Searching for "familie*" wouldn't return no item.
> 
> To find an ending for this initial post ;) :
> Could this behavior made configurable in the standard? If not:
> a) Why are the stemmers used by default if they can led to wrong results?
> b) What can be done manually to stem queries containing wildcards, e.g.
> overriding some parser.
> 
> Best regards
> Dennis
> 
> 
> 



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



spatial searches

2010-05-11 Thread Klaus Malorny


Hi all,

I hope someone can enlighten me. I am trying to figure out how spatial searches 
are to be implemented with Lucene. From walking through mailing lists and 
various web pages, looking at the JavaDoc and source code, I understand how the 
tiers work and how the search is limited by a special term query containing the 
ID(s) of the relevant grid cells.


However, it still puzzles me how, where and when the final distance filtering 
takes place. I see three possibilities: the "Filter" class, the 
"ValueSourceQuery" or the use of a subclass of "Collector". With my limited 
understanding of the inner working of Lucene, it seems to me that the first two 
ways more or less operate on the whole document set, i.e. prior to the moment 
where the term query for the tiers comes into effect, rendering it useless. The 
"Collector" approach seems to be much more appropriate, but additionally to the 
decision whether the document meets the distance condition or not, I would like 
to have different scores depending on the distance (lower score for larger 
distances). Originally I thought that the solution would be some kind of 
subclass of "Query", but haven't seen any hints pointing in this direction and I 
don't know whether I am able to implement that on my own. I fear that I 
completely misunderstand something. Thanks in advance for any hints.


Regards,

Klaus

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: spatial searches

2010-06-02 Thread Klaus Malorny

On 22/05/10 08:45, Julian Atkinson wrote:

Hi Klaus,

I suggest you take a look at the code in TestCartesian.java for
working examples of the search and as a staring point to trace
through.

in more depth, if you look at DistanceQueryBuilder.java you'll see 2
filters are being setup.

The first pass filter is created by CartesianPolyFilterBuilder and
this makes sure you only consider documents near to the area you are
searching by looking in the right tier and pulling out the relevant
grid cells.

The second filter is dependent on which method you are using Lat/Lng
or Geohash - this is where the more precise filtering is done based on
the calculated distance.

The use of the second pass filter is optional and driven by a boolean.

If you want to custom score then there is an example in the
TestCartesian.class with CustomScoreQuery

Hope this helps,
Julian


Hi Julian,

sorry not to thank earlier -- unfortunately, I had a family tragedy.

I missed that CustomScoreQuery can be used without ValueSourceQuery instances. 
So I will try to use a term query as a subquery to preselect the documents in 
the geographic vicinity and to finally calculate the right distances using an 
own implementation of CustomScoreProvider.


Greetings,

Klaus




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Slow Index Writes

2014-01-03 Thread Klaus Schaefers
Hi,

I am trying to use a lucene as a kind of key value store, but I encountered
some bad performance issues. When I try to add my data as documents to the
index I get an average write rate of 3 documents / second!! This seems to
me ridiculously slow and I guess I must have somewhere an error. Please
have a look at my code:



Directory dir = new niofsdirectojava-u...@lucene.apache.org!
java-user@lucene.apache.org!ry(file);
Analyzer analyzer =  new StandardAnalyzer(Version.LUCENE_45);
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_45,
analyzer);
IndexWriter writer = new IndexWriter(dir, config);

int eventCount = 1000;
for(int i=0; i < eventCount;i++){
Document doc = new Document();
doc.add(new StringField("id", i+"id" ,Store.YES));
doc.add(new StoredField("b", buildVector()));
writer.addDocument(doc);
writer.commit();
}
dir.close();
writer.close()


Not calling the commit function seems to fix the issue, but I guess this
would then have some issues if I want to read values in the mean time. My
normal use case would be to read something from the index, maybe alter it
and then write back. So I would have roughly 50% of reads.

I tried also an embedded version of elastic search and it manages to go to
2000 documents/ per second. As its based on lucene as well I guess I do
something wrong in my code.


THX for the help,

Klaus


-- 

-- 

Klaus Schaefers
Senior Optimization Manager

Ligatus GmbH
Hohenstaufenring 30-32
D-50674 Köln

Tel.:  +49 (0) 221 / 56939 -784
Fax:  +49 (0) 221 / 56 939 - 599
E-Mail: klaus.schaef...@ligatus.com
Web: www.ligatus.de

HRB Köln 56003
Geschäftsführung:
Dipl.-Kaufmann Lars Hasselbach, Dipl.-Kaufmann Klaus Ludemann,
Dipl.-Wirtschaftsingenieur Arne Wolter


Re: Slow Index Writes

2014-01-07 Thread Klaus Schaefers
Hi,


I was looking for some examples but I just found some using an NRTManager
class? In Lucene 4.5 I cannot find the class (missing a maven dependency?).
Can anyone point me to a working example?

Cheers,

Klaus



On Fri, Jan 3, 2014 at 11:49 AM, Ian Lea  wrote:

> You will indeed get poor performance if you commit for every doc.  Can
> you compromise and commit every, say, 1000 docs, or once every few
> minutes, or whatever makes sense for your app.
>
> Or look at lucene's near-real-time search features.  Google "Lucene
> NRT" for info.
>
> Or use Elastic Search.
>
>
> --
> Ian.
>
>
> On Fri, Jan 3, 2014 at 10:21 AM, Klaus Schaefers
>  wrote:
> > Hi,
> >
> > I am trying to use a lucene as a kind of key value store, but I
> encountered
> > some bad performance issues. When I try to add my data as documents to
> the
> > index I get an average write rate of 3 documents / second!! This seems to
> > me ridiculously slow and I guess I must have somewhere an error. Please
> > have a look at my code:
> >
> >
> >
> > Directory dir = new niofsdirectojava-u...@lucene.apache.org!
> > java-user@lucene.apache.org!ry(file);
> > Analyzer analyzer =  new StandardAnalyzer(Version.LUCENE_45);
> > IndexWriterConfig config = new
> IndexWriterConfig(Version.LUCENE_45,
> > analyzer);
> > IndexWriter writer = new IndexWriter(dir, config);
> >
> > int eventCount = 1000;
> > for(int i=0; i < eventCount;i++){
> > Document doc = new Document();
> > doc.add(new StringField("id", i+"id" ,Store.YES));
> > doc.add(new StoredField("b", buildVector()));
> > writer.addDocument(doc);
> > writer.commit();
> > }
> > dir.close();
> > writer.close()
> >
> >
> > Not calling the commit function seems to fix the issue, but I guess this
> > would then have some issues if I want to read values in the mean time. My
> > normal use case would be to read something from the index, maybe alter it
> > and then write back. So I would have roughly 50% of reads.
> >
> > I tried also an embedded version of elastic search and it manages to go
> to
> > 2000 documents/ per second. As its based on lucene as well I guess I do
> > something wrong in my code.
> >
> >
> > THX for the help,
> >
> > Klaus
> >
> >
> > --
> >
> > --
> >
> > Klaus Schaefers
> > Senior Optimization Manager
> >
> > Ligatus GmbH
> > Hohenstaufenring 30-32
> > D-50674 Köln
> >
> > Tel.:  +49 (0) 221 / 56939 -784
> > Fax:  +49 (0) 221 / 56 939 - 599
> > E-Mail: klaus.schaef...@ligatus.com
> > Web: www.ligatus.de
> >
> > HRB Köln 56003
> > Geschäftsführung:
> > Dipl.-Kaufmann Lars Hasselbach, Dipl.-Kaufmann Klaus Ludemann,
> > Dipl.-Wirtschaftsingenieur Arne Wolter
>
> -----
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


-- 

-- 

Klaus Schaefers
Senior Optimization Manager

Ligatus GmbH
Hohenstaufenring 30-32
D-50674 Köln

Tel.:  +49 (0) 221 / 56939 -784
Fax:  +49 (0) 221 / 56 939 - 599
E-Mail: klaus.schaef...@ligatus.com
Web: www.ligatus.de

HRB Köln 56003
Geschäftsführung:
Dipl.-Kaufmann Lars Hasselbach, Dipl.-Kaufmann Klaus Ludemann,
Dipl.-Wirtschaftsingenieur Arne Wolter


Re: Slow Index Writes

2014-01-08 Thread Klaus Schaefers
THX!


On Wed, Jan 8, 2014 at 10:10 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> NRTManager was renamed to ControlledRealTimeReopenThread at some point.
>
> But likely simple NRT readers (as Ian described, using
> .openIfChanged()) will fit your usage.
>
> ControlledRealTimeReopenThread is only necessary if you require
> certain searches to be real-time, e.g. you just indexed a document and
> then want to run a search that you know reflects that document.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Tue, Jan 7, 2014 at 8:41 AM, Klaus Schaefers
>  wrote:
> > Hi,
> >
> >
> > I was looking for some examples but I just found some using an NRTManager
> > class? In Lucene 4.5 I cannot find the class (missing a maven
> dependency?).
> > Can anyone point me to a working example?
> >
> > Cheers,
> >
> > Klaus
> >
> >
> >
> > On Fri, Jan 3, 2014 at 11:49 AM, Ian Lea  wrote:
> >
> >> You will indeed get poor performance if you commit for every doc.  Can
> >> you compromise and commit every, say, 1000 docs, or once every few
> >> minutes, or whatever makes sense for your app.
> >>
> >> Or look at lucene's near-real-time search features.  Google "Lucene
> >> NRT" for info.
> >>
> >> Or use Elastic Search.
> >>
> >>
> >> --
> >> Ian.
> >>
> >>
> >> On Fri, Jan 3, 2014 at 10:21 AM, Klaus Schaefers
> >>  wrote:
> >> > Hi,
> >> >
> >> > I am trying to use a lucene as a kind of key value store, but I
> >> encountered
> >> > some bad performance issues. When I try to add my data as documents to
> >> the
> >> > index I get an average write rate of 3 documents / second!! This
> seems to
> >> > me ridiculously slow and I guess I must have somewhere an error.
> Please
> >> > have a look at my code:
> >> >
> >> >
> >> >
> >> > Directory dir = new niofsdirectojava-u...@lucene.apache.org!
> >> > java-user@lucene.apache.org!ry(file);
> >> > Analyzer analyzer =  new StandardAnalyzer(Version.LUCENE_45);
> >> > IndexWriterConfig config = new
> >> IndexWriterConfig(Version.LUCENE_45,
> >> > analyzer);
> >> > IndexWriter writer = new IndexWriter(dir, config);
> >> >
> >> > int eventCount = 1000;
> >> > for(int i=0; i < eventCount;i++){
> >> > Document doc = new Document();
> >> > doc.add(new StringField("id", i+"id" ,Store.YES));
> >> > doc.add(new StoredField("b", buildVector()));
> >> > writer.addDocument(doc);
> >> > writer.commit();
> >> > }
> >> > dir.close();
> >> > writer.close()
> >> >
> >> >
> >> > Not calling the commit function seems to fix the issue, but I guess
> this
> >> > would then have some issues if I want to read values in the mean
> time. My
> >> > normal use case would be to read something from the index, maybe
> alter it
> >> > and then write back. So I would have roughly 50% of reads.
> >> >
> >> > I tried also an embedded version of elastic search and it manages to
> go
> >> to
> >> > 2000 documents/ per second. As its based on lucene as well I guess I
> do
> >> > something wrong in my code.
> >> >
> >> >
> >> > THX for the help,
> >> >
> >> > Klaus
> >> >
> >> >
> >> > --
> >> >
> >> > --
> >> >
> >> > Klaus Schaefers
> >> > Senior Optimization Manager
> >> >
> >> > Ligatus GmbH
> >> > Hohenstaufenring 30-32
> >> > D-50674 Köln
> >> >
> >> > Tel.:  +49 (0) 221 / 56939 -784
> >> > Fax:  +49 (0) 221 / 56 939 - 599
> >> > E-Mail: klaus.schaef...@ligatus.com
> >> > Web: www.ligatus.de
> >> >
> >> > HRB Köln 56003
> >> > Geschäftsführung:
> >> > Dipl.-Kaufmann Lars Hasselbach, Dipl.-Kaufmann Klaus Ludemann,
> >> > Dipl.-Wirtschaftsingenieur Arne Wolter
> >>
> >> -----
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>
> >>
> >
> >
> > --
> >
> > --
> >
> > Klaus Schaefers
> > Senior Optimization Manager
> >
> > Ligatus GmbH
> > Hohenstaufenring 30-32
> > D-50674 Köln
> >
> > Tel.:  +49 (0) 221 / 56939 -784
> > Fax:  +49 (0) 221 / 56 939 - 599
> > E-Mail: klaus.schaef...@ligatus.com
> > Web: www.ligatus.de
> >
> > HRB Köln 56003
> > Geschäftsführung:
> > Dipl.-Kaufmann Lars Hasselbach, Dipl.-Kaufmann Klaus Ludemann,
> > Dipl.-Wirtschaftsingenieur Arne Wolter
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


-- 

-- 

Klaus Schaefers
Senior Optimization Manager

Ligatus GmbH
Hohenstaufenring 30-32
D-50674 Köln

Tel.:  +49 (0) 221 / 56939 -784
Fax:  +49 (0) 221 / 56 939 - 599
E-Mail: klaus.schaef...@ligatus.com
Web: www.ligatus.de

HRB Köln 56003
Geschäftsführung:
Dipl.-Kaufmann Lars Hasselbach, Dipl.-Kaufmann Klaus Ludemann,
Dipl.-Wirtschaftsingenieur Arne Wolter


Alternative scoring of BooleanQuery

2009-07-07 Thread Klaus Malorny



Hi all,

sorry if this is FAQ or has been answered in the list earlier, but unfortunately 
I did not find a decent way to search in the archive (maybe a job for Lucene ;-) )


For some reason, I had to split my document into multiple fields. For the 
search, I create a query with two subqueries for the same term within each 
field, combining it via a BooleanQuery/Occur.SHOULD. If a term happens to appear 
in both fields, the score is added (and scaled, if disableCoord is false). In my 
context this is not really what I want. I would prefer to have a simple 
"maximum" function over the scores of the subqueries. Since I do not consider 
myself an expert in the internal working of Lucene, is there an easy way to 
achieve this or do I have to reimplement the whole BooleanQuery class?


Thanks for any advice.

Regards,
Klaus


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Concurrent Indexing and Searching

2009-09-25 Thread Klaus Teller
Hi,

I've read that it is possible to update the index while another thread has a 
reader open. 

Now let's say the reader is trying to reopen the index (using its reopen 
method) and at the very same time, the write its committing its 500MB changes 
to the index. My question is, what happens in this situation? What index does 
the reader end up having if it tries to open the index while the writer is 
modifying it?

Any feedback will be much appreciated,

Klaus.
-- 
Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 -
sicherer, schneller und einfacher! http://portal.gmx.net/de/go/chbrowser

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



fast Result Count

2010-02-09 Thread Klaus Teller
Hi Guys,

Is there a way to speed up couting documents that satisfy a search query other 
than by using TopDocCollector.getTotalHits()? 

For instance, if there are 100 documents satisfying my search query, how 
can I count them without loading them all in memory?

Thanks,
Klaus.
-- 
Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 -
sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



boosting results with a field from the index

2006-01-03 Thread Klaus Hubert
Hi and a Happy New Year!
 
I created a lucene index with 2 fields (text and importance). The text contains 
the real text and importance is a field where I manually give a number between 
1 and 5 for the related document. When I search the index I find the documents 
with the highest revelancy weighted automatically by lucene. I'm just wondering 
if I can boost the results with the importance field I already have stored in 
the index. As I result I expect the same search results just weighted 
differently. Something like relevancy multiplied by importance.
 
Thank you so much,
 
  Klaus


-
Yahoo! Photos
 Ring in the New Year with Photo Calendars. Add photos, events, holidays, 
whatever.

RE: boosting results with a field from the index

2006-01-03 Thread Klaus Hubert
Wow, that was fast :-)

Right, why haven't I came up with the idea on just
sorting the results by
importance... Lol...

OK, I will test both solutions and see what I like
best. Such a great piece
for software...

-Original Message-
From: Grant Ingersoll [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, January 03, 2006 5:26 PM
To: java-user@lucene.apache.org
Subject: Re: boosting results with a field from the
index

Hi Klaus,

You might want to just set the boost value of the
Document using your 
importance number, then Lucene will factor that in
automatically when 
scoring.  See the Document#setBoost javadoc for info.

You could also sort on the field, I think, so that the
more important 
docs come to the top.

-Grant

Klaus Hubert wrote:

>Hi and a Happy New Year!
> 
>I created a lucene index with 2 fields (text and
importance). The text
contains the real text and importance is a field where
I manually give a
number between 1 and 5 for the related document. When
I search the index I
find the documents with the highest revelancy weighted
automatically by
lucene. I'm just wondering if I can boost the results
with the importance
field I already have stored in the index. As I result
I expect the same
search results just weighted differently. Something
like relevancy
multiplied by importance.
> 
>Thank you so much,
> 
>  Klaus
>
>   
>-
>Yahoo! Photos
> Ring in the New Year with Photo Calendars. Add
photos, events, holidays,
whatever.
>  
>

-- 
---

Grant Ingersoll 
Sr. Software Engineer 
Center for Natural Language Processing 
Syracuse University 
School of Information Studies 
337 Hinds Hall 
Syracuse, NY 13244 

http://www.cnlp.org 
Voice:  315-443-5484 
Fax: 315-443-6886 


-
To unsubscribe, e-mail:
[EMAIL PROTECTED]
For additional commands, e-mail:
[EMAIL PROTECTED]





__ 
Yahoo! DSL – Something to write home about. 
Just $16.99/mo. or less. 
dsl.yahoo.com 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



AW: indexReader close method

2005-03-03 Thread Klaus Moysich


-Ursprüngliche Nachricht-
Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im Auftrag von Chris Hostetter
Gesendet: Montag, 6. Dezember 2004 21:32
An: Lucene Users List
Betreff: Re: indexReader close method


: Do you know why I can't close the IndexReader  explicitly under some
: circumstances and why, when I do manage to close it I can still call
: methods on the reader?

1) I tried to create a test case that demonstrated your bug based on the
code outline you provided, and i couldn't (see below).  that implies to me
that somethine else is going on.  If you can create a completely self
contained program that demonstrates your bug and mail it to the list that
would help us help you.

2) the documentation for IndexReader.close() says...

Closes files associated with this index. Also saves any new deletions to
disk. No other methods should be called after this has been called.

...note the word "should".  it doesn't say what the other methods will do
if you try to call them, just that you shouldn't try.  In some cases they
may generate exceptions, in other cases they may just be able to return
you data based on state internal to the object which is unaffected by the
fact that the files have all been closed.

-Hoss

public static void main(String argv[]) throws IOException {

/* create a directory */
String d = System.getProperty("java.io.tmpdir", "tmp")
+ System.getProperty("file.separator")
+ "index-dir-" + (new Random()).nextInt(1000);
Directory trash = FSDirectory.getDirectory(d, true);


/* build index */
Document doc;
IndexWriter w = new IndexWriter(d, new SimpleAnalyzer(), true);
doc = new Document();
doc.add(Field.Text("words", "apple emu"));
w.addDocument(doc);
w.optimize();
w.close();

/* search index */
IndexReader r = IndexReader.open(d);
IndexSearcher s = new IndexSearcher(r);
Hits h = s.search(new TermQuery(new Term("words", "apple")));

s.close();
r.close();

System.out.println("Reader? - " + r.maxDoc());

}






-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



SIMPLE Lucene / MySQL Indexer

2005-07-12 Thread Klaus Hubert
Hi,

I played with several search engines to replace MySQL
FULLTEXT index and hope that Lucene is the best
solution for that.

I am reading Mannings book on Lucene in action and it
seems to be the most powerful search engine I found so
far.

I'm stuck at some problem and need help from you
experts. I managed to create an index as described in
the examples. I also managed to read a MySQL database
in Java.

My question is, if anybody here has some SIMPLE
example which does this in one step. I am good in PHP
and in Visual Basic, but very new to Java. Maybe I'm
using the wrong tools (NetBeans IDE and JCreator) but
I don't get it managed to create an Lucene Index on 3
database fields.

I appreciate any help.

Thank you so much,

  Klaus

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: SIMPLE Lucene / MySQL Indexer

2005-07-13 Thread Klaus Hubert
Hi Chris,

this is indeed a cool application, but I need just to
create the index. I
definitely will look into your file and see if it
makes my life easier. Can
you tell any details how long it took to create such a
huge index? What
experiences you have with the slowest search? Does it
go over 1 second? (I
know, it depends on the hardware, but I'm just
wondering)

Thanks,

  Klaus

-Original Message-
From: Chris Lu [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 13, 2005 5:04 AM
To: java-user@lucene.apache.org
Subject: Re: SIMPLE Lucene / MySQL Indexer

Please allow me to intraduce DBSight.
It's based on Lucene, oriented for Any database
search.

Most of the things are done by web UI. No coding is
needed to create your
search.
check out this demo.  http://search.dbsight.com

It's free to download and test. Free for developer
edition, non-profit
usage.

Chris Lu
---
Full-Text Search on Any Database
http://www.dbsight.net

Klaus Hubert wrote:

>Hi,
>
>I played with several search engines to replace MySQL
FULLTEXT index 
>and hope that Lucene is the best solution for that.
>
>I am reading Mannings book on Lucene in action and it
seems to be the 
>most powerful search engine I found so far.
>
>I'm stuck at some problem and need help from you
experts. I managed to 
>create an index as described in the examples. I also
managed to read a 
>MySQL database in Java.
>
>My question is, if anybody here has some SIMPLE
example which does this 
>in one step. I am good in PHP and in Visual Basic,
but very new to 
>Java. Maybe I'm using the wrong tools (NetBeans IDE
and JCreator) but I 
>don't get it managed to create an Lucene Index on 3
database fields.
>
>I appreciate any help.
>
>Thank you so much,
>
>  Klaus
>
>__
>Do You Yahoo!?
>Tired of spam?  Yahoo! Mail has the best spam
protection around 
>http://mail.yahoo.com
>
>-
>To unsubscribe, e-mail:
[EMAIL PROTECTED]
>For additional commands, e-mail:
[EMAIL PROTECTED]
>
>
>  
>


--
Chris Lu
--
Free-Text Search on Any Database
http://www.dbsight.net


-
To unsubscribe, e-mail:
[EMAIL PROTECTED]
For additional commands, e-mail:
[EMAIL PROTECTED]






Start your day with Yahoo! - make it your home page
http://www.yahoo.com/r/hs
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: SIMPLE Lucene / MySQL Indexer

2005-07-13 Thread Klaus Hubert
Hi Nader,

I downloaded Eclipse and also the Hibernate plugin and
I really like this
IDE. It seems to have lots of power. What I didn't
found so far is a
Debugger where I can go line by line through the code
to see errors
eventually. It runs and I get error messages at the
line where the problem
arises. But I cannot go step by step as I was used to
when Programming
Visual Basic, PHP or Perl.

Thanks,

  Klaus 

-Original Message-
From: Nader Henein [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 13, 2005 10:42 AM
To: java-user@lucene.apache.org
Subject: Re: SIMPLE Lucene / MySQL Indexer

Also Hibernate, you can use Eclipse as an IDE, with
the Hibernator plugin to
create objects cleanly from your MySQL database and
then a few lines will
fetch an object which you could then be passed to
Lucene for indexing.

Nader Henein

Klaus Hubert wrote:

>Hi,
>
>I played with several search engines to replace MySQL
FULLTEXT index 
>and hope that Lucene is the best solution for that.
>
>I am reading Mannings book on Lucene in action and it
seems to be the 
>most powerful search engine I found so far.
>
>I'm stuck at some problem and need help from you
experts. I managed to 
>create an index as described in the examples. I also
managed to read a 
>MySQL database in Java.
>
>My question is, if anybody here has some SIMPLE
example which does this 
>in one step. I am good in PHP and in Visual Basic,
but very new to 
>Java. Maybe I'm using the wrong tools (NetBeans IDE
and JCreator) but I 
>don't get it managed to create an Lucene Index on 3
database fields.
>
>I appreciate any help.
>
>Thank you so much,
>
>  Klaus
>
>__
>Do You Yahoo!?
>Tired of spam?  Yahoo! Mail has the best spam
protection around 
>http://mail.yahoo.com
>
>-
>To unsubscribe, e-mail:
[EMAIL PROTECTED]
>For additional commands, e-mail:
[EMAIL PROTECTED]
>
>
>
>
>  
>

-- 

Nader S. Henein
Senior Applications Architect

Bayt.com


-
To unsubscribe, e-mail:
[EMAIL PROTECTED]
For additional commands, e-mail:
[EMAIL PROTECTED]






Start your day with Yahoo! - make it your home page
http://www.yahoo.com/r/hs
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: SIMPLE Lucene / MySQL Indexer

2005-07-13 Thread Klaus Hubert
Hi Ian,

That's something I'm looking for. Right, a simple
source code which reads a
database and adds the fields to the index. What I've
found also so far is
another solution at
http://www-128.ibm.com/developerworks/java/library/j-lucene/.
First step is
to export my MySQL database in simple XML and go from
there. It is just an
additional step and I would stick with this if I don't
find another method
to do all at once.

Thanks,

  Klaus 

-Original Message-
From: Ian Lea [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 13, 2005 10:19 AM
To: java-user@lucene.apache.org
Subject: Re: SIMPLE Lucene / MySQL Indexer

Something like this?

IndexWriter iw = whatever
ResultSet rs = whatever

while (rs.next()) {
   Document ldoc = new Document();
   ldoc.add(Field.Text("f1", rs.getString("f1"));
   ldoc.add(Field.Unstored("f2", rs.getString("f2"));
   ldoc.add(Field.Keyword("f3", rs.getString("f3"));
   ...
   iw.addDocument(ldoc);
}

rs.close();
iw.close();


On the IDE front, most people seem to use Eclipse
nowadays.


--
Ian.

On 13/07/05, Klaus Hubert <[EMAIL PROTECTED]>
wrote:
> Hi,
> 
> I played with several search engines to replace
MySQL FULLTEXT index 
> and hope that Lucene is the best solution for that.
> 
> I am reading Mannings book on Lucene in action and
it seems to be the 
> most powerful search engine I found so far.
> 
> I'm stuck at some problem and need help from you
experts. I managed to 
> create an index as described in the examples. I also
managed to read a 
> MySQL database in Java.
> 
> My question is, if anybody here has some SIMPLE
example which does 
> this in one step. I am good in PHP and in Visual
Basic, but very new 
> to Java. Maybe I'm using the wrong tools (NetBeans
IDE and JCreator) 
> but I don't get it managed to create an Lucene Index
on 3 database 
> fields.
> 
> I appreciate any help.
> 
> Thank you so much,
> 
>   Klaus

-
To unsubscribe, e-mail:
[EMAIL PROTECTED]
For additional commands, e-mail:
[EMAIL PROTECTED]



__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: SIMPLE Lucene / MySQL Indexer

2005-07-13 Thread Klaus Hubert
Hi Xing,

I have the book and as I wrote in my initial message I
managed to create the
sample index as well managed to read mySQL. But I seem
to be not able to
combine those programs :-( I'm very new to Java and I
haven't found a nice
Debugger so far to go step by step through my code. I
will try today all day
to get this fixed. I know, it shouldn't be too
difficult.

Thank you,

  Klaus 

-Original Message-
From: Xing Li [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 13, 2005 2:15 PM
To: java-user@lucene.apache.org
Subject: RE: SIMPLE Lucene / MySQL Indexer

Don't make the mistake of complicating the task. Just
read straight from
mysql into lucene via java. There is no benefit of
exporting data to xml
just to regrab the data back into lucene. 

Get the Lucene In actioin book if you haven't cause
all the samples there
are real-world practical. Are you need to add is 10
lines of mysql type
java/jdbc code and you are ready to create your first
index. Download luke
for lucene, GUI testing tool so you can browse the
index, perform searches,
validate/test search performan bottlenecks, dissect
queries, etc.


On Wednesday, July 13, 2005, at 05:04AM, Klaus Hubert
<[EMAIL PROTECTED]> wrote:

>Hi Ian,
>
>That's something I'm looking for. Right, a simple
source code which 
>reads a database and adds the fields to the index.
What I've found also 
>so far is another solution at 
>http://www-128.ibm.com/developerworks/java/library/j-lucene/.
>First step is
>to export my MySQL database in simple XML and go from
there. It is just 
>an additional step and I would stick with this if I
don't find another 
>method to do all at once.
>
>Thanks,
>
>  Klaus
>
>-Original Message-
>From: Ian Lea [mailto:[EMAIL PROTECTED]
>Sent: Wednesday, July 13, 2005 10:19 AM
>To: java-user@lucene.apache.org
>Subject: Re: SIMPLE Lucene / MySQL Indexer
>
>Something like this?
>
>IndexWriter iw = whatever
>ResultSet rs = whatever
>
>while (rs.next()) {
>   Document ldoc = new Document();
>   ldoc.add(Field.Text("f1", rs.getString("f1"));
>   ldoc.add(Field.Unstored("f2", rs.getString("f2"));
>   ldoc.add(Field.Keyword("f3", rs.getString("f3"));
>   ...
>   iw.addDocument(ldoc);
>}
>
>rs.close();
>iw.close();
>
>
>On the IDE front, most people seem to use Eclipse
nowadays.
>
>
>--
>Ian.
>
>On 13/07/05, Klaus Hubert <[EMAIL PROTECTED]>
>wrote:
>> Hi,
>> 
>> I played with several search engines to replace
>MySQL FULLTEXT index
>> and hope that Lucene is the best solution for that.
>> 
>> I am reading Mannings book on Lucene in action and
>it seems to be the
>> most powerful search engine I found so far.
>> 
>> I'm stuck at some problem and need help from you
>experts. I managed to
>> create an index as described in the examples. I
also
>managed to read a
>> MySQL database in Java.
>> 
>> My question is, if anybody here has some SIMPLE
>example which does
>> this in one step. I am good in PHP and in Visual
>Basic, but very new
>> to Java. Maybe I'm using the wrong tools (NetBeans
>IDE and JCreator)
>> but I don't get it managed to create an Lucene
Index
>on 3 database
>> fields.
>> 
>> I appreciate any help.
>> 
>> Thank you so much,
>> 
>>   Klaus
>
>-
>To unsubscribe, e-mail:
>[EMAIL PROTECTED]
>For additional commands, e-mail:
>[EMAIL PROTECTED]
>
>
>
>__
>Do You Yahoo!?
>Tired of spam?  Yahoo! Mail has the best spam
protection around 
>http://mail.yahoo.com
>
>-
>To unsubscribe, e-mail:
[EMAIL PROTECTED]
>For additional commands, e-mail:
[EMAIL PROTECTED]
>
>
>

-
To unsubscribe, e-mail:
[EMAIL PROTECTED]
For additional commands, e-mail:
[EMAIL PROTECTED]






Start your day with Yahoo! - make it your home page
http://www.yahoo.com/r/hs
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: SIMPLE Lucene / MySQL Indexer

2005-07-13 Thread Klaus Hubert
Hi,

Thank you all so much for the crash course in Java for
Beginners. Indeed the
last time I used java was 1996... Lol. But I'm getting
now very close. It is
all about the right declarations of classes and
includes at the correct
location. I have almost done it. I will publish my
code to the community if
somebody is interested.

Bye,

  Klaus

-Original Message-
From: Xing Li [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 13, 2005 2:38 PM
To: java-user@lucene.apache.org
Subject: RE: SIMPLE Lucene / MySQL Indexer

Kalus,

Just a few days ago I couldn't even remember how to
compile java code. Last
time I touched java was like 2001. Don't worry, Lucene
is extremely easy,
once you know a bit of fund java. It's no different
than any other language.
Just syntax. I recommend Java from Deitel & Deitel.
Fell in love with their
practical written style back in college. 

Below is what I whipped up quick to test mysql
connections...

Just add  the following to an lucene book sample. You
need to download
ConnectJ jdbc driver from mysql site and put the jar
file in your path
variable. 

my_db db = new my_db();
db.connect();

ResultSet = rs;

rs = db.query("select * from mytable limit 100");

whiel(rs.next()) {
... = rs.getString("mysqltablefieldname");
//return string value of
mysql row/column
...copy code from lucene...
}.


import java.sql.*;

public class my_db {

public Connection conn = null;
public Statement stmt = null;
public boolean loaded = false;

public boolean load() {
try {
// The newInstance() call is a
work around for some
// broken Java implementations

   
Class.forName("com.mysql.jdbc.Driver");
return true;
} catch (Exception ex) {

System.out.println("Cannot
load mysql driver.");
return false;
// handle the error
}

}

public boolean connect() {
if(loaded == false) {
loaded = load();
}

if(loaded == false) {
System.out.println("Can't load
driver.");
return false;
}
try {
conn =
DriverManager.getConnection("jdbc:mysql://ip:port/dbname?user=user&password=
pass");

stmt = conn.createStatement();
stmt.executeQuery("SET NAMES
'utf8'");

return true;
}
catch (SQLException ex) {
// handle any errors
   
System.out.println("SQLException: " +
ex.getMessage());
System.out.println("SQLState:
" + ex.getSQLState());
   
System.out.println("VendorError: " +
ex.getErrorCode());
return false;
}
}

public ResultSet query(String sql) {
try {
return stmt.executeQuery(sql);
}
catch (SQLException ex) {
// handle any errors
   
System.out.println("SQLException: " +
ex.getMessage());
System.out.println("SQLState:
" + ex.getSQLState());
   
System.out.println("VendorError: " +
ex.getErrorCode());
return null;
}
}

  
}




On Wednesday, July 13, 2005, at 05:23AM, Klaus Hubert
<[EMAIL PROTECTED]> wrote:

>Hi Xing,
>
>I have the book and as I wrote in my initial message
I managed to 
>create the sample index as well managed to read
mySQL. But I seem to be 
>not able to combine those programs :-( I'm very new
to Java and I 
>haven't found a nice Debugger so far to go step by
step through my 
>code. I will try today all day to get this fixed. I
know, it shouldn't 
>be too difficult.
>
>Thank you,
>
>  Klaus
>
>-Original Message-
>From: Xing Li [mailto:[EMAIL PROTECTED]
>Sent: Wednesday, July 13, 2005 2:15 PM
>To: java-user@lucene.apache.org
>Subject: RE: SIMPLE Lucene / MySQL Indexer
>
>Don't make the mistake of complicating the task. Just
read straight 
>from mysql into lucene via java. There is no benefit
of exporting data 
>to xml just to regrab the data back into lucene.
>
>Get the Lucene In actioin book if you haven't cause
all the samples 
>there are real-world practical. Are you need to add
is 10 lines of 
>mysql type java/jdbc code and you are ready to create
your first index. 
>Download luke for lucene, GUI testing tool so you can
browse the index,

RE: SIMPLE Lucene / MySQL Indexer

2005-07-13 Thread Klaus Hubert
Yes, it works with breakpoints and so on, but the
current line is never
highlighted. All I see where it is the line number in
the debug window. But
you are right, this is no Java Forum and I apologize
for beginners
questions.

-Original Message-
From: Karthik N S [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 13, 2005 2:41 PM
To: java-user@lucene.apache.org
Subject: RE: SIMPLE Lucene / MySQL Indexer


hI

Apologies


Interesting this is not the Form to discuss about HOW
to Debugging with
Eclipse

 So I suggest u to use the Help tab in Eclispe Ide.

 Hint : First set  the Break point on hte code and
then use Use the Debug
tab under Run.



 this is a Lucene Form Guys


Karthik



-Original Message-
From: Klaus Hubert [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 13, 2005 5:54 PM
To: java-user@lucene.apache.org
Subject: RE: SIMPLE Lucene / MySQL Indexer


Hi Xing,

I have the book and as I wrote in my initial message I
managed to create the
sample index as well managed to read mySQL. But I seem
to be not able to
combine those programs :-( I'm very new to Java and I
haven't found a nice
Debugger so far to go step by step through my code. I
will try today all day
to get this fixed. I know, it shouldn't be too
difficult.

Thank you,

  Klaus

-Original Message-
From: Xing Li [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 13, 2005 2:15 PM
To: java-user@lucene.apache.org
Subject: RE: SIMPLE Lucene / MySQL Indexer

Don't make the mistake of complicating the task. Just
read straight from
mysql into lucene via java. There is no benefit of
exporting data to xml
just to regrab the data back into lucene.

Get the Lucene In actioin book if you haven't cause
all the samples there
are real-world practical. Are you need to add is 10
lines of mysql type
java/jdbc code and you are ready to create your first
index. Download luke
for lucene, GUI testing tool so you can browse the
index, perform searches,
validate/test search performan bottlenecks, dissect
queries, etc.


On Wednesday, July 13, 2005, at 05:04AM, Klaus Hubert
<[EMAIL PROTECTED]> wrote:

>Hi Ian,
>
>That's something I'm looking for. Right, a simple
source code which
>reads a database and adds the fields to the index.
What I've found also
>so far is another solution at
>http://www-128.ibm.com/developerworks/java/library/j-lucene/.
>First step is
>to export my MySQL database in simple XML and go from
there. It is just
>an additional step and I would stick with this if I
don't find another
>method to do all at once.
>
>Thanks,
>
>  Klaus
>
>-Original Message-
>From: Ian Lea [mailto:[EMAIL PROTECTED]
>Sent: Wednesday, July 13, 2005 10:19 AM
>To: java-user@lucene.apache.org
>Subject: Re: SIMPLE Lucene / MySQL Indexer
>
>Something like this?
>
>IndexWriter iw = whatever
>ResultSet rs = whatever
>
>while (rs.next()) {
>   Document ldoc = new Document();
>   ldoc.add(Field.Text("f1", rs.getString("f1"));
>   ldoc.add(Field.Unstored("f2", rs.getString("f2"));
>   ldoc.add(Field.Keyword("f3", rs.getString("f3"));
>   ...
>   iw.addDocument(ldoc);
>}
>
>rs.close();
>iw.close();
>
>
>On the IDE front, most people seem to use Eclipse
nowadays.
>
>
>--
>Ian.
>
>On 13/07/05, Klaus Hubert <[EMAIL PROTECTED]>
>wrote:
>> Hi,
>>
>> I played with several search engines to replace
>MySQL FULLTEXT index
>> and hope that Lucene is the best solution for that.
>>
>> I am reading Mannings book on Lucene in action and
>it seems to be the
>> most powerful search engine I found so far.
>>
>> I'm stuck at some problem and need help from you
>experts. I managed to
>> create an index as described in the examples. I
also
>managed to read a
>> MySQL database in Java.
>>
>> My question is, if anybody here has some SIMPLE
>example which does
>> this in one step. I am good in PHP and in Visual
>Basic, but very new
>> to Java. Maybe I'm using the wrong tools (NetBeans
>IDE and JCreator)
>> but I don't get it managed to create an Lucene
Index
>on 3 database
>> fields.
>>
>> I appreciate any help.
>>
>> Thank you so much,
>>
>>   Klaus
>
>-
>To unsubscribe, e-mail:
>[EMAIL PROTECTED]
>For additional commands, e-mail:
>[EMAIL PROTECTED]
>
>
>
>__
>Do You Yahoo!?
>Tired of spam?  Yahoo! Mail has the best spam
protection around
>http://mail.yahoo.com
>
>-
>To unsubscribe, e-mail:
[EMAIL PROTECTED]
>

RE: SIMPLE Lucene / MySQL Indexer

2005-07-13 Thread Klaus Hubert
Hi Chris,

I've not thought about that. I'm almost done with my
program and I will give
yours also a try as suggested. I have the lasest
(recommended) JDBC 3.1.10.
But I still have to download and install Tomcat or
similar to run your .war
file. I think 5-24h is not that bad, since you can
update the Lucene index
in future and not go over this long building time
again. Your demo looks
really nice and its fast. Congratulations!

Bye,

  Klaus 

-Original Message-
From: Chris Lu [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 13, 2005 5:47 PM
To: java-user@lucene.apache.org
Subject: Re: SIMPLE Lucene / MySQL Indexer

Hi, Klaus, thanks.

You can simply use DBSight to create the index. It's
in Lucene's standard
format.
And you can control index field type, analyzers, how
to select data from
database, number of java threads, etc, just by web UI.
No coding is needed.
We have a user who didn't know Lucene at all, and have
3 database searches
up and running in one week.

To index a huge index, say 1 million records, it may
take 5 ~ 24 hours
depends on the record size, computer size, etc.
Actually most of the time is
spent on JDBC pulling the data.

Special warning: MySQL's JDBC driver has a bug leading
to OutOfMemory if you
do a select with lots of rows. You must download the
latest JDBC(dev
version) and use setFetchSize().

Chris
---
Full-Text Search on Any Database
http://www.dbsight.net

On 7/13/05, Klaus Hubert <[EMAIL PROTECTED]>
wrote:
> Hi Chris,
> 
> this is indeed a cool application, but I need just
to create the 
> index. I definitely will look into your file and see
if it makes my 
> life easier. Can you tell any details how long it
took to create such 
> a huge index? What experiences you have with the
slowest search? Does 
> it go over 1 second? (I know, it depends on the
hardware, but I'm just
> wondering)
> 
> Thanks,
> 
>   Klaus
> 
> -Original Message-
> From: Chris Lu [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, July 13, 2005 5:04 AM
> To: java-user@lucene.apache.org
> Subject: Re: SIMPLE Lucene / MySQL Indexer
> 
> Please allow me to intraduce DBSight.
> It's based on Lucene, oriented for Any database
search.
> 
> Most of the things are done by web UI. No coding is
needed to create 
> your search.
> check out this demo.  http://search.dbsight.com
> 
> It's free to download and test. Free for developer
edition, non-profit 
> usage.
> 
> Chris Lu
> ---
> Full-Text Search on Any Database
> http://www.dbsight.net
> 
> Klaus Hubert wrote:
> 
> >Hi,
> >
> >I played with several search engines to replace
MySQL
> FULLTEXT index
> >and hope that Lucene is the best solution for that.
> >
> >I am reading Mannings book on Lucene in action and
it
> seems to be the
> >most powerful search engine I found so far.
> >
> >I'm stuck at some problem and need help from you
> experts. I managed to
> >create an index as described in the examples. I
also
> managed to read a
> >MySQL database in Java.
> >
> >My question is, if anybody here has some SIMPLE
> example which does this
> >in one step. I am good in PHP and in Visual Basic,
> but very new to
> >Java. Maybe I'm using the wrong tools (NetBeans IDE
> and JCreator) but I
> >don't get it managed to create an Lucene Index on 3
> database fields.
> >
> >I appreciate any help.
> >
> >Thank you so much,
> >
> >  Klaus
> >
> >__
> >Do You Yahoo!?
> >Tired of spam?  Yahoo! Mail has the best spam
> protection around
> >http://mail.yahoo.com
> >
>
>-
> >To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> >For additional commands, e-mail:
> [EMAIL PROTECTED]
> >
> >
> >
> >
> 
> 
> --
> Chris Lu
> --
> Free-Text Search on Any Database
> http://www.dbsight.net
> 
> 
>
-
> To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> For additional commands, e-mail:
> [EMAIL PROTECTED]
> 
> 
> 
> 
> 
> 
> Start your day with Yahoo! - make it your home page 
> http://www.yahoo.com/r/hs
> 
> 
>
-
> To unsubscribe, e-mail:
[EMAIL PROTECTED]
> For additional commands, e-mail:
[EMAIL PROTECTED]
> 
> 


--
Chris Lu
-
Full-Text Search on Any Database
http://www.dbsight.net

-
To unsubscribe, e-mail:
[EMAIL PROTECTED]
For additional commands, e-mail:
[EMAIL PROTECTED]






Start your day with Yahoo! - make it your home page
http://www.yahoo.com/r/hs
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]