Gentlemen,
A join like operation between Lucene indexes can be done with
(at least) reasonable performance by using a few standard
methods from RDB's: sort before going to disk, and cache
whenever possible. The steps are:
- query the first Lucene index with the low level search API to get the
Lu
Hello everybody.
We are building a complex automatic classification system using Lucene.
We need to manage normalized Tf/Idf (Term Frequency / Inverse Document
Frequency).
We understood that Lucene can give us Tf and Df and we are using these
values to calculate the normalized Tf/Idf but we would l
The Term Vector code can be used to get the term frequencies from a
specific document. Search this list, see the Lucene In Action book or
look at http://www.cnlp.org/apachecon2005 for examples on how to use
Term Vectors
Danilo Cicognani wrote:
Hello everybody.
We are building a complex autom
Hello list,
I want to know if a human written query passed through the
QueryParser is "clean" from fields, boolean clauses and query
indicators. Easy way out would of course to add a boolean that resets
at ReInit(), but maybe there is a smart way to do it. Perhaps it is
possible to treat
Hello,
We am using Lucene to facilitate searching of our applications log files. I
am noticing some inconsistencies in result sets when searching on certain
fields.
One field we index is the file path. I am using a simple query like
"location:Z:\logs\someLogFile.log". However, I can never get pat
It would be helpful to download Luke (http://www.getopt.org/luke/) and
analyze whats getting indexed. Have you tried that?
On 4/14/06, Bill Snyder <[EMAIL PROTECTED]> wrote:
>
> Hello,
>
> We am using Lucene to facilitate searching of our applications log files.
> I
> am noticing some inconsistenc
14 apr 2006 kl. 16.37 skrev Bill Snyder:
One field we index is the file path. I am using a simple query like
"location:Z:\logs\someLogFile.log". However, I can never get path
searches
like this to come back with any results. Tried escaping the
backslashes and
colon. Nothing seems to work.
Oh, cool. Look at that. A neat tool made with thinlets. I had not heard of
this...I'll see if it helps me figure out whats going on.
--Bill
On 4/14/06, Rajesh Munavalli <[EMAIL PROTECTED]> wrote:
>
> It would be helpful to download Luke (http://www.getopt.org/luke/) and
> analyze whats getting in
AHA! I am using the Search tab and have enteres the query :
location:Z:\install\logs\archive.log.D20060406.T141958
the query details says the query was parsed to
location:z
so if I escape the colon I see the new parsed query as
location:"z installlogsarchive.log.d20060406.t141958"
So, lucenc
14 apr 2006 kl. 17.11 skrev Bill Snyder:
so if I escape the colon I see the new parsed query as
location:"z installlogsarchive.log.d20060406.t141958"
So, lucence does not store the file path exactly?! It converts it
all lower
case! Is there some property I should turn on?
It is the Anal
On 4/14/06, Bill Snyder <[EMAIL PROTECTED]> wrote:
>
> AHA! I am using the Search tab and have enteres the query :
>
> location:Z:\install\logs\archive.log.D20060406.T141958
>
> the query details says the query was parsed to
>
> location:z
>
> so if I escape the colon I see the new parsed query as
I would like to store all in my application rather than using the
Lucene persistency mechanism for tokens. I only want the search
mechanism. I do not need the IndexReader and IndexWriter as that will
be a natural part of my application. I only want to use the Searchable.
So I looked at exte
Thanks! OK, how do I get the file separator to be part of the term? Luke
shows the parsed query as ignoring the file separator.
so location:Z\:\\/install/logs\\jetspeedservices.log
becomes location:"z install logs jetspeedservices.log"
--Bill
On 4/14/06, Rajesh Munavalli <[EMAIL PROTECTED]> w
tried MultiFieldQueryParser?
Chris Lu
---
Full-Text Lucene Search on Any Databases
http://www.dbsight.net
Faster to Setup than reading marketing materials!
On 4/14/06, karl wettin <[EMAIL PROTECTED]> wrote:
> Hello list,
>
> I want to know if a human written qu
14 apr 2006 kl. 17.22 skrev karl wettin:
It is the Analyzer that does that. Try creating your IndexSearcher
with a KeywordAnalyzer (it think).
err
It is the Analyzer that does that. Try using a KeywordAnalyzer (it
think).
14 apr 2006 kl. 17.41 skrev Chris Lu:
On 4/14/06, karl wettin <[EMAIL PROTECTED]> wrote:
I want to know if a human written query passed through the
QueryParser is "clean" from fields, boolean clauses and query
indicators. Easy way out would of course to add a boolean that resets
at ReInit(),
use Store.NO when creating Field
Chris Lu
---
Full-Text Lucene Search on Any Databases
http://www.dbsight.net
Faster to Setup than reading marketing materials!
On 4/14/06, karl wettin <[EMAIL PROTECTED]> wrote:
> I would like to store all in my application rat
14 apr 2006 kl. 17.46 skrev Chris Lu:
On 4/14/06, karl wettin <[EMAIL PROTECTED]> wrote:
I would like to store all in my application rather than using the
Lucene persistency mechanism for tokens. I only want the search
mechanism. I do not need the IndexReader and IndexWriter as that will
be a
On 14 Apr 2006, at 08:51, karl wettin wrote:
You missunderstand all my questions.
I must admit I was not sure I understood your question, either. In
order to search, Lucene needs an index. That index is maintained by
the IndexReader and IndexWriter classes. Are you contemplating
having
14 apr 2006 kl. 17.51 skrev karl wettin:
14 apr 2006 kl. 17.46 skrev Chris Lu:
On 4/14/06, karl wettin <[EMAIL PROTECTED]> wrote:
I would like to store all in my application rather than using the
Lucene persistency mechanism for tokens. I only want the search
mechanism. I do not need the In
14 apr 2006 kl. 17.51 skrev Christophe:
Are you contemplating having your own index and index format? In
that case, it's not clear to me how much leverage you will be
getting using Lucene at all. Could you explain in more detail what
you are trying to do?
I want to use the parts of Luc
oops, thought that you were just referring to the lowercase...
:)
On 4/14/06, karl wettin <[EMAIL PROTECTED]> wrote:
>
>
> 14 apr 2006 kl. 17.22 skrev karl wettin:
> >
> > It is the Analyzer that does that. Try creating your IndexSearcher
> > with a KeywordAnalyzer (it think).
>
> err
>
> It is th
On 14 Apr 2006, at 08:55, karl wettin wrote:
I don't want to use Lucene for persistence. I do not want to store
tokens nor field text in a FSDirectory or in a RAMDirectory. I want
to take store the the tokens in my application.
If I understand your question, I think that the first answer wa
Sorry, really misunderstood you. And you already know Lucene a lot. :)
Basically you want to restore the original query from the Query
object. But it may have already passed a lot of composition, like
Boolean, Span, Wildcard.
I don't feel it's possible to reconstruct the original human query.
Ch
14 apr 2006 kl. 17.56 skrev karl wettin:
14 apr 2006 kl. 17.51 skrev Christophe:
Are you contemplating having your own index and index format? In
that case, it's not clear to me how much leverage you will be
getting using Lucene at all. Could you explain in more detail
what you are tr
Thanks, Christophe.
Hi, Kevin,
I think your question means you want to store the Analyzed tokens yourself?
If so, you can use Analyzer to directly process the text, and save the
analyzed results in your application, maybe later use it in some
RDBMS? or BerkelyDB?
Chris Lu
---
14 apr 2006 kl. 18.01 skrev Christophe:
On 14 Apr 2006, at 08:55, karl wettin wrote:
I don't want to use Lucene for persistence. I do not want to store
tokens nor field text in a FSDirectory or in a RAMDirectory. I
want to take store the the tokens in my application.
If I understand your
Hi all,
I came across an old mail list item from 2003 exploring the possibilities of a
more probabilistic approach to using Lucene. Do the online experts know if
anyone achieved this since?
Thanks for any advice,
Malc
karl wettin wrote:
I would like to store all in my application rather than using the
Lucene persistency mechanism for tokens. I only want the search
mechanism. I do not need the IndexReader and IndexWriter as that will
be a natural part of my application. I only want to use the Searchable.
Wow, I finally found out why I was getting results in the wrong order
- I got the results in the correct order from the Lucene index. I
got the explanation of each of the results along with their database
id and found the ordering mismatch.
The problem is in the database call. I am calling
14 apr 2006 kl. 18.31 skrev Doug Cutting:
karl wettin wrote:
I would like to store all in my application rather than using the
Lucene persistency mechanism for tokens. I only want the search
mechanism. I do not need the IndexReader and IndexWriter as that
will be a natural part of my a
We tried two approaches:
1) Pull data from the db in arbitrary order and then sort in the application
AFTER the retrieve. This will require two passes over the results.
2) Add an order by clause to the select. In Oracle, you could do something
like "order by decode(444,1,333,2,555,3,888,4,
On 4/14/06, karl wettin <[EMAIL PROTECTED]> wrote:
> Do I have to worry about passing a null Directory to the default
> constructor?
That's not an easy road you are trying to take, but it should be doable.
There are some final methods you can't override, but just set
directoryOwner=false and close
Something that took me a while to get was that the analyzer is important
BOTH in the indexing phase and in the searching phase (assuming you're using
the QueryParser). For you experiment, you probably want to use the
WhitespaceAnalyzer. See page 119 of "Lucene in Action".
The other three most-comm
On 4/14/06, Erick Erickson <[EMAIL PROTECTED]> wrote:
>
> Something that took me a while to get was that the analyzer is important
> BOTH in the indexing phase and in the searching phase (assuming you're
> using
> the QueryParser). For you experiment, you probably want to use the
> WhitespaceAnalyz
I would use a database function to force the ordering like the one
your provided that works in Oracle, but it doesn't look like mysql 5
supports that. If anyone else knows of a way to force the ordering
using mysql 5 queries, please respond. I think I'll just resort them
when they get bac
karl wettin wrote:
Do I have to worry about passing a null Directory to the default
constructor?
A null Directory should not cause you problems.
Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mai
Jeremy Hanna wrote:
I would use a database function to force the ordering like the one your
provided that works in Oracle, but it doesn't look like mysql 5
supports that. If anyone else knows of a way to force the ordering
using mysql 5 queries, please respond. I think I'll just resort th
I still have a similar problem with the boost factor. I change the
name to have the AND operator and set that query's boost to a very
high value in relation to the others. I also have a regular OR based
name so that it doesn't rule those out. However whenever I change
the boost values wi
Im the co-worker who suggested to Ananth( I've think we have been debating
this for 3 days now,from the post it seems he is winning :)... )
Anway, as Ananth stated I suggested this because I am wondering if lucene
could solve a bottle neck query that is taking a deathly long time to
complete(rea
Hi Lucene Users,
I would like to catch BooleanQuery.TooManyClauses exception for certain
wildcard searches and display a 'subset' of results. I have used the
WildcardTermEnum to give me the first X documents matching the wildcard
query. Below is the code I use to implement the solution.
Witho
41 matches
Mail list logo