I don't want to advertise DBSight there to polute the mailing list.
But this recurring lucene database question often comes out. So here
is the reply. And Tony, these questions you can contact us directly
later.
What you asked is exactly what DBSight does. And all you need to do to
get it up and r
Chris,
Thanks for your reply. I'm currently evaluating different products for full
text search. I visited DBSight web site briefly. Basically, I have a main
table which has a number of fields for Ids from look-up tables, for example
category_id, brand_id, product_id etc. Main table also has a
Wildcard stuff gets fun and interesting. Have a look at
SpanRegexQuery in the contrib/regex codebase. It is a more
generalized version of WildcardQuery, but within the SpanQuery family.
Erik
On Apr 13, 2006, at 11:22 AM, Erick Erickson wrote:
More of the same
So, now all I
On Apr 13, 2006, at 8:55 PM, Jeremy Hanna wrote:
Looking at the results, the first document in the results should
hopefully be near the bottom and the Explanation for this document
has a Description/Details (using the toString() on the Explanation)
of:
product of:
0.0 = sum of:
0.0 =
Thanks for the tip. I'm trying to decipher what the explanation
tells me right now.
Btw, here is the code that I'm currently running:
QueryParser nameParser = new QueryParser("name", analyzer);
QueryParser categoryParser = new QueryParser("category", analyzer);
QueryParser descripti
The best recommendation is to have a look at the Explanation returned
from IndexSearcher.explain() for a specific query and document to
trace how things are being scored. Is it possible you're boosting
all documents by the same amount?
Erik
On Apr 13, 2006, at 6:29 PM, Jeremy Han
I have a situation where I'm indexing database entries and have
fields such as:
name
sku
model
category name
description
features
specifications
I am trying to set a priority higher for the name, category name, and
description.
I've tried setting the fields' boost values as I've indexed th
I have a situation where I'm indexing database entries and have
fields such as:
name
sku
model
category name
description
features
specifications
I am trying to set a priority higher for the name, category name, and
description.
I've tried setting the fields' boost values as I've indexed th
Erick,
Don't get me wrong. I agree with you 100 percent on everything you just
said, and have been advocating what you are saying. I turned to the forum to
get other peoples thoughts on the issue, feeling that my perspective may be
a little warped, and wanted to see what the community thinks. I thi
On 4/13/06, Ananth T. Sarathy <[EMAIL PROTECTED]> wrote:
>
> No we do have drop downs selects that would allow for the substitution,
> but
> we also have a free text fields to allow the user to search. That solution
> would I think work for the DB query replacement, but you would need a
> regular n
Depending on what you are trying to search. Let's say your table A is
the table with the "real" content, and B and C are your lookup table.
You should build one index in this case. And select A's content
together with B and C's lookup value into the documents.
I strongly recommend you take a look
: Also we need to address the Join Between A and B and C, which I don't know
: see how with out taking out values out of the hit list.
When discussing Index structure strategies, speaking in generalities like
A B and C is hard .. because there is no 100% generaic solution about how
to "join" X an
No we do have drop downs selects that would allow for the substitution, but
we also have a free text fields to allow the user to search. That solution
would I think work for the DB query replacement, but you would need a
regular non underscored field to allow for free text.
On 4/13/06, Erick Erick
OK, we've defined the problem away for the present. Which is probably a very
good thing. Scratch that, there's no doubt that it's a very good thing .
I'm still interested in anything you have to say, of course, but for now I'm
off and running
Thanks again for all your help
Erick
Well, that's a problem you must already be solving. Somewhere, you have to
construct your DB query and recognize what constitutes a "term". From your
previous mail, you imply you can construct this query...
select count(distinct Crew_ID) from Crew_TItles where Title="Producer"
Where did you get "
Hi,
> From: Ananth T. Sarathy [mailto:[EMAIL PROTECTED]
>
> How would a second index solve the 1:N relationship issue?
> For the record I agree that Lucene is Document Centric.
Why Lucene should solve this problem? What is your search document?
Or what is your search result? What are you goin
How would that work is some on a form
type Assistant Producer? How would that match the indexed Field if the value
added is Assistant_Producer?
On 4/13/06, Erick Erickson <[EMAIL PROTECTED]> wrote:
>
> Warning: I'm quite new to lucene, so this may not be very accurate
>
> What analyzers are yo
Also we need to address the Join Between A and B and C, which I don't know
see how with out taking out values out of the hit list.
On 4/13/06, Satuluri, Venu_Madhav <[EMAIL PROTECTED]> wrote:
>
> I am not sure having an index each for each table solves the problem.
>
> (Going by the schema I pu
Warning: I'm quite new to lucene, so this may not be very accurate
What analyzers are you using for indexing and searching? StandardAnalyzer
(like in most of the examples)? Because it looks like you're having a
tokenizer problem. That is, when you index "Assistant producer", you
actually index
I am not sure having an index each for each table solves the problem.
(Going by the schema I put in the earlier mail)
You have an index each for tables A, B and C. What is the
lucene-equivalent of the db query
A.field1 == value1 and B.field2 == value2 and C.field3 == value3.
You cant use MultiSe
How would a second index solve the 1:N relationship issue? For the record I
agree that Lucene is Document Centric.
On 4/13/06, Chris Lu <[EMAIL PROTECTED]> wrote:
>
> I agree with Jelda.
>
> Lucene is more document-centric. Storing the relationship is not a
> good idea. It's better to simply have
I agree with Jelda.
Lucene is more document-centric. Storing the relationship is not a
good idea. It's better to simply have 2 indexes. Usually when users
search, they can choose which index they want.
Of course, building the indexes will take more time to process-data.
Lucene can not replace re
No.. I don't see your solution is performant..
If each lucene Document corresponds to a row in 'A join B' then Index
explodes..
Index size drastically increases.
Why not then creating two indexs A and B.
And search for A and then from obtained A documents information search in B.
It seems for me
Yes, this is pretty much what I was trying to do. I am not sure how you
solve this with out he maintenance and the removal of duplicate hits logic
taking up so much time that any performance benefit from querying lucene is
lost.
On 4/13/06, Satuluri, Venu_Madhav <[EMAIL PROTECTED]> wrote:
>
> I th
I think you are asking if we can retain 1:n relationships in lucene.
Ok, I'll go out on a limb and give my solution. Say you have a table A
and table B with B having multiple rows associated to each row in A.
Also your documents are centered around A, i.e. all your queries return
some row(s) of A,
Sorry, hit submit in mid email
Ok,
Some of the stuff makes some sense. I was a little loopy from lack of
sleep and some of these solutions don't really cover my concerns
Let's take this movie example. If each member of a production Crew can have
multiple titles that come from a lookup tabl
Ok,
Some of the stuff makes some sense. I was a little loopy from lack of
sleep and some of these solutions don't really cover my concerns
Let's take this movie example. If each member of a production Crew can have
multiple titles that come from a lookup table of Distinct Jobs
Titles
Assis
More of the same
So, now all I want to do is a SpanNearQuery on a wildcard. No problem if I
can use a wildcard query, but I can't because of the "TooManyClauses" issue.
We have two types of wildcards, truncation (which I can handle by indexing
successively shorter terms and a special characte
I really, really recommend buying a copy of "Lucene In Action", especially
if you don't already have a good grasp of what indexing and searching is all
about. It's well worth the effort to read. Sure, you'll have a ton of
questions after you're done with it (I certainly had/do), but at least
you'll
Hi Kostas,
I am assuming you need to find all of the text files in the directory.
Assuming you're using JDK1.5 you need to do something like this:
File directory = new File("c:\my\directory");
FilenameFilter findTextFiles = new FilenameFilter() {
public boolean accept(File dir, String nam
Hi, I have a problem that has to do more with java than with lucene.
I have a folder that has about 524 text files (.txt) that I want to index.
I have made a program that works very well. It does indexing searching
etc...
...
IndexWriter writer;
GreekAnalyzer anal = ne
You can use text extractors for the document formats you mentioned.
Lucene as such does not deal with this text extraction process.
Following are the extractors we generally use:
PDF -> PDFBox: Java API to read PDF documents
http://www.pdfbox.org.
WORD-> Antiword: http://www
Hi all,
i am new to Lucene. i want to work indexing for PDF,word,txt files. can any
one tell me how to dun indexing by Lucene. please give some informetion.
Thanking you
shaik
--
View this message in context:
http://www.nabble.com/Lucene-Help-t1442764.html#a3896122
Sent from the Lucene - Java U
33 matches
Mail list logo