hi, i am using lucene for the very first time and want to manipulate the
results, by adding some more factors to it, which file should i edit to
manipulate the search results
Thanks
Sumit Tyagi
--
View this message in context:
http://www.nabble.com/Which-file-in-the-lucene-package-is-used-
Hi all,
I am using Hibernate Search (http://www.hibernate.org/410.html) which is a
wrapper around Lucene for performing search over info stored in the DB. I have
questions related to Lucene boosting Vs sorting:
Is index time boosting of documents and fields better than specifying sorting
para
Gotchya. Well, if you want to check a doc at a time you could use
getSpans for a NearSpan query and just count how many you get. No ideas
off the top of my head if you want the result like a score in that you
get it for each hit in a search of a whole corpus.
- Mark
Jeff wrote:
If I am not m
I actually hadn't implemented the TokenFilter solution before deciding not
to go with that solution, so didn't have any benchmark.
But eventually I have taken care of this problem with a different
variation of your quick and dirty solution. I have captured the character
'@' in FastCharStream.java,
I think you need to back up and think about what you're trying to
accomplish. Just throwing the file into a single document in
your index doesn't seem very useful.
Of course you can pre-process the input and index only what
you want. The examples in the Lucene demo just show
you how to index entir
20 dec 2007 kl. 22.32 skrev [EMAIL PROTECTED]:
In fact I had previously located the grammar in StandardTokenizer.jj
(just wasn't sure if that was the one u were talking about) and had
commented out EMAIL entries from all the following files:
StandardTokenizer.java
StandardTokenizer.jj
Stand
If I am not mistaken, that is for a term.. Is it possible for a query? In
the below example, I don't want to know how many times brown is in the
document I want to know how many times "quick brown" is in the document.
Thanks,
Jeff
On Dec 20, 2007 3:03 PM, Mark Miller <[EMAIL PROTECTED]> wrote:
>
Interesting I am trying to make our logs searchable and thought of
trying Lucene. I am talking of several (around 50-60) 2GB files to index.
Would it scale? How can I index portion of document? Also like with any log
, there is a pattern and most of the stuff in there is redundant. Can i
discar
Lucene, by default, only indexes the first 10,000 tokens and throws
the rest away. You can change this via IndexWriter.SetMaxFieldLength.
2G is a huge file. Are you indexing all that or are you indexing only
portions?
Erick
On Dec 20, 2007 5:20 PM, Baljeet Dhaliwal <[EMAIL PROTECTED]> wrote:
>
Hi Erick
Thanks. I found something interesting. I was indexing huge text files (>2GB)
and the search was not returning escape characters. However, when I moved
the line to a smaller file (20MB), it works fine. Is there a limit on file
size search by Lucene or would you know how do escape character
Karl,
I should have mentioned before, I have Lucene 1.9.1.
In fact I had previously located the grammar in StandardTokenizer.jj (just
wasn't sure if that was the one u were talking about) and had commented
out EMAIL entries from all the following files:
StandardTokenizer.java
StandardTokenizer.j
You can override the scoring system and only score by term frequency
(use a 1 or whatever creates a no-op for the other factors). If you have
indexed with norms than you will have to use a Reader that ignores them
to do this.
- Mark
Jeff wrote:
I don't care about score, but I do care about t
I don't care about score, but I do care about the # of times a query was hit
within a document? example:
the quick brown fox jumped over the lazy dog
the quick brown fox jumped over the lazy dog
the quick brown fox jumped over the lazy dog
the quick brown fox jumped over the lazy dog
the slow b
20 dec 2007 kl. 20.21 skrev [EMAIL PROTECTED]:
I would rather like to modify the lexer grammar. But exactly where
it is
defined. After having a quick look, seems like
StandardTokenizerTokenManager.java may be where it is being done.
http://svn.apache.org/repos/asf/lucene/java/trunk/src/java
Thanks Karl,
I would rather like to modify the lexer grammar. But exactly where it is
defined. After having a quick look, seems like
StandardTokenizerTokenManager.java may be where it is being done.
Ampersand having a decimal value of '38', I was assuming that the
following step is taken when face
20 dec 2007 kl. 18.43 skrev [EMAIL PROTECTED]:
I am using StandardAnalyzer for my indexes. Now I don't want to be
able to
be search whole email addresses, and want to consider '@' as a
punctuation
too. Because my users would rather be able to search for user id and/
or
the host name to ret
I am using StandardAnalyzer for my indexes. Now I don't want to be able to
be search whole email addresses, and want to consider '@' as a punctuation
too. Because my users would rather be able to search for user id and/or
the host name to return all the email addresses than searching by the
whole a
use Luke. Google Lucene luke and you'll find it.
You can use it to examine the contents of your index
in many different ways. It's invaluable when exploring
the different analyzers and making sure that your index
has what you *think* it has.
Erick
On Dec 19, 2007 10:48 PM, Baljeet Dhaliwal <
Have a look at the FunctionQuery capabilities in Lucene, whereby you
can use the value of a Field as a scoring factor. So, your
FunctionQuery would just do a simple calculation between the current
time and whatever date is in the document.
-Grant
On Dec 20, 2007, at 8:03 AM, prabin meitei
Hi,
Looking into older threads and some googling i came across some codes
where boosting is done at the time of indexing using time gap from 'epoch'
or a *base time*. With this approach what I am afraid of is that over a
period of time the boosting factor may go up and I may loose the relevence
f
On Dec 20, 2007 8:31 AM, Tushar B <[EMAIL PROTECTED]> wrote:
> Hi Doron,
>
> Just filed an issue in JIRA.
Thanks!
>
>
> Here are the requested stats:
> Index size-> around 11 million documents
> Query -> fieldname:[009 TO 999] (using CSRQ)
ConstantScoreRangeQuery, right?
>
> Result
Brain,
Can you simply describe the method you tried? I am very intertested in
that.
Jackson
2007/12/20, Brian Grimal <[EMAIL PROTECTED]>:
>
> I would love to revisit this one. I implemented pseudo date boosting in
> an overly simplistic manner in my app, which I know can be improved
> upon.
I would love to revisit this one. I implemented pseudo date boosting in an
overly simplistic manner in my app, which I know can be improved upon. Might
it be useful to reopen a thread on the topic?
Brian
-Original Message-
From: prabin meitei <[EMAIL PROTECTED]>
Sent: Wednesday, Dece
23 matches
Mail list logo