ken
> every reasonable precaution to minimize this risk, but is not liable for
> any damage
> you may sustain as a result of any virus in this e-mail. You should carry
> out your
> own virus checks before opening the e-mail or attachment. Infosys reserves
> the
> right to monitor and review the content of all messages sent to or from
> this e-mail
> address. Messages sent to or from this e-mail address may be stored on the
> Infosys e-mail system.
> ***INFOSYS End of Disclaimer INFOSYS***
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
--
Thanks and Regards,
Prashant Ullegaddi,
Search and Information Extraction Lab,
IIIT-Hyderabad, India.
tried using Explanation for each document, but found it very slow. I
believe there
got to be another fast alternative to achieve the same.
--
Thanks and Regards,
Prashant Ullegaddi,
Search and Information Extraction Lab,
IIIT-Hyderabad, INDIA.
If you want to modify the way Lucene scores documents, I guess you need to
extend Similarity class and provide your own implementation. Take a look at:
http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/DefaultSimilarity.html
http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/
Hi,
How to normalize the Lucene score to the range [0, 1]?
Thanks,
Prashant.
Hi,
I've some indexes. As you all know, each has these files:
_0.fdt _0.fdx _hqy.fnm _hqy.frq _hqy.nrm _hqy.prx _hqy.tii _hqy.tis
segments_2 segments.gen
Once I merge those indexes into single index by (IndexWriter's
addIndexes()), the merged index has
only 3 files:
_0.cfs segments_2 se
want
> to remove unnecessary stored fields from the index and move them to a
> relational db to squeeze out better performance.
>
>
> Shashi
>
>
> On Tue, Aug 4, 2009 at 3:18 AM, prashant
> ullegaddi wrote:
> > I did that as well. Actually, we had 32 indexes init
; The facts expressed here belong to everybody, the opinions to me. The
> distinction is yours to draw
>
>
> On Tue, Aug 4, 2009 at 10:08 AM, prashant ullegaddi <
> prashullega...@gmail.com> wrote:
>
> > I'm running it on Quadcore, 2.4GHz each, 4GB R
rs, you really ought to tell us about your
> hardware, types of queries, etc.
>
> Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>
>
>
> ----- Original Message
>
Hi,
I've a single index of size 87GB containing around 50M documents. When I
search for any query,
best search time I observed was 8sec. And when query is expanded with
synonyms, search takes
minutes (~ 2-3min). Is there a better way to search so that overall search
time reduces?
Thanks,
Prashant
eives the HOST token type, and breaks it further to
> its
> components (e.g., extract "en", "wikipedia" and "org"). You can also return
> the original HOST token and its components.
>
> I hope this helps.
>
> Shai
>
> On Sun, Aug 2, 2009 at
d work...
> +title:"rahul dravid" +url:"en.wikipedia.org"
>
> Thanks,
> Phil
>
> On Sun, Aug 2, 2009 at 10:14 AM, prashant
> ullegaddi wrote:
> > Yes, I'm sure that title:"Rahul Dravid" is extracted properly, and there
> is
> >
; field?
>
> You can read about Luke here: http://www.getopt.org/luke/.
>
> Can you do System.out.println(document.toString()) before you add it to the
> index, and paste the output here?
>
> Shai
>
> On Sun, Aug 2, 2009 at 4:47 PM, prashant ullegaddi <
> prashullega...@gmail.com
>
l Dravid" since you index it under
> "url" and not "title".
> 2) url:"wiki/Rahul_Dravid" works, since it looks for a phrase that exists
> in
> the index (look at the last 3 tokens produced by the Analyzer, in the
> output
> above).
> 3) ur:&quo
Hi,
I've indexed some 50million documents. I've indexed the target URL of each
document as "url" field by using
StandardAnalyzer with index.ANALYZED. Suppose, there is a wikipedia page
with title:"Rahul Dravid" and
url: http://en.wikipedia.org/wiki/Rahul_Dravid.
But when I search for +title:"Rahu
In MultiFieldQueryParser, you can mention different fields of the document
which can
be searched for
E.g. in contents of the document, if you index different fields such as URL,
BOLD, ITALIC, you can search over all of them.
Additionally, there is provision to boost a field over the other as well.
It might be because there are hardly any documents containing both the
words.
Try exact search: "\"tall fat\""
On Fri, Jul 31, 2009 at 3:31 PM, bourne71 wrote:
>
> Hi, new here.
>
> I recently started using lucene and had encounter a problem.I crawl and
> index a number of documents.
> When i pe
Thanks Ahmet. This answers my question.
On Fri, Jul 31, 2009 at 1:30 PM, AHMET ARSLAN wrote:
>
>
> > Given a term say "apache", I want to look up the lucene index
> > programmatically to find out its frequency in the corpus.
>
> I think you are asking collection frequency of a term. Term Frequen
Given a term say "apache", I want to look up the lucene index
programmatically to find out its frequency in the corpus.
On Fri, Jul 31, 2009 at 12:23 AM, wrote:
>
> prashant ullegaddi wrote:
> > How to get the number of times a term occurs in the Lucene
How to get the number of times a term occurs in the Lucene index?
Regards,
Prashant.
gt;
>
> On Jul 19, 2009, at 7:55 AM, prashant ullegaddi wrote:
>
> Hi,
>>
>> We have some 50M pages, and we also have computed PageRanks of those
>> pages.
>> What's the best way to combine lucene's score with PageRank?
>>
>> Regards,
>&
Yes you can use Hadoop with Lucene. Borrow some code from Nutch. Look at
org.apache.nutch.indexer.IndexerMapReduce and org.apache.nutch.indexer.
Indexer.
Prashant.
On Wed, Jul 22, 2009 at 2:00 PM, m.harig wrote:
>
> Thanks Shai
>
> So there won't be problem when searching that kind of
Hi,
We have some 50M pages, and we also have computed PageRanks of those pages.
What's the best way to combine lucene's score with PageRank?
Regards,
Prashant.
t's there. Nothing in your e-mails indicates that you
>> *should* get any hits. Although I admin not getting jakarta lucene in
>> 50M pages seems unlikely.
>>
>> But Ian's suggestion that you start with a smaller index is spot on.
>>
>> Best
>> E
t;
> On Thu, Jul 16, 2009 at 9:23 PM, prashant ullegaddi <
> prashullega...@gmail.com> wrote:
>
> > Hi
> >
> > I'm unable to find this class in lucene-core-2.4.1.jar. Is there other
> jar
> > file I need to
> > download to get this?
> >
> > Regards,
> > Prashant.
> >
>
Hi
I'm unable to find this class in lucene-core-2.4.1.jar. Is there other jar
file I need to
download to get this?
Regards,
Prashant.
to draw....
>
>
> On Thu, Jul 16, 2009 at 6:04 PM, prashant ullegaddi <
> prashullega...@gmail.com> wrote:
>
> > Hi,
> >
> > I tried searching:
> > "Apache Jakarta"~10
> >
> > Nothing was returned. What might be wrong?
> >
> > Regards,
> > Prashant.
> >
>
Sorry, subject should have been: Unable to do proximity search.
Also, how to do exact search in Lucene?
~
Prashant
On Thu, Jul 16, 2009 at 6:04 PM, prashant ullegaddi <
prashullega...@gmail.com> wrote:
> Hi,
>
> I tried searching:
> "Apache Jakarta"~10
>
> N
Hi,
I tried searching:
"Apache Jakarta"~10
Nothing was returned. What might be wrong?
Regards,
Prashant.
28 matches
Mail list logo