Re: Scaling Lucene to 1bln docs

2010-08-10 Thread prashant ullegaddi
ken > every reasonable precaution to minimize this risk, but is not liable for > any damage > you may sustain as a result of any virus in this e-mail. You should carry > out your > own virus checks before opening the e-mail or attachment. Infosys reserves > the > right to monitor and review the content of all messages sent to or from > this e-mail > address. Messages sent to or from this e-mail address may be stored on the > Infosys e-mail system. > ***INFOSYS End of Disclaimer INFOSYS*** > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Thanks and Regards, Prashant Ullegaddi, Search and Information Extraction Lab, IIIT-Hyderabad, India.

Retrieving field information for each hit when using "MultiFieldQueryParser"

2010-02-03 Thread prashant ullegaddi
tried using Explanation for each document, but found it very slow. I believe there got to be another fast alternative to achieve the same. -- Thanks and Regards, Prashant Ullegaddi, Search and Information Extraction Lab, IIIT-Hyderabad, INDIA.

Re: How to give a score for all documents?

2009-08-21 Thread prashant ullegaddi
If you want to modify the way Lucene scores documents, I guess you need to extend Similarity class and provide your own implementation. Take a look at: http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/DefaultSimilarity.html http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/

How to normalize Lucene score?

2009-08-16 Thread prashant ullegaddi
Hi, How to normalize the Lucene score to the range [0, 1]? Thanks, Prashant.

What happens after merging?

2009-08-05 Thread prashant ullegaddi
Hi, I've some indexes. As you all know, each has these files: _0.fdt _0.fdx _hqy.fnm _hqy.frq _hqy.nrm _hqy.prx _hqy.tii _hqy.tis segments_2 segments.gen Once I merge those indexes into single index by (IndexWriter's addIndexes()), the merged index has only 3 files: _0.cfs segments_2 se

Re: How to improve search time?

2009-08-04 Thread prashant ullegaddi
want > to remove unnecessary stored fields from the index and move them to a > relational db to squeeze out better performance. > > > Shashi > > > On Tue, Aug 4, 2009 at 3:18 AM, prashant > ullegaddi wrote: > > I did that as well. Actually, we had 32 indexes init

Re: How to improve search time?

2009-08-04 Thread prashant ullegaddi
; The facts expressed here belong to everybody, the opinions to me. The > distinction is yours to draw > > > On Tue, Aug 4, 2009 at 10:08 AM, prashant ullegaddi < > prashullega...@gmail.com> wrote: > > > I'm running it on Quadcore, 2.4GHz each, 4GB R

Re: How to improve search time?

2009-08-03 Thread prashant ullegaddi
rs, you really ought to tell us about your > hardware, types of queries, etc. > > Otis > -- > Sematext is hiring -- http://sematext.com/about/jobs.html?mls > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR > > > > ----- Original Message >

How to improve search time?

2009-08-02 Thread prashant ullegaddi
Hi, I've a single index of size 87GB containing around 50M documents. When I search for any query, best search time I observed was 8sec. And when query is expanded with synonyms, search takes minutes (~ 2-3min). Is there a better way to search so that overall search time reduces? Thanks, Prashant

Re: Weird behaviour

2009-08-02 Thread prashant ullegaddi
eives the HOST token type, and breaks it further to > its > components (e.g., extract "en", "wikipedia" and "org"). You can also return > the original HOST token and its components. > > I hope this helps. > > Shai > > On Sun, Aug 2, 2009 at

Re: Weird behaviour

2009-08-02 Thread prashant ullegaddi
d work... > +title:"rahul dravid" +url:"en.wikipedia.org" > > Thanks, > Phil > > On Sun, Aug 2, 2009 at 10:14 AM, prashant > ullegaddi wrote: > > Yes, I'm sure that title:"Rahul Dravid" is extracted properly, and there > is > >

Re: Weird behaviour

2009-08-02 Thread prashant ullegaddi
; field? > > You can read about Luke here: http://www.getopt.org/luke/. > > Can you do System.out.println(document.toString()) before you add it to the > index, and paste the output here? > > Shai > > On Sun, Aug 2, 2009 at 4:47 PM, prashant ullegaddi < > prashullega...@gmail.com >

Re: Weird behaviour

2009-08-02 Thread prashant ullegaddi
l Dravid" since you index it under > "url" and not "title". > 2) url:"wiki/Rahul_Dravid" works, since it looks for a phrase that exists > in > the index (look at the last 3 tokens produced by the Analyzer, in the > output > above). > 3) ur:&quo

Weird behaviour

2009-08-02 Thread prashant ullegaddi
Hi, I've indexed some 50million documents. I've indexed the target URL of each document as "url" field by using StandardAnalyzer with index.ANALYZED. Suppose, there is a wikipedia page with title:"Rahul Dravid" and url: http://en.wikipedia.org/wiki/Rahul_Dravid. But when I search for +title:"Rahu

Re: Is there any difference between using QueryParser and MultiFieldQueryParser when have single default search field ?

2009-07-31 Thread prashant ullegaddi
In MultiFieldQueryParser, you can mention different fields of the document which can be searched for E.g. in contents of the document, if you index different fields such as URL, BOLD, ITALIC, you can search over all of them. Additionally, there is provision to boost a field over the other as well.

Re: Boosting Search Results

2009-07-31 Thread prashant ullegaddi
It might be because there are hardly any documents containing both the words. Try exact search: "\"tall fat\"" On Fri, Jul 31, 2009 at 3:31 PM, bourne71 wrote: > > Hi, new here. > > I recently started using lucene and had encounter a problem.I crawl and > index a number of documents. > When i pe

Re: Term's frequency

2009-07-31 Thread prashant ullegaddi
Thanks Ahmet. This answers my question. On Fri, Jul 31, 2009 at 1:30 PM, AHMET ARSLAN wrote: > > > > Given a term say "apache", I want to look up the lucene index > > programmatically to find out its frequency in the corpus. > > I think you are asking collection frequency of a term. Term Frequen

Re: Term's frequency

2009-07-31 Thread prashant ullegaddi
Given a term say "apache", I want to look up the lucene index programmatically to find out its frequency in the corpus. On Fri, Jul 31, 2009 at 12:23 AM, wrote: > > prashant ullegaddi wrote: > > How to get the number of times a term occurs in the Lucene

Term's frequency

2009-07-30 Thread prashant ullegaddi
How to get the number of times a term occurs in the Lucene index? Regards, Prashant.

Re: PageRanking with Lucene

2009-07-22 Thread prashant ullegaddi
gt; > > On Jul 19, 2009, at 7:55 AM, prashant ullegaddi wrote: > > Hi, >> >> We have some 50M pages, and we also have computed PageRanks of those >> pages. >> What's the best way to combine lucene's score with PageRank? >> >> Regards, >&

Re: indexing 100GB of data

2009-07-22 Thread prashant ullegaddi
Yes you can use Hadoop with Lucene. Borrow some code from Nutch. Look at org.apache.nutch.indexer.IndexerMapReduce and org.apache.nutch.indexer. Indexer. Prashant. On Wed, Jul 22, 2009 at 2:00 PM, m.harig wrote: > > Thanks Shai > > So there won't be problem when searching that kind of

PageRanking with Lucene

2009-07-19 Thread prashant ullegaddi
Hi, We have some 50M pages, and we also have computed PageRanks of those pages. What's the best way to combine lucene's score with PageRank? Regards, Prashant.

Re: Unable to do exact search with Lucene.

2009-07-17 Thread prashant ullegaddi
t's there. Nothing in your e-mails indicates that you >> *should* get any hits. Although I admin not getting jakarta lucene in >> 50M pages seems unlikely. >> >> But Ian's suggestion that you start with a smaller index is spot on. >> >> Best >> E

Re: Unable to find: org.apache.lucene.index.memory.AnalyzerUtil

2009-07-17 Thread prashant ullegaddi
t; > On Thu, Jul 16, 2009 at 9:23 PM, prashant ullegaddi < > prashullega...@gmail.com> wrote: > > > Hi > > > > I'm unable to find this class in lucene-core-2.4.1.jar. Is there other > jar > > file I need to > > download to get this? > > > > Regards, > > Prashant. > > >

Unable to find: org.apache.lucene.index.memory.AnalyzerUtil

2009-07-16 Thread prashant ullegaddi
Hi I'm unable to find this class in lucene-core-2.4.1.jar. Is there other jar file I need to download to get this? Regards, Prashant.

Re: Unable to do exact search with Lucene.

2009-07-16 Thread prashant ullegaddi
to draw.... > > > On Thu, Jul 16, 2009 at 6:04 PM, prashant ullegaddi < > prashullega...@gmail.com> wrote: > > > Hi, > > > > I tried searching: > > "Apache Jakarta"~10 > > > > Nothing was returned. What might be wrong? > > > > Regards, > > Prashant. > > >

Re: Unable to do exact search with Lucene.

2009-07-16 Thread prashant ullegaddi
Sorry, subject should have been: Unable to do proximity search. Also, how to do exact search in Lucene? ~ Prashant On Thu, Jul 16, 2009 at 6:04 PM, prashant ullegaddi < prashullega...@gmail.com> wrote: > Hi, > > I tried searching: > "Apache Jakarta"~10 > > N

Unable to do exact search with Lucene.

2009-07-16 Thread prashant ullegaddi
Hi, I tried searching: "Apache Jakarta"~10 Nothing was returned. What might be wrong? Regards, Prashant.