Re: Lucene in Action

2007-10-09 Thread Peter W.
Query, facets, boosting, duplicates, span queries and highlighting stuff. Even Luke and the other Lucene related projects. Thanks, Peter W. Otis Gospodnetic wrote: Peter - LIA2 is in progress! :) LIA2IP? - To unsubscribe, e

Lucene in Action

2007-10-09 Thread Peter W.
Hello, How is progress on the new Lucene in Action coming? Thanks, Peter W. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: How to keep user search history and how to turn it into information?

2007-08-14 Thread Peter W.
Lukas, One last thing, be sure to log only when a user clicks on a result and in Hadoop document_id will be a key in the map phase. Lucene related steps are the same. Best, Peter W. On Aug 14, 2007, at 1:28 PM, Peter W. wrote: When users perform a search, log the unique document_id, IP

Re: How to keep user search history and how to turn it into information?

2007-08-14 Thread Peter W.
rt results (reverse order) by score field. A more advanced version could store previous result positions as Payloads but I don't understand this new Lucene concept. Regards, Peter W. On Aug 10, 2007, at 5:56 AM, Lukas Vlcek wrote: Enis, Thanks for your time. I gave a quick glance at Pig an

Re: In memory MultiSearcher

2007-05-22 Thread Peter W.
)) }; MultiReader mr=new MultiReader(indexr_a); IndexSearcher is=new IndexSearcher(mr); Regards, Peter W. On May 22, 2007, at 1:10 AM, Chris Hostetter wrote: ...and if you are "Multi Searching" over a bunch of local directories anyway, then use a single INdexSea

Re: In memory MultiSearcher

2007-05-22 Thread Peter W .
a try in the servlet init() method! Regards, Peter W. On May 21, 2007, at 2:46 PM, Erick Erickson wrote: Why are you doing this in the first place? Do you actually have evidence that the default Lucene behavior (caching, etc) is inadequate for your needs? I'd *strongly* recommend, if y

In memory MultiSearcher

2007-05-21 Thread Peter W.
(searcher_a); ... } catch(Exception e) { System.out.println(e); } For example, one of several indexes is 768MB. Is there possibly a better way to do this? Regards, Peter W. - To

Re: Design question

2007-04-13 Thread Peter W .
which Lucene index user data is written to. Another option to consider is Solr. Regards, Peter W. On Apr 13, 2007, at 1:39 AM, Dan Wiggin wrote: But so often, when a developer search how to work with lucene finds normally the same code for same problems. I think it will be useful creat

Re: Range search in numeric fields

2007-04-04 Thread Peter W .
thn_f,morethn_f}; Filter rf=new ChainedFilter(fa,ChainedFilter.AND); return rf; } It's more expensive at index time, has a bigger storage requirement and is slower than in-memory but should give the desired functionality. Regards, Peter W. On Apr 3, 2007, at 10:59 AM, Andy Liu wrote:

Re: precision double sortable String

2007-04-02 Thread Peter W.
One more thing... It could optionally be indexed and stored as a String then contents of the Hits object could be placed into a Collection with a comparator that sorts double values in reverse order. Regards, Peter W. On Apr 2, 2007, at 12:02 PM, "Peter W." <[EMAIL PROTECT

Re: Contextual text-link ads

2007-03-29 Thread Peter W.
ayments and pay-per-action (conversions) would be gravy. Beyond just cloning what's out there, the collective experience of the Lucene community could take leadership in paid search. Best, Peter W. On Mar 27, 2007, at 12:45 PM, Doron Cohen wrote: Assuming you don't mean UI design - ho

Contextual text-link ads

2007-03-27 Thread Peter W.
Howdy, Does anyone have any design considerations for implementing a contextual text-link advertising system using Lucene? The emphasis would be strictly on monetizing search results with light, non-intrusive behavior (query terms match sponsored results). Thanks, Peter W

Re: How to customize scoring using user feedback?

2007-03-24 Thread Peter W.
are unique for each query. Utilizing "user feedback to improve search results" with clickstream data could be a sub-project in itself. It moves into future areas of personalization and would be a cool add-on to Lucene. Hope that helps, Peter W. Because scoring The way it appears to On Ma

Re: Sort Performance Question

2007-03-20 Thread Peter W .
ilter would be constructed using two RangeFilters setting upper and lower date boundaries (Strings) combined using NumberTools and ChainedFilter. With a subset of your matching results sorting should be much faster. Regards, Peter W. On Mar 20, 2007, at 12:39 PM, David Seltzer wrote: Hi All,

Re: How to customize scoring using user feedback?

2007-03-15 Thread Peter W.
ing a Sort Object and passing in an array of SortFields with "votes" as type SortField.STRING first. Precedence of sort order kicks in and your docs with more clicks rank higher. If everything goes well you will have results ordered by user generated scoring. Regards, Peter W. On Mar

Re: Soliciting Design Thoughts on Date Searching

2007-02-28 Thread Peter W.
StringBuffer delimited by commas, then make one long String (holding all your dates) and add to the Lucene doc as one Field.Text. You might be able to set that Field to indexed, but not stored to save space. Regards, Peter W. On Feb 28, 2007, at 11:22 AM, Aigner, Thomas wrote: Walt, I am no

Re: Using Lucene - Design Question

2007-02-22 Thread Peter W.
r. For someone trying to get work done, use incremental updates to one local index first. Then explore writing to multiple indexes and reading them using MultiSearcher. Afterward, use HTTP-based updates/requests with Solr to scale out. Hope that helps. Peter W. On Feb 20, 2007, at 5:29 PM, ori

Re: pagination

2007-02-22 Thread Peter W.
ariable to keep track of which page you are on and a static method which returns min/max values to be included in your iteration loop. You can also see my previous attempt at solving this: http://www.gossamer-threads.com/lists/lucene/java-user/43595 Regards, Peter W. On Feb 21, 2007, at

Re: How to not tokenize HTML tag from input string

2007-02-08 Thread Peter W.
Hello, Using a parser to get text out of HTML, XML (including RSS, ATOM) is only easy if you control the source documents. HTML pages in the wild are much different, generating exceptions you must catch and deal with. For most projects you can probably use java.util.regex to obtain keywo

remote index update question

2007-02-02 Thread Peter W.
outside searching need to be turned off during updates? Also, assuming this runs hourly, and if I need to close then open each time, how can a seamless user experience (no frozen queries, minimal delays) be achieved? Thanks. Peter W

Re: Lucene Internals question

2007-01-22 Thread Peter W .
ath, those who are can find an explanation of the latter here: http://www.ams.org/featurecolumn/archive/pagerank.html Regards, Peter W. On Jan 22, 2007, at 12:00 PM, Mark Miller wrote: Well first Lucene checks all of the other documents in the world for any that that refer to the document

Re: lucene scalability questions

2007-01-04 Thread Peter W.
est 2.0 version release the Lucene in Action book provides good background on combining separate indexes. Regards, Peter W. On Jan 4, 2007, at 7:51 AM, Mark Mei wrote: So this question has two parts: 1. How does Lucene scale, exactly? Do we distribute the index to multiple servers somehow? Or

Re: Clustering Lucene with 40 Servers

2007-01-02 Thread Peter W.
separated data files would be exposed thru a web service where load balanced remote boxes access them using servlets. They connect in rotation downloading batched index updates. Heck, start splitting up big files using Hadoop's HDFS and make it a party! Re

Re: Paging Lucene Results

2006-12-28 Thread Peter W.
g*hpp); else // few results ri=hc; } // inner if else ri=hc; } // else return ri; } Also, is there an available sample of using TopDocs .search()? Peter W. On Dec 27, 2006, at 10:33 PM,

Paging Lucene Results

2006-12-27 Thread Peter W .
Hello, I'm trying to iterate or page through Lucene document hits results. Before reinventing this, is there an existing solution out there or in Solr? Thanks in advance, Peter - To unsubscribe, e-mail: [EMAIL PROTECTED] Fo

Re: help finding docs, creating analyzer objects

2006-12-26 Thread Peter W.
Hello, I just got this working in three or four steps: 1. goto http://www.apache.org/dyn/closer.cgi/lucene/java/ 2. click on any of the mirrors and download "lucene-2.0.0.zip" 3. unzip into preferred directory (step not shown), then use jar to look at snowball items: jar tvf /opt/lucene-2.0.

Re: BooleanQuery.TooManyClauses exception

2006-10-17 Thread Peter W .
Another solution is work with plain java dates and calendar objects, convert into Lucene strings using DateTools (resolution day) then query this field with two RangeFilters using ChainedFilter. You will never get the BooleanQuery error. Peter On Oct 17, 2006, at 10:57 AM, Bushey, John wrote

Re: search with RangeFilter.Less

2006-06-28 Thread Peter W .
undary: Filter filter=RangeFilter.Less("num",NumberTools.longToString(10L)); // field num < 10 ... FilteredQuery fq = new FilteredQuery(query,filter); The NumberTools.longToString() method is supposed to replace padding leading 0's eliminating string comparison issues. Hopefully, so

search with RangeFilter.Less

2006-06-28 Thread Peter W .
eld.Index.TOK ENIZED) ); writer.addDocument(doc); writer.optimize(); writer.close(); } } Since five is less than ten, why doesn't it work? Thanks. Peter W. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]