[OT] San Fran. Lucene/Solr Hack Night

2013-01-30 Thread Grant Ingersoll
heers, Grant Grant Ingersoll http://www.lucidworks.com

Re: Reg Lucene Naive Bayesian classifier.

2013-01-14 Thread Grant Ingersoll
-- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> >> Grant Ingersoll http://www.lucidworks.com

Re: Lucene for a linguistic corpus

2013-01-08 Thread Grant Ingersoll
Hi Igor, On Jan 5, 2013, at 7:36 AM, Igor Shalyminov wrote: > Hello! > > I'm considering Lucene as an engine for linguistic corpus search. > > There's a feature in this search: each word is treated as ambiguuos - i.e., > it has got multiple sets of grammatical annotations (there's a fixed maxi

[JOB] Lucid Imagination is hiring

2011-12-05 Thread Grant Ingersoll
Hi All, If you've wanted a full time job working on Lucene or Solr, we have two positions open that just might be of interest. The job descriptions are below. Interested candidates should submit their resumes off list to care...@lucidimagination.com. You can learn more on our website: ht

Re: Bet you didn't know Lucene can...

2011-10-25 Thread Grant Ingersoll
l as it was not embedded but even using a > remoted Lucene call I get significantly better performance (avg 0.5ms lookup > vs MySQL 10ms) > > > Cheers > Mark > > > > - Original Message - > From: Grant Ingersoll > To: java-user@lucene.apache.org > C

Re: Bet you didn't know Lucene can...

2011-10-22 Thread Grant Ingersoll
On Oct 22, 2011, at 6:03 PM, Sujit Pal wrote: > Hi Grant, > > Not sure if this qualifies as a "bet you didn't know", but one could use > Lucene term vectors to construct document vectors for similarity, > clustering and classification tasks. I found this out recently (although > I am probably no

Bet you didn't know Lucene can...

2011-10-22 Thread Grant Ingersoll
onference and also see if I can't inject more ideas beyond the ones I have. I don't need deep technical details, but just high level use case and the basic insight that led you to believe Lucene could solve the problem. Thanks in advance, Grant ---- Grant Ingersoll http://www.lucidimagination.com

Re: Need Help: Business Scenario to lucene implementation

2011-09-01 Thread Grant Ingersoll
can not be directly > converted into a percentage match (as the score value changes based on many > factors) how can this requirement be satisfied? > > Thanks > > Saurabh Grant Ingersoll http://www.lucidimagination.com Lucene Eurocon 2011: http://www.lucene-eurocon.com

Re: LSI

2011-08-29 Thread Grant Ingersoll
s indexed with lucene, I dont know > how, plz help me > > thanks, ------ Grant Ingersoll http://www.lucidimagination.com Lucene Eurocon 2011: http://www.lucene-eurocon.com - To unsubscribe, e-m

Re: What kind of System Resources are required to index 625 million row table...???

2011-08-15 Thread Grant Ingersoll
build > jvmap6460-20090215_29883 >(i.e. 64 bit Java 6) > OS: AIX 6.1 > Platform: PPC (IBM P520) > cores: 2 > Memory: 8 GB > jvm memory: ` -Xm

Re: Adding Encryption to lucene indexes

2011-08-14 Thread Grant Ingersoll
gt; >>> Thanks, >>> Chris. >>> GSOC intern with OpenMRS >>> >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> Grant Ingersoll - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

[Help Wanted] Graphics and other help for new Lucene/Solr website

2011-08-10 Thread Grant Ingersoll
Hi, We are in the process of putting up a new Lucene/Solr/PyLucene/OpenRelevance website. You can see a preview at http://lucene.staging.apache.org/lucene/. It is more or less a look and feel copy of Mahout and Open For Biz websites. This new site, IMO, both looks better than the old one and

Re: Text Categorization with Lucene (N-Gram technique)

2011-07-26 Thread Grant Ingersoll
gram with > the known fingerprint of the category. > > I wanted to know if Lucene already has any contribution done in this regards > that I can find in the contrib directory or is there any example that I can > look at else where. > > Saurabh

Lucene Rev Stump the Chump

2011-04-25 Thread Grant Ingersoll
Hey everyone, As you no doubt by now know, Lucene Revolution, the second annual Lucene/Solr conference sponsored by Lucid Imagination, is happening out in San Francisco at the end of May. There are a lot of really great talks and speakers from across the spectrum (check out lucenerevolution.o

Re: "Umlaute" getting lost

2011-04-23 Thread Grant Ingersoll
ilter is applied. What is the Analyzer for the Main Index? What is the tokenizer and token filters used? Out of curiosity, what is the problem you are trying to solve? -- Grant Ingersoll http://www.lucidimagination.com/ - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: some basic questions on how Lucene/search engines work

2011-04-13 Thread Grant Ingersoll
e > in action" I'd start w/ Lucene in Action 2nd ed. Brin and Page paper is good. As is the Manning book, Baeza Yates, Grossman, etc. I believe we have a resources page on our Wiki that lists out a lot of books and talks. I would recommend, however,

Apache Lucene 3.1.0

2011-03-31 Thread Grant Ingersoll
ccess. -- Grant Ingersoll Lucene Revolution -- Lucene and Solr User Conference May 25-26 in San Francisco www.lucenerevolution.org

Apache Lucene 3.1.0 is available

2011-03-31 Thread Grant Ingersoll
March 2011, Apache Lucene 3.1 available The Lucene PMC is pleased to announce the release of Apache Lucene 3.1. This release contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below. The release is available for immediate download at http://www.apache.org

Re: Definition Extraction

2011-03-29 Thread Grant Ingersoll
system > for Amharic Language - using machine learning technique (Version Space > learning). Can anyone suggest me some java codes to start with? > Thank You > Henok > ------ Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem docs us

Re: Grouping...

2011-03-23 Thread Grant Ingersoll
Any thoughts appreciated. Have you looked at Solr and date faceting capabilities? Also, it has result grouping, but I think you are just describing faceting/filtering. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecos

Re: Detecting duplicates

2011-03-10 Thread Grant Ingersoll
ch on 1.4.1 and it was terribly slow. > > On 3/5/11 4:43 AM, Grant Ingersoll wrote: >> See http://wiki.apache.org/solr/Deduplication. Should be fairly easy to >> pull out if you are doing just Lucene. >> >> On Mar 5, 2011, at 1:49 AM, Mark wrote: >> >>

Re: Detecting duplicates

2011-03-05 Thread Grant Ingersoll
> > Can this be easily accomplished? > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > -- Grant Ingersoll http

Fwd: [Announce] Now Open: Call for Participation for ApacheCon North America

2011-03-03 Thread Grant Ingersoll
Begin forwarded message: > From: Grant Ingersoll > Date: March 3, 2011 3:52:05 PM EST > To: u...@mahout.apache.org, solr-u...@lucene.apache.org, > java-user@lucene.apache.org, opennlp-u...@incubator.apache.org > Subject: Fwd: [Announce] Now Open: Call for Participation for

Fwd: [Announce] Now Open: Call for Participation for ApacheCon North America

2011-03-03 Thread Grant Ingersoll
Begin forwarded message: > From: Sally Khudairi > Date: March 3, 2011 3:10:17 PM EST > To: annou...@apachecon.com > Subject: [Announce] Now Open: Call for Participation for ApacheCon North > America > Reply-To: s...@apache.org > > Call for Participation > ApacheCon North America 2011 > 7-11

Free Webcast/Technical Case Study: How Bazaarvoice moved to Solr to implement Search Strategies for Social and eCommerce

2011-02-24 Thread Grant Ingersoll
I thought you might be interested in a technical webcast on Solr/Lucene and e-commerce/social media that we are sponsoring, featuring RC Johnson of Bazaarvoice. It's Wednesday, March 2, 2011 at 11:00am PST/2:00pm EST/19:00 GMT. RC has been leading efforts at Bazaarvoice to build out their Solr sea

Re: Storing payloads without term-position and frequency

2011-02-03 Thread Grant Ingersoll
> Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apac

Re: Payloads API and support

2011-02-02 Thread Grant Ingersoll
ost. > I think the better solution is to use the first approach, but to use the FieldCache on your metrics instead of stored documents and combine that w/ a custom Collector. -- Grant Ingersoll http://www.lucidimagination.com/ - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Lucene , hits per document

2011-01-25 Thread Grant Ingersoll
> CDU Systems & Process Tools > Software Developer I > ANSYS INC. -- Grant Ingersoll http://www.lucidimagination.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Grant Ingersoll
And here's mine: On Jan 18, 2011, at 4:04 PM, Grant Ingersoll wrote: > > Where do you get your Lucene/Solr downloads from? > > [] ASF Mirrors (linked in our release announcements or via the Lucene website) > > [x] Maven repository (whether you use Maven, Ant+Ivy, Buil

[POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Grant Ingersoll
As devs of Lucene/Solr, due to the way ASF mirrors, etc. works, we really don't have a good sense of how people get Lucene and Solr for use in their application. Because of this, there has been some talk of dropping Maven support for Lucene artifacts (or at least make them external). Before we

Re: Using Lucene to search live, being-edited documents

2011-01-03 Thread Grant Ingersoll
dvisable / practical to use Lucene as the basis of a >>> live >>> document search capability? By "live document" I mean a largish document >>> such as a word processor might be able to handle which is being edited >>> currently. Examples would be Word documents of some siz

[ANN] General Availability of LucidWorks Enterprise

2010-12-15 Thread Grant Ingersoll
Access LucidWorks Enterprise whitepapers and tutorials: www.lucidimagination.com/lwe/whitepapers Read further commentary on the Lucid Imagination blog Cheers, Grant -- Grant Ingersoll http://www.lucidimagination.com

Re: Custom scoring for searhing geographic objects

2010-12-15 Thread Grant Ingersoll
ddress, and the results > should appear the most relevant results. > > Thanks. > -- > Pavel Minchenkov -- Grant Ingersoll http://www.lucidimagination.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Lucene index exchange format?

2010-11-09 Thread Grant Ingersoll
mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > -- Grant Ingersoll http://www.lucidimagination.com - To unsubscribe, e-mail: java-u

ApacheCon Atlanta next week

2010-10-25 Thread Grant Ingersoll
Hi All, Just a couple of notes about ApacheCon next week for those who either are attending or are thinking of attending. 1. There will be Lucene and Solr 2 day trainings done by Erik Hatcher (Solr) and me (Lucene). It's not too late to sign up. See http://na.apachecon.com/c/acna2010/schedul

Re: Using a TermFreqVector to get counts of all words in a document

2010-10-22 Thread Grant Ingersoll
from the Directory and then you can massage the data as you see fit. On Oct 21, 2010, at 7:47 AM, app...@dsl.pipex.com wrote: > Would you have an example of this or be able to point me in the direction of > an example at all? > > Quoting Grant Ingersoll : > >> >>

Re: Using a TermFreqVector to get counts of all words in a document

2010-10-21 Thread Grant Ingersoll
oment so further investigation is inevitable. > > I expect that a combination of MySQL database storage and Lucene indexing is > going to be the end result. I'd likely take the TermVectorMapper approach, but otherwise, yeah, I think you are on the right track. > > >

Re: Using a TermFreqVector to get counts of all words in a document

2010-10-20 Thread Grant Ingersoll
ild your data structures on the fly instead of having to serialize them into two parallel arrays and then loop over those arrays to create some other structure. -- Grant Ingersoll http://www.lucidimagination.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

ApacheCon Meetup in Atlanta

2010-10-18 Thread Grant Ingersoll
Is there interest in having a Meetup at ApacheCon? Who's going? Would anyone like to present? We could do something less formal, too, and just have drinks and Q&A/networking. Thoughts? -Grant - To unsubscribe, e-mail: java

Re: Use of Lucene to store data from RSS feeds

2010-10-14 Thread Grant Ingersoll
um, etc.) to get at the frequencies. You might also need to do some stuff with Spans and SpanQueries to properly incorporate your length of time requirement. -Grant -- Grant Ingersoll http://www.lucidimagination.com ---

Re: determining the type of a term - retrieving a payload

2010-10-14 Thread Grant Ingersoll
n e) { > ... What does your Analysis process look like? Many of Lucene's analysis pieces don't bother setting type. Have you looked at the index with Luke? That should show you the payloads. Also, have a look at the SpanTermQuery. You can use the Spans

Re: Is it a bug in Lucene?

2010-09-27 Thread Grant Ingersoll
lia","Brasilândia","Braslândia", "São Paulo", "São > Roque", "Salvador"}; > ======= > > >>> Using StandardAnalyzer Using BrazilianAnalyzer >> JUnit -- Grant Ingersoll http://lucenerevolution.org

Lucene Revolution Update

2010-08-31 Thread Grant Ingersoll
ister now. Also please be aware that the early bird rate expires September 10. Hope to see you there. Cheers, Grant -- Grant Ingersoll http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8

RTP Apache Lucene/Solr Meetup Sept. 21

2010-08-30 Thread Grant Ingersoll
pe to see you there, Grant -- Grant Ingersoll http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8

Re: Span Query/Slop distance

2010-08-30 Thread Grant Ingersoll
ible. When you walk the Spans, the doc() method will tell you what doc you are on. ------ Grant Ingersoll http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8 - To unsubscribe, e-

Re: Calculate Term Co-occurrence Matrix

2010-08-20 Thread Grant Ingersoll
ating the term co-occurrence matrix for a given text corpus. > > Thanks! > > -- > Ahmed -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http:/

Re: asking about incremental update

2010-08-20 Thread Grant Ingersoll
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.co

Re: cluster documents based on fields' values

2010-08-17 Thread Grant Ingersoll
nd the brute force approach >means that I need to be doing tens of millions of searches just to >group on one field. Also I most likely will blow my heap up if I try to >load all of the values in memory all at once. > > ----

Fwd: Please Forward - Apache Retreat in Hursley, UK - 17-19th September

2010-08-06 Thread Grant Ingersoll
FYI Begin forwarded message: > From: "Mattmann, Chris A (388J)" > Date: August 5, 2010 5:24:00 PM EDT > To: "d...@tika.apache.org" > Subject: FW: Please Forward - Apache Retreat in Hursley, UK - 17-19th > September > Reply-To: d...@tika.apache.org > > > === > From: Nick Burch > To: retr

Re: Different ranking results

2010-07-27 Thread Grant Ingersoll
> > 2.) > Query q = parser.parse(TITLE:lucene OR BOOK:lucene); > > Regards, >philippe > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > -- Grant Ing

Re: Get all terms of a specific field

2010-07-27 Thread Grant Ingersoll
On Jul 27, 2010, at 8:50 AM, Philippe wrote: > Hi, > > what would be the fastest way to get all terms for all documents matching a > specific query? > > Sofar I: > > 1.) Query the index > 2.) Retrieve all scoreDocs > 3.) Iterate the scoreDocs and retrieve all terms using the getValues method

Re: Hot to get word importance in lucene index

2010-07-23 Thread Grant Ingersoll
wiki.apache.org/confluence/display/MAHOUT/Collocations for one way of doing that. -Grant - Grant Ingersoll http://www.lucidimagination.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Reverse Lucene queries

2010-07-23 Thread Grant Ingersoll
On Jul 23, 2010, at 5:06 AM, Karl Wettin wrote: > > 23 jul 2010 kl. 08.30 skrev sk...@sloan.mit.edu: > >> Hi all, I have an interesting problem...instead of going from a query >> to a document collection, is it possible to come up with the best fit >> query for a given document collection (resu

Re: Lucene Scoring

2010-07-05 Thread Grant Ingersoll
On Jul 5, 2010, at 5:02 AM, manjula wijewickrema wrote: > Hi, > > In my application, I input only single term query (at one time) and get back > the corresponding scorings for those queries. But I am little struggling of > understanding Lucene scoring. I have reffered > http://lucene.apache.org/

Re: search for a string which begins with a '$' character

2010-07-02 Thread Grant Ingersoll
What analyzer are you using? Did you check that it is making it through your analyzer? -Grant On Jul 1, 2010, at 2:56 PM, Nathaniel Auvil wrote: > i am trying to search for a value which begins with a '$' or even sometimes > '$$'. '$' is not listed as a special character and no matter what i

Re: example of processing terms in query results?

2010-06-30 Thread Grant Ingersoll
the assistance, > Peter > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > -- Grant Ingersoll http://www.lucidimagination.com/ Search the Luc

Re: arguments in favour of lucene over commercial competition

2010-06-25 Thread Grant Ingersoll
a lot of case studies over at http://www.lucidimagination.com/, including several that highlight replacements of the commercial players. -Grant -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosyst

Last Call: Lucene Revolution CFP Closes Tomorrow Wednesday, June 23, 2010, 12 Midnight PDT

2010-06-22 Thread Grant Ingersoll
Lucene Revolution Call For Participation - Boston, Massachusetts October 7 & 8, 2010 The first US conference dedicated to Lucene and Solr is coming to Boston, October 7 & 8, 2010. The conference is sponsored by Lucid Imagination with additional support from community and other commercial co‐sp

Re: CFP for Lucene Revolution Conference, Boston, MA October 7 & 8 2010

2010-06-01 Thread Grant Ingersoll
Sorry for the noise, but thought I would send out a reminder to get your talks in... On May 17, 2010, at 8:43 AM, Grant Ingersoll wrote: > Lucene Revolution Call For Participation - Boston, Massachusetts October 7 & > 8, 2010 > > The first US conference dedicated to Apache Lu

Re: vector model usage

2010-06-01 Thread Grant Ingersoll
) which you could easily back with a TermFreqVector. > > This is the use case behind the question: retrieve some documents from the > index, cluster them, and store the vector space representations of the > clusters back to the index. > > Dionisis -- Grant Ingers

ApacheCon CFP Closes on Friday

2010-05-26 Thread Grant Ingersoll
If you are planning on submitting for ApacheCon, you have until Friday to do so See the CFP at http://blogs.apache.org/conferences/date/20100428 - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional comm

Re: CFP for Lucene Revolution Conference, Boston, MA October 7 & 8 2010

2010-05-24 Thread Grant Ingersoll
I should add that talks on Mahout, Tika, Nutch, etc. are also encouraged. -Grant On May 17, 2010, at 8:43 AM, Grant Ingersoll wrote: > Lucene Revolution Call For Participation - Boston, Massachusetts October 7 & > 8, 2010 > > The first US conference dedicated to Apache Lu

Re: Arrange terms[i]

2010-05-24 Thread Grant Ingersoll
On May 20, 2010, at 5:15 AM, manjula wijewickrema wrote: > Hi, > > I wrote aprogram to get the ferquencies and terms of an indexed document. > The output comes as follows; > > > If I print : +tfv[0] > > Output: > > array terms are:{title: capabl/1, code/2, frequenc/1, lucen/4, over/1, > samp

Re: About loading lazily

2010-05-24 Thread Grant Ingersoll
I'd also add that the Document keeps a pointer to the spot in storage where that value can be loaded from. It can result in a performance saving in the typical search use case where one is displaying just "metadata" fields on a page, but not the full content. In this case, the full content pag

Re: Problem of getTermFrequencies()

2010-05-17 Thread Grant Ingersoll
Note, depending on your downstream use, you may consider using a TermVectorMapper that allows you to construct your own data structures as needed. -Grant On May 17, 2010, at 3:16 PM, Ian Lea wrote: > terms and freqs are arrays. Try terms[i] and freqs[i]. > > > -- > Ian. > > > On Mon, May

CFP for Lucene Revolution Conference, Boston, MA October 7 & 8 2010

2010-05-17 Thread Grant Ingersoll
Lucene Revolution Call For Participation - Boston, Massachusetts October 7 & 8, 2010 The first US conference dedicated to Apache Lucene and Solr is coming to Boston, October 7 & 8, 2010. The conference is sponsored by Lucid Imagination with additional support from community and other commercia

Fwd: [Travel Assistance] - Applications Open for ApacheCon NA 2010

2010-05-17 Thread Grant Ingersoll
Begin forwarded message: > he Travel Assistance Committee is now taking in applications for those > wanting to attend ApacheCon North America (NA) 2010, which is taking place > between the 1st and 5th November in Atlanta. > > The Travel Assistance Committee is looking for people who would like

Re: TermDocs

2010-05-14 Thread Grant Ingersoll
tor(); while (docIdSetIterator.nextDoc() != DocIdSetIterator.NO_MORE_DOCS){ System.out.println("Doc: " + docIdSetIterator.docID()); } -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimaginati

Re: Relevancy Practices

2010-05-05 Thread Grant Ingersoll
er in many, many situations. > Most of our relevance tuning has occurred after deployment to production. > > Peter > > On Thu, Apr 29, 2010 at 10:14 AM, Grant Ingersoll wrote: > >> I'm putting on a talk at Lucene Eurocon ( >> http://lucene-eurocon.org/sessions

Re: Relevancy Practices

2010-05-05 Thread Grant Ingersoll
On May 2, 2010, at 5:50 AM, Avi Rosenschein wrote: > On 4/30/10, Grant Ingersoll wrote: >> >> On Apr 30, 2010, at 8:00 AM, Avi Rosenschein wrote: >>> Also, tuning the algorithms to the users can be very important. For >>> instance, we have found that in

[OT] Lucene Boot Camp Training in Europe

2010-04-30 Thread Grant Ingersoll
I will be once again providing Lucene training in Europe this year as part of Lucene EuroCon (in place of the usual ApacheCon venue). This time it is in the beautiful city of Prague starting on May 18th. Registration is open. For more info, check out http://lucene-eurocon.org/training.html C

Re: Relevancy Practices

2010-04-30 Thread Grant Ingersoll
On Apr 30, 2010, at 8:00 AM, Avi Rosenschein wrote: > Also, tuning the algorithms to the users can be very important. For > instance, we have found that in a basic search functionality, the default > query parser operator OR works very well. But on a page for advanced users, > who want to very pre

Relevancy Practices

2010-04-29 Thread Grant Ingersoll
I'm putting on a talk at Lucene Eurocon (http://lucene-eurocon.org/sessions-track1-day2.html#1) on "Practical Relevance" and I'm curious as to what people put in practice for testing and improving relevance. I have my own inclinations, but I don't want to muddy the water just yet. So, if you

Re: Call for Participation: Technical Talks -- ApacheCon North America 2010

2010-04-28 Thread Grant Ingersoll
On Apr 28, 2010, at 1:53 PM, Grant Ingersoll wrote: > > > Begin forwarded message: > >> From: Sally Khudairi >> Date: April 28, 2010 1:48:57 PM EDT >> To: annou...@apachecon.com >> Subject: Call for Participation: Technical Talks -- ApacheCon N

Fwd: Call for Participation: Technical Talks -- ApacheCon North America 2010

2010-04-28 Thread Grant Ingersoll
Begin forwarded message: > From: Sally Khudairi > Date: April 28, 2010 1:48:57 PM EDT > To: annou...@apachecon.com > Subject: Call for Participation: Technical Talks -- ApacheCon North America > 2010 > Reply-To: s...@apache.org > > ApacheCon North America 2010 > 1-5 November 2010 -- Westin Pe

Re: Big problem with solr in an official server.

2010-04-20 Thread Grant Ingersoll
ver. > Could you help me please ??? > Thanks in advance. > Regards > Ariel -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search -

Re: Exception, field is not stored

2010-04-13 Thread Grant Ingersoll
iter.addDocument(luceneDocument); > > You need to use the Field constructor that takes in a String and not a Reader in order to use storage. -- Grant Ingersoll http://www.lucidimagination.com/ S

Re: Apache Lucene EuroCon Call For Participation: Prague, Czech Republic May 20 & 21, 2010

2010-04-05 Thread Grant Ingersoll
Just a reminder, just over one week left open on the CFP. Some great talks entered already. Keep it up! On Mar 24, 2010, at 8:03 PM, Grant Ingersoll wrote: > Apache Lucene EuroCon Call For Participation - Prague, Czech Republic May 20 > & 21, 2010 > > All submissions mus

Re: Lucene Challenge - sum, count, avg, etc.

2010-04-02 Thread Grant Ingersoll
On Apr 1, 2010, at 11:13 PM, Michel Nadeau wrote: > My big question is how do you loop 1M records, sum up field(s), and then > sort on that field... all in memory (could use too much ram) ? In a > temporary index (could take a while to re-write a lot of documents in a new > index) ? > You're g

Re: Lucene Challenge - sum, count, avg, etc.

2010-04-01 Thread Grant Ingersoll
that would be quite crazy too as the date > range possibilities quickly become endless. > > So - is there any known way to efficiently do SUM(), COUNT() (and even AVG() > ) using Lucene/Solr/others? I also checked Bobo Browse but it doesn't seem > to offer what I need either. >

Re: Lucene Spatial

2010-03-30 Thread Grant Ingersoll
-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > -- > Guillermo Payet > L O C A L H A R V E S T > http://www.localharvest.org > http://twitter.com/localharvestorg > > ---

Registration is now open for Apache Lucene EuroCon - Prague, Czech Republic, 18-21 May, 2010.

2010-03-30 Thread Grant Ingersoll
gain practical hands-on experience with Lucene & Solr and the know-how to develop killer search code. Lastly, a reminder: the Call for Participation is still open, accepting submissions until April 13th. Hope to see you there! Grant Ingersoll Apache Lucene EuroCon Program Chair ww

Apache Lucene EuroCon Call For Participation: Prague, Czech Republic May 20 & 21, 2010

2010-03-29 Thread Grant Ingersoll
Apache Lucene EuroCon Call For Participation - Prague, Czech Republic May 20 & 21, 2010 All submissions must be received by Tuesday, April 13, 2010, 12 Midnight CET/6 PM US EDT The first European conference dedicated to Lucene and Solr is coming to Prague from May 18-21, 2010. Apache Lucene Eu

Re: adapting lucene's practical scoring function

2010-03-29 Thread Grant Ingersoll
possibly, Lucene is a bit of overkill here other than using it to get IDF values. Can't you just create a big matrix (maybe w/ Hadoop and HBase or something similar) of your precomputed similarities and then just lookups on the document? --

Apache Lucene EuroCon Call For Participation: Prague, Czech Republic May 20 & 21, 2010

2010-03-24 Thread Grant Ingersoll
March 2010: Call For Participation Open 13 April 2010: Call For Participation Closes 16 April 2010: Speaker Acceptance/Rejection Notification 18-19 May 2010 Lucene and Solr Pre-conference Training Sessions 20-21 May 2010: Apache Lucene EuroCon We look forward to seeing you in Prague! Grant In

Re: Garbage Collection performance on 2.9.2

2010-03-24 Thread Grant Ingersoll
-- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: ht

Re: Lucene query with long strings

2010-03-24 Thread Grant Ingersoll
th. > > Finally, if you wish to be very precise, you can loop through the hits > collector and use a string comparison algorithm like Jaro-Winkler, > Levenstein etc. for a second-level filter. Note, this approach will be slow. -- Grant Ingersoll http://www

Re: BooleanQuery and SpanQuery : how to get « c ombined » spans?

2010-03-23 Thread Grant Ingersoll
On Mar 23, 2010, at 12:58 AM, Benoit Mercier wrote: > Hi, > > I would like to write a query composed of a BooleanQuery (several clauses) > and a SpanQuery (SpanNearQuery), where both are mandatory. Sounds simple > but I have to work on spans returned by this query. > > I know that I could u

Re: access payload from HitCollector.collect()

2010-03-22 Thread Grant Ingersoll
On Mar 22, 2010, at 8:56 AM, prasenjit mukherjee wrote: > I am trying to implement oracle's aggregation like SQL's ( e.g. > SUM(col3) where col1='foo' and col2='bar' ) using lucene's payload > feature. > > I can add the integer_value ( of col3 ) as a payload to my searchable > fields ( col1 and

Re: Dealing with special cases in analyser

2010-03-17 Thread Grant Ingersoll
On Mar 17, 2010, at 11:34 AM, Paul Taylor wrote: > Grant Ingersoll wrote: >> What's your current chain of TokenFilters? How many exceptions do you >> expect? That is, could you enumerate them? >> > Very few, yes I could enumerate them, but not sure what exactly

Re: Dealing with special cases in analyser

2010-03-17 Thread Grant Ingersoll
ng > this, is there something else I should be doing. > > thanks Paul > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > -- Grant

Re: recovering payload from fields

2010-03-05 Thread Grant Ingersoll
--- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > -- Grant Ingersoll http://www.lucidimagination.com/ Searc

Re: If you could have one feature in Lucene...

2010-02-25 Thread Grant Ingersoll
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > -- Grant Ingersoll http://www.lucidimagination.com/ Search the L

Re: If you could have one feature in Lucene...

2010-02-25 Thread Grant Ingersoll
On Feb 24, 2010, at 4:22 PM, Paul Libbrecht wrote: > I would wish a highlighting feature that's fully integrated. That's what Solr does. Lucene is still, at the end of the day, a library of APIs for people to build things. Solr/Nutch are the Lucene TLP way of expressing these sentiments. ---

Re: If you could have one feature in Lucene...

2010-02-25 Thread Grant Ingersoll
On Feb 25, 2010, at 12:41 AM, Ganesh wrote: > > 1. Payload per document which could be updated without a need to update the > entire document. > Usecase: The state of our indexed content will change based on the User > action (Created/ Viewed/Deleted etc) and we are using Lucene as our databa

If you could have one feature in Lucene...

2010-02-24 Thread Grant Ingersoll
What would it be? - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: PayloadNearSpanScorer explain method

2010-02-16 Thread Grant Ingersoll
That sounds reasonable. Patch? On Feb 15, 2010, at 10:29 AM, Peter Keegan wrote: > The 'explain' method in PayloadNearSpanScorer assumes the > AveragePayloadFunction was used. I don't see an easy way to override this > because 'payloadsSeen' and 'payloadScore' are private/protected. It seems > l

Re: read more tokens during analysis

2010-02-10 Thread Grant Ingersoll
hod. captureState() and restoreState() are the new versions in 3.0. There are several examples of how they work in contrib/analyzers. ------ Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimag

Re: Match span of capitalized words

2010-02-05 Thread Grant Ingersoll
On Feb 3, 2010, at 8:57 PM, Max Lynch wrote: > Hi, > I would like to do a search for "Microsoft Windows" as a span, but not match > if words before or after "Microsoft Windows" are upper cased. > > For example, I want this to match: another crash for Microsoft Windows today > But not this: anoth

Re: Average Precision - TREC-3

2010-01-28 Thread Grant Ingersoll
On Jan 28, 2010, at 11:00 AM, Robert Muir wrote: > in addition to what Grant said, even if your documents are similar, what > about queries? > > For example, if only a few trec queries contain proper names, acronyms, > abbreviations, or whatever, but your users frequently input things like > thi

  1   2   3   4   5   6   7   8   9   10   >