Re: Lucene for a linguistic corpus

2013-01-08 Thread Grant Ingersoll
Hi Igor, On Jan 5, 2013, at 7:36 AM, Igor Shalyminov wrote: > Hello! > > I'm considering Lucene as an engine for linguistic corpus search. > > There's a feature in this search: each word is treated as ambiguuos - i.e., > it has got multiple sets of grammatical annotations (there's a fixed maxi

Re: Reg Lucene Naive Bayesian classifier.

2013-01-14 Thread Grant Ingersoll
-- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> >> Grant Ingersoll http://www.lucidworks.com

[OT] San Fran. Lucene/Solr Hack Night

2013-01-30 Thread Grant Ingersoll
heers, Grant Grant Ingersoll http://www.lucidworks.com

Re: Relevancy Practices

2010-05-05 Thread Grant Ingersoll
On May 2, 2010, at 5:50 AM, Avi Rosenschein wrote: > On 4/30/10, Grant Ingersoll wrote: >> >> On Apr 30, 2010, at 8:00 AM, Avi Rosenschein wrote: >>> Also, tuning the algorithms to the users can be very important. For >>> instance, we have found that in

Re: Relevancy Practices

2010-05-05 Thread Grant Ingersoll
er in many, many situations. > Most of our relevance tuning has occurred after deployment to production. > > Peter > > On Thu, Apr 29, 2010 at 10:14 AM, Grant Ingersoll wrote: > >> I'm putting on a talk at Lucene Eurocon ( >> http://lucene-eurocon.org/sessions

Re: TermDocs

2010-05-14 Thread Grant Ingersoll
tor(); while (docIdSetIterator.nextDoc() != DocIdSetIterator.NO_MORE_DOCS){ System.out.println("Doc: " + docIdSetIterator.docID()); } -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimaginati

Fwd: [Travel Assistance] - Applications Open for ApacheCon NA 2010

2010-05-17 Thread Grant Ingersoll
Begin forwarded message: > he Travel Assistance Committee is now taking in applications for those > wanting to attend ApacheCon North America (NA) 2010, which is taking place > between the 1st and 5th November in Atlanta. > > The Travel Assistance Committee is looking for people who would like

CFP for Lucene Revolution Conference, Boston, MA October 7 & 8 2010

2010-05-17 Thread Grant Ingersoll
Lucene Revolution Call For Participation - Boston, Massachusetts October 7 & 8, 2010 The first US conference dedicated to Apache Lucene and Solr is coming to Boston, October 7 & 8, 2010. The conference is sponsored by Lucid Imagination with additional support from community and other commercia

Re: Problem of getTermFrequencies()

2010-05-17 Thread Grant Ingersoll
Note, depending on your downstream use, you may consider using a TermVectorMapper that allows you to construct your own data structures as needed. -Grant On May 17, 2010, at 3:16 PM, Ian Lea wrote: > terms and freqs are arrays. Try terms[i] and freqs[i]. > > > -- > Ian. > > > On Mon, May

Re: About loading lazily

2010-05-24 Thread Grant Ingersoll
I'd also add that the Document keeps a pointer to the spot in storage where that value can be loaded from. It can result in a performance saving in the typical search use case where one is displaying just "metadata" fields on a page, but not the full content. In this case, the full content pag

Re: Arrange terms[i]

2010-05-24 Thread Grant Ingersoll
On May 20, 2010, at 5:15 AM, manjula wijewickrema wrote: > Hi, > > I wrote aprogram to get the ferquencies and terms of an indexed document. > The output comes as follows; > > > If I print : +tfv[0] > > Output: > > array terms are:{title: capabl/1, code/2, frequenc/1, lucen/4, over/1, > samp

Re: CFP for Lucene Revolution Conference, Boston, MA October 7 & 8 2010

2010-05-24 Thread Grant Ingersoll
I should add that talks on Mahout, Tika, Nutch, etc. are also encouraged. -Grant On May 17, 2010, at 8:43 AM, Grant Ingersoll wrote: > Lucene Revolution Call For Participation - Boston, Massachusetts October 7 & > 8, 2010 > > The first US conference dedicated to Apache Lu

ApacheCon CFP Closes on Friday

2010-05-26 Thread Grant Ingersoll
If you are planning on submitting for ApacheCon, you have until Friday to do so See the CFP at http://blogs.apache.org/conferences/date/20100428 - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional comm

Re: vector model usage

2010-06-01 Thread Grant Ingersoll
) which you could easily back with a TermFreqVector. > > This is the use case behind the question: retrieve some documents from the > index, cluster them, and store the vector space representations of the > clusters back to the index. > > Dionisis -- Grant Ingers

Re: CFP for Lucene Revolution Conference, Boston, MA October 7 & 8 2010

2010-06-01 Thread Grant Ingersoll
Sorry for the noise, but thought I would send out a reminder to get your talks in... On May 17, 2010, at 8:43 AM, Grant Ingersoll wrote: > Lucene Revolution Call For Participation - Boston, Massachusetts October 7 & > 8, 2010 > > The first US conference dedicated to Apache Lu

Last Call: Lucene Revolution CFP Closes Tomorrow Wednesday, June 23, 2010, 12 Midnight PDT

2010-06-22 Thread Grant Ingersoll
Lucene Revolution Call For Participation - Boston, Massachusetts October 7 & 8, 2010 The first US conference dedicated to Lucene and Solr is coming to Boston, October 7 & 8, 2010. The conference is sponsored by Lucid Imagination with additional support from community and other commercial co‐sp

Re: arguments in favour of lucene over commercial competition

2010-06-25 Thread Grant Ingersoll
a lot of case studies over at http://www.lucidimagination.com/, including several that highlight replacements of the commercial players. -Grant -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosyst

Re: example of processing terms in query results?

2010-06-30 Thread Grant Ingersoll
the assistance, > Peter > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > -- Grant Ingersoll http://www.lucidimagination.com/ Search the Luc

Re: search for a string which begins with a '$' character

2010-07-02 Thread Grant Ingersoll
What analyzer are you using? Did you check that it is making it through your analyzer? -Grant On Jul 1, 2010, at 2:56 PM, Nathaniel Auvil wrote: > i am trying to search for a value which begins with a '$' or even sometimes > '$$'. '$' is not listed as a special character and no matter what i

Re: Lucene Scoring

2010-07-05 Thread Grant Ingersoll
On Jul 5, 2010, at 5:02 AM, manjula wijewickrema wrote: > Hi, > > In my application, I input only single term query (at one time) and get back > the corresponding scorings for those queries. But I am little struggling of > understanding Lucene scoring. I have reffered > http://lucene.apache.org/

Re: Reverse Lucene queries

2010-07-23 Thread Grant Ingersoll
On Jul 23, 2010, at 5:06 AM, Karl Wettin wrote: > > 23 jul 2010 kl. 08.30 skrev sk...@sloan.mit.edu: > >> Hi all, I have an interesting problem...instead of going from a query >> to a document collection, is it possible to come up with the best fit >> query for a given document collection (resu

Re: Hot to get word importance in lucene index

2010-07-23 Thread Grant Ingersoll
wiki.apache.org/confluence/display/MAHOUT/Collocations for one way of doing that. -Grant - Grant Ingersoll http://www.lucidimagination.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Get all terms of a specific field

2010-07-27 Thread Grant Ingersoll
On Jul 27, 2010, at 8:50 AM, Philippe wrote: > Hi, > > what would be the fastest way to get all terms for all documents matching a > specific query? > > Sofar I: > > 1.) Query the index > 2.) Retrieve all scoreDocs > 3.) Iterate the scoreDocs and retrieve all terms using the getValues method

Re: Different ranking results

2010-07-27 Thread Grant Ingersoll
> > 2.) > Query q = parser.parse(TITLE:lucene OR BOOK:lucene); > > Regards, >philippe > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > -- Grant Ing

Fwd: Please Forward - Apache Retreat in Hursley, UK - 17-19th September

2010-08-06 Thread Grant Ingersoll
FYI Begin forwarded message: > From: "Mattmann, Chris A (388J)" > Date: August 5, 2010 5:24:00 PM EDT > To: "d...@tika.apache.org" > Subject: FW: Please Forward - Apache Retreat in Hursley, UK - 17-19th > September > Reply-To: d...@tika.apache.org > > > === > From: Nick Burch > To: retr

Re: cluster documents based on fields' values

2010-08-17 Thread Grant Ingersoll
nd the brute force approach >means that I need to be doing tens of millions of searches just to >group on one field. Also I most likely will blow my heap up if I try to >load all of the values in memory all at once. > > ----

Re: asking about incremental update

2010-08-20 Thread Grant Ingersoll
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.co

Re: Calculate Term Co-occurrence Matrix

2010-08-20 Thread Grant Ingersoll
ating the term co-occurrence matrix for a given text corpus. > > Thanks! > > -- > Ahmed -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http:/

Re: Span Query/Slop distance

2010-08-30 Thread Grant Ingersoll
ible. When you walk the Spans, the doc() method will tell you what doc you are on. ------ Grant Ingersoll http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8 - To unsubscribe, e-

RTP Apache Lucene/Solr Meetup Sept. 21

2010-08-30 Thread Grant Ingersoll
pe to see you there, Grant -- Grant Ingersoll http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8

Lucene Revolution Update

2010-08-31 Thread Grant Ingersoll
ister now. Also please be aware that the early bird rate expires September 10. Hope to see you there. Cheers, Grant -- Grant Ingersoll http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8

Re: Is it a bug in Lucene?

2010-09-27 Thread Grant Ingersoll
lia","Brasilândia","Braslândia", "São Paulo", "São > Roque", "Salvador"}; > ======= > > >>> Using StandardAnalyzer Using BrazilianAnalyzer >> JUnit -- Grant Ingersoll http://lucenerevolution.org

Re: determining the type of a term - retrieving a payload

2010-10-14 Thread Grant Ingersoll
n e) { > ... What does your Analysis process look like? Many of Lucene's analysis pieces don't bother setting type. Have you looked at the index with Luke? That should show you the payloads. Also, have a look at the SpanTermQuery. You can use the Spans

Re: Use of Lucene to store data from RSS feeds

2010-10-14 Thread Grant Ingersoll
um, etc.) to get at the frequencies. You might also need to do some stuff with Spans and SpanQueries to properly incorporate your length of time requirement. -Grant -- Grant Ingersoll http://www.lucidimagination.com ---

ApacheCon Meetup in Atlanta

2010-10-18 Thread Grant Ingersoll
Is there interest in having a Meetup at ApacheCon? Who's going? Would anyone like to present? We could do something less formal, too, and just have drinks and Q&A/networking. Thoughts? -Grant - To unsubscribe, e-mail: java

Re: Using a TermFreqVector to get counts of all words in a document

2010-10-20 Thread Grant Ingersoll
ild your data structures on the fly instead of having to serialize them into two parallel arrays and then loop over those arrays to create some other structure. -- Grant Ingersoll http://www.lucidimagination.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Using a TermFreqVector to get counts of all words in a document

2010-10-21 Thread Grant Ingersoll
oment so further investigation is inevitable. > > I expect that a combination of MySQL database storage and Lucene indexing is > going to be the end result. I'd likely take the TermVectorMapper approach, but otherwise, yeah, I think you are on the right track. > > >

Re: Using a TermFreqVector to get counts of all words in a document

2010-10-22 Thread Grant Ingersoll
from the Directory and then you can massage the data as you see fit. On Oct 21, 2010, at 7:47 AM, app...@dsl.pipex.com wrote: > Would you have an example of this or be able to point me in the direction of > an example at all? > > Quoting Grant Ingersoll : > >> >>

ApacheCon Atlanta next week

2010-10-25 Thread Grant Ingersoll
Hi All, Just a couple of notes about ApacheCon next week for those who either are attending or are thinking of attending. 1. There will be Lucene and Solr 2 day trainings done by Erik Hatcher (Solr) and me (Lucene). It's not too late to sign up. See http://na.apachecon.com/c/acna2010/schedul

Re: Lucene index exchange format?

2010-11-09 Thread Grant Ingersoll
mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > -- Grant Ingersoll http://www.lucidimagination.com - To unsubscribe, e-mail: java-u

Re: Custom scoring for searhing geographic objects

2010-12-15 Thread Grant Ingersoll
ddress, and the results > should appear the most relevant results. > > Thanks. > -- > Pavel Minchenkov -- Grant Ingersoll http://www.lucidimagination.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

[ANN] General Availability of LucidWorks Enterprise

2010-12-15 Thread Grant Ingersoll
Access LucidWorks Enterprise whitepapers and tutorials: www.lucidimagination.com/lwe/whitepapers Read further commentary on the Lucid Imagination blog Cheers, Grant -- Grant Ingersoll http://www.lucidimagination.com

Re: Using Lucene to search live, being-edited documents

2011-01-03 Thread Grant Ingersoll
dvisable / practical to use Lucene as the basis of a >>> live >>> document search capability? By "live document" I mean a largish document >>> such as a word processor might be able to handle which is being edited >>> currently. Examples would be Word documents of some siz

[POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Grant Ingersoll
As devs of Lucene/Solr, due to the way ASF mirrors, etc. works, we really don't have a good sense of how people get Lucene and Solr for use in their application. Because of this, there has been some talk of dropping Maven support for Lucene artifacts (or at least make them external). Before we

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Grant Ingersoll
And here's mine: On Jan 18, 2011, at 4:04 PM, Grant Ingersoll wrote: > > Where do you get your Lucene/Solr downloads from? > > [] ASF Mirrors (linked in our release announcements or via the Lucene website) > > [x] Maven repository (whether you use Maven, Ant+Ivy, Buil

Re: Lucene , hits per document

2011-01-25 Thread Grant Ingersoll
> CDU Systems & Process Tools > Software Developer I > ANSYS INC. -- Grant Ingersoll http://www.lucidimagination.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Payloads API and support

2011-02-02 Thread Grant Ingersoll
ost. > I think the better solution is to use the first approach, but to use the FieldCache on your metrics instead of stored documents and combine that w/ a custom Collector. -- Grant Ingersoll http://www.lucidimagination.com/ - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Storing payloads without term-position and frequency

2011-02-03 Thread Grant Ingersoll
> Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apac

Free Webcast/Technical Case Study: How Bazaarvoice moved to Solr to implement Search Strategies for Social and eCommerce

2011-02-24 Thread Grant Ingersoll
I thought you might be interested in a technical webcast on Solr/Lucene and e-commerce/social media that we are sponsoring, featuring RC Johnson of Bazaarvoice. It's Wednesday, March 2, 2011 at 11:00am PST/2:00pm EST/19:00 GMT. RC has been leading efforts at Bazaarvoice to build out their Solr sea

Fwd: [Announce] Now Open: Call for Participation for ApacheCon North America

2011-03-03 Thread Grant Ingersoll
Begin forwarded message: > From: Sally Khudairi > Date: March 3, 2011 3:10:17 PM EST > To: annou...@apachecon.com > Subject: [Announce] Now Open: Call for Participation for ApacheCon North > America > Reply-To: s...@apache.org > > Call for Participation > ApacheCon North America 2011 > 7-11

Fwd: [Announce] Now Open: Call for Participation for ApacheCon North America

2011-03-03 Thread Grant Ingersoll
Begin forwarded message: > From: Grant Ingersoll > Date: March 3, 2011 3:52:05 PM EST > To: u...@mahout.apache.org, solr-u...@lucene.apache.org, > java-user@lucene.apache.org, opennlp-u...@incubator.apache.org > Subject: Fwd: [Announce] Now Open: Call for Participation for

Re: Detecting duplicates

2011-03-05 Thread Grant Ingersoll
> > Can this be easily accomplished? > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > -- Grant Ingersoll http

Re: Detecting duplicates

2011-03-10 Thread Grant Ingersoll
ch on 1.4.1 and it was terribly slow. > > On 3/5/11 4:43 AM, Grant Ingersoll wrote: >> See http://wiki.apache.org/solr/Deduplication. Should be fairly easy to >> pull out if you are doing just Lucene. >> >> On Mar 5, 2011, at 1:49 AM, Mark wrote: >> >>

Re: Grouping...

2011-03-23 Thread Grant Ingersoll
Any thoughts appreciated. Have you looked at Solr and date faceting capabilities? Also, it has result grouping, but I think you are just describing faceting/filtering. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecos

Re: Definition Extraction

2011-03-29 Thread Grant Ingersoll
system > for Amharic Language - using machine learning technique (Version Space > learning). Can anyone suggest me some java codes to start with? > Thank You > Henok > ------ Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem docs us

Apache Lucene 3.1.0 is available

2011-03-31 Thread Grant Ingersoll
March 2011, Apache Lucene 3.1 available The Lucene PMC is pleased to announce the release of Apache Lucene 3.1. This release contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below. The release is available for immediate download at http://www.apache.org

Apache Lucene 3.1.0

2011-03-31 Thread Grant Ingersoll
ccess. -- Grant Ingersoll Lucene Revolution -- Lucene and Solr User Conference May 25-26 in San Francisco www.lucenerevolution.org

Re: some basic questions on how Lucene/search engines work

2011-04-13 Thread Grant Ingersoll
e > in action" I'd start w/ Lucene in Action 2nd ed. Brin and Page paper is good. As is the Manning book, Baeza Yates, Grossman, etc. I believe we have a resources page on our Wiki that lists out a lot of books and talks. I would recommend, however,

Re: "Umlaute" getting lost

2011-04-23 Thread Grant Ingersoll
ilter is applied. What is the Analyzer for the Main Index? What is the tokenizer and token filters used? Out of curiosity, what is the problem you are trying to solve? -- Grant Ingersoll http://www.lucidimagination.com/ - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Lucene Rev Stump the Chump

2011-04-25 Thread Grant Ingersoll
Hey everyone, As you no doubt by now know, Lucene Revolution, the second annual Lucene/Solr conference sponsored by Lucid Imagination, is happening out in San Francisco at the end of May. There are a lot of really great talks and speakers from across the spectrum (check out lucenerevolution.o

Re: Text Categorization with Lucene (N-Gram technique)

2011-07-26 Thread Grant Ingersoll
gram with > the known fingerprint of the category. > > I wanted to know if Lucene already has any contribution done in this regards > that I can find in the contrib directory or is there any example that I can > look at else where. > > Saurabh

[Help Wanted] Graphics and other help for new Lucene/Solr website

2011-08-10 Thread Grant Ingersoll
Hi, We are in the process of putting up a new Lucene/Solr/PyLucene/OpenRelevance website. You can see a preview at http://lucene.staging.apache.org/lucene/. It is more or less a look and feel copy of Mahout and Open For Biz websites. This new site, IMO, both looks better than the old one and

Re: Adding Encryption to lucene indexes

2011-08-14 Thread Grant Ingersoll
gt; >>> Thanks, >>> Chris. >>> GSOC intern with OpenMRS >>> >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> Grant Ingersoll - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: What kind of System Resources are required to index 625 million row table...???

2011-08-15 Thread Grant Ingersoll
build > jvmap6460-20090215_29883 >(i.e. 64 bit Java 6) > OS: AIX 6.1 > Platform: PPC (IBM P520) > cores: 2 > Memory: 8 GB > jvm memory: ` -Xm

Re: LSI

2011-08-29 Thread Grant Ingersoll
s indexed with lucene, I dont know > how, plz help me > > thanks, ------ Grant Ingersoll http://www.lucidimagination.com Lucene Eurocon 2011: http://www.lucene-eurocon.com - To unsubscribe, e-m

Re: Need Help: Business Scenario to lucene implementation

2011-09-01 Thread Grant Ingersoll
can not be directly > converted into a percentage match (as the score value changes based on many > factors) how can this requirement be satisfied? > > Thanks > > Saurabh Grant Ingersoll http://www.lucidimagination.com Lucene Eurocon 2011: http://www.lucene-eurocon.com

Bet you didn't know Lucene can...

2011-10-22 Thread Grant Ingersoll
onference and also see if I can't inject more ideas beyond the ones I have. I don't need deep technical details, but just high level use case and the basic insight that led you to believe Lucene could solve the problem. Thanks in advance, Grant ---- Grant Ingersoll http://www.lucidimagination.com

Re: Bet you didn't know Lucene can...

2011-10-22 Thread Grant Ingersoll
On Oct 22, 2011, at 6:03 PM, Sujit Pal wrote: > Hi Grant, > > Not sure if this qualifies as a "bet you didn't know", but one could use > Lucene term vectors to construct document vectors for similarity, > clustering and classification tasks. I found this out recently (although > I am probably no

Re: Bet you didn't know Lucene can...

2011-10-25 Thread Grant Ingersoll
l as it was not embedded but even using a > remoted Lucene call I get significantly better performance (avg 0.5ms lookup > vs MySQL 10ms) > > > Cheers > Mark > > > > - Original Message - > From: Grant Ingersoll > To: java-user@lucene.apache.org > C

[JOB] Lucid Imagination is hiring

2011-12-05 Thread Grant Ingersoll
Hi All, If you've wanted a full time job working on Lucene or Solr, we have two positions open that just might be of interest. The job descriptions are below. Interested candidates should submit their resumes off list to care...@lucidimagination.com. You can learn more on our website: ht

Re: Storing special characters in Lucene

2008-08-21 Thread Grant Ingersoll
will contain "ni�os" Looking at the index with Luke it shows me "ni�os" but when I want to see the full text (by right clicking) it shows me ni�os. I know Lucene is supposed to store fields in UTF8, but then, how can I make sure I sotre something and get it back just as

Re: Score Boosting

2008-08-22 Thread Grant Ingersoll
Normalization is done on a field by field basis, as is most scoring. It doesn't factor all fields in, b/c someone might not be querying all fields. The field it does use is based on the query. On Aug 18, 2008, at 10:44 PM, blazingwolf7 wrote: Hi, I am currently working on the calculatio

Re: How do TeeTokenizer and SinkTokenizer work?

2008-08-22 Thread Grant Ingersoll
On Aug 22, 2008, at 3:47 PM, Teruhiko Kurosaka wrote: Hello, I'm interested in knowing how these tokenizers work together. The API doc for TeeTokenizer http://lucene.apache.org/java/2_3_1/api/org/apache/lucene/analysis/TeeTokenFilter.html has this sample code: SinkTokenizer sink1 = new SinkTok

Re: How do TeeTokenizer and SinkTokenizer work?

2008-08-26 Thread Grant Ingersoll
On Aug 25, 2008, at 7:29 PM, Teruhiko Kurosaka wrote: Thank you, Grant and (Koji) Sekiguchi-san. but I don't understand how the input from reader1 and reader2 are mixed together. Will sink1 first reaturn the reader1 text, and reader2? It depends on the order the fields are added. If so

Re: lucene 3.0 feature list?

2008-08-27 Thread Grant Ingersoll
On Aug 26, 2008, at 6:59 PM, Karl Wettin wrote: 27 aug 2008 kl. 00.52 skrev Darren Govoni: Hi, Sorry if I missed this somewhere or maybe its not released yet, but I was anxiously curious about lucene 3.0's expected features/ improvements. Is there a list yet? If everything goes as planne

Re: lucene 3.0 feature list?

2008-08-27 Thread Grant Ingersoll
See http://wiki.apache.org/lucene-java/BackwardsCompatibility Generally speaking 3.0 will be the same as 2.9 minus the deprecated methods (and, in this case, the upgrade to JDK 1.5). That is not to say that the file formats, etc. under the hood won't change and that there won't be new meth

Re: lucene 3.0 feature list?

2008-08-28 Thread Grant Ingersoll
We haven't even begun working on 3.0 other than the planning to say it will be on JDK 1.5. There may be a few tickets in JIRA that are marked as 3.0, though, but that doesn't even mean they will make it. And, the API will not necessarily be 2.4 compatible. That is not in our back compat.

Re: Clarity: Is there a Query boosting 50-50 over 1000-1 ?

2008-08-28 Thread Grant Ingersoll
On Aug 27, 2008, at 7:34 PM, Shi Hui Liu wrote: Hi, I think I should clarify my question a little bit. I'm using BooleanQuery to combine TermQuery(A) and TermQuery(B). But I'm not satisfied with its scoring algorigthm. Is there other queries can boost up the documents with 50 of A and 50

Re: Clarity: Is there a Query boosting 50-50 over 1000-1 ?

2008-08-28 Thread Grant Ingersoll
uery doing same thing since I'm still a new user to Lucene. Thank you in advance, Shi Hui -Original Message- From: Grant Ingersoll [mailto:[EMAIL PROTECTED] Sent: Thursday, August 28, 2008 5:57 AM To: java-user@lucene.apache.org Subject: Re: Clarity: Is there a Query boosting 50-50 ov

Re: Clarity: Is there a Query boosting 50-50 over 1000-1 ?

2008-08-29 Thread Grant Ingersoll
On Aug 29, 2008, at 7:53 AM, Sébastien Rainville wrote: I'm curious... what do you mean by "It's not perfect (there is no such thing) but it works pretty well in most cases, and works great if you spend a little time figuring out the right length normalization factors." ? Can you plz elabor

Re: Lucene Memory Leak

2008-09-02 Thread Grant Ingersoll
-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/Luce

Re: concise definition of Lucene score?

2008-09-03 Thread Grant Ingersoll
heir contents. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfP

Re: Pre-filtering for expensive query

2008-09-03 Thread Grant Ingersoll
On Aug 30, 2008, at 3:14 PM, Andrzej Bialecki wrote: Matt Ronge wrote: Hi all, I am working on implementing a new Query, Weight and Scorer that is expensive to run. I'd like to limit the number of documents I run this query on by first building a candidate set of documents with a boolean

Re: Lucene Index

2008-09-09 Thread Grant Ingersoll
Term frequency information is kept in the index. On Sep 9, 2008, at 11:54 AM, Marie-Christine Plogmann wrote: Hi all, I am currently using a (slightly modified) version of the IndexFiles demo class of Lucene to index a corpus. As I understand it, the index lists for each term the documents

Re: memory leak during Lucene Search

2008-09-09 Thread Grant Ingersoll
Just chipping in that I recall there being a number of discussions on java-dev about ThreadLocal and web containers and how they should be handled. Not sure if it pertains here or not, but you might find http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal helpful. You might a

Re: TopDocCollector & Paging

2008-09-17 Thread Grant Ingersoll
On Sep 17, 2008, at 11:51 AM, Cam Bazz wrote: And how about queries that need starting position, like hits between 100 and 200? could we pass something to the collector that will count between 0 to 100 and then get the next 100 records? The collector uses a Priority Queue to store doc ids a

Re: TopDocCollector & Paging

2008-09-17 Thread Grant Ingersoll
2008/9/17 Grant Ingersoll <[EMAIL PROTECTED]> On Sep 17, 2008, at 11:51 AM, Cam Bazz wrote: And how about queries that need starting position, like hits between 100 and 200? could we pass something to the collector that will count between 0 to 100 and then get the next 100 reco

Re: TopDocCollector & Paging

2008-09-17 Thread Grant Ingersoll
On Sep 17, 2008, at 6:53 PM, Dino Korah wrote: Thanks Grant.. Please see my comments/response below. 2008/9/17 Grant Ingersoll <[EMAIL PROTECTED]> On Sep 17, 2008, at 4:39 PM, Dino Korah wrote: I know in applications where we search for a words or phrases and expect the result

Re: Multi Field search without Multifieldqueryparser

2008-09-23 Thread Grant Ingersoll
[EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance

Re: Multi Field search without Multifieldqueryparser

2008-09-23 Thread Grant Ingersoll
ry will become very inefficient as there can be thousands of fields. I think it should clarify my point. On Tue, Sep 23, 2008 at 1:58 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: So, the piece I'm missing is how do you know what field for which terms. In other words how do you know

Re: Index time Document Boosting and Query Time Sorts

2008-09-24 Thread Grant Ingersoll
:2008010200, depth- factor: 1/2 Folder1/Folder2/Folder3/Folder4/text_file3.txt, mod-time: 2008010100, depth-factor: 1/4 Folder1/Folder2/Folder3/Folder4/Folder5/text_file4.txt, mod-time:2008010500, depth-factor: 1/5 Many thanks. Dino -- Grant

Re: CorruptIndexException workaround in 2.3-SNAPSHOT? (Attn: Michael McCandless)

2008-09-26 Thread Grant Ingersoll
On Sep 26, 2008, at 6:30 AM, Michael McCandless wrote: Ari Miller wrote: According to https://issues.apache.org/jira/browse/LUCENE-1282?focusedCommentId=12596949 #action_12596949 (Sun hotspot compiler bug in 1.6.0_04/05 affects Lucene), a workaround for the bug which causes the CorruptInd

Re: 2.4 release candidate 2

2008-09-26 Thread Grant Ingersoll
Looks good. On Sep 25, 2008, at 11:11 AM, Michael McCandless wrote: Hi, I just created the second release candidate for Lucene 2.4, here: http://people.apache.org/~mikemccand/staging-area/lucene2.4rc2 These are the fixes since RC1: * Issues with CheckIndex (LUCENE-1402) * Removed new y

Re: How to restore corrupted index

2008-09-26 Thread Grant Ingersoll
, nrm Remains there in index directory. Is there any way that such issue does not occur at all or if it happens we can recover the index data again? It would be a great help, if some one can. Regards, Chaula -- Grant Ingersoll http://www.lucidimagination.com L

ApacheCon US promo

2008-09-26 Thread Grant Ingersoll
Cross-posting... Just wanted to let everyone know that there will be a number of Lucene/ Solr/Mahout/Tika related talks, training sessions, and Birds of a Feather (BOF) gatherings at ApacheCon New Orleans this fall. Details: When: November 3-7 Where: Sheraton, New Orleans, USA URL: http://u

Re: advice on using Lucene for sorting based on payloads

2008-10-07 Thread Grant Ingersoll
you could do this, then you could implement a Function Query to do so. You might look at Solr's function query capabilities as well. Alex -- Grant Ingersoll Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java

Re: Is lucene right for us

2008-10-12 Thread Grant Ingersoll
PROTECTED] ------ Grant Ingersoll Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans. http://www.lucenebootcamp.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java

Re: Retrieving Top Terms for a subset of the index (or for all results of a query)

2008-10-12 Thread Grant Ingersoll
Aleksander -- Aleksander M. Stensby Senior Software Developer Integrasco A/S +47 41 22 82 72 [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --

Re: Access Scoring Values of Lucene for Post-Processing

2008-10-12 Thread Grant Ingersoll
PROTECTED] ------ Grant Ingersoll Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans. http://www.lucenebootcamp.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java

Re: About TermQuery

2008-10-20 Thread Grant Ingersoll
ls. MultiReader is not the solution for me :). Thanks! Carlos -- Grant Ingersoll Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans. http://www.lucenebootcamp.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.ap

  1   2   3   4   5   6   7   8   9   10   >