from:"Bob Carpenter"

Book: Building Search Applications: Lucene, LingPipe and Gate

2008-06-12 Thread Bob Carpenter

ssary, but the book never loses sight of its goal of providing a practical introduction. In that way, it’s like the Manning "in Action" series. About the author: Manu Konchady has a home page/blog on Amazon: http://www.amazon.com/gp/blog/A2TWRNMTU6T9TW/ref=cm_blog_dp_artist_blog - Bob C

Re: Lucene for Sentiment Analysis

2008-03-07 Thread Bob Carpenter

Gathering more data like that from Amazon, C-net, etc. should be easy. That's what everyone's doing for evaluations. But these are all at the review level, not at the sentence level. We've actually had customers annotating at the sen

Re: How do i get a text summary

2008-02-29 Thread Bob Carpenter

.pdf Both LingPipe and Kea are able to find significant phrases, which is useful for query refinement or summarizing sets of search results, but not so useful for individual documents. It can be a huge help to add part-of-speech information to these kinds of approaches. -

Re: Content Summarization

2007-06-19 Thread Bob Carpenter

arch in this area is coming out of Kathy McKeown's group at Columbia, not to mention the horde of students she's graduated over the last ten years, such as Drago Radev, the author of the second tutorial and software above. - Bob Carpenter Alias-i ---

Re: Keyphrase Extraction

2007-05-08 Thread Bob Carpenter

here's a blog entry comparing our hypothesis testing approach to a standard mutual-info based method (discussed by Matthew Hurst, when he was at Nielsen BuzzMetrics): http://www.alias-i.com/blog/?p=14 - Bob Carpenter Alias-i - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Language detection library

2007-05-07 Thread Bob Carpenter

22.59% 2 34.82% 4 58.55% 8 81.17% 16 92.45% 32 97.33% 64 98.99% 128 99.67% The end of the tutorial has references to other popular language ID packages online (e.g. TextCat, which is Gertjan van Noord's Perl package). And it also has

Re: is there any n-gram analyzer available??

2006-11-21 Thread Bob Carpenter

and maximum n-gram length. You might want to put them in different fields if you want weighting between them to be easy. - Bob Carpenter Alias-i - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: does anyone know of a 'smart' categorizing text pattern finder?

2006-11-21 Thread Bob Carpenter

oblems, you might want to check out Weka. - Bob Carpenter Alias-i - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Analyzers and multiple languages (language detection)

2006-11-21 Thread Bob Carpenter

out our tutorial at: http://www.alias-i.com/lingpipe/demos/tutorial/langid/read-me.html Accuracy depends on the pair of languages (some are more confusible than others), as well as length of input (it's very hard with only one or two words, especially if it's a a nam

Re: Analysis/tokenization of compound words (German, Chinese, etc.)

2006-11-21 Thread Bob Carpenter

recall is much easier than fine-grained linguistic morphology. Often the best solution is a combination of best-guess based on linguistic rules/statistical models/heuristics combined with weaker substring measures. For beter solutions that would cover fuzzy errors, contact Bob Carpenter from Al

Re: a "fair" similarity

2006-11-21 Thread Bob Carpenter

because the doc vectors remain stable as new docs are added. Then, in general: score(doc,doc) < score(doc,doc') if IDF(doc) = doc'. That is, the inversely IDF-scaled query matches a document better than the document itself. - Bob Carpenter Alias-i -

Re: lucene in combination with pattern recognition...

2006-06-22 Thread Bob Carpenter

iness is a testament to how hard this problem is in general. - Bob Carpenter Alias-i i'm looking at a problem and i can't figure out how to "easily" solve it... basically, i'm trying to figure out if there's a way to use lucene/nutch with some form of pattern match

Re: Phrase Frequency For Analysis

2006-06-22 Thread Bob Carpenter

nt("t") / collectionSize collectionCount("t") = count of term "t" in the collection collectionSize = number of term instances (not types) in the collection - Bob Carpenter Alias-i Andrzej Bialecki wrote: Nader Akhnoukh wrote: Yes, Chris is correct, the goal is to det

Re: Lucene and SIPs

2006-06-22 Thread Bob Carpenter

"t1")*probFG("t2") to both find things that are new and that are phrase-like. I'm going to be writing this all up in a bit longer form in a case study for the revised Lucene in Action, with explanations of how to find the significant terms relative to a query, like Scirus.com doe

Re: JVM Crash

2006-06-13 Thread Bob Carpenter

in the JVM until I replaced my memory with ECC memory a couple of years ago, and haven't seg-faulted since. - Bob Carpenter Ross Rankin wrote: We keep getting JVM crashes on 1.4.3. I found in the archive that setting a JVM parameter solved the problem for a few users. We've tried that a

Re: question with spellchecker

2006-06-13 Thread Bob Carpenter

et al. a lot of problem with false positives (correcting things that were right) and false negatives (missing corrections). This is especially obvious once you drop into a specialized domain that's not computer science (which is over-represented proportionally on the web), or a language that

Book: Building Search Applications: Lucene, LingPipe and Gate

Re: Lucene for Sentiment Analysis

Re: How do i get a text summary

Re: Content Summarization

Re: Keyphrase Extraction

Re: Language detection library

Re: is there any n-gram analyzer available??

Re: does anyone know of a 'smart' categorizing text pattern finder?

Re: Analyzers and multiple languages (language detection)

Re: Analysis/tokenization of compound words (German, Chinese, etc.)

Re: a "fair" similarity

Re: lucene in combination with pattern recognition...

Re: Phrase Frequency For Analysis

Re: Lucene and SIPs

Re: JVM Crash

Re: question with spellchecker

16 matches

Site Navigation

Mail list logo

Footer information