Hello,
Quick poll for those who have an opinion about what index size monitoring
should report in terms of the number of documents in the index.
Poll: http://blog.sematext.com/2012/02/13/poll-solr-index-size-monitoring/
For example, imagine that in some 5-minute time period (say 10:00 AM to 10:
Hi,
Maybe https://github.com/sematext/ActionGenerator could be of help?
We use it to produce query load for Solr and ElasticSearch and the whole thing
is extensible, so you could easily add support for talking directly to Lucene.
Oh, and there is the benchmark in Lucene:
http://lucene.apache.or
Hi,
When Lucene scores matching documents, what is the order in which
documents are processed/scored and can that be changed? I'm guessing
it scores matches in whichever order they are stored in the index/on
disk, which means by increasing docIDs?
I do see some out of order scoring is possible..
Hi,
Have a look at http://www.youtube.com/watch?v=13yQbaW2V4Y . I'd say
it's easier than Mahout, especially if you already have and know your
way around Solr.
Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm
On Fri, Jun 28, 2013 at
Hi,
It doesn't have to be one or the other. In the past I've built a news
recommender engine based on CF (Mahout) and combined it with Content
Similarity-based engine (wasn't Solr/Lucene, but something custom that
worked with ngrams, but it may have as well been Lucene/Solr/ES). It
worked well.
I think what Tomislav was trying to ask is:
Can filters replace only strictly boolean clauses (i.e. only MUST and
MUST_NOT), such as: +gender:F, -rating:xxx)?
Or can filters also replace SHOULD clauses, such as: food:banana (which is
neither absolutely required or strictly prohibited)?
Otis
--
I think others will have more thoughts on this, esp. for Numeric* questions...
but I'll try answering...
- Original Message
> From: Tomislav Poljak
> To: java-user@lucene.apache.org
> Sent: Fri, May 7, 2010 2:34:46 PM
> Subject: Filter vs. TermQuery performance
>
> Hi,
> when is it w
Pasa,
Maybe Field Collapsing (Solr) can help? See SOLR-236 in JIRA
http://search-lucene.com/?q=field+collapsing&fc_project=Lucene&fc_project=Solr
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
- Original Message --
I think those doc-oriented DBs tend to be distributed, with replication
built-in and such, but yes, in some way the schemaless DB with docs and fields
(whether they are pumped in as JSON or XML or Java objects) feels the same. I
saw something from Grant about 2 months ago how Lucene is "nosql-i
VL,
Solr (not Lucene, but you can embed Solr) has JsonUpdateRequestHandler, which
lets you send docs to Solr for indexing in JSON (instead of the usual XML):
http://search-lucene.com/c/Solr:/src/java/org/apache/solr/handler/JsonUpdateRequestHandler.java
And you can get Solr to respond with JSON
Hi Pablo,
This question comes up every once in a while. You'll find some previous
discussions and answers here:
http://search-lucene.com/?q=terms+closer+together+score
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
-
Li Li:
Then best to go to the source.
Here's one version with syntax highlighting and line numbers, should you have
questions about specific parts of that class:
http://search-lucene.com/c/Lucene:/src/java/org/apache/lucene/search/PhraseQuery.java
Otis
Sematext :: http://sematext.com/ ::
Btw. folks, http://search-lucene.com/ has a really handy source code search
with auto-completion for Lucene, Solr, etc. For example, I typed in: numDel -
and immediately found those methods. Use it. :)
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search
Other than iostat, vmstat and such?
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
- Original Message
> From: Jason Rutherglen
> To: java-user@lucene.apache.org
> Sent: Thu, June 3, 2010 2:13:17 PM
> Subject: Mo
Ah, there is another one I came across several months back -
http://wiki.sdn.sap.com/wiki/display/Java/JPicus.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
- Original Message
> From: Otis Gospodnetic
&
Lucene/Solr choice typically means:
* lower cost of ownership (think about various crazy licensing models some of
the commercial search vendors have: per doc, per server, per query, per
year)
* faster implementation (just think about the duration of the sales/negotiation
phase for commerci
Off the top of my head:
FAST
Endeca
Coveo
Attivio
Vivisimo
Google Search Appliance
(tell me when to stop)
Dieselpoint
IBM OmniFind
Exalead
Autonomy
dtSearch
ISYS
Oracle
...
...
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com
nd Lucene...
And I
> personally wouldn't count full text search solutions such as
> Oracle's.
Itamar.
> -----Original Message-
> From:
> Otis Gospodnetic [mailto:
> href="mailto:otis_gospodne...@yahoo.com";>otis_gospodne...@yahoo.com]
>
On Wed, Jun 23,
> 2010 at 11:41 PM, Otis Gospodnetic
<
> ymailto="mailto:otis_gospodne...@yahoo.com";
> href="mailto:otis_gospodne...@yahoo.com";>otis_gospodne...@yahoo.com>
> wrote:
> Off the top of my head:
>
> FAST
>
> Endeca
> Co
too, to show how it has improved in
the last
> versions (not that it was bad before) does anyone have a link
to a nice page
> with numbers/graphs ?
On Thu, Jun 24, 2010 at 7:43 AM, Otis
> Gospodnetic
<
> href="mailto:otis_gospodne...@yahoo.com";>otis_gospodne...@yahoo.co
Igor,
You can treat that question as the query and use it to search the index where
you've indexed other questions.
More Like This is another option.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
- Original Message
Utku, you should ask via comments on
https://issues.apache.org/jira/browse/LUCENE-2453.
What happened with Lucandra?
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
- Original Message
> From: Utku Can Topçu
> To
Manning, the Lucene in Action publisher, frequently offers 30-50% off on a
number of their books, including LIA2.
See http://twitter.com/ManningBooks
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
- Original Message
Hello Luan,
I think you are looking for facets and faceted search. In short, it means
storing the category for a document (web page) in the Document Field in Lucene
index . Then, at search time, you count how many matches were in which
category. You can implement this yourself or you can use
There is also a non-Mahout Key Phrase Extractor for Collocations, SIPs, and a
few other things: http://sematext.com/products/key-phrase-extractor/index.html
One of the demos that uses news data is at
http://sematext.com/demo/kpe/index.html
Otis
Sematext :: http://sematext.com/ :: Solr - Lu
Hi,
Are you actually talking about Solr? Sounds like it. Check solr-u...@lucene
list.
Maybe you need to treat those words are protected words? See the protwords.txt
file in the conf dir.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://se
s.searchenginewatch.com/showthread.php?t=48>.
> I hope to find some code that given a text corpus, generate all the words
> pairs with their probability of occurring together.
>
>
> On Sat, Aug 21, 2010 at 1:46 AM, Otis Gospodnetic <
> otis_gospodne...@yahoo.com> wro
Hello,
You can use LuSQL to index DB content into Lucene. Solr (the "Lucene Server")
has DataImportHandler for indexing data from DBs:
http://search-lucene.com/?q=dataimporthandler
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-luce
Hello,
Of course, if you actually want the last 7 days rolling effect and not the this
week vs. previous week, then you could go with smaller indices, say daily ones.
Then you'd always add new docs to the latest index and removing the oldest
index
completely every 24 hours.
You could go hourly
Hi Clemens,
If you will be searching individual languages, go with language-specific
indices. Wunder likes to give an example of "die" in German vs. English. :)
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
- Orig
> [X] ASF Mirrors (linked in our release announcements or via the Lucene
>website)
>
> [X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
>
> [X] I/we build them from source via an SVN/Git checkout.
>
> [] Other (someone in your company mirrors them internally or via a
> d
Hi Ganesh,
You could probably use replication scripts from Solr.
But why not just use Solr?
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
- Original Message
> From: Ganesh
> To: java-user@lucene.apache.org
> S
Mark,
Keep in mind that there are actually multiple patches for this. SOLR-236 and
SOLR-1086 IIRC.
Also, I just noticed this is java-user@lucene. You may want to continue on
solr-user@lucene.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http:/
Hi Chris,
Yes, people have done classification with Lucene before. Have a look at
http://search-lucene.com/?q=classifier&fc_project=Lucene for some discussions
and actual code (in old JIRA issues)
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: ht
I think what's being described here is a lot like what I *think* ElasticSearch
does, where there is no single master and index changed made to any node get
propagated to N-1 other nodes (N=number of index replicas). I'm not sure how
it
deals with situations where "incompatible" index changes a
Hi,
I'm looking at some code that uses MemoryIndex (Lucene 3.1) and that's
exhibiting a strange behaviour - it slows down over time.
The MemoryIndex contains 1 doc, of course, and executes a set of a few thousand
queries against it. The set of queries does not change - the same set of
queries
y ArrayUtils.mergeSort()
> and see if problem is still there?
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> > -Original Message-
> > From: Otis Gospodnetic [mailto:otis
s and stack overflow. In Lucene 3.0 this used
> > > stock java sort (which is mergesort), maybe replace the
> > > ArrayUtils.quickSort my ArrayUtils.mergeSort() and see if problem is
> still
> > there?
> > >
> > > Uwe
> > >
> > > -
> > >
at (nearly) full speed and once
> you hit the breakpoint, inspect the stack, variables, etc...
>
> Dawid
>
> On Fri, Apr 29, 2011 at 1:40 PM, Otis Gospodnetic <
> otis_gospodne...@yahoo.com> wrote:
>
> > Hi,
> >
> > OK, so it looks like it's not
Hi,
Is there any reason why one would *not* want to reuse Query instances?
I'm using MemoryIndex with a fixed set of queries and I'm executing them all on
each new document that comes in. Because each document needs to have many tens
of thousands of queries executed against it, I thought I'd j
Hi,
I'd like to solicit your thoughts about Search Analytics if you are doing any
sort of analysis/reporting of search logs or click stream or anything related.
* Which information or reports do you find the most useful and why?
* Which reports would you like to have, but don't have for whatever
Hi,
I think this describes what's going on:
10 load N stored queries
20 parse N stored queries, keep them in some List forever
30 for each incoming document create a new MemoryIndex instance "mi"
40 for query 1 to N do mi.search(query)
Over time this step 40 takes longer and longer and longer --
Hi,
I didn't read this thread closely, but just in case:
* Is this something you can handle with synonyms?
* If this is for English and you are trying to handle typos, there is a list of
common English misspellings out there that you could use for this perhaps.
* Have you considered n-gramming yo
k that just "n-grams" the docs/fields.
>
> class SimpleNGramAnalyzer extends Analyzer
> {
> @Override
> public TokenStream tokenStream ( String fieldName, Reader reader )
> {
>EdgeNGramTokenFilter... ???
> }
> }
>
> > -Ursprüngliche Nachric
eld content) as it is...
>
> > -Ursprüngliche Nachricht-
> > Von: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
> > Gesendet: Dienstag, 3. Mai 2011 21:31
> > An: java-user@lucene.apache.org
> > Betreff: Re: AW: AW: "fuzzy prefix" search
> &g
ne - Nutch Lucene
> > > > ecosystem search :: http://search-lucene.com/
> > > >
> > > >
> > > >
> > > > ----- Original Message
> > > > > From: Clemens Wyss
> > > > > To: "java-user@lucene.apache.org"
If only you were using Solr
http://wiki.apache.org/solr/DisMaxQParserPlugin#bf_.28Boost_Functions.29
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
- Original Message
> From: Johnbin Wang
> To: java-user@l
We've used Hadoop MapReduce with Solr to parallelize indexing for a customer
and that brought down their multi-hour indexing process down to a couple of
minutes. There is/was also Lucene-level contrib in Hadoop that makes use of
MapReduce to parallelize indexing.
Otis
Sematext :: http://
Hello,
I saw mentions of something called "Caste" a while back, but only now looked at
what it is, and it sounds like something that's potentially interesting/useful
(performance-wise) for Lucene/Solr.
See http://twitter.com/#!/otisg/status/109768673467699200
Has anyone tried it with Lucene/S
Hello folks,
Do you ever use http://search-lucene.com (SL) or http://search-hadoop.com (SH)?
If you do, I'd like to ask you for a small favour:
We are at Lucene Eurocon in Barcelona and we are about to show the Search
Analytics [1] and Performance Monitoring [2] tools/services we've built and
t
Bok Tamara,
You didn't say what -Xmx value you are using. Try a little higher value. Note
that loading field values (and it looks like this one may be big because is
compressed) from a lot of hits is not recommended.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene e
Have a look at http://search-lucene.com/ where you can search Lucene mailing
list archives (user, dev, common) its web site, wiki, source code, jira, etc.
as well as the same types of data for Solr, Nutch, and so on.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene eco
Hi,
Logstash is the piece that first touches your logs, filters them, and then
outputs them somewhere.
People often use it with ElasticSearch. Once logs are in ES, they look at them
with Kibana.
Note: somebody should write a Logstash output for Solr!
In Solr world there is Flume, which has a
Hi,
(cross-posting to both Solr and Lucene user lists because while this is a
Lucene-level question, I suspect a lot of people who know about this or are
interested in this subject are actually on the Solr list)
I have a large append-only index and I looked at merge policies hoping to
identify one
Thanks Mike(s) & Co.
Added https://issues.apache.org/jira/browse/LUCENE-5419
Sounds like a killer feature :)
Otis
On Wed, Jan 8, 2014 at 4:17 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> On Mon, Jan 6, 2014 at 3:42 PM, Michael Sokolov
> wrote:
> > I think the key optimization
Hello,
We have what I think is a great opening at Sematext. Ideal candidate would
be in New York, but that's not an absolute must. More info below + on
http://sematext.com/about/jobs.html in job-ad-speak, but I'd be happy to
describe what we are looking for, what we do, and what types of companie
Hi,
I spotted Uwe's comment in JIRA the other day "BTRFS, which might also
bring some cool things for Lucene.".
Has anyone tried Lucene (or Solr or Elasticsearch) with BTRFS and seen some
(performance) benefits over ext3/4 or xfs for example?
Thanks,
Otis
--
Monitoring * Alerting * Anomaly D
Heiko,
It's most likely because that B case has a purely negative query. Perhaps you
can combine it with MatchAllDocs query?
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: Heiko <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sen
}
> return false;
> }
>
> For this simplified call:
>
> public boolean next() {
> return (id++ < maxId);
> }
>
> This change doesn't validate deleted documents, in my implementation it was
> not a problem, so, it's possible that this
Giovanni,
You could try the approach you described - one index per user. When I built
Simpy (see http://simpy.com ) a few years ago I chose the same approach and I
never regretted it. The hardware behind Simpy is very modest, usage is high,
and I never had problems with too many indices open
Dino, you lost me half-way through your email :(
NO_NORMS does not mean the field is not tokenized.
UN_TOKENIZED does mean the field is not tokenized.
Otis--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: Dino Korah <[EMAIL PROTECTED]>
> To: java
Dino,
If a field is not tokenized then it is indexed as is.
For example: "Dino Korah" would get indexed just like that. It would not get
split into multiple tokens, it would not be lowercased, it would not have any
stop words removed from it, etc.
Otis
--
Sematext -- http://sematext.com/ -- Lu
> Field.Index.UN_TOKENIZED plus field.setOmitNorms(true).
>
> Probably we should rename it to Field.Index.UN_TOKENiZED_NO_NORMS?
>
> Mike
>
> Otis Gospodnetic wrote:
>
> > Dino, you lost me half-way through your email :(
> >
> > NO_NORMS does not me
Hi,
You may want to ask on the java-user list (more subscribers), which I'm CC-ing,
so we can continue discussion there.
I think you will have to implement your own logic that runs on A and does
something like this:
- stop adding new docs
- call commit on the IndexWriter
- copy the index
- res
Sithu,
Old emails: markmail.org
Sample code: Lucene in Action has free downloadable code -- manning.com/hatcher2
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: "Sudarsan, Sithu D." <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Se
dex every certain amount of time on A.
>
> -copy the index
> Copying the whole index everytime ?
>
> Currently i am investigating how i can make use of SOLR replication scripts
> to achive this.
>
>
> Is there anyone who did this with out SOLR before?
>
>
> Tha
So in other words, it *is* possible to have the field both tokenized and its
norms omitted?
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: Karl Wettin <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Thursday, August 28, 200
t;
> >
> > The snapinstaller runs on the slave after a snapshot has
> > been pulled from
> > the master. This signals the local Solr server to open a
> > new index reader,
> > then auto-warming of the cache(s) begins (in the new
> > reader), while ot
e.org
> Sent: Thursday, August 28, 2008 1:39:21 PM
> Subject: Re: Case Sensitivity
>
> Otis Gospodnetic wrote:
> > So in other words, it *is* possible to have the field both tokenized and
> > its
> norms omitted?
>
> Yes. Probably this is an unintended side-ef
This actually sounds bugish to me, but you removed the text from your original
email, so I don't know what context this was in.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: gaz77 <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Se
Joe,
CLucene is slightly behind Java Lucene, but I believe CLucene developers are
working on 2.3.2 port. I think that's the only C++ option.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: Joseph Kovacic <[EMAIL PROTECTED]>
> To: "java-us
Guy, ulimit -n is your friend. As is the compound index format.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: Guy Gavriely <[EMAIL PROTECTED]>
> To: "java-user@lucene.apache.org"
> Sent: Thursday, September 11, 2008 10:28:34 AM
> Subjec
Hi,
Check the Hits javadoc:
* @deprecated Hits will be removed in Lucene 3.0.
* Instead e. g. [EMAIL PROTECTED] TopDocCollector} and [EMAIL PROTECTED]
TopDocs} can be used:
*
* TopDocCollector collector = new TopDocCollector(hitsPerPage);
* searcher.search(query, collector);
* Scor
Yes, probably out of sync with the 2.3.2 code. Have you tried applying it to
the trunk?
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: Cam Bazz <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Monday, September 15, 2008 11:14
I don't think the "exists vs. doesn't exist" matters (but I should really try
it and see) as much as using Sort vs. not using it if you use sorting because
sorting required FieldCache loading.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> Fr
I think Daniel was suggesting you write your own HitCollector with its own "int
hits" counter var.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: Cam Bazz <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Monday, September 15,
Tobias,
That's the approach I took with Simpy.com and it's been working well for
several years now. You'll have to keep track of searchers and close them when
appropriate, of course.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: Tobias
Are the terms stopwords?
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: Cam Bazz <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Tuesday, September 16, 2008 1:33:48 AM
> Subject: Phrase Query
>
> Hello,
>
> Lets say I have
If your index is increasing in size so fast, you should start thinking about
sharding your index (breaking it into multiple smaller indices that each fits
on its server) and searching across them (aka distributed search).
Yes, Lucene can handle millions of records if run on adequate hardware and
t.com/ -- Lucene - Solr - Nutch
- Original Message
> From: Otis Gospodnetic <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Tuesday, August 12, 2008 3:37:00 PM
> Subject: Case studies for Lucene in Action 2nd edition
>
> Hello,
>
> We are work
Hi,
Wrong list. :) I answered your question on solr-user.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: rahul_k123 <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Tuesday, September 23, 2008 11:00:02 PM
> Subject: Rsync cau
Gregor,
You could loop through the results or collect them using a custom HitCollector.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Friday, September 26,
I think somebody provided a patch (might have been a whole new IndexReader
impl?) mny moons ago (2005?), but it never attracted enough
interest to get committed.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: Michael Wechner <
bject: RE: Getting all found document ids from a search result
>
> Hi,
>
> Do I really get all results if I use a custom hitcollector?
> This would be great :-)
>
> Regards,
> Gregor
>
>
>
> -Original Message-
> From: Otis Gospodnetic [mailto:[EMAIL PR
David, this is not really a Lucene issue.
Here is some Perl code that you could either use or rewrite in Java if you need
it in Java:
http://search.cpan.org/dist/Date-Extract/
Tika won't help with this, and I believe UIMA itself with not help either,
although there may be components for date ex
Hello,
Very quick comments.
- Original Message
> From: Justus Pendleton <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Sunday, November 2, 2008 10:42:52 PM
> Subject: Performance of never optimizing
>
> Howdy,
>
> I have a couple of questions regarding some Lucene ben
Yes, I think it is. I think the only catch will be those log timestamps, how
fine you really need them to be, and if you want them very fine what happens
when you do range queries on timestamps. If you have a pile of log files lying
around, it should be pretty easy to get them indexed. You do
Christian,
If I understand your situation correctly, you should look at sloppy phrases and
at Span family of queries.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
From: Christian Reuschling <[EMAIL PROTECTED]>
To: java-user@lucene.apache
Or Tika, Lucene's cousin: http://incubator.apache.org/tika/
(which uses POI under the hood, but goes beyond MS Word parsing)
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
From: Donna L Gresh <[EMAIL PROTECTED]>
To: java-user@lucene.apache.or
Mario,
Does this help:
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/index/TermFreqVector.html
Plus:
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/index/IndexReader.html#method_summary
(look for "getTerm.Freq...")
Otis
--
Se
The more Documents you have to look at the slower it will be, but it may still
be fast enough - it's impossible to tell without considering index size, query
volume, hardware, number of hits/Docs, etc.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
___
Hi Mayur,
Solr has built-in support for facets. I don't understand what you mean by
scoped searches. Could you please give a concrete example?
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
From: "Bapat, Mayur" <[EMAIL PROTECTED]>
To: ja
There is CLucene. It's not a part of Apache, but lives on SourceForge, I
think.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: Ariel <[EMAIL PROTECTED]>
> To: lucene user
> Sent: Tuesday, December 2, 2008 2:13:08 PM
> Subject: I wou
Tim (and we should move this to java-dev if it gains traction),
Perhaps you can come up with a mechanism to perform scoring in two passes
instead of one:
- first pass is cheap and fast
- second pass is more expensive and slower
Currently, there is no choice - Lucene does 2). But perhaps you can
Yeah, I think we'll have to start paying the commission fee! ;)
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: Erick Erickson <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Friday, December 5, 2008 8:37:20 AM
> Subject: Re:
If Hoss is referring to synonym expansion, allow me to point out that freely
downloadable code from Lucene in Action (first edition) has code for that, if
you'd like to have a look, OP.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: Chri
Let me add to that that I clearly recall having a hard time getting the tests
for that particular section of LIA1 to clearly and consistently show that using
the RAMDirectory buffering approach instead of vanilla IndexWriter yields
faster indexing. Even back then IndexWriter buffered indexed da
Mark,
This is simple enough that it should be easy to put together. If you search
the ML archives you'll see that one of the common "tricks" is to "flip" host
name parts (e.g. com.sematext.www). The details of this have been discussed
before, so have a look.
Otis --
Sematext -- http://semat
Christian,
You can certainly purge old documents on a daily basis in order to keep the
corpus from growing, but note that 3M*90=270M 2K docs may be a bit too much for
a single index unless you really have lots of RAM or you don't need queries to
be quick. In other words, you may have to spread
.
> can you give me an idea what in your opinion would mean "don't need
> queries to be quick" ...
> i have no idea in what timeframe it could be handeled if it is not
> completely in RAM.
>
> regards chris
>
>
>
> On Mon, Dec 22, 2008 at 4:41 AM, Oti
1 - 100 of 835 matches
Mail list logo