merge will also change docid
all segments' docId begin with 0
2011/3/30 Trejkaz :
> On Tue, Mar 29, 2011 at 11:21 PM, Erick Erickson
> wrote:
>> I'm always skeptical of storing the doc IDs since they can
>> change out from underneath you (just delete even a single
>> document and optimize).
>
> W
On Tue, Mar 29, 2011 at 11:21 PM, Erick Erickson
wrote:
> I'm always skeptical of storing the doc IDs since they can
> change out from underneath you (just delete even a single
> document and optimize).
We never delete documents. Even when a feature request came in to
update documents (i.e. dele
On Tue, Mar 29, 2011 at 6:56 PM, Christopher Condit wrote:
> Ideally I'd like to have the parser use the
> custom analyzer for everything unless it's going to parse a clause into
> a PhraseQuery or a MultiPhraseQuery, in which case it uses the
> SimpleAnalyzer and looks in the _exact field - but
I have Lucene indexes build using a shingled, stemmed custom analyzer.
I have a new requirement that exact searches match correctly.
ie: bar AND "nachos"
will only fetch results with plural nachos. Right now, with the
stemming, singular nacho results are returned as well. I realize that
I'm going t
One last thing, how do I check if the random document does not contain the
term ?
In other words, I cannot just pass the TermsFilter but I need to check if
the retrieved random document is valid or not to know if I have enough.
Any code example is appreciated.. so far I have this one, to retrieve
> Plan A sounds better because I don't want to consider the entire collection
> and then remove results from it.
Fine, your choice.
> However, the same code has to work with 2 different collections. The first
> one has 30.000 docs the other one 90.000.
No problem. The number of docs is irreleva
Plan A sounds better because I don't want to consider the entire collection
and then remove results from it.
However, the same code has to work with 2 different collections. The first
one has 30.000 docs the other one 90.000.
How can I get the total amount of docs from a collection ?
thanks
O
Here are a couple of ideas.
Plan A.
Think of a number, say 10, retrieve n * 10 docids in your search and
then loop round java.util.Random.nextInt(n * 10) until you've got
enough.
Plan B.
Reverse your MUST NOT search to get a list of docids that you don't
want, then loop round Random.nextInt(ind
It is in the zip/tar.gz file from Hudson under contrib! Alternatively load
the maven artifacts from Lucene's Maven Build in Hudson:
https://builds.apache.org/hudson/job/Lucene-Solr-Maven-trunk/lastSuccessfulB
uild/artifact/maven_artifacts/org/apache/lucene/
Uwe
-
Uwe Schindler
H.-H.-Meier-All
Ok I've solved the first part of the problem. I'm now selecting all
documents that do not contain a given term with a BooleanFilter
and FilterClause, MUST NOT.
I still have to understand how to retrieve random documents and limit the
number of retrieved docs to N.
thanks
On 29 March 2011 20:40,
Is there a Filter to get a limited number of random collection docs from the
index which DO NOT contain a specific term ?
i.e. term="pizza"
I want to run the query against 10 random documents of the collection that
do not contain the term "pizza".
thanks
Nevermind, I've compiled it using ant. solved thanks
On 29 March 2011 17:41, Patrick Diviacco wrote:
> Ok, the svn repository I can only find the source files. Should I build the
> jar by myself or is there a packaged jar to download ?
>
> thanks
>
>
> On 29 March 2011 16:00, Uwe Schindler wrot
Ok, the svn repository I can only find the source files. Should I build the
jar by myself or is there a packaged jar to download ?
thanks
On 29 March 2011 16:00, Uwe Schindler wrote:
> Hi,
>
> The TermsFilter is not in Lucene Core. You have to use one of the contrib
> JARS (I think it is contri
I've written an "Understanding Lucene" refcard that has just been published at
DZone. See here for details:
http://www.lucidimagination.com/blog/2011/03/28/understanding-lucene-by-erik-hatcher-free-dzone-refcard-now-available/
If you're new to Lucene or Solr, this refcard will be a nice gro
Hi,
The TermsFilter is not in Lucene Core. You have to use one of the contrib
JARS (I think it is contrib-queries, so should be lucene-queries.jar).
Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: P
If you are using Machine learning techniques for concept learning then
you can use mahout library. Mahout has plenty of clustering
algorithms(https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms)
which are useful for concept learning.
Thanks
Vineet Yadav
On Tue, Mar 29, 2011 at 12:42 PM, h
Java codes to do what? You might ask machine learning questions over on
u...@mahout.apache.org, but please be provide more details on what you are
doing.
On Mar 29, 2011, at 3:12 AM, henok sahilu wrote:
> Hello All
> Recently, I am trying to develop an automatic definition extraction system
>
I get that in response to this:
import org.apache.lucene.search.TermsFilter;
well I'm only using this jar: lucene-core-4.0-20110304.141738-1.jar
and for example this line of my code compiles correctly:
booleanQuery.add(new QueryParser(org.apache.lucene.util.Version.LUCENE_40,
"tags", new
Whitesp
You get this in response to doing what? Are you sure you've unpackaged
the nightly build and aren't inadvertently getting older jars?
Best
Erick
On Tue, Mar 29, 2011 at 7:21 AM, Patrick Diviacco
wrote:
> I've downloaded the nightly build of Lucene (TRUNK) and I'm referring to the
> following doc
I'm always skeptical of storing the doc IDs since they can
change out from underneath you (just delete even a single
document and optimize). What is it you're doing with
the doc ID that you couldn't do with the guid? If your "guid list"
were ordered, I can imagine building filters quite quickly fro
I've downloaded the nightly build of Lucene (TRUNK) and I'm referring to the
following documentation:
https://hudson.apache.org/hudson/view/G-L/view/Lucene/job/Lucene-trunk/javadoc/all/index.html
But I get:
cannot find symbol
symbol : class TermsFilter
location: package org.apache.lucene.search
http://wiki.apache.org/lucene-java/LuceneFAQ#How_do_I_restrict_searches_to_only_return_results_from_a_limited_subset_of_documents_in_the_index_.28e.g._for_privacy_reasons.29.3F_What_is_the_best_way_to_approach_this.3F
--
Ian.
On Tue, Mar 29, 2011 at 11:54 AM, Patrick Diviacco
wrote:
> hi,
>
>
hi,
Can I run a query against few specific docs of the collection only ?
Can I filter the built collection according to documents fields content ?
For example I would like to query over documents having field2 = "abc".
thanks
> 1 - I'm using commons Digester as xml parser, how can I find the bottleneck
> ? Should I run the code and comment out the Lucene queries part and just
> leave the xml parsing ?
That is what I was suggesting.
> 2 - I actually also wanted to know the following: how much does it take to
> run a 10
1 - I'm using commons Digester as xml parser, how can I find the bottleneck
? Should I run the code and comment out the Lucene queries part and just
leave the xml parsing ?
2 - I actually also wanted to know the following: how much does it take to
run a 100MB queries text file against each single
My machine is Intel Dual Duo Core with 4GB ram.. is there something wrong
here ?
On 29 March 2011 11:22, Patrick Diviacco wrote:
> hi,
>
> I performing multiple queries (stored in a 100MB XML file) against a
> collection (indexed with lucene, and it was stored before in a 100MB XML
> file).
>
>
You need to figure out what is taking the time, for example by reading
the XML file without making any lucene queries. What XML parsing
process are you using? Some are faster than others. A google search
should find loads of info.
If it turns out that it is lucene searching taking most of the t
hi,
I performing multiple queries (stored in a 100MB XML file) against a
collection (indexed with lucene, and it was stored before in a 100MB XML
file).
The process seems pretty long on my machine (more than 2 hours), so I was
wondering if importing the 100MB queries XML file into a mysql dataset
hey Uwe, so from your last answer, I understand I'm done.. no need to do
anything, I can already compare the queries.
However there is actually a misunderstanding: my booleanqueries have
variable number of boolean clauses because the fields are fixed but the
terms per field are not. So, for exampl
> thanks for your reply. I thought I've solved the issue according to Uwe,
the
> queries without coord function were reasonably comparable, but now you
> actually reopened it.
>
> So, I need to be sure I'm making them comparable and I would like to ask
the
> following.
>
> My BooleanQueries have
hey Hoss,
thanks for your reply. I thought I've solved the issue according to Uwe, the
queries without coord function were reasonably comparable, but now you
actually reopened it.
So, I need to be sure I'm making them comparable and I would like to ask the
following.
My BooleanQueries have simil
Hello All
Recently, I am trying to develop an automatic definition extraction system for
Amharic Language - using machine learning technique (Version Space learning).
Can anyone suggest me some java codes to start with?
Thank You
Henok
32 matches
Mail list logo