Hi
suppose we have two Hits, now we need the documents which exists in both of
them and ignore the others.
is there any workaround?
thanks
Regards
Mohammad
--
View this message in context:
http://www.nabble.com/How-to-get-the-intersection-of-two-Hits--tf3484844.html#a9728397
Sent from the Lu
I registered in Nabble, but to post message you should subscribe to lucene
mailing list and if you subscribe to mailing list your inbox will become
full of messages. this is very bad!!!
On 3/28/07, John Haxby <[EMAIL PROTECTED]> wrote:
Grant Ingersoll wrote:
> I like the mailing list approach m
Hello
The collect(int doc, int score) method. in this method, which id the
argument doc refers to? the original id in the index or the id of search
result (the position of document in the search result)
I ask this, because I implement a HitCollector and collect the IDs in a
BitSet and it was ver
Thank you very much for your time. Here is a sample of a vector of terms :
v1 = {"sad", "john", "intelligent", "news", "USA", "disneyland", "MIT",
"cambridge", "marry", ...}
I'll try out your method.
Best regards,
Sengly
On 3/28/07, karl wettin <[EMAIL PROTECTED]> wrote:
28 mar 2007 kl.
On 3/28/07, Scott Oshima <[EMAIL PROTECTED]> wrote:
So I assumed a linear decay of performance as an index got bigger.
For some reason when going from an index size of 1.89 to 1.95 gigs
dramatically increased cpu across all of our servers.
I was thinking of splitting the 1.95 index into 2 separ
Well, if you're adding data, you must be doing something with
it later. Are you sure the problem is in the index growing and not
how you use the data afterwards?
The reason I ask is that I found almost a 10-fold increase in
my apps performance when I used FieldSelector (Lucene 2.1)
to only load t
Thanks Erik. For our purposes it seems more generally useful to use the
original start/end offsets.
Antony
Erik Hatcher wrote:
They aren't used implicitly by anything in Lucene, but can be very handy
for efficient highlighting. Where you set the offsets really all
depends on how you plan
Yeah it might be an hardware issue, with a slightly smaller index with
less stored data, the performance is what we want it to be. Just adding
5% more stored data(unidexed of course) pushes us over some sort of
threshold causing performance to tank.
-Original Message-
From: Erik Hatc
you need to use JAMA combined with Lucene, using the vectors that are builded
by lucene to compute SVD with JAMA
http://math.nist.gov/javanumerics/jama/
Best
jose
José Ramón Pérez Agüera
Dept. de Ingeniería del Software e Inteligencia Artificial
Despacho 411 tlf. 913947599
Facultad de Inform
you need to use JAMA combined with Lucene, using the vectors that are
builded by lucene to compute SVD with JAMA
http://math.nist.gov/javanumerics/jama/
Best
jose
On 3/28/07, Mark Stiner <[EMAIL PROTECTED]> wrote:
Hi,
I have a research project where I want to implement
I've just built a 9.3G index (admittedly tons of stored data in
there, 3.3M documents) and performance is amazing (through Solr).
Erik
On Mar 28, 2007, at 3:11 PM, Erick Erickson wrote:
This surprises me, I'm currently working with a 4G index, and the
improvement from when it was a
Hi,
I have a research project where I want to implement LSI technique. The scenario
is something as follows.
Search
the news sites for the locally event based news. Cluster the similar
news items together. For example hurricane in New York city.
We want to apply basic LSI a
This surprises me, I'm currently working with a 4G index, and the
improvement from when it was an 8G index was only 10% or so.
And it's plenty speedy.
Are you hitting hardware limitations and perhaps swapping like
crazy? In which case, unless you split things across several
machines, I doubt it w
So I assumed a linear decay of performance as an index got bigger.
For some reason when going from an index size of 1.89 to 1.95 gigs
dramatically increased cpu across all of our servers.
I was thinking of splitting the 1.95 index into 2 separate indexes and
using a multisearcher on those parts
I have integrated lucene to search into files i tried that in the
jsp-examples like that
1st step
Indexation des fichiers de JSP-examples
C:\Program Files\Apache Software Foundation\Tomcat
5.5\webapps\luceneweb\WEB-INF
\lib>java org.apache.lucene.demo.IndexFiles "C:\Program Files\Apache
Software
thx a
2007/3/28, Arun M.A. <[EMAIL PROTECTED]>:
I used to have this classpath issue when i set classpath for Java. Try
using
DOS based directory path instead of the windows name.
eg. instead of Documents and Settings use DOCUME~1
use dir/x to know your DOS name for the directory
On 3/28/07,
OK, gotcha. I now see what you mean. StandardAnalyzer uses the
StandardTokenizer, whereas StopAnalyzer uses the LowerCaseTokenizer,
which divides text at non-letters. What you most likely will need to
do is create a Tokenizer that outputs the original token, and outputs
the parts of it
Grant,
Thanks for your reply and the pointer to the custom code sample. I will be
checking into that today. I did delve into the src for the OOTB analyzers
and was aware of what they did. Still, the StandardAnalyzer does not do what
I want. The real difference between my needs and the results of t
28 mar 2007 kl. 15.24 skrev Sengly Heng:
Thank you but I still have have no clue of how to do that by using
Weka
after taking a look at its API. Let me reformulate my problem :
I have a collection of vector of terms (actually each vector of terms
represents the list of tokens extracted from
28 mar 2007 kl. 13.22 skrev mark harwood:
Odd. I'm sure it used to have a getReader method somewhere.
Still, you can use MemoryIndex.createSearcher().getIndexReader()
I've wrapped MemoryIndex in the unison index facade of LUCENE-550,
just as I did with all the other index implemementations
Grant Ingersoll wrote:
I like the mailing list approach much better. With a good set of
rules and folders in place (which takes about 15 minutes to setup),
one can easily manage large volumes of mail w/o batting an eye,
whereas forums require large amounts of navigation, IMO.
Glad I'm not th
Thank you but I still have have no clue of how to do that by using Weka
after taking a look at its API. Let me reformulate my problem :
I have a collection of vector of terms (actually each vector of terms
represents the list of tokens extracted from a file) and I do not have the
original files.
28 mar 2007 kl. 10.36 skrev Sengly Heng:
Does anyone of you know any Java API that directly handle this
problem?
or I have to implement from scratch.
You can also try
weka.filters.unsupervised.attribute.StringToWordVector, it has many
neat features you might be interested in. And if app
On Mar 28, 2007, at 3:51 AM, Antony Bowesman wrote:
I'm fiddling with custom anaylyzers to analyze email addresses to
store the full email address and the component parts. It's based
on Solr's analyzer framework, so I have a StandardTokenizerFactory
followed by a EmailFilterFactory. It p
I like the mailing list approach much better. With a good set of
rules and folders in place (which takes about 15 minutes to setup),
one can easily manage large volumes of mail w/o batting an eye,
whereas forums require large amounts of navigation, IMO.
On Mar 28, 2007, at 7:53 AM, Ted Hu
You can pass in a String or a Reader to Field when indexing. There
is nothing file specific about Lucene when it comes to indexing.
Take a look at the Field class for the various constructors.
On Mar 28, 2007, at 8:20 AM, Sengly Heng wrote:
Thanks but in my case I do not have the files. W
Since there are so many ASF projects, launching new infrastructure
initiatives can be difficult. Infrastructure wants everything to
scale, so that everyone can use it. We also want to be sure that there
are volunteers with the interest and ability to keep the product up
and running.
To get produc
Thanks but in my case I do not have the files. What I have is just a
collection of vectors of terms.
Does lucene provide any mean to index each vector of terms as a file? Or
there is a better way to do?
Thank everyone once again.
Regards,
Sengly
On 3/28/07, thomas arni <[EMAIL PROTECTED]> wr
Odd. I'm sure it used to have a getReader method somewhere.
Still, you can use MemoryIndex.createSearcher().getIndexReader()
- Original Message
From: Melanie Langlois <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Wednesday, 28 March, 2007 8:38:24 AM
Subject: RE: Reverse searc
I used to have this classpath issue when i set classpath for Java. Try using
DOS based directory path instead of the windows name.
eg. instead of Documents and Settings use DOCUME~1
use dir/x to know your DOS name for the directory
On 3/28/07, mohamed hadj taieb <[EMAIL PROTECTED]> wrote:
Hi
Hi
i have downloaded the lucene-2.1.0.zip file
and i tried to integrate into my application web
i have added the 2 jar files to the classpath into the Environment variables
like this
classpath:
.;C:\Program Files\Apache Software Foundation\Tomcat
5.5\webapps\luceneweb\WEB-INF\lib\lucene-demos-2.1.
karl wettin wrote:
The way I see it (and probably many other) mailing lists are suprior
in many ways, especially when following multiple forums.
It's true. Any forum that I need to subscribe to I find an RSS feed
for so that I can get mail messages. Forums are a pain in the neck
once you'
Daniel Noll wrote:
The only screenshots I can see look like plain text to me, and I'm
currently working on something which needs to convert Word to HTML,
which is why I ask.
wvWare, which I mentioned earlier, can convert word to HTML and does a
pretty good job of maintaining formatting. abiwor
Hava a look at the "TermDocs" Interface in the API.
You can get term frequency with a open IndexReader
TermDocs termDocs = reader.termDocs(term);
where "term" represents the current Term.
now you can call:
termDocs.freq()
to get the frequency of the term within the current document.
For th
Hello Luceners,
I have a collections of vector of terms (token) that I extracted from files.
I am looking for ways to calculate TF/IDF of each term.
I wanted to use Lucene to do this but Lucene is made for collections of
files and in my case I have already extracted those files into vector of
te
35 matches
Mail list logo