Hi Srinivas,
Perhaps what you need here is a query formation logic which assigns the
right keywords to the right fields. Let me know in case I got it wrong. One
way to do that could be by using index time boost for fields and then
running a query (so that a particular field is preferred over the o
Hello all
1)
Which is best to use Snowball analyzer or Lucene contrib analyzer? There is
no inbuilt stop word list for Snowball analyzer?
2)
Whether Analyzer and QueryParser are thread-free. They could created once
and use it in as many threads?
3)
I am using Snowball Analyzer to do index a
Thank you Ian,
> If you want a direct suggestion: use PerFieldAnalyzerWrapper,
> specifying a different analyzer for field B.
>
>
> --
> Ian.
this makes a lot of sense.
-John
-
To unsubscribe, e-mail: java-user-unsubscr...@luc
I am trying to evaluate as to whether Lucene is the right candidate for the
problem at hand.
Say I have 3 indexes:
Index 1 has street names.
Index 2 has business names.
Index 3 has area names.
All these names can be single words or a combination of words like woodward
street or marks and spencer
Hello gopi,
My comments.
if(textFiles[i].isFile() > textFiles[i].getName().endsWith(".txt")){
&& should be used.
*document.add(Field.Text("content",textReader));
document.add(new Field("content", textReader);
document.add(Field.Text("path",textFiles[i].getPath()));*
document.ad
Hi all,
I am getting error in running this code. Can somebody please tell me what
is the problem? The code is given below. The bold lines were giving error as
*cannot find symbol *
import java.io.File;
import java.io.FileReader;
import java.io.Reader;
import java.util.Date;
import org.apache.
So, I have a (small) Lucene index, all fine; I use it a bit, and then (on app
shutdown) want to delete its files and the containing directory (the index is
intended as a temp object). At some earlier time this was working just fine,
using java.io.File.delete(). Now however, some of the files get
Uhhhm, this is the Lucene user's list, not a general Java programming
thread, so unless this has something to do with Lucene I doubt
you'll get much help.
I'd suggest one of the Java programming language lists rather than
this one.
Best
Erick
On Thu, Mar 5, 2009 at 6:32 PM, futurpc wrote:
>
>
: That being said, I could see maybe determining a delta value such that if the
: distance between any two scores is more than the delta, you cut off the rest
: of the docs. This takes into account the relative state of scores and is not
: some arbitrary value (although, the delta is, of course)
: > Hmm, bugzilla has moved to JIRA. I'm not sure where the mapping is
: > anymore. There used to be a Bugzilla Id in JIRA, I think. Sorry.
FYI...
by default the jira homepage has a form for searching by legacy
bugzilla ID...
https://issues.apache.org/jira/
...if you create a Jira account
hello.
i have data files on web server that contains some values(i need to build
from them chart).
i make applet that read information from file and build chart.
but when i upload the applet to server , it didn't find the files.
can you please suggest how can i make java program that will be execu
: I can now create indexes with Nutch, and see them in Luke.. this is
: fantastic news, well for me it is beyond fantastic..
: Now I would like to (need to) query them, and to that end I wrote the
: following code segment.
:
: int maxHits = 1000;
: NutchBean nutchBean
Sounds like your most difficult part will be the question parser using POS.
This is kind of old school but use something like the AliceBot AIML library
http://en.wikipedia.org/wiki/AIML
Where the subjective terms can be extracted from the questions, and indexed
separately.
Or as Grant and others
Hi Seid,
Do you have a reference for the article? I've done some QA in my day,
but don't recall reading that one.
At any rate, I do think it is possible to do what you are after. See
below.
On Mar 5, 2009, at 9:49 AM, Seid Mohammed wrote:
For my work, I have read an article stating th
Yes, it is good to learn that Yonik, Erik et al are also human-beings. :-)
Thanks for all your contributions to Lucene/Solr, this list and the OSS
community in general.
Best,
Shashi
On Thu, Mar 5, 2009 at 11:36 AM, Erick Erickson wrote:
> Let's see, you guys generously contributed your time and
Hi,
The very fact that you are trying to answer factoid questions to start
with, it is better to use OpenNLP components to identify
NER (Named Entity recognition) in the document and use those tags as part
of your indexing process.
REgards
Vasu
On Thu, Mar 5, 2009 at 8:19 PM, Seid Mohamm
Hello,
I would like to be able to instantiate a RAMDirectory from a directory
that an IndexWriter in another process might currently be modifying.
Ideally, I would like to do this without any synchronizing or locking.
Kind-of like the way in which an IndexReader can open an index in a
direct
Let's see, you guys generously contributed your time and saved
my butt way more than once. I *think* I can stand an inadvertent
message or two ...
Best
Erick
On Thu, Mar 5, 2009 at 10:12 AM, Glen Newton wrote:
> Yonik,
>
> Thank-you for your email. I appreciated and accept your apology.
>
> Ind
mkjjyy
On 8/10/07, Askar Zaidi wrote:
Hey Guys,
I am trying to do something similar. Make the content search-able as
soon as
it is added to the website. The way it can work in my scenario is
that , I
create the Index for a every new user account created.
Then, whenever a new document is
Hi
I think that the SimpleAnalyzer you are passing to the query parser
will be downcasing the X. You can fix it using an analyzer that
doesn't convert to lower case, creating the query directly in code, or
by using PerFieldAnalyzerWrapper, and no doubt other ways too.
If you want a direct sugge
Hi all,
I'm not able to see what's wrong in the following sample code.
I'm indexing a document with 5 fields, using five different indexing strategies.
I'm fine the the results for 4 of them, but field B is causing me some
trouble in understanding what's going on.
The value of field B is X (upper
Yonik,
Thank-you for your email. I appreciated and accept your apology.
Indeed the spam was annoying, but I think that you and your colleagues
have significant social capital in the Lucene and Solr communities, so
this minor but unfortunate incident should have minimal impact.
That said, you and
On Mar 5, 2009, at 9:24 AM, Tuztuz T wrote:
dear all
I am really new to lucene
Is there anyone who can guid me learning lucene
I have lucene in action the old book, but I get hard time to
understand the syntaxes in the book and the new lucene release (2.4)
Can anyone give me copy of the new lu
For my work, I have read an article stating that " Answer type can be
automatically constructed by Indexing Different Questions and Answer
types. Later, when an unseen question apears, answer type for this
question will be found with the help of 'similarity function'
computation"
so I am clear wit
This morning, an apparently over-zealous marketing firm, on behalf of
the company I work for, sent out a marketing email to a large number
of subscribers of the Lucene email lists. This was done without my
knowledge or approval, and I can assure you that I'll make all efforts
to prevent it from ha
Hi Tuztuz,
Please visit the book's website and the forum. You will get most queries
cleared.
Sincerely,
Sithu D Sudarsan
-Original Message-
From: Tuztuz T [mailto:tuztu...@yahoo.com]
Sent: Thursday, March 05, 2009 9:24 AM
To: java-user@lucene.apache.org
Subject: Learning Lucene
dear a
dear all
I am really new to lucene
Is there anyone who can guid me learning lucene
I have lucene in action the old book, but I get hard time to understand the
syntaxes in the book and the new lucene release (2.4)
Can anyone give me copy of the new lucen inaction book or any other material
that i
I think your root problem is that you're indexing UN_TOKENIZED, which
means that the tokens you're adding to your index are NOT run through
the analyzer.
So your terms are exactly "111", "222 333" and "111 222 333", none of which
match "222". I expect you wanted your tokens to be "111", "222", and
Hi,
I would like to do a search that will return documents that contain a given
word.
For example, I created the following index:
IndexWriter writer = new IndexWriter("C:/TryIndex", new StandardAnalyzer());
Document doc = new Document();
doc.add(new Field(WordIndex.FIELD_WORLDS, "111 222 333", F
That's interesting.
I've been working in python recently, not crawling though.
But, as ever, the more you get into it the more curious you get.
Did you come up with a solution to a node error?
Are you really talking about a broken link, or are you just saying the
bottom of the tree has been reached
Hi,
I think it might be a case of the allowed open files at the OS. Try
setting a higher ulimit and run the program. Also, what are the gc
parameters you have set on the jvm?
Regards
Varun Dhussa
Product Architect
CE InfoSystems (P) Ltd
http://www.mapmyindia.com
damu_verse wrote:
Hi Than
31 matches
Mail list logo