Hi guys,
I want to do a facet with facet query,and let it has the [Tagging and
Excluding Filters] (
https://cwiki.apache.org/confluence/display/solr/Faceting)style which
facet.field has,so how to do it , pls guide me!
Thanks,
Andy
Hi Uwe,
thanks a lot, I will try with that.
Uwe Schindler wrote
> Hi andy,
>
> unfortunately, that is not easy to show with one simple code. You have to
> change the Similarity used.
>
> Before starting to do this, you should be sure, that this affects you
> users. Th
lable in the Solr
> server. But Andy uses Lucene directly. In his case he should use
> IndexSearcher's explain functionalities to retrieve a structured output of
> how the documents are scored for this query for debugging:
>
> http://lucene.apache.org/core/4_6_0/core/org/a
thanks for your reply Erick, this is the case ,But how can I keep the
precision of the fields' length?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Length-of-the-filed-does-not-affect-the-doc-score-accurately-for-chinese-analyzer-SmartChineseAnalyz-tp4111390p4116832.html
Hi guys,
As the topic,it seems that the length of filed does not affect the doc score
accurately for chinese analyzer in my source code
index source code
private static Directory DIRECTORY;
@BeforeClass
public static void before() throws IOException {
DIRECTORY = new RAMDire
t compareDocToValue(int arg0, Object arg1)
throws IOException {
// TODO Auto-generated method stub
return 0;
}
}
}
}
and solrconfig.xml configuration is
mySortComponent
Andy
eak your code down into a simple standalone program
> and post that if it still doesn't work.
>
>
> --
> Ian.
>
> On Thu, Nov 29, 2012 at 4:20 AM, Andy Yu wrote:
> > I revise the code to
> >
> > SortField sortField[] = {new Sor
NaN I think you'll need
> to use a TopFieldCollector. See for example
> http://www.gossamer-threads.com/lists/lucene/java-user/86309
>
>
> --
> Ian.
>
>
> On Tue, Nov 27, 2012 at 3:51 AM, Andy Yu wrote:
> > Hi All,
> >
> >
> > Now I want to sor
My question should really be on "fuzzy search". Is there a minimum
length requirement for fuzzy search to start? For example, would
"an~0.8" kick off fuzzy search?
Thanks,
Andy
On Wed, Mar 30, 2011 at 4:02 PM, Erick Erickson wrote:
> Uhhhm, doesn't "term1 term2&
~2
term2~2". I am wondering if we should skip short words if it is not
done automatically by the engine.
Thanks,
Andy
On Wed, Mar 30, 2011 at 4:02 PM, Erick Erickson wrote:
> Uhhhm, doesn't "term1 term2"~5 work? If not, why not?
>
> You might get some use from
> htt
Is there a minimum string length requirement for proximity search? For
example, would "a~" or "an~" trigger proximity search? The result
would be horrible if there is no such requirement.
Thanks,
Andy
-
To
Congrats!
A couple questions:
1) Which version of Solr is this based on?
2) How is LWE different from standard Solr? How should one choose between the
two?
Thanks.
--- On Wed, 12/15/10, Grant Ingersoll wrote:
> From: Grant Ingersoll
> Subject: [ANN] General Availability of LucidWorks Enterp
I would like to use MultiFieldQueryParser to serach multiple fields, then in
each field, I want to use fuzzy search. How can that be done? Any example
will be appreciated.
Thanks,
Andy
That works, and now that I re-test my original code, it also works.
> Date: Mon, 19 Apr 2010 10:52:45 -0700
> From: iori...@yahoo.com
> Subject: Re: How to search by numbers
> To: java-user@lucene.apache.org
>
>
> > Hi, I have indexed the following two fields:
> > org_id - NOT_ANALYZEDorg_name
Hi, I have indexed the following two fields:
org_id - NOT_ANALYZEDorg_name - ANALYZED
However when I try to search by org_id, for example, 12345, I get no hits.
I am using the StandardAnalyzer to index and search.
And I am using: Query query = queryParser.parse("org_id:12345");
Any ideas? Th
o: java-user@lucene.apache.org
>
> Why are you locked into using MultiFieldQueryParser? The simpler approach is
> just send something like +title:abc +desc:123 through the regular query
> parser
>
> HTH
> Erick
>
> On Thu, Apr 15, 2010 at 6:34 PM, Andy wrote
Hi, I am trying to use the MultiFieldQueryParser to search "title" and "desc"
fields. However the Lucene API appears to only let me provide a single search
term. Is it possible to use multiple search terms (one for each field)?
For example, the SQL equivalent would be:
select *
from luce
Thanks
--
View this message in context:
http://old.nabble.com/lucene-search-tp27358766p27367213.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: java-user-unsu
hello,
I programmed with Lucene code to handle the search on my site ... the
articles indexed are those stored in a database, then I do a search with
"lucene.queryparser" on the field "code" of various objects (a "code" is a
word of 3 6-character) ...
My problem is the fact that when I search, I
cause the text format is
tripping up the tokenizing.
I am trying to figure out whether using Lucene to implement this is a good
thing or whether I should just try to implement my own search logic.
Andy Faibishenko
rches, but overall I only
spent about a week on the project, and got a 60x speed improvement on the
target set. (from minutes to seconds) YMMV however, since the app requires
the collection of the complete set of results for analysis.
- andy g
On Mon, Jun 29, 2009 at 12:47 AM, Marcus Herou
wrot
--- On Sat, 4/25/09, andykan1...@yahoo.com wrote:
From: andykan1...@yahoo.com
Subject: Piece of coded needed
To: java-user@lucene.apache.org
Date: Saturday, April 25, 2009, 1:37 AM
Hi every body
I know it may seem stupid, but I'm in the middle of a research and I need a
piece of code in luc
Is there a way to have lucene to write index in a txt file?
solve?
On Apr 9, 2009, at 2:33 AM, Andy wrote:
> Hello all,
>
> I'm trying to implement a vector space model using lucene. I need to have a
> file (or on memory) with TF/IDF weight of each term in each document. (in
> fact that is a matrix with documents presented
Hello all,
I'm new to lucene and trying to implement a vector space model using lucene. I
need to have a file (or on memory) with TF/IDF weight of each term in each
document. (in fact that is a matrix with documents presented as vectors, in
which the elements of each vector is the TF weight ...
Hello all,
I'm trying to implement a vector space model using lucene. I need to have a
file (or on memory) with TF/IDF weight of each term in each document. (in fact
that is a matrix with documents presented as vectors, in which the elements of
each vector is the TF weight ...)
Please Please h
Hello all,
I'm trying to implement a vector space model using lucene. I need to
have a file (or on memory) with TF/IDF weight of each term in each
document. (in fact that is a matrix with documents presented as
vectors, in which the elements of each vector is the TF weight ...)
Please Please
whichever is chosen.
Just a huge thank you for making this tool available!
Great tool!
//andy
On Thu, Oct 30, 2008 at 4:06 AM, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> Many people ask me when the next version of Luke becomes available. It's
> almost
brown" to require a 4 instead of a 3,
two to transpose brown and fox, two to transpose quick and fox. Why
is this only 3?
- andy g
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
a "best practice" to treat
Lucene as described.
//andy
On Fri, Aug 8, 2008 at 2:39 PM, Cam Bazz <[EMAIL PROTECTED]> wrote:
> hello,
>
> what would happen if I modified the class IndexWriter, and made the delete
> by id method public?
>
> I have two fields in my d
be merged and data ends up
getting copied over again at certain points. So if you're running a batch
process with a lot of inserts, you might get better throughput with BDB as
opposed to Lucene, but, of course, benchmark to confirm ;)
Andy
On Thu, Jul 31, 2008 at 9:12 AM, Karsten F.
&l
ch other as the same NGrams in the search string. I'm hoping NGrams
would avoid the need for a whole index scan. Does Lucene already factor
this into its hit score, or would I need to do some custom work?
- Andy
Grant Ingersoll wrote:
I believe there were some posts on this about a year
or similar name.
Based on the little I know of Lucene, I'm thinking an NGram algorithm
(based on characters, not words) would work best... but, I'm not sure if
Lucene takes proximity or edit distances into account? For example, say
you have these two names:
Andrew John
John Andrew
I
My firm uses a parser based on javax.xml.stream.XMLStreamReader to
break (english and nonenglish) wikipedia xml dumps into lucene-style
"documents and fields." We use wikipedia to test our
language-specific code, so we've probably indexed 20 wikipedia dumps.
- andy g
On Dec 1
n the hits are
gathered. The only way I can see of doing this is by over-riding
Similarity, which seems like an incredibly complex procedure. What am I
missing?
- andy g.
heap dump, and will start an http
listener on port 7000 by default.
Interesting statistics can be found at the bottom of the front page. These
will enable you to discover whether it is a memory leak in the java runtime
or in the lucene library.
- andy g
On 4/5/07, Craig W Conway <[EM
chine you can load balance 2 search servers and take one
out of the cluster when the index is being copied. Alternatively, if it's
possible, you can copy the index at an offpeak hour.
Andy
On 4/3/07, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
How fast are your disks? Perhaps they ar
nding on your data.
Andy
On 4/3/07, Ivan Vasilev <[EMAIL PROTECTED]> wrote:
Hi All,
I have the following problem:
I have to implement range search for fields that contain numbers. For
example the field size that contains file size. The problem is that the
numbers are not kept in strings
r
field4 is a field that would be updated frequently and as real-time as
possible. However, once I update field4, the docId's are no longer
synchronized, and ParallelReader fails.
Andy
On 3/6/07, Alexey Lef <[EMAIL PROTECTED]> wrote:
We use MultiSearcher for a similar scenario. This
is issue on the list, but nothing pointing to a
solution. Can somebody help me out?
Andy
I'm just not seeing?
Andy
On 9/18/06, Paul Elschot <[EMAIL PROTECTED]> wrote:
On Monday 18 September 2006 23:08, Andy Liu wrote:
> For multi-word queries, I would like to reward documents that contain a
more
> even distribution of each word and penalize documents that have a skewe
would implement
this?
Thanks,
Andy
4. Search for records with filter.
if the filter returns a lot of ids, it willn' t be fast.
Recently I have a test. I customized a filter which get a list of ids from a
mysql database table of size 5000. Then I invoke the search(query, filter,
hitcollector), I took me more than 40s to retrieve th
filter.
Can you give me some advice?
2006/8/8, Ryan O'Hara <[EMAIL PROTECTED]>:
Hey Andy,
If you have enough RAM, try using FieldCache:
String[] fieldYouWant = FieldCache.DEFAULT.getStrings
(searcher.getIndexReader(), "fieldYouWant");
searcher.search(query, new HitColle
, then use the list to check whether the Lucene
seached results should be returned.
Can you give some suggestion?
Also can you show me to how you use filter?
2006/8/8, Simon Willnauer <[EMAIL PROTECTED]>:
Hey Andy,
i don't know how you determinate whether a document has to be
displ
document to determine whether I
should return the document. The total number of documents is about two
hundred thousand. So I'm afraid the
performance
2006/8/7, Martin Braun <[EMAIL PROTECTED]>:
hi andy,
> How can I use HitCollector to iterate over every returned document
How can I use HitCollector to iterate over every returned document?
Thank you in advance.
Thank you
These codes are written in C#,. There is a C# version of Lucene 1.9, which
can be downloaded from http://www.dotlucene.net
This implements the indexing .
public void CreateIndex()
{
try
{
AddDirectory(directory);
writer.Optimize();
Yes, I have closed IndexWriter. But it doesn't work.
2006/7/27, Michael McCandless <[EMAIL PROTECTED]>:
> I met this problem: when searching, I add documents to index. Although
I
> instantiates a new IndexSearcher, I can't retrieve the newly added
> documents. I have to close the program an
I met this problem: when searching, I add documents to index. Although I
instantiates a new IndexSearcher, I can't retrieve the newly added
documents. I have to close the program and enter the program, then it will
be ok.
The platform is win xp. Is it the fault of xp?
Thank you in advance.
stribute it to you. I am glad You understand
Chinese. How I should deliver it to you? Because the api includes a Chinese
lexis which is nearly 10M in size. Maybe I can mail it to you.
2006/5/30, Erik Hatcher <[EMAIL PROTECTED]>:
On May 29, 2006, at 6:34 AM, hu andy wrote:
> I indexed
2006/5/29, hu andy <[EMAIL PROTECTED]>:
I indexed a collection of Chinese documents. I use a special segmentation
api to do the analysis, because the segmentation of Chinese is different
from English.
A strange thing happened. With lucene 1.4 or lucene 2.0, it will be all
right to re
I indexed a collection of Chinese documents. I use a special segmentation
api to do the analysis, because the segmentation of Chinese is different
from English.
A strange thing happened. With lucene 1.4 or lucene 2.0, it will be all
right to retrieve the corresponding documents given the terms
Hi, I hava an application that need mark the retrieved documents which have
been read. So the next time I needn't read the marked documents again.
I have an idea that adding a particular field into the indexed
document. But as lucene have no update method, I have to delete that
document, and
For my application we have several hundred indexes, different subsets
of which are searched depending on the situation. Aside from not
upgrading to lucene 1.9, or making a big index for every possible
subset, do you have any ideas for how can we maintain fast
performance?
- andy g
On 4/26/06
I have seen in some documents that there are three kinds of retrieval modle
which are used often: Boolean, vector space and probability.
So I want to which is it that used by lucene. Thank you in advance
IndexReader.delete(int docNum) or IndexReader.delete(Term term)
2006/4/1, Don Vaillancourt <[EMAIL PROTECTED]>:
>
> Hi All,
>
> I need to implement the ability to update one document within a Lucene
> collection.
>
> I haven't been able to find anything in the API. Is there a way to
> update one
Hi,everyone. I have a large mount of xml files of size 1G. I use lucene(the
dotNet edition) to index . There are 8 fields for a document, with 4 keyword
fields and 4 unstored fields. I have set the minMergeDocs to 1 and
mergeFactor to 100. It took about 2.5 hours (main memeory 3G, CPU p4 ) .I
a
Do you mean you pack the index files into the file *.luc.If it is the case,
Lucene can't read it.
If you put index files and *.luc together under some directory, That's OK.
Lucene knows how to find these files
2006/3/14, Aditya Liviandi <[EMAIL PROTECTED]>:
>
> Hi all,
>
>
>
> If I want to embed
Because I will delete the indexed document periodically, So the index files
must be deleted after that. If I just want to delete some documents added
before some past day from the index, How should i do it?
Thank you in advance
I see there are seven different files with extentions .fnm .tis and etc. I
just can't make sure how it looks up in the .tis file. Does lucene use
Binary-Search to locate the term?
etween that and search described below:
>
> TermQuery termQuery = new TermQuery(
> BooleanQuery bq = ..
> bq.add(termQuery,true,false);
> bq.add(query,true,false);
> hits = Searcher.search(bq,queryFilter);
>
>
>
> -Original Message-
> From: hu andy [mailto:[
2006/3/7, Anton Potehin <[EMAIL PROTECTED]>:
>
> Is it possible to make search among results of previous search?
>
>
>
>
>
> For example: I made search:
>
>
>
> Searcher searcher =...
>
>
>
> Query query = ...
>
>
>
> Hits hits =
>
>
>
> hits = Searcher.search(query);
>
>
>
>
>
>
>
> After it
Jira it is not clear to me what version of lucene
I need to include a fix.
Has version 1.4.3 been fixed up beyond the latest official binary dated
29-Nov-2004?
Should I be getting and building from the repository?
Any help appreciated,
Regards
Andy
On Nov 3, 2005, at 10:22 AM, Oren Shir wrote:
There is no constructor for Sort(SortField, boolean) in Lucene API.
Which
version are you using?
I think 1.9rc1. I have a pretty recent svn checkout -- maybe this
constructor is new.
--Andy
erever you were using Sort.INDEXORDER.
--Andy
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
case the IndexSearcher classes. All I could find
was the *Nutch* IndexSearcher's getExplanation() method, which I see
sends toHtml() rather than toString() to its internal Lucene
IndexSearcher.
--Andy
-
To unsubsc
it seems to be only HTML now. Finally I wrote a
convenience method that dumps the HTML to a file, which I view in a
browser.
Thanks, Chris and Erik!
--Andy
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional
oosts, and if
so, can someone explain (at least roughly) how to achieve the desired
result?
--Andy
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Oops, I'm confusing libraries. I meant I want to remove a Nutch
Clause from a Nutch Query.
--Andy
On Oct 13, 2005, at 4:45 PM, Andy Lee wrote:
The API for BooleanQuery only seems to allow adding clauses. The
nearest way I can see to *remove* a clause is by laboriously
construct
me to want to remove clauses from a query. Is
there some reasonable way of doing this that I'm missing?
--Andy
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
that i
could use that in this case?
thanks,
andy g
"long", "verylong", depending on how
granular you need it. Then at query time you can specify the field and a
given boost value, i.e.
civil war docLength:verylong^5 docLength:long^3
Andy
On 9/28/05, Dawid Weiss <[EMAIL PROTECTED]> wrote:
>
>
> Hi.
>
> I
e than one term in the search query. Also, there is
obviously going to be some duplication of hits, so you could use a HashMap
when iterating of the Hits to ensure you get unique hits when the queries are
collated.
Andy
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
hieve more
generally, we can confirm that you don't need to mess with explicit indexing
of indexing.
Andy
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
On Monday 13 Jun 2005 14:52, Markus Wiederkehr wrote:
> On 6/13/05, Andy Roberts <[EMAIL PROTECTED]> wrote:
> > On Monday 13 Jun 2005 13:18, Markus Wiederkehr wrote:
> > > I see, the list of exceptions makes this a lot more complicated than I
> > > thought... Tha
a hyphen, you can manipulate
the buffer to merge the hyphenated tokens.
Andy
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
f(int freq)
It would be nice to have something like:
float tf(int freq, String fieldName, int numTerms)
If this isn't available out of the box, how difficult would it be to
hack up Lucene to allow for this?
Thanks,
Andy
-
To u
atin1 encoding doesn't support such characters. You need to specify Big5
yourself. Read the info on InputStreamReaders:
http://java.sun.com/j2se/1.5.0/docs/api/java/io/InputStreamReader.html
Andy
>
> Btw, I did try running the lucene demo (web template) to index the HTML
> files af
termFreq += tp.freq();
}
System.out.println(currentTerm.text() + "(" + termFreq
+ "|" + te.docFreq()
+ ")");
}
reader.close();
}
HTH,
Andy
---
t I've been hacking at it for a while with little
fun. Any suggestions?
Thanks,
Andy
public class DigesterTest {
private Digester dig;
public DigesterTest(File inFile) throws IOException, SAXException {
dig = new Digester();
nce of their indexes! (And I'm
sure they'd prefer it that way too)
Andy
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
On Wednesday 20 Apr 2005 08:27, Maik Schreiber wrote:
> > As the index is rather critical to my program, I just wanted to make it
> > really robust, and able to cope should a problem occur with the index
> > itself. Otherwise, the user will be left with a non-functioning program
> > with no explana
, I just wanted to make it really
robust, and able to cope should a problem occur with the index itself.
Otherwise, the user will be left with a non-functioning program with no
explanation. That's my reasoning anyway.
Andy
> Handle
> catch/throw/finally correctly and it should not p
stions, or will removing any file from the
directory be sufficient?
Many thanks,
Andy
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Docs(currentTerm);
int docCounter = 1;
while (docs.next()) {
System.out.println(currentTerm.text() + ", doc" + docCount + ",
" + docs.freq());
docCounter++;
}
}
HTH,
Andy
--
. However,
it's clear that you can't really accomodate multi-language documents. It
would be much easier to ensure all docs were in a single language before
indexing.
Andy
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
Fo
ser to specify their input language because otherwise,
results will be poor.
Andy Roberts
> -MB
>
> On Apr 11, 2005, at 6:02 AM, Andy Roberts wrote:
> > Can you not provide the user with a option list to specify their input
> > language?
> >
> > Language identificat
5 in the field
"contents" of the index ir.
HTH,
Andy Roberts
On Sunday 10 Apr 2005 15:52, Patricio Galeas wrote:
> Hello,
> I am new with Lucene. I have following problem.
> When I execute a search I receive the list of document Hits.
> I get without problem the
uages was to build a
model based on character bigrams (that is, sequences of two letters) [1]
At the end of the day, Lucene cannot help you in choosing the correct language
as it doesn't know, and so it'll be up to you to add the necessary logic to
tell Lucene which Analyzers to utilis
m the book
(http://www.lucenebook.com/LuceneInAction.zip). If you unzip this file you
will find a directory called "LuceneInAction/src/lia/analysis" and in there
is a class called AnalyzerDemo (which depends in AnalyzerUtils). Compile this
and run to see how the Analysers work. Put in your hyp
successfully built the code in the lucene-1.4.2-dev
branch,
but that doesn't contain that class either!
Any hints? Google didn't shed any light, btw.
Cheers,
Andy Roberts
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
Hi,
I've been using Lucene for a few months now, although not in a typical
"building a search engine" kind of way*. Basically, I have some large
documents. I would like a system whereby I search for a term, and then I
receive a hit for each match, with its context, e.g., ten words either side
94 matches
Mail list logo