I think that all I needed to create the components is:
@Override
protected Analyzer.TokenStreamComponents createComponents( String
fieldName, Reader reader ) {
Analyzer.TokenStreamComponents tsc = new
Analyzer.TokenStreamComponents(
getTokenFilterChain( reader
hi Hoss -- thank you for your time. it looks like you're right (and it
makes sense if the reader is advanced in two places at the same time
that it will cause a problem).
I'll try to figure out how to create an Analyzer out of the Tokenizer.
that's what I was trying to do there and obviously
: thanks for your reply. please see attached. I tried to maintain the
: structure of the code that I need to use in the library I'm building. I think
: it should work for you as long as you remove the package declaration at the
: top.
I can't currently try your code, but skimming through it i'
thanks for your reply. please see attached. I tried to maintain the
structure of the code that I need to use in the library I'm building. I
think it should work for you as long as you remove the package
declaration at the top.
when I run the attached file I get the following output:
debug:
: I keep getting an NPE when trying to add a Doc to an IndexWriter. I've
: minimized my code to very basic code. what am I doing wrong? pseudo-code:
can you post a full test that other people can run to try and reproduce?
it doesn't even have to be a junit test -- just some complete javacode
I keep getting an NPE when trying to add a Doc to an IndexWriter. I've
minimized my code to very basic code. what am I doing wrong? pseudo-code:
Document doc = new Document();
TextField ft;
ft = new TextField( "desc1", "word1", Field.Store.YES );
doc.add( ft );
ft = new TextField( "desc2", "
Of course you're free to do as you like - who will stop you? :)
The problem is the lack of a single place to look for detailed guidance on
handling a long-distance upgrade like that.
But it's difficult to generalize here: the possible range in the level of
difficulty involved is vast, depending
I am in the process of upgrading LuSql from 2.x to 4.x and I am first
going to 3.6 as the jump to 4.x was too big.
I would suggest this to you. I think it is less work.
Of course I am also able to offer LuSql to 3.6 users, so this is
slightly different from your case.
-Glen
On Wed, Jan 9, 2013 a
Are there any best practices that we can follow? We want to get to the latest
version and am thinking if we can directly go from 2.4.0 to 4.x (as supposed
to 2.x - 3.x and 3.x - 4.x)? so that it will not only save time but also
testing cycle at each migration hop.
Are there any limitations in dire
as mentioned before -- I'm not an expert on Lucene (far from it) -- but
it seems to me like each migration version will take almost equal amount
of work so if I were you I'd rethink this plan and consider migration to 4.0
Igal
On 1/9/2013 1:08 PM, saisantoshi wrote:
Is there any migration g
I don't think there is a migration guide from 2.X to 3.X, other than the
specific information in the release notes.
If you start reading CHANGES.txt at version 3.0.0, and then each later
release's notes after that, especially the sections "Changes in backwards
compatibility policy", e.g. for 3.
Is there any migration guide from 2.x to 3.x? ( as per the suggestion, i
would like to upgrade first from 2.4.0 to 2.9.0 and from 2.9.0 to 3.6) and
later we decide if we want to upgrade from 3.6 to 4.x version?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Upgrade-Lucene-t
Sai,
For the transition from 2.X to 3.X, I recommend compiling your code against the
latest 2.9.X version (2.9.4), looking at the deprecation messages, and making
changes until these are all addressed and compilation no longer produces
deprecation messages. Once that's done, your code should c
My guess is that upgrading to 3.6 to cover the _mostly_ upward compatible
changes to that point (Fieldable vs. Field) might make a worthwhile
intermediate step.
Then test that to make sure it is working using whatever have to test. Then
work out the "real" changes to 4.0.
That is only a thought
I can not elaborate much myself add there are many changes and I'm not an
expert on License.
I can tell you though that many signatures have changed as well as package
names.
There were many API changes even between 3.6 and 4.0
--
typos, misspels, and other weird words brought to you courtesy of
Thanks. Could you please elaborate on what is needed other than replacing the
jars? Are the jars listed is the only jars or any additional jars required?
Is the API not backward compatible? I mean to say whatever the API calls we
are using in 2.4.0 is not supported by 4.0? Has the signature modifi
We recently went through the same process. We upgraded our indexing service
from 1.9.1 to 3.6.1. Unfortunately, the process is not as easy as you
thought. Besides replacing the jar files. You also need to change your code
to adopt to the new API. There are many changes, the most import parts are
in
the API has changed much over time so I suspect that it will take more
than replacing the jars.
On 1/9/2013 11:04 AM, saisantoshi wrote:
We have an existing application which uses Lucene 2.4.0 version. We are
thinking of upgrading it to alatest version (4.0). I am not sure the process
involved
We have an existing application which uses Lucene 2.4.0 version. We are
thinking of upgrading it to alatest version (4.0). I am not sure the process
involved in upgrading to latest version. Is it just copying of the jars? If
yes, what are all the jars that we need to copy over. Will it be backward
hi everybody,
I figured it out. the problem was that I was using a "custom" jar to
deploy this along with other libs that I use in my application. so at
the end of my build.xml I create a jar file with all the required libs.
the problem was that I was adding lucene-core.jar with a filter of
There is often the possibility to put another tokenizer in the chain to create
a variant analyzer. This NOT very hard at all in either Lucene or
ElasticSearch.
Extra tokenizers can often be used to tweak the overall processing to add a
late tokenization to overcome an overlooked tokenization (
Thanks for all the responses. From the above, it sounds that there are two
options.
1. Use ICUTokenizer ( is it in Lucene 4.0 or 4.1)? If its in 4.1, then we
cannot use at this time as it is not released out.
2. Write a custom analyzer by extending ( StandardAnalyzer) and add filters
for addition
attaching the second screen shot of live recorded objects .
thanks again
Alon
On Wed, Jan 9, 2013 at 7:34 PM, Alon Muchnick wrote:
> hi ,
> after upgrading to Lucune 3.6.2 i noticed there is an extensive minor
> garbage collection operations. once or twice a second , and the amount of
> memor
hi ,
after upgrading to Lucune 3.6.2 i noticed there is an extensive minor
garbage collection operations. once or twice a second , and the amount of
memory being freed is about 600 MB each time for a load of 60 searches per
second :
2013-01-09T18:57:24.350+0200: 174200.121: [GC [PSYoungGen:
630064
FWIW, new FuzzyQuery(term, 2 ,0) is the same as new FuzzyQuery(term), given
the current values of defaultMaxEdits (2) and defaultPrefixLength (0).
-- Jack Krupansky
-Original Message-
From: Ian Lea
Sent: Wednesday, January 09, 2013 9:44 AM
To: java-user@lucene.apache.org
Subject: Re:
>> For an example, in the phrase "A man saw a elephant" "saw" has annotations as
>> follows (we also say that its position in index is 1234):
>>
>> {lemma: see, pos: verb, tense: past}, {lemma: saw, pos: noun, number:
>> singular}
>>
>> I think, it would be more effective to insert parse index in
See the javadocs for FuzzyQuery to see what the parameters are. I
can't tell you what the comment means. Possible values to try maybe?
--
Ian.
On Wed, Jan 9, 2013 at 2:34 PM, algebra wrote:
> is true Ian, o code is good.
>
> The only thing that I dont understand is a line:
>
> Query query =
is true Ian, o code is good.
The only thing that I dont understand is a line:
Query query = new FuzzyQuery(term, 2 ,0); //0-2
Whats means 0 to 2?
--
View this message in context:
http://lucene.472066.n3.nabble.com/FuzzyQuery-in-lucene-4-0-tp4031871p4031879.html
Sent from the Lucene - Java Us
What adjustments did you make? One of them might be to blame.
But at a glance the code looks fine to me. In what way is it not
working? Care to provide any input/output/details of what
does/doesn't work?
--
Ian.
On Wed, Jan 9, 2013 at 2:03 PM, algebra wrote:
> I was using lucene 3.6 and my
I was using lucene 3.6 and my function worked well. After I changed the
version of lucene to 4.0 and did some adjustments and my function is not
working. Someone tell me what do you know I'm doing wrong?
public List fuzzyLuceneList(List list, String s) throws
CorruptIndexException, LockObtainFai
Great! I'll look into that.
Thanks!
2013/1/9 김한규
> Try SpanTermQuery, getSpans() function. It returns Spans object which you
> can iterate through to find position of every hits in every documents.
>
> http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/spans/SpanTermQuery.html
>
Try SpanTermQuery, getSpans() function. It returns Spans object which you
can iterate through to find position of every hits in every documents.
http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/spans/SpanTermQuery.html
2013/1/9 Itai Peleg
> Hi,
>
> I'n new to Lucene, and I'm hav
the index lib must be saved into harddisk.
Sent from Huawei Mobile
Ian Lea 编写:
>What do you mean by lucene blocksize? What version of lucene are you using?
>
>A good general principle is to start with the defaults and only worry
>if there is a problem.
>
>
>--
>Ian.
>
>
>On Wed, Jan 9, 2013 at
The index lib must be saved in the harddisk. when harddisk can not save a
large size index lib, we will use disk array. the disk array must set stripe
size. so i want to know,when index lib saved in the disk array ,which stripe
size will be set. when index saved in the file sytem, ho
Hi Jack, thanks for your ideas, I've added some comments to your
questions, maybe you can throw some more light on this...
On 01/08/2013 11:34 PM, Jack Krupansky wrote:
The term "arv" is on the first list, but not the second. Maybe it's
document frequency fell below the setting for minimum
Hi,
you can have a look at the (early stage) Lucene classification module on
trunk [1], see also a brief introduction given at last ApacheCon EU [2].
Hope this helps,
Tommaso
[1] :
http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/classification/
[2] :
http://www.slideshare.net/teofili/tex
What do you mean by lucene blocksize? What version of lucene are you using?
A good general principle is to start with the defaults and only worry
if there is a problem.
--
Ian.
On Wed, Jan 9, 2013 at 8:51 AM, seacathello wrote:
> now i index very many email file, aboule 50m and every email f
http://www.slideshare.net/teofili/text-categorization-with-lucene-and-solr
On Wed, Jan 9, 2013 at 5:46 AM, VIGNESH S wrote:
> Hi,
>
> can anyone suggest me how can i use lucene for text classification.
>
> --
> Thanks and Regards
> Vignesh Srinivasan
>
> -
Hi,
can anyone suggest me how can i use lucene for text classification.
--
Thanks and Regards
Vignesh Srinivasan
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@
Thanks, I'll do that.
p.s. -- that was http://getrailo.org -- 'auto-correct' messed it up ;-)
--
typos, misspels, and other weird words brought to you courtesy of my mobile
device.
On Jan 9, 2013 2:08 AM, "Nick Burch" wrote:
> On Wed, 9 Jan 2013, Igal Sapir wrote:
>
>> The syntax is CFML / CFSc
On Wed, Jan 9, 2013 at 5:25 PM, Steve Rowe wrote:
> Dude. Go look. It allows for per-script specialization, with (non-UAX#29)
> specializations by default for Thai, Lao, Myanmar and Hewbrew. See
> DefaultICUTokenizerConfig. It's filled with exactly the opposite of what you
> were describing
On Wed, 9 Jan 2013, Igal Sapir wrote:
The syntax is CFML / CFScript (ColdFusion Script). Railo is an open
source, high performance, ColdFusion server. http://getrailo.arg/
I will re-download the Lucene jars and try again. I'll let you know
what I find.
It may be worth double-checking that
The syntax is CFML / CFScript (ColdFusion Script). Railo is an open
source, high performance, ColdFusion server. http://getrailo.arg/
I will re-download the Lucene jars and try again. I'll let you know what I
find.
Thanks,
Igal
--
typos, misspels, and other weird words brought to you courtes
the index lib size is aboule 1TB and have only one segment.
--
View this message in context:
http://lucene.472066.n3.nabble.com/how-much-blocksize-is-set-in-lucene-tp4031796p4031797.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
-
now i index very many email file, aboule 50m and every email file size about
4-50k.
the index lib size is aboule 1TB, segment size is only.
In this index lib, which blocksize should i shoose?
4k or 512k, which choice is better?
Thanks very much?
--
View this message in context:
http://lucene.4
> indexWriterConfig = createObject( "java",
> "org.apache.lucene.index.IndexWriterConfig" ).init( Lucene.Version,
> this.indexAnalyzer );
What syntax is that, I have never seen that before!
> where Lucene.Version is an object of Lucene.VERSION_40 and
> this.indexAnalyzer is an Analyzer objec
hi Uwe,
thank you for answering. I believe that this is the complete stack
trace, no (pasted again below)?
I'm actually not trying to do anything fancy with codecs etc. I'm
trying to do something very basic: create an object of type
indexWriterConfig. the CFML (Railo) code is as follows:
47 matches
Mail list logo