Re: NPE when adding a Document to an IndexWriter

2013-01-09 Thread Igal @ getRailo.org
I think that all I needed to create the components is: @Override protected Analyzer.TokenStreamComponents createComponents( String fieldName, Reader reader ) { Analyzer.TokenStreamComponents tsc = new Analyzer.TokenStreamComponents( getTokenFilterChain( reader

Re: NPE when adding a Document to an IndexWriter

2013-01-09 Thread Igal @ getRailo.org
hi Hoss -- thank you for your time. it looks like you're right (and it makes sense if the reader is advanced in two places at the same time that it will cause a problem). I'll try to figure out how to create an Analyzer out of the Tokenizer. that's what I was trying to do there and obviously

Re: NPE when adding a Document to an IndexWriter

2013-01-09 Thread Chris Hostetter
: thanks for your reply. please see attached. I tried to maintain the : structure of the code that I need to use in the library I'm building. I think : it should work for you as long as you remove the package declaration at the : top. I can't currently try your code, but skimming through it i'

Re: NPE when adding a Document to an IndexWriter

2013-01-09 Thread Igal @ getRailo.org
thanks for your reply. please see attached. I tried to maintain the structure of the code that I need to use in the library I'm building. I think it should work for you as long as you remove the package declaration at the top. when I run the attached file I get the following output: debug:

Re: NPE when adding a Document to an IndexWriter

2013-01-09 Thread Chris Hostetter
: I keep getting an NPE when trying to add a Doc to an IndexWriter. I've : minimized my code to very basic code. what am I doing wrong? pseudo-code: can you post a full test that other people can run to try and reproduce? it doesn't even have to be a junit test -- just some complete javacode

NPE when adding a Document to an IndexWriter

2013-01-09 Thread Igal @ getRailo.org
I keep getting an NPE when trying to add a Doc to an IndexWriter. I've minimized my code to very basic code. what am I doing wrong? pseudo-code: Document doc = new Document(); TextField ft; ft = new TextField( "desc1", "word1", Field.Store.YES ); doc.add( ft ); ft = new TextField( "desc2", "

Re: Upgrade Lucene to latest version (4.0) from 2.4.0

2013-01-09 Thread Steve Rowe
Of course you're free to do as you like - who will stop you? :) The problem is the lack of a single place to look for detailed guidance on handling a long-distance upgrade like that. But it's difficult to generalize here: the possible range in the level of difficulty involved is vast, depending

Re: Upgrade Lucene to latest version (4.0) from 2.4.0

2013-01-09 Thread Glen Newton
I am in the process of upgrading LuSql from 2.x to 4.x and I am first going to 3.6 as the jump to 4.x was too big. I would suggest this to you. I think it is less work. Of course I am also able to offer LuSql to 3.6 users, so this is slightly different from your case. -Glen On Wed, Jan 9, 2013 a

Re: Upgrade Lucene to latest version (4.0) from 2.4.0

2013-01-09 Thread saisantoshi
Are there any best practices that we can follow? We want to get to the latest version and am thinking if we can directly go from 2.4.0 to 4.x (as supposed to 2.x - 3.x and 3.x - 4.x)? so that it will not only save time but also testing cycle at each migration hop. Are there any limitations in dire

Re: Upgrade Lucene to latest version (4.0) from 2.4.0

2013-01-09 Thread Igal @ getRailo.org
as mentioned before -- I'm not an expert on Lucene (far from it) -- but it seems to me like each migration version will take almost equal amount of work so if I were you I'd rethink this plan and consider migration to 4.0 Igal On 1/9/2013 1:08 PM, saisantoshi wrote: Is there any migration g

Re: Upgrade Lucene to latest version (4.0) from 2.4.0

2013-01-09 Thread Steve Rowe
I don't think there is a migration guide from 2.X to 3.X, other than the specific information in the release notes. If you start reading CHANGES.txt at version 3.0.0, and then each later release's notes after that, especially the sections "Changes in backwards compatibility policy", e.g. for 3.

Re: Upgrade Lucene to latest version (4.0) from 2.4.0

2013-01-09 Thread saisantoshi
Is there any migration guide from 2.x to 3.x? ( as per the suggestion, i would like to upgrade first from 2.4.0 to 2.9.0 and from 2.9.0 to 3.6) and later we decide if we want to upgrade from 3.6 to 4.x version? -- View this message in context: http://lucene.472066.n3.nabble.com/Upgrade-Lucene-t

Re: Upgrade Lucene to latest version (4.0) from 2.4.0

2013-01-09 Thread Steve Rowe
Sai, For the transition from 2.X to 3.X, I recommend compiling your code against the latest 2.9.X version (2.9.4), looking at the deprecation messages, and making changes until these are all addressed and compilation no longer produces deprecation messages. Once that's done, your code should c

RE: Upgrade Lucene to latest version (4.0) from 2.4.0

2013-01-09 Thread Paul Hill
My guess is that upgrading to 3.6 to cover the _mostly_ upward compatible changes to that point (Fieldable vs. Field) might make a worthwhile intermediate step. Then test that to make sure it is working using whatever have to test. Then work out the "real" changes to 4.0. That is only a thought

Re: Upgrade Lucene to latest version (4.0) from 2.4.0

2013-01-09 Thread Igal Sapir
I can not elaborate much myself add there are many changes and I'm not an expert on License. I can tell you though that many signatures have changed as well as package names. There were many API changes even between 3.6 and 4.0 -- typos, misspels, and other weird words brought to you courtesy of

Re: Upgrade Lucene to latest version (4.0) from 2.4.0

2013-01-09 Thread saisantoshi
Thanks. Could you please elaborate on what is needed other than replacing the jars? Are the jars listed is the only jars or any additional jars required? Is the API not backward compatible? I mean to say whatever the API calls we are using in 2.4.0 is not supported by 4.0? Has the signature modifi

Re: Upgrade Lucene to latest version (4.0) from 2.4.0

2013-01-09 Thread Bin Lan
We recently went through the same process. We upgraded our indexing service from 1.9.1 to 3.6.1. Unfortunately, the process is not as easy as you thought. Besides replacing the jar files. You also need to change your code to adopt to the new API. There are many changes, the most import parts are in

Re: Upgrade Lucene to latest version (4.0) from 2.4.0

2013-01-09 Thread Igal @ getRailo.org
the API has changed much over time so I suspect that it will take more than replacing the jars. On 1/9/2013 11:04 AM, saisantoshi wrote: We have an existing application which uses Lucene 2.4.0 version. We are thinking of upgrading it to alatest version (4.0). I am not sure the process involved

Upgrade Lucene to latest version (4.0) from 2.4.0

2013-01-09 Thread saisantoshi
We have an existing application which uses Lucene 2.4.0 version. We are thinking of upgrading it to alatest version (4.0). I am not sure the process involved in upgrading to latest version. Is it just copying of the jars? If yes, what are all the jars that we need to copy over. Will it be backward

Re: Cannot instantiate SPI class

2013-01-09 Thread Igal @ getRailo.org
hi everybody, I figured it out. the problem was that I was using a "custom" jar to deploy this along with other libs that I use in my application. so at the end of my build.xml I create a jar file with all the required libs. the problem was that I was adding lucene-core.jar with a filter of

RE: Is StandardAnalyzer good enough for multi languages...

2013-01-09 Thread Paul Hill
There is often the possibility to put another tokenizer in the chain to create a variant analyzer. This NOT very hard at all in either Lucene or ElasticSearch. Extra tokenizers can often be used to tweak the overall processing to add a late tokenization to overcome an overlooked tokenization (

Re: Is StandardAnalyzer good enough for multi languages...

2013-01-09 Thread saisantoshi
Thanks for all the responses. From the above, it sounds that there are two options. 1. Use ICUTokenizer ( is it in Lucene 4.0 or 4.1)? If its in 4.1, then we cannot use at this time as it is not released out. 2. Write a custom analyzer by extending ( StandardAnalyzer) and add filters for addition

Re: extensive minor garbage collection when using RAMDirectory on Lucune 3.6.2

2013-01-09 Thread Alon Muchnick
attaching the second screen shot of live recorded objects . thanks again Alon On Wed, Jan 9, 2013 at 7:34 PM, Alon Muchnick wrote: > hi , > after upgrading to Lucune 3.6.2 i noticed there is an extensive minor > garbage collection operations. once or twice a second , and the amount of > memor

Fwd: extensive minor garbage collection when using RAMDirectory on Lucune 3.6.2

2013-01-09 Thread Alon Muchnick
hi , after upgrading to Lucune 3.6.2 i noticed there is an extensive minor garbage collection operations. once or twice a second , and the amount of memory being freed is about 600 MB each time for a load of 60 searches per second : 2013-01-09T18:57:24.350+0200: 174200.121: [GC [PSYoungGen: 630064

Re: FuzzyQuery in lucene 4.0

2013-01-09 Thread Jack Krupansky
FWIW, new FuzzyQuery(term, 2 ,0) is the same as new FuzzyQuery(term), given the current values of defaultMaxEdits (2) and defaultPrefixLength (0). -- Jack Krupansky -Original Message- From: Ian Lea Sent: Wednesday, January 09, 2013 9:44 AM To: java-user@lucene.apache.org Subject: Re:

Re: Lucene for a linguistic corpus

2013-01-09 Thread Wu, Stephen T., Ph.D.
>> For an example, in the phrase "A man saw a elephant" "saw" has annotations as >> follows (we also say that its position in index is 1234): >> >> {lemma: see, pos: verb, tense: past}, {lemma: saw, pos: noun, number: >> singular} >> >> I think, it would be more effective to insert parse index in

Re: FuzzyQuery in lucene 4.0

2013-01-09 Thread Ian Lea
See the javadocs for FuzzyQuery to see what the parameters are. I can't tell you what the comment means. Possible values to try maybe? -- Ian. On Wed, Jan 9, 2013 at 2:34 PM, algebra wrote: > is true Ian, o code is good. > > The only thing that I dont understand is a line: > > Query query =

Re: FuzzyQuery in lucene 4.0

2013-01-09 Thread algebra
is true Ian, o code is good. The only thing that I dont understand is a line: Query query = new FuzzyQuery(term, 2 ,0); //0-2 Whats means 0 to 2? -- View this message in context: http://lucene.472066.n3.nabble.com/FuzzyQuery-in-lucene-4-0-tp4031871p4031879.html Sent from the Lucene - Java Us

Re: FuzzyQuery in lucene 4.0

2013-01-09 Thread Ian Lea
What adjustments did you make? One of them might be to blame. But at a glance the code looks fine to me. In what way is it not working? Care to provide any input/output/details of what does/doesn't work? -- Ian. On Wed, Jan 9, 2013 at 2:03 PM, algebra wrote: > I was using lucene 3.6 and my

FuzzyQuery in lucene 4.0

2013-01-09 Thread algebra
I was using lucene 3.6 and my function worked well. After I changed the version of lucene to 4.0 and did some adjustments and my function is not working. Someone tell me what do you know I'm doing wrong? public List fuzzyLuceneList(List list, String s) throws CorruptIndexException, LockObtainFai

Re: getting the offset of hits in a search

2013-01-09 Thread Itai Peleg
Great! I'll look into that. Thanks! 2013/1/9 김한규 > Try SpanTermQuery, getSpans() function. It returns Spans object which you > can iterate through to find position of every hits in every documents. > > http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/spans/SpanTermQuery.html >

Re: getting the offset of hits in a search

2013-01-09 Thread 김한규
Try SpanTermQuery, getSpans() function. It returns Spans object which you can iterate through to find position of every hits in every documents. http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/spans/SpanTermQuery.html 2013/1/9 Itai Peleg > Hi, > > I'n new to Lucene, and I'm hav

Re: how much blocksize is set in lucene.

2013-01-09 Thread hujing
the index lib must be saved into harddisk. Sent from Huawei Mobile Ian Lea 编写: >What do you mean by lucene blocksize? What version of lucene are you using? > >A good general principle is to start with the defaults and only worry >if there is a problem. > > >-- >Ian. > > >On Wed, Jan 9, 2013 at

Re: how much blocksize is set in lucene.

2013-01-09 Thread hujing
The index lib must be saved in the harddisk. when harddisk can not save a large size index lib, we will use disk array. the disk array must set stripe size. so i want to know,when index lib saved in the disk array ,which stripe size will be set. when index saved in the file sytem, ho

Re: Differences in MLT Query Terms Question

2013-01-09 Thread Peter Lavin
Hi Jack, thanks for your ideas, I've added some comments to your questions, maybe you can throw some more light on this... On 01/08/2013 11:34 PM, Jack Krupansky wrote: The term "arv" is on the first list, but not the second. Maybe it's document frequency fell below the setting for minimum

Re: Help needed Regarding classification of Text Data using Lucene..

2013-01-09 Thread Tommaso Teofili
Hi, you can have a look at the (early stage) Lucene classification module on trunk [1], see also a brief introduction given at last ApacheCon EU [2]. Hope this helps, Tommaso [1] : http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/classification/ [2] : http://www.slideshare.net/teofili/tex

Re: how much blocksize is set in lucene.

2013-01-09 Thread Ian Lea
What do you mean by lucene blocksize? What version of lucene are you using? A good general principle is to start with the defaults and only worry if there is a problem. -- Ian. On Wed, Jan 9, 2013 at 8:51 AM, seacathello wrote: > now i index very many email file, aboule 50m and every email f

Re: Help needed Regarding classification of Text Data using Lucene..

2013-01-09 Thread Shashi Kant
http://www.slideshare.net/teofili/text-categorization-with-lucene-and-solr On Wed, Jan 9, 2013 at 5:46 AM, VIGNESH S wrote: > Hi, > > can anyone suggest me how can i use lucene for text classification. > > -- > Thanks and Regards > Vignesh Srinivasan > > -

Help needed Regarding classification of Text Data using Lucene..

2013-01-09 Thread VIGNESH S
Hi, can anyone suggest me how can i use lucene for text classification. -- Thanks and Regards Vignesh Srinivasan - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@

RE: Cannot instantiate SPI class

2013-01-09 Thread Igal Sapir
Thanks, I'll do that. p.s. -- that was http://getrailo.org -- 'auto-correct' messed it up ;-) -- typos, misspels, and other weird words brought to you courtesy of my mobile device. On Jan 9, 2013 2:08 AM, "Nick Burch" wrote: > On Wed, 9 Jan 2013, Igal Sapir wrote: > >> The syntax is CFML / CFSc

Re: Is StandardAnalyzer good enough for multi languages...

2013-01-09 Thread Trejkaz
On Wed, Jan 9, 2013 at 5:25 PM, Steve Rowe wrote: > Dude. Go look. It allows for per-script specialization, with (non-UAX#29) > specializations by default for Thai, Lao, Myanmar and Hewbrew. See > DefaultICUTokenizerConfig. It's filled with exactly the opposite of what you > were describing

RE: Cannot instantiate SPI class

2013-01-09 Thread Nick Burch
On Wed, 9 Jan 2013, Igal Sapir wrote: The syntax is CFML / CFScript (ColdFusion Script). Railo is an open source, high performance, ColdFusion server. http://getrailo.arg/ I will re-download the Lucene jars and try again. I'll let you know what I find. It may be worth double-checking that

RE: Cannot instantiate SPI class

2013-01-09 Thread Igal Sapir
The syntax is CFML / CFScript (ColdFusion Script). Railo is an open source, high performance, ColdFusion server. http://getrailo.arg/ I will re-download the Lucene jars and try again. I'll let you know what I find. Thanks, Igal -- typos, misspels, and other weird words brought to you courtes

Re: how much blocksize is set in lucene.

2013-01-09 Thread seacathello
the index lib size is aboule 1TB and have only one segment. -- View this message in context: http://lucene.472066.n3.nabble.com/how-much-blocksize-is-set-in-lucene-tp4031796p4031797.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. -

how much blocksize is set in lucene.

2013-01-09 Thread seacathello
now i index very many email file, aboule 50m and every email file size about 4-50k. the index lib size is aboule 1TB, segment size is only. In this index lib, which blocksize should i shoose? 4k or 512k, which choice is better? Thanks very much? -- View this message in context: http://lucene.4

RE: Cannot instantiate SPI class

2013-01-09 Thread Uwe Schindler
> indexWriterConfig = createObject( "java", > "org.apache.lucene.index.IndexWriterConfig" ).init( Lucene.Version, > this.indexAnalyzer ); What syntax is that, I have never seen that before! > where Lucene.Version is an object of Lucene.VERSION_40 and > this.indexAnalyzer is an Analyzer objec

Re: Cannot instantiate SPI class

2013-01-09 Thread Igal @ getRailo.org
hi Uwe, thank you for answering. I believe that this is the complete stack trace, no (pasted again below)? I'm actually not trying to do anything fancy with codecs etc. I'm trying to do something very basic: create an object of type indexWriterConfig. the CFML (Railo) code is as follows: