Re: korean and lucene

2005-10-26 Thread Youngho Cho
Hello, Ok , I've attached my test code for Korean which is slitely modified Koji's code. Just put into the lia.analysis.i18n package at LuceneInAction and run ant. Hopely someone is helped. build.xml - Japanese Test... Korea

Re: Segments file format

2005-10-26 Thread Yonik Seeley
There is a currently undocumented extra int32. Here's the code for writing the segment file: output.writeInt(FORMAT); // write FORMAT output.writeLong(++version); // every write changes the index output.writeInt(counter); // write counter output.writeInt(size()); // write infos for (int i = 0; i <

Re: Segments file format

2005-10-26 Thread Yonik Seeley
Hi Bill, I can't seem to correctly parse it either... Format = FF FF FF FF Version = 00 00 00 00 00 00 00 28 SegCount = 00 00 00 4E = 00 00 00 04 -Yonik Now hiring -- http://forms.cnet.com/slink?231706 On 10/26/05, Bill Tschumy <[EMAIL PROTECTED]> wrote: > > I have been trying to reconstitu

Re: korean and lucene

2005-10-26 Thread Youngho Cho
Hello all Plese forgive me pervious my stupid message [echo] Running lia.analysis.i18n.KoreanDemo... [java] [경] [기] analyzer = org.apache.lucene.analysis.standard.StandardAnalyzer [java] phrase = 경기 [java] query = "경 기" I got the good result. When I compile I just rename ol

Re: korean and lucene

2005-10-26 Thread Youngho Cho
Hello Koji Here is test result. Japanese is OK !. maybe ant clean did some effect. Anyway please refer to the result using 1.9 [echo] Running lia.analysis.i18n.JapaneseDemo... [java] [ラ] [メ] [ン] [屋] analyzer = org.apache.lucene.analysis.standard.StandardAnalyzer [java] phrase =

Segments file format

2005-10-26 Thread Bill Tschumy
I have been trying to reconstitute a corrupted index. I have been looking at the segments file with a hex-editor and its format doesn't seem to quite agree with the description found at: It indicates the segments file looks like this: Seg

RE: korean and lucene

2005-10-26 Thread Koji Sekiguchi
Hello Youngho, I don't understand why you couldn't get hits result in Japanese, though, you had better check why the query was empty with Korean data: > For Korean > [echo] Running lia.analysis.i18n.KoreanDemo... > [java] phrase = 경 > [java] query = The last line should be query

Re: korean and lucene

2005-10-26 Thread Youngho Cho
Hello Koji, Thanks for your kind reply. Yes, I used QueryParser. normaly I used Query = QueryParser.parse( ) method. I put your sample code into lia.analysis.i18n package in LuceneAction and run JapaneseDemo using 1.4 and 1.9 results are [echo] Running lia.analysis.i18n.JapaneseDemo...

Re: Review of Using Compass With Lucene to Index Database?

2005-10-26 Thread Erik Hatcher
On 26 Oct 2005, at 20:15, Sam Lee wrote: Are Compass Framework and DBsight the only 2 options for indexing DB so far? Or probably 10 lines of JDBC and Lucene API code (just to index, of course). Erik - To unsubscrib

RE: korean and lucene

2005-10-26 Thread Koji Sekiguchi
Hi Youngho, With regard to Japanese, using StandardAnalyzer, I can search a word/phase. Did you use QueryParser? StandardAnalyzer tokenizes CJK characters into a stream of single character. Use QueryParser to get a PhraseQuery and search the query. Please see the following sample code. Replace J

Re: Review of Using Compass With Lucene to Index Database?

2005-10-26 Thread Sam Lee
Are Compass Framework and DBsight the only 2 options for indexing DB so far? --- Chris Lu <[EMAIL PROTECTED]> wrote: > It's done by Shay Banon. He should be on the list > also. > Lucene works by a hooker through Hibernate when CRUD > operatates on MySql. > > http://www.theserverside.com/tss?serv

Re: Document number

2005-10-26 Thread Grant Ingersoll
Yep. hits.id() should do it. Gusenbauer Stefan wrote: Gusenbauer Stefan wrote: I've searching trough the archives but is there a way to get the document number for a specific document? I would need it for the Method getTermFreqVector of IndexReader? For deleting I've saved a unique ID Fie

Re: Review of Using Compass With Lucene to Index Database?

2005-10-26 Thread Chris Lu
It's done by Shay Banon. He should be on the list also. Lucene works by a hooker through Hibernate when CRUD operatates on MySql. http://www.theserverside.com/tss?service=direct/0/NewsThread/threadViewer.markNoisy.link&sp=l35679&sp=l180646 Chris Lucene Search On A

Re: korean and lucene

2005-10-26 Thread Youngho Cho
Hello Cheolgoo, Now I updated my lucene version to 1.9 for using StandardAnalyzer for Korean. And tested your patch which is already adopted in 1.9 http://issues.apache.org/jira/browse/LUCENE-444 But Still I have no good results with Korean compare with CJKAnalyzer. Single character is good ma

Review of Using Compass With Lucene to Index Database?

2005-10-26 Thread Sam Lee
Hi, I just found a open source project called Compass that works with Lucene to index database like mysql. Has anyone used it? If so, please let us know what you think about Compass. Many thanks. __ Do You Yahoo!? Tired of spam? Yahoo! Mail has

Re: Document number

2005-10-26 Thread Gusenbauer Stefan
Gusenbauer Stefan wrote: >I've searching trough the archives but is there a way to get the >document number for a specific document? I would need it for the Method >getTermFreqVector of IndexReader? For deleting I've saved a unique ID >Field to delete the documents but how I get the document numbe

Document number

2005-10-26 Thread Gusenbauer Stefan
I've searching trough the archives but is there a way to get the document number for a specific document? I would need it for the Method getTermFreqVector of IndexReader? For deleting I've saved a unique ID Field to delete the documents but how I get the document number? thanks Stefan ---

Re: AW: Java heap space ...after index process

2005-10-26 Thread Gusenbauer Stefan
Patricio Galeas wrote: >Hello Ben, >It happens when one of the documents [4.95 MB] is indexed. >I use the framework to index office documents from the book "Lucene In >Action". I think the PDDocument objects are closed correctly. > >I'll look for more information about increasing the heap size. >

AW: Java heap space ...after index process

2005-10-26 Thread Patricio Galeas
Hello Ben, It happens when one of the documents [4.95 MB] is indexed. I use the framework to index office documents from the book "Lucene In Action". I think the PDDocument objects are closed correctly. I'll look for more information about increasing the heap size. PDFBox version = 0.7.2 Thank Y

RE: Help with Search Java Code set up

2005-10-26 Thread Kevin L. Cobb
Yeah. The last post got me to reading more about BooleanQuery and this opend up the flood gates. A question on the heels of this one though. I have documents indexed in multiple fields. Lets say Name, Synonym, and Definition. Lets say the search phrase is "big green cat". What I'm building using

Would Someone Give Me Pointer On How to Index Database?

2005-10-26 Thread Sam Lee
Hi, I want to use Lucene/Nutch to index my mysql database. I think of using JDBC, is it a good idea? I searched all over the web, but all the examples are non-lucene/Nutch related. Would you guys give me pointers or websites or examples on how to use JDBC on Lucene/Nutch to index mysql databas

Re: Java heap space ...after index process

2005-10-26 Thread Ben Litchfield
Is this only after the entire indexing process is finished or do you mean it happens on one of the documents you are extracting text from? Are you closing the PDDocument objects when you are done with them? What heap size are you using and have you tried increasing it? What version of PDFBox?

Java heap space ...after index process

2005-10-26 Thread Patricio Galeas
Hello All, I try to index some PDF documents using PDFBox. It works apparent normally, but when the index process ends, I get the following message: Exception in thread "main" java.lang.OutOfMemoryError: Java heap space Do you have some idea? Thanks Patricio ---

RE: Help with Search Java Code set up

2005-10-26 Thread Otis Gospodnetic
Answer to 4: yes, you can. QueryParser will give you a Query instance. You can use that to make your own BooleanQuery. Otis --- "Kevin L. Cobb" <[EMAIL PROTECTED]> wrote: > It well could be that I'm lacking in setting up my queries. Here's > the > gist of what I'm trying to do, it a little pse

Re: Help with Search Java Code set up

2005-10-26 Thread Otis Gospodnetic
Are you simply looking to use multiple terms in your search? In that case, simply use BooleanQuery instead of TermQuery. QueryParser will recognize strings likefoo AND baror+foo +barand turn that into a BooleanQuery for you. Otis --- "Kevin L. Cobb" <[EMAIL PROTECTED]> wrote: >

[Fwd: Re: SQLDirectory]

2005-10-26 Thread Rick Hillegas
I can't find SQLDirectory and friends checked into Lucene proper or the sandbox. I'm not a lawyer but I feel the licensing issues here are gray. Fortunately, DbDirectory lives in the Lucene sandbox and so seems legally safer. Anyone else have an opinion on this? -Rick --- Begin Message --- ---

Re: FilteredQuery usage

2005-10-26 Thread Chris Hostetter
: filter, I need to query all the index checking if the value of field : name "path" is a prefix or not. : : There's a way to do that query without having to retrieve all the : Document instances from the index? Yep, you're definitely on the right track. You don't need to retrieve any documents a

RE: Help with Search Java Code set up

2005-10-26 Thread Kevin L. Cobb
It well could be that I'm lacking in setting up my queries. Here's the gist of what I'm trying to do, it a little pseudocode. 1. inputs: 1) termToSearch 2) keywordField 2. Use MultiFieldQueryParser to build the query for the termToSearch in the searchable fields 3. Use QueryParser to build the qu

Re: Help with Search Java Code set up

2005-10-26 Thread Jeff Rodenburg
Kevin - Maybe I'm misunderstanding, but how is this not a BooleanQuery with two clauses? - j On 10/26/05, Kevin L. Cobb <[EMAIL PROTECTED]> wrote: > > I've been using Lucene happily for a couple of years now. But, this new > search functionality I'm trying to add is somewhat different that what

Re: Help with Search Java Code set up

2005-10-26 Thread Yonik Seeley
Hi Kevin, Your description is rather generic... could you be more specific why you couldn't just create a BooleanQuery that searches across the searchable and keyword fields at the same time? -Yonik Now hiring -- http://forms.cnet.com/slink?231706 On 10/26/05, Kevin L. Cobb <[EMAIL PROTECTED]> w

RE: score formula in Similarity javadoc

2005-10-26 Thread Koji Sekiguchi
Hi Yonik, I'd checked TermQuery, TermScorer and TermWeight then sent the previous mail. But after getting your reply, I did double-check and I understand that you are correct. So, the formula in LIA should be re-corrected? :) Scoring formula figure omission http://www.lucenebook.com/blog/errata/

Re: Bad explanations

2005-10-26 Thread Olivier Jaquemet
LOL, have you ever seen a man humiliating himself in public in a mailing list? cause I just did ;) thank you yonik ... :) Yonik Seeley wrote: To be more literal, I actually meant "explain(query,hits.id(i))" On 10/26/05, Yonik Seeley <[EMAIL PROTECTED]> wrote: Typo... try explain(query,doc

Re: Bad explanations

2005-10-26 Thread Yonik Seeley
To be more literal, I actually meant "explain(query,hits.id(i))" On 10/26/05, Yonik Seeley <[EMAIL PROTECTED]> wrote: > > Typo... try explain(query,doc) instead of (query,i) > :-) >

Re: Bad explanations

2005-10-26 Thread Yonik Seeley
Typo... try explain(query,doc) instead of (query,i) :-) -Yonik Now hiring -- http://forms.cnet.com/slink?231706 On 10/26/05, Olivier Jaquemet <[EMAIL PROTECTED]> wrote: > > Hi everyone, > > I am encoutering a really weird problem, I'm doing a query which gives > me perfectly good results, with

Help with Search Java Code set up

2005-10-26 Thread Kevin L. Cobb
I've been using Lucene happily for a couple of years now. But, this new search functionality I'm trying to add is somewhat different that what I'm used to doing. Would help if the smart folks on this list would drive me in the right direction. I have several "searchable" fields and one keyword fi

Bad explanations

2005-10-26 Thread Olivier Jaquemet
Hi everyone, I am encoutering a really weird problem, I'm doing a query which gives me perfectly good results, with scores which are looking pretty right too. I wanted to display an explanation of some of my results just to check for something, and ALL hits output this explanation 0.0 = produ

Re: score formula in Similarity javadoc

2005-10-26 Thread Yonik Seeley
With respect to different terms in a boolean query, they will contribute to the total score proportional to idf^2, so I think the javadoc as it exists now is probably more correct. A single TermQuery will have a final score with a single idf factor in it, but that's because of the queryweight fact

Re: MaxFieldLength or MaxFields?

2005-10-26 Thread Jeff Rodenburg
thanks Erik On 10/26/05, Erik Hatcher <[EMAIL PROTECTED]> wrote: > > > On 26 Oct 2005, at 02:50, Jeff Rodenburg wrote: > > I'm considering building out an index that will flatten a data > > structure, > > such that some Document "A" will have Fields 1,2 and 3. > > Fields 1 and 2 are indexed/tokeni

Re: Iterate through all entries

2005-10-26 Thread Yonik Seeley
To loop through all documents, simply go from 0 to maxDoc() and check isDeleted() on each. -Yonik Now hiring -- http://forms.cnet.com/slink?231706 On 10/26/05, Dirk Hennig < [EMAIL PROTECTED]> wrote: > > Hallo, > > I want to iterate through all documents in my index (to add some new > fields). >

RE: Database File Store (SQLDirectory?)

2005-10-26 Thread Steven Pannell
Hi, Seems the JDBCDirectory did exist, but sadly the link is no longer working: http://ppinew.mnis.com/jdbcdirectory/ newsgroup link - http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg10689.html But with the SQLDirectory and the DBDirectory I am sure I can knock something up now

Re[3]: Cross-field multi-word and query

2005-10-26 Thread Maxim Patramanskij
Hello Chris, Thanks a lot for the helping hand. I plugged in MaxDisjunctQuery and it is working so far, but I need to check accuracy of it. Next problem I met is highlighter, which must be adopted to understand MaxDisjunctQuery(because now it stops to highlight anything due to unknown new Query t

Re: Database File Store (SQLDirectory?)

2005-10-26 Thread Volodymyr Bychkoviak
Hi. Try to search for JDBCDirectory... Steven Pannell wrote: Hi, I'm looking to try and store my index into an oracle database. Has anyone done this before? I did find something in the archive referring to the SQLDirectory.java file by Marc Kramis. But there does not seem to be any download o

Re: Database File Store (SQLDirectory?)

2005-10-26 Thread Rick Hillegas
Thanks, Tobias. It turns out that email address is dead. I've googled up another address for him and have posted him a question about the licensing of this code. When I get an answer, I'll let the list know. Cheers, -Rick Tobias Lütticke wrote: Hi, as follow-up: google found the post for m

Re: Database File Store (SQLDirectory?)

2005-10-26 Thread Tobias Lütticke
Hi, as follow-up: google found the post for me: http://mail-archives.apache.org/mod_mbox/lucene-java-user/200112.mbox/[EMAIL PROTECTED] The sources are attached. As for one question I received: I have no idea what licence the code is intended to have. I guess it is kind of free since it was

RE: Database File Store (SQLDirectory?)

2005-10-26 Thread Simon Middleton
This works a bit, but I had to tweak it a bit to get it just right. Regards, Simon http://issues.apache.org/jira/browse/LUCENE-150 Hi, I'm looking to try and store my index into an oracle database. Has anyone done this before? I did find something in the archive referring to the SQLDirectory

Re: Database File Store (SQLDirectory?)

2005-10-26 Thread Rick Hillegas
Hi Tobias, I would be interested in the SQLDirectory class and DDL scripts if you could send them. Thanks, -Rick Tobias Lütticke wrote: Hi, > SQLDirectory.java file by Marc Kramis. But there does not seem to be any > download or further references to this implementation. Does anyone hav

FilteredQuery usage

2005-10-26 Thread Ricardo Borillo Domenech
Hi all!! I'm using PrefixQuery in my search application and I get TooManyClauses. I have found many information about this problem and the solutions seems to be the use of a FilteredQuery. Now, I'm trying to write my Filter ... Well, the problem is that when i try to write the "bits" function of

Re: MaxFieldLength or MaxFields?

2005-10-26 Thread Erik Hatcher
On 26 Oct 2005, at 02:50, Jeff Rodenburg wrote: I'm considering building out an index that will flatten a data structure, such that some Document "A" will have Fields 1,2 and 3. Fields 1 and 2 are indexed/tokenized field. Field 3 is indexed, and will contain many discrete values (up to poss

Re: Non-scoring fields

2005-10-26 Thread Maik Schreiber
> There is nothing intrinsic in the way Filters work that make them slower > then Queries -- in the case of RangeQuery vs RangeFilter, a RangeFilter is > just about always faster then a RangeQuery. (or more specifically: I've > never seen a case in which a RangeQuery is faster) > > [...] Nice, t

Iterate through all entries

2005-10-26 Thread Dirk Hennig
Hallo, I want to iterate through all documents in my index (to add some new fields). What ist the best / fastest way to do this? Greetings, Dirk Hennig - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-m

Re: Database File Store (SQLDirectory?)

2005-10-26 Thread Tobias Lütticke
Hi, > SQLDirectory.java file by Marc Kramis. But there does not seem to be any > download or further references to this implementation. Does anyone have it? > or are there perhaps any other implementations to get me started? It took me a while to find it, but I finally discovered it in an old m

Re: Another index corruption problem

2005-10-26 Thread Andrzej Bialecki
Bill Tschumy wrote: I hate to plead, but I really need to do my best to recover my customer's data. Does anyone have any pointers for how to manually (or programmatically) repair this corrupted index? On Oct 24, 2005, at 11:23 PM, Bill Tschumy wrote: Many months ago I wrote this list abou

Re: Another index corruption problem

2005-10-26 Thread Andrzej Bialecki
Daniel Naber wrote: On Mittwoch 26 Oktober 2005 04:09, Bill Tschumy wrote: I hate to plead, but I really need to do my best to recover my customer's data. Does anyone have any pointers for how to manually (or programmatically) repair this corrupted index? You could try to fix the segments

Re: Another index corruption problem

2005-10-26 Thread Daniel Naber
On Mittwoch 26 Oktober 2005 04:09, Bill Tschumy wrote: > I hate to plead, but I really need to do my best to recover my > customer's data. Does anyone have any pointers for how to manually > (or programmatically) repair this corrupted index? You could try to fix the segments file (remove the fil

Database File Store (SQLDirectory?)

2005-10-26 Thread Steven Pannell
Hi, I'm looking to try and store my index into an oracle database. Has anyone done this before? I did find something in the archive referring to the SQLDirectory.java file by Marc Kramis. But there does not seem to be any download or further references to this implementation. Does anyone have i