Incorrect Token Offset when using multiple fieldable instance
Hi, I currently use multiple fieldable instances for indexing sentences of a document. When there is only one single fieldable instance, the token offset generation performed in DocumentWriter is correct. The problem appears when there is two or more fieldable instances. In DocumentWriter$FieldData#invertField method, if the field is tokenized, instead of updating offset attribute with stringValue.length() (which is performed if the field is not tokenized, line 1458), you update the offset attribute with the end offset of the last token (line 1503: offset = offsetEnd+1;). As a consequence, if a token has been filtered (for example a stopword, a dot, a space, etc.), the offset attribute is updated with the end offset of the last token not filtered. In this case, you store inside the offset attribute an incorrect offset (the offset is shift back) and all the next fieldable instances will have their offset shifted back. Is it a bug ? Or is it a desired behavior (in this case, why ?) ? Regards. -- Renaud Delbru, E.C.S., Ph.D. Student, Semantic Information Systems and Language Engineering Group (SmILE), Digital Enterprise Research Institute, National University of Ireland, Galway. http://smile.deri.ie/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Why indexing database is necessary? (RE: indexing database)
Could anyone provide any insight on why someone would use nutch/lucene or any other search engines to index relational databases? With use cases if possible? Shouldn't the database's own indexing mechanism be used since it is more efficient? If there is such a need of indexing the database content using search engines, what would be the best approach other than de-normalizing the database? Thanks a lot in advance! ND -Original Message- From: payo [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 04, 2008 12:36 PM To: [EMAIL PROTECTED] Subject: indexing database hi to all i can index a database with nutch? i am use nutch 0.8.1 thanks -- View this message in context: http://www.nabble.com/indexing-database-tp15832696p15832696.html Sent from the Nutch - User mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Why indexing database is necessary? (RE: indexing database)
Indexing with lucene/nutch on top of/instead of DB indexing for: 1) relativity scoring 2) alias searching (i.e. a large amount of aliases, like first names) 3) highlighting 4) cross-datasource searching (multi DB, DB + XML files, etc). As for best approach to externally index, I do not have any direct pointers. I would recommend looking at an ETL tool that can be extended for this purpose (I've started writing a plugin for Pentaho, but got pulled off and haven't finished it -- and that was for Solr, not lucene/nutch). -D > -Original Message- > From: Duan, Nick [mailto:[EMAIL PROTECTED] > Sent: Tuesday, March 04, 2008 1:33 PM > To: java-user@lucene.apache.org > Subject: Why indexing database is necessary? (RE: indexing database) > > Could anyone provide any insight on why someone would use nutch/lucene > or any other search engines to index relational databases? With use > cases if possible? Shouldn't the database's own indexing mechanism be > used since it is more efficient? > > If there is such a need of indexing the database content using search > engines, what would be the best approach other than de-normalizing the > database? > > Thanks a lot in advance! > > ND > -Original Message- > From: payo [mailto:[EMAIL PROTECTED] > Sent: Tuesday, March 04, 2008 12:36 PM > To: [EMAIL PROTECTED] > Subject: indexing database > > > hi to all > > i can index a database with nutch? > > i am use nutch 0.8.1 > > thanks > -- > View this message in context: > http://www.nabble.com/indexing-database-tp15832696p15832696.html > Sent from the Nutch - User mailing list archive at Nabble.com. > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Why indexing database is necessary? (RE: indexing database)
Don't forget the number 1 reason: speed. For certain types of queries a search engine can return results orders of magnitude faster than a database. I've seen search engines return hits in hundreds of milliseconds when the same database query took hours or even days. That's not to say that a search engine is always better, just the it often times is for when the inputs and outputs are carefully defined. - will -Original Message- From: Darren Hartford [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 04, 2008 1:52 PM To: java-user@lucene.apache.org Subject: RE: Why indexing database is necessary? (RE: indexing database) Indexing with lucene/nutch on top of/instead of DB indexing for: 1) relativity scoring 2) alias searching (i.e. a large amount of aliases, like first names) 3) highlighting 4) cross-datasource searching (multi DB, DB + XML files, etc). As for best approach to externally index, I do not have any direct pointers. I would recommend looking at an ETL tool that can be extended for this purpose (I've started writing a plugin for Pentaho, but got pulled off and haven't finished it -- and that was for Solr, not lucene/nutch). -D > -Original Message- > From: Duan, Nick [mailto:[EMAIL PROTECTED] > Sent: Tuesday, March 04, 2008 1:33 PM > To: java-user@lucene.apache.org > Subject: Why indexing database is necessary? (RE: indexing database) > > Could anyone provide any insight on why someone would use nutch/lucene > or any other search engines to index relational databases? With use > cases if possible? Shouldn't the database's own indexing mechanism be > used since it is more efficient? > > If there is such a need of indexing the database content using search > engines, what would be the best approach other than de-normalizing the > database? > > Thanks a lot in advance! > > ND > -Original Message- > From: payo [mailto:[EMAIL PROTECTED] > Sent: Tuesday, March 04, 2008 12:36 PM > To: [EMAIL PROTECTED] > Subject: indexing database > > > hi to all > > i can index a database with nutch? > > i am use nutch 0.8.1 > > thanks > -- > View this message in context: > http://www.nabble.com/indexing-database-tp15832696p15832696.html > Sent from the Nutch - User mailing list archive at Nabble.com. > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
C++ as token in StandardAnalyzer?
I saw some discussion in the archives some time ago about the fact that C++ is tokenized as C in the StandardAnalyzer; this seems to still be the case; I was wondering if there is a simple way for me to get the behavior I want for C++ (that it is tokenized as C++) in particular, and perhaps for other more ideosyncratic terms I may have in my own application-- Thanks Donna
RE: Why indexing database is necessary? (RE: indexing database)
Hmm, I guess that's because a database query returns a list of records, whereas search engine returns only the links, not the actual content. So a search engine works only in the index space, whereas a database query engine would have to work in both index and content space... ND -Original Message- From: Will Johnson [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 04, 2008 2:18 PM To: java-user@lucene.apache.org Subject: RE: Why indexing database is necessary? (RE: indexing database) Don't forget the number 1 reason: speed. For certain types of queries a search engine can return results orders of magnitude faster than a database. I've seen search engines return hits in hundreds of milliseconds when the same database query took hours or even days. That's not to say that a search engine is always better, just the it often times is for when the inputs and outputs are carefully defined. - will -Original Message- From: Darren Hartford [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 04, 2008 1:52 PM To: java-user@lucene.apache.org Subject: RE: Why indexing database is necessary? (RE: indexing database) Indexing with lucene/nutch on top of/instead of DB indexing for: 1) relativity scoring 2) alias searching (i.e. a large amount of aliases, like first names) 3) highlighting 4) cross-datasource searching (multi DB, DB + XML files, etc). As for best approach to externally index, I do not have any direct pointers. I would recommend looking at an ETL tool that can be extended for this purpose (I've started writing a plugin for Pentaho, but got pulled off and haven't finished it -- and that was for Solr, not lucene/nutch). -D > -Original Message- > From: Duan, Nick [mailto:[EMAIL PROTECTED] > Sent: Tuesday, March 04, 2008 1:33 PM > To: java-user@lucene.apache.org > Subject: Why indexing database is necessary? (RE: indexing database) > > Could anyone provide any insight on why someone would use nutch/lucene > or any other search engines to index relational databases? With use > cases if possible? Shouldn't the database's own indexing mechanism be > used since it is more efficient? > > If there is such a need of indexing the database content using search > engines, what would be the best approach other than de-normalizing the > database? > > Thanks a lot in advance! > > ND > -Original Message- > From: payo [mailto:[EMAIL PROTECTED] > Sent: Tuesday, March 04, 2008 12:36 PM > To: [EMAIL PROTECTED] > Subject: indexing database > > > hi to all > > i can index a database with nutch? > > i am use nutch 0.8.1 > > thanks > -- > View this message in context: > http://www.nabble.com/indexing-database-tp15832696p15832696.html > Sent from the Nutch - User mailing list archive at Nabble.com. > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Why indexing database is necessary? (RE: indexing database)
Not necessarily, many of the high traffic search sites on the market today for everything from yellow pages to job boards to ecommerce sites use search engines to exclusively search *and* retrieve/serve content. The key is that they don't have to return all matching rows only the 'best' which are probably the ones you would want anyways. - will -Original Message- From: Duan, Nick [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 04, 2008 2:29 PM To: java-user@lucene.apache.org Subject: RE: Why indexing database is necessary? (RE: indexing database) Hmm, I guess that's because a database query returns a list of records, whereas search engine returns only the links, not the actual content. So a search engine works only in the index space, whereas a database query engine would have to work in both index and content space... ND -Original Message- From: Will Johnson [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 04, 2008 2:18 PM To: java-user@lucene.apache.org Subject: RE: Why indexing database is necessary? (RE: indexing database) Don't forget the number 1 reason: speed. For certain types of queries a search engine can return results orders of magnitude faster than a database. I've seen search engines return hits in hundreds of milliseconds when the same database query took hours or even days. That's not to say that a search engine is always better, just the it often times is for when the inputs and outputs are carefully defined. - will -Original Message- From: Darren Hartford [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 04, 2008 1:52 PM To: java-user@lucene.apache.org Subject: RE: Why indexing database is necessary? (RE: indexing database) Indexing with lucene/nutch on top of/instead of DB indexing for: 1) relativity scoring 2) alias searching (i.e. a large amount of aliases, like first names) 3) highlighting 4) cross-datasource searching (multi DB, DB + XML files, etc). As for best approach to externally index, I do not have any direct pointers. I would recommend looking at an ETL tool that can be extended for this purpose (I've started writing a plugin for Pentaho, but got pulled off and haven't finished it -- and that was for Solr, not lucene/nutch). -D > -Original Message- > From: Duan, Nick [mailto:[EMAIL PROTECTED] > Sent: Tuesday, March 04, 2008 1:33 PM > To: java-user@lucene.apache.org > Subject: Why indexing database is necessary? (RE: indexing database) > > Could anyone provide any insight on why someone would use nutch/lucene > or any other search engines to index relational databases? With use > cases if possible? Shouldn't the database's own indexing mechanism be > used since it is more efficient? > > If there is such a need of indexing the database content using search > engines, what would be the best approach other than de-normalizing the > database? > > Thanks a lot in advance! > > ND > -Original Message- > From: payo [mailto:[EMAIL PROTECTED] > Sent: Tuesday, March 04, 2008 12:36 PM > To: [EMAIL PROTECTED] > Subject: indexing database > > > hi to all > > i can index a database with nutch? > > i am use nutch 0.8.1 > > thanks > -- > View this message in context: > http://www.nabble.com/indexing-database-tp15832696p15832696.html > Sent from the Nutch - User mailing list archive at Nabble.com. > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: More IndexDeletionPolicy questions
The bigger picture here is NFS-safety. When I run a search, I hand off the search results to another thread so that they can be processed as necessary -- in particular so that they can be JOINed with a SQL DB -- but I don't want to completely lock the index from writes while doing a bunch of SQL calls. Using the commit point tracking I can make sure my appropriate snapshot stays around until I'm completely done with it, even if I'm using an NFS mounted filesystem that doesn't have delete-last semantics. --tim > > It seems like it should be pretty simple -- keep a list of open > > IndexReaders, track what Segment files they're pointing to, and in > > > onCommit don't delete those segments. > > This implies you have multiple readers in a single JVM? If so, you > should not need to make a custom deletion policy to handle this case > > -- the OS should be properly protecting open files from deletion. > Can you shed more light on the bigger picture here? > > > Unfortunately it ends up being very difficult to directly determine > > what Segment an IndexReader is pointing to. Is there some > > straightforward way that I'm missing -- all I've managed to do so > > far is to remember the most recent one from onCommit/onInit and use > > > that onethat works OK, but makes bootstrapping a pain if you > > try to open a Reader before you've opened the writer once. > > > > Also, when I use IndexReader.reopen(), can I assume that the newly > > > returned reader is pointing at the "most recent" segment? I think > > > so... > > Yes, except you have a synchronization challenge: if the writer is in > > the process of committing just as your reader opens you can't be > certain whether the reader got the new commit or the previous one. > If you have external synchronization to ensure reader only re-opens > after writer has fully committed then this isn't an issue. > > > > > Here's a sequence of steps that happen in my app: > > > > 0) open writer, onInit tells me that seg_1 is the most recent > segment > > > > 1) Open reader, assume it is pointing to seg_1 from (0) > > > > 2) New write commits into seg_2, don't delete seg_1 b/c of (1) > > > > 3) Call reader.reopen() on the reader from (1)new reader is > > pointing to seg_2 now? > > > > 4) seg_1 stays around until the next time I open or commit a > > writer, then it is removed. > > > > > > Does that seem reasonable? > > > > > > --tim > > > > > > > > > > > > > > > > > - > > To unsubscribe, e-mail: java-user-unsubscri > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Looking for an example of Using Position Increment Gap
Fellows, I'm working on a project here where we are trying to use our lucene indexes to return concrete objects. One of the things we want to be able to match by is by vocabulary terms annotated to that object, as well as all of the child vocabulary terms of that annotated term. So, what I was thinking about doing is extending my index that returns objects of that type to include a new field say "sub_term". In this field I would put all of the text of these vocabulary sub terms together, and introduce phrase boundries using some of the techniques that are described in the Javadoc in the analysis section. (Basically writing a custom analyzer that introduces a position increment gap between phrases) I am however curious if an example of a usage like that exists somewhere that I could use as a basis for the analyzer that I'm going to have to write to handle this case. Does anyone know of a good example? Matt - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: C++ as token in StandardAnalyzer?
Almost by definition, you have to write your own analyzer. This may be as simple as chaining another filter into one of the regular analyzers or as complex as defining your own grammar. As far as I know, there's no "keep word" list. But that would be an interesting addition. That is, a variety of analyzer that you not only passed a list of stop words to, but also passed a list of "keep words", or words that should NOT be massaged at all. I can imagine that this would get pretty tricky for, say, StandardAnalyzer, but something like this in the chain of WhitespaceTokenizer >> LowercaseFilter >> KeepwordFilter might be useful... All this right off the top of my head without much thought, but Best Erick On Tue, Mar 4, 2008 at 2:22 PM, Donna L Gresh <[EMAIL PROTECTED]> wrote: > I saw some discussion in the archives some time ago about the fact that > C++ is tokenized as C in the StandardAnalyzer; this seems to still be the > case; I was wondering if there is a simple way for me to get the behavior > I want for C++ (that it is tokenized as C++) in particular, and perhaps > for other more ideosyncratic terms I may have in my own application-- > Thanks > Donna > > >
Re: Why indexing database is necessary? (RE: indexing database)
Hi, Nick, Lucene Index in a sense is more like another kind of database indexes, because it's inverted, etc. If we ask why we need many database indexes, the answer is, different query execution path. Same thing for Lucene index, which is faster for term matching. Lucene index actually can do more. For example, facet-search, which tells you how many matches in each category(facet), in addition to the matched results. This way is more convenient for websites to display results, and provide additional links for users to narrow down the results. -- Chris Lu - Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.com Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes DBSight customer, a shopping comparison site, (anonymous per request) got 2.6 Million Euro funding! On Tue, Mar 4, 2008 at 11:28 AM, Duan, Nick <[EMAIL PROTECTED]> wrote: > Hmm, I guess that's because a database query returns a list of records, > whereas search engine returns only the links, not the actual content. > So a search engine works only in the index space, whereas a database > query engine would have to work in both index and content space... > > > ND > > > > -Original Message- > From: Will Johnson [mailto:[EMAIL PROTECTED] > Sent: Tuesday, March 04, 2008 2:18 PM > To: java-user@lucene.apache.org > Subject: RE: Why indexing database is necessary? (RE: indexing database) > > Don't forget the number 1 reason: speed. For certain types of queries a > search engine can return results orders of magnitude faster than a > database. > I've seen search engines return hits in hundreds of milliseconds when > the > same database query took hours or even days. That's not to say that a > search engine is always better, just the it often times is for when the > inputs and outputs are carefully defined. > > - will > > -Original Message- > From: Darren Hartford [mailto:[EMAIL PROTECTED] > Sent: Tuesday, March 04, 2008 1:52 PM > To: java-user@lucene.apache.org > Subject: RE: Why indexing database is necessary? (RE: indexing database) > > Indexing with lucene/nutch on top of/instead of DB indexing for: > > 1) relativity scoring > 2) alias searching (i.e. a large amount of aliases, like first names) > 3) highlighting > 4) cross-datasource searching (multi DB, DB + XML files, etc). > > As for best approach to externally index, I do not have any direct > pointers. I would recommend looking at an ETL tool that can be extended > for this purpose (I've started writing a plugin for Pentaho, but got > pulled off and haven't finished it -- and that was for Solr, not > lucene/nutch). > > -D > > > -Original Message- > > From: Duan, Nick [mailto:[EMAIL PROTECTED] > > Sent: Tuesday, March 04, 2008 1:33 PM > > To: java-user@lucene.apache.org > > Subject: Why indexing database is necessary? (RE: indexing database) > > > > Could anyone provide any insight on why someone would use nutch/lucene > > or any other search engines to index relational databases? With use > > cases if possible? Shouldn't the database's own indexing mechanism be > > used since it is more efficient? > > > > If there is such a need of indexing the database content using search > > engines, what would be the best approach other than de-normalizing the > > database? > > > > Thanks a lot in advance! > > > > ND > > -Original Message- > > From: payo [mailto:[EMAIL PROTECTED] > > Sent: Tuesday, March 04, 2008 12:36 PM > > To: [EMAIL PROTECTED] > > Subject: indexing database > > > > > > hi to all > > > > i can index a database with nutch? > > > > i am use nutch 0.8.1 > > > > thanks > > -- > > View this message in context: > > http://www.nabble.com/indexing-database-tp15832696p15832696.html > > Sent from the Nutch - User mailing list archive at Nabble.com. > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Why indexing database is necessary? (RE: indexing database)
And one other point. You probably *don't* need a search engine for your database *if* you don't have much textual data. That is, if your database consists of "classical" tables with columns like "firstname", "lastname", etc. But if your database has columns in it containing, say, a page of text then searching that text is a real pain. *That's* where a search engine shines. Searching a large DB text field for a single word becomes...er...awkward. That said, there's a long thread on the Lucene thread that I didn't understand at all concerning embedding Lucene in Oracle. You might try looking at the searchable Lucene threads for that... Best Erick On Tue, Mar 4, 2008 at 5:27 PM, Chris Lu <[EMAIL PROTECTED]> wrote: > Hi, Nick, > > Lucene Index in a sense is more like another kind of database indexes, > because it's inverted, etc. > > If we ask why we need many database indexes, the answer is, different > query execution path. > Same thing for Lucene index, which is faster for term matching. > > Lucene index actually can do more. For example, facet-search, which > tells you how many matches in each category(facet), in addition to the > matched results. This way is more convenient for websites to display > results, and provide additional links for users to narrow down the > results. > > -- > Chris Lu > - > Instant Scalable Full-Text Search On Any Database/Application > site: http://www.dbsight.net > demo: http://search.dbsight.com > Lucene Database Search in 3 minutes: > > http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes > DBSight customer, a shopping comparison site, (anonymous per request) > got 2.6 Million Euro funding! > > > On Tue, Mar 4, 2008 at 11:28 AM, Duan, Nick <[EMAIL PROTECTED]> > wrote: > > Hmm, I guess that's because a database query returns a list of records, > > whereas search engine returns only the links, not the actual content. > > So a search engine works only in the index space, whereas a database > > query engine would have to work in both index and content space... > > > > > > ND > > > > > > > > -Original Message- > > From: Will Johnson [mailto:[EMAIL PROTECTED] > > Sent: Tuesday, March 04, 2008 2:18 PM > > To: java-user@lucene.apache.org > > Subject: RE: Why indexing database is necessary? (RE: indexing > database) > > > > Don't forget the number 1 reason: speed. For certain types of queries > a > > search engine can return results orders of magnitude faster than a > > database. > > I've seen search engines return hits in hundreds of milliseconds when > > the > > same database query took hours or even days. That's not to say that a > > search engine is always better, just the it often times is for when the > > inputs and outputs are carefully defined. > > > > - will > > > > -Original Message- > > From: Darren Hartford [mailto:[EMAIL PROTECTED] > > Sent: Tuesday, March 04, 2008 1:52 PM > > To: java-user@lucene.apache.org > > Subject: RE: Why indexing database is necessary? (RE: indexing > database) > > > > Indexing with lucene/nutch on top of/instead of DB indexing for: > > > > 1) relativity scoring > > 2) alias searching (i.e. a large amount of aliases, like first names) > > 3) highlighting > > 4) cross-datasource searching (multi DB, DB + XML files, etc). > > > > As for best approach to externally index, I do not have any direct > > pointers. I would recommend looking at an ETL tool that can be > extended > > for this purpose (I've started writing a plugin for Pentaho, but got > > pulled off and haven't finished it -- and that was for Solr, not > > lucene/nutch). > > > > -D > > > > > -Original Message- > > > From: Duan, Nick [mailto:[EMAIL PROTECTED] > > > Sent: Tuesday, March 04, 2008 1:33 PM > > > To: java-user@lucene.apache.org > > > Subject: Why indexing database is necessary? (RE: indexing database) > > > > > > Could anyone provide any insight on why someone would use > nutch/lucene > > > or any other search engines to index relational databases? With use > > > cases if possible? Shouldn't the database's own indexing mechanism > be > > > used since it is more efficient? > > > > > > If there is such a need of indexing the database content using search > > > engines, what would be the best approach other than de-normalizing > the > > > database? > > > > > > Thanks a lot in advance! > > > > > > ND > > > -Original Message- > > > From: payo [mailto:[EMAIL PROTECTED] > > > Sent: Tuesday, March 04, 2008 12:36 PM > > > To: [EMAIL PROTECTED] > > > Subject: indexing database > > > > > > > > > hi to all > > > > > > i can index a database with nutch? > > > > > > i am use nutch 0.8.1 > > > > > > thanks > > > -- > > > View this message in context: > > > http://www.nabble.com/indexing-database-tp15832696p15832696.html > > > Sent from the Nutch - User mailing list archive at Nabble.com. > > > > > > > > > ---
NO_NORM and TOKENIZED
Hi, I am quite new to the Lucene API. I find the Field-constructor unintuitive. Maybe I have misunderstood it. Let's find out... It can be used either as: new Field("field", "data", Store.NO, TOKENIZED) or: new Field("field", "data", Store.NO, NO_NORM) As I understand it NO_NORM and TOKENIZED are not settings for a one-dimensional behaviour - on the contrary they are rather orthogonal. I.e. it is quite likely that I would want _both_ TOKENIZED and NO_NORM. This is especially true for fields that are of approx. equal and short length over the doc-space. - Am I right in my reasoning (which means that the API is a bit unclear)? Or - Have I misunderstood something fundamental about TOKENIZED and NO_NORM? Thankful for any feedback on this, Tobias - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]