Thanks Erick,

I'll give a representation of the data structure that I am trying to index (in xml)..... This represents a relational data structure. Because all Place (ie Kazakhstan) Person's are grouped together eta....

   <Example>
       <Place name="United States of America">
          <PlaceAlias>USA</PlaceAlias>
          <PlaceAlias>U.S.A</PlaceAlias>
          <PlaceAlias>US</PlaceAlias>
          <Person>
             <Name>George W Bush</Name>
             <Occupation>Demolition</Occupation>
             <Hobby alias="Funny">Comedy</Hobby>
             <Hobby alias="Pretend">Mime</Hobby>
             <Hobby>Ant Farms</Hobby>
          </Person>
          <Person>
             <Name>Bill Clinton</Name>
             <Occupation>Retired</Occupation>
             <Hobby>Smoking Cigars</Hobby>
          </Person>
          <!-- many more person's here.... -->
          <!-- many more person's here.... -->
          <!-- many more person's here.... -->
       </Place>
       <Place name="kazakhstan">
          <PlaceAlias>kazak</PlaceAlias>
          <PlaceAlias>kazzi</PlaceAlias>
          <PlaceAlias>kzh</PlaceAlias>
          <Person>
             <Name>Borat</Name>
             <Occupation>TV Reporter</Occupation>
             <Hobby alias="Boogie">Dancing</Hobby>
             <Hobby alias="Soccer">Football</Hobby>
             <Hobby>Swimming</Hobby>
             <!-- many more hobbie's in here.. (or even none) with or
   without aliases -->
             <!-- many more hobbie's in here.. (or even none) with or
   without aliases -->
             <!-- many more hobbie's in here.. (or even none) with or
   without aliases -->
          </Person>
          <!-- many more person's here.... -->
          <!-- many more person's here.... -->
          <!-- many more person's here.... -->
       </Place>
       <!-- many more place's, person's and hobbie's here.... -->
       <!-- many more place's, person's and hobbie's here.... -->
       <!-- many more place's, person's and hobbie's here.... -->
   </Example>


I am expecting someone to say that this Relational/3NF strucutre should simply be placed into a flat index... the concept of an index replaces the 1-Many relational approach by grouping/indexing all "documents" with the same "Place" together... or at least effectively making the search time so fast and hence achieving a usable solution....

   Place     Person_Name       Person_Occupation  Hobby
   ===========================================================================
   USA          George W Bush  Demolition         Comedy
   USA          Bill  Clinton  Retired            Smoking Cigars
   Kazakhstan   Borat          TV Presenter       Dancing



I do however ask... how would one group duplicate fields.... such as the "Hobbie's" below..... should these simply be a single field in the lucene index??? that are tokenized? Or should everything be *duplicated*???? Like this.... (plus I have ignored Alias' for simplicity).


   Place     Person_Name       Person_Occupation  Hobby
   ===========================================================================
   USA          George W Bush  Demolition         Comedy
   USA          George W Bush  Demolition         Mime
   USA          George W Bush  Demolition         Ant Farms
   USA          Bill  Clinton  Retired            Smoking Cigars
   Kazakhstan   Borat          TV Presenter       Dancing
   Kazakhstan   Borat          TV Presenter       Football
   Kazakhstan   Borat          TV Presenter       Swimming

   OR

   Place     Person_Name       Person_Occupation  Hobby
   ===========================================================================
   USA          George W Bush  Demolition         Comedy + Mime + Ant Farms
   USA          Bill  Clinton  Retired            Smoking Cigars
   Kazakhstan   Borat          TV Presenter       Dancing + Football +
   Swimming


I guess my final question, which is really what I am trying to achieve is this.... I want to search for all "Person's" in the "~United States of America", who's name is like "~Klinton" and enjoy's "~Smoking" for a Hobby. An important part of this.... is that "I Wont know which token is to be matched to which field", like when you go to an internet search engine..... so I do I tokenize and put all fields from the XML into a single Field in the index and query that with tokens??????


I realize that I'm posting LOTS of complicated questions.... and I am probably just looking at the equivalent of a HTML indexing/search implementation.



Many Thanks....

--AH




Erick Erickson wrote:
Tell us more about the problem you are trying to solve. Lucene is designed for large text searching, not relations. Trying to "index a data structure"
seems like mis-application of Lucene. Without some idea of what you are
trying to accomplish, any advice you get is irrelevant at best...


Best
Erick

On 12/13/06, Andrew Hughes <[EMAIL PROTECTED]> wrote:

Hey All,

I am very interested in indexing a 3NF Data Structure. Is there any
advice that someone can provide with this? From what I have seen Lucene
is typically a flat "First Normal Form" (Flat) data structure.... The
only way I can see to combine the relational links between multiple
indexes is to compare documents.


Any Help is Appreciated.





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to