Thanks Erick,
I'll give a representation of the data structure that I am trying to
index (in xml)..... This represents a relational data structure. Because
all Place (ie Kazakhstan) Person's are grouped together eta....
<Example>
<Place name="United States of America">
<PlaceAlias>USA</PlaceAlias>
<PlaceAlias>U.S.A</PlaceAlias>
<PlaceAlias>US</PlaceAlias>
<Person>
<Name>George W Bush</Name>
<Occupation>Demolition</Occupation>
<Hobby alias="Funny">Comedy</Hobby>
<Hobby alias="Pretend">Mime</Hobby>
<Hobby>Ant Farms</Hobby>
</Person>
<Person>
<Name>Bill Clinton</Name>
<Occupation>Retired</Occupation>
<Hobby>Smoking Cigars</Hobby>
</Person>
<!-- many more person's here.... -->
<!-- many more person's here.... -->
<!-- many more person's here.... -->
</Place>
<Place name="kazakhstan">
<PlaceAlias>kazak</PlaceAlias>
<PlaceAlias>kazzi</PlaceAlias>
<PlaceAlias>kzh</PlaceAlias>
<Person>
<Name>Borat</Name>
<Occupation>TV Reporter</Occupation>
<Hobby alias="Boogie">Dancing</Hobby>
<Hobby alias="Soccer">Football</Hobby>
<Hobby>Swimming</Hobby>
<!-- many more hobbie's in here.. (or even none) with or
without aliases -->
<!-- many more hobbie's in here.. (or even none) with or
without aliases -->
<!-- many more hobbie's in here.. (or even none) with or
without aliases -->
</Person>
<!-- many more person's here.... -->
<!-- many more person's here.... -->
<!-- many more person's here.... -->
</Place>
<!-- many more place's, person's and hobbie's here.... -->
<!-- many more place's, person's and hobbie's here.... -->
<!-- many more place's, person's and hobbie's here.... -->
</Example>
I am expecting someone to say that this Relational/3NF strucutre should
simply be placed into a flat index... the concept of an index replaces
the 1-Many relational approach by grouping/indexing all "documents" with
the same "Place" together... or at least effectively making the search
time so fast and hence achieving a usable solution....
Place Person_Name Person_Occupation Hobby
===========================================================================
USA George W Bush Demolition Comedy
USA Bill Clinton Retired Smoking Cigars
Kazakhstan Borat TV Presenter Dancing
I do however ask... how would one group duplicate fields.... such as the
"Hobbie's" below..... should these simply be a single field in the
lucene index??? that are tokenized? Or should everything be
*duplicated*???? Like this.... (plus I have ignored Alias' for simplicity).
Place Person_Name Person_Occupation Hobby
===========================================================================
USA George W Bush Demolition Comedy
USA George W Bush Demolition Mime
USA George W Bush Demolition Ant Farms
USA Bill Clinton Retired Smoking Cigars
Kazakhstan Borat TV Presenter Dancing
Kazakhstan Borat TV Presenter Football
Kazakhstan Borat TV Presenter Swimming
OR
Place Person_Name Person_Occupation Hobby
===========================================================================
USA George W Bush Demolition Comedy + Mime + Ant Farms
USA Bill Clinton Retired Smoking Cigars
Kazakhstan Borat TV Presenter Dancing + Football +
Swimming
I guess my final question, which is really what I am trying to achieve
is this.... I want to search for all "Person's" in the "~United States
of America", who's name is like "~Klinton" and enjoy's "~Smoking" for a
Hobby. An important part of this.... is that "I Wont know which token is
to be matched to which field", like when you go to an internet search
engine..... so I do I tokenize and put all fields from the XML into a
single Field in the index and query that with tokens??????
I realize that I'm posting LOTS of complicated questions.... and I am
probably just looking at the equivalent of a HTML indexing/search
implementation.
Many Thanks....
--AH
Erick Erickson wrote:
Tell us more about the problem you are trying to solve. Lucene is
designed
for large text searching, not relations. Trying to "index a data
structure"
seems like mis-application of Lucene. Without some idea of what you are
trying to accomplish, any advice you get is irrelevant at best...
Best
Erick
On 12/13/06, Andrew Hughes <[EMAIL PROTECTED]> wrote:
Hey All,
I am very interested in indexing a 3NF Data Structure. Is there any
advice that someone can provide with this? From what I have seen Lucene
is typically a flat "First Normal Form" (Flat) data structure.... The
only way I can see to combine the relational links between multiple
indexes is to compare documents.
Any Help is Appreciated.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]