Re: [Ankur-core] XML standard for Ankur's Abhidhan

Deepayan Sarkar Wed, 13 May 2009 11:07:56 -0700

On 5/12/09, Salahuddin Pasha <salahuddi...@gmail.com> wrote:
> Dear all,
>
>  I was working on অভিধান - Abhidhan for XML support.  To
>  enable various application and tools to utilize our dictionary.
>
>  Basic work is already done, but we need to define a standard XML (XML
>  DTD or XML Schema).
>
>  Any suggestion or comments ?


Back in 2003, the bengalinux dictionary list had a discussion on this.
Nothing ever came out of it, and when Golam first started on anubadok,
his emphasis was more specialized. In any case, that discussion may
provide some suggestions.

You can get it from the list archives, and I'm also attaching a
cleaned up and edited version of the thread here:

<thread from May 2003>

----
                
[Ankur-dictionary] dictionary.dtd
From: Kaushik Ghose <kgh...@wa...> -    2003-05-14 04:17
                                                                                
                        

  Hi,
  here is the descriptor file.
  I'm new to XML and DTDs so please go over the semantics as well as the
  syntax an see if this serves our purpose...


  <?xml version="1.0"?>
  <!ELEMENT entry*(word_bn, info_bn*)>
  <!ELEMENT word_bn (#CDATA)>
  <!ELEMENT info_bn (english, pronounciation_bn,meaning_bn)>
  <!ELEMENT english  (#CDATA)>
  <!ELEMENT pronounciation_bn  (#CDATA)>
  <!ELEMENT meaning_bn  (#CDATA)>

  thanks
  -kg                                           
                                                
----                                                            
        
From: Kaushik Ghose <kgh...@wa...> -    2003-05-14 05:12
                                                                                
                        
 Ok, small correction, QTs DOM class seems to parse this correctly

  dictionary.dtd

  <?xml version="1.0"?>
  <!ELEMENT dictionary (entry*)>
  <!ELEMENT entry (word_bn, info_bn*) >
  <!ELEMENT word_bn (#CDATA)>
  <!ELEMENT info_bn (english?, pronounciation_bn?,meaning_bn?)>
  <!ELEMENT english  (#CDATA)>
  <!ELEMENT pronounciation_bn  (#CDATA)>
  <!ELEMENT meaning_bn  (#CDATA)>


  test.xml

  <?xml version="1.0"?>
  <!DOCTYPE entry SYSTEM "dictionary.dtd">
  <dictionary>
  <entry>
      <word_bn>?????????????????????   ???????????????</word_bn>
      <info_bn>
          <english>seedling</english>
          <pronounciation_bn>ankur</pronounciation_bn>
          <meaning_bn>???????????????????   ???????????
  ????????????????????????    ??????????????????
  ??????????????????</meaning_bn>
      </info_bn>
  </entry>

  <entry>
      <word_bn>?????????????????????   ?????????</word_bn>
      <info_bn>
          <english>bangla</english>
          <pronounciation_bn>bangla</pronounciation_bn>
          <meaning_bn>???????????????????   ?????????????????
  ????????????????????????,    ?????????????????????????   ???????????
  ?????????????????????????   ?????</meaning_bn>
      </info_bn>
      <info_bn>
          <english>bengali</english>
      </info_bn>
  </entry>
  </dictionary>

  thanks
  -kg

        
----
                
From: Deepayan Sarkar <deepa...@st...> -        2003-05-14 07:03
                                                                                
                        
  Ha! A friend of mine once corrected me on this, now I can correct
  someone else :) 'pronounciation' should be spelled
  'pronunciation'.

  I'm not an expert on DTDs (though I know someone who knows much
  more, whom I can ask after after we make some progress). I find it
  very difficult to understand DTD's, and much easier to understand
  examples of what the final thing would look like. Let's work that
  way, and we can write out the DTD on ce we decide on the 'look'.

  I don't know if you know this, but there's something called
  attributes which might be useful. For instance, with multiple
  meanings as different parts of speech.  Here's an example (I'm using
  slightly different tags) --- 'pos' is part of speech, 'plural' is
  whether the word has a plural form, etc.:

  <entry>
        <word>chhaanaa</word>
        <info pos="noun" plural="false" origin="deshi">
                  <meaning>dudh theke toiri ek dhoroner ...</meaning>
                  <synonyms>...</synonyms>
                  <antonyms>...</antonyms>    ## ???
                <translation lang="en">cottage cheese (?)</translation>
                <pronunciation>chhaanaa</pronunciation>
        </info>
        <info pos="noun" origin="tatbhabo">  #it's probably not, but...
                  <meaning>shishu, bachchaa</meaning>
                <translation lang="en">child, young</translation>  # comma 
separated
                <translation lang="hn">bachcha</translation>  #hindi is hn ? 
not sure
                <pronunciation>chhaanaa</pronunciation>
                <derivative form="the">chhaanaaTaa, chhaanaaTi</derivative>
                <derivative form="of" num="singular">chhaanaaTir</derivative>
                <derivative form="of" num="plural">chhaanaader</derivative>
        </info>
  </entry>

  (I've used romanized bengali in place of what should be bengali, but
  you get the idea.)

  I think we should handle derivative words here (and not have
  separate entries for them. They can be generated from
  this). Sanskrit has very systematic rules for 'shabdarup'. Bengali
  isn't as systematic, but there are still quite general rules. We can
  formulate some rules and list down only derivative words that are
  exceptions to that rule. We have the standard forms:

  to, by, for, from, of and in

  plus maybe plurals, the, a --- anything else ?

  Also, Bengali (unlike English) often has many words which mean
  exactly the same thing. We might try to think of a way to have a
  single entry for all o f them.

  Can anyone (preferably with a dictionary at hand) think of anything else ?

  This is not very important right now, but what's a good format to store
  pronunciation ?

        
----

                
From: Taneem Ahmed <tan...@ey...> -     2003-05-14 08:33
                                                                                
                        
  On Wed, 14 May 2003, Kaushik Ghose wrote:

  > Hi,
  > here is the descriptor file.
  > I'm new to XML and DTDs so please go over the semantics as well as the
  > syntax an see if this serves our purpose...
  >
  >
  > <?xml version="1.0"?>
  > <!ELEMENT entry*(word_bn, info_bn*)>
  > <!ELEMENT word_bn (#CDATA)>
  > <!ELEMENT info_bn (english, pronounciation_bn,meaning_bn)>
  > <!ELEMENT english  (#CDATA)>
  > <!ELEMENT pronounciation_bn  (#CDATA)>
  > <!ELEMENT meaning_bn  (#CDATA)>

  I remember someone mentioned something about multiple language support. Is
  it possible to have a general element instead of "english" so that it'll
  be easier to expand for other langauges?

  Taneem                                                
                                                
----

From: Taneem Ahmed <tan...@ey...> -     2003-05-14 08:37
                                                                                
                        
  Sorry I didn't see Deepayan's mail when I sent my previous e-mail. His
  example is what I was talking about :)

  Taneem

  On Wed, 14 May 2003, Deepayan Sarkar wrote:

        
----

                
From: Kaushik Ghose <kgh...@wa...> -    2003-05-14 20:54
                                                                                
                        
  hi,

  On Wed, 14 May 2003, Deepayan Sarkar wrote:

  >
  > Ha! A friend of mine once corrected me on this, now I can correct
someone else
  > :) 'pronounciation' should be spelled 'pronunciation'.
  >

  Okay :), so the new tag for this is <pron> >:D


  > I'm not an expert on DTDs (though I know someone who knows much more, whom I
  > can ask after after we make some progress). I find it very difficult to
  > understand DTD's, and much easier to understand examples of what the final
  > thing would look like. Let's work that way, and we can write out
the DTD once
  > we decide on the 'look'.

  Sure, I think I've got the hold of elementary DTD (ie of the level I set
  out, so I can handle that -QTs happy, so am I...)

  > I don't know if you know this, but there's something called attributes which
  > might be useful. For instance, with multiple meanings as different parts of
  > speech.  Here's an example (I'm using slightly different tags) --- 'pos' is
  > part of speech, 'plural' is whether the word has a plural form, etc.:
  >
  > <entry>
  >     <word>chhaanaa</word>
  >     <info pos="noun" plural="false" origin="deshi">
  >                 <meaning>dudh theke toiri ek dhoroner ...</meaning>
  >                 <synonyms>...</synonyms>
  >                 <antonyms>...</antonyms>    ## ???
  >             <translation lang="en">cottage cheese (?)</translation>
  >             <pronunciation>chhaanaa</pronunciation>
  >     </info>
  >     <info pos="noun" origin="tatbhabo">  #it's probably not, but...
  >                 <meaning>shishu, bachchaa</meaning>
  >             <translation lang="en">child, young</translation>  # comma 
separated
  >             <translation lang="hn">bachcha</translation>  #hindi is hn ? 
not sure
  >             <pronunciation>chhaanaa</pronunciation>
  >             <derivative form="the">chhaanaaTaa, chhaanaaTi</derivative>
  >             <derivative form="of" num="singular">chhaanaaTir</derivative>
  >             <derivative form="of" num="plural">chhaanaader</derivative>
  >     </info>
  > </entry>

  I would suggest only putting in the english synonym, or closest word
  - this is a question of size and interfacing. If we have a set of
  english synonyms we can then use that to link to an English-German
  dict say, or an English-Thai dict to have a bangla-thai dict for ex.
  If we start to put in translations for additional languages I think
  the file will become very large and slow to load.

  As it is, with the bangla word, bangla synonyms, antonyms, meanings
  and english synonyms I think we are going to deal with pretty large
  files for each bangla alphabet.

  Another issue to deal with is what we do with words that have no
  direct one word english equivalent.

  I couldn't get what "origin" means ?  By plural="false" do you mean
  it doesn't have a plural form ?

  > I think we should handle derivative words here (and not have
separate entries
  > for them. They can be generated from this). Sanskrit has very systematic
  > rules for 'shabdarup'. Bengali isn't as systematic, but there are
still quite
  > general rules. We can formulate some rules and list down only derivative
  > words that are exceptions to that rule. We have the standard forms:
  >
  > to, by, for, from, of and in
  >
  > plus maybe plurals, the, a --- anything else ?

  This is fine,

  > Also, Bengali (unlike English) often has many words which mean exactly the
  > same thing. We might try to think of a way to have a single entry for all of
  > them.

  I would rather not. I'd say link it to the required word by putting that
  in the synonym, and in the <meaning> tag put in somethig like "see blah"

  >
  > Can anyone (preferably with a dictionary at hand) think of anything else ?
  >
  >
  > This is not very important right now, but what's a good format to store
  > pronunciation ?
  >
  unicode should do fine, there's a provision for the international phonetic
  alphabet
  http://www.unicode.org/charts/PDF/U0250.pdf

  so the next draft layout...


  <dictionary>
  <entry>
        <word_bn> chanaa </word_bn>
        <info pos="noun" plural="true" origin="??">
                <pron>....</pron>
                <meaning_bn> baccha </meaning_bn>
                <synonym_bn>...</synonym_bn>
                <synonym_bn>...</synonym_bn>
                <antonym_bn>...</antonym_bn>
                <synonym_en>...</synonym_en>
                <synonym_en>...</synonym_en>

                <grammar>
                      <derivative form="the">chhaanaaTaa,chhaanaaTi</derivative>
                        <derivative form="of"
num="singular">chhaanaaTir</derivative>
                      <derivative form="of" 
num="plural">chhaanaader</derivative>
                </grammar>
        </info>
        <info pos="noun" plural="false" origin="??">
                <pron>...</pron>
                <meaning_bn> khabar... </meaning_bn>
          </info>
  </entry>
  </dictionary>


  -kg                                           
                                                
        
----
                
From: Deepayan Sarkar <deepa...@st...> -        2003-05-14 23:25
                                                                                
                        
  On Wednesday 14 May 2003 15:53, Kaushik Ghose wrote:

  > I would suggest only putting in the english synonym, or closest word -
  > this is a question of size and interfacing. If we have a set of english
  > synonyms we can then use that to link to an English-German dict say, or
  > an English-Thai dict to have a bangla-thai dict for ex.
  > If we start to put in translations for additional languages I think the
  > file will become very large and slow to load.

  Before we go any further, we need to decide how we are eventually planning to
  use the XML files.

  I don't think XML is a good format for use in any real application. For
  example, for a spell-checker to load the XML files directly would be very
  inefficient.

  Instead, the XML could be a repository of all possible information
we might
  ever want to have. For a spell checker we could generate something that would
  contain only the words and nothing else (that could be a plain text file, or
  a database, could be in various different encodings and formats). Generating
  this from the XML may take a while, but if we do this once every two months
  or so, it shouldn't matter. Similarly for speech synthesis, we could extract
  only the actual word and its pronunciation, and leave everything else out.

  From that perspective, I don't think it should matter if the XML files become
  large. And of course we don't need to have a single file for each
alphabet,
  we could split them as much as we want (maybe the first 3 letters identify
  each file) as long as given a word it's possible to identify which file that
  word belongs to.

  As for the translation, I'm not saying that we have to list
translations in   to
  all possible languages. But there's no harm in keeping the option.
In fact,
  initially we won't even have english translations for the words that we
  already have. And as you point out, not all words will even have an
English
  translation. All this wouldn't matter if we allow an arbitrary number
  (including 0) of instances of the <translation> tag for each word.

  The English->other language idea may not always be the best because there
  might be some words which have no proper english version, but could have,
  say, hindi versions. We could make it policy to include a non-english
  translation only when this is the case. But explicitly ruling out
that opti   on
  is not a good idea, I think.

  > As it is, with the bangla word, bangla synonyms, antonyms, meanings and
  > english synonyms I think we are going to deal with pretty large files for
  > each bangla alphabet.
  >
  > Another issue to deal with is what we do with words that have no direct
  > one word english equivalent.
  >
  > I couldn't get what "origin" means ?

  Basically tot-somo, tot-bhobo, dishi, bideshi, that sort of stuff.

  > By plural="false" do you mean it doesn't have a plural form ?

  Yes.

  > > I think we should handle derivative words here (and not have separate
  > > entries for them. They can be generated from this). Sanskrit has very
  > > systematic rules for 'shabdarup'. Bengali isn't as systematic, but there
  > > are still quite general rules. We can formulate some rules and list down
  > > only derivative words that are exceptions to that rule. We have the
  > > standard forms:
  > >
  > > to, by, for, from, of and in
  > >
  > > plus maybe plurals, the, a --- anything else ?
  >
  > This is fine,
  >
  > > Also, Bengali (unlike English) often has many words which mean exactly
  > > the same thing. We might try to think of a way to have a single
entry f   or
  > > all of them.
  >
  > I would rather not. I'd say link it to the required word by putting that
  > in the synonym, and in the <meaning> tag put in somethig like "see blah"

  Yes, that should be good enough. Maybe in those cases

  <word_bn>gabAkSha</word_bn>
  <info ...>
      <meaning_bn type="refer">jAnalA</meaning_bn>
  </info>

  > > Can anyone (preferably with a dictionary at hand) think of anything else
  > > ?
  > >
  > >
  > > This is not very important right now, but what's a good format to store
  > > pronunciation ?
  >
  > unicode should do fine, there's a provision for the international phonetic
  > alphabet
  > http://www.unicode.org/charts/PDF/U0250.pdf

  Cool. Does there exist a speech synthesizer which can work from this
? That
  way we could confirm that we enter the correct pronunciation.

  > so the next draft layout...
  >
  >
  > <dictionary>
  > <entry>
  >     <word_bn> chanaa </word_bn>
  >     <info pos="noun" plural="true" origin="??">

  Since most words would have plural="true", we could omit that (the
  default would be "true").

  >             <pron>....</pron>
  >             <meaning_bn> baccha </meaning_bn>
  >             <synonym_bn>...</synonym_bn>
  >             <synonym_bn>...</synonym_bn>

  Any problem with giving multiple synonyms comma separated ?

  >             <antonym_bn>...</antonym_bn>
  >             <synonym_en>...</synonym_en>
  >             <synonym_en>...</synonym_en>

  I still think a translation tag with a language attribute would be more
  appropriate.

  >             <grammar>
  >                <derivative form="the">chhaanaaTaa,chhaanaaTi</derivative>
  >                  <derivative form="of"
  > num="singular">chhaanaaTir</derivative>
  >                <derivative form="of"
  > num="plural">chhaanaader</derivative>
  >               </grammar>
  >     </info>
  >     <info pos="noun" plural="false" origin="??">
  >             <pron>...</pron>
  >             <meaning_bn> khabar... </meaning_bn>
  >         </info>
  > </entry>
  > </dictionary>

  Otherwise looks OK (maybe an optional comment tag for each word),
  unless someone else can think of something.

  BTW, what's the use of the extra _bn for the tags (not that it matters) ?

  Deepayan
        
----

                
From: Kaushik Ghose <kgh...@wa...> -    2003-05-15 02:57
                                                                                
                        
  Hiya,

  On Wed, 14 May 2003, Deepayan Sarkar wrote:

  > Before we go any further, we need to decide how we are eventually
planning to
  > use the XML files.
  >
  > I don't think XML is a good format for use in any real application. For
  > example, for a spell-checker to load the XML files directly would be very
  > inefficient.
  >
  > Instead, the XML could be a repository of all possible information we might
  > ever want to have. For a spell checker we could generate something
that would
  > contain only the words and nothing else (that could be a plain text file, or
  > a database, could be in various different encodings and formats). Generating
  > this from the XML may take a while, but if we do this once every two months
  > or so, it shouldn't matter. Similarly for speech synthesis, we could extract
  > only the actual word and its pronunciation, and leave everything else out.
  >
  > >From that perspective, I don't think it should matter if the XML
files become
  > large. And of course we don't need to have a single file for each alphabet,
  > we could split them as much as we want (maybe the first 3 letters identify
  > each file) as long as given a word it's possible to identify which file that
  > word belongs to.
  >
  > As for the translation, I'm not saying that we have to list
translations into
  > all possible languages. But there's no harm in keeping the option. In fact,
  > initially we won't even have english translations for the words that we
  > already have. And as you point out, not all words will even have an English
  > translation. All this wouldn't matter if we allow an arbitrary number
  > (including 0) of instances of the <translation> tag for each word.
  >

  Ok, that seems fine. The size of the files will matter for the GUI that
  does the dicto editing and any online collaboration tool we come up with
  for creating the dicto, but yes, we'll have automated tools to create
  (like you, may be on the first of every two months) separate file clusters
  for spell checkers, theasauri etc. which can be more compacted.


  Now, for the translation. Are we looking to put in one word that can link
  this bangla word to a word in some other dicto ? Or are we looking to give
  a translation of it ? For that we can probably end up with two sets of
  tags.

  <synonym lang ="">...</synonym>
  <meaning lang ="">...</meaning>

  where synonym is the one word thingy, meaning is well a paragraph or so.


  > Yes, that should be good enough. Maybe in those cases
  >
  > <word_bn>gabAkSha</word_bn>
  > <info ...>
  >     <meaning_bn type="refer">jAnalA</meaning_bn>
  > </info>

  Yes, good idea, I'd prefer a separate tag <refer> which would do this job.
  we could do it via synonyms too, may be everything...

  > Cool. Does there exist a speech synthesizer which can work from this ? That
  > way we could confirm that we enter the correct pronunciation.

  Didn't go much through it but here's a promising site
  http://www.vorde.org/prodVordeTech/documents/vorde/split/node28.html

  > > so the next draft layout...
  > >
  > >
  > > <dictionary>
  > > <entry>
  > >   <word_bn> chanaa </word_bn>
  > >   <info pos="noun" plural="true" origin="??">
  >
  > Since most words would have plural="true", we could omit that (the default
  > would be "true").
  >
  > >           <pron>....</pron>
  > >           <meaning_bn> baccha </meaning_bn>
  > >           <synonym_bn>...</synonym_bn>
  > >           <synonym_bn>...</synonym_bn>
  >
  > Any problem with giving multiple synonyms comma separated ?
  >
  > >           <antonym_bn>...</antonym_bn>
  > >           <synonym_en>...</synonym_en>
  > >           <synonym_en>...</synonym_en>

  Yeah, I couldn't figure out if commas would tell the parser these are
  separate instances, or just one big glob of text, so I played it safe...

  > I still think a translation tag with a language attribute would be more
  > appropriate.

  Yes.

  > >           <grammar>
  > >              <derivative form="the">chhaanaaTaa,chhaanaaTi</derivative>
  > >                  <derivative form="of"
  > > num="singular">chhaanaaTir</derivative>
  > >              <derivative form="of"
  > > num="plural">chhaanaader</derivative>
  > >               </grammar>
  > >   </info>
  > >   <info pos="noun" plural="false" origin="??">
  > >           <pron>...</pron>
  > >           <meaning_bn> khabar... </meaning_bn>
  > >         </info>
  > > </entry>
  > > </dictionary>
  >
  > Otherwise looks OK  (maybe an optional comment tag fr each word), unless
  > someone else can think of something.
  >
  > BTW, what's the use of the extra _bn for the tags (not that it matters) ?

  Yeah, that should get replaced by the lang tag.

  so here it is (hopefully I remembered everything)

  <dictionary>
  <entry>
        <word>...</word>
        <info pos="noun" plural="false" orign="." date=".">
                <pron>...</pron>
                <synonym lang="bn">...</synonym>
                <synonym lang="bn">...</synonym>
                <antonym lang="bn">...</antonym>
                <synonym lang="en">...</synonym>
                <meaning lang="bn">...</meaning>
                <meaning lang="en">...</meaning>
                <grammar>
                        <derivative form="the"
  num="singular">...</derivative>
                </grammar>
        </info>
  </entry>
  </dictionary>

  I'll make a DTD and see if I can make a GUI for it...

  -kg                                           
                                                

----

                
From: Deepayan Sarkar <deepa...@st...> -        2003-05-15 04:13
                                                                                
                        
  On Wednesday 14 May 2003 21:56, Kaushik Ghose wrote:

  > Ok, that seems fine. The size of the files will matter for the GUI that
  > does the dicto editing and any online collaboration tool we come up with
  > for creating the dicto, but yes, we'll have automated tools to create
  > (like you, may be on the first of every two months) separate file clusters
  > for spell checkers, theasauri etc. which can be more compacted.

  Yes, we do need to plan ahead so that individual files don't get very big.
  Since the main purpose of the GUI is to enter new words and edit existing
  words, the only requirement is that given a word we should be able figure out
  which file it should be in. That way, if the file doesn't exist, the program
  could create a blank instance of the XML document object, and if it does
  exist, parse it and read it into memory.

  As for the file structure, we could consider a separate directory for each
  starting character, then one file for each combination of first 3 letters
  (I'm not sure what the best way to name these files would be). But we may
  need to adjust this depending on how many files per directory and how many
  words per file this would make. Could you run through the existing words and
  get an estimate (basically count combinations of first 3 characters) ?


  > Now, for the translation. Are we looking to put in one word that can link
  > this bangla word to a word in some other dicto ? Or are we looking to give
  > a translation of it ? For that we can probably end up with two sets of
  > tags.
  >
  > <synonym lang ="">...</synonym>
  > <meaning lang ="">...</meaning>
  >
  > where synonym is the one word thingy, meaning is well a paragraph or so.

  Again, no harm in keeping the option (that way, we could potentially have a
  bengali to english dictionary as well as a bengali to bengali).

  > > Yes, that should be good enough. Maybe in those cases
  > >
  > > <word_bn>gabAkSha</word_bn>
  > > <info ...>
  > >     <meaning_bn type="refer">jAnalA</meaning_bn>
  > > </info>
  >
  > Yes, good idea, I'd prefer a separate tag <refer> which would do this job.
  > we could do it via synonyms too, may be everything...

  OK.

  > > Any problem with giving multiple synonyms comma separated ?
  > >
  > > >                 <antonym_bn>...</antonym_bn>
  > > >                 <synonym_en>...</synonym_en>
  > > >                 <synonym_en>...</synonym_en>
  >
  > Yeah, I couldn't figure out if commas would tell the parser these are
  > separate instances, or just one big glob of text, so I played it safe...

  The comma is not special in XML, so it would be interpreted as a single long
  string. But we could always interpret them correctly inside applications.
  Anyway, it's not that important.


  > so here it is (hopefully I remembered everything)
  >
  > <dictionary>
  > <entry>
  >     <word>...</word>
  >     <info pos="noun" plural="false" orign="." date=".">

  What's date ? The last modification time ?

  >             <pron>...</pron>
  >             <synonym lang="bn">...</synonym>
  >             <synonym lang="bn">...</synonym>
  >             <antonym lang="bn">...</antonym>
  >             <synonym lang="en">...</synonym>
  >             <meaning lang="bn">...</meaning>
  >             <meaning lang="en">...</meaning>
  >             <grammar>
  >                     <derivative form="the"
  > num="singular">...</derivative>
  >             </grammar>
  >     </info>
  > </entry>
  > </dictionary>
  >
  > I'll make a DTD and see if I can make a GUI for it...

  Great. I have done this sort of programming in Python, but not C++.
I might be
  able to help once you get something going. I think it might be useful to
  start by writing a class to represent a single XML file, with methods to add
  and modify tags (rather than directly accessing the XML document object all
  the time). That way, if there are minor changes in the DTD, we just need to
  modify this class.

  Deepayan

----
                
From: Kaushik Ghose <kgh...@wa...> -    2003-05-16 15:07
                                                                                
                        
  <?xml version="1.0"?>
  <!ELEMENT dictionary (entry*)>
  <!ELEMENT entry (word, info*) >
  <!ELEMENT word (#CDATA)>
  <!ELEMENT info (refer?,pron?, synonym?,antonym?,meaning?,grammar?)>
  <!ATTLIST info pos (n|adj|v|adv) "n" plural (true|false) "false" origin
  CDATA #DEFAULT "????????????" date CDATA>
  <!ELEMENT refer  (#CDATA)>
  <!ELEMENT pron  (#CDATA)>
  <!ELEMENT synonym (#CDATA)>
  <!ATTLIST synonym lang CDATA #DEFAULT "bn">
  <!ELEMENT antonym (#CDATA)>
  <!ATTLIST antonym lang CDATA #DEFAULT "bn">
  <!ELEMENT meaning (#CDATA)>
  <!ATTLIST meaning lang CDATA #DEFAULT "bn">
  <!ELEMENT grammar (derivative?)>
  <!ELEMENT derivative (#CDATA)>
  <!ATTLIST derivative form (the|of) "the" num (singular|plural) "singular">


  also, to answer Deepayan's question by date I was thinking of date of
  origin, first use etc.

  Will potter with QT

  right now, I'm goign to hardcode the DTD structure, I can't think of a
  simple way of creating an editor that will parse the DTD and configure the
  GUI on the fly - fixed boxes for all teh element will be quicker for this
  size DTD

  PS. try the perl tool at
  http://www.sagehill.net/livedtd/download.html

  -kg                                           
                                                

</thread>

------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image 
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
Bengalinux-core mailing list
Bengalinux-core@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bengalinux-core

Re: [Ankur-core] XML standard for Ankur's Abhidhan

Reply via email to