Re: Turning debian/upstream into BibTeX (Was: Tasks pages (close to) fixed; Bibref does not seem to be updated automatically)

Andreas Tille Tue, 21 Feb 2012 00:07:43 -0800

[Laszlo explicitely in CC because I do not know whether you followed this
 longish mails]

Hi,

On Mon, Feb 20, 2012 at 10:08:20PM -0500, Yaroslav Halchenko wrote:
> > The alternative approach could perfectly be to seek for files matching
> >     /usr/share/doc/*/upstream
> > and do the BibTeX generation afterwards.
> 
> yeap -- that was the idea behind dbib_collect -- to gather from
> all possible places, but if we converge on /upstream, and it would (or
> does already?) allow multiple entries, trigger-generated snippets-cat-er
> might be preferable eliminating the need for an additional user-land
> tool.

Currently debian/upstream explicitely does allow only one entry and
there was some criticism about this fact expressed by Laszlo.  So I
would like to address this point here explicitely.  For me the question
is, how to handle multiple entries sanely and how + where do we use
them.  Currently by design tasks pages display only one reference entry.
If you would put multiple entries into a tasks file the last one will
win.  Even worse if you would try something like

Published-Authors: Alois Schlögl, Clemens Brunner
Published-DOI: 10.1109/MC.2008.407
Published-In: Computer, 41(10): 44-50
Published-Title: BioSig: A Free and Open Source Software Library for BCI 
Research
Published-URL: 
http://pub.ist.ac.at/~schloegl/publications/Schloegl2007_BCI_Software.pdf
Published-Year: 2008
Published-Authors: Some other Author
Published-Title: Some stupid title

you would end up with

  Author: Some other Author
  Title Some stupid title
  DOI: 10.1109/MC.2008.407
  In: Computer, 41(10): 44-50
  URL: http://pub.ist.ac.at/~schloegl/publications/Schloegl2007_BCI_Software.pd
  Year: 2008

This is by design of the RFC 822 parser where the last entry with a
certain name wins.  So in tasks pages we do not have any reasonable
means to specify more than one reference.  Basing on this "feature"
I suggested to design the bibref table like

CREATE TABLE bibref (
        package text NOT NULL,
        key     text NOT NULL,
        value   text NOT NULL,
        PRIMARY KEY (package,key)
);

to explicitely prevent duplication of values to ensure data integrity
(we had at least one case in the past where some package,key pair
occured twice and had broken my attempt to create the tasks pages).
When keeping this design of a flexible package,key,value table which can
easily adapt to new keys the only chance I see would be to do

CREATE TABLE bibref (                                                           

        package text NOT NULL,                                                  

        key     text NOT NULL,                                                  

        value   text NOT NULL,                                                  

        rank    int NOT NULL,
        PRIMARY KEY (package,key,rank)
);               

>From the tasks pages point of view this would require some changes but
I'd regard it as doable.  Remark:  Currently the algorithm for parsing
the references is:  Take references from tasks file only if you did not
found references in UDD.  From tasks files I see no chance to specify
more than one reference, so the only way to specify more than one is
debian/upstream via UDD.

In short: I see a chance for implementing multiple references via
debian/upstream - UDD - tasks pages when defining some ranking.

However, handling multiple references is asking for additional trouble
also for other use case.  For instance if I think about finding a key
for the BibTeX database.  This would have been pretty simple for only
one reference - just take the package key and be done with it.  For
multiple reference you somehow need to event a key and for the moment I
do not see no handy way to do this.  We could somehow relay on the
sequence the references are given which also could serve as rank value
for the UDD table.  We could also use a key like <package><rank> based
on this sequence, but I'd consider this all as a bit hackish - better
suggestions are welcome.

We also need to make sure that the different references are properly
separated inside the yaml file.  I have no experience with yaml files
but I have seen Laszlo inserting '-' signs in front of the first entry
of each separate reference.  I guess usual yaml parser will just do the
right thing and simply assume that this will work flawlessly.

In short: If we really want to support multiple references we need to
clarify the use cases and the implementation details first.  I'm
personally not really convinced that we could not go with one major
reference per package and whether the trouble we need to deal with is
worth the effort for some exceptions.  However, I do not consider myself
as a final user of those references and if there are honest arguments
of users raised I'd be easily convinced to help implementing this.

> > packages with renamed files).  If you are asking: "Why, this should be
> > installed?" I would say:  "You are right, probably nobody has really
> > thought about this."  I would fully agree that it could add extra
> > information to the doc inside a binary package - so why not installing
> > it.
> 
> ;-)

BTW, every developer is free to mention debian/upstream in debian/docs
for the moment - we just missed to do this.

> > Despite this my plan should work with or without the installation of
> > the files.  I would like to do something like this:
> 
> > debian/control:
> > Build-Depends: upstream-to-bibref-helper
> > #  for sure the package needs a better name
> 
> e.g. debian-bibliography-tools... ? ;-)
> or may be the whole debian-bibliography could be abbreviated as
> debbib..., then debbib-tools

I'm not great in inventing names - so any sane suggestion which is not
too longish (just to save the energy in pressing keys :-)) would be
fine.  The only thing we should decide when finding a name would be
whether we want to restrict it explicitely to bibliography or whether
we rather stick to a more generic "upstream" name which enables more
flexibility in case we need to handle some other upstream data.

> > database.  Please note that this is just a scetch which should be
> > enhanced and perhaps / probably the debian.bib file should end up at a
> > better place where bibtex files will be automatically searched for etc -
> > but these are implemantation details.
> 
> IIRC I have looked for such a place and there were no suitable one (I
> could be wrong), so in that preliminary debian-bibliography
> package we placed /usr/share/bib/debian.bib  with the intent to seek
> adding /usr/share/bib into default BIBINPUTS.

I do not consider /usr/share as the right place to put autogenerated
data and the method I described is autogenerated.

> Here debian.bib 
> http://anonscm.debian.org/gitweb/?p=pkg-exppsy/debian-bibliography.git;a=blob;hb=HEAD;f=bib/debian.bib
> is not a compilation of software bib references but
> rather ready-to-use entries for debian documents (e.g. papers) and
> some wiki pages.  With http://wiki.debian.org/CategoryPublication
> we can now extend it automatically each "release" with relevant
> publication entries (script yet TODO).
> ...

So this is rather a manually compiled list of references and is
something else than what we discussed before about fetching the
bibliographic data from debian/upstream right?  Or do you want to freeze
the bibliographic data at some certain point in time inside a source
package and then upload this package with the references?  I'd regard
this method as a possible alternative with the drawback of beeing not
perfectly up to date - just to make sure I do understand you correctly.

Kind regards

        Andreas. 

-- 
http://fam-tille.de

-- 
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]
Archive: http://lists.debian.org/[email protected]

Re: Turning debian/upstream into BibTeX (Was: Tasks pages (close to) fixed; Bibref does not seem to be updated automatically)

Reply via email to