I don't have any concrete suggestions, but have some random personal 
thoughts on the matter.

* I personally don't mind Pari being cited if flint actually performs a 
computation. What I mean is, I'm not sure how important it is to look all 
the way down the decision tree to see which package actually got called, 
but rather to see which packages are in general responsible for the 
computations being performed (even if they aren't actually used in a 
particular computation) and to cite them all.

(I'm not saying that I wouldn't personally find it very helpful to be able 
to easily tell which package is carrying out a given calculation involving 
a given function for a given input. That would be very useful for other 
reasons. Just not necessarily citation. In particular I might want to look 
at their source code to see how they did the calculation so 
fast/slow/correctly/incorrectly.)

* Perhaps it is easier to notionally divide Sage up into discrete domains 
and to document which packages are ultimately responsible for those 
domains. Power series over QQ: Packages X, Y and Z, Calculus: Packages S 
and T, Plotting: Packages P and Q, Something else: Just Sage itself.

* The hardest thing is meaningfully citing packages for a bunch of fairly 
low level stuff, e.g. integer or polynomial arithmetic, or some 
computations over the reals or finite fields or big multifaceted 
computations which cover a multitude of bits and pieces covering many 
areas. Even in a single area this could be hard. E.g. in doing some 
algebraic number theory, you might have used Pari, flint, GMP/MPIR, MPFR, 
NTL, Linbox, IML, Sage, and so on. At what point do you draw the line? Do 
you include GCC, Cython, Python, autotools, m4, etc? My personal view is 
one should cite whichever mathematical packages constituted a critical part 
of your computation. If you could have used almost anything to do your 
computation, it probably doesn't warrant a citation. If you used the 
algebraic number theory in Sage precisely because it has features X, Y and 
Z which you couldn't have used just about anything for, then it would be 
useful to figure out why Sage can do X, Y and Z and specially cite any 
packages that have provided that functionality through Sage.

The unfortunate side effect of doing this is packages like GMP/MPIR won't 
get widely cited because they are dependencies of other libraries and not 
often used directly in critical computations. But I'm also not sure how 
helpful it is for MPIR to get cited along with a dozen other packages for 
some algebraic number theory computation as opposed to being cited by 
someone working on a new FFT who compares it against the very fast FFT in 
MPIR.

* At the end of the day, one of the crucial reasons for citation is not to 
bring recognition and prestige to the people who wrote the packages, but to 
aid researchers who are trying to track down prior work in the literature. 
For example, I might be trying to work out how to compute X, and in reading 
your paper on X - epsilon I might note that you cite package Y as being 
critical to your work. I might then look into the code for package Y and 
see that they have solved part of the problem, and learn something about 
how they did so. This kind of scientific citation has to be balanced 
against the prestige motivation, and surely preferred, scientifically 
speaking.

I'm encouraged by your efforts to work on this. I guess in summary, my 
personal opinion is that it might be easier to start with a pragmatic 
approach which doesn't attempt to do things at such a fine level and which 
still relies on the researcher to use a good deal of discretion and 
understanding when citing packages written by others.

Every year or so we do a search to see who has cited flint and mpir. It's 
disappointing just how few citations we receive. Either people are just not 
using flint and MPIR in any way that is critical to their work (definitely 
a possibility), or writing highly performant C libraries is just not the 
way to get citations (also very possible). On the other hand, the situation 
might be improved for us if we spent more time writing papers on new, 
groundbreaking algorithms being implemented in flint and MPIR.
 
Sorry that was a bit rambling. It takes a lot of time and effort to write 
short, succinct posts. Perhaps my garbled thoughts above will trigger some 
better thoughts from others who have thought more about the citation 
problem than me.

Bill.

On Friday, 5 September 2014 15:07:25 UTC+2, Martin Raum wrote:
>
> Dear Sage-developer,
>
> I'm writing to get an impression on the communities opinion on how 
> citation management should be implemented.  As a background, I should say 
> that I have taken it into my head to modernize citation management in Sage. 
>  I personally find this very important, as it signalized respect to 
> projects we wrap.  More objectively, I figure such facilities can be a 
> certain plus when writing the European Sage grant, as many such projects 
> (Pari, Gap, Singular, FLINT, etc.) are developed in Europe.
>
> Current status in Sage
> ======================
>
> Mike Hansen implemented citation facilities in sage.misc.citations.  This 
> is all we have.
>
> sage: from sage.misc.citation import get_systems
> sage: get_systems("integrate(x^2,x,0,1)")
> ['ginac', 'Maxima']
>
> His implementation uses profiling:
> 1) run the given code under control of the profiler.
> 2) parse the list of functions called, extracting the list of modules 
> called.  For example, sage.libs.pari.
> 3) Match this list against a certain list of projects, given in 
> sage/misc/citation.pyx
>
> Problems with the current implementation
> ========================================
>
> I'm not trying to put Mike's code down. Actually, I'm really glade he 
> implemented what we currently have. I'm just saying where we can improve
>
> 1) Use of profiling implies that the code runs much slower.  Tracing 
> citations for a toy computation may result in failure to pick the right 
> ones.
> 2) For technical reasons, we miss functions written in Cython.
> 3) The subsystems themselves don't tell the user how to cite them.
> 4) The user is not being made aware of current functionality.
> 5) The naming scheme could be improved. The interface is not user friendly.
>
> Two solutions available
> =======================
>
> We have three tickets dealing with.  At #3317, there is old code by Niels 
> Ranosch, Michael Brickenstein, Burcin Erocal.  It tries to take a 
> completely different approach.  At #16777 and #16854, I have provided 
> improved versions of the current method.
>
> The issue
> =========
>
> Burcin has correctly argued at #16854 that the profiling approach is not 
> capable of tracking decision trees inside a function. I.e., if a function 
> decides according to some parameter to either call Pari or FLINT, we can't 
> see this in the profiling.
>
> On the other hand, #3317 uses decorators, which have to be applied to 
> every function that requires citation management.  Alternatively, one can 
> achieve the same by calling a certain function.  In any event, this means 
> there will be a slight slow down of Sage in general.
>
> Implementation at #3317 is really fast already, but not optimal. If we go 
> for the decorators approach, I would speed it up.
>
> Question
> ========
>
> So, what does the community think.  Should we prefer the profiling or the 
> decorator approach?  I'm calling for a vote, because I plan to get this 
> into Sage until, say, the end of this year.
>
> Best,
> Martin
>
> PS: My personal vote is +1 decorators
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-devel+unsubscr...@googlegroups.com.
To post to this group, send email to sage-devel@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.

Reply via email to