[sage-devel] Re: Tracking Citations; How?

Bill Hart Sat, 06 Sep 2014 09:45:11 -0700

On additional technical point.

The profiling approach need not slow Sage down.


The same approach is used in another project I'm aware of (not for citation 
though). It takes samples at regular points during the computation, figures 
out which function is currently being called and tallies the results.

My experience is this is not noticeably slower than running the computation 
without profiling....

Actually I just checked an example and it slowed down a complex computation 
about 15%. It took around 2400 stack traces over a 2.4s period and 
intercepted calls into around 44 different high level functions at around 
260 different function call points and calls into around 122 distinct C 
functions in C libraries compiled with symbols on, plus another 500 or so 
interceptions in C libraries with symbols off (probably far fewer distinct 
functions). This resulted in a couple of hundred thousand pieces of data 
(each stack trace includes a complete backtrace)

One trick they use is to separate the actual collection of samples from the 
processing. The latter happens after the profiling itself stops.

Bill.

On Saturday, 6 September 2014 18:01:33 UTC+2, Bill Hart wrote:
>
> I don't have any concrete suggestions, but have some random personal 
> thoughts on the matter.
>
> * I personally don't mind Pari being cited if flint actually performs a 
> computation. What I mean is, I'm not sure how important it is to look all 
> the way down the decision tree to see which package actually got called, 
> but rather to see which packages are in general responsible for the 
> computations being performed (even if they aren't actually used in a 
> particular computation) and to cite them all.
>
> (I'm not saying that I wouldn't personally find it very helpful to be able 
> to easily tell which package is carrying out a given calculation involving 
> a given function for a given input. That would be very useful for other 
> reasons. Just not necessarily citation. In particular I might want to look 
> at their source code to see how they did the calculation so 
> fast/slow/correctly/incorrectly.)
>
> * Perhaps it is easier to notionally divide Sage up into discrete domains 
> and to document which packages are ultimately responsible for those 
> domains. Power series over QQ: Packages X, Y and Z, Calculus: Packages S 
> and T, Plotting: Packages P and Q, Something else: Just Sage itself.
>
> * The hardest thing is meaningfully citing packages for a bunch of fairly 
> low level stuff, e.g. integer or polynomial arithmetic, or some 
> computations over the reals or finite fields or big multifaceted 
> computations which cover a multitude of bits and pieces covering many 
> areas. Even in a single area this could be hard. E.g. in doing some 
> algebraic number theory, you might have used Pari, flint, GMP/MPIR, MPFR, 
> NTL, Linbox, IML, Sage, and so on. At what point do you draw the line? Do 
> you include GCC, Cython, Python, autotools, m4, etc? My personal view is 
> one should cite whichever mathematical packages constituted a critical part 
> of your computation. If you could have used almost anything to do your 
> computation, it probably doesn't warrant a citation. If you used the 
> algebraic number theory in Sage precisely because it has features X, Y and 
> Z which you couldn't have used just about anything for, then it would be 
> useful to figure out why Sage can do X, Y and Z and specially cite any 
> packages that have provided that functionality through Sage.
>
> The unfortunate side effect of doing this is packages like GMP/MPIR won't 
> get widely cited because they are dependencies of other libraries and not 
> often used directly in critical computations. But I'm also not sure how 
> helpful it is for MPIR to get cited along with a dozen other packages for 
> some algebraic number theory computation as opposed to being cited by 
> someone working on a new FFT who compares it against the very fast FFT in 
> MPIR.
>
> * At the end of the day, one of the crucial reasons for citation is not to 
> bring recognition and prestige to the people who wrote the packages, but to 
> aid researchers who are trying to track down prior work in the literature. 
> For example, I might be trying to work out how to compute X, and in reading 
> your paper on X - epsilon I might note that you cite package Y as being 
> critical to your work. I might then look into the code for package Y and 
> see that they have solved part of the problem, and learn something about 
> how they did so. This kind of scientific citation has to be balanced 
> against the prestige motivation, and surely preferred, scientifically 
> speaking.
>
> I'm encouraged by your efforts to work on this. I guess in summary, my 
> personal opinion is that it might be easier to start with a pragmatic 
> approach which doesn't attempt to do things at such a fine level and which 
> still relies on the researcher to use a good deal of discretion and 
> understanding when citing packages written by others.
>
> Every year or so we do a search to see who has cited flint and mpir. It's 
> disappointing just how few citations we receive. Either people are just not 
> using flint and MPIR in any way that is critical to their work (definitely 
> a possibility), or writing highly performant C libraries is just not the 
> way to get citations (also very possible). On the other hand, the situation 
> might be improved for us if we spent more time writing papers on new, 
> groundbreaking algorithms being implemented in flint and MPIR.
>  
> Sorry that was a bit rambling. It takes a lot of time and effort to write 
> short, succinct posts. Perhaps my garbled thoughts above will trigger some 
> better thoughts from others who have thought more about the citation 
> problem than me.
>
> Bill.
>
> On Friday, 5 September 2014 15:07:25 UTC+2, Martin Raum wrote:
>>
>> Dear Sage-developer,
>>
>> I'm writing to get an impression on the communities opinion on how 
>> citation management should be implemented.  As a background, I should say 
>> that I have taken it into my head to modernize citation management in Sage. 
>>  I personally find this very important, as it signalized respect to 
>> projects we wrap.  More objectively, I figure such facilities can be a 
>> certain plus when writing the European Sage grant, as many such projects 
>> (Pari, Gap, Singular, FLINT, etc.) are developed in Europe.
>>
>> Current status in Sage
>> ======================
>>
>> Mike Hansen implemented citation facilities in sage.misc.citations.  This 
>> is all we have.
>>
>> sage: from sage.misc.citation import get_systems
>> sage: get_systems("integrate(x^2,x,0,1)")
>> ['ginac', 'Maxima']
>>
>> His implementation uses profiling:
>> 1) run the given code under control of the profiler.
>> 2) parse the list of functions called, extracting the list of modules 
>> called.  For example, sage.libs.pari.
>> 3) Match this list against a certain list of projects, given in 
>> sage/misc/citation.pyx
>>
>> Problems with the current implementation
>> ========================================
>>
>> I'm not trying to put Mike's code down. Actually, I'm really glade he 
>> implemented what we currently have. I'm just saying where we can improve
>>
>> 1) Use of profiling implies that the code runs much slower.  Tracing 
>> citations for a toy computation may result in failure to pick the right 
>> ones.
>> 2) For technical reasons, we miss functions written in Cython.
>> 3) The subsystems themselves don't tell the user how to cite them.
>> 4) The user is not being made aware of current functionality.
>> 5) The naming scheme could be improved. The interface is not user 
>> friendly.
>>
>> Two solutions available
>> =======================
>>
>> We have three tickets dealing with.  At #3317, there is old code by Niels 
>> Ranosch, Michael Brickenstein, Burcin Erocal.  It tries to take a 
>> completely different approach.  At #16777 and #16854, I have provided 
>> improved versions of the current method.
>>
>> The issue
>> =========
>>
>> Burcin has correctly argued at #16854 that the profiling approach is not 
>> capable of tracking decision trees inside a function. I.e., if a function 
>> decides according to some parameter to either call Pari or FLINT, we can't 
>> see this in the profiling.
>>
>> On the other hand, #3317 uses decorators, which have to be applied to 
>> every function that requires citation management.  Alternatively, one can 
>> achieve the same by calling a certain function.  In any event, this means 
>> there will be a slight slow down of Sage in general.
>>
>> Implementation at #3317 is really fast already, but not optimal. If we go 
>> for the decorators approach, I would speed it up.
>>
>> Question
>> ========
>>
>> So, what does the community think.  Should we prefer the profiling or the 
>> decorator approach?  I'm calling for a vote, because I plan to get this 
>> into Sage until, say, the end of this year.
>>
>> Best,
>> Martin
>>
>> PS: My personal vote is +1 decorators
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-devel+unsubscr...@googlegroups.com.
To post to this group, send email to sage-devel@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.

[sage-devel] Re: Tracking Citations; How?

Reply via email to