On Sun, 23 Jan 2022 07:34:26 -0800, Tobiah <t...@tobiah.org> declaimed the following:
I'm going to do a little rearranging of your paragraphs, since most of them are domain specific, whereas the last (original) paragraph actually gets to a core... Caveat: I've not written anything making use of either package, so my only basis for commenting is what I've read on web sites (like the pandas documentation site) > >It seems like both libraries are possible choices. Would one >be the obvious choice for me? > pandas USES numpy internally but expands on it... https://www.geeksforgeeks.org/difference-between-pandas-vs-numpy/ """ A numpy array is a grid of values (of the same type) that are indexed by a tuple of positive integers, """ Pandas provide high performance, fast, easy to use data structures and data analysis tools for manipulating numeric data and time series. """ Pandas, I believe, might get closer to what one might find in statistical packages (like R) in that it supports tables/data-frames in which each column may be a different data type. I don't know if it actually has the statistics concepts of "factors" (eg: a column containing "male"/"female" is not really a text column but closer to an enumeration type). >I need to compose large (hundreds, thousands, maybe millions) lists >and be able to do math on, or possibly sort by various columns, among other >operations. A common requirement would be to do the same math operation >on each value in a column, or redistribute the values according to an >exponential curve, etc. En-mass operations should be supported; not sure about the "redistribute" -- if you can define a function that takes one input parameter (the existing value) and returns the redistributed value, I'd think it should be feasible. > >One wrinkle is that the first column of a Csound score is actually a >single character. I was thinking if the data types all had to be the >same, then I'd make a translation table or just use the ascii value >of the character, but if I could mix types that might be a smidge better. > Based upon the comparison I linked, pandas should be applicable for this. For pure numpy, you'd likely be better off maintaining a separate list (though sorting will require some tricks to keep the numpy array in sync with the character list). Note that the comparison warns that /indexing/ in pandas can be slow. If your manipulation is always "apply operationX to columnY" it should be okay -- but "apply operationX to the nth row of columnY", and repeat for other rows, is going to be slow. -- Wulfraed Dennis Lee Bieber AF6VN wlfr...@ix.netcom.com http://wlfraed.microdiversity.freeddns.org/ -- https://mail.python.org/mailman/listinfo/python-list