Re: OT: why do people use python when it is slow?

Laeeth Isharc via Digitalmars-d-learn Thu, 15 Oct 2015 14:30:52 -0700

On Thursday, 15 October 2015 at 07:57:51 UTC, Russel Winder wrote:

On Thu, 2015-10-15 at 06:48 +0000, data pulverizer viaDigitalmars-d- learn wrote:
[…]
A journey of a thousand miles ...
Exactly.
I tried to start creating a data table type object by
investigating variantArray:
http://forum.dlang.org/thread/hhzavwrkbrkjzfohc...@forum.dlang.org
but hit the snag that D is a static programming language andmay notallow the kind of behaviour you need for creating the samekind of
behaviour you need in data table - like objects.
I envisage such an object as being composed of arrays ofvectors where each vector represents a column in a table as inR - easier for model matrix creation. Some people believe thatyou should work with arrays of tuple rows - which may be morebig data friendly. I am not overly wedded to either approach.
Anyway it seems I have hit an inherent limitation in thelanguage. Correct me if I am wrong. The data frame needs tohave dynamic behaviour bind rows and columns and return partsof itself as a data table etc and since D is a static languagewe cannot do this.
Just because D doesn't have this now doesn't mean it cannot. Cdoesn't have such capability but R and Python do even though Rand CPython are just C codes.
Pandas data structures rely on the NumPy n-dimensional arrayimplementation, it is not beyond the bounds of possibility thatthat data structure could be realized as a D module.
Is R's data.table written in R or in C? In either case, it isnot beyond the bounds of possibility that that data structurecould be realized as a D module.
The core issue is to have a seriously efficient n-dimensionalarray that is amenable to data parallelism and is extensible.As far as I am aware currently (I will investigate more) theNumPy array is a good native code array, but has some issueswith data parallelism and Pandas has to do quite a lot of workto get the extensibility. I wonder how the R data.table works.
I have this nagging feeling that like NumPy, data.table seems alot better than it could be. From small experiments D is (andalso Chapel is even more) hugely faster than Python/NumPy atthings Python people think NumPy is brilliant for. Expectationsof Python programmers are set by the scale of Pythonperformance, so NumPy seems brilliant. Compared to the scaleset by D and Chapel, NumPy is very disappointing. I bet thesame is true of R (I have never really used R).
This is therefore an opportunity for D to step in. However itis a journey of a thousand miles to get something productionworthy. Python/NumPy/Pandas have had a very large number ofprogrammer hours expended on them. Doing this poorly as a Dmodules is likely worse than not doing it at all.

I think it's much better to start, which means solving your ownproblems in a way that is acceptable to you rather than lettingperfection be the enemy of the good. It's always easier to dosomething a second time too, as you learn from successes andmistakes and you have a better idea about what you want. Ofcourse it's better to put some thought into design early on, butthat shouldn't end up in analysis paralysis. John Colvin andothers are putting quite a lot of thought into dlang science, itseems to me, but he is also getting stuff done. Running D in aJupyter notebook is something very useful. It doesn't matterthat it's cosmetically imperfect at this stage, and it won't staythat way. And that's just a small step towards the bigger goal.

Re: OT: why do people use python when it is slow?

Reply via email to