Thank you sir for the clarification.

On Sat, Jun 19, 2021 at 7:24 PM Konrad Hinsen <konrad.hin...@cnrs.fr> wrote:

> Dear Balaji,
>
> >  I am working on implementing the DataFrame>>dtypes feature which
> > checks the datatypes of columns in a DataFrame, as part of my GSOC
> > project. I have tried to explain my theoretical work done so far on
> > this blog post. Please kindly go through it , as I need advice on what
> > could be the optimal way to implement this feature. Any kind of input
> > and discussions is most welcome.
>
> Your post looks like an overall accurate description of the current
> state of everything - with one exception, and that is Pandas. You say
> you didn't look at the Pandas code yet, so that's not surprising.
>
> You seem to assume that Pandas stores Python objects as elements of
> DataFrames, but that isn't true. Pandas uses NumPy arrays instead. And
> NumPy arrays are very different from standard Python objects, because
> their internal data layout is by design the same as used in C or
> Fortran. For a full description, see
>
>   https://numpy.org/doc/stable/user/basics.rec.html
>
> However, I am not sure you need to understand this in all detail, as I
> am pretty sure that you do not want to copy this approach in Pharo.
>
> The one point that does matter for you is where NumPy and Pandas take a
> column's dtype from. The answer is that it's defined when a DataFrame is
> created, and it cannot be changed afterwards. If a column is "integer",
> it will remain "integer" forever. If you try to assign a string to an
> element of such a column, you get an error message. When you create a
> DataFrame from existing data, e.g. by reading a CSV file, Pandas scans
> the data and determines a suitable dtype, much in the same way as V1.0
> in Pharo/PolyMath did. But since Pandas doesn't allow any later change,
> there is no serious performance issue.
>
> So that's an option you can add to your list: define the dtypes once and
> for all when the DataFrame is created. The main drawback is that you
> would have to change the API for DataFrame creation to make this work.
>
> Cheers,
>   Konrad
> --
> ---------------------------------------------------------------------
> Konrad Hinsen
> Centre de Biophysique Moléculaire, CNRS Orléans
> Synchrotron Soleil - Division Expériences
> Saint Aubin - BP 48
> 91192 Gif sur Yvette Cedex, France
> Tel. +33-1 69 35 97 15
> E-Mail: konrad DOT hinsen AT cnrs DOT fr
> http://dirac.cnrs-orleans.fr/~hinsen/
> ORCID: https://orcid.org/0000-0003-0330-9428
> Twitter: @khinsen
> ---------------------------------------------------------------------
>

Reply via email to