Re: pivoting panda dataframe

Bjørn Jørgensen Tue, 15 Mar 2022 15:18:58 -0700

Colums bind in r is concatinat in pandas
https://www.datasciencemadesimple.com/append-concatenate-columns-python-pandas-column-bind/



Please start a now thread for each questions.

tir. 15. mar. 2022, 22:59 skrev Andrew Davidson <aedav...@ucsc.edu>:

> Many many thanks!
>
>
>
> I have been looking for a pyspark data frame  column_bind() solution for
> several months. Hopefully pyspark.pandas  works. The only other solutions I
> was aware of was to use spark.dataframe.join(). This does not scale for
> obvious reason.
>
>
>
> Andy
>
>
>
>
>
> *From: *Bjørn Jørgensen <bjornjorgen...@gmail.com>
> *Date: *Tuesday, March 15, 2022 at 2:19 PM
> *To: *Andrew Davidson <aedav...@ucsc.edu>
> *Cc: *Mich Talebzadeh <mich.talebza...@gmail.com>, "user @spark" <
> user@spark.apache.org>
> *Subject: *Re: pivoting panda dataframe
>
>
>
> Hi Andrew. Mitch asked, and I answered transpose()
> https://spark.apache.org/docs/latest/api/python/reference/pyspark.pandas/api/pyspark.pandas.DataFrame.transpose.html
> .
>
>
>
> And now you are asking in the same thread about pandas API on spark and
> the transform().
>
>
>
> Apache Spark have pandas API on Spark.
>
>
>
> Which means that spark has an API call for pandas functions, and when you
> use pandas API on spark it is spark you are using then.
>
>
>
> Add this line in yours import
>
>
>
> from pyspark import pandas as ps
>
>
>
>
>
> Now you can pass yours dataframe back and forward to pandas API on spark
> by using
>
>
>
> pf01 = f01.to_pandas_on_spark()
>
>
> f01 = pf01.to_spark()
>
>
>
>
>
> Note that I have changed pd to ps here.
>
>
>
> df = ps.DataFrame({'A': range(3), 'B': range(1, 4)})
>
>
>
> df.transform(lambda x: x + 1)
>
>
>
> You will now see that all numbers are +1
>
>
>
> You can find more information about pandas API on spark transform
> https://spark.apache.org/docs/latest/api/python/reference/pyspark.pandas/api/pyspark.pandas.DataFrame.transform.html?highlight=pyspark%20pandas%20dataframe%20transform#pyspark.pandas.DataFrame.transform
>
> or in yours notbook
>
> df.transform?
>
>
>
> Signature:
>
> df.transform(
>
>     func: Callable[..., ForwardRef('Series')],
>
>     axis: Union[int, str] = 0,
>
>     *args: Any,
>
>     **kwargs: Any,
>
> ) -> 'DataFrame'
>
> Docstring:
>
> Call ``func`` on self producing a Series with transformed values
>
> and that has the same length as its input.
>
>
>
> See also `Transform and apply a function
>
> <https://koalas.readthedocs.io/en/latest/user_guide/transform_apply.html>`_.
>
>
>
> .. note:: this API executes the function once to infer the type which is
>
>      potentially expensive, for instance, when the dataset is created after
>
>      aggregations or sorting.
>
>
>
>      To avoid this, specify return type in ``func``, for instance, as below:
>
>
>
>      >>> def square(x) -> ps.Series[np.int32]:
>
>      ...     return x ** 2
>
>
>
>      pandas-on-Spark uses return type hint and does not try to infer the type.
>
>
>
> .. note:: the series within ``func`` is actually multiple pandas series as the
>
>     segments of the whole pandas-on-Spark series; therefore, the length of 
> each series
>
>     is not guaranteed. As an example, an aggregation against each series
>
>     does work as a global aggregation but an aggregation of each segment. See
>
>     below:
>
>
>
>     >>> def func(x) -> ps.Series[np.int32]:
>
>     ...     return x + sum(x)
>
>
>
> Parameters
>
> ----------
>
> func : function
>
>     Function to use for transforming the data. It must work when pandas Series
>
>     is passed.
>
> axis : int, default 0 or 'index'
>
>     Can only be set to 0 at the moment.
>
> *args
>
>     Positional arguments to pass to func.
>
> **kwargs
>
>     Keyword arguments to pass to func.
>
>
>
> Returns
>
> -------
>
> DataFrame
>
>     A DataFrame that must have the same length as self.
>
>
>
> Raises
>
> ------
>
> Exception : If the returned DataFrame has a different length than self.
>
>
>
> See Also
>
> --------
>
> DataFrame.aggregate : Only perform aggregating type operations.
>
> DataFrame.apply : Invoke function on DataFrame.
>
> Series.transform : The equivalent function for Series.
>
>
>
> Examples
>
> --------
>
> >>> df = ps.DataFrame({'A': range(3), 'B': range(1, 4)}, columns=['A', 'B'])
>
> >>> df
>
>    A  B
>
> 0  0  1
>
> 1  1  2
>
> 2  2  3
>
>
>
> >>> def square(x) -> ps.Series[np.int32]:
>
> ...     return x ** 2
>
> >>> df.transform(square)
>
>    A  B
>
> 0  0  1
>
> 1  1  4
>
> 2  4  9
>
>
>
> You can omit the type hint and let pandas-on-Spark infer its type.
>
>
>
> >>> df.transform(lambda x: x ** 2)
>
>    A  B
>
> 0  0  1
>
> 1  1  4
>
> 2  4  9
>
>
>
> For multi-index columns:
>
>
>
> >>> df.columns = [('X', 'A'), ('X', 'B')]
>
> >>> df.transform(square)  # doctest: +NORMALIZE_WHITESPACE
>
>    X
>
>    A  B
>
> 0  0  1
>
> 1  1  4
>
> 2  4  9
>
>
>
> >>> (df * -1).transform(abs)  # doctest: +NORMALIZE_WHITESPACE
>
>    X
>
>    A  B
>
> 0  0  1
>
> 1  1  2
>
> 2  2  3
>
>
>
> You can also specify extra arguments.
>
>
>
> >>> def calculation(x, y, z) -> ps.Series[int]:
>
> ...     return x ** y + z
>
> >>> df.transform(calculation, y=10, z=20)  # doctest: +NORMALIZE_WHITESPACE
>
>       X
>
>       A      B
>
> 0    20     21
>
> 1    21   1044
>
> 2  1044  59069
>
> File:      /opt/spark/python/pyspark/pandas/frame.py
>
> Type:      method
>
>
>
>
>
>
>
>
>
> tir. 15. mar. 2022 kl. 19:33 skrev Andrew Davidson <aedav...@ucsc.edu>:
>
> Hi Bjorn
>
>
>
> I have been looking for spark transform for a while. Can you send me a
> link to the pyspark function?
>
>
>
> I assume pandas transform is not really an option. I think it will try to
> pull the entire dataframe into the drivers memory.
>
>
>
> Kind regards
>
>
>
> Andy
>
>
>
> p.s. My real problem is that spark does not allow you to bind columns. You
> can use union() to bind rows. I could get the equivalent of cbind() using
> union().transform()
>
>
>
> *From: *Bjørn Jørgensen <bjornjorgen...@gmail.com>
> *Date: *Tuesday, March 15, 2022 at 10:37 AM
> *To: *Mich Talebzadeh <mich.talebza...@gmail.com>
> *Cc: *"user @spark" <user@spark.apache.org>
> *Subject: *Re: pivoting panda dataframe
>
>
>
>
> https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.transpose.html 
> we
> have that transpose in pandas api for spark to.
>
>
>
> You also have stack() and multilevel
> https://pandas.pydata.org/pandas-docs/stable/user_guide/reshaping.html
>
>
>
>
>
>
>
> tir. 15. mar. 2022 kl. 17:50 skrev Mich Talebzadeh <
> mich.talebza...@gmail.com>:
>
>
> hi,
>
>
>
> Is it possible to pivot a panda dataframe by making the row column
> heading?
>
>
>
> thanks
>
>
>
>
>
>  *Error! Filename not specified.*  view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
>
>
> --
>
> Bjørn Jørgensen
> Vestre Aspehaug 4, 6010 Ålesund
> Norge
>
> +47 480 94 297
>
>
>
>
> --
>
> Bjørn Jørgensen
> Vestre Aspehaug 4, 6010 Ålesund
> Norge
>
> +47 480 94 297
>

Re: pivoting panda dataframe

Reply via email to