Colums bind in r is concatinat in pandas https://www.datasciencemadesimple.com/append-concatenate-columns-python-pandas-column-bind/
Please start a now thread for each questions. tir. 15. mar. 2022, 22:59 skrev Andrew Davidson <aedav...@ucsc.edu>: > Many many thanks! > > > > I have been looking for a pyspark data frame column_bind() solution for > several months. Hopefully pyspark.pandas works. The only other solutions I > was aware of was to use spark.dataframe.join(). This does not scale for > obvious reason. > > > > Andy > > > > > > *From: *Bjørn Jørgensen <bjornjorgen...@gmail.com> > *Date: *Tuesday, March 15, 2022 at 2:19 PM > *To: *Andrew Davidson <aedav...@ucsc.edu> > *Cc: *Mich Talebzadeh <mich.talebza...@gmail.com>, "user @spark" < > user@spark.apache.org> > *Subject: *Re: pivoting panda dataframe > > > > Hi Andrew. Mitch asked, and I answered transpose() > https://spark.apache.org/docs/latest/api/python/reference/pyspark.pandas/api/pyspark.pandas.DataFrame.transpose.html > . > > > > And now you are asking in the same thread about pandas API on spark and > the transform(). > > > > Apache Spark have pandas API on Spark. > > > > Which means that spark has an API call for pandas functions, and when you > use pandas API on spark it is spark you are using then. > > > > Add this line in yours import > > > > from pyspark import pandas as ps > > > > > > Now you can pass yours dataframe back and forward to pandas API on spark > by using > > > > pf01 = f01.to_pandas_on_spark() > > > f01 = pf01.to_spark() > > > > > > Note that I have changed pd to ps here. > > > > df = ps.DataFrame({'A': range(3), 'B': range(1, 4)}) > > > > df.transform(lambda x: x + 1) > > > > You will now see that all numbers are +1 > > > > You can find more information about pandas API on spark transform > https://spark.apache.org/docs/latest/api/python/reference/pyspark.pandas/api/pyspark.pandas.DataFrame.transform.html?highlight=pyspark%20pandas%20dataframe%20transform#pyspark.pandas.DataFrame.transform > > or in yours notbook > > df.transform? > > > > Signature: > > df.transform( > > func: Callable[..., ForwardRef('Series')], > > axis: Union[int, str] = 0, > > *args: Any, > > **kwargs: Any, > > ) -> 'DataFrame' > > Docstring: > > Call ``func`` on self producing a Series with transformed values > > and that has the same length as its input. > > > > See also `Transform and apply a function > > <https://koalas.readthedocs.io/en/latest/user_guide/transform_apply.html>`_. > > > > .. note:: this API executes the function once to infer the type which is > > potentially expensive, for instance, when the dataset is created after > > aggregations or sorting. > > > > To avoid this, specify return type in ``func``, for instance, as below: > > > > >>> def square(x) -> ps.Series[np.int32]: > > ... return x ** 2 > > > > pandas-on-Spark uses return type hint and does not try to infer the type. > > > > .. note:: the series within ``func`` is actually multiple pandas series as the > > segments of the whole pandas-on-Spark series; therefore, the length of > each series > > is not guaranteed. As an example, an aggregation against each series > > does work as a global aggregation but an aggregation of each segment. See > > below: > > > > >>> def func(x) -> ps.Series[np.int32]: > > ... return x + sum(x) > > > > Parameters > > ---------- > > func : function > > Function to use for transforming the data. It must work when pandas Series > > is passed. > > axis : int, default 0 or 'index' > > Can only be set to 0 at the moment. > > *args > > Positional arguments to pass to func. > > **kwargs > > Keyword arguments to pass to func. > > > > Returns > > ------- > > DataFrame > > A DataFrame that must have the same length as self. > > > > Raises > > ------ > > Exception : If the returned DataFrame has a different length than self. > > > > See Also > > -------- > > DataFrame.aggregate : Only perform aggregating type operations. > > DataFrame.apply : Invoke function on DataFrame. > > Series.transform : The equivalent function for Series. > > > > Examples > > -------- > > >>> df = ps.DataFrame({'A': range(3), 'B': range(1, 4)}, columns=['A', 'B']) > > >>> df > > A B > > 0 0 1 > > 1 1 2 > > 2 2 3 > > > > >>> def square(x) -> ps.Series[np.int32]: > > ... return x ** 2 > > >>> df.transform(square) > > A B > > 0 0 1 > > 1 1 4 > > 2 4 9 > > > > You can omit the type hint and let pandas-on-Spark infer its type. > > > > >>> df.transform(lambda x: x ** 2) > > A B > > 0 0 1 > > 1 1 4 > > 2 4 9 > > > > For multi-index columns: > > > > >>> df.columns = [('X', 'A'), ('X', 'B')] > > >>> df.transform(square) # doctest: +NORMALIZE_WHITESPACE > > X > > A B > > 0 0 1 > > 1 1 4 > > 2 4 9 > > > > >>> (df * -1).transform(abs) # doctest: +NORMALIZE_WHITESPACE > > X > > A B > > 0 0 1 > > 1 1 2 > > 2 2 3 > > > > You can also specify extra arguments. > > > > >>> def calculation(x, y, z) -> ps.Series[int]: > > ... return x ** y + z > > >>> df.transform(calculation, y=10, z=20) # doctest: +NORMALIZE_WHITESPACE > > X > > A B > > 0 20 21 > > 1 21 1044 > > 2 1044 59069 > > File: /opt/spark/python/pyspark/pandas/frame.py > > Type: method > > > > > > > > > > tir. 15. mar. 2022 kl. 19:33 skrev Andrew Davidson <aedav...@ucsc.edu>: > > Hi Bjorn > > > > I have been looking for spark transform for a while. Can you send me a > link to the pyspark function? > > > > I assume pandas transform is not really an option. I think it will try to > pull the entire dataframe into the drivers memory. > > > > Kind regards > > > > Andy > > > > p.s. My real problem is that spark does not allow you to bind columns. You > can use union() to bind rows. I could get the equivalent of cbind() using > union().transform() > > > > *From: *Bjørn Jørgensen <bjornjorgen...@gmail.com> > *Date: *Tuesday, March 15, 2022 at 10:37 AM > *To: *Mich Talebzadeh <mich.talebza...@gmail.com> > *Cc: *"user @spark" <user@spark.apache.org> > *Subject: *Re: pivoting panda dataframe > > > > > https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.transpose.html > we > have that transpose in pandas api for spark to. > > > > You also have stack() and multilevel > https://pandas.pydata.org/pandas-docs/stable/user_guide/reshaping.html > > > > > > > > tir. 15. mar. 2022 kl. 17:50 skrev Mich Talebzadeh < > mich.talebza...@gmail.com>: > > > hi, > > > > Is it possible to pivot a panda dataframe by making the row column > heading? > > > > thanks > > > > > > *Error! Filename not specified.* view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > > > -- > > Bjørn Jørgensen > Vestre Aspehaug 4, 6010 Ålesund > Norge > > +47 480 94 297 > > > > > -- > > Bjørn Jørgensen > Vestre Aspehaug 4, 6010 Ålesund > Norge > > +47 480 94 297 >