Re: Suggestion on Spark 2.4.7 vs Spark 3 for Kubernetes

2021-01-05 Thread Prashant Sharma
A lot of developers may have already moved to 3.0.x, FYI 3.1.0 is just around the corner hopefully(in a few days) and has a lot of improvements to spark on K8s, including it will be transitioning from experimental to GA in this release. See: https://issues.apache.org/jira/browse/SPARK-33005 Than

Re: Spark DF does not rename the column

2021-01-05 Thread Mich Talebzadeh
Yes many thanks German. Jayesh kindly reminded me about it. It is amazing how one at times one overlooks these typos and assumes more sophisticated investigation to the code not working. Mich LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

A question on extrapolation of a nonlinear curve fit beyond x value

2021-01-05 Thread Mich Talebzadeh
Hi, I am not sure Spark forum is the correct avenue for this question. I am using PySpark with matplotlib to get the best fit for data using the Lorentzian Model. This curve uses 2010-2020 data points (11 on x-axis). I need to predict predict the prices for years 2021-2025 based on this fit. So

Re: A question on extrapolation of a nonlinear curve fit beyond x value

2021-01-05 Thread Sean Owen
If your data set is 11 points, surely this is not a distributed problem? or are you asking how to build tens of thousands of those projections in parallel? On Tue, Jan 5, 2021 at 6:04 AM Mich Talebzadeh wrote: > Hi, > > I am not sure Spark forum is the correct avenue for this question. > > I am

Re: A question on extrapolation of a nonlinear curve fit beyond x value

2021-01-05 Thread Mich Talebzadeh
thanks Sean. This is the gist of the case I have data points for x-axis from 2010 till 2020 and values for y axis. I am using PySpark, pandas and matplotlib. Data is read into PySpark from the underlying database and a pandas Data Frame is buil

Re: A question on extrapolation of a nonlinear curve fit beyond x value

2021-01-05 Thread Sean Owen
You will need to use matplotlib on the driver to plot in any event. If this is a single extrapolation, over 11 data points, you can just use Spark to do the aggregation, call .toPandas, and do whatever you want in the Python ecosystem to fit and plot that result. On Tue, Jan 5, 2021 at 9:18 AM Mic

Re: A question on extrapolation of a nonlinear curve fit beyond x value

2021-01-05 Thread Mich Talebzadeh
Thanks again Just to clarify, I want to see the average price for year 2021, 2022 etc based on the best fit. So naively if someone asked a question what the average price will be in 2022, I should be able to make some predictions. I can of course crudely use pen and pencil like shown in the attac

Re: A question on extrapolation of a nonlinear curve fit beyond x value

2021-01-05 Thread Sean Owen
You need to fit a curve to those points using your chosen model. It sounds like you want scipy's curve_fit maybe? matplotlib is for plotting, not curve fitting. But that and the plotting are nothing to do with Spark here. Spark gives you the data as pandas so you can use all these tools as you like

Re: A question on extrapolation of a nonlinear curve fit beyond x value

2021-01-05 Thread Mich Talebzadeh
OK will try it thanks On Tue, 5 Jan 2021 at 15:42, Sean Owen wrote: > You need to fit a curve to those points using your chosen model. It sounds > like you want scipy's curve_fit maybe? matplotlib is for plotting, not > curve fitting. > But that and the plotting are nothing to do with Spark he

Re: Suggestion on Spark 2.4.7 vs Spark 3 for Kubernetes

2021-01-05 Thread Sachit Murarka
Thanks for the link Prashant. Regards Sachit On Tue, 5 Jan 2021, 15:08 Prashant Sharma, wrote: > A lot of developers may have already moved to 3.0.x, FYI 3.1.0 is just > around the corner hopefully(in a few days) and has a lot of improvements to > spark on K8s, including it will be transitioni

Extending GraphFrames without running into serialization issues

2021-01-05 Thread Michal Monselise
Hi, I am trying to extend GraphFrames and create my own class that has some additional graph functionality. To simplify for this example, I have created a class that doesn't contain any functions. All it does is just extend GraphFrames: import org.apache.spark.sql.DataFrameimport org.graphframes

Re: Extending GraphFrames without running into serialization issues

2021-01-05 Thread Sean Owen
It's because this calls the no-arg superclass constructor that sets _vertices and _edges in the actual GraphFrame class to null. That yields the error. Normally you'd just show you want to call the two-arg superclass constructor with "extends GraphFrame(_vertices, _edges)" but that constructor is p