Re: A question on extrapolation of a nonlinear curve fit beyond x value

Sean Owen Tue, 05 Jan 2021 07:23:36 -0800

You will need to use matplotlib on the driver to plot in any event. If this
is a single extrapolation, over 11 data points, you can just use Spark to
do the aggregation, call .toPandas, and do whatever you want in the Python
ecosystem to fit and plot that result.


On Tue, Jan 5, 2021 at 9:18 AM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> thanks Sean.
>
> This is the gist of the case
>
> <https://stackoverflow.com/posts/65570917/timeline>
>
> I have data points for x-axis from 2010 till 2020 and values for y axis. I
> am using PySpark, pandas and matplotlib. Data is read into PySpark from the
> underlying database and a pandas Data Frame is built on it. Data is
> aggregated over each year. However, the underlying prices are provided on a
> monthly basis in CSV file which has been loaded into a Hive table
>
> summary_df = spark.sql(f"""SELECT cast(Year as int) as year,
> AVGFlatPricePerYear, AVGTerracedPricePerYear, AVGSemiDetachedPricePerYear,
> AVGDetachedPricePerYear FROM {v.DSDB}.yearlyhouseprices""")
>
> df_10 = summary_df.filter(col("year").between(f'{start_date}',
> f'{end_date}'))
>
> p_dfm = df_10.toPandas()  # converting spark DF to Pandas DF
>
>
> for i in range(n):
>
>   if p_dfm.columns[i] != 'year':   # year is x axis in integer
>
>     vcolumn = p_dfm.columns[i]
>
>      print(vcolumn)
>
>      params = model.guess(p_dfm[vcolumn], x = p_dfm['year'])
>
>      result = model.fit(p_dfm[vcolumn], params, x = p_dfm['year'])
>
>      result.plot_fit()
>
>      if vcolumn == "AVGFlatPricePerYear":
>
>          plt.xlabel("Year", fontdict=v.font)
>
>          plt.ylabel("Flat house prices in millions/GBP", fontdict=v.font)
>
>          plt.title(f"""Flat price fluctuations in {regionname} for the
> past 10                 years """,  fontdict=v.font)
>
>          plt.text(0.35,
>
>             0.45,
>
>             "Best-fit based on Non-Linear Lorentzian Model",
>
>             transform=plt.gca().transAxes,
>
>             color="grey",
>
>             fontsize=10
>
>          )
>
>          print(result.fit_report())
>
>          plt.xlim(left=2009)
>
>          plt.xlim(right=2022)
>
>          plt.show()
>
>          plt.close()
>
> ```
>
> So far so good. I get a best fit plot as shown using Lorentzian model
>
> Also I have model fit data
>
> [[Model]]
>
>     Model(lorentzian)
>
> [[Fit Statistics]]
>
>     # fitting method   = leastsq
>
>     # function evals   = 25
>
>     # data points      = 11
>
>     # variables        = 3
>
>     chi-square         = 8.4155e+09
>
>     reduced chi-square = 1.0519e+09
>
>     Akaike info crit   = 231.009958
>
>     Bayesian info crit = 232.203644
>
> [[Variables]]
>
>     amplitude:  31107480.0 +/- 1471033.33 (4.73%) (init = 6106104)
>
>     center:     2016.75722 +/- 0.18632315 (0.01%) (init = 2016.5)
>
>     sigma:      8.37428353 +/- 0.45979189 (5.49%) (init = 3.5)
>
>     fwhm:       16.7485671 +/- 0.91958379 (5.49%) == '2.0000000*sigma'
>
>     height:     1182407.88 +/- 15681.8211 (1.33%) ==
> '0.3183099*amplitude/max(2.220446049250313e-16, sigma)'
>
> [[Correlations]] (unreported correlations are < 0.100)
>
>     C(amplitude, sigma)  =  0.977
>
>     C(amplitude, center) =  0.644
>
>     C(center, sigma)     =  0.603
>
>
> Now I need to predict the prices for years 2021-2022 based on this fit. Is
> there any way I can use some plt functions to provide extrapolated values
> for 2021 and beyond?
>
>
> Thanks
>
>
>
>
>
> On Tue, 5 Jan 2021 at 14:43, Sean Owen <sro...@gmail.com> wrote:
>
>> If your data set is 11 points, surely this is not a distributed problem?
>> or are you asking how to build tens of thousands of those projections in
>> parallel?
>>
>> On Tue, Jan 5, 2021 at 6:04 AM Mich Talebzadeh <mich.talebza...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am not sure Spark forum is the correct avenue for this question.
>>>
>>> I am using PySpark with matplotlib to  get the best fit for data using
>>> the Lorentzian Model. This curve uses 2010-2020 data points (11 on x-axis).
>>> I need to predict predict the prices for years 2021-2025 based on this
>>> fit. So not sure if someone can advise me? If Ok, then I can post the
>>> details
>>>
>>> Thanks
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>

Re: A question on extrapolation of a nonlinear curve fit beyond x value

Reply via email to