Re: [R] Help needed! Pre-processing the dataset before splitting - model building - model tuning - performance evaluation

Rui Barradas Wed, 25 Sep 2024 01:01:10 -0700

Às 06:04 de 24/09/2024, Bekzod Akhmuratov escreveu:

Below is the link for a dataset on focus. I want to split the dataset into
training and test set, use training set to build the model and model tune,
use test set to evaluate performance. But before doing that I want to make
sure that original dataset doesn't have noise, collinearity to address, no
major outliers so that I have to transform the data using techniques like
Box-Cox and looking at VIF to eliminate highly correlated predictors.


https://www.kaggle.com/datasets/joaofilipemarques/google-advanced-data-analytics-waze-user-data

When I fit the original dataset into regression model with Minitab, I get
attached result for residuals. It doesn't look normal. Does it mean there
is high correlation or the dataset in have nonlinear response and
predictors? How should I approach this? What would be my strategy if I use
in Python, Minitab, and R. Explaining it in all softwares are appraciated
if possible.


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

R-Help is a list of questions and answers about R code, not to suggestanalysis strategies. Anyhow, I suggest that you read the Python notebookat the bottom of the Kaggle page, it addresses your main question and ifyou have doubts translating the Python code to R code, ask us morespecific questions on those doubts.


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help needed! Pre-processing the dataset before splitting - model building - model tuning - performance evaluation

Reply via email to