Às 06:04 de 24/09/2024, Bekzod Akhmuratov escreveu:
Below is the link for a dataset on focus. I want to split the dataset into
training and test set, use training set to build the model and model tune,
use test set to evaluate performance. But before doing that I want to make
sure that original dataset doesn't have noise, collinearity to address, no
major outliers so that I have to transform the data using techniques like
Box-Cox and looking at VIF to eliminate highly correlated predictors.
https://www.kaggle.com/datasets/joaofilipemarques/google-advanced-data-analytics-waze-user-data
When I fit the original dataset into regression model with Minitab, I get
attached result for residuals. It doesn't look normal. Does it mean there
is high correlation or the dataset in have nonlinear response and
predictors? How should I approach this? What would be my strategy if I use
in Python, Minitab, and R. Explaining it in all softwares are appraciated
if possible.
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Hello,
R-Help is a list of questions and answers about R code, not to suggest
analysis strategies. Anyhow, I suggest that you read the Python notebook
at the bottom of the Kaggle page, it addresses your main question and if
you have doubts translating the Python code to R code, ask us more
specific questions on those doubts.
Hope this helps,
Rui Barradas
--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença
de vírus.
www.avg.com
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.