loader image

The next step after exploratory data analysis is model selection, building, and testing. In this step, the analytical approach is put together and tested.

A few considerations will help select one or more appropriate statistical or machine learning models:

  • What are the data types? Categorical, ordered, continuous, or mixed.
  • Is there a time index to consider?
  • Is the response multivariate?
  • Are there rules and constraints that need to be incorporated into the model?
  • What models have others used for similar problems?

With a few candidate models selected, the next step is model building, testing, and tuning. In this step the models are configured, validated, and fine-tuned to get better accuracy.

For model validation, a very popular approach is to train the model on one set of data and then, using the trained or fitted model, evaluate its predictive ability on a separate set of data. Through the train-validate-test approach, the best performing models and configurations can be selected.