The next step after exploratory data analysis is model selection, building, and testing. In this step, the analytical approach is put together and tested.
A few considerations will help select one or more appropriate statistical or machine learning models:
- What are the data types? Categorical, ordered, continuous, or mixed.
- Is there a time index to consider?
- Is the response multivariate?
- Are there rules and constraints that need to be incorporated into the model?
- What models have others used for similar problems?
With a few candidate models selected, the next step is model building, testing, and tuning. In this step the models are configured, validated, and fine-tuned to get better accuracy.
For model validation, a very popular approach is to train the model on one set of data and then, using the trained or fitted model, evaluate its predictive ability on a separate set of data. Through the train-validate-test approach, the best performing models and configurations can be selected.