January 12, 2022
Picking an algorithm for a machine learning project can be confusing business. Most data scientists will tell you that there is not always a perfect answer to the question “What algorithm or model should I use?” In this series, we look to break down the important details that go into making that decision.
In this article we will look at the two main subsets of supervised learning: regression and classification. As these are both subsets of supervised learning, we must first be dealing with labeled data as we talked about in our understanding of supervised learning. The simplest way to understand the difference between regression and classification is to understand your target variable. Ask yourself, “Am I predicting a continuous number, or is it a category?” Numbers will be predicted by regression models, and categories will be predicted by classification models.
Regression: A classic regression data science problem involves the Boston housing prediction, where the mission is to predict the sale price of a house (a continuous numeric target) using many features of the property. These features include crime statistics, age of the structure, tax rate, and number of rooms, among others. Since it is a supervised learning problem, you need to train this regression model using some known sale prices before it can start predicting sale priced based on the given feature sets.
Classification: A classic classification data science problem involves the Iris dataset prediction, where the mission is to predict the type of species of Iris (Iris setosa, Iris verginica, and Iris versicolor) based on some measurements of the flower. Using pedal length, pedal width, sepal length, and sepal width, the classification algorithms will attempt to determine what species of Iris (which category, or class) the flower belongs. As with regression, labeled data will first need to train the algorithms using known flower types before it can predict flower type based on a given set of features.
Below is an example of a classification algorithm in Infor Coleman ML and it’s produced results. You can see that each flower has an actual category and a predicted category.
One Infor Coleman ML customer in the aquatic farming industry had a problem where the measurement techniques that established the weight and gender of their fish was a long, costly, and manual process. By implementing an Infor Coleman ML solution, they were able to use both regression and classification models to obtain the length, weight, and gender of their fish using pictures taken automatically. Their regression models take information from the image to predict length and weight, which are continuous numeric variables. The same images are used in classification models and categorize the fish into categories, in this case male and female. With model accuracy upwards of 92%, this customer saved massive costs involved with individually measuring their fish.
We will continue this blog series in the coming weeks to further understand the world of machine learning.
Infor Coleman ML is part of the Infor technology platform. If you would like to learn more about how Coleman can benefit your business and the industry specific machine learning models Infor can deploy don’t hesitate to contact us.