Data Science 101 — Regression Analytics; A powerful tool for businesses

Introduction

Y = mx + b is the first Regression that everyone learns. It is an equation for a straight line, and you can use this to make a simple prediction of any event. For example, if you have a cell phone plan, you can predict how much you will have to pay if you have a pay-per-use plan.

Basic outline of what a Regression algorithm is and how it works to take a set of input data and produce a final output/prediction. A regression can be mapped on an x-y axis.
Image by Cuemath

As we dig deeper into data science, you elevate this equation to a more robust Regression that can solve more complex real-world problems.

What is Regression?

Regression is one of the most common types of machine learning models. It estimates the numeric value of a relationship between input variables.

Basic outline of what a Regression algorithm is and how it works to take a set of input data and produce a final output/prediction. A regression can be mapped on an x-y axis.
Image by Towards Data Science

At Desa Analytics, we can design, build, test, and implement regression models for our clients. Regressions are very useful algorithms that provide extremely useful insights for businesses. In addition to building these models, we also incorporate Approachable AI into our models to provide a full understanding for how these powerful algorithms work.

Applications of Regression

Regression is an important machine learning algorithm that can solve many real-world problems with numeric data. Some of these applications include:

  • Predicting financial metrics like stock prices and housing prices
  • Weather forecasting
  • Customer spend

Depending on the data, context, and timeline, you can effectively apply Regression to solve problems.

How do companies use it?

Regression is the baseline algorithm many businesses use due to its simplicity and effectiveness. Companies use regression to model areas like sales volume, customer spend, and employee retention.

As a small business, you can use regression to understand why customer service calls dropped last month, predict sales over the next four months, or product features selection.

Types of Regression

There are many types of complex regression but the two most common and widely used are linear and logistic regression. Also, it is important to keep in mind that just because the model is more complex doesn’t mean it is better for solving your particular problem. Regression models are also easy to build and easy to interpret.

Basic outline of what a Regression algorithm is and the components that are required to build a regression model.
Image by Author

Linear Regression

Linear regression is the approach for modelling the relationship between two or more variables. The simplest way to think about linear regression is to draw a straight line through plotted data points. The goal is to draw a line to be the best fit, meaning it minimizes the distance between all data points on the graph.

Linear regression can be used in business for predictive analytics. Any forecasting opportunities that you have data for can be used as predictive analysis. For example, you can predict demand for your products based on the quantity sold and price of the product over time.

You can predict the number of shoppers who will visit your shop. Insurance companies predict credit standing of policyholders to get the possible number of claims in a period of time. With linear regression, you can measure if extending hours for your business will increase sales and make decisions accordingly. You can compare operating expenses and working hours to optimize for costs. Being able to test your hypothesis allows you to make better decisions for your business and learn new insights.

Logistic Regression

Logistic regression is similar to linear regression, but slightly different in its application. When you have an event like pass/fail, win/lose, healthy/sick, or dead/alive, it is very hard to model that with linear regression. But these problems are common in the real world. What if you need to analyze whether your customer is going to stop coming to your store under these conditions? Then you would need to use logistic regression.

Logistic regression assigns a probability between 0 and 1, with a sum of 1, to the two events that are likely to occur under the set of variables. For a more technical audience, it measures and graphs a logistic function with a binary dependent variable.

Logistic regressions have binary outcomes (1/0 or yes/no) and are optimal for solving classification problems.
Image by Wikipedia

For this example, we have a pass or fail for students and we have data on the number of hours that each student has studied. After creating this model, we can look at each student, given their number of hours studying, and make a good prediction of whether that student will pass or fail with a certain probability. If the student studies for three hours, they would have a 63% probability of passing.

With the same methodologies and interpretation, we can apply this to many business problems. For example, you can predict which part of a machine will fail depending on the length of time that the material is stored in inventory. The difference between linear and logistic regressions is that linear is for continuous data and logistic is for a limited range of values or categorical data — yes or no, or A, B, C, D.

Image by HBR.ORG

Common mistakes businesses make using Regression

When creating these graphs, it is easy to assume that an input variable can directly explain an outcome. But we have to remember correlation doesn’t mean causation. It might be enough to make decisions but not enough to make a conclusion that can be applied to all situations. You also need to have good background knowledge of the question you are trying to answer, if not you will run the risk of looking for insights in the wrong places.

Not all data are perfect and can be well-fitted to linear or logistic regression because they can have a more complex relationship. Forcing data into a model because it is easy can cause you to have major prediction errors. Robust testing is required to find the best fit models for your data to optimize your business decisions.

Highlighting the visual differences between linear and non-linear regressions.
Image by DataDrivenInvestor

Conclusion

Regression is a simple and easy way to make predictions and analyses of different variables for your business. It is a great way to understand the business in different ways and verify your hypotheses about the business with data and not just intuition. Desa Analytics can provide regression techniques for business problems coupled with high-powered approachable AI.

References

https://www.datarobot.com/wiki/regression/

https://hbr.org/2015/11/a-refresher-on-regression-analysis

https://towardsdatascience.com/5-types-of-regression-and-their-properties-c5e1fa12d55e

https://www.listendata.com/2018/03/regression-analysis.html

https://favtutor.com/blogs/types-of-regression

https://www.newgenapps.com/blogs/business-applications-uses-regression-analysis-advantages/amp/