Data Science 101 — Decision Trees and why they are good for businesses

Introduction

In our day-to-day lives, we all make decisions based on circumstances and events. To make a decision, we choose between multiple alternatives which lead us down different paths to reach the expected outcomes of that decision. Decision tree algorithms work the same way, and can be created to replicate this decision-making process. This can be applied to building financial models to develop business strategy chains of actions and events.

As a business owner, you may want to acquire new customers, but your budget is limited. You need to optimize for the right target market to convert your marketing spend into new customers. So how do you identify these people? You would need a classification algorithm that can identify these customers. This is where decision trees come in handy. You can use your current customer data to build a decision tree. The data should have customer attributes (features) and a target outcome (label) to indicate their conversion.

What are Decision Trees?

A decision tree can divide your datasets into smaller subsets based on the descriptive features until you get all data points into their appropriate label. In general, a decision tree asks a question and then classifies the outcome based on the answer.

Basic outline of what a Decision Tree algorithm is and how it works to split key data into more precise decision sets.
Image by Author

This decision tree is based on a “yes/no” question. You can also build a tree with numeric data.

The decision tree can also be based on ranked data, where one is super hungry, and two is moderately hungry.

Basic outline of what a Decision Tree algorithm is and how it works to split key data into more precise decision sets.
Image by Author

If a person is super hungry, they need to eat. If the person is moderately hungry, then they need a snack. And if they are not hungry at all, then there is no need to eat.

The classification can be categorical or numeric. The final classifications can be repeated. For the most part, decision trees are pretty intuitive to work with. You start at the top and work your way down until you get to a point where you can’t go any further, and that’s how you classify a sample.

You can use any data and turn it into categorical variables to put into the nodes. The model will be trained using the data, and the decision made will depend on which path and boundary will result in the most information gain.

Components of a decision tree

The very top of the tree is called the Root Node. Internal nodes have arrows pointing to them and arrows pointing away from them. Lastly, the leaf nodes have arrows pointing to them, but no arrows are pointing away from them.

Basic outline of what a Decision Tree algorithm is and how it works to split key data into more precise decision sets.
Image by Author

Each feature of your dataset can become the root node. Each outcome can become the leaf nodes. For a dataset with lots of features, you would need to identify which of these features go into which branches to form a good decision tree. For example, a classification tree can classify whether a passenger has survived or died from a plane crash. There is also the regression tree that can predict continuous variables like the price of a stock option.

Decision trees are meant to split the data on different variables into two outcomes, for example, whether a customer buys or does not buy a product. This method is intended to remove impurities, reduce entropy in the population and get a subset of customers with the appropriate labels. In each node, you would calculate the impurities score to decide which nodes/ decision to move forward with to further divide your data.

At Desa Analytics, we can design, build, test, and implement decision trees for our clients. Decision trees are powerful algorithms that provide extremely useful insights for businesses. In addition to building these decision trees, we also incorporate Explainable AI into our models to provide a full understanding for how these powerful algorithms work.

Example of a more complicated decision tree

Decision Tree algorithms have the ability to process large datasets and provide highly accurate and effective final decisions.
Image by Harvard Business School
Image by Investopedia

Pros of Decision Trees

Decision trees have high user interpretability and are easy to understand and explain. You don’t need tons of data preparation to build, and there is no need to normalize the data.

Cons of Decision Trees

Decision trees have a risk of overfitting based on outliers in the data. As you get deeper into the trees, the probability of overfitting to outliers gets higher. So decision trees are restrained by less complex data. To alleviate this, we can use the process of pruning to reduce overfitting.

What problems do decision trees solve?

Decision trees are great for solving classification problems where results are often binary — True or False. You can use this to learn whether or not a client would churn or not churn for your subscription-based product. It can also solve multi-class classification problems such as: ‘churned customer’, ‘converted customer’, ‘convert if they see relatable advertisements’, or ‘will never buy the product’.

There are four steps a business owner should take to make a qualified decision when applying decision trees:

  1. Recognize the issue and acquire relevant data
  2. Recognize the constraints
  3. Identify and evaluate possibilities to choose the best option
  4. Put the answer to work on the problem and keep an eye on the results

Risks versus opportunities

Risks

We are faced with decisions and uncertainty daily, and decision trees can make effective decisions efficiently and accurately for businesses. Incorrect decisions can be costly if not addressed properly. That is why proper uses of decision trees are important. But decision trees also come with risks as they are hindered by overly large data sets, overfitting, and biases.

Information in the decision relies on precise input to provide the user with a reliable outcome. The model, however, can be sensitive to small changes in data and must be accounted for in the model building process. Getting reliable data can also be a challenge for businesses. Inaccurate and unreliable data can lead to incorrect outcomes from the decision tree, increasing the estimated cost to the business.

Decision trees need to be kept simple in order to optimize its accuracy and efficiency. It is integral that only the highest quality and relevant data is selected, making the human-in-the-loop an integral part of the model building process.

Before making a choice, a business must first identify the risks. Because the best judgments are often based on data rather than experience alone, data must be gathered early in the project. Having the information early will reduce the chances of an unforeseen incident occurring. To be used effectively, all data must be processed and organized into the appropriate categories.

Conclusion

Decision trees can be used to aid the risk mitigation process of business decision-making. Decision trees are more effective than other decision-making tools, and they are easy to understand and utilize. Proper use of decision trees can localize and solve problems effectively.

References

https://hbr.org/1964/07/decision-trees-for-decision-making

https://www.investopedia.com/articles/financial-theory/11/decisions-trees-finance.asp

https://towardsdatascience.com/decision-trees-in-machine-learning-641b9c4e8052

https://medium.com/smb-lite/what-is-a-decision-tree-algorithm-4531749d2a17

http://apppm.man.dtu.dk/index.php/Decision_Tree:_Risk_%26_Opportunities