Making Predictions on Data

Understanding the patterns within data and having the ability to create predictive rules allow you to know and gain insight into your data very quickly. However, the true benefit to producing the many predictive rules that the EmcienPatterns engine produces is to use them to predict the outcome of new data. This article covers how to use the UI to load and predict on new data, and how to interpret the results from the tool. In this article we will be using the laptop failure diagnostic log available in our Sample Data Sets.

To make predictions on a test file, click on the Predict link for an analysis in the Outcome column from the home page.  The predictions will be based on the rules generated in that specific analysis.

The New Prediction screen will be displayed. At the top of the screen is the set of predictive rules you'd like to use for the prediction. It will automatically pick the rules set from the analysis you chose, but you can use any rules set you'd like. In the middle of the page, you can select the test file that you'd like to make predictions on- in this example we are using the prediction rules from an analysis called "banded laptop failure" and the training set banded_Laptop_Test.csv.

After clicking "Make Prediction", the EmcienPatterns engine will use all of the rules in the training set to predict all of the transactions in the test file, regardless of whether the test file has filled in values in the outcome category. However, if your test file does not come with the outcome category populated, EmcienPatterns will not be able to create a confusion matrix.

Once the Predictions are ready, clicking "View Predictions" will take you to the Prediction Dashboard.  The Prediction Dashboard gives information on how accurately the rule set was able to describe the test set.

The Prediction Dashboard displays a confusion matrix as well as several metrics. Correct predictions are where the Predicted Outcome is equal to the Actual Outcome, or the diagonal line starting at the top left of the matrix. Additional metrics include:

Strong Signals: Correct predictions made with very high confidence- these can be considered the "best predictions" within the data set and are useful to see the strong indicators and drivers within your data

Mixed Signals: Incorrect predictions with high confidence- illustrating powerful data signals in the training data that were 'wrong' in the test set

Weak Signals: Predictions with very low confidence- either due to the lack of any strong data points to give the engine any confidence in the answer, or multiple strong data points that seemed to go against each other

While just seeing how accurate the predictions are is helpful to understanding how the rules describe the data set, what sets EmcienPatterns apart from other prediction solutions is its ability to give understandable reasons behind its predictions. 

To begin understanding how the predictions were made, and how the rules are used in the test set, click on a specific outcome, such as "Laptop Failure: Yes" and scroll down. This takes you to the Outcome Details area, which outlines the 5 most important rules used in predicting the selected outcome. In this example, you can see that the most important rules in this data set were from knowing the laptop's reboot time and Cache. This indicates that if a user were to reduce laptop failures, he or she would want to start by examining the reboot time and cache and finding what is making them lead to failure.

Going a step further, EmcienPatterns gives prescriptive rules for every single row or transaction in the test set. To see the reasons it made every prediction, click on "View All Predictions for this outcome" and select a transaction.

Here we see how EmcienPatterns applied all of the rules that it found for each transaction, and how it determined the outcome that selected. You can use these reasons to see exactly how your data points are becoming rules and becoming predictions.

Using the UI for predictions is incredibly useful to understanding and predicting your data, but if you have stable predictions and would like to automate the solution, see our article on Creating a Repeatable Process Using API's.

After making predictions on incoming data, the engine will output api's and .csv files with metadata about each prediction. By including a field with the name "Transaction_id" in the first column of the wide or tagged file, it is possible to join the prediction results with the source data that was sent through for predictions. This is important when storing results or finding specific predictions with source information after an analysis.