Recently, I was presenting to a senior executive at a large IT company. He stopped me at the beginning of my presentation with the following words:
"My sales people don't even enter data into the CRM system, and when they do, it is garbage. I don't trust it for reporting. So why should I trust it for predictions? Garbage in, garbage out."
I have heard this argument many times from companies in the high-tech industry. Based on the outcomes from each and every one of those companies, I will go out on a limb and say that there is enough quality data in most CRM systems to make predictions with 80%+ accuracy.
The primary reason for a high-level of accuracy is that DxContinuum uses a patented design to automatically overcome most of the key challenges that typically hamper predictive analytics solutions in terms of data quality. When people say they have a data quality problem, they typically mean one of two situations. Either data is missing ("My sales people don’t even enter data into the CRM system"), or data is incorrect ("...and when they do enter, it is garbage.").
Missing Data
Some techniques are better able to cope with missing data than others when creating a predictive model. Since DxContinuum’s platform considers multiple techniques in parallel prior to selecting the right technique for the data, it picks the technique that can handle missing values. For example, we do not consider columns that do not have at least 20% of the rows populated.
DxContinuum’s platform automatically generates not just one model, but a set of models each of which is built using a subset of columns in the original data set. Therefore, if for a particular prediction for an opportunity, the value of a critical variable -- say, the amount of the opportunity -- is unavailable, DxContinuum uses a model that was built without taking amount into consideration.
Incorrect Data
If some of the data in the training set used for model creation are incorrect, the situation is a bit more complex. At a macro level, two broad kinds of data types are used: continuous (number or decimal) and categorical (a list of discrete values). For continuous values, using the log of the number as opposed to the number itself, or binning can manage the deficiency. So, for amounts for example, as long as the order of magnitude is right, the predictions are reasonable.
With categorical variables, it is different. While we have solved the problem of identifying “Walmart” and “Walmrt” as the same entity for some of our customers, the general problem is harder to solve. The quality of data has an impact on the quality of the model and predictions, and could be ameliorated by a master data repository to improve the quality of predictions.
Every company is different, but chances are good that your CRM has enough data to make quality predictions. To do so, you need the proper kind of operational predictive analytics platform.
As for that IT executive that interrupted my presentaiton? He ended up letting us use his historical data to create a model and test it on opportunities with outcomes his team knew but withheld from us. I'm happy to report that our out-of-the-box solution used a mesh of 62 individual models to deliver an 80% threshold like we said we would.
If you're intrigued by what predictive analytics can bring to your sales organizaiton, don't let fears over data quality hold you back. Contact DxContinuum today and get started with our simple FunnelVision test of your data to see what might be possible. Like that IT executive, you'll be very glad you did.