Most companies try to be data driven today. For example, a lot of companies are using data-driven marketing and sales solutions. This creates a lot of possible customers for B2B data solutions. In order for a data-driven product to be successful, we must ensure that our prospects and customers trust the products that we are building. Therefore, it is important to provide them with information about the accuracy of the data solution that they are going to buy or that they are actively using. For that reason, when designing a data solution, we need to include the ability to generate reports that illustrate how well the solution performed for our customer.
In this blog post I will provide three key moments in your customer’s journey where you can earn their trust in your data solution. It is important to think about these scenarios beforehand, so that you can provide these accuracy reports in a proactive way.
Let’s start at the beginning. The first time that a lot of your potential customers will see your data solution is during a demo. In most scenarios, your demo will be based on demo company X, where your data solution has a precision of Y and a recall of Z. A lot of times your prospect will respond to your data with, “This looks nice, but how does this look like with my data? Can I expect the same performance on the data models that you will be building for me?”
If your product is predicting, for example, sales for the next year, you probably don’t want to give away your product for free for a year and let the prospect decide at the end whether they want to buy it. On the other side, your customer probably doesn’t want to pay for a product for which they have seen validations on a demo company only. Therefore, you need to be able to simulate how your product would have looked for the prospect last year, based on historical data from the prospect. In that way, you can simulate the predictions from last year and compare them with the values that you have from the present.
This validation and simulation mode doesn’t need to be incorporated in your production environment. The key is to isolate the predictive components of your software and show that they can easily run on historical data from your client. It is also important that there is a well-defined process in place to extract sample data from your client. Finally, also make sure that you have good agreements in place for what happens with these sample data when the evaluation ends in a good or in a bad way.
Next, let’s say that your prospect was happy with what they saw with the numbers that you generated in the previous scenario, and they have become a customer. The customer is happily using your product. When he have reached the future time of the predictions that we made, both we as data scientists, and our customers want to assess the performance of our models.
The generation of performance reports can be very tedious and time-consuming. A good practice here is to initially generate these reports in Excel sheets or Google sheets and ask for a lot of feedback from your internal stakeholders. When they are satisfied with the metrics that are being tracked, you can automate the generation of these reports. There are various strategies possible, from macros in Excel, to advanced reporting solutions like Domo. It is a good practice to send the performance results of data solutions not only to your customers, but also to internal stakeholders. It is also a good practice to add in alerting levels that will notify you when the performance of certain models is dropping, so that you can react to it.
This brings us to the last scenario: You keep working hard to improve the performance of your model, and you have improved its precision to 1%. You deployed your model last week, and today you get a phone call from a worried customer saying that the product is not behaving as expected. It is important that you understand the use case of your customer and understand why the model is not behaving the way it should. Add these use cases in as extra data for your test set. In this way, you know the influence of your model update on your customer data.
When you get this kind of request from a customer, make sure that you can test these cases against the previous versions of your models. You can use a similar setup to what was described in the first scenario. You should also ensure that you can revert back to a previous model, if needed. Therefore, it is important that you have all the parameters and model files that will enable you to revert back to a previous version. If this is a frequent scenario, try engaging a larger set of your customers to see whether your test set represents a typical use case for these customers. This is also a good time to increase your gold test set, and maybe also improve your algorithm, if needed.
Continuous reporting on the performance of a predictive model can be forgotten when you are designing your data solution. It is, however, one of the biggest concerns from our customers. Therefore, it is important to proactively think about how you would incorporate this in your architecture, so that you can respond quickly to customer questions.
Written by Mary Loubele, Data Growth Coach
Mary Loubele is the Analytics Dev Manager at MappedIn. Prior to joining MappedIn she was a Senior data engineer at TalkIQ which got acquired by Dialpad in May 2018.
Prior to joining TalkIQ she was the director of data science and engineering at FunnelCake. Before that she held positions as a Data Scientist at D2L and as a NLP Software Developer at Maluuba, now a Microsoft company. She holds a PhD in Medical Image Computing and a Master’s degree in Computer Engineering from KU Leuven. She also organizes several meetups in the Waterloo region including KW Intersections and Waterloo Data Science and Engineering.