This is the second part of our three part article series - You can refer to Part I here.
Garbage in, garbage out. One of the most quoted statements in the field of computer science, this can be interpreted in multiple contexts. Your AI initiative and strategy is only as good as the data you're collecting. As good fuel powers engines to perform better, good data powers your AI initiative to deliver exceptional value. This is a step that most business leaders are usually far from, which gets them to believe that almost all the data they currently collect can be used to predict any outcome. Many leaders end up predicting desirable outcomes from absolutely unnecessary or irrelevant features and having their data science teams optimise for an incorrect north-star metric, leading to poor performance masked behind high accuracies.
The process of collecting relevant data begins at the initial step of creating your data-map and continues to every initiative you build thereon. This allows for better interconnectivity between features and initiatives of a certain product and creates an inherent path within your data. Once the initiative reaches a certain level of maturity, data centralisation is a great way to manage your data from multiple sources. For instance, if we try to predict purchase likelihoods of an item on an e-commerce website, a lot more variables can now be looked at apart from just explicit indicators. Have they made any purchases before, spent enough time on the product, or added balance to their wallets?
Detailed interconnectivity of data helps when you're investigating cause-and-effect as well. If you have multiple aspects of a product that all flow into one global action, such as a final product purchase, relevant data tracking at each step can ensure that you go granular with your analytics and pinpoint each step made. This should also help you track the right metrics in terms of what your initiative is out to solve.
For AI initiatives to run their course and create long term impact, a strong link to the product is necessary. As it's commonly said in the statistical world: statistical significance does not equate to practical importance, and quite a ton of R&D does not translate to direct business impact or outcome. While most conceptualisation does begin at the soul of data science teams, tying outcomes to the product and garnering real value out of experiments is something that the data science and product teams do best in unison. In a cross-functional collaborative team, many things happen to work well. For starters, there's a volume of knowledge from varied viewpoints in the room that can approach the same problem in different ways. More importantly, the right domain experts for the right feature sets are available for you to bring everyone on the same page. Does your feature involve psychological domain expertise that you may be overlooking through your data? You're in luck; your product owner is in the room with you.
More often than not, domain experts bring radical ideas to the table. As connoisseurs of their fields, they usually have the drive to build and deploy certain features that are unparalleled in the industry and unique in terms of their propositions. Some of these ideas can never see the light of day, because of them being either too far-fetched and futuristic or having minimal business outcome. For instance, my team and I were once trying to develop an upgrade to our sentiment analysis using state-of-the-art algorithms: including each user's context both in terms of their current organisation's state and the question asked. Undoubtedly, this was an incredible idea to envision and may have had a significant long-term impact to our business, but during our impact analysis and prioritisation sessions, we chose against it and ranked this at the bottom. Why? Depending on the scale of your current business, there are always more urgent problems to solve that could have a significant short-term impact without taking as long as this one could have taken us (low hanging fruits). Is the best solution to trash these longer-term thoughts? I advise against it. If it's a great idea that can be useful down the line, add it to the "revisit soon" bucket.
When identified and bucketed timely, the “revisit soon” ideas can make for excellent long-term project plans with targeted outcomes. Else, they're very prone to get caught up in infinite R&D loops if attempted earlier or later than they should be.
Given the growth and scale of AI adoption in the past five years, most of us may have already experienced our first AI initiative. The chances are that we've experienced either directly or indirectly, one complete workflow of how an initiative is deployed to the end-users. A large part of ensuring an AI initiative's adoption is to arrive at an acceptable degree of anthropomorphic perception by its end-users. Humanise your initiative too much, and each mistake is watched and critiqued. Humanise too less, and it may lose adoption.
Two areas need to be watched for while starting with your initial deployment.
- Workflow Adaptability
- Metrics Impact
End-users of your initiative, on either end of the spectrum, could be new to the change and play around to get the hang of it: the novelty effect.[i] This can statistically impact your rollouts too, but more importantly, they feel a new element in their pre-existing workflow, and that may need time to navigate around. For example, a line-worker in a phone factory may be accustomed to a particular workflow, or an HRBP in an organisation would be well aware of how to gather feedback through year-long surveys or pulse triggers. A shift in how they lead their standard workflow requires both training and time for exploration to be given – so we can identify whether they work around the initiative or use it to get maximum value from it. Many times, you will find that a fair proportion of your end-users may not adapt well to an initiative without its ancillary deployments. An HRBP that usually sends out manual pulse-surveys and extracts data through excel-file downloads may initially be excited about getting their pulses automated but may not use the new initiative extensively since it still involves their analysis to happen like it did earlier. For increased cultural adoption and adaptability, the ancillary deployments play a significant role in moving the needle. You can only identify the parts of your workflow that remain incomplete by starting small and moving slow. A brand-new workflow for the user may be too hard to interact with, giving you misplaced data to verify your experiment.
Another set of users that you would identify in this process are the ones that go the extra mile to "deal with" their ecosystem. Another false trigger that the rollout has been successful, but in reality, people may just be (silently, not joyfully) doing more than what they earlier used to. This may not impact your initiative in the short run but can later show as lower satisfaction scores. With a controlled rollout, this could help you find and mitigate the additional workflow step using other guardrail metrics apart from your north star metric. If your end-users are deriving higher value at the expense of their time, the question is whether the threshold of value derived versus time spent is appropriate for their use-case.
As I briefly mentioned the novelty effect earlier, there are a few more similar effects, and they all have one primary function: to show you impact where there is none. You could be rolling out a remarkable feature to your end-users, and there may be an immediate impact on your north star. While you speed up your rollout, you may be ignoring whether this was just initial excitement that led to your growth or actual impact.
Without delving into the mathematical intricacies, this is also a step where your north star metric could fail your strategy. A north star metric is one that is sacrosanct to the initiative: the guiding metric to your success or failure. For sales, it's the dollar value. For customer success, it could be renewals. For engineering, there could be lower bugs in production. This metric is set at the beginning, and initiatives are built around tracking these for growth. While your north star could practically blow up and show significant impact if you move too fast due to many external factors, something that protects you from more extensive damage are your guardrail metrics. These are the ones that show you the larger picture, while your north star remains bullish. For example, as an e-learning company, you could track higher hours in the classroom as a north star for an initiative that brings people back to the platform to study and maintain graduation rates as a guardrail to ensure that their learning has been successful. For a better understanding of what your north star and guardrail metrics should be, a glance at the current workflow or user journey to complete a critical task should suffice. Are you adding a step to the workflow, subtracting one, or creating one that aids the rest? Depending on these decisions, your guardrail and north star metrics would change and adapt to track your initiative better and roll out smoother.
While these two sections speak of both cultural and statistical nuances from an end-user's standpoint, another benefit of a controlled ramp is that it helps you rectify your errors as you spot them early and make adaptations to your initiative based on initial user behaviour.
Early in my tenure at inFeedo, when we deployed our intelligent text-bucketing algorithm[ii], we missed one crucial step in the race to going out to the customer: ramping. While we started with eight distinct buckets like salary, relationship, manager, etc., we found that one strong use case was left out: giving the provision to users to bucket their textual answers in a way that would suit them. This meant giving the user flexibility to create custom buckets, going out of the eight that the product allowed for. We found this as a missing piece in a variety of consumer conversations and internal feedback sessions. Since it was a new launch, the novelty effect helped us latch on to the wave, allowing us to foresee more requirements and build something before it got to a point where it drew criticism. The luxury of novelty that we enjoyed isn't typical: ensure controlled ramps to the end-user. There is so much more that we could have done if we ramped better, which we eventually learned in future experiments: fixing bugs, closing the loop with the end-user, and envisioning newer use cases.
The most crucial checkbox to tick before formulating a rollout strategy is adapting to your domain of deployment. Working at a capacity lesser than the benchmark, and not waiting enough to go out to an acceptable level of workability in sensitive domains can call for a lot of harm to come your way. I do believe data scientists and algorithm makers should merely be in the loop while deciding the “acceptable” level of workability of a solution in niche domains, and the primary decision-making left to either policymakers or domain experts. I mention this about two domains in particular, but there are undoubtedly a ton more that can either directly or indirectly influence the fate of individuals or end-users of the initiative: healthcare and attrition prediction. Some other ones to consider are along the lines of self-driving or flying cars, and loan-default prediction.
Your initiative will be at cultural odds while using black-boxed algorithms to directly influence the fate of a child going through surgery, or that of an extraordinarily passionate and career-oriented individual that the machine thinks is highly likely to attrite. In domains or use-cases of such high sensitivity, the smaller performance boosts can add a lot of value to the overall perception of your efforts. Going out too fast may damage trust irreparably, and regardless of how much you improve in the coming weeks or months, broken trust is hard to win back.
You may feel a never-ending wait to deployment being created since some argue there’s always room for more improvement. A great way I’ve seen firms tackle going to market earlier with room for improvement still on the cards, is by making great use of psychological priming, a technique where exposure to one stimulus influences a response to another subsequent stimulus, is used widely in the application of black-boxed initiatives that may be running at sub-par capacity (or even at full capacity). This helps to avoid flak in your early stages with bold claims that may not be as close to reality as expected. At inFeedo, we used it by adding an icon that warned users of our functional capabilities with predicting attrition and ensured end-users that they still held the higher hand in matters of feedback and career decision making and that we were just a tool to aid them while doing so. Different initiatives adopt different strategies to this particularly simple technique to assure end-users that human intervention is still primary and isn’t being replaced, at least not with sub-par performance.
In domains that make high-value decisions based on forecasting of the future, it’s even more important to be transparent about the route taken to arrive at a conclusion or call out the limitations so that it helps in tracing-back the right or wrong, to either recreate or avoid a certain decision in the future. Future irregularities are outliers from the past and may not have a trend to spawn. It is inadvisable to completely hand-off decision making in these domains to automatons since there could be an innumerable number of variables in play that have resulted in an outcome or prediction, and you may not be using them all. If you’re on the generating-end of the forecast, setting these expectations right from the start would help you gain trust and aid safer decision-making, and if you’re on the receiving end, be mindful of what you base your decisions on.
Building on workflow adaptability, going a step further to understand how far your end-users will be derailed from their current normal is essential to see whether your fast ramp moves them too far away from functioning efficiently. This could happen if your initiative is the first “intelligent” one they come across in their workflow. This quick shift initiates a cultural change they may experience, making them think about aspects of your motivation that you may not want to bring out. As part of a rapid change, a spiral effect could be triggered that can reduce your NPS (Net Promoter Score) as an initiative. Often, this effect starts at the top and doesn’t wait to trickle down before the move is scrutinized. On other occasions, where your end-users may not be decision-makers, it could take slightly longer to reach a similar final state while reducing adoption and forcing people to adopt ways they’re more familiar with. This is where your guardrail metrics may help you before the real NPS drop happens – suggesting lower adoption and other signals before the real north star gets affected.
You can go on to read Part III here.