In 1998 Netflix launched with $2.5 million of startup cash. Twenty years later, it’s a streaming video empire and independent TV studio with a market cap close to $160 billion. How did a mail-delivery DVD rental service become one of the most dominant forces in media today?
The answer is simple: machine learning models. From the time it launched streaming video, Netflix has integrated models into every aspect of its business, from its famous recommendation engine to the backend processes that keep its streams coming through crystal clear. As a result, it has a leg up on the competition. Legacy entertainment companies from Disney to NBC have struggled to launch streaming services that as reliable, efficient, and popular with viewers.
In this, Netflix isn’t alone. Models power the success of today’s most innovative (and most profitable) companies, from Allstate to Amazon. They’re part of an elite group: only 5 percent of companies have extensively integrated models into their processes, according to a 2017 MIT Sloan study. The other 95 percent may hire data scientists hoping to get in the game but often find themselves plagued by a set of staggering inefficiencies, frustrated and uncertain of why their profit just isn’t living up to the promise.
What do the 5 percent have in common? They don’t fall for the Model Myth: the false assumption that models can be managed in the same way as hardware or software. They adapt their project management practices, IT stacks, and other infrastructure to suit the unique needs of data scientists. Here are three concrete ways that your company can bust the Model Myth and harness the power of models to stay on the cutting edge.
Keep Research Processes Flexible
Scientific discoveries don’t follow a set schedule. In fact, some major breakthroughs in scientific history – like the discovery of penicillin – happened completely by mistake. In the same way, the process of developing a great model rarely follows a straight line. In software development, a client develops a set of requirements and a software engineer simply makes code to fit, whereas data science is a trial-and-error process of inquiry and discovery.
To managers who are used to the straightforward world of software, that process can seem incredibly inefficient. However, in a model-driven organization, managers must come to understand that what looks like a project stalling out can really be a necessary step in the researcher working through a problem – or can lead to an unexpected insight on another problem altogether. For one former Bell Labs data scientist, for example, “following the thread” of an incongruity in performance data for the telephone network led to millions of dollars of savings for the company – in a totally different sector. If managers are overly rigid, they risk missing out on such opportunities.
To make your organization truly model-driven, build some room for experimentation into processes for your data science team. For example, while the scope of a software product can often be reduced in the interest of meeting a deadline, adjusting the scope of a model may alter the project entirely. Managers should set deadlines accordingly.
Deploy Models Where You Need Reasonable Probability, Not Perfect Accuracy
“All models are wrong,” wrote statistician George Box. “But some are useful.”
When software is wrong, it’s because of a bug that engineers need to go back and fix. By contrast, a model being wrong occasionally doesn’t necessarily mean it’s badly made.
It’s true that some incorrect predictions can be the result of problems such as concept drift, which happens when underlying data changes over time, rendering a model trained on earlier data less accurate. Your data science team should look to counteract those very real issues when they arise. However, it’s also true that no model is ever 100% accurate, for the simple fact that models, like much of science, are probabilistic. They tell us the likelihood of a certain outcome, rather than whether that outcome is certain to happen.
This should be an important consideration whenever a business is deciding whether to employ a model to make a business decision. Leadership should ask: how often is that model going to be wrong? What will we do when it is wrong? Does the benefit we get from using the model outweigh the cost of all the times the model is wrong? Rather than assuming an incorrect prediction means a bad model that needs to be fixed, they should understand that probability is the nature of the tool. If an operation is so vital to the business that it must be done exactly right every time, that may not be an appropriate place to integrate a model.
Adapt IT Infrastructure to Data Scientists’ Needs
Compared to data scientists, software developers’ technology use is highly predictable. They work in a standard development environment and can plan in advance when they’ll need specific computing tools, based on coding deadlines. Most companies’ technology foundations are built for this predictability. They are meant for spooling or storing limited data or for running big processing jobs scheduled in advance.
Data science is different and requires a different approach to IT. Data scientists’ work is highly variable day-to-day, and it is difficult for them to schedule their use of heavy-duty computing power very far in advance. Also, each data scientist’s stack is unique and draws on an ecosystem that evolves incredibly rapidly. To visualize how fast-paced this field truly is, consider Tensorflow, one of the most widely used data science tools today; four years ago, it didn’t even exist.
In a management system designed for software or hardware development, red tape can deny data scientists the flexible access to tools and technology they need. That can hold up projects for weeks – or even months – at a time. For a company to realize the full potential of models, IT managers must ensure an open, scalable, and instantly elastic technology foundation. They must speed up package acquisition and approval processes to avoid months-long waits, ensure easy access to scalable compute and specialized hardware like GPUs, and bolster infrastructure necessary for heavy-duty data processing. This isn’t a bonus – it’s a key organizational capability.
The Time is Now to Bust The Model Myth
Failing to make your business model-driven is no longer a question of missing out on profits. Companies that can’t keep up face existential threat. If they don’t want to be outpaced by the likes of Netflix and Amazon, today’s companies have to be willing to rid themselves of old policies that stifle innovation.
By appreciating the “science” of data science, companies can take inspiration from fields like biotech, which already understand that groundbreaking research requires flexibility and the willingness to learn from mistakes. Bust the Model Myth now, and your company can build a management framework that lets you benefit from the incredible potential of models – before time runs out.