Deep learning and more generally machine learning is a particular paradigm of programming. Rather than telling the computer explicitly each operation it must perform, you tell the computer how to learn the operations it must perform. Despite what some people might say, this is still programming, and many of the best coding practices still apply. Nonetheless, it’s a sufficiently significant change to lead otherwise successful software companies astray.
In this post, I will describe a framework for thinking about deep learning, and will use this term throughout the article, although many of the ideas also apply to machine learning more generally. This framework describes 4 streams of work related to deep learning. Many companies fail to derive value from deep learning because they generally focus on only one or two of the following streams.
This post is for anyone who works in or around deep learning. This framework is useful whether you’re a project manager, a data scientist, a machine learning engineer, or a team leader wondering why deep learning isn’t providing the expected return on investment.
The 4 streams of deep learning are:
- Model integration
If one of these streams is lacking in your deep learning strategy then you’re probably not as effective as you could be.
Models are the most obvious aspect of deep learning: they are the learning elements of the deep learning ecosystem. The models stream’s objective is to implement and train the deep learning algorithms to achieve good performance on a very specific task. For instance, if you were working in the models stream, your role would be to optimise some very specific performance metrics (precision/recall, IOU, RMSE) on a well defined test set. Other important elements of this stream include framing your problem in terms of input and output (or input and loss), selecting deep learning algorithms and model architecture, and hyperparameter tuning.
These are usually the skills taught in an introduction to deep learning course at university or online. Most people claiming to be machine learning engineers are competent in this area: they know how to select, train and evaluate models. Companies usually understand the importance of this stream and hire accordingly. However, the rise of frameworks such as Tensorflow and Pytorch, as well as pretrained models on large open source datasets have arguably made this stream less critical to the success of deep learning projects. Unless your team is working on a very specific supervised learning problem or using a less mature deep learning method such as reinforcement learning, your models stream is probably not your biggest weakness. This isn’t to say that there aren’t some very concrete challenges in modelling, but that’s a discussion for another time.
Key takeaway: A model that performs correctly on a well-defined dataset is a necessity. Nevertheless, huge advances in machine learning open source software packages and datasets have massively reduced the amount of R&D that you will need in this stream. Before spending all your resources for a minimal gain in performance, make sure you are up to speed in the other streams.
Model Integration Stream
The model integration stream’s objective is to take a model that performs correctly on a well defined deep learning dataset and make it useful in the real world. This stream is essential for companies and should be the core of any deep learning initiative. Despite the huge improvements of end-to-end models in areas such as natural language processing or computer vision, most deep learning models do not solve real world problems on their own. They must be integrated into larger applications composed of more traditional coding methods, such as signal processing, rule based decision making systems etc.
When evaluating the value of a deep learning model, many stop at its performance on the well defined labeled dataset that was used to train the model. Far more important is the value this model provides to the application in which it is integrated. This evaluation might be a lot less obvious. Imagine you work for a robotics company and have trained an object detection system. Your model is performing 3% better than your old one. This is probably huge from a modelling perspective, but how important is that for the robot as a whole?
Companies are more or less susceptible to overlooking the importance of the model integration stream depending on their type of business. Companies whose core business is not in machine or deep learning are very likely to overlook the importance of model integration. They might hire one or two data scientists who will focus on the specific task of getting a model to perform very well on a labeled dataset (i.e the models stream). However when it comes to deploying and integrating the model they will likely lack the software development skills, vision and support for their work to impact the business as a whole. On the other hand, companies that were founded on advances in deep learning are more likely to have a better idea of how the performance of individual models impacts the business as a whole. They might also be more technologically oriented and will have an easier time with the software aspect of the integration.
Key takeaway: Never forget that deep learning is simply a method to achieve a broader objective. If deep learning doesn’t help with your end goal or isn’t more effective than simpler, more traditional methods, it’s not worth it.
Data are likely the second most obvious aspect of deep learning. When people think of data in the context of deep learning, they usually imagine a static set of labeled data. For instance, they might think of an open source image dataset such as Coco or ImageNet, or a dataset with labels provided by a wearable device. They view the data as a fixed asset that does not evolve. Business objectives change, therefore for deep learning to promote business objectives, the data that supports it must be evolving and dynamic. Nowadays, no one would ever assume that a software application is a monolithic bloc that never changes. Data is the same.
For your data to be considered dynamic, it must be updated with feedback from the system in production. This allows to constantly reevaluate the impact of the deep learning models on the end goal and answer questions such as: “Is the model still performing as expected? Or has there been a shift in the input data that is affecting performance?”. This feedback from the field is also likely to provide raw material for new training data. For instance, people using a face recognition app would constantly be providing raw images that can be labeled to produce more training data. This data, once labeled, might even become more valuable than the initial datasets since it represents the most recent data from the field.
Another characteristic of dynamic data is the ability to search and join it with other data sources to provide new views and perspectives. Valuable data has metadata, and is indexed to help with searching and joining.
Key takeaway: For data to be valuable, it must be dynamic. A labeled dataset with little in common with the real world is only useful to bootstrap model development. Make sure you are collecting and using data from your production system to better understand how deep learning is helping and how your models can be improved. Collecting data alone is not enough, the data must be indexed and searchable to provide value.
The infrastructure stream must support the development of the models and data streams by providing adapted data storage and computing. In this context, infrastructure also includes software that supports data flows and the tasks that one might call data engineering. Depending on the type of business and data requirements, this might mean providing other streams with the correct cloud infrastructure, the necessary hardware (e.g GPUs/TPUs) and data APIs. Embedded deep learning systems might add additional constrains.
Infrastructure is particularly relevant for the data stream. Data cannot be dynamic if it is stored in CSVs on a hard drive. For data to be dynamic it must be stored on systems providing fast search capabilities and easy access. The infrastructure stream must select databases adapted to the type of data and the right tools to query and search it.
For small deep learning projects, in which datasets are homogeneous and mostly static, infrastructure might not be a huge concern. But as the impact of deep learning grows in the organisation, the supporting infrastructure becomes fundamental. I have seen deep learning initiatives fall apart and creativity slow to a halt due to a lack of flexibility in how the data could be viewed and analysed. For deep learning solutions to emerge, there must be minimal friction between an idea and the dataset needed to test that idea.
Key takeaway: The infrastructure is the backbone of any large deep learning initiative. Without a well designed infrastructure, deep learning engineers lack the flexibility to test new ideas and be creative.
The importance of focusing on all four of the deep learning streams cannot be overstated. One of the most undeniable demonstrations of the importance of going beyond the models stream and its monolithic datasets, is provided by companies such as Google, Facebook and Microsoft. These companies have open sourced huge labeled deep learning datasets and made available incredibly powerful frameworks for developing neural networks. They would never have done such a thing if they thought open sourcing these resources was a threat to their business. These companies fundamentally believe that the value of deep learning comes not from the models themselves, but from the ability to integrate them into their core business, and to constantly improve these models with real world data.
I hope the 4 Streams of Deep Learning will help you plan your deep learning strategy. Deep learning is an extremely powerful tool. All you need is to know how to use it.