Systems Engineering for Machine Learning

Machine Learning systems developed in a structured way
Av Morten Werther
Machine Learning development has until now been a fairly experimental endeavour. By applying Systems Engineering practices, for example by defining its development lifecycle, it can be brought closer to traditional engineering disciplines.


Machine Learning has seen a near exponential uptake in products in the last decade. Models mimicking brain-like structures (neurons and synapses) have been around since the days of the early computers but were for a long time mainly adopted by the academic community. Over the years it evolved, methods from traditional statistical analysis were incorporated and it is now booming in the age of Big Data where there are few competing alternatives to describe complex dependencies within large data sets. One of the main drivers has been the rapid increase of computing power based on a technological development that follows Moore’s law. This has enabled processing of vast amounts of data with highly complex Machine Learning models (many nodes and layers), e.g. Deep Learning.

However, even to this date development of Machine Learning models require a fairly experimental way-of-working and relies on people with specific competences, although Big Data and Data Analytics require similar skill sets for roles like Data Scientists and Data Engineers. Lately there have been efforts to bring Machine Learning development closer to traditional engineering disciplines by defining its development lifecycle, competence needs, tools and best practices. In other words, extending Systems Engineering (loosely described by the quote below) to also cover Machine Learning (ML). This is not only to support those that develop ML systems, but also to give the rest of the organization a chance to:
- Plan, develop and maintain products containing ML components
- Design ML components so that they fit with the other parts of a product
- Identify where you are in the development lifecycle and how to track and support development
Systems Engineering adds a system view, which is vital for ML solutions that are getting increasingly complex and require a holistic understanding.

”Systems engineering is the art and science of developing an operable system capable of meeting requirements within often opposed constraints. Systems engineering is a holistic, integrative discipline, wherein the contributions of … engineers … and many more disciplines are evaluated and balanced, one against another, to produce a coherent whole that is not dominated by the perspective of a single discipline.”
(NASA, 2019)

Machine Learning Development

Developing an ML system often reminds more of research than traditional development. The solution – if one exists – is not given beforehand and results are reached by experimentation and iterations. The most distinguishing feature of ML is, however, that the solution logic is not defined or coded but extracted from data. Thus, data is absolutely essential to an ML solution and therefore also the ability to:
- Gather, store, clean (from e.g. noise or bias) and pre-process data
- Understand the data at hand, e.g. does it represent the problem being studied well enough?
- Monitor data over time and context, since changes in patterns should be expected

Activities like these have been simplified through a substantial increase of available tools and frameworks (many of them Open Sourced) that can be used both for data analysis and creation of almost infinitely complex ML models. Thus, practitioners have a much better situation now compared to 10-15 years ago. However, it is still important to have a good understanding of the underlying models that are used to better account for possible limitations, strengths and to some extent understand what a model does (to provide better “explainability”, meaning the extent to which the internal mechanics of a ML system can be explained in human terms).






Figure 1. The Machine Learning development

After choosing a specific ML model, a “training” step follows to find patterns and features in the data provided to the model. In a world where vast quantities of data are easily available, it is tempting to use more data as a brute force to get models to converge. However, as a rule, good (high quality) data and clever pre-processing is preferred. Hence, it is reasonable to expect spending more time on data collection and preparation than on the actual construction of ML models.

Systems Architecture and Machine Learning

Typically, the ML model code constitutes a very small fraction of the complete system. Most of the development that is done can be described as “plumbing” between components that perform other functions (see Figure 2 below).

Figure 2. A conceptual view of an ML system as presented in [1].The boxes describe tasks performed by different parts of the system and the size of each box represents the effort spent in each area.

Like any system, ML systems may be afflicted with a considerable amount of technical debt. ML systems do not only suffer from all the traditional problems with maintaining code over time but also suffer from problems uniquely associated with the fact that data lies behind the behaviour of it. It is not enough to trust traditional approaches like static and dynamic code analysis, modularization and APIs, since data flows across and affects the complete system.

Examples of data related technical debt that may arise if ML components are regarded as “black box” are:

  • Training data dependencies that call for careful consideration before reusing or linking models. When used in a new context, a model (or linked models) generally needs to be re-optimized.
  • Hidden assumptions that have been built into glue code or complicated data collection structures caused by an experimental way of working (so-called pipeline jungles) make the system difficult to analyse and maintain.
  • Different models that have been trained based on the same original dataset will be connected in ways that can be very difficult to analyse.
  • The environment that data is collected from is generally a dynamic system (the world around us) that is constantly changing, and this may also invalidate an ML model if it is not monitored and updated over time. When a model is in operation it may also influence its own future training data in complex feedback loops.

To overcome issues like these, a systems thinking methodology is needed. Creating as simple ML models as possible with a higher level of “explainability” helps, but more generally a proper systems analysis should be made. On the system level it thus makes sense to make a design that aims to circumvent the pitfalls of data technical debt, i.e. an ML systems architecture.

The Machine Learning Development Lifecycle

A good starting point for a non-practitioner to learn how a Machine Learning model is developed is to understand the lifecycle of an ML component. At Microsoft the process is mapped as in Figure 3 below (note: alternative but similar process descriptions exist).




Figure 3. The Machine Learning development lifecycle process according to Microsoft [2].

First the requirements for the model and its depending data are set. This is followed by the data steps, meaning: 1) identifying and collecting data needed by the model, 2) cleaning data from noise and bias and 3) labelling data so it can be identified and traced. Feature engineering basically entails pre-processing the data in a way that fits the ML model that has been chosen. Model training means using (parts of) the pre-processed data to create and converge a ML model. If no solution is found it may be necessary to go back to the previous step. In the model evaluation, data not previously used (collected, constructed or both) is used to check the validity of the model. If validity is too low, there is a need to loop back to one of the earlier steps – often the first ones. Finally, the model is integrated and deployed in a product and monitored for the rest of its lifetime to secure validity and consistency.

Since there are several data-oriented process steps in the development lifecycle, it is reasonable to assume that many of the challenges lie in managing data (collecting, cleaning, versioning/labelling and monitoring) which is very different from traditional software development with its focus on coding. As mentioned earlier, data is at the centre of an ML solution and an ML component therefore cannot be treated in the same way as a regular software component. This affects how adaptation, customization and reuse can be done.

From a tool framework perspective, the different steps in the ML lifecycle all require their own specific tools. The lack of tool integration across the lifecycle has previously been identified as a challenge but lately there has been a lot of effort put into creating unified tool chains to overcome this. This is an area where a systems engineering view can support, but also just from looking at the process steps there are areas like requirements engineering and verification and validation that are well known to the systems engineering community.

Requirements Engineering for Machine Learning

On one hand Machine Learning can be looked at agnostically as a technology among others. On the other hand, it is clear that ML differs fundamentally from conventional software development since its function is not a result of codified rules but of learning data controlled by a fitness function that defines what good and bad outcomes are and hence drives learning in the right direction. Normally it is left to experts working with the ML model to decide on data selection and design the fitness function, but those decisions should preferably be made based on an understanding of the underlying business and stakeholder context to avoid definitions that are difficult to interpret although technically correct. This suggests that it could at least partially fall within the responsibility of traditional requirements engineers working in collaboration with the team developing the ML model [3].

Typical requirements that should be derived for ML systems are functional requirements, “explainability” requirements (to what level the model needs to be understood), requirements on bias and discrimination, legal and regulatory requirements (not least including privacy and safety) and data requirements (e.g. quantity and quality).

There is also a strong connection between requirements and verification and validation. An example in ML stems from the dependency on the data used for model training, where it is important that it is properly describing the context it will be used in. Since this context can change over time, the requirements also need to be continuously validated. In this case, a requirements engineer can specify when and how often re-training of the model is needed and define conditions that could help to identify anomalies in models that have been operationally deployed.

On a more fundamental level, setting requirements on the expected or needed quality of the output of a model also makes it possible to check if the developed ML system produces adequate results or if alternative (simpler/cheaper) approaches could be good enough for a certain application.

Testing Machine Learning Components

The standard way of verifying the validity, quality and performance of an ML model is to test it with a data set that has not been used in the learning phase. Lately it has become more common to also verify the model by checking how it responds to unexpected data or intentionally bad data. Additionally, it is now commonly acknowledged that in many applications of ML technology, both security and safety aspects need to be covered.

As an example, security considerations could lead to several concerns. Given that the logic of an ML model is depending on training data, it is important to test and safeguard against attack-possibilities that originate from data, like:
- Poisoned data sets. How can deliberately invalid examples inserted in the training data be identified?
- Adversarial data. Data that has been manipulated based on assumptions about the ML model and that will cause erroneous responses. This can for instance be done by adding subtle levels of noise that are difficult to detect.
- Reverse engineering of input data. By having specific knowledge about an ML model, it may be possible to re-create data that was used to train it which leads to privacy issues.

At Microsoft, a collaboration between traditional systems engineers and ML engineers has led to that the Secure Development Lifecycle is now also covering development of ML-based systems. On a general level, it is reasonable to assume that experience gained from verification and validation of conventional systems also is relevant in an ML context if it is adapted to the specific conditions valid there.

Bridging the Competence Gap - Machine Learning Literacy

Increasing Machine Learning competence should not only be about further strengthening the knowledge of practitioners but also about educating everyone else in the organization so that a better understanding can be shared around the benefits, limitations and pre-conditions of a sound development of ML-based systems.

A common slogan in this context is “Democratization of Machine Learning”. In the extreme interpretation this implies that working with ML should be made so simple that anyone can do it, but a more practical view would be to say that anyone should have a good enough understanding to be able to partake in development activities and decision making around it – much like software development has transformed from an obscure practice for the few to common-place and well-known.

As a way to understand ML literacy, Microsoft developed an assessment scheme to measure ML capabilities [2]. Some of the findings from assessing the development teams were that the technical competence level was generally ranked very high, whereas the knowledge on how to integrate ML components into larger systems or how to share ML findings were ranked lower.


Data Science and Machine Learning components are integrated in an increasing number of products. It has become time to standardize not only the tools that are used but also the development methodology, not least by adapting it into the general body of Systems Engineering knowledge. This will benefit the experts that can focus on the truly challenging ML issues, but it will also benefit everyone else in the development organization that need to understand possibilities and limitations, define requirements, test and lead the development of ML components as part of complete systems. And at the end of the day, it will provide better ML-powered products. As ML is becoming increasingly mainstream, we all need to participate and take ownership.


[1] D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, M. Young, J.-F. Crespo, and D. Dennison, “Hidden technical debt in machine learning systems,” in Advances in Neural Information Processing Systems, 2015, pp. 2503–2511.

[2] S. Amershi, A. Begel, C. Bird, R. DeLine, H. Gall, E. Kamar, N. Nagappan, B. Nushi, and T. Zimmermann, “Software engineering for machine learning: a case study”. In Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP '19), IEEE Press, 2019, pp. 291–300

[3] Vogelsang, A. and Borg, M., “Requirements Engineering for Machine Learning: Perspectives from Data Scientists”, In Proceedings of the IEEE 27th International Requirements Engineering Conference Workshops, 2019, pp. 245-251

[4] A. Arpteg, B. Brinne, L. Crnkovic-Friis, J. Bosch,”Software Engineering Challenges of Deep Learning”, In Proceedings of the 44th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), 2018, pp. 50-59

Morten Werther
Morten Werther