Quantifiying the environmental footprint #
Central question
- What is the ecological footprint of AI applications and why do we need it?
-> To mitigate the ecological Impact of AI and this is only possible, if you can quantify the impact!
Why it is important to know the footprint #
A fundamental understanding of the ecological impact of AI systems is a prerequisite for the development of effective mitigation strategies. For example, the replacement of existing hardware with new, more energy-efficient models may paradoxically result in an increased overall CO2 footprint when considering the emissions associated with the production of new hardware and the disposal of the old. Therefore, before implementing optimizations, it is essential to fully comprehend the potential environmental benefits, which can only be achieved by a comprehensive quantification of the entire system footprint.
What are the environmental influences their sources? #
Given the complexity of AI-based systems, their environmental influences are similarly intricate. The complete footprint of these systems encompasses a multitude of factors, including the execution on hardware (energy consumption) and the necessary infrastructure for their execution (e.g., the data centers but also the offices for workers). Infrastructure-related impactrs are primarily embodied emissions, but also include water for cooling the servers, etc. Finally, the workers, including those involved in the development, marketing, and other aspects of the AI-based system, must be considered. Life-cycle-assessment (LCA) provides an effective framework for evaluating these emissions.
Related Work: [Morand2024] provide a tool for machine learning Life Cycle Assessment
How can those impacts be measured? #
One of the most challenging aspects of measuring these emissions is the difficulty of obtaining accurate data. The measurement of energy consumption in AI-based systems can be accomplished through a variety of methods, each with its own advantages and disadvantages. The most precise approach would be to utilize a power meter to directly measure the energy consumption of the system. However, there may be instances where it is not feasible to gain access to the machine or to obtain a suitable power meter. In these cases, an alternative approach may be to estimate the energy consumption with software tools, such as RAPL or NVIDIA-smi which are more accessible and easier to use but not as accurate. If the hardware is incompatible with these tools, the energy consumption can be estimated by combining the time and system utilization with the TDP of the system. Although some large cloud providers have begun to grant access to the energy consumption of individual projects, it is still possible that the load or TDP of the system may not be known. In such cases, estimation is the only viable option.
The measurement of additional resources may be necessary in order to gain a comprehensive understanding of the environmental impact of AI systems, depending on the hardware and objectives. Measures include execution time, CPU, GPU/TPU, and RAM usage. However, additional metrics, such as resource and energy efficiency may lead to further optimization approaches. Furthermore, the examination of additional system components may bring additional insights. For instance, the water consumption of data centers utilizing water cooling can be considered. Regardless of thelocation where the AI system is deployed, it requires hardware that is associated with a CO2 footprint and additional environmental impacts throughout its production, transportation, and end-of-life (disposal, recycling, or reuse). This is a challenge that currently cannot be quantified and necessitates the use of estimates, if available.
Further details on the measurement of software-induced resource and energy consumption, including methods, tutorials, guidelines, metrics, and tools, can be found in the article [Guldner2024] and the associated Green Software Measurement Model (GSMM) repository.
How do we report our results? #
Once the system has been measured, the resulting data must be reported. It is important to ensure that the correct number is reported to the appropriate audience. For instance, reporting the CO2 footprint of a trained model to the general public would be an appropriate course of action if other models were to do the same. This would enable customers to select the more environmentally friendly model. However, for a scientific audience, this may not be sufficient. For example, they may also wish to compare architectural approaches. In this case, it is necessary to provide information regarding the utilized hardware and the induced energy consumption.
References #
- [Guldner2024]
Guldner, A., Bender, R., Calero, C., Fernando, G. S., Funke, M., Gröger, J., Hilty, L. M., Hörnschemeyer, J., Hoffmann, G.-D., Junger, D., Kennes, T., Kreten, S., Lago, P., Mai, F., Malavolta, I., Murach, J., Obergöker, K., Schmidt, B., Tarara, A., … Naumann, S. (2024). Development and evaluation of a reference measurement model for assessing the resource and energy efficiency of software products and components—Green Software Measurement Model (GSMM). In Future Generation Computer Systems (Vol. 155, pp. 402–418). Elsevier BV.
- [Morand2024]
Morand, C., Ligozat, A.-L. & Névéol, A. (2024). MLCA: a tool for Machine Learning Life Cycle Assessment. In International ICT4S Conference. https://conf.researchr.org/details/ict4s-2024/ict4s-2024-research-papers/13/MLCA-a-tool-for-Machine-Learning-Life-Cycle-Assessment