Telemetry Data and Open Source

Open source software projects frequently want to understand better how their software is being used by end users. Most project communities rely heavily on bug reports and feature requests that are submitted by users in their issue tracking systems. However, there may be details of how the software is used in practice that can be obtained via a more direct understanding of the actual use of the open source tool.

The term telemetry data is frequently used to refer to data about how software (whether open source or otherwise) is used or performing, often collected through a “phone home” mechanism that is built directly into the software.

The collection and use of telemetry data involves a careful balancing of interests. On the one hand, telemetry data can be extraordinarily useful to open source project contributors and maintainers. It helps them to understand how end users actually make use of their tools and to work through incorrect assumptions.

At the same time, any collection of telemetry data should be done in a careful and conscientious manner. An open source community’s collection of telemetry data can raise at least four concerns of relevance for end users:

  1. individual data privacy: Does the Telemetry Data lead to the ability to track or uniquely identify the user? Even if it doesn’t, does the Telemetry Data otherwise include some form of personal information that is subject to laws and regulations, or even just that the user doesn’t realize is being shared?

  2. data confidentiality: Does the Telemetry Data result in any potentially business-sensitive information being sent to the project community? Does a business realize that the open source software is sharing this data? Even if a staff member clicked to consent, were they authorized to enable data sharing on behalf of their employer?

  3. awareness of collection: Does the software ensure that all relevant users and installers of the software are aware of the Telemetry Data collection, before it is enabled? Is it opt-out or opt-in? Can notices or consents be inadvertently bypassed when the software is installed through automated means?

  4. security of collection mechanism: Does the “phone home” functionality open up any inadvertent security vulnerabilities? Could those vulnerabilities be present even for users who refuse to enable Telemetry Data?

The Linux Foundation and its hosted projects have implemented policies and procedures for telemetry data collection and usage. These are intended to provide guidance to LF-hosted projects that want to collect telemetry data from end users of the open source project’s software. They establish a process where the project maintainers answer questions about the intended telemetry collection and use of telemetry data. The LF’s legal counsel then works with those maintainers to align the collection with end user expectations and best practices.

Please see the Telemetry Data Collection and Usage policies for projects organized under The Linux Foundation and LF Projects, LLC, for more details, as well as to view the review forms and discussions for previously-reviewed projects.