Keeping track of freedom while managing packages
To start, let us define some concepts that will help in understanding this article. If you are already familiar with how package managers work, you can skip to the next section.
A programming language is an agreed upon standard to write machine instructions collaboratively. A package is a collection of programs, code, or files bundled together. A dependency is a requirement for a package to work. A package manager is able to download, update, or remove packages and resolve dependencies. A repository or "repo" is a place where packages are stored. A software license grants or restricts rights and specifies the use and distribution terms of the code.
Everyone that has used
apt on Debian based systems,
pacman on Arch based systems,
dnf on Red Hat based systems, or an app store, has had experience using a package manager with a repository.
What is a programming-language-specific package manager?
A programming-language-specific package manager is a package manager built to aid or extend the functionality of a given programming language by aggregating programs and modules that are specifically written for a programming language. Nearly every modern, popular programming language has at least one package manager and repository available. Unfortunately, it can be difficult to track the mix of free and nonfree licenses for dependencies when using these package managers.
To give a few examples, see this non-exhaustive list of programming-language-specific package managers and their repositories:
- C/C++'s Conan and ConanCenter repository
- NodeJS's npm and npm Registry repository
- PHP's composer and Packagist repository
- Python's pip and PyPI repository
- Rust's cargo and Crates repository Note: lib.rs is a a free frontend to crates.
- Go commonly has a built-in package manager and uses the pkg.go.dev/ repository
- Haskell's cabal-install and Hackage repository
- R commonly has a built-in package manager and uses the CRAN repository
- Ruby's gem and RubyGems repository
Why do we have all of these package managers when GNU/Linux systems usually include a package manager already? Many programming language packages are maintained by GNU/Linux distributions and found in the operating system's repository, but the number of packages that are found in the repository is a small subset of the total number for each programming language. This is often a matter of labor; in order for a package to live in the distribution repository, someone must be willing to create a system package for an operating system and commit to maintaining it over time. Additionally, stable distributions need to maintain security patches for each release version. It is much less work for a programmer to release a package into a programming-language-specific repository that is accessible to any operating system that supports the programming language, and not only GNU/Linux distributions.
A licensing issue
Ideally all software would be free. We should be able to easily identify any nonfree packages that are widely used and organize efforts to either get them freed, or replace them with equivalent free packages. When someone installs software using a package manager, the package manager does not currently verify whether the software license is free or not. Manually checking the license of one package is not very difficult, but some NodeJS and Rust packages can have 100+ dependencies. Completing a due diligence license check becomes exponentially difficult and time consuming without automation. The Rust community has developed two free software tools, cargo-license and cargo-lichking, to help automate license compliance for dependencies. Most repositories do keep track of the licenses of the software stored within them, but the quality of repository license data varies.
Fully free GNU/Linux distributions that follow the Free System Distribution Guidelines (GNU FSDG) handle this issue by using their own system repositories that only contain free software and by removing most additional programming-language-specific package managers from their repositories until a solution is worked out. When additional programming-language-specific packages are required, the user will be faced with a broken workflow that will need to be manually resolved by either installing it outside of their main package manager or by using a different operating system that does not remove packages. Users who are not familiar with this issue will likely choose the latter which leads to using more nonfree software. Instructions for manual installation usually involve piping a curl command directly into bash which is bad security practice.
The business and ethical needs for filtering licenses
Businesses and organizations care about the license compliance of the code that runs on their machines as violations could end up costing lots of money in the future. There are commercially available but nonfree solutions to automated license compliance that are very popular so this is demonstrably a big issue.
Users who care about using free software from an ethical perspective want the four essential freedoms in their software. The four essential freedoms enable the software user the opportunity to know and understand the tools they run, edit the software to their tastes, contribute when there is an issue, and share their changes with the community.
It goes deeper
Stable distributions such as Trisquel, Debian, or Ubuntu lock their package versions at the time of release and maintain updates to those packages only with security patches. When using a programming language package manager with a potentially older (up to five years old) version with new security updates of the programming language, users should expect to run into unmet dependency requirement problems. The common solution for this is to use programming language version managers; these can install different versions of a programming language environment concurrently.
To give a few examples, see this non-exhaustive list of programming language version managers:
- asdf handles many languages with an extensible plugin system
- Go's gvm
- NodeJS' nvm
- Python's pyenv
- Ruby's rvm
When a version manager installs a new version of a programming language, each installed instance is expected to include a programming-language-specific package manager built for that version. Manually modifying package managers and repositories for all of these additional programming language installations would be difficult.
I will propose a few ways in which we can approach solutions for the issue, but we really need a community effort if we are to improve the situation. The best way to handle macro issues at scale is to work upstream and convince package managers and repository maintainers that this is an issue, and most importantly to offer help building and maintaining solutions:
- The package manager:
- Configuration: Package managers should have the configuration option to exclude packages that were deemed to be nonfree based on license data.
Fork: If upstream package managers are not interested in merging this functionality, forks could be maintained that have this feature. If a fork is the solution, I would propose the fully free GNU/Linux distributions band together to build and maintain such a tool. The version managers would also need the ability to install the fork.
The repository side:
- Self-reporting licenses: At the very least, repositories should require reporting the license of a package in order to submit a new entry. Most repositories do mandatory license self-reporting at this point which is an important first step.
Automated license compliance scanning: The SPDX project keeps an exhaustive list of license text and standard license headers that can be leveraged by license compliance software to better scan projects for license compliance and verify license information kept by repositories. The Free Software Directory teams use FOSSology, Licenseutils, and ScanCode Toolkit to help scan repositories for license compliance. These tools do not make licensing distinctions by themselves and require manual interpretation. Developing interpretative automation for these free software tools would help the licensing community.
- Community review: If tools are not built to aid the repositories in automatic license compliance, repository maintainers are unlikely to change systemically. A large number of volunteers could manually review repositories and submit corrections.
- Alternative repositories: If repositories are unwilling or unable to implement the changes, alternative repositories could be maintained by members of the free software community.
The third-party solutions would take time and create a lot of work, the number of volunteers with a license compliance skill set is small, package managers and version managers would still need to be configured to point to an alternative.
Beware... Here Be Dragons
This issue of differentiating between free and nonfree licenses in an ever growing chain of dependencies springs up in many places outside of programming-language-specific package managers as well. In containerization, docker, podman, and kubernetes use the Docker Hub repository. In configuration management, Ansible uses the Ansible Galaxy repository. Distribution independent package managers also exist that bundle dependencies such as snaps using the Snapcraft repository, flatpak using the Flathub repository, and AppImages using the AppImageHub repository.
There are alternative repositories for apt on Debian based systems that could contain anything called Personal Package Archive (PPA). I would never recommend using them unless you know and trust the maintainer.
The unknowable mix of free and nonfree software is an issue almost anytime there is a package manager and an associated repository. While all of these additional examples need to be addressed, we need to start somewhere.
All of these layers of abstraction were built to make things simpler, but their long term effect is that in the process of making things simpler, it has made it difficult for people to know and understand the software that they use. We have a track record of working through major issues and I am optimistic that if we jump on the issue now, we can solve it together.
Image by Michael McMahon. Copyright © 2021 Free Software Foundation, Inc., licensed under Creative Commons Attribution 4.0 International license.
Copyright © 2021 Free Software Foundation, Inc. This article is individually licensed under the Creative Commons Attribution-ShareAlike 4.0 International license.