init
This commit is contained in:
commit
b03965b764
52 changed files with 3576 additions and 0 deletions
593
index.qmd
Normal file
593
index.qmd
Normal file
|
@ -0,0 +1,593 @@
|
|||
---
|
||||
title: Reproducibility in functional package management
|
||||
date: 2025-2-11
|
||||
date-format: long
|
||||
lightbox: true
|
||||
logo: telecom.png
|
||||
margin-top: "0px"
|
||||
author:
|
||||
- name:
|
||||
given: Julien
|
||||
family: Malka
|
||||
url: https://luj.fr
|
||||
email: julien.malka@telecom-paris.fr
|
||||
orcid: 0009-0008-9845-6300
|
||||
roles:
|
||||
- conceptualization
|
||||
- investigation
|
||||
- writing – original draft
|
||||
affiliations:
|
||||
- id: telecom
|
||||
name: Télécom Paris, Institut Polytechnique de Paris
|
||||
fig-align: center
|
||||
code-overflow: wrap
|
||||
code-line-numbers: false
|
||||
css: styles.css
|
||||
|
||||
format:
|
||||
metropolis-beamer-revealjs:
|
||||
theme: slide.scss
|
||||
toc: false
|
||||
toc-title: Plan
|
||||
toc-depth: 2
|
||||
slide-level: 2
|
||||
slide-number: true
|
||||
---
|
||||
|
||||
|
||||
|
||||
## Research topics
|
||||
|
||||
**Main topics:** Cybersecurity & Software engineering
|
||||
|
||||
*How can one trust the software installed on one’s system is not malicious?*
|
||||
|
||||
- What if we make the assumption that the software is **open source**?
|
||||
|
||||
## Software supply chain
|
||||
|
||||
::: {.r-fit-text}
|
||||
|
||||
**Definition:** All the **components**, **tools** and **processes** used to **produce**, **compile** and **distribute** software.
|
||||
|
||||
- An increasing number of attacks that target the software supply chain of the software instead of the software itself : for example *Solarwinds* (2020).
|
||||
- Will to create security norms of the software supply chain (*USA Executive order on improving the nation’s cybersecurity*/*EU Cyber Resilience Act*).
|
||||
:::
|
||||
|
||||
|
||||
::: {#fig-eval-build}
|
||||
{.r-stretch}
|
||||
|
||||
Software supply chain overview ([slsa.dev](https://slsa.dev))
|
||||
|
||||
:::
|
||||
|
||||
## Main PhD research question
|
||||
|
||||
How to increase trust in the Open Source Software Supply Chain with **functional package managers** and **reproducible builds**?
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
## Functional package managers {auto-animate="true"}
|
||||
|
||||
|
||||
New software deployment model (from which **Nix** has been the first example).
|
||||
|
||||
|
||||
```nix
|
||||
{ stdenv, lib, fetchFromGitHub }:
|
||||
```
|
||||
|
||||
## Functional package managers {auto-animate="true"}
|
||||
|
||||
|
||||
New software deployment model (from which **Nix** has been the first example).
|
||||
|
||||
```nix
|
||||
{ stdenv, lib, fetchFromGitHub }:
|
||||
|
||||
stdenv.mkDerivation rec {
|
||||
version = "1.3.7";
|
||||
pname = "htpdate";
|
||||
|
||||
src = fetchFromGitHub {
|
||||
owner = "twekkel";
|
||||
repo = pname;
|
||||
rev = "v${version}";
|
||||
sha256 = "sha256-X7r95Uc4oGB0eVum5D7pC4tebZIyyz73g6Q/D0cjuFM=";
|
||||
};
|
||||
|
||||
```
|
||||
|
||||
## Functional package managers {auto-animate="true"}
|
||||
|
||||
|
||||
New software deployment model (from which **Nix** has been the first example).
|
||||
|
||||
```nix
|
||||
{ stdenv, lib, fetchFromGitHub }:
|
||||
|
||||
stdenv.mkDerivation rec {
|
||||
version = "1.3.7";
|
||||
pname = "htpdate";
|
||||
|
||||
src = fetchFromGitHub {
|
||||
owner = "twekkel";
|
||||
repo = pname;
|
||||
rev = "v${version}";
|
||||
sha256 = "sha256-X7r95Uc4oGB0eVum5D7pC4tebZIyyz73g6Q/D0cjuFM=";
|
||||
};
|
||||
|
||||
makeFlags = [
|
||||
"prefix=$(out)"
|
||||
];
|
||||
|
||||
```
|
||||
|
||||
## Functional package managers {auto-animate="true"}
|
||||
|
||||
::: {.r-fit-text}
|
||||
|
||||
New software deployment model (from which **Nix** has been the first example).
|
||||
|
||||
```nix
|
||||
{ stdenv, lib, fetchFromGitHub }:
|
||||
|
||||
stdenv.mkDerivation rec {
|
||||
version = "1.3.7";
|
||||
pname = "htpdate";
|
||||
|
||||
src = fetchFromGitHub {
|
||||
owner = "twekkel";
|
||||
repo = pname;
|
||||
rev = "v${version}";
|
||||
sha256 = "sha256-X7r95Uc4oGB0eVum5D7pC4tebZIyyz73g6Q/D0cjuFM=";
|
||||
};
|
||||
|
||||
makeFlags = [
|
||||
"prefix=$(out)"
|
||||
];
|
||||
|
||||
meta = with lib; {
|
||||
description = "Utility to fetch time and set the system clock over HTTP";
|
||||
platforms = platforms.linux;
|
||||
license = licenses.gpl2Plus;
|
||||
maintainers = with maintainers; [ julienmalka ];
|
||||
};
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
|
||||
## Evaluation->Build pipeline
|
||||
|
||||
|
||||
::: {#fig-eval-build}
|
||||
{.r-stretch}
|
||||
|
||||
Eval-build pipeline
|
||||
|
||||
:::
|
||||
|
||||
|
||||
## Functional package managers for SSC security
|
||||
|
||||
Functional package managers also have interesting properties for software supply chain security (which are of interest for us):
|
||||
|
||||
- Builds from source;
|
||||
- Sandboxed compilation.
|
||||
|
||||
|
||||
|
||||
## Functional package managers for SSC security
|
||||
|
||||
- Installed packages create a static graph structure (a Merkle tree) that can be analysed in order to find known vulnerability in the dependencies.
|
||||
|
||||
|
||||
::: {#fig-eval-build}
|
||||
{.r-stretch}
|
||||
|
||||
Example of a package dependency graph
|
||||
|
||||
:::
|
||||
|
||||
## Binary distribution
|
||||
|
||||
- It is not always reasonable to compile all the software a user wants to install on their own machine: creates the necessity of binary caches ;
|
||||
- **But** binary caches make us lose some of the interesting security properties of functional package managers.
|
||||
|
||||
|
||||
|
||||
|
||||
## Reproducible builds {.lol}
|
||||
|
||||
{height='4em' fig-align="center"}
|
||||
|
||||
A build is **reproducible** if given the same source code, build environment and build instructions, any party can recreate bit-by-bit identical copies of all specified artifacts.
|
||||
|
||||
|
||||
## Why is build reproducibility important?
|
||||
|
||||
::: {#fig-rb}
|
||||
|
||||
{.r-stretch fig-align="center"}
|
||||
|
||||
Leveraging reproducible-builds to increase trust in distributed artifacts.
|
||||
|
||||
:::
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
## Research questions {auto-animate="true"}
|
||||
|
||||
|
||||
**How reproducible is software in the functional package management model?**
|
||||
|
||||
- Is Nix evaluation reproducible? Can we reproduce *build environments* of Nix packages?
|
||||
- Do functional package management enable **bitwise build reproducibility**?
|
||||
|
||||
|
||||
|
||||
## Reproducibility of build environments {auto-animate="true"}
|
||||
|
||||
|
||||
- Is Nix evaluation reproducible? Can we reproduce *build environments* of Nix packages?
|
||||
|
||||
:::: {.columns}
|
||||
|
||||
::: {.column width="50%"}
|
||||
|
||||
{height='14em'}
|
||||
|
||||
:::
|
||||
|
||||
::: {.column width="50%"}
|
||||
|
||||
**"Reproducibility of Build Environments through Space and Time"**, ICSE 2024 (New Ideas and Emerging Results track), *J. Malka, S. Zacchiroli, T. Zimmermann*.
|
||||
|
||||
|
||||
:::
|
||||
|
||||
::::
|
||||
|
||||
|
||||
|
||||
|
||||
## Reproducibility of build environments
|
||||
|
||||
|
||||
We say that two build environments are identical if they contain the **exact same set of executables, up to their specific versions**.
|
||||
|
||||

|
||||
|
||||
|
||||
## Reproducibility in Space
|
||||
|
||||
|
||||
::: {#fig-rb}
|
||||

|
||||
|
||||
Reproducibility of build environments in Space
|
||||
|
||||
:::
|
||||
|
||||
## Reproducibility in Time
|
||||
|
||||
|
||||
|
||||
::: {#fig-rb}
|
||||
{.r-stretch}
|
||||
|
||||
Reproducibility of build environments in Time
|
||||
|
||||
:::
|
||||
|
||||
|
||||
## Research questions
|
||||
|
||||
|
||||
::: {.incremental}
|
||||
- **RQ1:** Is space and time reproducibility of build environments achievable with Nix ?
|
||||
- **RQ2:** Does it allow rebuilding of past software versions ?
|
||||
:::
|
||||
|
||||
## Experimental protocol
|
||||
|
||||
::: {.incremental}
|
||||
|
||||
1) Sample 200 revisions of the Nix software repository, picked from 2017 to 2023;
|
||||
2) For each sampled revision, perform the **evaluation** of each package and compare with the historical truth (historical CI results);
|
||||
3) For the *oldest revision* of our samples, perform the **build** of each package and compare with the historical truth.
|
||||
|
||||
:::
|
||||
|
||||
|
||||
## Results {auto-animate="true"}
|
||||
|
||||
|
||||
**RQ1:** *Reproducibility of build environments*
|
||||
|
||||
- We were able to **reproduce the build environment of 99.99% of the packages** we tested;
|
||||
- Discrepancies we found were due to the (unfortunate) use of some of Nix’s impure builtins.
|
||||
|
||||
## Results {auto-animate="true"}
|
||||
|
||||
|
||||
**RQ1:** *Reproducibility of build environments*
|
||||
|
||||
- We were able to **reproduce the build environment of 99.99% of the packages** we tested;
|
||||
- Discrepancies we found were due to the (unfortunate) use of some of Nix’s impure builtins.
|
||||
|
||||
**RQ2:** *Rebuilding past software versions*
|
||||
|
||||
- We were able to **build successfully 14233 out of the 14242 (99.94%) packages that were built successfully by CI in 2017**;
|
||||
- Discrepancies we found were due to leakages of the Nix build sandbox, that we wish to investigate further.
|
||||
|
||||
|
||||
|
||||
|
||||
## Research questions {auto-animate="true"}
|
||||
|
||||
|
||||
**How reproducible is software in the functional package management model?**
|
||||
|
||||
- Is Nix evaluation reproducible? Can we reproduce *build environments* of Nix packages?
|
||||
- Do functional package management enable **bitwise build reproducibility**?
|
||||
|
||||
|
||||
|
||||
## Reproducibility of build environments {auto-animate="true"}
|
||||
|
||||
|
||||
- Do functional package management enable **bitwise build reproducibility**?
|
||||
|
||||
:::: {.columns}
|
||||
|
||||
::: {.column width="50%"}
|
||||
|
||||
{height='14em'}
|
||||
|
||||
:::
|
||||
|
||||
::: {.column width="50%"}
|
||||
|
||||
**"Does Functional Package Management Enable Reproducible Builds at Scale? Yes."**, MSR 2025, *J. Malka, S. Zacchiroli, T. Zimmermann*.
|
||||
|
||||
|
||||
:::
|
||||
|
||||
::::
|
||||
|
||||
|
||||
|
||||
## Nix **does not** garantee reproducible builds!
|
||||
|
||||
```nix
|
||||
let
|
||||
pkgs = import <nixpkgs> { };
|
||||
in
|
||||
pkgs.runCommand "random" { } ''
|
||||
echo $RANDOM > $out
|
||||
''
|
||||
```
|
||||
|
||||
```{mermaid}
|
||||
flowchart TD
|
||||
A[nix-build]
|
||||
A --run 1--> B[12505]
|
||||
A --run 2--> C[29217]
|
||||
```
|
||||
|
||||
→ Will produce an artifact with a different number at each run!
|
||||
|
||||
|
||||
|
||||
|
||||
## So how reproducible packages of the Nix distribution are?
|
||||
|
||||
|
||||
::: {#fig-monitoring}
|
||||
|
||||

|
||||
|
||||
[https://reproducible.nixos.org](https://reproducible.nixos.org)
|
||||
|
||||
:::
|
||||
|
||||
## So how reproducible packages of the Nix distribution are?
|
||||
|
||||
|
||||
::: {#fig-diffoscope}
|
||||
|
||||

|
||||
|
||||
Example of a diffoscope.
|
||||
|
||||
:::
|
||||
|
||||
**Problem:**
|
||||
|
||||
- Only monitors a small subset of `nixpkgs` (~1300 packages for the Gnome image runtime closure)
|
||||
|
||||
|
||||
|
||||
## Research questions
|
||||
|
||||
- **RQ1:** What is the evolution of bitwise reproducible packages in `nixpkgs` between 2017 and 2023?
|
||||
- **RQ2:** What are the unreproducible packages?
|
||||
- **RQ3:** Why are packages unreproducible?
|
||||
- **RQ4:** How are unreproducibilities fixed?
|
||||
|
||||
|
||||
|
||||
## Research methodology
|
||||
|
||||
::: {#fig-methodology}
|
||||
{fig-align="center"}
|
||||
|
||||
Pipeline summarizing our research methodology.
|
||||
:::
|
||||
|
||||
|
||||
## A few figures
|
||||
|
||||
::::{.columns}
|
||||
|
||||
::: {.column width="60%"}
|
||||
|
||||
|
||||
|
||||
::: {#fig-diffoscope}
|
||||

|
||||
|
||||
Evolution of the size of the nine most popular software ecosystems in `nixpkgs`.
|
||||
|
||||
:::
|
||||
|
||||
:::
|
||||
|
||||
|
||||
|
||||
::: {.column width="40%"}
|
||||
- 709 816 packages built;
|
||||
- 14 296 total build hours;
|
||||
- 548 390 tracked by name and corresponding to <span style="white-space: nowrap;">59 103</span> unique packages associated to a specific software ecossytem .
|
||||
|
||||
:::
|
||||
|
||||
::::
|
||||
|
||||
|
||||
|
||||
## RQ1: Evolution of bitwise reproducible packages
|
||||
|
||||
::: {#fig-overall}
|
||||
|
||||
{.r-stretch fig-align="center"}
|
||||
|
||||
Proportion of reproducible, rebuildable and non-rebuildable packages over time.
|
||||
|
||||
:::
|
||||
|
||||
|
||||
## RQ1: Evolution of bitwise reproducible packages
|
||||
|
||||
::: {#fig-overall2}
|
||||
|
||||
{.r-stretch fig-align="center"}
|
||||
|
||||
Absolute numbers of reproducible, rebuildable and non-rebuildable packages over time.
|
||||
|
||||
:::
|
||||
|
||||
|
||||
|
||||
## RQ1: Evolution of bitwise reproducible packages
|
||||
|
||||
|
||||
::: {#fig-overall2-reg}
|
||||
|
||||
|
||||
{.r-stretch fig-align="center"}
|
||||
|
||||
Reproducibility regression around June 2020.
|
||||
|
||||
:::
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
## RQ2: What are the unreproducible packages?
|
||||
|
||||
::: {#fig-diff}
|
||||
|
||||
{.r-stretch fig-align="center"}
|
||||
|
||||
Proportion of reproducible packages belonging to the three most popular ecosystems and the base namespace of nixpkgs.
|
||||
:::
|
||||
|
||||
## RQ3: Why are packages unreproducible?
|
||||
|
||||
|
||||
::: {#fig-diff}
|
||||
|
||||
{.r-stretch fig-align="center"}
|
||||
|
||||
Evolution of the number of packages that are matched by each of our heuristics, over time.
|
||||
:::
|
||||
|
||||
|
||||
|
||||
## RQ4: How are unreproducibilities fixed?
|
||||
|
||||
|
||||
- Sampled 100 fixes in our dataset of reproducibility fixed (obtained by bisection of the `nixpkgs` repository):
|
||||
|
||||
→ **In 93 instances, "reproducibility" was not mentionned on the pull request / commit message.**
|
||||
|
||||
→ **In 75 cases the fix was merely a package update.**
|
||||
|
||||
<br>
|
||||
|
||||
- Studied the 15 most impactful fixes (from 3052 to 27 packages fixed):
|
||||
|
||||
→ **In 8/15 instances, the reproducibility issue being fixed is documented.**
|
||||
|
||||
|
||||
|
||||
## Conclusion
|
||||
|
||||
- Bitwise reproducibility in `nixpkgs` as of 2023: 91%;
|
||||
- This justifies investing resources/conducting research on distributed cache solutions relying on build reproducibility.
|
||||
|
||||
|
||||
## Thank you for your attention!
|
||||
|
||||
|
||||
<h3><ins>My socials:</ins></h3>
|
||||
|
||||
|
||||
<div style="margin-bottom: 40px;"></div>
|
||||
{{< bi mastodon >}} luj@chaos.social
|
||||
|
||||
{{< bi envelope >}} julien.malka@telecom-paris.fr
|
||||
|
||||
|
||||
:::: {.columns}
|
||||
|
||||
::: {.column width="50%"}
|
||||
|
||||
|
||||
{height='11em'}
|
||||
:::
|
||||
|
||||
|
||||
::: {.column width="50%"}
|
||||
|
||||
{height='11em'}
|
||||
:::
|
||||
|
||||
::::
|
||||
|
||||
## RQ1: Evolution of bitwise reproducible packages
|
||||
|
||||
|
||||
::: {#fig-overall2-reg}
|
||||
|
||||
|
||||
{.r-stretch fig-align="center"}
|
||||
|
||||
Sankey graph of the average flow of packages between two revisions, excluding the revision from June 2020, considered as an outlier.
|
||||
|
||||
:::
|
||||
|
||||
|
||||
|
Loading…
Add table
Add a link
Reference in a new issue