a provenance tool
RSECon25
You need to:
Team has been improving data cleaning since then. Some team members left.
Different statistical methods now.
First question: How exactly was Figure 1 produced?
Figure 1 = code(data)
Results depend on both algorithms and data.
Code under version control (Git) ✓
Tagged commit at submission ✓
But what about the data?
data₁ = code₂(data₂)
Data transformed by wrangling/cleaning steps
You want to reconstruct the chain of data provenance.
| Frequent changes | Code and data both evolve |
| Complex pipelines | Many steps, multiple datasets |
| Tool heterogeneity | Python, R, SQL, DuckDB all in one project |
| Team dynamics | People join, leave, change roles |
| Version Control (Git) | Not suitable for large binary data. |
| Data Version Control (DVC) | Only versions data. |
| Orchestration Tools (Airflow, dbt, KNIME) | Language specific and too complex. |
A command-line tool that packages code and data together in immutable snapshots, with all data dependencies declared explicitly.
Conference session classification by title
Inputs:
Output:
/demo/projects
/demo/projects
/demo/projects
/demo/projects
/demo/projects
/demo/projects
/demo/projects
/demo/bead-box
/demo/projects
/demo/bead-box
/demo/projects
/demo/bead-box
/demo/projects
/demo/bead-box
/demo/projects
/demo/bead-box
/demo/projects
/demo/bead-box
/demo/projects
/demo/bead-box
/demo/projects
/demo/bead-box
/demo/projects
/demo/bead-box
/demo/projects
/demo/bead-box
/demo/projects
└── sessions
├── input
├── output
└── temp
/demo/bead-box
/demo/projects
└── sessions
├── input
├── output
└── temp
/demo/bead-box
/demo/projects
└── sessions
├── input
├── output
└── temp
/demo/bead-box
/demo/projects
└── sessions
├── input
├── output
└── temp
/demo/bead-box
/demo/projects
└── sessions
├── input
├── output
│ └── sessions.csv
└── temp
/demo/bead-box
/demo/projects
└── sessions
├── input
├── output
│ └── sessions.csv
└── temp
/demo/bead-box
/demo/projects
└── sessions
├── input
├── output
│ └── sessions.csv
└── temp
/demo/bead-box
/demo/projects
└── sessions
├── input
├── output
│ └── sessions.csv
└── temp
/demo/bead-box
/demo/projects
└── sessions
├── input
├── output
│ └── sessions.csv
└── temp
/demo/bead-box └── sessions_20250910T150420306964+0000.zip
/demo/projects
└── sessions
├── input
├── output
│ └── sessions.csv
└── temp
/demo/bead-box └── sessions_20250910T150420306964+0000.zip
/demo/projects
/demo/bead-box └── sessions_20250910T150420306964+0000.zip
/demo/projects
/demo/bead-box └── sessions_20250910T150420306964+0000.zip
/demo/projects
/demo/bead-box └── sessions_20250910T150420306964+0000.zip
/demo/projects
/demo/bead-box └── sessions_20250910T150420306964+0000.zip
/demo/projects
/demo/bead-box └── sessions_20250910T150420306964+0000.zip
/demo/projects
/demo/bead-box └── sessions_20250910T150420306964+0000.zip
/demo/projects
/demo/bead-box └── sessions_20250910T150420306964+0000.zip
/demo/projects
/demo/bead-box └── sessions_20250910T150420306964+0000.zip
/demo/projects
└── theme-aliases
├── input
├── output
└── temp
/demo/bead-box └── sessions_20250910T150420306964+0000.zip
/demo/projects
└── theme-aliases
├── input
├── output
└── temp
/demo/bead-box └── sessions_20250910T150420306964+0000.zip
/demo/projects
└── theme-aliases
├── input
├── output
└── temp
/demo/bead-box └── sessions_20250910T150420306964+0000.zip
/demo/projects
└── theme-aliases
├── input
├── output
└── temp
/demo/bead-box └── sessions_20250910T150420306964+0000.zip
/demo/projects
└── theme-aliases
├── input
├── output
│ └── theme_aliases.csv
└── temp
/demo/bead-box └── sessions_20250910T150420306964+0000.zip
/demo/projects
└── theme-aliases
├── input
├── output
│ └── theme_aliases.csv
└── temp
/demo/bead-box └── sessions_20250910T150420306964+0000.zip
/demo/projects
└── theme-aliases
├── input
├── output
│ └── theme_aliases.csv
└── temp
/demo/bead-box └── sessions_20250910T150420306964+0000.zip
/demo/projects
└── theme-aliases
├── input
├── output
│ └── theme_aliases.csv
└── temp
/demo/bead-box └── sessions_20250910T150420306964+0000.zip
/demo/projects
└── theme-aliases
├── input
├── output
│ └── theme_aliases.csv
└── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
└── theme-aliases
├── input
├── output
│ └── theme_aliases.csv
└── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
└── possible-tracks
├── input
├── output
└── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
└── possible-tracks
├── input
├── output
└── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
└── possible-tracks
├── input
├── output
└── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
└── possible-tracks
├── input
├── output
└── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
└── possible-tracks
├── input
│ └── sessions
│ └── sessions.csv
├── output
└── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
└── possible-tracks
├── input
│ └── sessions
│ └── sessions.csv
├── output
└── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
└── possible-tracks
├── input
│ ├── sessions
│ │ └── sessions.csv
│ └── theme-aliases
│ └── theme_aliases.csv
├── output
└── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
└── possible-tracks
├── input
│ ├── sessions
│ │ └── sessions.csv
│ └── theme-aliases
│ └── theme_aliases.csv
├── output
└── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
└── possible-tracks
├── input
│ ├── sessions
│ │ └── sessions.csv
│ └── theme-aliases
│ └── theme_aliases.csv
├── output
├── run.sh
└── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
└── possible-tracks
├── input
│ ├── sessions
│ │ └── sessions.csv
│ └── theme-aliases
│ └── theme_aliases.csv
├── output
├── run.sh
└── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
└── possible-tracks
├── input
│ ├── sessions
│ │ └── sessions.csv
│ └── theme-aliases
│ └── theme_aliases.csv
├── output
├── run.sh
└── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
└── possible-tracks
├── input
│ ├── sessions
│ │ └── sessions.csv
│ └── theme-aliases
│ └── theme_aliases.csv
├── output
├── run.sh
└── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
└── possible-tracks
├── input
│ ├── sessions
│ │ └── sessions.csv
│ └── theme-aliases
│ └── theme_aliases.csv
├── output
│ └── classified_sessions.csv
├── run.sh
└── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
└── possible-tracks
├── input
│ ├── sessions
│ │ └── sessions.csv
│ └── theme-aliases
│ └── theme_aliases.csv
├── output
│ └── classified_sessions.csv
├── run.sh
└── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
└── possible-tracks
├── input
│ ├── sessions
│ │ └── sessions.csv
│ └── theme-aliases
│ └── theme_aliases.csv
├── output
│ └── classified_sessions.csv
├── run.sh
└── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
└── possible-tracks
├── input
│ ├── sessions
│ │ └── sessions.csv
│ └── theme-aliases
│ └── theme_aliases.csv
├── output
│ └── classified_sessions.csv
├── run.sh
└── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
└── possible-tracks
├── input
│ ├── sessions
│ │ └── sessions.csv
│ └── theme-aliases
│ └── theme_aliases.csv
├── output
│ └── classified_sessions.csv
├── run.sh
└── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
└── possible-tracks
├── input
│ ├── sessions
│ │ └── sessions.csv
│ └── theme-aliases
│ └── theme_aliases.csv
├── output
│ └── classified_sessions.csv
├── run.sh
└── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
└── possible-tracks
├── input
│ ├── sessions
│ │ └── sessions.csv
│ └── theme-aliases
│ └── theme_aliases.csv
├── output
│ └── classified_sessions.csv
├── run.sh
└── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
└── possible-tracks
├── input
│ ├── sessions
│ │ └── sessions.csv
│ └── theme-aliases
│ └── theme_aliases.csv
├── output
│ └── classified_sessions.csv
├── run.sh
└── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
└── possible-tracks
├── input
│ ├── sessions
│ │ └── sessions.csv
│ └── theme-aliases
│ └── theme_aliases.csv
├── output
│ └── classified_sessions.csv
├── run.sh
└── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
└── possible-tracks
├── input
│ ├── sessions
│ │ └── sessions.csv
│ └── theme-aliases
│ └── theme_aliases.csv
├── output
│ └── classified_sessions.csv
├── run.sh
└── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
└── theme-aliases
├── input
├── output
│ └── theme_aliases.csv
└── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
└── theme-aliases
├── input
├── output
│ └── theme_aliases.csv
└── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
└── theme-aliases
├── input
├── output
│ └── theme_aliases.csv
└── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
└── theme-aliases
├── input
├── output
│ └── theme_aliases.csv
└── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
└── theme-aliases
├── input
├── output
│ └── theme_aliases.csv
└── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
└── theme-aliases
├── input
├── output
│ └── theme_aliases.csv
└── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
└── theme-aliases
├── input
├── output
│ └── theme_aliases.csv
└── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
└── theme-aliases
├── input
├── output
│ └── theme_aliases.csv
└── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
└── theme-aliases
├── input
├── output
│ └── theme_aliases.csv
└── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
└── theme-aliases
├── input
├── output
│ └── theme_aliases.csv
└── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
└── theme-aliases
├── input
├── output
│ └── theme_aliases.csv
└── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
└── theme-aliases
├── input
├── output
│ └── theme_aliases.csv
└── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
└── theme-aliases
├── input
├── output
│ └── theme_aliases.csv
└── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
└── theme-aliases
├── input
├── output
│ └── theme_aliases.csv
└── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
└── possible-tracks
├── input
├── output
├── run.sh
└── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
└── possible-tracks
├── input
├── output
├── run.sh
└── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
└── possible-tracks
├── input
├── output
├── run.sh
└── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
└── possible-tracks
├── input
├── output
├── run.sh
└── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
└── possible-tracks
├── input
├── output
├── run.sh
└── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
└── possible-tracks
├── input
├── output
├── run.sh
└── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
└── possible-tracks
├── input
├── output
├── run.sh
└── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
└── possible-tracks
├── input
├── output
├── run.sh
└── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
└── possible-tracks
├── input
│ └── theme-aliases
│ └── theme_aliases.csv
├── output
├── run.sh
└── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
└── possible-tracks
├── input
│ └── theme-aliases
│ └── theme_aliases.csv
├── output
├── run.sh
└── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
└── possible-tracks
├── input
│ ├── sessions
│ │ └── sessions.csv
│ └── theme-aliases
│ └── theme_aliases.csv
├── output
├── run.sh
└── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
└── possible-tracks
├── input
│ ├── sessions
│ │ └── sessions.csv
│ └── theme-aliases
│ └── theme_aliases.csv
├── output
├── run.sh
└── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
└── possible-tracks
├── input
│ ├── sessions
│ │ └── sessions.csv
│ └── theme-aliases
│ └── theme_aliases.csv
├── output
│ └── classified_sessions.csv
├── run.sh
└── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
└── possible-tracks
├── input
│ ├── sessions
│ │ └── sessions.csv
│ └── theme-aliases
│ └── theme_aliases.csv
├── output
│ └── classified_sessions.csv
├── run.sh
└── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
└── possible-tracks
├── input
│ ├── sessions
│ │ └── sessions.csv
│ └── theme-aliases
│ └── theme_aliases.csv
├── output
│ └── classified_sessions.csv
├── run.sh
└── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
└── possible-tracks
├── input
│ ├── sessions
│ │ └── sessions.csv
│ └── theme-aliases
│ └── theme_aliases.csv
├── output
│ └── classified_sessions.csv
├── run.sh
└── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
└── possible-tracks
├── input
│ ├── sessions
│ │ └── sessions.csv
│ └── theme-aliases
│ └── theme_aliases.csv
├── output
│ └── classified_sessions.csv
├── run.sh
└── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
└── possible-tracks
├── input
│ ├── sessions
│ │ └── sessions.csv
│ └── theme-aliases
│ └── theme_aliases.csv
├── output
│ └── classified_sessions.csv
├── run.sh
└── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects └── session-themes.svg
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects └── session-themes.svg
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects └── session-themes.svg
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects └── session-themes.svg
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects └── session-themes.svg
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects └── session-themes.svg
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects └── session-themes.svg
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects └── session-themes.svg
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
|
|
|
|
|
CEU MicroData and MACROMANAGERS.eu are economics research labs studying competitiveness, firm dynamics, trade and growth.
Variety of data: administrative registers, surveys, web-scraped data, commercial datasets. Typically not large, but complex.
Diverse and evolving team: 7 senior researchers, 4 research fellows, 20+ recent and 40+ past student affiliates.
Used since 2017 internally and when sharing data with others.
Saved about 600+ beads, two versions on average.
Interquartile range of bead sizes: 10 to 500 MB, largest is 23 GB.
Median time between saving new versions: 51 days.
Everything is a bead: raw data, intermediate data, analysis sample, research results.
Never load data directly, from outside a bead
We don’t often recompute everything, but nice to know we could.
Website: bead.zip
Installation: bead.zip/install
GitHub: github.com/e3krisztian/bead
Slides: bead.zip/rsecon25
bead