a provenance tool
RSECon25
You need to:
Team has been improving data cleaning since then. Some team members left.
Different statistical methods now.
First question: How exactly was Figure 1 produced?
Figure 1 = code(data)
Results depend on both algorithms and data.
Code under version control (Git) ✓
Tagged commit at submission ✓
But what about the data?
data₁ = code₂(data₂)
Data transformed by wrangling/cleaning steps
You want to reconstruct the chain of data provenance.
Frequent changes | Code and data both evolve |
Complex pipelines | Many steps, multiple datasets |
Tool heterogeneity | Python, R, SQL, DuckDB all in one project |
Team dynamics | People join, leave, change roles |
Version Control (Git) | Not suitable for large binary data. |
Data Version Control (DVC) | Only versions data. |
Orchestration Tools (Airflow, dbt, KNIME) | Language specific and too complex. |
A command-line tool that packages code and data together in immutable snapshots, with all data dependencies declared explicitly.
Conference session classification by title
Inputs:
Output:
/demo/projects
/demo/projects
/demo/projects
/demo/projects
/demo/projects
/demo/projects
/demo/projects
/demo/bead-box
/demo/projects
/demo/bead-box
/demo/projects
/demo/bead-box
/demo/projects
/demo/bead-box
/demo/projects
/demo/bead-box
/demo/projects
/demo/bead-box
/demo/projects
/demo/bead-box
/demo/projects
/demo/bead-box
/demo/projects
/demo/bead-box
/demo/projects
/demo/bead-box
/demo/projects └── sessions ├── input ├── output └── temp
/demo/bead-box
/demo/projects └── sessions ├── input ├── output └── temp
/demo/bead-box
/demo/projects └── sessions ├── input ├── output └── temp
/demo/bead-box
/demo/projects └── sessions ├── input ├── output └── temp
/demo/bead-box
/demo/projects └── sessions ├── input ├── output │ └── sessions.csv └── temp
/demo/bead-box
/demo/projects └── sessions ├── input ├── output │ └── sessions.csv └── temp
/demo/bead-box
/demo/projects └── sessions ├── input ├── output │ └── sessions.csv └── temp
/demo/bead-box
/demo/projects └── sessions ├── input ├── output │ └── sessions.csv └── temp
/demo/bead-box
/demo/projects └── sessions ├── input ├── output │ └── sessions.csv └── temp
/demo/bead-box └── sessions_20250910T150420306964+0000.zip
/demo/projects └── sessions ├── input ├── output │ └── sessions.csv └── temp
/demo/bead-box └── sessions_20250910T150420306964+0000.zip
/demo/projects
/demo/bead-box └── sessions_20250910T150420306964+0000.zip
/demo/projects
/demo/bead-box └── sessions_20250910T150420306964+0000.zip
/demo/projects
/demo/bead-box └── sessions_20250910T150420306964+0000.zip
/demo/projects
/demo/bead-box └── sessions_20250910T150420306964+0000.zip
/demo/projects
/demo/bead-box └── sessions_20250910T150420306964+0000.zip
/demo/projects
/demo/bead-box └── sessions_20250910T150420306964+0000.zip
/demo/projects
/demo/bead-box └── sessions_20250910T150420306964+0000.zip
/demo/projects
/demo/bead-box └── sessions_20250910T150420306964+0000.zip
/demo/projects └── theme-aliases ├── input ├── output └── temp
/demo/bead-box └── sessions_20250910T150420306964+0000.zip
/demo/projects └── theme-aliases ├── input ├── output └── temp
/demo/bead-box └── sessions_20250910T150420306964+0000.zip
/demo/projects └── theme-aliases ├── input ├── output └── temp
/demo/bead-box └── sessions_20250910T150420306964+0000.zip
/demo/projects └── theme-aliases ├── input ├── output └── temp
/demo/bead-box └── sessions_20250910T150420306964+0000.zip
/demo/projects └── theme-aliases ├── input ├── output │ └── theme_aliases.csv └── temp
/demo/bead-box └── sessions_20250910T150420306964+0000.zip
/demo/projects └── theme-aliases ├── input ├── output │ └── theme_aliases.csv └── temp
/demo/bead-box └── sessions_20250910T150420306964+0000.zip
/demo/projects └── theme-aliases ├── input ├── output │ └── theme_aliases.csv └── temp
/demo/bead-box └── sessions_20250910T150420306964+0000.zip
/demo/projects └── theme-aliases ├── input ├── output │ └── theme_aliases.csv └── temp
/demo/bead-box └── sessions_20250910T150420306964+0000.zip
/demo/projects └── theme-aliases ├── input ├── output │ └── theme_aliases.csv └── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects └── theme-aliases ├── input ├── output │ └── theme_aliases.csv └── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects └── possible-tracks ├── input ├── output └── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects └── possible-tracks ├── input ├── output └── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects └── possible-tracks ├── input ├── output └── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects └── possible-tracks ├── input ├── output └── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects └── possible-tracks ├── input │ └── sessions │ └── sessions.csv ├── output └── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects └── possible-tracks ├── input │ └── sessions │ └── sessions.csv ├── output └── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects └── possible-tracks ├── input │ ├── sessions │ │ └── sessions.csv │ └── theme-aliases │ └── theme_aliases.csv ├── output └── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects └── possible-tracks ├── input │ ├── sessions │ │ └── sessions.csv │ └── theme-aliases │ └── theme_aliases.csv ├── output └── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects └── possible-tracks ├── input │ ├── sessions │ │ └── sessions.csv │ └── theme-aliases │ └── theme_aliases.csv ├── output ├── run.sh └── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects └── possible-tracks ├── input │ ├── sessions │ │ └── sessions.csv │ └── theme-aliases │ └── theme_aliases.csv ├── output ├── run.sh └── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects └── possible-tracks ├── input │ ├── sessions │ │ └── sessions.csv │ └── theme-aliases │ └── theme_aliases.csv ├── output ├── run.sh └── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects └── possible-tracks ├── input │ ├── sessions │ │ └── sessions.csv │ └── theme-aliases │ └── theme_aliases.csv ├── output ├── run.sh └── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects └── possible-tracks ├── input │ ├── sessions │ │ └── sessions.csv │ └── theme-aliases │ └── theme_aliases.csv ├── output │ └── classified_sessions.csv ├── run.sh └── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects └── possible-tracks ├── input │ ├── sessions │ │ └── sessions.csv │ └── theme-aliases │ └── theme_aliases.csv ├── output │ └── classified_sessions.csv ├── run.sh └── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects └── possible-tracks ├── input │ ├── sessions │ │ └── sessions.csv │ └── theme-aliases │ └── theme_aliases.csv ├── output │ └── classified_sessions.csv ├── run.sh └── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects └── possible-tracks ├── input │ ├── sessions │ │ └── sessions.csv │ └── theme-aliases │ └── theme_aliases.csv ├── output │ └── classified_sessions.csv ├── run.sh └── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects └── possible-tracks ├── input │ ├── sessions │ │ └── sessions.csv │ └── theme-aliases │ └── theme_aliases.csv ├── output │ └── classified_sessions.csv ├── run.sh └── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects └── possible-tracks ├── input │ ├── sessions │ │ └── sessions.csv │ └── theme-aliases │ └── theme_aliases.csv ├── output │ └── classified_sessions.csv ├── run.sh └── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects └── possible-tracks ├── input │ ├── sessions │ │ └── sessions.csv │ └── theme-aliases │ └── theme_aliases.csv ├── output │ └── classified_sessions.csv ├── run.sh └── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects └── possible-tracks ├── input │ ├── sessions │ │ └── sessions.csv │ └── theme-aliases │ └── theme_aliases.csv ├── output │ └── classified_sessions.csv ├── run.sh └── temp
/demo/bead-box ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects └── possible-tracks ├── input │ ├── sessions │ │ └── sessions.csv │ └── theme-aliases │ └── theme_aliases.csv ├── output │ └── classified_sessions.csv ├── run.sh └── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects └── possible-tracks ├── input │ ├── sessions │ │ └── sessions.csv │ └── theme-aliases │ └── theme_aliases.csv ├── output │ └── classified_sessions.csv ├── run.sh └── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects └── theme-aliases ├── input ├── output │ └── theme_aliases.csv └── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects └── theme-aliases ├── input ├── output │ └── theme_aliases.csv └── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects └── theme-aliases ├── input ├── output │ └── theme_aliases.csv └── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects └── theme-aliases ├── input ├── output │ └── theme_aliases.csv └── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects └── theme-aliases ├── input ├── output │ └── theme_aliases.csv └── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects └── theme-aliases ├── input ├── output │ └── theme_aliases.csv └── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects └── theme-aliases ├── input ├── output │ └── theme_aliases.csv └── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects └── theme-aliases ├── input ├── output │ └── theme_aliases.csv └── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects └── theme-aliases ├── input ├── output │ └── theme_aliases.csv └── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects └── theme-aliases ├── input ├── output │ └── theme_aliases.csv └── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects └── theme-aliases ├── input ├── output │ └── theme_aliases.csv └── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects └── theme-aliases ├── input ├── output │ └── theme_aliases.csv └── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip └── theme-aliases_20250910T150620911398+0000.zip
/demo/projects └── theme-aliases ├── input ├── output │ └── theme_aliases.csv └── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects └── theme-aliases ├── input ├── output │ └── theme_aliases.csv └── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects └── possible-tracks ├── input ├── output ├── run.sh └── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects └── possible-tracks ├── input ├── output ├── run.sh └── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects └── possible-tracks ├── input ├── output ├── run.sh └── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects └── possible-tracks ├── input ├── output ├── run.sh └── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects └── possible-tracks ├── input ├── output ├── run.sh └── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects └── possible-tracks ├── input ├── output ├── run.sh └── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects └── possible-tracks ├── input ├── output ├── run.sh └── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects └── possible-tracks ├── input ├── output ├── run.sh └── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects └── possible-tracks ├── input │ └── theme-aliases │ └── theme_aliases.csv ├── output ├── run.sh └── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects └── possible-tracks ├── input │ └── theme-aliases │ └── theme_aliases.csv ├── output ├── run.sh └── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects └── possible-tracks ├── input │ ├── sessions │ │ └── sessions.csv │ └── theme-aliases │ └── theme_aliases.csv ├── output ├── run.sh └── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects └── possible-tracks ├── input │ ├── sessions │ │ └── sessions.csv │ └── theme-aliases │ └── theme_aliases.csv ├── output ├── run.sh └── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects └── possible-tracks ├── input │ ├── sessions │ │ └── sessions.csv │ └── theme-aliases │ └── theme_aliases.csv ├── output │ └── classified_sessions.csv ├── run.sh └── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects └── possible-tracks ├── input │ ├── sessions │ │ └── sessions.csv │ └── theme-aliases │ └── theme_aliases.csv ├── output │ └── classified_sessions.csv ├── run.sh └── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects └── possible-tracks ├── input │ ├── sessions │ │ └── sessions.csv │ └── theme-aliases │ └── theme_aliases.csv ├── output │ └── classified_sessions.csv ├── run.sh └── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects └── possible-tracks ├── input │ ├── sessions │ │ └── sessions.csv │ └── theme-aliases │ └── theme_aliases.csv ├── output │ └── classified_sessions.csv ├── run.sh └── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects └── possible-tracks ├── input │ ├── sessions │ │ └── sessions.csv │ └── theme-aliases │ └── theme_aliases.csv ├── output │ └── classified_sessions.csv ├── run.sh └── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects └── possible-tracks ├── input │ ├── sessions │ │ └── sessions.csv │ └── theme-aliases │ └── theme_aliases.csv ├── output │ └── classified_sessions.csv ├── run.sh └── temp
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── sessions_20250910T150420306964+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects └── session-themes.svg
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects └── session-themes.svg
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects └── session-themes.svg
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects └── session-themes.svg
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects └── session-themes.svg
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects └── session-themes.svg
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects └── session-themes.svg
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
/demo/projects └── session-themes.svg
/demo/bead-box ├── possible-tracks_20250910T151020873645+0000.zip ├── theme-aliases_20250910T150620911398+0000.zip └── theme-aliases_20250910T151300589642+0000.zip
|
|
|
|
|
CEU MicroData and MACROMANAGERS.eu are economics research labs studying competitiveness, firm dynamics, trade and growth.
Variety of data: administrative registers, surveys, web-scraped data, commercial datasets. Typically not large, but complex.
Diverse and evolving team: 7 senior researchers, 4 research fellows, 20+ recent and 40+ past student affiliates.
Used since 2017 internally and when sharing data with others.
Saved about 600+ beads, two versions on average.
Interquartile range of bead sizes: 10 to 500 MB, largest is 23 GB.
Median time between saving new versions: 51 days.
Everything is a bead: raw data, intermediate data, analysis sample, research results.
Never load data directly, from outside a bead
We don’t often recompute everything, but nice to know we could.
Website: bead.zip
Installation: bead.zip/install
GitHub: github.com/e3krisztian/bead
Slides: bead.zip/rsecon25
bead