pkgjam: Moving Away from the One True Tree
James K. Lowden
FreeTDS maintainer & pkgsrc whiner
What is pkgjam?
- A new install tree structure, better than /usr/pkg
- A new metadata store, better than /var/db/pkg
- A new way to capture options and dependencies,
better than Makefiles and mk.conf
- A new build tool, better than make(1) for managing the overall build
Organization of Talk
- Motivation
- Welcome to pkgjam
- Dependency Independence
- Description of databases
- New build tool
- Examples
- Questions and Refutations
Motivation
- Tear-down-to-upgrade is broken by design
- pkgsrc's complexity discourages potential maintainers
- make(1) and db(1) not best tools to manage metadata
- No new licences introduced to use the build system
“As a novice at package creation, I found the guide adequate for dealing with ordinary situations, but lacking on how do deal with the weirder stuff.
Worse yet, there's almost always something weird.”
— user-pkg@ 21 April 2007
Welcome to pkgjam
- Dependency Independence: New install tree allows upgrades without teardowns
- Three Databases to manage build & run dependencies and options
- New build tool — pkg
Dependency Independence
Objective
- Install multiple versions of a package (e.g. N & N+1)
- Upgrade a shared library without disturbing installed dependents
Means
- Each package installed in its own directory, /usr/pkg/pkgname
- RPATH is /usr/pkg/pkgname/so, and all needed shared objects are hard-linked to that location.
- Executables linked to /usr/pkg/bin for convenient access by the shell
Dependency Independence (2)
Why hard links?
- Can rm -f /usr/pkg/pkgname without hurting installed applications
- Can (given time) reconstruct who-uses-what from the inodes
- Filesystem does our reference counting
- Garbage collection: filesystem frees space when link count is zero
Database Diagram
Complexity is inherent in the problem domain. We manage this complexity now, but without the benefit of a diagram or a relational database.
Interpolated strings are not a complexity management tool.
|
|
Three Databases
- bin describes installed packages
- src describes all pkgsrc packages, including
- dependencies
- options
- distfiles & URLs
- scripts
- site captures any site-specific preferences
- replaces mk.conf
- allows individual packages (perhaps OS-provided) to be installed outside /usr/pkg
bin and site are locally managed. src is built from CVS-controlled sources and is downloaded by users.
The build tool, pkg, consults all three databases to determine what to build and how to build it. (Dependencies may be satistified from bin and options driven by site.)
Interesting Database Columns
- id
- integer representing a version of a package, managed by a central registry
- option
- user-settable package choice akin to PKG_OPTIONS
- canover
- canonical version, a lexically sortable package version number
- knob
- any build-time package setting, large or small
bin Database
- Locally managed
- Replaces /var/db/pkg
- Maintained on machine by pkgjam
- Queried via new pkg info or directly with SQL
site Database
- Locally managed
- Holds user-settable options
- “Outside” packages can replace regular ones (Packages.location)
Three kinds of Options
- Global options that apply to many packages use id 0
- Global options affecting a single package use that package's id
- Package-specific options
User can discover options by quering the database.
src Database
Main tables
- Packages
- one id per name+version
- Dependencies
- define a “package mask”, a range of package versions that would satisify the package's requirement
- Alternatives
- describe cases in which one of several options satisfy a dependency
- Knobs
- are any build-time setting, including e.g. PREFIX
- Options
- control knobs, sets of knobs, and other options
Canover
- How to convert any package's version to a uniform, lexically sortable string?
- Required feature for queries to report dependencies accurately
- Impossible
- We thought so too
- 12 lines of lex proved us wrong
Describing Dependencies as Relations
- id always refers to a specific version of a package
- Options.pkgname refers to a package's name
- Options.canover_min and Options.canover_max define the first and last+1 versions that satisfy the dependency
Loading src Database
- src is built from CVS-controlled sources and is downloaded by users.
pkgjam users never see the development tree.
- Each package has a Makefile.jam with targets named after each src table.
The target produces a file suitable for importation into its table.
- How each file is produced is up to the maintainer. It may rely on additional scripts/files.
- SQLite supports a variety of import formats. Most import files are tab delimited, but scripts use CSV.
src Database Benefits
The database answers ad hoc queries instantly that either can't be answered today, or that take overnight to answer. For example:
- Which packages depend on libxml2?
- Which options are available for Package X?
- Which packages are affected by site-wide option Y?
Note: no src.Plist. Instead, we derive a plist (for bin.Plist) by watching what files are installed by the package into its directory.
Exempli Gratia
sqlite> select p.name as pkg, dp.name as 'depends on'
from Packages p
join Dependencies d on p.id = d.id
join Packages dp on d.pkgname = dp.name
where dp.name = 'libxml2';
pkg depends on
---------- ----------
libxslt libxml2
py24-libxm libxml2
xmlto libxml2
SQLite
We chose SQLite because:
- Licensing — it is in the public domain
- Simplicity — the whole database is distributed as a single file
- Lightweight — requires a binary 2/3 the size of ls(1)
Other features:
- No important SQL gaps
- Flexible import/export
- Good isql-like tool (sqlite3)
- Fast (if keys are defined)
Build tool: pkg
- Queries databases to determine dependencies and site options
- Has no package-specific knowledge
- Generates simple Makefiles with minimal reliance on macros (and no magic)
- Executes (any) make(1) to build packages
- Installs/deinstalls packages
- Updates bin database to reflect packages added/removed
Johnny Lam discusses pkg later.
Status
- Evolving rapidly
- Database design holds all pkgsrc dependencies & options, as far as we know
- Loaded src tables from make show-vars VARNAMES=[...]
- Packages
- Dependencies
- Descriptions
- pkg build tool prototype installs in new tree structure
Coming Attractions
- Load src.Options and src.Knobs
- bin proof of concept loaded from pkg_info output
- Implement pkg_info as shell script around SQLite
- “Stored procedure” module
Choose pkgjam
- Dependency Independence!
- Push complexity into the database, where it can be managed
- Describe, don't prescribe
- Better infrastucture invites
- better tools
- new maintainers
- happier users
- Less work for everyone, except me
[any material that should appear in print but not on the slide]