Inter project dependencies

Most projects today are not started out of the vacuum but instead rely heavily on third-party projects and source code. Unless those dependencies are carefully managed, it quickly becomes a nightmare to sustain effort on your own project instead of delving into troubles while installing third-party packages on development machines. Package managers, configure scripts and the make utility are all useful pieces in an edit/compile/run cycle, though none of them is enough by itself. A complete workspace management strategy integrates all pieces together. Open source projects see their contributors split their time between the project and other things in their life. Some contributors disappear and new ones appear at random. The time it takes to bring up a new contributor into the edit/compile/run cycle is a direct measure of the effectiveness of the implemented workspace management strategy.

Projects at least depend on some compiler toolchain. Most times, they also require third-party tools and libraries. Configure scripts based on autoconf are usually used to analyze a local machine for prerequisites. Unfortunately, once a prerequisite is identified as missing, very little help is given on how to fulfill that dependency.

Most open source distributions use a notion of package and tools called package managers to resolve runtime dependencies. It eventually guarantees a whole subsystem of projects will run together. As package managers focus on runtime stability, installed files on a local machine often get out of date with the prerequisites of a project's head leaving developpers having to fall back on compiling third-party dependencies from source, duplicating most of the efforts that goes into building distribution packages.

When development requires changes in multiple projects, there are also no way to rebuilt those projects in dependence topological order. In most of these cases, custom shell scripts and/or recursive makes are put together as needed.

A workspace concept and a workspace manager tool (dws) has been put together to alleviate the time each contributor deals with dependency troubles.

A workspace relies on environment variables (srcTop, buildTop, etc.) and a project dependency graph.

The environment variables are set in a workspace configuration file (dws.mk) which is also interpreted as a makefile fragment. The workspace tool populates the configuration file as necessary at the time a variable value is required. The environment variables consist mostly of directory paths where source files are checked out (srcTop), where object files are built (buildTop), etc.

The dependency graph contains information typically found in a package manager (yum, apt, etc.) as well as information usually found in configure scripts (autoconf). In the same way Linux distributions gather packages dependencies into a global dependency database, project dependency information is aggregated in a single index file (dws collect). A master copy of the project dependency graph is stored on a remote server and cached locally on the development local machine.

The workspace tool will extract a subset of the dependency graph from a set of projects. All projects in the extracted subset which are neither present as a directory in srcTop nor whose generated executables, headers, libraries, etc. found pre-installed on the local machines will be flagged as missing.

A project does not built as long as there are missing prerequisites and missing prerequisistes can either be classified as development or stable. A development prerequisiste is based on a source repository because a developer pulling down a prerequisiste project as a source repository expects to run edit/build/run cycles on it. With a stable prerequisiste, a developer expects headers and libraries to be present on the local system (i.e. in either /usr or /usr/local) in order to compile development projects but that is it. This difference is embodied in the dws build and dws install commands in conjunction with the repository, patch and package tags in the project index file.

Development
  • The dws build command does a deep traversal of the dependency graph and re-make every project that has a repository tag. Those projects are considered to be part of the development workspace and will be checked-out as a source controlled directory in srcTop.

  • Stable
  • When the goal is to bring a local system with a set of applications and libraries instead of developing those, the dws install command will be used. It does a shallow traversal of the dependency graph, installing packages only as necessary to bring the local system in sync with the definitions in the index file.

    If the prerequisites for a project in the project index file cannot be found, a binary or source package will be installed to statisfy those. Ideally all third-party prerequisites could be installed through the local package manager (apt-get, yum, etc.). In practice it is not the case for many varied reasons. For example:

    • The third-party prerequisite is not available in the official package repo
    • The third-party prerequisite in the official package repo is older than the required version
    • The third-party prerequisite in the official package repo was compiled with inadequate configuration options.
    Under such circustances, it is in the advantage of a project to provide substitute packages as either binary () or source (). These should be considered temporary stop-gap while the official distribution is catching-up on upstream projects. dws will thus try to install a prerequisite through the local package manager (apt-get, yum, etc.), then try to install from custom-made packages and last politely indicate the contributor must resort to manual install.

  • Rebuilding the intra-projects dependency database

    When an project's local index.xml file has been modified, a collect command needs to be run in order to integrate the changes into the intra-projects dependency database used by everyone. Following are the commands to rebuild the intra-projects dependency databases currently available on fortylines server.

    
    
    # 1. Excludes 'test[0-9]' such that the unit tests for the drop project itself 
    # do not end-up as first-class projects.
    # 2. Excludes 'machines' such that the scripts used to configure different 
    # machines in fortylines IT infrastructure do not end-up being executed
    # and repurpose a build machine into a release machine.
    dws --exclude 'test[0-9]' --exclude machines collect
    scp siteTop/db.xml fortylines.com:/var/fortylines
        
    

    As we try to separate the public and private repositories as well as create an efficient way to publish changes to the website, we ended-up with three three build index file related to fortylines code. The first one, fortylines.com:/var/www/fortylines.com/resources/dws.xml, is used to build all of fortylines public projects. The second one, fortylines.com:/var/private/resources/private-fortylines.xml, is used to build both the public projects and the private projects supporting the business model (tests, etc.). The third one, fortylines.com:/var/private/resources/website-fortylines.xml, contains only the necessary projects to publish fortylines public website. It is the only build index file intended to be executed specifically on the server machine. Unfortunately, due to permissions and a few other constraints, the publication of the website, for now, still requires to run the following command.

    Distribution Packages

    Most operating systems distribute software in the form of packages. At a high-level, each package defines prerequisites packages that need to be installed on the system before it, itself, can be used. The operating system uses a package manager to analyze the packages dependency graph and determine how to update the local system based on an end-user selection.

    Software is always in an ever-evolving state. In order to keep a local system up-to-date with latest available features and fixes, software is tested, built, packaged and stored on remote update servers. Each package comes with a small specification file that primary contains information about a package prerequisite dependencies. The remote server aggregates all specification files into a small index file that contains the packages dependency graph. On a local system update, the local package manager downloads the index file and through the analysis of the packages dependency graph, decides to download actual packages and install them on the local system.

    Packaging-style Examples
    Debian Redhat FreeBSD
    local tools apt-* yum port
    package files .deb .rpm .tar.bz2 + patches
    index database Packages.bz2 SQLlite database Set of Portfiles
    building packages debuild rpmbuild built on local

    The inter-project dependencies tool (dws) naturally extends the ideas of distribution packages to projects and the development cycle. It is thus natural to be able to build and install binary packages through dws while relying on the local system package manager as much as possible.

    Information specific to OSes used at fortylines

    Ubuntu

    Ubuntu is a derivative of the Debian family and thus uses most of the same tools and file formats for package distribution.

    The Debian Policy Manual is a must-read reference manual. Compatibility levels and compatibility codes is a good explanation about, well ..., Compatibility levels and compatibility codes.

    The apt- utilities are the original suite of tools for installing packages on Ubuntu even though aptitude is also very popular nowadays.

    distribution packages come in the form of .deb files.

    The index database of packages dependencies is a compressed text file called Packages.bz2.

    To build a package, Makefiles are added into the upstream source tree in a debian/ subdirectory and the commands cd %name && debuild is invoked from the top of the source tree. Rolling your own Debian package and Rolling your own Ubuntu package are good introductions to building .deb files.

    Information about creating a Debian repository shows how to distribute those packages to a community.

    Fedora

    Fedora is a Redhat sponsored distribution and uses most of the same tools and file formats for package distribution as Redhat Enterprise.

    yum is used for package management on Fedora and Redhat Enterprise.

    Packages come in the form of a .rpm file

    The index database with packages dependencies is stored as a SQLlite database under the name repodata/.

    Fedora control files, %{name}.spec, are standalone shell-like scripts used to build packages out of an upstream source archive with a command such as rpmbuild -bb --clean %name.spec. RedHat Package Manager, Building Fedora RPM Packages and Fedora Packaging guidelines have starting documentation on building .rpm files.

    Creating a remote repository is explained at length in RedHat Package Manager Tips and How to run your own yum repository. In modern Fedora distribution, yum-arch has been replaced by createrepo.

    OSX

    There is no official update tool for all software deployed on an OSX platform even though packaging projects as disk images is a common practise. The internals of OSX (aka Darwin), as a derivative from the BSD family, it is also common to find open source software distributed through the MacPorts project.

    Simple applications on OSX come as a .app that needs to be dragged into the /Applications folder while complex applications come as .pkg directory. Both typically come bundled as a .dmg disk image.

    The Apple Website and macports are the two main repositories to search for OSX packages.

    The documentation to package OSX software is sparse and most times explained as a sequence of clicks through the XCode GUI which is far from convenient for automation. See Packaging and Distribution Software by Apple, packagemaker man pages and the dbldpkg.py python script which is a food starting point for understanding the command line tools available.

    Windows

    There is no official update tool for all software deployed on a Windows platform. Cygwin has become a de-facto manager for open source packages and cyg-get.py is script very convenient to update packages from the command-line.

    Cyg-apt turned out to be quite old and unreliable with current cygwin repositories. Cyg-get.py uses optparse so the command line syntax is quite strange for a package manager tool but it worked very well.

    Most times, recent packages cannot be found in cygwin but are available in cygwinports. Since both sites are using the same distribution model, cyg-get.py can be used to download for either one.

    All software compiled for Windows using cygwin need to be delivered with the cygwin.dll. On the other hand, mingwin uses native win32 calls directly.

    Distribution packages come in the form of .tar.bz2 files.

    The index database is a simple formatted text file called either setup.ini or setup-2.ini

    There are at least three methods described to package software for cygwin as written in Cygwin Package Contributor"s Guide. A setup.hint file and a cygport %name all command will look very familiar to Fedora packagers.

    Solaris

    Information for Solaris package management is also available. The update tool is called pkg-get.

    apt-*

    Looking at Project-Builder.org might be a good step to find more details information. These perl scripts aim to provide support to automate packaging for different distribution.

    configure

    index xml file

    dws.xml contains package dependency file info and configure info. create links.

    Related tools

    Source/project configuration