Source code conventions

The quality of a software system relies fundamentally on dedicated and inventive contributors supported by appropriate procedures and tools. The purpose of conventions is to help developers use pre-existing general knowledge and contribute to projects they might be unfamiliar with.

As projects grow in size, it becomes cumbersome to enforce conventions manually. As a result, the only conventions that truely exist are the ones that can be checked and enforced by mechanical tools that run automatically as code is pushed into shared repositories, deployed to users, etc.

Text formatting

Unicode is a native standard on almost all systems dealing with text representations Today, yet most source code remains firmly anchored in the ASCII subset. Semantically, either it is variable names or comments, source code is strongly rooted in the English language.

In a text editor, letter keys on the keyboard, when pressed, insert a readable representation of that character on the display and store its encoding in the associated file. Interpretation is straightforward. This is often not the case for end-of-line or tab command keys. Deciding what is best often end-up in religious wars, as does discussion about the 80-columns rule and camelcase versus underscore.

End of line markers

Historically two characters have been used to mark the end of line: carriage return (commonly referred as CR, '\r', 0x0D or 13) and line feed (commonly referred as LF, '\n', 0x0A or 10).

unix LF
dos CR+LF
older mac CR

Spaces, tabs and indentation

The tab key is a powerful function key that does a lot of different things depending on the editor and context. The tab key can be used to jump from columns to columns in a spreadsheet text file (.csv). The tab key can also be used to indent the current line of a source code file to the appropriate scope level. Finally the tab key can be used, well, to insert a tab character in a file. That is when things get complicated. What should an editor do when it sees a tab character in a file? It could treat it as

  • a separator between spreadsheet columns

  • indentation of the following code statement on the same line

  • alignment on a modulo-N boundary when using fixed-size fonts

  • a short encoding for N space characters

As a result most teams decides on a no-tab-characters in source files policy (i.e. only spaces are allowed). Blindly enforcing that policy is not always advisable. For example, Makefiles rely on the tab character as a special marker for commands.

A slightly less restrictive and clever, policy is to allow tab characters solely at the begining of a line but nowhere after the first non-tab charcter on a line.

Eighty columns rule

Historically, the eighty columns rule was meant to make printing easier. These days, with larger screens, URLs that regularly blow past eighty characters, complex HTML layout that require deep node hierarchies, that line width limit looks too restrictive.

None-the-less, rationale behind a hard limit on line width remains. Think about that book that is just an inch taller than the others in your library and does not fit on the shelve. How many times have you cursed the "clever" publisher?

The majority of the programming community has settle on an eighty columns limit. Unless you can reliably prove that changing it improves productivity by a factor ten, stick with the eighty column rule.

CamelCase versus _underscore_

There does not seem any definitive proof of CamelCase or _underscore_ being more readable than the other. It is only agreed that mixing both styles is distateful.

In general because complex products involve various programming languages, a cross-organization policy will run counter to at least one community and thus is not advisable.

The pragmatic approach to follow the style conventions set by the standard library of each language has also the benefit to prepare your projects for acceptance by the wider community.

License Headers

The value of software is tied to Intellectual Property. The first few files created have a copyright statement. A year pass by, dates do not get updated. The company is renamed or sold. Old copyright statements remain.

New files are created. Developers are unsure which copyright statement to use.

The only way to solve those conflicts in a project is to early on make a clear choice by edict and ruthlessly enforce a zero-tolerance policy through mechanical tools.

On a last note, there are often coding rules that states something akin to: "Class names start with a uppercase letter; variable names start with a lowercase letter." These rules open a pandora box and need to be carefully debated before a team decides to implement them. Here how it goes:

Your most junior programmer writes a simple script in less than a day to check the format characters rule, column-width rule, and that identifier match a specific regular expression rule. The script is deployed as a git-hook on the same day and catches 100% of text formatting issues (no false positives). You never hear about tabs, spaces nor any formatting issues anymore.

You cleverly think it would be great that constant names are all uppercase, while variable names start with a lowercase, class names start an uppercase, etc. Now you have tied the language semantic to typography and somehow need to integrate a proper parser to get it right... It is not as simple anymore. You postpone stylistic and text formatting checking for later. Religious wars go on. You don't even have a tool that checks each file starts with a license statement...

Coding guidelines

While it is possible to enforce all text formatting conventions mechanically, the coding guidelines involve a important part of human judgement and have been devised based on experience as well as Djaodjin's business model.

General

Open source enables to build businesses faster and cheaper. In terms of hiring and marketing, the economical advantages to be involved with opensource are also undeniable. Of course, putting disparate pieces of open source software together and keeping machines running is no small feat. The Djaodjin Team thus makes the rational choice to open source, by default, all infrastructure Django apps we write as stated in the Djaodjin organizational guidelines.

Fonts images and other large binary content not in repo.

For projects with a weak package management system or because sources of those third-party libraries need to be modified before being deployed (ex: CSS and Javascript), it is often preferable to include them in the application source control repository. By convention, third-party sources are committed under a vendor/ subdirectory. It is thus possible for statistics tools running on the repository to exclude those third-party sources from the metrics reported.

Directory structure

README.md
  • General README file with information on how to get started.

  • Makefile
  • Makefile used to build and install a project files on the local system.

  • htdocs/
  • Files which are not stored in the source repository, either because they are significantly large binary data files or they are necessary cached files that can be recreated from code in the repository.

  • doc/
  • General documentation for the project which cannot be extracted from the source files.

  • Convention for URLs

    XXX

    C++ coding guidelines

    No "#if 0" nor "#if 1" without a blessed XXX (todo) comment.

    No debugging messages through std::cerr statements in code committed to a shared repository.

    Code must compile with no warnings when using -Wall command line flag.

    -Weffc++ is an helpful g++ compiler flag. It will warn of style violations with regards to Meyers "Effective C++" coding practices as outlined in the book of the same name.

    Repository directory structure

    include/
  • Public header files (.hh, .tcc) that are installed in includeDir (ex. /usr/local/include).

  • src/
  • Source code in the form of C++ (.cc) files.

  • Python coding guidelines

    Follow PEP8 convention.

    We distinguish three groups of imports in Python. First come all the imports related to the standard library. Second follow, separated by a whitespace line, the imports related to third-party libraries (the one showing up in requirements.txt). Third, separated by a whitespace line, statements related to module in the same repo conclude the list of imports.

    XXX No print use LOGGER don't use print kwargs but print "XXX kwargs" instead

    No "if None:" without a blessed XXX (todo) comment.

    Django app coding guidelines

    A Django App is a generic piece of software that can be used accross many projects. It is not intended to be a full feldge application by and of itself. The guidelines for writing django apps thus articulates around:

    • Make it easy to re-use the app logic.

      Make it capital intensive to reproduce finished products.

    Write enough html in templates to validate the logic but no more. For example, do not use:

    • i18n in templates

    • class attributes (ex. class=""...) that are purely for design look purposes

    By convention:
    • {% extends "base.html" %}

    • override the {% block content %} defined in base.html

    Do not decorate templates with CSS classes

    Use {% static %} and not {% assets %} in templates

    Do not use login_required method decorators.

    Directory structure

    app/
  • Subdirectory that contains the code of the django app. Unless indicated otherwise, all files in that subdirectory follow established Django conventions (models.py, urls.py, views.py, etc.).

  • compat.py
  • Deal with differences between Django and other prerequisites versions. Present a consistent API to the rest of the app code.

  • settings.py
  • Settings exported by the application. Modules of the app should import from the app settings instead of directly from the main django.conf.settings module.

  • testsite
  • testsite/

    Subdirectory that contains a sample project to test the django app.

  • Django projects

    XXX - rule: use block.super, css/js always in assets. - describe: copyright notice on top of files, order of imports. - write about how to organize tests (sunny, etc.) - don't change top level content block {% block content %} A Django project holds the code for a web product. It is typically deployed within a gunicorn application container behind a nginx front-end. Project repositories are not open sourced.

    Use djaodjin deploy utils

    Use django assets

    Use crispy_forms

    Modify boostrap less files, do not use text-danger for red.

    Use urldecorators security in toplevel urls.py

    Directory structure

    project/
  • Subdirectory that contains the code of the django project. Unless indicated otherwise, all files in that subdirectory follow established Django conventions (models.py, urls.py, views.py, etc.). This repository name should match the name of the project (for deployment).

  • .gitignore
  • 
    
    *.pyc
    *~
    *.xcodeproj
    .DS_Store
    /htdocs/static/img
    htdocs/static/.webassets-cache/
    htdocs/static/cache/
    htdocs/static/fonts/
    .timestamp
                
    
  • htdocs/
  • All files stored in htdocs/ will be served directly by the nginx front-end in production.

  • htdocs/static/
  • Files in htdocs/static/ are non-minified version of css and javascript files served by the django runserver during development.

  • etc/credentials
  • Definition of settings related to credentials

  • etc/site.conf
  • Definition of settings related to the machine serving the webapp

  • etc/gunicorn.conf
  • Definition of settings related to gunicorn

  • vendor/bootstrap/less/
  • Less files specific to the project

  • Unit tests

    The purpose of unit tests is to insure first that functionality is implemented correctly. Later in the life cycle of a project, unit tests are often relabeled regression tests and insure that changes to the source code did not break existing functionality.

    There are benefits of unit testing beyhond functional correctness. They help to make better use of the limited time and resources at integration. We can also use unit tests to support refactoring, especially in the context of dynamic languages.

    Functional and regression tests

    The definition of functional and regression tests are very often used interchangeably because a functional test can serve as the base for a regression test as much as a regression test can be used to validate functionality.

    For our purpose here, the major distinction between a functional and regression test is how they were originally created. The developer of a functional test has explicitly thought about the passing and failing conditions and written explicit statements for them in the source code that is checked into the repository. The developer of a regression test typically used the build framework to trigger comparison of different runs of the same program.

    As such functional tests usually result in source code while regression tests result in makefile rules. Functional tests also typically result in self-contained executables while regression tests rely on external data sets usually stored outside the version controlled repository.

    functional regression
    explicit pass conditions output comparison
    source code makefiles
    self-contained rely on external data sets

    Unit tests will typically start breaking and become deprecated as a project moves forward. As a result, relevant unit tests are the ones that can be run by an automated tool as part of the build process.

    Regardless of the original specification of a unit test, the web presentation engine provides both a test functional view and a test regression view. Functional views are useful in the release process to validate that tests pass on all systems and configurations. A failing test in the context of a functional view usually indicates an unanticipated error that must be further investigated. Regression views tie a change in behavior to a specific source revision and or system. Regression views will thus be most useful to contributors investigating why a previously passing test started to fail.

    Projects and tests source repositories

    There are usually two approaches to store the tests associated to a project. Either they can form a sub-directory in the project hierarchy and managed as part of the same physical repository or they can form their own directory hierarchy and managed in a logically separate repository. Choosing either approach depends on the distribution model. In a typical two-party model made of a developer and a user, the developer provides a source package to the user; the user compiles the code provided, checks the stability of it and installs it on the system. In a typical three-party model made of a developer, a distributor and a user, the developer provides a source package to the distributor; the distributor compiles the code provided, checks the stability of it and provides a binary package to the user; the user then updates the system and installs the binary package as a side effect.

    
    # Two-party model
    
    
    # Three-party model
    
    
    # developer
    make dist-src
    
    
    # developer
    make dist-src
    
    
    # distributor
    cd project-test
    tar zxvf project.tar.gz
    make
    make dist
    
    
    # user
    tar zxvf project.tar.gz
    cd project
    ./configure
    make
    make check
    make install
    
    
    # user (ex. Ubuntu)
    apt-get install project
    

    In a two-party model, including tests as a sub-directory in the project repository seems more appropriate. It facilitates the life of the developer to deal with a single repository and distribute tests along the project into a single package. The end-user can also rely on a single source package and a straightforward procedure to deploy a project on the system.

    In a three-party model, dual repositories approach for project and tests is better suited then. The test repository can be relatively large with regards to the project repository itself and of little use for anyone but the distributor. Furthermore the responsibility of writing and running tests have been off loaded to the distributor so it is usually more practical for the distributor to maintain its own tests repository separate from the developer project repository.

    Fortylines uses the dual repositories approach. The tests for a project form their own repository in a separate directory hierarchy. The organization of a test project though follows the organization of a regular project directory consisting of Makefile, src/ and data/, etc. directories. Thus project and tests are purely a distribution policy difference and the workspace management tools are agnostic of that difference.