Posts for webapp

Continuous Integration for a Javascript-heavy Django Site

by Sebastien Mirolo on Sat, 25 May 2013

In the popular series on how technical decisions are made, we will see Today why fortylines picked django-jenkins, phantomjs, casperjs and django-casper to build its continuous integration infrastructure on.

Continuous integration for Django projects

Python is a great language. It makes it quick to get to working code in production. Unfortunately like many dynamic programming languages, issues arise when you start to add features and rework the code base. With a static type checking language like C++, the compiler would caught stupid mistakes like variable name that wasn't updated all the places it is used, or like a function parameter that used to be an integer and is now a list of integers.

Welcome to the world of extensive testing. If you are looking to deploy code changes at the pace required to run a webapp, continuous integration becomes rapidly an important aspect of the development cycle.

django-jenkins is a perfect start if, like us, you are running django and jenkins. The only unfortunate hic-hup so far is the absence of support for behave. That is unfortunate because we chose behave over lettuce for various reasons.

Since behave accepts a --junit command line flag, it is though possible to integrate and jenkins directly, as a subsequent command.

$ python manage.py jenkins
$ behave --junit --junit-directory ../reports

SIDE NOTE: There is a django-behave project. Unfortunately using it will remove the ability to run standard django testsuites - or more accurately from reading comments in the code - mixing django-behave and tests based on unittest hasn't been implemented yet. There has been no update to the repository in eight months as of writing this post.

Javascript content

After starting out with Amazon Payment System, integrating with Paypal, we finally settled on stripe.

Stripe is great from a developer's perspective. APIs and documentations are excellent. The one feature that is advertised as a benefit to developers is also the feature that threw off our django/behave/jenkins initiative: Javascript.

Until now we used mechanize to simulate a browser and check the generated HTML. With the introduction of Javascript in our stack, it wasn't possible to rely on mechanize alone. We needed to integrate a javascript engine in our test infrastructure.

The first intuition was to use python bindings to selenium, a browser automation framework.

$ pip install selenium --upgrade
# Download the webdriver for your browser
$ unzip ~/Download/chromedriver_mac_26.0.1383.0.zip
$ mv chromedriver /usr/local/bin
$ export PATH=$PATH:"/Applications/Google Chrome.app/Contents/MacOS"

Selenium is quite heavy weight. It requires to launch the browser executable and that will trigger GUI windows popping-up on your screen. You might be able to install all the required packages to run a X11 server with no display attached on your continuous server virtual machine but that seems like overkill and like a potential rat hole of package dependencies.

Welcome to phantomjs, the headless webkit engine.

Browsing around for phantomjs and BDD, it wasn't long before I stumbled on jasmine and djangojs. Jasmine is wonderful to unit tests javascript code and djangojs helps integrate a javascript-heavy webui into a django site. Both projects deliver as promised. That is where the subtlety lies. We needed something to drive end-to-end system tests, something that would help write tests at the level of "open url", "fill form", "click button", "check text in html page", etc.

We thus reverted our first attempt of using phantomjs with jasmine and djangojs and started to look again for a more suited solution. That is how a few searches later we ended-up on casperjs and django-casper. By itself casperjs generates junit xml output. You can thus use casperjs straight from the command line in your jenkins shell.

$ cat hellotest.js
casper.test.comment('this is an hello test');
casper.test.assertTrue(true, "YEP");
casper.test.done();
$ casperjs test --xunit=./test-result.xml hellotest.js

Once integrated into django through a django-casperjs wrapper, your tests look and behave like regular django tests. Hence they integrate perfectly with manage.py test and django-jenkins. Excellent!

How we picked d3js to draw SaaS metrics

by Sebastien Mirolo on Fri, 1 Mar 2013

There are only few webapps that can do without displaying nice looking charts. This is even more so when you are running a Software-as-a-Service (SaaS) website. If you believe we are living in a knowledge economy as I previously described in Open source business models, this means we must search and are bound to find already made solutions.

This post started as the hunt for an open source solution to draw nice looking charts within fortylines django webapp but after much googling and experimenting, it was better re-written as an insight on how technical decisions are made. I hope you find the journey interesting.

First and foremost, fortylines business model requires that its entire SaaS solution can be deployed on an air-gap network. Most of fortylines bigger clients prefer to pay the extra cost and retain physical control of the cluster machines. This is an important requirement that ruled out many of the Google Chart API wrappers out there.

For consistency and to avoid many headaches, we also favor projects with BSD-like licenses and written in Python or Javascript (the two languages with picked for server-side and client-side code respectively). These were the guidelines when we started the search. Outside picking a specific open source project to build on, two open questions had to be decided:

  • Should we do the rendering server-side or client-side?
  • Which format should the graphics be rendered as (PNG, SVG, Canvas)?

The server-side way

First, if we did all the rendering server-side, it would be a lot easier to serve charts through different medium. Not only could we put the charts inside a web page but also embed them in a pdf, or an email, etc.

Charts being more in the graphics more than the photography cluster as far as image processing is concerned, it made sense to focus on producing a vector format (SVG) over a pixel format (PNG).

Ideally we are looking for python code that would transform a data model into a nice looking SVG file that we can later send to a web browser. Of course, browser SVG support being what it is, it is conceivable that in practice we have to resort to sending PNG images in the end.

All python solutions seem to either rely on the Python Imaging Library (PIL) or PyCairo, both of which are mostly bindings to a native C implementation.

PILCairo
django charts pycha (tutorial)
BeautifulCharts
cairoplot

Both pycha and BeautifulCharts are available through pip. A Pip search for charts also shows svg.charts, an MIT-style licensed package which looks promising though I couldn't figure out the prerequisites it is using for drawing the charts.

Since our search did not turn any pure-python solution, it is not far-fetched to look for chart applications that can be invoked on a command-line shell. We serialize the python data model then do some os.system call. If the quality of the charts is a lot better than the C/Python implementations, that might be worth it and won't introduce more Python to native dependencies that we would otherwise have. Suddenly something like ChartSVG, a collection of XSLT scripts that creates SVG charts from XML file, could fit the bill.

The client-side way

Google Chart API out of the equation, we were looking for full javascript libraries here. There are surprisingly an amazing pool of fully features chart library written in javascript though most of them have a commercial license with different restrictions on how you can use it for free.

amCharts and HighCharts, both have been packaged with fanstatic, a python framework to manage javascript dependencies if that matters at some point. FusionCharts charts also look really good.

d3js is not technically a chart library but it appears in many related searches. D3js deals with the much broader scope of data visualization (see here for pointers). Making charts using d3js can be quite complex but a gallery of examples exist and d3js is released under a BSD license.

The choice

The visual quality of the charts produced by client-side javascript libraries appears to be a lot better than their server-side python counterpart. If we keep bent on generating the charts server side because we care about caching, eldest browser support or simply using the same code to output monthly report PDFs, we will have to think about introducing nodejs in our back-end stack. Visual quality matters.

Fortylines builds a trace visualization tool not unlike GTKwave, though it runs in a web browser and supports the iPad touch interfaces. It is only the beginning as more rich and interactive trace analysis tools will make their way into the web product. So sooner or later, we are bound to introduce an interactive data visualization library in our stack.

If we need a data visualization library at some point and all the best charting libraries come with restrictions, we might as well pick d3js. A side advantage is that we add a single dependency and only need to learn one API.

That is how we picked d3js, an unlikely candidate, to draw charts for fortylines SaaS webapp. Later we found a chart library based on d3js - just amazing.

Nginx, Gunicorn and Django

by Sebastien Mirolo on Fri, 22 Jun 2012

I decided today to bring a new web stack consisting of nginx, gunicorn and django on a fedora 17 system. We are also throwing django-registration in the mix since the service requires authentication.

First things first, we need to install the packages on the local system.

$ yum install nginx python-gunicorn Django django-registration

We are developing a webapp written in an interpreted language (python) so a straightforward rsync should deploy the code to production, otherwise it weakens the rationale of using python for the job. Though production will run nginx, gunicorn and django, we still want to be able to debug the code on development machines with a simple manage.py runserver command. Hence thinking about file paths in advance is quite important. The following setup supports a flexible dev/prod approach.

*siteTop*/app                 # django project python code
*siteTop*/htdocs              # root for static html pages served by nginx
*siteTop*/htdocs/static       # root for static files served by nginx and django

The nginx configuration is simple and straightforward. Nginx redirects all pages to https and serves static content from htdocs.

upstream proxy_*domain* {
          server  127.0.0.1:8000;
  }

server {
          listen          80;
          server_name     *domain*;

          location / {
                  rewrite ^/(.*)$ https://$http_host/$1 redirect;
          }

  }

server {
        listen       443;
        server_name  *domain*;

        client_max_body_size 4G;
        keepalive_timeout 5;

        ssl                  on;
        ssl_certificate      /etc/ssl/certs/*domain*.pem;
        ssl_certificate_key  /etc/ssl/private/*domain*.key;

        ssl_session_timeout  5m;

        ssl_protocols  SSLv3 TLSv1;
        ssl_ciphers  ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv3:+EXP;
        ssl_prefer_server_ciphers   on;

        # path for static files
        root /var/www/*domain*/htdocs;

        location / {
            # checks for static file, if not found proxy to app
            try_files $uri @forward_to_app;
        }

        location @forward_to_app {
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header Host $http_host;
            # proxy_redirect default;
            proxy_redirect off;

            proxy_pass      http://proxy_*domain*;
        }

        error_page 500 502 503 504 /500.html;
        location = /50x.html {
            root /var/www/*domain*/htdocs;
        }
    }

The django settings.py is also straightforward. The only interesting bits are figuring out the APP_ROOT and paths to static files.

$ diff -u prev settings.py
+import os.path
+APP_ROOT = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))

STATICFILES_DIRS = (
+    APP_ROOT + '/htdocs/static',
)
TEMPLATE_DIRS = (
+    APP_ROOT + '/app/templates'
)

Getting gunicorn hooked-up with django and behaving as expected was a lot more challenging.

First I tried to run gunicorn_django. I might have misread the documentation but I tried to pass on the command-line the directory where settings.py is located. Running in daemon mode, I saw both, gunicorn master and worker, up through ps, no error in the log files, and yet, I couldn't fetch a page. Only when I finally decided to run in non-daemon mode did it become obvious that gunicorn was running in an infinite loop.

Error: Can't find 'app' in your PYTHONPATH.
Error: Can't find 'app' in your PYTHONPATH.
...

Everything started to look fine when I passed the actual settings.py file on the command-line; well at least on Fedora 17. When I decided to run the same command on OSX, I got

Could not import settings 'app/settings.py' (Is it on sys.path?)

That is a weird error, especially that ls shows the file is there and definitely in the PYTHONPATH. Digging through the django code, I found an ImportError was caught and re-written as this error message in django/conf/__init__.py. As it turns out, my OSX python complains that importlib cannot import a module by filename.

I thus decided to use the second way of running gunicorn I saw advertised, through manage.py.

$ pip install gunicorn
$ diff -u prev settings.py
INSTALLED_APPS = (
+    'gunicorn',
     ...
)
$ python ./manage.py run_gunicorn

That worked fine; still I couldn't seem to change the gunicorn process name despite all attempts. As it turns out, no error, no warning, just a silent fail because setproctitle wasn't installed on my system.

$ yum install python-setproctitle

From that point on we could run the webapp, both in prod through nginx, gunicorn and django and directly through manage.py runserver in development.

Django and PayPal payment processing

by Sebastien Mirolo on Thu, 3 May 2012

I gave a another shot at the paypal API Today. Since I am most interested in encrypted web payments, after signing up with a business account, I went through the steps of generating a private key and corresponding public certificate.

$ openssl genrsa -out hostname-paypal-priv.pem 1024
$ openssl req -new -key hostname-paypal-priv.pem -x509 \
  		  -days 365 -out hostname-paypal-pubcert.pem

# Useful command to check certificate expiration
$ openssl x509 -in hostname-paypal-pubcert.pem -noout -enddate

I then went to my paypal account, follow to > Merchant Services > My Account > Profile > My Selling Tools > Encrypted Payment Settings > Add and upload hostname-paypal-pubcert.pem. Write down the CERT_ID, you will need it later on to create payment buttons.

The paypal API comes in many flavors but so far it is only important to understand PayPal Payments Standard vs. PayPal Payments Pro. When you use the first one, PayPal Payments Standard, a visitor will be redirected from your website to paypal's website for payment processing. When you use the second one, PayPal Payments Pro, a visitor can enter payment information directly through your site.

Paypal provides a sandbox to be used to develop and debug your code. Unfortunately the sandbox is quite broken. Some critical links, like "Merchant Services", branch out of the sandbox into the live paypal site. That makes it impossible to upload certificates in the sandbox and thus test your code there.

django-paypal

After uploading a certificate, I search through django packages for an integrated payment solution. django-merchant supports multiple payment processors including paypal. The django-merchant paypal setup documentation deals with PayPal Payments Pro. I am not quite sure django-merchant supports the PayPal Payments Standard. Either way, since it is mostly a wrapper around django-paypal as far as paypal support is concerned, I started there and configured django-paypal itself first.

Through the source code of django-paypal, there is a reference to a paypal with django post using the m2crypto library for encryption.

# install prerequisites
$ apt-get install python-virtualenv python-m2crypto
$ virtualenv ~/payment
$ source ~/payment/bin/activate
$ pip install Django django-registration django-paypal django-merchant

# create a django example project 
$ django-admin startproject example
$ django-admin startapp charge
$ diff -u prev settings.py
INSTALLED_APPS = (
   ...
+'paypal.standard.ipn'
)

+# django-paypal
+PAYPAL_NOTIFY_URL = "URL_ROOT/charge/difficult_to_guess"
+PAYPAL_RETURN_URL = "URL_ROOT/charge/return/"
+PAYPAL_CANCEL_URL = "URL_ROOT/charge/cancel/"
+# These are credentials and should be proctected accordingly.
+PAYPAL_RECEIVER_EMAIL = ...
+PAYPAL_PRIVATE_CERT = ...
+PAYPAL_PUBLIC_CERT = ...
+# path to Paypal's own certificate
+PAYPAL_CERT = ...
+# code which Paypal assign to the certificate when you upload it
+PAYPAL_CERT_ID = ...

$ diff -u prev urls.py
urlpatterns = patterns('',
+ # The order of 
+    (r'^charge/difficult_to_guess/',
+     include('paypal.standard.ipn.urls')),
+    (r'^charge/cancel/', 'charge.views.payment_cancel'),
+    (r'^charge/return/', 'charge.views.payment_return'),
+    (r'^charge/', 'charge.views.paypal_charge'),

$ python manage.py syncdb

# Running the http server
$ python manage.py runserver
$ wget http://127.0.0.1/charge/

Testing IPNs

For each payment processing request, paypal asynchronously calls your web server with the status of that request. That is the second part of the payment pipeline that needs to be tested before going live.

I decided to give a second chance to the Paypal Sandbox on IPN testing. I went through > Test Account > Create a pre-configured account > Business.

"> Test Tools > Instant Payment Notification (IPN) simulator" looks like a promising candidate so I went ahead and entered my site's url for the ipn handler, selected "Express Checkout" left all default values and clicked "Send IPN", result:

IPN delivery failed. Unable to connect to the specified URL. Please verify the URL and try again.

As it turns out, paypal will not connect to your web server on a plain text connection. The error message is just very cryptic. I proxyied the django test server through Apache to support https connections.

$ cd /etc/apache2/mods-enabled
$ ln -s ../mods-available/proxy.load
$ ln -s ../mods-available/proxy_http.load
$ ln -s ../mods-available/proxy.conf
$ diff -u prev proxy.conf
- 	   ProxyRequests Off
+      ProxyRequests On

        <Proxy *>
                AddDefaultCharset off
                Order deny,allow
                Deny from all
+               Allow from 127.0.0.1
        </Proxy>

+       ProxyVia On

$ diff -u prev ../sites-available/default-ssl
+       ProxyPass /charge/ http://127.0.0.1:8000/charge/
+       ProxyPassReverse /charge/ http://127.0.0.1:8000/charge/

+<Location /charge/>
+  Order allow,deny
+  Allow from all
+</Location>

That worked and I could see the paypal request in my apache and django logs. Though now I hit the following error:

IPN delivery failed. HTTP error code 403: Forbidden

Classic django error related to the csrf middleware and a little bit of csrf_exempt magic does the trick.

$ diff -u prev /usr/lib/python/site-packages/paypal/standard/ipn/views.py
+from django.views.decorators.csrf import csrf_exempt

+@csrf_exempt 
@require_POST
def ipn(request, item_check_callable=None): 

The IPN simulator is now showing a success.

Further notes

At some point I encountered HTTP 500 returned code from django without any log showing up. That happened because an import statements was not found. The longest time I spent was to figure out how to display the cause of the error. I finally did it like this.

$ diff -u prev settings.py
LOGGING = {
    'handlers': {
+        'logfile':{
+        'level':'DEBUG',
+        'class':'logging.handlers.WatchedFileHandler',
+        'filename': '/var/log/django.log',
+        'formatter': 'simple'
+        },
    'loggers': {
        'django.request': {
    ...
        },
+       # Might as well log any errors anywhere else in Django
+       'django': {
+           'handlers': ['logfile'],
+           'level': 'ERROR',
+           'propagate': False,
+       },

I was interested to find out how did django-paypal verify the IPN is actually coming from paypal. Looking through the source code I traced the answer from paypal/standard/models.py:verify to paypal/standard/ipn/models.py:_postback. django-paypal post the IPN back to paypal and checks the return code. Wow! I'd trust the DNS server I am using.

django-paypal is using django signals to trigger some other code that should run on an IPN notification. It can be setup as follow:

$ diff -u prev charge/models.py
+ from paypal.standard.ipn.signals import payment_was_successful

+def paypal_payment_was_successful(sender, **kwargs):
+    logging.error("!!! payment_was_successful for invoice %s", sender.invoice)
+payment_was_successful.connect(paypal_payment_was_successful)

With such code models.py need to be imported/executed before an IPN notification is triggered otherwise the signal handler is never set. That's usually not a problem when you trigger the payment pipeline urls in order (charge, ipn). That is something to be aware of though when starting django and directly running the paypal IPN simulator. Signals won't be added and thus triggered. Because of csrf_exempt patch and the signal setup issue, it might be better to add a wrapper to paypal.standard.ipn.views.ipn inside the charge django app.

Some interesting documentation from Record Keeping with Pass-through Variables, you should not that the following variables are passed through paypal back to your website:

  • custom
  • item_number or item_number_X
  • invoice

Originally before using django-paypal, I looked through Paypal Java SDK. The setup required to download a crypto package from bouncycastle and export private keys in pkcs12 format.

# Compiling the code sample
$ curl -O http://www.bouncycastle.org/download/crypto-145.tar.gz
$ tar zxf crypto-145.tar.gz
$ export JAVA_CLASSPATH=~/crypto-145/jars/bcprov-jdk16-145.jar
$ export JAVA_CLASSPATH=$JAVA_CLASSPATH:~/crypto-145/jars/bcpg-jdk16-145.jar
$ export JAVA_CLASSPATH=$JAVA_CLASSPATH:~/crypto-145/jars/bctest-jdk16-145.jar
$ export JAVA_CLASSPATH=$JAVA_CLASSPATH:~/crypto-145/jars/bcmail-jdk16-145.jar
$ javac -g -classpath $JAVA_CLASSPATH ButtonEncryption.java \
  		com/paypal/crypto/sample/*.java

# Converting the private key (remember password for next command)
$ openssl pkcs12 -export -inkey hostname-paypal-priv.pem \
  		  -in hostname-paypal-pubcert.pem \
		  -out hostname-paypal-priv.p12

# Encrypting a paypal button
$ cat testbutton.txt
cert_id=Given when key uploaded to paypal website
cmd=_xclick  
business=sales@company.com
item_name=Handheld Computer  
item_number=1234
custom=sc-id-789  
amount=500.00
currency_code=USD
tax=41.25
shipping=20.00
address_override=1
address1=123 Main St
city=Austin
state=TX
zip=94085
country=US
no_note=1 
cancel_return=http://www.company.com/cancel.htm
$ java -classpath $JAVA_CLASSPATH ButtonEncryption \
  	   hostname-paypal-pubcert.pem \
	   hostname-paypal-priv.p12 \
	   paypal_cert_pem.txt \
	   pkcs12_password \
	   testbutton.txt testbutton.html

I have not completed this work yet but here are the initial notes I currently have on using crypto++ to interface with paypal processing system. Some background articles that turned out to be useful are Cryptographic Interoperability: KeysApplied Crypto++: Block Ciphers, crypto++ CBC Mode and crypto++ key formats.

# Private key that can be loaded through crypto++
openssl pkcs8 -nocrypt -topk8 -in hostname-paypal-priv.pem \
		-out hostname-paypal-priv.der -outform DER
Share with your network