October 2009

HTML to reStructuredText in Python using Pandoc

During the conversion of my blog from Wordpress to a custom Django-based system, I wanted to move from HTML markup to reStructuredText (partly to make it easier to publish Sphinx documentation to my blog).

While it is dead simple to convert reStructuredText to HTML, going the other way is more difficult. Luckily, Pandoc, the swiss army knife for converting between markup formats, can do a nice job converting HTML to reStructuredText.

I wrote a custom Django Command to parse a Wordpress XML export file and store the blog entries. The relevant code to convert HTML to reStructuredText is very simple. It simply makes a subprocess call to the Pandoc command and retrieves the command's output. Make sure you have Pandoc installed (in Ubuntu, sudo apt-get install pandoc will work).

import subprocess
def html2rst(html):
    p = subprocess.Popen(['pandoc', '--from=html', '--to=rst'],
                         stdin=subprocess.PIPE, stdout=subprocess.PIPE)
    return p.communicate(html)[0]

reStructuredText Widget in Django Admin

While Django has great support for rendering standard markup languages, sometimes it can be difficult editing documents using a markup in the Django Admin. Several others (1, 2, 3) show how easy it is to edit Markdown, Textile, and even HTML, in the Django Admin using a WYSIWYG editor, such as markItUp! or TinyMCE.

Unfortunately, reStructuredText is not well support by most WYSIWYG editors. Nevertheless, we can improve the experience of editing reStructuredText in the Django Admin. One of the biggest improvements is switching the Textarea to use a monospace font to avoid issues caused by the heading underlines being too short. We can also customize the size of the Textarea.

In our app/admin.py file, we can add a ModelForm which overrides the field that has the reStructuredText content (in this case, description). We then use this ModelForm as the form in our subclass of ModelAdmin. Finally, we indicate that the ModelAdmin subclass is associated with the specific model that contains the reStructuredText content.

For the reStructuredText content field, description, we change the size to a width of 80 characters and the font to a monospace family. Additionally, we add a quick link to the reStructuredText Quick Reference.

from django import forms
from django.contrib import admin
from app.models import Entry

class EntryAdminForm(forms.ModelForm):
    description = forms.CharField(widget=forms.Textarea(attrs={'rows':30,
                                                                'cols':80,
                                                                'style':'font-family:monospace'}),
                                  help_text='<a href="http://docutils.sourceforge.net/docs/user/rst/quickref.html">reStructuredText Quick Reference</a>')
    class Meta:
        model = Entry

class EntryAdmin(admin.ModelAdmin):
    form = EntryAdminForm

admin.site.register(Entry, EntryAdmin)

Mocking Groovy's HTTPBuilder

I ran into a head-scratcher today when trying to unit test some Groovy code. The code under test interacts with an HTTP web service using Groovy's great HTTPBuilder, which wraps Apache's HttpClient. Obviously, I wanted to mock the interaction with the HTTP server to limit the scope of my tests.

Groovy makes it easy to create simple mocks using maps. To mock a class with a map, one must create a map which is keyed by the methods names to be tested and storing closures for the mock method implementation. For example, if we wish to mock out the HTTPBuilder, which has a "post" method, we can accomplish it using the map defined by mapMock.

class HTTPBuilder {
    def post(...) { /* real implementation */ }
}


def mapMock = ["post": { /* mock implementation */ }]

This map-mock approach was working great for mocking out the post, put, and delete methods in HTTPBuilder, but the get method was giving me quite a bit of trouble. The closure in my get method mock was never executed.

After taking a step back, I realized that the map's get method (the one used to return the value at a specific key) was getting called instead of the key within the map called get.

The simple solution was to switch to use an Expando mock instead of a map mock.

def expandoMock = new Expando()
expandoMock.get = { /* mock implementation */ }

I know I'm late to the train, buy Groovy is a breathe of fresh air compared to Java.