jsonpickle 0.3.1 Released

jsonpickle, the powerful library for serializing complex object graphs in Python to JSON, had a major milestone this week with the official release of 0.3.1, available on PyPi with documentation and full release notes at http://jsonpickle.github.com/. We have migrated from the Google Code site to the new Github site at http://github.com/jsonpickle/jsonpickle.

This release represents nearly a year of development from multiple contributors. Some of the major highlights of the release include supporting a wider variety of objects, supporting the pickle protocol's set and get state methods, and allowing the use and addition of any Python JSON backend (e.g. demjson, simplejson, django.util.simplejson, etc.). Please be aware that backwards compatibility for the 0.2.0 format JSON is not guaranteed.

In the past year jsonpickle has done well, with nearly 2500 downloads of the 0.2.0 release from Google Code, not including the bundled distribution of jsonpickle in tools such as FireLogger and git-cola. jsonpickle is currently available in the Gentoo repository and working its way through the Fedora and Debian repository processes.

I thank our contributors, including David Aguilar, Dan Buch, and Ian Schenck for their massive improvements to jsonpickle. I also thank everyone who has submitted bug reports and shared thoughts on our mailing list. Finally, I thank the distribution managers who have worked to package jsonpickle for their various distributions.

Please try the new version, submit bug reports, and even fork the project on Github.


HTML to reStructuredText in Python using Pandoc

During the conversion of my blog from Wordpress to a custom Django-based system, I wanted to move from HTML markup to reStructuredText (partly to make it easier to publish Sphinx documentation to my blog).

While it is dead simple to convert reStructuredText to HTML, going the other way is more difficult. Luckily, Pandoc, the swiss army knife for converting between markup formats, can do a nice job converting HTML to reStructuredText.

I wrote a custom Django Command to parse a Wordpress XML export file and store the blog entries. The relevant code to convert HTML to reStructuredText is very simple. It simply makes a subprocess call to the Pandoc command and retrieves the command's output. Make sure you have Pandoc installed (in Ubuntu, sudo apt-get install pandoc will work).

import subprocess
def html2rst(html):
    p = subprocess.Popen(['pandoc', '--from=html', '--to=rst'],
                         stdin=subprocess.PIPE, stdout=subprocess.PIPE)
    return p.communicate(html)[0]

reStructuredText Widget in Django Admin

While Django has great support for rendering standard markup languages, sometimes it can be difficult editing documents using a markup in the Django Admin. Several others (1, 2, 3) show how easy it is to edit Markdown, Textile, and even HTML, in the Django Admin using a WYSIWYG editor, such as markItUp! or TinyMCE.

Unfortunately, reStructuredText is not well support by most WYSIWYG editors. Nevertheless, we can improve the experience of editing reStructuredText in the Django Admin. One of the biggest improvements is switching the Textarea to use a monospace font to avoid issues caused by the heading underlines being too short. We can also customize the size of the Textarea.

In our app/admin.py file, we can add a ModelForm which overrides the field that has the reStructuredText content (in this case, description). We then use this ModelForm as the form in our subclass of ModelAdmin. Finally, we indicate that the ModelAdmin subclass is associated with the specific model that contains the reStructuredText content.

For the reStructuredText content field, description, we change the size to a width of 80 characters and the font to a monospace family. Additionally, we add a quick link to the reStructuredText Quick Reference.

from django import forms
from django.contrib import admin
from app.models import Entry

class EntryAdminForm(forms.ModelForm):
    description = forms.CharField(widget=forms.Textarea(attrs={'rows':30,
                                                                'cols':80,
                                                                'style':'font-family:monospace'}),
                                  help_text='<a href="http://docutils.sourceforge.net/docs/user/rst/quickref.html">reStructuredText Quick Reference</a>')
    class Meta:
        model = Entry

class EntryAdmin(admin.ModelAdmin):
    form = EntryAdminForm

admin.site.register(Entry, EntryAdmin)

Mocking Groovy's HTTPBuilder

I ran into a head-scratcher today when trying to unit test some Groovy code. The code under test interacts with an HTTP web service using Groovy's great HTTPBuilder, which wraps Apache's HttpClient. Obviously, I wanted to mock the interaction with the HTTP server to limit the scope of my tests.

Groovy makes it easy to create simple mocks using maps. To mock a class with a map, one must create a map which is keyed by the methods names to be tested and storing closures for the mock method implementation. For example, if we wish to mock out the HTTPBuilder, which has a "post" method, we can accomplish it using the map defined by mapMock.

class HTTPBuilder {
    def post(...) { /* real implementation */ }
}


def mapMock = ["post": { /* mock implementation */ }]

This map-mock approach was working great for mocking out the post, put, and delete methods in HTTPBuilder, but the get method was giving me quite a bit of trouble. The closure in my get method mock was never executed.

After taking a step back, I realized that the map's get method (the one used to return the value at a specific key) was getting called instead of the key within the map called get.

The simple solution was to switch to use an Expando mock instead of a map mock.

def expandoMock = new Expando()
expandoMock.get = { /* mock implementation */ }

I know I'm late to the train, buy Groovy is a breathe of fresh air compared to Java.


Master's Thesis & Open-Source Tool

On July 15th, I successfully defended my Master's Thesis in Biomedical Informatics at Vanderbilt University. This defense was the culmination of 2 years of work. The thesis focuses on extracting organizational structure and relationships from the audit logs of clinician information systems. This work has potential applications in the improvement of delivery of care and improving the security of patients private medical data.

As part of this work, I developed an open-source tool for analyzing audit logs. Licensed under an Apache 2.0 License, the Healthcare Organizational Relational Network Extraction Toolkit (HORNET) is a Python framework for plugins that analyze healthcare audit logs. The tool is fully functional, but is not yet polished enough for use by healthcare administrators.

The project is hosted on Google Code (http://code.google.com/p/hornet/). You can visit the project site as well as view the latest documentation

I am writing a journal publication that describes this tool, its methods, and results from Vanderbilt University Medical Center. I will link to that publication when it is available, but until that time, I can release my thesis abstract.

A Framework for the Automatic Discovery of Policy from Healthcare Access Logs

by John M. Paulett

Healthcare organizations are often stymied in their efforts to prevent insider attacks that violate patient privacy. Numerous high-profile privacy breaches involving celebrities have brought this deficiency to the public's attention. In response, recent legislation aims to improve this situation by means of regulations and sanctions. While the public and government may demand more privacy safeguards, the current state-of-the-art tools in healthcare security, such as access control and auditing, will still be limited in their ability to solve the issue technically. These technologies are theoretically sound and tested in other industries, yet are suboptimal because no feasible methods exist for generating the policies these systems must act upon, due to the inherent complexities of modern healthcare organizations.

To address this shortcoming, we present a novel open-source framework, which mines low-level statistics of how users interact within the organization from the access logs of the organization's information systems. Our framework is scalable and capable of handling real world data integrity issues. We demonstrate the use of our tool by modeling the Vanderbilt University Medical Center. Additionally, we compare our framework's model to traditional experts who would attempt to manually generate a similar model.


older