Everything I know about Python...

Learn to Write Pythonic Code!

Check out the book Writing Idiomatic Python!

Looking for Python Tutoring? Remote and local (NYC) slots still available! Email me at jeff@jeffknupp.com for more info.

Python, sandman2, and Open Data

sandman2 (and its predecessor, sandman) has been far and away my most successful open source project. I fully attribute this to its genuine usefulness. It is most often used as a command line tool, through which you provide the connection details of a legacy database and run it, and in return it starts both a RESTful API service for your data, as well as a web-based UI that allows you to add, delete, and edit rows directly. For many (especially in the enterprise), interacting with legacy databases is a pain at best, and impossible at worst. The ability to access data via a simple REST API, then, is a godsend.

But what about outside the enterprise? Given my position at Enigma, how could an organization wanting to open its data make use of sandman2? At first, the answer seems obvious: just run sandman2 as is, and let would-be consumers of the data access it via the API.

That would work. But it's clunky and overly time consuming (especially given that there is a limit of only twenty resources per request, and your table may have one million rows). More than anything, folks who want access to open data want access to all of it. The first thing they'll do, if it's a possibility, is to take a dump of all of the data. They almost never interact with it programatically at its source. Analysis is done elsewhere.

So forcing those folks to get twenty resources at a time is silly. They should be able to hit a single endpoint and get all of the data. And it should be in a format they're used to (like, say, CSV) rather than JSON.

Now they can do exactly that.

By adding eleven lines of code, sandman2 got the ability to "export" a collection to CSV. By simply adding the ?export=true URL parameter, export mode is triggered, and all of the data is available in one go. It's fast, simple, and above all, just works.

So if you're an organization looking to "free" your data and make it available to the world but don't have the tools to do so, well, you do now.

Simply download and run sandman2, point users at it, and check off that pesky "give back to the data community" box off of your to-do list. This trivial act may not make much difference to you, but there are whole communities out there that care a lot about this stuff, and you'll be doing them a favor and making yourself look good in the process.


I would be remiss if I didn't highlight just how important Enigma was in getting this functionality added. No one at the company suggested this to me, but just working there has brought data and its availability to the front of my mind. I'm mostly just ashamed it took me this long to realize there was something I could do to help out the folks in the data community. Regardless, if even a single data set is made public based on sandman2, I'll consider it a win. After all, eleven lines of code is still just eleven lines of code.

Posted on by
comments powered by Disqus
Web Analytics