Hackers Gonna Hack

On Hacking and Hustling

Python’s Hardest Problem

For more than a decade, no single issue has caused more frustration or curiousity for Python novices and experts alike than the Global Interpreter Lock.

An Open Question

Every field has one. A problem that has been written off as too difficult, too time consuming. Merely mentioning an attempt to solve it raises eyebrows. Long after the community at large has moved on, it is taken up by those on the fringe. Attempts by novices are undertaken for no other reason than the difficulty of the problem and the imagined accolades that would accompany a solution. The open question in Computer Science of whether P = NP is such a problem. An answer to the affirmitive has the possibility to literally change the world, provided a “reasonable” polynomial time algorithm is presented. Python’s hardest problem is less difficult than crafting a proof of P = NP, to be sure. Nevertheless, it has not received a satisfactory solution to date, and the practical implications of a solution would be similarly transformative. Thus, it’s easy to see why so many in the Python community are interested in an answer to the question: “What can be done about the Global Interpreter Lock?”

From Memcached to Redis to Surpdb

In this post, I’ll describe my journey to find the perfect caching solution for my Django-based site linkrdr. After trying Memcached and Redis, I settled on surpdb. I guarantee you haven’t heard of surpdb before, because I just finished writing it.

Single Founder SEO: Building Your Personal Brand

As a single founder, I realize I am at a tremendous disadvantage when it comes to non-technical work. There are only so many hours in a day, and I have a full-time job. Work on linkrdr, therefore, must be prioritized. With so many interesting technical challenges to solve, a number of other useful activities fall by the wayside. Marketing is a big one. So how do I focus my efforts? By building my personal brand.

Django Memcached: Optimizing Django Through Caching

Caching is a subject near and dear to the heart of many peformance-minded programmers. For those coming to web programming without other programming experience, caching may be a new topic. For programmers new to the web, using an external cache may be an approach not yet considered. In this post, I’ll describe how, through the use of Django’s caching support, I was able to reduce linkrdr’s page load time from over 3.5 seconds to 0.01 seconds.

How Linkrdr Went Semi-viral

I was a bit under the weather this past weekend, but it turns out I wasn’t the only thing to go viral. Below is a brief story of how linkrdr enjoyed it’s first encouter with virality.

Background

Before Saturday, linkrdr had about 10 users, who had figured out a way to signup without me really providing one. Sometime Thursday evening, I believe, I slapped a ‘Beta’ sticker on the front page, cleaned some stuff up, and declared linkrdr open for business. Come the weekend, I was under the weather and not feeling like working on anything, so I didn’t check any of my sites until Monday at 4pm. I cruised over to my Clicky dashboard and took a look at the stats for this blog. Then something caught my eye…

Optimizing Django Views With C++

In my previous post I outlined the method by which one goes about profiling a Django application. I used a view from linkrdr as an example. That view is responsible of aggregating, ranking, and sorting all of the links in a user’s feeds (RSS, atom, Twitter, etc). The code from the post was an early, simplistic implementation of the view. I have, however, a much more robust scoring algorithm, written in Python, which I planned to used on the site.

You may have caught the word ‘planned’ in there. The algorithm turned out to be too slow. Rather, my Python implementation of the algorithm was slower than what I deemed acceptable. After thinking of various architectural changes that could be made to solve the problem, I settled on a somewhat radical solution for a Django developer: I implemented the view in C++.

I realize that not every Django developer knows C++, nor should they, but those that do should realize it’s a viable tool available when Python is just too slow. Eventually, you may get to a point where you can’t really optimize your Python code any more. In this case, profiling will show that most of your time is spent in Python library calls. Once you hit that point, you’ve either written a horribly inefficient algorithm or you’ve got a problem not suited for Python.

When I realized I had hit that point with my view code, I panicked. ‘What more is there to do?’ I wondered. Then I rememberd a work project where I had written some C++ code that interfaced with Python. From a technical perspective, there was nothing stopping me from implementing some aspects of my Django app in C++ (besides the fact that it’s excruciating to write in coming from Python). Since linkrdr is a single-person project, there are no teammates who need to grok the code. I’m free to implement it as I wish.

Profiling Django Applications: A Journey From 1300 to 2 Queries

In this post, I’ll discuss profiling Django applications through a case study in linkrdr’s code. Through the use of profiling tools, I was able to reduce the number of database queries a view was using from 1300 to 2.

Introduction To Profiling

At some point in most Django projects, some part of the application becomes ‘slow’. This doesn’t have to be the case (more on that later), but it’s often the result of changes made without performance in mind.

In the begining, this is actually a good thing: focus on making it work first, then focus on making it fast. Of course, you don’t want to code yourself into a corner by writing code that “works” but does so in a way that it will never be fast. Instead, you want to keep performance in the back of your mind while implementing a solution that makes sense.

Once you’ve proven your solution works through your automated tests (You are using automated tests, right?), the next step is to make sure its performance is acceptable. Note that I didn’t say ‘optimal’. Don’t waste time making something faster than it needs to be. This should be common sense but, once the optimization bug bites, it’s common for developers to go a bit off the deep end and keep trying to find optimizations long after it’s necessary.

Unit Testing in Django

As a follow-up to my post Starting a Django Project the Right Way, I wanted to talk aboue the importance of writing tests for Django applications. I previously mentioned that my first site IllestRhyme, has no app specific tests for it. This is both embarassing and true. I’ve lost countless hours to fixing problems caused by new changes. I wasn’t going to make the same mistake with linkrdr. Having a set of unit tests that I can run in an automated fashion has made a world of difference.

The Django unittest framework (really the Python unittest framework) is both simple and powerfull. Along with the test client (django.test.client.Client), there’s a lot you can do with Django right out of the box.

Setup

To start, we’ll want to create a dump of our database data to use during testing.

Dump our data
1
$ ./manage.py dumpdata --format=json > my/app/directory/initial_data.json

This will give us a json fixture that mimics the current state of our production database. Note that since this is a fixture for all of the apps installed, we’ve put it in a non-standard directory. To let the test runner find our fixture, we’ll need to set FIXTURE_DIRS to the directory we just dumped our data to.

Now that we have our data copied, let’s run whatever tests our installed apps have already:

Run our tests
1
$  python manage.py test

This hopefully gives us output like:

Test run output
1
2
3
4
5
.....................................................................................................................................................................................................................................................................................................................................................................
----------------------------------------------------------------------
Ran 357 tests in 30.025s

OK

This is also a good check of the integrity of your database, as Django will try to load a fixture representing all of your data. If you’ve been screwing around with the admin interface or the shell adding and deleting records, you may have integrity errors. If you do (like I did), you’ll have to fix them manually and re-dump your data.

Introducing Linkrdr

I started work on a new site on Monday: linkrdr. It’s the next generation feed reader for people who subscribe to tens or hundreds of feeds. linkrdr aggregates your feeds but more importantly, your links. It ranks links according to a relevance formula. Purely chronological based readers are just terrible at managing a mountain of links.

The idea for linkrdr came from a Hacker News post describing exactly the problem linkrdr solves. I realized I had exactly the same problem as the submitter: too many links, too little time. I don’t want to miss out on a quality post on a blog that doesn’t publish very often. At the same time, if there’s a link that’s showing up in a number of my feeds, it’s a good bet that it’s worth reading.

Starting a Django Project the Right Way

One of the things I wish I had known when starting my Django project for IllestRhyme was “How do I start a real Django project”. As in, one that’s actually going to be used and developed more, not the toy project from the (admittedly execellent) Django documentation.

Having just gone through this process again for my new site, I wanted to share the knowledge I’ve gained about how to properly start a project in Django. By the end of this post, you will have

1. A fully functional Django project
2. All resources under source control (with git)
3. An environment independet install of your project (using virtualenv)
4. Automated deployment and testing (using Fabric)
5. Automatic database migrations (using South)
6. A solid start to your new site

None of these steps, except for perhaps the first, are covered in the official tutorial. They should be. If you’re looking to start a new, production ready Django project, look no further.