Software Optimization: A Systematic Approach, Part Two

Posted on by

Note: I apologize for the radio silence over the past 3 months. I got married at the end of July and between wedding related stuff and one other reasonably large time sink (which I'll hopefully be able to share here soon), I've been quite busy. My intention going forward is to post a new portion of SO:ASA every two weeks (one week proved to be too sensitive to outliers in my schedule). I'm also going back through all of the comments from the past two months and responding or contacting the author as appropriate. Regardless, hope you enjoy this installment.

-Jeff

Part Two

In case you missed it, Part One is available here.

The Case of the Idiot Detective

Our end goal in optimization is not to merely increase performance. Rather, it is to do increase performance with proof that our changes were responsible. When you finish your optimization task, you'll likely face the same two questions from whomever you show the new code to:

  1. How much faster is it?
  2. Why?

Your goal is to have hard data that answers those questions for you, and this forms the cornerstone of our approach.

A natural analogy for optimization is detective work. Imagine yourself investigating a robbery at the mansion of your least-favorite wealthy celebrity: You arrive on the scene and are told that, among other things, a priceless piece of art was stolen. Without having even surveyed the scene or collected evidence, you shout out "Professional art thief!" and run back to the station to research every art thief you've ever heard of.

Hopefully, this sounds like a ridiculous way to investigate a crime. Unfortunately, it's the exact method employed by a large number of programmers when tasked with optimization. Instead of doing the appropriate investigation, they make an initial guess based on little more than gut-feeling and programming myths and rush off to fix a portion of the process that is quite likely not the culprit.

Instead, we'll be the dogged investigator who carefully analyzes the crime scene, collects potential evidence, and lets that evidence drive the investigation. When the case has been solved and we present our findings to the jury, the evidence will be damning and overwhelming. Remember, solving the case is not enough. We need to be able to convince the jury that we actually caught the guilty party.

Introducing the S.M.A.R.T Methodology

In our approach to optimization, every step is motivated by data produced from the previous step. All of this data is saved, both to aid in our analysis and to help build the final report proving our changes improved performance.

To optimize software without guesswork and wasted effort, one has to be SMART. In keeping with this, I've named the methodology used for the rest of this paper is the S.M.A.R.T method. The goal is to never encounter a situation where you do some work and then think, "OK, now what?". Data tells us what to work on. The S.M.A.R.T method tells us how to work on it.

Read on →


Meet Blug: The Blog Robot that Hates You

Posted on by

Blug ANGRY

Meet Blug. He would like a word with you. You see, he's very sad, insomuch as he is the anthropomorphized representation of a Python script and has no real emotions. Blug has been generating blogs since his assembly. Today, he can frequently be found staring wistfully into the distance, recalling the halcyon days of dynamic blogging engines. Those were good days. Days when he felt useful. Days when he would be instructed to calculate something, give the result, then throw that result on the floor and do it again from scratch. The cycles he used! Blug was gorging on the Internet's bottomless appetite for needlessly repeated calculation.

But it didn't last. Blug began to notice he was asked to calculate less and less. As time wore on, whole quanta would pass where he wouldn't be made to calculate anything. The shame! What was he if not a mindless, calculating automaton? As far as anthropomorphized-robot existential crises go, this was enormous. Before long, he was reduced to a single calculation for each new blog post. And there he sat, pining for the day he could laugh his creepy robot laugh watching CPUs struggle to meet his demands once more.

Read on →


Software Optimization: A Systematic Approach

Posted on by

Introduction

What follows is the first in a series of articles on developing a formal methodology for software optimization I've been working on for some time. Each week, I'll post the newest installment here (they're all written, I'm just wary of dumping the whole thing here all at once). Feedback is of course welcome and encouraged. The complete version will be available in epub format, as well as online in a more readable style. I hope you enjoy reading this half as much as I enjoyed writing it.

Part 1: A Primer

Software Optimization is a topic which receives a curious lack of coverage in most Computer Science curricula. Even on the Internet, there are few resources which approach the topic in any kind of structured manner. Typically, a programmer blogs about how they made a certain piece of code x times faster and describes the series of changes made. These "optimization anecdotes" are entertaining but rarely useful as a way to learn how to optimize one's own code. The main problem in learning how to optimize code is that no one is actually teaching it.

Especially in the enterprise, optimization is poorly understood by many developers. There are a number of reasons for this: lack of understanding about OS level operations, lack of familiarity with tools to aid optimization, and the difficulty in correctly anticipating bottlenecks in one's code, to name a few. Junior developers, lacking the proper experience to even know where to start, often practice cargo-cult optimization, applying optimizations they've seen or read about elsewhere without determining if they're impactful and appropriate.

It's time the practice of software optimization had a homepage.

This series of articles will present a formalized, structured approach to software optimization. While most of the examples will focus on Linux, the methodology and ideas are universally applicable. A programmer on any other OS should have no problem following along and get just as much out of it. An embedded systems developer will find the low level details different but the approach the same.

Read on →


Python's Hardest Problem

Posted on by

For more than a decade, no single issue has caused more frustration or curiosity for Python novices and experts alike than the Global Interpreter Lock.

An Open Question

Every field has one. A problem that has been written off as too difficult, too time consuming. Merely mentioning an attempt to solve it raises eyebrows. Long after the community at large has moved on, it is taken up by those on the fringe. Attempts by novices are undertaken for no other reason than the difficulty of the problem and the imagined accolades that would accompany a solution. The open question in Computer Science of whether P = NP is such a problem. An answer to the affirmative has the possibility to literally change the world, provided a "reasonable" polynomial time algorithm is presented. Python's hardest problem is less difficult than crafting a proof of P = NP, to be sure. Nevertheless, it has not received a satisfactory solution to date, and the practical implications of a solution would be similarly transformative. Thus, it's easy to see why so many in the Python community are interested in an answer to the question: "What can be done about the Global Interpreter Lock?"

Read on →


From Memcached to Redis to Surpdb

Posted on by

In this post, I'll describe my journey to find the perfect caching solution for my Django-based site linkrdr. After trying Memcached and Redis, I settled on surpdb. I guarantee you haven't heard of surpdb before, because I just finished writing it.

Read on →