Surprise!

When I tell folks I study Political Science, they’re usually surprised to learn that it is not only a research-oriented profession but a necessarily mathematical one. No, we aren’t all talking heads in training, and yes, the graphs you see on CNN comparing Hillary Clinton’s ideological position to Ted Cruz’s rely on more than guesswork under the aegis of knowledge… as crazy as that may sound. Although the study of topics in social science cannot always be designed for causal inference or analyzed quantitatively (and that’s a beautiful thing), I’d argue that Political Science places an emphasis on numerical analysis that is not well understood even by some who propose to begin the study of it. Accordingly, my family is under the impression I’m taking a few gap years to study the Presidents.

It’s not hard to imagine that many people who enter into Political Science research – or social science research, for that matter – are similarly surprised, only discovering the “mathophilic” nature of their field after it’s too late to turn back. Gary King summed up the sentiment in his praise for Jeff Gill’s book, Essential Mathematics for Political and Social Research:

Did you choose the social sciences because you thought they had relatively little mathematical content? Surprise! You’re now in a bizarre situation, in which many of us once found ourselves […]

A bizarre situation indeed—one which, as of the book’s 2006 writing, was captured well by Dr. King. Today, in 2015, his words still ring true; a strong math background is a necessary prerequisite for the study of the social sciences. In the past decade, however, the number of prerequisites necessary for their study has grown. Fledging Political Scientists must know how to do more than math. They must also know how to hack.

What’s Hacking? Isn’t it malicious?

Hacking used to be the art of gaining access to things that were off-limits through coding prowess—often against the wishes of an organization whose systems a hacker would break into. Breaking the rules meant that hackers had to be very good at writing code. Other people were good at coding, too, but being a “regular” programmer doesn’t really come with the enviable edgy coolness associated with doing things one is not supposed to be doing; being a hacker, rather than a coder or developer, comes with a certain tech caché. It’s not hard to imagine that because of the prestige entailed in the term, regular programmers began using it to talk about their work. Hence, the term’s current bifurcate meaning. At least that’s one story.

In today’s tech circles, hacking simply means “writing code well,” as evidenced by the emergence of hackathons (where teams will compete to solve a problem for a sponsor organization) and programs like Code for America’s National Day of Civic Hacking (where coders “design processes to improve our communities and the governments that serve them”).

Is hacking malicious? The newer definition of hacking aside, it’s generally unclear whether hacking proper is a species of curiosity, ego, profit, or malice (among other things). The intent of hacking can only be determined on a case-by-case basis, and whether the act of hacking into something is good or bad often depends on one’s point of view. See, e.g., Stuxnet. For our intents and purposes, however, hacking is only malicious insofar as it can sometimes cause your brain to explode :). For the social scientist, hacking is a wonderful and very useful thing!

Hacking the Social Sciences

Hacking (computer programming; coding) has become a central component of social science research for several reasons. One is that hacking has allowed researchers to establish a greater standard of replicability for their projects. Take a quick look at the publication requirements for any of the top political science journals and you will see that publication in the journal is conditional on the submission of your replication file – usually composed in R, STATA, or Python – and its subsequent verification by an independent statistical expert. In fact, the American Journal of Political Science (AJPS) recently established publication standards that prevent the publication of a paper if an independent statistical expert is unable to replicate the results you present using your data and code. (As an aside, there is a relevant debate raging in the wake of a fudged political science study. The study was able to make it past the peer review process at Science, even though it was fraudulent.) If you want to get published in a top journal, you need to know how to hack.

Another reason hacking is essential to the practice of political science, or any science for that matter, is that it has scaled incredibly the ability of researchers to collect data and draw conclusions. Techniques like web scraping and services like Amazon’s MechanicalTurk allow people to enlist an army of either robots or single-tasked workers for very little cost (allowing those of us with more…ahem… modest budgets to get some decent research done). W-NOMINATE, a process which allows us to plot congresspeople in a policy space, is a staple of the political science research corpus. Its widespread use and utility would not have been possible without hacking. Even if you don’t view hacking as necessary, it is abundantly clear that it makes your “personal research lab” a whole lot more productive.

Here’s one more cool benefit you’ll get from learning to hack: once you write a script, you can re-use it again and again. In other words, once you put in the initial investment of time and energy (which, by the way, is fun and rewarding), that investment never loses its value. And that experience carries over—learning one programming language is very much like learning any other, and once you learn one, you’ve made learning the next much, much easier. Learning to hack isn’t like learning a new method that you use on one paper and then never again. Learning to hack buffs the learning curve for everything else in your favor.

What programming languages are available to the budding hacker?

The list of programming languages available to learn are endless, ranging from the ancient to the absurd. For example, see figure 1 for a slice of C, which first appeared in 1972, and see figure 2 for a slice of LOLCODE, which is a hilariously useless (but functional) language. Unless you’re trying to trick a graphics processor into doing sophisticated calculations for you (in the case of C) or just flat out trying to troll a journal (in the case of LOLCODE), you will not need to learn these languages. (Actually, I kind of want to troll a journal with a working LOLCODE replication script now).


Figure 1: An Example of a C Script (Ancient)

long some_function();
/* int */ other_function();

/* int */ calling_function()
{
    long test1;
    register /* int */ test2;

    test1 = some_function();
    if (test1 > 0)
          test2 = 0;
    else
          test2 = other_function();
    return test2;
}


Figure 2: An Example of a LOLCODE Script (Absurd)

HAI
CAN HAS STDIO?
PLZ OPEN FILE "LOLCATS.TXT"?
  AWSUM THX
    VISIBLE FILE
  O NOES
    INVISIBLE "ERROR!"
KTHXBYE


Most languages relevant to the burgeoning newbie (that’s us!) – the ones that help you do data collection, analysis, and visualization – are actually pretty intelligible. See, for example, the Python code in figure 3.


Figure 3: An Example of a Python Script (Awesome)

friends = ['john', 'pat', 'gary', 'michael']
for number, name in enumerate(friends):
    print "iteration {iteration} is {name}".format(iteration=number, name=name)


It’s simple to understand what’s going on in this script. We have a list of your friends, and the program prints each of your friends’ names along with the numerical position of that friend in the list. When processing data, we would write a similar script. In the simplest form of that script, we would have a list of data, and for each datum in that list, we would perform a function or a calculation. Not too bad, right?

Started from the bottom…

Certainly, there are several benefits of hacking that I have not discussed here. More so, there are some costs to learning how to hack: time investment, among other inconsequentials like syntax frustration (when something isn’t working because you are missing a comma) or the development of code envy (when you look at someone’s code and get jealous because it is so good). The way hacking is usually taught, there is a steep learning curve, which turns people off from learning how to hack because they feel like they can’t make the investment. Or, it causes tears because people don’t feel like they can ever be good enough to “be one of those hacker guys.” Another fear might not be that you are scared of not being the best, but that you are scared of being bad at it: you’ve seen so many of your peers/colleagues spit out code like it’s a second language. It’s scary to think that you just won’t be able to do it.

These are reservations that I have felt, too. Here on American Oil Can, I’m here to help walk you through the first few steps you’ll take as a fledgling coder—no judgement, no pretense. Just simple-to-follow posts that will help you understand what skills you need to learn and how you should use them. Here, we will teach you the ins and outs of the Mac Stack for data processing and analysis: UNIX, the Terminal, Bash, Homebrew, Python, R, Quandl, web scraping, and much more. (And let’s be honest: get a Mac or install Linux.) Let’s get lubridating!

Check back for more posts weekly in the Essential Hacking series. Subscribe via RSS.