Jumble of thoughts about Ruby and programming

I really like the Ruby language. I like it for its syntax and its idioms and the programming culture. Recently I’ve been thinking about the things that I don’t like about the language. It’ s a relatively short list.

Ruby: some things I don’t like about you

The slow speed. It really is bad, but we get away with it because reasons.

The symbol-string distinction. This seems to exist only to introduce silly bugs involving mismatches (and so Rails users can make jokes about HashWithIndifferentAccess). I know that symbols are singletons and that helps with memory usage, but 1) most Ruby programmers in my experience aren’t profiling memory that much and 2) that argument is less relevant now that there is a plan to treat string literals the same way.

The poor immutability story. I’m not sure that I buy 100% into FP (in part because I don’t know yet how to integrate it with OOP, which I do buy into) but I’ve become more and more disenchanted with mutation of objects. This was an actual pain point on a recent project, where I had a deeply nested hash that was being mutated from afar and I eventually used ice_nine to freeze the whole thing just to figure out where the mutation was taking place.

The MRI codebase. The core interpreter and low-level libraries are implemented in C (also the project uses Subversion for version control). I’ve tried subscribing in my RSS reader to the new commits on master on the Github mirror, but I eventually gave up because I never understood what I was looking at. On the other hand I was able to jump in and contribute to the Crystal standard libraries with a very minor PR just by glancing around the source code. I realize this isn’t necessarily fair, since this part of MRI standard lib would be probably be equally easy to hack on but in general it’s relatively easy to understand what is happening even in the “low-level” Crystal AST parsing code. It won’t happen but I wish Ruby were implemented in a higher level and more OO language like Go, Rust, or even C++ (or Crystal!).

The poor auto-completion and static analysis. Obviously this is the cost we pay for having an untyped, dynamic, late-binding language. More generally I agree with Avdi Grimm’s point here that Ruby has bad development tools. Rubocop is a notable exception that when integrated with tools like Syntastic and more recently Neomake has dramatically improved my development experience (if only because I get instant warnings when I have a unused variable, usually meaning I have a typo).

Peering at the greener grass

So, recently I have been exploring other languages, in particular compiled languages (not for the first time, but my entire working career has been with dynamic languages so it’s most of my programming experience). Not that much; just a little puttering around here and there with Crystal, Rust, and Go. It’s a little too early for me to say anything meaningful, but here are my early reactions:

  • Something I already knew: good autocompletion is really nice. Static typing really pays off when you can see at a glance all the methods and properties on the object that you’re dealing with.
  • Code-compile-execute is awkward (although to be fair,  in Ruby we run our tests constantly, almost like a compile step).
  • I really miss REPL-style investigation with a debugger (actually I should mention Pry as another great Ruby tool). Right now I’m experiencing this lack with a Go program, where I want to know what this struct instance that I’m dealing with is and the suggestions on the web seem to be to marshall it to a JSON object and then print it? I know that there are a couple things going on here and I’m probably making noob mistakes, but my point is that REPL is a really big part of productivity for me. Right now I have the joy of knowing all the types of  my objects through static analysis but I feel blind when it comes to runtime values.
  • You have to think about so many low-level details, like specifying the size and element types of arrays (or the maximum size of slices in Go). There’s your big productivity tradeoff there. These low-level details might help you be more precise about performance, but they also distract from whatever domain you’re usually working in.

Enfin

This is sort of the point of this disorganized jumble of incomplete thoughts. You will sometimes see sentiments like “Ruby is dying because everyone is going to go to X because it’s faster/better”. If X is Go or Rust you’re just talking about a different kind of thing. They’re really interesting and useful languages but you can’t get away from the fact that they impose a certain style and pace of programming that departs from what makes Ruby a great language.

Actually, I thought this guy expressed things pretty well:

I’m thinking Ruby is the most depressive programming language community. Every week there’s some Ruby figure handwringing in public.

— Joseph Method (@method3000) July 6, 2015

People freak out about what Node, Go, Haskell, or Elixir can do. People write diatribes about Ruby “dying”, about things “killing” Ruby.

— Joseph Method (@method3000) July 6, 2015

The bottom line is that Ruby has the best, most expressive syntax. It is a language for writing code that looks like pseudocode.

— Joseph Method (@method3000) July 6, 2015

Ruby optimizes for rapid prototyping and inter-developer communication. It has some inherent limitations and some accidental limitations.

— Joseph Method (@method3000) July 6, 2015

The accidental limitations are lack of concurrency and the GIL. The inherent limitation is that it can’t do certain compiler optimizations.

— Joseph Method (@method3000) July 6, 2015

So these things being said, pick your poison. What are you optimizing? Code accessibility and communicability? Performance over everything?

— Joseph Method (@method3000) July 6, 2015

I’m just saying, have perspective. If you tell me that Node.js is going to take over the world okay I’ll bite. But don’t tell me Haskell is.

— Joseph Method (@method3000) July 6, 2015

Haskell will be a useful tool in some people’s infrastructures but it’s never going to be a main driver because HAVE YOU SEEN HASKELL?

— Joseph Method (@method3000) July 6, 2015

Likewise, more people will code in Go or Rust for the performance. But don’t pretend everyone is going to do it. Because it’s low-level.

— Joseph Method (@method3000) July 6, 2015

Programming language use isn’t decided just on technical merits of the interpreter or compiler but on external factors like developer pool.

— Joseph Method (@method3000) July 6, 2015

Anyway this is why Ruby isn’t “dying”. So seriously just take your angst and use it to make Ruby better.

— Joseph Method (@method3000) July 6, 2015

I would just add a couple more nuances. I’m not saying that Ruby will never go out of fashion. I’m just dubious that it is currently “dying” or that it will “die”. Apart from the fact that major programming languages never actually die, there’s also the assumption that Ruby can’t improve and overcome many of its pain points. But there is still an incredible vibrancy and degree of communal care in the Ruby world that creates tools like Rubocop and that can mount efforts like Matz’s proposal to improve Ruby’s speed 3x.

Also I’m not saying that it would be unusual for someone to prefer another language, static or dynamic, over Ruby. It sounds like Elixir is really special, and the fact that it is dynamic, has Ruby-like syntax, AND is fast makes it really intriguing. And as I’ve mentioned before I’m especially interested in Crystal’s approach that combines Ruby-like syntax with compilation and inferred types (although the project is moving toward more explicit type declarations to improve compilation times). Really what I’m saying is that we should keep innovating to make the best programming language, which includes the concomitant development experience, with the caveat that you can’t actually make one language that everyone will prefer for every occasion and use case. And that’s just fine.

Bootstrapping

Crystal is a cool new language that I like. One thing that’s neat about it is that even the compiler (as opposed to the runtime libraries) is written in Crystal. That is, the compiler is written in the same high level language that the compiler is intended to compile! As an aside, this is really powerful and important for the future of the language, because of the way that it enables OOP programmers to contribute to the language core. Compare with MRI Ruby, which is written in C, so that you have a divide between core developers who can follow what’s going on in the C code and everyone else.

Anyway, I intend to write more about Crystal later, but here’s the quick point. A version of Crystal that has the compiler written in Crystal has to be compiled… by Crystal. So the way this works is that Crystal version 0.9.0 is compiled by a precompiled binary of Crystal version 0.8.0. If you install crystal-lang from Brew you will see this happening: it pulls down a tar file of 0.8.0 in order to compile 0.9.0.

Obviously this leads to a chicken and egg regress. What compiled 0.8.x? 0.7.x., and so on. So where did 0.1.0 come from then? It came from Ruby! That is, Crystal’s compiler used to be written in Ruby, before Crystal learned how to bootstrap itself.

It might seem that the regress stops there, at the point before the Crystal bootstrap, but MRI Ruby’s C code has to be compiled by something as well, for example the gcc compiler. And how is a compiler like gcc compiled? It is bootstrapped in another regress that goes back to an assembler. Honestly I get fuzzy at this point, but my understanding is that at some point in the past you arrive at humans physically etching circuitry or manually feeding in punch cards to machines. That is, we have to start the sequence before you can get to programs that generate other programs and machines that build other machines. And of course humans bootstrap each other, down through the generations.

Anyways, I know I just recapitulated the concept of the historical regress, from which Aristotle inferred the existence of a Prime Mover. What’s striking for me though is how this account of bootstrapping conflicts with the dominant sense in computing of infinite, cheap, meaningless reproducibility. Every program and document is capable of being reproduced millions of times across the Internet. No particular bit or line of code residing on a particular computer is special. But bootstrapping shows how all code is indebted to a certain lineage, even as it is also potentially free from that legacy to develop in new directions. Well, all the parallels and analogies flow from there.

Slow Vim? Check your ~/.zshenv

This is probably a “duh” tip, but here goes: don’t put heavy loading code in ~/.zshenv because this code will be processed in every system call inside another program like Vim, making the program seem slow.

Like every single programmer I maintain a repo of my dotfiles that I constantly tinker with. At some point I was reorganizing my ~/.zshrc and I moved some things into ~/.zshenv according to my (weak) understanding of how these things should be separated, mainly based on the advice of this Stack Overflow page.

So I gathered that you should put important environment variables there, but as you can see I also decided that this is where you should put the initialization code for rbenv, direnv, etc.

Don’t do this. I thought that Vim (actually I use Neovim these days) had become horribly slow and was inclined to blame this on individual plugins or just Vimscript in general. As soon as I moved the offending code out of ~/.zshenv into ~/.zshrc my Vim sped up dramatically.

Raspberry Pi wifi+ssh connection disconnecting (Ourlink rtl8192cu)

I started playing around with a new Raspberry Pi 2 that arrived yesterday. I had used a Beaglebone before (always connected to ethernet) so I knew what to expect in terms of memory and CPU limitations. What surprised me was how flaky ssh connections were when connected over wifi using this USB wifi dongle (an Ourlink RTL8188CUS/RTL8192cu chipset) from Adafruit. I could connect for a second and then if I paused for even a moment the connection would drop and I would get  a “Host is down” response if I tried to reconnect. Usually I would need to power cycle the Pi to be able to ssh in again.

Googling for similar issues I found the following advice:

  • Some 5V power sources actually deliver less than 5 volts. Also some chargers don’t deliver enough current (A) to supply both the Raspberry Pi and the wifi dongle. To guarantee that this wasn’t the issue I added a powered USB hub to run the wifi dongle. This didn’t solve the issue.
  • You can enable an ssh option to send a “null packet” every X seconds. You can set this on the server-side or on the client-side. On the server in /etc/ssh/sshd_config:
    ClientAliveInterval 60

    On the client in ~/.ssh/config:

    ServerAliveInterval 60

    This helped out considerably. This made sense since I could stay connected by just running ping in the ssh session or typing ls constantly. I would still lose connection after a while and would have to wait for something in the Pi to reset.

  • The real solution (and the point of this post) is suggested here. Apparently the wifi dongle has a power conservation mode that is causing it to disconnect from wifi. In hindsight, this is kind of obvious (I could see the Pi disconnecting from the router) but it took me forever to find this tip. Anyway this fixed everything:
    echo "options 8192cu rtw_power_mgnt=0 rtw_enusbss=0" | sudo tee --append /etc/modprobe.d/8192cu.conf

Hopefully this will save someone else some time.

Aside

Meanwhile

I mean to write some longer pieces here soon. I’ve put some energy into various posts on forum.stupididea.com. I would especially recommend the links category, which is where I put interesting things that I find through my feed reader. I also write over at my company’s blog on software development topics, mostly Ruby-related:  http://wegowise.github.io/. When I write over there of course it’s in a certain voice and I have to be slightly less opinionated, so I’ll still talk about software development here.

Review of It’s Complicated, by danah boyd

http://www.amazon.com/Its-Complicated-Social-Lives-Networked/dp/0300166311

(This book is actually available for free download here: http://www.danah.org/itscomplicated/, though it would be good to support her work if you like it.)

The remarkable thing about It’s Complicated is that danah boyd actually talks to teenagers for her findings. As she describes the different popular attitudes and beliefs about teens and social media it’s striking how little of it is actually based on sound data, and how much of it is based on the selling of fear by news media, barely concealed inter-generational prejudice and sheer intellectual laziness.

In one telling story, boyd (like bell hooks, she doesn’t capitalize her name) talks about a girl who killed her mother who had also been active on Myspace and describes how the media translated this event into “Girl with Myspace kills mother”. boyd identifies this as a common theme: stories that would otherwise be about broken homes or social disfunction become stories about social media, with the implication that social media creates the pathology. She argues that technology is such a locus for our hopes and anxieties that the realities of how technology is actually used by teenagers becomes distorted, resulting in bad public policy that often causes real harm.

For her own research she simply sits down with a wide array of kids and asks them questions about how and why they use social media. She uses a qualitative, ethnographic method, meaning that there’s no attempt to statistically verify the universality of her findings with surveys containing standardized questions and answers. Instead she identifies common themes from hundreds of interviews conducted over the last decade. This style of research strikes me as a good way to open up a field of social research, because it starts from reported phenomena rather than from aggregated data that already embeds assumptions and is subject to multiple interpretations. Many of her findings run directly counter to conventional wisdom, and others seem completely overlooked by popular media.

One common finding is that many kids either have little unstructured social time, because of over-programming by parents, or their parents don’t allow them to leave the house or meet with friends because of safety fears.  Social media becomes more important to teenagers as a result (the teens keep telling boyd that they would always rather hang out with their friends in person). Another finding is that teenagers use the different social networks in different ways (just like adults do) and struggle to maintain their privacy in the face of confusing settings and social circle boundary collapses (just like adults do). She talks about teens who are exasperated with parents who jump into Facebook conversations that are clearly not intended for them. She also talks about teens who employ clever tricks to avoid “drama” stemming from social networks, like deleting all comments on their wall every day (“whitewalling”) and a teen who temporarily suspends her account every day instead of logging out.

A more general theme is the conflict between parents’ concerns about teenagers and teenagers’ own perceived interests. Adults are concerned with ensuring that teenagers are doing their homework, not getting involved with bad kids, and keeping their digital records clean for future employers. Teenagers on the other hand are interested in entering into public life wherever it can be found. Since other kinds of ‘publics’ are withheld from teenagers, even to the extent that many teenagers are prevented from gathering,  ‘networked publics’ are elevated in importance for teenagers, and they are willing to make more trade-offs in terms of privacy or potentially negative representations.

What adults see as irresponsible or even “obsessive” interest in social media is for boyd a rational response to the developmental stage of being a teenager given existing social conditions. She attributes the special anxiety that parents have towards the Internet and social media to the fact that it allows teenagers to enter into various ‘publics’. boyd argues that teenagers need to be able to step out and operate in these publics in order to do their ‘identity work’, to develop into social adults. The conflict between teenagers and adults is thus a disagreement about how valuable that online social engagement is. To teenagers it feels very important but adults tend to discount it, often without compassion.

Another theme that runs throughout the book is the idea that technology by itself generally does not create social problems, nor does it offer solutions on its own. In one chapter boyd looks at the fears about sexual predators on the Internet. She finds that the risk is extremely low for teenagers overall, but that the teenagers who do get involved with adults through the Internet tend to come from troubled households and engage in risky behaviors in real life as well. Likewise, in a chapter on whether teenagers are “digital natives”, boyd points out what should be obvious: like other forms of literacy technological literacy tracks with income and social class. Similarly social media doesn’t necessarily promote equality or erase racism, but largely reproduces existing social networks and attitudes. In a chapter on online bullying, she allows that technological ‘affordances’ can help spread harassing messages farther and wider, but she reports that her informants claim that bullying is not a big issue for them (instead they talk about “drama”, which doesn’t involve a power differential). Her message is that while technology can amplify or alter behaviors online, it does not necessarily create the behaviors or the underlying non-technological conditions behind them.

The book is written in an accessible style with a minimum of jargon that clearly enunciates its arguments and findings. These points are so counter to popular views on teenagers and social media that the exposition can be forgiven for being somewhat cautious and repetitive. boyd does a good job of not assuming any academic background on the part of the reader, and gives a clear explanation of the few theoretical constructs that she needs to make her case (she does have the annoying-to-me academic tic of referring to things as “problematic”). Over all, this is a great model for how to communicate challenging ideas to a wide audience. I would recommend this book to parents, educators, policy makers, journalists–anyone who would like to understand teenagers, rather than just demonize them.

Webcomics

Somebody asked me to make a list of webcomics that I subscribe to (using Feedly these days) and I started writing little descriptions next to each one. I figured I’d just place it here in case anyone else is interested. These are the webcomics that have made the cut after having tried out and abandoned many more. Here they are, in no particular order (well, the order of the tabs in Chrome at the moment). I read all of these but some I recommend more whole-heartedly than others (recommended comics have a * next to them).

  • * Hark! A Vagrant — The panels require basic knowledge of literature and history, but it’s Kate Beaton’s art style that usually clinches the joke. She draws insanity and idiocy extremely well.
  • * Left Handed Toons — The conceit is that the right-handed artist draws with his left hand, but he’s been doing it so long that it doesn’t seem like much of a hindrance anymore. The comic relies on brutally dumb puns and literalisms, as well as a recurring cast of characters like Fridaynosaur, Whale, and General Twobabies.
  • Penny Arcade — Mainly of interest to people who play video games. Obscure references to recent video games and lots of gross-out humor.
  • PhD — Only occasionally funny. This is basically Cathy for academics. The jokes are all: graduate students are overworked; dissertations are stressful; advisors are clueless.
  • * Pictures for Sad Children — Grim strips that follow a depressive logic where events often take a surreal turn but nobody acts surprised. People end up inside dogs and idly discuss what to do, etc. Pretty great if you have the fortitude for this kind of thing. Not updated these days.
  • * Poorly Drawn Lines — Just started reading this, but so far the gags are good, absurd but not particularly dark.
  • *  Saturday Morning Breakfast Cereal — SMBC’s Zach Weiner is on par with Gary Larson and xkcd’s Randall Munroe in terms of raw creativity. He does both one-panel gags and longer multi-page stories. The comics generally have a science and science fiction bent with a solid grasp on related philosophical issues. Weiner is also a significantly better illustrator than most webcomic artists.
  • Sheldon — This is more of a traditional syndicated comic, but it occasionally has some interesting storylines.
  • * Subnormality — These are irregularly updated bizarrely dense comics that typically take up several screens. The comics are extremely dialogue-heavy, to the point where I frequently skip them not because I dislike the stories but simply because I don’t have the time to read them all. The stories typically take the form of externalized inner dialogues about insecurity, projection, prejudgment, etc.
  • * Gutters — One-off comics about the comic book industry by comic book artists.
  • The Trenches — Episodic strip about game testers.
  • * Wondermark — Hilarious comic assembled from old illustrations to create absurd hybrid 19th-and 21st-century jokes. Has an alt text joke.
  • * xkcd — Should need no introduction. Alt text joke.
  • * Girls with Slingshots — A well-drawn sitcom about two girls who do polyamory. Sexy but not explicit.
  • * Cyanide & Happiness — Incredibly crass jokes that could be fairly criticized as promoting all kinds of bad culture, but I would be lying if I said I didn’t find many of them to be amusing.
  • * Dinosaur Comics — Dinosaur Comics is what it is: goofy (secretly rather intelligent) rambling about language and culture overlaid on an invariant set of six panels.
  • * Dilbert — Just because Dilbert cartoons are a cubicle cliche doesn’t mean that they don’t generally speak a certain truth. It’s the same stuff over and over about pointy haired bosses and lazy coworkers, but the punchline is usually pretty fresh and often surprisingly edgy.
  • Diesel Sweeties — People seem to love this comic. It’s sufficiently interesting but doesn’t blow me away. It’s mainly jokes about killer robots and coffee.
  • * Crocodile in water tiger on land — Commentary about Indian society, delivered in a self-satirizing manner by a cast of Indian archetypes that I have to partially construct from context (e.g., the hipster, the religious zealot, the business fat cat, etc.) I don’t totally follow the issues being discussed, but it gives me some insight into Indian society and provides a valuable example of cultural self-criticism.
  • * Cat and Girl — I can’t say that Cat and Girl is usually or even often laugh out loud funny. In fact I can’t say that I generally 100% understand Dorothy’s comics. The strips tend to reward taking time to do an analysis of the words and symbolism to derive something like a thesis statement. The thesis statement is often about the nature of authenticity as an ineffable criterion for value in the context of postindustrial society and the age of digital reproduction and social networks. This will certainly not be everyone’s cup of tea, but it definitely is mine.
  • * Blaster Nation — This is another geeky narrative sitcom. Digging the story so far.
  • * Bad Machinery — Scooby-doo-style mystery stories set in England. The fun though is in the hilarious banter between the kids.
  • * Abstruse Goose — Similar to xkcd, jokes about being geeky with a focus on math and science.
  • * A Softer World — Three-image strips with some text that is generally a funny-sad statement about love and loss.
  • * Hijinks Ensue — Violent-gross jokes about geek dude culture.
  • * Scenes from a Multiverse — Strips rotate through several universes. The most popular ones get to come back.
  • * Perry Bible Fellowship — Wonderfully evil, beautifully drawn comic. Every punchline is designed to disturb you. Not updated these days.
  • * Achewood — I’m not going to bother describing this comic. It’s about some dog people and the writer Chris Onsted has a wonderful ear for spoken English. Sadly not updated for more than a year.

Helpful commands

As a follow-up to my last post here are some commands that I use throughout the day. They are admittedly nothing special but they help me out.

.bashrc aliases:

alias grc='git rebase --continue'
alias gca='git commit -a --amend'
alias gs='git status -sb'
alias gd='git diff'
alias gsn='git show --name-only'

The one worth explaining is gca. This one stages and commits everything to the previous commit. I use this constantly to keep adding stuff to my WIP commits. One thing to watch out for is this will mess up a merge conflict fix inside a rebase operation because you’ll end up putting everything in the merge before the conflict. You want to do grc instead.

scripts:

force_push — I use this to automate the process of updating my remote branch and most importantly to prevent me from force pushing the one branch that I must NEVER force push.

#!/usr/bin/env bash
CURRENT_BRANCH=`git rev-parse --abbrev-ref HEAD`
if [ $CURRENT_BRANCH == 'master' ]; then
  echo "YOU DO NOT WANT TO DO THAT"
  exit 0
fi
 
echo "git push origin $CURRENT_BRANCH --force"
read -p "Are you sure? [Yn] "
if [ "$REPLY" != "n" ]; then
  git push origin $CURRENT_BRANCH --force
fi

rebase_branch — There’s not really a lot to this, but I use it reflexively before I do anything.

#!/usr/bin/env bash
git fetch
git rebase -i origin/master

merge_to_master — I do this when I’m done with a branch. This makes sure that there will be a clean fast-forward push. Notice how it reuses rebase_branch.

#!/usr/bin/env bash
rebase_branch
CURRENT_BRANCH=`git rev-parse --abbrev-ref HEAD`
echo "git checkout master"
git checkout master
echo "pull origin master"
git pull origin master
echo "git merge $CURRENT_BRANCH"
git merge $CURRENT_BRANCH

git-vim — this one is still a bit of a work in progress, but the idea is to grab the files you’ve changed in Git and open them in separate tabs inside Vim. You can then run it with git vim which I alias as gv.

#!/usr/bin/env ruby

# uncommitted files
files = `git diff HEAD --name-only`.split("\n")
if files.empty?
  # WIP files
  files = `git show --name-only`.split("\n")
end

system("vim -p #{files.join(" ")}")

Of course, all these scripts need to be put somewhere in your executable path. I put them in ~/bin and include this location in my path.

So my workflow would look like this

git checkout -b new_branch
# hack hack hack
git commit -a
# hack hack hack
gca
# hack hack hack
gca
# all done now
rebase_branch
# whoops a merge conflict
# resolve it
git add .
grc
# Time to get this code reviewed on Github
force_push
# Code accepted, gonna merge this
merge_to_master

Git workflow

In my last post I described how at my work we use code review feedback to iteratively improve code. I want to describe how Git fits into this process, because this is probably the biggest change I had to make to my preexisting workflow. Basically I had to relearn how to use Git. The new way of using it (that is, it was new to me) is extremely powerful and in a strange way extremely satisfying, but it does take a while to get used to.

Importance of rebasing

I would describe my old approach and understanding as “subversion, but with better merging” ((Not an insignificant improvement, since merging in Subversion sucks.)). I was also aware of the concept of rebasing from having submitted a pull request to an open source project at one point, but I didn’t use it very often for reasons I’ll discuss later. As it turns out understanding git rebase is the key to learning how to use Git as more than a ‘better subversion’.

For those who aren’t familiar with this command, git rebase <branch> takes the commits that are unique to your branch and places them “on top” of another branch. You typically want to do this with master, so that all your commits for your feature branch will appear together as the most recent commits when the feature branch is merged into master.

Here’s a short demonstration. Let’s say this is your feature branch, which you’ve been developing while other unrelated commits are being added to master:

Feature branch with ongoing mainline activity
Feature branch with ongoing mainline activity

If you merge without rebasing you’ll end up with a history like this:

History is all jacked up!
History is all jacked up!

Here is the process with rebasing:

# We're on `feature_branch`
git rebase master # Put feature_branch's commits 'on top of' master's
git checkout master
git merge feature_branch

This results in a clean history:

Feature branch commits on top
Feature branch commits on top

Another benefit of having done a rebase before merging is that there’s no need for an explicit merge commit like you see at the top of the original history. This is because — and this is a key insight — the feature branch is exactly like the master branch but with more commits added on. In other words, when you merge it’s as though you had never branched in the first place. Because Git doesn’t have to ‘think’ about what it’s doing when it merges a rebased branch it performs what is called a fast forward. In this case it moved the HEAD ((Which I always think about as a hard drive head, which I in turn think about as a record player needle)) from 899bdb (More mainline activity) to 5b475e (Finished feature branch).

The above is the basic use case for git rebase. It’s a nice feature that keeps your commit history clean. The greater significance of git rebase is the way it makes you think about your commits, especially as you start to use the interactive rebase features discussed below.

Time travel

When you call git rebase with the interactive flag, e.g. git rebase -i master, git will open up a text file that you can edit to achieve certain effects:

Interactive rebase menu
Interactive rebase menu

As you can see there are several options besides just performing the rebase operation described above. Delete a line and you are telling Git to disappear that commit from your branch’s history. Change the order of the commit lines and you are asking Git to attempt to reorder the commits themselves. Change the word ‘pick’ to ‘squash’ and Git will squash that commit together with the commit on the preceding line. Most importantly, change the word ‘pick’ to ‘edit’ and Git will drop you just after the selected ref number.

I think of these abilities as time travel. They enable you to go back in the history of your branch and make code changes as well as reorganize code into different configuration of commits.

Let’s say you have a branch with several commits. When you started the branch out you thought you understood the feature well and created a bunch of code to implement it. When you opened up the pull request the first feedback you received was that the code should have tests, so you added another commit with the tests. The next round of feedback suggested that the implementation could benefit from a new requirement, so you added new code and tests in a third commit. Finally, you received feedback about the underlying software design that required you to create some new classes and rename some methods. So now you have 4 commits with commit messages like this:

A messy commit history
A messy commit history
  1. Implemented new feature
  2. Tests for new feature
  3. Add requirement x to new feature
  4. Changed code for new feature

This history is filled with useless information. Nobody is going to care in the future that the code had to be changed from the initial implementation in commit 4 and it’s just noise to have a separate commit for tests in commit 2. On the other hand it might be valuable to have a separate commit for the added requirement.

To get rid of the tests commit all you have to do is squash commit 2 into commit 1, resulting in:

  1. Implemented new feature
  2. Add requirement x to new feature
  3. Changed code for new feature

New commit 3 has some code that belongs in commit 1 and some code that belongs with commit 2. To keep things simple, the feature introduced in commit 1 was added to file1.rb and the new requirement was added to file2.rb. To handle this situation we’re going to have to do a little transplant surgery. First we need to extract the part of commit 3 that belongs in commit 1. Here is how I would do this:

# We are on HEAD, i.e. commit 3
git reset HEAD^ file1.rb
git commit --amend
git stash
git rebase -i master
# ... select commit 1 to edit
git stash apply
git commit -a --amend
git rebase --continue

It’s just that easy! But seriously, let’s go through each command to understand what’s happening.

  1. The first command, git reset, is notoriously hard to explain, especially because there’s another command, git checkout, which seems to do something similar. The diagram at the top of this Stack Overflow page is actually extremely helpful. The thing about Git to repeat like a mantra is that Git has a two step commit process, staging file changes and then actually committing. Basically, when you run git reset REF on a file it stages the file for committing at that ref. In the case of the first command, git reset HEAD^ file.rb, we’re saying “stage the file as it looked before HEAD’s change”; in other words, revert the changes we made in the last commit.
  2. The second command, git commit --amend commits what we’ve staged into HEAD (commit 3). The two commands together (a reset followed by an amend) have the effect of uncommitting the part of HEAD’s commit that changed file1.rb.
  3. The changes that were made to file1.rb aren’t lost, however. They were merely uncommitted and unstaged. They are now sitting in the working directory as an unstaged diff, as if they’d never been part of HEAD. So just as you could do with any diff you can use git stash to store away the diff.
  4. Now I use interactive rebase to travel back in time to commit 1. Rebase drops me right after commit 1 (in other words, the temporary rebase HEAD is commit 1).
  5. I use git stash apply to get my diff back (you might get a merge conflict at this point depending on the code).
  6. Now I add the diff back into commit 1 with git commit --amend -a (-a automatically stages any modified changes, skipping the git add . step).

This is the basic procedure for revising your git history (at least the way I do it). There are a couple of other tricks that I’m not going to go into detail about here, but I’ll leave some hints. Let’s say the changes for the feature and the new requirement were both on the same file. Then you would need to use git add --patch file1.rb before step 2. What if you wanted to introduce a completely new commit after commit 1? Then you would use interactive rebase to travel to commit 1 and then add your commits as normal, and then run git rebase --continue to have the new commits inserted into the history.

Caveats

One of the reasons I wasn’t used to this workflow before this job was because I thought rebasing was only useful for the narrow case of making sure that the branch commits are grouped together after a merge to master. My understanding was that other kinds of history revision were to be avoided because of the problems that they cause for collaborators who pull from your public repos.  I don’t remember the specific blog post or mailing list message but I took away the message that once you’ve pushed something to a public repo (as opposed to what’s on your local machine) you are no longer able to touch that history.

Yes and no.  Rebasing and changing the history of a branch that others are pulling from can cause a lot of problems. Basically any time you amend a commit message, change the order of a commit or alter a commit you actually create a new object with a new sha reference. If someone else naively pulls from your branch after having pulled the pre-revised-history they will get a weird set of duplicate code changes and things will get worse from there. In general if other people are pulling from your public (remote) repository you should not change the history out from under them without telling them. Linus’ guidelines about rebasing here are generally applicable.

On the other hand, in many Git workflows it’s not normal for other people to be pulling from your feature branch and if they are they shouldn’t be that surprised if the history changes.  In the Github-style workflow you will typically develop a feature branch on your personal repository and then submit that branch as a pull request to the canonical repository. You would probably be rebasing your branch on the canonical repository’s master anyway. In that sense even though your branch is public it’s not really intended for public consumption. If you have a collaborator on your branch you would just shoot them a message when you rebase and they would do a “hard reset” on their branch (sync their history to yours) using git reset --hard remote_repo/feature_branch. In practice, in my limited experience with a particular kind of workflow, it’s really not that big a deal.

Don’t worry

Some people are wary of rebase because it really does alter history. If you remove a commit you won’t see a note in the commit history that so and so removed that commit. The commit just disappears. Rebase seems like a really good way to destroy yours and other people’s work. In fact you can’t actually screw up too badly using rebase because every Git repository keeps a log of the changes that have been made to the repository’s history called the reflog. Using git reflog you can always back out misguided recent history changes by returning to a point before you made the changes.

Hope this was helpful!

Work

My new work is great ((New as in I’ve been there for ~6 months )). The main thing I like about it is the consummate professionalism of the team. Everyone I work with is interested in improving their craft and is eager to engage in discussions about software principles, code quality, development processes and tools. In general there is a noticeable group ethos that seems guided by the following set of principles:

Take the time to make your work as high-quality as possible given the scope of the task at hand. This means writing tests and undertaking refactorings to manage complexity in the normal course of development. It also means often having to partially or fully revisit a design based on new findings that come out during development. This may seem like a recipe for low productivity but our team is pretty consistent in its output: to use a decidedly fake and wobbly measurement, each team member seems capable of producing about two significant user-facing features per week. I know from personal experience that the alternative, programming to hard deadlines and arbitrary deployment cycles ((These deadlines and cycles often end up being Procrustean beds for feature requirements.)), may at times produce the superficial appearance of a greater amount of output (“With Monday’s release we closed ten stories and five bugs!”) but this is just a loan taken out against the day when you have to devote entire “sprints” to fixing the earlier rushed implementations. Taking the requisite time with each task is consonant with the recognition that growth in complexity in a system has to be managed, either all at once in a great “redesign” crusade, or in small thoughtful efforts as you go.

Even sufficiently good work can be improved dramatically through constructive criticism and iterative development. This was something I’ve had considerable trouble with at times, as a person who avoids conflict and criticism at all cost. The key to being able to accept the criticism (which normally comes in the form of code reviews on Github), I’ve found, is to focus on the improvement to the final product brought about by incorporating the criticism. During each round of code review feedback instead of fighting the suggestions I try to let go of my ego (not at all easy) and visualize how the code will look after I implement the changes; I’m generally much happier with the final code than I am with the initial submission. I also try to approach criticism as containing information about what I should do and what I should avoid in the future, so that even when I am receiving brutal criticism about the quality of my code I receive some comfort from the fact that I am learning how to avoid similar criticism in the future ((Did I mention that I don’t like criticism?)).

There’s more than one way to do it, but you should be able to articulate the reason for why you’re doing something a certain way, even if the articulation of that reason is “because that’s the way I learned it initially and there’s no compelling reason to change”. This of course opens you up to the possibility that somebody else has a positive argument for a certain practice, in which case it’s hard to argue for the original practice purely on the basis of apathy. Often the process of asking oneself why one prefers one practice to another ends up extracting motivations and values that are more interesting than the actual issue of practice in question. As a team we discuss many issues great and small, from high-level questions about design patterns and object composition to nitty-gritty questions about code block indentation. Sometimes these discussions result in some binding decision regarding style or best practice, but on other occasions we’ve articulated dueling reasons and concluded that the arguments for both are strong enough that both practices or styles are valid for different contexts.

Work to improve the product and support your teammates. I know this one sounds like a cheerlead-y cliché, but all I mean by this is that the primary motivator for working hard is to make the application and its codebase better, both because these support one’s self-interest of having the company succeed and remaining employed, but also because you see your teammates working hard and want to make a similar contribution. The codebase and app are a communal creation and as one works with it one begins to feel a sense of loyalty to the whole, as well as a sense of pride for the sections of code that you’ve played a major role in shaping. This is worth mentioning mainly because it stands in contrast to other motivational systems, like those that depend on fear of punitive action or explicit intra-team competition. There are still elements of fear and competition at work in our system, but they are sublimated into cooperative and high-productivity behaviors ((That is, people compete to be the most helpful!)).

This ethos doesn’t come from nowhere. It is the product of specific values that are held and promoted by my boss, and it is sustained through a combination of the personalities of the people he chose to hire and the practices (often enforced/encouraged by technological tools) he put in place. It is an impressive accomplishment that is easier to describe than it would be to replicate.

Anyway, it’s a pretty cool job.