Indenting with Tabs and Spaces
The oft-repeated “don’t mix tabs and spaces” is wrong.
In this article, I’ll explore what programmers use indentation for, the advantages and disadvantages of indenting with tabs or spaces alone, and why mixing them is the best, if not objectively correct, way to indent. I am certainly not the first to make this claim, and I’m sure I won’t be the last. But this point is far from moot; many, many, many, MANY people still advocate using a fixed number of spaces. This article will show why they are wrong.
Indentation: The Basics
The Two Goals of Indentation
Unless you’re using a language with whitespace-based syntax — the most notorious example of which is Python — your code will run just fine no matter how it’s indented. So why indent at all? When writing code, indentation is used to accomplish two goals:
-
It makes the structure of your code, as a computer would see it, clear to a human as well; when code is indented correctly, the hierarchical relation between statements and scopes is visible on the screen. This way, you and the computer agree on the structure of the code. We will refer to indentation used this way as semantic indentation.
An example of semantic indentation is below. All examples in this article will be in JavaScript, but the principles apply to every language.[1]
// The initial indent level is 0 function f(x, y) { // We're inside the function, so we indent one level if (x) { // Now we're inside the `if-elseif-else`, so we indent one additional level } else if (y) { // Still inside `if-elseif-else` for (let i = 0; i < 10; i++) { // Another scope --> another indent console.log(`Logged ${i+1} times`); } } else { // Still inside... } // Now we're not, so we go back to the function-level indent } // And now we're outside the function again, so back to 0
-
It is used for typesetting/alignment. This is most useful when when items that are part of a single statement have to be displayed over multiple lines and you wish to horizontally align the lower items with the first item. We will refer to indentation used this way as aesthetic indentation.
Below is an example of this kind of indentation with two equivalent styles shown.
Style 1: operators at the start of the lineif (superLongConditionNumber1 && superLongConditionNumber2 && superLongConditionNumber3) { /* Do stuff */ }
Style 2: operators at the end of the lineif (superLongConditionNumber1 && superLongConditionNumber2 && superLongConditionNumber3) { /* Do stuff */ }
In both styles[2], the conditions in the
if
statement are all aligned to the same column, column 4.When everything is in the same place on each line, the code becomes much easier to parse quickly; it’s almost like you’re looking at a table of text instead of lines of text.
For alignment to work consistently, it’s imperative that you use a monospace font — a font in which every character has the same width (as opposed to a variable-width font). Otherwise the alignment will be sensitive to the relative widths of the characters, and thus will break if you switch fonts. In this article, all the code blocks are written with a monospace font, a custom variant of Iosevka. |
So, what happens when you need to mix both styles of indentation? What happens when, on a single line, you need to use both semantic indentation and aesthetic indentation? Can this situation even occur naturally? (Yes.) Before we talk about that, we should discuss what indentation actually is.
A Brief History of the Tab Key
The tab key, Tab, can trace its history back to the tab stop on typewriters. On a typewriter, tab stops were used to create fixed horizontal positions for the carriage (the part that holds the paper and moves horizontally as you type) to travel to. This was useful when typing tabular data — hence the name tab stop — as once you had placed tab stops at the start of each of the table’s columns, you could quickly advance from one column to the next by pressing the Tab key.[3] This was faster and more reliable than pressing the space key as many times as necessary to advance to the next column. (When you were done with the last column on the line, instead of pressing Tab you would pull the carriage return lever to move to the next line and return the carriage to the leftmost position.) When computers began to replace typewriters, the Tab key kept the same behavior: advancing to the next preset location on the line. The question was: what is the right way to implement this behavior in a virtual text field?
The Tab Character
Wite-Out notwithstanding, typewritten text is a write-once format, so when advancing the carriage there was no difference between using tab stops and repeatedly pressing the space bar; tab stops just made advancing the carriage more consistent and convenient.
But computers allow text to be edited (duh), so when people started using them, the question of how a computer should store the fact that the Tab key was pressed needed to be settled.
Should it simulate pressing space enough times to move over to the next virtual tab stop — say, some multiple of eight characters from the start of the line?
Well, if spaces were inserted, then if any edits were made to the previous text on the line, subsequent tab stops would become misaligned and the user would have to manually correct the number of spaces inserted.
This wouldn’t do; computers are supposed to automate this kind of boring work for us.
Instead, the answer was the tab character, U+0009
, often represented in source code as \t
.
Throughout this article, I will refer to this character as <TAB>
.[4]
Virtual Tab Stops
Throughout this article, within code blocks, ⤓
will represent tab stops, ·
will represent space between tab stops, ├╌┤
will represent a tab character (of variable width), and ␣
will represent a single space character.
Due to this representation of the <TAB>
character, we will have to represent tabs of width 1 as ║
, and tabs of width 0 as single line behind the previous character, e.g., a
.
The beauty of the <TAB>
character, as opposed to using several spaces, is that the computer can dynamically set its width so that it always appears to fill the horizontal space to the next virtual tab stop.
When text to the left of a <TAB>
character is edited, the computer can immediately resize the <TAB>
character on the fly so that it is always flush with a tab stop.
In other words, if you use <TAB>
to enter tabular data laid out in columns, the alignment of the columns will be preserved even as the content of the table changes.
For instance, if tab stops are set to multiples of eight characters, then the following shows how <TAB>
will change widths to maintain the columns.
·······⤓·······⤓
├╌╌╌╌╌╌┤├╌╌╌╌╌╌┤
a├╌╌╌╌╌┤├╌╌╌╌╌╌┤
ab├╌╌╌╌┤x├╌╌╌╌╌┤
abc├╌╌╌┤xy├╌╌╌╌┤
abcd├╌╌┤xyz├╌╌╌┤
In this example, we chose a tab width of eight characters. This decision was arbitrary; let’s see what happens if we’d instead chosen a tab width of four characters.
···⤓···⤓
├╌╌┤├╌╌┤
a├╌┤├╌╌┤
ab├┤x├╌┤
abc║xy├┤
abcdxyz║
Looks good, although in the last row we ran out of space between the columns. What about a width of two characters?
·⤓·⤓·⤓·⤓
├┤├┤
a║├┤
abx║
abc║xy
abcdxyz║
Whoops! Our tab stops weren’t wide enough — they were only two spaces wide, but we tried to insert four characters in one column — so things got misaligned. But as long as the tab stop width is at least the width of the widest column of text entered within a column, everything works out.
So, if you’re entering data in columns, how wide should tab stops be? Eight characters? Four? Two? (One?) It depends on how wide you expect your columns to be.
But when indenting code, there is by definition no text within the indentation; it’s all whitespace. There is no minimum tab width; regardless of the tab width you choose, the (whitespace-only) columns will remain aligned. This means that the consideration of how wide to make the tab stops that comprise the indentation in your editor is immaterial. It is completely up to your personal preference. Which of the following styles do you prefer?
function f() {
if (foo) {
bar()
}
}
function f() {
if (foo) {
bar()
}
}
It’s entirely a matter of preference; both options perfectly preserve the alignment of the whitespace columns.
Camp 1: Only Tabs
Tabs for Semantic Indentation
Alice and Bob are working on the same codebase.
Having read the above, they decide that from here on out, they will only use <TAB>
to indent their code.
That way they can each read each other’s code with the indent width that they prefer.
·······⤓·······⤓
// Alice prefers a tab width of 8
function f() {
├╌╌╌╌╌╌┤if (foo) {
├╌╌╌╌╌╌┤├╌╌╌╌╌╌┤bar()
├╌╌╌╌╌╌┤}
}
···⤓···⤓
// Bob prefers a tab width of 4
function f() {
├╌╌┤if (foo) {
├╌╌┤├╌╌┤bar()
├╌╌┤}
}
Looks great! Crucially, these two code snippets have the same underlying representation — they are byte-for-byte equal. Yet when Alice and Bob open this file on their own computer, they each get to see the code how they prefer. Same file, different appearance — that’s the magic of the tab character.
Tabs for Aesthetic Indentation
Unfortunately, Alice and Bob run into a little bit of a wrinkle. Bob checks in the following code:
···⤓···⤓
function f() {
├╌╌┤if (superLongConditionNumber1
├╌╌┤├╌╌┤&& superLongConditionNumber2
├╌╌┤├╌╌┤&& superLongConditionNumber3)
├╌╌┤{
├╌╌┤├╌╌┤/* Do stuff */
├╌╌┤}
}
He is proud of how nicely aligned his code is; having the three condition lines aligned inside the if
's parentheses looks great.
But soon afterward, Alice complains to him about how poorly aligned his code is.
Bob is confused — the code looked great on his computer.
But when he views his code on Alice’s computer, he sees the issue:
·······⤓·······⤓
function f() {
├╌╌╌╌╌╌┤if (superLongConditionNumber1
├╌╌╌╌╌╌┤├╌╌╌╌╌╌┤&& superLongConditionNumber2
├╌╌╌╌╌╌┤├╌╌╌╌╌╌┤&& superLongConditionNumber3)
├╌╌╌╌╌╌┤{
├╌╌╌╌╌╌┤├╌╌╌╌╌╌┤/* Do stuff */
├╌╌╌╌╌╌┤}
}
Whoops!
The if
statements conditions lines aren’t aligned anymore.
Unfortunately, Bob had meant to push over those condition lines by four characters, but he used <TAB>
s to do it, and so the alignment he was so proud of only existed when using <TAB>
s of width four.
Camp 2: Only Spaces
Spaces for Aesthetic Indentation
Alice’s and Bob’s coworker Simplicio[5] watches this unfold from a distance. He offers them a solution to their problem: “Just use spaces!”. Alice and Bob look at each other reluctantly, but Simplicio insists that Alice and Bob open his version of the file, shown below, on their computers.
·····⤓·····⤓
function f() {
␣␣␣␣␣␣if (superLongConditionNumber1
␣␣␣␣␣␣␣␣␣␣&& superLongConditionNumber2
␣␣␣␣␣␣␣␣␣␣&& superLongConditionNumber3)
␣␣␣␣␣␣{
␣␣␣␣␣␣␣␣␣␣␣␣/* Do stuff */
␣␣␣␣␣␣}
}
They all agree that no matter whose computer they viewed Simplicio’s version on, it looks the same. Reluctantly, Bob admits that this does fix his alignment problem. But Alice isn’t so sure about Simplicio’s solution.
Spaces for Semantic Indentation
“Wait”, Alice said, “this won’t do.”
“The code is aligned, is it not?”, asked Simplicio.
“Yes, but Bob and I are stuck with your ugly indentation! I like my indentation to be eight characters, and Bob likes his to be four.”
“I’m sorry”, says Simplicio, “but I just don’t see another way to solve this problem. If you want your code to be aligned, you’ll just need to indent with spaces, your preferences be damned.”
Sure enough, if they were to continue to use spaces for indentation, they would not each be able to use the indentation width they preferred. It seemed that Simplicio’s “solution” had merely traded one problem for another.
Camp 0: (Responsibly) Mixing Tabs and Spaces
“Eureka!”, cries Alice. “I know the solution to this problem. I know how we can each use the indentation width we prefer while keeping our code nicely aligned!”
“Impossible”, retorts Simplicio.
“Let’s hear her out”, says Bob.
Her solution? Use the two kinds of whitespace characters where they each excel.
<TAB>
for Semantic Indentation…-
Because the
<TAB>
character is flexible, it is perfect for semantic indentation. Its width can be set by each user individually, and so they’ll all view the same file the way they each prefer. - And
<SPACE>
for Aesthetic Indentation -
Because the
<SPACE>
character is inflexible, it is perfect for aesthetic indentation. Its width never changes, so the amount of space it adds to aligned text remains the same regardless of who is looking at the file.
To test her solution, Alice rewrites the file as such:
·······⤓·······⤓
function f() {
├╌╌╌╌╌╌┤if (superLongConditionNumber1
├╌╌╌╌╌╌┤␣␣␣␣&& superLongConditionNumber2
├╌╌╌╌╌╌┤␣␣␣␣&& superLongConditionNumber3)
├╌╌╌╌╌╌┤{
├╌╌╌╌╌╌┤├╌╌╌╌╌╌┤/* Do stuff */
├╌╌╌╌╌╌┤}
}
When Bob opens it on his computer, he sees this:
···⤓···⤓
function f() {
├╌╌┤if (superLongConditionNumber1
├╌╌┤␣␣␣␣&& superLongConditionNumber2
├╌╌┤␣␣␣␣&& superLongConditionNumber3)
├╌╌┤{
├╌╌┤├╌╌┤/* Do stuff */
├╌╌┤}
}
It looks like her solution works!
Out of spite, Simplicio sets his <TAB>
width to 16.
Surely Alice’s solution won’t work then?
To his dismay, it does:
···············⤓···············⤓
function f() {
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤if (superLongConditionNumber1
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤␣␣␣␣&& superLongConditionNumber2
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤␣␣␣␣&& superLongConditionNumber3)
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤{
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤/* Do stuff */
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤}
}
Finally, in a vain attempt to outsmart Alice’s solution, Simplicio tries setting his <TAB>
width to 1:
⤓⤓
function f() {
║if (superLongConditionNumber1
║␣␣␣␣&& superLongConditionNumber2
║␣␣␣␣&& superLongConditionNumber3)
║{
║║/* Do stuff */
║}
}
Unsurprisingly, Alice’s solution survives this attack, too.
Reluctantly, Simplicio concedes that using <TAB>
for semantic indentation and <SPACE>
for aesthetic indentation is the best option after all; it’s the only way to allow Alice, Bob, and anyone else (even Simplicio) to each have the indentation width they prefer without losing the ability to keep code nicely aligned.
Editing in Practice
So, now that we know the right way to indent our code, how can we actually achieve this style of indentation?
The Very Dumb Way
Don’t! Just give up! Accept that the problem is too hard to solve, settle on only tabs or only spaces, and move on. This is what Python did from the outset[6], and it doesn’t seem to have hurt it much. In general, imposing your will on others without regard for their individual preferences is the simplest way to get what you want. Who cares if people whose preferences differ from your own aren’t happy with your decision? That’s their problem.
The Dumb Way
The “dumb” way is to manually indent your own code.
When you want to increase the scope depth, press Tab (and make sure your editor is set to insert <TAB>
when you do so!).
When you want to align some code, press Space as many times as necessary.
A bit inconvenient, but it will work just fine.
Just make sure not to accidentally tell your editor to convert all indentation to tabs or spaces, as that will undo all your hard work.
The Smart Way
For every programming language, there is almost certainly some auto-formatter that will nicely format your code for you.[7] Many languages have several. Here are some examples:
-
Python
-
JavaScript
-
Others
Go Rust Ruby Java
If this is your first time hearing about auto-formatters, stop reading this article and go install one for a language you work in. You’ll never manually format your code again (which you shouldn’t — tedium is for computers, not humans). |
Unfortunately, most of these indent with spaces (Camp 1) by default.
Gofmt is the only formatter on this list that indents with tabs and spaces (Camp 0) by default, although Rustfmt can also be configured to indent the right way with the non-default option hard_tabs = true
.
So unfortunately, chances are that if you use an auto-formatter, you’ll be stuck with Simplicio’s indentation scheme; if you want to join Camp 0, you’ll need to format your code the dumb way.
To fix this, you can open an issue on the auto-formatter’s repo, or, preferably, submit a pull request that implements this behavior.
If you’re lazy, though, you can just post an article online and hope that enough people read it and agree with it that the idea of mixing tabs and spaces becomes mainstream.
Other Practical Considerations
Thus far, we’ve been focused primarily on the fact that mixed indentation lets programmers express their indentation width preferences while keeping things tidy.
But indenting with both <TAB>
s and <SPACE>
s has other advantages as well:
-
Sure, some editors have “smart” behavior that allows them to treat multiple spaces as a single tab stop. (It’s dumb that they need a setting at all for something this simple.) But this takes configuring and varies on a per language/filetype basis. And can you guarantee that you’ll always be using your editor of choice? (Imagine having to SSH into a server where the only editor is a config-less Vim — not fun!) If you use
<TAB>
s for semantic indentation, everything will just work. -
If your cursor is within
<TAB>
-based indentation, the left and right arrow keys are guaranteed to navigate between tab stops; it’s impossible to move just part of the way to the next tab stop. This lets you move between tab stops more quickly (regardless of any editor settings). It also prevents your cursor from ever being located between tab stops, which would serve no purpose — if it’s between tab stops, the only useful thing for it to do is move somewhere else first. -
If your cursor is within
<TAB>
-based indentation and you attempt to delete a single indent, you… successfully delete that indent. (Amazing!) If you are using<SPACE>
-based indentation, though, a number of things could happen, depending on your editor’s settings:-
You might delete a tab stop’s worth of spaces
-
You might delete just a single space
-
If your cursor is not at a tab stop, you might delete back to the previous tab stop, which will be less than a full tab stop away.
Why deal with this when there’s a character that was made to indent just begging to be used?
-
-
Depending on how many
<SPACE>
s one uses to simulate a<TAB>
,<TAB>
s can take up quite a bit less space in a file. If you could shrink the sizes of all your files by 5% for no cost, wouldn’t you?
Other Thoughts
No Indentation At All
An unstated premise of this article was that one’s editor should faithfully render one’s files, including whitespace (or the lack thereof). However, auto-formatters are able to infer where whitespace belongs so that they can format the file correctly. Accordingly, it should also be possible for text editors to infer where whitespace belongs. This means that it should be possible, in theory, to insert no indentation whatsoever in one’s source code, and instead make it the responsibility of the text editor to display the file’s (unindented) contents with the correct indentation.
function f(x) {
if (x === 1) {
console.log("x was 1");
} else {
console.log(`x wasn't 1; it was ${x}`)
}
}
function f(x) {
if (x === 1) {
console.log("x was 1");
} else {
console.log(`x wasn't 1; it was ${x}`)
}
}
This would make the whole argument of tabs-versus-spaces-versus-both moot, but would require changing both how editors render it code and how people write it (in the event that they’re not using such an editor). And of course this would not be possible in whitespace-sensitive languages such as Python, nor would it work reliably if your file contained any syntax errors. One interesting consequence of this would be that when using version control, adding or removing a scope would only show up in a diff as just the opening and closing lines being added/removed; the enclosed code would remain unchanged.
Elastic Tabstops
Elastic Tabstops are somewhat tangential to the issue of how to indent properly, but are nonetheless worth mentioning because they’re another interpretation of the <TAB>
character.
Rather than assign a fixed position for each tab stop, elastic tabstops auto-magically treats each tab as exactly wide as it needs to be for the text to remain aligned, taking into account the text and <TAB>
s on surrounding lines.
This allows one to use <TAB>
s to align non-whitespace text while avoiding the issue we saw above where the tab stops had to be sufficiently far apart.
I won’t delve into it, but if you’re interested you can read more here.
Conclusion
When indenting, <TAB>
s and <SPACE>
s serve different purposes.
Indentation works best when both <TAB>
s and <SPACE>
s are used, the former for semantic indentation and the latter for aesthetic indentation.
Indenting with a fixed number of spaces takes us back to the days of the typewriter, when nothing was virtual, everything was physical, and horizontal space was horizontal space was horizontal space, alignment was alignment was alignment.
We are in a more advanced age now and we don’t need to tie semantics to aesthetics quite so strongly; we can make use of the virtual facilities provided by computers to write more comfortably, and in a more individualized manner, than was allowed by typewriters.
We listen to the music we like; we use the editors of our choosing; we use the syntax themes we like best; we assign keyboard shortcuts to the actions that make us most productive. But when it comes to the width of our indentation, we are told to all march in lockstep, not due to any technological limitation, but for the sake of some misbegotten notion of “consistency”. Even Guido van Rossum, dictator of Python and proponent of needless consistency,[8] could acknowledge that “A Foolish Consistency is the Hobgoblin of Little Minds”. I think it’s high time that the rest of us do so as well.
<TAB>
when Tab is pressed — Tab is just another key, and they’re free to respond to it how they want — most “dumb” editors like Apple’s TextEdit and Microsoft’s Notepad will. If your editor doesn’t insert the <TAB>
character when you press Tab, you might be able to force it to insert <TAB>
by pressing Ctrl+I.