← back

Thoughts on code style

22 December 2018

Recently there was a discussion on twitter about formatting, which prompted me to summarise my thoughts on coding style and the style that I personally adhere to (thread). Here I recount and elaborate the points made.

The tldr version of my style advice is that the code should be self-explanatory and easy to modify. Self-explanatory code has a logical control and data structure, and reads like a crisp essay or news article.

Although I focus on Haskell because it's my primary programming language, the advice here is mostly just general-purpose engineering hygiene.

Comprehension concerns

1. Make dataflow as syntactically explicit as possible.

Pointfree style is a great tool for this, to an extent. Use it whenever it does.

Good functional style emphasises clear data flow.

Good imperative style emphasises clear control flow.

I have no idea what good object-oriented style is. Objects obscure both control and data flow by entangling mutable functions and mutable data into a big mess.

(But, I hear you object, what about Scala? Well, good Scala code is basically good functional code---they use the object system as a module system.)

2. Separate function definition and function application.

Use . over $.

This is functional---

h . g . f $ x

You are building a function h . g . f and then applying it.

This is imperative. This is bad. Now you have to care about the concrete variable in question when reading this.

h $ g $ f $ x

This comes into play when you have a large function. When reading grossImperativeFlow, you have to track the variable as it gets passed around. It takes more mental effort and causes you to think at a low level of abstraction.

grossImperativeFlow input =
  let stage1 = doMore $ doAnotherThing $ doSomething $ input
      stage2 = somethingElse $ evenMore $ stage1
  in stage2

When reading niceFunctionalFlow, you don't care about the variable---in fact, there is no variable! There's only the actions to concern yourself with. Your attention is called to the main stage2 . stage1 pipeline. The specifics of the stages are offset to the where binding, where you can focus your attention to if you need to know.

niceFunctionalFlow = stage2 . stage1
    stage1 = doMore . doAnotherThing . doSomething
    stage2 = somethingElse . evenMore

Naming your functions makes explicit data flow. Naming your data makes explicit control flow.

3. On a similar note, explicitly name intermediate steps only when it is absolutely necessary.

Each let-binder should refer to a meaningful unit of work. Excessive lets obscure data flow.

For example, this (source) is very difficult to read.

There are too many binders, verbosely naming unnecessary details that you shouldn't be caring about. The flow is obscured.

adjustLine :: Int -> Text -> Text
adjustLine desiredPrefixLength oldLine = newLine
    (prefix, suffix) = Data.Text.breakOn "=" oldLine

    actualPrefixLength = Data.Text.length prefix

    additionalSpaces = desiredPrefixLength - actualPrefixLength

    spaces = Data.Text.replicate additionalSpaces " "

    newLine = Data.Text.concat [ prefix, spaces, suffix ]

I would write it like this. (See my stance on import names below).

adjustLine :: Int -> T.Text -> T.Text
adjustLine n = recombine . first adjust . T.breakOn "="
    recombine = uncurry (<>)
    adjust    = (<>) <*> spaces
    spaces    = flip T.replicate " " . (n - ) . T.length

The high-level logical structure is clear, and the unnecessary details are hidden in the where. The processing is split into three clear steps, each of which is given a meaningful short name. You break the text in two, adjust the first thing, then put the pieces back together again.

At a glance, you know what steps the data1 is flowing through, rather than knowing what the different intermediate datum are but not what's actually going on.

4. Do not use excessive whitespace or modularisation.

With regards to the logical organisation of code, a file or module is like a chapter, a function is like a section, and each newline-separated block in a function is like a paragraph.

The original adjustLine function above is an example of excessive whitespace. The logical organisation of the subroutines is obscured by the spacing.

5. Do not call attention to mundane things, like data type munging.

It’s fine (even preferable) to make such functions terse and pointfree and give them single character/short names like f, g, wrap, munge, etc. The point is that the reader doesn't care about these things and he shouldn't need to.

In fact, it is sometimes advantageous to obscure your code with pointfree style, to force the reader to ignore the irrelevant details.

6. Make your datatypes as generic as possible. Make the operations over the data types as generic as possible.

Traversal boilerplate is mundane code; do your best to avoid writing it. Abstract it with recursion schemes.

7. Excessive type annotations or type applications can obscure dataflow and data structure.

If you have 6 nested fmaps, you absolutely do not care what those intermediate containers are.

Engineering concerns

I do not use IDEs nor debuggers, a practice which shapes my following opinions. However, I do believe that codebases maintained by hackers with simple tools are kept in better shape than codebases generated by IDE monkeys, probably out of necessity. (Much like how those who commute by walking or cycling are on average in better shape than those who drive.)

8. Make it easy to grep for definitions.

Haskell is great for this because types and terms are kept separate, but not as separate as ML that you need special plugins to jump between .ml and .mli files. The type ascription token :: is the marker for the function definition and should always be kept on the same line as the function name.

With this in mind, my new preferred multiline formatting style for type signatures is:

myLongFunctionName ::
  a ->
  b ->

I used to write multiline signatures like this:

myLongFunctionName :: a
                   -> b
                   -> ...
                   -> c

However, this style is quite bad because to keep things in alignment, you have to reindent whenever you rename the function, necessitating unnecessary keystrokes and causing lots of diff noise.

The following style is even worse

-- NO NO NO, don't ever do this
  :: a
  -> b
  -> ...
  -> c

While everything has a nice, uniform offset, the :: token has been separated from the function name, neutering its utility when grepping through the code.

9. Make it easy to comment things out for debugging.

Format lists like this, so the first line is not special and can be commented out easily.

myList = map something
 , 2
 , 3

I'm a lot less strict about this, but it's sometimes convenient to split up long chains of function compositions:

doIt =
  . function4
  . function3
  . function2
  . function1
  . function0

10. The more rare something is, the more qualified its import should be.

This is self-explanatory. The converse to this statement is that common things should be unqualified or have short imports. There is a special place in Enterprise Java Hell for you if you do this:

import qualified Control.Monad.Trans.Class
import qualified Control.Monad.Trans.IO
import qualified Control.Monad.Reader
import qualified Data.Bifunctor

doSomething = do
  ctx <- Control.Monad.Reader.ask  -- wtf??

  let xy = Data.Bifunctor.bimap f g someBifunctor  -- seriously wtf is wrong with you??

  -- my fingers are hurting just typing this out
  Control.Monad.Trans.Class.lift (doSomething ctx)

  -- I have no idea what's going on anymore because of all the goddamn import noise
  Control.Monad.Trans.IO.liftIO (putStrLn "done")

11. Organize/abstract things so that it’s easy to make large changes over all files with find and sed.

Use type aliases, type constraints, and typeclasses liberally. Abstraction is the name of the game; that's why you're using Haskell!

12. Make wrapper modules that abstract over libraries.

...and provide your own stable API (e.g. Time, Regex, Compression) so that you can swap the libraries out easily.

This is just good old-school engineering hygiene. But if you don't do this, please observe the above point and import the library consistently (for ease of sed)!

  1. You shouldn't be caring about the data anyhow.