Shell Scripting

In the last chapter, we looked at how you can customize your terminal experience with the .bashrc configuration file and some alternative shell and terminal programs. While the .bashrc file is a conventional place to put personal configuration, at its core, it's just a slightly special case of a shell script. Or, in other words, a program!

We call it "scripting" because you're writing a sequence of instructions for the shell program to follow, like a theater or movie script. Don't be too concerned with the fact it has a different (and more specific) name than "programming", however - it's still programming!

In this chapter, we'll mostly look at some best practices that will help you write clean, readable, and understandable programs. While I will use shell scripting, specifically bash, for the examples (this is a book on the terminal, after all!), you can assume they are generally good starting points for structuring larger programs in any language.

Reusable Code with Functions

You may recognize the word "function" from math education, where it more or less meant "an equation with some number of variables that you plug in to find the answer". You may remember the formula for a line: $$y = mx + b$$ Which you were likely taught is also written as: $$f(x) = mx + b$$ Where the $f(x)$ indicates a function that varies with respect to x. In other words, the result or output of the function (annotated as $y$ or $f(x)$) depends on the input (annotated as $x$).

To get a bit into the weeds here, mathematical functions like this are often called pure functions, because they always provide the same result if you provide the same input. There's an entire paradigm of programming that relies on this handy consistency called functional programming. But that's a topic for another book!

In programming, our functions often look like this in concept, but are written quite differently. One major commonality is the idea of function inputs and outputs, where the inputs may also be called arguments or parameters, and the outputs are also called return values.

Here's a few examples from several different languages:

// C++
int y(int x) {
  const int slope = 2;
  const int y_intercept = 5;
  return x * slope + y_intercept;
}
// Javascript
function y(x) {
  const slope = 2;
  const y_intercept = 5;
  return x * slope + y_intercept;
}
# Python
def y(x):
  slope = 2;
  y_intercept = 5;
  return x * slope + y_intercept;
# Elixir
def y(x) do
  slope = 2
  y_intercept = 5
  x * slope + y_intercept
end
# F#
let y x: int =
    let slope = 2
    let y_intercept = 5
    x * slope + y_intercept
# bash
function y {
  slope=2
  y_intercept=5
  echo "$1*$slope+$y_intercept" | bc
}

These functions all do the same thing, but different languages have different ways to specify or define a function.

When possible, make individual functions simple to understand what they're doing, even if how they do it is complicated. Here are some ways to help with understandable functions.

Names

Give them good names for the context they exist in. In my examples, y isn't a very good function name in general, but since we were just looking at a function annotated as $y$, it's not too bad here. A better general name for the function might be something like linear_equation.

Short and Sweet

Psychology generally agrees that humans can only store between 4 to 7 pieces of information in working memory, so that's about the upper limit of how much "stuff" a function should do at once.

It's easy to interpret that as "4 to 7 lines of code", but I don't think that's very helpful, because not all lines are created equal. Instead, I think you can use it as a guideline of how many things a function is doing.

In the above example with the linear equation function, all of the example functions have 3 lines of inner code. But only 1 line is really important to knowing what the function does - the last one, where it calculates the result.

Maybe instead of just one linear equation, we want a function that calculates 2 linear equations and subtracts them from each other. Something like:

function distance_between_lines {
  slope_a=2
  slope_b=7
  y_intercept_a=5
  y_intercept_b=0
  echo "$1*$slope_a+$y_intercept_a - $1*$slope_b+$y_intercept_b" | bc
}

We added two lines, but it's still not doing too much more conceptually.

Now maybe we want to calculate this for a range of values for x and then show that to the user:

function distances_between_lines_for_range {
  slope_a=2
  slope_b=7
  y_intercept_a=5
  y_intercept_b=0
  for i in {1..10}; do
	  answer=$(echo "$i*$slope_a+$y_intercept_a - $i*$slope_b+$y_intercept_b" | bc)
	  echo "$i: $answer"
  done
}

# running this function gives us...

bsh ❯ distances_between_lines_for_range
1: 0
2: -5
3: -10
4: -15
5: -20
6: -25
7: -30
8: -35
9: -40
10: -45

The main work of this function takes place in only 2 of the 9 lines inside of it, but those 2 lines could count as 2 separate main actions: calculate the difference, and printing the result of that calculation.

Use Other Functions

In functions you write, you can use other functions! In bash, this looks like our 6th function line:

answer=$(echo "$i*$slope_a+$y_intercept_a - $i*$slope_b+$y_intercept_b" | bc)

(Technically, this is a command, but in bash, using or calling a command looks like using or calling a function.)

Basically, you can wrap any function or command in $(...) and assign it to a variable to use the result of that function/command elsewhere in your script.

In other languages, it's not quite as clunky as it is in bash. Here's a Python example:

def linear_equation_a(x):
  slope = 2
  y_intercept = 5
  return x * slope + y_intercept

def linear_equation_b(x):
  slope = 7
  y_intercept = 0
  return x * slope + y_intercept

# Calculate the difference between the two linear equations at `x`.
def linear_equation_difference(x):
  return linear_equation_a(x) - linear_equation_b(x)

Document Them!

Using comments is a great way to explain a function in plain human language. We could document the purpose of the distances_between_lines_for_range function like this:

# Calculate the distance between two lines for the range 1 through 10
# and print the result for each input value.
function distances_between_lines_for_range {
  ...
}

Bash and some languages only allows single-line comments with #, but some languages allow multi-line comments for somewhat cleaner documentation comments. Here's an example of a Java documentation comment in the Javadoc style:

/**
	* Returns an Image object that can then be painted on the screen.
	* The url argument must specify an absolute <a href="#{@link}">{@link URL}</a>. The name
	* argument is a specifier that is relative to the url argument.
	* <p>
	* This method always returns immediately, whether or not the
	* image exists. When this applet attempts to draw the image on
	* the screen, the data will be loaded. The graphics primitives
	* that draw the image will incrementally paint on the screen.
	*
	* @param  url  an absolute URL giving the base location of the image
	* @param  name the location of the image, relative to the url argument
	* @return      the image at the specified URL
	* @see         Image
*/
public Image getImage(URL url, String name) {
	try {
		return getImage(new URL(url, name));
	} catch (MalformedURLException e) {
		return null;
	}
}

Notice that this documentation comment has a few different components. There's the short description of the function, then a more detailed description that indicates the expected format of the parameters and some notes about its behavior. It also includes, as annotations like @param and @return, the name and description of its inputs and outputs.

Structure of a Bash Function

Here's an example of a small function I wrote in my .bashrc:

function source__ {
  SOURCE_FILE=${1:-~/.bashrc}
  source "$SOURCE_FILE"
}

The first line has three important parts:

  • function tells the shell "I want to define a function",
  • source__ is the name of the function, and
  • the { curly brace is how we mark the beginning of the function

The second line only does one thing: if we provided an argument when we called this function, use that for the variable SOURCE_FILE; if we did not, default to using my .bashrc in my home directory. The syntax for this is a bit clunky, unfortunately.

The third line also only does one thing: run the command source with the SOURCE_FILE variable as its argument. The source command simply runs the argument as a shell script. In the case of my .bashrc, this defines functions like this and modifies or sets any environment variables I want to change.

The fourth line is just a } right curly brace, telling the shell that the definition is complete.

When the shell runs these four lines, it doesn't run them yet. It simply stores the commands inside the curly braces, also called the function body, and gives me the ability to run something like this:

source__ my_config_file.sh

To execute the function body, replacing argument 1 $1 with my_config_file.sh.

As an aside, you can also define a function like this:

source__() {
  # the inside is the same as before
}

Bash treats this first line exactly the same as function source__.

Splitting Code Across Files

Did you know that Google has more than 2 billion lines of code? That's a lot of typing!

Fortunately for their employees, not all 2 billion lines are in the same file. Projects big and small can benefit from logical separation of code into multiple files, and with shell scripting it's quite easy! In fact, you can use my source__ function from the last section, or just use the built in source command. Here's some of the first few lines of my configuration file:

source ~/.gitbash

# Work specific bash stuff
if [ -f ~/.workbash ]; then
  source ~/.workbash
fi

Basically, I always have a separate file for setup and commands related to the git program that I call .gitbash. When I've been working somewhere for a bit, I'll probably have some setup that's only important for work, so I always put that in a file called .workbash. The if block says "if the file called .workbash exists, then source it". If I didn't have that, I would get an error like this:

bsh ❯ source .workbash
-bash: .workbash: No such file or directory

Running the if block like I have it instead gives no error:

bsh ❯ if [ -f "~/.workbash" ]; then
∙ source .workbash
∙ fi

bsh ❯

Nice!

The mechanism for making code available across different files is

Summary