5 Programs and Variables

In this chapter, we'll look at programs, their true nature, and how your computer knows about the programs we've used so far and all the others you can use. As part of this, we'll also start looking at the concept of variables, which are a key programming concept, and a very basic but practical way to use them.

Programs are Files

Well, that's quite a header! But indeed, it's true - programs, fundamentally, are also files. On Windows, they typically have the file extension .exe, but on Linux it's common for programs you run from the terminal to not have any extension.

Of course, not all files are programs - or at least, not valid programs. The details are a world all their own, but we won't get into it too much here.

The important consequence of this fun detail is that programs aren't some special entity embedded in your computer. Since they're files, they exist somewhere on the file system. You can move them, rename them, download them, share them, and do anything you would do with any other file.

As an extra note, on Linux, everything is represented as a file. Your USB devices like mouse and keyboard, your screens, your internet connection, any other connections...they're all files! The consequences of that are better suited for a book on Linux, but it's noted here just for information's sake.

Finding A Program's Location with `which`

Well, if our programs are somewhere on the file system, how do we find where they are? We can use the which command, with the program name as its argument:

[stephen@virtualbox Examples]$ which ls
/usr/bin/ls

This says that on my system, the ls program is in the /usr/bin directory. And indeed, if you run ls /usr/bin, it's full of programs! There are 2,802 files in that directory on my machine, to give you an idea.

In case you're curious, it's common convention in Linux world to use bin as a directory name for programs, either on a system or as part of a larger software project. "Bin" is short for "binary", which is a file format that humans generally can't read, but the computer can read. In a different context, it's also a number system of 1s and 0s!

The binary file format is smaller in computer size than text files, so that's a major advantage for storing programs and other files that don't need to be read by humans. When we discuss programs as software developers, they may also be referred to as "binaries", particularly in the context of single-files. The commands we've looked at so far could all be considered "binaries"!

Types of Data

Before moving further, this is a good opportunity to look at a foundational programming concept: all the data in your computer has a certain shape, and that shape is called it's type.

Imagine you're at a party and the host is preparing an order for pizza. If you have 12 attendees, they'll probably make some basic calculation like "we should get 1 pizza for every 3 people", leading to an order of 4 pizzas. All the numbers involved here are one common data type, integers, or numbers without a fraction or decimal point. In most programming, they in fact can not have a fractional part.

Now imagine you instead have 13 attendees at the same party. The same calculation will lead to a result of 4 and 1/3 pizzas. That is a real number, more commonly known in programming as a float, short for "floating-point number", or a double, short for "double-precision floating point number". Those terms have a more mathy origin that I won't get into here too much. Ultimately, it's hard to order a third of a pizza, though, so you probably round up to 5 pizzas or round down to 4 pizzas.

Now as the host is ordering, he has to direct the pizza delivery to his address. He'll say something like "42 Wallaby Way", which will turn into a string in the order system, so-called because it's a string of characters. A character, ultimately, is just a letter, number, symbol, empty space, or put another way, for simplified practical purposes, the smallest thing you can highlight when you move a mouse over text with the left-click held down.

With the terminal, these are the most important data types. Almost everything is inherently interpreted by the shell program as a string, then if you need to do math you can with specific programs. But there are many more types of data in programming, and you'll learn about them in due time!

Environment Variables with `env`

One question you may have involves that directory, /usr/bin. Is that a magical directory that I need to put all my programs if I want to use them? The answer, thankfully, is no! That would get pretty gnarly pretty fast.

To understand how the terminal finds programs, we need to look at another command, env:

[stephen@virtualbox ~]$ env
SHELL=/bin/bash
PWD=/home/stephen
HOME=/home/stephen
USER=stephen
PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl

The actual output of this command was very long, so I've trimmed the unimportant stuff for now. This output is all of the environment variables in my current shell.

A variable in programming is, very broadly and imprecisely, a value - it could be a number, a string of data, or something more complex in fancier programming languages. In most shell programs, variables are limited to numbers and strings.

The environment is the context the program is running in. This involves details like what operating system am I on? How do I know with what format to print out times and dates? How do I know what currency or language I should show to the user?

Often, the answers to those questions are stored in the environment variables. Conveniently, the env command shows all of the current shell's environment variables!

In my trimmed example output, I left 5 variables of interest:

SHELL is the shell program, given as an absolute path
- You might notice this program lives in /bin, whereas ls lived in /usr/bin. /bin has a more direct relationship with the operating system, while /usr/bin is a bit more flexible. You still shouldn't change things in /usr/bin though :)
PWD is the working directory, which we can also see with the pwd command from earlier
HOME is the user's home directory. This is equivalent to the ~ as we saw earlier
USER is the current logged in user. That's me!
- But it's also possible to take on the perspective of other users if you have their password, or root access on the machine

The last one, PATH is the important one for this lesson. The value of PATH is always a list of directories, separated by a colon : so multiple directories can exist in the same variable. When you type a command in the shell, it searches each directory in the order shown for the command you gave. When it finds a match, it runs the program and stops looking. If it doesn't find a match, you'll get a message like this:

[stephen@virtualbox ~]$ some_command
bash: some_command: command not found

Adding Directories to `PATH`

One way to "add" a program to your shell is by moving it to a folder in PATH. While this works, there's another way that's more flexible and doesn't clutter your system folders. PATH is an environment variable, and in programming, we have a lot of control over variables! Most of the time, anyway...

Some Linux distributions, like Ubuntu, include ~/.local/bin as a PATH directory. My example doesn't, but we can add it a few different ways. For now, we'll just temporarily modify the environment variable, using the export command:

export PATH=$HOME/.local/bin:$PATH

Wow! What's going on here? Let's go left to right.

The export command in general makes variables carry over into any programs you run from the same shell session. When you open a new terminal window, that makes a new shell session - when you close a terminal window, that closes that specific shell session.

Next is the name of the variable we want to carry over, PATH. This could be any variable name, and we could even make a new variable if we wanted:

export MY_FAVORITE_FOOD=pizza

You may notice all of the environment variables so far have been all uppercase names. This is another convention, mostly to make it easy to distinguish variables from commands.

Variable Expansion

After the name is an equals sign followed by the value we want to set for this variable. In this value, we include the values of two variables, marked by the dollar sign $. In order to see what value PATH ends up with afterwards, we can use the echo command:

[stephen@virtualbox ~]$ echo $PATH
/home/stephen/.local/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl

That looks quite a bit different than our export command! How did that happen?

When we send commands to the shell, the shell automatically takes the variable names we used in the command and replaces them with their value. This process is called variable expansion, because it "expands" the variable into its value.

So when we give this command to the shell:

export PATH=$HOME/.local/bin:$PATH

It takes the two variables, HOME and PATH, and replaces them with their values. In my case, HOME is /home/stephen, and PATH is...well, I'm lazy, so I'll just say it's the default environment path.

Cleaner Variable Expansion

You might think it's a little confusing. How does the shell know where a variable ends and non-variable stuff starts? Better yet, how could you know if the variable names aren't quite as comprehensible as HOME or PATH?

A best practice for using variables in the shell is to surround them with curly brackets {}. This makes it clear what part is a variable and what part is not, and makes it easier to read. To use this in our previous example, we can do:

export PATH=${HOME}/.local/bin:${PATH}

When you write your variables like this, it's much easier to read it as a human! This has the added benefit of letting you easily expand variables that you intend to put in front of another string:

[stephen@virtualbox ~]$ echo $HOMEdirectory

[stephen@virtualbox ~]$ echo ${HOME}directory
/home/stephendirectory

In the first attempt, the shell tried to find the variable HOMEdir, which isn't defined. So it just gives us a blank line instead of adding "directory" to the end. By using curly brackets, the shell knows it's actually looking for the variable HOME, and then it should write "directory" right after the end of that variable's value.

A Gentle Introduction to the Terminal