5 Programs and Variables
In this chapter, we'll look at programs, their true nature, and how your computer knows about the programs we've used so far and all the others you can use. As part of this, we'll also start looking at the concept of variables, which are a key programming concept, and a very basic but practical way to use them.
Programs are Files
Well, that's quite a header! But indeed, it's true - programs, fundamentally,
are also files. On Windows, they typically have the file extension .exe
, but
on Linux it's common for programs you run from the terminal to not have any
extension.
Of course, not all files are programs - or at least, not valid programs. The details are a world all their own, but we won't get into it too much here.
The important consequence of this fun detail is that programs aren't some special entity embedded in your computer. Since they're files, they exist somewhere on the file system. You can move them, rename them, download them, share them, and do anything you would do with any other file.
As an extra note, on Linux, everything is represented as a file. Your USB devices like mouse and keyboard, your screens, your internet connection, any other connections...they're all files! The consequences of that are better suited for a book on Linux, but it's noted here just for information's sake.
Finding A Program's Location with which
Well, if our programs are somewhere on the file system, how do we find where
they are? We can use the which
command, with the program name as its argument:
[stephen@virtualbox Examples]$ which ls
/usr/bin/ls
This says that on my system, the ls
program is in the /usr/bin
directory.
And indeed, if you run ls /usr/bin
, it's full of programs! There are 2,802
files in that directory on my machine, to give you an idea.
In case you're curious, it's common convention in Linux world to use bin
as a
directory name for programs, either on a system or as part of a larger software
project. "Bin" is short for "binary", which is a file format that humans
generally can't read, but the computer can read. In a different context, it's
also a number system of 1s and 0s!
The binary file format is smaller in computer size than text files, so that's a major advantage for storing programs and other files that don't need to be read by humans. When we discuss programs as software developers, they may also be referred to as "binaries", particularly in the context of single-files. The commands we've looked at so far could all be considered "binaries"!
Types of Data
Before moving further, this is a good opportunity to look at a foundational programming concept: all the data in your computer has a certain shape, and that shape is called it's type.
Imagine you're at a party and the host is preparing an order for pizza. If you have 12 attendees, they'll probably make some basic calculation like "we should get 1 pizza for every 3 people", leading to an order of 4 pizzas. All the numbers involved here are one common data type, integers, or numbers without a fraction or decimal point. In most programming, they in fact can not have a fractional part.
Now imagine you instead have 13 attendees at the same party. The same calculation will lead to a result of 4 and 1/3 pizzas. That is a real number, more commonly known in programming as a float, short for "floating-point number", or a double, short for "double-precision floating point number". Those terms have a more mathy origin that I won't get into here too much. Ultimately, it's hard to order a third of a pizza, though, so you probably round up to 5 pizzas or round down to 4 pizzas.
Now as the host is ordering, he has to direct the pizza delivery to his address. He'll say something like "42 Wallaby Way", which will turn into a string in the order system, so-called because it's a string of characters. A character, ultimately, is just a letter, number, symbol, empty space, or put another way, for simplified practical purposes, the smallest thing you can highlight when you move a mouse over text with the left-click held down.
With the terminal, these are the most important data types. Almost everything is inherently interpreted by the shell program as a string, then if you need to do math you can with specific programs. But there are many more types of data in programming, and you'll learn about them in due time!
Environment Variables with env
One question you may have involves that directory, /usr/bin
. Is that a magical
directory that I need to put all my programs if I want to use them? The answer,
thankfully, is no! That would get pretty gnarly pretty fast.
To understand how the terminal finds programs, we need to look at another
command, env
:
[stephen@virtualbox ~]$ env
SHELL=/bin/bash
PWD=/home/stephen
HOME=/home/stephen
USER=stephen
PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl
The actual output of this command was very long, so I've trimmed the unimportant stuff for now. This output is all of the environment variables in my current shell.
A variable in programming is, very broadly and imprecisely, a value - it could be a number, a string of data, or something more complex in fancier programming languages. In most shell programs, variables are limited to numbers and strings.
The environment is the context the program is running in. This involves details like what operating system am I on? How do I know with what format to print out times and dates? How do I know what currency or language I should show to the user?
Often, the answers to those questions are stored in the environment variables.
Conveniently, the env
command shows all of the current shell's environment
variables!
In my trimmed example output, I left 5 variables of interest:
SHELL
is the shell program, given as an absolute path- You might notice this program lives in
/bin
, whereasls
lived in/usr/bin
./bin
has a more direct relationship with the operating system, while/usr/bin
is a bit more flexible. You still shouldn't change things in/usr/bin
though :)
- You might notice this program lives in
PWD
is the working directory, which we can also see with thepwd
command from earlierHOME
is the user's home directory. This is equivalent to the~
as we saw earlierUSER
is the current logged in user. That's me!- But it's also possible to take on the perspective of other users if you have their password, or root access on the machine
The last one, PATH
is the important one for this lesson. The value of PATH
is always a list of directories, separated by a colon :
so multiple
directories can exist in the same variable. When you type a command in the
shell, it searches each directory in the order shown for the command you gave.
When it finds a match, it runs the program and stops looking. If it doesn't find
a match, you'll get a message like this:
[stephen@virtualbox ~]$ some_command
bash: some_command: command not found
Adding Directories to PATH
One way to "add" a program to your shell is by moving it to a folder in PATH
.
While this works, there's another way that's more flexible and doesn't clutter
your system folders. PATH
is an environment variable, and in programming, we
have a lot of control over variables! Most of the time, anyway...
Some Linux distributions, like Ubuntu, include ~/.local/bin
as a PATH
directory. My example doesn't, but we can add it a few different ways. For now,
we'll just temporarily modify the environment variable, using the export
command:
export PATH=$HOME/.local/bin:$PATH
Wow! What's going on here? Let's go left to right.
The export
command in general makes variables carry over into any programs you
run from the same shell session. When you open a new terminal window, that
makes a new shell session - when you close a terminal window, that closes that
specific shell session.
Next is the name of the variable we want to carry over, PATH
. This could be
any variable name, and we could even make a new variable if we wanted:
export MY_FAVORITE_FOOD=pizza
You may notice all of the environment variables so far have been all uppercase names. This is another convention, mostly to make it easy to distinguish variables from commands.
Variable Expansion
After the name is an equals sign followed by the value we want to set for this
variable. In this value, we include the values of two variables, marked by the
dollar sign $
. In order to see what value PATH
ends up with afterwards, we
can use the echo
command:
[stephen@virtualbox ~]$ echo $PATH
/home/stephen/.local/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl
That looks quite a bit different than our export
command! How did that happen?
When we send commands to the shell, the shell automatically takes the variable names we used in the command and replaces them with their value. This process is called variable expansion, because it "expands" the variable into its value.
So when we give this command to the shell:
export PATH=$HOME/.local/bin:$PATH
It takes the two variables, HOME
and PATH
, and replaces them with their
values. In my case, HOME
is /home/stephen
, and PATH
is...well, I'm lazy,
so I'll just say it's the default environment path.
Cleaner Variable Expansion
You might think it's a little confusing. How does the shell know where a
variable ends and non-variable stuff starts? Better yet, how could you know if
the variable names aren't quite as comprehensible as HOME
or PATH
?
A best practice for using variables in the shell is to surround them with curly
brackets {}
. This makes it clear what part is a variable and what part is not,
and makes it easier to read. To use this in our previous example, we can do:
export PATH=${HOME}/.local/bin:${PATH}
When you write your variables like this, it's much easier to read it as a human! This has the added benefit of letting you easily expand variables that you intend to put in front of another string:
[stephen@virtualbox ~]$ echo $HOMEdirectory
[stephen@virtualbox ~]$ echo ${HOME}directory
/home/stephendirectory
In the first attempt, the shell tried to find the variable HOMEdir
, which
isn't defined. So it just gives us a blank line instead of adding "directory" to
the end. By using curly brackets, the shell knows it's actually looking for the
variable HOME
, and then it should write "directory" right after the end of
that variable's value.