Chapter 4 R Syntax
Welcome to the R Book! In this chapter, we will explore the basics of R, a powerful programming language used for statistical computing and graphics.
At its most fundamental level, R is a calculator capable of performing simple, and complex, mathematical operations. It can read and write data to and from files, manipulate the data, calculate summaries and plot visual representations of the data. Essentially, it is a programmatic version of a spreadsheet program.
However, R is much more than just a calculator. It is also a platform for conducting complex analyses, statistical evaluations, predictive inferencing, and machine learning. With R, you can explore and visualize data in a variety of ways, perform advanced statistical analyses, and build predictive models.
In this chapter, we will start by examining the simplest operations of R. We will cover basic arithmetic, working with variables, and creating basic plots. By the end of this chapter, you will have a solid understanding of the fundamentals of R and be ready to tackle more complex topics.
So, let’s get started!
At the end of this chapter you should be able to
|
4.1 Reserved Words
As we begin our journey, it’s important to keep in mind that there are certain reserved words that carry a special meaning and cannot be used as identifiers. These words have been set aside by the R programming language, and using them as variable names or function names could lead to errors in your code.
Therefore, before we dive too deeply into our R programming endeavors, let’s take a moment to familiarize ourselves with these reserved words. This will help us avoid potential issues down the road and ensure that our code runs smoothly.
Word | Use |
---|---|
if , else |
flow control, part of the if-then-else statement |
for , repeat , while , break , next |
flow control, part of the for-loop statement |
function |
basis for defining new algorithms |
TRUE , FALSE |
Boolean logic values |
NULL |
an undefined value |
Inf , -Inf |
an infinite value (eg. 1/0 ) |
NaN |
‘not a number’ |
NA |
a missing value indicator |
A Null results when a value is missing and could be a string or a numeric, where as a NA results when a known value, such as in a column of numbers, is missing. |
4.2 Types
Welcome to the R Book! Whether you’re just starting out or a seasoned pro, understanding the different components of R code is essential for writing high-quality, efficient R programs. In this section, we’ll take a deep dive into the various components of R code that you should be familiar with.
R input is composed of typed characters that represent different parts of a process or mathematical operation. These characters come together to form what we call R code. It’s important to note that R code is not just a random collection of characters - each character serves a specific purpose and contributes to the larger structure of the code. As such, understanding the different components of R code is key to writing effective and efficient R programs.
So, what are these different components of R code? Below, we’ve provided some examples to help you get started:
4.2.1 comments | # this is an important note |
4.2.2 strings | "letters" or "numbers" in quotes |
4.2.3 numbers | 1 integers or 1.000002 floats |
4.2.4 operators | + , - , / , * , … |
4.2.5 variables | var <- 2 containers for information |
4.2.6 statements | == exactly the same, != not the same |
4.2.7 functions | add(x, y) complex code in a convenient wrapper |
By understanding these different components of R code, you’ll be well on your way to writing effective and efficient R programs. So let’s dive in and get started!
# adding two numbers here and storing it as a variable
four <- 2 + 2
# using the function 'cat' to print out my variable along with some text
cat("my number is ", four)
## my number is 4
R does not have an line ending character such as ; in java, PHP or C++ |
4.2.2 Strings
Strings are essentially a sequence of characters, consisting of letters or numbers. They are commonly used in programming languages and are used to represent text-based data. A string can be as simple as a single character, such as “A”, or it can be a longer sequence of characters such as “Hello, World!”. Strings are often used to store data that requires text manipulation, such as usernames, passwords, and email addresses. In contrast to words, which are made up of a specific combination of letters to represent a linguistic term, strings do not follow any specific rules of composition and can be a random or semi-random sequence of characters.
# a string can be a word, this is a string variable
three <- 1 + 2
# or an abbreviation, this is a variable (thr) representing the string "three"
thr <- "three"
# a mass spec reference
peptide <- "QWERTK"
# or an abbreviated variable
pep <- "QWERTK"
When working with R programming language, it is essential to note that strings play a crucial role in the syntax used. Strings, which define text characters, are used to represent data in R, and they must be enclosed in quotes. Failure to do so will result in the interpreter assuming that you are referring to a variable that is not enclosed in quotes.
For instance, in the example above, the peptide
variable contains the string of letters representing the peptide amino acid sequence "QWERTK"
. However, it is essential to note that there are no strict rules for how strings and variables are composed, except that variables cannot start with a number.
There are however, conventions that you can follow when constructing variable names that aid in the readability of the code and convey information about the contents. This is especially useful in long code blocks, or, when the code becomes more complex and divested across several files. For example:
# a string containing a peptide sequence
str_pep <- "QWERTK"
# a data table of m/z values and their identifications
tbl_mz_ids <- read_csv("somefile.csv")
To learn more about and follow specific conventions, explore the following resources:
4.2.3 Numbers
Numbers are the foundation upon which all data analysis is built. Without numbers, we would not be able to perform calculations, identify patterns, or draw conclusions from our data. In the programming language R, there are two main types of numbers: integers
and floats
. An integer is a whole number with no decimal places, while a float is a number with decimal places. Understanding the difference between these two types of numbers is essential for accurate numerical analysis.
In R, integers are represented as whole numbers, such as 1, 2, 3, and so on, while floats are represented with a decimal point, such as 1.5, 2.75, and so on. It is important to note that integers occupy less space in memory than floats, which can be a consideration when working with large datasets. This means that when possible, it is generally better to use integers over floats in R, as they are more efficient and can improve the overall performance of your code.
Numbers are the foundation upon which all data analysis is built. Without numbers, we would not be able to perform calculations, identify patterns, or draw conclusions from our data. In the programming language R, there are two main types of numbers: integers
and floats
.
An integer is a whole number with no decimal places, while a float is a number with decimal places. In most programming languages, including R, integers are represented as whole numbers, such as 1, 2, 3, and so on, while floats are represented with a decimal point, such as 1.5, 2.75, and so on.
It is essential to understand the difference between these two types of numbers for accurate numerical analysis. While integers can only represent whole numbers, floats can represent fractions and decimals. Thus, if you need to represent a number that is not a whole number, you should use a float.
Moreover, it is important to note that integers occupy less space in memory than floats. This can be a consideration when working with large datasets, especially when the whole number is enough to represent the data. Therefore, when possible, it is generally better to use integers over floats in R, as they are more efficient and can improve the overall performance of your code.
4.2.4 Operators
Operators are fundamental components of programming that enable us to manipulate and process various data types. They are symbols that perform a specific action on one or more operands, which could be numeric values, variables, or even strings. Most commonly these symbols allow us to perform basic arithmetic operations such as addition, subtraction, multiplication, and division on numeric values, as well as more complex mathematical operations like exponentiation and modulus.
In addition to numeric values, operators can also manipulate string variables. For instance, we can use concatenation operators to join two or more strings together, which is particularly useful when working with text data. By utilizing operators, we can perform powerful operations that allow us to build complex programs and applications that can handle large amounts of data. Operators play a crucial role in programming, as they allow us to manipulate data in a way that would be difficult or impossible to achieve otherwise.
At their very basic, operators allow you to perform calculations ..
## [1] 3
## [1] 0.5
.. assign values to string variables ..
.. and compare values.
## [1] TRUE
## [1] FALSE
Here is a table summarizing of some common operators in R.
Operator | Name | Description | Example |
---|---|---|---|
<- |
assignmnet | assigns numerics and functions to variables | x <- 1 x now has the value of 1 |
+ |
addition | adds two numbers | 1 + 2 = 3 |
- |
subtraction | subtracts two numbers | 1 - 2 = -1 |
* |
multplication | multiplies two numbers | 1 * 2 = 2 |
/ |
division | divides two numbers | 1 / 2 = 0.5 |
^ |
power or exponent | raises one number to the power of the other | 1 ^ 2 = 1 |
= |
equals | also an assignment operator | x = 1 x now has the value of 1 |
== |
double equals | performs a comparison (exactly equal) | 1 == 1 = TRUE |
!= |
not equals | performs a negative comparison (not equal) | 1 != 2 = TRUE |
%% |
modulus | provides the remainder after division | 5 %% 2 = 1 |
Remember order of operations (PEMDAS): Parentheses, Exponents, Multiplication and Division (from left to right), Addition and Subtraction (from left to right). |
4.2.5 Variables
In programming, variables are essential elements used to store information that can in essence vary. They come in handy when we need to manipulate or retrieve the information stored in them.
Variables can be thought of as containers that can store any kind of information, such as letters, words, numbers, or text strings. They are flexible enough to hold different types of data, and we can use them to store all sorts of information.
One of the most significant advantages of using variables is that we can refer to them repeatedly to retrieve the information stored in them. We can also manipulate the information stored in them with an operation or replace it with an assignment. Variables are a powerful tool in programming that allows us to store and retrieve information, manipulate it, and perform various operations on it.
## [1] 4.14
R even has some intrinsic variables that come in handy, like pi.
## [1] 3.141593
In R it is easy to overwrite existing variables, either initialized by R or created by you, causing error and confusion. |
## [1] 9.876543
4.2.6 Statements
Using a comparison operator, you can make logical comparisons called statements.
Operator | Description | Example |
---|---|---|
| |
an either or comparison, TRUE if both are true FALSE if one is false. |
|
& |
a comparison where both must be TRUE |
|
There are also the double operators || and &&, these are intended to work as flow control operators and stop at the first condition met. In the most recent versions of R, the double operators will error out if a vector is applied. |
4.2.7 Functions
In programming, a function is a type of operator that performs a specific task and can accept additional information or parameters. Functions in the R programming language are fundamental building blocks used to encapsulate and execute a sequence of statements. They allow for modular, reusable, and efficient code development. Functions in R can perform a wide range of tasks, from simple operations like adding two numbers to complex data analyses and visualizations. The structure and behavior of functions in R are designed to support both built-in functions provided by R itself and user-defined functions created by programmers to meet specific needs.
Functions in R can do a wide range of tasks, such as perform a simple calculation and return a single variable. or a vector of variables. Functions can be used to clean, subset, merge, and transform data frames or lists. They can perform statistical modeling and analysis as well as create simple plots and complex, multi-layered graphics.
Moreover, R empowers users to define their own functions, allowing for the encapsulation of complex or repetitive tasks into single, reusable commands, enhancing the language’s flexibility and efficiency.
A function in R is defined using the function
keyword, followed by a set of parentheses that can contain any arguments (parameters) the function requires, and a body enclosed in curly braces {} that contains the code to be executed. Here’s the basic syntax:
## [1] 3
Note that this function requires the inputs for a
and b
as denoted in the parameters brackets ()
. The function then transfers those inputs into the main body of the function that performs the operation inside the curly brackets {}
. And while this single line function is compact and concise, it does not define default values, check any of the inputs, or explicitly return a value. And as a consequence we can get an error that can be confusing.
## Error in a + b: non-numeric argument to binary operator
## Error in add(1): argument "b" is missing, with no default
The following is a more robust function that can be reused at a later date that adds some readability to the process and explicitly returns, with the return()
function, the intended value. Note, that we used some functions, is.numeric()
, stop()
, and paste0
inside our function.
add <- function(
a = 1,
b = 2
) {
if(!is.numeric(a)) { stop('the first value is not a number') }
if(!is.numeric(b)) { stop(paste0('the second value "', b, '" is not a number')) }
answer <- a + b
return(answer)
}
add(1,2)
## [1] 3
## [1] 3
## Error in add(1, "two"): the second value "two" is not a number
4.3 Flow-Control
Sometimes in the course of data analysis we have to make decisions or branch decisions based on what is contained within the data.
The logic of how this done within the programming language is called flow control
. More generally, flow control
is an essential aspect of programming that allows you to control the order in which statements and functions are executed.
4.3.1 For Loop
In R, a loop is a programming construct that allows you to execute a block of code repeatedly. Loops are used when you want to perform a set of instructions repeatedly, such as when you want to iterate over a set of data and perform a particular operation on each element.
There are several types of loops available in R, including the for
loop, the while
loop, and the repeat
loop.
The for
loop is the most commonly used loop in R. It is used to iterate over a sequence of values, such as a vector or a list, and perform a particular operation on each element of the sequence. The basic syntax of a for loop in R is as follows:
Let’s look at a very simple for
loop:
# Create a vector of numbers
numbers <- c(1, 2, 3, 4, 5)
# Iterate over the vector using a for loop
for (num in numbers) {
print(num)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
This code first creates some example data, numbers 1 to 5 increasing by 1, then we go every number and print
the value. Translating the synatx to english we can say for
every num
in numbers
vector, print
the num
.
4.3.2 If-Else
When we have data that needs to be treated based on a condition of the data, we have a branching decision. In this case, our flow control is an If-Else statement. In plain english, if
a condition is met, we do something, else
we do something else. The brackets between these statements determine what is done.
Let’s take the previous example and print if the number is even or odd. You can see already that in how we formulate the programming question we already see the syntax. We say “if
the number is even…” if
is our flow control
# Create a vector of numbers
numbers <- c(1, 2, 3, 4, 5)
# Iterate over the vector using a for loop
for (num in numbers) {
if (num %% 2 == 0) {
print(paste(num, "is even"))
} else {
print(paste(num, "is odd"))
}
}
## [1] "1 is odd"
## [1] "2 is even"
## [1] "3 is odd"
## [1] "4 is even"
## [1] "5 is odd"
In this example, we’re using the %% operator to determine whether each number in the vector is even or odd. If the number is even, the code inside the if block is executed and the number is printed along with the message “is even”. If the number is odd, the code inside the else block is executed and the number is printed along with the message “is odd”. `paste is a handy function for combining
Exercises
|
- Calculate the sum of 2 and 3.
## [1] 5
- Evaluate if 0.5 is equal to 1 divided by 2.
## [1] TRUE
Define a variable that is is 98.6 degrees in Fahrenheit.
Construct an if-else statement to determine if the temperature indicates a fever (temperature greater than or equal to 100). If it does print “The temperature indicates a fever.” If the temperature is less than 100, print “The temperature does not indicate a fever.”
## [1] "The temperature indicates a fever."
- Create a function to test if a temperature is a fever called
fever_checker
the prints above, but can be reused
## [1] "The temperature indicates a fever."
## [1] "The temperature does not indicate a fever."
- Use similar logic to print if a temperature is the homeostatic range for human beings (97.7–99.5).
## [1] "Outside homeostatic range!"
## [1] "Outside homeostatic range!"
- Advanced exercise… add TRUE / FALSE returns to the functions and create a function that combines both called
temperature_check
. That gives us info on the what our temperature means (i.e., is it homeostatic, is it a fever, are we possibly dead (temperature way too low??)
4.2.1 Comments
Comments are essential parts of the code you will write. They help explain why you are taking a certain approach to the problem, either for you to remember at a later time or for a colleague. Comments in other coding languages, including R package development, can become quite expressive, representing parts and structures to a larger documentation effort. Here, however, comments are just simple text that gets ignored by the R interpreter. You can put anything you want in comments.