Chapter 4 R Syntax

Welcome to the R Book! In this chapter, we will explore the basics of R, a powerful programming language used for statistical computing and graphics.

At its most fundamental level, R is a calculator capable of performing simple, and complex, mathematical operations. It can read and write data to and from files, manipulate the data, calculate summaries and plot visual representations of the data. Essentially, it is a programmatic version of a spreadsheet program.

However, R is much more than just a calculator. It is also a platform for conducting complex analyses, statistical evaluations, predictive inferencing, and machine learning. With R, you can explore and visualize data in a variety of ways, perform advanced statistical analyses, and build predictive models.

In this chapter, we will start by examining the simplest operations of R. We will cover basic arithmetic, working with variables, and creating basic plots. By the end of this chapter, you will have a solid understanding of the fundamentals of R and be ready to tackle more complex topics.

So, let’s get started!


At the end of this chapter you should be able to

  • Understand R’s syntax, variables, operators and functions.

  • Create and edit a project in RStudio.


4.1 Reserved Words

As we begin our journey, it’s important to keep in mind that there are certain reserved words that carry a special meaning and cannot be used as identifiers. These words have been set aside by the R programming language, and using them as variable names or function names could lead to errors in your code.

Therefore, before we dive too deeply into our R programming endeavors, let’s take a moment to familiarize ourselves with these reserved words. This will help us avoid potential issues down the road and ensure that our code runs smoothly.

# to read more about them type
?reserved
Word Use
if, else flow control, part of the if-then-else statement
for, repeat, while, break, next flow control, part of the for-loop statement
function basis for defining new algorithms
TRUE, FALSE Boolean logic values
NULL an undefined value
Inf , -Inf an infinite value (eg. 1/0 )
NaN ‘not a number’
NA a missing value indicator


A Null results when a value is missing and could be a string or a numeric, where as a NA results when a known value, such as in a column of numbers, is missing.


4.2 Types

Welcome to the R Book! Whether you’re just starting out or a seasoned pro, understanding the different components of R code is essential for writing high-quality, efficient R programs. In this section, we’ll take a deep dive into the various components of R code that you should be familiar with.

R input is composed of typed characters that represent different parts of a process or mathematical operation. These characters come together to form what we call R code. It’s important to note that R code is not just a random collection of characters - each character serves a specific purpose and contributes to the larger structure of the code. As such, understanding the different components of R code is key to writing effective and efficient R programs.

So, what are these different components of R code? Below, we’ve provided some examples to help you get started:

4.2.1 comments # this is an important note
4.2.2 strings "letters" or "numbers" in quotes
4.2.3 numbers 1 integers or 1.000002 floats
4.2.4 operators +, -, /, *, …
4.2.5 variables var <- 2 containers for information
4.2.6 statements == exactly the same, != not the same
4.2.7 functions add(x, y) complex code in a convenient wrapper

By understanding these different components of R code, you’ll be well on your way to writing effective and efficient R programs. So let’s dive in and get started!

# adding two numbers here and storing it as a variable
four <- 2 + 2

# using the function 'cat' to print out my variable along with some text
cat("my number is ", four)
## my number is  4


R does not have an line ending character such as ; in java, PHP or C++


4.2.1 Comments

Comments are essential parts of the code you will write. They help explain why you are taking a certain approach to the problem, either for you to remember at a later time or for a colleague. Comments in other coding languages, including R package development, can become quite expressive, representing parts and structures to a larger documentation effort. Here, however, comments are just simple text that gets ignored by the R interpreter. You can put anything you want in comments.

oops, not a comment
# This is a comment

# and here a comment tag is used to ignore legitimate R code
# four <- 2 + 2 
four <- 2 * 2

4.2.2 Strings

Strings are essentially a sequence of characters, consisting of letters or numbers. They are commonly used in programming languages and are used to represent text-based data. A string can be as simple as a single character, such as “A”, or it can be a longer sequence of characters such as “Hello, World!”. Strings are often used to store data that requires text manipulation, such as usernames, passwords, and email addresses. In contrast to words, which are made up of a specific combination of letters to represent a linguistic term, strings do not follow any specific rules of composition and can be a random or semi-random sequence of characters.

# a string can be a word, this is a string variable
three <- 1 + 2
# or an abbreviation, this is a variable (thr) representing the string "three"
thr <- "three" 
# a mass spec reference
peptide <- "QWERTK"
# or an abbreviated variable
pep <- "QWERTK"

When working with R programming language, it is essential to note that strings play a crucial role in the syntax used. Strings, which define text characters, are used to represent data in R, and they must be enclosed in quotes. Failure to do so will result in the interpreter assuming that you are referring to a variable that is not enclosed in quotes.

For instance, in the example above, the peptide variable contains the string of letters representing the peptide amino acid sequence "QWERTK". However, it is essential to note that there are no strict rules for how strings and variables are composed, except that variables cannot start with a number.

# permitted
b4 <- 1 + 3
# not permitted
4b <- 1 + 3. ## Error: unexpected symbol in "4b"

There are however, conventions that you can follow when constructing variable names that aid in the readability of the code and convey information about the contents. This is especially useful in long code blocks, or, when the code becomes more complex and divested across several files. For example:

# a string containing a peptide sequence
str_pep <- "QWERTK"

# a data table of m/z values and their identifications
tbl_mz_ids <- read_csv("somefile.csv")

To learn more about and follow specific conventions, explore the following resources:

4.2.3 Numbers

Numbers are the foundation upon which all data analysis is built. Without numbers, we would not be able to perform calculations, identify patterns, or draw conclusions from our data. In the programming language R, there are two main types of numbers: integers and floats. An integer is a whole number with no decimal places, while a float is a number with decimal places. Understanding the difference between these two types of numbers is essential for accurate numerical analysis.

In R, integers are represented as whole numbers, such as 1, 2, 3, and so on, while floats are represented with a decimal point, such as 1.5, 2.75, and so on. It is important to note that integers occupy less space in memory than floats, which can be a consideration when working with large datasets. This means that when possible, it is generally better to use integers over floats in R, as they are more efficient and can improve the overall performance of your code.

# integers
1,  12345, -17, 0

Numbers are the foundation upon which all data analysis is built. Without numbers, we would not be able to perform calculations, identify patterns, or draw conclusions from our data. In the programming language R, there are two main types of numbers: integers and floats.

An integer is a whole number with no decimal places, while a float is a number with decimal places. In most programming languages, including R, integers are represented as whole numbers, such as 1, 2, 3, and so on, while floats are represented with a decimal point, such as 1.5, 2.75, and so on.

It is essential to understand the difference between these two types of numbers for accurate numerical analysis. While integers can only represent whole numbers, floats can represent fractions and decimals. Thus, if you need to represent a number that is not a whole number, you should use a float.

Moreover, it is important to note that integers occupy less space in memory than floats. This can be a consideration when working with large datasets, especially when the whole number is enough to represent the data. Therefore, when possible, it is generally better to use integers over floats in R, as they are more efficient and can improve the overall performance of your code.

# floats
significand <- 12345
exponent <- -3
base <- 10

# 12.345 = 12345 * 10^-3
significand * base ^ exponent

4.2.4 Operators

Operators are fundamental components of programming that enable us to manipulate and process various data types. They are symbols that perform a specific action on one or more operands, which could be numeric values, variables, or even strings. Most commonly these symbols allow us to perform basic arithmetic operations such as addition, subtraction, multiplication, and division on numeric values, as well as more complex mathematical operations like exponentiation and modulus.

In addition to numeric values, operators can also manipulate string variables. For instance, we can use concatenation operators to join two or more strings together, which is particularly useful when working with text data. By utilizing operators, we can perform powerful operations that allow us to build complex programs and applications that can handle large amounts of data. Operators play a crucial role in programming, as they allow us to manipulate data in a way that would be difficult or impossible to achieve otherwise.

At their very basic, operators allow you to perform calculations ..

1 + 2
## [1] 3
1 / 2
## [1] 0.5

.. assign values to string variables ..

myvar <- 1

.. and compare values.

1 == myvar
## [1] TRUE
2 != myvar + myvar
## [1] FALSE

Here is a table summarizing of some common operators in R.

Operator Name Description Example
<- assignmnet assigns numerics and functions to variables x <- 1 x now has the value of 1
+ addition adds two numbers 1 + 2 = 3
- subtraction subtracts two numbers 1 - 2 = -1
* multplication multiplies two numbers 1 * 2 = 2
/ division divides two numbers 1 / 2 = 0.5
^ power or exponent raises one number to the power of the other 1 ^ 2 = 1
= equals also an assignment operator x = 1 x now has the value of 1
== double equals performs a comparison (exactly equal) 1 == 1 = TRUE
!= not equals performs a negative comparison (not equal) 1 != 2 = TRUE
%% modulus provides the remainder after division 5 %% 2 = 1


Remember order of operations (PEMDAS): Parentheses, Exponents, Multiplication and Division (from left to right), Addition and Subtraction (from left to right).


4.2.5 Variables

In programming, variables are essential elements used to store information that can in essence vary. They come in handy when we need to manipulate or retrieve the information stored in them.

Variables can be thought of as containers that can store any kind of information, such as letters, words, numbers, or text strings. They are flexible enough to hold different types of data, and we can use them to store all sorts of information.

One of the most significant advantages of using variables is that we can refer to them repeatedly to retrieve the information stored in them. We can also manipulate the information stored in them with an operation or replace it with an assignment. Variables are a powerful tool in programming that allows us to store and retrieve information, manipulate it, and perform various operations on it.

# create two viables and assign values to each
var_a <- 1
var_b <- 3.14

var_a + var_b
## [1] 4.14

R even has some intrinsic variables that come in handy, like pi.

pi
## [1] 3.141593


In R it is easy to overwrite existing variables, either initialized by R or created by you, causing error and confusion.


pi <- 9.876543
pi
## [1] 9.876543

4.2.6 Statements

Using a comparison operator, you can make logical comparisons called statements.

Operator Description Example
| an either or comparison, TRUE if both are true FALSE if one is false.

1 == 1 | 1 != 2 = TRUE

1 == 1 | 1 == 2 = FALSE

& a comparison where both must be TRUE

1 == 1 & 1 != 2 = TRUE

1 == 1 & 1 != 2 = FALSE


There are also the double operators || and &&, these are intended to work as flow control operators and stop at the first condition met. In the most recent versions of R, the double operators will error out if a vector is applied.


4.2.7 Functions

In programming, a function is a type of operator that performs a specific task and can accept additional information or parameters. Functions in the R programming language are fundamental building blocks used to encapsulate and execute a sequence of statements. They allow for modular, reusable, and efficient code development. Functions in R can perform a wide range of tasks, from simple operations like adding two numbers to complex data analyses and visualizations. The structure and behavior of functions in R are designed to support both built-in functions provided by R itself and user-defined functions created by programmers to meet specific needs.

Functions in R can do a wide range of tasks, such as perform a simple calculation and return a single variable. or a vector of variables. Functions can be used to clean, subset, merge, and transform data frames or lists. They can perform statistical modeling and analysis as well as create simple plots and complex, multi-layered graphics.

Moreover, R empowers users to define their own functions, allowing for the encapsulation of complex or repetitive tasks into single, reusable commands, enhancing the language’s flexibility and efficiency.

A function in R is defined using the function keyword, followed by a set of parentheses that can contain any arguments (parameters) the function requires, and a body enclosed in curly braces {} that contains the code to be executed. Here’s the basic syntax:

add <- function(a,b) { a + b }
add(1,2)
## [1] 3

Note that this function requires the inputs for a and b as denoted in the parameters brackets (). The function then transfers those inputs into the main body of the function that performs the operation inside the curly brackets {}. And while this single line function is compact and concise, it does not define default values, check any of the inputs, or explicitly return a value. And as a consequence we can get an error that can be confusing.

add(1,'two')
## Error in a + b: non-numeric argument to binary operator
add(1)
## Error in add(1): argument "b" is missing, with no default

The following is a more robust function that can be reused at a later date that adds some readability to the process and explicitly returns, with the return() function, the intended value. Note, that we used some functions, is.numeric(), stop(), and paste0 inside our function.

add <- function(
    a = 1,
    b = 2
) { 
  if(!is.numeric(a)) { stop('the first value is not a number') }
  if(!is.numeric(b)) { stop(paste0('the second value "', b, '" is not a number')) }
  answer <- a + b 
  return(answer)
}
add(1,2)
## [1] 3
add(1)
## [1] 3
add(1,'two')
## Error in add(1, "two"): the second value "two" is not a number

4.3 Flow-Control

Sometimes in the course of data analysis we have to make decisions or branch decisions based on what is contained within the data. The logic of how this done within the programming language is called flow control. More generally, flow control is an essential aspect of programming that allows you to control the order in which statements and functions are executed.

4.3.1 For Loop

In R, a loop is a programming construct that allows you to execute a block of code repeatedly. Loops are used when you want to perform a set of instructions repeatedly, such as when you want to iterate over a set of data and perform a particular operation on each element.

There are several types of loops available in R, including the for loop, the while loop, and the repeat loop.

The for loop is the most commonly used loop in R. It is used to iterate over a sequence of values, such as a vector or a list, and perform a particular operation on each element of the sequence. The basic syntax of a for loop in R is as follows:

for (var in sequence) {
  # code to be executed
}

Let’s look at a very simple for loop:

# Create a vector of numbers
numbers <- c(1, 2, 3, 4, 5)

# Iterate over the vector using a for loop
for (num in numbers) {
  print(num)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5

This code first creates some example data, numbers 1 to 5 increasing by 1, then we go every number and print the value. Translating the synatx to english we can say for every num in numbers vector, print the num.

4.3.2 If-Else

When we have data that needs to be treated based on a condition of the data, we have a branching decision. In this case, our flow control is an If-Else statement. In plain english, if a condition is met, we do something, else we do something else. The brackets between these statements determine what is done.

Let’s take the previous example and print if the number is even or odd. You can see already that in how we formulate the programming question we already see the syntax. We say “if the number is even…” if is our flow control

# Create a vector of numbers
numbers <- c(1, 2, 3, 4, 5)

# Iterate over the vector using a for loop
for (num in numbers) {
  if (num %% 2 == 0) {
    print(paste(num, "is even"))
  } else {
    print(paste(num, "is odd"))
  }
}
## [1] "1 is odd"
## [1] "2 is even"
## [1] "3 is odd"
## [1] "4 is even"
## [1] "5 is odd"

In this example, we’re using the %% operator to determine whether each number in the vector is even or odd. If the number is even, the code inside the if block is executed and the number is printed along with the message “is even”. If the number is odd, the code inside the else block is executed and the number is printed along with the message “is odd”. `paste is a handy function for combining

Exercises

  • Create a new R Studio Project and name it 002_basics.

  • Create a new R script, add your name and date at the top as comments.

# Your Name
# YYYY-MM-DD
# Institution
#
# Description
  1. Calculate the sum of 2 and 3.
## [1] 5
  1. Evaluate if 0.5 is equal to 1 divided by 2.
## [1] TRUE
  1. Test if 3 is an even number. Hint, use the round() or floor() functions and a comparison operator (eg. if the number is even there will not be a remainder).
## [1] FALSE
  1. Create a function to test if a value is even resulting in TRUE or FALSE.
even(3)
## [1] FALSE
  1. Construct an if-else statement to test if the number three is odd or even.
## [1] "odd"
  1. Create a function to test or even or odd by returning a string.
oddeven(3)
## [1] "odd"
  1. Construct a for-loop to test, using the function from #8, if the numbers between 1 and 9 are odd or even, printing the number and string for each on a new line.
## 1     odd 
## 2     even 
## 3     odd 
## 4     even 
## 5     odd 
## 6     even 
## 7     odd 
## 8     even 
## 9     odd