Basic operations
# Addition
1 + 1
## [1] 2
# Subtraction
2 - 3
## [1] -1
# Multiplication
2 * 3
## [1] 6
# Division
6 / 4
## [1] 1.5
# Integer division
6 %/% 4
## [1] 1
# Remainder
6 %% 4
## [1] 2
# Exponentiation
2 ^ 3
## [1] 8
8 ^ (1 / 3)
## [1] 2
Objects in R
To create an object, we assign something, in this case a value, to a name using the assignment operator <- or =:
obj1 <- 5
obj2 = 5 + 2
Notice that both objects have been created and appear in the upper-right pane (Environment). We can print their values by executing the object name:
obj1
## [1] 5
or by printing the value explicitly with the print() function:
print(obj2)
## [1] 7
We can also modify an object by assigning a new value to it:
obj1 = obj2
obj1
## [1] 7
A very common way to update an object is to use its current value as the starting point:
obj1 = obj1 + 3
obj1
## [1] 10
This will become especially relevant when we work with loops.
We can inspect the object type using the class() function:
class(obj1)
## [1] "numeric"
Therefore, obj1 is a real number. There are five atomic object classes in R, that is, objects containing a single value:
character: textnumeric: real numberinteger: integercomplex: complex numberlogical: true/false (1 or 0)
num_real = 3
class(num_real)
## [1] "numeric"
num_inteiro = 3L # use the suffix L for integers
class(num_inteiro)
## [1] "integer"
texto = "Hi"
print(texto)
## [1] "Oi"
class(texto)
## [1] "character"
boolean = 2>1
print(boolean)
## [1] TRUE
class(boolean)
## [1] "logical"
boolean2 = T # alternatively, you could write TRUE
print(boolean2)
## [1] TRUE
boolean3 = F # alternatively, you could write FALSE
print(boolean3)
## [1] FALSE
Logical/Boolean expressions
These are expressions that return either TRUE or FALSE:
class(TRUE)
## [1] "logical"
class(FALSE)
## [1] "logical"
T + F + T + F + F # sum of 1s (TRUE) and 0s (FALSE)
## [1] 2
2 < 20
## [1] TRUE
19 >= 19
## [1] TRUE
100 == 10^2
## [1] TRUE
100 != 20*5
## [1] FALSE
We can write compound expressions using | (or) and & (and):
x = 20
x < 0 | x^2 > 100
## [1] TRUE
x < 0 & x^2 > 100
## [1] FALSE
Operator precedence table
- Level 6 - exponentiation:
^- Level 5 - multiplication/division:
*,/,%/%,%%- Level 4 - addition/subtraction:
+,-- Level 3 - relational operators:
==,!=,<=,>=,>,<- Level 2 - logical:
&(and)- Level 1 - logical:
|(or)
Vectors
After the five object classes introduced above, the most basic multi-element structures are vectors and lists. A vector requires all elements to belong to the same class. We can create a vector with the c() function by separating the elements with commas:
x = c(0.5, 0.6) # numeric
x = c(TRUE, FALSE) # logical
x = c("a", "b", "c") # character
x = 9:12 # integer (equivalent to c(9, 10, 11, 12))
x = c(1+0i, 2+4i) # complex
If we use the c() function with elements from different classes, R coerces the object into the most general class:
y = c(1.7, "a") # (numeric, character) -> character
class(y)
## [1] "character"
y = c(FALSE, 0.75) # (logical, numeric) -> numeric
class(y)
## [1] "numeric"
y = c("a", TRUE) # (character, logical) -> character
class(y)
## [1] "character"
Note:
character > complex > numeric > integer > logical
We can also force a conversion toward a less general class, which typically leads to:
- the more general elements becoming missing values (
NAs),
as.numeric(c(1.7, "a")) # (numeric, character)
## Warning: NAs introduced by coercion
## [1] 1.7 NA
as.logical(c("a", TRUE)) # (character, logical)
## [1] NA TRUE
- [exception] a character entry containing a number, such as
"9", can be coerced to numeric
as.numeric(c(1.7, "9")) # (numeric, character number)
## [1] 1.7 9.0
- [exception] a nonzero numeric value becomes TRUE when coerced to logical
- [exception] a zero-valued numeric entry becomes FALSE when coerced to logical
as.logical(c(FALSE, 0.75, -10)) # (logical, numeric > 0, numeric < 0)
## [1] FALSE TRUE TRUE
as.logical(c(TRUE, 0)) # (logical, numeric zero)
## [1] TRUE FALSE
- [exception] logical character values such as
"TRUE","T","FALSE", and"F"can be coerced to logical values, whereas"0"and"1"becomeNA
as.logical(c("T", "FALSE", "1", TRUE)) # (character TRUE/FALSE, logical)
## [1] TRUE FALSE NA TRUE
Building sequence vectors
- A convenient way to build a numeric sequence is to use the
seq()function.
seq(from = 1, to = 1, by = ((to - from)/(length.out - 1)),
length.out = NULL, ...)
from, to: the starting and (maximal) end values of the sequence. Of length 1 unless just from is supplied as an unnamed argument.
by: number, increment of the sequence.
length.out: desired length of the sequence. A non-negative number, which for seq and seq.int will be rounded up if fractional.
- Notice that the arguments already have default values, so sequences can be defined in different ways.
- Once
fromandtoare specified, we can either:- define
byto determine the step between consecutive elements, or - define
length.out(or simplylength) to determine how many elements the sequence should contain
- define
seq(from=0, to=10, by=2)
## [1] 0 2 4 6 8 10
seq(from=0, to=10, length=5)
## [1] 0.0 2.5 5.0 7.5 10.0
Building vectors with repeated elements
- We can build vectors with repeated elements using the
rep()function.
rep(x, times)
x: a vector (of any mode including a list) or a factor or (for rep only) a POSIXct or POSIXlt or Date object.
rep(0, 10) # repetition of 10 zeros
## [1] 0 0 0 0 0 0 0 0 0 0
rep(c("a", "b"), 2) # repeat the vector c("a", "b")
## [1] "a" "b" "a" "b"
Matrices
Matrices are vectors and therefore contain elements from the same class, but with a dimension attribute, that is, a number of rows by a number of columns. A matrix can be created with the matrix() function:
matrix(data = NA, nrow = 1, ncol = 1, byrow = FALSE, ...)
data: an optional data vector (including a list or expression vector). Non-atomic classed R objects are coerced by as.vector and all attributes discarded.
nrow: the desired number of rows.
ncol: the desired number of columns.
byrow: logical. If FALSE (the default) the matrix is filled by columns, otherwise the matrix is filled by rows.
m = matrix(nrow=2, ncol=3)
m
## [,1] [,2] [,3]
## [1,] NA NA NA
## [2,] NA NA NA
We can create a filled matrix by supplying a vector with the required number of entries (rows \(\times\) columns). By default, the vector fills the matrix column by column (column-wise):
m = matrix(1:6, nrow=2, ncol=3)
m
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
Another way to create matrices is by binding vectors by columns (column-binding) or by rows (row-binding) using cbind() and rbind(), respectively:
cbind(...)
rbind(...)
... : (generalized) vectors or matrices. These can be given as named arguments. Other R objects may be coerced as appropriate, or S4 methods may be used: see sections ‘Details’ and ‘Value’. (For the "data.frame" method of cbind these can be further arguments to data.frame such as stringsAsFactors.)
x = 1:3
y = 10:12
cbind(x, y)
## x y
## [1,] 1 10
## [2,] 2 11
## [3,] 3 12
rbind(x, y)
## [,1] [,2] [,3]
## x 1 2 3
## y 10 11 12
Lists
In contrast, a list allows its elements to belong to different classes and may even contain a vector as one of its elements. Lists can be created with the list() function:
x = list(1, "a", TRUE, 1+4i, c(0.5, 0.6))
x
## [[1]]
## [1] 1
##
## [[2]]
## [1] "a"
##
## [[3]]
## [1] TRUE
##
## [[4]]
## [1] 1+4i
##
## [[5]]
## [1] 0.5 0.6
class(x)
## [1] "list"
Data frames
A data frame is a special type of list in which each element has the same length
Each list element can be interpreted as a column in a dataset
Unlike matrices, each element of a data frame may belong to a different class
It is often created automatically when we load a dataset from a
.txtor.csvfile withread.table()orread.csv()It can also be created manually with
data.frame()
data.frame(..., stringsAsFactors = FALSE)
... : these arguments are of either the form value or tag = value. Component names are created based on the tag (if present) or the deparsed argument itself.
stringsAsFactors: logical: should character vectors be converted to factors?.
x = data.frame(foo=1:4, bar=c(T, T, F, F))
x
## foo bar
## 1 1 TRUE
## 2 2 TRUE
## 3 3 FALSE
## 4 4 FALSE
nrow(x) # Number of rows in x
## [1] 4
ncol(x) # Number of columns in x
## [1] 2
Importing data frames
- Reading tabular data (John Hopkins/Coursera)
- For reading tabular data, the most widely used functions are
read.table()andread.csv() read.table()uses the following arguments, which also appear in other data import functions:file: file path, including its extensionheader:TRUEorFALSE, indicating whether the first row of the dataset is a headersep: indicates how the columns are separatedstringAsFactors:TRUEorFALSE, depending on whether text variables should be converted to factors.
data_txt = read.table("mtcars.txt") # also reads .csv files
data_csv = read.csv("mtcars.csv")
- If you want to test it, download the datasets mtcars.txt and mtcars.csv.
- If you have not defined a working directory, you need to provide the full path of the file you want to import:
data = read.table("C:/Users/Fabio/OneDrive/FEA-RP/mtcars.csv")
read.csv()is similar toread.table(), but it assumes that the separator is a comma by default (sep = ",").- To export a dataset, we typically use
write.csv()orwrite.table(), supplying a data frame and the desired file name including its extension.
Importing other formats
- To open Excel files in
.xlsor.xlsxformat, you need thexlsxpackage.
read.xlsx("mtcars.xlsx", sheetIndex=1) # Read the first worksheet of the Excel file
If you want to test it, download mtcars.xlsx.
- To open SPSS, Stata, and SAS files, use the
havenpackage and the functionsread_spss(),read_dta(), andread_sas(), respectively.
Note that R uses a period (
.) as the decimal separator, whereas Brazilian-formatted files often use a comma.This may lead to incorrect imports if the
.csvor.xlsxfile was generated with Brazilian formatting conventions.To handle this issue, use
read.csv2()andread.xlsx2()so the data are imported correctly under Brazilian formatting conventions. If you want to test it, download mtcars_br.csv and mtcars_br.xlsx.
Extracting subsets
- Subsetting - Basics (John Hopkins/Coursera)
- There are three basic operators for extracting subsets of objects in R:
[]: returns a sub-object of the same class as the original object[[]]: used to extract elements from a list or data frame, where the extracted object is not necessarily of the same class as the original object$: used to extract an element from a list or data frame by name
Subsetting vectors
x = c(1, 2, 3, 3, 4, 1)
x[1] # extract the first element of x
## [1] 1
x[1:4] # extract the first through fourth elements of x
## [1] 1 2 3 3
- Notice that a logical expression applied to a vector returns a logical vector.
x > 1
## [1] FALSE TRUE TRUE TRUE TRUE FALSE
- Using
[], we can extract a subset of the vectorxusing a condition. For example, we can extract only values greater than 1:
x[x > 1]
## [1] 2 3 3 4
x[c(F, T, T, T, T, F)] # Equivalent to x[x > 1] - keep only TRUE entries
## [1] 2 3 3 4
Subsetting lists
- Subsetting - Lists (John Hopkins/Coursera)
- Unlike vectors, extracting a specific value from a list requires
[[]]together with the position of the list element.
x = list(foo=1:4, bar=0.6)
x
## $foo
## [1] 1 2 3 4
##
## $bar
## [1] 0.6
x[1] # returns the list element foo
## $foo
## [1] 1 2 3 4
class(x[1])
## [1] "list"
x[[1]] # returns the vector foo
## [1] 1 2 3 4
class(x[[1]])
## [1] "integer"
- If you want to access an element inside that list element, you must follow it with
[].
x[[1]][2]
## [1] 2
x[[2]][1]
## [1] 0.6
- We can also use the element name to extract a subset of the object.
x[["foo"]]
## [1] 1 2 3 4
x$foo
## [1] 1 2 3 4
Subsetting matrices and data frames
- Subsetting - Matrices (John Hopkins/Coursera)
- To extract part of a matrix or data frame, we specify the rows and columns inside
[].
x = matrix(1:6, 2, 3)
x
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
x[1, 2] # row 1, column 2 of matrix x
## [1] 3
x[1:2, 2:3] # rows 1-2 and columns 2-3 of matrix x
## [,1] [,2]
## [1,] 3 5
## [2,] 4 6
- We can select rows or columns with a logical vector (
TRUEs andFALSEs) of the appropriate length:
x[, c(F, T, T)] # logical vector selecting the last two columns
## [,1] [,2]
## [1,] 3 5
## [2,] 4 6
- We can select entire rows or columns by leaving out the corresponding index:
x[1, ] # row 1 and all columns
## [1] 1 3 5
x[, 2] # all rows and column 2
## [1] 3 4
- Notice that when the subset is a single value or a vector, the returned object stops being a matrix. We can force R to keep it as a matrix with
drop = FALSE.
x[1, 2, drop = FALSE]
## [,1]
## [1,] 3
Removing missing values (NA)
- Subsetting - Removing missing values (John Hopkins/Coursera)
- Removing missing data is a common step in data manipulation.
- To check which entries are
NA, use theis.na()function.
x = c(1, 2, NA, 4, NA, NA)
is.na(x)
## [1] FALSE FALSE TRUE FALSE TRUE TRUE
sum(is.na(x)) # number of missing values
## [1] 3
- Recall that the
!operator negates an expression, so!is.na()returns the elements that are not missing.
x[ !is.na(x) ]
## [1] 1 2 4
Vector and matrix operations
- Vectorized operations (John Hopkins/Coursera)
- When standard arithmetic operations are applied to vectors, each element is combined with the element in the same position of the other vector.
x = 1:4
y = 6:9
x + y # add elements in the same position
## [1] 7 9 11 13
x + 2 # add the same scalar to each element
## [1] 3 4 5 6
x * y # multiply elements in the same position
## [1] 6 14 24 36
x / y # divide elements in the same position
## [1] 0.1666667 0.2857143 0.3750000 0.4444444
- The
%*%operator is used for inner and matrix products. By default, R treats the first vector as a row vector and the second as a column vector.
# Vector operations
x %*% y # x as row vector / y as column vector
## [,1]
## [1,] 80
x %*% t(y) # x as column vector / y as row vector (only the second is transposed)
## [,1] [,2] [,3] [,4]
## [1,] 6 7 8 9
## [2,] 12 14 16 18
## [3,] 18 21 24 27
## [4,] 24 28 32 36
- We can also explicitly force a vector to be represented as a row or column using
matrix().
# Turning vectors into matrix objects
x_col = matrix(x, ncol=1) # column vector
x_col
## [,1]
## [1,] 1
## [2,] 2
## [3,] 3
## [4,] 4
y_lin = matrix(y, nrow=1)
y_lin
## [,1] [,2] [,3] [,4]
## [1,] 6 7 8 9
# Vector operations
x_col %*% y_lin # x as column vector / y as row vector
## [,1] [,2] [,3] [,4]
## [1,] 6 7 8 9
## [2,] 12 14 16 18
## [3,] 18 21 24 27
## [4,] 24 28 32 36
t(x_col) %*% t(y_lin) # x as row vector / y as column vector
## [,1]
## [1,] 80
- The same logic applies to matrices:
x = matrix(1:4, nrow=2, ncol=2)
x
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
y = matrix(rep(10, 4), nrow=2, ncol=2)
y
## [,1] [,2]
## [1,] 10 10
## [2,] 10 10
x + y # add elements in the same position
## [,1] [,2]
## [1,] 11 13
## [2,] 12 14
x + 2 # add the same scalar to each matrix entry
## [,1] [,2]
## [1,] 3 5
## [2,] 4 6
x * y # multiply elements in the same position
## [,1] [,2]
## [1,] 10 30
## [2,] 20 40
x %*% y # matrix multiplication
## [,1] [,2]
## [1,] 40 40
## [2,] 60 60
Basic statistics for vectors and matrices
- Absolute values:
abs()
x = c(1, 4, -5, 2, 8, -2, 4, 7, 8, 0, 2, 3, -5, 7, 4, -4, 2, 5, 2, -3)
x
## [1] 1 4 -5 2 8 -2 4 7 8 0 2 3 -5 7 4 -4 2 5 2 -3
abs(x)
## [1] 1 4 5 2 8 2 4 7 8 0 2 3 5 7 4 4 2 5 2 3
- Sum:
sum(..., na.rm = FALSE)
sum(x)
## [1] 40
- Mean:
mean(x, trim = 0, na.rm = FALSE, ...)
mean(x)
## [1] 2
- Standard deviation:
sd(x, na.rm = FALSE)
sd(x)
## [1] 4.129483
- Quantiles:
quantile(x, probs = seq(0, 1, 0.25), na.rm = FALSE, ...)
# Minimum, first quartile, median, third quartile, and maximum
quantile(x, probs=c(0, .25, .5, .75, 1))
## 0% 25% 50% 75% 100%
## -5.00 -0.50 2.00 4.25 8.00
- Maximum and minimum:
max(..., na.rm = FALSE)andmin(..., na.rm = FALSE)
# Minimum, first quartile, median, third quartile, and maximum
max(x) # Maximum value
## [1] 8
min(x) # Minimum value
## [1] -5
- Maximum and minimum values can also be studied with
which.max()andwhich.min(), which return the index of the first element attaining the maximum or minimum in a numeric vector:
which.max(x) # first index of the maximum value
## [1] 5
which.min(x) # first index of the minimum value
## [1] 3
x[which.max(x)] # extract the maximum value from vector x
## [1] 8
- To obtain the indices of all maximum or minimum elements, we use
which(), which takes a logical vector (TRUEs andFALSEs) as input and returns a vector of indices:
which(x, ...)
x: a logical vector or array. NAs are allowed and omitted (treated as if FALSE).
x == max(x) # logical vector (TRUE entries mark the maxima)
## [1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
which(x == max(x)) # vector of indices for elements attaining the maximum
## [1] 5 9
x[which(x == max(x))] # maximum values
## [1] 8 8
- Notice that if missing values (
NA) are present, the function returnsNAby default. To exclude missing values, setna.rm = TRUE:
x = c(1, 4, -5, 2, NA, -2, 4, 7, NA, 0, 2, 3, -5, NA, 4, -4, NA, 5, 2, NA)
mean(x) # without excluding missing values
## [1] NA
mean(x, na.rm=TRUE) # excluding missing values
## [1] 1.2
Example: univariate function optimization
- We want to find the value of $x$ that minimizes the univariate function $f(x) = x^2 + 4x - 4$, that is, $$ \text{arg} \min_x (x^2 + 4x - 4) $$
- You can also inspect the function in Wolfram.
- To solve it numerically, we can evaluate many candidate values of $x$ and keep the smallest objective value.
- First, we build a grid of values for $x$ over the interval $[-5, 5]$.
x_grid = seq(-5, 5, length=20)
x_grid
## [1] -5.0000000 -4.4736842 -3.9473684 -3.4210526 -2.8947368 -2.3684211
## [7] -1.8421053 -1.3157895 -0.7894737 -0.2631579 0.2631579 0.7894737
## [13] 1.3157895 1.8421053 2.3684211 2.8947368 3.4210526 3.9473684
## [19] 4.4736842 5.0000000
- Next, we evaluate $f(x)$ at each candidate value of $x$.
fx = x_grid^2 + 4*x_grid - 4
- Notice that each value computed in
fxcorresponds to the value of $x$ in the same position ofx_grid.
head( cbind(x=x_grid, fx=fx), 6) # show the first six values
## x fx
## [1,] -5.000000 1.000000
## [2,] -4.473684 -1.880886
## [3,] -3.947368 -4.207756
## [4,] -3.421053 -5.980609
## [5,] -2.894737 -7.199446
## [6,] -2.368421 -7.864266
- We can now inspect both the value and the position of the $x$ that minimizes the function:
min(fx) # minimum value of f(x)
## [1] -7.975069
argmin_index = which.min(fx) # index of x that minimizes the function
argmin_index
## [1] 7
- To recover the value of
$x$ that minimizes
$f(x)$, we use the index found above to access
x_grid:
x_grid[argmin_index]
## [1] -1.842105
- Accuracy can be improved by using a finer grid for
x_grid. In more computationally intensive problems, however, that may substantially increase running time.
Control structures
- Control structures in R determine the program flow.
Conditional statements (if)
x = 5
if (x > 10) {
y = 10
print("case x > 10")
} else if (x > 0) {
y = 5
print("case 10 >= x > 0")
} else {
y = 0
print("case x <= 0")
}
## [1] "case 10 >= x > 0"
y
## [1] 5
- The same structure can also be used to assign a value to an object.
x = 5
y = if (x > 10) {
10
} else if (x > 0) {
5
} else {
0
}
y
## [1] 5
for loops
- Control structures - For loops (John Hopkins/Coursera)
- A
forloop requires a vector and a name for the iteration variable.
for(i in 3:7) {
print(i)
}
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
- Above, we named the iteration variable
iand supplied the integer vector from 3 to 7. - At each iteration,
itakes one value from the vector3:7until all elements have been used. - Numeric sequences are especially useful inside
forloops.
sequencia = seq(1, 5, length.out=11)
sequencia
## [1] 1.0 1.4 1.8 2.2 2.6 3.0 3.4 3.8 4.2 4.6 5.0
for (val in sequencia) {
print(val^2)
}
## [1] 1
## [1] 1.96
## [1] 3.24
## [1] 4.84
## [1] 6.76
## [1] 9
## [1] 11.56
## [1] 14.44
## [1] 17.64
## [1] 21.16
## [1] 25
while loops
- Control structures - While loops (John Hopkins/Coursera)
- Unlike
for, awhileloop requires the indicator variable to be created before the loop starts. - This happens because each loop iteration, including the first one, only runs if the condition is true.
- Since
whiledoes not automatically move through a vector asfordoes, the indicator variable must be updated at each iteration to avoid an infinite loop. - A common way to use
whileis to define the indicator variable as a counter and stop after a predetermined number of iterations.
counter = 0
while (counter <= 10) {
print(counter)
counter = counter + 1
}
## [1] 0
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
- Another possibility is to use as the indicator variable a quantity that is repeatedly updated until convergence or until it crosses a threshold. In the example below, the variable
distanceis halved at each iteration and stops once it becomes smaller than\(10^{-3}\).
distance = 10
tolerance = 1e-3 # = 1 x 10^(-3) = 0.001
while (distance > tolerance) {
distance = distance / 2
}
distance
## [1] 0.0006103516
Example 1: multiplication table
- It is common to use one loop inside another loop (nested loops).
- As an example, we create an empty matrix and fill it with the multiplication table.
times_table = matrix(NA, 10, 10)
times_table
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] NA NA NA NA NA NA NA NA NA NA
## [2,] NA NA NA NA NA NA NA NA NA NA
## [3,] NA NA NA NA NA NA NA NA NA NA
## [4,] NA NA NA NA NA NA NA NA NA NA
## [5,] NA NA NA NA NA NA NA NA NA NA
## [6,] NA NA NA NA NA NA NA NA NA NA
## [7,] NA NA NA NA NA NA NA NA NA NA
## [8,] NA NA NA NA NA NA NA NA NA NA
## [9,] NA NA NA NA NA NA NA NA NA NA
## [10,] NA NA NA NA NA NA NA NA NA NA
# Fill the multiplication-table matrix
for (row in 1:10) {
# Given one row value, fill all columns
for (col in 1:10) {
times_table[row, col] = row * col
}
# After finishing the columns, move to the next row
}
times_table
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 1 2 3 4 5 6 7 8 9 10
## [2,] 2 4 6 8 10 12 14 16 18 20
## [3,] 3 6 9 12 15 18 21 24 27 30
## [4,] 4 8 12 16 20 24 28 32 36 40
## [5,] 5 10 15 20 25 30 35 40 45 50
## [6,] 6 12 18 24 30 36 42 48 54 60
## [7,] 7 14 21 28 35 42 49 56 63 70
## [8,] 8 16 24 32 40 48 56 64 72 80
## [9,] 9 18 27 36 45 54 63 72 81 90
## [10,] 10 20 30 40 50 60 70 80 90 100
Example 2: bivariate function optimization
- We now want to find the pair $(x, z)$ that minimizes the bivariate function $f(x, z) = x^2 + 4z^2 - 4$, that is, $$ \text{arg} \min_{x, z} (x^2 + 4z^2 - 4) $$
- First, we create grids of candidate values for
\(x\)and\(z\).
x_grid = seq(-5, 5, length=11)
z_grid = seq(-6, 6, length=11)
- Next, we create a matrix where each row represents a value of $x$ and each column represents a value of $z$:
# Create the matrix to be filled
fxz = matrix(NA, length(x_grid), length(z_grid))
# Name rows and columns
rownames(fxz) = x_grid
colnames(fxz) = z_grid
fxz
## -6 -4.8 -3.6 -2.4 -1.2 0 1.2 2.4 3.6 4.8 6
## -5 NA NA NA NA NA NA NA NA NA NA NA
## -4 NA NA NA NA NA NA NA NA NA NA NA
## -3 NA NA NA NA NA NA NA NA NA NA NA
## -2 NA NA NA NA NA NA NA NA NA NA NA
## -1 NA NA NA NA NA NA NA NA NA NA NA
## 0 NA NA NA NA NA NA NA NA NA NA NA
## 1 NA NA NA NA NA NA NA NA NA NA NA
## 2 NA NA NA NA NA NA NA NA NA NA NA
## 3 NA NA NA NA NA NA NA NA NA NA NA
## 4 NA NA NA NA NA NA NA NA NA NA NA
## 5 NA NA NA NA NA NA NA NA NA NA NA
- We then fill in each possible combination using nested loops.
# Fill the fxz matrix
for (row_x in 1:length(x_grid)) {
for (col_z in 1:length(z_grid)) {
fxz[row_x, col_z] = x_grid[row_x]^2 + 4*z_grid[col_z]^2 -4
}
}
fxz
## -6 -4.8 -3.6 -2.4 -1.2 0 1.2 2.4 3.6 4.8 6
## -5 165 113.16 72.84 44.04 26.76 21 26.76 44.04 72.84 113.16 165
## -4 156 104.16 63.84 35.04 17.76 12 17.76 35.04 63.84 104.16 156
## -3 149 97.16 56.84 28.04 10.76 5 10.76 28.04 56.84 97.16 149
## -2 144 92.16 51.84 23.04 5.76 0 5.76 23.04 51.84 92.16 144
## -1 141 89.16 48.84 20.04 2.76 -3 2.76 20.04 48.84 89.16 141
## 0 140 88.16 47.84 19.04 1.76 -4 1.76 19.04 47.84 88.16 140
## 1 141 89.16 48.84 20.04 2.76 -3 2.76 20.04 48.84 89.16 141
## 2 144 92.16 51.84 23.04 5.76 0 5.76 23.04 51.84 92.16 144
## 3 149 97.16 56.84 28.04 10.76 5 10.76 28.04 56.84 97.16 149
## 4 156 104.16 63.84 35.04 17.76 12 17.76 35.04 63.84 104.16 156
## 5 165 113.16 72.84 44.04 26.76 21 26.76 44.04 72.84 113.16 165
- To recover the pair
$(x, z)$ that minimizes
$f(x, z)$, we use the location of the minimum with the argument
arr.ind = TRUE:
argmin_index = which(fxz==min(fxz), arr.ind = TRUE)
argmin_index
## row col
## 0 6 6
argmin_x = x_grid[argmin_index[1]] # apply the x index to x_grid
argmin_z = z_grid[argmin_index[2]] # apply the z index to z_grid
paste0("The pair (x = ", argmin_x, ", z = ", argmin_z, ") minimizes the function f(x,z).")
## [1] "The pair (x = 0, z = 0) minimizes the function f(x,z)."
Creating functions
- Your first R function (John Hopkins/Coursera)
- To create a function, we use
function(){}:- inside the parentheses
(), we include arbitrary variable names (arguments/inputs) that the function will use in calculations - inside the braces
{}, we use those variable names to perform calculations and return an output, namely the last value inside the braces
- inside the parentheses
- As an example, we create a function that takes two numbers as inputs and returns their sum.
sum_two = function(a, b) {
a + b
}
- Assigning the function to the object
sum_twodoes not by itself generate output. To evaluate it, we callsum_two()and provide two numbers as inputs:
sum_two(10, 4)
## [1] 14
- Note that the arbitrary variables
aandbexist only inside the function.
> a
Error: object 'a' not found
- We can also specify a default value for a function argument. For example, let us define a function that returns all elements of a vector greater than
n:
vector_x = 1:20
above = function(x, n = 10) {
x[x > n]
}
above(vector_x) # all values above the default threshold of 10
## [1] 11 12 13 14 15 16 17 18 19 20
above(vector_x, 14) # all values above 14
## [1] 15 16 17 18 19 20