Introduction to R

The work of a data scientist consists of many aspects:

  • working with domain experts to determine what questions need addressing
  • assembling a group of data into a dataset for analysis
  • examining the dataset to remove any unusable parts
  • analyzing the data to addressing the research question(s)
  • sharing the results with domain experts

Some of these steps are quite mechanical or cumbersome, and may repeat the same calculation again and again. Mistakes are costly; they can lead to wrong conclusions and errors on the job can mean money lost. It would be best not to butcher your data science job because of simple mistakes.

However, to err is human. So how can you minimize the chance of those errors occurring? That’s where computer programming come to the rescue.

A Fast History of Computing

A computer program is a sequence of characters with which to tell a computer how to perform its task. The history of computers is kind of long. Early in the 1800s, Charles Babbage invented a machine for calculation, calling it the “Difference Engine”. Ada Lovelace, a daughter of English poet Lord Byron, collaborated with Babbage to develop ideas about what the engine could compute beyond addition and multiplication.

Several attempts to build such mechanical systems to perform calculation had emerged until the first half of the 1900s during which a series of innovations resulted in the invention of “computers”. The new inventions used electricity as the source of energy, vacuum tubes for regulating electrical currents, separation between the “brain” that is responsible for computation and the storage for holding the final and intermediate results of computation, and punch cards for defining the calculation steps to follow.

Programming languages and scripting languages

In the early days of modern computing, programmers had to provide instructions to computers using languages specific to the computers. However, as new computers emerged and the number of computers in use grew, there arose a need for languages that programmers can use in describing the instructions.

That is the birth of modern, machine-independent programming languages. Allowing the use of machine-independent languages in writing programs requires the creation of a layer that sits between programmers who code with universal languages and the computers who will execute the program but do not understand the languages the programmers use.

Early on, there was only one layer separating programmers from computers. Nowadays there are multiple layers separating between them. What these layers do is outside the coverage of this text, and we will skip it entirely. The key point is that a programmer can use a specific language to write what she intends to do with program.

Many types of programming languages exist. One type is the scripting language. In a scripting language, a programmer “enters” an environment specific to the language and then executes her program “line by line”.

A line of a program is a sequence of characters which is syntactically complete according to the syntax of the language. In scripting language programming, a programmer completes writing of a line by pressing the return key (and the like), and then the environment immediately parses the line the programmer has entered regarding the syntax of the language and performs any action the line expresses.

Programming in R

We will use the programming language R in this text. There are four major reasons to choose R as our language.

  • R is quite popular. There are many data scientists out there who enjoy conducting science using R.
  • There are so many R codes for public use. You can adapt such programs to your own needs. Reciprocally, you can share your R codes with others. Some of the popular codes have become libraries, groups of R codes that you can use for specific purposes. You can download such libraries for your own use with a simple line of code.
  • R comes with dozens of datasets that you play with.
  • There exists an easy-to-install R programming environment RStudio, where not only can you write code but also can create notebooks in which R code and prose alternate.

First Steps: Expressions

In R, executable “lines” must conform to the syntax of the language. If a programmer enters a syntactically incorrect line, the environment usually issues an “error message” with some information about why the line failed.

Entering an incorrect line has no effect on the programming task she wants to carry out, which means there is nothing to be afraid of about making mistakes!

A line of an R code consists of what we call expressions. Several types of expressions exist, which serve as building blocks. We will introduce the types as they become necessary.

Let us begin with the simple ones: numerals and arithmetic expressions.

Numerical literals

Numerical literals are expressions of numbers with exact specifications of their values. There is no ambiguity in the expressions.

Here are some examples of numerical literals.

415
## [1] 415
-3.56
## [1] -3.56
1956.5436781
## [1] 1956.544

An important numerical literal type uses e to specify shifting the decimal point. You are probably already familiar with this idea from elementary math courses. It is called writing in scientific notation.

5.07e-2
## [1] 0.0507

which is equivalent to \(5.07 * 10^{-2}\).

A slight detail that you may want to keep in mind is that each value carries a fixed number of digits that it can correctly represent, and so you cannot be too specific about the value. Consider the following tiny value.

1.5454786e-10
## [1] 1.545479e-10

The value turns out to be slightly off!

Arithmetic expressions

You can construct mathematical expressions combining numerical literals and mathematical operators +, -, *, and / as we use in mathematics. Also, ^ represents exponentiation.

The same order of operations apply when processing an expression with multiple types of operations in it. You can use parentheses, ( and ), to change the order. We often use other types of parentheses like [ and ] in writing mathematical expressions in English. However, these carry different meaning in R and so cannot be used in writing mathematical expressions.

When you type an arithmetic expression in an R environment and press the return key, R attempts to evaluate the expression. For example:

3.4 * (1.7 + 2.3)
## [1] 13.6
2^5
## [1] 32

Instead of using ^ to mean exponentiation, you can also use **. Note that there should not be a space between the two *.

2 ** 5
## [1] 32

Another important operation is %%, which is the remainder.

10.0 %% 2.3
## [1] 0.8

After subtracting 2.3 from 10.0 four times, we arrive at 0.8, which is the remainder.

Below is a summary of the mathematical operations.

Expression Operator Example
Addition + 10 + 1
Subtraction - 10 - 1
Division / 10 / 1
Multiplication * 10 * 1
Remainder %% 2 - 3
Exponentiation ^ or ** 2^3 or 2**3

Here is an example that shows the difference of having parentheses in the expression. Try it by hand first before seeing the answer!

2 + 3 * 4 * 5 / 6 ** 2
## [1] 3.666667
2 + (3 * 4 * 5 / 6) ** 2
## [1] 102

Giving names to results

When you enter a mathematical expression like the one we presented in the above, R responds by printing the value resulting from its evaluation of the expression, but you cannot recall the value. Meaning, we can easily compute the value of the following expression.

3.4 + 4.2
## [1] 7.6

But we have no way of storing that result! The problem is overcome by assigning a name to the value R produces so that we can keep it for future use.

a <- 3.4 + 4.2

We can check its value by typing its name.

a
## [1] 7.6

Once you have executed it, the assignment will persist until you make another assignment to a or you terminate the R environment program you are running.

A common idiom when working in a notebook environment is to make an assignment on one line and then immediately show the result on the next line by simply typing its name. For instance:

a <- 5 * 15 # an assignment 
a # type its name to reveal its value
## [1] 75

Note how the value of a has changed from 7.6 to 75. We call such named entities that hold a value objects.

You may have noticed the leading [1] in front of the value of a. This involves the internal workings of R and is a technical detail that we will learn about later.

What happens when we try to refer to an object that currently does not exist?

barnie
## Error in eval(expr, envir, enclos): object 'barnie' not found

Our first programming error! The message reported is quite informative: it states that in the present session there is no assigned value to the name barnie.

There are some rules about what kinds of names you can give to an object. A name can use any letter from the alphabet, both uppercase and lowercase, numerals, and the underscore _. Underscores are useful when you want to include more than one word in a name.

a_tasty_pie <- 314159
a_tasty_pie
## [1] 314159

A name may not contain spaces or start with a numeral.

Using objects in expressions

When an R environment evaluates an expression and encounters a name, it tries to see whether an object by the name exists and, if so, substitutes all of its occurrences in the expression with the present value of the object.

a_tasty_pie / 2
## [1] 157079.5

If the object cannot be found, the environment produces an error message.

a <- barnie * 3
## Error in eval(expr, envir, enclos): object 'barnie' not found

Expressions and R notebook gotchas

There are a few gotchas that you should be aware of when working with expressions in a R notebook environment. We illustrate these issues by example.

Consider the following code chunk that assigns the value 5 to a name a.

a <- 5 

Let us create another name, b, bound to the value 3.

b <- 3 

The name summed is the sum of a and b.

summed <- a + b
summed
## [1] 8

The name summed is bound to the number 8 even though it was created using the expression a + b. Names are bound to values, not mathematical expressions or “formulas”.

Now let us assign a to a different value. We will then print the value of summed again. Can you guess what the value of summed will be?

a <- 10
summed
## [1] 8

This may come as a surprise! Since we updated a to be 10 and summed is the addition of a and b, you might have expected summed to be 13! However, because names are bound to the value at the time the expression was evaluated, summed remains to be 8. The value does not change later, which you might expect from spreadsheet software like Excel.

If it is important for summed to reflect the new value of a, we would need to amend the previous code or add another code chunk that performs the addition again:

summed <- a + b
summed
## [1] 13

Next, try to restart your R session by navigating to Session > Restart R. Confirm that the “Environment” pane has emptied. If there are names that still appear, revisit the instructions in Section 1.4.

Then try to print out these values:

a
b
summed

You should see an error! Because the session was restarted, R forgets that a, b, and summed ever existed despite the code for its creation being immediately above this code chunk. If we wish to remedy the error, we would need to execute again all code chunks starting from the top of the notebook.

This raises another danger when working with R notebooks: evaluating code chunks out-of-order. Let’s see an example to show why.

Re-define a as follows.

a <- 10

Then consider the following code chunk.

a <- a + 5
a
## [1] 15

This cell re-assigns a after adding 5 to the current amount. We see the result of that operation is 15.

If you were to execute this code chunk again (by pressing the green “play” button), the value of a becomes 20. If you did that again, the value is 25! Then 30, 35, and so on. So the value that a ultimately takes on is dependent on the number of times you executed this code chunk, even though the code responsible for making a some value greater than 15 is nowhere to be seen in the notebook.

Therefore, you should avoid executing code chunks out-of-order as much as possible. This is a convenience brought by notebook environments, but the unpredictability that is introduced by it creates confusion and ultimately leads to mistakes. Worse, if you were to return to this notebook at a later time, you are prone to forget the steps you took to drive the notebook. If ever in doubt, reset your R session and execute all code chunks again in a linear top-down fashion.

One benefit of knitting is that the knitting process always clears the R environment and runs code chunks top-down. If the knitting process ever picks up an error, there is likely one or more code chunks that are dependent on out-of-order execution! Be careful with this!

Function calls

Another type of expression is a function, which has the same meaning as the functions you learned in math class: something that takes some input, does something with it, and produces an output. In R, a function is an object which takes a certain number of objects/values for its execution.

Here is an example. This function returns the absolute value of a number.

abs(-20)
## [1] 20

Here are some more examples.

a <- -10.5
floor(a)
## [1] -11
## [1] -10
abs(a)
## [1] 10.5

R ships with many functions that we are familiar with from mathematical courses, e.g., sin, cos, tan, asin, acos, atan, log, and log10, which are sine, cosine, tangent, inverse sine, inverse cosine, inverse tangent, natural logarithm, and logarithm base 10, respectively. The unit of the argument is radian for the first three (i.e., 360 degrees are equal to \(\pi\) radians).

sin(1)
## [1] 0.841471
cos(1)
## [1] 0.5403023
tan(1)
## [1] 1.557408
asin(-1)
## [1] -1.570796
acos(-1)
## [1] 3.141593
atan(1)
## [1] 0.7853982
log(10)
## [1] 2.302585
log10(10)
## [1] 1

Functions can of course take more than one argument. The max and min functions are those that return the maximum and the minimum of the argument list, respectively.

max(4, 5, 3)
## [1] 5
min(1, 2, 3, 4, 5, 6)
## [1] 1

These functions accept any number of arguments.

Example: Pay your debt!

You may have heard from friends and family the importance of paying debt. You can use R to do a hypothetical calculation of how quickly your debt would accumulate if you do not pay it.

Suppose you load $1,000 from an agency with a monthly interest rate of 1%, which means that after each month, not only do you still owe the money you owed one month ago but also you owe one percent of the money you owed.

For example, after the first month of the mortgage in question, you not only owe the $1,000 you loaned but also 1 percent of it, $10. Altogether, at the end of the first month, you owe a total of $1,010.

If you fail to repay, the same series of actions will happen after another month, increasing your total debt to $1,020.10 dollars. If you fail to repay again, the same series of actions will happen.

Let us see how this process balloons your account if you DO NOT pay for one year.

1000 * 1.01 * 1.01 * 1.01 * 1.01 * 
  1.01 * 1.01 * 1.01 * 1.01 * 1.01 * 1.01 * 1.01 * 1.01
## [1] 1126.825

The loan accumulate a little over 126 dollars in interest.

The amount does not appear to be much. However, if the “principal amount* is large, the effect is substantial. For example, consider what happens in a loan of $200,000.

200000 * 1.01 * 1.01 * 1.01 * 1.01 * 
  1.01 * 1.01 * 1.01 * 1.01 * 1.01 * 1.01 * 1.01 * 1.01
## [1] 225365

That’s $25,365 dollars in interest!

The expression we used had 12 repetitions of 1.01. Using exponentiation, we can simplify it.

200000 * 1.01 ^ 12
## [1] 225365
200000 * 1.01 ** 12
## [1] 225365

Usually when you borrow money from a bank or a mortgage company, you set up a plan for repaying so that you pay the same amount throughout your mortgage. In other words, you pay a fixed amount at every anniversary (usually every month since the start). The bank or the mortgage company subtracts your payment from the amount of money you owe, and one month later will apply the interest rate to the money you now owe. The amount you pay monthly must be so that you owe no more or no less when you make the last payment.

There is a formula that gives the monthly payment.

\[ \mathrm{monthly} = \frac{\mathrm{principal} * \mathrm{rate}^{\mathrm{months}} * (\mathrm{rate} - 1)}{\mathrm{rate}^{\mathrm{months}}-1} \]

Here \(\mathrm{principal}\) is the money you borrow, \(\mathrm{rate}\) is the monthly rate, and \(\mathrm{months}\) is the duration of the loan in terms of the number of months.

Let us make R do the computation for us.

principal <- 1000
rate <- 1.01
months <- 12
monthly <- principal * rate ^ months * (rate - 1) / (rate ^ months - 1)
total <- monthly * months

The objects principal, rate, months, and monthly represent the four pieces of information appearing in the equation and total represents the total payment, which is simply monthly * months.

What are the values of monthly and total?

monthly
## [1] 88.84879
total
## [1] 1066.185

We talked about the possibility of not paying for one year. Let us use an object balloon to represent the amount.

balloon <- principal * rate ^ months
balloon
## [1] 1126.825

Assuming that in the case of not paying for one year, you pay all the debt, what is the difference between the two options, and what is the proportion of the difference to the principal? Let us use objects diff and diff_percent to represent the quantities.

diff <- balloon - total
diff_percent <- diff / principal * 100
diff_percent
## [1] 6.063957

So it is a little over 6% that you will have to pay more.

In other words, we obtain the conclusion:

  • Assuming the rate is 1% per month and the loan is for 12 months, if you chose to pay out after 12 months paying none until then, you would have to pay slightly more than 6% of the principal than you would pay with the standard monthly payment program.

6% does not sound like a significant amount if the principal is small, but what if the principal is 20,000 dollars? Try it out!

R Packages

The designers of R had statistical computing in mind. Because of the motivation, many mathematical functions are available as we have seen in the above. There are also, as mentioned earlier, groups of codes that you can use when necessary. We call these packages. Here is a list of R packages; the list is quite long.

Using an R package comes in two stages.

In the first stage, you download the package on to your computer where the R environment can see. When you download an R environment and install it, it sets up the place for it to hold the packages, so there is no need to worry about remembering where they are. The process is by calling the function install_packages(...) where the part ... represents its argument, which is the name of the package.

In the second stage, the installed packages are now readily usable. When you use one, you have tell the R environment you are using that package. The process is by calling the function library(...) where the part ... represents its argument, which is the name of the package.

Here are the two steps in a nutshell: to install a package, you enter install.packages(PACKAGE_NAME); to use it, you enter library(PACKAGE_NAME).

A package is a collection of objects. Each object has a name. Sometimes two packages contain objects having the same names. If such objects appear, the R environment warns you about the duplicate names, and chooses to use the most recent name assignment ignoring the earlier ones. Also, if you attempt to load a package with the library function and if the package needs installation, the R environment gives you a warning. Once you have installed the package, the warning will go away. You can also install packages as many times you want.

The package we will most frequently use in this text is tidyverse, which is actually a suite of packages designed for doing data science. Once you have installed it, loading the package is as we stated earlier.

What is the tidyverse?

The tidyverse is a collection of packages, each having to do with data processing, transformation, and visualization. The packages in the suite include:

  • stringr for manipulating string attributes
  • readr for reading data from data files
  • tibble for creating tables and making simple modifications
  • dplyr for data manipulation (e.g., filtering, reordering, etc.)
  • tidyr for tidying up data tables
  • ggplot2 for visualization (e.g., creating graphs)
  • purrr for data mapping and processing nested data

The list seems quite long, but don’t worry; we will introduce them gradually throughout the text and only as needed. Each package should be thought of as building blocks that “stick” together and work in unison with other members in the tidyverse. Together they build a complex data science pipeline, transforming raw data into insights and knowledge that can be communicated.

Note that once we have loaded the tidyverse, we can call any of the functions from the individual packages.

Exercises

Be sure to install and load the following packages into your R environment before beginning this exercise set.

Question 1 Suppose we manipulate two variables a and b to which we store the values 210 and 5, respectively.

a <- 210
b <- 5

We compute a series of values in the variables c1, c2, c3, and c4 as follows:

  • Give the value of a divided by b to c1.
  • Set c2 to the maximum of either the subtraction of b from c1 or the subtraction of c1 from b.
  • Set c3 by summing c2 and the result of raising the value of a to the power of 3.
  • The value of c3 is too large! Set c4 to the natural logarithm of c3.

Write the code for doing the calculation. After completing the assignments to all of c1, c2, and c3, print the values of the three variables.

Question 2: Names, expressions, and error messages This question lists some R code chunks that produce error messages. Also given are some explanations of what’s wrong with the code. Choose the explanation that best explains each problem. Comment on your answer.

  • Question 2.1

    4 <- 2 + 1
    1. 4 is not 2 + 1 so R does not allow us to assign 2 + 1 to 4.
    2. The issue is that this needs to be rewritten as 2 + 1 <- 4 because
        the code is trying to say that 4 is the result of adding together
        the numbers 2 and 1.
    3. 4 is a number, and we cannot make a number a name for some quantity.
    4. <- is not valid here. We need to replace the <- symbol with =
        because we are trying to show that 4 is equivalent to the
        expression 2 + 1.
  • Question 2.2

    fifty <- 50
    ten <- 9
    eight <- fifty divides ten
    1. On the third line, the name fifty cannot be followed directly by
       another name.
    2. Fifty divides ten is five, not eight.
    3. The divides operation only applies to numbers; it does not work
       for the names “fifty” and “ten”.
    4. On the second line, the name “ten” can only be assigned to the
       number 10; it cannot be assigned to another number, e.g., 9.

Question 3: A Smoothie Difference Suppose you’d like to quantify how dissimilar two Smoothie Queen drinks (20 oz.) are, using three quantitative characteristics. We are told the following important traits about these smoothies (among many others).

Trait Peanut Power Banana Ironman Vanilla
Price ($) 5.29 5.29
Added sugar amount (g) 49 65
Protein (g) 16 24

You decide to define the dissimilarity between two smoothies as the maximum of the absolute values of the 3 differences in their respective trait values. The built-in functions max() and abs() can help you.

  • Question 3.1 Using this method, compute the dissimilarity between Peanut Power Banana and Ironman Vanilla. Name the result dissimilarity_smoothie. Use a single expression (a single line of code) to compute the answer. Let R perform all arithmetic, e.g., 10 - 2, rather than simplifying the expression yourself, e.g., 8.

  • Question 3.2 The American Heart Association (AHA) suggests an added-sugar limit of no more than 36 grams of sugar for most men. If the added sugar amount in the table above was measured as a percentage of the added-sugar limit for men rather than an amount in grams, what would be the dissimilarity between these two smoothies? Assign either 1, 2, 3, or 4 to the name smoothies_q2 below.

        1. 0
        2. 8
        3. 16
        4. 141

Question 4: Signing a Loan Suppose you are signing a loan in the amount of $1,000 with a 3% monthly interest rate. On the day you received the borrowed money, the loan starts. At each monthly anniversary the money you own is updated with the multiplication by 1.03. You may choose to pay money at that point, and, if so, your payment is subtracted from the updated amount. Naturally, if you choose to make no payments, the amount you owe keeps growing.

  • Suppose you skip payments for two full years. How much will you own? Calculate your answer in a variable owed and print its value.

  • Compute also how much more the owed money is from the original money you are borrowing, store it in a variable pure_interest, and show its value.

Question 5: Half-life Radioactive elements decay over time by losing their radioactivity. Each such element has its rate of decay. We use the term ‘half-life’ to refer to the speed. The half-life of an element is the time that it takes for a radio-active element to lose half of its radio-activity.

  • Question 5.1 Carbon-14 is an isotope of a radioactive element Carbon. The most prominent Carbon has mass of 12, while Carbon-14 has mass of 14. The natural abundance of Carbon-14 is one in trillion. Its half-life is 5730 years. Store the half-life in a name half_life_years and then, assuming that one year has 365 days, calculate a whole number half_life_seconds representing how many seconds are equal to the half-life of Carbon-14.

  • Question 5.2 Continuing on the previous question, the radioactivity of Carbon-14 is reduced from 1.0 to how much in 100 thousand years? Note that at every half-life anniversary, the amount is reduced to a half of its previous value. So, with \(x\) repetitions of half-life, the amount is reduced to \((\frac{1}{2})^x\), where \(x\) can be a real number. Write a single expression that computes the amount in question and store the answer in a name how_much_100k.

  • Question 5.3 Continuing on the previous question, the radio-activity of Carbon-14 is reduced from 1.0 to how much in 2 thousand years? Write a single expression that computes the answer and store the result in the name how_much_2k.

  • Question 5.4 By reversing the calculation from the previous question, you can identify the age of a carbon compound from the proportion of Carbon-14. Let’s say the number of years \(y\) is \(\mathrm{age}\) divided by \(h\), where \(h\) is the half life. The proportion \(r\) of the remaining Carbon-14 is \(0.5^y\). Now, the age of the compound, \(\mathrm{age}\) can be obtained as:

    \[ age = - \log_2(1/r) * h \]

    where \(\log_2\) is the logarithm base 2.

    Now, assuming that the proportion of Carbon-14 has been reduced to one-fiftieth of the naturally occurring proportion, how much do you estimate the age of the compound is? Give a single expression to compute the number and store this in a name age.

Question 6: Walk the Earth The Earth is almost a sphere. Its radius is approximately 3958.8 miles. Suppose two points A and B on the surface of the Earth has angle \(d\). The surface distance between A and B is \(d/360\) of the circumference of the Earth. The circumference of a sphere with radius \(r\) is \(2\pi r\). After consulting with an app on your smartphone, you learn that the angular distance between the two points is 4.5, which is stored in the following name:

a_to_b_ang_dist <- 4.5 
  • Question 6.1 Compute in a name c the circumference of the Earth using a single expression. In your expression, refer to the constant \(\pi\) without having to define it explicitly as the non-repeating number 3.141593..

  • Question 6.2 Write an expression to compute the surface distance between the two points and store the result in a name d1.

  • Question 6.3 A mile is equal to 1610 meters. How many meters are equal to this distance? Write an expression to compute the answer and store the value in a name m1.

  • Question 6.4 Suppose you walk 80 meters per minute. How long does it take for you to walk straight from A to B in minutes and in hours? Write an expression to compute this and store the result in the names minutes and hours, respectively.

  • Question 6.5 Suppose you plan to walk 10 hours on average per day to travel from A and B, finding some lodging on the way each night. How many days will it take for you to make the trip? Write an expression to find the answer. Also, round your answer up to turn it to a whole number (not by hand, use an R function!) and store the value in a name days.

  • Question 6.6 Assuming that your daily expense is $65 dollars, how much will the trip cost you (the daily cost applies even to the last day of the trip). Store the result in cost.

  • Question 6.7 Develop a single expression that computes the total expense of your trip from points A to B using as input the angular distance known between the two points, walking speed in meters per minute, average number of hours per day planned for travel by walking, and your daily expense.

  • Question 6.8 Frustrated with the total time your trip will take, you opt to rent an electric bike instead of walking. This increases your speed to a brisk 210 meters per minute, but also brings up your daily expenses to $160. If you are still willing to set aside 10 hours on average per day for travel, will the total expense of the bike trip end up being less than the walking itinerary? Use the expression you gave from Question 4.7 to help answer this. You should not need to make any auxiliary calculations.

Question 7: A mysterious sighting A group of scientists found a mysterious object suspended midair over Lake Inferior, which they suspect to be extra-terrestrial. The scientists want to pinpoint the location of the object without having to go near it. After some debate, they decide to send one of them to the northernmost point, \(N\), of the lake and another to the southernmost point, \(S\). The two points have a straight-line distance of 1,234 meters. Coincidentally, the drop point, \(C\), of the object, that is, the point on the surface of the lake that the object would land if it fell, happens to be on the straight-line connecting the two points. The two scientists have measured the view angle of the object, which was found to be 40 degrees and 65 degrees. Let us use \(O\) to refer to the location of the object. To summarize, the scientists have learned that:

  • the line \(NS\) has length 1,234 meters,
  • the point \(C\) is on the line \(NS\),
  • the angle \(ONC\) is 40 degrees,
  • the angle \(OSC\) is 65 degrees, and
  • the triangles \(ONC\) and \(OSC\) are right-angle triangles where their right angles appear at \(C\).

They record their findings using the following names:

NS <- 1234 # the line NS has length 1,234 meters
ONC <- 40 # the angle ONC is 40 degrees
OSC <- 65 # the angle OSC is 65 degrees

From the information, we can compute the distance \(OC\), which is the height of the object. We wish to compute the distances \(NO\) and \(SO\) by means of an R expression.

  • Question 7.1 Recall that the unit of angle in R is radian, where 180 degrees is equivalent to \(\pi\) radian. Let us first convert the two angles to their respective radian values. For an angle \(d\) in degrees, its radian is

    \[ radian\_d = (d / 180.0) * \pi \]

    Compute the two radian angles, radianN for the angle \(ONC\) and radianS for the angle \(OSC\).

  • Question 7.2 Let \(h\) be the height of the object (that is, the distance OC). The distance \(NC\), then, is:

    \[ \mathrm{NC} = h / \mathrm{tan}(\mathrm{radianN}) \]

    and the distance \(SC\) is:

    \[ \mathrm{SC} = h / \mathrm{tan}(\mathrm{radianS}) \]

    The sum of the two distances is equal to \(NS\), which is NS meters. We thus have:

    \[ h(1/\mathrm{tan}(\mathrm{radianN}) + 1/\mathrm{tan}(\mathrm{radianS})) = \mathrm{NS} \]

    Solving this for \(h\), we get:

    \[ h = \frac{\mathrm{NS}}{(1/\mathrm{tan}(\mathrm{radianN}) + 1/\mathrm{tan}(\mathrm{radianS}))} \]

    Use this formula to obtain the value of h and store it in the name height.

  • Question 7.3 Now we are ready to compute the distances \(NO\) and \(SO\) as follows:

    \[ \mathrm{NO} = \mathrm{height} / \mathrm{sin(radianN)} \]

    \[ \mathrm{SO} = \mathrm{height} / \mathrm{sin(radianS)} \]

    Compute these quantities and store the results in the names distanceNO and distanceSO, respectively.

  • Question 7.4 Using the computations developed from the above parts, form a single R expression that computes the minimum view distance to the object. Hint: Your answer should use just one line of code!

  • Question 7.5 After several hours, the scientists find the object to be closer to falling in the lake. The same two scientists quickly remeasure the angles from before and find \(ONC\) to be 30 degrees and \(OSC\) 50 degrees. Adjust the single expression given for Question 7.4 accordingly. What is the minimum view distance to the object now?