# Google Charts in R Markdown

## Introduction

An excellent little post (Zoom, zoom googleVis) showed up recently on R-Bloggers. The author Markus Gesmann is the maintainer of the googleVis package that links R to the Google Charts API. My first thought was: could I embed charts like those in R Markdown documents that could knit to ioslides or other formats suitable for use in my elementary statistics classes?

A quick look at the documentation showed that it’s very easy indeed to do this sort of thing.

## Extrapolation

Suppose, for example, that you want to illustrate to students the risks associated with extrapolation. You begin by reminding them of the experience they had back in high school with their graphing calculators, when they zoomed in on a curve: zoom in close enough, and it looks like a straight line.

Then you point out that for the most part we live our lives from a “zoomed-in” perspective, at least where data is concerned. In situations where we are interested in a pair of numerical measurements on individuals, we usually possess $y$-values for only a fairly narrow range of $x$-values. Hence it is likely that a scatter plot we make from our “zoomed-in” data will show a roughly linear relationship, even though on a global scale the “real” relationship probably is some kind of a curve.

The app below (a slight modification of the example in Gesmann’s post) makes the point in a flash. Click and drag to establish a zoom region, right-click to reset:

All we needed was the following code (be sure to add the chunk option results='asis'):

set.seed(2020)
x <- seq(0,100,by=0.5)
y <- (50-x)^2+rnorm(length(x),sd=100)

curvy <- data.frame(x,y)

gvScat <- gvisScatterChart(curvy,
options=list(
explorer="{actions: ['dragToZoom',
'rightClickToReset'],
maxZoomIn:0.05}",
chartArea="{width:'85%',height:'80%'}",
hAxis="{title: 'Explanatory x',
titleTextStyle: {color: '#000000'}}",
vAxis="{title: 'Response y',
titleTextStyle: {color: '#000000'}}",
title="Curvilinear Relationship",
width=550, height=500,
legend="none"),
chartid="ZoomZoom")

print(gvScat,'chart')


The same approach works in any R Markdown document (including the source document for this Octopress-powered post). I will certainly take a closer look at googleVis: thanks, Markus!

# New Wine in an Old Bottle: R Markdown V2 and R Studio on the Cent OS Server

## Introduction

R Markdown Version 2 is a boon to students: with a single click one can convert an R Markdown file to either HTML, PDF or Word format. However, getting this feature to work fully in the R Studio server environment may require a bit of work, especially if you running the Server on a Cent OS distribution. Although I am sure that Cent OS has many virtues, an up-to-date repository is not among them.

This post is the record of an arm wrasslin’ match with Cent OS and the R Studio Server version 0.98.932, from which I emerged more or less victorious. If your IT department hosts RStudio on CentOS, then perhaps the following remarks will make your life a bit easier. On the other hand, if you know your way around Linux better than I do, please feel free to offer quicker or better solutions in the Comments.

Log on to the server, perhaps through ssh (secure shell). Come armed with administrative privileges.

## New Pandoc

R Markdown v2 uses a newer version of the pandocconverter than the one available in the Cent OS repository. Fortunately, R Studio comes bundled with the binaries of a sufficiently recent version of pandoc. You obtain access to these files by establishing symbolic links in the /usr/local/bin directory to the pandoc and pandoc-cite binaries:

## Installing LaTeX Packages

You can get Tex Live from the Cent OS repositories, but the release appears to date back to the year 2007. Therefore it lacks a couple of packages needed by pandoc:

Since you will download these packages from the Comprehensive Tex Archive Network, you’ll want a web-fetch utility such as wget. If it’s not already installed on Cent OS, you can get it with:

Now you can grab the relevant files with wget:

Turning first to ifluatex, we begin by by unpacking the .dtx bundle. This is accomplished with a tex command:

Several files spill out into your Home directory. You care only about ifluatex.sty. Copy it as follows:

As for the framed package, you must first unzip the downloaded file into a directory:

Now copy the framed directory as follows:

Finally, you need to make tex aware of the existence of these new packages with texhash:

Now you may Knit to your heart’s content!

# Five Reasons to Teach Elementary Statistics With R: #3

## Introduction

This is the third in a projected five-part series of posts aimed at colleagues who teach elementary statistics. If you teach with R but hesitate to spring such a powerful and complex tool on unsuspecting introductory students—many of whom whom will have had no prior experience with the command line, much less with coding—then I hope these posts will give you some encouragement.

The previous post in this series described R Studio’s package manipulate and its applications in the easy authoring of instructional applets. Today we’ll look at shiny, a related RStudio project.

In order to try the ensuing examples, download an ancillary package that we use for our elementary course:

## Reason #3: RStudio’s shiny

Shiny appears to be intended primarily for data analysts working in industry or in academic or institutional research, but on the very day of its public release Victor Moreno pointed out its implications for statistics education (see his comment on this RStudio blog post). For statistics instructors Shiny offers essentially the same benefits as manipulate, but in addition comes pimped out with:

• options for dynamic user input;
• output formats that go well beyond manipulate’s home in the Plots pane;
• default Bootstrap styling.

## Examples

### “Slow” Simulation

At my College we believe that simulation is important to understanding probability concepts, but we also find that our students don’t easily grasp the import of a simulation when the computers simply generates, say, 3000 re-samples and summarizes the results, all in flash. We feel the need for plenty of “one at a time” simulation experiences that serve as transitions to the analysis of large-scale simulation results, and we don’t always find apps on the web that cater to our needs in just the way we would like.

Suppose for example you are wondering whether a certain die is loaded. You don’t want to crack it open, so you roll it sixty times, getting the following results:

Spots One Two Three Four Five Six
Freq 8 18 11 7 9 7

This looks like an awful lot of two-spots, but we were not expecting this in advance. By this point in the course students have been made aware of the perils of “data snooping” and hence should be disinclined to employ an inferential procedure that is based specifically on a pattern that one happens to notice in one’s data. Therefore, rather than perform inferential procedures keyed to “Two-spot” side of the die, we might turn instead to the chi-square statistic as a neutral measure of the difference between the observed results and what one would expect if the die were fair.

The situation is addressed in this Shiny app:

http://rstudio.georgetowncollege.edu:3838/SlowGoodness

After re-sampling for a few minutes, students are convinced that it’s not so unlikely, after all to get results like the ones we observed, if the die is fair all along.

Students are then prepared to understand a full-scale re-sampling simulation like the following one:

## Pearson's chi-squared test with simulated p-value
## 	 (based on 3000 resamples)
##
##   observed counts Expected by Null contribution to chisq statistic
## A               8               10                             0.4
## B              18               10                             6.4
## C              11               10                             0.1
## D               7               10                             0.9
## E               9               10                             0.1
## F               7               10                             0.9
##
##
## Chi-Square Statistic = 8.8
## Degrees of Freedom of the table = 5
## P-Value = 0.125


Sure enough, if the die is fair then there is a reasonably good chance—about 12.5%—of getting results at least as extreme as the ones we got in our 60 rolls.

Note: Shiny users know that the apps are liable to run more quickly if you run them locally. To run the foregoing app locally from an R session, pull it out of the tigerstats package:

### Understanding Model Assumptions

Students tend to be somewhat rigid in their handling of “safety checks”—the diagnostics they are instructed to perform in order to judge whether the statistical model underlying a given inferential procedure is appropriate to the data at hand. This rigidity stems partly from a lack of understanding of what the inferential procedure is intended to deliver (for example, that a method for making 95%-confidence intervals for a parameter should produce intervals that cover the parameter about 95% of the time in repeated sampling), and partly from a lack of experience with situations in which the mathematical assumptions of the model are not perfectly satisfied.

The following Shiny app:

• coverage properties of confidence intervals (e.g, what “95% confidence” means, from a frequentist point of view);
• the effect on coverage properties, at various sample sizes, of departures from normality assumptions in procedures based upon the t-statistic.

Both “slow” (one-at-a-time) simulation and large-scale simulation (5000 samples) are available to the student.

### Types of Error

Simulation is also helpful in coming to understand such notions as the level of significance of a hypothesis test (i.e., the probability a true Null Hypothesis in repeated sampling), and the notion of power as well. See the following app:

### Illustrating Fine Points

Sometimes you want to have an app on hand, not because it addresses a major course objective, but simply in case students ask a particular question. For example, sometimes when the class is looking at a scatter plot—with regression line—of data that comes from a bivariate normal distribution, a student will remark that the regression line looks “too shallow”. This root of this question is a confusion, in the student’s mind, between two purposes that a line might serve:

• to provide a “linear summary”” of the scatter plot;
• to provide linear predictions, based on the scatter plot, of y-values from x-values.

The so-called “SD line”—the line that runs through the point of averages and whose slope is the ratio of the standard deviation of the y-value to the standard deviation of the x-value—is well-suited to the former task, whereas the regression line is, of course, the right choice for the latter one. When many students first look at a scatter plot, they see an SD line in their mind’s eye; when they get around to producing the regression line, it can look like a misfire.

The following app helps clear things up for students. It is based on a discussion on the “shallow regression line” issue in Statistics, the classic text by Freeman, Pisani and Purves.

### Playing Games

Here’s is yet another of those “find the regression line” apps that you see all over the web:

You have the option to keep score. Your score is the sum of the number of times you have submitted a guess and the following “closeness measure”:

## Shiny vs. manipulate

You don’t need to know much at all about web development in order to program in Shiny, but for R users there is the extra requirement of becoming comfortable with the reactive programming paradigm. The hurdle is not all that high: as an intermediate-level R-programmer, I was able to pick up Shiny over a weekend. The online Shiny tutorials and a few consultations with Stack Overflow provide almost everything I needed to know.

The pay-back for the extra learning is considerable. Shiny apps permit a much more flexible user-interface, as compared to manipulate. For example, it is easy to make input “dynamic”, in the sense that the requests that a user can make of the app can be easily made to depend upon previous choices that the user has made. It’s also easy to provide plenty of written explanation for the activity, as it proceeds: with manipulate apps this can be somewhat difficult.

On the other hand, since manipulate apps run directly within RStudio, they can easily be programmed to work with any data frame that the user specifies. Shiny apps will allow you to upload a CSV file, but for elementary students this process is usually too much of a burden.

Show Me Shiny has some wonderful instructional apps.

Considering all of the buzz surrounding Shiny, I am baffled at how difficult it is has been for me to find other up-to-date sites featuring Shiny apps for statistics instruction. Perhaps readers of this post could direct me to any that they know of. Eventually it would be nice to develop something like a ShinyTeachingTube, which could serve as a central hub for Shiny instructional applets.

# Course Management With the RStudio Server

## Introduction

At my institution we teach both elementary and upper-level undergraduate statistics using R, in the environment of the RStudio Linux server installed and configured on our campus network. Although students are made aware of the existence of the desktop version of RStudio and eventually are encouraged to install it on their personal machines, the default course environment is that of the server.

One reason for this choice is that the server allows us—instructors working in consultation with our sysadmin—to standardize the R environment (R version, installed packages, etc.) for all class members, so that if we add a feature or fix a problem we have some reasonable confidence that it will work for everyone.

Another reason—which constitutes the theme of this post—is that the server environment facilitates course management, especially in technical respects specific to a statistics course, where standard online content management systems such as Moodle or Blackboard may fall short. The aim of this post is to record, for colleagues at our institution and for folks at other institutions who are considering making the switch to R, the principal ways in which in we have tweaked the server for course-management purposes. R and RStudio are wonderful free software, but like all free software, they come with a certain “cost of ownership”, and those costs can be considerable if (like me) you begin with little in the way of programming/hacking skills. I hope that the following information will reduce the ownership costs for others who choose to teach with R in a similar vein.

## Installation

I assume that you have persuaded your sysadmin to install and configure some version of the RStudio Linux server. My sysadmin chose to set up the Cent OS version, and configured it so that all members of the campus community can access it by means of their username and password.

If your personal machine runs Linux—either the Ubuntu or Cent OS distribution—it’s a good idea to brush up on (or acquire) some very basic command-line skills and to install the server on your own machine as well, so you can replicate some of the strategies described below. Just a little bit of knowledge of the innards—file permissions, etc.—pays off handsomely in being able to work with your sysadmin to diagnose and resolve quickly any problems that arise. I myself run Ubuntu, but have not found significant differences between how the server works for me and how it works on campus.

## Establishing a “Common Source” Folder

Ask your sysadmin to grant superuser privileges t oyou and other course instructors. Then one of you should create a folder in your Home directory on the server that will serve as a common source for course material. The sysadmin can create a symbolic link to the folder and can set permissions so that all users may read files in the folder but only you and fellow instructors can write to it. This folder serves as the repository for assignments, solutions, syllabi, etc.

If you are not the owner of the folder, you can get to it using the ellipses button in the upper right-hand corner of the Files pane. Simply enter the path-name as specified by your sysadmin. For one of our courses it is simply: /mat111.

From there you can navigate the directory structure in the Files pane, in the usual way. To reset the Files pane view back to your Home directory, push the Ellipses button again and enter: ~.

All of the foregoing will make sense to you once your have studied Unix-like directory structures.

## Automated Assignment Collection/Return

Once our elementary students have acquired some proficiency with R, we introduce them to R Markdown and require them to turn in certain homework and project assignments as R Markdown documents. We write comments into a copy of the assignment and return it to the student. One of the best arguments for teaching in the server environment is that this collection and return process can be automated. Here’s how we do it these days.

First of all, each instructor should create a text file consisting of the network usernames of student in his or her course (or section thereof), one username per line, and name it something like students.txt.

Save the file in your Home directory on the server.

You are going to create some sub-directories in Home directories of your students, so for this you will need to act as a superuser. This action will in turn require you to provide your password to the computer. For security reasons, you don’t want to send out the password every time you perform a superuser action, so you need to encrypt your password and provide a key in its place. For this purpose our sysadmin has written the following Perl script:

The above script, and others to follow, are house in /scripts. You will use it to create an encrypted version of your password that is stored in a new file in your Home directory. To run the script, issue the following (suitably modified) command in R:

system("perl /scripts/createpasswordfile.pl --password=<YourPassword> --key=<YourChosenKey> --file=</path/to/YourFavFileName.txt>")


After you run the script, clear your R History: you don’t want to leave your password hanging out in the open.

### Create Subdirectories

Here is the Perl script that we currently use to create submit and returned directories in the Home directory of each student in the class. Obviously your sysadmin will modify it to suit the file structure of your server.

To run the script, issue the following R command, suitably modified:

system("perl createdirectories.pl --studentfile=<StudentFileName>")


There are options to receive an email report confirming the creation of the directories, and to set permissions for them as well. Currently we use the default settings.

### Collect Assignments

Students save an assignment into their submit directory, named according to some convention that you establish. Specifics vary, but the name must end with an underscore followed by the student username. For example: HW05_jdoe.Rmd is the fifth homework assignment, submitted by the student with username jdoe.

The Perl script for collection of assignments is as follows:

To run the script issue a command like the following:

system("perl /scripts/collecthomework.pl --instructor=<yourUsername> --assignment=<assignCode> --studentfile=students.txt")


If you would like to receive an email with a list of all students from whom you got an assignment, run this instead:

system("perl /scripts/collecthomework.pl --instructor=<yourUsername> --assignment=<assignCode> --studentfile=students.txt --email=<yourEmailAddress>")


You can run the collection script as often as you like: it will pick up newly-submitted assignments but will not overwrite assignments collected from other students in a previous run.

### Return Assignments

All of the assignments you collect appear in a homework folder in your Home directory, in sub-directories by assignment name and sub-sub-directories by student username. Navigate to the assignments one by one. For each assignment, open the R Markdown file and save it with an additional tag in the file name that will mark it out as the graded/commented copy to be returned to the student. We use _com as our tag, creating files like this: HW05_jdoe_com.Rmd.

For returning assignments, we have the following Perl script:

To run the script, you need the key for your encrypted password. Run a command like the following:

system("perl /scripts/returnhomework.pl --path=/path/to/yourUsername/homework/HW05/ --flag=_com --studentfile=/usr/local/sbin/YourUsername-YourCourse.txt --key=YourChosenKey --passwordfile=password_file_YourUsername.txt")


Note that the sysadmin has established, for each instructor, a file in /usr/local/sbin of student usernames for the instructor’s course. As students drop your course and you edit your local student file accordingly, the two files may fall out of sync, but the return script will still work correctly for students still enrolled in the course.

All in all, the server environment has proven to be quite useful for our courses. Nevertheless, there are a few complications and potential problems to keep in mind.

• Students can read from the Common Source, directory, but cannot write to it. If a student wishes to perform an “knitting” type of action to a file in the Common Source directory—e.g., knitting an R Markdown to HTML or previewing an R Presentation document—then she must save a copy into her Home directory and perform the knitting operations upon it. The same often goes for other instructors (default file permissions are still a bit unclear to us).
• Shiny apps are wonderful. We put them into the ancillary package that we use for our own elementary course, so that R users can run them locally once the package is installed, or run them locally after downloading them from the package’s Git Hub repository. However, at many institutions the firewalls don’t permit execution of the Shiny scripts. If this is the case at your own institution and you want your students to work with Shiny apps, then you must either install and configure the Shiny server or deploy the apps yourself on a site hosted by RStudio, e.g., http://shinyapps.io/. We have experimented with both venues and are pleased with the results.
• A small percentage of users eventually experience mysterious problems—e.g., loss of ability to knit an R markdown document more than once in a single server session—that we have not been able to diagnose and resolve completely. If the problem becomes sufficiently severe, a student could always use the desktop version, but this in itself creates a course management problem. Larger institutions than ours may wish to consider paying for the Enterprise version of the RStudio server, and the support that accompanies it.