I have recently completed a working beta of an R package for the regression discontinuity design with the help of Cyrus Samii. Moreover, it has been accepted to CRAN, so you can download it and mess around. I’ll briefly introduce the theory, and then continue with the some examples (there are also examples available as part of the package documentation, along with more full explanations of the available options). A more rigorous introduction to this software is currently being prepared for publication.

Most simply, rdd is a package for non-parametric estimation of sharp or fuzzy regression discontinuity designs. It works nearly identically to Austin Nichols’ `rd’ function for Stata. There are a number of things which should be noted, however.

Specification of the formula is a bit idiosyncratic, so be sure to look at the documentation. It is essentially specified as Outcome ~ Running_variable + Treatment_variable | Covariate_1 + Covariate_2, omitting terms that are unnecessary as the case may be.

Bandwidth can either be specified or it will be calculated automatically using the Imbens-Kalyanaraman optimal bandwidth procedure. The kernel used automatically is the triangular kernel, although other options are available. The cutpoint is assumed to be zero if nothing else is specified.

Justin McCrary’s routine for testing the density of the running variable for a discontinuity is also included (including a plot thereof).

Covariates are handled differently than in the Stata routine. Unlike Nichols, I chose to interact the covariates with treatment (in particular, a logical indicating above or below the cutpoint). This follows from some of the latest work done on the RD design. Interacting treatment with the covariates is likely to improve efficiency of estimation, and it requires no additional assumptions that aren’t already being made regarding smoothness of covariates across the cutpoint. If there is an outcry on this point, I am fully willing to make this choice an optional parameter. Although I feel that its likely that a strong design should be robust to this sort of thing.

There are many options for standard errors beyond the default Huber-White heteroskedastic robust ones. Most importantly, cluster robust SEs are included as an option (such as for use with a discrete running variable). Subsetting is handled, as are missing values. Internal model frames may be returned if desired by an option.

A useful addition is the plotting routine available in this package, which should automate the rather cumbersome process of plotting the discontinuity, particular in the case of dichotomous variables.

The workhorse function for estimation is RDestimate. Density tests can be performed with DCdensity. Methods are provided for summary and plot (for the RD object returned from RDestimate).

Further development will look to add non-parametric covariate adjustment as per Markus Frölich. This is likely still several months away, however. I am also open to any other additional features or improvements that might be desired. It is also possible that there may exist some bugs. Please report those to me, as well (that’s why it’s still labelled a beta)! Also, feel free to ask for clarification if the documentation doesn’t make something sufficiently clear.

Example analysis is available below the fold.

Example replication of Dal Bó, Dal Bó and Snyder’s “Political Dynasties’’ with Stata code and data available here.

require("foreign") #For Stata data
require("dummies")

setwd("C:/Users/drewdim/Dropbox/rd/paper/dalbodalbo")
d<-read.dta("PoliticalDynastiesData.dta")

#Set up data as per ``Political Dynasties''
reg <- data.frame(dummy(d$region))
dec <- data.frame(dummy(d$decade))
reg<-reg[,-1]
dec<-dec[,-c(1,10,20,21,22)]
nreg<-names(reg)
ndec<-names(dec)
d<-cbind(d,reg)
d<-cbind(d,dec)
rm(list=c("dec","reg"))

covs<-c(nreg,ndec,"democrat","republican","female","collegeatt",
        "outsider","preanyoffice","age","military","farmer",
        "lawyer","business")
sub<-d$year==d$yearenter & d$realbirthyear<=1910 & 
        d$diedinoffice==0 & d$prerelative==0

# Simple RD estimate
rd<-RDestimate(postrelative~marginvote+longterm,data=d)

# Full RD estimate
rd<-RDestimate(
  paste("postrelative~marginvote+longterm|",paste(covs,collapse="+"),sep=""),
  data=d,
  subset=sub)

summary(rd)
plot(rd)

# With different bandwidths
rd25 <- RDestimate(
  paste("postrelative~marginvote+longterm|",paste(covs,collapse="+"),sep=""),
  data=d,
  subset=sub,
  bw=2.5)

rd5 <- RDestimate(
  paste("postrelative~marginvote+longterm|",paste(covs,collapse="+"),sep=""),
  data=d,
  subset=sub,
  bw=5)

rd10 <- RDestimate(
  paste("postrelative~marginvote+longterm|",paste(covs,collapse="+"),sep=""),
  data=d,
  subset=sub,
  bw=10)

# Bind the output into a table
rbind(rd.default=rd[c language="("est","se","z","p")"][/c],
      rd2.5=rd25[c language="("est","se","z","p")"][/c],
      rd5=rd5[c language="("est","se","z","p")"][/c], 
      rd10=rd10[c language="("est","se","z","p")"][/c])

# Plot the discontinuity
par(pty="m")
pdf("PD_example_1.pdf",width=7.5,height=5)
plot(rd,which=1)
title(xlab="Vote Margin (%)",ylab="Proportion with Post-relative")
dev.off()

pdf("PD_example_2.pdf",width=7.5,height=5)
plot(rd,which=2)
title(xlab="Vote Margin (%)",ylab="Proportion Long term")
dev.off()

pdf("PD_example_3.pdf",width=7.5,height=5)
DCdensity(d$marginvote[sub])
title(xlab="Vote Margin (%)",ylab=''Density'')
dev.off()