 
	
  
I am a Canadian data scientist and software engineer currently living in Australia. I have extensive experience in applied statistics, machine learning, lean and agile methodologies, R, Bayesian analysis, and data pipelines.
For the last six years, I developed solutions for market research and business intelligence software that enabled customers to automate the cleaning, statistical analysis, visualization, and reporting of their survey data.
I am an experienced Scrum Master and Agile practioner, certified as a SAFe Practice Consultant (SPC) and RTE. Please contact me if you are interested in training or coaching.
Previously, I was an academic researcher in applied statistics specializing in statistical computing, semiparametric and high-dimensional regression, time series analysis, and hierarchical Bayesian modelling. I am passionate about software development and love working with data from a diverse range of fields: market research, finance, nutrition, and many others.
I obtained a PhD in Operations Research with a concentration in Applied Probability and Statistics in 2013 from the School of ORIE at Cornell University under the supervision of David Ruppert and Giles Hooker. After Cornell, I completed postdocs with distinguished professors Ray Carroll at Texas A&M University and Matt Wand at University of Technology Sydney.
     M. W. McLean,
	M. P. Wand.
    Variational Message Passing for Elaborate Response
    Regression Models
.
    In: Bayesian Analysis 14.2 (Jun. 2019), pp. 371–398.
  
  [PDF] 
 K. Triff, M. W. McLean,
	  E. Callaway, J. Goldsby, I. Ivanov,
	  and R. S. Chapkin.
Dietary Fat and Fiber Interact to Uniquely Modify Global
                  Histone Post-Translational Epigenetic Programming in
                  a Rat Colon Cancer Progression Model
.
In: International Journal of Cancer 143.6 (May 2018), pp. 1402–1415.
  
	   M. W. McLean (2017).  RefManageR: Import and Manage
	  BibTeX and BibLaTeX References in R
.  Journal of Open Source
	  Software.  R package also accepted
	  to ROpenSci.  
	  [JOSS |
	  GitHub
	  | Reports]
	
 K. Triff, M. W. McLean, K. Kranti, J. Pang,
	  E. Callaway, B. Zhou, I. Ivanov,
	  and R. S. Chapkin.  Assessment of
	  Histone Tail Modifications and Transcriptional Profiling
	  During Colon Cancer Progression Reveals a Global Decrease in
	  H3K4me3 Activity
.
	In: BBA - Molecular Basis of Disease 1863.6 (Jun. 2017), pp. 1392–1402. 
	
	[Elsevier]
	
   
       M. W. McLean, G. Hooker,
      D. Ruppert. Restricted Likelihood Ratio
	Tests for Linearity in Scalar-on-Function Regression
.
	In: Statistics and Computing 25.5 (Sep. 2015),
	pp. 997–1008.  
	[arXiv
	| Springer]
	
	 M. W. McLean, G. Hooker,
	 A.-M. Staicu, 
	 F. Scheipl, D. Ruppert.  Functional
	 Generalized Additive Models
. In: Journal of Computational
	 and Graphical Statistics 23.1 (Feb. 2014),
	 pp. 249–269.  
	 [T & F Online
	 | Supplementary
	 Materials]
	
 
	D. S. Matteson, 
	M. W. McLean, 
	D. B. Woodard
	, 
	S. G. Henderson.  Forecasting Emergency Medical
	Service Call Arrival Rates
. In: Annals of Applied
	Statistics 5.2B (Jun. 2011), pp. 1379–1406.  
	[ProjectEuclid
		| arXiv
        | Supplementary Materials]
	
  
	   M. W. McLean, C. J. Oates,
	  M. P. Wand (2017).  Real-Time
	  Semiparametric Regression via Sequential Monte Carlo.
	  
 [PDF] 
	
  
	  
	  M. W. McLean, F. Scheipl, G. Hooker, S. Greven, D. Ruppert  (2017).  Bayesian
	  Functional Generalized Additive Models for Sparsely Observed
	  Covariates
.  
	  [arXiv]
	
	   M. W. McLean (2014).  Straightforward Bibliography
	  Managament in R Using the RefManageR Package
.
	  
 [arXiv]
	
	  On Generalized Additive Models for Regression with
	  Functional Data (2013). School of Operations Research and
	  Information Engineering, Cornell University.  
	  [PDF]
	
Note: The HTML5 slides below may be printed in any decent browser by 'right-clicking' on any slide and selecting 'print'.
 2014 WNAR/IMS Conference, Honolulu, HI (2014)
	    
	    
 2014 ENAR Spring Meeting, Baltimore, MD (2014)
	  
R  IAMCS Machine Learning and Applied Statistics Workshop
	    Series, College Station, TX (2014) 
	    [Slides (HTML)]
	    (These slides also contain a brief introduction to
	    my R
	    package RefManageR)
	  
R Programming and Authoring R Packages  IAMCS Machine Learning and Applied Statistics Workshop Series, College Station, TX (2014)
	
 [Slides (HTML)]
	
Department of Statistics, University of Auckland, Auckland, New Zealand, August, 2017
 Annual Meeting of the German and Austrian Statistical
	Associations, Vienna, Austria (2012) 
	[Slides
	(PDF)]
	
 ORIE PhD Colloquium, Ithaca, NY (2012)
	
 [Slides (PDF)]
	
 ORIE PhD Reunion / Jack Muckstadt Retirement Celebration, Ithaca, NY (2012)
	   
[Poster (PDF)]
     
	   Statistical Methods for Very Large Datasets Conference, Baltimore, MD (2011)
        
 
		SAMSI Analysis of Object Data Closing Workshop, Research Triangle Park, NC (2011)
        
 
		Imaging, Communications and Finance: Stochastic Modeling of Real-world Problems, New York, NY (2011)
		
 [Poster (PDF)]
     
	My postdoctoral research focused on building models
	and algorithms for high-dimensional and streaming data
	applications.  Specifically, approximations
	to fully Bayesian inference such as variational Bayesian
	methods.  Additonally, I worked on developing semiparametric models to
	analyze longitudinal and functional data.  I have a passion
	for statistical computing and reproducible research, and have
	contributed to a number of R packages
	on CRAN.
      
Previous areas of application have included diffusion tensor imaging, forecasting emergency medical service call arrival rates, differential expression analysis of RNA-Seq data, analyzing step counts from wearable activity trackers, measurement error modelling for dietary intake data, and financial time series. Below is an introduction to a couple of applied problems I have worked on.
We have data from a study comparing certain white matter tracts of multiple sclerosis (MS) patients with control subjects. MS is a central nervous system disorder that leads to lesions in the white matter of the brain which disrupts the ability of cells in the brain to communicate with each other. Below is a human brain with two major white matter tracts, the corpus callosum and corticospinal tracts in red and blue, respectively (thanks to Jeff Goldsmith).
 
       Using a Magnetic Resonance Imaging technique called Diffusion Tensor Imaging we obtain several measurements at each location in the white matter tracts. Here it makes sense to treat these measurements as being from a continuous function X(t) where t is location along the tract and use tools from functional data analysis. We wish to use the functions X(t) to predict a scalar health outcome Y. For example, Y might be score on a cognitive test or disease status (case or control). The typical approach in functional data analysis is to use the Functional Linear Model (FLM)
where β(t) is an unknown coefficient function that needs to be estimated and g(•) is a known monotonic function called the link function. Below is a plot of some of the curves Xi(t) as well as a corresponding estimated β(t).
 
		However, this model is often not flexible enough to model the true underlying relationship between the response and predictor. We propose the new model
where F(x,t) is an unknown smooth bivariate function that we estimate using penalized regression splines. The model is much more general than the FLM, but retains the ease of interpretation. Below is an example estimated surface.
 
		Here Y is 1 if the subject has MS and 0 if they do not and g[E(Y|X)] gives the estimated probability that subject i has MS. The red plotted curve is X(t) for one subject with MS and blue curve is for a subject without MS. We see that since the red curve traces out a higher path on the surface than the red curve, the subject corresponding to the red curve is more likely to be classified as having MS.
You may fit the FGAM and FLM yourself to several different data sets using a Shiny app I created. See the Software tab.
In this project we are interested in determining how to best deploy ambulances in a given city to minimize response times to emergency calls while keeping costs low. More specifically, we have data on every ambulance trip and every emergency call received by Toronto EMS between January 2007 and December 2008. The operations research models that address these deployment problems require accurate estimates of the number of calls that will be received during each time period. Below is a plot of the mean number of calls per hour for each day of the week.
 
 
        Methods used in practice to obtain estimates are quite ad hoc. Instead, what we do is build what is called a factor model with constraints. This provides dimension reduction, accounts for seasonal and intra-day patterns in the call arrival rate process, and lets us incorporate additional covariates. The factor model is estimated using penalized regression splines so that the factors and loading in the factor model vary smoothly over time. Using the factor model we obtain smooth estimates for the arrival process for every day of 2008 which is plotted below. As is to be expected, there is a very noticeable difference between the estimates for weekdays and weekends.
 
        Finally, an integer-value GARCH model is fit to the residual process to account for any remaining dependence. Below is a plot of the estimated number of calls for each hour of weeks 8 and 9 of 2007 as well as a plot of the estimated integer-value GARCH model fit to the residual process.
 
		How do we check whether our model improves staffing levels and response times?
By simulating an M/M/s queueing system where the servers represent ambulances, calls for emergency medical services in hour, t, occur at rate λt, and each ambulance services callers at rate ν. We use our model to obtain estimates of the number of callers for each period and these determine the number of ambulances to staff, say st, in each period. After fitting the model to 2007 data, to initialize the queueing system we assume the number of arrivals each hour is the observed number of calls for the corresponding hour for 2008 and simulate inter-hour arrival times and service times for each caller.
 
   RPackages 
	I have authored and help maintain several  
  	  The following functions were contributed to
	the    Provided assistance getting
	  the the R packages for Displayr. These include packages for data
	cleaning, regression, machine learning, and choice
	modeling. Most of the packages are publicly available on GitHub here.
	RefManageR - a Reference Manager for R RefManageR provides tools for importing and
	 working with bibliographic references.  It greatly enhances
	 the bibentry class by providing
	 a class BibEntry which
	 stores BibTeX and
	 BibLaTeX references,
	 supports UTF-8 encoding, and can be easily searched by any
	 field, by date ranges, and by various formats for name lists
	 (author by last names, translator by full names, etc.).
	 Entries can be updated, sorted, combined, printed in a number
	 of styles, and exported. BibTeX
	 and BibLaTeX .bib files can be read
	 into R and converted to BibEntry objects.  Interfaces to
	 NCBI's Entrez,
	 CrossRef and 
	 Zotero are provided for importing references
	 and references can be created from locally stored .bib files or PDFs using 
	 Poppler.  The package can
	 also be used in RMarkdown and RHTML
	 documents for including citations and printing a bibliography
	 of all cited entries.  Both the bibliography and the
	 citations can include automatically generated hyperlinks.
	 The package
	 is available now on CRAN and
	 a vignette is available
	 here.  The package was peer-reviewed and
	 accepted by ROpenSci.
	  
	refund - REgression with FUNctional Data refund package available on
	CRAN
	and GitHub:
	  
To fit FGAM in the latest versions of fgam - a wrapper for the 
	  gam function in package  mgcv to fit
	  functional generalized additive models.  predict.fgam - wrapper for the 
	  predict.gam function in package  mgcv
	  for prediction with  fgam fits.  vis.fgam - for visualizing estimated
	  surfaces produced by  fgam.  af - internal function for building fgam
	  terms specified in model formulas passed to 
	  fgam.  lf - internal function for building
	  functional linear model terms specified in model formulas
	  passed to  fgam. refund, the
	function fgam is deprecated in favour of the
	function pfr, which can fit several different scalar
	on function regression models including FGAM.
      
	curvHDRR
	  package accepted to CRAN, including setting up package
	  namespace and including compiled code.
	  
Shiny Apps Shiny 
	    Below is a Shiny
	    app I created that performs estimation,
	    visualization, inference, and prediction for the functional
	    generalized additive model (FGAM) and the functional linear model
	    (FLM).  More information on these models can be found in the Research
	    tab and in my papers from the Papers tab.  Begin by selecting a data
	    set and choosing model parameters and an estimation method (if you are
	    familar with penalized splines).  You may also specify options for
	    displaying the fits, conducting a hypothesis test of FLM vs. FGAM, and
	    performing out-of-sample prediction.  When you have selected the
	    setting you want, you may fit the models by clicking the "Fit
	    Models" button.  Source code for the app is available
	    on GitHub.
	  
splines package in R
	means this code no longer runs in current versions
	of R.  Those interested in fitting an FGAM should
	see the pfr function in the
	package refund.
	Below is a sampling of blog posts I wrote in a previous role. They aim to describe aspects of R, choice modelling, and using the software Q and Displayr for a general audience of non-programmers and data scientists.
A complete list of my posts with better styling can be found on the Displayr blog.