R provides a number of handy features for working with date-time
data. However, the sheer number of options/packages available can make
things seem overwhelming at first. There
are more than 10
packages providing support for working with date-time data in
R, as well as being able to use the as.Date( )
function to
convert character data to dates. In this post, I will provide
an introduction to the functionality R offers for
converting strings to
dates. Converting dates entered as strings into numeric dates in R is
relatively simple for a single string and for a vector of strings if
you have consistent date information. It is far trickier if the date
information is represented inconsistently. I'll discuss common
pitfalls and give helpful tips to make working with dates in R less
painful. Finally, I introduce some code that my colleagues and I wrote
to make things a bit easier (with the flipTime
package).
When dates are provided in the format of year followed by month
followed by day, such as 2017-12-02
, you can use
the as.Date
function.This tells R to think of them as
being calendar dates. For example:
months(as.Date("2017-12-02"))
returns a value of December
weekdays(as.Date("2017-12-02"))
returns a value of Saturday
as.Date("2017-06-09") - as.Date("2016-05-01")
returns a value of of 404
and prints on the screen Time difference of 404 days.
difftime(as.Date("2017-06-09"), as.Date("2016-05-01"), units = "hours")
returns a value of 9696
and prints on the screen Time difference of 9696
hours
.format(as.Date("2017-01-02"), "%A, %d-%b. %Y")
prints "Monday, 02-Jan. 2017"
as.Date
does not work. For example, as.Date("17-12-2010")
returns 0017-12-20
.
Things get even more complicated when input data contain times, as then we need to handle issues like time zones and leap seconds. R provides the classes POSIXct
and POSIXlt
for working with date-time data. POSIXct
corresponds to the POSIX standard for calendar time and POSIXlt
corresponds to the POSIX standard for local time. Also, the POSIXct
class is more convenient for inclusion in R data frames.
I am now going to review some of the more useful packages. However, if you are a bit of a guru, just skip to the section on flipTime
, where I have documented the stuff we have done.
lubridate
lubridate
package provides a number of useful functions for reading, manipulating, and doing arithmetic with dates in R. It provides the functions parse_date_time()
and parse_date_time2()
, which can be used to quickly convert strings to date-time objects. Their convenience stems from allowing the user to specify orders to convert the strings, but without the need to specify how to separate the different components.
The parse_date_time()
function allows the user to specify multiple orders at once. Additionally, it determines internally which is best to use to convert the input strings. It does this by training itself on a subset of the input strings and ranking the supplied orders. (The ranking is based on how often they successfully convert the strings in the subset). By contrast, the parse_date_time2()
function does not allow multiple orders to be specified at once since it supports fewer orders overall. However, it is faster when you need to convert a large number of strings.
For example, if we use:
parse_date_time("10-31/2010", orders = c("ymd", "dmy", "mdy"))
we get 2010-10-31 UTC
For shorter input vectors, lubridate
can give strange results because it is so "aggressive" when performing the conversion. (Because I want to make things easier for skim reading, for the rest of the post I will put the output immediately beneath the code, with ##
indicating it is the result of running the code.)
parse_date_time("July/1998", orders = c("bdy", "bY"))
## [1] "1998-07-19 UTC"
parse_date_time2("Jan 128", orders = "mdy")
## [1] "2008-01-12 UTC"
parse_date_time2("3.122", orders = "ymd")
## [1] "2003-12-02 UTC"
anytime
Another popular package for reading date strings into R is anytime, which uses the Boost date_time
C++ library. It provides functions anytime()
and anydate()
for date conversion. The package supports converting other R classes such as integer and factor to dates in addition to converting character strings. The user does not need to specify any orders or formats, as anytime()
and anydate()
will guess the format (from a default list of supported formats). Furthermore, you have the possibility of including additional formats using the addFormats()
function.
As with lubridate
, anytime
can give strange results because of how aggressive it is with trying to convert the strings. For example, it does not support formats with two-digit years by default. Furthermore, it does not support at all strings containing “AM/PM” indicators. It is inconvenient, sometimes impossible, in some situations to specify whether a numeric month comes before or after the day in a date string.
Additionally, there may be situations where there is ambiguity (eg: is “01/02” January 2nd or February 1st?). In these situations, we’d like to be able to tell the function whether day comes before month or not. Where we are not sure, it's helpful to have a warning. Unfortunately we don't get that here.
library(anytime)
anydate("3.145")
## [1] "1400-03-14"
anydate(3.145)
## [1] "1970-01-04"
anytime(c("10-11-2011 5:30AM", "16-10-2011 10:10pm"))
## [1] "2011-10-10 23:00:00 AEDT" NA
flipTime
The package flipTime
provides utilities for working with time series and date-time data. The package can be installed from GitHub
using
require(devtools)
install_github("Displayr/flipTime")
I will discuss only two functions from the package in this post, AsDate()
and AsDateTime().
These are used for the conversion of date and date-time strings, respectively. These functions build on the convenience and speed of the lubridate
function. Furthermore, the flipTime
functions provide additional functionality (making them easier to use). The functions are smart about identifying the proper format to use. So the user doesn't need to specify the format(s) as inputs. At the same time, both AsDate()
and AsDateTime()
are careful to not convert any strings to dates when they are not formatted as dates. Additionally, it will also warn the user when the dates are not in an unambiguous format.
AsDate()
and AsDateTime()
are very flexible with respect to what they permit as characters to separate the components of the date strings.
library(flipTime)
AsDate("Jan. 10, 2016")
## [1] "2016-01-10"
AsDate("Jan/10 - 2016")
## [1] "2016-01-10"
However, they are also careful to not convert strings to dates that are clearly not dates:
AsDate("Jan 128")
## Error in AsDate("Jan 128"): Could not parse "Jan 128" into a valid
## date in any format.
AsDate("3.122")
## Error in AsDate("3.122"): Could not parse "3.122" into a valid date in
## any format.
The above example also demonstrates the default behaviour of the functions to throw an error. This occurs when the date strings cannot be interpreted as dates. Both functions have an argument on.parse.failure
, which is used to control this behaviour.
AsDate("foo", on.parse.failure = "warn")
## Warning in AsDate("foo", on.parse.failure = "warn"): Could not parse
## "foo" into a valid date in any format.
## [1] NA
AsDateTime("foo", on.parse.failure = "silent")
## [1] NA
Both functions provide an argument us.format
, to allow the user to specify whether the date strings are in a U.S. or international format. U.S. format is with the month coming before the day, such as Jan.
2, 1988. By contrast, international format, has the day before the month, such as 21-10-1999
. The default behaviour is to check both formats. In this case, if the format is ambiguous, the date strings will be converted according to the U.S. format. The user will also receive a warning.
AsDateTime("9/10/2010 10:20PM")
## Warning: Date formats are ambiguous, US format has been used.
## [1] "2010-09-10 22:20:00 UTC"
AsDateTime("9/10/2010 10:20PM", us.format = FALSE)
## [1] "2010-10-09 22:20:00 UTC"
We can also combine the flipTime
functions with functions from lubridate
.
library(lubridate)
dt < AsDateTime("10/30/08 11:10AM")
dt + dminutes(6)
## [1] "2008-10-30 11:16:00 UTC"
birthday = "Dec. 8, 86"
days.alive = (AsDate(Sys.time()) - AsDate(birthday)) / ddays(1)
days.alive
## [1] 11322
The function AsDate()
is also able to interpret date intervals or periods, which can be useful when working with aggregated data. If the function encounters date periods, it will convert the start of the period to a date and return it.
AsDate("10/20/2015-12/02/2016")
## [1] "2015-10-20"
AsDate("may 2017-september 2017")
## [1] "2017-05-01"
AsDate("Dec/Apr 16")
## [1] "2015-12-01"
The following example shows how AsDate()
can be useful when working with dates inside a custom function. Say we have the following data on monthly returns of Apple and Yahoo. (A full copy of the dataframe can be found on this Displayr page here).
head(df)
## YHOO AAPL
## 01/2007-02/2007 0.09516441 -0.006488240
## 02/2007-03/2007 0.09007418 -0.013061224
## 03/2007-04/2007 0.01393390 0.097601323
## 04/2007-05/2007 -0.10386705 0.074604371
## 05/2007-06/2007 0.02353780 0.213884993
## 06/2007-07/2007 -0.05470383 0.006932409
We can plot this data as time series with formatted axis labels. For instance, we might write the following function and it produces the plot below.
PlotSeries = function(data, max.labels = 20, ...){
n = nrow(data)
xlabs = AsDate(rownames(data), on.parse.failure = "silent")
if (any(is.na(xlabs))) # no dates present, use original rownames
xlabs = rownames(data)[seq.int(1, n, length.out = max.labels)]
else{ # dates, present; format for pretty labels
xlabs = seq(xlabs[1L], xlabs[n], length.out = max.labels)
xlabs = format(xlabs, "%b, %Y")
}
matplot(data, type = "l", xaxt = "n", ...)
axis(1, labels = xlabs, at = seq.int(1, n, length.out = max.labels),
las = 2)
legend("bottomright", names(data), lty = 1:ncol(data),
col = 1:ncol(data))
}
PlotSeries(df, lwd = 2, ylab = "Return")
The source code for flipTime can be viewed and downloaded here.
You can find more information about working with dates and times in R below. For more handy tips for using R in Displayr or to learn how to do something in R, check out the Displayr blog.
lubridate
, see Garrett Grolemund and Hadley Wickham: R for Data Scienceanytime
, see Dirk Eddelbuettel's blog