Title: | The Research Data Warehouse of Miguel de Carvalho |
---|---|
Description: | Pulls together a collection of datasets from Miguel de Carvalho research articles. Including, for example: - de Carvalho (2012) <doi:10.1016/j.jspi.2011.08.016>; - de Carvalho et al (2012) <doi:10.1080/03610926.2012.709905>; - de Carvalho et al (2012) <doi:10.1016/j.econlet.2011.09.007>); - de Carvalho and Davison (2014) <doi:10.1080/01621459.2013.872651>; - de Carvalho and Rua (2017) <doi:10.1016/j.ijforecast.2015.09.004>. |
Authors: | Miguel de Carvalho [aut, cre] |
Maintainer: | Miguel de Carvalho <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.1 |
Built: | 2024-10-31 19:46:22 UTC |
Source: | https://github.com/cran/DATAstudio |
DATAstudio is an add-on tool for R that pulls together
a collection of datasets used in Miguel de Carvalho's research.
For a complete list of datasets and documentation, type help.start()
and follow the link to DATAstudio on the Package Index.
Fundação para a Ciência e a Tecnologia (Portuguese NSF) grants:
PTDC/MAT-STA/28649/2017.
UID/MAT/00006/2019.
Miguel de Carvalho; School of Mathematics, University of Edinburgh.
Useful links:
The alps
data data consist of daily winter temperature
minima and maxima measured at 2m above ground surface
at two sites in the Swiss Alps: Montana and Zermatt.
alps
alps
The alps
data frame contains the following columns:
date
:
date of measurements.
min_montana
, min_zermatt
:
daily minimum temperature in ºC on Montana and Zermatt.
min_montana
, min_zermatt
:
daily maximum temperature in ºC on Montana and Zermatt.
MeteoSwiss
Mhalla, L., de Carvalho, M., and Chavez-Demoulin, V. (2019) Regression type models for extremal dependence. Scandinavian Journal of Statistics, 46, 1141-1167.
## visualizing the data data(alps) oldpar <- par(pty = 's', mfrow = c(1, 2)) plot(alps$min_montana, alps$min_zermatt, pch = 20, xlab = "Montana", ylab = "Zermatt", main = "Daily Minimum") plot(alps$max_montana, alps$max_zermatt, pch = 20, xlab = "Montana", ylab = "Zermatt", main = "Daily Maximum") par(oldpar) oldpar <- par(pty = 's', mfrow = c(1, 2)) plot(alps$min_montana, alps$max_montana, pch = 20, xlab = "Minimum", ylab = "Maximum", main = "Montana") abline(a = 0, b = 1, col = "red", lty = 2) plot(alps$min_zermatt, alps$max_zermatt, pch = 20, xlab = "Minimum", ylab = "Maximum", main = "Zermatt") abline(a = 0, b = 1, col = "red", lty = 2) par(oldpar) ## to download the NAO daily index in Mhalla et al (2019) use ## the R package data.table to access NOAA via ftp link <- "ftp://ftp.cdc.noaa.gov/Public/gbates/teleconn/nao.reanalysis.t10trunc.1948-present.txt" NAO.daily <- data.table::fread(link) NAO.daily <- data.frame(NAO.daily) colnames(NAO.daily) <- c("year", "month", "day", "NAO")
## visualizing the data data(alps) oldpar <- par(pty = 's', mfrow = c(1, 2)) plot(alps$min_montana, alps$min_zermatt, pch = 20, xlab = "Montana", ylab = "Zermatt", main = "Daily Minimum") plot(alps$max_montana, alps$max_zermatt, pch = 20, xlab = "Montana", ylab = "Zermatt", main = "Daily Maximum") par(oldpar) oldpar <- par(pty = 's', mfrow = c(1, 2)) plot(alps$min_montana, alps$max_montana, pch = 20, xlab = "Minimum", ylab = "Maximum", main = "Montana") abline(a = 0, b = 1, col = "red", lty = 2) plot(alps$min_zermatt, alps$max_zermatt, pch = 20, xlab = "Minimum", ylab = "Maximum", main = "Zermatt") abline(a = 0, b = 1, col = "red", lty = 2) par(oldpar) ## to download the NAO daily index in Mhalla et al (2019) use ## the R package data.table to access NOAA via ftp link <- "ftp://ftp.cdc.noaa.gov/Public/gbates/teleconn/nao.reanalysis.t10trunc.1948-present.txt" NAO.daily <- data.table::fread(link) NAO.daily <- data.frame(NAO.daily) colnames(NAO.daily) <- c("year", "month", "day", "NAO")
Preprocessed pairs of temperatures in unit Fréchet scale from Beatenberg forest, registered under forest cover and in the open field.
beatenberg
beatenberg
The beatenberg
data frame has 2839 rows and 2 columns:
x
(forest cover) and y
(open field).
Preprocessing is conducted as described in Ferrez et al (2011), and for applications of this dataset within the context of extreme value theory see de Carvalho et al. (2013), de Carvalho and Davison (2014) as well as Castro and de Carvalho (2017).
Castro, D. & de Carvalho, M. (2017) Spectral density regression for bivariate extremes. Stochastic Environmental Research and Risk Assessment, 31, 1603-1613.
de Carvalho, M., Oumow, B., Segers, J. and Warchol, M. (2013) A Euclidean likelihood estimator for bivariate tail dependence. Communications in Statistics—Theory and Methods, 42, 1176-1192.
de Carvalho, M. & Davison, A. C. (2014) Spectral density ratio models for multivariate extremes. Journal of the American Statistical Association, 109, 764-776.
Ferrez, J., A. C. Davison, and Rebetez., M. (2011) Extreme temperature analysis under forest cover compared to an open field. Agricultural and Forest Meteorology, 151, 992-1001.
## de Carvalho et al (2013, Fig. 5) data(beatenberg) attach(beatenberg) plot(x, y, log = "xy", pch = 20, xlab = "Forest Cover", ylab = "Open Field") ## Not run: ## install package extremis if not installed if (!require("extremis")) install.packages("extremis") ## de Carvalho et al (2013, Fig. 7) data(beatenberg) fit <- bev.kernel(beatenberg, tau = 0.98, nu = 163, raw = FALSE) plot(fit) rug(fit$w) ## End(Not run)
## de Carvalho et al (2013, Fig. 5) data(beatenberg) attach(beatenberg) plot(x, y, log = "xy", pch = 20, xlab = "Forest Cover", ylab = "Open Field") ## Not run: ## install package extremis if not installed if (!require("extremis")) install.packages("extremis") ## de Carvalho et al (2013, Fig. 7) data(beatenberg) fit <- bev.kernel(beatenberg, tau = 0.98, nu = 163, raw = FALSE) plot(fit) rug(fit$w) ## End(Not run)
The data consist of 267 polls conducted before the June 23 2016 EU referendum, which took place in the UK.
brexit
brexit
A dataframe with 272 observations on six variables.
leave
, stay
, undecided
:
percentage in favor of each option.
date
:
date on which the poll was conducted.
pollster
:
institution conducting the poll.
size
:
number of polled subjects.
Financial Times (FT) Brexit poll tracker.
de Carvalho, M. and Martos, G. (2020). Brexit: Tracking and disentangling the sentiment towards leaving the EU. International Journal of Forecasting, 36, 1128-1137.
## Leave-stay plot (de Carvalho and Martos, 2018; Fig. 1) data(brexit) attach(brexit) oldpar <- par(pty = "s") plot(leave[(leave > stay)], stay[(leave > stay)], xlim = c(22, 66), ylim = c(22, 66), pch = 16, col = "red", xlab = "Leave", ylab = "Stay") points(leave[(stay > leave)], stay[(stay > leave)], pch = 16, col = "blue") points(leave[(stay == leave)], stay[(stay == leave)], pch = 24) abline(a = 0, b = 1, lwd = 3) par(oldpar)
## Leave-stay plot (de Carvalho and Martos, 2018; Fig. 1) data(brexit) attach(brexit) oldpar <- par(pty = "s") plot(leave[(leave > stay)], stay[(leave > stay)], xlim = c(22, 66), ylim = c(22, 66), pch = 16, col = "red", xlab = "Leave", ylab = "Stay") points(leave[(stay > leave)], stay[(stay > leave)], pch = 16, col = "blue") points(leave[(stay == leave)], stay[(stay == leave)], pch = 24) abline(a = 0, b = 1, lwd = 3) par(oldpar)
Data on 23 flights of the space shuttle Challenger prior to the 1986 accident, wherein the shuttle blew up during takeoff.
challenger
challenger
A dataframe with 23 observations on two variables, namely O-ring
temperature
(ºF) and oring
state (1
= failure;
0
= success).
de Carvalho, M. (2012) A Generalization of the Solis-Wets method. Journal of Statistical Planning and Inference, 142, 633-644.
data(challenger) attach(challenger) boxplot(temperature ~ oring, xlab = "Failure", ylab = "Temperature")
data(challenger) attach(challenger) boxplot(temperature ~ oring, xlab = "Failure", ylab = "Temperature")
Weekly number (in thousands) of unemployment insurance claims in the US from 7 Jan 1967 until 28 Nov 2009.
claims
claims
A time series with 515 observations; the object is of class tis
(time-indexed series).
United States Department of Labor—Employment & Training Administration.
de Carvalho, M., Turkman, K. F. and Rua, A. (2013) Dynamic threshold modelling and the US business cycle. Journal of the Royal Statistical Society, Ser. C, 62, 535-550.
https://www.maths.ed.ac.uk/~mdecarv/decarvalho2013ash.html
## de Carvalho et al (2013; Fig 1) data(claims) plot(time(claims), claims, type = "l", xlab = "Time", ylab = "Initial Claims (in Thousands)")
## de Carvalho et al (2013; Fig 1) data(claims) plot(time(claims), claims, type = "l", xlab = "Time", ylab = "Initial Claims (in Thousands)")
Axial brain slices gathered via magnetic resonance images (MRI) with 500 points on each outline, for 30 schizophrenia patients and 38 healthy controls.
cortical
cortical
The cortical
list has the following variables:
cortical$age
:
age, in years.
cortical$group
:
control patient (Con
) or schizophrenia patient (Scz
).
cortical$sex
:
male (1
) or female (2
).
cortical$symm
:
symmetry score obtained from raw 3D brain surface.
cortical$x
and cortical$y
x
, y
coordinates of slice from brain
surface that intersects the AC (anterior commissure) and PC (posterior
commissure).
cortical$r
500 radii from angular polar coordinates.
The data were gathered from a neuroscience study conducted at the University of British Columbia, Canada, and documented in Brignell et al. (2010) and Martos and de Carvalho (2018). Each brain was registered into the so-called Talairach space so that brains can be compared on the same three-dimensional referential coordinate space.
Brignell, C.J., Dryden, I.L., Gattone, S.A., Park, B., Leask, S., Browne, W.J. and Flynn, S. (2010) Surface shape analysis, with an application to brain surface asymmetry in schizophrenia. Biostatistics, 11, 609-630.
Martos, G. & de Carvalho, M. (2018) Discrimination surfaces with application to region-specific brain asymmetry analysis. Statistics in Medicine, 37, 1859-1873.
## Martos and de Carvalho (2018; Fig 1 a) library(scales) data(cortical) m <- 500 n <- 68 plot(cortical$r[,1] * cos(2 * pi * 1:m / m), cortical$r[,1] * sin(2 * pi * 1:m / m) , type = "l", col = alpha("gray", 1 / n), xlab = "z", ylab = "x") for(i in 2:n) lines(cortical$r[, i] * cos(2 * pi * 1:m / m), cortical$r[, i] * sin(2 * pi * 1:m / m), type = "l", col = alpha("gray", i / n))
## Martos and de Carvalho (2018; Fig 1 a) library(scales) data(cortical) m <- 500 n <- 68 plot(cortical$r[,1] * cos(2 * pi * 1:m / m), cortical$r[,1] * sin(2 * pi * 1:m / m) , type = "l", col = alpha("gray", 1 / n), xlab = "z", ylab = "x") for(i in 2:n) lines(cortical$r[, i] * cos(2 * pi * 1:m / m), cortical$r[, i] * sin(2 * pi * 1:m / m), type = "l", col = alpha("gray", i / n))
The diabetes
data frame has 286 rows and 3 columns. The data
were gathered from a population-based pilot survey of diabetes in
Cairo, Egypt, in which postprandial blood glucose measurements were
obtained from a fingerstick on 286 subjects. Based on the WHO (World
Health Organization) criteria, 88 subjects were classified as diseased
and 198 as healthy.
diabetes
diabetes
The diabetes
data frame contains the following columns:
marker
:
postprandial blood glucose measurements (mg/dl) obtained from a
fingerstick.
status
:
disease status, with 1
identifying subjects diagnosed with
diabetes.
age
age in years.
Inácio de Carvalho, V., de Carvalho, M. and Branscum, A. (2017) Nonparametric Bayesian covariate-adjusted estimation of the Youden index. Biometrics, 73, 1279-1288.
Inácio de Carvalho, V., Jara, A., Hanson, T. E. and de Carvalho, M. (2013) Bayesian nonparametric ROC regression modeling. Bayesian Analysis, 8, 623-646.
data(diabetes) plot(diabetes, pch = 20, main = "Diabetes Data")
data(diabetes) plot(diabetes, pch = 20, main = "Diabetes Data")
The Danish Fire Insurance Claims Database includes 2167 industrial fire losses gathered from the Copenhagen Reinsurance Company over the period 1980-1990.
fire
fire
A dataframe with 2167 observations on five variables, namely:
Positions
: date.
building
: loss to buildings.
content
: loss to content.
profits
: loss to profits.
total
: total loss.
de Carvalho, M. & Marques, F. (2012) Jackknife Euclidean likelihood-based inference for Spearman's rho. North American Actuarial Journal, 16, 487-492.
https://www.maths.ed.ac.uk/~mdecarv/decarvalho2012bsh.html
data(fire) attach(fire) plot(building, contents, pch = 20, xlim = c(0, 95), ylim = c(0, 133), xlab = "Loss of Building", ylab = "Loss of Contents", main = "Danish Fire Insurance Claims") ## Not run: ## Confidence intervals for Spearman rho; install the package ## spearmanCI, if not installed if (!require("spearmanCI")) install.packages("spearmanCI") spearmanCI(building, contents) ## End(Not run)
data(fire) attach(fire) plot(building, contents, pch = 20, xlim = c(0, 95), ylim = c(0, 133), xlab = "Loss of Building", ylab = "Loss of Contents", main = "Danish Fire Insurance Claims") ## Not run: ## Confidence intervals for Spearman rho; install the package ## spearmanCI, if not installed if (!require("spearmanCI")) install.packages("spearmanCI") spearmanCI(building, contents) ## End(Not run)
US GDP (Gross Domestic Product) ranging from from 1950 (Q1) to 2009 (Q4).
GDP
GDP
A time series with 268 observations on two
variables. The object is of class ts
.
de Carvalho, M., Rodrigues, P. and Rua, A. (2012) Tracking the US business cycle with a singular spectrum analysis. Economics Letters, 114, 32-35.
de Carvalho, M. and Rua, A. (2017) Real-time nowcasting the US output gap: Singular spectrum analysis at work. International Journal of Forecasting, 33, 185-198.
https://www.maths.ed.ac.uk/~mdecarv/decarvalho2012dsh.html
data(GDP) plot(GDP, ylab = "Gross Domestic Product") ## Not run: if (!require("ASSA")) install.packages("ASSA") data(GDP) fit <- bssa(log(GDP[, 1])) plot(fit) print(fit) ## End(Not run)
data(GDP) plot(GDP, ylab = "Gross Domestic Product") ## Not run: if (!require("ASSA")) install.packages("ASSA") data(GDP) fit <- bssa(log(GDP[, 1])) plot(fit) print(fit) ## End(Not run)
US GDP (Gross Domestic Product) and IP (Industrial Production) ranging from from 1947 (Q1) to 2013 (Q4); the data correspond to a real-time vintage.
GDPIP
GDPIP
A bivariate time series with 268 observations on two variables:
GDP
and IP
. The object is of class mts
.
Federal Reserve Bank of Philadelphia.
de Carvalho, M. and Rua, A. (2017). Real-time nowcasting the US output gap: Singular spectrum analysis at work. International Journal of Forecasting, 33, 185-198.
https://www.maths.ed.ac.uk/~mdecarv/decarvalho2017sh.html
data(GDPIP) plot(GDPIP) ## Plotting GDP against IP (de Carvalho and Rua, 2017; Fig. 4) data(GDPIP) oldpar <- par(mar = c(5, 4, 4, 5) + .1) plot(GDPIP[, 1], type = "l", xlab = "Time", ylab = "Gross Domestic Product (GDP)", lwd = 3, col = "red", cex.lab = 1.4, cex.axis = 1.4) par(new = TRUE) plot(GDPIP[, 2], type = "l", xaxt = "n", yaxt = "n", xlab = "", ylab = "", lwd = 3, col = "blue", cex.axis = 1.4) axis(4) mtext("Industrial Production (IP)", side = 4, line = 3, cex = 1.4) legend("topleft", col = c("red", "blue"), lty = 1, lwd = 3, legend = c("GDP", "IP")) par(oldpar) ## Not run: ## Tracking the US Business Cycle (de Carvalho et al, 2017; Fig. 6) ## Install the package ASSA, if not installed if (!require("ASSA")) install.packages("ASSA") data(GDPIP) fit <- bmssa(log(GDPIP)) plot(fit) print(fit) ## End(Not run)
data(GDPIP) plot(GDPIP) ## Plotting GDP against IP (de Carvalho and Rua, 2017; Fig. 4) data(GDPIP) oldpar <- par(mar = c(5, 4, 4, 5) + .1) plot(GDPIP[, 1], type = "l", xlab = "Time", ylab = "Gross Domestic Product (GDP)", lwd = 3, col = "red", cex.lab = 1.4, cex.axis = 1.4) par(new = TRUE) plot(GDPIP[, 2], type = "l", xaxt = "n", yaxt = "n", xlab = "", ylab = "", lwd = 3, col = "blue", cex.axis = 1.4) axis(4) mtext("Industrial Production (IP)", side = 4, line = 3, cex = 1.4) legend("topleft", col = c("red", "blue"), lty = 1, lwd = 3, legend = c("GDP", "IP")) par(oldpar) ## Not run: ## Tracking the US Business Cycle (de Carvalho et al, 2017; Fig. 6) ## Install the package ASSA, if not installed if (!require("ASSA")) install.packages("ASSA") data(GDPIP) fit <- bmssa(log(GDPIP)) plot(fit) print(fit) ## End(Not run)
Prices at close from 26 selected stocks from the London stock exchange from 1989 to 2016.
lse
lse
The lse
data frame has 6894 rows and 27 columns.
Rubio, R., de Carvalho, M., and Huser (2018) Similarity-based clustering of extreme losses from the London stock exchange.
The lungcancer
data frame has 241 rows and 3 columns. The data
were gathered gathered from a case-control study, conducted at the
Mayo Clinic in Rochester (Minnesota), which included 140 controls and
101 lung cancer cases; only woman have been enrolled in the study.
lungcancer
lungcancer
This data frame contains the following columns:
marker
: square root of sEGFR levels (soluble isoform of the epidermal growth factor receptor).
status
:
disease status, with 1
identifying lung cancer cases and
0
identifying controls.
pre
:
premonopausal indicator, with 1
identifying premonopausal
women.
age
: age in years.
Inácio de Carvalho, V., Jara, A. and de Carvalho, M. (2015) Bayesian nonparametric approaches for ROC curve inference. In: Nonparametric Bayesian Methods in Biostatistics and Bioinformatics. Eds R. Mitra and P. Mueller. Cham: Springer.
Rainfall data from Madeira, Portugal, from January 1973 to June 2018.
madeira
madeira
The madeira
data frame has 544 observations and 8 columns:
yearmonth
:
Year and month.
prec
:
Total monthly precipitation (.01 inches).
amo
:
Atlantic multi-decadal oscillation.
nino34
:
El Niño-southern oscillation (ENSO), expressed by NINO34
index.
np
:
North pacific index (NPI).
pdo
:
Pacific decadal oscillation (PDO).
soi
:
Southern oscillation index (SOI).
nao
:
North atlantic oscillation (NAO).
After eliminating the dry events (i.e., zero precipitation) and the missing precipitation data (two observations) one is left with a total of 532 observations, and that is the version of the data analyzed in de Carvalho et al (2022)
National Oceanic and Atmospheric Administration.
de Carvalho, M., Pereira, S., Pereira, S. and de Zea Bermudez, P. (2022, in press). An extreme value Bayesian lasso for the conditional left and right tails. Journal of Agricultural, Biological and Environmental Statistics.
Daily quotations at close of the NASDAQ and NYSE stock market indices from February 1971 till November 2021.
marketsUS
marketsUS
The marketsUS
data frame has 12562 rows and 3 columns: date
and quotation at close of the nasdaq
and nyse
indices.
de Carvalho, M., Kumukova, A. and dos Reis, G. (2022) Regression-type analysis for multivariate extreme values. Submitted.
## Not run: ## de Carvalho et al (2022; Fig 5.1) data(marketsUS) packages <- c("scales", "ggplot2") sapply(packages, require, character.only = TRUE) ggplot(data = marketsUS, aes(x = date, y = value, color = Indices)) + geom_line(aes(y = nasdaq, col = "NASDAQ"), alpha = 0.5, position = position_dodge(0.8), size = 1.1) + geom_line(aes(y = nyse, col = "NYSE"), alpha = 0.5, position = position_dodge(0.8), size = 1.1) + scale_y_continuous(breaks = seq(2000, 14000, by = 2000)) + scale_x_date(labels = date_format("%Y"), breaks = as.Date(c("1971-01-01", "1978-01-01", "1985-01-01", "1992-01-01", "1999-01-01", "2006-01-01", "2013-01-01", "2020-01-01"))) + scale_color_manual(values = c("red", "blue")) + labs(y = "Value (in USD)", x = "Time (in Years)") ## End(Not run)
## Not run: ## de Carvalho et al (2022; Fig 5.1) data(marketsUS) packages <- c("scales", "ggplot2") sapply(packages, require, character.only = TRUE) ggplot(data = marketsUS, aes(x = date, y = value, color = Indices)) + geom_line(aes(y = nasdaq, col = "NASDAQ"), alpha = 0.5, position = position_dodge(0.8), size = 1.1) + geom_line(aes(y = nyse, col = "NYSE"), alpha = 0.5, position = position_dodge(0.8), size = 1.1) + scale_y_continuous(breaks = seq(2000, 14000, by = 2000)) + scale_x_date(labels = date_format("%Y"), breaks = as.Date(c("1971-01-01", "1978-01-01", "1985-01-01", "1992-01-01", "1999-01-01", "2006-01-01", "2013-01-01", "2020-01-01"))) + scale_color_manual(values = c("red", "blue")) + labs(y = "Value (in USD)", x = "Time (in Years)") ## End(Not run)
Raw interval data series corresponding to weekly minimum and maximum values of the MERVAL index (Argentina stock market) ranging from January 1 2016 to September 30 2020, along with prices at open and prices at close.
merval
merval
A dataframe with 353 observations and 5 columns: dates
,
low
, high
, open
, and close
.
Yahoo Finance.
de Carvalho, M. and Martos, G. (2022). Modeling interval trendlines: Symbolic singular spectrum analysis for interval time series. Journal of Forecasting, 41, 167-180.
data(merval) attach(merval) head(merval, 3) oldpar <- par(pty = 's') plot(low, high, pch = 20) abline(a = 0, b = 1, lty = 2, col = "gray") par(oldpar)
data(merval) attach(merval) head(merval, 3) oldpar <- par(pty = 's') plot(low, high, pch = 20) abline(a = 0, b = 1, lty = 2, col = "gray") par(oldpar)
The metsynd
data includes Gamma-Glutamyl Transferase (GGT)
levels and curves of arterial oxygen saturation, for samples of women
suffering from metabolic syndrome and women without metabolic
syndrome; the data were gathered from a population-based survey
conducted in Galicia (NW Spain), and it includes 35 women suffering
from metabolic syndrome and 80 women without metabolic syndrome.
metsynd
metsynd
The data consist of a list with the following elements:
y0
:
GGT levels for women without metabolic syndrome.
y1
:
GGT levels for women suffering from metabolic syndrome.
X0
:
Curves of arterial oxygen saturation (%) for women without
metabolic syndrome (X0$data
, X0$time
).
X1
:
Curves of arterial oxygen saturation (%) for women suffering from
metabolic syndrome (X1$data
, X1$time
).
The curves of arterial oxygen saturation are included in the matrices
X0$data
and X1$data
, with each row representing a
patient, and with columns representing ordered measurements over time.
Here X0$time
and X1$time
represents the time (in hours) at
which measurements were made, i.e., every 20 seconds during three
hours of sleep. Further details on these data can be found in the
references below.
Inácio de Carvalho, V., de Carvalho, M., Alonzo, T. A., González-Manteiga, W. (2016) Functional covariate-adjusted partial area under the specificity-ROC curve regression with an application to metabolic syndrome case study. Annals of Applied Statistics, 10, 1472-1495
data(metsynd) library(scales) attach(metsynd) ## Inacio de Carvalho et al (2016; Fig 1) oldpar <- par(mfrow = c(1,2)) n0 <- length(y0) n1 <- length(y1) t <- X1$time plot(t, X1$data[1, ], type = "l", lwd = 3, ylim = c(70, 100), xlab = "Time (in hours)", ylab = "Arterial oxygen saturation (%)", main = "Metabolic syndrome") for (i in 2:n1) lines(t, X1$data[i, ], type = "l", lwd = 3, col = alpha("black", i / n1)) plot(t, X0$data[1, ], type = "l", lwd = 3, col = "gray", ylim = c(70, 100), xlab = "Time (in hours)", ylab = "Arterial oxygen saturation (%)", main = "No metabolic syndrome") for (i in 1:n0) lines(t, X0$data[i, ], type = "l", lwd = 3, col = alpha("gray", i / n0)) par(oldpar)
data(metsynd) library(scales) attach(metsynd) ## Inacio de Carvalho et al (2016; Fig 1) oldpar <- par(mfrow = c(1,2)) n0 <- length(y0) n1 <- length(y1) t <- X1$time plot(t, X1$data[1, ], type = "l", lwd = 3, ylim = c(70, 100), xlab = "Time (in hours)", ylab = "Arterial oxygen saturation (%)", main = "Metabolic syndrome") for (i in 2:n1) lines(t, X1$data[i, ], type = "l", lwd = 3, col = alpha("black", i / n1)) plot(t, X0$data[1, ], type = "l", lwd = 3, col = "gray", ylim = c(70, 100), xlab = "Time (in hours)", ylab = "Arterial oxygen saturation (%)", main = "No metabolic syndrome") for (i in 1:n0) lines(t, X0$data[i, ], type = "l", lwd = 3, col = alpha("gray", i / n0)) par(oldpar)
Monthly number of passengers (in thousands) in a group of several international airline companies from January 1949-December 1960.
passengers
passengers
A time series with 144 observations; the object is of class ts
.
Brown, R.G. (1963) Smoothing, Forecasting and Prediction of Discrete Time Series. New Jersey: Prentice-Hall.
Rodrigues, P. C. and de Carvalho, M. (2013) Spectral modeling of time series with missing data. Applied Mathematical Modelling, 37, 4676-4684.
Longitudinal measurements of two Prostate Specific Antigen (PSA)-based biomarkers for 71 prostate cancer cases and 70 controls.
psa
psa
The psa
data frame has 683 rows and 6 columns:
id
patient id.
marker1
total PSA.
marker2
ratio of free total PSA.
status
disease status of each subject, with 1
identifying
subjects diagnosed with prostate cancer.
age
age in years.
t
time prior to diagnosis.
The data were gathered from the Beta-Carotone and Retinol Efficacy Trial (CARET)—a lung cancer prevention trial, conducted at the Fred Hutchison Cancer Research Center. Further details on this study can be found in de Carvalho et al. (2020).
de Carvalho, M., Barney, B. and Page, G. L. (2020) Affinity-based measures of biomarker performance evaluation. Statistical Methods in Medical Research, 20, 837-853.
The data consist of average daily air temperatures in Fahrenheit scale—rounded to the nearest integer—of Santiago (Chile) from April 1990 to March 2017.
santiago
santiago
A dataframe with 10126 observations on one variable.
NOAA's National Centers for Environmental Information (NCEI).
Galasso, B., Zemel, Y. and de Carvalho, M. (2022). Bayesian semiparametric modelling of phase-varying point processes. Electronic Journal of Statistics, 16, 2518-2549.
Daily Standard and Poor’s index at close from 1988 till 2007.
sp500
sp500
The sp500
data frame has 5043 rows and 2 columns: date
and price at close
.
de Carvalho, M. (2016) Statistics of extremes: Challenges and opportunities. In: Handbook of EVT and its Applications to Finance and Insurance. Eds F. Longin. Hoboken: Wiley.
Completion times in seconds for TMT (Trail Making Test), part A, for 245 patients with Parkinson's disease, along with corresponding diagnostic on cognitive impairment.
tmt
tmt
The tmt
data frame has 245 rows and 2 columns:
marker
completion times (in seconds)
status
disease status of each subject, with 1
, 2
, and
3
respectively denoting patients diagnosed as unimpaired,
mild cognitive impairment, and dementia.
Inácio de Carvalho, V., de Carvalho, M., and Branscum, A. (2018) Bayesian bootstrap inference for the ROC surface. Stat, 7, e211.
US monthly unemployment rate from January 1967 to November 2009; the 515 monthly observations are seasonally adjusted.
unemployment
unemployment
A time series with 515 observations; the object is of class ts
.
Bureau of Labor Statistics.
de Carvalho, M., Turkman, K. F. and Rua, A. (2013) Dynamic threshold modelling and the US business cycle. Journal of the Royal Statistical Society, Ser. C, 62, 535-550.
https://www.maths.ed.ac.uk/~mdecarv/decarvalho2013ash.html
## de Carvalho et al (2013; Fig. 1) data(unemployment) plot(unemployment, xlab = "Time", ylab = "Unemployment Rate")
## de Carvalho et al (2013; Fig. 1) data(unemployment) plot(unemployment, xlab = "Time", ylab = "Unemployment Rate")