Package 'DATAstudio' reference manual

Title:	The Research Data Warehouse of Miguel de Carvalho
Description:	Pulls together a collection of datasets from Miguel de Carvalho research articles. Including, for example: - de Carvalho (2012) <doi:10.1016/j.jspi.2011.08.016>; - de Carvalho et al (2012) <doi:10.1080/03610926.2012.709905>; - de Carvalho et al (2012) <doi:10.1016/j.econlet.2011.09.007>); - de Carvalho and Davison (2014) <doi:10.1080/01621459.2013.872651>; - de Carvalho and Rua (2017) <doi:10.1016/j.ijforecast.2015.09.004>.
Authors:	Miguel de Carvalho [aut, cre]
Maintainer:	Miguel de Carvalho <[email protected]>
License:	GPL (>= 3)
Version:	1.1
Built:	2025-01-29 04:54:10 UTC
Source:	https://github.com/cran/DATAstudio

The Research Data Warehouse of Miguel de Carvalho

Description

logo

DATAstudio is an add-on tool for R that pulls together a collection of datasets used in Miguel de Carvalho's research. For a complete list of datasets and documentation, type help.start() and follow the link to DATAstudio on the Package Index.

Funding

Fundação para a Ciência e a Tecnologia (Portuguese NSF) grants:

PTDC/MAT-STA/28649/2017.
UID/MAT/00006/2019.

Author(s)

Miguel de Carvalho; School of Mathematics, University of Edinburgh.

Swiss Alps Temperature Data

Description

The alps data data consist of daily winter temperature minima and maxima measured at 2m above ground surface at two sites in the Swiss Alps: Montana and Zermatt.

Usage

alpsalps

Format

The alps data frame contains the following columns:

date: date of measurements.
min_montana, min_zermatt: daily minimum temperature in ºC on Montana and Zermatt.
min_montana, min_zermatt: daily maximum temperature in ºC on Montana and Zermatt.

Source

MeteoSwiss

References

Mhalla, L., de Carvalho, M., and Chavez-Demoulin, V. (2019) Regression type models for extremal dependence. Scandinavian Journal of Statistics, 46, 1141-1167.

Examples

## visualizing the data
data(alps)
oldpar <- par(pty = 's', mfrow = c(1, 2))
plot(alps$min_montana, alps$min_zermatt, pch = 20, 
     xlab = "Montana", ylab = "Zermatt", main = "Daily Minimum")
plot(alps$max_montana, alps$max_zermatt, pch = 20, 
     xlab = "Montana", ylab = "Zermatt", main = "Daily Maximum")
par(oldpar)

oldpar <- par(pty = 's', mfrow = c(1, 2))
plot(alps$min_montana, alps$max_montana, pch = 20, 
     xlab = "Minimum", ylab = "Maximum", main = "Montana")
abline(a = 0, b = 1, col = "red", lty = 2)
plot(alps$min_zermatt, alps$max_zermatt, pch = 20, 
     xlab = "Minimum", ylab = "Maximum", main = "Zermatt")
abline(a = 0, b = 1, col = "red", lty = 2)
par(oldpar)

## to download the NAO daily index in Mhalla et al (2019) use
## the R package data.table to access NOAA via ftp 
link <- "ftp://ftp.cdc.noaa.gov/Public/gbates/teleconn/nao.reanalysis.t10trunc.1948-present.txt"
NAO.daily <- data.table::fread(link)
NAO.daily <- data.frame(NAO.daily)
colnames(NAO.daily) <- c("year", "month", "day", "NAO")
## visualizing the data
data(alps)
oldpar <- par(pty = 's', mfrow = c(1, 2))
plot(alps$min_montana, alps$min_zermatt, pch = 20, 
     xlab = "Montana", ylab = "Zermatt", main = "Daily Minimum")
plot(alps$max_montana, alps$max_zermatt, pch = 20, 
     xlab = "Montana", ylab = "Zermatt", main = "Daily Maximum")
par(oldpar)

oldpar <- par(pty = 's', mfrow = c(1, 2))
plot(alps$min_montana, alps$max_montana, pch = 20, 
     xlab = "Minimum", ylab = "Maximum", main = "Montana")
abline(a = 0, b = 1, col = "red", lty = 2)
plot(alps$min_zermatt, alps$max_zermatt, pch = 20, 
     xlab = "Minimum", ylab = "Maximum", main = "Zermatt")
abline(a = 0, b = 1, col = "red", lty = 2)
par(oldpar)

## to download the NAO daily index in Mhalla et al (2019) use
## the R package data.table to access NOAA via ftp 
link <- "ftp://ftp.cdc.noaa.gov/Public/gbates/teleconn/nao.reanalysis.t10trunc.1948-present.txt"
NAO.daily <- data.table::fread(link)
NAO.daily <- data.frame(NAO.daily)
colnames(NAO.daily) <- c("year", "month", "day", "NAO")

Beatenberg Forest Temperature Data (In Unit Fréchet Scale)

Description

Preprocessed pairs of temperatures in unit Fréchet scale from Beatenberg forest, registered under forest cover and in the open field.

Usage

beatenbergbeatenberg

Format

The beatenberg data frame has 2839 rows and 2 columns: x (forest cover) and y (open field).

Details

Preprocessing is conducted as described in Ferrez et al (2011), and for applications of this dataset within the context of extreme value theory see de Carvalho et al. (2013), de Carvalho and Davison (2014) as well as Castro and de Carvalho (2017).

References

Castro, D. & de Carvalho, M. (2017) Spectral density regression for bivariate extremes. Stochastic Environmental Research and Risk Assessment, 31, 1603-1613.

de Carvalho, M., Oumow, B., Segers, J. and Warchol, M. (2013) A Euclidean likelihood estimator for bivariate tail dependence. Communications in Statistics—Theory and Methods, 42, 1176-1192.

de Carvalho, M. & Davison, A. C. (2014) Spectral density ratio models for multivariate extremes. Journal of the American Statistical Association, 109, 764-776.

Ferrez, J., A. C. Davison, and Rebetez., M. (2011) Extreme temperature analysis under forest cover compared to an open field. Agricultural and Forest Meteorology, 151, 992-1001.

Examples

## de Carvalho et al (2013, Fig. 5)
data(beatenberg)
attach(beatenberg)
plot(x, y, log = "xy", pch = 20, xlab = "Forest Cover", ylab = "Open Field")

## Not run: 
## install package extremis if not installed
if (!require("extremis")) install.packages("extremis")

## de Carvalho et al (2013, Fig. 7)
data(beatenberg)
fit <- bev.kernel(beatenberg, tau = 0.98, nu = 163, raw = FALSE)
plot(fit)
rug(fit$w)

## End(Not run)
## de Carvalho et al (2013, Fig. 5)
data(beatenberg)
attach(beatenberg)
plot(x, y, log = "xy", pch = 20, xlab = "Forest Cover", ylab = "Open Field")

## Not run: 
## install package extremis if not installed
if (!require("extremis")) install.packages("extremis")

## de Carvalho et al (2013, Fig. 7)
data(beatenberg)
fit <- bev.kernel(beatenberg, tau = 0.98, nu = 163, raw = FALSE)
plot(fit)
rug(fit$w)

## End(Not run)

Brexit Poll Tracker

Description

The data consist of 267 polls conducted before the June 23 2016 EU referendum, which took place in the UK.

Usage

brexitbrexit

Format

A dataframe with 272 observations on six variables.

leave, stay, undecided: percentage in favor of each option.
date: date on which the poll was conducted.
pollster: institution conducting the poll.
size: number of polled subjects.

Source

Financial Times (FT) Brexit poll tracker.

References

de Carvalho, M. and Martos, G. (2020). Brexit: Tracking and disentangling the sentiment towards leaving the EU. International Journal of Forecasting, 36, 1128-1137.

Examples

## Leave-stay plot (de Carvalho and Martos, 2018; Fig. 1)
data(brexit)
attach(brexit)
oldpar <- par(pty = "s")
plot(leave[(leave > stay)], stay[(leave > stay)],
     xlim = c(22, 66), ylim = c(22, 66), pch = 16, col = "red",
     xlab = "Leave", ylab = "Stay")
points(leave[(stay > leave)], stay[(stay > leave)],
       pch = 16, col = "blue")
points(leave[(stay == leave)], stay[(stay == leave)],
       pch = 24)
abline(a = 0, b = 1, lwd = 3)
par(oldpar)
## Leave-stay plot (de Carvalho and Martos, 2018; Fig. 1)
data(brexit)
attach(brexit)
oldpar <- par(pty = "s")
plot(leave[(leave > stay)], stay[(leave > stay)],
     xlim = c(22, 66), ylim = c(22, 66), pch = 16, col = "red",
     xlab = "Leave", ylab = "Stay")
points(leave[(stay > leave)], stay[(stay > leave)],
       pch = 16, col = "blue")
points(leave[(stay == leave)], stay[(stay == leave)],
       pch = 24)
abline(a = 0, b = 1, lwd = 3)
par(oldpar)

Space Shuttle Challenger Data

Description

Data on 23 flights of the space shuttle Challenger prior to the 1986 accident, wherein the shuttle blew up during takeoff.

Usage

challengerchallenger

Format

A dataframe with 23 observations on two variables, namely O-ring temperature (ºF) and oring state (1 = failure; 0 = success).

References

de Carvalho, M. (2012) A Generalization of the Solis-Wets method. Journal of Statistical Planning and Inference, 142, 633-644.

Examples

data(challenger)
attach(challenger)
boxplot(temperature ~ oring, xlab = "Failure", ylab = "Temperature")
data(challenger)
attach(challenger)
boxplot(temperature ~ oring, xlab = "Failure", ylab = "Temperature")

Initial Claims of Unemployment

Description

Weekly number (in thousands) of unemployment insurance claims in the US from 7 Jan 1967 until 28 Nov 2009.

Usage

claimsclaims

Format

A time series with 515 observations; the object is of class tis (time-indexed series).

Source

United States Department of Labor—Employment & Training Administration.

References

de Carvalho, M., Turkman, K. F. and Rua, A. (2013) Dynamic threshold modelling and the US business cycle. Journal of the Royal Statistical Society, Ser. C, 62, 535-550.

Examples

## de Carvalho et al (2013; Fig 1)
data(claims)
plot(time(claims), claims, type = "l",
     xlab = "Time", ylab = "Initial Claims (in Thousands)")
## de Carvalho et al (2013; Fig 1)
data(claims)
plot(time(claims), claims, type = "l",
     xlab = "Time", ylab = "Initial Claims (in Thousands)")

Brain Shape Data

Description

Axial brain slices gathered via magnetic resonance images (MRI) with 500 points on each outline, for 30 schizophrenia patients and 38 healthy controls.

Usage

corticalcortical

Format

The cortical list has the following variables:

cortical$age: age, in years.
cortical$group: control patient (Con) or schizophrenia patient (Scz).
cortical$sex: male (1) or female (2).
cortical$symm: symmetry score obtained from raw 3D brain surface.
cortical$x and cortical$y x, y coordinates of slice from brain surface that intersects the AC (anterior commissure) and PC (posterior commissure).
cortical$r 500 radii from angular polar coordinates.

Details

The data were gathered from a neuroscience study conducted at the University of British Columbia, Canada, and documented in Brignell et al. (2010) and Martos and de Carvalho (2018). Each brain was registered into the so-called Talairach space so that brains can be compared on the same three-dimensional referential coordinate space.

References

Brignell, C.J., Dryden, I.L., Gattone, S.A., Park, B., Leask, S., Browne, W.J. and Flynn, S. (2010) Surface shape analysis, with an application to brain surface asymmetry in schizophrenia. Biostatistics, 11, 609-630.

Martos, G. & de Carvalho, M. (2018) Discrimination surfaces with application to region-specific brain asymmetry analysis. Statistics in Medicine, 37, 1859-1873.

Examples

  ## Martos and de Carvalho (2018; Fig 1 a)
  library(scales)
  data(cortical)
  m <- 500  
  n <- 68
  plot(cortical$r[,1] * cos(2 * pi * 1:m / m),
       cortical$r[,1] * sin(2 * pi * 1:m / m) , type = "l",
       col = alpha("gray", 1 / n), xlab = "z", ylab = "x")
  for(i in 2:n) 
  lines(cortical$r[, i] * cos(2 * pi * 1:m / m),
        cortical$r[, i] * sin(2 * pi * 1:m / m), type = "l",
        col = alpha("gray", i / n))
## Martos and de Carvalho (2018; Fig 1 a)
  library(scales)
  data(cortical)
  m <- 500  
  n <- 68
  plot(cortical$r[,1] * cos(2 * pi * 1:m / m),
       cortical$r[,1] * sin(2 * pi * 1:m / m) , type = "l",
       col = alpha("gray", 1 / n), xlab = "z", ylab = "x")
  for(i in 2:n) 
  lines(cortical$r[, i] * cos(2 * pi * 1:m / m),
        cortical$r[, i] * sin(2 * pi * 1:m / m), type = "l",
        col = alpha("gray", i / n))

Diabetes Diagnosis Data

Description

The diabetes data frame has 286 rows and 3 columns. The data were gathered from a population-based pilot survey of diabetes in Cairo, Egypt, in which postprandial blood glucose measurements were obtained from a fingerstick on 286 subjects. Based on the WHO (World Health Organization) criteria, 88 subjects were classified as diseased and 198 as healthy.

Usage

diabetesdiabetes

Format

The diabetes data frame contains the following columns:

marker: postprandial blood glucose measurements (mg/dl) obtained from a fingerstick.
status: disease status, with 1 identifying subjects diagnosed with diabetes.
age age in years.

References

Inácio de Carvalho, V., de Carvalho, M. and Branscum, A. (2017) Nonparametric Bayesian covariate-adjusted estimation of the Youden index. Biometrics, 73, 1279-1288.

Inácio de Carvalho, V., Jara, A., Hanson, T. E. and de Carvalho, M. (2013) Bayesian nonparametric ROC regression modeling. Bayesian Analysis, 8, 623-646.

Examples

data(diabetes)
plot(diabetes, pch = 20, main = "Diabetes Data")
data(diabetes)
plot(diabetes, pch = 20, main = "Diabetes Data")

Danish Fire Insurance Claims Database

Description

The Danish Fire Insurance Claims Database includes 2167 industrial fire losses gathered from the Copenhagen Reinsurance Company over the period 1980-1990.

Usage

firefire

Format

A dataframe with 2167 observations on five variables, namely:

Positions: date.
building: loss to buildings.
content: loss to content.
profits: loss to profits.
total: total loss.

References

de Carvalho, M. & Marques, F. (2012) Jackknife Euclidean likelihood-based inference for Spearman's rho. North American Actuarial Journal, 16, 487-492.

Examples

data(fire)
attach(fire)
plot(building, contents, pch = 20, xlim = c(0, 95), ylim = c(0, 133),
     xlab = "Loss of Building", ylab = "Loss of Contents",
     main = "Danish Fire Insurance Claims")

## Not run: 
## Confidence intervals for Spearman rho; install the package
## spearmanCI, if not installed
if (!require("spearmanCI")) install.packages("spearmanCI")
spearmanCI(building, contents)

## End(Not run)
data(fire)
attach(fire)
plot(building, contents, pch = 20, xlim = c(0, 95), ylim = c(0, 133),
     xlab = "Loss of Building", ylab = "Loss of Contents",
     main = "Danish Fire Insurance Claims")

## Not run: 
## Confidence intervals for Spearman rho; install the package
## spearmanCI, if not installed
if (!require("spearmanCI")) install.packages("spearmanCI")
spearmanCI(building, contents)

## End(Not run)

GDP of the US Economy

Description

US GDP (Gross Domestic Product) ranging from from 1950 (Q1) to 2009 (Q4).

Usage

GDPGDP

Format

A time series with 268 observations on two variables. The object is of class ts.

Source

de Carvalho, M., Rodrigues, P. and Rua, A. (2012) Tracking the US business cycle with a singular spectrum analysis. Economics Letters, 114, 32-35.

References

de Carvalho, M. and Rua, A. (2017) Real-time nowcasting the US output gap: Singular spectrum analysis at work. International Journal of Forecasting, 33, 185-198.

Examples

data(GDP)
plot(GDP, ylab = "Gross Domestic Product")

## Not run: 
if (!require("ASSA")) install.packages("ASSA")
data(GDP)
fit <- bssa(log(GDP[, 1]))
plot(fit)
print(fit)

## End(Not run)
data(GDP)
plot(GDP, ylab = "Gross Domestic Product")

## Not run: 
if (!require("ASSA")) install.packages("ASSA")
data(GDP)
fit <- bssa(log(GDP[, 1]))
plot(fit)
print(fit)

## End(Not run)

A Real-time Vintage of GDP and IP for the US Economy

Description

US GDP (Gross Domestic Product) and IP (Industrial Production) ranging from from 1947 (Q1) to 2013 (Q4); the data correspond to a real-time vintage.

Usage

GDPIPGDPIP

Format

A bivariate time series with 268 observations on two variables: GDP and IP. The object is of class mts.

Source

Federal Reserve Bank of Philadelphia.

References

de Carvalho, M. and Rua, A. (2017). Real-time nowcasting the US output gap: Singular spectrum analysis at work. International Journal of Forecasting, 33, 185-198.

Examples

data(GDPIP)
plot(GDPIP)

## Plotting GDP against IP (de Carvalho and Rua, 2017; Fig. 4)
data(GDPIP)
oldpar <- par(mar = c(5, 4, 4, 5) + .1)
plot(GDPIP[, 1], type = "l", 
     xlab = "Time", ylab = "Gross Domestic Product (GDP)",
     lwd = 3, col = "red", cex.lab = 1.4, cex.axis = 1.4)
par(new = TRUE)
plot(GDPIP[, 2], type = "l", xaxt = "n", yaxt = "n",
     xlab = "", ylab = "", lwd = 3, col = "blue", cex.axis = 1.4)
axis(4)
mtext("Industrial Production (IP)", side = 4, line = 3, cex = 1.4)
legend("topleft", col = c("red", "blue"),
       lty = 1, lwd = 3, legend = c("GDP", "IP"))
par(oldpar)

## Not run: 
    ## Tracking the US Business Cycle (de Carvalho et al, 2017; Fig. 6)
    ## Install the package ASSA, if not installed
    if (!require("ASSA")) install.packages("ASSA")
    data(GDPIP)
    fit <- bmssa(log(GDPIP))
    plot(fit)
    print(fit)

## End(Not run)
data(GDPIP)
plot(GDPIP)

## Plotting GDP against IP (de Carvalho and Rua, 2017; Fig. 4)
data(GDPIP)
oldpar <- par(mar = c(5, 4, 4, 5) + .1)
plot(GDPIP[, 1], type = "l", 
     xlab = "Time", ylab = "Gross Domestic Product (GDP)",
     lwd = 3, col = "red", cex.lab = 1.4, cex.axis = 1.4)
par(new = TRUE)
plot(GDPIP[, 2], type = "l", xaxt = "n", yaxt = "n",
     xlab = "", ylab = "", lwd = 3, col = "blue", cex.axis = 1.4)
axis(4)
mtext("Industrial Production (IP)", side = 4, line = 3, cex = 1.4)
legend("topleft", col = c("red", "blue"),
       lty = 1, lwd = 3, legend = c("GDP", "IP"))
par(oldpar)

## Not run: 
    ## Tracking the US Business Cycle (de Carvalho et al, 2017; Fig. 6)
    ## Install the package ASSA, if not installed
    if (!require("ASSA")) install.packages("ASSA")
    data(GDPIP)
    fit <- bmssa(log(GDPIP))
    plot(fit)
    print(fit)

## End(Not run)

Selected Stocks from the London Stock Exchange

Description

Prices at close from 26 selected stocks from the London stock exchange from 1989 to 2016.

Usage

lselse

Format

The lse data frame has 6894 rows and 27 columns.

References

Rubio, R., de Carvalho, M., and Huser (2018) Similarity-based clustering of extreme losses from the London stock exchange.

Lung Cancer Diagnosis

Description

The lungcancer data frame has 241 rows and 3 columns. The data were gathered gathered from a case-control study, conducted at the Mayo Clinic in Rochester (Minnesota), which included 140 controls and 101 lung cancer cases; only woman have been enrolled in the study.

Usage

lungcancerlungcancer

Format

This data frame contains the following columns:

marker: : square root of sEGFR levels (soluble isoform of the epidermal growth factor receptor).
status: : disease status, with 1 identifying lung cancer cases and 0 identifying controls.
pre: : premonopausal indicator, with 1 identifying premonopausal women.
age: : age in years.

References

Inácio de Carvalho, V., Jara, A. and de Carvalho, M. (2015) Bayesian nonparametric approaches for ROC curve inference. In: Nonparametric Bayesian Methods in Biostatistics and Bioinformatics. Eds R. Mitra and P. Mueller. Cham: Springer.

Rainfall Data from Madeira, Portugal

Description

Rainfall data from Madeira, Portugal, from January 1973 to June 2018.

Usage

madeiramadeira

Format

The madeira data frame has 544 observations and 8 columns:

yearmonth: Year and month.
prec: Total monthly precipitation (.01 inches).
amo: Atlantic multi-decadal oscillation.
nino34: El Niño-southern oscillation (ENSO), expressed by NINO34 index.
np: North pacific index (NPI).
pdo: Pacific decadal oscillation (PDO).
soi: Southern oscillation index (SOI).
nao: North atlantic oscillation (NAO).

Details

After eliminating the dry events (i.e., zero precipitation) and the missing precipitation data (two observations) one is left with a total of 532 observations, and that is the version of the data analyzed in de Carvalho et al (2022)

Source

National Oceanic and Atmospheric Administration.

References

de Carvalho, M., Pereira, S., Pereira, S. and de Zea Bermudez, P. (2022, in press). An extreme value Bayesian lasso for the conditional left and right tails. Journal of Agricultural, Biological and Environmental Statistics.

NASDAQ and NYSE Indices

Description

Daily quotations at close of the NASDAQ and NYSE stock market indices from February 1971 till November 2021.

Usage

marketsUSmarketsUS

Format

The marketsUS data frame has 12562 rows and 3 columns: date and quotation at close of the nasdaq and nyse indices.

References

de Carvalho, M., Kumukova, A. and dos Reis, G. (2022) Regression-type analysis for multivariate extreme values. Submitted.

Examples

## Not run: 
## de Carvalho et al (2022; Fig 5.1)
data(marketsUS)
packages <- c("scales", "ggplot2")
sapply(packages, require, character.only = TRUE)
ggplot(data = marketsUS, aes(x = date, y = value, color = Indices)) + 
  geom_line(aes(y = nasdaq, col = "NASDAQ"), alpha = 0.5,
                position = position_dodge(0.8), size = 1.1) +
  geom_line(aes(y = nyse, col = "NYSE"), alpha = 0.5,
            position = position_dodge(0.8), size = 1.1) + 
  scale_y_continuous(breaks = seq(2000, 14000, by = 2000)) + 
  scale_x_date(labels = date_format("%Y"), 
               breaks = as.Date(c("1971-01-01", "1978-01-01",
                                  "1985-01-01", "1992-01-01",
                                  "1999-01-01", "2006-01-01",
                                  "2013-01-01", "2020-01-01"))) + 
  scale_color_manual(values = c("red", "blue")) +
  labs(y = "Value (in USD)", x = "Time (in Years)")

## End(Not run)
## Not run: 
## de Carvalho et al (2022; Fig 5.1)
data(marketsUS)
packages <- c("scales", "ggplot2")
sapply(packages, require, character.only = TRUE)
ggplot(data = marketsUS, aes(x = date, y = value, color = Indices)) + 
  geom_line(aes(y = nasdaq, col = "NASDAQ"), alpha = 0.5,
                position = position_dodge(0.8), size = 1.1) +
  geom_line(aes(y = nyse, col = "NYSE"), alpha = 0.5,
            position = position_dodge(0.8), size = 1.1) + 
  scale_y_continuous(breaks = seq(2000, 14000, by = 2000)) + 
  scale_x_date(labels = date_format("%Y"), 
               breaks = as.Date(c("1971-01-01", "1978-01-01",
                                  "1985-01-01", "1992-01-01",
                                  "1999-01-01", "2006-01-01",
                                  "2013-01-01", "2020-01-01"))) + 
  scale_color_manual(values = c("red", "blue")) +
  labs(y = "Value (in USD)", x = "Time (in Years)")

## End(Not run)

MERVAL Stock Market Data

Description

Raw interval data series corresponding to weekly minimum and maximum values of the MERVAL index (Argentina stock market) ranging from January 1 2016 to September 30 2020, along with prices at open and prices at close.

Usage

mervalmerval

Format

A dataframe with 353 observations and 5 columns: dates, low, high, open, and close.

Source

Yahoo Finance.

References

de Carvalho, M. and Martos, G. (2022). Modeling interval trendlines: Symbolic singular spectrum analysis for interval time series. Journal of Forecasting, 41, 167-180.

Examples

data(merval)
attach(merval)
head(merval, 3)
oldpar <- par(pty = 's')
plot(low, high, pch = 20)
abline(a = 0, b = 1, lty = 2, col = "gray")
par(oldpar)
data(merval)
attach(merval)
head(merval, 3)
oldpar <- par(pty = 's')
plot(low, high, pch = 20)
abline(a = 0, b = 1, lty = 2, col = "gray")
par(oldpar)

Metabolic Syndrome Data

Description

The metsynd data includes Gamma-Glutamyl Transferase (GGT) levels and curves of arterial oxygen saturation, for samples of women suffering from metabolic syndrome and women without metabolic syndrome; the data were gathered from a population-based survey conducted in Galicia (NW Spain), and it includes 35 women suffering from metabolic syndrome and 80 women without metabolic syndrome.

Usage

metsyndmetsynd

Format

The data consist of a list with the following elements:

y0: GGT levels for women without metabolic syndrome.
y1: GGT levels for women suffering from metabolic syndrome.
X0: Curves of arterial oxygen saturation (%) for women without metabolic syndrome (X0$data, X0$time).
X1: Curves of arterial oxygen saturation (%) for women suffering from metabolic syndrome (X1$data, X1$time).

Details

The curves of arterial oxygen saturation are included in the matrices X0$data and X1$data, with each row representing a patient, and with columns representing ordered measurements over time. Here X0$time and X1$time represents the time (in hours) at which measurements were made, i.e., every 20 seconds during three hours of sleep. Further details on these data can be found in the references below.

References

Inácio de Carvalho, V., de Carvalho, M., Alonzo, T. A., González-Manteiga, W. (2016) Functional covariate-adjusted partial area under the specificity-ROC curve regression with an application to metabolic syndrome case study. Annals of Applied Statistics, 10, 1472-1495

Examples

data(metsynd)
library(scales)
attach(metsynd)

## Inacio de Carvalho et al (2016; Fig 1)
oldpar <- par(mfrow = c(1,2))
n0 <- length(y0)
n1 <- length(y1)
t <- X1$time
plot(t, X1$data[1, ], type = "l", lwd = 3, ylim = c(70, 100), 
     xlab = "Time (in hours)", ylab = "Arterial oxygen saturation (%)", 
     main = "Metabolic syndrome")
for (i in 2:n1)
  lines(t, X1$data[i, ], type = "l", lwd = 3, col = alpha("black", i / n1))
plot(t, X0$data[1, ], type = "l", lwd = 3, col = "gray", ylim = c(70, 100), 
     xlab = "Time (in hours)", ylab = "Arterial oxygen saturation (%)", 
     main = "No metabolic syndrome")
for (i in 1:n0)
  lines(t, X0$data[i, ], type = "l", lwd = 3, col = alpha("gray", i / n0))
par(oldpar)
data(metsynd)
library(scales)
attach(metsynd)

## Inacio de Carvalho et al (2016; Fig 1)
oldpar <- par(mfrow = c(1,2))
n0 <- length(y0)
n1 <- length(y1)
t <- X1$time
plot(t, X1$data[1, ], type = "l", lwd = 3, ylim = c(70, 100), 
     xlab = "Time (in hours)", ylab = "Arterial oxygen saturation (%)", 
     main = "Metabolic syndrome")
for (i in 2:n1)
  lines(t, X1$data[i, ], type = "l", lwd = 3, col = alpha("black", i / n1))
plot(t, X0$data[1, ], type = "l", lwd = 3, col = "gray", ylim = c(70, 100), 
     xlab = "Time (in hours)", ylab = "Arterial oxygen saturation (%)", 
     main = "No metabolic syndrome")
for (i in 1:n0)
  lines(t, X0$data[i, ], type = "l", lwd = 3, col = alpha("gray", i / n0))
par(oldpar)

International Airline Traffic Data

Description

Monthly number of passengers (in thousands) in a group of several international airline companies from January 1949-December 1960.

Usage

passengerspassengers

Format

A time series with 144 observations; the object is of class ts.

References

Brown, R.G. (1963) Smoothing, Forecasting and Prediction of Discrete Time Series. New Jersey: Prentice-Hall.

Rodrigues, P. C. and de Carvalho, M. (2013) Spectral modeling of time series with missing data. Applied Mathematical Modelling, 37, 4676-4684.

Prostate Cancer Diagnosis Data

Description

Longitudinal measurements of two Prostate Specific Antigen (PSA)-based biomarkers for 71 prostate cancer cases and 70 controls.

Usage

psapsa

Format

The psa data frame has 683 rows and 6 columns:

id: patient id.
marker1: total PSA.
marker2: ratio of free total PSA.
status: disease status of each subject, with 1 identifying subjects diagnosed with prostate cancer.
age: age in years.
t: time prior to diagnosis.

Details

The data were gathered from the Beta-Carotone and Retinol Efficacy Trial (CARET)—a lung cancer prevention trial, conducted at the Fred Hutchison Cancer Research Center. Further details on this study can be found in de Carvalho et al. (2020).

References

de Carvalho, M., Barney, B. and Page, G. L. (2020) Affinity-based measures of biomarker performance evaluation. Statistical Methods in Medical Research, 20, 837-853.

Santiago Temperature Data

Description

The data consist of average daily air temperatures in Fahrenheit scale—rounded to the nearest integer—of Santiago (Chile) from April 1990 to March 2017.

Usage

santiagosantiago

Format

A dataframe with 10126 observations on one variable.

Source

NOAA's National Centers for Environmental Information (NCEI).

References

Galasso, B., Zemel, Y. and de Carvalho, M. (2022). Bayesian semiparametric modelling of phase-varying point processes. Electronic Journal of Statistics, 16, 2518-2549.

Standard & Poor 500

Description

Daily Standard and Poor’s index at close from 1988 till 2007.

Usage

sp500sp500

Format

The sp500 data frame has 5043 rows and 2 columns: date and price at close.

References

de Carvalho, M. (2016) Statistics of extremes: Challenges and opportunities. In: Handbook of EVT and its Applications to Finance and Insurance. Eds F. Longin. Hoboken: Wiley.

Trail Making Test

Description

Completion times in seconds for TMT (Trail Making Test), part A, for 245 patients with Parkinson's disease, along with corresponding diagnostic on cognitive impairment.

Usage

tmttmt

Format

The tmt data frame has 245 rows and 2 columns:

marker: completion times (in seconds)
status: disease status of each subject, with 1, 2, and 3 respectively denoting patients diagnosed as unimpaired, mild cognitive impairment, and dementia.

References

Inácio de Carvalho, V., de Carvalho, M., and Branscum, A. (2018) Bayesian bootstrap inference for the ROC surface. Stat, 7, e211.

US Unemployment Rate

Description

US monthly unemployment rate from January 1967 to November 2009; the 515 monthly observations are seasonally adjusted.

Usage

unemploymentunemployment

Format

A time series with 515 observations; the object is of class ts.

Source

Bureau of Labor Statistics.

References

de Carvalho, M., Turkman, K. F. and Rua, A. (2013) Dynamic threshold modelling and the US business cycle. Journal of the Royal Statistical Society, Ser. C, 62, 535-550.

Examples

## de Carvalho et al (2013; Fig. 1)
data(unemployment)
plot(unemployment, xlab = "Time", ylab = "Unemployment Rate")
## de Carvalho et al (2013; Fig. 1)
data(unemployment)
plot(unemployment, xlab = "Time", ylab = "Unemployment Rate")

Package 'DATAstudio'

Help Index

The Research Data Warehouse of Miguel de Carvalho

Description

Funding

Author(s)

See Also

Swiss Alps Temperature Data

Description

Usage

Format

Source

References

Examples

Beatenberg Forest Temperature Data (In Unit Fréchet Scale)

Description

Usage

Format

Details

References

Examples

Brexit Poll Tracker

Description

Usage

Format

Source

References

Examples

Space Shuttle Challenger Data

Description

Usage

Format

References

Examples

Initial Claims of Unemployment

Description

Usage

Format

Source

References

See Also

Examples

Brain Shape Data

Description

Usage

Format

Details

References

Examples

Diabetes Diagnosis Data

Description

Usage

Format

References

Examples

Danish Fire Insurance Claims Database

Description

Usage

Format

References

See Also

Examples

GDP of the US Economy

Description

Usage

Format

Source

References

See Also

Examples

A Real-time Vintage of GDP and IP for the US Economy

Description

Usage

Format

Source

References

See Also

Examples

Selected Stocks from the London Stock Exchange

Description