| Title: | Multiple Imputation with 'MIDAS2' Denoising Autoencoders |
|---|---|
| Description: | Fits 'MIDAS' denoising autoencoder models for multiple imputation of missing data, generates multiply-imputed datasets, computes imputation means, and runs Rubin's rules regression analysis. Wraps the 'MIDAS2' 'Python' engine via a local 'FastAPI' server over 'HTTP', so no 'reticulate' dependency is needed at runtime. Methods are described in Lall and Robinson (2022) <doi:10.1017/pan.2020.49> and Lall and Robinson (2023) <doi:10.18637/jss.v107.i09>. |
| Authors: | Thomas Robinson [aut, cre], Ranjit Lall [aut] |
| Maintainer: | Thomas Robinson <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.1 |
| Built: | 2026-05-16 09:29:20 UTC |
| Source: | https://github.com/cran/rMIDAS2 |
Runs a GLM across all stored imputations and combines the results using Rubin's combination rules for multiple imputation inference.
combine( model_id, y, ind_vars = NULL, dof_adjust = TRUE, incl_constant = TRUE, ... )combine( model_id, y, ind_vars = NULL, dof_adjust = TRUE, incl_constant = TRUE, ... )
model_id |
A character model ID, or a fitted model object (list with
a |
y |
Character. Name of the outcome variable. |
ind_vars |
Character vector of independent variable names, or |
dof_adjust |
Logical. Apply Barnard-Rubin degrees-of-freedom
adjustment (default |
incl_constant |
Logical. Include an intercept (default |
... |
Arguments forwarded to |
A data frame with columns term, estimate, std.error,
statistic, df, and p.value.
## Not run: df <- data.frame(Y = rnorm(200), X1 = rnorm(200), X2 = rnorm(200)) df$X1[sample(200, 40)] <- NA fit <- midas_fit(df, epochs = 10L) midas_transform(fit, m = 10) results <- combine(fit, y = "Y") results ## End(Not run)## Not run: df <- data.frame(Y = rnorm(200), X1 = rnorm(200), X2 = rnorm(200)) df$X1[sample(200, 40)] <- NA fit <- midas_fit(df, epochs = 10L) midas_transform(fit, m = 10) results <- combine(fit, y = "Y") results ## End(Not run)
Starts the server if it is not already running. Called internally by every client function so users never have to manage the server manually.
ensure_server(...)ensure_server(...)
... |
Arguments forwarded to |
Invisibly returns the base URL of the running server.
## Not run: ensure_server() ## End(Not run)## Not run: ensure_server() ## End(Not run)
Calculates the element-wise mean across all stored imputations for a model.
imp_mean(model_id, ...)imp_mean(model_id, ...)
model_id |
A character model ID, or a fitted model object (list with
a |
... |
Arguments forwarded to |
A data frame with the mean imputed values.
## Not run: df <- data.frame(X1 = rnorm(200), X2 = rnorm(200)) df$X1[sample(200, 40)] <- NA fit <- midas_fit(df, epochs = 10L) midas_transform(fit, m = 10) mean_df <- imp_mean(fit) ## End(Not run)## Not run: df <- data.frame(X1 = rnorm(200), X2 = rnorm(200)) df$X1[sample(200, 40)] <- NA fit <- midas_fit(df, epochs = 10L) midas_transform(fit, m = 10) mean_df <- imp_mean(fit) ## End(Not run)
Creates an isolated Python environment and installs the midasverse-midas-api
package (which pulls in midasverse-midas as a dependency).
install_backend( method = c("pip", "conda", "uv"), envname = "midas2_env", package = "midasverse-midas-api" )install_backend( method = c("pip", "conda", "uv"), envname = "midas2_env", package = "midasverse-midas-api" )
method |
Character. One of |
envname |
Character. Name of the virtual environment to create
(default |
package |
Character. Package specifier to install
(default |
This is the only function in the package that uses reticulate, and
only for environment creation. It is never used at runtime.
No return value, called for side effects.
## Not run: install_backend() install_backend(method = "conda") ## End(Not run)## Not run: install_backend() install_backend(method = "conda") ## End(Not run)
Convenience function that fits a MIDAS model and generates imputations
in a single call. Equivalent to calling midas_fit() followed by
midas_transform().
midas( data, m = 5L, hidden_layers = c(256L, 128L, 64L), dropout_prob = 0.5, epochs = 75L, batch_size = 64L, lr = 0.001, corrupt_rate = 0.8, num_adj = 1, cat_adj = 1, bin_adj = 1, pos_adj = 1, omit_first = FALSE, seed = 89L, ... )midas( data, m = 5L, hidden_layers = c(256L, 128L, 64L), dropout_prob = 0.5, epochs = 75L, batch_size = 64L, lr = 0.001, corrupt_rate = 0.8, num_adj = 1, cat_adj = 1, bin_adj = 1, pos_adj = 1, omit_first = FALSE, seed = 89L, ... )
data |
A data frame (may contain |
m |
Integer. Number of imputations (default 5). |
|
Integer vector of hidden layer sizes
(default |
|
dropout_prob |
Numeric. Dropout probability (default 0.5). |
epochs |
Integer. Number of training epochs (default 75). |
batch_size |
Integer. Mini-batch size (default 64). |
lr |
Numeric. Learning rate (default 0.001). |
corrupt_rate |
Numeric. Corruption rate for denoising (default 0.8). |
num_adj |
Numeric. Loss multiplier for numeric columns (default 1). |
cat_adj |
Numeric. Loss multiplier for categorical columns (default 1). |
bin_adj |
Numeric. Loss multiplier for binary columns (default 1). |
pos_adj |
Numeric. Loss multiplier for positive columns (default 1). |
omit_first |
Logical. Omit first column from encoder input
(default |
seed |
Integer. Random seed (default 89). |
... |
Arguments forwarded to |
A list with model_id and imputations (a list of data frames).
## Not run: df <- data.frame(X1 = rnorm(200), X2 = rnorm(200)) df$X1[sample(200, 40)] <- NA result <- midas(df, m = 5, epochs = 10) head(result$imputations[[1]]) ## End(Not run)## Not run: df <- data.frame(X1 = rnorm(200), X2 = rnorm(200)) df$X1[sample(200, 40)] <- NA result <- midas(df, m = 5, epochs = 10) head(result$imputations[[1]]) ## End(Not run)
Sends data to the server and fits a MIDAS denoising autoencoder.
midas_fit( data, hidden_layers = c(256L, 128L, 64L), dropout_prob = 0.5, epochs = 75L, batch_size = 64L, lr = 0.001, corrupt_rate = 0.8, num_adj = 1, cat_adj = 1, bin_adj = 1, pos_adj = 1, omit_first = FALSE, seed = 89L, ... )midas_fit( data, hidden_layers = c(256L, 128L, 64L), dropout_prob = 0.5, epochs = 75L, batch_size = 64L, lr = 0.001, corrupt_rate = 0.8, num_adj = 1, cat_adj = 1, bin_adj = 1, pos_adj = 1, omit_first = FALSE, seed = 89L, ... )
data |
A data frame (may contain |
|
Integer vector of hidden layer sizes
(default |
|
dropout_prob |
Numeric. Dropout probability (default 0.5). |
epochs |
Integer. Number of training epochs (default 75). |
batch_size |
Integer. Mini-batch size (default 64). |
lr |
Numeric. Learning rate (default 0.001). |
corrupt_rate |
Numeric. Corruption rate for denoising (default 0.8). |
num_adj |
Numeric. Loss multiplier for numeric columns (default 1). |
cat_adj |
Numeric. Loss multiplier for categorical columns (default 1). |
bin_adj |
Numeric. Loss multiplier for binary columns (default 1). |
pos_adj |
Numeric. Loss multiplier for positive columns (default 1). |
omit_first |
Logical. Omit first column from encoder input
(default |
seed |
Integer. Random seed (default 89). |
... |
Arguments forwarded to |
A list with model_id, n_rows, n_cols, col_types.
## Not run: df <- data.frame(X1 = rnorm(200), X2 = rnorm(200), X3 = rnorm(200)) df$X2[sample(200, 40)] <- NA fit <- midas_fit(df, epochs = 10L) fit$model_id ## End(Not run)## Not run: df <- data.frame(X1 = rnorm(200), X2 = rnorm(200), X3 = rnorm(200)) df$X2[sample(200, 40)] <- NA fit <- midas_fit(df, epochs = 10L) fit$model_id ## End(Not run)
Generates m imputed datasets from a fitted MIDAS model.
midas_transform(model_id, m = 5L, ...)midas_transform(model_id, m = 5L, ...)
model_id |
A character model ID, or a fitted model object (list with
a |
m |
Integer. Number of imputations (default 5). |
... |
Arguments forwarded to |
A list of m data frames, each with imputed values.
## Not run: df <- data.frame(X1 = rnorm(200), X2 = rnorm(200)) df$X1[sample(200, 40)] <- NA fit <- midas_fit(df, epochs = 10L) imps <- midas_transform(fit, m = 10) head(imps[[1]]) ## End(Not run)## Not run: df <- data.frame(X1 = rnorm(200), X2 = rnorm(200)) df$X1[sample(200, 40)] <- NA fit <- midas_fit(df, epochs = 10L) imps <- midas_transform(fit, m = 10) head(imps[[1]]) ## End(Not run)
Masks a fraction of observed values, re-imputes them, and computes RMSE to assess imputation quality.
overimpute(model_id, mask_frac = 0.1, m = 5L, seed = NULL, ...)overimpute(model_id, mask_frac = 0.1, m = 5L, seed = NULL, ...)
model_id |
A character model ID, or a fitted model object (list with
a |
mask_frac |
Numeric. Fraction of observed values to mask (default 0.1). |
m |
Integer. Number of imputations for the diagnostic (default 5). |
seed |
Integer or |
... |
Arguments forwarded to |
A list with rmse (named numeric vector) and mean_rmse.
## Not run: df <- data.frame(X1 = rnorm(200), X2 = rnorm(200)) df$X1[sample(200, 40)] <- NA fit <- midas_fit(df, epochs = 10L) diag <- overimpute(fit, mask_frac = 0.1) diag$mean_rmse ## End(Not run)## Not run: df <- data.frame(X1 = rnorm(200), X2 = rnorm(200)) df$X1[sample(200, 40)] <- NA fit <- midas_fit(df, epochs = 10L) diag <- overimpute(fit, mask_frac = 0.1) diag$mean_rmse ## End(Not run)
Launches python -m midas2_api as a background process and waits for the
/health endpoint to respond.
start_server(python = "python3", port = NULL, venv = NULL, max_wait = 120L)start_server(python = "python3", port = NULL, venv = NULL, max_wait = 120L)
python |
Path to the Python interpreter (default |
port |
Port to bind to. If |
venv |
Path to a Python virtual environment.
If supplied, the interpreter is taken from |
max_wait |
Maximum number of 0.5-second polling attempts (default 120, i.e. 60 seconds). The first launch may be slower due to Python import caching. |
Invisibly returns the port number.
## Not run: start_server() start_server(venv = "~/.virtualenvs/midas2_env") ## End(Not run)## Not run: start_server() start_server(venv = "~/.virtualenvs/midas2_env") ## End(Not run)
Kills the background Python process and clears the internal state.
stop_server()stop_server()
No return value, called for side effects.
## Not run: stop_server() ## End(Not run)## Not run: stop_server() ## End(Not run)
Stops the running server (if any), removes the Python environment created by
install_backend(), and clears the saved configuration.
uninstall_backend(method = c("pip", "conda", "uv"), envname = "midas2_env")uninstall_backend(method = c("pip", "conda", "uv"), envname = "midas2_env")
method |
Character. One of |
envname |
Character. Name of the virtual environment to remove
(default |
No return value, called for side effects.
## Not run: uninstall_backend() uninstall_backend(method = "conda") ## End(Not run)## Not run: uninstall_backend() uninstall_backend(method = "conda") ## End(Not run)
Upgrades the midasverse-midas-api package (and its dependencies) in the
existing Python environment. Stops the running server first so that the
new version is loaded on next use.
update_backend( method = c("pip", "conda", "uv"), envname = "midas2_env", package = "midasverse-midas-api" )update_backend( method = c("pip", "conda", "uv"), envname = "midas2_env", package = "midasverse-midas-api" )
method |
Character. One of |
envname |
Character. Name of the virtual environment
(default |
package |
Character. Package specifier to upgrade
(default |
No return value, called for side effects.
## Not run: update_backend() ## End(Not run)## Not run: update_backend() ## End(Not run)