Package 'kssa'

Title: Known Sub-Sequence Algorithm
Description: Implements the Known Sub-Sequence Algorithm <doi:10.1016/j.aaf.2021.12.013>, which helps to automatically identify and validate the best method for missing data imputation in a time series. Supports the comparison of multiple state-of-the-art algorithms.
Authors: Iván Felipe Benavides [aut, cre, cph] , Steffen Moritz [aut] , Brayan-David Aroca-Gonzalez [aut] , Jhoana Romero [aut] , Marlon Santacruz [aut] , John-Josephraj Selvaraj [aut]
Maintainer: Iván Felipe Benavides <[email protected]>
License: AGPL (>= 3)
Version: 0.0.1
Built: 2024-11-18 05:10:14 UTC
Source: https://github.com/pipeben/kssa

Help Index


get_imputations function

Description

Function to get imputations from methods compared by kssa

Usage

get_imputations(x_ts, methods = "all", seed = 1234)

Arguments

x_ts

A ts object with missing data to be imputed

methods

A string or string vector indicating the method or methods

seed

Numeric. Any number

Value

A list of imputed time series with the selected methods

Examples

# Get imputed values for airgap_na_ts with the methods of
# Create 20% random missing data in tsAirgapComplete time series from imputeTS
set.seed(1234)
library("imputeTS")
library("kssa")
airgap_na <- missMethods::delete_MCAR(as.data.frame(tsAirgapComplete), 0.2)

# Convert co2_na to time series object
airgap_na_ts <- ts(airgap_na, start = c(1959, 1), end = c(1997, 12), frequency = 12)

my_imputations <- get_imputations(airgap_na_ts, methods = "all")

# my_imputations contains the imputed time series with all methods.
# Access it and choose the one from the best method for your purposes

my_imputations$seadec
plot.ts(my_imputations$seadec)

kssa Algorithm

Description

Run the Known Sub-Sequence Algorithm to compare the performance of imputation methods on a time series of interest

Usage

kssa(
  x_ts,
  start_methods,
  actual_methods,
  segments = 5,
  iterations = 10,
  percentmd = 0.2,
  seed = 1234
)

Arguments

x_ts

Time series object ts containing missing data (NA)

start_methods

String vector. The method or methods to start the algorithm. Same as for actual_methods

actual_methods

The imputation methods to be compared and validated. It can be a string vector containing the following 11 imputation methods:

  • "all" - compare among all methods automatically - Default

  • "auto.arima" - State space representation of an ARIMA model

  • "StructTS" - State space representation of a structural model

  • "seadec" - Seasonal decomposition with Kalman smoothing

  • "linear_i" - Linear interpolation

  • "spline_i" - Spline interpolation

  • "stine_i" - Stineman interpolation

  • "simple_ma" - Simple moving average

  • "linear_ma" - Linear moving average

  • "exponential_ma" - Exponential moving average

  • "locf" - Last observation carried forward

  • "stl" - Seasonal and trend decomposition with Loess

For further details on these imputation methods please check packages imputeTS and forecast

segments

Integer. Into how many segments the time series will be divided

iterations

Integer. How many iterations to run

percentmd

Numeric. Percentage of missing data. Must match with the true percentage of missing data in x_ts

seed

Numeric. Random seed to choose

Value

A list of results to be plotted with function kssa_plot for easy interpretation

References

Benavides, I. F., Santacruz, M., Romero-Leiton, J. P., Barreto, C., & Selvaraj, J. J. (2022). Assessing methods for multiple imputation of systematic missing data in marine fisheries time series with a new validation algorithm. Aquaculture and Fisheries. Full text publication.

Examples

# Create 20% random missing data in tsAirgapComplete time series from imputeTS
set.seed(1234)
library("kssa")
library("imputeTS")
airgap_na <- missMethods::delete_MCAR(as.data.frame(tsAirgapComplete), 0.2)

# Convert co2_na to time series object
airgap_na_ts <- ts(airgap_na, start = c(1959, 1), end = c(1997, 12), frequency = 12)

# Apply the kssa algorithm with 5 segments,
# 10 iterations, 20% of missing data, and
# compare among all available methods in the package.
# Remember that percentmd must match with
# the real percentage of missing data in the
# input co2_na_ts time series

results_kssa <- kssa(airgap_na_ts,
  start_methods = "all",
  actual_methods = "all",
  segments = 5,
  iterations = 10,
  percentmd = 0.2
)

# Print and check results
results_kssa

# For an easy interpretation of kssa results
# please use function kssa_plot

kssa_plot function

Description

Function to plot the results of kssa for easy interpretation

Usage

kssa_plot(results, type, metric)

Arguments

results

An object with results produced with function kssa

type

A character value with the type of plot to show. It can be "summary" or "complete".

metric

A character with the performance metric to be plotted. It can be "rmse", "mase," "cor", or "smape"

  • "rmse" - Root Mean Squared Error (default choice)

  • "mase" - Mean Absolute Scaled Error

  • "smape" - Symmetric Mean Absolute Percentage Error

  • "cor" - Pearson correlation coefficient

For further details on these metrics please check package Metrics

Value

A plot of kssa results in which imputation methods are ordered from lower to higher (left to right) error.

Examples

# Plot the results obtained in the example from function kssa

# Create 20% random missing data in tsAirgapComplete time series from imputeTS
set.seed(1234)
library("kssa")
library("imputeTS")
airgap_na <- missMethods::delete_MCAR(as.data.frame(tsAirgapComplete), 0.2)

# Convert co2_na to time series object
airgap_na_ts <- ts(airgap_na, start = c(1959, 1), end = c(1997, 12), frequency = 12)

# Apply the kssa algorithm with 5 segments,
# 10 iterations, 20% of missing data, and
# compare among all available methods in the package.
# Remember that percentmd must match with
# the real percentage of missing data in the
# input co2_na_ts time series

results_kssa <- kssa(airgap_na_ts,
  start_methods = "all",
  actual_methods = "all",
  segments = 5,
  iterations = 10,
  percentmd = 0.2
)

kssa_plot(results_kssa, type = "complete", metric = "rmse")

# Conclusion: Since kssa_plot is ordered from lower to
# higher error (left to right), method 'linear_i' is the best to
# impute missing data in airgap_na_ts. Notice that method 'locf' is the worst

# To obtain imputations with the best method, or any method of preference
# please use function get_imputations