Package 'kssa' reference manual

Title:	Known Sub-Sequence Algorithm
Description:	Implements the Known Sub-Sequence Algorithm <doi:10.1016/j.aaf.2021.12.013>, which helps to automatically identify and validate the best method for missing data imputation in a time series. Supports the comparison of multiple state-of-the-art algorithms.
Authors:	Iván Felipe Benavides [aut, cre, cph] , Steffen Moritz [aut] , Brayan-David Aroca-Gonzalez [aut] , Jhoana Romero [aut] , Marlon Santacruz [aut] , John-Josephraj Selvaraj [aut]
Maintainer:	Iván Felipe Benavides <[email protected]>
License:	AGPL (>= 3)
Version:	0.0.1
Built:	2025-02-16 04:31:55 UTC
Source:	https://github.com/pipeben/kssa

get_imputations function

Description

Function to get imputations from methods compared by kssa

Usage

get_imputations(x_ts, methods = "all", seed = 1234)
get_imputations(x_ts, methods = "all", seed = 1234)

Arguments

`x_ts`	A ts object with missing data to be imputed
`methods`	A string or string vector indicating the method or methods
`seed`	Numeric. Any number

Value

A list of imputed time series with the selected methods

Examples


# Get imputed values for airgap_na_ts with the methods of
# Create 20% random missing data in tsAirgapComplete time series from imputeTS
set.seed(1234)
library("imputeTS")
library("kssa")
airgap_na <- missMethods::delete_MCAR(as.data.frame(tsAirgapComplete), 0.2)

# Convert co2_na to time series object
airgap_na_ts <- ts(airgap_na, start = c(1959, 1), end = c(1997, 12), frequency = 12)

my_imputations <- get_imputations(airgap_na_ts, methods = "all")

# my_imputations contains the imputed time series with all methods.
# Access it and choose the one from the best method for your purposes

my_imputations$seadec
plot.ts(my_imputations$seadec)


# Get imputed values for airgap_na_ts with the methods of
# Create 20% random missing data in tsAirgapComplete time series from imputeTS
set.seed(1234)
library("imputeTS")
library("kssa")
airgap_na <- missMethods::delete_MCAR(as.data.frame(tsAirgapComplete), 0.2)

# Convert co2_na to time series object
airgap_na_ts <- ts(airgap_na, start = c(1959, 1), end = c(1997, 12), frequency = 12)

my_imputations <- get_imputations(airgap_na_ts, methods = "all")

# my_imputations contains the imputed time series with all methods.
# Access it and choose the one from the best method for your purposes

my_imputations$seadec
plot.ts(my_imputations$seadec)

kssa Algorithm

Description

Run the Known Sub-Sequence Algorithm to compare the performance of imputation methods on a time series of interest

Usage

kssa(
  x_ts,
  start_methods,
  actual_methods,
  segments = 5,
  iterations = 10,
  percentmd = 0.2,
  seed = 1234
)
kssa(
  x_ts,
  start_methods,
  actual_methods,
  segments = 5,
  iterations = 10,
  percentmd = 0.2,
  seed = 1234
)

Arguments

`x_ts`	Time series object `ts` containing missing data (NA)
`start_methods`	String vector. The method or methods to start the algorithm. Same as for actual_methods
`actual_methods`	The imputation methods to be compared and validated. It can be a string vector containing the following 11 imputation methods: "all" - compare among all methods automatically - Default "auto.arima" - State space representation of an ARIMA model "StructTS" - State space representation of a structural model "seadec" - Seasonal decomposition with Kalman smoothing "linear_i" - Linear interpolation "spline_i" - Spline interpolation "stine_i" - Stineman interpolation "simple_ma" - Simple moving average "linear_ma" - Linear moving average "exponential_ma" - Exponential moving average "locf" - Last observation carried forward "stl" - Seasonal and trend decomposition with Loess For further details on these imputation methods please check packages `imputeTS` and `forecast`
`segments`	Integer. Into how many segments the time series will be divided
`iterations`	Integer. How many iterations to run
`percentmd`	Numeric. Percentage of missing data. Must match with the true percentage of missing data in x_ts
`seed`	Numeric. Random seed to choose

Value

A list of results to be plotted with function kssa_plot for easy interpretation

References

Benavides, I. F., Santacruz, M., Romero-Leiton, J. P., Barreto, C., & Selvaraj, J. J. (2022). Assessing methods for multiple imputation of systematic missing data in marine fisheries time series with a new validation algorithm. Aquaculture and Fisheries. Full text publication.

Examples


# Create 20% random missing data in tsAirgapComplete time series from imputeTS
set.seed(1234)
library("kssa")
library("imputeTS")
airgap_na <- missMethods::delete_MCAR(as.data.frame(tsAirgapComplete), 0.2)

# Convert co2_na to time series object
airgap_na_ts <- ts(airgap_na, start = c(1959, 1), end = c(1997, 12), frequency = 12)

# Apply the kssa algorithm with 5 segments,
# 10 iterations, 20% of missing data, and
# compare among all available methods in the package.
# Remember that percentmd must match with
# the real percentage of missing data in the
# input co2_na_ts time series

results_kssa <- kssa(airgap_na_ts,
  start_methods = "all",
  actual_methods = "all",
  segments = 5,
  iterations = 10,
  percentmd = 0.2
)

# Print and check results
results_kssa

# For an easy interpretation of kssa results
# please use function kssa_plot


# Create 20% random missing data in tsAirgapComplete time series from imputeTS
set.seed(1234)
library("kssa")
library("imputeTS")
airgap_na <- missMethods::delete_MCAR(as.data.frame(tsAirgapComplete), 0.2)

# Convert co2_na to time series object
airgap_na_ts <- ts(airgap_na, start = c(1959, 1), end = c(1997, 12), frequency = 12)

# Apply the kssa algorithm with 5 segments,
# 10 iterations, 20% of missing data, and
# compare among all available methods in the package.
# Remember that percentmd must match with
# the real percentage of missing data in the
# input co2_na_ts time series

results_kssa <- kssa(airgap_na_ts,
  start_methods = "all",
  actual_methods = "all",
  segments = 5,
  iterations = 10,
  percentmd = 0.2
)

# Print and check results
results_kssa

# For an easy interpretation of kssa results
# please use function kssa_plot

kssa_plot function

Description

Function to plot the results of kssa for easy interpretation

Usage

kssa_plot(results, type, metric)
kssa_plot(results, type, metric)

Arguments

results

An object with results produced with function kssa

type

A character value with the type of plot to show. It can be "summary" or "complete".

metric

A character with the performance metric to be plotted. It can be "rmse", "mase," "cor", or "smape"

"rmse" - Root Mean Squared Error (default choice)
"mase" - Mean Absolute Scaled Error
"smape" - Symmetric Mean Absolute Percentage Error
"cor" - Pearson correlation coefficient

For further details on these metrics please check package Metrics

Value

A plot of kssa results in which imputation methods are ordered from lower to higher (left to right) error.

Examples


# Plot the results obtained in the example from function kssa

# Create 20% random missing data in tsAirgapComplete time series from imputeTS
set.seed(1234)
library("kssa")
library("imputeTS")
airgap_na <- missMethods::delete_MCAR(as.data.frame(tsAirgapComplete), 0.2)

# Convert co2_na to time series object
airgap_na_ts <- ts(airgap_na, start = c(1959, 1), end = c(1997, 12), frequency = 12)

# Apply the kssa algorithm with 5 segments,
# 10 iterations, 20% of missing data, and
# compare among all available methods in the package.
# Remember that percentmd must match with
# the real percentage of missing data in the
# input co2_na_ts time series

results_kssa <- kssa(airgap_na_ts,
  start_methods = "all",
  actual_methods = "all",
  segments = 5,
  iterations = 10,
  percentmd = 0.2
)

kssa_plot(results_kssa, type = "complete", metric = "rmse")

# Conclusion: Since kssa_plot is ordered from lower to
# higher error (left to right), method 'linear_i' is the best to
# impute missing data in airgap_na_ts. Notice that method 'locf' is the worst

# To obtain imputations with the best method, or any method of preference
# please use function get_imputations


# Plot the results obtained in the example from function kssa

# Create 20% random missing data in tsAirgapComplete time series from imputeTS
set.seed(1234)
library("kssa")
library("imputeTS")
airgap_na <- missMethods::delete_MCAR(as.data.frame(tsAirgapComplete), 0.2)

# Convert co2_na to time series object
airgap_na_ts <- ts(airgap_na, start = c(1959, 1), end = c(1997, 12), frequency = 12)

# Apply the kssa algorithm with 5 segments,
# 10 iterations, 20% of missing data, and
# compare among all available methods in the package.
# Remember that percentmd must match with
# the real percentage of missing data in the
# input co2_na_ts time series

results_kssa <- kssa(airgap_na_ts,
  start_methods = "all",
  actual_methods = "all",
  segments = 5,
  iterations = 10,
  percentmd = 0.2
)

kssa_plot(results_kssa, type = "complete", metric = "rmse")

# Conclusion: Since kssa_plot is ordered from lower to
# higher error (left to right), method 'linear_i' is the best to
# impute missing data in airgap_na_ts. Notice that method 'locf' is the worst

# To obtain imputations with the best method, or any method of preference
# please use function get_imputations

Package 'kssa'

Help Index

get_imputations function

Description

Usage

Arguments

Value

Examples

kssa Algorithm

Description

Usage

Arguments

Value

References

Examples

kssa_plot function

Description

Usage

Arguments

Value

Examples