Beethoven’s Moonlight Sonata

# .title-slide-h1[Beethoven's Moonlight Sonata]
### .title-slide-h2[Analyzing music with Time Series methodologies]
.title-slide-h3[Terry Wang, Rima Mittal, Joshua Goldberg]

---

# The Goal

Use time series methodologies to learn and construct new music conforming to the style of Beethoven's Moonlight Sonata.

???

Predicting next notes was not the focus of our analysis. Disclaimer: time series might not be the best tool for the problem because, although music looks like a continuous time series on a superficial level, it is discrete and not continuous, so something like HMM might be a better way to tackle it.  But again, if you have a hammer, everything looks like a nail :)

---

# Moonlight Sonata as a time series (arpeggios)

Notice the 3 note arpeggio pattern.

.pull-left[
<img src="index_files/figure-html/unnamed-chunk-2-1.png" width="720" style="display: block; margin: auto;" />
]

.pull-right[<iframe width="372" height="204" src="https://www.youtube.com/embed/4Tr0otuiQuU" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>]

???

This is a stationary time series but does exhibit some patterns. There are repeating sequences of 3-note patterns and differencing by 3 improves the model. Again, since our goal is not predicting but identifying the music structure and trying to generate real music, the differencing helps.

---
class: text-slide

# Stationarity

```r
adf.test(ms)
```

```
## 
## 	Augmented Dickey-Fuller Test
## 
## data:  ms
## Dickey-Fuller = -3.4876, Lag order = 9, p-value = 0.0435
## alternative hypothesis: stationary
```

```r
kpss.test(ms)
```

```
## 
## 	KPSS Test for Level Stationarity
## 
## data:  ms
## KPSS Level = 0.32934, Truncation lag parameter = 6, p-value = 0.1
```

KPSS Test shows the Time Series is Stationary.
---

# Stationarity

<img src="index_files/figure-html/unnamed-chunk-4-1.png" width="720" style="display: block; margin: auto;" />
The periodogram detects some seasonality, and the maximum frequency correponds to a seasonality of 3

---

# Stationarity

???

As we can see, the song exhibits a pattern that for lags of multiples of 3 the ACF start to show exponentially decreasing pattern, and at other lags it looks very unstationary with high correlations at each lag. Therefore, we will proceed to difference the time series by 3.

---

# Moonlight Sonata lagged by 3

---

# Modeling

---

# Baseline model: ARIMA

```r
aarima_fit <- auto.arima(ms_3)
summary(aarima_fit)
```

```
## Series: ms_3 
## ARIMA(1,0,4) with zero mean 
## 
## Coefficients:
##           ar1     ma1     ma2     ma3      ma4
##       -0.9562  1.3889  0.9542  0.1902  -0.1811
## s.e.   0.0148  0.0373  0.0572  0.0565   0.0365
## 
## sigma^2 estimated as 7.438:  log likelihood=-1931.47
## AIC=3874.94   AICc=3875.04   BIC=3903.03
## 
## Training set error measures:
##                       ME     RMSE     MAE MPE MAPE      MASE         ACF1
## Training set -0.04579418 2.718686 1.63699 NaN  Inf 0.6572701 -0.005812973
```

---

# Residual inspection

---

# Testing residuals

Residuals are not white noise. Inspection shows both time-dependent change in variance and autocorrelation.

```r
Box.test(aarima_fit$residuals, type = "L", lag = 10)
```

```
## 
## 	Box-Ljung test
## 
## data:  aarima_fit$residuals
## X-squared = 35.255, df = 10, p-value = 0.000113
```

---

# Sophisticated model 1: GARCH

GARCH model is a method for dealing with heteroscedasticity in residuals, so we hope this will be more applicable to our data.

```r
garch_fit <-
  fGarch::garchFit( ~ arma(1, 4) + garch(1, 1), data = ms_3, trace = F)
```

The GARCH model has a log likelihood of 1585.0695755, compared to `auto.arima`'s -1931.4676347.

---

# Residual issues persist...

???

Even using GARCH model, the residuals are still showing varying variance and autocorrelation.

---

# Sophisticated model 2: vector regression

Given the song's particular structure of repeating 3-note arpeggio patterns, it might be useful to reshape the data into a matrix of `$n\times3$` and use vector regression to tackle the problem.

```r
library(vars)
ms_3col <- matrix(ms, ncol = 3)
var_est <- VAR(ms_3col, p = 1, type = "const")
```

```r
summary(var_est)
```

---

# Residuals still bad...

Vector regression is also problematic and fails to reduce residuals to white noise:

???

This method is roughly similar to an autoregressive method taking into account the structure of the data. As such it didn't add much to what we have already tried in terms of bringing residuals under control.

---

# Better model 1: simple markov chain

This model takes each note as a discrete event and therefore can lead to better fit. We calculate the transition matrix for each note vs 3 notes into the future.

```r
ms_l3 <-
  data.frame(ms = ms[1:(length(ms) - 3)], ms_l3 = ms[4:length(ms)])
t_mat <- as.matrix(prop.table(table(ms_l3$ms, ms_l3$ms_l3), 1))

# We remove the first column of the matrix because note 37 only 
# appeared once in the end
t_mat <- t_mat[, -1]
```

---

# Markov chain predictions

* To predict, we simply give it a 3 note sequence in vector form.  Since our transition matrix has a particular form, we have to conform to it, and mark the corresponding note with a 1.

For example, a three note sequence corresponding to the start of the song (notes 56, 61, 64, a C# minor chord) can be written as:

```
##  [1] 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [36] 0 0
```

```
##  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [36] 0 0
```

```
##  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [36] 0 0
```

* Note: 1's correspond to the position of the note.

* Then we simply multiply each 1-hot vector with the transition matrix to get the probability for the future note (which we sample).

---

# Better model 2: neural network

Used grouping of 3 notes as input to predict the next 3 notes. The RNN Model learned the 3 note arpeggio pattern.

<img src="image/lstm.png" style="display: block; margin: auto;" />
---

# Prediction

---

# Learning results

* To make predictions, we use each model's default `forecast` method to predict 48 notes down the line, or 4 measures into the future.

* We add some noise based on the model parameters.

* Round all predictions to its nearest whole number. Note, for time series models the predictions start with the last note.

```r
aarima_pred <- round(forecast(aarima_fit, h = 48)$mean +
                       ms[length(ms)] +
                       rnorm(mean = 0,
                             sd = sqrt(aarima_fit$sigma2),
                             48))

garch_pred <- round(
  fGarch::predict(garch_fit, 48)$meanForecast +
    ms[length(ms)] +
    rnorm(
      n = 48,
      mean = 0,
      sd = fGarch::predict(garch_fit, 48)$standardDeviation
    )
)
```

---

# Prediction: vector regression

```r
pred_vars <- function(vars_obj, n.ahead = 12) {
  temp <- predict(vars_obj, n.ahead = n.ahead)
  sigma_1 <- summary(vars_obj$varresult$y1)
  sigma_1 <- sigma_1$sigma
  sigma_2 <- summary(vars_obj$varresult$y2)
  sigma_2 <- sigma_2$sigma
  sigma_3 <- summary(vars_obj$varresult$y3)
  sigma_3 <- sigma_3$sigma
  
  output <-
    data.frame(
      y1 = round(temp$fcst$y1[, 1] + rnorm(
        n = 48, mean = 0, sd = sigma_1
      )),
      y2 = round(temp$fcst$y2[, 1] + rnorm(
        n = 48, mean = 0, sd = sigma_2
      )),
      y3 = round(temp$fcst$y3[, 1] + rnorm(
        n = 48, mean = 0, sd = sigma_3
      ))
    )
  
  return(as.vector(t(output)))
}
```

---

# Prediction: Markov Chain

```r
pred_markov <- function(init_vec, tm, n_ahead = 16) {
  output <- c()
  last_3notes <- init_vec
  for (i in 1:n_ahead) {
    note1 <- sample(
        colnames(tm),
        size = 1,
        prob = as.numeric(colnames(tm) == as.character(last_3notes[1])) %*% tm
      )
    note2 <- sample(
        colnames(tm),
        size = 1,
        prob = as.numeric(colnames(tm) == as.character(last_3notes[1])) %*% tm
      )
    note3 <- sample(
        colnames(tm),
        size = 1,
        prob = as.numeric(colnames(tm) == as.character(last_3notes[1])) %*% tm
      )
    last_3notes <- c(note1, note2, note3)
    output <- c(output, note1, note2, note3)
  }
  return(as.numeric(output))
}
```

???

### Inputs:
__init_vec:__ vector of size 3 indicating the initial notes, must belong to colnames(tm)

__tm:__ transition matrix, must be square and all rows add up to 1

__n_ahead:__ how many 3 note arpeggios you want to generate

### Output:

A numerical vector of size 3*n_ahead

---

# Prediction Results

Only VARS was able to replicate some of the structure in the music:

---

# Appendix

---

# DirRec

We tried DirRec method as well, but the error would blow up substantially after just a few steps due to added noise to point estimates, so we abandoned the approach and treat any musicality as the result of noise.

```r
# This is the DirRec approach - it is abandoned.
pred_arima <- function(ts_data, arima_obj, h = 48) {
  # use ms_3 series
  # get arima params
  data <- ts_data
  output <- c()
  params <- unname(arimaorder(arima_obj))
  for (i in 1:h) {
    model_refit <- Arima(ts_data,
                         order = params,
                         include.mean = F)
    fc <- as.vector(forecast(model_refit, h = 1)$mean) +
      data[length(data)] +
      rnorm(1, 0, sd = sqrt(model_refit$sigma2)) # Added error term
    output <- c(output, fc)
    # Add point back to data
    data <- c(data, fc)
  }
  return(output)
}
```