Time Series Forecasting using Keras-Tensorflow

Ahlemkaabi
6 min readMay 15, 2022

In this short blog tutorial, I’ll share a way to do Time Series Forecasting. the dataset used is coinbaseUSD historical data from this link, you can download it and try yourself how this works. You find the code in my GitHub account, link at the end of this blog.

An introduction to Time Series Forecasting

To better understand Time Series Forecasting, let us first start from the big picture. What is Time Series?. Time contributes to the meaning of this term, it is a temporal structure. Series is something successive/ a sequence. So to conclude “A time series is a sequence taken at successive equally spaced points in time”, “Time series data have a natural temporal ordering”. It is data taken to give information about one or more specific data features which are saved at every point in time. Need more detailed information! here is this resource link.

Time series analysis comprises methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the data. Time series forecasting is the use of a model to predict future values based on previously observed values.

The goal of time series forecasting is to predict a future value or classification at a particular point in time.

Time Series Forecasting development:

  • Time series forecasting starts with a historical time series (referred to as sample data).
  • Analysts examine the historical data and check for patterns of time decomposition (trends, seasonal patterns, cyclical patterns, and regularity)

Trend is a pattern in data that shows the movement of a series to relatively higher or lower values over a long period of time. In other words, a trend is observed when there is an increasing or decreasing slope in the time series. — link to resource

  • These patterns help inform data analysts and data scientists about which forecasting algorithms they should use for predictive modeling.

The Goal of the Model

The Model uses the past 24 hours of BTC data to predict the value of BTC at the close of the following hour (approximately how long the average transaction takes).

Preprocessing method and why choose it

loading the data:

In the coinbase data-set is formatted such that every row represents a 60 second time window containing:

  • The start time of the time window in Unix time
  • The open price in USD at the start of the time window
  • The high price in USD within the time window
  • The low price in USD within the time window
  • The close price in USD at end of the time window
  • The amount of BTC transacted in the time window
  • The amount of Currency (USD) transacted in the time window
  • The volume-weighted average price in USD for the time window

To preprocess this data I had to answer these questions:

*Are all of the data points useful?

This is the dataframe loaded, the data contains NAN values, and fitting a model with NAN values may have undesirable results, we have to find a solution for this challenge.

In this case, I used the method of pandas.DataFrame.ffil() forward fill: that propagates the last valid observation forward to next valid.

*Are all of the data features useful?

The correlation is our answer to this question here is a plot about it.

“Open”, “ High”, “ Low”, “Close”, “ Weighted_Price” columns are like so we can pick one of them as one of our features.
We get the final list of our features as [“Close”, “Volume_(BTC)”, ““Volume_(Currency)””]

*Is the current time window relevant?

Every row represents a 60-second time window, in order to be aligned with the goal of the model to build, we need to have a 1hour time window. To do so there is a lot of methods that you can use
here are two:

1- simply split the data with 60 steps:

data = data[27::60] # since our data starts with 33 min the first 27 rows are not important to keep!

2- Round of the data-set according to the Timestamp column, it will be possible and easy to group by and then apply the mean on the Close column.

Visualize data:

For this model, I decided to use only two columns [ “Timestamp”, “Close”].

*Should I rescale the data?

The model will forecast the Close value, we should rescale by normalizing this feature.

*How should I save this preprocessed data?

After preprocessing the dataframe, it will be saved as a tensorflow.data.Dataset

Preparing the dataset for the model

1- Method to split the data into a training dataset and test dataset.

2- Normalizing the dataset.

3- Make a dataset that can fit a model as batches of 24 hours data!

RNN architecture

RNNs were designed to be able to remember important information about recent inputs, which they can then use to generate accurate forecasts.

A long short term memory network (LSTM) is a type of RNN that is especially popular in the time series space. It has forget gates and feed forward mechanisms that allow the network to retain information, forget extraneous inputs, and update the forecasting procedure to model and forecast complex time series problems.

Compilation and fitting

Model performance and results

Mean absolute error each epoch:

Model Performance:

GitHub Project Link:

Resources:

--

--