Have you ever thought about how you could make smarter decisions when it comes to investing in the stock market? I’ve been on that very journey, and I’ve come across a remarkable tool called Live (Delayed) Stock Prices API that provides almost real-time information about any stock, delayed by 15 minutes. This discovery got me thinking, and I decided to create my own personal stock market assistant. With live stock data at your fingertips, you can make more informed choices without relying on expensive platforms. What’s more, you can develop your own investment strategy based on historical data to help you make the best decisions.

I explored various ways to quickly access up-to-date stock prices and trends, and I found that most solutions fell short of my needs, except for this API that offers live stock data.

Let’s take a closer look at what this API can do. This is the output when the API is requested for TSLA stock:

{'code': 'TSLA.US',
'timestamp': 1693941000,
'gmtoffset': 0,
'open': 245,
'high': 258,
'low': 244.86,
'close': 257.44,
'volume': 112942754,
'previousClose': 245.01,
'change': 12.43,
'change_p': 5.0733}

This output provides all the information you need to create a successful investment strategy. Now, let’s delve into how you can build a model to simplify your financial decision-making.

Register & Get Data

Python Implementation

1. Importing the Necessary Packages

Begin by importing some essential Python packages to support your project. These packages will assist you in handling data, training models, and more.

import pandas as pd
from eodhd import APIClient
import numpy as np
from xgboost import XGBClassifier
from sklearn.ensemble import IsolationForest
from sklearn.ensemble import RandomForestClassifier

Here’s what each package does:

Pandas: Helps with various data operations.

Numpy: Used for mathematical operations in Python.

Train_test_split: Splits your data into training and test sets.

XGBoost, Random Forest, and Isolated Random Forest: These are classification models used for our task.

eodhd: This is the official library of EODHD for accessing their APIs

2. API Key Activation

It is essential to register the EODHD API key with the package in order to use its functions. If you don’t have an EODHD API key, firstly, head over to their website, then, finish the registration process to create an EODHD account, and finally, navigate to the ‘Settings’ page where you could find your secret EODHD API key. It is important to ensure that this secret API key is not revealed to anyone. You can activate the API key by following this code:

api_key = '<YOUR API KEY>'
client = APIClient(api_key)

The code is pretty simple. In the first line, we are storing the secret EODHD API key into the api_key and then in the second line, we are using the APIClient class provided by the eodhd package to activate the API key and stored the response in the client variable.

Note that you need to replace <YOUR API KEY> with your secret EODHD API key. Apart from directly storing the API key with text, there are other ways for better security such as utilizing environmental variables, and so on.

Register & Get Data

3. Loading Historical Data

Retrieve historical stock data for the period you’re interested in. This data will be the foundation for training your model. We can easily extract the historical data of stocks using EODHD’s historical market data API endpoint via the eodhd package.

def get_historical_data(ticker, start_date, end_date):
    json_resp = client.get_historical_data(symbol = ticker, period = '5m', from_date = start_date, to_date = end_date, order = 'a')
    df = pd.DataFrame(json_resp)
    df = df.set_index('date')
    df.index = pd.to_datetime(df.index)
    return df

TSLA = get_historical_data('TSLA', '2021-08-02', '2021-09-02')

In the above code, we are using the get_historical_data function provided by the eodhd package to extract the split-adjusted historical stock data of Tesla. The function consists of the following parameters:

  • the ticker parameter where the symbol of the stock we are interested in extracting the data should be mentioned
  • the period refers to the time interval between each data point (5 minutes interval in our case).
  • the from_date and to_date parameters which indicate the starting and ending date of the data respectively. The format of the input should be “YYYY-MM-DD”
  • the order parameter which is an optional parameter that can be used to order the dataframe either in ascending (a) or descending (d). It is ordered based on the dates.

4. Obtaining Live Data for Prediction

Use EODHD’s Live (Delayed) Stock Prices API via the eodhd package to access live stock data, which is crucial for your decision-making process.

def extract_intraday(symbol):
    raw_df = client.get_live_stock_prices(ticker = symbol)
    df = pd.DataFrame([raw_df])
    return df

tsla_intraday = extract_intraday('TSLA')

This function takes a stock code as input, fetches live stock information from the API, converts the response into a Pandas dataframe, and returns it.

5. Preprocessing the Data

Before you can train a model to predict stock prices, you need to prepare and clean the data. This step ensures your predictions are as accurate as possible. Here’s what you should do:

Check for Class Imbalances: Sometimes, you might have more data for one class (e.g., “buy”) than another (e.g., “sell”). This can skew your model’s predictions. To fix this, you can use two techniques:

Oversampling: Creating more instances of the underrepresented class.

Undersampling: Reducing the number of instances in the overrepresented class.

Normalize the Data: Data normalization ensures that all your features have the same scale. This is important because some algorithms are sensitive to the scale of the input features. You can do this using techniques like:

Standard Scaler: It scales your data to have a mean of 0 and a standard deviation of 1.

MinMax Scaler: This scales your data to a specific range, usually between 0 and 1.

Drop Unnecessary Columns: Sometimes, you have columns in your data that aren’t relevant to your prediction task. It’s a good idea to remove them to simplify your model and improve its performance.

Here’s an example:

dataF = dataF.drop(['timestamp',  'gmtoffset',  'datetime'],axis =1)
tsla_intraday = tsla_intraday.drop(['code', 'timestamp', 'gmtoffset', 'previousClose', 'change', 'change_p'], axis=1)

6. Forming a Strategy

Next, you’ll want to classify your training data based on your unique strategy. This helps your model understand how to make predictions. In my case, I used a simple strategy with three classes: “waiting” (0), “buying” (1), and “selling” (2).

Here’s how I did it:

def signal_generator(df):
open = df.Open.iloc[-1]
close = df.Close.iloc[-1]
previous_open = df.Open.iloc[-2]
previous_close = df.Close.iloc[-2]

if (open > close and previous_open < previous_close and close < previous_open and open >= previous_close):
return 1 # Buying
elif (open < close and previous_open > previous_close and close > previous_open and open <= previous_close):
return 2 # Selling
else:
return 0 # Waiting

signal = [0] # Initialize with "waiting"
for i in range(1, len(dataF)):
df = dataF[i - 1:i + 1]
signal.append(signal_generator(df))

dataF["signal"] = signal

7. Loading and Training the Models

Now comes the exciting part — selecting and training your model. In this tutorial, I used three different models: XGBoost, Isolated Random Forest, and Random Forest. Each has its unique strengths.

XGBoost: a powerful boosting algorithm that’s great at handling structured data like stock prices.

from xgboost import XGBClassifier
model1 = XGBClassifier()
model1.fit(X_train, y_train)

Isolated Random Forest: this model is excellent for detecting anomalies using binary trees.

from sklearn.ensemble import IsolationForest

random_state = np.random.RandomState(42)
model=IsolationForest(n_estimators=100,max_samples='auto',contamination=float(0.2),random_state=random_state)

model.fit(X_train, y_train)

Random Forest: a machine-learning algorithm that uses multiple decision trees to make predictions or classifications.

from sklearn.ensemble import RandomForestClassifier

model1 = RandomForestClassifier()
model1.fit(X_train, y_train)

8. Prediction on Live Data for Suggestions

With your trained model, you can now make predictions on both test data and live data. Evaluate your predictions using metrics like precision, recall, accuracy, and F1 score. The F1 score is particularly useful when dealing with imbalanced data.

Here’s how you can do it:

# Make predictions for test data
y_pred1 = model1.predict(X_test)
predictions1 = [round(value) for value in y_pred1]

# Evaluate the predictions
from sklearn.metrics import confusion_matrix, recall_score, precision_score, f1_score, accuracy_score
cm = confusion_matrix(y_test, predictions)

rf_Recall = recall_score(y_test, predictions1, average='macro')
rf_Precision = precision_score(y_test, predictions1, average='macro')
rf_f1 = f1_score(y_test, predictions1, average='macro')
rf_accuracy = accuracy_score(y_test, predictions1)

Where to Go from Here

Now that you’ve embarked on this journey, the possibilities are endless. You can fine-tune your model, explore different machine-learning algorithms, and analyze various stocks. With access to live stock data using Live (Delayed) Stock Prices API and the knowledge from this tutorial, you’re well-equipped to make more informed decisions in the ever-changing world of stock trading.

Register & Get Data

Stay tuned for more insights and tips on improving your stock market strategies with data-driven solutions!