README
If you've ever heard that ESG investing is hard, that there's not enough climate disclosures, that we don't know enough...
STOP
This project is here to make climate investing actionable. We are building both open source software and openly available data sets so that you can identify relative value trades, optimize portfolios, and structure benchmarks for climate aligned investing.
Our first project is a multi-factor equity returns model which adds a climate factor, or Brown Minus Green, to the popular Fama French and Carhart models. This additional Brown Minus Green (BMG) return factor could be used for a variety of climate investing applications, including:
  • Calculate the market-implied carbon risk of a stock, investment portfolio, mutual fund, or bond based on historical returns
  • Determine the market reaction to the climate policies of a company
  • Optimize a portfolio to minimize carbon risk subject to other parameters, such as index tracking or growth-value-sector investment strategies.

Running the Code

Install the required python modules (use pip3 instead of pip according to your python installation):
1
pip install -r requirements.txt
Copied!

Using a Database

All python scripts have a --help option to show you the latest parameters available.
Init the Database using:
1
python setup_db.py -d
Copied!
Then use get_stocks.py to get stock information for stocks table and return history for the stock_data table:
  • python get_stocks.py -f some_ticker_file.csv for using a csv source file
  • python get_stocks.py -t ALB to load a single stock with ticker ALB
  • python get_stocks.py -s ALB shows whether there is data stored for the ticker ALB
If your stock ticker is a composite stock, it will calculate the historical returns using the weights in the stock_components table.
To run the regression, use get_regressions.py to save the output in the stock_stats table:
  • python get_regressions.py for running all the stocks in the database
  • python get_regressions.py -f some_ticker_file.csv for using a csv source file
  • python get_regressions.py -t ALB to run and store the regression for a given stock
It will use the stock returns in the database by default, or if none are found, get them first. It has some optional parameters:
  • -s YYYY-MM-DD to specify an optional start date, by default it will start at the earliest common date from the stocks and risk factors
  • -e YYYY-MM-DD to specify an optional end date, by default it will end at the latest common date from the stocks and risk factors
  • -i N for the regression interval in months (defaults to 60 months).
  • -n FACTOR_NAME to specify the BMG factor to use. If not specified, DEFAULT will be used.
  • -h to see all parameters available
To calculate a BMG series and store it in the database:
1
python bmg_series.py -n XOP-SMOG -g SMOG -b XOP
Copied!
where
  • -n <series name> is the name of your bmg series
  • -b is the ticker of your Brown stock
  • -g is the ticker of your Green stock

Viewing the Results

There is a react UI in the ui/ directory. It will need data including stocks and their regression results (see above) in the database. Once you've run get_regressions.py, then you can use this UI to view the results.
To run it, start both the node server and the react app (simultaneously in two terminal sessions) :
1
cd ui/node-server
2
npm run start
Copied!
and
1
cd ui/react
2
npm run start
Copied!

Running Command Line Scripts

These have been deprecated but are still available and can be used to run regressions in the command line without the database:
1
python factor_regression.py
Copied!
The inputs are:
  • Stock return data: Use the stock_data.csv or enter a ticker
  • Carbon data: The BMG return history. By default use carbon_risk_factor.csv.
  • Fama-French factors: Use either ff_factors.csv, which are the Fama/French Developed 3 Factors and Developed Momentum Factor (Mom), or ff_factors_north_american.csv, which are the Fama/French North American 3 Factors and the North American Momentum Factor (Mom) series, from the Dartmouth Ken French Data Library The original CARIMA project used the data from ff_factors.csv
The output will be a print output of the statsmodel object, the statsmodel coefficient summary, including the coefficient & p-Values (to replicate that of the CARIMA paper)
stock_price_function.py adjusts this so it returns an object (which is used later)
factor_regression.py loads in the stock prices, the carbon risk factor and the Fama-French factors. The names of these CSVs are asked for. If stock data would be liked to be downloaded, then it will use stock_price_function.py to do so
  • Ensure that you have the relevant modules installed
  • Have stock_price_script.py in the same folder as factor_regression.py
  • Have your factor CSVs saved
  • Run factor_regression.py and follow the prompts and enter the names of the CSVs as asked

Understanding the Output

The model uses the coefficients on each factor to calculate the stocks loadings on to it. If it is positive, it indicates that the stock returns are positively linked to that factor (i.e. if that factor increases, the returns increase), and the inverse if it is negative.
To determine if it is statistically significant, the t-statistic is calculated and the p-value provided. The null hypothesis is that the coefficient is not statistically significant. Thus, if the probability is below a cutoff, the null hypothesis is rejected and the loading can be considered statistically significant. A cutoff commonly used is the 5% level. In this case, if the p-value is below 0.05, then the loading is considered to be statistically significant.
Ordinary least square regression is based on certain assumptions. There are a variety of statistics that are used to test these assumptions, some of which are presented in the output.
The Jarque-Bera statistic tests the assumption that the data is Gaussian distributed (also known as normally distributed). The null hypothesis in this case is that the distribution is not different to what is expected if it follows a Gaussian distributed. If the p-value is above the cutoff, then one can assume that it is Gaussian distributed and the assumption of Gaussian distribution is not violated.
The Breusch-Pagan tests for heteroskedasticity. Hetereskedasticity is the phenomenon where the the variability of the random disturbance is different across elements of the factors used, ie the variability of the stock returns changes as the values of the factors change. The null hypothesis is that there is no heteroskedasticity, so if the p-value is below the cutoff, then there is not evidence to suggest that the assumption of homogeneity is violated.
The Durbin-Watson test calculated whether there is autocorrelation. Autocorrelation occurs when the errors are correlated with time (i.e. the unsystematic/stock-specific risk of the stock changes through time). A value between 1.5 and 2.5 is traditionally used to conclude that there is no autocorrelation.
The R Squared is what percentage of the stock returns (dependent variable) are explained by the factors (independent variables). The higher the percentage, the more of a stock returns can be considered to be based on the factor model.
An overview of ordinary least-squares regression can be found here on Wikipedia

R Scripts

To use the R scripts and apps, please download the latest version of R and RStudio. Open the script /R/requirements_r.R and run it. This will install all the packages
  • bulk_stock_return_downloader.R is a method to download multiple stocks. Using line 8, replace "stock_tickers.csv" with the list of tickers you wish to use, saved as a CSV in the data/ folder

The Book

A free book which explains climate investing and how to use this project. You can also read it online at gitbook.

Data Sources

Data included in this project come from the following sources:

References

Last modified 6d ago