Zipline Beginner Tutorial¶
Basics¶
Zipline is an open-source algorithmic trading simulator written in Python.
The source can be found at: https://github.com/quantopian/zipline
Some benefits include:
Realistic: slippage, transaction costs, order delays.
Stream-based: Process each event individually, avoids look-ahead bias.
Batteries included: Common transforms (moving average) as well as common risk calculations (Sharpe).
Developed and continuously updated by Quantopian which provides an easy-to-use web-interface to Zipline, 10 years of minute-resolution historical US stock data, and live-trading capabilities. This tutorial is directed at users wishing to use Zipline without using Quantopian. If you instead want to get started on Quantopian, see here.
This tutorial assumes that you have zipline correctly installed, see the installation instructions if you haven’t set up zipline yet.
Every zipline
algorithm consists of two functions you have to
define:
initialize(context)
handle_data(context, data)
Before the start of the algorithm, zipline
calls the
initialize()
function and passes in a context
variable.
context
is a persistent namespace for you to store variables you
need to access from one algorithm iteration to the next.
After the algorithm has been initialized, zipline
calls the
handle_data()
function once for each event. At every call, it passes
the same context
variable and an event-frame called data
containing the current trading bar with open, high, low, and close
(OHLC) prices as well as volume for each stock in your universe. For
more information on these functions, see the relevant part of the
Quantopian docs.
My First Algorithm¶
Let’s take a look at a very simple algorithm from the examples
directory, buyapple.py
:
from zipline.examples import buyapple
buyapple??
from zipline.api import order, record, symbol
def initialize(context):
pass
def handle_data(context, data):
order(symbol('AAPL'), 10)
record(AAPL=data.current(symbol('AAPL'), 'price'))
As you can see, we first have to import some functions we would like to
use. All functions commonly used in your algorithm can be found in
zipline.api
. Here we are using order()
which takes two
arguments: a security object, and a number specifying how many stocks you would
like to order (if negative, order()
will sell/short
stocks). In this case we want to order 10 shares of Apple at each iteration. For
more documentation on order()
, see the Quantopian docs.
Finally, the record()
function allows you to save the value
of a variable at each iteration. You provide it with a name for the variable
together with the variable itself: varname=var
. After the algorithm
finished running you will have access to each variable value you tracked
with record()
under the name you provided (we will see this
further below). You also see how we can access the current price data of the
AAPL stock in the data
event frame (for more information see
here).
Running the Algorithm¶
To now test this algorithm on financial data, zipline
provides three
interfaces: A command-line interface, IPython Notebook
magic, and
run_algorithm()
.
Ingesting Data¶
If you haven’t ingested the data, then run:
$ zipline ingest [-b <bundle>]
where <bundle>
is the name of the bundle to ingest, defaulting to
quantopian-quandl
.
you can check out the ingesting data section for more detail.
Command Line Interface¶
After you installed zipline you should be able to execute the following
from your command line (e.g. cmd.exe
on Windows, or the Terminal app
on OSX):
$ zipline run --help
Usage: zipline run [OPTIONS]
Run a backtest for the given algorithm.
Options:
-f, --algofile FILENAME The file that contains the algorithm to run.
-t, --algotext TEXT The algorithm script to run.
-D, --define TEXT Define a name to be bound in the namespace
before executing the algotext. For example
'-Dname=value'. The value may be any python
expression. These are evaluated in order so
they may refer to previously defined names.
--data-frequency [daily|minute]
The data frequency of the simulation.
[default: daily]
--capital-base FLOAT The starting capital for the simulation.
[default: 10000000.0]
-b, --bundle BUNDLE-NAME The data bundle to use for the simulation.
[default: quandl]
--bundle-timestamp TIMESTAMP The date to lookup data on or before.
[default: <current-time>]
-s, --start DATE The start date of the simulation.
-e, --end DATE The end date of the simulation.
-o, --output FILENAME The location to write the perf data. If this
is '-' the perf will be written to stdout.
[default: -]
--trading-calendar TRADING-CALENDAR
The calendar you want to use e.g. LSE. NYSE
is the default.
--print-algo / --no-print-algo Print the algorithm to stdout.
--benchmark-file The csv file that contains the benchmark
returns (date, returns columns)
--benchmark-symbol The instrument's symbol to be used as
a benchmark.
(should exist in the ingested bundle)
--benchmark-sid The sid of the instrument to be used as a
benchmark.
(should exist in the ingested bundle)
--no-benchmark This flag is used to set the benchmark to
zero. Alpha, beta and benchmark metrics
are not calculated
--help Show this message and exit.
As you can see there are a couple of flags that specify where to find your
algorithm (-f
) as well as parameters specifying which data to use,
defaulting to quandl
. There are also arguments for
the date range to run the algorithm over (--start
and --end
).To use a
benchmark, you need to choose one of the benchmark options listed before. You can
always use the option (--no-benchmark
) that uses zero returns as a benchmark (
alpha, beta and benchmark metrics are not calculated in this case).
Finally, you’ll want to save the performance metrics of your algorithm so that you can
analyze how it performed. This is done via the --output
flag and will cause
it to write the performance DataFrame
in the pickle Python file format.
Note that you can also define a configuration file with these parameters that
you can then conveniently pass to the -c
option so that you don’t have to
supply the command line args all the time (see the .conf files in the examples
directory).
Thus, to execute our algorithm from above and save the results to
buyapple_out.pickle
, we call zipline run
as follows:
zipline run -f ../zipline/examples/buyapple.py --start 2016-1-1 --end 2018-1-1 -o buyapple_out.pickle --no-benchmark
AAPL
[2018-01-03 04:30:51.843465] INFO: Performance: Simulated 503 trading days out of 503.
[2018-01-03 04:30:51.843598] INFO: Performance: first open: 2016-01-04 14:31:00+00:00
[2018-01-03 04:30:51.843672] INFO: Performance: last close: 2017-12-29 21:00:00+00:00
run
first calls the initialize()
function, and then
streams the historical stock price day-by-day through handle_data()
.
After each call to handle_data()
we instruct zipline
to order 10
stocks of AAPL. After the call of the order()
function, zipline
enters the ordered stock and amount in the order book. After the
handle_data()
function has finished, zipline
looks for any open
orders and tries to fill them. If the trading volume is high enough for
this stock, the order is executed after adding the commission and
applying the slippage model which models the influence of your order on
the stock price, so your algorithm will be charged more than just the
stock price * 10. (Note, that you can also change the commission and
slippage model that zipline
uses, see the Quantopian
docs for more
information).
Let’s take a quick look at the performance DataFrame
. For this, we
use pandas
from inside the IPython Notebook and print the first ten
rows. Note that zipline
makes heavy usage of pandas
, especially
for data input and outputting so it’s worth spending some time to learn
it.
import pandas as pd
perf = pd.read_pickle('buyapple_out.pickle') # read in perf DataFrame
perf.head()
AAPL | algo_volatility | algorithm_period_return | alpha | benchmark_period_return | benchmark_volatility | beta | capital_used | ending_cash | ending_exposure | ending_value | excess_return | gross_leverage | long_exposure | long_value | longs_count | max_drawdown | max_leverage | net_leverage | orders | period_close | period_label | period_open | pnl | portfolio_value | positions | returns | sharpe | short_exposure | short_value | shorts_count | sortino | starting_cash | starting_exposure | starting_value | trading_days | transactions | treasury_period_return | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2016-01-04 21:00:00+00:00 | 105.35 | NaN | 0.000000e+00 | NaN | -0.013983 | NaN | NaN | 0.0 | 10000000.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0 | 0.000000e+00 | 0.0 | 0.000000 | [{\'dt\': 2016-01-04 21:00:00+00:00, \'reason\': N... | 2016-01-04 21:00:00+00:00 | 2016-01 | 2016-01-04 14:31:00+00:00 | 0.0 | 10000000.0 | [] | 0.000000e+00 | NaN | 0 | 0 | 0 | NaN | 10000000.0 | 0.0 | 0.0 | 1 | [] | 0.0 |
2016-01-05 21:00:00+00:00 | 102.71 | 0.000001 | -1.000000e-07 | -0.000022 | -0.012312 | 0.175994 | -0.000006 | -1028.1 | 9998971.9 | 1027.1 | 1027.1 | 0.0 | 0.000103 | 1027.1 | 1027.1 | 1 | -1.000000e-07 | 0.0 | 0.000103 | [{\'dt\': 2016-01-05 21:00:00+00:00, \'reason\': N... | 2016-01-05 21:00:00+00:00 | 2016-01 | 2016-01-05 14:31:00+00:00 | -1.0 | 9999999.0 | [{\'sid\': Equity(8 [AAPL]), \'last_sale_price\': ... | -1.000000e-07 | -11.224972 | 0 | 0 | 0 | -11.224972 | 10000000.0 | 0.0 | 0.0 | 2 | [{\'order_id\': \'4011063b5c094e82a5391527044098b... | 0.0 |
2016-01-06 21:00:00+00:00 | 100.70 | 0.000019 | -2.210000e-06 | -0.000073 | -0.024771 | 0.137853 | 0.000054 | -1008.0 | 9997963.9 | 2014.0 | 2014.0 | 0.0 | 0.000201 | 2014.0 | 2014.0 | 1 | -2.210000e-06 | 0.0 | 0.000201 | [{\'dt\': 2016-01-06 21:00:00+00:00, \'reason\': N... | 2016-01-06 21:00:00+00:00 | 2016-01 | 2016-01-06 14:31:00+00:00 | -21.1 | 9999977.9 | [{\'sid\': Equity(8 [AAPL]), \'last_sale_price\': ... | -2.110000e-06 | -9.823839 | 0 | 0 | 0 | -9.588756 | 9998971.9 | 1027.1 | 1027.1 | 3 | [{\'order_id\': \'3bf9fe20cc46468d99f741474226c03... | 0.0 |
2016-01-07 21:00:00+00:00 | 96.45 | 0.000064 | -1.081000e-05 | 0.000243 | -0.048168 | 0.167868 | 0.000300 | -965.5 | 9996998.4 | 2893.5 | 2893.5 | 0.0 | 0.000289 | 2893.5 | 2893.5 | 1 | -1.081000e-05 | 0.0 | 0.000289 | [{\'dt\': 2016-01-07 21:00:00+00:00, \'reason\': N... | 2016-01-07 21:00:00+00:00 | 2016-01 | 2016-01-07 14:31:00+00:00 | -86.0 | 9999891.9 | [{\'sid\': Equity(8 [AAPL]), \'last_sale_price\': ... | -8.600019e-06 | -10.592737 | 0 | 0 | 0 | -9.688947 | 9997963.9 | 2014.0 | 2014.0 | 4 | [{\'order_id\': \'6af6aed9fbb44a6bba17e802051b94d... | 0.0 |
2016-01-08 21:00:00+00:00 | 96.96 | 0.000063 | -9.380000e-06 | 0.000466 | -0.058601 | 0.145654 | 0.000311 | -970.6 | 9996027.8 | 3878.4 | 3878.4 | 0.0 | 0.000388 | 3878.4 | 3878.4 | 1 | -1.081000e-05 | 0.0 | 0.000388 | [{\'dt\': 2016-01-08 21:00:00+00:00, \'reason\': N... | 2016-01-08 21:00:00+00:00 | 2016-01 | 2016-01-08 14:31:00+00:00 | 14.3 | 9999906.2 | [{\'sid\': Equity(8 [AAPL]), \'last_sale_price\': ... | 1.430015e-06 | -7.511729 | 0 | 0 | 0 | -7.519659 | 9996998.4 | 2893.5 | 2893.5 | 5 | [{\'order_id\': \'18f64975732449a18fca06e9c69bf5c... | 0.0 |
As you can see, there is a row for each trading day, starting on the
first business day of 2016. In the columns you can find various
information about the state of your algorithm. The very first column
AAPL
was placed there by the record()
function mentioned earlier
and allows us to plot the price of apple. For example, we could easily
examine now how our portfolio value changed over time compared to the
AAPL stock price.
%pylab inline
figsize(12, 12)
import matplotlib.pyplot as plt
ax1 = plt.subplot(211)
perf.portfolio_value.plot(ax=ax1)
ax1.set_ylabel('Portfolio Value')
ax2 = plt.subplot(212, sharex=ax1)
perf.AAPL.plot(ax=ax2)
ax2.set_ylabel('AAPL Stock Price')
Populating the interactive namespace from numpy and matplotlib
<matplotlib.text.Text at 0x10c48c198>
As you can see, our algorithm performance as assessed by the
portfolio_value
closely matches that of the AAPL stock price. This
is not surprising as our algorithm only bought AAPL every chance it got.
IPython Notebook¶
The IPython Notebook is a very
powerful browser-based interface to a Python interpreter (this tutorial
was written in it). As it is already the de-facto interface for most
quantitative researchers zipline
provides an easy way to run your
algorithm inside the Notebook without requiring you to use the CLI.
To use it you have to write your algorithm in a cell and let zipline
know that it is supposed to run this algorithm. This is done via the
%%zipline
IPython magic command that is available after you
import zipline
from within the IPython Notebook. This magic takes
the same arguments as the command line interface described above. Thus
to run the algorithm from above with the same parameters we just have to
execute the following cell after importing zipline
to register the
magic.
%load_ext zipline
%%zipline --start 2016-1-1 --end 2018-1-1
from zipline.api import symbol, order, record
def initialize(context):
pass
def handle_data(context, data):
order(symbol('AAPL'), 10)
record(AAPL=data[symbol('AAPL')].price)
Note that we did not have to specify an input file as above since the
magic will use the contents of the cell and look for your algorithm
functions there. Also, instead of defining an output file we are
specifying a variable name with -o
that will be created in the name
space and contain the performance DataFrame
we looked at above.
_.head()
AAPL | algo_volatility | algorithm_period_return | alpha | benchmark_period_return | benchmark_volatility | beta | capital_used | ending_cash | ending_exposure | ending_value | excess_return | gross_leverage | long_exposure | long_value | longs_count | max_drawdown | max_leverage | net_leverage | orders | period_close | period_label | period_open | pnl | portfolio_value | positions | returns | sharpe | short_exposure | short_value | shorts_count | sortino | starting_cash | starting_exposure | starting_value | trading_days | transactions | treasury_period_return | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2016-01-04 21:00:00+00:00 | 105.35 | NaN | 0.000000e+00 | NaN | -0.013983 | NaN | NaN | 0.00 | 10000000.00 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0 | 0.000000e+00 | 0.0 | 0.000000 | [{\'created\': 2016-01-04 21:00:00+00:00, \'reaso... | 2016-01-04 21:00:00+00:00 | 2016-01 | 2016-01-04 14:31:00+00:00 | 0.00 | 10000000.00 | [] | 0.000000e+00 | NaN | 0 | 0 | 0 | NaN | 10000000.00 | 0.0 | 0.0 | 1 | [] | 0.0 |
2016-01-05 21:00:00+00:00 | 102.71 | 1.122497e-08 | -1.000000e-09 | -2.247510e-07 | -0.012312 | 0.175994 | -6.378047e-08 | -1027.11 | 9998972.89 | 1027.1 | 1027.1 | 0.0 | 0.000103 | 1027.1 | 1027.1 | 1 | -9.999999e-10 | 0.0 | 0.000103 | [{\'created\': 2016-01-04 21:00:00+00:00, \'reaso... | 2016-01-05 21:00:00+00:00 | 2016-01 | 2016-01-05 14:31:00+00:00 | -0.01 | 9999999.99 | [{\'amount\': 10, \'cost_basis\': 102.711000000000... | -1.000000e-09 | -11.224972 | 0 | 0 | 0 | -11.224972 | 10000000.00 | 0.0 | 0.0 | 2 | [{\'dt\': 2016-01-05 21:00:00+00:00, \'order_id\':... | 0.0 |
2016-01-06 21:00:00+00:00 | 100.70 | 1.842654e-05 | -2.012000e-06 | -4.883861e-05 | -0.024771 | 0.137853 | 5.744807e-05 | -1007.01 | 9997965.88 | 2014.0 | 2014.0 | 0.0 | 0.000201 | 2014.0 | 2014.0 | 1 | -2.012000e-06 | 0.0 | 0.000201 | [{\'created\': 2016-01-05 21:00:00+00:00, \'reaso... | 2016-01-06 21:00:00+00:00 | 2016-01 | 2016-01-06 14:31:00+00:00 | -20.11 | 9999979.88 | [{\'amount\': 20, \'cost_basis\': 101.706000000000... | -2.011000e-06 | -9.171989 | 0 | 0 | 0 | -9.169708 | 9998972.89 | 1027.1 | 1027.1 | 3 | [{\'dt\': 2016-01-06 21:00:00+00:00, \'order_id\':... | 0.0 |
2016-01-07 21:00:00+00:00 | 96.45 | 6.394658e-05 | -1.051300e-05 | 2.633450e-04 | -0.048168 | 0.167868 | 3.005102e-04 | -964.51 | 9997001.37 | 2893.5 | 2893.5 | 0.0 | 0.000289 | 2893.5 | 2893.5 | 1 | -1.051300e-05 | 0.0 | 0.000289 | [{\'created\': 2016-01-06 21:00:00+00:00, \'reaso... | 2016-01-07 21:00:00+00:00 | 2016-01 | 2016-01-07 14:31:00+00:00 | -85.01 | 9999894.87 | [{\'amount\': 30, \'cost_basis\': 99.9543333333335... | -8.501017e-06 | -10.357397 | 0 | 0 | 0 | -9.552189 | 9997965.88 | 2014.0 | 2014.0 | 4 | [{\'dt\': 2016-01-07 21:00:00+00:00, \'order_id\':... | 0.0 |
2016-01-08 21:00:00+00:00 | 96.96 | 6.275294e-05 | -8.984000e-06 | 4.879306e-04 | -0.058601 | 0.145654 | 3.118401e-04 | -969.61 | 9996031.76 | 3878.4 | 3878.4 | 0.0 | 0.000388 | 3878.4 | 3878.4 | 1 | -1.051300e-05 | 0.0 | 0.000388 | [{\'created\': 2016-01-07 21:00:00+00:00, \'reaso... | 2016-01-08 21:00:00+00:00 | 2016-01 | 2016-01-08 14:31:00+00:00 | 15.29 | 9999910.16 | [{\'amount\': 40, \'cost_basis\': 99.2060000000002... | 1.529016e-06 | -7.215497 | 0 | 0 | 0 | -7.301134 | 9997001.37 | 2893.5 | 2893.5 | 5 | [{\'dt\': 2016-01-08 21:00:00+00:00, \'order_id\':... | 0.0 |
Access to Previous Prices Using history
¶
Working example: Dual Moving Average Cross-Over¶
The Dual Moving Average (DMA) is a classic momentum strategy. It’s probably not used by any serious trader anymore but is still very instructive. The basic idea is that we compute two rolling or moving averages (mavg) – one with a longer window that is supposed to capture long-term trends and one shorter window that is supposed to capture short-term trends. Once the short-mavg crosses the long-mavg from below we assume that the stock price has upwards momentum and long the stock. If the short-mavg crosses from above we exit the positions as we assume the stock to go down further.
As we need to have access to previous prices to implement this strategy we need a new concept: History
data.history()
is a convenience function that keeps a rolling window of
data for you. The first argument is the number of bars you want to
collect, the second argument is the unit (either '1d'
or '1m'
,
but note that you need to have minute-level data for using 1m
). For
a more detailed description of history()
’s features, see the
Quantopian docs.
Let’s look at the strategy which should make this clear:
%%zipline --start 2014-1-1 --end 2018-1-1 -o dma.pickle
from zipline.api import order_target, record, symbol
import matplotlib.pyplot as plt
def initialize(context):
context.i = 0
context.asset = symbol('AAPL')
def handle_data(context, data):
# Skip first 300 days to get full windows
context.i += 1
if context.i < 300:
return
# Compute averages
# data.history() has to be called with the same params
# from above and returns a pandas dataframe.
short_mavg = data.history(context.asset, 'price', bar_count=100, frequency="1d").mean()
long_mavg = data.history(context.asset, 'price', bar_count=300, frequency="1d").mean()
# Trading logic
if short_mavg > long_mavg:
# order_target orders as many shares as needed to
# achieve the desired number of shares.
order_target(context.asset, 100)
elif short_mavg < long_mavg:
order_target(context.asset, 0)
# Save values for later inspection
record(AAPL=data.current(context.asset, 'price'),
short_mavg=short_mavg,
long_mavg=long_mavg)
def analyze(context, perf):
fig = plt.figure()
ax1 = fig.add_subplot(211)
perf.portfolio_value.plot(ax=ax1)
ax1.set_ylabel('portfolio value in $')
ax2 = fig.add_subplot(212)
perf['AAPL'].plot(ax=ax2)
perf[['short_mavg', 'long_mavg']].plot(ax=ax2)
perf_trans = perf.ix[[t != [] for t in perf.transactions]]
buys = perf_trans.ix[[t[0]['amount'] > 0 for t in perf_trans.transactions]]
sells = perf_trans.ix[
[t[0]['amount'] < 0 for t in perf_trans.transactions]]
ax2.plot(buys.index, perf.short_mavg.ix[buys.index],
'^', markersize=10, color='m')
ax2.plot(sells.index, perf.short_mavg.ix[sells.index],
'v', markersize=10, color='k')
ax2.set_ylabel('price in $')
plt.legend(loc=0)
plt.show()
Here we are explicitly defining an analyze()
function that gets
automatically called once the backtest is done (this is not possible on
Quantopian currently).
Although it might not be directly apparent, the power of history()
(pun intended) can not be under-estimated as most algorithms make use of
prior market developments in one form or another. You could easily
devise a strategy that trains a classifier with
scikit-learn which tries to
predict future market movements based on past prices (note, that most of
the scikit-learn
functions require numpy.ndarray
s rather than
pandas.DataFrame
s, so you can simply pass the underlying
ndarray
of a DataFrame
via .values
).
We also used the order_target()
function above. This and other
functions like it can make order management and portfolio rebalancing
much easier. See the Quantopian documentation on order
functions for
more details.
Conclusions¶
We hope that this tutorial gave you a little insight into the
architecture, API, and features of zipline
. For next steps, check
out some of the
examples.
Feel free to ask questions on our mailing list, report problems on our GitHub issue tracker, get involved, and checkout Quantopian.