Skip to main content

Data Pipelining With Coinbase

··1210 words·6 mins

Introduction #

This is a quick post to illustrate how I collect and store crypto asset data from Coinbase. Essentially, the scripts below pull minute, hour, and daily data for the specified assets and if there is an existing data record, then the existing record is updated to include the most recent data. If there is not an existing data record, then the complete historical record from coinbase is pulled and stored.

Python Imports #

# Standard Library
import datetime
import os
import sys
import warnings

from datetime import datetime
from pathlib import Path

# Suppress warnings
warnings.filterwarnings("ignore")

Add Directories To Path #

# Add the source subdirectory to the system path to allow import config from settings.py
current_directory = Path(os.getcwd())
website_base_directory = current_directory.parent.parent.parent
src_directory = website_base_directory / "src"
sys.path.append(str(src_directory)) if str(src_directory) not in sys.path else None

# Import settings.py
from settings import config

# Add configured directories from config to path
SOURCE_DIR = config("SOURCE_DIR")
sys.path.append(str(Path(SOURCE_DIR))) if str(Path(SOURCE_DIR)) not in sys.path else None

# Add other configured directories
BASE_DIR = config("BASE_DIR")
CONTENT_DIR = config("CONTENT_DIR")
POSTS_DIR = config("POSTS_DIR")
PAGES_DIR = config("PAGES_DIR")
PUBLIC_DIR = config("PUBLIC_DIR")
SOURCE_DIR = config("SOURCE_DIR")
DATA_DIR = config("DATA_DIR")
DATA_MANUAL_DIR = config("DATA_MANUAL_DIR")

Python Functions #

Here are the functions needed for this project:

from coinbase_fetch_available_products import coinbase_fetch_available_products
from coinbase_fetch_full_history import coinbase_fetch_full_history
from coinbase_fetch_historical_candles import coinbase_fetch_historical_candles
from coinbase_pull_data import coinbase_pull_data

Function Usage #

Coinbase Fetch Available Products #

This script pulls the list of available assets based on the inputs for base and quote currency. Here’s an example:

df = coinbase_fetch_available_products(
    base_currency=None,
    quote_currency="USD",
    status="online",
)

In this example, the quote_currency is provided as “USD”. This script checks all available assets that are priced against USD and returns a dataframe listing all available assets:

display(df)

idbase_currencyquote_currencyquote_incrementbase_incrementdisplay_namemin_market_fundsmargin_enabledpost_onlylimit_onlycancel_onlystatusstatus_messagetrading_disabledfx_stablecoinmax_slippage_percentageauction_modehigh_bid_limit_percentage
2400-USD00USD0.00010.0100-USD1FalseFalseFalseFalseonlineFalseFalse0.03000000False
3421INCH-USD1INCHUSD0.0010.011INCH-USD1FalseFalseFalseFalseonlineFalseFalse0.03000000False
5342Z-USD2ZUSD0.000010.012Z/USD1FalseFalseFalseFalseonlineFalseFalse0.03000000False
713A8-USDA8USD0.00010.01A8/USD1FalseFalseFalseFalseonlineFalseFalse0.03000000False
115AAVE-USDAAVEUSD0.010.001AAVE-USD1FalseFalseFalseFalseonlineFalseFalse0.03000000False
.........................................................
442ZKC-USDZKCUSD0.00010.01ZKC/USD1FalseFalseFalseFalseonlineFalseFalse0.03000000False
352ZKP-USDZKPUSD0.000010.1ZKP/USD1FalseFalseFalseFalseonlineFalseFalse0.03000000False
88ZORA-USDZORAUSD0.000011ZORA/USD1FalseFalseFalseFalseonlineFalseFalse0.03000000False
467ZRO-USDZROUSD0.0010.01ZRO/USD1FalseFalseFalseFalseonlineFalseFalse0.03000000False
610ZRX-USDZRXUSD0.0000010.00001ZRX-USD1FalseFalseFalseFalseonlineFalseFalse0.03000000False

375 rows × 18 columns

Coinbase Fetch Historical Candles #

This script pulls the historical candles:

df = coinbase_fetch_historical_candles(
    product_id="BTC-USD",
    start=datetime(2025, 1, 1),
    end=datetime(2025, 1, 1),
    granularity=86_400,
)

Specifically, the date/time, open, high, low, close, and volume levels:

display(df)

timelowhighopenclosevolume
02025-01-0192743.6394960.9193347.5994383.596871.738482

Coinbase Fetch Full History #

This script pulls the full history for a specified asset:

df = coinbase_fetch_full_history(
    product_id="BTC-USD",
    start=datetime(2025, 1, 1),
    end=datetime(2025, 1, 31),
    granularity=86_400,
)

The example above pulls the daily data for 1 month, but can handle data ranges of years because it uses the coinbase_fetch_historical_candles to pull 300 candles at a time to ensure that the API is not overloaded and drops data. Here’s the results for the above:

display(df)

timelowhighopenclosevolume
02025-01-0192743.6394960.9193347.5994383.596871.738482
12025-01-0294177.0097776.9994383.5996903.1910912.473840
22025-01-0396016.6398969.9296905.4898136.519021.885382
32025-01-0497516.6598761.0298139.8598209.852742.089606
42025-01-0597250.0098814.0098209.8598345.332377.921759
52025-01-0697900.00102500.0098347.65102279.4115173.556068
62025-01-0796105.11102735.99102279.4196941.9816587.286922
72025-01-0892500.0097254.3596941.9895036.6314182.297395
82025-01-0991187.0095363.2695033.1892547.449712.378532
92025-01-1092209.2595862.9292547.4494701.1812634.034078
102025-01-1193804.0594983.6594701.4894565.022638.699568
112025-01-1293670.3095383.8494569.9194509.622025.816130
122025-01-1389028.6495900.0094507.2494506.4513094.863595
132025-01-1494311.3697353.2994507.3596534.9611210.742267
142025-01-1596400.00100716.4596534.97100510.2313610.747294
152025-01-1697277.58100880.00100504.2799981.7812312.373669
162025-01-1799937.81105970.0099981.46104107.0020518.309493
172025-01-18102233.45104933.15104107.00104435.007835.299918
182025-01-1999518.00106314.44104435.01101211.1313312.636856
192025-01-2099416.27109358.01101217.78102145.4332342.183113
202025-01-21100051.00107291.10102145.42106159.2619411.234890
212025-01-22103100.00106431.34106159.27103667.1110730.018962
222025-01-23101200.01106870.87103659.60103926.3625064.864999
232025-01-24102751.92107200.00103926.36104850.2712921.993614
242025-01-25104104.00105294.00104866.13104733.563404.853083
252025-01-26102452.24105478.80104729.92102563.004575.366115
262025-01-2797715.03103228.46102565.28102062.4223647.141119
272025-01-28100213.80103770.85102063.92101290.009488.534295
282025-01-29101275.60104829.64101290.01103747.2511403.202789
292025-01-30103289.74106484.77103747.25104742.6413061.348812
302025-01-31101506.00106090.00104742.63102411.2613313.681045

Coinbase Pull Data #

This script combines the above functions to perform the following:

  1. Attempt to read an existing pickle data file
  2. If a data file exists, then pull updated data
  3. Otherwise, pull all historical data available for that asset on Coinbase
  4. Store pickle and/or excel files of the data in the specified directories

Through the base_directory, source, and asset_class variables the script knows where in the local filesystem to look for an existing pickle file and the store the resulting updated pickle and/or excel files:

df = coinbase_pull_data(
    base_directory=DATA_DIR,
    source="Coinbase",
    asset_class="Cryptocurrencies",
    excel_export=False,
    pickle_export=True,
    output_confirmation=True,
    base_currency="BTC",
    quote_currency="USD",
    granularity=60, # 60=minute, 3600=hourly, 86400=daily
    status='online', # default status is 'online'
    start_date=datetime(current_year, current_month - 1, 1), # default start date
    end_date=datetime.now() - timedelta(days=1), # updates data through 1 day ago due to lag in data availability
)

By passing None as the base_currency and/or the quote_currency, the script will use the coinbase_fetch_available_products function to pull the list of all the available products, and then pulls data for all assets in that list. This functionality is incredibly useful, and makes acquiring data very straightforward, especially for a set of products for a specific base_currency or quote_currency.

The example above pulls the data for BTC-USD, and stores it in the following system directory:

DATA_DIR/Coinbase/Cryptocurrencies/Minute

And here is the filesystem tree output for the Coinbase directory:

$ tree Coinbase/
Coinbase/
└── Cryptocurrencies
    ├── Daily
    │   ├── BTC-USD.pkl
    │   ├── BTC-USD.xlsx
    │   ├── ETH-USD.pkl
    │   ├── ETH-USD.xlsx
    │   ├── SOL-USD.pkl
    │   ├── SOL-USD.xlsx
    │   ├── XRP-USD.pkl
    │   └── XRP-USD.xlsx
    ├── Hourly
    │   ├── BTC-USD.pkl
    │   ├── BTC-USD.xlsx
    │   ├── ETH-USD.pkl
    │   ├── ETH-USD.xlsx
    │   ├── SOL-USD.pkl
    │   ├── SOL-USD.xlsx
    │   ├── XRP-USD.pkl
    │   └── XRP-USD.xlsx
    └── Minute
        ├── BTC-USD.pkl
        ├── ETH-USD.pkl
        ├── SOL-USD.pkl
        └── XRP-USD.pkl

5 directories, 20 files

References #

  1. https://www.coinbase.com/

Code #

The Jupyter notebook with the functions and all other code is available here.
The HTML export of the jupyter notebook is available here.
The PDF export of the jupyter notebook is available here.