Introduction¶
This is a quick post to illustrate how I collect and store crypto asset data from Coinbase. Essentially, the scripts below pull minute, hour, and daily data for the specified assets and if there is an existing data record, then the existing record is updated to include the most recent data. If there is not an existing data record, then the complete historical record from coinbase is pulled and stored.
Python Imports¶
# Standard Library
import datetime
import os
import sys
import warnings
from datetime import datetime
from pathlib import Path
# Suppress warnings
warnings.filterwarnings("ignore")
Add Directories To Path¶
# Add the source subdirectory to the system path to allow import config from settings.py
current_directory = Path(os.getcwd())
website_base_directory = current_directory.parent.parent.parent
src_directory = website_base_directory / "src"
sys.path.append(str(src_directory)) if str(src_directory) not in sys.path else None
# Import settings.py
from settings import config
# Add configured directories from config to path
SOURCE_DIR = config("SOURCE_DIR")
sys.path.append(str(Path(SOURCE_DIR))) if str(Path(SOURCE_DIR)) not in sys.path else None
# Add other configured directories
BASE_DIR = config("BASE_DIR")
CONTENT_DIR = config("CONTENT_DIR")
POSTS_DIR = config("POSTS_DIR")
PAGES_DIR = config("PAGES_DIR")
PUBLIC_DIR = config("PUBLIC_DIR")
SOURCE_DIR = config("SOURCE_DIR")
DATA_DIR = config("DATA_DIR")
DATA_MANUAL_DIR = config("DATA_MANUAL_DIR")
Python Functions¶
Here are the functions needed for this project:
- coinbase_fetch_available_products: Fetch available products from Coinbase Exchange API.
- coinbase_fetch_full_history: Fetch full historical data for a given product from Coinbase Exchange API.
- coinbase_fetch_historical_candles: Fetch historical candle data for a given product from Coinbase Exchange API.
- coinbase_pull_data: Update existing record or pull full historical data for a given product from Coinbase Exchange API.
from coinbase_fetch_available_products import coinbase_fetch_available_products
from coinbase_fetch_full_history import coinbase_fetch_full_history
from coinbase_fetch_historical_candles import coinbase_fetch_historical_candles
from coinbase_pull_data import coinbase_pull_data
df = coinbase_fetch_available_products(
base_currency=None,
quote_currency="USD",
status="online",
)
In this example, the quote_currency is provided as "USD". This script checks all available assets that are priced against USD and returns a dataframe listing all available assets:
display(df)
| id | base_currency | quote_currency | quote_increment | base_increment | display_name | min_market_funds | margin_enabled | post_only | limit_only | cancel_only | status | status_message | trading_disabled | fx_stablecoin | max_slippage_percentage | auction_mode | high_bid_limit_percentage | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 24 | 00-USD | 00 | USD | 0.0001 | 0.01 | 00-USD | 1 | False | False | False | False | online | False | False | 0.03000000 | False | ||
| 342 | 1INCH-USD | 1INCH | USD | 0.001 | 0.01 | 1INCH-USD | 1 | False | False | False | False | online | False | False | 0.03000000 | False | ||
| 534 | 2Z-USD | 2Z | USD | 0.00001 | 0.01 | 2Z/USD | 1 | False | False | False | False | online | False | False | 0.03000000 | False | ||
| 713 | A8-USD | A8 | USD | 0.0001 | 0.01 | A8/USD | 1 | False | False | False | False | online | False | False | 0.03000000 | False | ||
| 115 | AAVE-USD | AAVE | USD | 0.01 | 0.001 | AAVE-USD | 1 | False | False | False | False | online | False | False | 0.03000000 | False | ||
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 442 | ZKC-USD | ZKC | USD | 0.0001 | 0.01 | ZKC/USD | 1 | False | False | False | False | online | False | False | 0.03000000 | False | ||
| 352 | ZKP-USD | ZKP | USD | 0.00001 | 0.1 | ZKP/USD | 1 | False | False | False | False | online | False | False | 0.03000000 | False | ||
| 88 | ZORA-USD | ZORA | USD | 0.00001 | 1 | ZORA/USD | 1 | False | False | False | False | online | False | False | 0.03000000 | False | ||
| 467 | ZRO-USD | ZRO | USD | 0.001 | 0.01 | ZRO/USD | 1 | False | False | False | False | online | False | False | 0.03000000 | False | ||
| 610 | ZRX-USD | ZRX | USD | 0.000001 | 0.00001 | ZRX-USD | 1 | False | False | False | False | online | False | False | 0.03000000 | False |
375 rows × 18 columns
Coinbase Fetch Historical Candles¶
This script pulls the historical candles:
df = coinbase_fetch_historical_candles(
product_id="BTC-USD",
start=datetime(2025, 1, 1),
end=datetime(2025, 1, 1),
granularity=86_400,
)
Specifically, the date/time, open, high, low, close, and volume levels:
display(df)
| time | low | high | open | close | volume | |
|---|---|---|---|---|---|---|
| 0 | 2025-01-01 | 92743.63 | 94960.91 | 93347.59 | 94383.59 | 6871.738482 |
Coinbase Fetch Full History¶
This script pulls the full history for a specified asset:
df = coinbase_fetch_full_history(
product_id="BTC-USD",
start=datetime(2025, 1, 1),
end=datetime(2025, 1, 31),
granularity=86_400,
)
The example above pulls the daily data for 1 month, but can handle data ranges of years because it uses the coinbase_fetch_historical_candles to pull 300 candles at a time to ensure that the API is not overloaded and drops data. Here's the results for the above:
display(df)
| time | low | high | open | close | volume | |
|---|---|---|---|---|---|---|
| 0 | 2025-01-01 | 92743.63 | 94960.91 | 93347.59 | 94383.59 | 6871.738482 |
| 1 | 2025-01-02 | 94177.00 | 97776.99 | 94383.59 | 96903.19 | 10912.473840 |
| 2 | 2025-01-03 | 96016.63 | 98969.92 | 96905.48 | 98136.51 | 9021.885382 |
| 3 | 2025-01-04 | 97516.65 | 98761.02 | 98139.85 | 98209.85 | 2742.089606 |
| 4 | 2025-01-05 | 97250.00 | 98814.00 | 98209.85 | 98345.33 | 2377.921759 |
| 5 | 2025-01-06 | 97900.00 | 102500.00 | 98347.65 | 102279.41 | 15173.556068 |
| 6 | 2025-01-07 | 96105.11 | 102735.99 | 102279.41 | 96941.98 | 16587.286922 |
| 7 | 2025-01-08 | 92500.00 | 97254.35 | 96941.98 | 95036.63 | 14182.297395 |
| 8 | 2025-01-09 | 91187.00 | 95363.26 | 95033.18 | 92547.44 | 9712.378532 |
| 9 | 2025-01-10 | 92209.25 | 95862.92 | 92547.44 | 94701.18 | 12634.034078 |
| 10 | 2025-01-11 | 93804.05 | 94983.65 | 94701.48 | 94565.02 | 2638.699568 |
| 11 | 2025-01-12 | 93670.30 | 95383.84 | 94569.91 | 94509.62 | 2025.816130 |
| 12 | 2025-01-13 | 89028.64 | 95900.00 | 94507.24 | 94506.45 | 13094.863595 |
| 13 | 2025-01-14 | 94311.36 | 97353.29 | 94507.35 | 96534.96 | 11210.742267 |
| 14 | 2025-01-15 | 96400.00 | 100716.45 | 96534.97 | 100510.23 | 13610.747294 |
| 15 | 2025-01-16 | 97277.58 | 100880.00 | 100504.27 | 99981.78 | 12312.373669 |
| 16 | 2025-01-17 | 99937.81 | 105970.00 | 99981.46 | 104107.00 | 20518.309493 |
| 17 | 2025-01-18 | 102233.45 | 104933.15 | 104107.00 | 104435.00 | 7835.299918 |
| 18 | 2025-01-19 | 99518.00 | 106314.44 | 104435.01 | 101211.13 | 13312.636856 |
| 19 | 2025-01-20 | 99416.27 | 109358.01 | 101217.78 | 102145.43 | 32342.183113 |
| 20 | 2025-01-21 | 100051.00 | 107291.10 | 102145.42 | 106159.26 | 19411.234890 |
| 21 | 2025-01-22 | 103100.00 | 106431.34 | 106159.27 | 103667.11 | 10730.018962 |
| 22 | 2025-01-23 | 101200.01 | 106870.87 | 103659.60 | 103926.36 | 25064.864999 |
| 23 | 2025-01-24 | 102751.92 | 107200.00 | 103926.36 | 104850.27 | 12921.993614 |
| 24 | 2025-01-25 | 104104.00 | 105294.00 | 104866.13 | 104733.56 | 3404.853083 |
| 25 | 2025-01-26 | 102452.24 | 105478.80 | 104729.92 | 102563.00 | 4575.366115 |
| 26 | 2025-01-27 | 97715.03 | 103228.46 | 102565.28 | 102062.42 | 23647.141119 |
| 27 | 2025-01-28 | 100213.80 | 103770.85 | 102063.92 | 101290.00 | 9488.534295 |
| 28 | 2025-01-29 | 101275.60 | 104829.64 | 101290.01 | 103747.25 | 11403.202789 |
| 29 | 2025-01-30 | 103289.74 | 106484.77 | 103747.25 | 104742.64 | 13061.348812 |
| 30 | 2025-01-31 | 101506.00 | 106090.00 | 104742.63 | 102411.26 | 13313.681045 |
Coinbase Pull Data¶
This script combines the above functions to perform the following:
- Attempt to read an existing pickle data file
- If a data file exists, then pull updated data
- Otherwise, pull all historical data available for that asset on Coinbase
- Store pickle and/or excel files of the data in the specified directories
Through the base_directory, source, and asset_class variables the script knows where in the local filesystem to look for an existing pickle file and the store the resulting updated pickle and/or excel files:
df = coinbase_pull_data(
base_directory=DATA_DIR,
source="Coinbase",
asset_class="Cryptocurrencies",
excel_export=False,
pickle_export=True,
output_confirmation=True,
base_currency="BTC",
quote_currency="USD",
granularity=60, # 60=minute, 3600=hourly, 86400=daily
status='online', # default status is 'online'
start_date=datetime(current_year, current_month - 1, 1), # default start date
end_date=datetime.now() - timedelta(days=1), # updates data through 1 day ago due to lag in data availability
)
By passing None as the base_currency and/or the quote_currency, the script will use the coinbase_fetch_available_products function to pull the list of all the available products, and then pulls data for all assets in that list. This functionality is incredibly useful, and makes acquiring data very straightforward, especially for a set of products for a specific base_currency or quote_currency.
The example above pulls the data for BTC-USD, and stores it in the following system directory:
DATA_DIR/Coinbase/Cryptocurrencies/Minute
And here is the filesystem tree output for the Coinbase directory:
$ tree Coinbase/
Coinbase/
└── Cryptocurrencies
├── Daily
│ ├── BTC-USD.pkl
│ ├── BTC-USD.xlsx
│ ├── ETH-USD.pkl
│ ├── ETH-USD.xlsx
│ ├── SOL-USD.pkl
│ ├── SOL-USD.xlsx
│ ├── XRP-USD.pkl
│ └── XRP-USD.xlsx
├── Hourly
│ ├── BTC-USD.pkl
│ ├── BTC-USD.xlsx
│ ├── ETH-USD.pkl
│ ├── ETH-USD.xlsx
│ ├── SOL-USD.pkl
│ ├── SOL-USD.xlsx
│ ├── XRP-USD.pkl
│ └── XRP-USD.xlsx
└── Minute
├── BTC-USD.pkl
├── ETH-USD.pkl
├── SOL-USD.pkl
└── XRP-USD.pkl
5 directories, 20 files