Cleaning A Bloomberg Data Excel Export

Table of Contents

Introduction #

In this tutorial, we will write a python function that imports an excel export from Bloomberg, removes ancillary rows and columns, and leaves the data in a format where it can then be used in time series analysis.

Example of a Bloomberg excel export #

We will use the SPX index data in this example. Exporting the data from Bloomberg using the excel Bloomberg add-on yields data in the following format:

Format of data in excel export from Bloomberg

Data modifications #

The above format isn’t horrible, but we want to perform the following modifications:

Remove the first six rows of the data
Convert the 7th row to become column headings
Rename column 2 to “Close” to represent the closing price
Remove column 3, as we are not concerned about volume
Export to excel and make the name of the excel worksheet “data”

Assumptions #

The remainder of this tutorial assumes the following:

Your excel file is named “SPX_Index.xlsx”
The worksheet in the excel file is named “Worksheet”
You have the pandas library installed
You have the OpenPyXL library installed

Python function to modify the data #

The following function will perform the modifications mentioned above:

# This function takes an excel export from Bloomberg and 
# removes all excess data leaving date and close columns

# Imports
import pandas as pd

# Function definition
def bb_data_updater(fund):

    # File name variable
    file = fund + "_Index.xlsx"
    
    # Import data from file as a pandas dataframe
    df = pd.read_excel(file, sheet_name = 'Worksheet', engine='openpyxl')
    
    # Set the column headings from row 5 (which is physically row 6)
    df.columns = df.iloc[5]
    
    # Set the column heading for the index to be "None"
    df.rename_axis(None, axis=1, inplace = True)
    
    # Drop the first 6 rows, 0 - 5
    df.drop(df.index[0:6], inplace=True)
    
    # Set the date column as the index
    df.set_index('Date', inplace = True)
    
    # Drop the volume column
    try:
        df.drop(columns = {'PX_VOLUME'}, inplace = True)
    except KeyError:
        pass
        
    # Rename column
    df.rename(columns = {'PX_LAST':'Close'}, inplace = True)
    
    # Sort by date
    df.sort_values(by=['Date'], inplace = True)
    
    # Export data to excel
    file = fund + ".xlsx"
    df.to_excel(file, sheet_name='data')
    
    # Output confirmation
    print(f"The last date of data for {fund} is: ")
    print(df[-1:])
    print(f"Bloomberg data conversion complete for {fund} data")
    return print(f"--------------------")

Let’s break this down line by line.

Imports #

First, we need to import pandas:

import pandas as pd

Import excel data file #

Then import the excel file as a pandas dataframe:

# File name variable
file = fund + "_Index.xlsx"

# Import data from file as a pandas dataframe
df = pd.read_excel(file, sheet_name = 'Worksheet', engine='openpyxl')

Running:

df.head(10)

Gives us:

Dataframe excel import

Set column headings #

Next, set the column heading:

# Set the column headings from row 5 (which is physically row 6)
df.columns = df.iloc[5]

Now, running:

df.head(10)

Gives us:

Set column headings

Remove index heading #

Next, remove the column heading from the index column:

# Set the column heading for the index to be "None"
df.rename_axis(None, axis=1, inplace = True)

Note: The axis=1 argument here specifies the column index.

Now, running:

df.head(10)

Gives us:

Remove index heading

Drop rows #

Next, we want to remove the first 6 rows that have unneeded data:

# Drop the first 6 rows, 0 - 5
df.drop(df.index[0:6], inplace=True)

Note: When dropping rows, the range to drop begins with row 0 and continues up to - but not including - row 6.

Now, running:

df.head(10)

Gives us:

Remove rows

Set index #

Next, we want to set the date column as the index:

# Set the date column as the index
df.set_index('Date', inplace = True)

Now, running:

df.head(10)

Gives us:

Set index

Drop the “PX_VOLUME” column #

Next, we want to drop the volume column:

# Drop the volume column
try:
    df.drop(columns = {'PX_VOLUME'}, inplace = True)
except KeyError:
    pass

For some data records, the volume column does not exist. Therefore, we try, and if it fails with a KeyError, then we assume the “PX_VOLUME” column does not exist, and just pass to move on.

Now, running:

df.head(10)

Gives us:

Drop volume

Rename the “PX_LAST” column #

Next, we want to rename the “PX_LAST” column as “Close”:

# Rename column
df.rename(columns = {'PX_LAST':'Close'}, inplace = True)

Now, running:

df.head(10)

Gives us:

Rename column

Sort data #

Next, we want to sort the data starting with the oldest date:

# Sort by date
df.sort_values(by=['Date'], inplace = True)

Now, running:

df.head(10)

Gives us:

Sort by date

Export data #

Next, we want to export the data to an excel file, for easy viewing and reference later:

# Export data to excel
file = fund + ".xlsx"
df.to_excel(file, sheet_name='data')

And verify the output is as expected:

Excel export

Output confirmation #

Finally, we want to print a confirmation that the process succeeded along withe last date we have for data:

# Output confirmation
print(f"The last date of data for {fund} is: ")
print(df[-1:])
print(f"Bloomberg data conversion complete for {fund} data")
print(f"--------------------")

And confirming the output:

Output confirmation

References #

https://www.bloomberg.com/professional/support/software-updates/