Cleaning A Bloomberg Data Excel Export

Table of Contents
Introduction #
In this tutorial, we will write a python function that imports an excel export from Bloomberg, removes ancillary rows and columns, and leaves the data in a format where it can then be used in time series analysis.
Example of a Bloomberg excel export #
We will use the SPX index data in this example. Exporting the data from Bloomberg using the excel Bloomberg add-on yields data in the following format:

Data modifications #
The above format isn’t horrible, but we want to perform the following modifications:
- Remove the first six rows of the data
- Convert the 7th row to become column headings
- Rename column 2 to “Close” to represent the closing price
- Remove column 3, as we are not concerned about volume
- Export to excel and make the name of the excel worksheet “data”
Assumptions #
The remainder of this tutorial assumes the following:
- Your excel file is named “SPX_Index.xlsx”
- The worksheet in the excel file is named “Worksheet”
- You have the pandas library installed
- You have the OpenPyXL library installed
Python function to modify the data #
The following function will perform the modifications mentioned above:
# This function takes an excel export from Bloomberg and
# removes all excess data leaving date and close columns
# Imports
import pandas as pd
# Function definition
def bb_data_updater(fund):
# File name variable
file = fund + "_Index.xlsx"
# Import data from file as a pandas dataframe
df = pd.read_excel(file, sheet_name = 'Worksheet', engine='openpyxl')
# Set the column headings from row 5 (which is physically row 6)
df.columns = df.iloc[5]
# Set the column heading for the index to be "None"
df.rename_axis(None, axis=1, inplace = True)
# Drop the first 6 rows, 0 - 5
df.drop(df.index[0:6], inplace=True)
# Set the date column as the index
df.set_index('Date', inplace = True)
# Drop the volume column
try:
df.drop(columns = {'PX_VOLUME'}, inplace = True)
except KeyError:
pass
# Rename column
df.rename(columns = {'PX_LAST':'Close'}, inplace = True)
# Sort by date
df.sort_values(by=['Date'], inplace = True)
# Export data to excel
file = fund + ".xlsx"
df.to_excel(file, sheet_name='data')
# Output confirmation
print(f"The last date of data for {fund} is: ")
print(df[-1:])
print(f"Bloomberg data conversion complete for {fund} data")
return print(f"--------------------")
Let’s break this down line by line.
Imports #
First, we need to import pandas:
import pandas as pd
Import excel data file #
Then import the excel file as a pandas dataframe:
# File name variable
file = fund + "_Index.xlsx"
# Import data from file as a pandas dataframe
df = pd.read_excel(file, sheet_name = 'Worksheet', engine='openpyxl')
Running:
df.head(10)
Gives us:

Set column headings #
Next, set the column heading:
# Set the column headings from row 5 (which is physically row 6)
df.columns = df.iloc[5]
Now, running:
df.head(10)
Gives us:

Remove index heading #
Next, remove the column heading from the index column:
# Set the column heading for the index to be "None"
df.rename_axis(None, axis=1, inplace = True)
Note: The axis=1 argument here specifies the column index.
Now, running:
df.head(10)
Gives us:

Drop rows #
Next, we want to remove the first 6 rows that have unneeded data:
# Drop the first 6 rows, 0 - 5
df.drop(df.index[0:6], inplace=True)
Note: When dropping rows, the range to drop begins with row 0 and continues up to - but not including - row 6.
Now, running:
df.head(10)
Gives us:

Set index #
Next, we want to set the date column as the index:
# Set the date column as the index
df.set_index('Date', inplace = True)
Now, running:
df.head(10)
Gives us:

Drop the “PX_VOLUME” column #
Next, we want to drop the volume column:
# Drop the volume column
try:
df.drop(columns = {'PX_VOLUME'}, inplace = True)
except KeyError:
pass
For some data records, the volume column does not exist. Therefore, we try, and if it fails with a KeyError, then we assume the “PX_VOLUME” column does not exist, and just pass to move on.
Now, running:
df.head(10)
Gives us:

Rename the “PX_LAST” column #
Next, we want to rename the “PX_LAST” column as “Close”:
# Rename column
df.rename(columns = {'PX_LAST':'Close'}, inplace = True)
Now, running:
df.head(10)
Gives us:

Sort data #
Next, we want to sort the data starting with the oldest date:
# Sort by date
df.sort_values(by=['Date'], inplace = True)
Now, running:
df.head(10)
Gives us:

Export data #
Next, we want to export the data to an excel file, for easy viewing and reference later:
# Export data to excel
file = fund + ".xlsx"
df.to_excel(file, sheet_name='data')
And verify the output is as expected:

Output confirmation #
Finally, we want to print a confirmation that the process succeeded along withe last date we have for data:
# Output confirmation
print(f"The last date of data for {fund} is: ")
print(df[-1:])
print(f"Bloomberg data conversion complete for {fund} data")
print(f"--------------------")
And confirming the output:
