Scrapping Financial (ESG) Data with Python

Curt Beck
5 min readApr 19, 2020

--

It may no longer be enough to beat the market. ESG — Environment, Social, and Governance — considerations are becoming ever more important in investment decision-making. In fact, investing in sustainable companies might even be the top priority for investors. Today, I want to focus on scrapping ESG data to identify how sustainable companies are.

The ultimate objective is to allow users to easily toggle through companies in the S&P 500 to see their ESG scores, allowing for a quick assessment of their sustainability.

Yahoo Finance allows you to view a variety of financial information for a company, and we’ll use Microsoft as an example, for starters:

We’ll start by importing the following packages:

To start web-scrapping, you need to give Python the URL of the site you want to scrape by making what’s called a request. This allows Python to read the website’s content:

The code above attempts to access the URL. If it successfully does so, it will print success, otherwise it will let you know that there is a problem.

The Sustainability tab displays several kinds of ESG scores, and we’ll first scrape Microsoft’s Total ESG Score. To scrape any piece of information from the web, you need to find its HTML tag, which boils down to highlighting that piece of information with your mouse and “inspecting” it by right-clicking:

The piece of information we’re after is identified in the HTML panel as div class=Fz(36px) Fw(600) D(ib) Mend(5px). To get Python to identify and extract this, you need to first tell it to parse the URL it read a moment ago. This will allow it to find this piece of information:

Python is now in a position to extract the Total ESG score, which is represented as div class=Fz(36px) Fw(600) D(ib) Mend(5px). Now you need to tell it to look for this information:

We can see that Python found what we are looking for, but we really just want the actual score — 16, not all of the HTML. Typing esg_score.text accomplishes this:

At this point, we’ve scrapped the total ESG score. Now let’s get the Environment, Social and Governance scores:

The Environment, Social, and Governance scores are identified as div class=D(ib) Fz(23px) smartphone_Fz(22px) Fw(600). Like we did before, we tell Python to look for this tag, but since there are more than one of these, we need to tell Python to find and store all of them somewhere:

Python first creates an empty list, elements, to store the rest of the ESG scores, and subsequently extracts and puts each score in the list. Let’s view the results:

Let’s then get the Controversy Score, which follows the same process as the Total ESG score we retrieved initially:

Now that we’ve got all of our scores, let’s label each one by putting them in a Data Frame, an excel-like spreadsheet, if you will:

We successfully scrapped Microsoft’s ESG scores. However, what if we wanted to search for any company’s scores programmatically? To do that, we’ll first create a function that gathers all of the tickers in the S&P 500:

Next, we’ll create a function that scrapes ESG scores. This just iterates over the steps we described above:

The last function we’ll create will classify the severity of the Controversy Score — 0 indicating No controversy and 4 or 5 indicating Severe Controversy:

Finally, we create our interactive menu that allows us to view the ESG scores for any stock we want:

You should now be able to view the ESG scores of any stock in the S&P 500 in an easy fashion. The biggest implication, however, from this lesson, in my mind, is that it will put you in a position to build your own ESG dataset. That is, you can now scrape and store ESG scores in a CSV or Excel file, and you will have created a very important dataset from scratch, at no cost!

Here is the code in its entirety:

--

--

Curt Beck

Stumbled into a data-centric role several years ago and have not looked back! Passionate about leveraging technology to uncover answers and improve the world.