Scrapping Financial (ESG) Data with Python

Image for post
Image for post

It may no longer be enough to beat the market. ESG — Environment, Social, and Governance — considerations are becoming ever more important in investment decision-making. In fact, investing in sustainable companies might even be the top priority for investors. Today, I want to focus on scrapping ESG data to identify how sustainable companies are.

The ultimate objective is to allow users to easily toggle through companies in the S&P 500 to see their ESG scores, allowing for a quick assessment of their sustainability.

Yahoo Finance allows you to view a variety of financial information for a company, and we’ll use Microsoft as an example, for starters:

Image for post
Image for post

We’ll start by importing the following packages:

Image for post
Image for post

To start web-scrapping, you need to give Python the URL of the site you want to scrape by making what’s called a request. This allows Python to read the website’s content:

The code above attempts to access the URL. If it successfully does so, it will print success, otherwise it will let you know that there is a problem.

The Sustainability tab displays several kinds of ESG scores, and we’ll first scrape Microsoft’s Total ESG Score. To scrape any piece of information from the web, you need to find its HTML tag, which boils down to highlighting that piece of information with your mouse and “inspecting” it by right-clicking:

Image for post
Image for post

The piece of information we’re after is identified in the HTML panel as div class=Fz(36px) Fw(600) D(ib) Mend(5px). To get Python to identify and extract this, you need to first tell it to parse the URL it read a moment ago. This will allow it to find this piece of information:

Image for post
Image for post

Python is now in a position to extract the Total ESG score, which is represented as div class=Fz(36px) Fw(600) D(ib) Mend(5px). Now you need to tell it to look for this information:

Image for post
Image for post

We can see that Python found what we are looking for, but we really just want the actual score — 16, not all of the HTML. Typing esg_score.text accomplishes this:

Image for post
Image for post

At this point, we’ve scrapped the total ESG score. Now let’s get the Environment, Social and Governance scores:

Image for post
Image for post

The Environment, Social, and Governance scores are identified as div class=D(ib) Fz(23px) smartphone_Fz(22px) Fw(600). Like we did before, we tell Python to look for this tag, but since there are more than one of these, we need to tell Python to find and store all of them somewhere:

Image for post
Image for post

Python first creates an empty list, elements, to store the rest of the ESG scores, and subsequently extracts and puts each score in the list. Let’s view the results:

Image for post
Image for post

Let’s then get the Controversy Score, which follows the same process as the Total ESG score we retrieved initially:

Image for post
Image for post

Now that we’ve got all of our scores, let’s label each one by putting them in a Data Frame, an excel-like spreadsheet, if you will:

Image for post
Image for post

We successfully scrapped Microsoft’s ESG scores. However, what if we wanted to search for any company’s scores programmatically? To do that, we’ll first create a function that gathers all of the tickers in the S&P 500:

Image for post
Image for post

Next, we’ll create a function that scrapes ESG scores. This just iterates over the steps we described above:

Image for post
Image for post

The last function we’ll create will classify the severity of the Controversy Score — 0 indicating No controversy and 4 or 5 indicating Severe Controversy:

Image for post
Image for post

Finally, we create our interactive menu that allows us to view the ESG scores for any stock we want:

Image for post
Image for post

You should now be able to view the ESG scores of any stock in the S&P 500 in an easy fashion. The biggest implication, however, from this lesson, in my mind, is that it will put you in a position to build your own ESG dataset. That is, you can now scrape and store ESG scores in a CSV or Excel file, and you will have created a very important dataset from scratch, at no cost!

Here is the code in its entirety:

Image for post
Image for post

Stumbled into a data-centric role several years ago and have not looked back! Passionate about leveraging technology to uncover answers and improve the world.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store