How one can Scrape & Analyze Google Search Outcomes with Python

  • September 6, 2023
  • AI, SEO
No Comments

[ad_1]

Ever spent hours analyzing Google search outcomes and ended up extra pissed off and confused than earlier than?

Python hasn’t.

On this article, we’ll discover why Python is a perfect selection for Google search evaluation and the way it simplifies and automates an in any other case time-consuming activity.

We’ll additionally carry out an search engine optimization evaluation in Python from begin to end. And supply code so that you can copy and use.

However first, some background.

Why to Use Python for Google Search and Evaluation

Python is named a flexible, easy-to-learn programming language. And it actually shines at working with Google search information. 

Why? 

Listed below are a couple of key causes that time to Python as a best choice for scraping and analyzing Google search outcomes:

Python Is Straightforward to Learn and Use

Python is designed with simplicity in thoughts. So you may deal with analyzing Google search outcomes as a substitute of getting twisted up in difficult coding syntax.

It follows a simple to know syntax and elegance. Which permits builders to write down fewer traces of code in comparison with different languages.

an infographic showing the same example using Python, Java and C++

Python Has Effectively-Geared up Libraries

A Python library is a reusable chunk of code created by builders which you can reference in your scripts to offer additional performance with out having to write down it from scratch.

And Python now has a wealth of libraries like:

  • Googlesearch, Requests, and Stunning Soup for internet scraping
  • Pandas and Matplotlib for information evaluation 

These libraries are highly effective instruments that make scraping and analyzing information from Google searches environment friendly.

Python Gives Help from a Massive Neighborhood and ChatGPT

You’ll be well-supported in any Python undertaking you undertake, together with Google search evaluation.

As a result of Python’s recognition has led to a big, lively group of builders. And a wealth of tutorials, boards, guides, and third-party instruments. 

And when you may’t discover pre-existing Python code to your search evaluation undertaking, chances are high that ChatGPT will be capable to assist.

When utilizing ChatGPT, we advocate prompting it to:

  • Act as a Python knowledgeable and
  • Assist with an issue

Then, state:

  • The objective (“to question Google”) and 
  • The specified output (“the best model of a question”)
an example of Python-based query in ChatGPT

Setting Up Your Python Atmosphere

You will have to arrange your Python surroundings earlier than you may scrape and analyze Google search outcomes utilizing Python. 

There are various methods to get Python up and operating. However one of many quickest methods to begin analyzing Google search engine outcomes pages (SERPs) with Python is Google’s personal pocket book surroundings: Google Colab.

Right here’s how straightforward it’s to get began with Google Colab:

1. Entry Google Colab: Open your internet browser and go to Google Colab. When you have a Google account, register. If not, create a brand new account.

2. Create a brand new pocket book: In Google Colab, click on on “File” > “New Pocket book” to create a brand new Python pocket book.

open new notebook in Google Colab

3. Test set up: To make sure that Python is working appropriately, run a easy check by coming into and executing the code under. And Google Colab will present you the Python model that’s at the moment put in:

import sys
sys.model

Wasn’t that straightforward? 

There’s only one extra step earlier than you may carry out an precise Google search.

Importing the Python Googlesearch Module

Use the googlesearch-python bundle to scrape and analyze Google search outcomes with Python. It supplies a handy approach to carry out Google searches programmatically. 

Simply run the next code in a code cell to entry this Python Google search module:

from googlesearch import search
print("Googlesearch bundle put in efficiently!")

One advantage of utilizing Google Colab is that the googlesearch-python bundle is pre-installed. So, no want to try this first. 

It’s able to go when you see the message “Googlesearch bundle put in efficiently!”

Now, we’ll discover how one can use the module to carry out Google searches. And extract worthwhile data from the search outcomes.

How one can Carry out a Google Search with Python

To carry out a Google search, write and run a couple of traces of code that specify your search question, what number of outcomes to show, and some different particulars (extra on this within the subsequent part).

# set question to search for in Google
question = "lengthy winter coat"
# execute question and retailer search outcomes
outcomes = search(question, tld="com", lang="en", cease=3, pause=2)
# iterate over all search outcomes and print them
for outcome in outcomes:
print(outcome)

You’ll then see the highest three Google search outcomes for the question “lengthy winter coat.” 

Right here’s what it appears to be like like within the pocket book:

top three Google search results for the query "long winter coat" in the notebook

To confirm that the outcomes are correct, you should utilize Key phrase Overview.

Open the device, enter “lengthy winter coat” into the search field, and ensure the placement is about to “U.S.” And click on “Search.” 

search for "long winter coat" in the US in Keyword Overview tool

Scroll all the way down to the “SERP Evaluation” desk. And you must see the identical (or very comparable) URLs within the high three spots.

"SERP Analysis" table

Key phrase Overview additionally reveals you lots of useful information that Python has no entry to. Like month-to-month search quantity (globally and in your chosen location), Key phrase Issue (a rating that signifies how tough it’s to rank within the high 10 outcomes for a given time period), search intent (the rationale behind a person’s question), and rather more. 

Understanding Your Google Search with Python

Let’s undergo the code we simply ran. So you may perceive what every half means and how one can make changes to your wants.

We’ll go over every half highlighted within the picture under:

an image of the code ran above in Keyword Overview
  1. Question variable: The question variable shops the search question you need to execute on Google
  2. Search perform: The search perform supplies varied parametersthat help you customise your search and retrieve particular outcomes:
    1. Question: Tells the search perform what phrase or phrase to seek for. That is the one required parameter, so the search perform will return an error with out it. That is the one required parameter; all following ones are non-obligatory.
    2. Tld (brief for top-level area): Helps you to decide which model of Google’s web site you need to execute a search in. Setting this to “com” will search google.com; setting it to “fr” will search google.fr.
    3. Lang: Lets you specify the language of the search outcomes. And accepts a two-letter language code (e.g., “en” for English).
    4. Cease: Units the variety of the search outcomes for the search perform. We’ve restricted our search to the highest three outcomes, however you may need to set the worth to “10.”
    5. Pause: Specifies the time delay (in seconds) between consecutive requests despatched to Google. Setting an applicable pause worth (we advocate not less than 10) may help keep away from being blocked by Google for sending too many requests too rapidly.
  3. For loop sequence: This line of code tells the loop to iterate by way of every search outcome within the “outcomes” assortment one after the other, assigning every search outcome URL to the variable “outcome”
  4. For loop motion: This code block follows the for loop sequence (it’s indented) and incorporates the actions to be carried out on every search outcome URL. On this case, they’re printed into the output space in Google Colab.

How one can Analyze Google Search Outcomes with Python

When you’ve scraped Google search outcomes utilizing Python, you should utilize Python to investigate the info to extract worthwhile insights. 

For instance, you may decide which key phrases’ SERPs are comparable sufficient to be focused with a single web page. That means Python is doing the heavy lifting concerned in key phrase clustering.

Let’s follow our question “lengthy winter coat” as a place to begin. Plugging that into Key phrase Overview reveals over 3,000 key phrase variations.

"Keyword Variations" table for "long winter coat" query shows 3.0K results

For the sake of simplicity, we’ll follow the 5 key phrases seen above. And have Python analyze and cluster them by creating and executing this code in a brand new code cell in our Google Colab pocket book:

import pandas as pd
# Outline the principal question and checklist of queries
main_query = "lengthy winter coat"
secondary_queries = ["long winter coat women", "womens long winter coats", "long winter coats for women", "long winter coats"]
# Execute the principal question and retailer search outcomes
main_results = search(main_query, tld="com", lang="en", cease=3, pause=2)
main_urls = set(main_results)
# Dictionary to retailer URL percentages for every question
url_percentages = 
# Iterate over the queries
for secondary_query in secondary_queries:
# Execute the question and retailer search outcomes
secondary_results = search(secondary_query, tld="com", lang="en", cease=3, pause=2)
secondary_urls = set(secondary_results)
# Compute the share of URLs that seem in the principal question outcomes
share = (len(main_urls.intersection(secondary_urls)) / len(main_urls)) * 100
url_percentages[secondary_query] = share
# Create a dataframe from the url_percentages dictionary
df_url_percentages = pd.DataFrame(url_percentages.objects(), columns=['Secondary Query', 'Percentage'])
# Type the dataframe by share in descending order
df_url_percentages = df_url_percentages.sort_values(by='Proportion', ascending=False)
# Print the sorted dataframe
df_url_percentages

With 14 traces of code and a dozen or so seconds of ready for it to execute, we will now see that the highest three outcomes are the identical for these queries:

  • “lengthy winter coat”
  • “lengthy winter coat girls”
  • “womens lengthy winter coats”
  • “lengthy winter coats for girls”
  • “lengthy winter coats”

So, these queries could be focused with the identical web page.

Additionally, you shouldn’t attempt to rank for “lengthy winter coat” or “lengthy winter coats” with a web page providing coats for males.

Understanding Your Google Search Evaluation with Python

As soon as once more, let’s undergo the code we’ve simply executed. It’s somewhat extra advanced this time, however the insights we’ve simply generated are rather more helpful, too.

an image of the code ran above in Keyword Overview

1. Import pandas as pd: Imports the Pandas library and makes it callable by the abbreviation “pd.” We’ll use the Pandas library to create a “DataFrame,” which is basically a desk contained in the Python output space.

2. Main_query = “python google search”: Defines the principle question to seek for on Google

3. Secondary_queries = [“google search python”, “google search api python”, “python search google”, “how to scrape google search results python”]: Creates an inventory of queries to be executed on Google. You may paste many extra queries and have Python cluster lots of of them for you.

4. Main_results = search(main_query, tld=”com”, lang=”en”, cease=3, pause=2): Executes the principle question and shops the search ends in main_results. We restricted the variety of outcomes to a few (cease=3), as a result of the highest three URLs in Google’s search outcomes typically do the perfect job when it comes to satisfying customers’ search intent.

5. Main_urls = set(main_results): Converts the search outcomes of the principle question right into a set of URLs and shops them in main_urls

6. Url_percentages = : Initializes an empty dictionary (an inventory with mounted worth pairs) to retailer the URL percentages for every question

an image of the code described in this section

7. For secondary_query in secondary_queries :: Begins a loop that iterates over every secondary question within the secondary queries checklist

8. Secondary_results = search(secondary_query, tld=”com”, lang=”en”, cease=3, pause=2): Executes the present secondary question and shops the search ends in secondary_results. We restricted the variety of outcomes to a few (cease=3) for a similar cause we talked about earlier.

9. Secondary_urls = set(secondary_results): Converts the search outcomes of the present secondary question right into a set of URLs and shops them in secondary_urls

10. Proportion = (len(main_urls.intersection(urls)) / len(main_urls)) * 100: Calculates the proportion of URLs that seem in each the principle question outcomes and the present secondary question outcomes. The result’s saved within the variable share.

11. Url_percentages[secondary_query] = share: Shops the computed URL share within the url_percentages dictionary, with the present secondary question as the important thing

an image of the code described in this section

12. Df_url_percentages = pd.DataFrame(url_percentages.objects(), columns=[‘Secondary Query’, ‘Percentage’]): Creates a Pandas DataFrame that holds the secondary queries within the first column and their overlap with the principle question within the second column. The columns argument (which has three labels for the desk added) is used to specify the column names for the DataFrame.

13. Df_url_percentages = df_url_percentages.sort_values(by=’Proportion’, ascending=False): Types the DataFrame df_url_percentages primarily based on the values within the Proportion column. By setting ascending=False, the dataframe is sorted from the very best to the bottom values.

14. Df_url_percentages: Reveals the sorted DataFrame within the Google Colab output space. In most different Python environments you would need to use the print() perform to show the DataFrame. However not in Google Colab— plus the desk is interactive.

In brief, this code performs a collection of Google searches and reveals the overlap between the highest three search outcomes for every secondary question and the principle question. 

The bigger the overlap is, the extra probably you may rank for a major and secondary question with the identical web page. 

Visualizing Your Google Search Evaluation Outcomes

Visualizing the outcomes of a Google search evaluation can present a transparent and intuitive illustration of the info. And allow you to simply interpret and talk the findings.

Visualization is useful once we apply our code for key phrase clustering to not more than 20 or 30 queries. 

Be aware: For bigger question samples, the question labels within the bar chart we’re about to create will bleed into one another. Which makes the DataFrame created above extra helpful for clustering.

You may visualize your URL percentages as a bar chart utilizing Python and Matplotlib with this code:

import matplotlib.pyplot as plt
sorted_percentages = sorted(url_percentages.objects(), key=lambda x: x[1], reverse=True)
sorted_queries, sorted_percentages = zip(*sorted_percentages)
# Plotting the URL percentages with sorted x-axis
plt.bar(sorted_queries, sorted_percentages)
plt.xlabel("Queries")
plt.ylabel("URL Proportion")
plt.title("URL Proportion in Search Outcomes")
plt.xticks(rotation=45)
plt.ylim(0, 100)
plt.tight_layout()
plt.present()

We’ll rapidly run by way of the code once more:

an image of the code described in this section

1. Sorted_percentages = sorted(url_percentages.objects(), key=lambda x: x[1], reverse=True): This specifies that the URL percentages dictionary (url_percentages) is sorted by worth in descending order utilizing the sorted() perform. It creates an inventory of tuples (worth pairs) sorted by the URL percentages.

2. Sorted_queries, sorted_percentages = zip(*sorted_percentages): This means the sorted checklist of tuples is unpacked into two separate lists (sorted_queries and sorted_percentages) utilizing the zip() perform and the * operator. The * operator in Python is a device that allows you to break down collections into their particular person objects

an image of the code section described above

3. Plt.bar(sorted_queries, sorted_percentages): This creates a bar chart utilizing plt.bar() from Matplotlib. The sorted queries are assigned to the x-axis (sorted_queries). And the corresponding URL percentages are assigned to the y-axis (sorted_percentages).

4. Plt.xlabel(“Queries”): This units the label “Queries” for the x-axis

5. Plt.ylabel(“URL Proportion”): This units the label “URL Proportion” for the y-axis

6. Plt.title(“URL Proportion in Search Outcomes”): This units the title of the chart to “URL Proportion in Search Outcomes”

7. Plt.xticks(rotation=45): This rotates the x-axis tick labels by 45 levels utilizing plt.xticks() for higher readability

8. Plt.ylim(0, 100): This units the y-axis limits from 0 to 100 utilizing plt.ylim() to make sure the chart shows the URL percentages appropriately

9. Plt.tight_layout(): This perform adjusts the padding and spacing between subplots to enhance the chart’s structure

10. Plt.present(): This perform is used to show the bar chart that visualizes your Google search outcomes evaluation

And right here’s what the output appears to be like like:

"URL percentage in search results" graph

Grasp Google Search Utilizing Python’s Analytical Energy

Python gives unimaginable analytical capabilities that may be harnessed to successfully scrape and analyze Google search outcomes.

We’ve checked out how one can cluster key phrases, however there are nearly limitless purposes for Google search evaluation utilizing Python. 

However even simply to increase the key phrase clustering we’ve simply carried out, you may:

  • Scrape the SERPs for all queries you propose to focus on with one web page and extract all of the featured snippet textual content to optimize for them
  • Scrape the questions and solutions contained in the Individuals additionally ask field to regulate your content material to point out up in there

You’d want one thing extra sturdy than the Googlesearch module. There are some nice SERP software programming interfaces (APIs) on the market that present nearly all the data you discover on a Google SERP itself, however you may discover it less complicated to get began utilizing Key phrase Overview.

This device reveals you all of the SERP options to your goal key phrases. So you may research them and begin optimizing your content material.

[ad_2]

Supply hyperlink

About us and this blog

We are a digital marketing company with a focus on helping our customers achieve great results across several key areas.

Request a free quote

We offer professional SEO services that help websites increase their organic search score drastically in order to compete for the highest rankings even when it comes to highly competitive keywords.

Subscribe to our newsletter!

More from our blog

See all posts

Leave a Comment