dxalxmur.com

Quick $300 Web Scraping Opportunity: A Step-by-Step Guide

Written on

Chapter 1: Discovering the Opportunity

While browsing Craigslist, I stumbled upon a request for someone to download files from the Texas Department of Transportation's website and extract data from them. The client was seeking a manual approach.

Screenshot of Txdot.gov

Upon seeing this, an idea struck me—I could leverage Python to automate the process, completing the task in roughly an hour. After sending an email response, the client got back to me the following day, and we arranged a call. I reassured him that I could automate the necessary steps for efficient data retrieval. While he initially suggested hourly pay, I informed him that I typically charge on a project basis, and he accepted my proposal.

Section 1.1: Initial Steps

The first task was to download the files from the txdot.gov site. I adjusted an existing Jupyter notebook and accomplished this in about 15 minutes. For each year, I manually handled the data extraction. Tip: Sometimes, not every step needs automation. I merely adjusted a line of code as I navigated through different pages, streamlining the process without overthinking it.

To begin, I loaded the required libraries:

import pandas as pd import os, re, requests, urllib from selenium import webdriver from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.chrome.service import Service from selenium.webdriver.common.by import By from datetime import datetime, date, timedelta import urllib

Next, I identified the URL for the txdot.gov site, initiated the Firefox driver, and accessed the URL:

driver = webdriver.Firefox() # ('geckodriver.exe') driver.implicitly_wait(10) driver.get(url)

Section 1.2: The Download Process

I took a somewhat relaxed approach to gather the necessary information. Knowing there were fewer than 100 URLs for a specific year, I set up a loop to run 1000 times for the downloads, expecting my code would encounter errors. This wasn't an issue, as each iteration simply downloaded a file. The urllib.request.urlretrieve method facilitated the download using two parameters: the file location from the website and the filename along with its storage path on my device.

for row in range(1, 1000):

location = driver.find_element(By.XPATH,

f'/html/body/div[2]/div[3]/main/div[3]/div[1]/div[3]/div/div/div[7]/div/div/table/tbody/tr[{row}]/td[1]/a').get_attribute('href')

print(location)

urllib.request.urlretrieve(location, 'c:/users/denni/downloads/txdot/' + location.split('/')[-1])

I still needed clarification from the client on how he wanted the data extracted. Having frequently worked with PDFs, I estimated that the initial extraction would take around 15 minutes, while the PDF parsing would likely take less than 45 minutes. Thus, I projected earning over $300 for this straightforward gig.

Chapter 2: Completing the Task

The remaining parts of the job were relatively straightforward. The client requested I compile the listings from the pages. Given there were only seven years of data, I quickly copied and pasted the information into Excel, taking less than 5 minutes. The final step involved exporting the results to Excel. Initially, I started coding in Python, but then I remembered the PDFElement Pro software I had purchased for PDF conversions, which turned out to be the ideal solution. What I thought would take 10 minutes to code ended up taking just about 3 minutes to execute. I then zipped the results and sent them to the client.

Total time spent: under 60 minutes. Compensation = $300.

Interestingly, the client later presented additional requests. I clarified that these changes would incur extra charges.

It's essential to demonstrate to clients the possibilities of automation. Often, they are willing to pay a premium for swift results, as he could have hired someone to do it manually for $200 or less.

Explore how to make $300 using Scrapebox Robot Messenger through this video!

Learn easy tips and tricks for making money with web scraping on Fiverr!

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

Embracing Plant-Based Growth: A Sustainable Approach to Self-Improvement

Explore how adopting a plant-based growth mindset can lead to sustainable personal development.

Maximalist Design: Embracing Boldness in Creative Expression

Discover the vibrant world of maximalist design, exploring tips and techniques to infuse your projects with creativity and flair.

Understanding Game Theory: A Guide to Effective Problem Solving

Explore the principles of game theory and problem-solving strategies for personal growth and professional success.

Chemistry and Physics: An Interconnected Relationship

Exploring the vital interdependence of chemistry and physics in scientific modeling and understanding.

The Dual Nature of Empathy: A Closer Look at Its Complexities

Exploring the intricate dynamics of empathy, revealing its positive and negative aspects based on recent research findings.

Life Expectancy Disparities: A Look at Pro-Trump States

This analysis examines how living in pro-Trump states affects life expectancy, revealing significant regional disparities.

Unlocking the Power of Neuroplasticity to Change Your Habits

Discover how neuroplasticity can help you break old habits and form new ones for lasting change.

Exploring the Universe as a Sentient Neural Network

Investigating the idea of a sentient universe through the lens of science and spirituality.