Tag: Python

  • Make a Price Drop Notifier in Python

    In this guide, I will show you how to use the BeautifulSoup library to make a simple program that notifies you when a product on an online site drops in price.

    This library runs in the background, scraping static online e-commerce sites of your choice and notifying you when a product drops in price.

    Prerequisites

    This guide assumes that you have Python installed, pip added to your system’s PATH, along with a basic understanding of Python and HTML.

    Installing Required Components

    First, let’s install BeautifulSoup and Requests. The Requests library retrieves our data, but the BeautifulSoup library actually analyzes our data.

    We can install those two required components by running the command below:

    BAT (Batchfile)
    pip install beautifulsoup4 requests

    Note that depending on what your system’s setup is, you might need to use pip3 instead of pip.

    Grabbing Our Sample: Price

    In this step, we will be telling BeautifulSoup what exactly to scrape. In this case, it’s the price. But we need to tell BeautifulSoup where the price is on the website.

    To do this, navigate to the product you want to scrape. For this guide, I will be scraping an AV channel receiver I found on Amazon.

    Then, use your browser’s DevTools and navigate to the price. However, make sure that you have a very “unique” element selected. This is an element that shows the product’s price but is also very specifically identified within the HTML document. Ideally, choose an element with an id attribute, as there cannot be two elements with the same HTML ID. Try to get as much “uniqueness” as you can because this will make the parsing easier.

    The elements I have selected above are not the most “unique” but are the closest we can get as they have lots of classes that I can safely assume not many other elements have all of.

    We also want to ensure that our web scraper stays as consistent as possible with website changes.

    If you also don’t have an element that is completely “unique”, then I suggest using the Console tab and JavaScript DOM to see how many other elements have those attributes.

    Like, in this case, I am trying to see whether the element I selected is “unique” enough to be selected by its class.

    In this case, there is only one other element that I need to worry about, which I think is good enough.

    Basic Scraping: Setup

    This section will detail the fundamentals of web scraping only. We will add more features as this guide goes on, building upon the code we will write now.

    First, we need to import the libraries we will be using.

    Python
    import requests as rq
    from bs4 import BeautifulSoup

    Then, we need to retrieve the content from our product. I will be using this AV receiver as an example.

    Python
    request = rq.get("https://www.amazon.com/Denon-AVR-X1700H-Channel-Receiver-Built/dp/B09HFN8T64/")

    If the content you want to scrape is locked behind a login screen, chances are you need to provide basic HTTP authentication to the site. Luckily, the Requests library has support for this. If you need authentication, add the auth parameter to the get method above, and make it a tuple that follows the format of ('username','password').

    For example, if Amazon required us to use HTTP basic authentication, we would declare our request variable like the one below:

    Python
    request = rq.get("https://www.amazon.com/Denon-AVR-X1700H-Channel-Receiver-Built/dp/B09HFN8T64/", auth=("replaceWithUsername","replaceWithPwd"))

    If that authentication type does not work, then the site may be using HTTP Digest authentication.

    To authenticate with Digest, you will need to import HTTPDigestAuth from Request’s sub-library, auth. Then it’s as simple as passing that object into the auth parameter.

    Python
    from requests.auth import HTTPDigestAuth
    request = rq.get("https://www.amazon.com/Denon-AVR-X1700H-Channel-Receiver-Built/dp/B09HFN8T64/", auth=HTTPDigestAuth("replaceWithUsername","replaceWithPwd"))

    If the content you want to scrape requires a login other than basic HTTP authentication or Digest authentication, consult this guide for other types of authentications.

    Amazon does not require any authentication, so our code will work providing none.

    Now, we need to create a BeautifulSoup object and pass in our website’s response to the object.

    Python
    parser = BeautifulSoup(request.content, 'html.parser')

    When you use the Requests library to print a response to the console, you generally will want to use request.text. However, since we don’t need to worry about decoding the response into printable text, it is considered better practice to return the raw bytes with request.content.

    Basic Scraping: Searching Elements

    Now we can get to the fun part! We will find the price element using our sample we got earlier.

    I will cover two of the most common scenarios, one where you need to find the price based on its element’s ID – the simplest, or one where you need to find the price based on class names and sub-elements – a little more complicated but not too difficult, assuming you have a “unique” enough element.

    If we wanted to refer to an element based on its ID with BeautifulSoup, you would use the find method. For example, if we wanted to store the element with the ID of pricevalue within a variable called priceElement, we would invoke find() with the argument of id set to the value "pricevalue".

    Python
    priceElement = parser.find(id="pricevalue")

    We can even print our element to the console!

    Python
    print(priceElement.prettify())
    Expected Output (may vary)
    <div id="pricevalue"><p>$19.99</p></div>

    The function prettify is used to reformat (“pretty-print”) the output. It is used when you want to be able to visualize the data, as it results in better-looking output to the console.

    Now we get to the tougher part – making references to element(s) based on one or more class names. This is the method you will need to use for most major e-commerce sites like Amazon or Ebay.

    This time, we will be using the find_all function. It is used only in situations where it is theoretically possible to get multiple outputs, like when we have multiple classes as the function gives the output as a list of strings, not a single string.

    If you are not sure, know that you can use find_all even when the query you give it only returns one result, you will just get a one item list.

    The code below will return any elements with the classes of priceToPay or big-text.

    Python
    priceElements = parser.find_all(class_=["priceToPay","big-text"])

    The select function is just like that of the find function except instead of directly specifying attributes using its function parameters, you simply pass in a CSS selector and get a list of matching element(s) back.

    The code above selects all elements with the class of both price-value and main-color. Although many use the find or find_all functions, I prefer select as I am already familiar with CSS selectors.

    If, and this is not much of a good idea when finding elements, we would like to filter by element type, we will just call find_all with a single positional argument, the element’s type. So, parser.find_all("p") will return a list of every single paragraph (“p“) element.

    An element type is one of the broadest filters you can pass into the find_all function, so this only becomes useful when you combine it with another narrower filter, such as an id or class.

    Python
    parser.find_all("h1", id="title")

    That would return all h1 elements with an ID of title. But since each element needs to have its own unique ID, we can just use the find function. Let’s do something more realistic.

    Python
    parser.find_all("h1",class_="bigText")

    This code would return all h1 elements that had a class of bigText.

    Below are a few reviews of what we know so far and some other, rarer methods of element finding.

    Python
    """
    Never recommended, but returns a list of ALL the elements that have type 'p'
    """
    typeMatch = parser.find_all("p")
    
    """
    Finds element with the ID of 'priceValue' using a CSS selector
    """
    idSelMatch = parser.select("#priceValue")
    
    """
    Finds element with the ID of 'priceValue', except with the BeautifulSoup-native find function and not with a CSS selector
    """
    idMatch = parser.find(id="priceValue") # Same as above
    
    
    """
    Extremely rare, but returns a list of elements containing an ID of 'priceValue' OR 'price'
    """
    orIdMatch = parser.find_all(id=["priceValue","price"])
    
    
    """
    Returns a list of elements that have the class 'price' OR 'dollarsToPay'. I do not know of a CSS selector that does the same
    """
    orClassMatch = parser.find_all(class_=['price','dollarsToPay'])
    
    
    """
    Returns a list of elements that have the class 'price' AND 'dollarsToPay'. I do not know of a 
    find_all argument that does the same
    """
    andClassMatch = parser.select(".priceValue.dollarsToPay")
    
    """
    Returns the element that has a class of 'v' INSIDE the element of class 't'. This can also be done with ID attributes, but this function only works when the first function is .find(...) or when you are grabbing an element by index after calling .find_all(...). Because .find(...) only returns one element, it will only be returning the first instance of that class name. The code below return the same thing, however 'inMatch3' returns a list
    """
    inMatch = parser.find(class_="t").find(class_="v") # Most basic way to do it
    inMatch2 = parser.find_all(class_="t")[0].find_all(class_="v")[0] # Because .find_all(...) works on the final element, the '[0]' is unnecessary, we just do it so we don't get a one-element list
    inMatch3 = parser.find_all(class_="t")[0].find_all(class_="v") # Returns a one-element list

    Now that we know how to search elements, we can finally implement this in our price drop notifier!

    Let’s see if our request is successful. We will be printing out the entire file to check.

    Python
    print(parser.find("html").prettify())

    And we are not.

    Hmmm, so we have to bypass Amazon’s CAPTCHA somehow, so let’s try adding headers that mimic a normal browser!

    I will be adding headers to rq.get(). Make sure to replace my AV channel receiver link with the product you want to scrape.

    Replace “request=rq.get(…)”
    headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36","accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7","accept-encoding":"gzip, deflate, br","accept-language":"en-US,en;q=0.9","Sec-Ch-Ua":'"Not_A Brand";v="8", "Chromium";v="120", "Google Chrome";v="120"',"Sec-Ch-Ua-Mobile":"?0","Sec-Ch-Ua-Platform":"\"Windows\""}
    
    request = rq.get("https://www.amazon.com/Denon-AVR-X1700H-Channel-Receiver-Built/dp/B09HFN8T64/",headers=headers)

    Let’s try now…

    Nope. Still nothing. Well, time for plan B, ditching requests completely and using selenium.

    Sign up for our newsletter!

    Basic Scraping: Implementation of Selenium

    Firstly, it is important to know that Selenium has its own methods for finding elements in a HTML document, but for the sake of this guide, we will just be passing the source code of the website to our parser.

    Think of Selenium as a browser running in the background with some selection abilities. Instead of sending the requests to the website by crafting our own headers, we can use Selenium to spin up an invisible browser that crafts the headers for us. We should no longer get a CAPTCHA screen because Amazon shouldn’t be suspicious that a robot is browsing the page – we are technically using a legitimate browser, but with parsing capabilities.

    Installation of Selenium can be done with the command below. We will also be installing win10toast so you get a proper toast notification whenever a price drop is detected.

    BAT (Batchfile)
    pip install selenium
    pip install win10toast

    If you are looking for how you can uninstall Requests because you don’t need it anymore, think twice because Selenium depends on Requests anyways.

    Now, clear your entire Python file because we are going to need to do a short and quick rewrite of our code to use Selenium.

    Like always, we will start by importing the required modules. Make sure you replace chrome with the name of a browser you have installed on your system, preferably the most resource efficient one.

    Python
    from selenium import webdriver
    from bs4 import BeautifulSoup
    from selenium.webdriver.chrome.options import Options # Imports the module we will use to change the settings for our browser
    import time # This is what we will use to set delays so we don't use too many system resources
    from win10toast import ToastNotifier # This is what we will use to notify if a price drop occurs.
    
    notifier = ToastNotifier() # Assign our notifier class to a variable

    Then, we will need to set some preferences for the browser we are about to start. Let’s start by declaring an Options class and using it to make the browser invisible or run it in “headless” mode. While the arguments below are for specific browsers, I would just execute them all because I have not tested each argument individually.

    Python
    browserOptions = Options()
    browserOptions.headless = True # Makes Firefox run headless
    browserOptions.add_argument("--headless=new") # Makes newer versions of Chrome run headless
    browserOptions.add_argument("--headless") # Makes older versions of Chrome run headless
    browserOptions.add_argument("--log-level=3") # Only log fatal errors

    Now, we will initiate the browser in the background. Again, make sure you replace Chrome with whichever browser you want to use for this project.

    Python
    browser = webdriver.Chrome(options=browserOptions)

    Now, we can navigate our browser to the page we want to scrape and get its source, which we can pass to BeautifulSoup.

    Python
    browser.get("https://www.amazon.com/Denon-AVR-X1700H-Channel-Receiver-Built/dp/B09HFN8T64/")
    parser = BeautifulSoup(browser.page_source, "html.parser")

    Then, we can use what we already know about BeautifulSoup to grab the price of our element. Remember to replace the code below with one tailored to your sample.

    Python
    price = parser.select(".a-price.aok-align-center.reinventPricePriceToPayMargin.priceToPay")[0].find_all(class_="a-offscreen")[0].text

    Next, let’s strip the $ symbol from the price and convert it into a floating-point decimal.

    Python
    price = float(price.strip("$"))

    Then, we can set a variable to compare with the current price.

    Python
    previousPrice = price

    Now, we loop infinitely to see whether the price changed.

    Python
    while True:

    Insert a new line and then indent the code we will write from this point forward.

    Now, every two minutes (120 seconds), we refresh the page and compare the price we just got to our previous price.

    Python (place each line indented inside while loop)
    browser.refresh() # Refreshes the browser
    
    # Now that we may have a new price, we have to redfine our parser and price variables to adapt to that new page code
    parser = BeautifulSoup(browser.page_source, "html.parser")
    price = parser.select(".a-price.aok-align-center.reinventPricePriceToPayMargin.priceToPay")[0].find_all(class_="a-offscreen")[0].text
    price = float(price.strip("$"))
    
    # Next, we compare the two prices. If we find one, we alert the user and update our price threshold. We will also be looking for price increases.
    if (price<previousPrice):
      print(f"Price DECREASED from ${previousPrice} to ${price}!")
      notifier.show_toast("Price Drop!", f"The price decreased from ${previousPrice} to ${price}!")
    elif (price>previousPrice):
      print(f"Price INCREASED from ${previousPrice} to ${price}!")
      notifier.show_toast(":(", f"The price increased from ${previousPrice} to ${price} :(")
    
    # Now, we can tell the user we refreshed
    print(f"Refreshed! Previous Price: ${previousPrice}, and new price ${price}")
    previousPrice = price
    
    # And then we wait for two minutes
    time.sleep(120)

    And just like that, you are finished! I hoped this project was useful to you!

  • Tensors Dimensions and Basics in Python Artificial Intelligence and Machine Learning

    In PyTorch and TensorFlow, Tensors are a very popular way of storing large amounts of data in artificial intelligence projects. Here, I will show you what they are, and how they work.

    What makes a Tensor?

    Tensors are made up of Scalars, Vectors, and Matrixes. Scalars are single numbers. Vectors are a line of numbers, and Matrixes are, as the name suggests, Matrixes, or tables, of numbers.

    Here is an example: If you are making an image, you can think of Matrixes as images, Scalars as pixels or dots, and Vectors like rows. You can think of Tensors as a Matrix that contains Matrixes.

    Yellow: Main tensor

    Red: Matrix 1

    Cyan/Light Blue: Matrix 2

    Orange: Vectors

    Green: Scalars

    Matrix dimension

    Matrixes are tables of numbers, so the number of rows and columns in the matrix is the matrix dimension. Below is an example.

    12
    34

    There are two rows and two columns in this table of numbers or matrix, so the dimensions of this matrix are two by two. Below is another example.

    1234
    1234
    1234
    1234

    What I showed you had four rows and columns, so the matrix above is a four-by-four matrix.

    Tensor Dimension

    Tensor dimensions are made up of three things. Earlier in this post, I mentioned how a tensor is a matrix containing matrixes. The first dimension of a tensor is how many matrixes the tensor should have in it. The next two dimensions are the dimensions you want each matrix to have. For example,

    1234
    5678
    9101112
    13141516

    would be a 4×4 matrix. If you wanted four four-by-four matrixes, you would need to make the first dimension (the number of matrixes to be in the tensor, which, as I said, is a matrix full of matrixes) four. Then, you would want 4×4 matrixes, so you would input the next two dimensions as 4 and 4 for a 4×4 tensor.

    Tips

    • If you do not input your first dimension (the number of matrixes in the tensor) into a tensor, the number defaults to 1.
    • Tensors are useful for storing mass amounts of data.
    • One of the easiest ways to make a tensor with custom values would be to have a loop running into every scalar in the tensor, thus making every scalar something you choose.
    • Tensors, when stored, are stored unevaluated. This means that your actual data, typically the data you would be storing in a tensor would be numbers, is not actually stored raw, but rather compressed, which makes tensor storage much easier for the machine’s memory, since the data is significantly less complicated. This is what makes tensors so popular for the storage of mass data. If you want to see the actual, uncompressed data of a tensor, you must evaluate it. You can do this with a simple function in both PyTorch and TensorFlow.