589 views

How to set up a proxy in Puppeteer for web scraping

Puppeteer, a widely used Chrome browser automation library, operates through the DevTools protocol and was initially developed for JavaScript (Node.js). 'pyppeteer' is its Python counterpart, enabling similar functionalities in a Python environment. This library allows for automating Chromium browser tasks using high-level APIs through the Chrome DevTools Protocol. An example application is creating a web crawler with the Mimic browser, which uses fake fingerprints to search and gather data.

How to set up and use proxies in Puppeteer in Python

The instructions show how to set up a private proxy with authorization in Puppeteer in Python for web scraping using the Python wrapper - “pyppeteer”.

  1. Begin by installing the “pip” library, which is essential for using “pyppeteer”. While pip usually comes with Python, it can be manually installed via the command line if it's not already present.

    1.png

  2. Install “pyppeteer” using the following command.

    2.png

  3. Incorporate “pyppeteer” and “pyppeteer-stealth” for configuring Puppeteer in Python.

    3.png

  4. Replace placeholders such as “http://your-proxy-ip:your-proxy-port”, “your-username”, and “your-password” with your actual proxy server details: IP address, port, username, and password. Also, modify the target page in the script “await page.goto(“https://example.com”), to the desired website.

    4.png

With these steps, you're set to run a web scraping script using the configured proxy in Python's Puppeteer. Using a proxy enables you to conceal your real IP and overcome restrictions, facilitating access to previously unreachable data.