python抓取网页内容示例分享

当然，我可以为你提供一个简单的Python示例，用于抓取网页内容。这里我们将使用`requests`库来发送HTTP请求，并使用`BeautifulSoup`库来解析HTML内容。请确保你已经安装了这两个库，如果没有，你可以通过pip安装它们：


pip install requests beautifulsoup4

以下是抓取网页内容的Python示例代码：


import requests
from bs4 import BeautifulSoup

def fetch_webpage_content(url):
    """
    抓取网页内容

    Args:
    url (str): 要抓取的网页的URL

    Returns:
    str: 网页的HTML内容
    """
    try:
        # 发送HTTP GET请求
        response = requests.get(url)
        
        # 检查请求是否成功
        if response.status_code == 200:
            # 使用BeautifulSoup解析HTML
            soup = BeautifulSoup(response.text, 'html.parser')
            # 这里只是简单地返回了整个页面的HTML，
            # 你可以根据需要使用soup对象进一步解析和提取所需内容
            return soup.prettify()  # 返回格式化的HTML字符串，便于阅读
        else:
            return f"Failed to retrieve the webpage. Status code: {response.status_code}"
    except requests.RequestException as e:
        return f"Error occurred while fetching the webpage: {e}"

# 示例：抓取百度首页的内容
url = 'https://www.baidu.com'
content = fetch_webpage_content(url)
print(content)

请注意，这个示例只是简单地展示了如何发送HTTP请求并获取网页的HTML内容。在实际应用中，你可能需要根据网页的结构来解析和提取你需要的数据。此外，由于网页内容可能受版权保护，因此在抓取网页内容时要遵守相关法律法规和网站的robots.txt文件规定。