Python爬虫如何提取网页中被超链接标签包裹的文本？-软件指南

Python爬虫如何提取网页中被超链接标签包裹的文本？

热心网友 • 2025-04-11 16:01 • 教程 • 阅读 27

Python爬虫：高效提取超链接文本

在使用Python爬虫抓取网页数据时，经常会遇到无法提取标签内文本的问题。本文将通过一个案例，演示如何改进代码，完美解决这个问题。

问题描述：使用XPath表达式//div[@class=”f14 l24 news_content mt25zoom”]/p/text()提取网页文本时，由于目标文本“绿色发展”嵌套在标签内，导致提取失败。原始代码仅获取了

标签下的纯文本，忽略了标签及其内容。

原始代码：

立即学习“Python免费学习笔记（深入）”；

import requestsfrom lxml import etreeimport htmlbase_url = "https://www.solidwaste.com.cn/news/342864.html"resp = requests.get(url=base_url)html = etree.html(resp.text)encod = html.xpath('//meta[1]/@content')if encod:    encod = encod[0].split("=")[-1]    resp.encoding = encod    html = etree.html(resp.text)content = html.xpath('//div[@class="f14 l24 news_content mt25zoom"]/p/text()')print(content)content_deal = ""for i in content:    da = i.strip() + "n"    content_deal += daprint(content_deal)

登录后复制

本文来自互联网或AI生成，不代表软件指南立场。本站不负任何法律责任。

如若转载请注明出处：http://www.down96.com/tutorials/8723.html

Python爬虫如何提取网页中被超链接标签包裹的文本？

相关推荐