![]() ![]() |
Python网络爬虫 读者对象:本书可作为高等职业院校计算机类专业的专业课教材,也可供计算机相关从业人员选用参考。
本书介绍如何结合Python进行网络爬虫程序的开发,从Python语言的基本特性入手,详细介绍了Python网络爬虫开发的各个方面,涉及HTTP、HTML、JavaScript、正则表达式、自然语言处理、数据科学等不同领域的内容。全书共10章,包括Python基础知识、网站分析、网页解析、Python文件读写、Python与数据库、AJAX技术、模拟登录、文本与数据分析、网站测试、Scrapy爬虫框架、爬虫性能等多个主题。本书可作为高等职业院校计算机类专业的专业课教材,也可供计算机相关从业人员选用参考。
耿兴隆,Autodesk中国认证考试中心首席专家,全面负责Autodesk中国官方认证考试大纲制定、题库建设、技术咨询和师资力量培训工作。其创作的很多教材成为国内具有引导性的旗帜作品,在国内相关专业方向图书创作领域具有举足轻重的地位。
目录
项目一 Python 基础认知 ····················································································.1 任务一 Python 概述 ·······································································································.1 一、Python 简介 ······································································································.1 二、安装Python ······································································································.2 三、安装PyCharm ···································································································.6 四、Python 语法规范 ·······························································································.11 任务二 Python 命令的组成 ·····························································································.13 一、基本符号 ·········································································································.14 二、常量与变量 ······································································································.16 三、数据类型 ·········································································································.19 四、功能符号 ·········································································································.24 任务三 程序结构 ·········································································································.26 一、表达式语句 ······································································································.26 二、顺序结构 ·········································································································.27 三、选择结构 ·········································································································.28 四、循环结构 ·········································································································.30 五、条件表达式 ······································································································.31 六、程序的流程控制 ································································································.32 项目实战 ·····················································································································.33 实战 输出百度网址 ································································································.33 项目二 网络爬虫基础认知 ················································································.35 任务一 网络爬虫概述 ···································································································.35 一、网络爬虫的基本原理 ··························································································.36 二、网络爬虫系统框架 ·····························································································.37 三、爬行策略 ·········································································································.37 四、网络爬虫的分类 ································································································.38 五、开源网络爬虫框架/项目 ······················································································.39 任务二 HTTP ·············································································································.41 一、HTTP 的工作原理 ·····························································································.41 二、Urllib 模块库 ···································································································.42 三、URL 定义 ·······································································································.43 四、URL 编码设置 ·································································································.47 任务三 网页请求过程 ···································································································.50 一、发送请求报文 ··································································································.51 二、返回响应 ········································································································.52 三、HTTP 消息 ······································································································.53 项目实战 ·····················································································································.54 实战一 搜索商品网址 ····························································································.54 实战二 搜索食品价格网址 ······················································································.56 项目三 Urllib 请求模块库的应用 ········································································.58 任务一 发送网页请求 ···································································································.58 一、基本HTTP 请求 ·······························································································.58 二、Request 网络请求 ·····························································································.66 三、设置请求头 ·····································································································.67 四、Handler 方法发送请求 ·······················································································.69 五、设置代理IP ····································································································.71 六、身份验证 ········································································································.73 任务二 网页下载 ·········································································································.77 一、网页结构 ········································································································.77 二、写入网页文件 ··································································································.77 三、网页文件下载 ··································································································.79 项目实战 ·····················································································································.82 实战一 下载Python 学习网址 ··················································································.82 实战二 下载公司网页HTML 文件 ············································································.85 项目四 安装Urllib3 请求模块库并发送请求 ··························································.87 任务一 安装Urllib3 请求模块库 ······················································································.87 一、安装Anaconda ·································································································.87 二、安装Urllib3 模块库 ···························································································.92 任务二 发送请求 ·········································································································.95 一、创建代理对象 ··································································································.96 二、请求方法 ········································································································.98 三、定义请求头 ·····································································································.99 四、设置代理IP ···································································································.101 五、自动重试 ·······································································································.102 六、重定向 ··········································································································.103 项目实战 ····················································································································.104 实战 发送请求访问淘宝 ························································································.104 项目五 Requests 请求模块库的应用 ·································································.106 任务一 网页请求 ·······································································································.106 一、标准的HTTP 请求 ···························································································.107 二、返回响应消息 ·································································································.109 三、JSON 格式数据 ·······························································································.114 任务二 发送请求方法 ·································································································.117 一、发送GET 请求方法 ·························································································.118 二、发送POST 请求方法 ························································································.120 三、其他请求方法 ·································································································.125 任务三 复杂网络请求 ·································································································.126 一、复杂请求头 ····································································································.126 二、上传文件 ·······································································································.129 三、Cookies 验证 ··································································································.131 四、会话保持 ·······································································································.131 任务四 异常处理 ·······································································································.133 一、try-except 语句 ································································································.133 二、Urllib 异常处理模块 ·························································································.134 三、Urllib3 异常处理模块 ·······················································································.135 四、request 异常处理模块 ·······················································································.135 项目实战 ···················································································································.138 实战 爬取豆瓣最受欢迎的影评网址 ·········································································.138 项目六 解析网页 ···························································································.141 任务一 正则表达式解析网页 ························································································.141 一、正则表达式模式 ······························································································.142 二、使用re 模块实现正则表达式 ··············································································.143 三、字符串查找 ····································································································.144 四、字符串替换 ····································································································.148 五、字符串分割 ····································································································.149 任务二 XPath 解析网页 ·······························································································.150 一、XPath 概述 ····································································································.150 二、XPath 网页解析 ······························································································.152 三、获取节点信息 ·································································································.154 四、节点关系 ·······································································································.160 五、查找节点信息 ·································································································.162 六、属性节点 ·······································································································.163 七、XPath 运算符 ·································································································.165 八、XML 节点轴 ··································································································.168 任务三 BeautifulSoup 解析网页 ······················································································.170 一、安装BeautifulSoup ···························································································.171 二、创建BeautifulSoup 对象 ····················································································.171 三、通过属性获取节点内容 ·····················································································.173 四、根据节点关系获取节点 ·····················································································.176 五、查找节点内容 ·································································································.178 六、通过CSS 选择器查找节点内容 ···········································································.182 项目实战 ····················································································································.183 实战一 获取查询网中河北省石家庄市的邮编区号 ·······················································.183 实战二 爬取销售热门图书名称 ···············································································.186 实战三 下载销售热门图书的图片 ············································································.188 项目七 Scrapy 网络爬虫框架 ···········································································.190 任务一 Scrapy 网络爬虫框架基础认知 ·············································································.190 一、Scrapy 网络爬虫框架基础 ··················································································.190 二、Scrapy 常用命令 ······························································································.192 三、创建Scrapy 项目 ·····························································································.193 任务二 使用模板创建Spider 文件 ··················································································.194 一、创建网络爬虫文件命令 ·····················································································.195 二、创建basic 模板文件 ·························································································.196 三、创建crawl 模板文件 ·························································································.197 四、创建csvfeed 模板文件 ······················································································.198 五、创建xmlfeed 模板文件 ······················································································.198 任务三 Scrapy 网络爬虫文件 ·························································································.199 一、Spider 类 ·······································································································.199 二、配置网络爬虫 ·································································································.201 三、启动网络爬虫 ·································································································.202 四、提取数据 ·······································································································.207 项目实战 ····················································································································.209 实战 提取景区名称 ······························································································.209
你还可能感兴趣
我要评论
|