虫 之 requests
Requests
安装pip install requests
官方设计原则:让HTTP服务于人类
一、常用方法
import requests url = "upload/201909121507337273.gif") 25px 0px no-repeat; border: 2px solid rgb(239, 239, 239); min-height: 35px; line-height: 1.6em; color: rgb(51, 51, 51);">保存图片到本地
url为https://inews.gtimg.com/newsapp_bt/0/10186045426/1000
import requests url = "https://inews.gtimg.com/newsapp_bt/0/10186045426/1000" headers = {"User-Agent":"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E"} res = requests.get(url=url, headers=headers).content with open("demo.jpg", "wb") as f: f.write(res)
字符编码也是使用urllib.parse.urlencode()
1. GET
requests.get()
1.1 params
查询参数
1.1.1 参数类型
字典类型,字典中键值对作为查询参数
params = { "kw":"泰勒·斯威夫特吧" "pn":50 }
无需编码,requests模块会将其封装好请求
1.1.2 使用方法
res = requests.get(url, params, headers=headers)
1.1.3 特点
- URL为基准的URL地址,不包含查询参数
- 该方法会自动对params字典编码,然后和url拼接
1.1.4 示例
import requests #基准的URL地址 base_url = "http://www.baidu.com/s?" # 查询参数 params = { "kw":"泰勒·斯威夫特吧", "pn":50 } # 请求头 headers = { "User-Agent":"Mozilla/5.0" } res = requests.get(url=base_url, params=params, headers=headers) print(res.content.decode("utf-8", "ignore"))
1.2 auth
Web客户端认证
1.2.1 特点
- 针对于需要web客户端用户名密码认证的网站
auth = ('username','password')
1.2.3 使用
import requests from config import * base_url = "http://code.tarena.com.cn/AIDCode/aid1903/12-spider/spider_day{}_note.zip" auth = (USERNAME, PASSWD) headers = {"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3895.5 Safari/537.36"} def down(): for day in range(