chrome请求头headers直接转成python headers

python爬虫经常需要复制浏览器请求头,之前都是用pycharm批量替换。今天想看看有啥方便的方法没。结果发现了超出预期的东西。chrome的Copy as cURL 和curl to python。

下图是Copy as cURL,python爬取动态网页时经常需要寻找真正的接口然后利用参数构造请求。

20200514221625379

以https://fr.news.yahoo.com/politique/这个网站为例

复制过来是这么一坨:

curl 'https://www.wpbeginner.com/wp-tutorials/how-to-display-recently-registered-users-in-wordpress/' \
  -H 'authority: www.wpbeginner.com' \
  -H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9' \
  -H 'accept-language: zh-CN,zh;q=0.9' \
  -H 'cache-control: max-age=0' \
  -H 'cookie: _gcl_au=1.1.436465112.1662469028; _omappvp=epBIA5cd0ZWUxNENwH3BIB7xc0aYZhHXWJ8due9UpGESG981kQNkxcwclZNZXf1cwASDIMJA4EkKa90SXkXN9sMKM4ovHclf; PushSubscriberStatus=CLOSED; peclosed=true; omSeen-wswleymcr7lvrnemcb77=1662491983243; _omra=%7B%22wswleymcr7lvrnemcb77%22%3A%22view%22%7D; om-wswleymcr7lvrnemcb77=1662492524006; _gid=GA1.2.1414703957.1662651389; PHPSESSID=usov6qpk0v7b18l0n0u1vieuqu; _ga_YFDKLJ5Q0T=GS1.1.1662717815.8.1.1662719411.43.0.0; _ga=GA1.2.1930480014.1662469028' \
  -H 'if-modified-since: Fri, 09 Sep 2022 10:14:20 GMT' \
  -H 'sec-ch-ua: "Google Chrome";v="105", "Not)A;Brand";v="8", "Chromium";v="105"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "Windows"' \
  -H 'sec-fetch-dest: document' \
  -H 'sec-fetch-mode: navigate' \
  -H 'sec-fetch-site: none' \
  -H 'sec-fetch-user: ?1' \
  -H 'upgrade-insecure-requests: 1' \
  -H 'user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36' \
  --compressed

然后https://curlconverter.com/这网站登场了:可以直接将curl的请求转换成各种语言对应的代码。python可以选择转成requests库对应的代码。

转换完成,不用自己调试参数了,直接一把梭,方便了很多。postman的import功能也可以实现这样的效果,但实测这个请求postman转过来不对。

import requests

cookies = {
    '_gcl_au': '1.1.436465112.1662469028',
    '_omappvp': 'epBIA5cd0ZWUxNENwH3BIB7xc0aYZhHXWJ8due9UpGESG981kQNkxcwclZNZXf1cwASDIMJA4EkKa90SXkXN9sMKM4ovHclf',
    'PushSubscriberStatus': 'CLOSED',
    'peclosed': 'true',
    'omSeen-wswleymcr7lvrnemcb77': '1662491983243',
    '_omra': '%7B%22wswleymcr7lvrnemcb77%22%3A%22view%22%7D',
    'om-wswleymcr7lvrnemcb77': '1662492524006',
    '_gid': 'GA1.2.1414703957.1662651389',
    'PHPSESSID': 'usov6qpk0v7b18l0n0u1vieuqu',
    '_ga_YFDKLJ5Q0T': 'GS1.1.1662717815.8.1.1662719411.43.0.0',
    '_ga': 'GA1.2.1930480014.1662469028',
}

headers = {
    'authority': 'www.wpbeginner.com',
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'accept-language': 'zh-CN,zh;q=0.9',
    'cache-control': 'max-age=0',
    # Requests sorts cookies= alphabetically
    # 'cookie': '_gcl_au=1.1.436465112.1662469028; _omappvp=epBIA5cd0ZWUxNENwH3BIB7xc0aYZhHXWJ8due9UpGESG981kQNkxcwclZNZXf1cwASDIMJA4EkKa90SXkXN9sMKM4ovHclf; PushSubscriberStatus=CLOSED; peclosed=true; omSeen-wswleymcr7lvrnemcb77=1662491983243; _omra=%7B%22wswleymcr7lvrnemcb77%22%3A%22view%22%7D; om-wswleymcr7lvrnemcb77=1662492524006; _gid=GA1.2.1414703957.1662651389; PHPSESSID=usov6qpk0v7b18l0n0u1vieuqu; _ga_YFDKLJ5Q0T=GS1.1.1662717815.8.1.1662719411.43.0.0; _ga=GA1.2.1930480014.1662469028',
    'if-modified-since': 'Fri, 09 Sep 2022 10:14:20 GMT',
    'sec-ch-ua': '"Google Chrome";v="105", "Not)A;Brand";v="8", "Chromium";v="105"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': '"Windows"',
    'sec-fetch-dest': 'document',
    'sec-fetch-mode': 'navigate',
    'sec-fetch-site': 'none',
    'sec-fetch-user': '?1',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36',
}

response = requests.get('https://www.wpbeginner.com/wp-tutorials/how-to-display-recently-registered-users-in-wordpress/', cookies=cookies, headers=headers)