|
平时如果我们看到好看的图片就想着把它下载下来,如果图片很多的话,这样我们一张一张的下载也非常费时间,这时候就可以用到python把图片批量爬取下载到本地文件夹。Python爬取小姐姐图片代码如下,输出起始网页的ID和结束网页的ID就能批量地下载小姐姐。
- import os
- import time
- import requests
- import re
- headers = {
- 'User-Agent': "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1",
- 'Accept': "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
- 'Accept-Encoding': 'gzip',
- "Referer": "https://www.baidu.com/"
- }
- httpnum = int(input("请输入爬取网页的起始ID:"))
- httpnum1 = int(input("请输入爬取网页的结束ID:"))
- for i in range(httpnum,httpnum1+1):
- httpurl = "https://www.vmgirls.com/{0}.html".format(i)
- response = requests.get(httpurl, headers=headers)
- html = response.text
- if str("<style></style><meta name=keywords content=") not in html:
- print("{0}网页不存在".format(i))
- continue
- else:
- dir_name = re.findall('<h1 class="post-title h1">(.*?)</h1>', html)[-1]
- if not os.path.exists(dir_name):
- os.mkdir(dir_name)
- urls = re.findall('<a href="(.*?)" alt=".*?" title=".*?">', html)
- for url in urls:
- time.sleep(1)
- name = url.split('/')[-1]
- response = requests.get("https:" + url, headers=headers)
- print(name + "正在下载")
- with open(dir_name + '/' + name, 'wb') as f:
- f.write(response.content)
- print("{0}下载完毕".format(i))
- print("全部下载完毕")
复制代码
|
|