距离403已经过去了两小时左右,我又可以访问了!!
本着做一个能让博客背景随机图片的api的想法,我开始了人生中第一次爬虫。上个星期刚学的Python爬虫,还没学完技术属实有点不到位,测试的次数多了点,然后就被封ip了!!
以图片形式存bing每日一图
在网上看了很多bing图片的api,在网上找了一个很良心的站长的图片开始了我的爬虫之路
import requests
from bs4 import BeautifulSoup
import os
def get_html(url):
headers = {'User-agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 UBrowser/6.2.3964.2 Safari/537.36'}
r = requests.get(url, headers=headers)
return r.content
def download(text):
soup = BeautifulSoup(text, 'html.parser')
items = soup.find_all("img")
path = "E:bing"
if os.path.exists(path) == False:
os.makedirs(path)
for item in items:
if item:
html = requests.get(item.get('src'))
img_name = path + str(item.get('src'))[24:41] +'.png'
with open(img_name, 'wb') as file:
file.write(html.content)
file.flush()
file.close()
def main():
for i in range(1,124):
url = 'https://bing.ioliu.cn/ranking?p={}'.format(i)
text = get_html(url)
download(text)
print("第{}页下载完成".format(i))
if __name__ == "__main__":
main()
这段代码是把他以照片的形式下载下来,我转念一想,我还要全部上传到服务器一个个的获取图片的url,太过于复杂,于是我就……
爬取每张图片的url
import requests
from bs4 import BeautifulSoup
import os
def get_html(url):
headers = {'User-agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 UBrowser/6.2.3964.2 Safari/537.36'}
r = requests.get(url, headers=headers)
return r.content
def download(text):
soup = BeautifulSoup(text, 'html.parser')
items = soup.find_all("img")
for item in items:
if item:
img = str(item.get('src'))
img_url = img.strip('640x480.jpg?imageslim')
img_url += '1920x1080.jpg?imageslim'
f = open('imgurl.txt', 'a')
f.write(img_url + 'n')
f.close
def main():
for i in range(1,124):
url = 'https://bing.ioliu.cn/ranking?p={}'.format(i)
text = get_html(url)
download(text)
print("第{}页下载完成".format(i))
if __name__ == "__main__":
main()
我就把他所有的图片url给存下来了,没错文章头图就是爬取后的结果,然后正在我感受到喜悦的同时,我的ip被封了,不过这些图片还都可以访问,没算白干。技术不到位,还有待提升!
不过最终我还是获得了我想要的随机api
1474张呢,够我玩耍了。
http://www.wu555.ink/random