欢迎访问我的网站与公众号!点击与扫码即可进入,谢谢关注!
线程池基本使用
原则: 线程池处理的是阻塞且耗时的操作
1 2 3 4 5 6 7 8
| from multiprocessing.dummy import Pool
pool = Pool(4)
page_contets = pool.map(get_content, urls)
|
实例1: (梨视频爬取)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
|
tree = etree.HTML(page_text) li_list = tree.xpath('//url[@id="listvideoListUl"]/li') urls = [] for li in li_list: detail_url = 'https://www.pearvideo.com'+li.xpath('./div/a/@href')[0] name = li.xpath('./div/a/div[2]/text()')[0]+'.mp4' detail_page_text = requests.get(url=detail_url,headers).text video_url = res.finadall('src="(.*?)",vdoUrl', detail_page_text)[0] dic = {'name':name, 'url':video_url} urls.append(dic)
def get_video_content(dic): url = dic['url'] data = requests.get(url=url,headers=headers).content with open(dic['name'], 'wb') as f: f.write(data)
pool = Pool(4) pool.map(get_video_content, urls)
pool.close()
pool.join()
|