Downloader Class
A Downloader will download images with information from the list of ImageInfo.
You can import Downloader from image_crawler_utils.
- class image_crawler_utils.Downloader(image_info_list, crawler_settings=<image_crawler_utils.classes.crawler_settings.CrawlerSettings object>, store_path='./', image_info_filter=True, cookies=Cookies(cookies_nodriver=None, cookies_selenium=[], cookies_dict={}, cookies_string=''))[source]
Bases:
objectDownloading images using threading method.
- Parameters:
crawler_settings (image_crawler_utils.CrawlerSettings) – The CrawlerSettings used in this Downloader.
image_info_list (image_crawler_utils.ImageInfo) – A list of ImageInfo.
store_path (str) –
Path to store images, or a list of storage paths respectively for every image.
Default is the current working directory.
If it set to an iterable list, then its length should be the same as
image_info_list.
image_info_filter (callable, bool) –
A callable function used to filter the images in the list of ImageInfo.
The function of
image_info_filtershould only accept 1 argument of ImageInfo type and returns True (download this image) or False (do not download this image), like:def filter_func(image_info: ImageInfo) -> bool: # Meet the conditions return True # Do not meet the conditions return False
If the function have other parameters, use
lambdato exclude other parameters:image_info_filter=lambda info: filter_func(info, param1, param2, ...)
If you want to download all images in the ImageInfo list, set
image_info_filtertoTrue.TIPS: If you want to search images with complex restrictions that the image station sites may not support (e.g. Images with many keywords and restrictions on the ratio between width and height), you can simplify the query with some keywords to get all images with Parsers, and filter them with your custom
image_info_filterfunction.
cookies (image_crawler_utils.Cookies, str, dict, list, None) –
Cookies used to access images from a website.
Nonemeans no cookies and works the same asCookies().Leave this parameter blank works the same as
None/Cookies().TIPS: You can add corresponding cookies to Downloader if there are URLs of images only accessible with an account. For example, if you have saved Pixiv and Twitter / X cookies respectively in
Pixiv_cookies.jsonandTwitter_cookies.json, then you can usecookies=Cookies.load_from_json("Pixiv_cookies.json") + Cookies.load_from_json("Twitter_cookies.json")to add both cookies to the Downloader.
- classmethod load_from_pkl(pkl_file, log=<image_crawler_utils.log.Log object>)[source]
Load parser from .pkl file.
- Parameters:
pkl_file (str, None) – Name of the pkl file.
log (image_crawler_utils.log.Log, None) – Logging config.
- Returns:
A CrawlerSettings class loaded from pkl file, or None if failed.
- Return type:
- display_all_configs()[source]
Display all config info. Dataclasses will be displayed in a neater way.
- run()[source]
Run the Threading Downloader Object.
- Returns:
(Total size of image downloaded, Succeeded ImageInfo list, Failed ImageInfo list, Skipped ImageInfo list)
Total size of image downloaded: An int denoting the total size (in bytes) of images downloaded.
Succeeded ImageInfo list: A list of ImageInfo containing successfully downloaded images.
Failed ImageInfo list: A list of ImageInfo containing images failed to be downloaded.
Images not downloaded due to reaching
capacitydefined inimage_crawler_utils.CrawlerSettingswill be classified to this list.
Skipped ImageInfo list: A list of ImageInfo containing images skipped.
Images filtered out by
image_info_filter, not downloaded due to the restriction ofimage_numinimage_crawler_utils.CrawlerSettings, and skipped due to such images already exist whenoverwrite_imagesin DownloadConfig is set toFalsewill be classified to this list.
- Return type:
tuple[int, list[ImageInfo], list[ImageInfo], list[ImageInfo]]
- save_to_pkl(pkl_file)[source]
Save the Downloader with settings in a pkl file.
Examples of Downloader
Mostly, you only need to run:
downloader = Downloader(
crawler_settings=defined_CrawlerSettings
image_info_list=list_of_ImageInfo
# Set other parameters
)
downloader.run()
If you want to collect successfully downloaded list of images or list of images failed to be downloaded, you can use Downloader like:
download_traffic, succeeded_list, failed_list, skipped_list = downloader.run()
succeeded_list, failed_list and skipped_list can be saved or loaded with image_crawler_utils.save_image_infos() or image_crawler_utils.load_image_infos() for future uses.