Downloader Class

A Downloader will download images with information from the list of ImageInfo.

You can import Downloader from image_crawler_utils.

class image_crawler_utils.Downloader(image_info_list, crawler_settings=<image_crawler_utils.classes.crawler_settings.CrawlerSettings object>, store_path='./', image_info_filter=True, cookies=Cookies(cookies_nodriver=None, cookies_selenium=[], cookies_dict={}, cookies_string=''))[source]

Bases: object

Downloading images using threading method.

Parameters:
  • crawler_settings (image_crawler_utils.CrawlerSettings) – The CrawlerSettings used in this Downloader.

  • image_info_list (image_crawler_utils.ImageInfo) – A list of ImageInfo.

  • store_path (str) –

    Path to store images, or a list of storage paths respectively for every image.

    • Default is the current working directory.

    • If it set to an iterable list, then its length should be the same as image_info_list.

  • image_info_filter (callable, bool) –

    A callable function used to filter the images in the list of ImageInfo.

    • The function of image_info_filter should only accept 1 argument of ImageInfo type and returns True (download this image) or False (do not download this image), like:

      def filter_func(image_info: ImageInfo) -> bool:
          # Meet the conditions
          return True
          # Do not meet the conditions
          return False
      
    • If the function have other parameters, use lambda to exclude other parameters:

      image_info_filter=lambda info: filter_func(info, param1, param2, ...)
      
    • If you want to download all images in the ImageInfo list, set image_info_filter to True.

    • TIPS: If you want to search images with complex restrictions that the image station sites may not support (e.g. Images with many keywords and restrictions on the ratio between width and height), you can simplify the query with some keywords to get all images with Parsers, and filter them with your custom image_info_filter function.

  • cookies (image_crawler_utils.Cookies, str, dict, list, None) –

    Cookies used to access images from a website.

    • None means no cookies and works the same as Cookies().

    • Leave this parameter blank works the same as None / Cookies().

    • TIPS: You can add corresponding cookies to Downloader if there are URLs of images only accessible with an account. For example, if you have saved Pixiv and Twitter / X cookies respectively in Pixiv_cookies.json and Twitter_cookies.json, then you can use cookies=Cookies.load_from_json("Pixiv_cookies.json") + Cookies.load_from_json("Twitter_cookies.json") to add both cookies to the Downloader.

classmethod load_from_pkl(pkl_file, log=<image_crawler_utils.log.Log object>)[source]

Load parser from .pkl file.

Parameters:
Returns:

A CrawlerSettings class loaded from pkl file, or None if failed.

Return type:

CrawlerSettings

display_all_configs()[source]

Display all config info. Dataclasses will be displayed in a neater way.

run()[source]

Run the Threading Downloader Object.

Returns:

(Total size of image downloaded, Succeeded ImageInfo list, Failed ImageInfo list, Skipped ImageInfo list)

  • Total size of image downloaded: An int denoting the total size (in bytes) of images downloaded.

  • Succeeded ImageInfo list: A list of ImageInfo containing successfully downloaded images.

  • Failed ImageInfo list: A list of ImageInfo containing images failed to be downloaded.

  • Skipped ImageInfo list: A list of ImageInfo containing images skipped.

    • Images filtered out by image_info_filter, not downloaded due to the restriction of image_num in image_crawler_utils.CrawlerSettings, and skipped due to such images already exist when overwrite_images in DownloadConfig is set to False will be classified to this list.

Return type:

tuple[int, list[ImageInfo], list[ImageInfo], list[ImageInfo]]

save_to_pkl(pkl_file)[source]

Save the Downloader with settings in a pkl file.

Parameters:
  • path (str) – Path to save the pkl file. Default is saving to the current path.

  • pkl_file (str, None) – Name of the pkl file. (Suffix is optional.)

Returns:

(Saved file name, Absolute path of the saved file), or None if failed.

Return type:

tuple[str, str] | None

Examples of Downloader

Mostly, you only need to run:

downloader = Downloader(
    crawler_settings=defined_CrawlerSettings
    image_info_list=list_of_ImageInfo
    # Set other parameters
)
downloader.run()

If you want to collect successfully downloaded list of images or list of images failed to be downloaded, you can use Downloader like:

download_traffic, succeeded_list, failed_list, skipped_list = downloader.run()

succeeded_list, failed_list and skipped_list can be saved or loaded with image_crawler_utils.save_image_infos() or image_crawler_utils.load_image_infos() for future uses.