image_crawler_utils.image_downloader package

image_crawler_utils.image_downloader.download_image(url, image_name, download_config=DownloadConfig(headers=None, proxies=None, thread_delay=5, fail_delay=3, randomize_delay=True, thread_num=5, timeout=10, max_download_time=None, retry_times=5, overwrite_images=True), headers=None, proxies=None, log=<image_crawler_utils.log.Log object>, store_path='./', session=<requests.Session object>, progress_group=None, thread_id=0)[source]

Core downloader for downloading image from url.

Parameters:
  • url (str) – The URL of the image to download.

  • image_name (str) – Name of image to be stored.

  • download_config (image_crawler_utils.configs.DownloadConfig) – Comprehensive download config.

  • headers (dict, callable, None) – Custom headers that will overwrite the ones in download_config.

  • proxies (dict, callable, None) – Custom proxies that will overwrite the ones in download_config.

  • log (config.Log) – The logger.

  • store_path (str) – Path of image to be stored.

  • session (requests.Session) – A session that may contain cookies.

  • progress_group (image_crawler_utils.progress_bar.ProgressGroup) – The Group of Progress bars to be displayed in.

  • thread_id (int) – Nth thread of image downloading.

Returns:

(bool denoting whether succeeded, the size of the downloaded image in bytes)

Return type:

(bool, int)

image_crawler_utils.image_downloader.download_image_from_url(url, image_name, download_config=DownloadConfig(headers=None, proxies=None, thread_delay=5, fail_delay=3, randomize_delay=True, thread_num=5, timeout=10, max_download_time=None, retry_times=5, overwrite_images=True), log=<image_crawler_utils.log.Log object>, store_path='./', session=None, progress_group=None, thread_id=0, cookies=Cookies(cookies_nodriver=None, cookies_selenium=[], cookies_dict={}, cookies_string=''))[source]

Download image from url. Automatically separate Pixiv, Twitter, etc. image URLs from normal URLs.

Parameters:
Returns:

(the size of the downloaded image in bytes, thread_id)

Return type:

(float, int)

image_crawler_utils.image_downloader.pixiv_download_image_from_url(url, image_name, download_config=DownloadConfig(headers=None, proxies=None, thread_delay=5, fail_delay=3, randomize_delay=True, thread_num=5, timeout=10, max_download_time=None, retry_times=5, overwrite_images=True), log=<image_crawler_utils.log.Log object>, store_path='./', session=<requests.Session object>, progress_group=None, thread_id=0)[source]

Download Pixiv image from url. Supports both direct Pixiv picture URL and artwork ID URL.

Parameters:
Returns:

(the size of the downloaded image in bytes, thread_id)

Return type:

(float, int)

image_crawler_utils.image_downloader.twitter_download_image_from_status(url, image_name, download_config=DownloadConfig(headers=None, proxies=None, thread_delay=5, fail_delay=3, randomize_delay=True, thread_num=5, timeout=10, max_download_time=None, retry_times=5, overwrite_images=True), log=<image_crawler_utils.log.Log object>, store_path='./', session=<requests.Session object>, progress_group=None, thread_id=0)[source]

Download image from Twitter status URL.

Parameters:
Returns:

(the size of the downloaded image in bytes, thread_id)

Return type:

(float, int)