image_crawler_utils.stations.booru package

class image_crawler_utils.stations.booru.DanbooruKeywordParser(station_url='https://danbooru.donmai.us/', crawler_settings=<image_crawler_utils.classes.crawler_settings.CrawlerSettings object>, standard_keyword_string=None, keyword_string=None, cookies=Cookies(cookies_nodriver=None, cookies_selenium=[], cookies_dict={}, cookies_string=''), replace_url_with_source_level='None', use_keyword_include=False)[source]

Bases: KeywordParser

Parameters:
  • crawler_settings (image_crawler_utils.CrawlerSettings) – The CrawlerSettings used in this Parser.

  • station_url (str) –

    The URL of the main page of a website.

    • This parameter works when several websites use the same structure. For example, https://yande.re/ and https://konachan.com/ both use Moebooru to build their websites, and this parameter must be filled to deal with these sites respectively.

    • For websites like https://www.pixiv.net/, as no other website uses its structure, this parameter has already been initialized and do not need to be filled.

  • standard_keyword_string (str) – Query keyword string using standard syntax. Refer to the documentation for detailed instructions.

  • cookies (image_crawler_utils.Cookies, list, dict, str, None) –

    Cookies used in loading websites.

  • keyword_string (str, None) –

    If you want to directly specify the keywords used in searching, set keyword_string to a custom non-empty string. It will OVERWRITE standard_keyword_string.

    • For example, set keyword_string to "kuon_(utawarerumono) rating:safe" in DanbooruKeywordParser means searching directly with this string in Danbooru, and its standard keyword string equivalent is "kuon_(utawarerumono) AND rating:safe".

  • replace_url_with_source_level (str, must be one of "All", "File", and "None") –

    A level controlling whether the Parser will try to download from the source URL of images instead of from the current website.

    • It has 3 available levels, and default is “None”:
      • ”All” or “all” (NOT SUGGESTED): As long as the image has a source URL, try to download from this URL first.

      • ”File” or “file”: If the source URL looks like a file (e.g. https://foo.bar/image.png) or it is one of several special websites (e.g. Pixiv or Twitter / X status), try to download from this URL first.

      • ”None” or “none”: Do not try to download from any source URL first.

    • Both source URLs and Danbooru URLs are stored in ImageInfo class and will be used when downloading. This parameters only controls the priority of URLs.

    • Set to a level other than “None” / “none” will reduce the pressure on Danbooru server but cost longer time (as source URLs may not be directly accessible, or they are absolutely unavailable).

  • use_keyword_include (bool) –

    If this parameter is set to True, KeywordParser will try to find keyword / tag subgroups with lowest number of keywords / tags (or subgroups with number of keywords / tags lower than a threshold, like 2 in Danbooru for those without an account) that contain all searching results with the least page number.

    • Only works when standard_keyword_string is used. When keyword_string is specified, this parameter is omitted.

    • For example, if the standard_keyword_string is set to “kuon_(utawarerumono) AND rating:safe OR utawarerumono”, then the Parser will check “kuon_(utawarerumono) OR utawarerumono” and “rating:safe OR utawarerumono” and select the group with the least page number of results as the keyword string in later queries.

    • If no subgroup with less than 2 keywords / tags exists (e.g. “kuon_(utawarerumono) OR rating:safe OR utawarerumono”), the Parser will try to find keyword / tag subgroups with the least keyword / tag number. This may often CAUSE ERRORS, so make a quick check of your keywords before setting this parameter to True.

generate_keyword_string()[source]
Return type:

str

generate_keyword_string_include(session=None)[source]
Parameters:

session (Session)

Return type:

str

Parameters:

session (Session)

Return type:

int

get_image_info_from_json(session=None)[source]
Parameters:

session (Session)

Return type:

list[ImageInfo]

get_json_page_num()[source]
Return type:

int

get_json_page_urls(session=None)[source]
Parameters:

session (Session)

Return type:

list[str]

run()[source]

The main function that runs the Parser and returns a list of image_crawler_utils.ImageInfo.

Return type:

list[ImageInfo]

class image_crawler_utils.stations.booru.GelbooruKeywordParser(station_url='https://gelbooru.com/', crawler_settings=<image_crawler_utils.classes.crawler_settings.CrawlerSettings object>, standard_keyword_string=None, keyword_string=None, cookies=Cookies(cookies_nodriver=None, cookies_selenium=[], cookies_dict={}, cookies_string=''), use_api=False, replace_url_with_source_level='None', use_keyword_include=False, api_key=None, user_id=None)[source]

Bases: KeywordParser

Parameters:
  • crawler_settings (image_crawler_utils.CrawlerSettings) – The CrawlerSettings used in this Parser.

  • station_url (str) –

    The URL of the main page of a website.

    • This parameter works when several websites use the same structure. For example, https://yande.re/ and https://konachan.com/ both use Moebooru to build their websites, and this parameter must be filled to deal with these sites respectively.

    • For websites like https://www.pixiv.net/, as no other website uses its structure, this parameter has already been initialized and do not need to be filled.

  • standard_keyword_string (str) – Query keyword string using standard syntax. Refer to the documentation for detailed instructions.

  • cookies (image_crawler_utils.Cookies, list, dict, str, None) –

    Cookies used in loading websites.

  • keyword_string (str, None) –

    If you want to directly specify the keywords used in searching, set keyword_string to a custom non-empty string. It will OVERWRITE standard_keyword_string.

    • For example, set keyword_string to "kuon_(utawarerumono) rating:safe" in DanbooruKeywordParser means searching directly with this string in Danbooru, and its standard keyword string equivalent is "kuon_(utawarerumono) AND rating:safe".

  • use_api (bool) –

    Use Gelbooru API page, like https://gelbooru.com/index.php?page=dapi&s=post&q=index&json=1&api_key=*********&user_id=*********.

    • Set to False will parse image infos from directly visited gallery pages, like https://yande.re/.

    • For some websites like konachan.com, the API is protected, and you need to set this parameters to False to ensure that the Parser works correctly.

  • replace_url_with_source_level (str, must be one of "All", "File", and "None") –

    A level controlling whether the Parser will try to download from the source URL of images instead of from the current website.

    • It has 3 available levels, and default is “None”:
      • ”All” or “all” (NOT SUGGESTED): As long as the image has a source URL, try to download from this URL first.

      • ”File” or “file”: If the source URL looks like a file (e.g. https://foo.bar/image.png) or it is one of several special websites (e.g. Pixiv or Twitter / X status), try to download from this URL first.

      • ”None” or “none”: Do not try to download from any source URL first.

    • Both source URLs and Danbooru URLs are stored in ImageInfo class and will be used when downloading. This parameters only controls the priority of URLs.

    • Set to a level other than “None” / “none” will reduce the pressure on Danbooru server but cost longer time (as source URLs may not be directly accessible, or they are absolutely unavailable).

  • use_keyword_include (bool) –

    If this parameter is set to True, KeywordParser will try to find keyword / tag subgroups with lowest number of keywords / tags (or subgroups with number of keywords / tags lower than a threshold, like 2 in Danbooru for those without an account) that contain all searching results with the least page number.

    • Only works when standard_keyword_string is used. When keyword_string is specified, this parameter is omitted.

    • For example, if the standard_keyword_string is set to “kuon_(utawarerumono) AND rating:safe OR utawarerumono”, then the Parser will check “kuon_(utawarerumono) OR utawarerumono” and “rating:safe OR utawarerumono” and select the group with the least page number of results as the keyword string in later queries.

    • If no subgroup with less than 2 keywords / tags exists (e.g. “kuon_(utawarerumono) OR rating:safe OR utawarerumono”), the Parser will try to find keyword / tag subgroups with the least keyword / tag number. This may often CAUSE ERRORS, so make a quick check of your keywords before setting this parameter to True.

  • api_key (str) – The api_key used to access JSON-API. Can be acquired after logging in at https://gelbooru.com/index.php?page=account&s=options.

  • user_id (str) – The user_id used to access JSON-API. Can be acquired after logging in at https://gelbooru.com/index.php?page=account&s=options.

generate_keyword_string()[source]
Return type:

str

generate_keyword_string_include(session=None)[source]
Parameters:

session (Session)

Return type:

str

Return type:

list[str]

Parameters:

session (Session)

Return type:

list[ImageInfo]

get_image_info_from_json(session=None)[source]
Parameters:

session (Session)

Return type:

list[ImageInfo]

get_json_page_num()[source]
Return type:

int

get_json_page_urls()[source]
Return type:

list[str]

Parameters:

session (Session)

Return type:

int

get_total_image_num_json(session=None)[source]
Parameters:

session (Session)

Return type:

int

run()[source]

The main function that runs the Parser and returns a list of image_crawler_utils.ImageInfo.

Return type:

list[ImageInfo]

class image_crawler_utils.stations.booru.MoebooruKeywordParser(station_url, crawler_settings=<image_crawler_utils.classes.crawler_settings.CrawlerSettings object>, standard_keyword_string=None, keyword_string=None, cookies=Cookies(cookies_nodriver=None, cookies_selenium=[], cookies_dict={}, cookies_string=''), use_api=True, image_num_per_gallery_page=1, image_num_per_json=10, replace_url_with_source_level='None', use_keyword_include=False, has_cloudflare=False)[source]

Bases: KeywordParser

Parameters:
  • crawler_settings (image_crawler_utils.CrawlerSettings) – The CrawlerSettings used in this Parser.

  • station_url (str) –

    The URL of the main page of a website.

    • This parameter works when several websites use the same structure. For example, https://yande.re/ and https://konachan.com/ both use Moebooru to build their websites, and this parameter must be filled to deal with these sites respectively.

    • For websites like https://www.pixiv.net/, as no other website uses its structure, this parameter has already been initialized and do not need to be filled.

  • standard_keyword_string (str) – Query keyword string using standard syntax. Refer to the documentation for detailed instructions.

  • cookies (image_crawler_utils.Cookies, list, dict, str, None) –

    Cookies used in loading websites.

  • use_api (bool) –

    Use Moebooru API page, like https://yande.re/post.json?api_version=2.

    • Set to False will parse image infos from directly visited gallery pages, like https://yande.re/.

    • For some websites like konachan.com, the API is protected, and you need to set this parameters to False to ensure that the Parser works correctly.

  • image_num_per_gallery_page (int) –

    Denotes how many images are displayed on a gallery page.

    • When use_api is set to True, this parameter will be used to estimate the total JSON page number (as we can only acquire total gallery page num from a gallery page). Otherwise it is not used.

    • Several predefined constants are provided for this. You can import them from image_crawler_utils.stations.booru, like:

    from image_crawler_utils.stations.booru import (
        YANDERE_IMAGE_NUM_PER_GALLERY_PAGE,  # yande.re
        KONACHAN_COM_IMAGE_NUM_PER_GALLERY_PAGE,  # konachan.com
        KONACHAN_NET_IMAGE_NUM_PER_GALLERY_PAGE,  # konachan.net
    )
    

  • image_num_per_json (int) –

    When use_api is set to True, this parameter will control how many images are displayed on a JSON-API page.

    from image_crawler_utils.stations.booru import (
        YANDERE_IMAGE_NUM_PER_JSON,  # yande.re
        KONACHAN_NET_IMAGE_NUM_PER_JSON,  # konachan.com
        KONACHAN_COM_IMAGE_NUM_PER_JSON,  # konachan.net
    )
    

  • keyword_string (str, None) –

    If you want to directly specify the keywords used in searching, set keyword_string to a custom non-empty string. It will OVERWRITE standard_keyword_string.

    • For example, set keyword_string to "kuon_(utawarerumono) rating:safe" in DanbooruKeywordParser means searching directly with this string in Danbooru, and its standard keyword string equivalent is "kuon_(utawarerumono) AND rating:safe".

  • replace_url_with_source_level (str, must be one of "All", "File", and "None") –

    A level controlling whether the Parser will try to download from the source URL of images instead of from the current website.

    • It has 3 available levels, and default is “None”:
      • ”All” or “all” (NOT SUGGESTED): As long as the image has a source URL, try to download from this URL first.

      • ”File” or “file”: If the source URL looks like a file (e.g. https://foo.bar/image.png) or it is one of several special websites (e.g. Pixiv or Twitter / X status), try to download from this URL first.

      • ”None” or “none”: Do not try to download from any source URL first.

    • Both source URLs and Danbooru URLs are stored in ImageInfo class and will be used when downloading. This parameters only controls the priority of URLs.

    • Set to a level other than “None” / “none” will reduce the pressure on Danbooru server but cost longer time (as source URLs may not be directly accessible, or they are absolutely unavailable).

  • use_keyword_include (bool) –

    Using a new keyword string whose searching results can contain all images belong to the original keyword string result. Default set to False.

    • Example: search “A” can contain all results by “A and B”

  • has_cloudflare (bool) – Denoting whether current website has a cloudflare protection. Set to True meaning current site is protected by Cloudflare (e.g. konachan.com). A browser window will be open (and often MANUAL operations will be needed) to get cookies in order to bypass it.

generate_keyword_string()[source]
Return type:

str

generate_keyword_string_include(session=None)[source]
Parameters:

session (Session)

Return type:

str

Parameters:

session (Session)

Return type:

int

Return type:

list[str]

Parameters:

session (Session)

get_image_info_from_json(session=None)[source]
Parameters:

session (Session)

Return type:

list[ImageInfo]

get_json_page_num(session=None)[source]
Parameters:

session (Session)

Return type:

int

get_json_page_urls()[source]
Return type:

list[str]

run()[source]

The main function that runs the Parser and returns a list of image_crawler_utils.ImageInfo.

Return type:

list[ImageInfo]

class image_crawler_utils.stations.booru.SafebooruKeywordParser(station_url='https://safebooru.org/', crawler_settings=<image_crawler_utils.classes.crawler_settings.CrawlerSettings object>, standard_keyword_string=None, keyword_string=None, cookies=Cookies(cookies_nodriver=None, cookies_selenium=[], cookies_dict={}, cookies_string=''), replace_url_with_source_level='None', use_keyword_include=False)[source]

Bases: KeywordParser

Parameters:
  • crawler_settings (image_crawler_utils.CrawlerSettings) –

    The CrawlerSettings used in this Parser. station_url (str): The URL of the main page of a website.

    • This parameter works when several websites use the same structure. For example, https://yande.re/ and https://konachan.com/ both use Moebooru to build their websites, and this parameter must be filled to deal with these sites respectively.

    • For websites like https://www.pixiv.net/, as no other website uses its structure, this parameter has already been initialized and do not need to be filled.

  • standard_keyword_string (str) – Query keyword string using standard syntax. Refer to the documentation for detailed instructions.

  • cookies (image_crawler_utils.Cookies, list, dict, str, None) –

    Cookies used in loading websites.

  • keyword_string (str, None) –

    If you want to directly specify the keywords used in searching, set keyword_string to a custom non-empty string. It will OVERWRITE standard_keyword_string.

    • For example, set keyword_string to “kuon_(utawarerumono) rating:safe” in DanbooruKeywordParser means searching directly with this string in Danbooru, and its standard keyword string equivalent is “kuon_(utawarerumono) AND rating:safe”.

    • standard_keyword_string and keyword_string CANNOT be None or empty (contains only spaces) at the same time. Otherwise, a critical error will happen!

  • replace_url_with_source_level (str, must be one of "All", "File", and "None") –

    A level controlling whether the Parser will try to download from the source URL of images instead of from the current website.

    • It has 3 available levels, and default is “None”:
      • ”All” or “all” (NOT SUGGESTED): As long as the image has a source URL, try to download from this URL first.

      • ”File” or “file”: If the source URL looks like a file (e.g. https://foo.bar/image.png) or it is one of several special websites (e.g. Pixiv or Twitter / X status), try to download from this URL first.

      • ”None” or “none”: Do not try to download from any source URL first.

    • Both source URLs and Danbooru URLs are stored in ImageInfo class and will be used when downloading. This parameters only controls the priority of URLs.

    • Set to a level other than “None” / “none” will reduce the pressure on Danbooru server but cost longer time (as source URLs may not be directly accessible, or they are absolutely unavailable).

  • use_keyword_include (bool) –

    If this parameter is set to True, KeywordParser will try to find keyword / tag subgroups with lowest number of keywords / tags (or subgroups with number of keywords / tags lower than a threshold, like 2 in Danbooru for those without an account) that contain all searching results with the least page number.

    • Only works when standard_keyword_string is used. When keyword_string is specified, this parameter is omitted.

    • For example, if the standard_keyword_string is set to “kuon_(utawarerumono) AND rating:safe OR utawarerumono”, then the Parser will check “kuon_(utawarerumono) OR utawarerumono” and “rating:safe OR utawarerumono” and select the group with the least page number of results as the keyword string in later queries.

    • If no subgroup with less than 2 keywords / tags exists (e.g. “kuon_(utawarerumono) OR rating:safe OR utawarerumono”), the Parser will try to find keyword / tag subgroups with the least keyword / tag number. This may often CAUSE ERRORS, so make a quick check of your keywords before setting this parameter to True.

  • station_url (str)

generate_keyword_string()[source]
Return type:

str

generate_keyword_string_include(session=None)[source]
Parameters:

session (Session)

Return type:

str

get_image_info_from_json(session=None)[source]
Parameters:

session (Session)

Return type:

list[ImageInfo]

get_json_page_num(session=None)[source]
Parameters:

session (Session)

Return type:

int

get_json_page_urls()[source]
Return type:

list[str]

get_total_image_num(session=None)[source]
Parameters:

session (Session)

Return type:

int

run()[source]

The main function that runs the Parser and returns a list of image_crawler_utils.ImageInfo.

Return type:

list[ImageInfo]

image_crawler_utils.stations.booru.filter_keyword_booru(image_info, standard_keyword_string)[source]

A keyword filter for xxxbooru-style image info.

It will check whether current tags match the standard_keyword_string query.

Parameters: