image_crawler_utils.stations.booru package

class image_crawler_utils.stations.booru.DanbooruKeywordParser(station_url='https://danbooru.donmai.us/', crawler_settings=<image_crawler_utils.classes.crawler_settings.CrawlerSettings object>, standard_keyword_string=None, keyword_string=None, cookies=Cookies(cookies_nodriver=None, cookies_selenium=[], cookies_dict={}, cookies_string=''), replace_url_with_source_level='None', use_keyword_include=False)[source]

Bases: KeywordParser

Parameters:

crawler_settings (image_crawler_utils.CrawlerSettings) – The CrawlerSettings used in this Parser.
station_url (str) –
The URL of the main page of a website.
- This parameter works when several websites use the same structure. For example, https://yande.re/ and https://konachan.com/ both use Moebooru to build their websites, and this parameter must be filled to deal with these sites respectively.
- For websites like https://www.pixiv.net/, as no other website uses its structure, this parameter has already been initialized and do not need to be filled.
standard_keyword_string (str) – Query keyword string using standard syntax. Refer to the documentation for detailed instructions.
cookies (image_crawler_utils.Cookies, list, dict, str, None) –
Cookies used in loading websites.
- Can be one of image_crawler_utils.Cookies, list, dict, str or None.
  - None means no cookies and works the same as Cookies().
  - Leave this parameter blank works the same as None / Cookies().
keyword_string (str, None) –
If you want to directly specify the keywords used in searching, set keyword_string to a custom non-empty string. It will OVERWRITE standard_keyword_string.
- For example, set keyword_string to "kuon_(utawarerumono) rating:safe" in DanbooruKeywordParser means searching directly with this string in Danbooru, and its standard keyword string equivalent is "kuon_(utawarerumono) AND rating:safe".
replace_url_with_source_level (str, must be one of "All", "File", and "None") –
A level controlling whether the Parser will try to download from the source URL of images instead of from the current website.
- It has 3 available levels, and default is “None”:
  - ”All” or “all” (NOT SUGGESTED): As long as the image has a source URL, try to download from this URL first.
  - ”File” or “file”: If the source URL looks like a file (e.g. https://foo.bar/image.png) or it is one of several special websites (e.g. Pixiv or Twitter / X status), try to download from this URL first.
  - ”None” or “none”: Do not try to download from any source URL first.
- Both source URLs and Danbooru URLs are stored in ImageInfo class and will be used when downloading. This parameters only controls the priority of URLs.
- Set to a level other than “None” / “none” will reduce the pressure on Danbooru server but cost longer time (as source URLs may not be directly accessible, or they are absolutely unavailable).
use_keyword_include (bool) –
If this parameter is set to True, KeywordParser will try to find keyword / tag subgroups with lowest number of keywords / tags (or subgroups with number of keywords / tags lower than a threshold, like 2 in Danbooru for those without an account) that contain all searching results with the least page number.
- Only works when standard_keyword_string is used. When keyword_string is specified, this parameter is omitted.
- For example, if the standard_keyword_string is set to “kuon_(utawarerumono) AND rating:safe OR utawarerumono”, then the Parser will check “kuon_(utawarerumono) OR utawarerumono” and “rating:safe OR utawarerumono” and select the group with the least page number of results as the keyword string in later queries.
- If no subgroup with less than 2 keywords / tags exists (e.g. “kuon_(utawarerumono) OR rating:safe OR utawarerumono”), the Parser will try to find keyword / tag subgroups with the least keyword / tag number. This may often CAUSE ERRORS, so make a quick check of your keywords before setting this parameter to True.

generate_keyword_string()[source]

Return type:: str

generate_keyword_string_include(session=None)[source]

Parameters:: session (Session)
Return type:: str

get_gallery_page_num(session=None)[source]

Parameters:: session (Session)
Return type:: int

get_image_info_from_json(session=None)[source]

Parameters:: session (Session)
Return type:: list[ImageInfo]

get_json_page_num()[source]

Return type:: int

get_json_page_urls(session=None)[source]

Parameters:: session (Session)
Return type:: list[str]

run()[source]

The main function that runs the Parser and returns a list of image_crawler_utils.ImageInfo.

Return type:: list[ImageInfo]

class image_crawler_utils.stations.booru.GelbooruKeywordParser(station_url='https://gelbooru.com/', crawler_settings=<image_crawler_utils.classes.crawler_settings.CrawlerSettings object>, standard_keyword_string=None, keyword_string=None, cookies=Cookies(cookies_nodriver=None, cookies_selenium=[], cookies_dict={}, cookies_string=''), use_api=False, replace_url_with_source_level='None', use_keyword_include=False, api_key=None, user_id=None)[source]

Bases: KeywordParser

Parameters:

crawler_settings (image_crawler_utils.CrawlerSettings) – The CrawlerSettings used in this Parser.
station_url (str) –
The URL of the main page of a website.
- This parameter works when several websites use the same structure. For example, https://yande.re/ and https://konachan.com/ both use Moebooru to build their websites, and this parameter must be filled to deal with these sites respectively.
- For websites like https://www.pixiv.net/, as no other website uses its structure, this parameter has already been initialized and do not need to be filled.
standard_keyword_string (str) – Query keyword string using standard syntax. Refer to the documentation for detailed instructions.
cookies (image_crawler_utils.Cookies, list, dict, str, None) –
Cookies used in loading websites.
- Can be one of image_crawler_utils.Cookies, list, dict, str or None.
  - None means no cookies and works the same as Cookies().
  - Leave this parameter blank works the same as None / Cookies().
keyword_string (str, None) –
If you want to directly specify the keywords used in searching, set keyword_string to a custom non-empty string. It will OVERWRITE standard_keyword_string.
- For example, set keyword_string to "kuon_(utawarerumono) rating:safe" in DanbooruKeywordParser means searching directly with this string in Danbooru, and its standard keyword string equivalent is "kuon_(utawarerumono) AND rating:safe".
use_api (bool) –
Use Gelbooru API page, like https://gelbooru.com/index.php?page=dapi&s=post&q=index&json=1&api_key=*********&user_id=*********.
- Set to False will parse image infos from directly visited gallery pages, like https://yande.re/.
- For some websites like konachan.com, the API is protected, and you need to set this parameters to False to ensure that the Parser works correctly.
replace_url_with_source_level (str, must be one of "All", "File", and "None") –
A level controlling whether the Parser will try to download from the source URL of images instead of from the current website.
- It has 3 available levels, and default is “None”:
  - ”All” or “all” (NOT SUGGESTED): As long as the image has a source URL, try to download from this URL first.
  - ”File” or “file”: If the source URL looks like a file (e.g. https://foo.bar/image.png) or it is one of several special websites (e.g. Pixiv or Twitter / X status), try to download from this URL first.
  - ”None” or “none”: Do not try to download from any source URL first.
- Both source URLs and Danbooru URLs are stored in ImageInfo class and will be used when downloading. This parameters only controls the priority of URLs.
- Set to a level other than “None” / “none” will reduce the pressure on Danbooru server but cost longer time (as source URLs may not be directly accessible, or they are absolutely unavailable).
use_keyword_include (bool) –
If this parameter is set to True, KeywordParser will try to find keyword / tag subgroups with lowest number of keywords / tags (or subgroups with number of keywords / tags lower than a threshold, like 2 in Danbooru for those without an account) that contain all searching results with the least page number.
- Only works when standard_keyword_string is used. When keyword_string is specified, this parameter is omitted.
- For example, if the standard_keyword_string is set to “kuon_(utawarerumono) AND rating:safe OR utawarerumono”, then the Parser will check “kuon_(utawarerumono) OR utawarerumono” and “rating:safe OR utawarerumono” and select the group with the least page number of results as the keyword string in later queries.
- If no subgroup with less than 2 keywords / tags exists (e.g. “kuon_(utawarerumono) OR rating:safe OR utawarerumono”), the Parser will try to find keyword / tag subgroups with the least keyword / tag number. This may often CAUSE ERRORS, so make a quick check of your keywords before setting this parameter to True.
api_key (str) – The api_key used to access JSON-API. Can be acquired after logging in at https://gelbooru.com/index.php?page=account&s=options.
user_id (str) – The user_id used to access JSON-API. Can be acquired after logging in at https://gelbooru.com/index.php?page=account&s=options.

generate_keyword_string()[source]

Return type:: str

generate_keyword_string_include(session=None)[source]

Parameters:: session (Session)
Return type:: str

get_gallery_page_urls()[source]

Return type:: list[str]

get_image_info_from_gallery(session=None)[source]

Parameters:: session (Session)
Return type:: list[ImageInfo]

get_image_info_from_json(session=None)[source]

Parameters:: session (Session)
Return type:: list[ImageInfo]

get_json_page_num()[source]

Return type:: int

get_json_page_urls()[source]

Return type:: list[str]

get_total_image_num_gallery(session=None)[source]

Parameters:: session (Session)
Return type:: int

get_total_image_num_json(session=None)[source]

Parameters:: session (Session)
Return type:: int

run()[source]

The main function that runs the Parser and returns a list of image_crawler_utils.ImageInfo.

Return type:: list[ImageInfo]

class image_crawler_utils.stations.booru.MoebooruKeywordParser(station_url, crawler_settings=<image_crawler_utils.classes.crawler_settings.CrawlerSettings object>, standard_keyword_string=None, keyword_string=None, cookies=Cookies(cookies_nodriver=None, cookies_selenium=[], cookies_dict={}, cookies_string=''), use_api=True, image_num_per_gallery_page=1, image_num_per_json=10, replace_url_with_source_level='None', use_keyword_include=False, has_cloudflare=False)[source]

Bases: KeywordParser

Parameters:

crawler_settings (image_crawler_utils.CrawlerSettings) – The CrawlerSettings used in this Parser.
station_url (str) –
The URL of the main page of a website.
- This parameter works when several websites use the same structure. For example, https://yande.re/ and https://konachan.com/ both use Moebooru to build their websites, and this parameter must be filled to deal with these sites respectively.
- For websites like https://www.pixiv.net/, as no other website uses its structure, this parameter has already been initialized and do not need to be filled.
standard_keyword_string (str) – Query keyword string using standard syntax. Refer to the documentation for detailed instructions.
cookies (image_crawler_utils.Cookies, list, dict, str, None) –
Cookies used in loading websites.
- Can be one of image_crawler_utils.Cookies, list, dict, str or None.
  - None means no cookies and works the same as Cookies().
  - Leave this parameter blank works the same as None / Cookies().
use_api (bool) –
Use Moebooru API page, like https://yande.re/post.json?api_version=2.
- Set to False will parse image infos from directly visited gallery pages, like https://yande.re/.
- For some websites like konachan.com, the API is protected, and you need to set this parameters to False to ensure that the Parser works correctly.
image_num_per_gallery_page (int) –
Denotes how many images are displayed on a gallery page.
- When use_api is set to True, this parameter will be used to estimate the total JSON page number (as we can only acquire total gallery page num from a gallery page). Otherwise it is not used.
- Several predefined constants are provided for this. You can import them from image_crawler_utils.stations.booru, like:
```
from image_crawler_utils.stations.booru import (
    YANDERE_IMAGE_NUM_PER_GALLERY_PAGE,  # yande.re
    KONACHAN_COM_IMAGE_NUM_PER_GALLERY_PAGE,  # konachan.com
    KONACHAN_NET_IMAGE_NUM_PER_GALLERY_PAGE,  # konachan.net
)
```
image_num_per_json (int) –
When use_api is set to True, this parameter will control how many images are displayed on a JSON-API page.
- Several predefined constants are provided for this. You can import them from image_crawler_utils.stations.booru, like:
```
from image_crawler_utils.stations.booru import (
    YANDERE_IMAGE_NUM_PER_JSON,  # yande.re
    KONACHAN_NET_IMAGE_NUM_PER_JSON,  # konachan.com
    KONACHAN_COM_IMAGE_NUM_PER_JSON,  # konachan.net
)
```
keyword_string (str, None) –
If you want to directly specify the keywords used in searching, set keyword_string to a custom non-empty string. It will OVERWRITE standard_keyword_string.
- For example, set keyword_string to "kuon_(utawarerumono) rating:safe" in DanbooruKeywordParser means searching directly with this string in Danbooru, and its standard keyword string equivalent is "kuon_(utawarerumono) AND rating:safe".
replace_url_with_source_level (str, must be one of "All", "File", and "None") –
A level controlling whether the Parser will try to download from the source URL of images instead of from the current website.
- It has 3 available levels, and default is “None”:
  - ”All” or “all” (NOT SUGGESTED): As long as the image has a source URL, try to download from this URL first.
  - ”File” or “file”: If the source URL looks like a file (e.g. https://foo.bar/image.png) or it is one of several special websites (e.g. Pixiv or Twitter / X status), try to download from this URL first.
  - ”None” or “none”: Do not try to download from any source URL first.
- Both source URLs and Danbooru URLs are stored in ImageInfo class and will be used when downloading. This parameters only controls the priority of URLs.
- Set to a level other than “None” / “none” will reduce the pressure on Danbooru server but cost longer time (as source URLs may not be directly accessible, or they are absolutely unavailable).
use_keyword_include (bool) –
Using a new keyword string whose searching results can contain all images belong to the original keyword string result. Default set to False.
- Example: search “A” can contain all results by “A and B”
has_cloudflare (bool) – Denoting whether current website has a cloudflare protection. Set to True meaning current site is protected by Cloudflare (e.g. konachan.com). A browser window will be open (and often MANUAL operations will be needed) to get cookies in order to bypass it.

generate_keyword_string()[source]

Return type:: str

generate_keyword_string_include(session=None)[source]

Parameters:: session (Session)
Return type:: str

get_gallery_page_num(session=None)[source]

Parameters:: session (Session)
Return type:: int

get_gallery_page_urls()[source]

Return type:: list[str]

get_image_info_from_gallery_pages(session=None)[source]

Parameters:: session (Session)

get_image_info_from_json(session=None)[source]

Parameters:: session (Session)
Return type:: list[ImageInfo]

get_json_page_num(session=None)[source]

Parameters:: session (Session)
Return type:: int

get_json_page_urls()[source]

Return type:: list[str]

run()[source]

The main function that runs the Parser and returns a list of image_crawler_utils.ImageInfo.

Return type:: list[ImageInfo]

class image_crawler_utils.stations.booru.SafebooruKeywordParser(station_url='https://safebooru.org/', crawler_settings=<image_crawler_utils.classes.crawler_settings.CrawlerSettings object>, standard_keyword_string=None, keyword_string=None, cookies=Cookies(cookies_nodriver=None, cookies_selenium=[], cookies_dict={}, cookies_string=''), replace_url_with_source_level='None', use_keyword_include=False)[source]

Bases: KeywordParser

Parameters:

crawler_settings (image_crawler_utils.CrawlerSettings) –
The CrawlerSettings used in this Parser. station_url (str): The URL of the main page of a website.
- This parameter works when several websites use the same structure. For example, https://yande.re/ and https://konachan.com/ both use Moebooru to build their websites, and this parameter must be filled to deal with these sites respectively.
- For websites like https://www.pixiv.net/, as no other website uses its structure, this parameter has already been initialized and do not need to be filled.
standard_keyword_string (str) – Query keyword string using standard syntax. Refer to the documentation for detailed instructions.
cookies (image_crawler_utils.Cookies, list, dict, str, None) –
Cookies used in loading websites.
- Can be one of image_crawler_utils.Cookies, list, dict, str or None.
  - None means no cookies and works the same as Cookies().
  - Leave this parameter blank works the same as None / Cookies().
keyword_string (str, None) –
If you want to directly specify the keywords used in searching, set keyword_string to a custom non-empty string. It will OVERWRITE standard_keyword_string.
- For example, set keyword_string to “kuon_(utawarerumono) rating:safe” in DanbooruKeywordParser means searching directly with this string in Danbooru, and its standard keyword string equivalent is “kuon_(utawarerumono) AND rating:safe”.
- standard_keyword_string and keyword_string CANNOT be None or empty (contains only spaces) at the same time. Otherwise, a critical error will happen!
replace_url_with_source_level (str, must be one of "All", "File", and "None") –
A level controlling whether the Parser will try to download from the source URL of images instead of from the current website.
- It has 3 available levels, and default is “None”:
  - ”All” or “all” (NOT SUGGESTED): As long as the image has a source URL, try to download from this URL first.
  - ”File” or “file”: If the source URL looks like a file (e.g. https://foo.bar/image.png) or it is one of several special websites (e.g. Pixiv or Twitter / X status), try to download from this URL first.
  - ”None” or “none”: Do not try to download from any source URL first.
- Both source URLs and Danbooru URLs are stored in ImageInfo class and will be used when downloading. This parameters only controls the priority of URLs.
- Set to a level other than “None” / “none” will reduce the pressure on Danbooru server but cost longer time (as source URLs may not be directly accessible, or they are absolutely unavailable).
use_keyword_include (bool) –
If this parameter is set to True, KeywordParser will try to find keyword / tag subgroups with lowest number of keywords / tags (or subgroups with number of keywords / tags lower than a threshold, like 2 in Danbooru for those without an account) that contain all searching results with the least page number.
- Only works when standard_keyword_string is used. When keyword_string is specified, this parameter is omitted.
- For example, if the standard_keyword_string is set to “kuon_(utawarerumono) AND rating:safe OR utawarerumono”, then the Parser will check “kuon_(utawarerumono) OR utawarerumono” and “rating:safe OR utawarerumono” and select the group with the least page number of results as the keyword string in later queries.
- If no subgroup with less than 2 keywords / tags exists (e.g. “kuon_(utawarerumono) OR rating:safe OR utawarerumono”), the Parser will try to find keyword / tag subgroups with the least keyword / tag number. This may often CAUSE ERRORS, so make a quick check of your keywords before setting this parameter to True.
station_url (str)

generate_keyword_string()[source]

Return type:: str

generate_keyword_string_include(session=None)[source]

Parameters:: session (Session)
Return type:: str

get_image_info_from_json(session=None)[source]

Parameters:: session (Session)
Return type:: list[ImageInfo]

get_json_page_num(session=None)[source]

Parameters:: session (Session)
Return type:: int

get_json_page_urls()[source]

Return type:: list[str]

get_total_image_num(session=None)[source]

Parameters:: session (Session)
Return type:: int

run()[source]

The main function that runs the Parser and returns a list of image_crawler_utils.ImageInfo.

Return type:: list[ImageInfo]

image_crawler_utils.stations.booru.filter_keyword_booru(image_info, standard_keyword_string)[source]

A keyword filter for xxxbooru-style image info.

It will check whether current tags match the standard_keyword_string query.

Parameters:

image_info (image_crawler_utils.ImageInfo) – list of ImageInfo
standard_keyword_string (str) – A standard-syntax keyword string.