image_crawler_utils.stations.booru package
- class image_crawler_utils.stations.booru.DanbooruKeywordParser(station_url='https://danbooru.donmai.us/', crawler_settings=<image_crawler_utils.classes.crawler_settings.CrawlerSettings object>, standard_keyword_string=None, keyword_string=None, cookies=Cookies(cookies_nodriver=None, cookies_selenium=[], cookies_dict={}, cookies_string=''), replace_url_with_source_level='None', use_keyword_include=False)[source]
Bases:
KeywordParser- Parameters:
crawler_settings (image_crawler_utils.CrawlerSettings) – The CrawlerSettings used in this Parser.
station_url (str) –
The URL of the main page of a website.
This parameter works when several websites use the same structure. For example, https://yande.re/ and https://konachan.com/ both use Moebooru to build their websites, and this parameter must be filled to deal with these sites respectively.
For websites like https://www.pixiv.net/, as no other website uses its structure, this parameter has already been initialized and do not need to be filled.
standard_keyword_string (str) – Query keyword string using standard syntax. Refer to the documentation for detailed instructions.
cookies (image_crawler_utils.Cookies, list, dict, str, None) –
Cookies used in loading websites.
keyword_string (str, None) –
If you want to directly specify the keywords used in searching, set
keyword_stringto a custom non-empty string. It will OVERWRITEstandard_keyword_string.For example, set
keyword_stringto"kuon_(utawarerumono) rating:safe"in DanbooruKeywordParser means searching directly with this string in Danbooru, and its standard keyword string equivalent is"kuon_(utawarerumono) AND rating:safe".
replace_url_with_source_level (str, must be one of "All", "File", and "None") –
A level controlling whether the Parser will try to download from the source URL of images instead of from the current website.
- It has 3 available levels, and default is “None”:
”All” or “all” (NOT SUGGESTED): As long as the image has a source URL, try to download from this URL first.
”File” or “file”: If the source URL looks like a file (e.g. https://foo.bar/image.png) or it is one of several special websites (e.g. Pixiv or Twitter / X status), try to download from this URL first.
”None” or “none”: Do not try to download from any source URL first.
Both source URLs and Danbooru URLs are stored in ImageInfo class and will be used when downloading. This parameters only controls the priority of URLs.
Set to a level other than “None” / “none” will reduce the pressure on Danbooru server but cost longer time (as source URLs may not be directly accessible, or they are absolutely unavailable).
use_keyword_include (bool) –
If this parameter is set to
True, KeywordParser will try to find keyword / tag subgroups with lowest number of keywords / tags (or subgroups with number of keywords / tags lower than a threshold, like 2 in Danbooru for those without an account) that contain all searching results with the least page number.Only works when
standard_keyword_stringis used. Whenkeyword_stringis specified, this parameter is omitted.For example, if the
standard_keyword_stringis set to “kuon_(utawarerumono) AND rating:safe OR utawarerumono”, then the Parser will check “kuon_(utawarerumono) OR utawarerumono” and “rating:safe OR utawarerumono” and select the group with the least page number of results as the keyword string in later queries.If no subgroup with less than 2 keywords / tags exists (e.g. “kuon_(utawarerumono) OR rating:safe OR utawarerumono”), the Parser will try to find keyword / tag subgroups with the least keyword / tag number. This may often CAUSE ERRORS, so make a quick check of your keywords before setting this parameter to
True.
- run()[source]
The main function that runs the Parser and returns a list of
image_crawler_utils.ImageInfo.
- class image_crawler_utils.stations.booru.GelbooruKeywordParser(station_url='https://gelbooru.com/', crawler_settings=<image_crawler_utils.classes.crawler_settings.CrawlerSettings object>, standard_keyword_string=None, keyword_string=None, cookies=Cookies(cookies_nodriver=None, cookies_selenium=[], cookies_dict={}, cookies_string=''), use_api=False, replace_url_with_source_level='None', use_keyword_include=False, api_key=None, user_id=None)[source]
Bases:
KeywordParser- Parameters:
crawler_settings (image_crawler_utils.CrawlerSettings) – The CrawlerSettings used in this Parser.
station_url (str) –
The URL of the main page of a website.
This parameter works when several websites use the same structure. For example, https://yande.re/ and https://konachan.com/ both use Moebooru to build their websites, and this parameter must be filled to deal with these sites respectively.
For websites like https://www.pixiv.net/, as no other website uses its structure, this parameter has already been initialized and do not need to be filled.
standard_keyword_string (str) – Query keyword string using standard syntax. Refer to the documentation for detailed instructions.
cookies (image_crawler_utils.Cookies, list, dict, str, None) –
Cookies used in loading websites.
keyword_string (str, None) –
If you want to directly specify the keywords used in searching, set
keyword_stringto a custom non-empty string. It will OVERWRITEstandard_keyword_string.For example, set
keyword_stringto"kuon_(utawarerumono) rating:safe"in DanbooruKeywordParser means searching directly with this string in Danbooru, and its standard keyword string equivalent is"kuon_(utawarerumono) AND rating:safe".
use_api (bool) –
Use Gelbooru API page, like https://gelbooru.com/index.php?page=dapi&s=post&q=index&json=1&api_key=*********&user_id=*********.
Set to
Falsewill parse image infos from directly visited gallery pages, like https://yande.re/.For some websites like konachan.com, the API is protected, and you need to set this parameters to False to ensure that the Parser works correctly.
replace_url_with_source_level (str, must be one of "All", "File", and "None") –
A level controlling whether the Parser will try to download from the source URL of images instead of from the current website.
- It has 3 available levels, and default is “None”:
”All” or “all” (NOT SUGGESTED): As long as the image has a source URL, try to download from this URL first.
”File” or “file”: If the source URL looks like a file (e.g. https://foo.bar/image.png) or it is one of several special websites (e.g. Pixiv or Twitter / X status), try to download from this URL first.
”None” or “none”: Do not try to download from any source URL first.
Both source URLs and Danbooru URLs are stored in ImageInfo class and will be used when downloading. This parameters only controls the priority of URLs.
Set to a level other than “None” / “none” will reduce the pressure on Danbooru server but cost longer time (as source URLs may not be directly accessible, or they are absolutely unavailable).
use_keyword_include (bool) –
If this parameter is set to
True, KeywordParser will try to find keyword / tag subgroups with lowest number of keywords / tags (or subgroups with number of keywords / tags lower than a threshold, like 2 in Danbooru for those without an account) that contain all searching results with the least page number.Only works when
standard_keyword_stringis used. Whenkeyword_stringis specified, this parameter is omitted.For example, if the
standard_keyword_stringis set to “kuon_(utawarerumono) AND rating:safe OR utawarerumono”, then the Parser will check “kuon_(utawarerumono) OR utawarerumono” and “rating:safe OR utawarerumono” and select the group with the least page number of results as the keyword string in later queries.If no subgroup with less than 2 keywords / tags exists (e.g. “kuon_(utawarerumono) OR rating:safe OR utawarerumono”), the Parser will try to find keyword / tag subgroups with the least keyword / tag number. This may often CAUSE ERRORS, so make a quick check of your keywords before setting this parameter to
True.
api_key (str) – The api_key used to access JSON-API. Can be acquired after logging in at https://gelbooru.com/index.php?page=account&s=options.
user_id (str) – The user_id used to access JSON-API. Can be acquired after logging in at https://gelbooru.com/index.php?page=account&s=options.
- run()[source]
The main function that runs the Parser and returns a list of
image_crawler_utils.ImageInfo.
- class image_crawler_utils.stations.booru.MoebooruKeywordParser(station_url, crawler_settings=<image_crawler_utils.classes.crawler_settings.CrawlerSettings object>, standard_keyword_string=None, keyword_string=None, cookies=Cookies(cookies_nodriver=None, cookies_selenium=[], cookies_dict={}, cookies_string=''), use_api=True, image_num_per_gallery_page=1, image_num_per_json=10, replace_url_with_source_level='None', use_keyword_include=False, has_cloudflare=False)[source]
Bases:
KeywordParser- Parameters:
crawler_settings (image_crawler_utils.CrawlerSettings) – The CrawlerSettings used in this Parser.
station_url (str) –
The URL of the main page of a website.
This parameter works when several websites use the same structure. For example, https://yande.re/ and https://konachan.com/ both use Moebooru to build their websites, and this parameter must be filled to deal with these sites respectively.
For websites like https://www.pixiv.net/, as no other website uses its structure, this parameter has already been initialized and do not need to be filled.
standard_keyword_string (str) – Query keyword string using standard syntax. Refer to the documentation for detailed instructions.
cookies (image_crawler_utils.Cookies, list, dict, str, None) –
Cookies used in loading websites.
use_api (bool) –
Use Moebooru API page, like https://yande.re/post.json?api_version=2.
Set to
Falsewill parse image infos from directly visited gallery pages, like https://yande.re/.For some websites like konachan.com, the API is protected, and you need to set this parameters to False to ensure that the Parser works correctly.
image_num_per_gallery_page (int) –
Denotes how many images are displayed on a gallery page.
When use_api is set to True, this parameter will be used to estimate the total JSON page number (as we can only acquire total gallery page num from a gallery page). Otherwise it is not used.
Several predefined constants are provided for this. You can import them from image_crawler_utils.stations.booru, like:
from image_crawler_utils.stations.booru import ( YANDERE_IMAGE_NUM_PER_GALLERY_PAGE, # yande.re KONACHAN_COM_IMAGE_NUM_PER_GALLERY_PAGE, # konachan.com KONACHAN_NET_IMAGE_NUM_PER_GALLERY_PAGE, # konachan.net )
image_num_per_json (int) –
When
use_apiis set toTrue, this parameter will control how many images are displayed on a JSON-API page.Several predefined constants are provided for this. You can import them from
image_crawler_utils.stations.booru, like:
from image_crawler_utils.stations.booru import ( YANDERE_IMAGE_NUM_PER_JSON, # yande.re KONACHAN_NET_IMAGE_NUM_PER_JSON, # konachan.com KONACHAN_COM_IMAGE_NUM_PER_JSON, # konachan.net )
keyword_string (str, None) –
If you want to directly specify the keywords used in searching, set
keyword_stringto a custom non-empty string. It will OVERWRITEstandard_keyword_string.For example, set
keyword_stringto"kuon_(utawarerumono) rating:safe"in DanbooruKeywordParser means searching directly with this string in Danbooru, and its standard keyword string equivalent is"kuon_(utawarerumono) AND rating:safe".
replace_url_with_source_level (str, must be one of "All", "File", and "None") –
A level controlling whether the Parser will try to download from the source URL of images instead of from the current website.
- It has 3 available levels, and default is “None”:
”All” or “all” (NOT SUGGESTED): As long as the image has a source URL, try to download from this URL first.
”File” or “file”: If the source URL looks like a file (e.g. https://foo.bar/image.png) or it is one of several special websites (e.g. Pixiv or Twitter / X status), try to download from this URL first.
”None” or “none”: Do not try to download from any source URL first.
Both source URLs and Danbooru URLs are stored in ImageInfo class and will be used when downloading. This parameters only controls the priority of URLs.
Set to a level other than “None” / “none” will reduce the pressure on Danbooru server but cost longer time (as source URLs may not be directly accessible, or they are absolutely unavailable).
use_keyword_include (bool) –
Using a new keyword string whose searching results can contain all images belong to the original keyword string result. Default set to False.
Example: search “A” can contain all results by “A and B”
has_cloudflare (bool) – Denoting whether current website has a cloudflare protection. Set to
Truemeaning current site is protected by Cloudflare (e.g. konachan.com). A browser window will be open (and often MANUAL operations will be needed) to get cookies in order to bypass it.
- run()[source]
The main function that runs the Parser and returns a list of
image_crawler_utils.ImageInfo.
- class image_crawler_utils.stations.booru.SafebooruKeywordParser(station_url='https://safebooru.org/', crawler_settings=<image_crawler_utils.classes.crawler_settings.CrawlerSettings object>, standard_keyword_string=None, keyword_string=None, cookies=Cookies(cookies_nodriver=None, cookies_selenium=[], cookies_dict={}, cookies_string=''), replace_url_with_source_level='None', use_keyword_include=False)[source]
Bases:
KeywordParser- Parameters:
crawler_settings (image_crawler_utils.CrawlerSettings) –
The CrawlerSettings used in this Parser. station_url (str): The URL of the main page of a website.
This parameter works when several websites use the same structure. For example, https://yande.re/ and https://konachan.com/ both use Moebooru to build their websites, and this parameter must be filled to deal with these sites respectively.
For websites like https://www.pixiv.net/, as no other website uses its structure, this parameter has already been initialized and do not need to be filled.
standard_keyword_string (str) – Query keyword string using standard syntax. Refer to the documentation for detailed instructions.
cookies (image_crawler_utils.Cookies, list, dict, str, None) –
Cookies used in loading websites.
keyword_string (str, None) –
If you want to directly specify the keywords used in searching, set
keyword_stringto a custom non-empty string. It will OVERWRITEstandard_keyword_string.For example, set
keyword_stringto “kuon_(utawarerumono) rating:safe” in DanbooruKeywordParser means searching directly with this string in Danbooru, and its standard keyword string equivalent is “kuon_(utawarerumono) AND rating:safe”.standard_keyword_stringandkeyword_stringCANNOT beNoneor empty (contains only spaces) at the same time. Otherwise, a critical error will happen!
replace_url_with_source_level (str, must be one of "All", "File", and "None") –
A level controlling whether the Parser will try to download from the source URL of images instead of from the current website.
- It has 3 available levels, and default is “None”:
”All” or “all” (NOT SUGGESTED): As long as the image has a source URL, try to download from this URL first.
”File” or “file”: If the source URL looks like a file (e.g. https://foo.bar/image.png) or it is one of several special websites (e.g. Pixiv or Twitter / X status), try to download from this URL first.
”None” or “none”: Do not try to download from any source URL first.
Both source URLs and Danbooru URLs are stored in ImageInfo class and will be used when downloading. This parameters only controls the priority of URLs.
Set to a level other than “None” / “none” will reduce the pressure on Danbooru server but cost longer time (as source URLs may not be directly accessible, or they are absolutely unavailable).
use_keyword_include (bool) –
If this parameter is set to
True, KeywordParser will try to find keyword / tag subgroups with lowest number of keywords / tags (or subgroups with number of keywords / tags lower than a threshold, like 2 in Danbooru for those without an account) that contain all searching results with the least page number.Only works when
standard_keyword_stringis used. Whenkeyword_stringis specified, this parameter is omitted.For example, if the
standard_keyword_stringis set to “kuon_(utawarerumono) AND rating:safe OR utawarerumono”, then the Parser will check “kuon_(utawarerumono) OR utawarerumono” and “rating:safe OR utawarerumono” and select the group with the least page number of results as the keyword string in later queries.If no subgroup with less than 2 keywords / tags exists (e.g. “kuon_(utawarerumono) OR rating:safe OR utawarerumono”), the Parser will try to find keyword / tag subgroups with the least keyword / tag number. This may often CAUSE ERRORS, so make a quick check of your keywords before setting this parameter to
True.
station_url (str)
- run()[source]
The main function that runs the Parser and returns a list of
image_crawler_utils.ImageInfo.
- image_crawler_utils.stations.booru.filter_keyword_booru(image_info, standard_keyword_string)[source]
A keyword filter for xxxbooru-style image info.
It will check whether current tags match the standard_keyword_string query.
- Parameters:
image_info (image_crawler_utils.ImageInfo) – list of ImageInfo
standard_keyword_string (str) – A standard-syntax keyword string.