image_crawler_utils.stations.pixiv package
- class image_crawler_utils.stations.pixiv.PixivKeywordParser(station_url='https://www.pixiv.net/', crawler_settings=<image_crawler_utils.classes.crawler_settings.CrawlerSettings object>, standard_keyword_string=None, keyword_string=None, cookies=Cookies(cookies_nodriver=None, cookies_selenium=[], cookies_dict={}, cookies_string=''), pixiv_search_settings=PixivSearchSettings(age_rating='all', order='newest', target_illust=True, target_manga=True, target_ugoira=True, tags_match_type='partial', display_ai=True, width_lowest=None, width_highest=None, height_lowest=None, height_highest=None, ratio=None, creation_tool='all', starting_date='', ending_date=''), use_keyword_include=False, quick_mode=False, info_page_batch_num=100, info_page_batch_delay=300)[source]
Bases:
KeywordParser- Parameters:
crawler_settings (image_crawler_utils.CrawlerSettings) – The CrawlerSettings used in this Parser.
station_url (str) –
The URL of the main page of a website.
This parameter works when several websites use the same structure. For example, https://yande.re/ and https://konachan.com/ both use Moebooru to build their websites, and this parameter must be filled to deal with these sites respectively.
For websites like https://www.pixiv.net/, as no other website uses its structure, this parameter has already been initialized and do not need to be filled.
standard_keyword_string (str) – Query keyword string using standard syntax. Refer to the documentation for detailed instructions.
pixiv_search_settings (image_crawler_utils.stations.pixiv.PixivSearchSettings) – A PixivSearchSettings class that contains extra options when searching.
keyword_string (str, None) –
If you want to directly specify the keywords used in searching, set
keyword_stringto a custom non-empty string. It will OVERWRITEstandard_keyword_string.For example, set
keyword_stringto"kuon_(utawarerumono) rating:safe"in DanbooruKeywordParser means searching directly with this string in Danbooru, and its standard keyword string equivalent is"kuon_(utawarerumono) AND rating:safe".
use_keyword_include (bool) –
Using a new keyword string whose searching results can contain all images belong to the original keyword string result. Default set to False.
Example: search “A” can contain all results by “A and B”
cookies (image_crawler_utils.Cookies, str, dict, list, None) – Cookies containing logging information.
thread_delay (float, Callable, None) – As Pixiv restricts number of requests in a certain period, this argument defines the delay time (seconds) before every downloading thread of websites.
quick_mode (bool) –
Only collect the basic information.
Pixiv has a strict anti-crawling restriction on acquiring the pages containing information of images. Set this parameter to
Truewill not request these pages and collect only the basic information of images for downloading.- Different Parsers may have different structures of image information. Refer to the [ImageInfo Structure](#imageinfo-structure-4) chapter for the difference between results.
If set to
False(get full information), then thethread_delaywhen downloading information pages will be forced to be set to no lower thanCrawlerSettings.download_config.thread_num * 1.0. Other pages are not affected.
info_page_batch_num (int) – After downloading
info_page_batch_numnumber of image information pages, the crawler will wait forinfo_page_batch_delayseconds before continue.info_page_batch_delay (float, None) –
After downloading
info_page_batch_numnumber of image information pages, the crawler will wait for info_page_batch_delay seconds before continue.If
quick_modeis set toTrue, bothinfo_page_batch_numandinfo_page_batch_delaywill be ignored.If you are not sure, leaving both
info_page_batch_numandinfo_page_batch_delayblank (use their default values) is likely enough for preventing your account to be suspended.info_page_batch_delaycan be a function that will be called for every usage.
- run()[source]
The main function that runs the Parser and returns a list of
image_crawler_utils.ImageInfo.
- class image_crawler_utils.stations.pixiv.PixivSearchSettings(age_rating='all', order='newest', target_illust=True, target_manga=True, target_ugoira=True, tags_match_type='partial', display_ai=True, width_lowest=None, width_highest=None, height_lowest=None, height_highest=None, ratio=None, creation_tool='all', starting_date='', ending_date='')[source]
Bases:
objectSearch settings for Pixiv.
- Parameters:
age_rating (str)
order (str)
target_illust (bool)
target_manga (bool)
target_ugoira (bool)
tags_match_type (str)
display_ai (bool)
width_lowest (int | None)
width_highest (int | None)
height_lowest (int | None)
height_highest (int | None)
ratio (float | None)
creation_tool (str)
starting_date (str)
ending_date (str)
- build_search_appending_str_json(keyword_string)[source]
Building a searching appending suffix for ajax api.
- Parameters:
keyword_string (str) – the constructed keyword string for Pixiv.
- build_search_appending_str_website(keyword_string)[source]
Building a searching appending suffix for website.
- Parameters:
keyword_string (str) – the constructed keyword string for Pixiv.
- creation_tool: str = 'all'
Creation tool of images. Default is “all”.
Can be one of these strings:
CLICK HERE TO DISPLAY
'all' 'sai' 'photoshop' 'clip studio paint' 'illuststudio' 'comicstudio' 'pixia' 'azpainter2' 'painter' 'illustrator' 'gimp' 'firealpaca' 'oekaki bbs' 'azpainter' 'cgillust' 'oekaki chat' 'tegaki blog' 'ms_paint' 'pictbear' 'opencanvas' 'paintshoppro' 'edge' 'drawr' 'comicworks' 'azdrawing' 'sketchbookpro' 'photostudio' 'paintgraphic' 'medibang paint' 'nekopaint' 'inkscape' 'artrage' 'azdrawing2' 'fireworks' 'ibispaint' 'aftereffects' 'mdiapp' 'graphicsgale' 'krita' 'kokuban.in' 'retas studio' 'e-mote' '4thpaint' 'comilabo' 'pixiv sketch' 'pixelmator' 'procreate' 'expression' 'picturepublisher' 'processing' 'live2d' 'dotpict' 'aseprite' 'pastela' 'poser' 'metasequoia' 'blender' 'shade' '3dsmax' 'daz studio' 'zbrush' 'comi po!' 'maya' 'lightwave3d' 'hexagon king' 'vue' 'sketchup' 'cinema4d' 'xsi' 'carrara' 'bryce' 'strata' 'sculptris' 'modo' 'animationmaster' 'vistapro' 'sunny3d' '3d-coat' 'paint 3d' 'vroid studio' 'mechanical pencil' 'pencil' 'ballpoint pen' 'thin marker' 'colored pencil' 'copic marker' 'dip pen' 'watercolors' 'brush' 'calligraphy pen' 'felt-tip pen' 'magic marker' 'watercolor brush' 'paint' 'acrylic paint' 'fountain pen' 'pastels' 'airbrush' 'color ink' 'crayon' 'oil paint' 'coupy pencil' 'gansai' 'pastel crayons'
- ending_date: str = ''
Search images uploaded before this date. MUST be “YYYY-MM-DD”, “YYYY.MM.DD” or “YYYY/MM/DD” format.
- height_highest: int | None = None
Highest height (in pixels) of images. Default is None (no restrictions).
- height_lowest: int | None = None
Lowest height (in pixels) of images. Default is None (no restrictions).
- ratio: float | None = None
Ratio of images. Default is None (no restrictions).
Set to 0 means select only square images.
Set to positive means select horizontal images. For example, ratio=0.5 means selecting images with width / height >= 1 + 0.5 = 1.5
Set to negative means select vertical images. For example, ratio=-0.5 means selecting images with height / width >= 1 + 0.5 = 1.5
- starting_date: str = ''
Search images uploaded after this date. MUST be “YYYY-MM-DD”, “YYYY.MM.DD” or “YYYY/MM/DD” format.
- tags_match_type: str = 'partial'
Matching type of tags. MUST be selected from “partial”, “perfect”, “title_caption”.
“partial”: Partially matched tags are accepted.
“perfect”: Tags must be perfectly matched.
“title_caption”: Searched keywords will be matched with titles and captions.
- target_ugoira: bool = True
Whether to include ugoiras (animations) in results.
Cannot set
target_illust,target_mangaandtarget_ugoiratoFalseat the same time.Cannot set only one of
target_illustandtarget_ugoirato :py:data:Falsewith the rest set toTrueat the same time.
- class image_crawler_utils.stations.pixiv.PixivUserParser(member_id, station_url='https://www.pixiv.net/', crawler_settings=<image_crawler_utils.classes.crawler_settings.CrawlerSettings object>, cookies=Cookies(cookies_nodriver=None, cookies_selenium=[], cookies_dict={}, cookies_string=''), thread_delay=0, quick_mode=False, info_page_batch_num=100, info_page_batch_delay=300)[source]
Bases:
Parser- Parameters:
crawler_settings (image_crawler_utils.CrawlerSettings) – The CrawlerSettings used in this Parser.
member_id (str) – Pixiv ID of the user.
station_url (str) –
The URL of the main page of a website.
This parameter works when several websites use the same structure. For example, https://yande.re/ and https://konachan.com/ both use Moebooru to build their websites, and this parameter must be filled to deal with these sites respectively.
For websites like https://www.pixiv.net/, as no other website uses its structure, this parameter has already been initialized and do not need to be filled.
use_keyword_include (bool) –
Using a new keyword string whose searching results can contain all images belong to the original keyword string result. Default set to False.
Example: search “A” can contain all results by “A and B”
cookies (image_crawler_utils.Cookies, str, dict, list, None) – Cookies containing logging information.
quick_mode (bool) – DO NOT DOWNLOAD any image info. Will increase speed of downloading.
info_page_batch_num (int) – Batch size of images. Finish downloading a batch will wait for a rather long time.
info_page_batch_delay (float, None) – Delay time after each batch of images is downloaded.
- run()[source]
The main function that runs the Parser and returns a list of
image_crawler_utils.ImageInfo.
- image_crawler_utils.stations.pixiv.filter_keyword_pixiv(image_info, standard_keyword_string)[source]
A keyword filter for xxxbooru-style image info.
The “tags” should be accessed by info[“tags”].
- Parameters:
image_info (image_crawler_utils.ImageInfo) – list of ImageInfo
standard_keyword_string (str) – A standard-syntax keyword string.
- image_crawler_utils.stations.pixiv.get_pixiv_cookies(pixiv_id=None, password=None, proxies=None, timeout=30.0, headless=False, waiting_seconds=60.0, log=<image_crawler_utils.log.Log object>)[source]
Manually get cookies by logging in to Pixiv.
- Parameters:
pixiv_id (str, None) – Your Pixiv ID or mail address. Leave it to input manually.
password (str, None) – Your Pixiv password. Leave it to input manually.
proxies (dict, None) –
The proxies used in nodriver browser.
The pattern should be in a
requests-acceptable form like:HTTP type:
{'http': '127.0.0.1:7890'}HTTPS type:
{'https': '127.0.0.1:7890'}, or{'https': '127.0.0.1:7890', 'http': '127.0.0.1:7890'}SOCKS type:
{'https': 'socks5://127.0.0.1:7890'}
timeout (float, None) – Timeout (seconds) for waiting elements. Default is 30.
headless (bool, None) – Use headless mode. Default is False.
waiting_seconds (float, None) – In headless mode, if the next step cannot be loaded in waiting_seconds, then an error will be raised. Default is 60.
log (image_crawler_utils.log.Log, None) – Logging config.
- Returns:
A image_crawler_utils.Cookies class.
- Return type:
Cookies | None