image_crawler_utils.stations.pixiv package

class image_crawler_utils.stations.pixiv.PixivKeywordParser(station_url='https://www.pixiv.net/', crawler_settings=<image_crawler_utils.classes.crawler_settings.CrawlerSettings object>, standard_keyword_string=None, keyword_string=None, cookies=Cookies(cookies_nodriver=None, cookies_selenium=[], cookies_dict={}, cookies_string=''), pixiv_search_settings=PixivSearchSettings(age_rating='all', order='newest', target_illust=True, target_manga=True, target_ugoira=True, tags_match_type='partial', display_ai=True, width_lowest=None, width_highest=None, height_lowest=None, height_highest=None, ratio=None, creation_tool='all', starting_date='', ending_date=''), use_keyword_include=False, quick_mode=False, info_page_batch_num=100, info_page_batch_delay=300)[source]

Bases: KeywordParser

Parameters:
  • crawler_settings (image_crawler_utils.CrawlerSettings) – The CrawlerSettings used in this Parser.

  • station_url (str) –

    The URL of the main page of a website.

    • This parameter works when several websites use the same structure. For example, https://yande.re/ and https://konachan.com/ both use Moebooru to build their websites, and this parameter must be filled to deal with these sites respectively.

    • For websites like https://www.pixiv.net/, as no other website uses its structure, this parameter has already been initialized and do not need to be filled.

  • standard_keyword_string (str) – Query keyword string using standard syntax. Refer to the documentation for detailed instructions.

  • pixiv_search_settings (image_crawler_utils.stations.pixiv.PixivSearchSettings) – A PixivSearchSettings class that contains extra options when searching.

  • keyword_string (str, None) –

    If you want to directly specify the keywords used in searching, set keyword_string to a custom non-empty string. It will OVERWRITE standard_keyword_string.

    • For example, set keyword_string to "kuon_(utawarerumono) rating:safe" in DanbooruKeywordParser means searching directly with this string in Danbooru, and its standard keyword string equivalent is "kuon_(utawarerumono) AND rating:safe".

  • use_keyword_include (bool) –

    Using a new keyword string whose searching results can contain all images belong to the original keyword string result. Default set to False.

    • Example: search “A” can contain all results by “A and B”

  • cookies (image_crawler_utils.Cookies, str, dict, list, None) – Cookies containing logging information.

  • thread_delay (float, Callable, None) – As Pixiv restricts number of requests in a certain period, this argument defines the delay time (seconds) before every downloading thread of websites.

  • quick_mode (bool) –

    Only collect the basic information.

    • Pixiv has a strict anti-crawling restriction on acquiring the pages containing information of images. Set this parameter to True will not request these pages and collect only the basic information of images for downloading.

    • Different Parsers may have different structures of image information. Refer to the [ImageInfo Structure](#imageinfo-structure-4) chapter for the difference between results.
      • If set to False (get full information), then the thread_delay when downloading information pages will be forced to be set to no lower than CrawlerSettings.download_config.thread_num * 1.0. Other pages are not affected.

  • info_page_batch_num (int) – After downloading info_page_batch_num number of image information pages, the crawler will wait for info_page_batch_delay seconds before continue.

  • info_page_batch_delay (float, None) –

    After downloading info_page_batch_num number of image information pages, the crawler will wait for info_page_batch_delay seconds before continue.

    • If quick_mode is set to True, both info_page_batch_num and info_page_batch_delay will be ignored.

    • If you are not sure, leaving both info_page_batch_num and info_page_batch_delay blank (use their default values) is likely enough for preventing your account to be suspended.

    • info_page_batch_delay can be a function that will be called for every usage.

generate_keyword_string()[source]
Return type:

str

generate_keyword_string_include()[source]
Return type:

str

get_image_basic_info(session=None)[source]
Parameters:

session (Session)

Return type:

dict

get_image_info_full(session=None)[source]
Parameters:

session (Session)

Return type:

list[ImageInfo]

get_image_info_quick(session=None)[source]
Parameters:

session (Session)

Return type:

list[ImageInfo]

get_json_page_num(session=None)[source]
Parameters:

session (Session)

Return type:

int

get_json_page_urls()[source]
Return type:

list[str]

run()[source]

The main function that runs the Parser and returns a list of image_crawler_utils.ImageInfo.

Return type:

list[ImageInfo]

class image_crawler_utils.stations.pixiv.PixivSearchSettings(age_rating='all', order='newest', target_illust=True, target_manga=True, target_ugoira=True, tags_match_type='partial', display_ai=True, width_lowest=None, width_highest=None, height_lowest=None, height_highest=None, ratio=None, creation_tool='all', starting_date='', ending_date='')[source]

Bases: object

Search settings for Pixiv.

Parameters:
  • age_rating (str)

  • order (str)

  • target_illust (bool)

  • target_manga (bool)

  • target_ugoira (bool)

  • tags_match_type (str)

  • display_ai (bool)

  • width_lowest (int | None)

  • width_highest (int | None)

  • height_lowest (int | None)

  • height_highest (int | None)

  • ratio (float | None)

  • creation_tool (str)

  • starting_date (str)

  • ending_date (str)

build_search_appending_str_json(keyword_string)[source]

Building a searching appending suffix for ajax api.

Parameters:

keyword_string (str) – the constructed keyword string for Pixiv.

build_search_appending_str_website(keyword_string)[source]

Building a searching appending suffix for website.

Parameters:

keyword_string (str) – the constructed keyword string for Pixiv.

age_rating: str = 'all'

Age rating. MUST be selected from “all”, “safe” and “r18”.

creation_tool: str = 'all'

Creation tool of images. Default is “all”.

Can be one of these strings:

CLICK HERE TO DISPLAY
'all'
'sai'
'photoshop'
'clip studio paint'
'illuststudio'
'comicstudio'
'pixia'
'azpainter2'
'painter'
'illustrator'
'gimp'
'firealpaca'
'oekaki bbs'
'azpainter'
'cgillust'
'oekaki chat'
'tegaki blog'
'ms_paint'
'pictbear'
'opencanvas'
'paintshoppro'
'edge'
'drawr'
'comicworks'
'azdrawing'
'sketchbookpro'
'photostudio'
'paintgraphic'
'medibang paint'
'nekopaint'
'inkscape'
'artrage'
'azdrawing2'
'fireworks'
'ibispaint'
'aftereffects'
'mdiapp'
'graphicsgale'
'krita'
'kokuban.in'
'retas studio'
'e-mote'
'4thpaint'
'comilabo'
'pixiv sketch'
'pixelmator'
'procreate'
'expression'
'picturepublisher'
'processing'
'live2d'
'dotpict'
'aseprite'
'pastela'
'poser'
'metasequoia'
'blender'
'shade'
'3dsmax'
'daz studio'
'zbrush'
'comi po!'
'maya'
'lightwave3d'
'hexagon king'
'vue'
'sketchup'
'cinema4d'
'xsi'
'carrara'
'bryce'
'strata'
'sculptris'
'modo'
'animationmaster'
'vistapro'
'sunny3d'
'3d-coat'
'paint 3d'
'vroid studio'
'mechanical pencil'
'pencil'
'ballpoint pen'
'thin marker'
'colored pencil'
'copic marker'
'dip pen'
'watercolors'
'brush'
'calligraphy pen'
'felt-tip pen'
'magic marker'
'watercolor brush'
'paint'
'acrylic paint'
'fountain pen'
'pastels'
'airbrush'
'color ink'
'crayon'
'oil paint'
'coupy pencil'
'gansai'
'pastel crayons'
display_ai: bool = True

Whether to display AI-generated images.

ending_date: str = ''

Search images uploaded before this date. MUST be “YYYY-MM-DD”, “YYYY.MM.DD” or “YYYY/MM/DD” format.

height_highest: int | None = None

Highest height (in pixels) of images. Default is None (no restrictions).

height_lowest: int | None = None

Lowest height (in pixels) of images. Default is None (no restrictions).

order: str = 'newest'

Order of images. MUST be selected from “newest” and “oldest”.

ratio: float | None = None

Ratio of images. Default is None (no restrictions).

  • Set to 0 means select only square images.

  • Set to positive means select horizontal images. For example, ratio=0.5 means selecting images with width / height >= 1 + 0.5 = 1.5

  • Set to negative means select vertical images. For example, ratio=-0.5 means selecting images with height / width >= 1 + 0.5 = 1.5

starting_date: str = ''

Search images uploaded after this date. MUST be “YYYY-MM-DD”, “YYYY.MM.DD” or “YYYY/MM/DD” format.

tags_match_type: str = 'partial'

Matching type of tags. MUST be selected from “partial”, “perfect”, “title_caption”.

  • “partial”: Partially matched tags are accepted.

  • “perfect”: Tags must be perfectly matched.

  • “title_caption”: Searched keywords will be matched with titles and captions.

target_illust: bool = True

Whether to include illustrations in results.

target_manga: bool = True

Whether to include mangas in results.

target_ugoira: bool = True

Whether to include ugoiras (animations) in results.

  • Cannot set target_illust, target_manga and target_ugoira to False at the same time.

  • Cannot set only one of target_illust and target_ugoira to :py:data:False with the rest set to True at the same time.

width_highest: int | None = None

Highest width (in pixels) of images. Default is None (no restrictions).

width_lowest: int | None = None

Lowest width (in pixels) of images. Default is None (no restrictions).

class image_crawler_utils.stations.pixiv.PixivUserParser(member_id, station_url='https://www.pixiv.net/', crawler_settings=<image_crawler_utils.classes.crawler_settings.CrawlerSettings object>, cookies=Cookies(cookies_nodriver=None, cookies_selenium=[], cookies_dict={}, cookies_string=''), thread_delay=0, quick_mode=False, info_page_batch_num=100, info_page_batch_delay=300)[source]

Bases: Parser

Parameters:
  • crawler_settings (image_crawler_utils.CrawlerSettings) – The CrawlerSettings used in this Parser.

  • member_id (str) – Pixiv ID of the user.

  • station_url (str) –

    The URL of the main page of a website.

    • This parameter works when several websites use the same structure. For example, https://yande.re/ and https://konachan.com/ both use Moebooru to build their websites, and this parameter must be filled to deal with these sites respectively.

    • For websites like https://www.pixiv.net/, as no other website uses its structure, this parameter has already been initialized and do not need to be filled.

  • use_keyword_include (bool) –

    Using a new keyword string whose searching results can contain all images belong to the original keyword string result. Default set to False.

    • Example: search “A” can contain all results by “A and B”

  • cookies (image_crawler_utils.Cookies, str, dict, list, None) – Cookies containing logging information.

  • quick_mode (bool) – DO NOT DOWNLOAD any image info. Will increase speed of downloading.

  • info_page_batch_num (int) – Batch size of images. Finish downloading a batch will wait for a rather long time.

  • info_page_batch_delay (float, None) – Delay time after each batch of images is downloaded.

  • thread_delay (float | Callable)

get_image_ids(session=None)[source]
Parameters:

session (Session)

Return type:

list[str]

get_image_info_full(session=None)[source]
Parameters:

session (Session)

Return type:

list[ImageInfo]

get_image_info_quick(session=None)[source]
Parameters:

session (Session)

Return type:

list[ImageInfo]

run()[source]

The main function that runs the Parser and returns a list of image_crawler_utils.ImageInfo.

Return type:

list[ImageInfo]

image_crawler_utils.stations.pixiv.filter_keyword_pixiv(image_info, standard_keyword_string)[source]

A keyword filter for xxxbooru-style image info.

The “tags” should be accessed by info[“tags”].

Parameters:
image_crawler_utils.stations.pixiv.get_pixiv_cookies(pixiv_id=None, password=None, proxies=None, timeout=30.0, headless=False, waiting_seconds=60.0, log=<image_crawler_utils.log.Log object>)[source]

Manually get cookies by logging in to Pixiv.

Parameters:
  • pixiv_id (str, None) – Your Pixiv ID or mail address. Leave it to input manually.

  • password (str, None) – Your Pixiv password. Leave it to input manually.

  • proxies (dict, None) –

    The proxies used in nodriver browser.

    • The pattern should be in a requests-acceptable form like:

      • HTTP type: {'http': '127.0.0.1:7890'}

      • HTTPS type: {'https': '127.0.0.1:7890'}, or {'https': '127.0.0.1:7890', 'http': '127.0.0.1:7890'}

      • SOCKS type: {'https': 'socks5://127.0.0.1:7890'}

  • timeout (float, None) – Timeout (seconds) for waiting elements. Default is 30.

  • headless (bool, None) – Use headless mode. Default is False.

  • waiting_seconds (float, None) – In headless mode, if the next step cannot be loaded in waiting_seconds, then an error will be raised. Default is 60.

  • log (image_crawler_utils.log.Log, None) – Logging config.

Returns:

A image_crawler_utils.Cookies class.

Return type:

Cookies | None