image_crawler_utils.stations.twitter package

class image_crawler_utils.stations.twitter.TwitterKeywordMediaParser(station_url='https://x.com/', crawler_settings=<image_crawler_utils.classes.crawler_settings.CrawlerSettings object>, standard_keyword_string=None, keyword_string=None, cookies=Cookies(cookies_nodriver=None, cookies_selenium=[], cookies_dict={}, cookies_string=''), twitter_search_settings=TwitterSearchSettings(from_users=None, to_users=None, mentioned_users=None, including_replies=True, only_replies=False, including_links=True, only_links=False, including_media=True, only_media=False, min_reply_num=None, min_favorite_num=None, min_retweet_num=None, starting_date='', ending_date=''), reload_times=1, error_retry_delay=200, headless=True)[source]

Bases: KeywordParser

Keyword Parser for Twitter. Will fetch all media images from the searching result of certain keywords.

Parameters:
  • crawler_settings (image_crawler_utils.CrawlerSettings) – The CrawlerSettings used in this Parser.

  • station_url (str) –

    The URL of the main page of a website.

    • This parameter works when several websites use the same structure. For example, https://yande.re/ and https://konachan.com/ both use Moebooru to build their websites, and this parameter must be filled to deal with these sites respectively.

    • For websites like https://www.pixiv.net/, as no other website uses its structure, this parameter has already been initialized and do not need to be filled.

  • standard_keyword_string (str) – Query keyword string using standard syntax. Refer to the documentation for detailed instructions.

  • keyword_string (str, None) –

    If you want to directly specify the keywords used in searching, set keyword_string to a custom non-empty string. It will OVERWRITE standard_keyword_string.

    • For example, set keyword_string to "kuon_(utawarerumono) rating:safe" in DanbooruKeywordParser means searching directly with this string in Danbooru, and its standard keyword string equivalent is "kuon_(utawarerumono) AND rating:safe".

  • cookies (image_crawler_utils.Cookies, str, dict, list, None) – Cookies containing logging information.

  • twitter_search_settings (image_crawler_utils.stations.twitter.TwitterSearchSettings) – A TwitterSearchSettings class that contains extra options when searching.

  • reload_times (int) – Reload the page for reload_times times. May be useful when there are status (tweets) not detected.

  • error_retry_delay (float) – When Twitter / X returns an error, the Parser will retry after error_retry_delay seconds.

  • headless (bool) – Do not display browsers window when a browser is started. Set to False will pop up browser windows.

generate_keyword_string()[source]
Return type:

str

get_status()[source]
Return type:

list[TwitterStatus]

parse_images_from_status()[source]
Return type:

list[ImageInfo]

run()[source]

The main function that runs the Parser and returns a list of image_crawler_utils.ImageInfo.

Return type:

list[ImageInfo]

class image_crawler_utils.stations.twitter.TwitterSearchSettings(from_users=None, to_users=None, mentioned_users=None, including_replies=True, only_replies=False, including_links=True, only_links=False, including_media=True, only_media=False, min_reply_num=None, min_favorite_num=None, min_retweet_num=None, starting_date='', ending_date='')[source]

Bases: object

TwitterSearchSettings controls advanced searching settings. It will append an string to the keyword string according to the settings in this class.

Parameters:
  • from_users (list[str] | str | None)

  • to_users (list[str] | str | None)

  • mentioned_users (list[str] | str | None)

  • including_replies (bool)

  • only_replies (bool)

  • including_links (bool)

  • only_links (bool)

  • including_media (bool)

  • only_media (bool)

  • min_reply_num (int | None)

  • min_favorite_num (int | None)

  • min_retweet_num (int | None)

  • starting_date (str)

  • ending_date (str)

build_search_appending_str(keyword_string)[source]

Building a searching appending suffix.

Parameters:

keyword_string (str) – the constructed keyword string for Twitter.

ending_date: str = ''

Tweets before this date. Must be “YYYY-MM-DD”, “YYYY.MM.DD” or “YYYY/MM/DD” format.

from_users: list[str] | str | None = None

Select tweets sent by a certain user / a certain list of users.

Including tweets that contain at least one link.

including_media: bool = True

Including tweets that contain at least one media.

including_replies: bool = True

Including reply tweets.

mentioned_users: list[str] | str | None = None

Select tweets that mention a certain user / a certain list of users.

min_favorite_num: int | None = None

Including tweets with more than min_favorite_num favorites.

min_reply_num: int | None = None

Including tweets with more than min_reply_num replies.

min_retweet_num: int | None = None

Including tweets with more than min_retweet_num retweets.

Only including tweets that contain at least one link. Works only if including_replies is set to True (default).

only_media: bool = False

Only including tweets that contain at least one media. Works only if including_replies is set to True (default).

only_replies: bool = False

Only including reply tweets. Works only if including_replies is set to True (default).

starting_date: str = ''

Tweets after this date. Must be “YYYY-MM-DD”, “YYYY.MM.DD” or “YYYY/MM/DD” format.

to_users: list[str] | str | None = None

Select tweets replying to a certain user / a certain list of users.

class image_crawler_utils.stations.twitter.TwitterStatus(status_url=None, status_id=None, user_id=None, user_name=None, time=None, reply_num=0, retweet_num=0, like_num=0, view_num=None, text=None, hashtags=<factory>, links=<factory>, media_list=<factory>)[source]

Bases: object

Contains config of a tweet (Twitter / X status).

Parameters:
hashtags: Iterable[str]
like_num: int = 0
media_list: Iterable[TwitterStatusMedia]
reply_num: int = 0
retweet_num: int = 0
status_id: str | None = None
status_url: str | None = None
text: str | None = None
time: str | None = None
user_id: str | None = None
user_name: str | None = None
view_num: int | None = None
class image_crawler_utils.stations.twitter.TwitterStatusMedia(link: str | None = None, image_source: str | None = None, image_id: str | None = None, image_name: str | None = None)[source]

Bases: object

Parameters:
  • link (str | None)

  • image_source (str | None)

  • image_id (str | None)

  • image_name (str | None)

image_id: str | None = None
image_name: str | None = None
image_source: str | None = None
class image_crawler_utils.stations.twitter.TwitterUserMediaParser(user_id, station_url='https://x.com/', crawler_settings=<image_crawler_utils.classes.crawler_settings.CrawlerSettings object>, cookies=Cookies(cookies_nodriver=None, cookies_selenium=[], cookies_dict={}, cookies_string=''), reload_times=1, error_retry_delay=200, interval_days=180, starting_date=None, ending_date=None, exit_when_empty=False, headless=True)[source]

Bases: Parser

Parameters:
generate_search_settings()[source]
Return type:

list[TwitterSearchSettings]

get_status_from_urls()[source]
Return type:

list[TwitterStatus]

parse_images_from_status()[source]
Return type:

list[ImageInfo]

run()[source]

The main function that runs the Parser and returns a list of image_crawler_utils.ImageInfo.

Return type:

list[ImageInfo]

async image_crawler_utils.stations.twitter.find_twitter_status(tab, log=<image_crawler_utils.log.Log object>)[source]

Finding all Twitter / X status on current searching result page.

Parameters:
Returns:

A list of image_crawler_utils.stations.twitter.TwitterStatus class.

Return type:

list[TwitterStatus]

image_crawler_utils.stations.twitter.get_twitter_cookies(twitter_account=None, user_id=None, password=None, proxies=None, timeout=30.0, headless=False, waiting_seconds=60.0, log=<image_crawler_utils.log.Log object>)[source]

Manually get cookies by logging in to Twitter / X.

Parameters:
  • twitter_account (str, None) – Your Twitter / X mail address. Leave it to input manually.

  • user_id (str, None) – Your Twitter / X mail user id (@user_id). Sometimes Twitter / X requires it to confirm your logging in. Leave it to input manually.

  • password (str, None) – Your Twitter / X password. Leave it to input manually.

  • proxies (dict, None) –

    The proxies used in nodriver browser.

    • The pattern should be in a requests-acceptable form like:

      • HTTP type: {'http': '127.0.0.1:7890'}

      • HTTPS type: {'https': '127.0.0.1:7890'}, or {'https': '127.0.0.1:7890', 'http': '127.0.0.1:7890'}

      • SOCKS type: {'https': 'socks5://127.0.0.1:7890'}

  • timeout (float, None) – Timeout (seconds) for waiting elements. Default is 30.

  • headless (bool, None) – Use headless mode. Default is False.

  • waiting_seconds (float, None) – In headless mode, if the next step cannot be loaded in waiting_seconds, then an error will be raised. Default is 60.

  • log (image_crawler_utils.log.Log, None) – Logging config.

Returns:

A image_crawler_utils.Cookies class.

Return type:

Cookies | None

image_crawler_utils.stations.twitter.parse_twitter_status_element(status_html, log=<image_crawler_utils.log.Log object>)[source]

Parse Twitter / X status element from search result page: “<article …></article>”.

Parameters:
Returns:

A image_crawler_utils.stations.twitter.TwitterStatus class.

Return type:

TwitterStatus | None

async image_crawler_utils.stations.twitter.scrolling_to_find_status(tab, tab_url, crawler_settings=<image_crawler_utils.classes.crawler_settings.CrawlerSettings object>, reload_times=1, error_retry_delay=200, image_num_restriction=None, progress_group=None, transient=False)[source]

Scrolling to finding all Twitter / X status on current searching result page.

Parameters:
  • crawler_settings (image_crawler_utils.CrawlerSettings) – The CrawlerSettings used in this Parser.

  • tab (nodriver.Tab) – nodriver.Tab with loaded searching result page.

  • reload_times (int) – To deal with (possible) missing status, reload pages for reload_times to get status results.

  • error_retry_delay (float) – When an error happens (especially Twitter / X returns an error), sleep error_retry_delay before reloading again.

  • progress_group (image_crawler_utils.progress_bar.ProgressGroup) – The Group of Progress bars to be displayed in.

  • transient (bool) – Hide Progress bars after finishing.

  • tab_url (str)

  • image_num_restriction (int | None)

Returns:

A list of image_crawler_utils.stations.twitter.TwitterStatus class, sort by status from large to small.

Return type:

list[TwitterStatus]

async image_crawler_utils.stations.twitter.twitter_empty_check(tab)[source]

Check if the result is empty.

Parameters:
Returns:

Return True if found empty element, or return False.

Return type:

str | None

async image_crawler_utils.stations.twitter.twitter_error_check(tab)[source]

Check if there is an error in loading Twitter / X page.

Parameters:

tab (nodriver.Tab) – Nodriver tab with loaded searching result page.

Returns:

Return True if found error element, or return False.

Return type:

str | None