image_crawler_utils.stations.twitter package
- class image_crawler_utils.stations.twitter.TwitterKeywordMediaParser(station_url='https://x.com/', crawler_settings=<image_crawler_utils.classes.crawler_settings.CrawlerSettings object>, standard_keyword_string=None, keyword_string=None, cookies=Cookies(cookies_nodriver=None, cookies_selenium=[], cookies_dict={}, cookies_string=''), twitter_search_settings=TwitterSearchSettings(from_users=None, to_users=None, mentioned_users=None, including_replies=True, only_replies=False, including_links=True, only_links=False, including_media=True, only_media=False, min_reply_num=None, min_favorite_num=None, min_retweet_num=None, starting_date='', ending_date=''), reload_times=1, error_retry_delay=200, headless=True)[source]
Bases:
KeywordParserKeyword Parser for Twitter. Will fetch all media images from the searching result of certain keywords.
- Parameters:
crawler_settings (image_crawler_utils.CrawlerSettings) – The CrawlerSettings used in this Parser.
station_url (str) –
The URL of the main page of a website.
This parameter works when several websites use the same structure. For example, https://yande.re/ and https://konachan.com/ both use Moebooru to build their websites, and this parameter must be filled to deal with these sites respectively.
For websites like https://www.pixiv.net/, as no other website uses its structure, this parameter has already been initialized and do not need to be filled.
standard_keyword_string (str) – Query keyword string using standard syntax. Refer to the documentation for detailed instructions.
keyword_string (str, None) –
If you want to directly specify the keywords used in searching, set
keyword_stringto a custom non-empty string. It will OVERWRITEstandard_keyword_string.For example, set
keyword_stringto"kuon_(utawarerumono) rating:safe"in DanbooruKeywordParser means searching directly with this string in Danbooru, and its standard keyword string equivalent is"kuon_(utawarerumono) AND rating:safe".
cookies (image_crawler_utils.Cookies, str, dict, list, None) – Cookies containing logging information.
twitter_search_settings (image_crawler_utils.stations.twitter.TwitterSearchSettings) – A TwitterSearchSettings class that contains extra options when searching.
reload_times (int) – Reload the page for
reload_timestimes. May be useful when there are status (tweets) not detected.error_retry_delay (float) – When Twitter / X returns an error, the Parser will retry after
error_retry_delayseconds.headless (bool) – Do not display browsers window when a browser is started. Set to
Falsewill pop up browser windows.
- run()[source]
The main function that runs the Parser and returns a list of
image_crawler_utils.ImageInfo.
- class image_crawler_utils.stations.twitter.TwitterSearchSettings(from_users=None, to_users=None, mentioned_users=None, including_replies=True, only_replies=False, including_links=True, only_links=False, including_media=True, only_media=False, min_reply_num=None, min_favorite_num=None, min_retweet_num=None, starting_date='', ending_date='')[source]
Bases:
objectTwitterSearchSettings controls advanced searching settings. It will append an string to the keyword string according to the settings in this class.
- Parameters:
- build_search_appending_str(keyword_string)[source]
Building a searching appending suffix.
- Parameters:
keyword_string (str) – the constructed keyword string for Twitter.
- ending_date: str = ''
Tweets before this date. Must be “YYYY-MM-DD”, “YYYY.MM.DD” or “YYYY/MM/DD” format.
- from_users: list[str] | str | None = None
Select tweets sent by a certain user / a certain list of users.
- mentioned_users: list[str] | str | None = None
Select tweets that mention a certain user / a certain list of users.
- only_links: bool = False
Only including tweets that contain at least one link. Works only if
including_repliesis set toTrue(default).
- only_media: bool = False
Only including tweets that contain at least one media. Works only if
including_repliesis set toTrue(default).
- only_replies: bool = False
Only including reply tweets. Works only if
including_repliesis set toTrue(default).
- class image_crawler_utils.stations.twitter.TwitterStatus(status_url=None, status_id=None, user_id=None, user_name=None, time=None, reply_num=0, retweet_num=0, like_num=0, view_num=None, text=None, hashtags=<factory>, links=<factory>, media_list=<factory>)[source]
Bases:
objectContains config of a tweet (Twitter / X status).
- Parameters:
- media_list: Iterable[TwitterStatusMedia]
- class image_crawler_utils.stations.twitter.TwitterStatusMedia(link: str | None = None, image_source: str | None = None, image_id: str | None = None, image_name: str | None = None)[source]
Bases:
object- Parameters:
- class image_crawler_utils.stations.twitter.TwitterUserMediaParser(user_id, station_url='https://x.com/', crawler_settings=<image_crawler_utils.classes.crawler_settings.CrawlerSettings object>, cookies=Cookies(cookies_nodriver=None, cookies_selenium=[], cookies_dict={}, cookies_string=''), reload_times=1, error_retry_delay=200, interval_days=180, starting_date=None, ending_date=None, exit_when_empty=False, headless=True)[source]
Bases:
Parser- Parameters:
- run()[source]
The main function that runs the Parser and returns a list of
image_crawler_utils.ImageInfo.
- async image_crawler_utils.stations.twitter.find_twitter_status(tab, log=<image_crawler_utils.log.Log object>)[source]
Finding all Twitter / X status on current searching result page.
- Parameters:
tab (unodriver.Tab) – Nodriver tab with loaded searching result page.
log (image_crawler_utils.log.Log, None) – Logging config.
- Returns:
A list of image_crawler_utils.stations.twitter.TwitterStatus class.
- Return type:
- image_crawler_utils.stations.twitter.get_twitter_cookies(twitter_account=None, user_id=None, password=None, proxies=None, timeout=30.0, headless=False, waiting_seconds=60.0, log=<image_crawler_utils.log.Log object>)[source]
Manually get cookies by logging in to Twitter / X.
- Parameters:
twitter_account (str, None) – Your Twitter / X mail address. Leave it to input manually.
user_id (str, None) – Your Twitter / X mail user id (@user_id). Sometimes Twitter / X requires it to confirm your logging in. Leave it to input manually.
password (str, None) – Your Twitter / X password. Leave it to input manually.
proxies (dict, None) –
The proxies used in nodriver browser.
The pattern should be in a
requests-acceptable form like:HTTP type:
{'http': '127.0.0.1:7890'}HTTPS type:
{'https': '127.0.0.1:7890'}, or{'https': '127.0.0.1:7890', 'http': '127.0.0.1:7890'}SOCKS type:
{'https': 'socks5://127.0.0.1:7890'}
timeout (float, None) – Timeout (seconds) for waiting elements. Default is 30.
headless (bool, None) – Use headless mode. Default is False.
waiting_seconds (float, None) – In headless mode, if the next step cannot be loaded in waiting_seconds, then an error will be raised. Default is 60.
log (image_crawler_utils.log.Log, None) – Logging config.
- Returns:
A image_crawler_utils.Cookies class.
- Return type:
Cookies | None
- image_crawler_utils.stations.twitter.parse_twitter_status_element(status_html, log=<image_crawler_utils.log.Log object>)[source]
Parse Twitter / X status element from search result page: “<article …></article>”.
- Parameters:
status_html (str) – HTML string of status element “<article …></article>”.
log (image_crawler_utils.log.Log, None) – Logging config.
- Returns:
A image_crawler_utils.stations.twitter.TwitterStatus class.
- Return type:
TwitterStatus | None
- async image_crawler_utils.stations.twitter.scrolling_to_find_status(tab, tab_url, crawler_settings=<image_crawler_utils.classes.crawler_settings.CrawlerSettings object>, reload_times=1, error_retry_delay=200, image_num_restriction=None, progress_group=None, transient=False)[source]
Scrolling to finding all Twitter / X status on current searching result page.
- Parameters:
crawler_settings (image_crawler_utils.CrawlerSettings) – The CrawlerSettings used in this Parser.
tab (nodriver.Tab) – nodriver.Tab with loaded searching result page.
reload_times (int) – To deal with (possible) missing status, reload pages for reload_times to get status results.
error_retry_delay (float) – When an error happens (especially Twitter / X returns an error), sleep error_retry_delay before reloading again.
progress_group (image_crawler_utils.progress_bar.ProgressGroup) – The Group of Progress bars to be displayed in.
transient (bool) – Hide Progress bars after finishing.
tab_url (str)
image_num_restriction (int | None)
- Returns:
A list of image_crawler_utils.stations.twitter.TwitterStatus class, sort by status from large to small.
- Return type:
- async image_crawler_utils.stations.twitter.twitter_empty_check(tab)[source]
Check if the result is empty.
- Parameters:
tab (nodriver.Tab) – Nodriver tab with loaded searching result page.
tab_url (str) – URL of the tab.
log (image_crawler_utils.log.Log, None) – Logging config.
- Returns:
Return True if found empty element, or return False.
- Return type:
str | None
- async image_crawler_utils.stations.twitter.twitter_error_check(tab)[source]
Check if there is an error in loading Twitter / X page.
- Parameters:
tab (nodriver.Tab) – Nodriver tab with loaded searching result page.
- Returns:
Return True if found error element, or return False.
- Return type:
str | None