Core API

The library supports retrieving Anime, User Profile/Stats, and User-Anime Info.

Anime and Users are identified by id_ref (int), and user_id (str) respectively, so while you can enumerate through Anime, you must ‘discover’ Users.

mal_scraper.discover_users(requester=<module 'requests' from '/home/docs/checkouts/readthedocs.org/user_builds/mal-scraper/envs/stable/lib/python3.5/site-packages/requests/__init__.py'>, use_cache=True, use_web=None)[source]

Return a set of user_ids usable by other user related library calls.

By default we will attempt to return any in our cache - clearing the cache in the process. If there are no users in the cache, we will attempt to find some on MAL but these will be biased towards recently active users.

The cache is built up by discovering users from all of the other web-pages retrieved from other API calls as you make those calls.

Parameters:
  • requester (requests-like, optional) – HTTP request maker. This allows us to control/limit/mock requests.
  • use_cache (bool, optional) – Ignore the cache that we have built up over time? True (default): Pretend the cache is empty (and do not clear it). False: Get and clear the cache.
  • use_web (bool, optional) – Control whether to fall back to scraping. None (default) to make a network call only if the cache is empty. False to never make a network call. True to always make a network call.
Returns:

A set of user_ids which are strings.

Raises:

Network and Request Errors – See Requests library.

Examples

Get user_ids discovered from earlier uses of the library:

animes = mal_scraper.get_anime()
users_probably_from_cache = mal_scraper.discover_users()

Get user_ids if there are any in the cache, but don’t bother to make a network call just to find some:

users_from_cache = mal_scraper.discover_users(use_web=False)

Discover some users from the web, ignoring the cache:

users_from_web = mal_scraper.discover_users(use_cache=False)
mal_scraper.get_anime(id_ref=1, requester=<module 'requests' from '/home/docs/checkouts/readthedocs.org/user_builds/mal-scraper/envs/stable/lib/python3.5/site-packages/requests/__init__.py'>)[source]

Return the information for a particular show.

You can simply enumerate through id_refs.

This will raise exceptions unless we properly and fully retrieve and process the web-page.

TODO: Genres https://myanimelist.net/info.php?go=genre # Broadcast? Producers? Licensors? Studios? Source? Duration?

Parameters:
  • id_ref (int, optional) – Internal show identifier.
  • requester (requests-like, optional) – HTTP request maker. This allows us to control/limit/mock requests.
Returns:

Retrieved – with the attributes meta and data.

data:

{
    'name': str,
    'name_english': str,
    'format': mal_scraper.Format,
    'episodes': int, or None when MAL does not know,
    'airing_status': mal_scraper.AiringStatus,
    'airing_started': date, or None when MAL does not know,
    'airing_finished': date, or None when MAL does not know,
    'airing_premiere': tuple(Year (int), Season (mal_scraper.Season))
        or None (for films, OVAs, specials, ONAs, music, or
        if MAL does not know),
    'mal_age_rating': mal_scraper.AgeRating,
    'mal_score': float, or None when not yet aired/MAL does not know,
    'mal_scored_by': int (number of people),
    'mal_rank': int, or None when not yet aired/some R rated anime,
    'mal_popularity': int,
    'mal_members': int,
    'mal_favourites': int,
}

See also Format, AiringStatus, Season.

Raises:
  • Network and Request Errors – See Requests library.
  • ParseError – Upon processing the web-page including anything that does not meet expectations.

Examples

Retrieve the first anime and get the next anime to retrieve:

next_anime = 1

try:
    meta, data = mal_scraper.get_anime(next_anime)
except mal_scraper.ParseError as err:
    logger.error('Investigate page %s with error %d', err.url, err.code)
except NetworkandRequestErrors:  # Pseudo-code (TODO: These docs)
    pass  # Retry?
else:
    mycode.save_data(data, when=meta['when'])

next_anime = meta['id_ref'] + 1
mal_scraper.get_user_anime_list(user_id, requester=<module 'requests' from '/home/docs/checkouts/readthedocs.org/user_builds/mal-scraper/envs/stable/lib/python3.5/site-packages/requests/__init__.py'>)[source]

Return the anime listed by the user on their profile.

This will make multiple network requests (possibly > 10).

TODO: Return Meta

Parameters:
  • user_id (str) – The user identifier (i.e. the username).
  • requester (requests-like, optional) – HTTP request maker. This allows us to control/limit/mock requests.
Returns:

A list of anime-info where each anime-info is the following dict:

{
    'name': (string) name of the anime,
    'id_ref': (id_ref) can be used with mal_scraper.get_anime,
    'consumption_status': (mal_scraper.ConsumptionStatus),
    'is_rewatch': (bool),
    'score': (int) 0-10,
    'progress': (int) 0+ number of episodes watched,
    'tags': (set of strings) user tags,

    The following tags have been removed for now:
    'start_date': (date, or None) may be missing,
    'finish_date': (date, or None) may be missing or not finished,
}

See also ConsumptionStatus.

Raises:
  • Network and Request Errors – See Requests library.
  • RequestErrorRequestError.Code.forbidden if the user’s info is private, or RequestError.Code.does_not_exist if the user_id is invalid. See RequestError.Code.
  • ParseError – Upon processing the web-page including anything that does not meet expectations.
mal_scraper.get_user_stats(user_id, requester=<module 'requests' from '/home/docs/checkouts/readthedocs.org/user_builds/mal-scraper/envs/stable/lib/python3.5/site-packages/requests/__init__.py'>)[source]

Return statistics about a particular user.

# TODO: Return Gender Male/Female # TODO: Return Birthday “Nov”, “Jan 27, 1997” # TODO: Return Location “England” # e.g. https://myanimelist.net/profile/Sakana-san

Parameters:
  • user_id (string) – The username identifier of the MAL user.
  • requester (requests-like, optional) – HTTP request maker. This allows us to control/limit/mock requests.
Returns:

Retrieved – with the attributes meta and data.

data:

{
    'name': (str) user_id/username,
    'last_online': (datetime),
    'joined': (datetime),
    'num_anime_watching': (int),
    'num_anime_completed': (int),
    'num_anime_on_hold': (int),
    'num_anime_dropped': (int),
    'num_anime_plan_to_watch': (int),
}

Raises:
  • Network and Request Errors – See Requests library.
  • RequestErrorRequestError.Code.does_not_exist if the user_id is invalid (i.e. the username does not exist). See RequestError.Code.
  • ParseError – Upon processing the web-page including anything that does not meet expectations.