Reddit allows requests of up to 100 items at once. So if you request <= 100 items PRAW can serve your request in a single API call, but for larger requests PRAW will break it into multiple API calls of 100 items each separated by a small 2 second delay to follow the api guidelines. So requesting 250 items will require 3 api calls and take at least 2×2=4 seconds due to API delay. PRAW does the API calls lazily, i.e. it will not send the next api call until you actually need the data. Meaning the runtime is max(api_delay, code execution time).
Regarding comment extraction:
from praw.models import MoreComments
for top_level_comment in submission.comments:
if isinstance(top_level_comment, MoreComments):
continue
print(top_level_comment.body)
In the above snippet, isinstance()
is used to check whether the item in the comment list was a MoreComments
so that we could ignore it. But there is a better way: the CommentForest
object has a method called replace_more()
, which replaces or removes MoreComments
objects from the forest.
Each replacement requires one network request, and its response may yield additional MoreComments
instances. As a result, by default, replace_more()
only replaces at most 32 MoreComments
instances – all other instances are simply removed. The maximum number of instances to replace can be configured via the limit
parameter.
from PRAW Docs