If your site is a larger site then the memory use and demand on the processor will really (and necessarily) increase as the lists of pages crawled and links checked get longer.
If the site is large enough then the app will eventually run out of memory and obviously can't continue. This is likely to happen after several thousand pages or several tens of thousands of links. (Typically 80,000 - 100,000 links)
Crawling a larger site
- You can crawl the site in parts, if you can break it down into sections, using the black and whitelists (eg, to crawl everything under /engineering, you start at mysite.com/engineering and type /engineering into the 'Only follow links containing' box)
- Make sure Integrity isn't going into a loop or crawling the same page multiple times because of a session id or date in a querystring. you can exclude these pages by blacklisting part of the url or querystring, or ignoring querystrings
- See if you're crawling unnecessary pages, such as a messageboard. To Integrity and Scrutiny, a well-used messageboard can look like tens of thousands of unique pages and it will try to list and check all of those pages. Again, you can exclude these pages by blacklisting part of the url or querystring or ignoring querystrings.
- 'Page titles are unique' can also help you to avoid duplicating or crawling unnecessary pages, and will be quicker, but this only works if every page really does have a unique title.