If your site is a larger site then the memory use and demand on the processor will increase as the lists of pages crawled and links checked get longer.
If the site is large enough then the app will eventually run out of memory and obviously can't continue. Though version 4 will handle larger sites and should take two hundred thousand links in its stride.
Crawling a larger site
- You can crawl the site in parts, if you can break it down into sections. You can use the black and whitelist rules to limit the crawl, or from version 4 onwards, Integrity and Scrutiny will limit itself to the directory you start in. (eg, to crawl everything under /engineering, simply start at mysite.com/engineering
- Make sure Integrity isn't going into a loop or crawling the same page multiple times because of a session id or date in a querystring. you can exclude these pages by blacklisting part of the url or querystring, or ignoring querystrings
- See if you're crawling unnecessary pages, such as a messageboard. To Integrity and Scrutiny, a well-used messageboard can look like tens of thousands of unique pages and it will try to list and check all of those pages. Again, you can exclude these pages by blacklisting part of the url or querystring or ignoring querystrings.
- 'Page titles are unique' can also help you to avoid duplicating or crawling unnecessary pages, and will be quicker, but this only works if every page really does have a unique title.