Integrity support

 

Before emailing your question, please take a quick look at the FAQ's below to see whether your question's answered.

Also see Integrity's home page for full version history and other information.

Failing that, please email me at shiela@peacockmedia.co.uk. Don't forget to tell me the url that you're starting from.

What does the "page titles are unique" option do?

Choosing this option is a quicker and more accurate way for Integrity to crawl your site, but it only works if each of your pages has a different title.

After checking each internal link, Integrity has to then fetch the contents of the page, read through it and pull out the links from that page. That's how it crawls the site. It'll get a link like "index.html" lots of times (on every page perhaps) so before fetching the contents, it has to decide whether it's done that page already. It compares the new link with the list of those it's already done.

Integrity used to use the url to determine this. However, it's often the case that the same page is referred to by a number of different urls - eg peacockmedia.co.uk and peacockmedia.co.uk/index.html are the same page, but Integrity can't know that. Some content management systems can refer to the same page by quite a few different urls. That means that Integrity could do lots more work than it needed to, and over-report the number of links and pages.

Should I set "ignore querystrings"?

The querystring is information within the url of a page. It follows a '?' - for example www.mysite.co.uk/index.html?thisis=thequerystring. If you don't use querystrings on your site, then it won't matter whether you set this option. If your page is the same with or without the querysrting (for example, if it contains a session id) then check 'ignore querystrings'. If the querystring determines which page appears (for example, if it contains the page id) then you shouldn't ignore querystrings, because Integrity won't crawl your site properly.

I need to make Integrity appear to be a 'real' browser

You can change Integrity's user-agent string to make it appear to the server to be a browser (known as 'spoofing').

Go to Integrity>Preferences and paste your chosen user-agent string into the box

There is an incredibly comprehensive list of browser user-agent strings on this page: http://www.zytrax.com/tech/web/browser_ids.htm

If you would like to find the user-agent string of the browser you're using now, just hit this link:
What's my user-agent string?