Integrity

If you've maintained a website for any length of time, you'll know that links very quickly become broken.
We all move, delete or change pages, and when we do, it not only results in our own internal links breaking, but other people's links to our website becoming broken. Similarly, when other people alter their pages, our own external links become broken.
A broken link on your site is a dead end for your visitors and will also be bad news for your search engine optimisation (SEO).
Unless you enjoy clicking every single link on your site followed by the back button, then you'll need to use a website crawler like Integrity!
Feed it your home page address (url) and Integrity will follow all of your internal links to find your pages, checking the server response code for all internal and external links found.
Integrity is donationware, which means that it's available to personal users free of charge with no restrictions. I'm very grateful for donations and if you choose to donate, it will encourage further development of this and other OSX software.
Screenshots



System Requirements
Mac OSX 10.3 or higher. (Note that as from v1.4, 10.2 is no longer supported).
Mac OSX Download
Integrity v3.2 is here
Now checks your images and has an improved user interface. See the version history below for full details.
PC Version?
If you're of the Windows persuasion, use Xenu's Link Sleuth. The developer has made it clear that he's not interested in producing a specific Mac version. I've no connection with Tilman Hausherr (though he seems like a great guy), and this is no more than a personal recommendation to use the Link Sleuth if you're a pc user.
Version History
Version 3.2
released November 2009
Changes to the user interface. Current url is displayed in a combo box along with the 'go' button at the top of the main window. The settings for the current url (previously called 'current config') are now displayed in the default tab of the main window. Flat and sortable views are now switched using tab buttons at the bottom of the main window.
Option for checking broken images added. Image urls are denoted by [img src] in the link text column.
Bug fix - alt text is now correctly shown (if it exists) in the link text column when the link contains an image rather than text. For example: [linked image]:NHS Direct
Some improvements to saving / deleting of settings for current site.
Auto-complete added to main url combo box. However, this only works if you type the 'http' or 'www' or however the saved url starts.
Progress indicator added for 'recheck this link'. Response time and time stamp are also correctly updated.
Help and donate links updated.
Automatic checking for updates. Checks for updates on startup. If a new version is available, informs user and invites visit to download page.
Version 3.1.2
released October 2009
Explicitly doesn't handle cookies (random behaviour previously).
will now pick up links withing imagemap area tags.
Version 3.1.1
released March 2009
Fixes bug which stopped further crawling if initial page is redirected.
Small efficiency/speed improvement.
Fixes bug which could register incorrect links if a request is redirected more than once.
Version3.1
released December 2008
Time stamp logged for each link checkedViews are now customisable - show or hide columns as you like. (Exported files reflect visible columns.)
"Redirected" no longer shows in status column as the information is available in its own column
New application icon with less transparency
Version3.02
released December 2008
Fixes bug related to unquoted href'sUnique page titles option (was new with v3.0 - crawls site faster and more accurately if you set this option and if your page titles *are* unique) now defaults to off for existing configs; defaulting to on was causing confusion.
Version3.01
released December 2008
Fixes bug preventing proper crawling of framesetsFixes problem with pause/continue button
Fixes problem with About panel
Version3
released December 2008
Adds 'Inspect Bad Links' to View menu (opens the first bad link in the link inspector)Adds 'Next Bad Link' button to link inspector (moves the link inspector to the next bad link if there is one)
Adds two new tools to the toolbar for 'Inspect bad links' and 'Inspect selected link' and a 'Customise Toolbar...' menu item
Adds highlighting feature - double-click an 'On page' from the list in the link inspector, Integrity will open selected page and highlight selected link with coloured background or coloured border.
Adds drop-down lists to preferences allowing you to choose the style of the highlighting (border / background, style and width of border)
Adds 'Archive pages while crawling' checkbox to preferences (archives pages while crawling - asks you for a save location when crawl is finished).
Version 2.2.2
released September 2008
If the link is around an image rather than text, the 'link text' columns will display [img]: and the alt text of the image.
'Redirected to' column added to flat view.
Changes
to button bar including addition of export as html, csv and text (tdl)
buttons. Now properly autosaves user customisation.
More information in the status display - now also shows how many bad links have been found
Version 2.2.1
released September 2008
Was generating the 'flat view' multiple times, giving the impression of 'hanging' after crawling large sites using lots of threads. Bug fixed, and progress bar added.Version 2.2
released July 2008
Server response time is logged. This is the time taken between Integrity sending the request and receiving the first response. This may not reflect the actual server response time if Integrity is running a large number of threads, or if the internet connection is busy
When Integrity has finished running, a 'flat' view is available, that can be sorted by any of the columns
Global preferences and current config are now combined into one tabbed window
Standard customisable toolbar added and main window rearranged. Stop is now renamed 'Pause'
Version 2.1
released June 2008
Crawls local files (drag the file into the 'starting URL' box)
Version 2.0 (beta)
Architecture
/ Logic changed. This fixes thread-safety issues (ie v1.x crashing on
faster machines when using larger number of threads). Architecture
change also makes v2 faster.
Now handles sites built using frames.
Max
number of threads increased. This was limited in version 1.6.6 as a
quick-fix to thread-safety issues. Max number of threads (when slider
is in 'more' position) is now 29, was 7.
'Threads' are no longer really separate threads owned by Integrity, but simultaneous asynchronous requests.
Version 1.6.11
released May 2008Fixes bug which was causing some links to be skipped on certain pages. Integrity's parser was getting confused sometimes by javascript on pages containing 'less than' and 'greater than' operators.
Other small fixes and efficiencies.
Version 1.6.10
released April 2008Progress indicators added to export functions.
Link info window now shows all occurrences of a link alongside the link text for each occurrence.
Version 1.6.9
released April 2008Fixes bug related to trimming which randomly prevented complete crawling of whole site.
Revised handling of incorrectly nested quotes - now correctly allows for apostrophes as part of url ( "/pdf/Educators'_Guide" ).
Help menu now links to support pages of peacockmedia.co.uk, 'Donate' menu option added.
Version 1.6.8
released April 2008
Routines for trimming whitespace, querystring etc rewritten in pure C, improving efficiency.
Better handling of incorrectly nested single/double quotes ( href = "http://..' )
Now correctly handles base href's which don't give a scheme (assumes http://)
Better
trimming of whitespace, ie carriage returns and other control
characters in unexpected places in the middle of <a ..> tags
Shows how many times a link occurs, not just how many pages it appears on (ie it may appear multiple times on same page).
Version 1.6.7
released March 2008
Fixes
bug which prevented links being found on a page if the end of a comment
and an 'end script' tag were adjacent to each other (
--></script> )
Version 1.6.6
released March 2008
sends user-agent string in header - default is "integrity/1.6" but this
can be changed (see Preferences) if your site needs integrity to appear
to be a recognised browser.
Other fixes and efficiencies.
Version 1.6.5
released November 2007
'whitelists' and 'blacklists' from the config are no longer case-sensitive.
some problems with mcms zref fixed. zrefs are now shown when good links are hidden.
links which are not checked because they are in the blacklist, are
treated as good links. They are hidden when good links are hidden and
are given no colour label.
"Hide good links" button has now become
"Show bad links only". This subtle change means that links which have
not been checked will not show and improves running.
Small fixes and efficiencies.
Version 1.6.4
released October 2007
Fixes problem with tab-delimited file export
Both tab-delimited and comma-separated exports are 'flat', ie each 'on page url' has its own row
Fixes crashes or problems caused by carriage returns or whitespace
present within a quoted href (yes, some html has really unexpected
features)
Ignores Javascript (anything between <script> tags)
More object retention fixes and small efficiencies
Version 1.6.2
released September 2007
'on page url' will now recognise 'http://peacockmedia.co.uk' and
'http://peacockmedia.co.uk/' as the same link. Therefore a broken links
may more correctly be reported on a lower number of pages and the whole
application is a little more efficient.
Recognises and reports 'zref' links, a difficult-to-find link inserted by Microsoft Content Management Server
other small efficiencies and fixes.
Version 1.6.1
released July 2007
Some changes to improve stability
Version 1.6
released July 2007
Adds user-definable colour labels (see Preferences). A 'good link' is
defined as server response code 2xx, redirected links include any 3xx
code, a bad link is a 4xx code, and an 'error' is a 5xx server code or
any other error.
Menu item added View > Info for Current Item
(command-I), shows link inspector pallette (previously only available
via double-click in the main table).
Fixes bug causing crash if no internet connection.
Version 1.5
released 28 May 2007
Supports base href.
Can now export tab-delimited text file along with CSV, plain text and HTML.
Improved HTML export - link urls are presented as links.
Adds 'Only follow links containing...' field.
Fixes bug allowing some 'commented out' urls to be tested.
Fixes bug preventing inspector window opening when some links double-clicked.
Preferences window added: allows choice of displaying 'on page' as url or page title.
Config Starting URL drop-down list behaviour improved .
Version 1.4.2
released May 21 2007
No longer parses and extracts links from error pages (eg 404 pages).
Now handles spaces in URLs (as long as correctly contained in single or double quotes).
Version 1.4
released April 22 2007
Fixes a problem in some earlier versions which prevented all links being found on some pages
HTML character entities in links are now 'un-encoded' (eg '&' is replaced with '&') before link is checked.
If link appears on more than one page, main table now shows actual number of pages rather than "multiple"
'Re-Check Bad Links' feature added (under File menu)
Fixes problem with export to CSV for some sites.
NB. early copies of 1.4 give the version number as 1.3.1 in about box.
Version 1.3.1
released April 7 2007
Fixes problem with the 'don't check URLs containing' feature which didn't work properly in v1.3
Fixes problem which caused some links to be missed
Small improvement to the stop button
Version 1.3
released April 6 2007
'This page only' checkbox added.
Status display more accurately shows number of links done.
Programme flow, thread safety and object retention improvements. Cures
an instability which seemed to be related to websites which have large
collections of external links and/or setting a larger number of threads.
Fixes bug preventing some link text from being recorded properly.
For some file types which may be larger files (pdf, mpg, mp3, jpg) the
parser no longer sends an http request to check the 'Content-Type',
speeding up the crawl time.
Version 1.2
released March 29 2007
Now tolerant to excessively long hrefs (previously hrefs over 1000
characters would break an internal limit and cause the application to
crash).
Timeout can now be set in the config window. Using a very
large number of threads can obviously make timeouts more likely and so
the timeout figure can now be increased accordingly.
The link
inspector window (double-click an entry in the main table) now shows
the 'on page' list in a form which is clickable. A double-click will
open the page in question.
The HTML report now shows the 'on page' column as links to the page in question.
Version 1.1
released March 26 2007
Link text shows up for more links - link text is still only held once
regardless of how many instances of that link are found on the site,
but if a link has no text (eg image link), then that will not overwrite
the existing link text.
Ignores javascript links as well as mailto links.
Fixes bug triggered by a return within the tag.
Fixes bug which could prevent all links being found on certain pages.
Version 1.0
released March 25 2007
First non-beta release, free and not set to expire. Not generally released, but provided to 2 magazine coverdiscs.
Version 0.5 (Beta)
released March 22 2007
corrected problem which allowed cached data to be checked - new data is now requested every time.
Fixes bug which could prevent some links being found if javascript present in page.
Version 0.4 (Beta)
released March 21 2007
Bug fixed which prevented some relative URLs from being formed correctly
Displays better information about any redirected urls. The final status
code shown is the status for the final (redirected to) URL
Link text included as column in main table
Change to programme flow and a number of small refinements and
efficiency improvements meaning that the application remains responsive
throughout larger crawls.
Bug fixed which prevented some configs saving properly
Version 0.3 (Beta)
released March 7 2007
Improved interface, added 'Continue' button, allows Integrity to be paused and re-started.
Exporting - results can be exported as HTML, CSV or plain text.
Version 0.2 (Beta)
released March 1 2007
Fixes bug preventing Integrity from following links where html is all uppercase.
Version 0.1 (Beta)
released Feb 2007