- Cleanup HTML using BeautifulSoap or Tidy.
+ Cleanup HTML before parsing using BeautifulSoap or Tidy.
Parse downloaded file and get javascript redirects.
More and better documentation.
New storage managers: shelve, SQL, ZODB, MetaKit.
More robots (URL checkers): threading, asyncore-based.
- Configuration file for configuring defaults - global defaults for the system
+ Configuration file to configure defaults - global defaults for the system
and local defaults for subsystems.
Ruleset-based mechanisms to filter out what types of URLs to check: checking