Bookmarks Database and Internet Robot
WHAT IS IT
There is a set of classes, libraries, programs and plugins I use to
manipulate my bookmarks.html. I like Netscape Navigator, but I need more
features, so I write and maintain these programs for my needs. I need to
extend Navigator's "What's new" feature (Navigator 4 calls it "Update
bookmarks").
WHAT'S NEW in version 3.4.0
Updated to m_lib version 1.2. Extended support for Mozilla.
WHAT'S NEW in version 3.3.2
parse_html.py can now recode unicode entities in titles.
WHAT'S NEW in version 3.3.0
Required Python 2.2.
HTML parser. If the protocol is HTTP, and there is Content-Type header, and
content type is text/html, the object is parsed to extract its title; if the
Content-Type header has charset, or if the HTML has with charset, the
title is converted from the given charset to the default charset. The object is
also parsed to extract tag with redirect.
WHAT'S NEW in version 3.0
Complete rewrite from scratch. Created mechanism for pluggable storage
managers, writers (DB dumpers/exporters) and robots.
WHERE TO GET
Master site: http://phd.pp.ru/Software/Python/#bookmarks_db
Faster mirrors: http://phd.by.ru/Software/Python/#bookmarks_db
http://phd2.chat.ru/Software/Python/#bookmarks_db
AUTHOR
Oleg Broytmann
COPYRIGHT
Copyright (C) 1997-2002 PhiloSoft Design
LICENSE
GPL
STATUS
Storage managers: pickle, FLAD (Flat ASCII Database).
Writers: HTML, text, FLAD (full database or only errors).
Robots (URL checker): simple, simple+timeoutscoket, forking.
TODO
Parse downloaded file and get some additional information out of headers
and parsed data - title, for example. Or redirects using .
(Partially done - now extracting title).
Documentation.
Merge "writers" to storage managers.
New storage managers: shelve, SQL, ZODB, MetaKit.
Robots (URL checkers): threading, asyncore-based.
Aliases in bookmarks.html.
Configuration file for configuring defaults - global defaults for the system
and local defaults for subsystems.
Ruleset-based mechanisms to filter out what types of URLs to check: checking
based on URL schema, host, port, path, filename, extension, etc.
Detailed reports on robot run - what's old, what's new, what was moved,
errors, etc.
WWW-interface to the report.
Bigger database. Multiuser database. Robot should operate on a part of
the DB.
WWW-interface to the database. User will import/export/edit bookmarks,
schedule robot run, etc.