+++ /dev/null
-
- BOOKMARKS database and internet robot
-
- Here is a set of classes, libraries and programs I use to manipulate my
-bookmarks.html. I like Netscape Navigator, but I need more features, so I am
-writing these programs for my needs. I need to extend Navigator's "What's new"
-feature (Navigator 4 named it "Update bookmarks").
-
- These programs are intended to run as follows.
-1. bkmk2db converts bookmarks.html to bookmarks.db.
-2. chk_urls (Internet robot) runs against bookmarks.db, checks every URL and
- saves results in check.db.
-3. db2bkmk converts bookmarks.db back to bookmarks.html.
- Then I use this bookmarks file and...
-4. bkmk2db converts bookmarks.html to bookmarks.db.
-5. chk_urls (Internet robot) runs against bookmarks.db, checks every URL and
- saves results in check.db (old file copied to check.old).
-6. (An yet unnamed program) will compare check.old with check.db and generate
-detailed report. For example:
- this URL is unchanged
- this URL is changed
- this URL is unavailable due to: host not found...
-
- Bookmarks database programs are almost debugged. What need to be done is
-support for aliases. Second version of the internet robot is finished.
-
- Although not required, these programs work fine with tty_pbar.py (my little
-module for creating text-mode progress bars).
-
-COPYRIGHT and LEGAL ISSUES
- All programs copyrighted by Oleg Broytmann and PhiloSoft Design. All
-sources protected by GNU GPL. Programs are provided "as-is", without any kind
-of warranty. All usual blah-blah-blah.
-
- #include <disclaimer>
-
-
------------------------------- bkmk2db ------------------------------
- NAME
- bkmk2db.py - script to convert bookmarks.html to FLAD database.
-
- SYNOPSIS
- bkmk2db.py [-its] [/path/to/bookmarks.html]
-
- DESCRIPTION
- bkmk2db.py splits given file (or ./bookmarks.html) into FLAD database
- bookmarks.db in current directory.
-
- Options:
- -i
- Inhibit progress bar. Default is to display progress bar if
- stderr.isatty()
-
- -t
- Convert to text file (for debugging). Default is to convert to
- FLAD.
-
- -s
- Suppress output of statistics at the end of the program. Default
- is to write how many lines the program read and how many URLs
- parsed. Also suppress some messages during run.
-
- BUGS
- The program starts working by writing lines to header file until
- BookmarksParser initializes its own output file (this occur when
- parser encountered 1st <DL> tag). It is misdesign.
-
- Empty comments (no text after <DD>) are not marked specially in
- database, so db2bkmk.py will not reconstruct it. I don't need empty
- <DD>s, so I consider it as feature, not a real bug.
-
- Aliases are not supported (yet).
-
-
------------------------------- db2bkmk ------------------------------
- NAME
- db2bkmk.py - script to reconstruct bookmarks.html back from FLAD
- database.
-
- SYNOPSIS
- db2bkmk.py [-is] [-t dict.db [-r]]
-
- DESCRIPTION
- db2bkmk.py reads bookmarks.db and creates two HTML files -
- public.html and private.html. The latter is just full
- bookmarks.html, while the former file hides private folder.
-
- Options:
- -i
- Inhibit progress bar. Default is to display progress bar if
- stderr.isatty()
-
- -s
- Suppress output of statistics at the end of the program. Default is
- to write how many records the program proceed and how many URLs
- created. Also suppress some messages during run.
-
- -t dict.db
- For most tasks, if someone need to process bookmarks.db in a
- regular way (for example, replace all "gopher://gopher." with
- "http://www."), it is easy to write special program, processing
- every DB record. For some tasks it is even simpler and faster to
- write sed/awk scripts. But there are cases when someone need to
- process bookmarks.db in a non-regular way: one URL must be changed
- in one way, another URL - in second way, etc. The -t option allows
- to use external dictionary for such translation. The dictionary
- itself is again FLAD database, where every record have two keys -
- URL1 and URL2. With -t option in effect, db2bkmk generates
- {private,public}.html, renames them to {private,public}.1, and
- then translates the entire bookmarks.db again, generating
- {private,public}.2 (totally 4 files), where every URL1 replaced
- with URL2 from dictionary. (See koi2win.db for example of
- translation dictionary)
-
- -r
- Reverse the effect of -t option - translate from URL2 to URL1.
-
- BUGS
- There are three hacks under line marked with "Dirty hacks here":
- 1. if record["Folder"] == "Private links":
- This is to hide passwords from my bookmarks file.
-
- 2. if record["Folder"] == "All the rest - Unclassified":
- outfile.write(" "*level + "<DT><H3 NEWITEMHEADER ...")
- First, I compare folder name with fixed string. This is real string
- from my bookmarks.html. If anyone want to use the program (s)he
- should change at least the very strings "Private links" and "All the
- rest - Unclassified". Second, I use netscapism "NEWITEMHEADER".
- Yes, I wrote these programs for Navigator's bookmarks.html, but I
- still would not like to use too many netscapisms here.
-
-
------------------------------- check_db ------------------------------
- NAME
- check_db.py - script to test generated FLAD database.
-
- SYNOPSIS
- check_db.py [-s] [-l logfile.err]
-
- DESCRIPTION
- check_db.py reads bookmarks.db and tests for various conditions and
- possible errors. Current tests are for duplicated URLs and too big
- indent. "Indent without folder" or "Indent too big" may occur if
- someone edit bookmarks.db manually, inserting a record with incorrect
- (higher) level (lower levels indents are ok). Every record tested for
- correct format (that there are no spare keys, date formats are
- correct).
-
- Options:
- -l logfile.err
- Put error log into log file (errors are printed to stderr
- anyway).
-
- -s
- Suppress information messages while running (errors are printed
- anyway).
-
-
------------------------------- chk_urls -----------------------------
- NAME
- chk_urls.py - Internet robot
-
- SYNOPSIS
- chk_urls.py [-is]
-
- DESCRIPTION
- chk_urls.py runs against bookmarks.db, checking every URL and store
- results in check.db. check.db is FLAD database almost identical to
- bookmarks.db, with modified LastVisit/LastModified fields. Additional
- field Error appeared in records that have not been checked by some
- reasons; the reason is a content of Error field.
- After every 100 URLs chk_urls creates checkpoint file check.dat (in
- set_checkpoint()). The file is FLAD suitable to pass to
- fladc.load_file() (in get_checkpoint()). If interrupted by ^C, killed
- or crashed, chk_urls can be restarted, and checkpoint file helps to
- restart from interrupted state. Checkpoint stores size and mtime of
- bookmarks.db (to note if the file changed while chk_urls interrupted)
- and last checked record. If chk_urls cannot find checkpoint file, or
- bookmarks.html changed, chk_urls will restart from the beginning. If
- there is valid checkpoint and size/mtime are ok - chk_urls will start
- from interrupted record.
-
- Options:
- -i
- Inhibit progress bar. Default is to display progress bar if
- stderr.isatty()
-
- -s
- Suppress output of statistics at the end of the program. Default is
- to write how many records the program proceed and how many URLs
- checked. Also suppress some messages during run.
-
- BUGS
- Ugly mechanism to catch welcome message from FTP server (from urllib).
-
-
------------------------------- chk_urls -----------------------------
- NAME
- check_urls2.py - Internet robot
-
- SYNOPSIS
- check_urls2.py [-is]
-
- DESCRIPTION
- check_urls2 is just a second version of chk_urls.py. It forks off a child
- process and the child check URLs. The parent monitors the child and kills
- it, if there is no answer within 15 minutes.