2 BOOKMARKS database and internet robot
4 Here is a set of classes, libraries and programs I use to manipulate my
5 bookmarks.html. I like Netscape Navigator, but I need more features, so I am
6 writing these programs for my needs. I need to extend Navigator's "What's new"
7 feature (Navigator 4 named it "Update bookmarks").
9 These programs are intended to run as follows.
10 1. bkmk2db converts bookmarks.html to bookmarks.db.
11 2. chk_urls (Internet robot) runs against bookmarks.db, checks every URL and
12 saves results in check.db.
13 3. db2bkmk converts bookmarks.db back to bookmarks.html.
14 Then I use this bookmarks file and...
15 4. bkmk2db converts bookmarks.html to bookmarks.db.
16 5. chk_urls (Internet robot) runs against bookmarks.db, checks every URL and
17 saves results in check.db (old file copied to check.old).
18 6. (An yet unnamed program) will compare check.old with check.db and generate
19 detailed report. For example:
22 this URL is unavailable due to: host not found...
24 Bookmarks database programs are almost debugged. What need to be done is
25 support for aliases. Second version of the internet robot is finished.
27 Although not required, these programs work fine with tty_pbar.py (my little
28 module for creating text-mode progress bars).
30 COPYRIGHT and LEGAL ISSUES
31 All programs copyrighted by Oleg Broytmann and PhiloSoft Design. All
32 sources protected by GNU GPL. Programs are provided "as-is", without any kind
33 of warranty. All usual blah-blah-blah.
38 ------------------------------ bkmk2db ------------------------------
40 bkmk2db.py - script to convert bookmarks.html to FLAD database.
43 bkmk2db.py [-its] [/path/to/bookmarks.html]
46 bkmk2db.py splits given file (or ./bookmarks.html) into FLAD database
47 bookmarks.db in current directory.
51 Inhibit progress bar. Default is to display progress bar if
55 Convert to text file (for debugging). Default is to convert to
59 Suppress output of statistics at the end of the program. Default
60 is to write how many lines the program read and how many URLs
61 parsed. Also suppress some messages during run.
64 The program starts working by writing lines to header file until
65 BookmarksParser initializes its own output file (this occur when
66 parser encountered 1st <DL> tag). It is misdesign.
68 Empty comments (no text after <DD>) are not marked specially in
69 database, so db2bkmk.py will not reconstruct it. I don't need empty
70 <DD>s, so I consider it as feature, not a real bug.
72 Aliases are not supported (yet).
75 ------------------------------ db2bkmk ------------------------------
77 db2bkmk.py - script to reconstruct bookmarks.html back from FLAD
81 db2bkmk.py [-is] [-t dict.db [-r]]
84 db2bkmk.py reads bookmarks.db and creates two HTML files -
85 public.html and private.html. The latter is just full
86 bookmarks.html, while the former file hides private folder.
90 Inhibit progress bar. Default is to display progress bar if
94 Suppress output of statistics at the end of the program. Default is
95 to write how many records the program proceed and how many URLs
96 created. Also suppress some messages during run.
99 For most tasks, if someone need to process bookmarks.db in a
100 regular way (for example, replace all "gopher://gopher." with
101 "http://www."), it is easy to write special program, processing
102 every DB record. For some tasks it is even simpler and faster to
103 write sed/awk scripts. But there are cases when someone need to
104 process bookmarks.db in a non-regular way: one URL must be changed
105 in one way, another URL - in second way, etc. The -t option allows
106 to use external dictionary for such translation. The dictionary
107 itself is again FLAD database, where every record have two keys -
108 URL1 and URL2. With -t option in effect, db2bkmk generates
109 {private,public}.html, renames them to {private,public}.1, and
110 then translates the entire bookmarks.db again, generating
111 {private,public}.2 (totally 4 files), where every URL1 replaced
112 with URL2 from dictionary. (See koi2win.db for example of
113 translation dictionary)
116 Reverse the effect of -t option - translate from URL2 to URL1.
119 There are three hacks under line marked with "Dirty hacks here":
120 1. if record["Folder"] == "Private links":
121 This is to hide passwords from my bookmarks file.
123 2. if record["Folder"] == "All the rest - Unclassified":
124 outfile.write(" "*level + "<DT><H3 NEWITEMHEADER ...")
125 First, I compare folder name with fixed string. This is real string
126 from my bookmarks.html. If anyone want to use the program (s)he
127 should change at least the very strings "Private links" and "All the
128 rest - Unclassified". Second, I use netscapism "NEWITEMHEADER".
129 Yes, I wrote these programs for Navigator's bookmarks.html, but I
130 still would not like to use too many netscapisms here.
133 ------------------------------ check_db ------------------------------
135 check_db.py - script to test generated FLAD database.
138 check_db.py [-s] [-l logfile.err]
141 check_db.py reads bookmarks.db and tests for various conditions and
142 possible errors. Current tests are for duplicated URLs and too big
143 indent. "Indent without folder" or "Indent too big" may occur if
144 someone edit bookmarks.db manually, inserting a record with incorrect
145 (higher) level (lower levels indents are ok). Every record tested for
146 correct format (that there are no spare keys, date formats are
151 Put error log into log file (errors are printed to stderr
155 Suppress information messages while running (errors are printed
159 ------------------------------ chk_urls -----------------------------
161 chk_urls.py - Internet robot
167 chk_urls.py runs against bookmarks.db, checking every URL and store
168 results in check.db. check.db is FLAD database almost identical to
169 bookmarks.db, with modified LastVisit/LastModified fields. Additional
170 field Error appeared in records that have not been checked by some
171 reasons; the reason is a content of Error field.
172 After every 100 URLs chk_urls creates checkpoint file check.dat (in
173 set_checkpoint()). The file is FLAD suitable to pass to
174 fladc.load_file() (in get_checkpoint()). If interrupted by ^C, killed
175 or crashed, chk_urls can be restarted, and checkpoint file helps to
176 restart from interrupted state. Checkpoint stores size and mtime of
177 bookmarks.db (to note if the file changed while chk_urls interrupted)
178 and last checked record. If chk_urls cannot find checkpoint file, or
179 bookmarks.html changed, chk_urls will restart from the beginning. If
180 there is valid checkpoint and size/mtime are ok - chk_urls will start
181 from interrupted record.
185 Inhibit progress bar. Default is to display progress bar if
189 Suppress output of statistics at the end of the program. Default is
190 to write how many records the program proceed and how many URLs
191 checked. Also suppress some messages during run.
194 Ugly mechanism to catch welcome message from FTP server (from urllib).
197 ------------------------------ chk_urls -----------------------------
199 check_urls2.py - Internet robot
205 check_urls2 is just a second version of chk_urls.py. It forks off a child
206 process and the child check URLs. The parent monitors the child and kills
207 it, if there is no answer within 15 minutes.