]> git.phdru.name Git - bookmarks_db.git/log
bookmarks_db.git
15 months agoStyle(bkmk_rurllib2): Remove unused import
Oleg Broytman [Wed, 31 Jul 2024 16:21:58 +0000 (19:21 +0300)]
Style(bkmk_rurllib2): Remove unused import

Found by `flake8`.

15 months agoRefactor(Robots): Move proxy handling to base class
Oleg Broytman [Wed, 31 Jul 2024 15:49:11 +0000 (18:49 +0300)]
Refactor(Robots): Move proxy handling to base class

This greatly simplifies robots.

15 months agoFeat(Robots): Return HTTP status code
Oleg Broytman [Wed, 31 Jul 2024 15:14:05 +0000 (18:14 +0300)]
Feat(Robots): Return HTTP status code

16 months agoDocs(TODO): Robot(s) that test many URLs in parallel
Oleg Broytman [Fri, 26 Jul 2024 10:12:14 +0000 (13:12 +0300)]
Docs(TODO): Robot(s) that test many URLs in parallel

Increase task priority.

16 months agoDocs(TODO): Robot based on aiohttp
Oleg Broytman [Fri, 26 Jul 2024 10:11:22 +0000 (13:11 +0300)]
Docs(TODO): Robot based on aiohttp

16 months agoAdd `setup.cfg` and `setup.py`
Oleg Broytman [Fri, 26 Jul 2024 01:32:50 +0000 (04:32 +0300)]
Add `setup.cfg` and `setup.py`

Mostly to list required and optional dependencies.

16 months agoFix(bkmk_db-venv): Do not exit
Oleg Broytman [Wed, 24 Jul 2024 21:01:18 +0000 (00:01 +0300)]
Fix(bkmk_db-venv): Do not exit

This is not a shell script, this is a sourced file.

16 months agoChore: Rename `bkmk-venv` to `bkmk_db-venv`
Oleg Broytman [Wed, 24 Jul 2024 20:44:41 +0000 (23:44 +0300)]
Chore: Rename `bkmk-venv` to `bkmk_db-venv`

16 months agoChore(bkmk-venv): Rename `.venv` to `bkmk_db-venv`
Oleg Broytman [Wed, 24 Jul 2024 20:43:45 +0000 (23:43 +0300)]
Chore(bkmk-venv): Rename `.venv` to `bkmk_db-venv`

16 months agoFeat: Cleanup redirects
Oleg Broytman [Wed, 24 Jul 2024 02:26:03 +0000 (05:26 +0300)]
Feat: Cleanup redirects

Remove verbiage.

16 months agoFeat: Skip URLs that have '%s'
Oleg Broytman [Wed, 24 Jul 2024 02:05:56 +0000 (05:05 +0300)]
Feat: Skip URLs that have '%s'

16 months agoFix: These are not errors, just duplicates
Oleg Broytman [Wed, 24 Jul 2024 01:47:39 +0000 (04:47 +0300)]
Fix: These are not errors, just duplicates

16 months agoBuild(Robots/bkmk_rrequests): Use HTTP(S) proxy instead of SOCKS5
Oleg Broytman [Tue, 23 Jul 2024 10:08:11 +0000 (13:08 +0300)]
Build(Robots/bkmk_rrequests): Use HTTP(S) proxy instead of SOCKS5

20 months agoFix(Robot): Stop splitting and un-splitting URLs 5.3.1
Oleg Broytman [Wed, 6 Mar 2024 15:43:48 +0000 (18:43 +0300)]
Fix(Robot): Stop splitting and un-splitting URLs

Pass `bookmark.href` as is.

20 months agoFix(get_url): Remove excessive printing
Oleg Broytman [Wed, 6 Mar 2024 15:36:17 +0000 (18:36 +0300)]
Fix(get_url): Remove excessive printing

`robot.get()` doesn't really fill the bookmarks,
`robot.check_url()` does but we don't call it here.

20 months agoRename `check_url.py` to `check_urls.py`
Oleg Broytman [Wed, 6 Mar 2024 15:35:03 +0000 (18:35 +0300)]
Rename `check_url.py` to `check_urls.py`

20 months agoRename `check_urls.py` to `check_urls_db.py`
Oleg Broytman [Wed, 6 Mar 2024 15:32:22 +0000 (18:32 +0300)]
Rename `check_urls.py` to `check_urls_db.py`

20 months agoVersion 5.3.0 5.3.0
Oleg Broytman [Tue, 5 Mar 2024 23:48:48 +0000 (02:48 +0300)]
Version 5.3.0

   Added get_url.py: a script to get one file from an URL.
   Renamed set-URLs -> set-urls.

20 months agoAdd `get_url.py`: a script to get one file from an URL
Oleg Broytman [Tue, 5 Mar 2024 23:47:23 +0000 (02:47 +0300)]
Add `get_url.py`: a script to get one file from an URL

20 months agoRename set-URLs -> set-urls
Oleg Broytman [Tue, 5 Mar 2024 23:33:09 +0000 (02:33 +0300)]
Rename set-URLs -> set-urls

20 months agoVersion 5.2.5 5.2.5
Oleg Broytman [Tue, 5 Mar 2024 23:24:17 +0000 (02:24 +0300)]
Version 5.2.5

   Feat(Robots/bkmk_rrequests): Ignore all problems with certificates.
   Fix(Robots/bkmk_robot_base): Pass query part.

20 months agoFix(Robots/bkmk_robot_base): Pass query part
Oleg Broytman [Tue, 5 Mar 2024 20:22:39 +0000 (23:22 +0300)]
Fix(Robots/bkmk_robot_base): Pass query part

20 months agoFeat(Robots/bkmk_rrequests): Ignore all problems with certificates
Oleg Broytman [Tue, 5 Mar 2024 20:14:47 +0000 (23:14 +0300)]
Feat(Robots/bkmk_rrequests): Ignore all problems with certificates

Drop SSL/TLS security to the lowest level.
I want to get the pages at all cost.
Unmatched names, expired certificates,
small DH values are less of a concern for me
comparing with DNS errors and connection timeouts.

20 months agoVersion 5.2.4: No need to re-check error 404 via proxy 5.2.4
Oleg Broytman [Mon, 4 Mar 2024 15:15:04 +0000 (18:15 +0300)]
Version 5.2.4: No need to re-check error 404 via proxy

20 months agoFix(Robots/bkmk_rrequests): Add forgotten spaces in log
Oleg Broytman [Mon, 4 Mar 2024 15:13:13 +0000 (18:13 +0300)]
Fix(Robots/bkmk_rrequests): Add forgotten spaces in log

20 months agoFix(Robots/bkmk_rrequests): No need to re-check error 404 via proxy
Oleg Broytman [Mon, 4 Mar 2024 10:48:26 +0000 (13:48 +0300)]
Fix(Robots/bkmk_rrequests): No need to re-check error 404 via proxy

20 months agoVersion 5.2.3 5.2.3
Oleg Broytman [Sun, 3 Mar 2024 20:49:55 +0000 (23:49 +0300)]
Version 5.2.3

Feat(Robots/bkmk_rrequests): Report 40x and 50x errors.
Fix HTML pasrer based on Bs4: Find "shortcut icon".

20 months agoFeat(Robots/bkmk_rrequests): Report 40x and 50x errors
Oleg Broytman [Sun, 3 Mar 2024 20:41:54 +0000 (23:41 +0300)]
Feat(Robots/bkmk_rrequests): Report 40x and 50x errors

20 months agoFeat(Robots/bkmk_rrequests): Change error message
Oleg Broytman [Sun, 3 Mar 2024 20:31:44 +0000 (23:31 +0300)]
Feat(Robots/bkmk_rrequests): Change error message

20 months agoFix(parse_html/bkmk_ph_beautifulsoup4): Find "shortcut icon"
Oleg Broytman [Sun, 3 Mar 2024 14:47:58 +0000 (17:47 +0300)]
Fix(parse_html/bkmk_ph_beautifulsoup4): Find "shortcut icon"

Bs4 splits attribute values. To fix it the value must be re-combined back.

20 months agoFix(Robots/bkmk_robot_base): Add forgotten spaces in log
Oleg Broytman [Sun, 3 Mar 2024 14:24:52 +0000 (17:24 +0300)]
Fix(Robots/bkmk_robot_base): Add forgotten spaces in log

20 months agoVersion 5.2.2 5.2.2
Oleg Broytman [Sun, 3 Mar 2024 10:29:06 +0000 (13:29 +0300)]
Version 5.2.2

   Robots/bkmk_rrequests: Add request headers.
   Robots/bkmk_robot_base: Process "data:image/" icons.

20 months agoFeat(Robots/bkmk_robot_base): Process "data:image/" icons
Oleg Broytman [Sun, 3 Mar 2024 10:22:48 +0000 (13:22 +0300)]
Feat(Robots/bkmk_robot_base): Process "data:image/" icons

20 months agoFeat(Robots/bkmk_rrequests): Add request headers
Oleg Broytman [Sun, 3 Mar 2024 10:10:13 +0000 (13:10 +0300)]
Feat(Robots/bkmk_rrequests): Add request headers

20 months agoRefactor(Robots): Refactor request headers
Oleg Broytman [Sun, 3 Mar 2024 09:48:11 +0000 (12:48 +0300)]
Refactor(Robots): Refactor request headers

20 months agoStyle(Robots/bkmk_rurllib_py3): Remove unused variable
Oleg Broytman [Sun, 3 Mar 2024 09:47:40 +0000 (12:47 +0300)]
Style(Robots/bkmk_rurllib_py3): Remove unused variable

20 months agoFix(Robots/bkmk_robot_base): Ignore unknown charset
Oleg Broytman [Sat, 2 Mar 2024 13:28:46 +0000 (16:28 +0300)]
Fix(Robots/bkmk_robot_base): Ignore unknown charset

There are sites that provide incorrect
(most probably misspelled) charset.

20 months agoFix(Robots/bkmk_robot_base): Add forgotten space in log 5.2.1
Oleg Broytman [Sat, 2 Mar 2024 09:28:34 +0000 (12:28 +0300)]
Fix(Robots/bkmk_robot_base): Add forgotten space in log

20 months agoPerf(Rebobt/requests): Speedup second access
Oleg Broytman [Sat, 2 Mar 2024 09:13:42 +0000 (12:13 +0300)]
Perf(Rebobt/requests): Speedup second access

Use proxy immediately for hosts
for which we already know they require proxy.

Don't use proxy for hosts that aren't accessible even through proxy,
immediately return an error.

20 months agoRefactor(Rebobt/requests)
Oleg Broytman [Fri, 1 Mar 2024 21:02:57 +0000 (00:02 +0300)]
Refactor(Rebobt/requests)

20 months agoFeat: For the robot based on requests allow to use a proxy 5.2.0
Oleg Broytman [Fri, 1 Mar 2024 20:57:56 +0000 (23:57 +0300)]
Feat: For the robot based on requests allow to use a proxy

20 months agoFeat: Robot based on requests 5.1.0
Oleg Broytman [Wed, 28 Feb 2024 21:18:38 +0000 (00:18 +0300)]
Feat: Robot based on requests

21 months agoFeat(venv): Use `venv` if `virtualenv` is not available
Oleg Broytman [Wed, 28 Feb 2024 19:03:42 +0000 (22:03 +0300)]
Feat(venv): Use `venv` if `virtualenv` is not available

2 years agoFix(Py3): Use `urllib.parse.urlsplit()`
Oleg Broytman [Tue, 28 Nov 2023 17:04:18 +0000 (20:04 +0300)]
Fix(Py3): Use `urllib.parse.urlsplit()`

2 years agoRelease 5.0.0 5.0.0
Oleg Broytman [Wed, 22 Nov 2023 16:09:56 +0000 (19:09 +0300)]
Release 5.0.0

2 years agoDocs: Update
Oleg Broytman [Wed, 22 Nov 2023 16:09:45 +0000 (19:09 +0300)]
Docs: Update

2 years agoFix(Py3): Open list of titles in UTF-8
Oleg Broytman [Tue, 21 Nov 2023 18:47:34 +0000 (21:47 +0300)]
Fix(Py3): Open list of titles in UTF-8

2 years agoFix(Py3): Always open text storage files in UTF-8
Oleg Broytman [Tue, 21 Nov 2023 18:46:42 +0000 (21:46 +0300)]
Fix(Py3): Always open text storage files in UTF-8

2 years agoFix(Py3): Always log in UTF-8
Oleg Broytman [Mon, 20 Nov 2023 20:58:14 +0000 (23:58 +0300)]
Fix(Py3): Always log in UTF-8

2 years agoFix(Py3): `html.parser` cannot parse bytes
Oleg Broytman [Mon, 20 Nov 2023 17:49:22 +0000 (20:49 +0300)]
Fix(Py3): `html.parser` cannot parse bytes

Decode to unicode from a known encoding.

2 years agoFix(Py3): Decode content using HTTP chrset
Oleg Broytman [Mon, 20 Nov 2023 17:34:36 +0000 (20:34 +0300)]
Fix(Py3): Decode content using HTTP chrset

2 years agoFix(Py3): `urllib` writes its files as bytes
Oleg Broytman [Mon, 20 Nov 2023 17:33:42 +0000 (20:33 +0300)]
Fix(Py3): `urllib` writes its files as bytes

2 years agoFix(parse_html CLI): Report encodings and the title
Oleg Broytman [Mon, 20 Nov 2023 16:21:22 +0000 (19:21 +0300)]
Fix(parse_html CLI): Report encodings and the title

2 years agoFix(parse_html/bkmk_parse_html.py): Open the file with known encoding
Oleg Broytman [Mon, 20 Nov 2023 16:20:31 +0000 (19:20 +0300)]
Fix(parse_html/bkmk_parse_html.py): Open the file with known encoding

2 years agoFix(parse_html/bkmk_ph_beautifulsoup4): Fix title recombination
Oleg Broytman [Mon, 20 Nov 2023 01:12:54 +0000 (04:12 +0300)]
Fix(parse_html/bkmk_ph_beautifulsoup4): Fix title recombination

2 years agoFix(Py3): Remove forgotten `.decode()`/`.encode()`
Oleg Broytman [Mon, 20 Nov 2023 01:02:30 +0000 (04:02 +0300)]
Fix(Py3): Remove forgotten `.decode()`/`.encode()`

2 years agoFeat: Remove some HTML parsers
Oleg Broytman [Mon, 20 Nov 2023 00:50:26 +0000 (03:50 +0300)]
Feat: Remove some HTML parsers

EtreeTidy is outdated and buggy.
html5 is outdated.

2 years agoStyle: Fix `flake8` E501 line too long
Oleg Broytman [Mon, 20 Nov 2023 00:39:46 +0000 (03:39 +0300)]
Style: Fix `flake8` E501 line too long

2 years agoStyle: Fix `flake8` E402 module level import not at top of file
Oleg Broytman [Mon, 20 Nov 2023 00:38:00 +0000 (03:38 +0300)]
Style: Fix `flake8` E402 module level import not at top of file

2 years agoChore(venv): Only run `pip install` on fresh virtual env
Oleg Broytman [Mon, 20 Nov 2023 00:16:46 +0000 (03:16 +0300)]
Chore(venv): Only run `pip install` on fresh virtual env

2 years agoFeat(check_url.py): Print "Moved", "Size", "Md5"
Oleg Broytman [Mon, 20 Nov 2023 00:00:06 +0000 (03:00 +0300)]
Feat(check_url.py): Print "Moved", "Size", "Md5"

2 years agoFix(robots): Fix "Content-Length" header returning `None`
Oleg Broytman [Sun, 19 Nov 2023 23:58:53 +0000 (02:58 +0300)]
Fix(robots): Fix "Content-Length" header returning `None`

2 years agoFix(robots): Store charset
Oleg Broytman [Sat, 18 Nov 2023 16:47:22 +0000 (19:47 +0300)]
Fix(robots): Store charset

2 years agoFix(robots): Do not parse empty strings
Oleg Broytman [Fri, 17 Nov 2023 23:55:10 +0000 (02:55 +0300)]
Fix(robots): Do not parse empty strings

Some sites return empty "html" that consist only of white spaces.
Strip them to get really empty string.

2 years agoFix(parse_html): Do not parse empty strings
Oleg Broytman [Fri, 17 Nov 2023 23:54:46 +0000 (02:54 +0300)]
Fix(parse_html): Do not parse empty strings

2 years agoFix(Py3): Reconfigure logs to write in UTF-8
Oleg Broytman [Fri, 17 Nov 2023 22:32:53 +0000 (01:32 +0300)]
Fix(Py3): Reconfigure logs to write in UTF-8

2 years agoBuild(Makefile): Update the list of example shell scripts
Oleg Broytman [Fri, 17 Nov 2023 21:48:40 +0000 (00:48 +0300)]
Build(Makefile): Update the list of example shell scripts

2 years agoFeat: Delete bookmarks
Oleg Broytman [Thu, 16 Nov 2023 07:27:26 +0000 (10:27 +0300)]
Feat: Delete bookmarks

2 years agoFeat(robots): Align "Content-Type"
Oleg Broytman [Thu, 16 Nov 2023 05:35:41 +0000 (08:35 +0300)]
Feat(robots): Align "Content-Type"

2 years agoFix(parse_html): Do not parse empty strings
Oleg Broytman [Thu, 16 Nov 2023 05:33:45 +0000 (08:33 +0300)]
Fix(parse_html): Do not parse empty strings

2 years agoFix(Py3): Fix `unescape`
Oleg Broytman [Thu, 16 Nov 2023 05:26:52 +0000 (08:26 +0300)]
Fix(Py3): Fix `unescape`

2 years agoFix(Py3): Fix `check_url.py`
Oleg Broytman [Wed, 15 Nov 2023 21:28:08 +0000 (00:28 +0300)]
Fix(Py3): Fix `check_url.py`

2 years agoBuild: Make Python virtual environment
Oleg Broytman [Wed, 15 Nov 2023 18:12:15 +0000 (21:12 +0300)]
Build: Make Python virtual environment

Install libraries.

2 years agoFix(Py3): Fix HTML parsers
Oleg Broytman [Wed, 15 Nov 2023 16:58:36 +0000 (19:58 +0300)]
Fix(Py3): Fix HTML parsers

2 years agoFeat(robots): Handle HTTP redirect 308
Oleg Broytman [Tue, 14 Nov 2023 23:27:46 +0000 (02:27 +0300)]
Feat(robots): Handle HTTP redirect 308

2 years agoFeat: Improve stats
Oleg Broytman [Tue, 14 Nov 2023 18:01:59 +0000 (21:01 +0300)]
Feat: Improve stats

2 years agoFeat: Open log files in UTF-8 encoding
Oleg Broytman [Tue, 14 Nov 2023 17:56:26 +0000 (20:56 +0300)]
Feat: Open log files in UTF-8 encoding

2 years agoFeat: Log reports to files
Oleg Broytman [Tue, 14 Nov 2023 17:53:50 +0000 (20:53 +0300)]
Feat: Log reports to files

2 years agoDocs(TODO): Increase priority for robots
Oleg Broytman [Tue, 14 Nov 2023 16:56:41 +0000 (19:56 +0300)]
Docs(TODO): Increase priority for robots

2 years agoFeat: Report redirects and set URLs
Oleg Broytman [Tue, 14 Nov 2023 15:11:12 +0000 (18:11 +0300)]
Feat: Report redirects and set URLs

Run through the bookmarks database and set URLs from redirects
from an external file.

2 years agoFix(Py3): Catch `http.client.IncompleteRead`
Oleg Broytman [Mon, 13 Nov 2023 22:21:45 +0000 (01:21 +0300)]
Fix(Py3): Catch `http.client.IncompleteRead`

2 years agoFix(Py3): Guess input file encoding
Oleg Broytman [Mon, 13 Nov 2023 15:13:14 +0000 (18:13 +0300)]
Fix(Py3): Guess input file encoding

2 years agoChore: Explicitly open text files in text mode
Oleg Broytman [Mon, 13 Nov 2023 14:39:17 +0000 (17:39 +0300)]
Chore: Explicitly open text files in text mode

2 years agoFix(Py3): Open output text files in utf-8 encoding
Oleg Broytman [Mon, 13 Nov 2023 14:36:01 +0000 (17:36 +0300)]
Fix(Py3): Open output text files in utf-8 encoding

2 years agoDocs: Update
Oleg Broytman [Sun, 12 Nov 2023 19:10:20 +0000 (22:10 +0300)]
Docs: Update

2 years agoFix(robots): Process redirect with non-encoded URL
Oleg Broytman [Sun, 12 Nov 2023 18:56:15 +0000 (21:56 +0300)]
Fix(robots): Process redirect with non-encoded URL

2 years agoFix(robots): Process response without `Content-Type`
Oleg Broytman [Sun, 12 Nov 2023 18:19:58 +0000 (21:19 +0300)]
Fix(robots): Process response without `Content-Type`

Try to recognize HTML.

2 years agoFix(Py3): Fix log reporting
Oleg Broytman [Sun, 12 Nov 2023 18:11:22 +0000 (21:11 +0300)]
Fix(Py3): Fix log reporting

`error` could be bytes.

2 years agoFix(Py3): Fix subrocess: pass bytes streams to `RecordFile`
Oleg Broytman [Sun, 12 Nov 2023 16:11:20 +0000 (19:11 +0300)]
Fix(Py3): Fix subrocess: pass bytes streams to `RecordFile`

2 years agoFix(Py3): Subrocess must use `urllib`
Oleg Broytman [Sun, 12 Nov 2023 14:36:14 +0000 (17:36 +0300)]
Fix(Py3): Subrocess must use `urllib`

`urllib2` robot doesn't work in Python 3.

2 years agoFix(Py3): Fix `subproc.py`
Oleg Broytman [Sun, 12 Nov 2023 14:23:26 +0000 (17:23 +0300)]
Fix(Py3): Fix `subproc.py`

Work with bytes.

2 years agoFix(Py3): Fix absolute import
Oleg Broytman [Sun, 12 Nov 2023 13:57:51 +0000 (16:57 +0300)]
Fix(Py3): Fix absolute import

2 years agoFix(Py3): Some socket errors are reported as `OSError`
Oleg Broytman [Sun, 12 Nov 2023 13:49:28 +0000 (16:49 +0300)]
Fix(Py3): Some socket errors are reported as `OSError`

2 years agoFix(Py3): Encode unicode to bytes
Oleg Broytman [Sun, 12 Nov 2023 13:35:38 +0000 (16:35 +0300)]
Fix(Py3): Encode unicode to bytes

2 years agoFix(Py3): Work around an old bug in `urlopen`
Oleg Broytman [Sun, 12 Nov 2023 11:46:38 +0000 (14:46 +0300)]
Fix(Py3): Work around an old bug in `urlopen`

It passes an extra parameter `timeout`
which `URLopener.open()` doesn't accept.

2 years agoFix(Robots/bkmk_rurllib_py3.py): Restore opener
Oleg Broytman [Sun, 12 Nov 2023 11:24:49 +0000 (14:24 +0300)]
Fix(Robots/bkmk_rurllib_py3.py): Restore opener

`urllib.request.urlcleanup()` clears opener.

2 years agoFix(Storage/bkmk_stflad.py): Fix reading header
Oleg Broytman [Sun, 12 Nov 2023 11:24:18 +0000 (14:24 +0300)]
Fix(Storage/bkmk_stflad.py): Fix reading header

2 years agoBuild(Makefile): The next version will be a new major release
Oleg Broytman [Sun, 12 Nov 2023 10:38:34 +0000 (13:38 +0300)]
Build(Makefile): The next version will be a new major release

2 years agoDocs(README): Fix copyright year
Oleg Broytman [Sun, 12 Nov 2023 10:38:08 +0000 (13:38 +0300)]
Docs(README): Fix copyright year

2 years agoFix(Py3): Fix `list.join(separator)`
Oleg Broytman [Sun, 12 Nov 2023 10:04:22 +0000 (13:04 +0300)]
Fix(Py3): Fix `list.join(separator)`

It's now `separator.join(list)`.