]> git.phdru.name Git - bookmarks_db.git/history - Robots/parse_html.py
ElementTidy often segfaults.
[bookmarks_db.git] / Robots / parse_html.py
2011-01-03 Oleg BroytmanElementTidy often segfaults.
2011-01-03 Oleg Broytman2011.
2011-01-02 Oleg BroytmanEncode icon's URL from unicode.
2010-08-13 Oleg BroytmanMoved lxml-based parser after BeautifulSoup - it doesn...
2010-08-13 Oleg BroytmanInsert lxml-based parser at the beginning.
2010-08-12 Oleg BroytmanTry parser in order until the first one finds a title.
2010-08-12 Oleg BroytmanTest if m_lib is available.
2010-08-12 Oleg BroytmanMove charset to the beginning of the list.
2010-08-11 Oleg BroytmanAdded HTML Parser based on html5 library.
2010-08-11 Oleg BroytmanRemoved parse_html_etreetidy - TidyHTMLTreeBuilder...
2010-08-11 Oleg BroytmanAdded HTML Parser based on TidyHTMLTreeBuilder.
2010-08-11 Oleg Broytman2010.
2010-08-08 Oleg BroytmanFixed a bug.
2010-08-08 Oleg BroytmanFixed parsing in case of unknown entity.
2009-09-27 Oleg Broytman"BroytMann" => "Broytman".
2008-03-09 Oleg BroytmanTitle (and refresh) can be None.
2008-03-07 Oleg BroytmanFixed a misspelled HTML entity.
2008-03-07 Oleg BroytmanFixed a misspelling.
2008-03-07 Oleg BroytmanPass charset from the command line.
2008-03-03 Oleg BroytmanLog more parsers errors.
2008-03-03 Oleg BroytmanAlways log guessed charset even if it's utf-8.
2008-03-03 Oleg BroytmanCharset was guessed if it is not from META and not...
2008-03-03 Oleg BroytmanCreate the list of charsets outside of the parsers...
2008-02-25 Oleg Broytman  is an entity that needs to be encoded.
2008-02-24 Oleg BroytmanUsed name2codepoint directly; recode it.
2008-02-24 Oleg BroytmanCombined two "if"s.
2008-02-24 Oleg BroytmanDo not unquote standard HTML entities.
2008-02-24 Oleg BroytmanEmulate log.
2008-02-23 Oleg BroytmanFixed a bug - break out of the loop after finding the...
2008-02-23 Oleg BroytmanIt is not HTTP charset, it is guessed charset.
2008-02-23 Oleg BroytmanTry a list of charsets, including the universal (utf...
2008-02-13 Oleg BroytmanStop meddling with cp1252.
2008-02-12 Oleg Broytmancurrent_charset is only needed in main.
2008-02-11 Oleg BroytmanRecode entities before num. entities.
2008-02-11 Oleg BroytmanSwitched to utf-8.
2008-02-11 Oleg BroytmanRecode HTML entities.
2007-12-28 Oleg BroytmanDo not display too much titles if they are equal.
2007-12-27 Oleg BroytmanStrip every line in title.
2007-12-22 Oleg BroytmanDo not encode non-encodeable entities.
2007-12-18 Oleg BroytmanFixed a bug.
2007-12-18 Oleg BroytmanDo all manipulations with title in one place.
2007-12-18 Oleg BroytmanLog the module's name of the failed parse_html.
2007-12-18 Oleg BroytmanTry BeautifulSoup; if it fails - fall back to HTML...
2007-12-18 Oleg BroytmanRecode from DEFAULT_CHARSET if recoding from cp1252...
2007-12-16 Oleg BroytmanAdded parser for html based on BeautifulSoup.
2007-12-16 Oleg BroytmanSplit parse_html.py into parse_html_htmlparser.py.
2007-10-11 Oleg BroytmanFixed a bug: import sys.
2007-10-11 Oleg BroytmanIgnore case for comparison.
2007-10-10 Oleg BroytmanFixed a bug: import codecs.
2007-10-10 Oleg BroytmanUse m_lib.defenc.
2007-10-10 Oleg BroytmanIn case of unknown charset try charset from HTML.
2007-10-10 Oleg BroytmanInitialize parser.icon in case there are no <link>...
2007-09-25 Oleg BroytmanFind an icon's URL in the HTML.
2007-09-07 Oleg Broytman/usr/bin/env python
2005-01-29 Oleg BroytmanIf sys.getdefaultencoding() returns "ascii" - use
2003-07-28 Oleg BroytmanUpdated to m_lib version 1.2. Extended support for...
2003-07-24 Oleg BroytmanParse and recode unicode entities.
2003-07-24 Oleg BroytmanVersion 3.3.1.