1 <?xml version="1.0" standalone="no"?>
2 <!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
3 "file:///usr/share/xml/docbook/schema/dtd/4.5/docbookx.dtd">
5 <refentry id="mimedecode.py">
8 <title>mimedecode.py</title>
9 <productname>mimedecode.docbook</productname>
11 <firstname>Oleg</firstname>
12 <surname>Broytman</surname>
13 <email>phd@phdru.name</email>
17 <year>2001-2014</year>
18 <holder>PhiloSoft Design.</holder>
23 <refentrytitle>mimedecode.py</refentrytitle>
24 <manvolnum>1</manvolnum>
28 <refname>mimedecode.py</refname>
29 <refpurpose>decode MIME message</refpurpose>
34 <command>mimedecode.py</command>
36 <option>-h|--help</option>
39 <option>-V|--version</option>
42 <option>-cCDP</option>
45 <option>-f charset</option>
48 <option>-H|--host=hostname</option>
51 <option>-d header1[,header2,header3...]</option>
54 <option>-d *[,-header1,-header2,-header3...]</option>
57 <option>-p header1[,header2,header3,...]:param1[,param2,param3,...]</option>
60 <option>-p *[,-header1,-header2,-header3,...]:param1[,param2,param3,...]</option>
63 <option>-p header1[,header2,header3,...]:*[,-param1,-param2,-param3,...]</option>
66 <option>-p *[,-header1,-header2,-header3,...]:*[,-param1,-param2,-param3,...]</option>
69 <option>-r header1[,header2,header3...]</option>
72 <option>-r *[,-header1,-header2,-header3...]</option>
75 <option>-R header1[,header2,header3,...]:param1[,param2,param3,...]</option>
78 <option>-R *[,-header1,-header2,-header3,...]:param1[,param2,param3,...]</option>
81 <option>-R header1[,header2,header3,...]:*[,-param1,-param2,-param3,...]</option>
84 <option>-R *[,-header1,-header2,-header3,...]:*[,-param1,-param2,-param3,...]</option>
87 <option>--set-header header:value</option>
90 <option>--set-param header:param=value</option>
93 <option>-beit mask</option>
96 <option>-o output_file</option>
98 <arg choice="opt">input_file
99 <arg choice="opt">output_file</arg>
106 <title>DESCRIPTION</title>
108 Mail users, especially in non-English countries, often find that mail
109 messages arrived in different formats, with different content types, in
110 different encodings and charsets. Usually it is good because it allows to
111 use an appropriate format/encoding/whatever. Sometimes, though, some
112 unification is desirable. For example, one may want to put mail messages
113 into an archive, make HTML indices, run search indexer, etc. In such
114 situations converting messages to text in one character set and skipping
115 some binary attachments is much desirable.
119 Here is the solution - mimedecode.py!
123 This is a program to decode MIME messages. The program expects one input
124 file (either on command line or on stdin) which is treated as an RFC822
125 message, and decodes to stdout or an output file. If the file is not an
126 RFC822 message it is just copied to the output one-to-one. If the file is a
127 simple RFC822 message it is decoded as one part. If it is a MIME message
128 with multiple parts ("attachments") all parts are decoded. Decoding can be
129 controlled by command-line options.
133 First, for every part the program removes headers and parameters listed with
134 -r and -R options. Then, Subject and Content-Disposition headers (and all
135 headers listed with -d and -p options) are examined. If any of those exists,
136 they are decoded according to RFC2047. Content-Disposition header is not
137 decoded - only its "filename" parameter. Encoded header parameters violate
138 the RFC, but widely deployed anyway by ignorant coders who never even heard
139 about RFCs. Correct parameter encoding specified by RFC2231. This program
140 decodes RFC2231-encoded parameters, too.
144 Then the body of the message (or the current part) is decoded. Decoding
145 starts with looking at header Content-Transfer-Encoding. If the header
146 specifies non-8bit encoding (usually base64 or quoted-printable), the body
147 converted to 8bit. Then, if its content type is multipart (multipart/related
148 or multipart/mixed, e.g) every part is recursively decoded. If it is not
149 multipart, mailcap database is consulted to find a way to convert the body
150 to plain text. (I have no idea how mailcap can be configured on OSes other
151 than POSIX, please don't ask me; real OS users can consult my example at
152 <ulink url="http://phdru.name/Software/dotfiles/mailcap.html">http://phdru.name/Software/dotfiles/mailcap.html</ulink>).
153 The decoding process uses the first copiousoutput filter it can find. If
154 there are no filters the body just passed as is.
158 Then Content-Type header is consulted for charset. If it is not equal to the
159 current locale charset and recoding is allowed the body text is recoded.
160 Finally message headers and the body are flushed to stdout.
166 Please be warned that in the following options asterisk is a shell
167 metacharacter and should be escaped or quoted. Either write -d \*,-h1,-h2
168 or -d '*,-h1,-h2' or such.
173 <title>OPTIONS</title>
180 Print brief usage help and exit.
187 <term>--version</term>
190 Print version and exit.
199 Recode different character sets in message bodies to the current
200 default charset; this is the default.
209 Do not recode character sets in message bodies.
215 <term>-f charset</term>
218 Force this charset to be the current default charset instead of
225 <term>-H hostname</term>
226 <term>--host=hostname</term>
229 Use this hostname in X-MIME-Autoconverted headers instead of the
236 <term>-d header1[,header2,header3...]</term>
239 Add the header(s) to a list of headers to decode; initially the
240 list contains headers "From", "To", "Cc", "Reply-To",
241 "Mail-Followup-To" and "Subject".
247 <term>-d *[,-header1,-header2,-header3...]</term>
250 This variant completely changes headers decoding. First, the list of
251 headers to decode is cleared. Then all the headers are decoded
252 except the given list of exceptions (headers listed with '-'). In
253 this mode it would be meaningless to give more than one -d options
254 but the program doesn't enforce it.
263 Clear the list of headers to decode (make it empty).
269 <term>-p header1[,header2,header3,...]:param1[,param2,param3,...]</term>
272 Add the parameters(s) to a list of headers parameters to decode;
273 the parameters will be decoded only for the given header(s).
274 Initially the list contains header "Content-Type", parameter "name";
275 and header "Content-Disposition", parameter "filename".
281 <term>-p *[,-header1,-header2,-header3,...]:param1[,param2,param3,...]</term>
284 Add the parameters(s) to a list of headers parameters to decode;
285 the parameters will be decoded for all headers except the given
292 <term>-p header1[,header2,header3,...]:*[,-param1,-param2,-param3,...]</term>
295 Decode all parameters except listed for the given list of headers.
301 <term>-p *[,-header1,-header2,-header3,...]:*[,-param1,-param2,-param3,...]</term>
304 Decode all parameters except listed for all headers (except listed).
313 Clear the list of headers parameters to decode (make it empty).
319 <term>-r header1[,header2,header3...]</term>
322 Add the header(s) to a list of headers to remove completely;
323 initially the list is empty.
329 <term>-r *[,-header1,-header2,-header3...]</term>
332 Remove all headers except listed.
338 <term>-R header1[,header2,header3,...]:param1[,param2,param3,...]</term>
341 Add the parameters(s) to a list of headers parameters to remove;
342 the parameters will be decoded only for the given header(s).
343 Initially the list is empty.
349 <term>-R *[,-header1,-header2,-header3,...]:param1[,param2,param3,...]</term>
353 <term>-R header1[,header2,header3,...]:*[,-param1,-param2,-param3,...]</term>
357 <term>-R *[,-header1,-header2,-header3,...]:*[,-param1,-param2,-param3,...]</term>
360 Remove listed parameters (or all parameters except listed) frome
361 these headers (or from all headers except listed).
367 <term>--set-header header:value</term>
370 The program sets or changes value for the header to the given value
371 (only at the top-level message).
377 <term>--set-param header:param=value</term>
380 The program sets or changes value for the header's parameter to the
381 given value (only at the top-level message). The header must exist.
390 Append mask to the list of binary content types; if the message to
391 decode has a part of this type the program will pass the part as is,
392 without any additional processing.
401 Append mask to the list of error content types; if the message to
402 decode has a part of this type the program fails with ValueError.
411 Append mask to the list of content types to ignore; if the message to
412 decode has a part of this type the program will not pass it, instead
413 a line "Message body of type `%s' skipped." will be issued.
422 Append mask to the list of content types to convert to text; if the
423 message to decode has a part of this type the program will consult
424 mailcap database, find first copiousoutput filter and convert the
431 <term>-o output_file</term>
434 Useful to set the output file in case of redirected stdin:
435 <programlisting language="sh">mimedecode.py -o output_file < input_file
436 cat input_file | mimedecode.py -o output_file</programlisting>
443 The 4 list options (-beit) require more explanation. They allow a user to
444 control body decoding with great flexibility. Think about said mail archive;
445 for example, its maintainer wants to put there only texts, convert
446 Postscript/PDF to text, pass HTML and images as is, and ignore everything
452 mimedecode.py -t application/postscript -t application/pdf -b text/html
453 -b 'image/*' -i '*/*'
458 When the program decodes a message (non-MIME or a non-multipart subpart of a
459 MIME message), it consults Content-Type header. The content type is searched
460 in all 4 lists, in order "text-binary-ignore-error". If found, appropriate
461 action performed. If not found, the program search the same lists for
462 "type/*" mask (the type of "text/html" is just "text"). If found,
463 appropriate action performed. If not found, the program search the same
464 lists for "*/*" mask. If found, appropriate action performed. If not found,
465 the program uses default action, which is to decode everything to text (if
466 mailcap specifies a filter).
470 Initially all 4 lists are empty, so without any additional parameters
471 the program always uses the default decoding.
477 <title>ENVIRONMENT</title>
479 <varlistentry><term>LANG</term></varlistentry>
480 <varlistentry><term>LC_ALL</term></varlistentry>
481 <varlistentry><term>LC_CTYPE</term></varlistentry>
484 Define current locale settings. Used to determine current default charset (if
485 your Python is properly installed and configured).
493 The program may produce incorrect MIME message. The purpose of the program
494 is to decode whatever it is possible to decode, not to produce absolutely
495 correct MIME output. The incorrect parts are obvious - decoded
496 From/To/Cc/Reply-To/Mail-Followup-To/Subject headers and filenames. Other
497 than that output is correct MIME message. The program does not try to guess
498 whether the headers are correct. For example, if a message header states
499 that charset is iso8859-5, but the body is actually in utf-8 the program
500 will recode the message with the wrong charset.
506 <title>AUTHOR</title>
508 <firstname>Oleg</firstname>
509 <surname>Broytman</surname>
510 <email>phd@phdru.name</email>
516 <title>COPYRIGHT</title>
518 Copyright (C) 2001-2014 PhiloSoft Design.
524 <title>LICENSE</title>
532 <title>NO WARRANTIES</title>
534 This program is distributed in the hope that it will be useful, but WITHOUT
535 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
536 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
543 <title>SEE ALSO</title>
545 mimedecode.py home page:
546 <ulink url="http://phdru.name/Software/Python/#mimedecode">http://phdru.name/Software/Python/#mimedecode</ulink>