1 <?xml version="1.0" standalone="no"?>
2 <!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
3 "file:///usr/share/xml/docbook/schema/dtd/4.5/docbookx.dtd">
5 <refentry id="mimedecode.py">
8 <title>mimedecode.py</title>
9 <productname>mimedecode.docbook</productname>
11 <firstname>Oleg</firstname>
12 <surname>Broytman</surname>
13 <email>phd@phdru.name</email>
17 <year>2001-2014</year>
18 <holder>PhiloSoft Design.</holder>
23 <refentrytitle>mimedecode.py</refentrytitle>
24 <manvolnum>1</manvolnum>
28 <refname>mimedecode.py</refname>
29 <refpurpose>decode MIME message</refpurpose>
34 <command>mimedecode.py</command>
36 <option>-h|--help</option>
39 <option>-V|--version</option>
42 <option>-cCDP</option>
45 <option>-f charset</option>
48 <option>-H|--host=hostname</option>
51 <option>-d header1[,header2,header3...]</option>
54 <option>-d *[,-header1,-header2,-header3...]</option>
57 <option>-p header1[,header2,header3,...]:param1[,param2,param3,...]</option>
60 <option>-p *[,-header1,-header2,-header3,...]:param1[,param2,param3,...]</option>
63 <option>-p header1[,header2,header3,...]:*[,-param1,-param2,-param3,...]</option>
66 <option>-p *[,-header1,-header2,-header3,...]:*[,-param1,-param2,-param3,...]</option>
69 <option>-r header1[,header2,header3...]</option>
72 <option>-r *[,-header1,-header2,-header3...]</option>
75 <option>-R header1[,header2,header3,...]:param1[,param2,param3,...]</option>
78 <option>-R *[,-header1,-header2,-header3,...]:param1[,param2,param3,...]</option>
81 <option>-R header1[,header2,header3,...]:*[,-param1,-param2,-param3,...]</option>
84 <option>-R *[,-header1,-header2,-header3,...]:*[,-param1,-param2,-param3,...]</option>
87 <option>--set-header header:value</option>
90 <option>--set-param header:param=value</option>
93 <option>-Bbeit mask</option>
96 <option>-o output_file</option>
98 <arg choice="opt">input_file
99 <arg choice="opt">output_file</arg>
106 <title>DESCRIPTION</title>
108 Mail users, especially in non-English countries, often find that mail
109 messages arrived in different formats, with different content types, in
110 different encodings and charsets. Usually it is good because it allows to
111 use an appropriate format/encoding/whatever. Sometimes, though, some
112 unification is desirable. For example, one may want to put mail messages
113 into an archive, make HTML indices, run search indexer, etc. In such
114 situations converting messages to text in one character set and skipping
115 some binary attachments is much desirable.
119 Here is the solution - mimedecode.py!
123 This is a program to decode MIME messages. The program expects one input
124 file (either on command line or on stdin) which is treated as an RFC822
125 message, and decodes to stdout or an output file. If the file is not an
126 RFC822 message it is just copied to the output one-to-one. If the file is a
127 simple RFC822 message it is decoded as one part. If it is a MIME message
128 with multiple parts ("attachments") all parts are decoded. Decoding can be
129 controlled by command-line options.
133 First, for every part the program removes headers and parameters listed with
134 -r and -R options. Then, Subject and Content-Disposition headers (and all
135 headers listed with -d and -p options) are examined. If any of those exists,
136 they are decoded according to RFC2047. Content-Disposition header is not
137 decoded - only its "filename" parameter. Encoded header parameters violate
138 the RFC, but widely deployed anyway by ignorant coders who never even heard
139 about RFCs. Correct parameter encoding specified by RFC2231. This program
140 decodes RFC2231-encoded parameters, too.
144 Then the body of the message (or the current part) is decoded. Decoding
145 starts with looking at header Content-Transfer-Encoding. If the header
146 specifies non-8bit encoding (usually base64 or quoted-printable), the body
147 converted to 8bit. Then, if its content type is multipart (multipart/related
148 or multipart/mixed, e.g) every part is recursively decoded. If it is not
149 multipart, mailcap database is consulted to find a way to convert the body
150 to plain text. (I have no idea how mailcap can be configured on OSes other
151 than POSIX, please don't ask me; real OS users can consult my example at
152 <ulink url="http://phdru.name/Software/dotfiles/mailcap.html">http://phdru.name/Software/dotfiles/mailcap.html</ulink>).
153 The decoding process uses the first copiousoutput filter it can find. If
154 there are no filters the body just passed as is.
158 Then Content-Type header is consulted for charset. If it is not equal to the
159 current locale charset and recoding is allowed the body text is recoded.
160 Finally message headers and the body are flushed to stdout.
166 Please be warned that in the following options asterisk is a shell
167 metacharacter and should be escaped or quoted. Either write -d \*,-h1,-h2
168 or -d '*,-h1,-h2' or such.
173 <title>OPTIONS</title>
180 Print brief usage help and exit.
187 <term>--version</term>
190 Print version and exit.
199 Recode different character sets in message bodies to the current
200 default charset; this is the default.
209 Do not recode character sets in message bodies.
215 <term>-f charset</term>
218 Force this charset to be the current default charset instead of
225 <term>-H hostname</term>
226 <term>--host=hostname</term>
229 Use this hostname in X-MIME-Autoconverted headers instead of the
236 <term>-d header1[,header2,header3...]</term>
239 Add the header(s) to a list of headers to decode; initially the
240 list contains headers "From", "To", "Cc", "Reply-To",
241 "Mail-Followup-To" and "Subject".
247 <term>-d *[,-header1,-header2,-header3...]</term>
250 This variant completely changes headers decoding. First, the list of
251 headers to decode is cleared. Then all the headers are decoded
252 except the given list of exceptions (headers listed with '-'). In
253 this mode it would be meaningless to give more than one -d options
254 but the program doesn't enforce it.
263 Clear the list of headers to decode (make it empty).
269 <term>-p header1[,header2,header3,...]:param1[,param2,param3,...]</term>
272 Add the parameters(s) to a list of headers parameters to decode;
273 the parameters will be decoded only for the given header(s).
274 Initially the list contains header "Content-Type", parameter "name";
275 and header "Content-Disposition", parameter "filename".
281 <term>-p *[,-header1,-header2,-header3,...]:param1[,param2,param3,...]</term>
284 Add the parameters(s) to a list of headers parameters to decode;
285 the parameters will be decoded for all headers except the given
292 <term>-p header1[,header2,header3,...]:*[,-param1,-param2,-param3,...]</term>
295 Decode all parameters except listed for the given list of headers.
301 <term>-p *[,-header1,-header2,-header3,...]:*[,-param1,-param2,-param3,...]</term>
304 Decode all parameters except listed for all headers (except listed).
313 Clear the list of headers parameters to decode (make it empty).
319 <term>-r header1[,header2,header3...]</term>
322 Add the header(s) to a list of headers to remove completely;
323 initially the list is empty.
329 <term>-r *[,-header1,-header2,-header3...]</term>
332 Remove all headers except listed.
338 <term>-R header1[,header2,header3,...]:param1[,param2,param3,...]</term>
341 Add the parameters(s) to a list of headers parameters to remove;
342 the parameters will be decoded only for the given header(s).
343 Initially the list is empty.
349 <term>-R *[,-header1,-header2,-header3,...]:param1[,param2,param3,...]</term>
353 <term>-R header1[,header2,header3,...]:*[,-param1,-param2,-param3,...]</term>
357 <term>-R *[,-header1,-header2,-header3,...]:*[,-param1,-param2,-param3,...]</term>
360 Remove listed parameters (or all parameters except listed) frome
361 these headers (or from all headers except listed).
367 <term>--set-header header:value</term>
370 The program sets or changes value for the header to the given value
371 (only at the top-level message).
377 <term>--set-param header:param=value</term>
380 The program sets or changes value for the header's parameter to the
381 given value (only at the top-level message). The header must exist.
390 Append mask to the list of binary content types; if the message to
391 decode has a part of this type the program will pass the part as is,
392 without any additional processing.
401 Append mask to the list of binary content types that will be not
402 content-transfer-decoded (will be left as base64 or such).
411 Append mask to the list of error content types; if the message to
412 decode has a part of this type the program fails with ValueError.
421 Append mask to the list of content types to ignore; if the message to
422 decode has a part of this type the program will not pass it, instead
423 a line "Message body of type `%s' skipped." will be issued.
432 Append mask to the list of content types to convert to text; if the
433 message to decode has a part of this type the program will consult
434 mailcap database, find first copiousoutput filter and convert the
441 <term>-o output_file</term>
444 Useful to set the output file in case of redirected stdin:
445 <programlisting language="sh">mimedecode.py -o output_file < input_file
446 cat input_file | mimedecode.py -o output_file</programlisting>
453 The 5 list options (-Bbeit) require more explanation. They allow a user to
454 control body decoding with great flexibility. Think about said mail archive;
455 for example, its maintainer wants to put there only texts, convert
456 Postscript/PDF to text, pass HTML and images as is, and ignore everything
462 mimedecode.py -t application/postscript -t application/pdf -b text/html
463 -b 'image/*' -i '*/*'
468 When the program decodes a message (non-MIME or a non-multipart subpart of a
469 MIME message), it consults Content-Type header. The content type is searched
470 in all 4 lists, in order "text-binary-ignore-error". If found, appropriate
471 action performed. If not found, the program search the same lists for
472 "type/*" mask (the type of "text/html" is just "text"). If found,
473 appropriate action performed. If not found, the program search the same
474 lists for "*/*" mask. If found, appropriate action performed. If not found,
475 the program uses default action, which is to decode everything to text (if
476 mailcap specifies a filter).
480 Initially all 4 lists are empty, so without any additional parameters
481 the program always uses the default decoding.
487 <title>ENVIRONMENT</title>
489 <varlistentry><term>LANG</term></varlistentry>
490 <varlistentry><term>LC_ALL</term></varlistentry>
491 <varlistentry><term>LC_CTYPE</term></varlistentry>
494 Define current locale settings. Used to determine current default charset (if
495 your Python is properly installed and configured).
503 The program may produce incorrect MIME message. The purpose of the program
504 is to decode whatever it is possible to decode, not to produce absolutely
505 correct MIME output. The incorrect parts are obvious - decoded
506 From/To/Cc/Reply-To/Mail-Followup-To/Subject headers and filenames. Other
507 than that output is correct MIME message. The program does not try to guess
508 whether the headers are correct. For example, if a message header states
509 that charset is iso8859-5, but the body is actually in utf-8 the program
510 will recode the message with the wrong charset.
516 <title>AUTHOR</title>
518 <firstname>Oleg</firstname>
519 <surname>Broytman</surname>
520 <email>phd@phdru.name</email>
526 <title>COPYRIGHT</title>
528 Copyright (C) 2001-2014 PhiloSoft Design.
534 <title>LICENSE</title>
542 <title>NO WARRANTIES</title>
544 This program is distributed in the hope that it will be useful, but WITHOUT
545 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
546 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
553 <title>SEE ALSO</title>
555 mimedecode.py home page:
556 <ulink url="http://phdru.name/Software/Python/#mimedecode">http://phdru.name/Software/Python/#mimedecode</ulink>