1 <?xml version="1.0" standalone="no"?>
2 <!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
3 "file:///usr/share/xml/docbook/schema/dtd/4.5/docbookx.dtd">
5 <refentry id="mimedecode.py">
8 <title>mimedecode.py</title>
9 <productname>mimedecode.docbook</productname>
11 <firstname>Oleg</firstname>
12 <surname>Broytman</surname>
13 <email>phd@phdru.name</email>
17 <year>2001-2014</year>
18 <holder>PhiloSoft Design.</holder>
23 <refentrytitle>mimedecode.py</refentrytitle>
24 <manvolnum>1</manvolnum>
28 <refname>mimedecode.py</refname>
29 <refpurpose>decode MIME message</refpurpose>
34 <command>mimedecode.py</command>
36 <option>-h|--help</option>
39 <option>-V|--version</option>
42 <option>-cCDP</option>
45 <option>-f charset</option>
48 <option>-H|--host=hostname</option>
51 <option>-d header1[,header2,header3...]</option>
54 <option>-d *[,-header1,-header2,-header3...]</option>
57 <option>-p header1[,header2,header3,...]:param1[,param2,param3,...]</option>
60 <option>-p *[,-header1,-header2,-header3,...]:param1[,param2,param3,...]</option>
63 <option>-p header1[,header2,header3,...]:*[,-param1,-param2,-param3,...]</option>
66 <option>-p *[,-header1,-header2,-header3,...]:*[,-param1,-param2,-param3,...]</option>
69 <option>-r header1[,header2,header3...]</option>
72 <option>-r *[,-header1,-header2,-header3...]</option>
75 <option>-R header1[,header2,header3,...]:param1[,param2,param3,...]</option>
78 <option>-R *[,-header1,-header2,-header3,...]:param1[,param2,param3,...]</option>
81 <option>-R header1[,header2,header3,...]:*[,-param1,-param2,-param3,...]</option>
84 <option>-R *[,-header1,-header2,-header3,...]:*[,-param1,-param2,-param3,...]</option>
87 <option>--set-header header:value</option>
90 <option>--set-param header:param=value</option>
93 <option>-Bbeit mask</option>
96 <option>--save-headers|body|message mask</option>
99 <option>-O dest_dir</option>
102 <option>-o output_file</option>
104 <arg choice="opt">input_file
105 <arg choice="opt">output_file</arg>
112 <title>DESCRIPTION</title>
114 Mail users, especially in non-English countries, often find that mail
115 messages arrived in different formats, with different content types, in
116 different encodings and charsets. Usually it is good because it allows to
117 use an appropriate format/encoding/whatever. Sometimes, though, some
118 unification is desirable. For example, one may want to put mail messages
119 into an archive, make HTML indices, run search indexer, etc. In such
120 situations converting messages to text in one character set and skipping
121 some binary attachments is much desirable.
125 Here is the solution - mimedecode.py!
129 This is a program to decode MIME messages. The program expects one input
130 file (either on command line or on stdin) which is treated as an RFC822
131 message, and decodes to stdout or an output file. If the file is not an
132 RFC822 message it is just copied to the output one-to-one. If the file is a
133 simple RFC822 message it is decoded as one part. If it is a MIME message
134 with multiple parts ("attachments") all parts are decoded. Decoding can be
135 controlled by command-line options.
139 First, for every part the program removes headers and parameters listed with
140 -r and -R options. Then, Subject and Content-Disposition headers (and all
141 headers listed with -d and -p options) are examined. If any of those exists,
142 they are decoded according to RFC2047. Content-Disposition header is not
143 decoded - only its "filename" parameter. Encoded header parameters violate
144 the RFC, but widely deployed anyway by ignorant coders who never even heard
145 about RFCs. Correct parameter encoding specified by RFC2231. This program
146 decodes RFC2231-encoded parameters, too.
150 Then the body of the message (or the current part) is decoded. Decoding
151 starts with looking at header Content-Transfer-Encoding. If the header
152 specifies non-8bit encoding (usually base64 or quoted-printable), the body
153 converted to 8bit. Then, if its content type is multipart (multipart/related
154 or multipart/mixed, e.g) every part is recursively decoded. If it is not
155 multipart, mailcap database is consulted to find a way to convert the body
156 to plain text. (I have no idea how mailcap can be configured on OSes other
157 than POSIX, please don't ask me; real OS users can consult my example at
158 <ulink url="http://phdru.name/Software/dotfiles/mailcap.html">http://phdru.name/Software/dotfiles/mailcap.html</ulink>).
159 The decoding process uses the first copiousoutput filter it can find. If
160 there are no filters the body just passed as is.
164 Then Content-Type header is consulted for charset. If it is not equal to the
165 current locale charset and recoding is allowed the body text is recoded.
166 Finally message headers and the body are flushed to stdout.
172 Please be warned that in the following options asterisk is a shell
173 metacharacter and should be escaped or quoted. Either write -d \*,-h1,-h2
174 or -d '*,-h1,-h2' or such.
179 <title>OPTIONS</title>
186 Print brief usage help and exit.
193 <term>--version</term>
196 Print version and exit.
205 Recode different character sets in message bodies to the current
206 default charset; this is the default.
215 Do not recode character sets in message bodies.
221 <term>-f charset</term>
224 Force this charset to be the current default charset instead of
231 <term>-H hostname</term>
232 <term>--host=hostname</term>
235 Use this hostname in X-MIME-Autoconverted headers instead of the
242 <term>-d header1[,header2,header3...]</term>
245 Add the header(s) to a list of headers to decode; initially the
246 list contains headers "From", "To", "Cc", "Reply-To",
247 "Mail-Followup-To" and "Subject".
253 <term>-d *[,-header1,-header2,-header3...]</term>
256 This variant completely changes headers decoding. First, the list of
257 headers to decode is cleared. Then all the headers are decoded
258 except the given list of exceptions (headers listed with '-'). In
259 this mode it would be meaningless to give more than one -d options
260 but the program doesn't enforce it.
269 Clear the list of headers to decode (make it empty).
275 <term>-p header1[,header2,header3,...]:param1[,param2,param3,...]</term>
278 Add the parameters(s) to a list of headers parameters to decode;
279 the parameters will be decoded only for the given header(s).
280 Initially the list contains header "Content-Type", parameter "name";
281 and header "Content-Disposition", parameter "filename".
287 <term>-p *[,-header1,-header2,-header3,...]:param1[,param2,param3,...]</term>
290 Add the parameters(s) to a list of headers parameters to decode;
291 the parameters will be decoded for all headers except the given
298 <term>-p header1[,header2,header3,...]:*[,-param1,-param2,-param3,...]</term>
301 Decode all parameters except listed for the given list of headers.
307 <term>-p *[,-header1,-header2,-header3,...]:*[,-param1,-param2,-param3,...]</term>
310 Decode all parameters except listed for all headers (except listed).
319 Clear the list of headers parameters to decode (make it empty).
325 <term>-r header1[,header2,header3...]</term>
328 Add the header(s) to a list of headers to remove completely;
329 initially the list is empty.
335 <term>-r *[,-header1,-header2,-header3...]</term>
338 Remove all headers except listed.
344 <term>-R header1[,header2,header3,...]:param1[,param2,param3,...]</term>
347 Add the parameters(s) to a list of headers parameters to remove;
348 the parameters will be decoded only for the given header(s).
349 Initially the list is empty.
355 <term>-R *[,-header1,-header2,-header3,...]:param1[,param2,param3,...]</term>
359 <term>-R header1[,header2,header3,...]:*[,-param1,-param2,-param3,...]</term>
363 <term>-R *[,-header1,-header2,-header3,...]:*[,-param1,-param2,-param3,...]</term>
366 Remove listed parameters (or all parameters except listed) frome
367 these headers (or from all headers except listed).
373 <term>--set-header header:value</term>
376 The program sets or changes value for the header to the given value
377 (only at the top-level message).
383 <term>--set-param header:param=value</term>
386 The program sets or changes value for the header's parameter to the
387 given value (only at the top-level message). The header must exist.
396 Append mask to the list of binary content types that will be not
397 content-transfer-decoded (will be left as base64 or such).
406 Append mask to the list of binary content types; if the message to
407 decode has a part of this type the program content-transfer-decodes
408 (base64 or whatever to 8bit binary) it and outputs the decoded part
409 as is, without any further processing.
418 Append mask to the list of error content types; if the message to
419 decode has a part of this type the program fails with ValueError.
428 Append mask to the list of content types to ignore; if the message
429 to decode has a part of this type the program outputs headers but
430 skips the body. Instead a line "Message body of type %s skipped."
440 Append mask to the list of content types to convert to text; if the
441 message to decode has a part of this type the program consults
442 mailcap database, find the first copiousoutput filter and, if any
443 filter is found, converts the part.
449 <term>--save-headers mask</term>
453 <term>--save-body mask</term>
457 <term>--save-message mask</term>
460 Append mask to a list of content types to save to a file;
461 --save-headers saves only decoded headers of the message (or
462 subpart); --save-body saves only decoded body; --save-message saves
463 the entire message or subpart (headers + body).
469 <term>-O dest_dir</term>
472 Set destination directory for the output files; if the directory
473 doesn't exist it will be created. Default is the current directory.
479 <term>-o output_file</term>
482 Save output to the file related to the destination directory from
483 option -O. Also useful in case of redirected stdin:
484 <programlisting language="sh">mimedecode.py -o output_file < input_file
485 cat input_file | mimedecode.py -o output_file</programlisting>
492 The 5 list options (-Bbeit) require more explanation. They allow a user to
493 control body decoding with great flexibility. Think about said mail archive;
494 for example, its maintainer wants to put there only texts, convert
495 PDF/Postscript to text, pass HTML and images as is (decoding base64 to html
496 but left images in base64), and ignore everything else. Easy:
501 mimedecode.py -t application/pdf -t application/postscript -b text/html
502 -B 'image/*' -i '*/*'
507 When the program decodes a message (non-MIME or a non-multipart subpart of a
508 MIME message), it consults Content-Type header. The content type is searched
509 in all 5 lists, in order "text-binary-ignore-error". If found, appropriate
510 action performed. If not found, the program search the same lists for
511 "type/*" mask (the type of "text/html" is just "text"). If found,
512 appropriate action performed. If not found, the program search the same
513 lists for "*/*" mask. If found, appropriate action performed. If not found,
514 the program uses default action, which is to decode everything to text (if
515 mailcap specifies a filter).
519 Initially all 5 lists are empty, so without any additional parameters
520 the program always uses the default decoding.
524 The 3 save options (--save-headers/body/message) are similar. They make the
525 program to save every non-multipart subpart (only headers, or body, or the
526 entire subpart: headers + body) that corresponds to the given mask to a file.
527 Before saving the message (or the subpart) is decoded according to all other
528 options and placed to the output stream as usual. Filename for the file is
529 created using "filename" parameter from the Content-Disposition header, or
530 "name" parameter from the Content-Type header if one of those exist; a serial
531 counter is prepended to the filename to avoid collisions; if there are no
532 name/filename parameters, or the name/filename parameters contain forbidden
533 characters (null, slash, backslash) the filename is just the serial counter.
534 The file is saved in the directory set with -O (default is the current
541 <title>ENVIRONMENT</title>
543 <varlistentry><term>LANG</term></varlistentry>
544 <varlistentry><term>LC_ALL</term></varlistentry>
545 <varlistentry><term>LC_CTYPE</term></varlistentry>
548 Define current locale settings. Used to determine current default charset (if
549 your Python is properly installed and configured).
557 The program may produce incorrect MIME message. The purpose of the program
558 is to decode whatever it is possible to decode, not to produce absolutely
559 correct MIME output. The incorrect parts are obvious - decoded
560 From/To/Cc/Reply-To/Mail-Followup-To/Subject headers and filenames. Other
561 than that output is correct MIME message. The program does not try to guess
562 whether the headers are correct. For example, if a message header states
563 that charset is iso8859-5, but the body is actually in utf-8 the program
564 will recode the message with the wrong charset.
570 <title>AUTHOR</title>
572 <firstname>Oleg</firstname>
573 <surname>Broytman</surname>
574 <email>phd@phdru.name</email>
580 <title>COPYRIGHT</title>
582 Copyright (C) 2001-2014 PhiloSoft Design.
588 <title>LICENSE</title>
596 <title>NO WARRANTIES</title>
598 This program is distributed in the hope that it will be useful, but WITHOUT
599 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
600 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
607 <title>SEE ALSO</title>
609 mimedecode.py home page:
610 <ulink url="http://phdru.name/Software/Python/#mimedecode">http://phdru.name/Software/Python/#mimedecode</ulink>