1 <?xml version="1.0" standalone="no"?>
2 <!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
3 "file:///usr/share/xml/docbook/schema/dtd/4.5/docbookx.dtd">
5 <refentry id="mimedecode.py">
8 <title>mimedecode.py</title>
9 <productname>mimedecode.docbook</productname>
11 <firstname>Oleg</firstname>
12 <surname>Broytman</surname>
13 <email>phd@phdru.name</email>
17 <year>2001-2014</year>
18 <holder>PhiloSoft Design.</holder>
23 <refentrytitle>mimedecode.py</refentrytitle>
24 <manvolnum>1</manvolnum>
28 <refname>mimedecode.py</refname>
29 <refpurpose>decode MIME message</refpurpose>
34 <command>mimedecode.py</command>
36 <option>-h|--help</option>
39 <option>-V|--version</option>
42 <option>-cCDP</option>
45 <option>-f charset</option>
48 <option>-H|--host=hostname</option>
51 <option>-d header1[,header2,header3...]</option>
54 <option>-d *[,-header1,-header2,-header3...]</option>
57 <option>-p header1[,header2,header3,...]:param1[,param2,param3,...]</option>
60 <option>-p *[,-header1,-header2,-header3,...]:param1[,param2,param3,...]</option>
63 <option>-p header1[,header2,header3,...]:*[,-param1,-param2,-param3,...]</option>
66 <option>-p *[,-header1,-header2,-header3,...]:*[,-param1,-param2,-param3,...]</option>
69 <option>-r header1[,header2,header3...]</option>
72 <option>-r *[,-header1,-header2,-header3...]</option>
75 <option>-R header1[,header2,header3,...]:param1[,param2,param3,...]</option>
78 <option>-R *[,-header1,-header2,-header3,...]:param1[,param2,param3,...]</option>
81 <option>-R header1[,header2,header3,...]:*[,-param1,-param2,-param3,...]</option>
84 <option>-R *[,-header1,-header2,-header3,...]:*[,-param1,-param2,-param3,...]</option>
87 <option>--set-header header:value</option>
90 <option>--set-param header:param=value</option>
93 <option>-Bbeit mask</option>
96 <option>--save-headers|body|message mask</option>
99 <option>-O dest_dir</option>
102 <option>-o output_file</option>
104 <arg choice="opt">input_file
105 <arg choice="opt">output_file</arg>
112 <title>DESCRIPTION</title>
114 Mail users, especially in non-English countries, often find that mail
115 messages arrived in different formats, with different content types, in
116 different encodings and charsets. Usually it is good because it allows to
117 use an appropriate format/encoding/whatever. Sometimes, though, some
118 unification is desirable. For example, one may want to put mail messages
119 into an archive, make HTML indices, run search indexer, etc. In such
120 situations converting messages to text in one character set and skipping
121 some binary attachments is much desirable.
125 Here is the solution - mimedecode.py!
129 This is a program to decode MIME messages. The program expects one input
130 file (either on command line or on stdin) which is treated as an RFC822
131 message, and decodes to stdout or an output file. If the file is not an
132 RFC822 message it is just copied to the output one-to-one. If the file is a
133 simple RFC822 message it is decoded as one part. If it is a MIME message
134 with multiple parts ("attachments") all parts are decoded. Decoding can be
135 controlled by command-line options.
139 First, for every part the program removes headers and parameters listed with
140 -r and -R options. Then, Subject and Content-Disposition headers (and all
141 headers listed with -d and -p options) are examined. If any of those exists,
142 they are decoded according to RFC2047. Content-Disposition header is not
143 decoded - only its "filename" parameter. Encoded header parameters violate
144 the RFC, but widely deployed anyway by ignorant coders who never even heard
145 about RFCs. Correct parameter encoding specified by RFC2231. This program
146 decodes RFC2231-encoded parameters, too.
150 Then the body of the message (or the current part) is decoded. Decoding
151 starts with looking at header Content-Transfer-Encoding. If the header
152 specifies non-8bit encoding (usually base64 or quoted-printable), the body
153 converted to 8bit. Then, if its content type is multipart (multipart/related
154 or multipart/mixed, e.g) every part is recursively decoded. If it is not
155 multipart, mailcap database is consulted to find a way to convert the body
156 to plain text. (I have no idea how mailcap can be configured on OSes other
157 than POSIX, please don't ask me; real OS users can consult my example at
158 <ulink url="http://phdru.name/Software/dotfiles/mailcap.html">http://phdru.name/Software/dotfiles/mailcap.html</ulink>).
159 The decoding process uses the first copiousoutput filter it can find. If
160 there are no filters the body just passed as is.
164 Then Content-Type header is consulted for charset. If it is not equal to the
165 current locale charset and recoding is allowed the body text is recoded.
166 Finally message headers and the body are flushed to stdout.
172 Please be warned that in the following options asterisk is a shell
173 metacharacter and should be escaped or quoted. Either write -d \*,-h1,-h2
174 or -d '*,-h1,-h2' or such.
179 <title>OPTIONS</title>
186 Print brief usage help and exit.
193 <term>--version</term>
196 Print version and exit.
205 Recode different character sets in message bodies to the current
206 default charset; this is the default.
215 Do not recode character sets in message bodies.
221 <term>-f charset</term>
224 Force this charset to be the current default charset instead of
231 <term>-H hostname</term>
232 <term>--host=hostname</term>
235 Use this hostname in X-MIME-Autoconverted headers instead of the
242 <term>-d header1[,header2,header3...]</term>
245 Add the header(s) to a list of headers to decode; initially the
246 list contains headers "From", "To", "Cc", "Reply-To",
247 "Mail-Followup-To" and "Subject".
253 <term>-d *[,-header1,-header2,-header3...]</term>
256 This variant completely changes headers decoding. First, the list of
257 headers to decode is cleared. Then all the headers are decoded
258 except the given list of exceptions (headers listed with '-'). In
259 this mode it would be meaningless to give more than one -d options
260 but the program doesn't enforce it.
269 Clear the list of headers to decode (make it empty).
275 <term>-p header1[,header2,header3,...]:param1[,param2,param3,...]</term>
278 Add the parameters(s) to a list of headers parameters to decode;
279 the parameters will be decoded only for the given header(s).
280 Initially the list contains header "Content-Type", parameter "name";
281 and header "Content-Disposition", parameter "filename".
287 <term>-p *[,-header1,-header2,-header3,...]:param1[,param2,param3,...]</term>
290 Add the parameters(s) to a list of headers parameters to decode;
291 the parameters will be decoded for all headers except the given
298 <term>-p header1[,header2,header3,...]:*[,-param1,-param2,-param3,...]</term>
301 Decode all parameters except listed for the given list of headers.
307 <term>-p *[,-header1,-header2,-header3,...]:*[,-param1,-param2,-param3,...]</term>
310 Decode all parameters except listed for all headers (except listed).
319 Clear the list of headers parameters to decode (make it empty).
325 <term>-r header1[,header2,header3...]</term>
328 Add the header(s) to a list of headers to remove completely;
329 initially the list is empty.
335 <term>-r *[,-header1,-header2,-header3...]</term>
338 Remove all headers except listed.
344 <term>-R header1[,header2,header3,...]:param1[,param2,param3,...]</term>
347 Add the parameters(s) to a list of headers parameters to remove;
348 the parameters will be decoded only for the given header(s).
349 Initially the list is empty.
355 <term>-R *[,-header1,-header2,-header3,...]:param1[,param2,param3,...]</term>
359 <term>-R header1[,header2,header3,...]:*[,-param1,-param2,-param3,...]</term>
363 <term>-R *[,-header1,-header2,-header3,...]:*[,-param1,-param2,-param3,...]</term>
366 Remove listed parameters (or all parameters except listed) frome
367 these headers (or from all headers except listed).
373 <term>--set-header header:value</term>
376 The program sets or changes value for the header to the given value
377 (only at the top-level message).
383 <term>--set-param header:param=value</term>
386 The program sets or changes value for the header's parameter to the
387 given value (only at the top-level message). The header must exist.
396 Append mask to the list of binary content types that will be not
397 content-transfer-decoded (will be left as base64 or such).
406 Append mask to the list of binary content types; if the message to
407 decode has a part of this type the program will
408 content-transfer-decode (base64 or whatever to 8bit binary) it but
409 pass the part as is, without any further processing.
418 Append mask to the list of error content types; if the message to
419 decode has a part of this type the program fails with ValueError.
428 Append mask to the list of content types to ignore; if the message to
429 decode has a part of this type the program will not pass it, instead
430 a line "Message body of type `%s' skipped." will be issued.
439 Append mask to the list of content types to convert to text; if the
440 message to decode has a part of this type the program will consult
441 mailcap database, find first copiousoutput filter and convert the
448 <term>--save-headers mask</term>
452 <term>--save-body mask</term>
456 <term>--save-message mask</term>
459 Append mask to a list of content types to save to a file;
460 --save-headers saves only decoded headers of the message (or
461 subpart); --save-body saves only decoded body; --save-message saves
462 the entire message or subpart (headers + body).
468 <term>-O dest_dir</term>
471 Set destination directory for the output files; if the directory
472 doesn't exist it will be created. Default is current directory.
478 <term>-o output_file</term>
481 Save output to the file related to the destination directory from
482 option -O. Also useful in case of redirected stdin:
483 <programlisting language="sh">mimedecode.py -o output_file < input_file
484 cat input_file | mimedecode.py -o output_file</programlisting>
491 The 5 list options (-Bbeit) require more explanation. They allow a user to
492 control body decoding with great flexibility. Think about said mail archive;
493 for example, its maintainer wants to put there only texts, convert
494 PDF/Postscript to text, pass HTML and images as is (decoding base64 to html
495 but left images in base64), and ignore everything else. Easy:
500 mimedecode.py -t application/pdf -t application/postscript -b text/html
501 -B 'image/*' -i '*/*'
506 When the program decodes a message (non-MIME or a non-multipart subpart of a
507 MIME message), it consults Content-Type header. The content type is searched
508 in all 5 lists, in order "text-binary-ignore-error". If found, appropriate
509 action performed. If not found, the program search the same lists for
510 "type/*" mask (the type of "text/html" is just "text"). If found,
511 appropriate action performed. If not found, the program search the same
512 lists for "*/*" mask. If found, appropriate action performed. If not found,
513 the program uses default action, which is to decode everything to text (if
514 mailcap specifies a filter).
518 Initially all 5 lists are empty, so without any additional parameters
519 the program always uses the default decoding.
523 The 3 save list options (--save-headers/body/message) are similar. They make
524 the program to save every non-multipart subpart (only headers, or body, or
525 the entire subpart) that corresponds to the given mask to a file. Before
526 saving the message (or the subpart) is decoded according to all other options
527 and placed to the output stream as usual. Filename for the file is created
528 using "filename" parameter from the Content-Disposition header, or "name"
529 parameter from the Content-Type header if one of those exist; a serial
530 counter is prepended to the filename to avoid collisions; if there are no
531 name/filename parameters, the filename is just the serial counter. The file
532 is saved in the directory set with -O (default is the current directory).
538 <title>ENVIRONMENT</title>
540 <varlistentry><term>LANG</term></varlistentry>
541 <varlistentry><term>LC_ALL</term></varlistentry>
542 <varlistentry><term>LC_CTYPE</term></varlistentry>
545 Define current locale settings. Used to determine current default charset (if
546 your Python is properly installed and configured).
554 The program may produce incorrect MIME message. The purpose of the program
555 is to decode whatever it is possible to decode, not to produce absolutely
556 correct MIME output. The incorrect parts are obvious - decoded
557 From/To/Cc/Reply-To/Mail-Followup-To/Subject headers and filenames. Other
558 than that output is correct MIME message. The program does not try to guess
559 whether the headers are correct. For example, if a message header states
560 that charset is iso8859-5, but the body is actually in utf-8 the program
561 will recode the message with the wrong charset.
567 <title>AUTHOR</title>
569 <firstname>Oleg</firstname>
570 <surname>Broytman</surname>
571 <email>phd@phdru.name</email>
577 <title>COPYRIGHT</title>
579 Copyright (C) 2001-2014 PhiloSoft Design.
585 <title>LICENSE</title>
593 <title>NO WARRANTIES</title>
595 This program is distributed in the hope that it will be useful, but WITHOUT
596 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
597 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
604 <title>SEE ALSO</title>
606 mimedecode.py home page:
607 <ulink url="http://phdru.name/Software/Python/#mimedecode">http://phdru.name/Software/Python/#mimedecode</ulink>