1 <?xml version="1.0" standalone="no"?>
2 <!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
3 "file:///usr/share/xml/docbook/schema/dtd/4.5/docbookx.dtd">
5 <refentry id="mimedecode.py">
8 <title>mimedecode.py</title>
9 <productname>mimedecode.docbook</productname>
11 <firstname>Oleg</firstname>
12 <surname>Broytman</surname>
13 <email>phd@phdru.name</email>
17 <year>2001-2014</year>
18 <holder>PhiloSoft Design.</holder>
23 <refentrytitle>mimedecode.py</refentrytitle>
24 <manvolnum>1</manvolnum>
28 <refname>mimedecode.py</refname>
29 <refpurpose>decode MIME message</refpurpose>
34 <command>mimedecode.py</command>
36 <option>-h|--help</option>
39 <option>-V|--version</option>
42 <option>-cCDP</option>
45 <option>-f charset</option>
48 <option>-H|--host=hostname</option>
51 <option>-d header1[,header2,header3...]</option>
54 <option>-d *[,-header1,-header2,-header3...]</option>
57 <option>-p header1[,header2,header3,...]:param1[,param2,param3,...]</option>
60 <option>-p *[,-header1,-header2,-header3,...]:param1[,param2,param3,...]</option>
63 <option>-p header1[,header2,header3,...]:*[,-param1,-param2,-param3,...]</option>
66 <option>-p *[,-header1,-header2,-header3,...]:*[,-param1,-param2,-param3,...]</option>
69 <option>-r header1[,header2,header3...]</option>
72 <option>-r *[,-header1,-header2,-header3...]</option>
75 <option>-R header1[,header2,header3,...]:param1[,param2,param3,...]</option>
78 <option>-R *[,-header1,-header2,-header3,...]:param1[,param2,param3,...]</option>
81 <option>-R header1[,header2,header3,...]:*[,-param1,-param2,-param3,...]</option>
84 <option>-R *[,-header1,-header2,-header3,...]:*[,-param1,-param2,-param3,...]</option>
87 <option>--set-header header:value</option>
90 <option>--set-param header:param=value</option>
93 <option>-Bbeit mask</option>
96 <option>--save-headers|body|message mask</option>
99 <option>-O dest_dir</option>
102 <option>-o output_file</option>
104 <arg choice="opt">input_file
105 <arg choice="opt">output_file</arg>
112 <title>DESCRIPTION</title>
114 Mail users, especially in non-English countries, often find that mail
115 messages arrived in different formats, with different content types, in
116 different encodings and charsets. Usually it is good because it allows to
117 use an appropriate format/encoding/whatever. Sometimes, though, some
118 unification is desirable. For example, one may want to put mail messages
119 into an archive, make HTML indices, run search indexer, etc. In such
120 situations converting messages to text in one character set and skipping
121 some binary attachments is much desirable.
125 Here is the solution - mimedecode.py!
129 This is a program to decode MIME messages. The program expects one input
130 file (either on command line or on stdin) which is treated as an RFC822
131 message, and decodes to stdout or an output file. If the file is not an
132 RFC822 message it is just copied to the output one-to-one. If the file is a
133 simple RFC822 message it is decoded as one part. If it is a MIME message
134 with multiple parts ("attachments") all parts are decoded. Decoding can be
135 controlled by command-line options.
139 First, for every part the program removes headers and parameters listed with
140 -r and -R options. Then, Subject and Content-Disposition headers (and all
141 headers listed with -d and -p options) are examined. If any of those exists,
142 they are decoded according to RFC2047. Content-Disposition header is not
143 decoded - only its "filename" parameter. Encoded header parameters violate
144 the RFC, but widely deployed anyway by ignorant coders who never even heard
145 about RFCs. Correct parameter encoding specified by RFC2231. This program
146 decodes RFC2231-encoded parameters, too.
150 Then the body of the message (or the current part) is decoded. Decoding
151 starts with looking at header Content-Transfer-Encoding. If the header
152 specifies non-8bit encoding (usually base64 or quoted-printable), the body
153 converted to 8bit. Then, if its content type is multipart (multipart/related
154 or multipart/mixed, e.g) every part is recursively decoded. If it is not
155 multipart, mailcap database is consulted to find a way to convert the body
156 to plain text. (I have no idea how mailcap can be configured on OSes other
157 than POSIX, please don't ask me; real OS users can consult my example at
158 <ulink url="http://phdru.name/Software/dotfiles/mailcap.html">http://phdru.name/Software/dotfiles/mailcap.html</ulink>).
159 The decoding process uses the first copiousoutput filter it can find. If
160 there are no filters the body just passed as is.
164 Then Content-Type header is consulted for charset. If it is not equal to the
165 current locale charset and recoding is allowed the body text is recoded.
166 Finally message headers and the body are flushed to stdout.
172 Please be warned that in the following options asterisk is a shell
173 metacharacter and should be escaped or quoted. Either write -d \*,-h1,-h2
174 or -d '*,-h1,-h2' or such.
179 <title>OPTIONS</title>
186 Print brief usage help and exit.
193 <term>--version</term>
196 Print version and exit.
205 Recode different character sets in message bodies to the current
206 default charset; this is the default.
215 Do not recode character sets in message bodies.
221 <term>-f charset</term>
224 Force this charset to be the current default charset instead of
231 <term>-H hostname</term>
232 <term>--host=hostname</term>
235 Use this hostname in X-MIME-Autoconverted headers instead of the
242 <term>-d header1[,header2,header3...]</term>
245 Add the header(s) to a list of headers to decode; initially the
246 list contains headers "From", "To", "Cc", "Reply-To",
247 "Mail-Followup-To" and "Subject".
253 <term>-d *[,-header1,-header2,-header3...]</term>
256 This variant completely changes headers decoding. First, the list of
257 headers to decode is cleared. Then all the headers are decoded
258 except the given list of exceptions (headers listed with '-'). In
259 this mode it would be meaningless to give more than one -d options
260 but the program doesn't enforce it.
269 Clear the list of headers to decode (make it empty).
275 <term>-p header1[,header2,header3,...]:param1[,param2,param3,...]</term>
278 Add the parameters(s) to a list of headers parameters to decode;
279 the parameters will be decoded only for the given header(s).
280 Initially the list contains header "Content-Type", parameter "name";
281 and header "Content-Disposition", parameter "filename".
287 <term>-p *[,-header1,-header2,-header3,...]:param1[,param2,param3,...]</term>
290 Add the parameters(s) to a list of headers parameters to decode;
291 the parameters will be decoded for all headers except the given
298 <term>-p header1[,header2,header3,...]:*[,-param1,-param2,-param3,...]</term>
301 Decode all parameters except listed for the given list of headers.
307 <term>-p *[,-header1,-header2,-header3,...]:*[,-param1,-param2,-param3,...]</term>
310 Decode all parameters except listed for all headers (except listed).
319 Clear the list of headers parameters to decode (make it empty).
325 <term>-r header1[,header2,header3...]</term>
328 Add the header(s) to a list of headers to remove completely;
329 initially the list is empty.
335 <term>-r *[,-header1,-header2,-header3...]</term>
338 Remove all headers except listed.
344 <term>-R header1[,header2,header3,...]:param1[,param2,param3,...]</term>
347 Add the parameters(s) to a list of headers parameters to remove;
348 the parameters will be decoded only for the given header(s).
349 Initially the list is empty.
355 <term>-R *[,-header1,-header2,-header3,...]:param1[,param2,param3,...]</term>
359 <term>-R header1[,header2,header3,...]:*[,-param1,-param2,-param3,...]</term>
363 <term>-R *[,-header1,-header2,-header3,...]:*[,-param1,-param2,-param3,...]</term>
366 Remove listed parameters (or all parameters except listed) frome
367 these headers (or from all headers except listed).
373 <term>--set-header header:value</term>
376 The program sets or changes value for the header to the given value
377 (only at the top-level message).
383 <term>--set-param header:param=value</term>
386 The program sets or changes value for the header's parameter to the
387 given value (only at the top-level message). The header must exist.
396 Append mask to the list of binary content types; if the message to
397 decode has a part of this type the program will pass the part as is,
398 without any additional processing.
407 Append mask to the list of binary content types that will be not
408 content-transfer-decoded (will be left as base64 or such).
417 Append mask to the list of error content types; if the message to
418 decode has a part of this type the program fails with ValueError.
427 Append mask to the list of content types to ignore; if the message to
428 decode has a part of this type the program will not pass it, instead
429 a line "Message body of type `%s' skipped." will be issued.
438 Append mask to the list of content types to convert to text; if the
439 message to decode has a part of this type the program will consult
440 mailcap database, find first copiousoutput filter and convert the
447 <term>--save-headers mask</term>
451 <term>--save-body mask</term>
455 <term>--save-message mask</term>
458 Append mask to a list of content types to save to a file;
459 --save-headers saves only decoded headers of the message (or
460 subpart); --save-body saves only decoded body; --save-message saves
461 the entire message (or subpart).
467 <term>-O dest_dir</term>
470 Set destination directory for the output files. Default is current
477 <term>-o output_file</term>
480 Save output to the file related to the destination directory from
481 option -O. Also useful in case of redirected stdin:
482 <programlisting language="sh">mimedecode.py -o output_file < input_file
483 cat input_file | mimedecode.py -o output_file</programlisting>
490 The 5 list options (-Bbeit) require more explanation. They allow a user to
491 control body decoding with great flexibility. Think about said mail archive;
492 for example, its maintainer wants to put there only texts, convert
493 Postscript/PDF to text, pass HTML and images as is, and ignore everything
499 mimedecode.py -t application/postscript -t application/pdf -b text/html
500 -b 'image/*' -i '*/*'
505 When the program decodes a message (non-MIME or a non-multipart subpart of a
506 MIME message), it consults Content-Type header. The content type is searched
507 in all 4 lists, in order "text-binary-ignore-error". If found, appropriate
508 action performed. If not found, the program search the same lists for
509 "type/*" mask (the type of "text/html" is just "text"). If found,
510 appropriate action performed. If not found, the program search the same
511 lists for "*/*" mask. If found, appropriate action performed. If not found,
512 the program uses default action, which is to decode everything to text (if
513 mailcap specifies a filter).
517 Initially all 4 lists are empty, so without any additional parameters
518 the program always uses the default decoding.
522 The 3 save list options (--save-headers/body/message) are similar. They make
523 the program to save every non-multipart subpart (only headers, or body, or
524 the entire subpart) that corresponds to the given mask to a file. Before
525 saving the message (or the subpart) is decoded according to all other options
526 and placed to the output stream as usual. Filename for the file is created
527 using "filename" parameter from the Content-Disposition header, or "name"
528 parameter from the Content-Type header if one of those exist; a serial
529 counter is prepended to the filename to avoid collisions; if there are no
530 name/filename parameters, the filename is just the serial counter. The file
531 is saved in the directory set with -O (default is the current directory).
537 <title>ENVIRONMENT</title>
539 <varlistentry><term>LANG</term></varlistentry>
540 <varlistentry><term>LC_ALL</term></varlistentry>
541 <varlistentry><term>LC_CTYPE</term></varlistentry>
544 Define current locale settings. Used to determine current default charset (if
545 your Python is properly installed and configured).
553 The program may produce incorrect MIME message. The purpose of the program
554 is to decode whatever it is possible to decode, not to produce absolutely
555 correct MIME output. The incorrect parts are obvious - decoded
556 From/To/Cc/Reply-To/Mail-Followup-To/Subject headers and filenames. Other
557 than that output is correct MIME message. The program does not try to guess
558 whether the headers are correct. For example, if a message header states
559 that charset is iso8859-5, but the body is actually in utf-8 the program
560 will recode the message with the wrong charset.
566 <title>AUTHOR</title>
568 <firstname>Oleg</firstname>
569 <surname>Broytman</surname>
570 <email>phd@phdru.name</email>
576 <title>COPYRIGHT</title>
578 Copyright (C) 2001-2014 PhiloSoft Design.
584 <title>LICENSE</title>
592 <title>NO WARRANTIES</title>
594 This program is distributed in the hope that it will be useful, but WITHOUT
595 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
596 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
603 <title>SEE ALSO</title>
605 mimedecode.py home page:
606 <ulink url="http://phdru.name/Software/Python/#mimedecode">http://phdru.name/Software/Python/#mimedecode</ulink>