• Revista PROGRAMAR: Já está disponível a edição #53 da revista programar. Faz já o download aqui!

Sign in to follow this  
Followers 0

[Python] Saving e-mails to disk (from a gmail account)

1 post in this topic

As I was trying to download all my spam messages to disk (I'm building a database of spam messages), I realised that this is no simple task. Well, one thing I could do is save the messages using Thunderbird or Outlook, but since I don't use either (I consider the Gmail web interface very nice and user-friendly) this one is out of the question. However, after a little browsing I discovered a wonderful Python package called libgmail. Long story short, here is the script I used to download all the messages from the spam folder:

#!/usr/bin/env python
savemsg.py -- Download all messages from a specified folder
License: GPL 2.0

import sys
from getpass import getpass
import libgmail

if __name__ == "__main__":
       name = sys.argv[1]
   except IndexError:
       name = raw_input("Gmail account name: ")

   pw = getpass("Password: ")

   ga = libgmail.GmailAccount(name, pw)

   print "\nPlease wait, logging in..."

   except libgmail.GmailLoginFailure,e:
       print "\nLogin failed. (%s)" % e.message
       print "Login successful.\n"

   FOLDER_list = {'U_INBOX_SEARCH' : 'inbox',
                  'U_STARRED_SEARCH' : 'starred',
                  'U_ALL_SEARCH' : 'all',
                  'U_DRAFTS_SEARCH' : 'drafts' ,
                  'U_SENT_SEARCH' : 'sent',
                  'U_SPAM_SEARCH' : 'spam',

   FOLDER_list = raw_input('Choose a folder (inbox, starred, all, drafts, sent, spam): ')
   folder = ga.getMessagesByFolder(FOLDER_list)

   for thread in folder:
       for msg in thread:
           print "Downloading message %s " % msg.id
           encIndexStart = msg.source.find('charset=')
           if encIndexStart != -1:
               encIndexEnd = (msg.source.find(' ', encIndexStart),\
                              msg.source.find('\n', encIndexStart),\
                              msg.source.find(';', encIndexStart),\
                              msg.source.find('"', encIndexStart+10))
               encIndexEnd = [ind for ind in encIndexEnd if ind != -1]
               encIndexEnd = min(encIndexEnd)
               enc = msg.source[encIndexStart + 8:encIndexEnd]
               enc = enc.replace('"', '').replace(';', '')
               enc = 'ascii'
           print "Detected encoding %s\n"  % enc
               f = open(msg.id + " " + msg.subject + ".txt", 'w')
               # message subject contains characters forbidden by the os in the
               # file name, use just message id
               f = open(msg.id + ".txt", 'w')
   print "\n\nDone."

One could use the script to download messages from any gmail folder. The encoding of the message is automatically recognized and the message is saved in UTF-8 to facilitate later processing. Of course, you have to have libgmail installed to run the script. It is also very easy to adapt the script to use it for any other purpose (I actually wrote this script by changing one of the demo scripts that come with libgmail).

in: http://filoxus.blogspot.com/2008/04/saving-e-mails-to-disk-from-gmail.html


Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  
Followers 0