• Revista PROGRAMAR: Já está disponível a edição #53 da revista programar. Faz já o download aqui!

djthyrax

[Python] Saving e-mails to disk (from a gmail account)

1 mensagem neste tópico

As I was trying to download all my spam messages to disk (I'm building a database of spam messages), I realised that this is no simple task. Well, one thing I could do is save the messages using Thunderbird or Outlook, but since I don't use either (I consider the Gmail web interface very nice and user-friendly) this one is out of the question. However, after a little browsing I discovered a wonderful Python package called libgmail. Long story short, here is the script I used to download all the messages from the spam folder:

#!/usr/bin/env python
'''
savemsg.py -- Download all messages from a specified folder
License: GPL 2.0
'''

import sys
from getpass import getpass
import libgmail

if __name__ == "__main__":
   try:
       name = sys.argv[1]
   except IndexError:
       name = raw_input("Gmail account name: ")

   pw = getpass("Password: ")

   ga = libgmail.GmailAccount(name, pw)

   print "\nPlease wait, logging in..."

   try:
       ga.login()
   except libgmail.GmailLoginFailure,e:
       print "\nLogin failed. (%s)" % e.message
       sys.exit(1)
   else:
       print "Login successful.\n"

   FOLDER_list = {'U_INBOX_SEARCH' : 'inbox',
                  'U_STARRED_SEARCH' : 'starred',
                  'U_ALL_SEARCH' : 'all',
                  'U_DRAFTS_SEARCH' : 'drafts' ,
                  'U_SENT_SEARCH' : 'sent',
                  'U_SPAM_SEARCH' : 'spam',
                  }

   FOLDER_list = raw_input('Choose a folder (inbox, starred, all, drafts, sent, spam): ')
   folder = ga.getMessagesByFolder(FOLDER_list)

   for thread in folder:
       for msg in thread:
           print "Downloading message %s " % msg.id
           encIndexStart = msg.source.find('charset=')
           if encIndexStart != -1:
               encIndexEnd = (msg.source.find(' ', encIndexStart),\
                              msg.source.find('\n', encIndexStart),\
                              msg.source.find(';', encIndexStart),\
                              msg.source.find('"', encIndexStart+10))
               encIndexEnd = [ind for ind in encIndexEnd if ind != -1]
               encIndexEnd = min(encIndexEnd)
               enc = msg.source[encIndexStart + 8:encIndexEnd]
               enc = enc.replace('"', '').replace(';', '')
           else:
               enc = 'ascii'
           print "Detected encoding %s\n"  % enc
           try:
               f = open(msg.id + " " + msg.subject + ".txt", 'w')
           except:
               # message subject contains characters forbidden by the os in the
               # file name, use just message id
               f = open(msg.id + ".txt", 'w')
           try:
               f.write(msg.source.decode(enc).encode('utf-8'))
           except:
               f.write(msg.source)
           f.close()
   print "\n\nDone."

One could use the script to download messages from any gmail folder. The encoding of the message is automatically recognized and the message is saved in UTF-8 to facilitate later processing. Of course, you have to have libgmail installed to run the script. It is also very easy to adapt the script to use it for any other purpose (I actually wrote this script by changing one of the demo scripts that come with libgmail).

in: http://filoxus.blogspot.com/2008/04/saving-e-mails-to-disk-from-gmail.html

0

Partilhar esta mensagem


Link para a mensagem
Partilhar noutros sites

Crie uma conta ou ligue-se para comentar

Só membros podem comentar

Criar nova conta

Registe para ter uma conta na nossa comunidade. É fácil!


Registar nova conta

Entra

Já tem conta? Inicie sessão aqui.


Entrar Agora