Deleting duplicate emails in outlook

Recently I started using Microsoft 365 for my email, instead of the gmail account I've had since the start of gmail. My mailbox is also a part of my life's archive, so I migrated the mail from gmail to Exchange Online, just like almost 2 decades ago I migrated all my emails to my brand new gmail account.

Anyway, during the recent migration to Exchange Online something got messed up, or most likely I messed up, but somehow a lot of messages were duplicated. Some even had 4 duplicates, resulting in a mailbox containing 144000 messages.

I connected Outlook to the mailbox, let it download all mail, and was hoping I could remove the duplicates using the "Clean up Folder" functionality that you see when right-clicking on a folder. It did remove some duplicates, but most duplicates remained.

So, I started looking online for a free tool that could remove duplicate messages from Outlook, and there appear to be a lot of tools, but non of them free. Well, some are free but they only remove 10 or 100 duplicates unless you buy the tool. After spending perhaps two hours looking for tools, installing them, finding out they were too limited in their free versions, I had enough and started Visual Studio Code to whip up a PowerShell script to do the trick.

It took the script about 60 minutes to run and remove all the duplicates; resulting in not 144K messages but only 67K! Which is still a bit much, and there really is no need to keep emails from before the Great Millennium Bug, so I'll do some more housekeeping...

The PowerShell script below only looks in one folder; I had everything in my Inbox so that worked. You can of course expand the script to recursively look in all folders for duplicates, but I leave that up to you!

--
Kenneth van Grinsven

The script is as follows:

param (
    $email = 'me@vangrinsven.com',
    $mailbox = "Inbox",
    $dodelete = $false
)

$outlook = New-Object -ComObject Outlook.Application
$namespace = $outlook.GetNamespace("MAPI")
$inbox = $namespace.Folders.Item($email).Folders.Item($mailbox)
$mailsattime=@{}
$itemcount=0
$deletedcount = 0
$lastcount=0
$todelete = @()
foreach ($item in $inbox.Items) {
    Write-Host -NoNewline "."
    $itemcount ++
    if ( ($itemcount % 1000) -eq 0)
    {
        Write-Host " "
        Write-Host ("{0} ({1} [{2}])" -f $itemcount, $deletedcount, ($deletedcount - $lastcount))
        $lastcount = $deletedcount
        
    } 
    # a message is a duplicate when the sender, subject, date, and message body match
    # messageid: EntryId
    # sender: SenderEmailAddress && SenderName
    # subject: Subject
    # date: ReceivedTime
    # body: Body && HTMLBody
    
    $messageId = $item.EntryID
    $recvtime = $item.ReceivedTime.ToString()

    # recvtime shouldn't be empty; 5 could pretty much be any number...
    if ($recvtime.length -ge 5) {
        $duplicate=$false

        foreach ($msg in $mailsattime[$recvtime]) {
            # this message has the exact same receive time as a previous message, so 
            # compare the message information
            $cmpitem = $namespace.GetItemFromID($msg)

            if (! $duplicate) {
                Write-Host -NoNewline "?"
                # is all the relevant data the same?
                if ( ($cmpitem.SenderEmailAddress -eq $item.SenderEmailAddress) -and 
                    ($cmpitem.SenderName -eq $item.SenderName) -and
                    ($cmpitem.Subject -eq $item.Subject) -and 
                    ($cmpitem.ReceivedTime -eq $item.ReceivedTime) -and 
                    ($cmpitem.Body -eq $item.Body) -and
                    ($cmpitem.HTMLBody -eq $item.HTMLBody) )
                {
                    Write-Host -NoNewline "!"
                    # store the messageid for removal
                    $todelete += $item.EntryID
                    $deletedcount++
                    $duplicate = $true 
                }
            }
        }

        
        if (!$duplicate) {
            # add this message to the $mailsattime hash/array
            if ($mailsattime[$recvtime]) {
                $array = $mailsattime[$recvtime]
                $array += $messageId
                $mailsattime[$recvtime] = $array
            }
            else {
                $array = @()
                $array += $messageId
                $mailsattime[$recvtime] = $array
            }
        }
    }
    else {
        Write-Host -NoNewline "-"
    }
}


Write-Host "Deleting items"
foreach ($todel in $todelete) {
    $item = $namespace.GetItemFromID($todel)
    if ($dodelete) {
        $item.Delete()
    }
    Write-Host -NoNewline "!"
}

Agile: planning part 1. Hours versus story points

After I prepared a post about agile planning, I found out the post was becoming huge, so I divided it into three posts: Estimating Short term planning Long term planning This first part is about estimating user stories, adding to the debate of hours versus story points. A lot of project managers who are involved in agile projects see a story point as a measure of time, often using the equation “1 story point = 1 hour”, while on the other hand an agile method like scrum teaches us story points are a measure of effort or complexity, and are at most loosely coupled with time. So should we estimate using hours or estimate using story points? Let’s first build the case for using story points: It is very hard to estimate how long something should take, but it is less hard to estimate whether something will take more time or less time compared to something you know well. To aid the estimating process scrum introduced story points. Story points aren’t a measure of time, but are a...

Kenneth van Grinsven's Blog

Search This Blog

Deleting duplicate emails in outlook

Labels

Comments

Post a Comment

Popular posts from this blog

Agile: So who's the team manager?

Agile: planning part 1. Hours versus story points