Skip to main content

Deleting duplicate emails in outlook

Recently I started using Microsoft 365 for my email, instead of the gmail account I've had since the start of gmail. My mailbox is also a part of my life's archive, so I migrated the mail from gmail to Exchange Online, just like almost 2 decades ago I migrated all my emails to my brand new gmail account.

Anyway, during the recent migration to Exchange Online something got messed up, or most likely I messed up, but somehow a lot of messages were duplicated. Some even had 4 duplicates, resulting in a mailbox containing 144000 messages.

I connected Outlook to the mailbox, let it download all mail, and was hoping I could remove the duplicates using the "Clean up Folder" functionality that you see when right-clicking on a folder. It did remove some duplicates, but most duplicates remained. 

So, I started looking online for a free tool that could remove duplicate messages from Outlook, and there appear to be a lot of tools, but non of them free. Well, some are free but they only remove 10 or 100 duplicates unless you buy the tool. After spending perhaps two hours looking for tools, installing them, finding out they were too limited in their free versions, I had enough and started Visual Studio Code to whip up a PowerShell script to do the trick.

It took the script about 60 minutes to run and remove all the duplicates; resulting in not 144K messages but only 67K! Which is still a bit much, and there really is no need to keep emails from before the Great Millennium Bug, so I'll do some more housekeeping...

The PowerShell script below only looks in one folder; I had everything in my Inbox so that worked. You can of course expand the script to recursively look in all folders for duplicates, but I leave that up to you!

--
Kenneth van Grinsven

The script is as follows:

param (
    $email = 'me@vangrinsven.com',
    $mailbox = "Inbox",
    $dodelete = $false
)

$outlook = New-Object -ComObject Outlook.Application
$namespace = $outlook.GetNamespace("MAPI")
$inbox = $namespace.Folders.Item($email).Folders.Item($mailbox)
$mailsattime=@{}
$itemcount=0
$deletedcount = 0
$lastcount=0
$todelete = @()
foreach ($item in $inbox.Items) {
    Write-Host -NoNewline "."
    $itemcount ++
    if ( ($itemcount % 1000) -eq 0)
    {
        Write-Host " "
        Write-Host ("{0} ({1} [{2}])" -f $itemcount$deletedcount, ($deletedcount - $lastcount))
        $lastcount = $deletedcount
        
    } 
    # a message is a duplicate when the sender, subject, date, and message body match
    # messageid: EntryId
    # sender: SenderEmailAddress && SenderName
    # subject: Subject
    # date: ReceivedTime
    # body: Body && HTMLBody
    
    $messageId = $item.EntryID
    $recvtime = $item.ReceivedTime.ToString()

    # recvtime shouldn't be empty; 5 could pretty much be any number...
    if ($recvtime.length -ge 5) {
        $duplicate=$false

        foreach ($msg in $mailsattime[$recvtime]) {
            # this message has the exact same receive time as a previous message, so 
            # compare the message information
            $cmpitem = $namespace.GetItemFromID($msg)

            if (! $duplicate) {
                Write-Host -NoNewline "?"
                # is all the relevant data the same?
                if ( ($cmpitem.SenderEmailAddress -eq $item.SenderEmailAddress) -and 
                    ($cmpitem.SenderName -eq $item.SenderName) -and
                    ($cmpitem.Subject -eq $item.Subject) -and 
                    ($cmpitem.ReceivedTime -eq $item.ReceivedTime) -and 
                    ($cmpitem.Body -eq $item.Body) -and
                    ($cmpitem.HTMLBody -eq $item.HTMLBody) )
                {
                    Write-Host -NoNewline "!"
                    # store the messageid for removal
                    $todelete += $item.EntryID
                    $deletedcount++
                    $duplicate = $true 
                }
            }
        }

        
        if (!$duplicate) {
            # add this message to the $mailsattime hash/array
            if ($mailsattime[$recvtime]) {
                $array = $mailsattime[$recvtime]
                $array += $messageId
                $mailsattime[$recvtime] = $array
            }
            else {
                $array = @()
                $array += $messageId
                $mailsattime[$recvtime] = $array
            }
        }
    }
    else {
        Write-Host -NoNewline "-"
    }
}


Write-Host "Deleting items"
foreach ($todel in $todelete) {
    $item = $namespace.GetItemFromID($todel)
    if ($dodelete) {
        $item.Delete()
    }
    Write-Host -NoNewline "!"
}

Comments

Popular posts from this blog

Agile: not "just another way of doing the same"

Despite being decades old agile is hot and is being introduced and implemented in many organizations which is great. With this post I'll very briefly try to explain that agile however is not some plug-in replacement of current ways of working. You can of course take an existing team and start scrumming with them, and it may lead to improvements, but you will not reap all the benefits. In order to be agile, you have to understand the agile concepts. Those concepts may be obvious, but in the daily madness people tend to focus on the wrong things. Think about this analogy; I use the train for my daily commute, and if you ask the employees of the Dutch Railways in what kind of business they are, the majority will tell you that they are in the railway business, while in fact they are in the people transportation business. This illustrates one of the driving forces behind agile: with everything you do, you should ask yourself who you’re doing it for, and whether there are ways to serve

Agile: So who's the team manager?

People often call agile teams “self-steering” teams or “self-managing” teams. But surely someone must steer the team towards a long term goal? And how about activities that are typically performed by a department head, such as daily operations, coaching, coordination of vacation days, and employee performance management? The answer to those questions is, as with many things, “it depends”. It depends on the management activity and how far along the organization is in adopting an ABOK (Agile Body of Knowledge) way of working. Let’s first zoom out before we zoom in. Traditionally organizations are divided into silos. The division can for example be based on market segment, or geography, but is usually based on the activities people perform. So usually, IT people are pooled in the IT silo. If we look at such a traditional IT department, then within that silo there is usually more compartmentalization; such as a software development team, a network and system administrations team, a

Agile: planning part 1. Hours versus story points

After I prepared a post about agile planning, I found out the post was becoming huge, so I divided it into three posts: Estimating Short term planning Long term planning This first part is about estimating user stories, adding to the debate of hours versus story points. A lot of project managers who are involved in agile projects see a story point as a measure of time, often using the equation “1 story point = 1 hour”, while on the other hand an agile method like scrum teaches us story points are a measure of effort or complexity, and are at most loosely coupled with time. So should we estimate using hours or estimate using story points? Let’s first build the case for using story points: It is very hard to estimate how long something should take, but it is less hard to estimate whether something will take more time or less time compared to something you know well. To aid the estimating process scrum introduced story points. Story points aren’t a measure of time, but are a