Recently I started using Microsoft 365 for my email, instead of the gmail account I've had since the start of gmail. My mailbox is also a part of my life's archive, so I migrated the mail from gmail to Exchange Online, just like almost 2 decades ago I migrated all my emails to my brand new gmail account.
Anyway, during the recent migration to Exchange Online something got messed up, or most likely I messed up, but somehow a lot of messages were duplicated. Some even had 4 duplicates, resulting in a mailbox containing 144000 messages.
I connected Outlook to the mailbox, let it download all mail, and was hoping I could remove the duplicates using the "Clean up Folder" functionality that you see when right-clicking on a folder. It did remove some duplicates, but most duplicates remained.
So, I started looking online for a free tool that could remove duplicate messages from Outlook, and there appear to be a lot of tools, but non of them free. Well, some are free but they only remove 10 or 100 duplicates unless you buy the tool. After spending perhaps two hours looking for tools, installing them, finding out they were too limited in their free versions, I had enough and started Visual Studio Code to whip up a PowerShell script to do the trick.
It took the script about 60 minutes to run and remove all the duplicates; resulting in not 144K messages but only 67K! Which is still a bit much, and there really is no need to keep emails from before the Great Millennium Bug, so I'll do some more housekeeping...
The PowerShell script below only looks in one folder; I had everything in my Inbox so that worked. You can of course expand the script to recursively look in all folders for duplicates, but I leave that up to you!
--
Kenneth van Grinsven
The script is as follows:
param (
$email = 'me@vangrinsven.com',
$mailbox = "Inbox",
$dodelete = $false
)
$outlook = New-Object -ComObject Outlook.Application
$namespace = $outlook.GetNamespace("MAPI")
$inbox = $namespace.Folders.Item($email).Folders.Item($mailbox)
$mailsattime=@{}
$itemcount=0
$deletedcount = 0
$lastcount=0
$todelete = @()
foreach ($item in $inbox.Items) {
Write-Host -NoNewline "."
$itemcount ++
if ( ($itemcount % 1000) -eq 0)
{
Write-Host " "
Write-Host ("{0} ({1} [{2}])" -f $itemcount, $deletedcount, ($deletedcount - $lastcount))
$lastcount = $deletedcount
}
# a message is a duplicate when the sender, subject, date, and message body match
# messageid: EntryId
# sender: SenderEmailAddress && SenderName
# subject: Subject
# date: ReceivedTime
# body: Body && HTMLBody
$messageId = $item.EntryID
$recvtime = $item.ReceivedTime.ToString()
# recvtime shouldn't be empty; 5 could pretty much be any number...
if ($recvtime.length -ge 5) {
$duplicate=$false
foreach ($msg in $mailsattime[$recvtime]) {
# this message has the exact same receive time as a previous message, so
# compare the message information
$cmpitem = $namespace.GetItemFromID($msg)
if (! $duplicate) {
Write-Host -NoNewline "?"
# is all the relevant data the same?
if ( ($cmpitem.SenderEmailAddress -eq $item.SenderEmailAddress) -and
($cmpitem.SenderName -eq $item.SenderName) -and
($cmpitem.Subject -eq $item.Subject) -and
($cmpitem.ReceivedTime -eq $item.ReceivedTime) -and
($cmpitem.Body -eq $item.Body) -and
($cmpitem.HTMLBody -eq $item.HTMLBody) )
{
Write-Host -NoNewline "!"
# store the messageid for removal
$todelete += $item.EntryID
$deletedcount++
$duplicate = $true
}
}
}
if (!$duplicate) {
# add this message to the $mailsattime hash/array
if ($mailsattime[$recvtime]) {
$array = $mailsattime[$recvtime]
$array += $messageId
$mailsattime[$recvtime] = $array
}
else {
$array = @()
$array += $messageId
$mailsattime[$recvtime] = $array
}
}
}
else {
Write-Host -NoNewline "-"
}
}
Write-Host "Deleting items"
foreach ($todel in $todelete) {
$item = $namespace.GetItemFromID($todel)
if ($dodelete) {
$item.Delete()
}
Write-Host -NoNewline "!"
}
Comments
Post a Comment