Skip to main content

Anaylising the content of a PST file and reporting on the age and type of content using Powershell and WPF

As time passes we all receive more and more email, this is one of the irrefutable facts of life for any mail system or anybody with a mailbox. Even the most fastidious of deleters still can't avoid this so at some point in the future you may need to consider achieving. The fact is your users maybe already self archiving using PST files which like self medicating can lead to serious problems down the track when their laptop goes under a bus. Exchange 2010 introduced native achieving and SP1 when its released will build on these capabilities, for other versions of exchange there are a bunch of other fine products you can look at to do this. If your looking to ingest PST's into an online archive you may want to first review what content your going to be importing and look at things like how many attachments and what type are they and how old is the content to give you a feel for what your going to be storing and potentially having to backup.

What this boils down to is if you want to include attachment reporting then you will need a script that will do a pass on every item within a mailbox. This means in real terms that this script is going to be slow to very slow to run on a very large PST file. If you can wait you can get some useful information so there is a trade off their somewhere.

To loop through every message within a PST file using powershell is pretty easy if you use Dmitry's redemption library http://www.dimastr.com/redemption/. RDO gives us an easy to use wrapper around exMapi and the Outlooks PST provider. So now we have the ability to go through every item in a pst file the next thing to decide is how you want to group and classify each item. I've chosen to do this by date and content age this gives the most flexibility when it comes to aggregating the data later on. To add extra functionality in this script I've created separate hash's for attachment sizes and types this is something that you can build on yourself all you need to do is think about what it is you want to report and aggregate and then just add in a few lines of your own code.

When it comes time to report on the data that was collected in the PST sweep this is when Powershell comes into its own with the Group-Object and Measure-Object cmdlets. Because during the sweep i classified each message group depending on the age of the content i can then further re-aggregate this data using some pipeline magic eg
$Datehash.Values | group-object {$_.Folder} | Sort-Object @{expression={(($_.Group | Measure-Object SizeofItems -sum).sum/1MB)}} -Descending | foreach-object{

$Charthash2["1 to 3 years"] = $Charthash2["1 to 3 years"] + ($_.Group | Where-Object {$_.ContentAge -eq 36} | Measure-Object SizeofItems -sum).sum/1MB

}
Actually explaining what this line does would take a separate post but when it comes to quantifying numerical information without using a database this is extremely useful. In real words it first groups the data by Folder Name and then allows me to aggregate within the grouped data.

Displaying the result

I decided to try something new with this script instead of using the normal winform GUI I've used a lot in the past i went down the WPF path. WPF (Windows Presentation Framework) first appeared in .NET 3.0 so to use this script you must have the .NET 3.5 framework installed as well as the wpftoolkit which contains both datagrid and chart control I've used in this script. As far as a comparison between Winforms and WPF from powershell WPF is by far easier to use when building a GUI script because you define all the element in XML then just manipulate the data providers with code. You can use something like Visual studio to build the graphical look for your GUI and then cut and past the code more or less straight into your script with a few small changes. The result of what you get is also visually more pleasing eg these are a few of the screen-shots from this script.



Using this script

There are a few pre-requisites for using this script first you need redemption http://www.dimastr.com/redemption/download.htm

If you want to run this on Windows 7 64bit I found i had to use the following launcher script to start a 32bit session to run the script

&$env:windir\syswow64\windowspowershell\v1.0\powershell.exe -noninteractive -STA

This also ensures that you have a STA session of powershell which is important for the WPF code to work correctly. The last things that's required is the WPFtoolkit which you can downloaded from http://wpf.codeplex.com/releases/view/40535. Note if this gets installed to anywhere other then C:\Program Files (x86)\WPF Toolkit" + "\v3.5.50211.1\ you will need to change the path.

To run the script you need to pass it the path to the PST file as a cmdline argument eg ./pstanlv1.ps1 "c:\mail\outlook.pst"

I've put a download of this script here the code itself looks like

$fnFileName = $args[0]

$Datehash = new-object "System.Collections.Generic.Dictionary[System.string, System.object]"
$AttachmentTypehash = @{ }
$ItemTypehash = @{ }
Add-Type -Assembly PresentationFramework
$dataVisualization = "C:\Program Files (x86)\WPF Toolkit" + "\v3.5.50211.1\System.Windows.Controls.DataVisualization.Toolkit.dll"
$wpfToolkit = "C:\Program Files (x86)\WPF Toolkit" + "\v3.5.50211.1\WPFToolkit.dll"
Add-Type -Path $dataVisualization
Add-Type -Path $wpfToolkit

[xml]$xaml = @"
<Window
xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
xmlns:dg="http://schemas.microsoft.com/wpf/2008/toolkit"
xmlns:chartingToolkit="clr-namespace:System.Windows.Controls.DataVisualization.Charting;assembly=System.Windows.Controls.DataVisualization.Toolkit"
Title="MainWindow" Height="auto" Width="auto">
<Grid>
<TabControl Height="auto" HorizontalAlignment="Stretch" VerticalAlignment="Stretch" Margin="0,0,0,0" Name="Home" Width="auto">
<TabItem Header="OverView" Name="OverView">
<Grid>
<Canvas Height="Auto" HorizontalAlignment="Left" Margin="0,0,0,0" Name="canvas1" VerticalAlignment="Top" Width="Auto"></Canvas>
<dg:DataGrid AutoGenerateColumns="True" Height="Auto" HorizontalAlignment="Left" Margin="0,0,0,0" Name="dataGrid1" VerticalAlignment="Top" Width="600" />
<chartingToolkit:Chart x:Name="PieChart1" Margin="600,0,0,250">
<chartingToolkit:Chart.Series>
<chartingToolkit:PieSeries ItemsSource="{Binding}"
DependentValuePath="Value"
IndependentValuePath="Key" />
</chartingToolkit:Chart.Series>
</chartingToolkit:Chart>
<chartingToolkit:Chart x:Name="PieChart2" Margin="600,250,0,00">
<chartingToolkit:Chart.Series>
<chartingToolkit:PieSeries ItemsSource="{Binding}"
DependentValuePath="Value.SizeofAttachments"
IndependentValuePath="Key" />
</chartingToolkit:Chart.Series>
</chartingToolkit:Chart>
</Grid>
</TabItem>
<TabItem Header="Content Age" Name="Cage">
<Grid>
<Canvas Height="Auto" HorizontalAlignment="Left" Margin="0,0,0,0" Name="canvas2" VerticalAlignment="Top" Width="Auto"></Canvas>
<DataGrid AutoGenerateColumns="True" Height="Auto" HorizontalAlignment="Left" Margin="0,250,0,0" Name="dataGrid2" VerticalAlignment="Top" Width="auto" />
<chartingToolkit:Chart x:Name="PieChart3" Margin="0,0,0,250">
<chartingToolkit:Chart.Series>
<chartingToolkit:PieSeries ItemsSource="{Binding}"
DependentValuePath="Value"
IndependentValuePath="Key" />
</chartingToolkit:Chart.Series>
</chartingToolkit:Chart>
</Grid>
</TabItem>
<TabItem Header="Item Type" Name="Itype">
<Grid>
<Canvas Height="Auto" HorizontalAlignment="Left" Margin="0,0,0,0" Name="canvas3" VerticalAlignment="Top" Width="Auto"></Canvas>
<dg:DataGrid AutoGenerateColumns="True" Height="Auto" HorizontalAlignment="Left" Margin="0,0,0,0" Name="dataGrid3" VerticalAlignment="Top" Width="400" />
<chartingToolkit:Chart x:Name="PieChart4" Margin="400,0,0,250" Title="Item type by Item Count">
<chartingToolkit:Chart.Series>
<chartingToolkit:PieSeries ItemsSource="{Binding}"
DependentValuePath="Value.NumberofItems"
IndependentValuePath="Key" />
</chartingToolkit:Chart.Series>
</chartingToolkit:Chart>
<chartingToolkit:Chart x:Name="PieChart5" Margin="400,250,0,00" Title="Item type by Item Size">
<chartingToolkit:Chart.Series>
<chartingToolkit:PieSeries ItemsSource="{Binding}"
DependentValuePath="Value.SizeofItems"
IndependentValuePath="Key" />
</chartingToolkit:Chart.Series>
</chartingToolkit:Chart>
</Grid>
</TabItem>
</TabControl>
</Grid>
</Window>
"@


Function Enumfolders($cnCurrentFolder){
foreach($folder in $cnCurrentFolder.Folders){
"Processing : " + $folder.Name
ProcessItems($folder)
If($folder.Folders.Count -ne 0){Enumfolders($folder)}
}
}

Function ProcessItems($wfWorkingFolder){
$lnum = 0
foreach($Item in $wfWorkingFolder.Items){
$lnum ++
write-progress "Processing message" $lnum
if ($ItemTypehash.ContainsKey($Item.MessageClass)){
$ItemTypehash[$Item.MessageClass].NumberofItems = $ItemTypehash[$Item.MessageClass].NumberofItems + 1
$ItemTypehash[$Item.MessageClass].SizeofItems = $ItemTypehash[$Item.MessageClass].SizeofItems + $Item.Size
}
else{
$iaItemAgobject = "" | select NumberofItems,SizeofItems
$iaItemAgobject.NumberofItems = 1
$iaItemAgobject.SizeofItems = $Item.Size
$ItemTypehash.add($Item.MessageClass,$iaItemAgobject)
}
$ItemAttachedNumber = 0
$ItemAttachedSize = 0
if ($Item.Attachments.Count -ne 0){
foreach($attachment in $Item.Attachments){
$ItemAttachedNumber = $ItemAttachedNumber +1
$ItemAttachedSize = $ItemAttachedSize + $attachment.Size
if ($Attachment.FileName -eq $null){
$attachext = "Embeeded"
}
else{
if ($Attachment.FileName.Substring($Attachment.FileName.Length-4,1) -eq ".")
{
$attachext = $Attachment.FileName.Substring($Attachment.FileName.Length-3,3)
}
else {
if ($Attachment.FileName.Substring($Attachment.FileName.Length-5,1) -eq "."){
$attachext = $Attachment.FileName.Substring($Attachment.FileName.Length-4,4)
}
else{
$attachext = "unkonwn"
}
}
}
if ($AttachmentTypehash.ContainsKey($attachext)){
$AttachmentTypehash[$attachext].NumberofAttachments = $AttachmentTypehash[$attachext].NumberofAttachments + 1
$AttachmentTypehash[$attachext].SizeofAttachments = $AttachmentTypehash[$attachext].SizeofAttachments + $Attachment.Size
}
else{
$iaAttachmentAgobject = "" | select NumberofAttachments,SizeofAttachments
$iaAttachmentAgobject.NumberofAttachments = 1
$iaAttachmentAgobject.SizeofAttachments = $Attachment.Size
$AttachmentTypehash.add($attachext,$iaAttachmentAgobject)
}
}
}
$caContentAge = New-TimeSpan $Item.ReceivedTime $(Get-Date)
if($caContentAge.days -le 183){$ca = 6}
if($caContentAge.days -gt 183 -band $caContentAge.days -le 365){$ca = 12}
if($caContentAge.days -gt 365 -band $caContentAge.days -le 1095){$ca = 36}
if($caContentAge.days -gt 1095 -band $caContentAge.days -le 1825){$ca = 60}
if($caContentAge.days -gt 1825){$ca = 100}
$agkey = $Item.ReceivedTime.ToString("yyyyMMdd") + "-" + $wfWorkingFolder.Name
if ($Datehash.ContainsKey($agkey)){
$Datehash[$agkey].NumberofItems = $Datehash[$agkey].NumberofItems + 1
$Datehash[$agkey].SizeofItems = $Datehash[$agkey].SizeofItems + $Item.Size
$Datehash[$agkey].NumberofAttachments = $Datehash[$agkey].NumberofAttachments + $ItemAttachedNumber
$Datehash[$agkey].AttachmentSize = $Datehash[$agkey].AttachmentSize + $ItemAttachedSize
}
Else{
$daDateAgregationobject = "" | select Date,Folder,ContentAge,NumberofItems,SizeofItems,NumberofAttachments,AttachmentSize
$daDateAgregationobject.Date = $Item.ReceivedTime.ToString("yyyyMMdd")
$daDateAgregationobject.Folder = $wfWorkingFolder.Name
$daDateAgregationobject.ContentAge = $ca
$daDateAgregationobject.NumberofItems = 1
$daDateAgregationobject.SizeofItems = $Item.Size
$daDateAgregationobject.NumberofAttachments = $ItemAttachedNumber
$daDateAgregationobject.AttachmentSize = $ItemAttachedSize
$Datehash.add($agkey,$daDateAgregationobject)
}
}
}

$RDOSession = new-object -com Redemption.RDOsession

$PSTfile = $RDOSession.LogonPSTStore($fnFileName, 1)
$PSTRoot = $RDOSession.GetFolderFromID($PSTfile.IPMRootFolder.EntryID, $PSTfile.EntryID)
Enumfolders($PSTRoot)

$byDateTable = New-Object System.Data.Datatable
$byDateTable.columns.add("Folder")
$byDateTable.columns.add("#Items",[INT64])
$byDateTable.columns.add("Items Size(MB)",[INT64])
$byDateTable.columns.add("#Attachments",[INT64])
$byDateTable.columns.add("Attachments Size(MB)",[INT64])
$byAgeTable = New-Object System.Data.Datatable
$byAgeTable.columns.add("Folder")
$byAgeTable.columns.add("6>#Items",[INT64])
$byAgeTable.columns.add("6>#(MB)",[INT64])
$byAgeTable.columns.add("6>#Atch",[INT64])
$byAgeTable.columns.add("6>#Atch(MB)",[INT64])
$byAgeTable.columns.add("6to12#Items",[INT64])
$byAgeTable.columns.add("6to12#(MB)",[INT64])
$byAgeTable.columns.add("6to12#Atch",[INT64])
$byAgeTable.columns.add("6to12#Atch(MB)",[INT64])
$byAgeTable.columns.add("1to3years#Items",[INT64])
$byAgeTable.columns.add("1to3years#(MB)",[INT64])
$byAgeTable.columns.add("1to3years#Atch",[INT64])
$byAgeTable.columns.add("1to3years#Atch(MB)",[INT64])
$byAgeTable.columns.add("3to5years#Items",[INT64])
$byAgeTable.columns.add("3to5years#(MB)",[INT64])
$byAgeTable.columns.add("3to5years#Atch",[INT64])
$byAgeTable.columns.add("3to5years#Atch(MB)",[INT64])
$byAgeTable.columns.add("5+years#Items",[INT64])
$byAgeTable.columns.add("5+years#(MB)",[INT64])
$byAgeTable.columns.add("5+years#Atch",[INT64])
$byAgeTable.columns.add("5+years#Atch(MB)",[INT64])
$Charthash = @{ }
$cCount1 = 0
$Datehash.Values | group-object {$_.Folder} | Sort-Object @{expression={(($_.Group | Measure-Object SizeofItems -sum).sum/1MB)}} -Descending | foreach-object{
if ((($_.Group | Measure-Object SizeofItems -sum).sum/1MB) -gt 1 -band $cCount1 -le 10){
if ($_.Name.Length -gt 10){$chartname = $_.Name.Substring(0,10)}
else{$chartname = $_.Name}
$Charthash.add($chartname,(($_.Group | Measure-Object SizeofItems -sum).sum/1MB))
}
$cCount1++
[VOID]$byDateTable.rows.add($_.Name,($_.Group | Measure-Object NumberofItems -sum).sum/1,(($_.Group | Measure-Object SizeofItems -sum).sum/1MB),($_.Group | Measure-Object NumberofAttachments -sum).sum/1,(($_.Group | Measure-Object AttachmentSize -sum).sum/1MB))
}
$Charthash2 = @{ }
$Charthash2.Add("Under 6 Months",0)
$Charthash2.Add("6 to 12 Months",0)
$Charthash2.Add("1 to 3 years",0)
$Charthash2.Add("3 to 5 years",0)
$Charthash2.Add("Over 5 years",0)
$Datehash.Values | group-object {$_.Folder} | Sort-Object @{expression={(($_.Group | Measure-Object SizeofItems -sum).sum/1MB)}} -Descending | foreach-object{
$Charthash2["Under 6 Months"] = $Charthash2["Under 6 Months"] + ($_.Group | Where-Object {$_.ContentAge -eq 6} | Measure-Object SizeofItems -sum).sum/1MB
$Charthash2["6 to 12 Months"] = $Charthash2["6 to 12 Months"] + ($_.Group | Where-Object {$_.ContentAge -eq 12} | Measure-Object SizeofItems -sum).sum/1MB
$Charthash2["1 to 3 years"] = $Charthash2["1 to 3 years"] + ($_.Group | Where-Object {$_.ContentAge -eq 36} | Measure-Object SizeofItems -sum).sum/1MB
$Charthash2["3 to 5 years"] = $Charthash2["3 to 5 years"] + ($_.Group | Where-Object {$_.ContentAge -eq 60} | Measure-Object SizeofItems -sum).sum/1MB
$Charthash2["Over 5 years"] = $Charthash2["Over 5 years"] + ($_.Group | Where-Object {$_.ContentAge -eq 100} | Measure-Object SizeofItems -sum).sum/1MB
[VOID]$byAgeTable.rows.add($_.Name,($_.Group | Where-Object {$_.ContentAge -eq 6} | Measure-Object NumberofItems -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 6} | Measure-Object SizeofItems -sum).sum/1MB),($_.Group | Where-Object {$_.ContentAge -eq 6} | Measure-Object NumberofAttachments -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 6} | Measure-Object AttachmentSize -sum).sum/1MB),($_.Group | Where-Object {$_.ContentAge -eq 12} | Measure-Object NumberofItems -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 12} | Measure-Object SizeofItems -sum).sum/1MB),($_.Group | Where-Object {$_.ContentAge -eq 12} | Measure-Object NumberofAttachments -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 12} | Measure-Object AttachmentSize -sum).sum/1MB),($_.Group | Where-Object {$_.ContentAge -eq 36} | Measure-Object NumberofItems -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 36} | Measure-Object SizeofItems -sum).sum/1MB),($_.Group | Where-Object {$_.ContentAge -eq 36} | Measure-Object NumberofAttachments -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 36} | Measure-Object AttachmentSize -sum).sum/1MB),($_.Group | Where-Object {$_.ContentAge -eq 60} | Measure-Object NumberofItems -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 60} | Measure-Object SizeofItems -sum).sum/1MB),($_.Group | Where-Object {$_.ContentAge -eq 60} | Measure-Object NumberofAttachments -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 60} | Measure-Object AttachmentSize -sum).sum/1MB),($_.Group | Where-Object {$_.ContentAge -eq 100} | Measure-Object NumberofItems -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 100} | Measure-Object SizeofItems -sum).sum/1MB),($_.Group | Where-Object {$_.ContentAge -eq 100} | Measure-Object NumberofAttachments -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 100} | Measure-Object AttachmentSize -sum).sum/1MB))
}
$XMLreader = New-Object System.Xml.XmlNodeReader $xaml
$XAMLreader = [Windows.Markup.XamlReader]::Load($XMLreader)
$tc = $XAMLreader.FindName("PieChart1")
$tc.DataContext = $Charthash
$tc = $XAMLreader.FindName("PieChart2")
$tc.DataContext = ($AttachmentTypehash.GetEnumerator() | Sort-Object Value.SizeofAttachments | select-object -First 10)
$tc = $XAMLreader.FindName("PieChart3")
$tc.DataContext = $Charthash2
$tc = $XAMLreader.FindName("PieChart4")
$tc.DataContext = ($ItemTypehash.GetEnumerator() | Sort-Object Value.SizeofItems | select-object -First 10)
$tc = $XAMLreader.FindName("PieChart5")
$tc.DataContext = ($ItemTypehash.GetEnumerator() | Sort-Object Value.SizeofItems | select-object -First 10)
$datagrid = $XAMLreader.FindName("dataGrid1")
$datagrid.ItemsSource = $byDateTable.defaultview
$datagrid2 = $XAMLreader.FindName("dataGrid2")
$datagrid2.ItemsSource = $byAgeTable.defaultview

$byItemTable = New-Object System.Data.Datatable
$byItemTable.columns.add("ItemType")
$byItemTable.columns.add("#Items",[INT64])
$byItemTable.columns.add("Items Size(MB)",[INT64])
$ItemTypehash.GetEnumerator() | foreach-object {
[VOID]$byItemTable.rows.Add($_.key.ToString(),$_.value.NumberofItems,$_.value.SizeofItems)
$_.key.ToString()
}

$datagrid3 = $XAMLreader.FindName("dataGrid3")
$datagrid3.ItemsSource = $byItemTable.defaultview
$XAMLreader.ShowDialog()

Popular posts from this blog

Testing and Sending email via SMTP using Opportunistic TLS and oAuth in Office365 with PowerShell

As well as EWS and Remote PowerShell (RPS) other mail protocols POP3, IMAP and SMTP have had OAuth authentication enabled in Exchange Online (Official announcement here ). A while ago I created  this script that used Opportunistic TLS to perform a Telnet style test against a SMTP server using SMTP AUTH. Now that oAuth authentication has been enabled in office365 I've updated this script to be able to use oAuth instead of SMTP Auth to test against Office365. I've also included a function to actually send a Message. Token Acquisition  To Send a Mail using oAuth you first need to get an Access token from Azure AD there are plenty of ways of doing this in PowerShell. You could use a library like MSAL or ADAL (just google your favoured method) or use a library less approach which I've included with this script . Whatever way you do this you need to make sure that your application registration  https://docs.microsoft.com/en-us/azure/active-directory/develop/quickstart-register-

How to test SMTP using Opportunistic TLS with Powershell and grab the public certificate a SMTP server is using

Most email services these day employ Opportunistic TLS when trying to send Messages which means that wherever possible the Messages will be encrypted rather then the plain text legacy of SMTP.  This method was defined in RFC 3207 "SMTP Service Extension for Secure SMTP over Transport Layer Security" and  there's a quite a good explanation of Opportunistic TLS on Wikipedia  https://en.wikipedia.org/wiki/Opportunistic_TLS .  This is used for both Server to Server (eg MTA to MTA) and Client to server (Eg a Message client like Outlook which acts as a MSA) the later being generally Authenticated. Basically it allows you to have a normal plain text SMTP conversation that is then upgraded to TLS using the STARTTLS verb. Not all servers will support this verb so if its not supported then a message is just sent as Plain text. TLS relies on PKI certificates and the administrative issue s that come around certificate management like expired certificates which is why I wrote th

The MailboxConcurrency limit and using Batching in the Microsoft Graph API

If your getting an error such as Application is over its MailboxConcurrency limit while using the Microsoft Graph API this post may help you understand why. Background   The Mailbox  concurrency limit when your using the Graph API is 4 as per https://docs.microsoft.com/en-us/graph/throttling#outlook-service-limits . This is evaluated for each app ID and mailbox combination so this means you can have different apps running under the same credentials and the poor behavior of one won't cause the other to be throttled. If you compared that to EWS you could have up to 27 concurrent connections but they are shared across all apps on a first come first served basis. Batching Batching in the Graph API is a way of combining multiple requests into a single HTTP request. Batching in the Exchange Mail API's EWS and MAPI has been around for a long time and its common, for email Apps to process large numbers of smaller items for a variety of reasons.  Batching in the Graph is limited to a m
All sample scripts and source code is provided by for illustrative purposes only. All examples are untested in different environments and therefore, I cannot guarantee or imply reliability, serviceability, or function of these programs.

All code contained herein is provided to you "AS IS" without any warranties of any kind. The implied warranties of non-infringement, merchantability and fitness for a particular purpose are expressly disclaimed.