Skip to main content

Anaylising the content of a PST file and reporting on the age and type of content using Powershell and WPF

As time passes we all receive more and more email, this is one of the irrefutable facts of life for any mail system or anybody with a mailbox. Even the most fastidious of deleters still can't avoid this so at some point in the future you may need to consider achieving. The fact is your users maybe already self archiving using PST files which like self medicating can lead to serious problems down the track when their laptop goes under a bus. Exchange 2010 introduced native achieving and SP1 when its released will build on these capabilities, for other versions of exchange there are a bunch of other fine products you can look at to do this. If your looking to ingest PST's into an online archive you may want to first review what content your going to be importing and look at things like how many attachments and what type are they and how old is the content to give you a feel for what your going to be storing and potentially having to backup.

What this boils down to is if you want to include attachment reporting then you will need a script that will do a pass on every item within a mailbox. This means in real terms that this script is going to be slow to very slow to run on a very large PST file. If you can wait you can get some useful information so there is a trade off their somewhere.

To loop through every message within a PST file using powershell is pretty easy if you use Dmitry's redemption library http://www.dimastr.com/redemption/. RDO gives us an easy to use wrapper around exMapi and the Outlooks PST provider. So now we have the ability to go through every item in a pst file the next thing to decide is how you want to group and classify each item. I've chosen to do this by date and content age this gives the most flexibility when it comes to aggregating the data later on. To add extra functionality in this script I've created separate hash's for attachment sizes and types this is something that you can build on yourself all you need to do is think about what it is you want to report and aggregate and then just add in a few lines of your own code.

When it comes time to report on the data that was collected in the PST sweep this is when Powershell comes into its own with the Group-Object and Measure-Object cmdlets. Because during the sweep i classified each message group depending on the age of the content i can then further re-aggregate this data using some pipeline magic eg
$Datehash.Values | group-object {$_.Folder} | Sort-Object @{expression={(($_.Group | Measure-Object SizeofItems -sum).sum/1MB)}} -Descending | foreach-object{

$Charthash2["1 to 3 years"] = $Charthash2["1 to 3 years"] + ($_.Group | Where-Object {$_.ContentAge -eq 36} | Measure-Object SizeofItems -sum).sum/1MB

}
Actually explaining what this line does would take a separate post but when it comes to quantifying numerical information without using a database this is extremely useful. In real words it first groups the data by Folder Name and then allows me to aggregate within the grouped data.

Displaying the result

I decided to try something new with this script instead of using the normal winform GUI I've used a lot in the past i went down the WPF path. WPF (Windows Presentation Framework) first appeared in .NET 3.0 so to use this script you must have the .NET 3.5 framework installed as well as the wpftoolkit which contains both datagrid and chart control I've used in this script. As far as a comparison between Winforms and WPF from powershell WPF is by far easier to use when building a GUI script because you define all the element in XML then just manipulate the data providers with code. You can use something like Visual studio to build the graphical look for your GUI and then cut and past the code more or less straight into your script with a few small changes. The result of what you get is also visually more pleasing eg these are a few of the screen-shots from this script.



Using this script

There are a few pre-requisites for using this script first you need redemption http://www.dimastr.com/redemption/download.htm

If you want to run this on Windows 7 64bit I found i had to use the following launcher script to start a 32bit session to run the script

&$env:windir\syswow64\windowspowershell\v1.0\powershell.exe -noninteractive -STA

This also ensures that you have a STA session of powershell which is important for the WPF code to work correctly. The last things that's required is the WPFtoolkit which you can downloaded from http://wpf.codeplex.com/releases/view/40535. Note if this gets installed to anywhere other then C:\Program Files (x86)\WPF Toolkit" + "\v3.5.50211.1\ you will need to change the path.

To run the script you need to pass it the path to the PST file as a cmdline argument eg ./pstanlv1.ps1 "c:\mail\outlook.pst"

I've put a download of this script here the code itself looks like

$fnFileName = $args[0]

$Datehash = new-object "System.Collections.Generic.Dictionary[System.string, System.object]"
$AttachmentTypehash = @{ }
$ItemTypehash = @{ }
Add-Type -Assembly PresentationFramework
$dataVisualization = "C:\Program Files (x86)\WPF Toolkit" + "\v3.5.50211.1\System.Windows.Controls.DataVisualization.Toolkit.dll"
$wpfToolkit = "C:\Program Files (x86)\WPF Toolkit" + "\v3.5.50211.1\WPFToolkit.dll"
Add-Type -Path $dataVisualization
Add-Type -Path $wpfToolkit

[xml]$xaml = @"
<Window
xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
xmlns:dg="http://schemas.microsoft.com/wpf/2008/toolkit"
xmlns:chartingToolkit="clr-namespace:System.Windows.Controls.DataVisualization.Charting;assembly=System.Windows.Controls.DataVisualization.Toolkit"
Title="MainWindow" Height="auto" Width="auto">
<Grid>
<TabControl Height="auto" HorizontalAlignment="Stretch" VerticalAlignment="Stretch" Margin="0,0,0,0" Name="Home" Width="auto">
<TabItem Header="OverView" Name="OverView">
<Grid>
<Canvas Height="Auto" HorizontalAlignment="Left" Margin="0,0,0,0" Name="canvas1" VerticalAlignment="Top" Width="Auto"></Canvas>
<dg:DataGrid AutoGenerateColumns="True" Height="Auto" HorizontalAlignment="Left" Margin="0,0,0,0" Name="dataGrid1" VerticalAlignment="Top" Width="600" />
<chartingToolkit:Chart x:Name="PieChart1" Margin="600,0,0,250">
<chartingToolkit:Chart.Series>
<chartingToolkit:PieSeries ItemsSource="{Binding}"
DependentValuePath="Value"
IndependentValuePath="Key" />
</chartingToolkit:Chart.Series>
</chartingToolkit:Chart>
<chartingToolkit:Chart x:Name="PieChart2" Margin="600,250,0,00">
<chartingToolkit:Chart.Series>
<chartingToolkit:PieSeries ItemsSource="{Binding}"
DependentValuePath="Value.SizeofAttachments"
IndependentValuePath="Key" />
</chartingToolkit:Chart.Series>
</chartingToolkit:Chart>
</Grid>
</TabItem>
<TabItem Header="Content Age" Name="Cage">
<Grid>
<Canvas Height="Auto" HorizontalAlignment="Left" Margin="0,0,0,0" Name="canvas2" VerticalAlignment="Top" Width="Auto"></Canvas>
<DataGrid AutoGenerateColumns="True" Height="Auto" HorizontalAlignment="Left" Margin="0,250,0,0" Name="dataGrid2" VerticalAlignment="Top" Width="auto" />
<chartingToolkit:Chart x:Name="PieChart3" Margin="0,0,0,250">
<chartingToolkit:Chart.Series>
<chartingToolkit:PieSeries ItemsSource="{Binding}"
DependentValuePath="Value"
IndependentValuePath="Key" />
</chartingToolkit:Chart.Series>
</chartingToolkit:Chart>
</Grid>
</TabItem>
<TabItem Header="Item Type" Name="Itype">
<Grid>
<Canvas Height="Auto" HorizontalAlignment="Left" Margin="0,0,0,0" Name="canvas3" VerticalAlignment="Top" Width="Auto"></Canvas>
<dg:DataGrid AutoGenerateColumns="True" Height="Auto" HorizontalAlignment="Left" Margin="0,0,0,0" Name="dataGrid3" VerticalAlignment="Top" Width="400" />
<chartingToolkit:Chart x:Name="PieChart4" Margin="400,0,0,250" Title="Item type by Item Count">
<chartingToolkit:Chart.Series>
<chartingToolkit:PieSeries ItemsSource="{Binding}"
DependentValuePath="Value.NumberofItems"
IndependentValuePath="Key" />
</chartingToolkit:Chart.Series>
</chartingToolkit:Chart>
<chartingToolkit:Chart x:Name="PieChart5" Margin="400,250,0,00" Title="Item type by Item Size">
<chartingToolkit:Chart.Series>
<chartingToolkit:PieSeries ItemsSource="{Binding}"
DependentValuePath="Value.SizeofItems"
IndependentValuePath="Key" />
</chartingToolkit:Chart.Series>
</chartingToolkit:Chart>
</Grid>
</TabItem>
</TabControl>
</Grid>
</Window>
"@


Function Enumfolders($cnCurrentFolder){
foreach($folder in $cnCurrentFolder.Folders){
"Processing : " + $folder.Name
ProcessItems($folder)
If($folder.Folders.Count -ne 0){Enumfolders($folder)}
}
}

Function ProcessItems($wfWorkingFolder){
$lnum = 0
foreach($Item in $wfWorkingFolder.Items){
$lnum ++
write-progress "Processing message" $lnum
if ($ItemTypehash.ContainsKey($Item.MessageClass)){
$ItemTypehash[$Item.MessageClass].NumberofItems = $ItemTypehash[$Item.MessageClass].NumberofItems + 1
$ItemTypehash[$Item.MessageClass].SizeofItems = $ItemTypehash[$Item.MessageClass].SizeofItems + $Item.Size
}
else{
$iaItemAgobject = "" | select NumberofItems,SizeofItems
$iaItemAgobject.NumberofItems = 1
$iaItemAgobject.SizeofItems = $Item.Size
$ItemTypehash.add($Item.MessageClass,$iaItemAgobject)
}
$ItemAttachedNumber = 0
$ItemAttachedSize = 0
if ($Item.Attachments.Count -ne 0){
foreach($attachment in $Item.Attachments){
$ItemAttachedNumber = $ItemAttachedNumber +1
$ItemAttachedSize = $ItemAttachedSize + $attachment.Size
if ($Attachment.FileName -eq $null){
$attachext = "Embeeded"
}
else{
if ($Attachment.FileName.Substring($Attachment.FileName.Length-4,1) -eq ".")
{
$attachext = $Attachment.FileName.Substring($Attachment.FileName.Length-3,3)
}
else {
if ($Attachment.FileName.Substring($Attachment.FileName.Length-5,1) -eq "."){
$attachext = $Attachment.FileName.Substring($Attachment.FileName.Length-4,4)
}
else{
$attachext = "unkonwn"
}
}
}
if ($AttachmentTypehash.ContainsKey($attachext)){
$AttachmentTypehash[$attachext].NumberofAttachments = $AttachmentTypehash[$attachext].NumberofAttachments + 1
$AttachmentTypehash[$attachext].SizeofAttachments = $AttachmentTypehash[$attachext].SizeofAttachments + $Attachment.Size
}
else{
$iaAttachmentAgobject = "" | select NumberofAttachments,SizeofAttachments
$iaAttachmentAgobject.NumberofAttachments = 1
$iaAttachmentAgobject.SizeofAttachments = $Attachment.Size
$AttachmentTypehash.add($attachext,$iaAttachmentAgobject)
}
}
}
$caContentAge = New-TimeSpan $Item.ReceivedTime $(Get-Date)
if($caContentAge.days -le 183){$ca = 6}
if($caContentAge.days -gt 183 -band $caContentAge.days -le 365){$ca = 12}
if($caContentAge.days -gt 365 -band $caContentAge.days -le 1095){$ca = 36}
if($caContentAge.days -gt 1095 -band $caContentAge.days -le 1825){$ca = 60}
if($caContentAge.days -gt 1825){$ca = 100}
$agkey = $Item.ReceivedTime.ToString("yyyyMMdd") + "-" + $wfWorkingFolder.Name
if ($Datehash.ContainsKey($agkey)){
$Datehash[$agkey].NumberofItems = $Datehash[$agkey].NumberofItems + 1
$Datehash[$agkey].SizeofItems = $Datehash[$agkey].SizeofItems + $Item.Size
$Datehash[$agkey].NumberofAttachments = $Datehash[$agkey].NumberofAttachments + $ItemAttachedNumber
$Datehash[$agkey].AttachmentSize = $Datehash[$agkey].AttachmentSize + $ItemAttachedSize
}
Else{
$daDateAgregationobject = "" | select Date,Folder,ContentAge,NumberofItems,SizeofItems,NumberofAttachments,AttachmentSize
$daDateAgregationobject.Date = $Item.ReceivedTime.ToString("yyyyMMdd")
$daDateAgregationobject.Folder = $wfWorkingFolder.Name
$daDateAgregationobject.ContentAge = $ca
$daDateAgregationobject.NumberofItems = 1
$daDateAgregationobject.SizeofItems = $Item.Size
$daDateAgregationobject.NumberofAttachments = $ItemAttachedNumber
$daDateAgregationobject.AttachmentSize = $ItemAttachedSize
$Datehash.add($agkey,$daDateAgregationobject)
}
}
}

$RDOSession = new-object -com Redemption.RDOsession

$PSTfile = $RDOSession.LogonPSTStore($fnFileName, 1)
$PSTRoot = $RDOSession.GetFolderFromID($PSTfile.IPMRootFolder.EntryID, $PSTfile.EntryID)
Enumfolders($PSTRoot)

$byDateTable = New-Object System.Data.Datatable
$byDateTable.columns.add("Folder")
$byDateTable.columns.add("#Items",[INT64])
$byDateTable.columns.add("Items Size(MB)",[INT64])
$byDateTable.columns.add("#Attachments",[INT64])
$byDateTable.columns.add("Attachments Size(MB)",[INT64])
$byAgeTable = New-Object System.Data.Datatable
$byAgeTable.columns.add("Folder")
$byAgeTable.columns.add("6>#Items",[INT64])
$byAgeTable.columns.add("6>#(MB)",[INT64])
$byAgeTable.columns.add("6>#Atch",[INT64])
$byAgeTable.columns.add("6>#Atch(MB)",[INT64])
$byAgeTable.columns.add("6to12#Items",[INT64])
$byAgeTable.columns.add("6to12#(MB)",[INT64])
$byAgeTable.columns.add("6to12#Atch",[INT64])
$byAgeTable.columns.add("6to12#Atch(MB)",[INT64])
$byAgeTable.columns.add("1to3years#Items",[INT64])
$byAgeTable.columns.add("1to3years#(MB)",[INT64])
$byAgeTable.columns.add("1to3years#Atch",[INT64])
$byAgeTable.columns.add("1to3years#Atch(MB)",[INT64])
$byAgeTable.columns.add("3to5years#Items",[INT64])
$byAgeTable.columns.add("3to5years#(MB)",[INT64])
$byAgeTable.columns.add("3to5years#Atch",[INT64])
$byAgeTable.columns.add("3to5years#Atch(MB)",[INT64])
$byAgeTable.columns.add("5+years#Items",[INT64])
$byAgeTable.columns.add("5+years#(MB)",[INT64])
$byAgeTable.columns.add("5+years#Atch",[INT64])
$byAgeTable.columns.add("5+years#Atch(MB)",[INT64])
$Charthash = @{ }
$cCount1 = 0
$Datehash.Values | group-object {$_.Folder} | Sort-Object @{expression={(($_.Group | Measure-Object SizeofItems -sum).sum/1MB)}} -Descending | foreach-object{
if ((($_.Group | Measure-Object SizeofItems -sum).sum/1MB) -gt 1 -band $cCount1 -le 10){
if ($_.Name.Length -gt 10){$chartname = $_.Name.Substring(0,10)}
else{$chartname = $_.Name}
$Charthash.add($chartname,(($_.Group | Measure-Object SizeofItems -sum).sum/1MB))
}
$cCount1++
[VOID]$byDateTable.rows.add($_.Name,($_.Group | Measure-Object NumberofItems -sum).sum/1,(($_.Group | Measure-Object SizeofItems -sum).sum/1MB),($_.Group | Measure-Object NumberofAttachments -sum).sum/1,(($_.Group | Measure-Object AttachmentSize -sum).sum/1MB))
}
$Charthash2 = @{ }
$Charthash2.Add("Under 6 Months",0)
$Charthash2.Add("6 to 12 Months",0)
$Charthash2.Add("1 to 3 years",0)
$Charthash2.Add("3 to 5 years",0)
$Charthash2.Add("Over 5 years",0)
$Datehash.Values | group-object {$_.Folder} | Sort-Object @{expression={(($_.Group | Measure-Object SizeofItems -sum).sum/1MB)}} -Descending | foreach-object{
$Charthash2["Under 6 Months"] = $Charthash2["Under 6 Months"] + ($_.Group | Where-Object {$_.ContentAge -eq 6} | Measure-Object SizeofItems -sum).sum/1MB
$Charthash2["6 to 12 Months"] = $Charthash2["6 to 12 Months"] + ($_.Group | Where-Object {$_.ContentAge -eq 12} | Measure-Object SizeofItems -sum).sum/1MB
$Charthash2["1 to 3 years"] = $Charthash2["1 to 3 years"] + ($_.Group | Where-Object {$_.ContentAge -eq 36} | Measure-Object SizeofItems -sum).sum/1MB
$Charthash2["3 to 5 years"] = $Charthash2["3 to 5 years"] + ($_.Group | Where-Object {$_.ContentAge -eq 60} | Measure-Object SizeofItems -sum).sum/1MB
$Charthash2["Over 5 years"] = $Charthash2["Over 5 years"] + ($_.Group | Where-Object {$_.ContentAge -eq 100} | Measure-Object SizeofItems -sum).sum/1MB
[VOID]$byAgeTable.rows.add($_.Name,($_.Group | Where-Object {$_.ContentAge -eq 6} | Measure-Object NumberofItems -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 6} | Measure-Object SizeofItems -sum).sum/1MB),($_.Group | Where-Object {$_.ContentAge -eq 6} | Measure-Object NumberofAttachments -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 6} | Measure-Object AttachmentSize -sum).sum/1MB),($_.Group | Where-Object {$_.ContentAge -eq 12} | Measure-Object NumberofItems -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 12} | Measure-Object SizeofItems -sum).sum/1MB),($_.Group | Where-Object {$_.ContentAge -eq 12} | Measure-Object NumberofAttachments -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 12} | Measure-Object AttachmentSize -sum).sum/1MB),($_.Group | Where-Object {$_.ContentAge -eq 36} | Measure-Object NumberofItems -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 36} | Measure-Object SizeofItems -sum).sum/1MB),($_.Group | Where-Object {$_.ContentAge -eq 36} | Measure-Object NumberofAttachments -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 36} | Measure-Object AttachmentSize -sum).sum/1MB),($_.Group | Where-Object {$_.ContentAge -eq 60} | Measure-Object NumberofItems -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 60} | Measure-Object SizeofItems -sum).sum/1MB),($_.Group | Where-Object {$_.ContentAge -eq 60} | Measure-Object NumberofAttachments -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 60} | Measure-Object AttachmentSize -sum).sum/1MB),($_.Group | Where-Object {$_.ContentAge -eq 100} | Measure-Object NumberofItems -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 100} | Measure-Object SizeofItems -sum).sum/1MB),($_.Group | Where-Object {$_.ContentAge -eq 100} | Measure-Object NumberofAttachments -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 100} | Measure-Object AttachmentSize -sum).sum/1MB))
}
$XMLreader = New-Object System.Xml.XmlNodeReader $xaml
$XAMLreader = [Windows.Markup.XamlReader]::Load($XMLreader)
$tc = $XAMLreader.FindName("PieChart1")
$tc.DataContext = $Charthash
$tc = $XAMLreader.FindName("PieChart2")
$tc.DataContext = ($AttachmentTypehash.GetEnumerator() | Sort-Object Value.SizeofAttachments | select-object -First 10)
$tc = $XAMLreader.FindName("PieChart3")
$tc.DataContext = $Charthash2
$tc = $XAMLreader.FindName("PieChart4")
$tc.DataContext = ($ItemTypehash.GetEnumerator() | Sort-Object Value.SizeofItems | select-object -First 10)
$tc = $XAMLreader.FindName("PieChart5")
$tc.DataContext = ($ItemTypehash.GetEnumerator() | Sort-Object Value.SizeofItems | select-object -First 10)
$datagrid = $XAMLreader.FindName("dataGrid1")
$datagrid.ItemsSource = $byDateTable.defaultview
$datagrid2 = $XAMLreader.FindName("dataGrid2")
$datagrid2.ItemsSource = $byAgeTable.defaultview

$byItemTable = New-Object System.Data.Datatable
$byItemTable.columns.add("ItemType")
$byItemTable.columns.add("#Items",[INT64])
$byItemTable.columns.add("Items Size(MB)",[INT64])
$ItemTypehash.GetEnumerator() | foreach-object {
[VOID]$byItemTable.rows.Add($_.key.ToString(),$_.value.NumberofItems,$_.value.SizeofItems)
$_.key.ToString()
}

$datagrid3 = $XAMLreader.FindName("dataGrid3")
$datagrid3.ItemsSource = $byItemTable.defaultview
$XAMLreader.ShowDialog()

Popular posts from this blog

The MailboxConcurrency limit and using Batching in the Microsoft Graph API

If your getting an error such as Application is over its MailboxConcurrency limit while using the Microsoft Graph API this post may help you understand why. Background   The Mailbox  concurrency limit when your using the Graph API is 4 as per https://docs.microsoft.com/en-us/graph/throttling#outlook-service-limits . This is evaluated for each app ID and mailbox combination so this means you can have different apps running under the same credentials and the poor behavior of one won't cause the other to be throttled. If you compared that to EWS you could have up to 27 concurrent connections but they are shared across all apps on a first come first served basis. Batching Batching in the Graph API is a way of combining multiple requests into a single HTTP request. Batching in the Exchange Mail API's EWS and MAPI has been around for a long time and its common, for email Apps to process large numbers of smaller items for a variety of reasons.  Batching in the Gr...

Exporting and Uploading Mailbox Items using Exchange Web Services using the new ExportItems and UploadItems operations in Exchange 2010 SP1

Two new EWS Operations ExportItems and UploadItems where introduced in Exchange 2010 SP1 that allowed you to do a number of useful things that where previously not possible using Exchange Web Services. Any object that Exchange stores is basically a collection of properties for example a message object is a collection of Message properties, Recipient properties and Attachment properties with a few meta properties that describe the underlying storage thrown in. Normally when using EWS you can access these properties in a number of a ways eg one example is using the strongly type objects such as emailmessage that presents the underlying properties in an intuitive way that's easy to use. Another way is using Extended Properties to access the underlying properties directly. However previously in EWS there was no method to access every property of a message hence there is no way to export or import an item and maintain full fidelity of every property on that item (you could export the...

Sending a Message in Exchange Online via REST from an Arduino MKR1000

This is part 2 of my MKR1000 article, in this previous post  I looked at sending a Message via EWS using Basic Authentication.  In this Post I'll look at using the new Outlook REST API  which requires using OAuth authentication to get an Access Token. The prerequisites for this sketch are the same as in the other post with the addition of the ArduinoJson library  https://github.com/bblanchon/ArduinoJson  which is used to parse the Authentication Results to extract the Access Token. Also the SSL certificates for the login.windows.net  and outlook.office365.com need to be uploaded to the devices using the wifi101 Firmware updater. To use Token Authentication you need to register an Application in Azure https://msdn.microsoft.com/en-us/office/office365/howto/add-common-consent-manually  with the Mail.Send permission. The application should be a Native Client app that use the Out of Band Callback urn:ietf:wg:oauth:2.0:oob. You ...
All sample scripts and source code is provided by for illustrative purposes only. All examples are untested in different environments and therefore, I cannot guarantee or imply reliability, serviceability, or function of these programs.

All code contained herein is provided to you "AS IS" without any warranties of any kind. The implied warranties of non-infringement, merchantability and fitness for a particular purpose are expressly disclaimed.