Friday, August 20, 2010

Anaylising the content of a PST file and reporting on the age and type of content using Powershell and WPF

As time passes we all receive more and more email, this is one of the irrefutable facts of life for any mail system or anybody with a mailbox. Even the most fastidious of deleters still can't avoid this so at some point in the future you may need to consider achieving. The fact is your users maybe already self archiving using PST files which like self medicating can lead to serious problems down the track when their laptop goes under a bus. Exchange 2010 introduced native achieving and SP1 when its released will build on these capabilities, for other versions of exchange there are a bunch of other fine products you can look at to do this. If your looking to ingest PST's into an online archive you may want to first review what content your going to be importing and look at things like how many attachments and what type are they and how old is the content to give you a feel for what your going to be storing and potentially having to backup.

What this boils down to is if you want to include attachment reporting then you will need a script that will do a pass on every item within a mailbox. This means in real terms that this script is going to be slow to very slow to run on a very large PST file. If you can wait you can get some useful information so there is a trade off their somewhere.

To loop through every message within a PST file using powershell is pretty easy if you use Dmitry's redemption library http://www.dimastr.com/redemption/. RDO gives us an easy to use wrapper around exMapi and the Outlooks PST provider. So now we have the ability to go through every item in a pst file the next thing to decide is how you want to group and classify each item. I've chosen to do this by date and content age this gives the most flexibility when it comes to aggregating the data later on. To add extra functionality in this script I've created separate hash's for attachment sizes and types this is something that you can build on yourself all you need to do is think about what it is you want to report and aggregate and then just add in a few lines of your own code.

When it comes time to report on the data that was collected in the PST sweep this is when Powershell comes into its own with the Group-Object and Measure-Object cmdlets. Because during the sweep i classified each message group depending on the age of the content i can then further re-aggregate this data using some pipeline magic eg
$Datehash.Values | group-object {$_.Folder} | Sort-Object @{expression={(($_.Group | Measure-Object SizeofItems -sum).sum/1MB)}} -Descending | foreach-object{

$Charthash2["1 to 3 years"] = $Charthash2["1 to 3 years"] + ($_.Group | Where-Object {$_.ContentAge -eq 36} | Measure-Object SizeofItems -sum).sum/1MB

}
Actually explaining what this line does would take a separate post but when it comes to quantifying numerical information without using a database this is extremely useful. In real words it first groups the data by Folder Name and then allows me to aggregate within the grouped data.

Displaying the result

I decided to try something new with this script instead of using the normal winform GUI I've used a lot in the past i went down the WPF path. WPF (Windows Presentation Framework) first appeared in .NET 3.0 so to use this script you must have the .NET 3.5 framework installed as well as the wpftoolkit which contains both datagrid and chart control I've used in this script. As far as a comparison between Winforms and WPF from powershell WPF is by far easier to use when building a GUI script because you define all the element in XML then just manipulate the data providers with code. You can use something like Visual studio to build the graphical look for your GUI and then cut and past the code more or less straight into your script with a few small changes. The result of what you get is also visually more pleasing eg these are a few of the screen-shots from this script.



Using this script

There are a few pre-requisites for using this script first you need redemption http://www.dimastr.com/redemption/download.htm

If you want to run this on Windows 7 64bit I found i had to use the following launcher script to start a 32bit session to run the script

&$env:windir\syswow64\windowspowershell\v1.0\powershell.exe -noninteractive -STA

This also ensures that you have a STA session of powershell which is important for the WPF code to work correctly. The last things that's required is the WPFtoolkit which you can downloaded from http://wpf.codeplex.com/releases/view/40535. Note if this gets installed to anywhere other then C:\Program Files (x86)\WPF Toolkit" + "\v3.5.50211.1\ you will need to change the path.

To run the script you need to pass it the path to the PST file as a cmdline argument eg ./pstanlv1.ps1 "c:\mail\outlook.pst"

I've put a download of this script here the code itself looks like

$fnFileName = $args[0]

$Datehash = new-object "System.Collections.Generic.Dictionary[System.string, System.object]"
$AttachmentTypehash = @{ }
$ItemTypehash = @{ }
Add-Type -Assembly PresentationFramework
$dataVisualization = "C:\Program Files (x86)\WPF Toolkit" + "\v3.5.50211.1\System.Windows.Controls.DataVisualization.Toolkit.dll"
$wpfToolkit = "C:\Program Files (x86)\WPF Toolkit" + "\v3.5.50211.1\WPFToolkit.dll"
Add-Type -Path $dataVisualization
Add-Type -Path $wpfToolkit

[xml]$xaml = @"
<Window
xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
xmlns:dg="http://schemas.microsoft.com/wpf/2008/toolkit"
xmlns:chartingToolkit="clr-namespace:System.Windows.Controls.DataVisualization.Charting;assembly=System.Windows.Controls.DataVisualization.Toolkit"
Title="MainWindow" Height="auto" Width="auto">
<Grid>
<TabControl Height="auto" HorizontalAlignment="Stretch" VerticalAlignment="Stretch" Margin="0,0,0,0" Name="Home" Width="auto">
<TabItem Header="OverView" Name="OverView">
<Grid>
<Canvas Height="Auto" HorizontalAlignment="Left" Margin="0,0,0,0" Name="canvas1" VerticalAlignment="Top" Width="Auto"></Canvas>
<dg:DataGrid AutoGenerateColumns="True" Height="Auto" HorizontalAlignment="Left" Margin="0,0,0,0" Name="dataGrid1" VerticalAlignment="Top" Width="600" />
<chartingToolkit:Chart x:Name="PieChart1" Margin="600,0,0,250">
<chartingToolkit:Chart.Series>
<chartingToolkit:PieSeries ItemsSource="{Binding}"
DependentValuePath="Value"
IndependentValuePath="Key" />
</chartingToolkit:Chart.Series>
</chartingToolkit:Chart>
<chartingToolkit:Chart x:Name="PieChart2" Margin="600,250,0,00">
<chartingToolkit:Chart.Series>
<chartingToolkit:PieSeries ItemsSource="{Binding}"
DependentValuePath="Value.SizeofAttachments"
IndependentValuePath="Key" />
</chartingToolkit:Chart.Series>
</chartingToolkit:Chart>
</Grid>
</TabItem>
<TabItem Header="Content Age" Name="Cage">
<Grid>
<Canvas Height="Auto" HorizontalAlignment="Left" Margin="0,0,0,0" Name="canvas2" VerticalAlignment="Top" Width="Auto"></Canvas>
<DataGrid AutoGenerateColumns="True" Height="Auto" HorizontalAlignment="Left" Margin="0,250,0,0" Name="dataGrid2" VerticalAlignment="Top" Width="auto" />
<chartingToolkit:Chart x:Name="PieChart3" Margin="0,0,0,250">
<chartingToolkit:Chart.Series>
<chartingToolkit:PieSeries ItemsSource="{Binding}"
DependentValuePath="Value"
IndependentValuePath="Key" />
</chartingToolkit:Chart.Series>
</chartingToolkit:Chart>
</Grid>
</TabItem>
<TabItem Header="Item Type" Name="Itype">
<Grid>
<Canvas Height="Auto" HorizontalAlignment="Left" Margin="0,0,0,0" Name="canvas3" VerticalAlignment="Top" Width="Auto"></Canvas>
<dg:DataGrid AutoGenerateColumns="True" Height="Auto" HorizontalAlignment="Left" Margin="0,0,0,0" Name="dataGrid3" VerticalAlignment="Top" Width="400" />
<chartingToolkit:Chart x:Name="PieChart4" Margin="400,0,0,250" Title="Item type by Item Count">
<chartingToolkit:Chart.Series>
<chartingToolkit:PieSeries ItemsSource="{Binding}"
DependentValuePath="Value.NumberofItems"
IndependentValuePath="Key" />
</chartingToolkit:Chart.Series>
</chartingToolkit:Chart>
<chartingToolkit:Chart x:Name="PieChart5" Margin="400,250,0,00" Title="Item type by Item Size">
<chartingToolkit:Chart.Series>
<chartingToolkit:PieSeries ItemsSource="{Binding}"
DependentValuePath="Value.SizeofItems"
IndependentValuePath="Key" />
</chartingToolkit:Chart.Series>
</chartingToolkit:Chart>
</Grid>
</TabItem>
</TabControl>
</Grid>
</Window>
"@


Function Enumfolders($cnCurrentFolder){
foreach($folder in $cnCurrentFolder.Folders){
"Processing : " + $folder.Name
ProcessItems($folder)
If($folder.Folders.Count -ne 0){Enumfolders($folder)}
}
}

Function ProcessItems($wfWorkingFolder){
$lnum = 0
foreach($Item in $wfWorkingFolder.Items){
$lnum ++
write-progress "Processing message" $lnum
if ($ItemTypehash.ContainsKey($Item.MessageClass)){
$ItemTypehash[$Item.MessageClass].NumberofItems = $ItemTypehash[$Item.MessageClass].NumberofItems + 1
$ItemTypehash[$Item.MessageClass].SizeofItems = $ItemTypehash[$Item.MessageClass].SizeofItems + $Item.Size
}
else{
$iaItemAgobject = "" | select NumberofItems,SizeofItems
$iaItemAgobject.NumberofItems = 1
$iaItemAgobject.SizeofItems = $Item.Size
$ItemTypehash.add($Item.MessageClass,$iaItemAgobject)
}
$ItemAttachedNumber = 0
$ItemAttachedSize = 0
if ($Item.Attachments.Count -ne 0){
foreach($attachment in $Item.Attachments){
$ItemAttachedNumber = $ItemAttachedNumber +1
$ItemAttachedSize = $ItemAttachedSize + $attachment.Size
if ($Attachment.FileName -eq $null){
$attachext = "Embeeded"
}
else{
if ($Attachment.FileName.Substring($Attachment.FileName.Length-4,1) -eq ".")
{
$attachext = $Attachment.FileName.Substring($Attachment.FileName.Length-3,3)
}
else {
if ($Attachment.FileName.Substring($Attachment.FileName.Length-5,1) -eq "."){
$attachext = $Attachment.FileName.Substring($Attachment.FileName.Length-4,4)
}
else{
$attachext = "unkonwn"
}
}
}
if ($AttachmentTypehash.ContainsKey($attachext)){
$AttachmentTypehash[$attachext].NumberofAttachments = $AttachmentTypehash[$attachext].NumberofAttachments + 1
$AttachmentTypehash[$attachext].SizeofAttachments = $AttachmentTypehash[$attachext].SizeofAttachments + $Attachment.Size
}
else{
$iaAttachmentAgobject = "" | select NumberofAttachments,SizeofAttachments
$iaAttachmentAgobject.NumberofAttachments = 1
$iaAttachmentAgobject.SizeofAttachments = $Attachment.Size
$AttachmentTypehash.add($attachext,$iaAttachmentAgobject)
}
}
}
$caContentAge = New-TimeSpan $Item.ReceivedTime $(Get-Date)
if($caContentAge.days -le 183){$ca = 6}
if($caContentAge.days -gt 183 -band $caContentAge.days -le 365){$ca = 12}
if($caContentAge.days -gt 365 -band $caContentAge.days -le 1095){$ca = 36}
if($caContentAge.days -gt 1095 -band $caContentAge.days -le 1825){$ca = 60}
if($caContentAge.days -gt 1825){$ca = 100}
$agkey = $Item.ReceivedTime.ToString("yyyyMMdd") + "-" + $wfWorkingFolder.Name
if ($Datehash.ContainsKey($agkey)){
$Datehash[$agkey].NumberofItems = $Datehash[$agkey].NumberofItems + 1
$Datehash[$agkey].SizeofItems = $Datehash[$agkey].SizeofItems + $Item.Size
$Datehash[$agkey].NumberofAttachments = $Datehash[$agkey].NumberofAttachments + $ItemAttachedNumber
$Datehash[$agkey].AttachmentSize = $Datehash[$agkey].AttachmentSize + $ItemAttachedSize
}
Else{
$daDateAgregationobject = "" | select Date,Folder,ContentAge,NumberofItems,SizeofItems,NumberofAttachments,AttachmentSize
$daDateAgregationobject.Date = $Item.ReceivedTime.ToString("yyyyMMdd")
$daDateAgregationobject.Folder = $wfWorkingFolder.Name
$daDateAgregationobject.ContentAge = $ca
$daDateAgregationobject.NumberofItems = 1
$daDateAgregationobject.SizeofItems = $Item.Size
$daDateAgregationobject.NumberofAttachments = $ItemAttachedNumber
$daDateAgregationobject.AttachmentSize = $ItemAttachedSize
$Datehash.add($agkey,$daDateAgregationobject)
}
}
}

$RDOSession = new-object -com Redemption.RDOsession

$PSTfile = $RDOSession.LogonPSTStore($fnFileName, 1)
$PSTRoot = $RDOSession.GetFolderFromID($PSTfile.IPMRootFolder.EntryID, $PSTfile.EntryID)
Enumfolders($PSTRoot)

$byDateTable = New-Object System.Data.Datatable
$byDateTable.columns.add("Folder")
$byDateTable.columns.add("#Items",[INT64])
$byDateTable.columns.add("Items Size(MB)",[INT64])
$byDateTable.columns.add("#Attachments",[INT64])
$byDateTable.columns.add("Attachments Size(MB)",[INT64])
$byAgeTable = New-Object System.Data.Datatable
$byAgeTable.columns.add("Folder")
$byAgeTable.columns.add("6>#Items",[INT64])
$byAgeTable.columns.add("6>#(MB)",[INT64])
$byAgeTable.columns.add("6>#Atch",[INT64])
$byAgeTable.columns.add("6>#Atch(MB)",[INT64])
$byAgeTable.columns.add("6to12#Items",[INT64])
$byAgeTable.columns.add("6to12#(MB)",[INT64])
$byAgeTable.columns.add("6to12#Atch",[INT64])
$byAgeTable.columns.add("6to12#Atch(MB)",[INT64])
$byAgeTable.columns.add("1to3years#Items",[INT64])
$byAgeTable.columns.add("1to3years#(MB)",[INT64])
$byAgeTable.columns.add("1to3years#Atch",[INT64])
$byAgeTable.columns.add("1to3years#Atch(MB)",[INT64])
$byAgeTable.columns.add("3to5years#Items",[INT64])
$byAgeTable.columns.add("3to5years#(MB)",[INT64])
$byAgeTable.columns.add("3to5years#Atch",[INT64])
$byAgeTable.columns.add("3to5years#Atch(MB)",[INT64])
$byAgeTable.columns.add("5+years#Items",[INT64])
$byAgeTable.columns.add("5+years#(MB)",[INT64])
$byAgeTable.columns.add("5+years#Atch",[INT64])
$byAgeTable.columns.add("5+years#Atch(MB)",[INT64])
$Charthash = @{ }
$cCount1 = 0
$Datehash.Values | group-object {$_.Folder} | Sort-Object @{expression={(($_.Group | Measure-Object SizeofItems -sum).sum/1MB)}} -Descending | foreach-object{
if ((($_.Group | Measure-Object SizeofItems -sum).sum/1MB) -gt 1 -band $cCount1 -le 10){
if ($_.Name.Length -gt 10){$chartname = $_.Name.Substring(0,10)}
else{$chartname = $_.Name}
$Charthash.add($chartname,(($_.Group | Measure-Object SizeofItems -sum).sum/1MB))
}
$cCount1++
[VOID]$byDateTable.rows.add($_.Name,($_.Group | Measure-Object NumberofItems -sum).sum/1,(($_.Group | Measure-Object SizeofItems -sum).sum/1MB),($_.Group | Measure-Object NumberofAttachments -sum).sum/1,(($_.Group | Measure-Object AttachmentSize -sum).sum/1MB))
}
$Charthash2 = @{ }
$Charthash2.Add("Under 6 Months",0)
$Charthash2.Add("6 to 12 Months",0)
$Charthash2.Add("1 to 3 years",0)
$Charthash2.Add("3 to 5 years",0)
$Charthash2.Add("Over 5 years",0)
$Datehash.Values | group-object {$_.Folder} | Sort-Object @{expression={(($_.Group | Measure-Object SizeofItems -sum).sum/1MB)}} -Descending | foreach-object{
$Charthash2["Under 6 Months"] = $Charthash2["Under 6 Months"] + ($_.Group | Where-Object {$_.ContentAge -eq 6} | Measure-Object SizeofItems -sum).sum/1MB
$Charthash2["6 to 12 Months"] = $Charthash2["6 to 12 Months"] + ($_.Group | Where-Object {$_.ContentAge -eq 12} | Measure-Object SizeofItems -sum).sum/1MB
$Charthash2["1 to 3 years"] = $Charthash2["1 to 3 years"] + ($_.Group | Where-Object {$_.ContentAge -eq 36} | Measure-Object SizeofItems -sum).sum/1MB
$Charthash2["3 to 5 years"] = $Charthash2["3 to 5 years"] + ($_.Group | Where-Object {$_.ContentAge -eq 60} | Measure-Object SizeofItems -sum).sum/1MB
$Charthash2["Over 5 years"] = $Charthash2["Over 5 years"] + ($_.Group | Where-Object {$_.ContentAge -eq 100} | Measure-Object SizeofItems -sum).sum/1MB
[VOID]$byAgeTable.rows.add($_.Name,($_.Group | Where-Object {$_.ContentAge -eq 6} | Measure-Object NumberofItems -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 6} | Measure-Object SizeofItems -sum).sum/1MB),($_.Group | Where-Object {$_.ContentAge -eq 6} | Measure-Object NumberofAttachments -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 6} | Measure-Object AttachmentSize -sum).sum/1MB),($_.Group | Where-Object {$_.ContentAge -eq 12} | Measure-Object NumberofItems -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 12} | Measure-Object SizeofItems -sum).sum/1MB),($_.Group | Where-Object {$_.ContentAge -eq 12} | Measure-Object NumberofAttachments -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 12} | Measure-Object AttachmentSize -sum).sum/1MB),($_.Group | Where-Object {$_.ContentAge -eq 36} | Measure-Object NumberofItems -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 36} | Measure-Object SizeofItems -sum).sum/1MB),($_.Group | Where-Object {$_.ContentAge -eq 36} | Measure-Object NumberofAttachments -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 36} | Measure-Object AttachmentSize -sum).sum/1MB),($_.Group | Where-Object {$_.ContentAge -eq 60} | Measure-Object NumberofItems -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 60} | Measure-Object SizeofItems -sum).sum/1MB),($_.Group | Where-Object {$_.ContentAge -eq 60} | Measure-Object NumberofAttachments -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 60} | Measure-Object AttachmentSize -sum).sum/1MB),($_.Group | Where-Object {$_.ContentAge -eq 100} | Measure-Object NumberofItems -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 100} | Measure-Object SizeofItems -sum).sum/1MB),($_.Group | Where-Object {$_.ContentAge -eq 100} | Measure-Object NumberofAttachments -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 100} | Measure-Object AttachmentSize -sum).sum/1MB))
}
$XMLreader = New-Object System.Xml.XmlNodeReader $xaml
$XAMLreader = [Windows.Markup.XamlReader]::Load($XMLreader)
$tc = $XAMLreader.FindName("PieChart1")
$tc.DataContext = $Charthash
$tc = $XAMLreader.FindName("PieChart2")
$tc.DataContext = ($AttachmentTypehash.GetEnumerator() | Sort-Object Value.SizeofAttachments | select-object -First 10)
$tc = $XAMLreader.FindName("PieChart3")
$tc.DataContext = $Charthash2
$tc = $XAMLreader.FindName("PieChart4")
$tc.DataContext = ($ItemTypehash.GetEnumerator() | Sort-Object Value.SizeofItems | select-object -First 10)
$tc = $XAMLreader.FindName("PieChart5")
$tc.DataContext = ($ItemTypehash.GetEnumerator() | Sort-Object Value.SizeofItems | select-object -First 10)
$datagrid = $XAMLreader.FindName("dataGrid1")
$datagrid.ItemsSource = $byDateTable.defaultview
$datagrid2 = $XAMLreader.FindName("dataGrid2")
$datagrid2.ItemsSource = $byAgeTable.defaultview

$byItemTable = New-Object System.Data.Datatable
$byItemTable.columns.add("ItemType")
$byItemTable.columns.add("#Items",[INT64])
$byItemTable.columns.add("Items Size(MB)",[INT64])
$ItemTypehash.GetEnumerator() | foreach-object {
[VOID]$byItemTable.rows.Add($_.key.ToString(),$_.value.NumberofItems,$_.value.SizeofItems)
$_.key.ToString()
}

$datagrid3 = $XAMLreader.FindName("dataGrid3")
$datagrid3.ItemsSource = $byItemTable.defaultview
$XAMLreader.ShowDialog()