Skip to main content

Anaylising the content of a PST file and reporting on the age and type of content using Powershell and WPF

As time passes we all receive more and more email, this is one of the irrefutable facts of life for any mail system or anybody with a mailbox. Even the most fastidious of deleters still can't avoid this so at some point in the future you may need to consider achieving. The fact is your users maybe already self archiving using PST files which like self medicating can lead to serious problems down the track when their laptop goes under a bus. Exchange 2010 introduced native achieving and SP1 when its released will build on these capabilities, for other versions of exchange there are a bunch of other fine products you can look at to do this. If your looking to ingest PST's into an online archive you may want to first review what content your going to be importing and look at things like how many attachments and what type are they and how old is the content to give you a feel for what your going to be storing and potentially having to backup.

What this boils down to is if you want to include attachment reporting then you will need a script that will do a pass on every item within a mailbox. This means in real terms that this script is going to be slow to very slow to run on a very large PST file. If you can wait you can get some useful information so there is a trade off their somewhere.

To loop through every message within a PST file using powershell is pretty easy if you use Dmitry's redemption library http://www.dimastr.com/redemption/. RDO gives us an easy to use wrapper around exMapi and the Outlooks PST provider. So now we have the ability to go through every item in a pst file the next thing to decide is how you want to group and classify each item. I've chosen to do this by date and content age this gives the most flexibility when it comes to aggregating the data later on. To add extra functionality in this script I've created separate hash's for attachment sizes and types this is something that you can build on yourself all you need to do is think about what it is you want to report and aggregate and then just add in a few lines of your own code.

When it comes time to report on the data that was collected in the PST sweep this is when Powershell comes into its own with the Group-Object and Measure-Object cmdlets. Because during the sweep i classified each message group depending on the age of the content i can then further re-aggregate this data using some pipeline magic eg
$Datehash.Values | group-object {$_.Folder} | Sort-Object @{expression={(($_.Group | Measure-Object SizeofItems -sum).sum/1MB)}} -Descending | foreach-object{

$Charthash2["1 to 3 years"] = $Charthash2["1 to 3 years"] + ($_.Group | Where-Object {$_.ContentAge -eq 36} | Measure-Object SizeofItems -sum).sum/1MB

}
Actually explaining what this line does would take a separate post but when it comes to quantifying numerical information without using a database this is extremely useful. In real words it first groups the data by Folder Name and then allows me to aggregate within the grouped data.

Displaying the result

I decided to try something new with this script instead of using the normal winform GUI I've used a lot in the past i went down the WPF path. WPF (Windows Presentation Framework) first appeared in .NET 3.0 so to use this script you must have the .NET 3.5 framework installed as well as the wpftoolkit which contains both datagrid and chart control I've used in this script. As far as a comparison between Winforms and WPF from powershell WPF is by far easier to use when building a GUI script because you define all the element in XML then just manipulate the data providers with code. You can use something like Visual studio to build the graphical look for your GUI and then cut and past the code more or less straight into your script with a few small changes. The result of what you get is also visually more pleasing eg these are a few of the screen-shots from this script.



Using this script

There are a few pre-requisites for using this script first you need redemption http://www.dimastr.com/redemption/download.htm

If you want to run this on Windows 7 64bit I found i had to use the following launcher script to start a 32bit session to run the script

&$env:windir\syswow64\windowspowershell\v1.0\powershell.exe -noninteractive -STA

This also ensures that you have a STA session of powershell which is important for the WPF code to work correctly. The last things that's required is the WPFtoolkit which you can downloaded from http://wpf.codeplex.com/releases/view/40535. Note if this gets installed to anywhere other then C:\Program Files (x86)\WPF Toolkit" + "\v3.5.50211.1\ you will need to change the path.

To run the script you need to pass it the path to the PST file as a cmdline argument eg ./pstanlv1.ps1 "c:\mail\outlook.pst"

I've put a download of this script here the code itself looks like

$fnFileName = $args[0]

$Datehash = new-object "System.Collections.Generic.Dictionary[System.string, System.object]"
$AttachmentTypehash = @{ }
$ItemTypehash = @{ }
Add-Type -Assembly PresentationFramework
$dataVisualization = "C:\Program Files (x86)\WPF Toolkit" + "\v3.5.50211.1\System.Windows.Controls.DataVisualization.Toolkit.dll"
$wpfToolkit = "C:\Program Files (x86)\WPF Toolkit" + "\v3.5.50211.1\WPFToolkit.dll"
Add-Type -Path $dataVisualization
Add-Type -Path $wpfToolkit

[xml]$xaml = @"
<Window
xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
xmlns:dg="http://schemas.microsoft.com/wpf/2008/toolkit"
xmlns:chartingToolkit="clr-namespace:System.Windows.Controls.DataVisualization.Charting;assembly=System.Windows.Controls.DataVisualization.Toolkit"
Title="MainWindow" Height="auto" Width="auto">
<Grid>
<TabControl Height="auto" HorizontalAlignment="Stretch" VerticalAlignment="Stretch" Margin="0,0,0,0" Name="Home" Width="auto">
<TabItem Header="OverView" Name="OverView">
<Grid>
<Canvas Height="Auto" HorizontalAlignment="Left" Margin="0,0,0,0" Name="canvas1" VerticalAlignment="Top" Width="Auto"></Canvas>
<dg:DataGrid AutoGenerateColumns="True" Height="Auto" HorizontalAlignment="Left" Margin="0,0,0,0" Name="dataGrid1" VerticalAlignment="Top" Width="600" />
<chartingToolkit:Chart x:Name="PieChart1" Margin="600,0,0,250">
<chartingToolkit:Chart.Series>
<chartingToolkit:PieSeries ItemsSource="{Binding}"
DependentValuePath="Value"
IndependentValuePath="Key" />
</chartingToolkit:Chart.Series>
</chartingToolkit:Chart>
<chartingToolkit:Chart x:Name="PieChart2" Margin="600,250,0,00">
<chartingToolkit:Chart.Series>
<chartingToolkit:PieSeries ItemsSource="{Binding}"
DependentValuePath="Value.SizeofAttachments"
IndependentValuePath="Key" />
</chartingToolkit:Chart.Series>
</chartingToolkit:Chart>
</Grid>
</TabItem>
<TabItem Header="Content Age" Name="Cage">
<Grid>
<Canvas Height="Auto" HorizontalAlignment="Left" Margin="0,0,0,0" Name="canvas2" VerticalAlignment="Top" Width="Auto"></Canvas>
<DataGrid AutoGenerateColumns="True" Height="Auto" HorizontalAlignment="Left" Margin="0,250,0,0" Name="dataGrid2" VerticalAlignment="Top" Width="auto" />
<chartingToolkit:Chart x:Name="PieChart3" Margin="0,0,0,250">
<chartingToolkit:Chart.Series>
<chartingToolkit:PieSeries ItemsSource="{Binding}"
DependentValuePath="Value"
IndependentValuePath="Key" />
</chartingToolkit:Chart.Series>
</chartingToolkit:Chart>
</Grid>
</TabItem>
<TabItem Header="Item Type" Name="Itype">
<Grid>
<Canvas Height="Auto" HorizontalAlignment="Left" Margin="0,0,0,0" Name="canvas3" VerticalAlignment="Top" Width="Auto"></Canvas>
<dg:DataGrid AutoGenerateColumns="True" Height="Auto" HorizontalAlignment="Left" Margin="0,0,0,0" Name="dataGrid3" VerticalAlignment="Top" Width="400" />
<chartingToolkit:Chart x:Name="PieChart4" Margin="400,0,0,250" Title="Item type by Item Count">
<chartingToolkit:Chart.Series>
<chartingToolkit:PieSeries ItemsSource="{Binding}"
DependentValuePath="Value.NumberofItems"
IndependentValuePath="Key" />
</chartingToolkit:Chart.Series>
</chartingToolkit:Chart>
<chartingToolkit:Chart x:Name="PieChart5" Margin="400,250,0,00" Title="Item type by Item Size">
<chartingToolkit:Chart.Series>
<chartingToolkit:PieSeries ItemsSource="{Binding}"
DependentValuePath="Value.SizeofItems"
IndependentValuePath="Key" />
</chartingToolkit:Chart.Series>
</chartingToolkit:Chart>
</Grid>
</TabItem>
</TabControl>
</Grid>
</Window>
"@


Function Enumfolders($cnCurrentFolder){
foreach($folder in $cnCurrentFolder.Folders){
"Processing : " + $folder.Name
ProcessItems($folder)
If($folder.Folders.Count -ne 0){Enumfolders($folder)}
}
}

Function ProcessItems($wfWorkingFolder){
$lnum = 0
foreach($Item in $wfWorkingFolder.Items){
$lnum ++
write-progress "Processing message" $lnum
if ($ItemTypehash.ContainsKey($Item.MessageClass)){
$ItemTypehash[$Item.MessageClass].NumberofItems = $ItemTypehash[$Item.MessageClass].NumberofItems + 1
$ItemTypehash[$Item.MessageClass].SizeofItems = $ItemTypehash[$Item.MessageClass].SizeofItems + $Item.Size
}
else{
$iaItemAgobject = "" | select NumberofItems,SizeofItems
$iaItemAgobject.NumberofItems = 1
$iaItemAgobject.SizeofItems = $Item.Size
$ItemTypehash.add($Item.MessageClass,$iaItemAgobject)
}
$ItemAttachedNumber = 0
$ItemAttachedSize = 0
if ($Item.Attachments.Count -ne 0){
foreach($attachment in $Item.Attachments){
$ItemAttachedNumber = $ItemAttachedNumber +1
$ItemAttachedSize = $ItemAttachedSize + $attachment.Size
if ($Attachment.FileName -eq $null){
$attachext = "Embeeded"
}
else{
if ($Attachment.FileName.Substring($Attachment.FileName.Length-4,1) -eq ".")
{
$attachext = $Attachment.FileName.Substring($Attachment.FileName.Length-3,3)
}
else {
if ($Attachment.FileName.Substring($Attachment.FileName.Length-5,1) -eq "."){
$attachext = $Attachment.FileName.Substring($Attachment.FileName.Length-4,4)
}
else{
$attachext = "unkonwn"
}
}
}
if ($AttachmentTypehash.ContainsKey($attachext)){
$AttachmentTypehash[$attachext].NumberofAttachments = $AttachmentTypehash[$attachext].NumberofAttachments + 1
$AttachmentTypehash[$attachext].SizeofAttachments = $AttachmentTypehash[$attachext].SizeofAttachments + $Attachment.Size
}
else{
$iaAttachmentAgobject = "" | select NumberofAttachments,SizeofAttachments
$iaAttachmentAgobject.NumberofAttachments = 1
$iaAttachmentAgobject.SizeofAttachments = $Attachment.Size
$AttachmentTypehash.add($attachext,$iaAttachmentAgobject)
}
}
}
$caContentAge = New-TimeSpan $Item.ReceivedTime $(Get-Date)
if($caContentAge.days -le 183){$ca = 6}
if($caContentAge.days -gt 183 -band $caContentAge.days -le 365){$ca = 12}
if($caContentAge.days -gt 365 -band $caContentAge.days -le 1095){$ca = 36}
if($caContentAge.days -gt 1095 -band $caContentAge.days -le 1825){$ca = 60}
if($caContentAge.days -gt 1825){$ca = 100}
$agkey = $Item.ReceivedTime.ToString("yyyyMMdd") + "-" + $wfWorkingFolder.Name
if ($Datehash.ContainsKey($agkey)){
$Datehash[$agkey].NumberofItems = $Datehash[$agkey].NumberofItems + 1
$Datehash[$agkey].SizeofItems = $Datehash[$agkey].SizeofItems + $Item.Size
$Datehash[$agkey].NumberofAttachments = $Datehash[$agkey].NumberofAttachments + $ItemAttachedNumber
$Datehash[$agkey].AttachmentSize = $Datehash[$agkey].AttachmentSize + $ItemAttachedSize
}
Else{
$daDateAgregationobject = "" | select Date,Folder,ContentAge,NumberofItems,SizeofItems,NumberofAttachments,AttachmentSize
$daDateAgregationobject.Date = $Item.ReceivedTime.ToString("yyyyMMdd")
$daDateAgregationobject.Folder = $wfWorkingFolder.Name
$daDateAgregationobject.ContentAge = $ca
$daDateAgregationobject.NumberofItems = 1
$daDateAgregationobject.SizeofItems = $Item.Size
$daDateAgregationobject.NumberofAttachments = $ItemAttachedNumber
$daDateAgregationobject.AttachmentSize = $ItemAttachedSize
$Datehash.add($agkey,$daDateAgregationobject)
}
}
}

$RDOSession = new-object -com Redemption.RDOsession

$PSTfile = $RDOSession.LogonPSTStore($fnFileName, 1)
$PSTRoot = $RDOSession.GetFolderFromID($PSTfile.IPMRootFolder.EntryID, $PSTfile.EntryID)
Enumfolders($PSTRoot)

$byDateTable = New-Object System.Data.Datatable
$byDateTable.columns.add("Folder")
$byDateTable.columns.add("#Items",[INT64])
$byDateTable.columns.add("Items Size(MB)",[INT64])
$byDateTable.columns.add("#Attachments",[INT64])
$byDateTable.columns.add("Attachments Size(MB)",[INT64])
$byAgeTable = New-Object System.Data.Datatable
$byAgeTable.columns.add("Folder")
$byAgeTable.columns.add("6>#Items",[INT64])
$byAgeTable.columns.add("6>#(MB)",[INT64])
$byAgeTable.columns.add("6>#Atch",[INT64])
$byAgeTable.columns.add("6>#Atch(MB)",[INT64])
$byAgeTable.columns.add("6to12#Items",[INT64])
$byAgeTable.columns.add("6to12#(MB)",[INT64])
$byAgeTable.columns.add("6to12#Atch",[INT64])
$byAgeTable.columns.add("6to12#Atch(MB)",[INT64])
$byAgeTable.columns.add("1to3years#Items",[INT64])
$byAgeTable.columns.add("1to3years#(MB)",[INT64])
$byAgeTable.columns.add("1to3years#Atch",[INT64])
$byAgeTable.columns.add("1to3years#Atch(MB)",[INT64])
$byAgeTable.columns.add("3to5years#Items",[INT64])
$byAgeTable.columns.add("3to5years#(MB)",[INT64])
$byAgeTable.columns.add("3to5years#Atch",[INT64])
$byAgeTable.columns.add("3to5years#Atch(MB)",[INT64])
$byAgeTable.columns.add("5+years#Items",[INT64])
$byAgeTable.columns.add("5+years#(MB)",[INT64])
$byAgeTable.columns.add("5+years#Atch",[INT64])
$byAgeTable.columns.add("5+years#Atch(MB)",[INT64])
$Charthash = @{ }
$cCount1 = 0
$Datehash.Values | group-object {$_.Folder} | Sort-Object @{expression={(($_.Group | Measure-Object SizeofItems -sum).sum/1MB)}} -Descending | foreach-object{
if ((($_.Group | Measure-Object SizeofItems -sum).sum/1MB) -gt 1 -band $cCount1 -le 10){
if ($_.Name.Length -gt 10){$chartname = $_.Name.Substring(0,10)}
else{$chartname = $_.Name}
$Charthash.add($chartname,(($_.Group | Measure-Object SizeofItems -sum).sum/1MB))
}
$cCount1++
[VOID]$byDateTable.rows.add($_.Name,($_.Group | Measure-Object NumberofItems -sum).sum/1,(($_.Group | Measure-Object SizeofItems -sum).sum/1MB),($_.Group | Measure-Object NumberofAttachments -sum).sum/1,(($_.Group | Measure-Object AttachmentSize -sum).sum/1MB))
}
$Charthash2 = @{ }
$Charthash2.Add("Under 6 Months",0)
$Charthash2.Add("6 to 12 Months",0)
$Charthash2.Add("1 to 3 years",0)
$Charthash2.Add("3 to 5 years",0)
$Charthash2.Add("Over 5 years",0)
$Datehash.Values | group-object {$_.Folder} | Sort-Object @{expression={(($_.Group | Measure-Object SizeofItems -sum).sum/1MB)}} -Descending | foreach-object{
$Charthash2["Under 6 Months"] = $Charthash2["Under 6 Months"] + ($_.Group | Where-Object {$_.ContentAge -eq 6} | Measure-Object SizeofItems -sum).sum/1MB
$Charthash2["6 to 12 Months"] = $Charthash2["6 to 12 Months"] + ($_.Group | Where-Object {$_.ContentAge -eq 12} | Measure-Object SizeofItems -sum).sum/1MB
$Charthash2["1 to 3 years"] = $Charthash2["1 to 3 years"] + ($_.Group | Where-Object {$_.ContentAge -eq 36} | Measure-Object SizeofItems -sum).sum/1MB
$Charthash2["3 to 5 years"] = $Charthash2["3 to 5 years"] + ($_.Group | Where-Object {$_.ContentAge -eq 60} | Measure-Object SizeofItems -sum).sum/1MB
$Charthash2["Over 5 years"] = $Charthash2["Over 5 years"] + ($_.Group | Where-Object {$_.ContentAge -eq 100} | Measure-Object SizeofItems -sum).sum/1MB
[VOID]$byAgeTable.rows.add($_.Name,($_.Group | Where-Object {$_.ContentAge -eq 6} | Measure-Object NumberofItems -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 6} | Measure-Object SizeofItems -sum).sum/1MB),($_.Group | Where-Object {$_.ContentAge -eq 6} | Measure-Object NumberofAttachments -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 6} | Measure-Object AttachmentSize -sum).sum/1MB),($_.Group | Where-Object {$_.ContentAge -eq 12} | Measure-Object NumberofItems -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 12} | Measure-Object SizeofItems -sum).sum/1MB),($_.Group | Where-Object {$_.ContentAge -eq 12} | Measure-Object NumberofAttachments -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 12} | Measure-Object AttachmentSize -sum).sum/1MB),($_.Group | Where-Object {$_.ContentAge -eq 36} | Measure-Object NumberofItems -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 36} | Measure-Object SizeofItems -sum).sum/1MB),($_.Group | Where-Object {$_.ContentAge -eq 36} | Measure-Object NumberofAttachments -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 36} | Measure-Object AttachmentSize -sum).sum/1MB),($_.Group | Where-Object {$_.ContentAge -eq 60} | Measure-Object NumberofItems -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 60} | Measure-Object SizeofItems -sum).sum/1MB),($_.Group | Where-Object {$_.ContentAge -eq 60} | Measure-Object NumberofAttachments -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 60} | Measure-Object AttachmentSize -sum).sum/1MB),($_.Group | Where-Object {$_.ContentAge -eq 100} | Measure-Object NumberofItems -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 100} | Measure-Object SizeofItems -sum).sum/1MB),($_.Group | Where-Object {$_.ContentAge -eq 100} | Measure-Object NumberofAttachments -sum).sum/1,(($_.Group | Where-Object {$_.ContentAge -eq 100} | Measure-Object AttachmentSize -sum).sum/1MB))
}
$XMLreader = New-Object System.Xml.XmlNodeReader $xaml
$XAMLreader = [Windows.Markup.XamlReader]::Load($XMLreader)
$tc = $XAMLreader.FindName("PieChart1")
$tc.DataContext = $Charthash
$tc = $XAMLreader.FindName("PieChart2")
$tc.DataContext = ($AttachmentTypehash.GetEnumerator() | Sort-Object Value.SizeofAttachments | select-object -First 10)
$tc = $XAMLreader.FindName("PieChart3")
$tc.DataContext = $Charthash2
$tc = $XAMLreader.FindName("PieChart4")
$tc.DataContext = ($ItemTypehash.GetEnumerator() | Sort-Object Value.SizeofItems | select-object -First 10)
$tc = $XAMLreader.FindName("PieChart5")
$tc.DataContext = ($ItemTypehash.GetEnumerator() | Sort-Object Value.SizeofItems | select-object -First 10)
$datagrid = $XAMLreader.FindName("dataGrid1")
$datagrid.ItemsSource = $byDateTable.defaultview
$datagrid2 = $XAMLreader.FindName("dataGrid2")
$datagrid2.ItemsSource = $byAgeTable.defaultview

$byItemTable = New-Object System.Data.Datatable
$byItemTable.columns.add("ItemType")
$byItemTable.columns.add("#Items",[INT64])
$byItemTable.columns.add("Items Size(MB)",[INT64])
$ItemTypehash.GetEnumerator() | foreach-object {
[VOID]$byItemTable.rows.Add($_.key.ToString(),$_.value.NumberofItems,$_.value.SizeofItems)
$_.key.ToString()
}

$datagrid3 = $XAMLreader.FindName("dataGrid3")
$datagrid3.ItemsSource = $byItemTable.defaultview
$XAMLreader.ShowDialog()

Popular posts from this blog

Export calendar Items to a CSV file using EWS and Powershell

Somebody asked about this last week and while I have a lot of EWS scripts that do access the Calendar I didn't have a simple example that just exported a list of the Calendar events with relevant information to a CSV file so here it is.

I've talked on this one before in this howto  but when you query the calendar folder using EWS you need to use a CalendarView which will expand any recurring appointments in a calendar. There are some limits when you use a calendarview in that you can only return a maximum of 2 years of appointments at a time and paging will limit the max number of items to 1000 per call. So if you have a calendar with a very large number of appointments you need to break your query into small date time blocks. In this example script I'm just grabbing the next 7 days of appointments if you want to query a longer period you need to adjust the following lines (keeping in mind what I just mentioned)

#Define Date to Query
$StartDate = (Get-Date)
$EndDate = (Ge…

EWS Managed API and Powershell How-To series Part 1

I thought I'd start the year with a series of posts that goes back over the basics of using the EWS Managed API from Powershell and provides a modular remarked example that you can easily cut and paste to build your own scripts. Along the way in this series I'll show a whole bunch of examples around specific things.

As a starting point for versions this will be Powershell Version 2.0  and the EWS Managed API 1.1 (which will soon change to 1.2 once released) http://www.microsoft.com/download/en/details.aspx?id=13480.

The starting point for any EWS script your going to write is connecting to Exchange for which there are three important pieces of information you will need. Firstly you need to know the version of Exchange your running in this script its going to be held in the following variable

$ExchangeVersion = [Microsoft.Exchange.WebServices.Data.ExchangeVersion]::Exchange2010_SP1

Other valid values for Exchange 2007 would be

$ExchangeVersion = [Microsoft.Exchange.WebServices.…

Writing a simple scripted process to download attachmentts in Exchange 2007/ 2010 using the EWS Managed API

Every complicated thing in life is made up of smaller simpler building blocks, when it comes to writing a script (or any code really) the more of these little building blocks you have to figure out the more the process of solving a problem can become bewildering. The Internet generally provides you with lots of half eaten sandwiches of information something someone else has taken a bite out but a lot of the time half done, and as with any code its usefulness declines over time as new and better API's and methods are derived. In this post I'm going to go through a simple scripted process that hopefully covers a few more of these smaller building blocks that you might face when asked to come up with a simple costless solution to perform an automated business function with a script.

So the process im going to look at is one that comes up a lot and that is you have an Email that comes into to certain mailbox every day with a certain subject in my case "Daily Export" this …
All sample scripts and source code is provided by for illustrative purposes only. All examples are untested in different environments and therefore, I cannot guarantee or imply reliability, serviceability, or function of these programs.

All code contained herein is provided to you "AS IS" without any warranties of any kind. The implied warranties of non-infringement, merchantability and fitness for a particular purpose are expressly disclaimed.