Wednesday, May 23, 2018

GDPR and your Exchange / Office365 Contacts from a data perspective

GDPR which stands for General Data Protection Regulation is one of those things that surfaces in the IT world (y2k would be another) that seems like a godsend to the lawyers and anybody doing project management.  Endless paperwork, meetings, manual writing and training that seems to achieve little but cost a ton. One of the problems with GDPR is the very broad brush it paints with about what is considered private data and what constitutes compliance. What I'm going to cover in this post is what you can do if your on the receiving end of a GDPR data request around Contacts that are stored in an Exchange Mailbox and how scripting can help you out. I'm not going to talk about any of the legalities around GDPR because its a bit of minefield but look more at it from the data prospective.

Mailbox Contacts

From a strictly data view the majority of properties that make up an Outlook Contact record is private data. Eg a Persons Name, PhoneNumber, Email Address, Address information etc is the data that essentially makes a contact functional. The volume of personal information people will store in contacts is a very personal thing to the Mailbox owner. Eg some people may store the birthday information of a client, spouse or children which Outlook can cater for to provide a more personal touch while for others just the EmailAddress and Name is enough. Contacts folders can then be shared within or outside an organization (policy dependant)which is where you can really start to getting into trouble and is where  IT professionals should have a part to play to ensure that when things are shared that it is done with security and privacy in mind. Data Stewardship is probably a new buzzword you might starting to here more often.

Should GDPR spell the end of using the default standard permission on Contacts folders ?

I'm posing this as a question more then a statement but my thoughts are probably. The default standard permission on a Mailbox Folder basically allows you to specify everyone (has to be an authenticated user with a Mailbox) can access a particular folder in your Mailbox. eg

By default on an Exchange folder this is set to None (so privacy by design would be good) but if a user (or and administrator using a PowerShell script) has set this to anything other then None on Contacts folders then they are basically sharing any private data they have stored in there own mailbox's contacts folder but can't account for who has access to it and in turn how it maybe used. EG a new Intern in the company decides to email everyone in the contacts folder to complete a task they didn't understand etc. Data Stewardship can mean at lot of different things but as a basic one you should know who has access to private data and be able to explain why people have that access and understand the privacy implications of accessing the underlying data. Eg your switching away from doing things for purely convenience.

Reporting on the state of the private data being stored

Reporting on private data can pose some special logical problems in that the report your producing shouldn't expose any of the underlying data your reporting one. One approach is a simple yes/no on which data is being stored or score each data point and provide a report on that. I've written a script that does both and added it to my contacts Module which is available on the PowerShell gallery and GitHub basically what this script does is checks a number of the default contacts properties (not all of them) and counts each one that has data and what data type are available so you can then produce a report of that eg.

This script works in turn with other cmdlets in the Module eg to produce the above report on all contacts in the default Contacts Folder of a Mailbox

$Contacts = Get-EXCContacts -MailboxName -Folder \contacts $Contacts | ForEach-Object{     Get-EXCPrivacyReport -Contact $_ }
You could also do it as a one-liner if you want to limit the report for example to those contacts with Notes eg

 Get-EXCContacts -MailboxName -Folder \contacts  | ForEach-Object{ Get-EXCPrivacyReport -Contact $_ } | where-object {$_.HasNotes -eq $true}
I'm only reporting on certain contact properties so this if you have a request to do this type or reporting I would suggest taking a look at the code which is available on GitHub and customising it to suit whatever your reporting needs are (or hire me and I'll do it as I could use the work at the moment).

Searching for Contacts

Article 17 of GDPR covers the rights to erasure (right to be forgotten), while its probably unlikely that this should ever effect Outlook Contacts it maybe that one day you find yourself being asked to show that you  could comply with such a request.

Generally this means just going in and deleting one contact in the Outlook Contacts folders but if your asked to do this across a number of Mailboxes or the Mailbox has a number of Contacts folders where the contact maybe located (or maybe the user has copied the contacts) this is where some type of automated search can be useful. Usually the first port of call for searching would be either the Search-Mailbox cmdlet or eDiscovery/Compliance tools in Office365 portal or OnPerm server. These currently don't seem to offer the ability to do a simple search via Email Address or DisplayName for an Outlook contact so I've written a script and added this to my EWS Contacts Module that can do this.

This script first gets all the Contacts Folder in a Mailbox in the visible Mailbox Root and then does a KQL (or AQS if you still on 2010) search of each of these Contacts Folder. With this script I do a parameter-less Keyword search of the Contacts which given that the emailaddresses and displayname should be indexed should then return the contacts as need. I then do some validation at the client side to remove any false positives by checking each of the 3 EmailAddress properties if you searched via email address or the many displayName properties that are available. If you are searching by displayName you do need to be careful of the format and any punctuation that maybe have been used that could affect the search results.(if you where requested to search via Telephone number that's possible but each phone number is a different so would require extra code to support that)

The following is an example of using the Search cmdlet to search via email

Search-EXCAllContactFolders -MailboxName -EmailAddress  -Credentials $cred
or to search via displayName

Search-EXCAllContactFolders -MailboxName -DisplayName "user im lookingfor"  -Credentials $cred
This script returns the typed EWS Managed API objects which is useful if you want to do further manipulation (Copy,Move) or if you just want to delete the object just call the Delete Method with the Enumeration for the type of delete you want to do eg for a soft deleted


The other thing the script returns is what property it matched on so for instance if it is the EmailAddress you are searching for and the TargetContact has this email address as its EmailAddress3 property when you look at the results of the search if you check the matched property it will show you this eg

As you can see above I haven't filtered any of the hidden Contact folders in the Mailbox so it will search through some of the System Folders like the Recipient Cache etc.

One last thing to remember here is that if your just talking about the emailaddress of a person its going to be stored in other places in the Mailbox eg AutoComplete Cache and in any email correspondents. So completely forgetting an email address across all dataset in Exchange is extremely impractical. (eg consider that its also held in Tracking logs, Antispam gateways etc).

Contact Groups

Contract Groups are personal distribution lists that are stored in a Mailbox's Contact Folders. From a Search perspective if you where looking to find if an Email Address was a member of any Contact Groups in a Mailbox it not an easy task to fulfil (because there is no mechanism to search contact Group members).So for this I've written a script that will enumerate all the Contact Groups across all Contacts Folders, then enumerate all the members in theses groups and then check each of the EmailAddresses or DisplayName (Email displayname) properties to check if a users is member. This Script will then return the Group if found with a new property call MatchedMember with the Member that was found. This make it easier if you want then remove the member you can call the Member.Remove Method using this property. Eg to find and remove a user from a Contact Group using this cmdlet

$Groups = Search-EXCAllContactGroups -MailboxName -Credentials $cred -EmailAddress Foreach($Group in $Groups){        write-host ("Removing : " + $Group.MatchedMember.AddressInformation.Address + " From " + $Group.DisplayName)        $Group.Members.Remove($Group.MatchedMember)        $Group.Update([Microsoft.Exchange.WebServices.Data.ConflictResolutionMode]::AlwaysOverwrite) }
All the source code for the scripts I've talked about in the post from GitHub here or you can get the Module from the PowerShell Gallery here

Thursday, May 17, 2018

Parsing and reporting on hyperlinks in email using EWS and REST (eg looking for baseStriker) in Exchange and Office365

Its been quite a busy week in Email security the pass 7 days with 2 new vulnerabilities released in the last week first BaseStriker and now EFail . While its still too early to gauge the implications of both of these flaws what they both have in common is using the HTML body of a message and underlying html markup tags to make these exploits work. With baseStriker its the use of the Base Href tag in a HTML document and with EFail using an Img Src tag to send decrypted email contents to an external server (this is an over simplification).

In this post I'm going to look at how you can parse the HTML Links, Image SRC tags from messages that are sitting in a Mailbox (so post any Transport pipeline filtering) and provide a level or reporting on these. Or basically because we are going to be using the Mailbox API's for this we are looking directly at what's available to any Email Client in terms of Link and Images.

The Challenge

The challenge with this type of problem is that by their very nature the payload your looking for will vary that any form of formal search for a static URL will fail as Phisher and spammers have developed ways of getting around scanning methods that just look statically for values (basically ruling out Search-Mailbox). So one way to attack this is to get the Body Content one by one (which is an expensive thing to do in terms of time and resources) and do the scanning at the client end.

Different types Messages Bodies

The format for Message bodies can vary depending on the Mail Agent (eg the email client) that is sending the Message for example in Exchange you could have a Native Body type of RTF,HTML or Text (or it could be multi part). If for example you are using Outlook and you have chosen RTF as the Body type when sending a Message to another user locally on the same Exchange server. Then only the native body RTF will be stored for the Message and the Exchange Store will do an on the fly conversion of the RTF body to HTML when the first client requests the HTML body. The Best body algorithm describes this problem in more detail . With my scripts I've chosen to use the PidBodyHTML Extended Property for the HTML body because I found this gave me the most raw version of the BodyHTML which was important to getting the most accurate link report.


You would think that parsing HTML would be a pretty basic and easy thing to do in any API and it is up to point. Eg a lot of people point towards using this method in PowerShell to parse HTML

$HTMDoc = New-Object -com "HTMLFILE"

While this works okay and produces a nice result with all the Links and Images in a collection because this is also essentially rendering the HTML it will execute any javascript in the HTML (which shouldn't be there for Email) but also it downloads all the images in the src links. On suspect content this isn't what your really want to be doing and even on Marketing type emails because often images in emails are used to perform beaconing so if your looking to do something simular to this yourself be very careful of using any objects that are going to parse (especially those that reuse browser objects like the above example) to a dom as there might be unintended consequences you didn't expect if you don't fully understand how the object you using is parsing the content. With my script I'm just relying on firstly a very simple RegEx to get all the HTML tags and then some other filtering code to pull the attributes out for href links, base  and src links and then some further code to expand any base url links. While this isn't perfect and does fail in some instances its at least safe as it won't activate any content and generally you can just tweak the code to workaround any failures.


I've created an EWS version and a Graph/Rest version of this code which should be useable in both OnPrem or Office365. The EWS version can be found in GitHub here the Graph version is in my Exch-REST module which  is available from the PowerShell Gallery and GitHub (version 3.8)

The Code

With the code I've written its separated into two function the first function

 Get-EWSBodyLinks -MailboxName -FolderPath \Inbox -MessageCount 500

 Get-EXREmailBodyLinks -MailboxName -FolderPath \Inbox -MessageCount 500
The inputs are relatively simple it will take the FolderPath and MessagCount for the number of messages you want scanned. Then the function does the parsing of the Message Body and builds 3 dictionary objects with the Links,Images and Basehref details of the underlying HTML body of the messages that are scanned. This property is the added back to the EWS Managed API or Custom Rest object so it available for further pipeline or script processing in PowerShell.  eg

theses properties are collections or URI objects so you can do further things like

$Messages[0].ParsedLinks.Links | select absoluteuri

to just show the absolute URI on a message or if you where just interested in links from a particular URLShortner you could use

$Messages[0].ParsedLinks.Links | where-object dnsSafehost -eq ""

And a whole number of other things

BaseStriker Reporting

In the instance where you want to see which emails are using the base href tags (which may or may not be related to basestriker you can use the following)

$BaseHrefMessages = Get-EWSBodyLinks -MailboxName -FolderPath \Inbox -MessageCount 10000 | where-object {$_.ParsedLinks.HasBaseURL -eq $true} 

$BaseHrefMessages =  Get-EXREmailBodyLinks -MailboxName -FolderPath \Inbox -MessageCount 500 | where-object {$_.ParsedLinks.HasBaseURL -eq $true}  
These examples will return a collection of Messages that are using the BaseURL which you can then have a look at further. For example if you had a Mail that was matching Avanan's sample for BaseStriker the ParsedLinks property on a returned message would look like

In the parsing code I expand out the relative URL's that are used when there is BaseURL in the document.

In most of the scanning that I did on my email there where a few companies that used the BASEURL legitimately for instance it seems to be used in OneDrive where you share a item in the invitation message that gets sent out.


The second cmdlets I've written takes the data from the above functions and then preforms a consolidation report on the Domains in the href links, the domain in the Img src links, the href and img src's. For each of these reporting areas it counts the number of times the link appears and the number of messages that the link or domain appears in. To run the Reports

$Report = Get-LinkReport -MailboxName -FolderPath \Inbox -MessageCount 100

$Report = Get-EXREmailLinkReport -MailboxName -FolderPath \Inbox -MessageCount 100
In these examples you will end up with a $Report variable that contains collections that you could export to CSV or do some further manipulation eg

$Report = Get-EXREmailLinkReport -MailboxName -FolderPath \Inbox -MessageCount 100
$report.Domains | Sort-Object MessageCount -Descending


There are a lot of Links and Images used within email so this type of parsing of Email will produce a lot of data that you need to filter or process further. Eg if you started to find links that you think might be suspect then you may want to look at using a service link VirusTotal which has the ability to scan suspect links and return the results using an API. They also provide a paid for private API's if your going to do this in a high volume nature. The other thing is downloading the body of each email is a pretty costly process so watch out for throttling if your doing this on a large scale basis.

Wednesday, May 09, 2018

Junk Email reporting with PowerShell in Office365 Part 2

This is part 2 of my Junk Email reporting series of posts for Part 1 which covers using Message Tracking please see . In this post I am going to look at using the Mailbox API's EWS and REST to actually read the contents of the Junk Email folder in Exchange and from there we can report on the various aspects of the Antispam information that is available in the Message Headers. Firstly if you are just looking for something to do single message analysis then I would check out Stephen Griffin's  Message Header Analyser Addin for Outlook this is a brilliant little tool for that. In this post I will focus on doing it in bulk using PowerShell and building some reports to allow you to see what's happening.

Mailbox Access

A big consideration if your going to be accessing Mailbox data is security, one of the benefits of using the REST api over EWS is that you can be very granular about the access that you give the App. At a very minimum you need the "Mail.Read.Shared" oauth Grant that give you API access into a Mailbox and then the underlying rights on the JunkEmail folder for the account your Authenticated as . EWS will still require convention access rights to work

Accessing the Data

Once you have your authentication sorted out you just need to access the MessageHeaders of the messages that you want to report on. Because of the size of this information you need to make a GetItem request in EWS for each item while REST can handle this for that property.

What Headers to look for and what your looking at 

This can be a little bit of a moving feast but currently if you want to look at the Authentication Results which include the SPF,DKIM and DMARC results then look at the Authentication-Results Header typically it will look like

The compauth part of this header is the Composite authentication result which is documented in . If a message has been forwarded through to you as part of a Mailing list your a member of another header you may see is the Authentication-Results-Original (or X-Original-Authentication-Results) how this header is implemented by Office365 is not entirely clear but I find it a useful header to look at when trying to work out why something is junked when the source domain looks okay.


This header should contain the PCL (Phishing Confidence Level) a good link for this and BCL (Bulk Complaint Level) which is the Bulk mailing list compliant level documented here


This header documented here contains the Country of origin value, SFV (Spam filter Verdict),SRV,IPV,PTR

SCL is the Spam confidence level which is where we all started back in 2003.


Is the ASF (Advanced spam filtering) Header for those using this feature

This isn't an extensive list just those that I've written code to process out.

Putting this all to work

All this information is only as good as how you can use it because Email Authentication is a hot topic at the moment lets see how we can put some script to work to help us look at this data. For EWS and REST I've created a ProcessAntiSPAMHeaders script that first indexes the Messages headers and then uses some REGEX to extract the relevant property data from the headers I've talked about above. For my examples I'd do it in two stage first is you need to get the ItemCollection that will be the basis for any reporting or investigation we do which involves enumerating the Items in the Mailbox and then processing those headers. Then you can use this ItemCollection is different way instead or re-enumerating messages each time you want to look at a different view of the data.


For EWS I've created a simple script that lets you specify a FolderPath, MailboxName and how many emails you want to look at and it will then return a collection of those items with all the Properties from the headers parsed out and promoted as first Class properties. This script is located here to use it to generate the Message collection use the following to look at the last 50 email in the Junk Email folder (Note you may need to adjust the spelling of the Junk Email folder) 

$Messages = Get-EWSAntiSpamReport -MailboxName -Credentials $creds -FolderPath "\Junk E-mail" -MaxCount 50


For the Graph API I've included all the processing code in my Exch-REST library which is available from the PowerShell Gallery and GitHub to use this we can use either Get-EXRWellknowFolderItems for Well Known folders such as the Inbox and JunkEmail Folder or Get-EXRFolderItems for all other folders. eg to get the top 50 messages from the JunkEmail Folder and process those

$Messages = Get-EXRWellKnownFolderItems -Mailbox -WellKnownFolder JunkEmail -Top 50 -TopOnly:$true -ReturnInternetMessageHeaders -ProcessAntiSPAMHeaders

What you can do with this Messages Collection

Browse Around

If your in troubleshooting mode one the best things to do is just browse around the data in PowerShell to see what's happening eg showing the Sender,SPF,DKIM,DMARC to the shell for the last 50 emails 

Or if you want to look at the SCL,PCL,BCL,SFV and CTRY values for the last 50 emails

Or if you just wanted to look at Messages that have failed DMARC

Or came from a specific Country 

Or if you wanted to compare the DMARC result to the Original-DMARC results for mailing lists

And any other combination of property values you want to look at to work out more what's happening to email that is ending up in your Junk Email folder. This type of drill down analysis should be useful in building your knowledge of these Antispam markers or to help spot any new trends that you might not be aware of etc or just to look flashy in a technical meeting (and you can do it all for free!! just a little PowerShell knowledge is required).


I wrote this separate post on digesting emails as this was another lengthy but useful thing you can do with script. For JunkEmail here is a sample digest of the last 10 messages in the JunkEmail folder and digest the Authentication Results 

to produce this report in REST using Exch-REST

$Messages = Get-EXRWellKnownFolderItems -Mailbox -WellKnownFolder JunkEmail -Top 10 -TopOnly:$true -ReturnInternetMessageHeaders -ProcessAntiSPAMHeaders -SelectProperties "ReceivedDateTime,Sender,Subject,IsRead,inferenceClassification,parentFolderId,hasAttachments,webLink,BodyPreview"
Send-EXRMessage -MailboxName -To -Body (Get-EXRDigestEmailBody -MessageList $Messages -Detail -InfoField1Name SPF -InfoField2Name DKIM -InfoField3Name DMARC -InfoField4Name CompAuth -InfoField5Name SCL) -Subject "Junk Mail Auth digest"

$Messages = Get-EXRWellKnownFolderItems -Mailbox -WellKnownFolder JunkEmail -Top 50 -TopOnly:$true -ReturnInternetMessageHeaders -ProcessAntiSPAMHeaders
Send-EWSMessage -MailboxName -To -Body (Get-EWSDigestEmailBody -MessageList $Messages -Detail -InfoField1Name SPF -InfoField2Name DKIM -InfoField3Name DMARC -InfoField4Name CompAuth -InfoField5Name SCL) -Subject "Junk Mail Auth digest" -Credentials $creds

Complete AnitSpam property Report

One last sample for this post would be to take all the properties that we are extracting and creating a spreadsheet to allow you to view them at a glance eg all these

You can create one big CSV file that you can open up in Excel by taking the $Messages Collection we generated above and selecting all the related AS properties and using export-csv eg

 $Messages | Select-Object SenderEmailAddress,Subject,SCL,PCL,BCL,SFV,SRV,IPV,CIP,PTR,ASF,CTRY,SPF,DKIM,DMARC,COMPAuth,Original-SPF,Orignal-DMARC,Original-DKIM | Export-Csv -NoTypeInformation -Path c:\temp\AsReport.csv