Sunday, 22 April 2018

Thoughts on PowerBI anonymous sharing link

So today we will explore in a very brief post how Power Bi generates the anonymous sharing link, It all started when I created an anonymous sharing link which looked like the following:{some key}

so the part I was curious about is the generated key which looks like a base64 encoded string, I decoded the value and I get this JSON string representation

Just from this JSON string representation, I can somehow understand two values which are the GUID values, k is referring to "key" which is the sharing key of the report which I discovered to be unique per report. The most obvious value was the "t" value which refers to your tenant. to be even more precise it's the azure active directory "Directory ID" value. you can always double check by logging into your azure portal and check your azure Active Directory ID in the properties page

the "c" value doesn't really resonate with anything I thought it's somehow refers to the report Id. although one might argue that the key GUID is enough as it will uniquely identify the report. However, I tried to change it encode the object and open it in the browser. Interestingly enough I had my report opened. Firstly, I thought the browser might cache a certain value so It tried a completely different browser and it worked as well!

What I did next was even more interesting so I removed the whole "c" property and guess what it also worked, so I have no clue what this "c" value refers to!

one interesting thought will be whether the sharing key is enough for security reasons, and given that it's the GUID is globally unique as it uses device Id and timestamp in addition to some other things , (algorithm, uniquifier) for more information regarding GUID structure you can refer to this blog post

That's bring us to the end of this brief post.

Tuesday, 17 April 2018

Auto classify Documents in SharePoint using Azure Machine learning Studio: Part 2

This is part 2 of a two parts blog series which explains briefly how to use azure machine learning to auto classify SharePoint documents. In part one, we covered the end to end solution skeleton, which relies on using Microsoft flow. The flow is set to be triggered whenever a new document is uploaded to our target SharePoint library.

The main challenge we faced is how to extract text representation from the Microsoft office file .docx and as explained in the previous blog post. I end up using the .NET version of Open Source Tika to extract the text. In the previous blog post we referred to the Azure Machine learning as web service we call using flow HTTP invoke action. Today we will explore this black box in more details:
  1. The Data 

    This is by far the most important and complicated step of the whole process as it's too specific to the problem you're trying to solve. Also the availability of data that is sufficient to train your model with high probability of correctness is something you will spend most of the time trying to figure out.
    In my example as it's merely a POC, I did choose an easy path. I used an existing dataset available to everyone has access to Azure ML studio (BBCNews Dataset) and then I tailored the SharePoint content to match the data set.
  2. The Training Experiment

    Create your first Azure ML training experiment is a relatively easy task compared to data preparation phase.
    First step is to navigate to and login using your Microsoft Account, you don't need to have an active Microsoft Azure subscription or credit card. LUIS and Bot framework were also free but now you need to have Azure subscription to use these two other services, don't know whether this will change in the future. However, till the time of writing this blog post it's still free.
    From the left menu choose experiments and choose new , choose blank experiment as we will build this together from scratch

    Let's name our experiment (our shiny new experiment) and we will use BBC news data set, you can substitute this with your own prepared set which will have the text representation of Office document with the current category

    Then we will do some data cleansing, select News and category from the data set and clean up empty rows by removing any rows that does have any missing columns by setting minimum missing values range 0 to 1 ( that means if only single column has a missing value the action "cleaning mode" will be triggered , we will choose to remove the entire row

    Now it's getting more exciting, we will use some text analysis technique called Extract N-Gram features, this will use the News column (text representation of the document) as an input and based on the repetition of a single word or tuple of words it can analyse the tuple effect in the categorization.
    First thing we will change the default column selection from any string column to only analyse the News column which represents the text extraction from Office document.

    Secondly,  we will choose to create a vocabulary mode, the result vocabulary can be used later on in the predictive experiment. the next option is the N-Gram size which dictate to what extend you want the tuple to grow. for example if you keep the default 1 it will only consider a single word. However, if you choose 2 it will consider single word and any couple of consecutive words. In our example we will use three that means the text analysis will consider a single word, two words and three words tuples.
    In the weighting function there are multiple options, I will choose TF-IDF which means term frequency and inverse document frequency.

    This technique is give more weight for terms that appears more than others with the consideration of giving a negative score for terms that appears in different documents( news items in our case) with different classification. The final score for the vocabulary is a mixture of both TF and IDF values.

    There are lots of other options which can be viewed in more details in this excellent guide for N-Gram module guide here

    One important option is the desired output features which is basically how many tuples you want to use to categorize your data, this might be a trial and error for a newbie like me until I can see the top n effective tuples to make it easier and faster for the trained model to compare future text against, in my scenario I used 5000 Features as a result of N-Gram Feature extraction step.

    After the data preparation and vocabulary preparation , we will use 4 steps that is common across any training experiment which are (Split, Train, Score and Evaluate)

    The first Step is to split the data randomly with some condition or purely based on a random seed, train the model using part of the data then score the model using the other portion, last step is to evaluate the model and view  how accurate your model is.

    Based on the evaluation result you can see overall model accuracy, at this point if the model is not hitting the mark ( you can set target accuracy based on business requirements). One possible solution is to change N-Gram tuple values, output features or even use a complete different training algorithms. Sometimes it's useful to to include additional columns or metadata to help categorize the documents maybe a department name or author not only the document content.
    In my sample I got overall accuracy of 80% which is accepted to me so relying on the text extraction from the documents is sufficient to me.

    Now let's run the experiment and confirm that all steps are executed successfully which we can validate with the green check mark on every step

  3. The Predictive Experiment

    After a successful run of the training experiment we can easily hover over setup a web-service in this case we can have retaining web service which will allow use to provide Dataset as an input and spit out the trained model and evaluation results as an output. for the sake of this blog post we will not play around with this option. We will instead generate a predictive experiment which will allow us to predict item classification based on its text representation. So using the same button in the lower toolbar we will choose to create predictive web service this will generate the predictive experiment for us if we run this and publish it, we will cause plenty of errors as the generated steps trying to create new vocabulary too.
    Let's use the generated vocabulary of the training experiment as an input (we need to save the result vocabulary of the training experiment as Data-set so we ca use it later)

    We will also remove any transformation steps , we will only select single column (news text representation) which will be the single input for our web service.

  4. The Webservice
    Let's run the predictive experiment now and make sure it's working, then we can deploy the web-service which we can use to classify text representation of documents. This will open a new page which allows us to test our web service by supplying text as "News" input and provide scored Label output as text.

    P.S. you must pass the API key as authentication header to make it work ;)

Wednesday, 4 April 2018

Auto Classify Documents in SharePoint using Azure Machine Learning Studio Part 1

I was trying to figure out a way to get the text representation of documents stored either in OneDrive or SharePoint Online to execute some text analysis techniques using Azure machine learning studio. I know for sure (not really just guessing) that SharePoint search index store text representation of the document but I guess this version of the document is not exposed to us

I also happen to know (this one is not a guess) from the good old days that SharePoint search uses IFilter to get file content as text then store this text in the index. I tried to do it in a different way.
So I figured how about doing this document to text conversion myself, I found Tika text extraction library  handy apache open source tool which has been ported as .NET nuget package 

I've create a simple Azure function using Visual studio, It has been a while since I used the full fledged Visual studio as I've been using mostly Visual studio code lately, as you guys can see the azure function is pretty straight forward just 4 lines of code to convert the docx files to text representation so we can use any text analysis techniques on our SharePoint documents.

Now let's hook this to a simple Flow which been triggered when a new file been uploaded to specific SharePoint library.
The flow will start then it will trigger the azure function which will extract the text representation of the office document and send it to a web-service to do some text analysis and return the document classification value.

Then within the flow itself we can update the SharePoint document and update the classification as per the text analysis result.

Hint: We will consider this web service call used in this flow as HTTP2 as a black box for now. to give you a sneak peak It's based on multi-class neural network classification algorithm built using Azure Machine Learning Studio and we will discuss this particular building block in more details in part 2 of this series.

now let's upload a new word document that have a text represents a business article and let's see the updated category text value

Here we go , our smart document categorization flow is able to classify the document as business document.
In the next part of this blog series, we will discuss the azure machine learning studio experiment in more details.

Monday, 12 February 2018

SharePoint Online: Unexpected behavior while trying to demote a news page

In my previous blog post I explained in details how the new communication site differentiate between normal Site page and promoted "News" one. it's all about a newly introduced attribute of the SitePage content type called PromotedState.

It's fairly easy to promote a normal site page using the UI using a simple button , but once you promote the page there is no button to return the page to a normal site page changing its promotedState back to "1".

In this post, I'll walk you through an unexpected behavior when we try to update the value of the PromotedState using CSOM to its original value of "1" to hide the page from the OTB news webpart.

I'm going to use exact code sample I've published before with minimal modifications, you can refer to my previous blog post published almost a year and half ago.

Please note: for the modern experience old fashioned custom actions for the ECB context can not be deployed actually you will get an error message if you try to deploy an EditorControlBlock item based custom action

For simplicity and because I just want to show the unexpected behavior when we manually update the promotedState from "2" to "1", I'll create a new JavaScript function to update the PromotedState and I'll call it demoteNewsItem which sets the Promoted State back to "1"

all other helper functions will remain intact, I'll use them to acquire the token and get both digest and e-tag so I'll be able to update the SharePoint item.

I'm going to call the demote item function from my test.js file which I usually use for debugging purpose, I'm just trying to find a quick and easy way to update the site page "PromotedState" attribute.

If we navigate back to the site we can find that the promote button is enabled however it has no effect on the item once you decided to update PromotedState using the previous mechanism when I inspected the PromoteToNews item action (mentioned in details in the previous blog post) I can still see the result is true which gives you the feeling that the process has been successfully executed!

A much better and user-friendly approach (which we will explore in our next blog post) is to create a new SPFx CommandSet for the site pages library. However having the page stuck on demoted state is still an issue as you can see below:

Thursday, 11 January 2018

Promote a Page in SharePoint Communication site

I was playing with the new communication site lately and I was wondering what differs a news page from a normal site page. All of the pages resides under the same site pages library. The first though entered my head was maybe it's a different content type.

By running a simple PowerShell script I find out that they are both using the same Content Type (SitePage) with ID

I was actually shocked as it was an old practice of mine to create different content type for different content elements also separate layout pages to dictate the rendering of these various intranet contents.
What I see now is an editing based experience. You can have various layouts per single content type as you can have a news page with single column layout or two-column layout that enables you to create a specific experience per item not per content type which is very flexible yet more time consuming from and editing experience and requires more governance to make sure that not every news item will have a completely different experience.

The last drawback can be handled by copying an existing post instead of starting from scratch but again that doesn't stop the super user from messing up the copy.

I know I've got side tracked as usual , back to the main topic of the post, what I did next is that I used my poor PowerShell skills to extract all the attributes of the pages then I compared both using winmerge (pretty old tool right)

I find out that there is a property called PromotedState which is 0 in the case of normal SitePage and 2 in the news Page.

If you navigate to a normal site Page you will find an option to promote a page which gives you an option to publish it as news post (eventually change the PromoteState from 0 to 2), you can only see the promote button if the page is published

to figure out what really happens behind the scene I had to look for all the xhr request in the browser console and I find out that when you click promote button it sends a POST request to

and the response returns as below indicating a successful promotion of the page

Then the page PromoteState will be updated to 2 and will appear in the Out of the box news webpart

Thursday, 7 December 2017

SharePoint Addin: VSTS CI/CD pipeline hosted agent challenge

In this post I will explain setting up a CI/CD pipeline using VSTS for SharePoint Addins without the need to install PnPPowerShell scripts on your Build/Release Agents.

By default the hosted VSTS Release agent doesn't include SharePoint online PowerShell cmdlets. The easy and straight forward option is to use your agent and install SharePoint PowerShell cmdlets on it. However, I want to have a more portable option that will allow me to use the hosted agent without maintaining a release VM.

PnP PowerShell cmdlets

Firstly, what is actually the underlying logic that PnP PowerShell cmdlets encapsulates. It's basically HTTP calls to SharePoint Online RESTful APIs. So in way we can replace the PowerShell Cmdlets with simple http requests.

Gulp to the rescue

By Default, VSTS hosted agent will have node and gulp installed so we don't need to worry about setting up VSTS hosted agent, we will build a gulp task that allow us to publish SharePoint Addin to our app catalog, the main steps will be:
  • Getting app principle
    In order to upload the app package to the app catalog we need to get app principle which will run in app-only mode check my post here to learn how to get client Id and client secret
  • Acquire access token
    Using sharepoint-apponly nodejs module we will be able to get the access token
  • Upload .app package to app catalog site.
  • First we will create a new file let's name it sharepoint.js, then let's import fs and http modules, we will create a single function uploadFile which will be exported to be used in our gulpfile.js 
    here is the sharepoint.js
    and the gulpfile.js

Putting it all together

  1. Let's create new directory , initiate new node module:
  2. Create SharePoint.js and gulpfile.js and paste our code there.
  3. Install needed dependencies which includes sharepoint-apponly module explained here
  4. Let's create new build definition which includes step to copy gulpfile.js /sharepoint.js and package.json to artefact directory
  5. In release definiton let's create two release steps. The first step will simply install npm dependencies 

  6. the other step will be gulp task which will run publish-app task defined in gulpfile.js , notice you can supply the parameters as argument which will be evaluated from release definition variables.

  7. the hosted agent can copy the app package to the appcatalogUrl which in my case defined in release variables.

Wednesday, 8 November 2017

OfficeDev: Register Custom Connector Teams vs Groups

In this post, I'll walk you guys through how the registration process of Office 365 connectors varies from Microsoft Teams (connector to specific channel) and Group connector for specific group conversations.
All office 365 connectors have a single endpoint to register the connector, which is can be accessed via the url ,you need to fill your connector information including an icon which will appear when the users configure it for either inbox , groups or even Microsoft Teams.

How you create a new connector is not the topic of this blog post, if you are interested to know how to create a connector you can refer to this MSDN article here.

However, today I'll walk you guys through creating new custom connector and side-loaded as Teams app.

  1. Using teams yeoman generator create a new teams app, if you want to learn how to run yo teams refer to the readme page of the generator-teams github repo
  2. Choose Connector from the generator options
  3. You will be prompt to provide the connector Guid, which you can get from connectors portal
  4. The generator will generate a sample Typescript code to run your generator then will run npm install to the current directory, followed by a success message
  5. Create an Azure app service to host the generator, alternatively you can use ngrok to host and run this connector locally, in my case I used an existing Azure app service 
  6. Create a local git repo for the azure app service
  7. Initilaize your local git repo and commit the changes
  8. Push your code to azure app service
  9. you will notice that deploy.cmd generated file will attempt to run npm install on the remote azure app service
  10. let's package our teams app manifest file to side-load it to our Teams client application
  11. Now using the Microsoft Teams client app choose any team and select the apps tab (if you see bots tab instead you need to enable side loading apps and switch to developer preview which is explained here)
  12. After sideloading our app which consists of a single connector, let's put the connector to the test by adding it to a channel within the team we sideloaded the app to, which can be easily achieved by selecting connectors from the channel drop down menu
  13. Sadly, the sideloaded connector appears at the end of the available connectors so you might scroll all the way till the end to find your newly added custom connector
When you click configure, a pop-up will appear to render the ***Connector.html page where you can replace *** with your connector name
Now we've reach to the highlight of this blog post and probably the reason that I wrote it in the first place. When you click the button in the above screenshot which labeled "Connect to Office 365" it sends a GET request with specific parameters to endpoint.

The request parameters are exactly the same whether you initiate the request  via the browser or from within Microsoft team client application.  However, the result is completely different. In the first scenario it will create a webhook for Office 365 group conversation and will prompt the user to select the targeted Office 365 group. In the second case a Teams channel webhook will be created with no further user input.
So how the endpoint correctly distinguish between the two different request originator, more importantly how it knows which team and which channel this webhook associated with

when I logged the request I noticed that there is two differences in the header of the requests, the teams request have a different user agent also it has an object called TeamsContext.

Now let's test the connector by sending an GET request to https://connectorURL/api/Connector/ping you will notice a message card with a single viewAction appears.

this is how the endpoint distinguish the two request and how you can easily build and host a custom Microsoft teams channel connector.