Auto Classify Documents in SharePoint using Azure Machine Learning Studio Part 1
On SharePoint

I was trying to figure out a way to get the text representation of documents stored either in OneDrive or SharePoint Online to execute some text analysis techniques using Azure machine learning studio. I know for sure (not really just guessing) that SharePoint search index store text representation of the document but I guess this version of the document is not exposed to us
Hmm #CognitiveServices can now tell you if text has profanity or derogatory terms.https://t.co/2dumIM2dkR— John LIU (@johnnliu) March 28, 2018
Waiting on @onedrive #ContentServices to return docx as txt then it'd be two @MicrosoftFlow actions.
Make a great O365 scanner that would flag docs to follow up
I also happen to know (this one is not a guess) from the good old days that SharePoint search uses IFilter to get file content as text then store this text in the index. I tried to do it in a different way.
So I figured how about doing this document to text conversion myself, I found Tika text extraction library handy apache open source tool which has been ported as .NET nuget package
I've create a simple Azure function using Visual studio, It has been a while since I used the full fledged Visual studio as I've been using mostly Visual studio code lately, as you guys can see the azure function is pretty straight forward just 4 lines of code to convert the docx files to text representation so we can use any text analysis techniques on our SharePoint documents.
Now let's hook this to a simple Flow which been triggered when a new file been uploaded to specific SharePoint library.
The flow will start then it will trigger the azure function which will extract the text representation of the office document and send it to a web-service to do some text analysis and return the document classification value.
Then within the flow itself we can update the SharePoint document and update the classification as per the text analysis result.
Now let's hook this to a simple Flow which been triggered when a new file been uploaded to specific SharePoint library.
The flow will start then it will trigger the azure function which will extract the text representation of the office document and send it to a web-service to do some text analysis and return the document classification value.
Then within the flow itself we can update the SharePoint document and update the classification as per the text analysis result.
Hint: We will consider this web service call used in this flow as HTTP2 as a black box for now. to give you a sneak peak It's based on multi-class neural network classification algorithm built using Azure Machine Learning Studio and we will discuss this particular building block in more details in part 2 of this series.
now let's upload a new word document that have a text represents a business article and let's see the updated category text value
Here we go , our smart document categorization flow is able to classify the document as business document.
In the next part of this blog series, we will discuss the azure machine learning studio experiment in more details.
hi
ReplyDeleteMachine Learning Projects for Final Year machine learning projects for final year
DeleteDeep Learning Projects assist final year students with improving your applied Deep Learning skills rapidly while allowing you to investigate an intriguing point. Furthermore, you can include Deep Learning projects for final year into your portfolio, making it simpler to get a vocation, discover cool profession openings, and Deep Learning Projects for Final Year even arrange a more significant compensation.
Python Training in Chennai Python Training in Chennai Angular Training Project Centers in Chennai
With a proper and dedicated strategy, it can expose your business to a more specific approach in the form of target marketing. Using the concept of text messaging services for business you can target your identified market and handle your marketing with flexibility and convenience under one easy-to-use platform.
ReplyDeleteHi, Can I have more details of implementation?
ReplyDeleteIt is an excellent blog, I have ever seen. I found all the material on this blog utmost unique and well written. And, I have decided to visit it again and again. Website translation into another language service
ReplyDeleteGreat content material and great layout. Your website deserves all of the positive feedback it’s been getting. Driver license
ReplyDelete