machine learning

Supercharging Scoro with Machine Learning

“By words we learn thoughts, and by thoughts we learn life.” – Jean Baptiste Girard

For centuries, we have admired words for their immense beauty and effect. For decades, we have used computers to try and decipher them for meaning. Past six months we have been working with Software Technology and Applications Competence Center (STACC) and the TEXTA toolkit. Our goal was to understand text that is used to describe tasks and calendar events. Primarily we were looking for topics and keywords – the forest behind the trees – and secondly, semantical meaning – are the topics discussed positively, negatively or with no apparent emotion attached. Our project received funding from Enterprise Estonia (EAS) through the development voucher in the amount of 18 200 euros.

This 9-minute read will give you an overview of how we are using machine learning in Scoro, the impact on your business, and also some background knowledge about:

  •     machine learning and AI
  •     Scoro – an intricate system with a network of connected objects
  •     Scoro – a complex set of reporting tools

Machine Learning is a vague and unfamiliar field for many, therefore we need some disclaimers:

  1. Data protection and privacy are paramount to us.
  2. All data used for research is our own internal data.
  3. Machine learning can and will be applied to your site’s data only after you explicitly ask us to.
  4. Each site can have their own set of rules for the selection and enrichment of data.
  5. Everything described will be part of our existing product – just as any other new feature.
  6. Any machine learning won’t be available for at least another couple of product releases.

With expectations managed and boundaries set, we can start with theory.

Structure

 

Scoro relations

Relations in Scoro (simplified)

Scoro is a holistic business and time management platform which means that every item is connected in any way, shape, or form to every other item.

For example, with the work planning and time billing use case the above chart can be organized to layers as:

1st level: tasks, calendar events, time entries

2nd level: includes connections to people (responsible users, assignees, linked contacts), activity types, project phases, and invoices

3rd and 4th level: teams, companies, projects, quotes and orders, payments, locations, and so forth

Scoro relations

Scoro structure organized

In addition to everything described above, there are more objects and more connections between the different entities.

How does it benefit you?

For all Scoro users: having connected data saves time and therefore money. With a few clicks, you can create an invoice for a project – costs from meetings and completed tasks are collected and aggregated. Another click sends the document to the customer’s mailbox, and to take it a step further – you can even automate the process. Your most valuable clients will get the invoice and list of completed items on time, every month.

A more strategic purpose is having a powerful and complex set of reports at your fingertips. In the simplest form, Scoro displays a set of graphs: time spent on projects grouped by activities or users; financial KPIs by month or customer; and success metrics in sales.

work_by_activity

Work by Activity groups and Invoices in period

Diving deeper, Scoro presents methods for intricate analysis, which composites data from different sources. The main categories are work, sales, financial, and general reports with each of these containing multiple reports.

Our project has most in common with the work and time management use case and the “Detailed work report” provides one of the most comprehensive overviews for this. It can answer the question “who does what for whom and when?” with a possibility to define even more questions or set confines by filtering.

detailed_work_report

Detailed work report

In the default view, the list already displays tasks and meetings, their dates and activity types, and also related users, contacts and companies. The View button allows adding more dimensions and the Filter button singles out any of the related dimensions. This toolset is enough for a comprehensive analysis of the work being done in an SME. However, Scoro doesn’t stop here.

Author’s note: I am a fan of numbers, data and creating value from them. When I joined Scoro (exactly a year ago) as a machine learning evangelist, I was thoroughly blown away by the functionality that is available.

One of the most significant mechanisms for data analysis in Excel is the pivot table. It enables combining data by any columns and aggregating the values in several ways – sum, average, count, etc. The same – grouping by two dimensions – is possible in Scoro, both in the reports and the task list view.

This means that the data, which is already in Scoro and has all the possible relations set, can be transformed into insights that are necessary and plentiful to understand the business, customers, employees, and projects.

detailed_work_report_by_projects_users

Detailed work report grouped by Projects and Users

The data is rich with information – it has connections to related objects and is organized in a logical form. Arranging it in a way that is relevant and comprehensible to you, allows making decisions that benefit the company and its employees most. Finding places and processes to improve and optimize is a simple matter after you understand the data.

Machines learning

Machine learning is used to create user-specific solutions. Most value is derived – compared to traditional programming – in cases when it is not known how a person may use the system: from the data added, pages visited, relations created to the results derived.

Let’s look at the difference between regular programming and machine learning – with math! Traditionally, the formula and the input data is known. If you use the formula on the input values, you know the output.

A mathematical relation between input and output can be written as y=f(x) with a more familiar form of y=2x^3+1 for example. If we look at input values (x) between the range -3 … +4, we can calculate the output and all three components are then known.

math_graph

Graph for y=2x^3+1

In machine learning – supervised learning more precisely – the formula is unknown, but instead, we know what the outcome should be. Real-estate is a good example: we know the number of rooms, location and other parameters of a listing and we know its price. Yet, there isn’t a universally accepted (deterministic) formula to calculate the price for new listings. Machine learning tries to find the correlation between the inputs and the output, and find a formula to best describe the occurrence. Machine learning per se is a set of tools for approximating the correct value.

ml_graph

Machine learning seeks the formula to describe the points with minimal error

We split the known inputs and outputs into two groups. First one is used for training a model – this is sort of like an engine that tries to understand the relations between the two values. The above example shows how the model creates a regression line, which enables calculating probable outputs for new inputs. The second half of the known data is used to verify that the model is created correctly. There are many intricacies, how machine learning models work, but it is a universally good idea to use valid information to see that the model works for new and previously unknown data, as well as with the data that it already possesses.

It is necessary to clarify two similar concepts. Firstly, machine learning (ML) is a process for analyzing data and finding hidden relations. It is similar to reasoning. Secondly, artificial intelligence (AI) this is an automated process built for decision-making and acting on those decisions. The topics are tightly related, and more often than not, AI is built using machine learning. The main distinction is taking the final automated action.

The actual project… from a technical viewpoint

The basis for the experiments were our own internal tasks and events.

Starting point: text values (title, description, and comments) with its metadata (activity type, responsible team, related resources, etc).

Processing: different models within the solution try to approximate the correct values.

Results: main keywords, sentiment, several categorical values related to the text.

TEXTA toolkit

TEXTA toolkit

Most of the process can be described as:

  1. collecting data
  2. transforming the data into a usable format
  3. sending data to the toolkit (the toolkit is accessible over an API)
  4. storing data – in ElasticSearch
  5. cleaning the data
    • translating text to English – more available lexicons and tools
    • lemmatizing the text
    • removing stop words
  6. training the models
    • finding similar text bodies
    • extracting topics from text
    • assigning a semantic label to the topics
    • adding custom labels
      • activity type
      • assignee (as a team)
      • project
      • etc.
  7. running the topic extractor on different datasets
    • keywords that are too frequent are removed (in our case, too many texts contained “Scoro” and “email”)
    • keywords that exist only in a few texts are removed – no apparent trends
  8. displaying the results
  9. calibrating the models (user can adjust the topics, group similar keywords and retrain the models)
  10. repeating the process for new data and thanks to user input

Integrating the new values to Scoro

The roadmap for integrating machine learning and someday using artificial intelligence for automated actions has thousands of steps. We’ve taken the first few, and we have actionable items for years to come. The initial implementation of the solution is simple. With each version of Scoro, it will become more polished, and finally, it will be an indivisible part of the whole product.

Thankfully, “simple” in terms of Scoro means “easily usable” rather than “feature-lacking”. Our system allows adding custom dimensions to existing objects and views. These are then displayed in reports, list and detail views, and over the API. They can be used like any other column to sort, filter and group the data.

custom_fields

Custom fields for displaying the results

To display the results of the semantic analysis, we only need to follow four steps.

  1. Add custom fields under Tasks (sentiment, topics)
  2. Use Scoro API to request new and updated tasks.
  3. Send the text values to our analyzer.
  4. Update tasks’ custom fields – again through Scoro API.

These simple steps are all that is needed to start using the TEXTA toolkit on our dataset. After saving the values back to Scoro, the data is available in all related lists and reports.

Find out what excites your customers and employees

Scoro has many cool features, for example, the smart inbox. Add your Scoro smart inbox address as BCC to any customer correspondence. All emails will be imported as tasks, and all possible values automatically set: subject as the tasks title, email body as the description, and if the customer’s email is connected to a Person or Company, these attributes will be automatically set as well.

Regardless of whether you’re looking at your own tasks, your team’s tasks or a list of helpdesk emails, understanding the emotions behind the words gives valuable insight.

task_list_sentimentTasks list with Sentiment values

At the top, most abstract layer we can find answers to questions such as which projects had the most positive feedback? Has the sentiment changed compared to the month before? These and many other hypotheses can easily be checked using the existing group by functionality.
detailed_work_report

 

Detailed report grouped by Projects and Sentiment

Creating metrics and keeping a running tally is prompt: with a couple dashboard metrics, the ratio between positive, negative and total tasks is readily available.

Dashboard ratio metrics: All vs. Positive or Negative tasks

Future developments include showing the sentiment analysis over time and many other graphs similar to the one below. From analyzing our data, we found out that with a required degree of certainty 26% of our items are positively inclined, and only 4% address some concerns or negativity. Most of our communications is still neutral – which in a work environment is nothing but positive!

pasted image 0 23

Number of positive and negative mentions over a time period

Topics

Displaying topics is more complex than sentiment classification – the latter can be displayed as a single number but the former it is still text. Also, each task may contain many topics, which means that we need to develop new tools. We promise, the results will be stunning, it will just take some time.

pasted image 0 26

List of main topics (by occurrence) by quarter

Currently, we can look at topics related to a given task. Our roadmap includes displaying the main topics as different lists (overall top or grouped by customer, team, project or other values). “Time” is a key metric: trendlines for topics show how many tasks are related to a keyword and what are the emotions towards it.

zapier_trendlineNumber of tasks and events with the topic “Zapier” over time

There are possibilities for deeper analysis, e.g. “Does working with the same topic get monotonous or does the performance improve?” To understand this, we can compare:

  • the occurrence frequency of topics
  • time spent working on a topic
  • average sentiment
  • progression of frequency or sentiment over time

The data can be divided into projects, companies, users and other related values.

main_topics_with_count_and_sentimentMain topics from different projects with frequency and sentiment values

Overall, the goal here is to visualize which topics are most relevant to your employees and customers. The different axes may represent the number of times a keyword appears in tasks and events; the emotional response is displayed as a single aggregated percentage (whether it is usually mentioned more positively or negatively). It is equally important to understand the values’ progression over time. Graph “Number of tasks and events with the topic Zapier over time” shows how the team reacted to integrations with Zapier. The beginning and end are overwhelmingly positive, but the orange dots at the beginning of 2018 mean that at that time the keyword “Zapier” had a more negative undertone than a positive one.

Unexpected value

During the development phase, there was a pleasant surprise: for the same amount of work we received additional benefits. The expected outcome from the TEXTA toolkit was knowing the main topics and the sentiments. After a bit of work, when the developers understood our dataset better, we found that it is possible to predict the activity type, project, company and even the assignee (at the team level) quite accurately.

new_task_popup

New task popup with Description as the main field

We start from the description field – the main text. In the background, we are processing the text to find the main topic and other values. If there is enough information, we can fill the title with the main topics and additional fields with values from most similar tasks. To achieve sufficient precision, around 50 examples (previous tasks) for each value is required. If the fields are set to satisfaction, we can simply save the task and carry on. If something needs improvement, we can do that before moving on.

The value here occurs twice. Firstly, if the system prefills everything correctly, the user saves time and is more efficient. Secondly, with more fields automatically set, the data quality improves. Building any reports on top of this means fewer unspecified values and more accurate data – allowing you to rely on the reports more assuredly.

You’re still the master

Machine learning is often a source of concern: a computer or even worse, the cloud, is looking through your data and making decisions on your behalf. In reality, machine learning is still a part of the software. The models look for correlations that they have been taught to find: at the beginning of the training process the models are dumb. The initial suggestions for topics are quite vague and semantic classification is incorrect. Then we define which data the models should look for, and after each iteration, the models convene towards the correct values. The improvement process is also continuous: new information is always added, and the models need to be retrained for different sites.

Therefore, the correlations are directed by the end user and the suggestions given are easily overridable. In this case: sentiment and topics are just custom fields – editable and configurable. Any future prediction that the user can’t directly modify will be subject to a general feedback system. The goal is to build value to the site’s users, and only they know what is relevant to them.

The overall process of using Scoro doesn’t change because of this projects outcome too much. However, we’ve added an enormous set of tools for seeing the big picture and getting a comprehensive overview – effortlessly. Upon that, making decisions and establishing plans will be backed by data and determined by you.

Scoro and machines

This is the beginning of machine learning in Scoro – there will be many breakthroughs to come.

Collaboration with STACC and additional funding from Enterprise Estonia has shown just how much-hidden value our actions hold. With this initial solution, we have already added new dimensions to tasks, gathered and correlated data, improved the overview of the items we spend our time working on and automated some processes.

We will continue working with machine learning and improving Scoro. It is a toolset, and its value lies only in the way it is used. We can use the knowledge gained from the project to build upon in future endeavors. That is, finding more gems from the way people tackle their problems, how different keywords and our attitude towards them changes over time.

If you have any questions or would like to test our solution – both in its current form or in the upcoming months with machine learning enabled – don’t hesitate to contact us. Or simply sign up for a free trial at scoro.com.

Want to know more about this?