Data & AI Innovation Day 2022

Video Data & AI

Innovation lies at the heart of Eraneos’ mentality. But, how do we make sure that we keep to our credo? In July 2022, we restarted our quarterly Innovation Day after a two-year hiatus. A full day of R&D, where we explore cutting edge tools and techniques chosen by popular demand from a pool of ideas. Everyone was divided into several teams who set their own goals for the day. The day ended with presentations of each team’s results over some drinks during which we also selected a winner to present their findings in a Techtalk. Here we give you a summary of the main topics that made the cut during Innovation Day 2022.

Accelerating Model Explainability with SHAP

Machine Learning models can be extremely complex and difficult to comprehend. A fraud detection model could reject a customer’s loan request without the users understanding why that happened. It’s becoming crucial to be able to explain the key decisions made by the models we depend on. This is partly driven by regulatory requirements and partly due to the desire to reduce bias in models by the general community. SHapley Additive exPlanations (SHAP) is an important tool one can leverage towards explainable AI. The most common SHAP implementations are run on single node machines, which make it a slow and tedious process. However, SHAP value calculations could be accelerated by being paralellized across several machines or GPU’s. This is what Angel’s team will be working on during this Innovation Day.

Dynamic Impact Dashboard

Organizations rely on various marketing tools to communicate with stakeholders, expand their market share, or promote their brand. One of those methods is advertising/promotional campaigns. However, analyzing the efficiency of those campaigns is often times quite difficult. The goal for Wido’s team is to create an interactive PowerBI dashboard that supports the analysis of the promotional impact on store level, using Causal Impact analysis. If successful, we can use this set up to bring even more insight to clients. The technical challenge we need to solve is the interaction between PowerBI and an on-demand model, wrapped into an (AzureML) API. For this use case we require the user to make custom filter selections which are input to the ML model. The integration aims to provide customer input to a model endpoint and visualise the result.

Self-supervised learning in speech, vision, and language

Recently, Meta published a paper on the topic “Data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language“. Data2vec is a model architecture that uses the same learning method for either speech, NLP or Computer Vision. Since  ‘Transformers’ is a topic of discussion in Anchormen’s deep learning guild as the future direction of all deep learning (Computer Vision, NLP and Timeseries), Emilio and Mallory’s team will be testing out Meta’s framework to see how it works and if it has any feasible application at this point.

Data Hub & the “Modern Data Stack”

The data community is starting to converge around a set of tools called the “Modern Data Stack“.  The goal of the MDS is to be an enabler for data democratization, where any user should be able to access data to make data driven decisions at any time, through self-service discovery, without having to go through the IT department. Key concepts:

  • Self-service and low-code
  • SaaS and out-of-the-box services
  • Designed with data-quality and governance in mind

We should understand how we can adopt and integrate certain tools and/or practices of the MDS. During innovation day, Derk’s team will try to setup a complete end-to-end data pipeline on Azure using tools from the MDS to determine how easy it would be to incorporate them into Anchormen’s the Data Hub solution.

ADX – Azure Data Explorer

Real-time (timeseries) data and insights, such as IoT and Web behavior, are becoming more and more vital. We are expecting a growing number of these types of use cases in the future. While the Azure Data Hub has capabilities to distribute, transform and store these types of timeseries data it does not yet contain a specific ‘timeseries database’.  During this Innovation Day, Corne and Patrick’s team will be exploring Azure’s timeseries solution called Data Explorer, formerly known as Kusto. At Anchormen, we’ve used it for some basic use cases in recent projects and found it to be a mature and feature-rich Azure solution. Among other things it includes:

  1. Timeseries specific functions: datetime aggregation, line fitting (regression), seasonality detection, missing values filling
  2. Timeseries anomaly detection and forecasting
  3. Internal transformation pipelines (update policies and materialized views)
  4. External tables like tables in SQL Server and Parquet data on blob / data lake which can be queried via ADX
  5. Possibility to run R and Python code as part of internal transformation pipelines.
  6. Build in dashboarding similar to Azure Monitor (ADX is the database used within Azure for all monitoring…) but it is also possible to integrate Graphana.

During the day, we will learn more about ADX by implementing a ‘simple’ data solution from ingestion to dashboard. This can be interesting for engineers as well as data scientists who can explore the timeseries functions like forecasting and outlier detection.

Video

All the sessions resulted in some interesting findings which will be shared in the coming weeks on this blog. So stay tuned! If you want to see how the day went, check out the video below (Eraneos powered by the Anchormen Data & AI team):

By Eraneos Netherlands

Knowledge Hub overview