Connect with us

Hi, what are you looking for?


Microsoft open-sources SynapseML for developing AI pipelines

Microsoft today announced the release of SynapseML (previously MMLSpark), an open source library designed to simplify the creation of machine learning pipelines. With SynapseML, developers can build “scalable and intelligent” systems for solving challenges across domains, including text analytics, translation, and speech processing, Microsoft says.

“Over the past five years, we have worked to improve and stabilize the SynapseML library for production workloads. Developers who use Azure Synapse Analytics will be pleased to learn that SynapseML is now generally available on this service with enterprise support [on Azure Synapse Analytics],” Microsoft software engineer Mark Hamilton wrote in a blog post.

Scaling up AI

Building machine learning pipelines can be difficult even for the most seasoned developer. For starters, composing tools from different ecosystems requires considerable code, and many frameworks aren’t designed with server clusters in mind.

Despite this, there’s increasing pressure on data science teams to get more machine learning models into use. While AI adoption and analytics continue to rise, an estimated 87% of data science projects never make it to production. According to Algorithmia’s recent survey, 22% of companies take between one and three months to deploy a model so it can deliver business value, while 18% take over three months.

SynapseML aims to address the challenge by unifying existing machine learning frameworks and Microsoft-developed algorithms in an API, usable across Python, R, Scala, and Java. SynapseML enables developers to combine frameworks for use cases that require more than one framework, such as search engine creation, while training and evaluating models on resizable clusters of computers.

As Microsoft explains on the project’s website, SynapseML expands Apache Spark, the open source engine for large-scale data processing, in several new directions. “[The tools in SynapseML] allow users to craft powerful and highly-scalable models that span multiple [machine learning] ecosystems. SynapseML also brings new networking capabilities to the Spark ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models and use their Spark clusters for massive networking workflows.”


SynapseML also enables developers to use models from different machine learning ecosystems through the Open Neural Network Exchange (ONNX), a framework and runtime co-developed by Microsoft and Facebook. With the integration, developers can execute a variety of classical and machine learning models with only a few lines of code.

Beyond this, SynapseML introduces new algorithms for personalized recommendation and contextual bandit reinforcement learning using the Vowpal Wabbit framework, an open source machine learning system library originally developed at Yahoo! Research. In addition, the API features capabilities for “unsupervised responsible AI,” including tools for understanding dataset imbalance (e.g., whether “sensitive” dataset features like race or gender are over- or under-represented) without the need for labeled training data and explainability dashboards that explain why models make certain predictions — and how to improve the training datasets.

Advertisement. Scroll to continue reading.

Where labeled datasets don’t exist, unsupervised learning — also known as self-supervised learning — can help to fill the gaps in domain knowledge. For example, Facebook’s recently announced SEER, an unsupervised model, trained on a billion images to achieve state-of-the-art results on a range of computer vision benchmarks. Unfortunately, unsupervised learning doesn’t eliminate the potential for bias or flaws in the system’s predictions. Some experts theorize that removing these biases might require a specialized training of unsupervised models with additional, smaller datasets curated to “unteach” biases.

“Our goal is to free developers from the hassle of worrying about the distributed implementation details and enable them to deploy them into a variety of databases, clusters, and languages without needing to change their code,” Hamilton continued.


VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Source link

Click to comment

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.




Microsoft today announced the release of SynapseML (previously MMLSpark), an open source library designed to simplify the creation of machine learning pipelines. With SynapseML,...

Top Stories

Bitcoin (BTC) and Ether (ETH) are testing psychologically important support levels at $60,000 and $4,000 respectively. Both these levels are critical to keep the...

Online Business Success

The logo of the European Central Bank (ECB) is pictured outside its headquarters in Frankfurt, Germany, April 26, 2018. — Reuters/File Inflation will “remain...

Top Stories

Israel-based StarkWare announced on Tuesday via Twitter that it had raised $50 million in a Series C funding round, and that the firm’s valuation...

Social Media

Are you ready for the final shopping push as we head into the end of the year? A key date for many is Small...

Top Stories

In a speech published Wednesday noon, Federal Reserve Governor Christopher J. Waller reiterated his skepticism for implementing a central bank digital currency, or CBDC,...

Loan And Finance

HSB, part of Munich Re, is a leading specialist provider of engineering and technology insurance and inspection services. Backed by over 150 years of...


Researchers have observed a new phishing campaign primarily targeting high-profile TikTok accounts belonging to influencers, brand consultants, production studios, and influencers’ managers. Abnormal Security...


You May Also Like


In this post, I will discuss the top ten profitable blogging niches ideas for Adsense approval and high traffic. whether you use Blogger or...

SEO Guide

How to index website on Google? Do you want to drive more organic traffic to your new website? I am sure your answer is...

SEO Guide

Want to rank in Google image search? Images that you use as a featured images when writing a post actually appear on Google Images...

SEO Guide

There are all kinds of pictures of the world on the internet, but to find one of these specific pictures that you want to...