UPDATED 00:01 EDT / OCTOBER 21 2024

AI

IBM releases new Granite foundation models under ‘permissive’ Apache license

Furthering its drive to build a distinctive position in enterprise artificial intelligence, IBM Corp. today is rolling out a series of new language models and tools to ensure their responsible use.

The company is also unveiling a new generation of its watsonx Code Assistant for application development and modernization. All of these new capabilities are being bundled together in a multimodel platform for use by the company’s 160,000 consultants.

The new Granite 3.0 8B & 2B models come in “Instruct” and “Guardian” variants used for training and risk/harm detection, respectively. Both will be available under an Apache 2.0 license, Rob Thomas (pictured), IBM’s senior vice president of software and chief commercial officer, called “the most permissive license for enterprises and partners to create value on top.” The open-source license allows models to be deployed for as little as $100 per server, with intellectual property indemnification aimed at giving enterprise customers confidence in merging their data with the IBM models.

“We’ve gone from a world of ‘plus AI,’ where clients were running their business and adding AI on top of it, to a notion of AI first, which is companies building their business model based on AI,” Thomas said. IBM intends to lead in the use of AI for information technology automation through organic development and its acquisitions and pending acquisitions of infrastructure-focused firms like Turbonomic Inc., Apptio Inc. and HashiCorp Inc.

“The book of business that we have built on generative AI is now $2 billion-plus across technology and consulting,” Thomas said. “I’m not sure we’ve ever had a business that has scaled at this pace.”

The Instruct versions of Granite, which are used for training, come in 8 billion- and 2 billion-parameter versions. They were trained on more than 12 trillion tokens of training data in 12 languages and 116 programming languages, making them capable of coding, documentation and translation.

By year’s end, IBM said, it plans to extend the foundational models to a 128,000-token context length with multimodality. That refers to enhancing a model’s ability to process significantly longer input sequences and handle multiple data types simultaneously. Context length is the number of tokens — such as words, symbols and or other units of input data — that the AI model can process and retain. Typical models have context lengths of between 1,000 and 8,000 tokens.

“IBM is taking the right approach in my view,” said Dave Vellante, chief analyst at SiliconANGLE’s sister market research firm theCUBE Research. “Rather than trying to be the biggest LLM and compete head-on with those consumer models like ChatGPT and Llama, it’s focusing on smaller, more efficient and cost-effective models.”

Enterprise workhorses

IBM said the new Granite models are designed as enterprise “workhorses” for tasks such as retrieval-augmented generation or RAG, classification, summarization, agent training, entity extraction and tool use. They can be trained with enterprise data to deliver the task-specific performance of much larger models at up to 60 times lower cost. Internal benchmarks showed the Granite 8B model achieving better performance than comparable models from Google LLC and Mistral AI SAS and equivalent performance to comparable models from Meta Platforms Inc.

An accompanying technical report and responsible use guide provide extensive documentation of training datasets used to train the models as well as details of the filtering, cleansing and curation steps that were applied and comparative benchmark data.

An updated release of the pretrained Granite models IBM released earlier this year are trained on three times more data and provide greater modeling flexibility with support for external variables and rolling forecasts.

“IBM has an opportunity to deliver small language models that are domain-specific,” said Vellante. “Here’s where IBM can go after the 99% of data that’s not been trained on the entire internet corpus. Rather IBM can go after proprietary use cases that can drive greater customer differentiation.

The Granite Guardian 3.0 models are intended to provide safety protections by checking user prompts and model responses for a variety of risks. “You can concatenate both on the input before you make the inference query and the output to prevent the core model from jailbreaks and to prevent violence, profanity, et cetera,” said Dario Gil, senior vice president and director of research at IBM. “We’ve done everything possible to make it as safe as possible.”

Jailbreaks are malicious attempts to bypass the restrictions or safety measures imposed on an AI system to make it behave in unintended or potentially harmful ways. Guardian also performs RAG-specific checks such as context relevance, answer relevance and “groundedness,” which refers to the extent to which the model is connected to and informed by real-world data, facts or context.

AI at the edge

A set of smaller models called Granite Accelerators and Mixture of Experts are intended for low-latency and CPU-only applications. MoE is a type of machine learning architecture that combines multiple specialized models and dynamically selects and activates only a subset of them to enhance efficiency.

“Accelerator allows you to implement speculative decoding so you can achieve twice the throughput of the core model with no loss of quality,” Gil said. The MoE model can be trained with 10 trillion tokens but uses only 800 million used during inferencing for efficiency in edge use cases.

The Instruct and Guardian variants of Granite 8B and 2B models are available immediately for commercial use on IBM’s watsonx platform. A selection of Granite 3.0 models will also be available n partner platforms like Nvidia Corp.’s NIM stack and Google’s Vertex. The entire suite of Granite 3.0 models and the updated Time Series models are available for download on HuggingFace Inc.’s open-source platform and Red Hat Enterprise Linux.

The new Granite 3.0-based watsonx Code Assistant supports the C, C++, Go, Java and Python languages with new application modernization capabilities for enterprise Java Applications. IBM said the assistant has yielded 90% faster code documentation for certain tasks within its software development business. The code capabilities are accessible through a Visual Studio Code extension called IBM Granite.Code.

More, better agents

New tools for developers include agentic frameworks, integrations with existing environments and low-code automations for common use cases such as RAG and agents.

With agentic AI, or systems that are capable of autonomous behavior or decision-making, set to become the next big wave in AI development, IBM also said it’s equipping its consulting division with a multimodal agentic platform. The new Consulting Advantage for Cloud Transformation and Management and Consulting Advantage for Business Operations consulting lines will include domain-specific AI agents, applications and methods trained on IBM intellectual property and best practices that consultants can apply to their clients’ cloud and AI projects.

About 80,000 IBM consultants are currently using Consulting Advantage, with most deploying only one or two agents at a time, said Mohamad Ali, senior vice president head of IBM Consulting. As usage grows, however, IBM Consulting will need to support over 1.5 million agents, making Granite’s economics “absolutely essential because we will continue to scale this platform and we needed to be very cost-efficient,” he said.

“IBM is dramatically lowering the cost of training and running LLMs with more than good enough accuracy,” Vellante said. “So think of 1/10th the cost at the same or better performance with roughly equivalent accuracy. This is quite a massive advantage that IBM has, and it is doing so with an open-source and partner mindset.

The key for IBM now, he said, is “aligning all the parts of its business and leveraging its vast research capability. So for example, IBM has its own LLM (Granite), it has partnerships with several other LLM players, it has watsxon ai, watson for data, watson for governance, Red Hat, InstructLab, data and analytics, industry knowledge through consulting, infrastructure, silicon expertise and software. If it can bring all that together for clients, it will become a major force in my view.”

That could benefit IBM overall, he added. “On balance I’m really encouraged by IBM’s direction and wrote on LinkedIn many months ago that I haven’t been this excited about IBM in over a decade. Since then the stock’s hit an all-time high and I think it has great prospects ahead.”

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU