Google Unveils PaliGemma 2: Advanced Vision-Language AI Models

Google has expanded its Gemma family of AI models by introducing PaliGemma, an open vision-language model (VLM), and announcing Gemma 2, the next generation of Gemma models built on a new architecture.These developments aim to enhance performance and efficiency in various AI applications.

PaliGemma: A Versatile Vision-Language Model

PaliGemma is designed to handle a range of vision-language tasks, including image and video captioning, visual question answering, text recognition within images, object detection, and segmentation. Inspired by the PaLI-3 vision-language models, PaliGemma is built on components from the SigLIP vision model, emphasizing a balance between size, speed, and strength. Developers can access PaliGemma on platforms such as GitHub, Hugging Face, Kaggle, and Vertex AI.

Gemma 2: Next-Generation Language Model

Scheduled for formal launch in the coming weeks, Gemma 2 introduces a new architecture aimed at delivering significant improvements in performance and efficiency. With 27 billion parameters, Gemma 2 offers performance comparable to Llama 3B but at less than half the size, making it more cost-effective to deploy. Its efficient design allows it to operate on less than half the compute resources required by comparable models. For fine-tuning, Gemma 2 is compatible with solutions ranging from Google Cloud to tools like Axolotl.

LLM Comparator: Enhancing Responsible AI Development

In addition to these model updates, Google has released the LLM Comparator as an open-source tool within its Responsible Generative AI Toolkit. This interactive data visualization tool assists developers in conducting model evaluations by enabling side-by-side comparisons of model responses to assess their quality and safety.

These advancements reflect Google’s ongoing commitment to providing developers and researchers with robust tools for responsible AI development, enhancing both the capabilities and accessibility of AI technologies.