Therefore, casual responses that do not carry any real world information (e.g., “That’s a great idea”), affect Informativeness but not Groundedness. A related metric, Informativeness, is defined as the percentage of responses with information about the external world that can be supported by known sources, as a share of all responses. Groundedness is defined as the percentage of responses with claims about the external world that can be supported by authoritative external sources, as a share of all responses containing claims about the external world. This motivates our study of groundedness in LaMDA. Groundedness: The current generation of language models often generate statements that seem plausible, but actually contradict facts established in known external sources. Our research towards developing a practical Safety metric represents very early work, and there is still a great deal of progress for us to make in this area. For example, these objectives train the model to avoid producing outputs that contain violent or gory content, promote slurs or hateful stereotypes towards groups of people, or contain profanity. These objectives attempt to constrain the model’s output to avoid any unintended results that create risks of harm for the user, and to avoid reinforcing unfair bias. Our Safety metric is composed of an illustrative set of safety objectives that captures the behavior that the model should exhibit in a dialog. Safety: We’re also making progress towards addressing important questions related to the development and deployment of Responsible AI. Finally, Interestingness measures whether the model produces responses that are also insightful, unexpected or witty, and are therefore more likely to create better dialog. Specificity is measured by judging whether the system's response is specific to the preceding dialog context, and not a generic response that could apply to most contexts (e.g., “ok” or “I don’t know”). Sensibleness refers to whether the model produces responses that make sense in the dialog context (e.g., no common sense mistakes, no absurd responses, and no contradictions with earlier responses). Quality: We decompose Quality into three dimensions, Sensibleness, Specificity, and Interestingness (SSI), which are evaluated by human raters. LaMDA has three key objectives - Quality, Safety, and Groundedness - each of which we measure using carefully designed metrics: LaMDA is built by fine-tuning a family of Transformer-based neural language models specialized for dialog, with up to 137B model parameters, and teaching the models to leverage external knowledge sources.ĭefining objectives and metrics is critical to guide training dialog models. In this post, we’ll give an overview on how we’re making progress towards safe, grounded, and high-quality dialog applications. Today we’re excited to share recent advances in our “ LaMDA: Language Models for Dialog Applications” project. In addition to producing responses that humans judge as sensible, interesting, and specific to the context, dialog models should adhere to Responsible AI practices, and avoid making factual statements that are not supported by external information sources. Among these, open-domain dialog, where a model needs to be able to converse about any topic, is probably one of the most difficult, with a wide range of potential applications and open challenges. Language models are becoming more capable than ever before and are helpful in a variety of tasks - translating one language into another, summarizing a long document into a brief highlight, or answering information-seeking questions. Posted by Heng-Tze Cheng, Senior Staff Software Engineer and Romal Thoppilan, Senior Software Engineer, Google Research, Brain Team
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |