Large pre-trained generative transformers have shown remarkable performance in various natural language generation tasks, utilizing vast training datasets to grasp the intricacies of human language. Nevertheless, fine-tuning these models for specific applications presents significant challenges. The computational efficiency of fine-tuning is heavily reliant on model size, rendering it expensive for researchers to work with large models. Additionally, fine-tuning on smaller datasets carries the risk of catastrophic forgetting, where the model overfits a particular task domain and loses crucial knowledge acquired during pre-training, leading to issues with reasoning skills such as compositional generalization and commonsense evaluation.
One method that has been developed to address this issue is prompt-tuning, which involves adding tokens or trainable vectors to the input and optimizing their embeddings. This approach allows for adaptation to new tasks with minimal data while minimizing the risk of catastrophic forgetting. Another method is the NeurAlly-Decomposed Oracles (NADO) algorithm which strikes a balance by employing a smaller transformer model to control the base model without altering its parameters. However, determining optimal training practices for significant distribution discrepancies and reducing additional costs associated with training the NADO module remains a question mark.
Additionally, there is also the GeLaTo Algorithm which enhances autoregressive text generation by integrating tractable probabilistic models (TPMs).
A team of researchers from University of California Los Angeles, Amazon AGI and Samsung Research America has introduced norm-Disentangled NeurAlly-Decomposed Oracles (DiNADO), an improved parameterization of the NADO algorithm that addresses some of its limitations.
The DiNADO’s performance was evaluated using two main tasks: Formal Machine Translation (FormalMT) and Lexically Constrained Generation (LCG). The experiments demonstrate that DiNADO-Soft outperforms DiNADO-Hard in terms sample efficiency study showing more generalizable performance.
In conclusion this research introduces DiNado an enhanced parameterisation variant for Nado improving upon earlier methodologies effectively opening new pathways for more efficient text generation applications.