TabGPT: Pretraining a Tabular Generative Pretrained Transformer Model for Heterogeneous Tabular Data in Finance
PI: Dr Qi LIU
Abstract:
The field of artificial intelligence has seen significant progress in recent years, particularly with regards to the use of pretrained models, which have yielded impressive results across a range of tasks. Tabular data is a widely used format in the financial sector for presenting and organizing information, but it has been relatively neglected in the context of AI research. This project aims to address this gap by developing a novel pretraining methodology for tabular data in the domain of finance, which is particularly challenging due to the complex and diverse table schemas involved. Our research questions will focus on adapting to heterogeneous table structures, establishing a universal pretraining protocol, ensuring generalizability and transferability of learned knowledge, and incorporating incremental columns over time. To tackle these challenges, we will propose a pioneering tabular GPT method that utilizes a module called TabUnit to represent basic table elements and a Transformer encoder to refine the representation. We will also utilize free-form prompts to facilitate pretraining and finetuning. Our methodology will be extensively tested and analyzed under various scenarios to validate its effectiveness.