python3 server.py --wbits 4 --groupsize 128 --model_type LLaMA --xformers --api
Figure 1. Starting up Oobabooga’s text-generation-webui API.
import websockets
import asyncio
import json
HOST = 'localhost:5005'
URI = f'ws://{HOST}/api/v1/stream'
async def run(context):
request = {
'prompt': context,
'max_new_tokens': 10,
'do_sample': True,
'temperature': 1.99,
'top_p': 0.18,
'typical_p': 1,
'repetition_penalty': 1.15,
'top_k': 30,
'min_length': 5,
'no_repeat_ngram_size': 0,
'num_beams': 1,
'penalty_alpha': 0,
'length_penalty': 1,
'early_stopping': False,
'seed': -1,
'add_bos_token': True,
'truncation_length': 510,
'ban_eos_token': True,
'skip_special_tokens': True,
'stopping_strings': []
}
async with websockets.connect(URI) as websocket:
await websocket.send(json.dumps(request))
yield context
while True:
incoming_data = await websocket.recv()
incoming_data = json.loads(incoming_data)
match incoming_data['event']:
case 'text_stream':
yield incoming_data['text']
case 'stream_end':
return
Figure 2. First 20 instances contained in the dataset in Kaggle.
Figure 3. Example of few-shot prompt given to the LLM.
This browser does not support the video element.
Figure 4. Image of our code running.
Figure 5. Evaluation matrix built for every AI model run.
Figure 6. Results for the Few-Shot experiment. High or low refer to whether the applicant is deemed “high risk” or “low risk” by the model. Only low risk applicants receive credit cards.