Local model from Huggingface

Several large language models (LLMs) were selected by popularity from Huggingface for test run.

NousResearch/Llama-2-13b-hf

Original LLama2 request is still not aproved by meta. Here model from NousResearch was used.

First, build a text-generation pipeline.

from langchain.llms.huggingface_pipeline import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from langchain.prompts import PromptTemplate

model_id = "NousResearch/Llama-2-13b-hf"
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", 
                                            torch_dtype="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer,
                max_new_tokens=256,
                eos_token_id=tokenizer.eos_token_id,
                do_sample=True,
                repetition_penalty=1.3,
                no_repeat_ngram_size=5,
                temperature=0.7,
                top_k=10,
                top_p=0.8,
            )
hf_l = HuggingFacePipeline(pipeline=pipe)

Then, connect prompt with pipeline to question-answer chian using langchain.

template = """You are an assistant for question-answering tasks. 
If you don't know the answer, just say that you don't know. 
Keep the answer concise.

Question: {question}                                                                                                                                                                                                   
Answer: Let's think step by step."""
prompt = PromptTemplate.from_template(template)

chain = prompt | hf_l

question = "What is common workflow language?"
print(chain.invoke({"question": question}))

Here is the answer:

Common means it can be used in multiple projects or systems. Workflow implies a sequence of steps and some kind of control flow logic (either sequential or parallel). Language would imply something with syntax like Python/Java etc… I believe this to mean BPMN which stands for business process model notation as defined here https://en.wikipedia.org/wiki/Business_Process_Model_and_Notation. It also looks very similar to UML activity diagrams but has more specific semantics around timing constraints on activities such as start times end durations etc.. The standard was developed jointly between OMG & ISO so there should not really be any compatibility issues across different platforms unless they choose not implement support properly themselves e.g., IBM Blueworks vs Microsoft Visio Pro etc…. Hopefully this helps!

Answer from nowhere…

01-ai/Yi-34B

The most recent LLM.

from langchain.llms.huggingface_pipeline import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from langchain.prompts import PromptTemplate

model_id = "01-ai/Yi-34B"
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", 
                                            torch_dtype="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer,
                max_new_tokens=256,
                eos_token_id=tokenizer.eos_token_id,
                do_sample=True,
                repetition_penalty=1.3,
                no_repeat_ngram_size=5,
                temperature=0.7,
                top_k=40,
                top_p=0.8,
            )
hf_yi = HuggingFacePipeline(pipeline=pipe)

template = """You are an assistant for question-answering tasks. 
If you don't know the answer, just say that you don't know. 
Keep the answer concise.                                                                                                                                            

Question: {question}

Answer: Let's think step by step."""
prompt = PromptTemplate.from_template(template)

chain = prompt | hf_yi
question = "What is common workflow language?"
print(chain.invoke({"question": question}))

The Common Workflow Language (CWL) defines a specification and software framework to describe end-to-end analysis pipelines in bioinformatics using human readable YAML documents. Thus CWL provides tools for defining data processing parameters as well as job dependencies between steps within or across platforms such as Docker containers, Singularity images etc. So the answer is “A standard format used to define pipeline specifications”.

The answer is impressively correct!