How to install CHAT GPT Locally – Now it’s Possible

Just a few months ago, Stanford released a brand new AI model called Alpaca, tha could be considered nearly as good as Chatgpt, but it’s based on Meta’s Llama seven model.

Installing llama and alpaca on your own computer will take no more than a few minutes and no high end hardware is required.

Alpaca is incredibly cheap to train because it is so lightweight as the shocking difference is that alpaca only costs $600 to train, and behaves qualitatively similar to Openai’s DaVinci.

Model Alpaca takes Meta’s publicly available model and uses it in conjunction with Gpt3 to essentially train itself on thousands of common instructions. From my own testing, it does seem to perform very well, however, not quite as well as GPT four, which just came out last week, but given its size and efficiency, OpenAI should definitely be looking over its shoulder.

So here’s the announcement from Stanford about alpaca, and it talks about why it was built, how it was built and how fast it is. And I think it’s super interesting to read through this.

LLama takes 175 self instruct seed tasks from DaVinci, which is Openai’s model loads it into Meta’s llama seven B model gets 52,000 instruction following examples and then uses that to train itself.

To install the models you need to find the GitHub page named is Cocktail Peanut, where you can find a repo called Dolly Dolly Llama, and you can install this in just a few steps. So you do need node to run it.

That’s really important. And I actually had an older version of node running, but you do need Node 18 or above, so make sure you install that and to install the latest version of Node.

So the first thing you’re going to do after you have Node 18 or above installed is to install Dolly and the way you do that is with this command NPM install Dolly.

So once you have that installed, the next thing you’re going to do is use Dolly to install the two models alpaca and llama.

So here’s the command Npx Dolly Llama install seven B and 13 B, and that’ll install two different llama models, one called seven B and one called 13 B. You don’t need both of them. You can only do one. But if you want to play around with multiple models and see which one you like best, go ahead and install both of them.

You can also install the alpaca model. Both of these just take a few minutes to download. And so here’s the command llama NP Dolly alpaca install seven B, and you’ll go ahead and hit enter. It looks very similar to the last one.

Now once you have both of those models installed, the next thing we’re going to do is just run the server NP, Dolly serve and that’ll spin up a localhost server with those models.

Let’s take a look. So we’ll grab the URL and here it is. You can see I have three models installed Alpaca seven B llama, 13 B and llama seven b Let’s give it a try with the prompt example from the GitHub repo.

How is the llama and alpaca related? And there it is. It starts spitting out the answer. Now at the top. It has a bunch of different settings and I’ll show you what those different settings are.

You’re actually going to see an issue as it’s outputting this response. So right here, the first setting is called N Predict, and that is the number of tokens that it’s going to respond with. And there’s the issue.

So as it was outputting the response, it stopped because we reached 200 tokens. Now you can go ahead and increase that to whatever you want. We’ll try it again and now we’ll get the full response. And there it is, the full response. So let’s take a look at all the different settings.

So on Cocktail Peanuts repo, it’s explained what most of the different settings are. So you have the prompt, obviously the model, which is which model you actually want to use a URL if you’re going to connect to a remote server. But we’re not threads is the number of threads that your computer is going to be using and predict, which is really the important setting.

So if you want a really long response or if you need a really long response, go ahead and set that a lot higher.

The temperature actually controls how unique or adventurous a response will be. Closer to zero, you’re going to get similar responses again and again with the same prompt closer to one.

You’re going to get a lot of unique and very adventurous responses. And then you can also have Skip End, which tells you whether it’s the end of the response or not. So one last thing I want to show off is a tweet from 2022 by Will Summerlin from Ark Invest. Now, he said AI training costs are declining at 60% year over year. It costs $5 million for GPT three and 2022. But by 2023, it’ll cost $500 to train the model at the same performance. Now, he said that at the end of 2022 and already we have a model that has similar performance to DaVinci three that only costs $600 or less to train and you can install it on your local computer, any computer, no high end graphics card, nothing needed. So we are way ahead of Moore’s Law. I am super excited to see what’s coming. I’m especially excited to install more models on my local machine, play around with them, compare them to ChatGPT and see which one does better at which tasks.


Exit mobile version