Local 8B LLM Benchmarks: MacBook M4 vs RTX 3070 vs Cloud CPU

Open source models have gotten unbelievably good, which has opened up a world of home uses for enthusiasts of all kinds. With no AI background you can have Ollama serving models on any laptop or desktop machine you have lying around, and with a little more work you can use that to have your own free1 service to power your RAG, personal finance or even code editing projects.

tl;dr – a recent mac is just fine for your development, and you can get far with a modest GPU investment. Current Intel/AMD machines (even large cloud VMs) are not worthwhile..

But what can you expect from your personal hardware, and what can you get if you’re willing to upgrade it? I don’t mean leasing a GPU farm for a year, just spending a little on your laptop or video card to improve your inference speed and model size. I tested 4 open source models on a consumer-grade NVidia graphics card, modern Apple silicon, and a reasonably powerful Cloud (CPU based) VM to see what would work for me for development and home service.

To do that I wrote a small LLM benchmark tool I call Unladen SwaLLM2 that takes a file of prompts (or one on the command line) and runs it against a set of LLMs on a given Ollama server, and outputs JSON stats and optionally the responses for evaluation (for which I’d recommend a public AI service). I ran this against the three platforms I had available.

Hardware Tested

Platform CPU RAM GPU Upfront Cost Operating Cost
Apple M4 Laptop Apple M4 (10-core) 16GB Integrated $1,099 ~$0/hr
GPU Server Intel i7-12700KF 32GB RTX 3070 8GB ~$1,200 ~$0.20/hr*
Cloud VM AMD EPYC 7J13 (12 vCPU) 128GB None Pay-as-you-go ~$0.50/hr

*Electricity cost when running inference

So the good news so far is that you don’t even need an internet connection to do AI development on your laptop. But most real world applications will require some degree of concurrency, so I also tested with 5 and 10 asynchronous requests at a time.

Performance

The first thing to review is raw speed, and no shock, GPUs rule as long as the model fits in memory (4.9GB files seem to be the sweet spot for the 3070’s 8 GB VRAM).

A note about the testing – The unladen-swallm repo has the prompts file I used, and the full command line was:

swallm benchmark -m llama3.1:8b-instruct-q4_K_M -m cogito:8b -m deepseek-r1:8b -m mistral:latest -t 90 -c 10 -P ./more_eval_prompts.txt -r -o output_concurrent_10.json

This will test the 4 models assuming ollama is running on localhost (otherwise -H <hostname:port>) with a concurrency of 10 statements per model used and a timeout of 90 seconds. The -r includes the responses in the JSON file, which I then gave to Claude Opus 4.5 (new model!) to review for quality. Some surprises there, too.

Quality

All four models failed this prompt:

You have a backpack with a 15kg weight limit. Choose the optimal combination of these items—camera (5kg), laptop (3kg), tent (7kg), food pack (6kg), and water (2kg)—to maximize utility for a 2-day hiking trip. Explain your reasoning in under 120 words.

(food, tent, water. C’mon now.)

For code generation, all four models handled utility scripts competently—none will refactor a large codebase, but they’ll write your sieve of Eratosthenes. Cogito was the only one to produce a subtly incorrect bug fix. For tasks with strict format constraints (word counts, paragraph limits), deepseek-r1 was most reliable but also slowest. YMMV. For instruction following, similarly, no model was perfect, but all were good – though mistral seemed to have the most problems with format instructions (e.g., when told “write 3 paragraphs” it produces a numbered list).

Concurrency

Add concurrency and the speed issues become more pronounced. CPU becomes unusable, the m4 loses a lot more ground to the GPU, with latency going up 4x at 10 concurrent prompts vs closer to 3x for the 3070. So for production home inference, you should really pop for the $300.

Conclusion

My conclusion: Mac laptop for development, GPU for anything real, skip AMD/Intel CPU entirely (for now). If you want to test your own setup, my code is on GitHub. Next I’m wondering about the backend – is the convenience of Ollama leaving performance on the table?

  1. “Free” apart from the electric bill ↩︎
  2. apologies to Monty Python ↩︎

Covid-19 data in Python

What did New York geeks do this spring? Obsess over pandemic data.

The best data source for US data is Covid Tracking, the Atlantic Monthly project used by many hospitals and the US government. I was able to muck around in python and make this GIF.

Additional data for the state shapes is from Census.gov and population data from healthdata.gov (because I might want to break it down farther). After that it is just matplotlib, pandas & geopandas. I’m hoping to get the time to clean it up and put it up on github, but I’m happy to share it.

Update: Replaced the GIF with a version updated through 7/25, also using a rolling mean instead of raw numbers for the line graphs (smooth out spikes).

Setting up MySQL on a Raspberry Pi 3 B+

MySQL on ARM! It is not rocket surgery, but I’ve aggregated the different sources needed below with a brief description of how to get it installed.

ARM has continued it’s march to the data center, with most Enterprise Linux Distributions supporting 64-bit ARM in the last year and a growing list of ARM devices available from a growing list of chip makers. The devices themselves have gotten easier and more full-featured, and the best known of these is the Raspberry Pi 3 Model B+ .  Sporting 1GB of SDRAM, Ethernet, WiFi, HDMI and 4 USB ports, it’s more full-featured than any of my first few computers. 

As of version 8.0, MySQL is now available on ARM Linux. I’ll go through the basics of getting 8.0 installed on Oracle Linux on a Pi 3. You’ll need the Pi, a micro-SD card (fast and at least 16 GB, I’d recommend this one), a power adapter, a monitor with an HDMI cable, Ethernet, and a USB keyboard.

The alternative to the keyboard-and-monitor route is to ssh into the device once it is booted on ethernet – but you’ll need a way to find it’s IP address. It showed up on my router as “rpi3”.  Cases are nice, too, if you like that sort of thing.

To start you’ll want to get the latest Oracle Linux image for the Pi and put it on the SD card. It’s fairly straight-forward on most platforms, but I’d recommend the open source tool Etcher if you are not familiar with tools like the Linux/UNIX dd command. Once you have the image on the disk, insert it into the Pi, connect an HDMI device (TV or monitor) and a USB keyboard and power up the Pi by plugging it in (there is no power switch).

The default password for root is ‘oracle’ (which you’ll need to change on login). The first thing you will want to do is enable the MySQL repo  and update the system with the following commands:

yum install yum-utils
yum-config-manager --enable ol7_MySQL80
yum update

The last command is technically optional, but you really should update this new system (it will be a few minutes).

But the image is not large, and you have a lot of extra space on the card. To use all that space you’ll need a larger file partition than you got in the image, and then to grow the file system (btrfs by default) into that partition.

If you have never used fdisk, I’ll say only to be careful as it is easy to render your system unusable with it. Start by invoking it with the name of your device

fdisk /dev/mmcblk0

And Delete the last partition (command ‘d’, partition 4 – the default) and then create a New partition (command ‘n’) with the start point of the former partition and a size of “the rest of the disk”. Luckily most of these choices are the default prompts, but watch closely and remember to Print the current status (command ‘p’) and Write the new partition table (command ‘w’ – are you getting the idea?) when you are sure it is correct. You will get a warning, which you will ignore.

Restart the system to load your new partition table (and use the latest kernel you got from the above update):

shutdown now -r

After you log in again, grow your root file system to use all that new space:

btrfs filesystem resize max /

Finally, you are ready to install MySQL! Full installation instructions are available here, but in summary:

yum install mysql-community-server

Lastly, there is a known issue with MySQL not finding libc++, and a work around with

ln -s /opt/oracle/oracle-armtoolset-1/root/usr/lib64 /usr/lib64/gcc7

And start your server with systemd

systemctl start mysqld

That’s it!

You have a new MySQL 8.0 Server. If you didn’t know how to get the root password (grep ‘temporary password’ /var/log/mysqld.log) then you might now go peruse the aforementioned yum install page and perhaps the post-installation page in the official documentation.