For those nerds who like the machine learning space and actually tinker/build/research in said space, I've been having a giggle with this model - https://huggingface.co/HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive Normally I run the qwen 3.6 27B dense model for my day to day which fits nicely on my main GPU. With a smaller MoE model loaded up on a second, smaller GPU for agentic work. The dense model only has 1 parallel path, the MoE has 2 for multiple users in the household. Anyone else pissing about with this stuff? I've also got a private ~3M parameter financial model I've been working on over the past year (in design for ~3 years) that is coming to a close in terms of project this year. I also previously worked in the ML space in healthcare, focused on CT scan imaging machine learning (you go for a cancer scan, the model reviews the sliced CT images and determines cancerous nodules, lookup Deephealth/Aidence, I no longer work for them). Also, if you're looking at pushing the limits of what is possible with your hardware, there is a large discussion on the turbo KV caching released by Google a while back - https://github.com/ggml-org/llama.cpp/discussions/20969 - there are a number of posts with various tests on caching for various GPU's w/ different models
The only part of your post I understood was “Anyone else pissing around with this stuff”. You speak in tongues Andy