Queue = FIFO, First in, First Out.
Many people use adds queue system to their products. Some of them do legendary things with it to make it extremely unreliable products :). Most of this solutions may seem trolling but they actually exists in some products.
Rolling out own queue system
First thing often people do is rolling out their own queue system. Is it bad ? no! it is great as long as you don’t have constraint that data can never be lost!
Can lose data in queue
If you use queue just to communicate between processes you can use something like unix name pipe. In reality this is just a file. Actually in Unix everything is a file and this is best ever design (if you neglect it you should die!).
1 2 |
|
And next you can use it to push stuff into it eg.
1
|
|
but this is shell example, you could create it and just read/write it in your processes.
That is cool. And this is the last point where we will not see problems :)))
But we are programmers!
Yes we are programmers and most of us are young and full of energy nobody remembers 70’ i was born in 85 so i technically would be quite mad if i would remember 70’.
So how we approach problems so of us would create inside of their code queue.
In C it would be simply array wrapped with mutex’es but this is unreliable and its long to write and and and…
So what people do ? They try use READY products.
First big mistake
Use key-value store as queue. – Lets serialize array into XYZ and set it into key. – That’s good idea! Only one thing will write to it!
WRONG!
Such an assumption will provide you with insane amount of carnage in future. And even if i know “agile says XYZ now”… actually “agile” don’t say “take drugs an yolo because tomorrow you can die” but “TAKE RISKY THINGS FIRST” and this is risky thing. Should be implemented well.
What happens in this case ? Someone gets a great idea that product should scale adds another daemon and this f*cks up queue you lose messages.
Some dbms can handle this problem, but wrapping it into transaction will not solve the problem.
Scenario is: Process a)
- reads queue
- process is (makes a pop or push)
- saves the serialised queue
Now imagine process b) does the same. Everything is blazing fast and you get f*cked.
So DB system must know context. This is where RIAK shines, you get vector clocks and you know that you are f*cked. You can react but in 99% you don’t know how to resolve this issue but at least you would know… but some specialsits can disable this because handling vector clocks is a pain and you can get PERFORMANCE BOOST :))).
Redis list as queue
Redis is great tool to build a lot of stuff. And it has built in data structures. I think this is ground breaking because previous solutions like RDBMS most commonly use or other NoSQL solutions. Redis is great how to make queue within redis ?
1 2 |
|
example like this
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Cool ! Works great and any process can access it in atomic way isn’t it great ? best thing ever ?!
Actually it is very good. But there is one thing you just missed! Redis do a flush of keys every 60 sec if 10k keys did change by default. What does it mean ? You can get screwed if redis will instantly die!
How to fix this ? Visit http://redis.io/topics/persistence and see section “Append-only file”
1 2 |
|
Man you just did it, you lost some performance but you did it. Who would knew ? You just saved the world. How much we just cut performance ? “fsync every time a new command is appended to the AOF. Very very slow, very safe.” that is Ouch! Your boss could be unhappy even if this solution is actually the best, most simple and durable idea.
Can’t lose any data! Redis plus Mysql/Postgres
Persistence daemons worked out a new combo. You store each element of the queue like key-value store in SQL RDBMS and put its id on the queue in redis next you pop it up from the queue in redis process and updates its status in SQL RDBM. This is not so bad but it kills performance more than just turning on “appendonly yes”. Also it makes things hell more complicated and forces you to do updates in both system.
Is this system cure for cancer ? No! You have to have very good queue fail recovery / startup system. Simply empty list and make query
1
|
|
next you have to clean redis queue and push new data. Is this safe ? No you don’t know if few last jobs did finish or not. Eg. Mysql got f*cked but messages got processed. Yes this adds a lot more complications.
Also with this solution index on ID column makes its fast to make a select but slow to add or remove. And you want your queue to perform and yes mysql will do fsync.
Why not MongoDB
You can’t atomically pop stuff. Don’t think about pop/push/pushall on array in document! If you will have this idea check my gist https://gist.github.com/2071805 run it and see what happens :) what you get back.
RabbitMQ / ZeroMQ
When you will visit ZeroMQ page you will see
1 2 3 4 5 6 7 8 9 10 |
|
Nothing about consistency FASTERN THAN (this has to be good) TCP but can use TCP (i wonder if it can be faster than TCP even using TCP /trollface). Anyway you see a lot of stuff. I started some search on zeromq losing data and what i found http://zguide.zeromq.org/page:all#Missing-Message-Problem-Solver a nice image.
Big thing :)
If you will visit rabbitmq page http://www.rabbitmq.com/ you will see a lot of nice things like tutorial etc. Page is nice and has useful knowledge. Both solutions have client in Erlang (massive plus) and other languages. And even while setting up whole thing may be a pain i think this is a solid option both ZeromMQ and RabbitMQ.
Why do we use Queues ?
We use them to absorb traffic of messages and process their content by eg. workers / handlers etc. If we will make it unprocessable by more than one worker we ain’t doing our job properly.
What makes things hard.
- Locks if we use them, they will bite you back
- Many points where we store same data in different way
- Yes, locks will bite you back
So what is the best way to go ?
I think the best way to go is just to start a new movement called Unix Archeology because we seems to be reinventing the wheel too many times. But really
- Make a list of solution
- Ask your self if your idea is really good
I’m 100% sure that storing queues as serialized lists in memcached or keeping them as table in mysql/postgres and making loads of funky stuff to keep it running is not the way to go. It can seem like a good idea at start but it is not. Named pipe in file system can be better.
Loads of things can be brilliant queue choices eg. Redis, ZeroMQ, RabbitMQ or even named pipes but not serialized array in key-value store.