{}
can kill anyone regardless of faith. It was really really long time since i wrote anything for the blog and today as it is “summer bank holiday” i decided to finally sit down and write few things that are interesting.
Even if i will be talking about java.util.concurrent i will give few examples using Scala. I like Scala and i think it is much easier to understand and read than java. Simply less tokens. And less tokens mean more fun.
I don’t know if explore how much nice things there is in JVM, one of thme is truffle (added in java 8 but i will not be talking about it.). One of the great things is called java.util.concurrent this set of tools/lib gives us tools to work with concurrency.
In times of agents such a set of tools could feel a bit outdated but still they can gives us valuable lessons about concurrency and maybe be useful in present/future.
So as we all know java is full of design patterns and one of the first things you will not while looking into docs are abstract classes. This kinda gives us overview of what we can expect in the package. Just like a movie teaser but… boring :).
First thing we notice while looking at http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/package-summary.html is probably BlockingDeque
and BlockingQueue
. And this is out first example.
If you ever worked with threads or any concurrent constructs you know how useful are channels/queues. First concrete class in the package is ArrayBlockingQueue[T]
which lets us construct queues. For those who don’t know what queue is, it is a FIFO construct, FIFO means First In, First Out. So elements that get in first will be picked up at the receiving end of queue before rest. It is like a queue for tickets before a big summer blockbuster release.
Let us try this ArrayBlockingQueue out:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
|
What are we doing here ? We simply demonstrate a producer and consumer type of situation.
There are few things to look at, first initialization
1
|
|
where we create our queue with a total capacity of 100, this can be skipped for no capacity but this could be risky in terms of memory. So we wanna omit unpredictable parts of code.
How to add stuff to the queue
1
|
|
Why like this ? lets not forget it is a blocking queue so once it will fill the capacity it will block. If the queue is full offer
method will return false and the element will not be added tot he queue thats why we have to retry this. Of course in this case it might not be the perfect example as it will grind CPU until it can add it to the queue. So maybe adding Thread.sleep(50)
sleep for 50 miliseconds could be good here.
Now lets look at consumer, here the job is simple we use take, this will simply block if we can get anything from queue and wait. In most cases this is the behavior we want. Thread simply sitting there and waiting for something to appear in the queue.
There is also option to use add
function to add stuff to the queue but this will trigger an exception in case queue is full and i’m not a big fan of handling exceptions in this type of scenarios.
More info about ArrayblockingQueue api can be found here http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ArrayBlockingQueue.html
Concurrent Hash Map lets you use a single dictionary / hash by many threads. This is great as it makes all the synchronization work for us. Of course often writes/updates by many threads will make it perform very very slow, but if we can eg. use it as a form of reduction result that would be great simplification.
If we will use it like this
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
|
Of course it will work but it will often cause troubles, this code is racy :D and often it will end up with same results for both one and two even if it is synced. Well we now know we can use this structure from any number of threads but to make it work it would be more useful to create another thread that would be reducing values or simply have a queue where we put in partial results and a single thread that is updating the hash. Still this can have some use, if you have a one reducer that is updating this hash or simply many different reducers updating dedicated key spaces while many other threads are simply using this hash in readonly mode. The big issue is when you want to update it as it doesn’t support transactions and what you really want to do here is a transaction.
Well we all love simplicity of a single variable, and in concurrent env it is simply easy to forget about goodies of sequential world and use a raw variable to store results of some execution.
Let us write some dodgy code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
Result should be 999000 but… you will get stuff like 907369… This happens because of both threads randomly reading and updating with garbage same val. Thats why we need atomic values :) lets convert it into less dodgy thing.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
After adding AtomicInteger and changing how we update to atomic updates we always get same results and it is the correct answer. It doesn’t look good yet because of this Example.counter but that is just an example.
There is a lot more in this awesome package to cover, i will cover one more thing next time and that are Cyclic Barriers for better synchronization of threads but for now this is it :). I hope this was a useful read. I don’t have much time to play with Scala so if something looks “too simple” :D yeah i’m not a scala expert.
Cheers!
]]>Data.Aeson
is a great package for working with json data in Haskell
but you can make it work even in fewer lines of code.
if you will use DeriveGeneric
from GHC and GHC.Generics
in your module you can parse stuff super easy ;O.
This is my example that explains how to use it.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
|
result
1 2 3 |
|
Here i made shortest possible example to show off how you can work with Aeson. First of all if you will use Generics you don’t have to write real implementation of ToJSON
and FromJSON
GHC will do this or you!
Only thing to remember is that encode
will give you back ByteString
and decode will give you Maybe A
and thats it.
You always can fallback to normal way of describing FromJSON
and ToJSON
:)
Different clients requests features, you upgrade your codebase and mvoe along for example streamlining api. You need a way to version the code so old clients have time to move work on upgrade and new clients can use new api without having problems.
This is the first part and its easy so i will be fast on this. In my opinion using subdomains/CNAM’es for versions like
v1., v2., v3.yourdomain.com
or 20012013.yourdomain.com
is the best way to handle changes in API from client side. As we spoke internally using headers or any other thing can make clients go mental because they are using other type of software and
some change could be not trivial. Yes actually people have this type of problems with very old bash/perl systems.
But why subdomains are cool in my opinion will unroll in next section.
Most important thing is how easy it will be for developer to add things without breaking other things. Yes this is the problem! Most people will think that rolling a solution with some sort of scoping, inheritance is ok.
eg.
1 2 |
|
No its not. This is actually sh*t. Why ? Even with good code test coverage you still have issue becasue each of this class is using other classes basically sharing them. And changes in their code can affect other version of api. And with V1 and V2 it is managable but if you have 5-9 versions its starts to be crazy.
Imagine a queue of versions in form of deployed boxes.
My idea is very simple. You simply tag version codebase and deploy new version on new box while pointing new subdomain to new app.
What do you gain ?
We live in age of cloud deploy so spinning new instance is not that expensive!
There is only one point you have to be very aware, that is data persistence. If you are using SQL you can only ADD COLUMNS/TABLEs never remove anything. But in most thing you will do exactly this anyway.
Version deploy’s on virtual machines example:
Process! how would you implement this in real life ? Simple!
When you have an app and you deploy it using eg. puppet + capistrano. When you deploy first version you make a tag in git with name eg version-1
or 20032014-deploy
and deploy it to a box assigning a CNAME.
Next you start working on new version and when you will get new version you tag it 21032014-deploy
and deploy to a new box. This must contain “build whole box script” in puppet or if you are using docker. This way if you eg. added redis to stack you need to be sure your production deployment scripts are ready. This makes you be focused on keeping your “production ready setup” always up to date.
And after deploy you move along and work on new version. When you need to decomision old version you just kill the box. Also each version should be monitored how many requests it actually gets to the API because if you will have 20 versions up and some are getting 0 traffic you can kill them.
Example where many versions use same db “ring” / “cluster”:
How do you rollback ? simply checkout deployment tag and deploy :)
The gain in this strategy is isolation, if you need more boxes for your main version you have ready production scripts so all you have to do is to spin them and be sure that CNAME is load balanced.
I don’t think this strategy has any problem / hidden traps. You get smaller codebase to work with, ability to upgrade your production boxes and application libs/frameworks as you always move forward and deploy from scratch. You don’t do upgrade-production box deploy but fresh deploy. Also fresh deploy can be smoke-tested by tester before putting it into production. Deployment scripts makes you by default ready to scale your app horizontally.
Again What you do:
IMHO everyone should move to this type of strategy in context of api versions.
Cheers – Jakub Oboza
]]>new
and make
. At first glance they seemed to be doing same thing. There is a difference and it actually is quite easy to explain.
If we will go to the golang doc page under http://golang.org/pkg/builtin
we can see every builtin function in go. Also new
and make
. From new we can read.
“The new built-in function allocates memory. The first argument is a type, not a value, and the value returned is a pointer to a newly allocated zero value of that type.”
similar on make.
“The make built-in function allocates and initializes an object of type slice, map, or chan (only). Like new, the first argument is a type, not a value. Unlike new, make’s return type is the same as the type of its argument, not a pointer to it.”
So we can see that new
return pointer to a type and make returns an allocated object of that type. Now this is a difference.
So how can we implement a simplified new ?
1 2 3 4 5 |
|
this is just like we would do someVar := new(int)
.
In case of make
we can only use it for map
, slice
and chan
.
“Slice: The size specifies the length. The capacity of the slice is equal to its length. A second integer argument may be provided to specify a different capacity; it must be no smaller than the length, so make([]int, 0, 10) allocates a slice of length 0 and capacity 10. Map: An initial allocation is made according to the size but the resulting map has length 0. The size may be omitted, in which case a small starting size is allocated. Channel: The channel’s buffer is initialized with the specified buffer capacity. If zero, or the size is omitted, the channel is unbuffered.”
make creates and allocates all the memory. We can specify size of the element we want in the second parameter but this only works for slice and chan. Map is a special type that doesn’t need size.
And make is the only way to create this objects.
new
is a way of getting pointers to new types while make
is for creating channels, maps and slices only.
If you have 800 Mhash/s
GPU (Graphics card) i’m sure you often think about making your raspberry pi a dogecoin miner. Because why not :D ? It is not effective i warn you :) you will get around 0.34 khash/s
that is about 2000 times less than your GPU :) and about 500 times less than CPU on your box. But its easy and fun.
I mainly did it for fun to see how it will react, work and what is the possible heat problem.
Because DOGE, doge coin is THE NEW BLACK. Its future! An Irony on all crypto currency :) and its value is based on meme, laugh and happiness. This is much better than bitcoin :> at least for me
You will need:
1 2 3 4 |
|
if you are using like me 2014-01-07-wheezy-raspbian
raspbian you will have everything ready :) one thing to install is automake
and you can do this by typing apt-get update
followed by apt-get install automake
this is all you need.
lets get on the box! (default login/passowrd for this image is pi:raspberry)
1 2 |
|
(This ip address is just an example :D you will need to have a way to find it in your network.)
Now all you need to do is to clone CPU miner for it.
1
|
|
This will download your mining software on the raspeberry pi, next we need to compile it and run!
1 2 3 4 5 6 |
|
This will compile and build the minerd binary that is ready to start mining :). Well you need to do one thing, join a doge pool. I’m not gone go into details of solo mining vs pool minnig :) i’m just a simple miner :D
If you need more info on mining pools you should check this topic http://www.reddit.com/r/dogecoin/comments/1tn8yz/dogecoin_mining_pool_list/
I personally at the time of writing this post i’m using small pool called chunky pool :).
Now we have software lets actually mine something :). You will need to create a shell script that will start minerd on your raspberry pi. My looks like this
1
|
|
make it executable and run! Yay you are a dogecoin farmer now! CPU mining is not the most optimal but hey..its all just for lolz :)
For me 2 hours of mining on raspberry pi did not generate any extra heat or stuff like this, seems to be stable. I was worried it will got nuts on this topic but i was proven wrong.
This sucks in terms of speed, you will get close to none speed because below 2 khash/s its not even worth it, you will not be on any stats on any pool even listed, periodically you will actually hit the jackpot and get a success pimping you to 2khash/s for about 60 sec but that is just a lie, you scored a win and this will give you some parts of doge. Last i checked you were able to get around 0.67 doge per hour of your raspberry time. That is really really bad as pretty basic GPU pimps you to 600+
Cheers :) Hope it helps! Much fun, so currency.
]]>Most of the time when preparing to make an http request in haskell eg. using simpleHTTP
we need to build a request. We have several ways to do it, one of them would be to ugly glue strings together but thats not the way to do it in a safe way. Happily for us we have url
, (cabal install url)
package that adds Network.Url
package. And here i will show few quick tips how to use it to work with urls.
First thing we have to do is to import our package :D and string into our url lib.
1 2 3 4 5 6 7 8 9 10 |
|
This is the super simple example. But how it work? First of all we have importURL
with signature:
1
|
|
This will import url in form of string to url library and give us back Maybe URL. This is awesome! So we will have type that we can work on Yay! To exit library and get back string. We need to use exportURL
with signature:
1
|
|
So we are only doing some simple transformation String ~>~ Maybe URL ~>~ URL ~>~ String
thats nothing we can’t handle!
Next important bit is add_param
function with signature:
1
|
|
This does exactly what we would expect :D If we need to add to url http://google.com
two params ok=1
and query=haskell
To build http://google.com?query=haskell&ok=1
.
I will try to reiterate our first example showing a bit more thing. Or just same things in a different way. Lets try to add two params.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
You should run the code and see something like this:
1 2 3 4 |
|
It is just a quick tip :) Network.URL
has few more functions eg. to check if protocol is secure and checking if params are ok.
But stuff showed above is the main point of lib.
More about this lib ofc on hackage: http://hackage.haskell.org/package/url-2.1/docs/Network-URL.html
….And quick tip should be quick :)
]]>forkIO
function and newChan
in Haskell
forkIO is part of Control.Concurrent
package and as it says it:
Sparks off a new thread to run the IO computation passed as the first argument, and returns the ThreadId of the newly created thread. The new thread will be a lightweight thread; if you want to use a foreign library that uses thread-local storage, use forkOS instead.
This is very neat if your program wants to use all the cores of your CPU or at least be more responsive not waiting for stuff to happen.
forkIO type is:
1
|
|
forkIO would be enough to start working on stuff but to make a real use of them we need a way of communicating with our threads. This actually opens design of our code to new stuff like building workers. There are other ways of communicating with threads like mVar but IMHO channels win hard.
Channels are part of Control.Concurrent.Chan
package and are typable! Typed communication Yay!
Channels functions we need have the following type signatures:
1 2 3 |
|
And that’s actually all we need. Let’s make some stuff working.
I think most of the time it’s better to explain stuff on examples.
First thing we want is just to spawn!
1 2 3 4 5 6 |
|
This is a very simple way to spawn a light weight thread via forkIO :>. As you can see it is a normal action so you can go dirty!
forkIO takes actions and give you back IO ThreadId so you can keep track / kill threads you don’t like
Previous example was a bit cheating as it showed nothing really important so lets make some crazy threads printing stuff now.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Well compiling and running this gives you only “hit it guys” as main thread exits and child threads dies! lets fix it so we can say when they need to stop!:>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
After launching it you can see how each thread is spamming its prints ;> So it work until you will hit enter. Cool so we have something working.
How does it work ? first of all we use forkIO to spawn threads and this time we have each “thread” function in separation.
Each of them run forever like crazy music fans :). In this place we can make it simple by using forever
from Control.Monad
to make it simpler.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
forever
is part of Control.Monad
as name says it is doing action forever ;) useful for stuff like workers or stuff that has to happen all the time it type is forever :: Monad m => m a -> m b
.
Cool so now we have some basics how to spawn a thread using forkIO but to have something that we actually can use in real life we need to have some sort of communication. I wanna present something i feel would be useful in almost every haskell program. Channel combined with forkIO.
If you programmed ever in Erlang
or Go
you will know what i’m talking about, channels are very similar to message passing. Basically it is a pipe that you can write to or read from in different threads/processes. This is one of the mechanism we can use to get data out of other threads. Because they are not sequential we can’t predict normal vals or time when they will be ready. One of the ways of getting response from threads are channels.
Channels are amazing because they are flexible :) And very natural. Basic principle is simple you write in one thread to the channel and read in other :)
But lets make an example that will show how powerful it is.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
Nice! What happens here :) So new things are newChan
that create channel which we will use to talk to our gossipGirl. readChan
reads data from channel and writeChan
writes stuff to channel. This is very simple :) So now lets generalize our worker into something that we can use in next mini tutorials. A worker.
simple worker will take 1 channel as parameter and spawn thread this will help us in understanding how this whole thing works. (ofc if we don’t got it by now :) )
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
|
Yes you can build workers as you want. I would not spend time on trying to build uber generic worker as it is usually custom and you don’t need to spend much time to make one :). Usually you can have worker types for particular tasks eg. databasesWriter, logWriters, counters etc.
Now why would you all this forkIO stuff ? here is there reason. Cat simulation!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
|
I hope this gives a little insight into forkIO
and channels
functions as You should use them in your code. It is super simple to add them, they work miracles and i love them. Yes you don’t need to be expert on Kleisli arrows to use them ;).
Cheers!
]]>This book is talking about R17. From what i read R17 should be named Erlang 2.0 the changes are just amazing.
My face after reading changes…
This chapter hits you in the face! I don’t have yet E17 to check more stuff but this looks amazing. Ok lets have a look. In Erlang E17 they are introducing MAPS. Also known as key-val’s / hashes / assoccs / dictionary. Basically a data structure that lets you store a value with a key and retrieve the value if you know the right key.
Lets have a look at the syntax.
1
|
|
This creates a map with 3 elements a,b and c. Easy. Of course maps are immutable data structures so if you want to add stuff you need to do it like this
1
|
|
So it is similar to updating records but imho the true power is in retrieving data and pattern matching on maps. YES PATTERN MATCHING.
1 2 3 |
|
Isn’t this amazing!? You can use maps like in ruby and pattern match on them. And i was just about to scream from joy when…
i saw this.
Yes you can serialize and deserialize maps to json. WTF ?! Yes.
1
|
|
Calling maps:to_json you can make from map a json and by calling
1 2 |
|
Gives you option to load maps from binaries! C’mon! this is amazing. Safe version will explode if you try to flood VM with not existing atoms. This is useful! because atoms are never GC’ed!
Yes new MAPs are amazing! I love them :> This sloves so many problems and resolved so many situations when you had to write boilerplate code. Amazing work!
Is also new thing i love this chapter as it shows you how to tackle real thing which is websockets :) it is very cool addition to the book and also a free sample so you can read it on your own before buying book.
http://media.pragprog.com/titles/jaerlang2/websockets.pdf
This is great from empiric point of view as Joe shows how to use rebar and build real life code using github. This is a great thing and worth reading. I love it :) You get real life example… i get here everything i lacked in previous book.
Every single new thing in the book is great. Stuff about E17 version of Erlang is just great. I don’t have E17 yet on my box but this will be by far best release of Erlang. In chapter about maps he talks about looking at ruby. I think it is a bit inspired by ruby builtin syntax for hashes by yet again. This is amazing and in future this will resolve so many problems and make many API’s much more useful. You don’t have to type anymore ton of _ if you want to pattern match on big tuple. You can match on a key.
I spend an hour reading the book on the train from London to Epsom and all I can say. It is great!. I love it and i love new changes!
]]>While using redis there is very common task we do. It is SET
followed by EXPIRE
. We do this when we want to cache for some period of time some data.
1 2 |
|
This will set key user:1:token
to value "kuba"
and next set it to expire in 5 seconds. We can check Time to live on this key by using ttl command.
1
|
|
this will return number of seconds that this key will be valid for or an negative number if its not valid anymore.
SETEX Introduced in redis 2.0.0 command, lets you do both things SET and EXPIRE in one go. How do we use it ? Its simple!
1
|
|
It is not key, value seconds! :D:D example usage:
1
|
|
This will set for 1500 seconds key “key:to:home” to value “4b234ferg34ret34rasd32rs”. Pretty easy thing to do.
Since Redis 2.6.0 we can use new command it is PSETEX
1
|
|
Will set “key:to:home” to expire in 15 seconds, its important to notice that TTL will give you back amount of time units! so if you did PSETEX it will me miliseconds and if it is SETEX it is in seconds!.
Cheers!
]]>And most of the time it is just updates from friends, sometimes some news from hacker world that potentially could be interesting. I use twitter in a bit odd way first of all i post link to things i read during commute to work :) and i use twitter to post alters from my apps to me.
I think twitter is great because he is posting things to my phone also, so instead of building this really complex infrastructure i can use twitter to do everything.
This is simple i add to every app i make a bit of code to handle twitter. eg. method like this in ruby
1 2 3 4 5 6 7 8 9 10 11 |
|
This is part of TwitterDriver/Agent class. And i wrap everything into exception handler when i get exception i log it and send notification to me via twitter direct message.
I love this way of using twitter because a year ago i thought it is only for sharing very random not useful messages about cats and sharing instagram pictures.
Try it yourself!
]]>I started making mind notes of things that i understand about system i’m working with. I never had time to think about grammar of this and really get into it. The system has many parts but i thought it can be reproduced in for of language that is easy to parse and nice to work with. I think i might change my current solution into more LISP looking thing just to make it easier to parse.
Each campaign is a sequence of actions that happens one after another. So my initial thought was “this is simple”. We can represent it by list of actions like this
1
|
|
But this is true only for simple campaigns and most of them are NOT like this :(
Most of the campaigns are built around key thing. This thing is making decisions! So if i can incorporate “if” i WIN ! Lets have a look
1 2 3 4 |
|
Yes! This was it, i think this is solution to all problems, ability to represent campaign as a sequence of steps mixed with if statements essentially abstract syntax tree of campaign.
It is not a tree…its more a graph. But AST will make it :] it is a programming language! So this was in my head and i did not had time to work on it… but today i decided to give a try and I made first impression of AST that we would need to have to make it work.
I wrote simple grammar and made a parser of a very simple “ify” language. My language starts like this…
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
This shows that the grammar is very simple, we can have assigments, operators to test and compare and ofc IF statement ;) this is our key to divnity!
Parser looks kinda ugly because i used parts of the code i wrote before and had i in different projects.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 |
|
But it works… we can parse. Now i have to make it more production ready and less unstable ;]. This is first attempt at this idea. I need to expand grammar and think about it more then just few minutes. but i think it is a good start. Next step for me is to make better grammar and parser and move to building interpreter of this mini language.
]]>We live now in startup era and main focus of people building new products is to make them fast. Who is the first to build some solution is most likely to get the market and monetize it. So almost every father of startup is looking for a technology that will help him build thing fast. This is the main reason that Rails are so popular, they are not fast, they consume memory like crazy but they offer you out-of-box something that you can work on. It has all properties you are looking for and everything is there with rich documentation and easy to learn and use conventions. Rails are like a gift to people who wants to “prototype” application fast, but for 99% of solution out there they never leave prototype stage.
Rails started small, the community was small and it was normal that not many people cared about finding security issues because it was more important to add support for everything and make the framework richer and richer. First thing that was the biggest helper in opening every door was raising popularity but this is also the biggest enemy of the framework. It started traction around the framework and people started to hack it, audit code and find things that can be exploited.
Every framework has to go through this type of periods in its life, there is no bug free code. People will find bugs in code base. What is the best way to prevent this ? Have a great team of engineers that follows the trends. But… most of startup owners wants to cut the costs, they outsource the work and have periods of life of the product that simply nobody cares about it in technical way. Or it can be even worse, team can be focused so much on features because of the boss pressure that they don’t have time for it.
Some of startup starters will be technical, more technical or not technical at all, in most cases it is not technical at all and this is a problem. This leads to building every startup base on frameworks like Rails or Django. Scenario looks the same every time. First team spends a lot of time building initial release and next it goes big so they don’t have a way to scale it in other way than spaming rails instances and changing database. So if something hits rails it hits whole platform and that hurts. Second scenario is that team is having some sort of engine and just a rails front end this is a smart approach because if something will go really wrong it only will kill front end and this is not bad! But how many teams build solutions in this way ? Not many, mostly polilingual team that know something more than ruby. What i experienced in my career is that people don’t wan to do some initial design decisions before start they just want to have product and “we will think about it later”. This from business point of view is ok but this “later” is often very early. Some startups are lucky enough to have engineer that are smart and know how to solve problems using background processing, caching and a bit of cheating (eg. like youtube do with vote count) so make everything work smooth but most of startups are created in a crazy way with big stress on speed of building.
People will always suggest things like YAML.safe_load
in my personal opinion its not a solution but just a patch. Why not disable support for YAML, JSON, XML and any other type of request and make it explicit what you accept as form of request for actions. Trying to apply every possible parser to input is not often best thing to do.
I think problems like rails have now with security are not something we should cry about, it is just another step in becoming mature framework and problems like this can happen always with every framework. We have to embrace it and devise tactics to deal with it in a timely fashion so we will not be affected. Building software is not cheap, maintaining it is not cheap but… if you will hit right market you will get the money back.
This is how i see startup stage now.
]]>Erlang
and only small module reading stuff in C.
Firt thought was to build small C program that will check stuff periodically or just “check stuff” in SHT1x and just print it out to output. So my first attempt was
For the example here i will use /proc/cpuinfo
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Nothing super special, except that raspberry pi is not really cool with different packet sizes and it can for example not read whole input properly. I observed some issues with it. So i decided to explore more FFI.
The thing i found is called erl_interface and it is designed for FFI. This is it! What you do is you build process like thing in C and micro module in erlang that handles this. (It is just to make it look nice). But there are few glitches!
This is my module posting back “pong” on “ping” message
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
|
First of all complexity goes p fast. You have to think about many things. Here i know that my host is called “emu@raspberrypi” and this is actually process that runs so if you will not remember about freeing memory you will fast learn what means “memory leak”.
But most important thing is how to build this and run.
My make file looks like this
1 2 |
|
It is important to remember about -pthread.
No how to make everything work… we need module
1 2 3 4 5 6 7 8 |
|
` This c1 is for c extension 1 not super obvious :) cN will be for c extension N =).
And finally we need to spawn our node…
1
|
|
This spawns node named emu@raspberrypi on my mini box with cookie “cookie” now … i said finally… but it was not final step.
Final step is to run out lolpong
binary. It is important to run it after node is up because it will try to connect to this node.
1
|
|
Now we can run in our erlang shell check if everything works!
1 2 |
|
Works!
It is good fun, now i need to wait for the parts to assemble everything and build rest of application :)
Cheers!!
]]>So first of all i used builtin inets for making requests. So first thing to do is to start inets!
1
|
|
Ok i generalized using httpc to really simple thing
1 2 3 4 5 |
|
So if you will a call you will get something like this
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
One more thing i added was extracting body of the response
1
|
|
So after calling it on response we can get body
1 2 3 4 |
|
So next thing to do was to parse json :) for this pupose use two libs mochijson2 or jiffy.
So this one is easy to find and use. All you need to do is grab file from mochiweb project and compile it :) it has two method encode and decode.
1 2 |
|
One thing worth mentioning is that it expects input in a bit structured format.
So typycaly it will use “struct” symbol for json “{}” objects.
1
|
|
So array in json will be list but object will have to be in tuple with struct. It is awkward at first but you can get used to it. but there is easier thing to use…
This! There is a good example on main page.
1 2 3 4 5 6 7 |
|
This shows how nice it is to use :D no ‘struct’ pure love!
You can get it here https://github.com/davisp/jiffy
This is just note to self to not be checking up docs looking for this stuff again :).
]]>“Theory is the thing we all skip and later on regret skipping” (Yes i quoted my self, I heard someone did it before so its not hip). We learn a lot of stuff and some of the things simply slip through and gets forgotten. That is opening for “new discoveries” or “rediscoveries” of some solutions. I have a thesis that “if something is simple it will get its niche” (did it again). This would explain new hipe for Nodejs <nodejs.org>. I don’t want to judge technologies I did some stuff in node and i have small portion of experience but this text is not about this, its about technology behind stuff people think it’s brilliant and new nowadays.
Event loops exists long long and are nothing new. When someone says event loop first thing i think about is not nodejs or eventmachine or twistted but WinAPI. Yes old WinAPI and suddenly everybody realize that every GUI (Graphical User Interface) solution is evented. QT, WinAPI and many other even older solutions. This is nothing new, so lets take a trip through the cards of history to find other gems that we can rediscover and become new rocket scientists…or first tech archeologists.
So most common thing people do is sequencing and even if we use brain in other way it is easy to us to think about and form sequences because it is something we understand. This is basic unit of work we think about. So most of people write small function that sequence some action and they build programs from small functions calling small functions in a really big sequence.
Alternative solution to do this is event machine, this solution will enable us to generate a lot of small event and if we will manage to force programmer to make them as small as possible and as fast as possible we can achieve a feeling of multitasking. How does it work ? we have one loop that picks events from event queue and process them and as long as nobody will make event that is blocking everything this will work like charm. Key to this solution is “non blocking”.
To do this we need to have a way of “wait until will happen and if it happen’ed trigger event” thing. This can be done by many different solutions. For example for file descriptors (everything is a file) we have stuff like select
, poll
, epoll
, kqueue
. and this is not new its the 80’. Later on POSIX first official POSIX thing 1997. 15 years ago! Previously mentioned nodejs is representing this category (ofc from the fresh news!). Does it have flaws ? Yes! People call this async… its not async, not real async. Real async was also implemented in 80’s :D
Signals are also very old, basically this is similar concept (every concept is similar) You have a slot that handles signal and something that emits signals. It’s very handy and was implemented with big success in QT library. Does it has flaws ? basically same as event loop solutions. It’s a way of doing “message passing” pattern.
Message passing really well implemented in Erlang is another way of achieving same result, imho most clean and good solution in the industry right now. Simply you cast messages and other entities receive them and react. YES THIS IS HOW PEOPLE WORK. Most of thins seems to be inspired by nature… but lets not go this route and start religious war.
The real unix async calls. Yes unix has this built in.
1 2 3 |
|
You call aio_write
and write to a file will happen, sometime in future… BEFORE END OF THE WORLD, maybe!
Wow! This is fresh! nobody is using it nowadays! you should make new nodejs on it and call it modejs. Or wait ? Can you ? nobody really know how to use them and not be cluster f@cked after five seconds. Real async syscalls in unix. That is awesome and its older than your son! Lets dig more…
This is the sh*t, real event programming, you have real reactor nothing is faked like in eventloop! But its nothing new… also implemented in 80’s. But what is the difference between reactive programming and event loop ? Not much because this is more conceptual thing around building your reactor. In reactor concept we think about data flows and reacting to changes in them.
We don’t have to research things that are found!
So how mega smart people in Google work ? I don’t know I don’t work for google and I’m not smart but I assume they want to get sh*t done. And after reading Amazon Dynamo Paper http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/decandia07dynamo.pdf , Google Big Table paper http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/pl//archive/bigtable-osdi06.pdf and many other i think they developed it in kitchen. Yes in a kitchen.
They sat down, people from google and amazon i imagine 6 engineers took 1 apple (Steve Jobs was serving) and asked them self how to how to cut it into 6 parts so they can eat it. They took knife and sliced into 6 parts and split it to each other. Suddenly one stood up (Amazon guy with headphones in avatar) and said. Each part of apple is exclusive (patent trolls have everything exclusive nowadays) to each of us so we can eat them on our own. LETS MAKE DB USING THIS TECH AND WRITE A PAPER ON IT. Next guy said.. yeah we have limited number of apple parts so it is predictable and we form a RING from it… and thats how Big Table and Amazon dynamo was born. The Google cook who was slicing apple said, i sliced apple in parts one by one with one knife… AND HE WROTE MAP REDUCE PAPER.
They did not invented anything really new, they just approached problem simple. We can’t split and scale complex solutions so we will make everything simple. And it happens that most of things can be turned into linear or near linear problem.
In my opinion not. Map existed in lisp world probably since 70’s. Now someone took map and sequence of data that is exclusive and don’t depend on each other and said “What will happen if i will split this list into two parts and run map on each in two or more threads and then just sum results ?” and map reduce was born. Concept is dead simple. But its nothing new! I’m not taking here credit from google engineers. I think guys there are very smart and by using simple things in my eyes they are ^2 smart. But is it something ground breaking new ? No.
So maybe we should more often look into past. Maybe we can find more gems like this… i bet google has technology archeology department :D
I’m 80’ kid. I remember type writers and the sound so for me understanding how files in OS (Unix or in general) works was simple, but when recently i was explaining it to friend who is 90’ kid he did not got the concept of that well. Why ? Because he is from age where type writers did not exist. For me it was a bit shocking because i thought everybody knows exactly how they work. And when i said why we have \r and what it is i was for a moment happy that I’m very very old :D.
Technology archeologiest must be open to new things, read about everything and most of all, never stop thinking “this had to be already invented”. When I first suggested Queue as a solution to a problem some people looked at me like i would be snake. FIFO queues, something so basic… still some people don’t even think about them. This is dull example but basic data structures should be well known by everybody still… things that freezes blood in your veins happens.
Building DSL and interpreters is good fun, its not easy often I’m still learning a lot but this should not be “devil” to other programmers.
There are books that contains tones of cool knowledge, you can be technology archeologist like me! just venture into them and discover “Stuff”. First book i would recall here is Richard Stevens.
This guy is amazing, books are very good quality talking about topics that today are more than active in our community. 22 year old books!
Amazing book, I never was big fan of A.I. (maybe i should, that would give me chance to actually have some of I…) but this book opened my eyes on how simple this stuff is. How easy and how powerful. When i was at uni this was one of the books i enjoed the most.
Classic this guy guides you through Pascal and shows you how to program and at the end of book implements subset of language. I say 76’ FFS that good books will not be printed after 2006. 44 years ago published. C’mon!
Next classic, especially section about making your own malloc(). C’mon 1988’ 24 years ago. Brilliant book.
And something fresh
Really good book, also common pattern in books that i read… guy is implementing language at the end ( subset of C )
Yes this is good stuff. Old books containing knowledge that industry will discover in 6-10 years.
Current stuff like dependent types… i recon we will hear about them in industry after our deaths ;/ next week apocalypse ;/. Yes there are tones of things to rediscover… and it’s fun! lets do it!
I love when i hear about new stuff and suddenly after inspection it becomes clear it is just old thing on new cogs. I love this. I love technology archeology and i want to bring more examples of this and write about it on my blog.
Yes sending satelite in space is something new…. another event loop implementation is not.
Sorry for ranting that much :D… anyway nobody read this :D.
]]>Before i tried Clojure i had few “trips” to Lisp world but i was always stroked by amount of half finished vm’s and implementations that are far from being production ready. For example chicken, scsh, sketchy, mit-scheme, bigloo. Some of them are more some less production ready but none of them is ready for anything real. The most production ready lisp thing i saw in my life was elisp
YES f*cking EMACS
lisp.
But I’m not any expert in lisp world!
Scheme is a dialect of Lisp that implements subset of language, in scheme code and data has same form which is a massive game changer. Scheme has super duper hiper mega MACRO building capabilities. And it is probably its biggest problems…
I remember when my friend was hacking in lisp, i remember when i was trying to hack stuff in scheme. So scheme is a trap! First stage of lisp/scheme hacker is to build scheme interpreter in scheme, when he finishes this he tries to implement scheme interpreter in his interpreter… Second trap is when he realizes that he wrote enough amount of scheme interpreters and he moves to implementing new features into his interpreter in interpreter in interpreter ( and yes ‘ready’ means that 8/10 runs stuff passes, some crash and half of features don’t even exits so called “junk features” ). So his dialect gets better than Haskell lazy evaluation, better etc… at this stage he has self bootstraping half working interpreter of his language that had no docs and nobody except his can use it and even if he can he must be on drags to make the lazy evaluation happen. Next stage of course is starting to do AI in scheme. So the point is that no shit gets done, its all conceptual work. If Da’vinci would program in Lisp he would not cut of his ear, he would cut of his head.
I wrote some borderline rant on lisp hackers and this was supposed to be micro tutorial on scheme. Well i took the Lisp way of “getting shit done” :D. If you don’t know that lisp bread and butter are (
and )
you should probably run away.
Naow!
Clojure seems to be production ready stuff that can “get shit done”. It support concurrency (with STM – Software Transactional Memory) it is Scheme’y / Lispy and I have read that its hip in Open Source magazine. Thats enough to start learning yeah ?
No! But i want to learn it and write some stuff about it. It seems to be so different from other languages, RP notation and the whole community of Stallman like people.
Clojure runs on JVM ( Java Virtual Machine ) and beside Java being biggest troll language next to C++ ever! JVM is really good. It makes stuff really portable and its good for building stuff like backends for servers. Same goes with Scala using JVM, actually JVM is solid! ( coming from me Java hater ). Its not like some f*cktard writing in “Open Source Magazine” that C++ enables you to write portable code. No, writing os portable code in C++ is not easy, its hard and full of pain.
First thing you have to do is to install Clojure. This is the easy step. Just visit http://clojure.org/. I installed it using macports by running sudo port install clojure
but for more into i would suggest going to:
Next you will want to install leiningen
this is a build tool for Clojure. same procedure. either sudo port install leiningen
or http://leiningen.org/ for more instructions.
Enough of this sh*t lets do some stuff. First to run repl you need to do following.
1
|
|
I mix it with cjl
but thats because i’m a noob.
This should prompt you with something like this
1 2 3 |
|
Success!!! We can now try to evaluate some simple stuff.
So now normal person would try to type something like 2 + 6
bad idea! Expressions in Lisp starts and ends with ()
so if you see things like (((((((((((((((((((((((((((((
or )))))))))))))))))))))))
you know something is happening :D (second one more commonly)
1 2 3 4 5 6 |
|
RPN now i should quote link to wikipedia on Reverse Polish Notation. (I’m Polish so it is kind’a like looking at my back and trying to read sign on t-shirt). Here you go http://en.wikipedia.org/wiki/Reverse_Polish_notation Reverse polish notation simple words is a stack notation. Where you put operator on stack and then arguments. Or so called prefix notation. C, Java, Ruby are all infix notation eg. 2 + 6
while lisp / scheme is prefix notation aka (+ 2 6)
on reality its so cool that you don’t have to repeat your self and you put operator, function only once and a list of arguments eg. (+ 1 2 3 4 5)
. So lets try it out.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
We can see that those examples work in REPL nice!. But wait there is something different between (/ 4 5)
and (/ 4 5.0)
. Yes! Clojure has built in support for rational numbers so 4/5 is a rational 4/5 not rounded estimate of 4/5 while result of evaluating 4/5.0 is a floating point number. Lets see what happens if we will try to do 1/3
1 2 3 4 |
|
Yeah the second one is not really accurate. Or is it ?
1 2 3 4 |
|
Not really…. it should be something like 0.(6) :)
So this is how you invoke functions, built in functions. But what if you would want to comment something out ? because its a useful info about new scheme dialect you are building?
You do comments using ;
everything after ; is getting ignored!
For example:
1 2 3 4 5 |
|
First thing that makes people happy usually while learning new language is the moment when guy that tries to explain whole concept helps them write first function! Yes thats what i will do now.
To define a function we use defn
(old lisp grinders will connect it with def
yes!) that defines a function for us, so to it we need to supply name of function, or the symbol that this function will be bind to, arguments and body like this.
1 2 3 4 |
|
Next we can invoke it eg. (multiply-by 3 4)
its worth mentioning that you can use -
in function names. But what is really happening ? this is two part thing. First we have def
and second we have fn
def binds symbol name with expression or value and fn
return a function. So it is “like lambda” in fact it is lambda.
Lets go with example.
1 2 3 4 5 6 7 8 9 10 |
|
This is slightly more complex but also should be easy. In first line we use fn
to define function with one parameter called base this returns a function. In line two we do the same but this function is returned and passed and argument of 5 and evaluated to 6. In line three we define symbol troll-symbol with value of 7 we obviously can’t evaluate a value so we get exception! It’s all fine because its value. But we can use it like in last line, where we add to value of troll-symbol effect of evaluation of function that we defined passed with argument of 5.
Don’t be afraid :D. All we did here is explained what defn
do, basically defn is just composition of def and fn. Lets you specify symbol that function returned by fn is bound to.
Defn simply saves us key strokes :) Like using Emacs :D this can lead to RSI. =)
1 2 3 4 5 6 7 8 |
|
This exactly shows what happens when you use defn, you get function from fn and bind it to some symbol. That’s it. Its good to have some play with all this brackets :) its fun! try defining something on your own :D
So every tutorial has to have something like hello world program! In my tutorial there is space for Hello World also :D
All you need to do is to create a file called hello.clj
and type this in!
1
|
|
When you will run it using clj hello.clj
you will see something like this
1 2 |
|
So i mentioned something called fn
for creating functions. But sometime you need a function ad’hoc eg. when you pass it to map
function this is done by using different syntax.
1 2 |
|
You can still use old syntax but this is just faster to type. Remember that %1, %2 … refer to arguments that are passed to this lambda.
Like in all programming languages we have lists and vectors. Those are built in. Syntax for List is simply '()
eg. '(1 2 3 4)
and for vector aka “array” its []
eg. [1 2 3]
. NO COMAS!
1 2 3 4 |
|
Lets do something useful with them! print them out naow!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
It is map… so it is ugly because result from println
is nil so we printed out contents of vector and list but its ugly.
Map is a high order function that takes function and list/array/sequance of arguments, applyies and evaluates funtion on each argument and stores results of this in a list that is returned at the end :) eg. (map (+ %1 1) (1 2 3))
will give [2 3 4]
.
Lets try to do it with doseq.
1 2 3 4 5 |
|
Works!
REPL – Read Eval Print Loop. Simply language console in which you can run code and see results without typing code into file and running interpreter on it.
Lisp – On of the biggest language trolls or only language worth learning. Still can’t decide
Jar – Java archive that contains code, can contain also jam. I like jam.
This is just the first part of the tutorial i hope you liked it, just covers the very basic of defining and evaluation functions.
]]>tcpdump
:D So i found this tool useful while i was working on many things. Guess what ? it if very useful when working with network related stuff :D but its uneasy to grasp. This is my list of commands and options I use.
In this text i use en0
, en1
naming convention from OSX if you are linux user you should change it to eth0, eth1 w… check your network config using ifconfig
. Basic knowledge required! =)
Tcpdump is a tool that lets you dump network packets. This helps to debug networking issues, apis, communication or other stuff.
Tcpdump basic options are
First thing that people do often is to listen to everything that bounces en1 like this:
1
|
|
This is obviously bad idea, only good thing about its that i lets you see that “something is on” so you will be able to say that this device is actually working.
If you want to see all traffic that goes to some host, so something that is useful you should add host
option.
1
|
|
This will let you see if there are some packets going to and from <www.facebook.com>.
Lets say you want to see what generates curl to your own machine
1
|
|
and in other shell just type
1
|
|
port is most fun option because it lets you see stuff that you are interested in.
Tcpdump is useful tool and i hope this text will let me not constantly forget its options.
Cheers!
]]>Recently in “professor Toons” :) computer club was a task that was ideal to solve as graph. It was one of this “find path” tasks. Yesterday just before going to bed I thought “Why not solve it in Erlang”. And this are my thoughts about erlang digraph: library.
First of all http://erldocs.com/ is the way to go when working with documentation.
Second “the Task”. It had two parts, first was to load xml file with songs and second was to build graph and find playlist between two songs having in mind that for each song following has to start with the same letter that first ended.
example.
if you have song list: ABC CBA BAC ACB BBA
And you want to find playlist from ABC to BAC you will get ABC –> CBA –> ACB –> BAC. Easy
Digraph API is quite nice but there are few things you have to have in mind.
To create empty graph you call :new/0
1
|
|
To add node to graph you call :add_vertex/2
1 2 |
|
To add edge to graph you call :add_edge/3
1
|
|
To get all vertices from graph (random order) :vertices/1
1
|
|
And finally to get path you just call :get_path/3
1
|
|
This path is a list of vertices in order. This API is just to remind me essential things about using the digraph from erlang stdlib. This graph is quite simple.
And for me for about 5000 vertices and a lot of edges it consumed 1.5 GB of ram. But when it was created it was super fast to use.
It was fun to play with.
I will make quick summary in form of short code sample!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
This gives you back ["London","Krakow","Andrychow"]
so Win! we are at home!
It was fun to play a bit with digraph before going to bed. There is a tone of things i did not use and also look into digraph_utils for even more things :)
]]>Many people use adds queue system to their products. Some of them do legendary things with it to make it extremely unreliable products :). Most of this solutions may seem trolling but they actually exists in some products.
First thing often people do is rolling out their own queue system. Is it bad ? no! it is great as long as you don’t have constraint that data can never be lost!
If you use queue just to communicate between processes you can use something like unix name pipe. In reality this is just a file. Actually in Unix everything is a file and this is best ever design (if you neglect it you should die!).
1 2 |
|
And next you can use it to push stuff into it eg.
1
|
|
but this is shell example, you could create it and just read/write it in your processes.
That is cool. And this is the last point where we will not see problems :)))
Yes we are programmers and most of us are young and full of energy nobody remembers 70’ i was born in 85 so i technically would be quite mad if i would remember 70’.
So how we approach problems so of us would create inside of their code queue.
In C it would be simply array wrapped with mutex’es but this is unreliable and its long to write and and and…
So what people do ? They try use READY products.
Use key-value store as queue. – Lets serialize array into XYZ and set it into key. – That’s good idea! Only one thing will write to it!
WRONG!
Such an assumption will provide you with insane amount of carnage in future. And even if i know “agile says XYZ now”… actually “agile” don’t say “take drugs an yolo because tomorrow you can die” but “TAKE RISKY THINGS FIRST” and this is risky thing. Should be implemented well.
What happens in this case ? Someone gets a great idea that product should scale adds another daemon and this f*cks up queue you lose messages.
Some dbms can handle this problem, but wrapping it into transaction will not solve the problem.
Scenario is: Process a)
Now imagine process b) does the same. Everything is blazing fast and you get f*cked.
So DB system must know context. This is where RIAK shines, you get vector clocks and you know that you are f*cked. You can react but in 99% you don’t know how to resolve this issue but at least you would know… but some specialsits can disable this because handling vector clocks is a pain and you can get PERFORMANCE BOOST :))).
Redis is great tool to build a lot of stuff. And it has built in data structures. I think this is ground breaking because previous solutions like RDBMS most commonly use or other NoSQL solutions. Redis is great how to make queue within redis ?
1 2 |
|
example like this
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Cool ! Works great and any process can access it in atomic way isn’t it great ? best thing ever ?!
Actually it is very good. But there is one thing you just missed! Redis do a flush of keys every 60 sec if 10k keys did change by default. What does it mean ? You can get screwed if redis will instantly die!
How to fix this ? Visit http://redis.io/topics/persistence and see section “Append-only file”
1 2 |
|
Man you just did it, you lost some performance but you did it. Who would knew ? You just saved the world. How much we just cut performance ? “fsync every time a new command is appended to the AOF. Very very slow, very safe.” that is Ouch! Your boss could be unhappy even if this solution is actually the best, most simple and durable idea.
Persistence daemons worked out a new combo. You store each element of the queue like key-value store in SQL RDBMS and put its id on the queue in redis next you pop it up from the queue in redis process and updates its status in SQL RDBM. This is not so bad but it kills performance more than just turning on “appendonly yes”. Also it makes things hell more complicated and forces you to do updates in both system.
Is this system cure for cancer ? No! You have to have very good queue fail recovery / startup system. Simply empty list and make query
1
|
|
next you have to clean redis queue and push new data. Is this safe ? No you don’t know if few last jobs did finish or not. Eg. Mysql got f*cked but messages got processed. Yes this adds a lot more complications.
Also with this solution index on ID column makes its fast to make a select but slow to add or remove. And you want your queue to perform and yes mysql will do fsync.
You can’t atomically pop stuff. Don’t think about pop/push/pushall on array in document! If you will have this idea check my gist https://gist.github.com/2071805 run it and see what happens :) what you get back.
When you will visit ZeroMQ page you will see
1 2 3 4 5 6 7 8 9 10 |
|
Nothing about consistency FASTERN THAN (this has to be good) TCP but can use TCP (i wonder if it can be faster than TCP even using TCP /trollface). Anyway you see a lot of stuff. I started some search on zeromq losing data and what i found http://zguide.zeromq.org/page:all#Missing-Message-Problem-Solver a nice image.
Big thing :)
If you will visit rabbitmq page http://www.rabbitmq.com/ you will see a lot of nice things like tutorial etc. Page is nice and has useful knowledge. Both solutions have client in Erlang (massive plus) and other languages. And even while setting up whole thing may be a pain i think this is a solid option both ZeromMQ and RabbitMQ.
We use them to absorb traffic of messages and process their content by eg. workers / handlers etc. If we will make it unprocessable by more than one worker we ain’t doing our job properly.
What makes things hard.
I think the best way to go is just to start a new movement called Unix Archeology because we seems to be reinventing the wheel too many times. But really
I’m 100% sure that storing queues as serialized lists in memcached or keeping them as table in mysql/postgres and making loads of funky stuff to keep it running is not the way to go. It can seem like a good idea at start but it is not. Named pipe in file system can be better.
Loads of things can be brilliant queue choices eg. Redis, ZeroMQ, RabbitMQ or even named pipes but not serialized array in key-value store.
]]>@antirez: “The Redis community is composed of 99% of people that, really, know their stuff, know the Redis internals and behaviour, and are * great *.”
@shanley: “@antirez I’ve never met a technical community where 99% of them were familiar with the internals of anything. Did you mean 9%?”
This sparked in my mind very quick review of the topics that we talk about in work and i realised that we talk about a lot about internals of redis and a bit about riak but this is different story :).
I just wanted to write a short post about first thing i ever picked when i was looking into internals of Redis. It is List :D i love lists.
So what i did was opening again github and picking up list header file to re-read.
https://github.com/antirez/redis/blob/d310fbedabd3101505b694f5c25a2e48480a3c2b/src/adlist.h
First thing that you notice is that code is simple and whole thing is implemented in 93 lines of header and 341 lines .c file. (with license etc lol).
In general List is just degenerated Tree. In Redis structure of it is simple. Whole description of the list is simply
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
this knowledge lets us count how much space this will take on the heap and compare it with eg. set if we really need to. (list is imho most memory effective structure)
List iterator is important so its also worth having a peek at even if this is just internal implementation.
1 2 3 4 |
|
with this we can take a peek into .c file and check how you get iterator.
1 2 3 4 5 6 7 8 9 10 11 |
|
And see that even if AL_START_HEAD is defined as 0 and AL_START_TAIL as 1 if we will use direction of 5 (lol) we will get tail :D I know that i’m bikesheding now.
Even without going any deeper you have a feeling how this works. Double linked list with (void) value. First thing i thought today was (this was stupid) “Wow why this is (void ) and not (char ) this would let compiler better type check it while compilation” but antirez wrote to me “@jakuboboza hint: grep listCreate .c” and that was the hint i needed. (void *) is more generic but list is used in many places in redis internals and i did not thought about it (lol)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
A lot of places :) lol.
In adlist.c we can also check how the list is created
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
This is just academic example :D I love it. This code is easy to understand and just pleasure to read.
It is important to talk about them because if you read in documentation that
1 2 3 |
|
you really want to check this out to get better understanding how stuff works under the hood. Even if this is trivial example.
It is worth talking about internals of tools that you use, you learn a lot and i think its truth what antirez said, this community is great!
Thing to view is suggested by antires Dict!
antirez: “@jakuboboza it’s definitely a very simple implementation! Probably our most “on steroids” implementation of data structures is dict.c”
^_____^
]]>