Failover Redis Setup With Sentinel

Sep 26th, 2012 | Comments

Long time nothing new, recently i started my own company LambdaCu.be and I was massively busy. If you want to hire me ping me at kuba@lambdacu.be =)

I have in pipeline a lot texts about lua scripting in redis and using it to build some tools but can’t find time to finish this stuff ;/.

Auto failover

Every database wants to have auto failover mechanism. This is a great marketing pitch! haha :) Main thing about is that one of your server can go down and you still are operating as normal and when he will go up again everything is fine.. unless your routing server will go down :D ofc.

2.4.16 / 2.6

Since Redis 2.4 and 2.6 there was this idea of adding it. Antirez wrote a draft spec for it and implemented it as experimental feature. It is really well described here http://redis.io/topics/sentinel so i will just write a short note how did i setup it and how does it feel.

Setup

While preparing to this demo i did everything on master 0ee3f05518e081640c1c6f9ae52c3a414f0feaceso what i did was simply start “master” and “replica servers” with this configs (ofc turn daemonize to yes in production lol)

Standard Master setup with default script on port 6379 and replica with

daemonize no
timeout 0
loglevel notice
logfile stdout
databases 16
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
dir ./replica_dir
slave-serve-stale-data yes
slave-read-only yes
appendonly no
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
lua-time-limit 5000

pidfile /var/run/redis-replica-7789.pid
port 7789
# replication config
slaveof 127.0.0.1 6379

slowlog-log-slower-than 10000
slowlog-max-len 1024
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-entries 512
list-max-ziplist-value 64
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60

So i had master and slave running :) that was cool next thing i did was! configure and turn on the sentinel!

sentinel monitor mymaster 127.0.0.1 6379 1
sentinel down-after-milliseconds mymaster 60000
sentinel failover-timeout mymaster 900000
sentinel can-failover mymaster yes
sentinel parallel-syncs mymaster 1

sentinel monitor resque 127.0.0.1 7789 1
sentinel down-after-milliseconds resque 10000
sentinel failover-timeout resque 900000
sentinel can-failover resque yes
sentinel parallel-syncs resque 5

And i turned him on! with

redis-server sentinel-my.conf --sentinel

And stuff started to work :D

Carnage!!!

So i started easy

redis 127.0.0.1:6379> get "hello"
"hello"
redis 127.0.0.1:6379> set "hello" "lulz"
OK

Works! I killed master and connected on 26379 to sentinel master did query

 sentinel masters

and got

1)  1) "name"
    2) "resque"
    3) "ip"
    4) "127.0.0.1"
    5) "port"
    6) "7789"
    ...

Cool works great :D The only thing that worried me was that when i turned on master after failover (it took 8 sec) he did not pickup he is slave and he did not start replicating data.

when you do this…

You will see beefy

[36373] 26 Sep 21:15:03.441 # Error condition on socket for SYNC: Connection refused
[36373] 26 Sep 21:15:04.521 * Connecting to MASTER...
[36373] 26 Sep 21:15:04.521 * MASTER <-> SLAVE sync started
[36373] 26 Sep 21:15:04.521 # Error condition on socket for SYNC: Connection refused
[36373] 26 Sep 21:15:05.128 * MASTER MODE enabled (user request)

On the initial slave :) things just went from bad to good :D

Summary

This is cool new feature that you can have master-slave and auto failover server the only thing that driver have to do is if you get error while connecting / querying is to ask sentinel for new master connect and retry :) It is very basic but…

I like it!

THIS IS EXPERIMENTAL FEATURE and much more info about it you can finde here http://redis.io/topics/sentinel. Especially about pub/sub way of watching stuff / events while they occur.

Designing Twitter Clone in Redis

Jul 31st, 2012 | Comments

Ont he official redis site http://redis.io you can find this http://redis.io/topics/twitter-clone/ post about building twitter clone in redis. I based my design post partially on it but i would like to go more deep into building timeline and posts.

Quick review

I used similar approach to store followers and following so i will just go fast through the keys and design.

  twtr:<user_id>:follows -> set of ids this user follows
  twtr:<user_id>:followers -> set of id's  that follows this user

What happens when i click “follow”

example

redis 127.0.0.1:6379> SADD twtr:kuba:following amelia
(integer) 1
redis 127.0.0.1:6379> SADD twtr:amelia:followers kuba
(integer) 1
redis 127.0.0.1:6379> SADD twtr:kuba:following dan
(integer) 1
redis 127.0.0.1:6379> SADD twtr:dan:followers kuba
(integer) 1
redis 127.0.0.1:6379> SADD twtr:kuba:following ben
(integer) 1
redis 127.0.0.1:6379> SADD twtr:ben:followers kuba
(integer) 1
redis 127.0.0.1:6379> SMEMBERS twtr:kuba:following
1) "ben"
2) "amelia"
3) "dan"

So now i follow amelia, ben and dan.

What happens when amelia click followers!

example

redis 127.0.0.1:6379> SMEMBERS twtr:amelia:followers
1) "kuba"

This way for each user we can see set of people who follow him and those who he follows. Thats all we need like in tutorial.

Post

So i think it is not waste if we will decide to keep post in form of 2 keys

  twtr:<user_id>:post:<post_id> -> content text of post
  twtr:<user_id>:post:<post_id>:created_at -> creation time of post
  twtr:post_owner:<post_id> -> id of post creator.

Why this approach and not compacting all the things into pipe separate key? Both solutions seems to be ok, this just leaves little bit more flexibility. I know that it will generate 2 x times more pickups to redis then previous so you can consider doing

  twtr:<user_id>:post:<post_id> -> (timestamp|text)

Both solutions have a pros and cones, first one will require 2 x lookups and second one will require parsing data in app layer. Still i prefer first one.

Post id

This is hard topic, because in future we would want to scale ( lol ) . Generating post id is not easy task in this case. We could just use auto incrementing counter like this

  twtr:post_id:next -> auto incr

But to have something more flexible you should look at something like snowflake http://engineering.twitter.com/2010/06/announcing-snowflake.html.

Post list

For each user we will store a list of posts he wrote. Initially i thought that we could just pump everything into list. But this is not optimal. This is single point which will grow like crazy and we will be not able to decide how and when to archive parts that are not used. Eg. posts did by user 8 months ago are not really relevant today because if we will make assumption that on average person posts few times a week this 8 month old entry will be way forgotten. We want to archive it, also it will be healthier for memory to store short lists.

I see here two scenarios:

user looks at his last few posts < 100
user is infinite scrolling through all posts.

So this scenario reasonable seems to have list of lists in which we will have ordered post lists id’s. if we will use only LPUSH to add posts lists tot his list we will be able to to do easy LRANGE 0 to get newest lists.

  twtr:<user_id>:lists -> list of lits id's only LPUSH id and LRANGE 0 number.
  twtr:<user_id>:list_next -> auto incr counter for lists id's
  twtr:<user_id>:list:<list_id> -> list with 100 posts

So how do we get most recent posts ? we just LRANGE 0 2 to get most recent two lists and next we will merge them first + second. Both are LPUSH’ed so should be semi ordered. (we don’t really care about order). adding stuff to time line is bit tricky.

we need to do it like this LRANGE <key> 0 1 current list id, next we need to LLEN <key> to check size and if it is < SIZE ( for our example 100 ) we just LPUSH <key> <value> and job done, if size is > 100 we need to INCR <list counter> and LPUSH its result on list of lists and next we need to LPUSH <key> <value> on the new list.

And all of this in application layer. But this is the hardest bit to do. May seem to be complicated but if this seems to be not optimal you can add one more list

  twtr:<user_id>:list:current -> list of current 100 posts

This list is just the most current posts of particular user. How does this list work ? Algorithm is simple

LPUSH new post id’s
RPOP if SIZE > 100

This could be useful to reduce number of hits you get against redis.

Time line

Now the time line. Time line is exactly the same as user post list. we will just one “bit” about adding posts.

Algorithm here is when you add a post you have to pick all id’s of people who follow you. (example from top if you are adding post as amelia)

SMEMBERS twtr:<user_id>:followers

And you need to push your post id into their time line posts list. Thats all. Ofc one thing that we need to add are keys for time line

  twtr:<user_id>:timeline:lists -> list of lits id's only LPUSH id and LRANGE 0 number.
  twtr:<user_id>:timeline:list_next -> auto incr counter for lists id's
  twtr:<user_id>:timeline:list:<list_id> -> list with 100 posts
  twtr:<user_id>:timeline:current -> LPUSH, RPOP > SIZE current list cache

Summary

This is how i would approach building twitter like clone. Things like old lists can be easily archived into mysql, postgres or other thing and EXPIREed from redis. One thing in my design is common that i put a lot into keys <user_id> this could be skiped but in my opinion it is not bad. IF you will use <user_id> in form of user email md5 you can use it directly to access gravatar of that user.

On average you will need to do around 10-30 hits into redis to get data if you plan to do it in a “lazy way” you can minimize number of hits to around 10.

If you see problem with my design comment i want to know it!. The core of this design is that each user post data is stored into one redis instance. This is important because of access and race condition stories if you will have many redis instances. But achieving “sharding” in application layer is not hard. Only thing that i would care about is post id generator. This is single point of failure because i have a strong assertion that post_id is unique in whole system.

Cheers!

Designing Todo App Backend Using Redis and Mongodb

Jul 26th, 2012 | Comments

side note

Upgrading octopress is a @!#!@pain! :>

What is this post about :)

Long time nothing new here so i will glue something together about stuff that we were talking about today with my friend Jarek. We talked about building backend for Todo app :). Yes simple todo app. How to build scalable backend. So my initial thought was “how i would design it in different databases”. (i’m taking only about data model)

Requirements

What we know:

User has some sort of id. (number, email, hash of something)
We need to be able to have different todo lists
User can choose his todo list and see tasks ( obvious )
User can tag tasks!
User can query tasks in list by tags
User can see all tags.

Design using Redis

How to do it with redis ? :)

First few facts i assumed at start. Single todo task has body and timestamps [created_at, updated_at] and base for key will be phrase “todoapp”.

So lets start with user and his list of todo lists :). This gives us first key

todo:<user_id>:todolist:next => auto incrementing counter for lists id
todo:<user_id>:todolists => [LIST]
todo:<user_id>:todolist:<todo_list_id>:name => list name

Here we have two keys, first is list id counter that we will bump to get new list counter :), second is list of todolists ids. Why do it this way ? Well people can add and remove todo lists.

Ok so how to create new list ?

example

redis 127.0.0.1:6379> INCR todo:kuba:todolist:next
(integer) 1
redis 127.0.0.1:6379> RPUSH todo:kuba:todolists 1
(integer) 1
redis 127.0.0.1:6379> LRANGE todo:kuba:todolists 0 -1
1) "1"
redis 127.0.0.1:6379> SET todo:kuba:todolist:1:name "things to do"
OK

Hey ! we just added id of our first list to list of our todo lists (lots of list word here!). Ok so now lets add a task.

list:

todo:<user_id>:todolist:<todo_list_id>:next => auto incrementing counter for tasks id
todo:<user_id>:todolist:<todo_list_id> => [LIST]

and task:

todo:<user_id>:task:<task_id> => content of task eg. "finish blog post"
todo:<user_id>:task:<task_id>:created_at => epoch time when it was created handled by app
todo:<user_id>:task:<task_id>:updated_at => epoch time when it was last updated handled by app

Ok so how to i add task to my list

adding task

redis 127.0.0.1:6379> INCR todo:kuba:todolist:1:next
(integer) 1
redis 127.0.0.1:6379> LPUSH todo:kuba:todolist:1 1
(integer) 1
redis 127.0.0.1:6379> SET todo:kuba:task:1 "finish blog post"
OK
redis 127.0.0.1:6379> SET todo:kuba:task:1:created_at  1343324314
OK
redis 127.0.0.1:6379> SET todo:kuba:task:1:updated_at  1343324315
OK

And we have our first task in. How do we get tasks from out todo list simple!

peeking task

redis 127.0.0.1:6379> LRANGE todo:kuba:todolist:1 0 -1
1) "1"
redis 127.0.0.1:6379> GET todo:kuba:task:1
"finish blog post"
redis 127.0.0.1:6379> GET todo:kuba:task:1:created_at
"1343324314"

Ok so now we have very simple todo lists with tasks, well at least overview. Ofc you can use sets for todo lists or zsets but lets stay with lists to keep it simple for now.

How ro remove task from the list ?

removing task

redis 127.0.0.1:6379> LREM todo:kuba:todolist:1 -1 1
(integer) 1
redis 127.0.0.1:6379> LRANGE todo:kuba:todolist:1 0 -1
(empty list or set)

Good, now we can add tasks, remove tasks, same sotry with adding todo lists and removing todo lists. One last thing is to add tags!. Simply here each task will have list of tags and each tag will have list of tasks related with.

todo:<user_id>:task:<task_id>:tags => [LIST]
todo:<user_id>:tag:<tag>:tasks => [LIST]

So how we will add tags to tasks ? Simple!

tagging

redis 127.0.0.1:6379> LPUSH todo:kuba:task:1:tags "redis"
(integer) 1
redis 127.0.0.1:6379> LPUSH todo:kuba:task:1:tags "design"
(integer) 2
redis 127.0.0.1:6379> LPUSH todo:kuba:tag:redis:tasks 1
(integer) 1
redis 127.0.0.1:6379> LPUSH todo:kuba:tag:design:tasks 1
(integer) 1
redis 127.0.0.1:6379> LRANGE todo:kuba:tag:design:tasks 0 -1
1) "1"
redis 127.0.0.1:6379> LRANGE todo:kuba:task:1:tags 0 -1
1) "design"
2) "redis"

This example shows what we need to do to tag a task with something and how to peek tasks tagged with it. Why we have both lists ? To make it fast while searching. If user will click on particular tag like “redis” you want to get it O(1) time not O(N) after searching all keys. And same the other way, normal ui will pull task test, when it was created and tags to display so we want to have this data ready.

This is whole design for the todo app. We have 8 types of keys. Things like pagination, calculating time are all left to app layer. Important thing is that i scope everything to user key / id. This is because i want to isolate each user space easy. Each user in his own space will have short lists, there is no danger of “ultimate” non splittable lists.

Design using Mongodb

Well this case upfront is easier to grasp because for each list we can use single document or collection of documents lets talk about both solutions.

Todolist = Document

In this example we will use built in “array” operators

creating

> db.todolists.save({name: "House work", tasks: []})
"ok"
> db.todolists.find({name: "House work"})
[
  {   "name" : "House work",   "_id" : {   "$oid" : "50118742cc93742e0d0b6f7c"   },
      "tasks" : [   ]   }
]

So lets add a task :)

db.todolists.update({name: "House work"},{$push:{"tasks":{"name":"finish blog post", "tags":["mongo"] } } })
"ok"
> > db.todolists.find({name: "House work"})
[
  {   "name" : "House work",   "_id" : {   "$oid" : "50118742cc93742e0d0b6f7c"   },
      "tasks" : [
       {   "name" : "finish blog post",   "tags" : [   "mongo" ]   }
      ]   }
]

this will create empty todo list with name “House work” of course this way we will not omit building sub lists of tags etc, we have to build in a same way like in redis but as part of document. The story is exactly the same like above in redis. Mongodb lets us query nested documents and this will enable us to skip some of the extra “lists” while doing search.

Lets try it out how to find out mongo tagged entries?

find by tag

> db.todolists.find({"tasks.tags": { $in : ["mongo"] } })

[
  {   "name" : "House work",   "_id" : {   "$oid" : "50118742cc93742e0d0b6f7c"   },
      "tasks" : [
        {   "name" : "finish blog post",   "tags" : [   "mongo" ]   }
      ]   }
]

This way we can find whole todolist with task that contains tag “mongo” but after that we will have to work out from the document in app the task that we are interested int. Using it like this we will have a document with structure like this

todo list structure

{
 user: "user id",
 name: "<name>",
 tasks: [
   {
    text : "todo text",
    tags : [
      "Tag1", "Tag2"
    ]
    created_at : Time,
    updated_at : Time
   }
 ]
}

Using redis we could wrapp stuff into MULTI command while using mongodb “array” command we are a bit cowboishing. They could remove wrong stuff in we will not be cautious :). (well same in redis!) Big plus of Mongodb is native time type!

Todolist = many documents

Using this approach we can leverage more of our stuff on mongodb search in this approach each task will be a different document. With structure like this

task structure

{
  todo_list: "todo list id",
  user: "user id",
  text: "todo text",
  tags: ["Tag1", "Tag2"]
}

This way we will have a lot of documents, more disk space consumption and still we will have to have second collection with with objects with structure like this

structure of todolist

{
  todo_list: "todo list id",
  name: "todo list name",
  user: "user id",
  // tasks: [Tasks OBjectID Array] you could have this and remove todo_list id from tasks choice is yours :)
}

And this way we can use find tool very easy and get documents fast.

Summary

All of this solutions have some pros and cons, mongodb excels better when documents are bigger (limit is set on 16 mb per document) than loads of small documents (massive waste of space). Solution in redis is really fast and if you will implement lazy loading it will be very fast. You can adjust this designs to your situation by changing lists to sets etc. The place where redis OWNS mongodb in this context is “strucutres” and we use a lot of them to store data like this, lists sets, zsets. Implementing priority list in mongodb will be totally custom solution while in redis we can just use zset.

This is just my point of view on this. I will supply some code to cover it more in part two. This is next problem, i’m sure solution in mongodb using things like mongoid http://mongoid.org will be much more developer friendly then building things “rawly” in redis hiredis client.

btw i jsut wrote this from “top of my head” so it may contain typos and i’m sure keys, structures can be optimized :) This is just to open discussion with my friend :)

Cheers!

Api Prototyping With Rails Respond_with

Jun 15th, 2012 | Comments

Prototyping json / xml RESTfull api with rails is easy. Before we will want to rewrite it to something like erlang webmachine or node.js! For purpose of this we can use syntax introduced in rails 3. (ages ago) with respond_to/respond_with it is very cool.

Example controller

How we use this ? Lets take a peek and simple example:

class VenuesController < ApplicationController

  respond_to :html, :xml, :json

  def index
    respond_with(@venues = Venue.all.page(params[:page]).per(20))
  end

end

Two things we will notice at start is respond_to :html, :xml, :json this is something like old merbs provides where we specify to which formats we want to respond. Second change is how we layout the action. All we have to do is to put into respond_with object we want to respond with.

What this buys for us ?

If we have html request to actions like create, update or destroy we want to redirect on success
if we have json, xml request to same type of “state changing” actions we want to render response with our format.

We achieve both of this with reponds_with in just one line. But lets take a peek at longer example.

class VenuesController < ApplicationController

  before_filter :authenticate_user!, :except => [:index, :show, :near]

  respond_to :html, :xml, :json

  def index
    respond_with(@venues = Venue.all.page(params[:page]).per(20))
  end

  def show
    respond_with(@venue = Venue.find_by(permalink: params[:id]))
  end

  def new
    repond_with(@venue = Venue.new)
  end

  def create
    @venue = Venue.create(params[:venue])
    respond_with(@venue, :location => venues_url)
  end


end

In this example we can see example of create action where we add :location => venues_url this will in case of format: html success redirect to this url.

Summary

Using this helps to write stuff fast and readable, you can still use plain old respond_to in action with old format.html syntax..

I hope you liked it :)

Pong Tests Are Wrong

Jun 7th, 2012 | Comments

After short talk with tomash and his gist https://gist.github.com/2871286 with performance of vibe.d i decided to write this post. In my opinion pong tests are wrong and do not show real “performance”. I said to him to check sinatra with thin handler not webrick and that showed ~1.6k req/sec which is not bad at all. While at my box it was ~900 req/sec so significantly less (Macbook Pro i5).

From this gist we can see that his vibe.d benchmark set pong test at 8425.85 req/sec (He used ab and i used httperf).

Warp

My first candidate in this competition of pong tests is haskell warp handler.

code:

pingpong.hs

{-# LANGUAGE OverloadedStrings #-}

import Network.Wai
import Network.Wai.Handler.Warp
import Network.HTTP.Types (status200)
import Blaze.ByteString.Builder (copyByteString)
import qualified Data.ByteString.UTF8 as BU
import Data.Monoid
import Data.Enumerator (run_, enumList, ($$))

main = do
    let port = 8000
    putStrLn $ "Listening on port " ++ show port
    run port app

app req = return $
    case pathInfo req of
        ["pong"] -> pong
        x -> index x

pong = ResponseBuilder status200 [ ("Content-Type", "text/plain") ] $ mconcat $ map copyByteString
    [ "pong" ]

index x = ResponseBuilder status200 [("Content-Type", "text/html")] $ mconcat $ map copyByteString
    [ "<p>Hello from ", BU.fromString $ show x, "!</p>"
    , "<p><a href='/pong'>pong</a></p>\n" ]

it is semi rack like syntax ;)

Benchmark results

I prepared results to show how big lie is pong test. Because in this type of test / showoff all you test is how fast really you can accept connections. Single thread will always win :). But lets look at results:

Multi threaded – 50 threads – Request rate: 858.8 req/s (1.2 ms/req) BOOO!!!!
Multi threaded – 4 threads ( on for each core ) – Request rate: 10020.8 req/s (0.1 ms/req) Vibe.d die!!! Yeah!!!!
Single threaded – default compilation – Request rate: 13584.1 req/s (0.1 ms/req) Mother of God!

Tested with httperf --uri=/ --port=8000 --num-calls=10000 --num-conns=20 httperf command.

Summary

You can test it on your own, i did it on latest 7.4.1 Ghc from haskell platform on OSX 10.7. And post reply with your results :) maybe i missed something. Code, scripts to build and run are in repository https://github.com/JakubOboza/haskell-warp-pong-test.

So how to test ?!

I think you should test your application in default environment so with db behind it, but anyone in this scenario can say it is testing performance of db. But everyday… users are really testing performance of our db ;>… or the weakest of the elements in chain. So if your db / rendering engine is performing at level of 50 req / sec fast app handler will not turn it into 5000 req / sec.

Rebar Templates

May 29th, 2012 | Comments

First question when i need to add new blog to my code I ask is, do i remember all the boilerplate and how many times i will make a mistake this time. No more :). Rebar has a nice thing built in it is templating language that lets us build our own templates.

Custom template ?

I always end up looking at old projects and copying parts like gen_server and reusing them. I always knew rebar has option to write them but never had time to look at it. Today lol, i wanted to do some cleaning at home so every thing seems to be a good excuse to not do any cleaning :D.

What i need to know

Basic template is made from one or many .erl files written with mustache style { { } } code and .template file describing what to do with files.

Let’s build gen_server template

So my first file will be gen_server.erl

gen_server.erl

%%% @author  { {author_name } } <{ {author_email} }>
%%% @copyright  { {copyright_year } }  { {author_name} }.
%%% @doc  { {description } }

-module( { {name } }).
-behaviour(gen_server).

-author(' { {author_name } } <{ {author_email } }>').

-export([start_link/1]).
-export([init/1, handle_call/3, handle_cast/2, terminate/2, handle_info/2, code_change/3, stop/1]).

% public api

start_link(_Args) ->
  gen_server:start_link({local, ?MODULE}, ?MODULE, [], []).

% state should be change with State that you will pass
init([]) ->
  {ok, state}.

stop(_Pid) ->
  stop().

stop() ->
  gen_server:cast(?MODULE, stop).

handle_call({method_name_and_params}, _From, State) ->
  Response = ok,
  {reply, Response, State};

handle_call(_Message, _From, State) ->
  {reply, error, State}.

handle_cast(_Message, State) -> {noreply, State}.
handle_info(_Message, State) -> {noreply, State}.
terminate(_Reason, _State) -> ok.
code_change(_OldVersion, State, _Extra) -> {ok, State}.

This is the template, i know a bit long i tried to cut out all comments, euint etc and narrow it to minimum. I posted it to show how much you can save :). Now you will need the transformation file. All this things in { { something } } will be replaced by things we will type in command line or defaults from template file.

gen_server.template

{variables, [
  {name, "template"},
  {copyright_year, "2012"},
  {author_name, "jakub Oboza"},
  {author_email, "jakub.oboza@gmail.com"}
]}.
{template, "gen_server.erl", "src/{ {name} }.erl"}.

For me it looks like this, we have few default definitions and at the bottom. template! This is important part it says which file he has to copy where and what will be the name of new file. Now everything should be clear!

Injet it into rebar!

All you need now to do is symlink your template folders to ~/.rebar/templates and you can use them! (you can symlink your folder or just create one there :) )

Use!

λ rebar create template=gen_server name=example
==> tmp (create)
Writing src/example.erl

Viola!

important

When i was looking at this post i saw that { { is converted in a wrong way by octopress so i added spaces between them! check repo for correct code!

My own templates

Today i started adding my own templates initially i have only gen_server and webmachine_resource but i will add more :). It is fun it is like building your own anti-boiler plate framework.

My templates repo: https://github.com/JakubOboza/rebar-templates

Hope this helps! Cheers!

Future Toons Processing Logs With node.js

Apr 29th, 2012 | Comments

I’m using new relic for monitoring of my servers and i think it is great tool but recently their marketing wing just spammmmmmed, hammered me with emails and when i replied to one of this emails it started to be even worse. My name is not Dave, David, John or Mark its Jakub. So i thought about processing nginx and all application servers on my own just for the lulz :).

First thing to achieve was to build a tool that will be able to work with logs in a way that will not limit it usage to small files. This is simple, i decided to build a lib that will enable to stream process logs by “entry”. Some of the

Initial idea

We will emit single “entry” so for logs that are build around “lines” we will emit line and for logs like rails we will emit whole entry. Line streaming is easy so few hours ago during watching 3 season of http://en.wikipedia.org/wiki/Metalocalypse i wrote this.

Future toons

Version 0.0.1 https://github.com/JakubOboza/future_toons This is my entry point to analytics on big files :). First thing was a benchmark. I said to my self that it can’t be more then 0.5sec on my box (i5 MBP) vs 100mb log file from nginx. I took a log from production and checked my initial code. It was 0.37 sec. That’s ok.

lib/toons.js

function FutureToons(filename, callback, end_callback){
  // in case someone will call it wrong way ;)
  if(false === (this instanceof FutureToons)) {
    return new FutureToons(filename, callback);
  }
  events.EventEmitter.call(this);

  this.onEnd(end_callback);

  if(filename && callback){
    // if both present run instant
    this.onLine(callback);
    this.run(filename);
  }

}

// do not put methods between this line and initial definition
util.inherits(FutureToons, events.EventEmitter);

Code of whole thing is very simple. But while building node.js module there are few things that are useful. First of all you don’t need to implement your own way of inheritance you can use the one from utils. (Remember to put it just after function definition) because it overrides prototype :) you don’t wanna lose your “instance methods” don’t you.

Next nice thing to help users is check of instanceof just at top of “constructor” function. This prevents users from using it wrong. Well its easier to say it enables them to use it wrong and fixes their mistake.

var toons = require('future_toons');
new toons("example.js", function(line){});

and

var toons = require('future_toons');
toons("example.js", function(line){});

This will produce same output and act in same way.

How to use this ?

It is simple :) You need to know three things, where the file is, what do you want to do with each line and do you need to do something at the end.

basic example:

var toons = require('future_toons');

var on_each_line_callback = function(line){
  console.log("> " + line);  
};

new toons("example.txt", on_each_line_callback);
// this will auto trigger run and process it but you can delay it like this

var streamer = new toons();
streamer.onLine(on_each_line_callback);
// you can add function on end! also :)
streamer.onEnd(function(){console.log("lol")});
// run it naow!
streamer.run("example.txt");
// you can reuse it for many files if you want ;p

This example will show each line of the file prefixed with “>” symbol and at the end he will print out “lol”. This is most common case in real world :) you need to optimize “lol”. And for now thats the whole api.

Command line interface

Currently i added simple command line interface just to play with it. example usage:

λ time node bin/toons -f ~/Downloads/access.log -e "function(line){}" node bin/toons -f ~/Downloads/access.log -e "function(line){}"
0.38s user 0.07s system 101% cpu 0.440 total

It is a bit unsafe now so maybe i will remove it soonish.

Db vs File

Some people say they need to put logs into db, i always ask this people “why not just file? this db will have to write it to this file anyway :)”

Summary

if you will get email from new relic sales guy, ignore! don’t reply ever!!! Haha while writing this post i got new email from their sales. Code is on github i hope i will have some time to work on it and it will have rails production.log streaming support soonish.

Cheers!

Setting Up Replication With Redis

Apr 29th, 2012 | Comments

Everyone who wants to feel safe about his data wants to have some sort of backup :). Redis have a support for replication. And it is very easy to setup.

Setup!

To setup replica node all you have to do is to add one line slaveOf in config @_@ of your new Redis instance. Sounds easy :). Lets think about most basic scenario.

Two nodes, master node and slave node. For purpose of this example you can just start redis using redis-server command without by default he will start on port 6379 and this is all we need to know to setup replication.

Configuration of replica node

To configure replica node all we need to do is to create place to store the db eg. mkdir replica_db and choose port eg. 7789. Last thing to do is to create config and point this node to the master. For me it looks like this:

redis-replica.conf

daemonize no
timeout 0
loglevel notice
logfile stdout
databases 16
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
dir ./replica_dir
slave-serve-stale-data yes
slave-read-only yes
appendonly no
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
lua-time-limit 5000

pidfile /var/run/redis-replica-7789.pid
port 7789
# replication config
slaveof 127.0.0.1 6379

slowlog-log-slower-than 10000
slowlog-max-len 1024
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-entries 512
list-max-ziplist-value 64
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60

Here the important thing really is slaveof 127.0.0.1 6379 where we set where is our master. port 7789 important if we are using few redis instances on one box and dir ./replica_dir be sure to not point this to master node db path if you do… you will suffer eternal flame.

Now just start the node redis-server redis-replica.conf and he will start syncing.

Checking if everything works

So by now we should have master running on default port and replica connected to it. Lets connect using redis-cli to master and set some keys eg. set name kuba. Now lets connect to our replica. If you followed the same configuration then me you can simply do ./redis-cli -p 7789 this will prompt you with regular command line interface. Now jsut type get name

redis-replica.conf

redis 127.0.0.1:7789> get name
"jakub"

Bang works!

RO

Important information is that one master can have many replicas and each replica is read only! So you can connect to it and read from it if you want / need.

 λ ./redis-cli -p 7789
redis 127.0.0.1:7789> keys *
 1) "age"
 2) "name"
redis 127.0.0.1:7789> get name
"jakub"
redis 127.0.0.1:7789> set name "not jakub"
(error) READONLY You can't write against a read only slave.
redis 127.0.0.1:7789>

Summary

I never had deadly important data in redis :) But still its worth knowing how to setup this just in case something goes wrong you may want to have replica ready :).

On official site http://redis.io/topics/replication you can learn more about replication in redis.

Cheers!

Building node.js Module Using Npm

Apr 23rd, 2012 | Comments

I have spend a bit time to find out how to build modules, where to put dependencies and how to form package.json so i decided to create this post to gather some of this info in one place. This modules are more like ruby gems than parts of language grouping functions also called modules (eg. Erlang). I was recently on conference and i want to post something about module i was working on at airport but before next post i need to add this so i will have something to reference to.

npm

Npm stands for node package manager and it is something like gems in ruby, eggs in python or apt in debian. It lets you search, install and update your node or application modules. You can create node.js modules without npm but if you want to publish your module its better to do it this way. If you have node.js >0.6.10 you should have npm bundled with your instalation if not go to http://npmjs.org/ and follow the instructions. (for normal platforms all you have to do i run http://npmjs.org/install.sh)

scaffold

First thing to do is to initialize our module, to do this we need to create a directory for it and run npm init like this.

λ mkdir my_first_module && cd my_first_module
λ npm init

Next he will ask us few questions, we should answer ;). example output:

Package name: (my_first_module)
Description: my first module
Package version: (0.0.0)
Project homepage: (none) no-fucking-idea.com
Project git repository: (none)
Author name: Jakub Oboza
Author email: (none) jakub.oboza@gmail.com
Author url: (none) no-fucking-idea.com
Main module/entry point: (none)
Test command: (none) mocha -R landing lib/my_first_module.js
About to write to /private/tmp/my_first_module/package.json

{
  "author": "Jakub Oboza <jakub.oboza@gmail.com> (no-fucking-idea.com)",
  "name": "my_first_module",
  "description": "my first module",
  "version": "0.0.0",
  "homepage": "no-fucking-idea.com",
  "scripts": {
    "test": "mocha -R landing lib/my_first_module.js"
  },
  "dependencies": {},
  "devDependencies": {},
  "optionalDependencies": {},
  "engines": {
    "node": "*"
  }
}



Is this ok? (yes) yes

This will create for us package.json. Now we have fully functional module and we could stop now…. but

package.json

This file is description of our package. It is in form of json so it should be easy to read and change. Most of this fields don’t need a lot of description because keys are self explaining but things that we should look at is "scripts" where we defined "test" key. If we will now run npm test he will execute this command that is very useful.

Engines defined on what version of node.js our code will work leaving it to * is a bit hazard. You can set it to something specific if you want.

Important thing! If we will not specify entry point to our module by default it will be looking for index.js so for now lets leave it this way.

tests with mocha

If we want to write reasonable code that we want to rely on we should be doing massive testing. I think mocha is a very good library for this purpose! I strongly suggest installing it with flag -g so it will be accessible in global scope of npm

λ npm install -g mocha

Code

Ok but we don’t have any code yet ;/. Yes lets start coding.

λ mkdir lib
λ touch index.js
λ touch lib/my_first_module.js

Ok so we created lib directory with out module code, empty module file and index file. Now lets run the tests and see what happened.

λ npm test

> my_first_module@0.0.0 test /private/my_first_module
> mocha -R landing lib/my_first_module.js

  ✔ 0 tests complete (1ms)

Everything passed, that is expected because we don’t have any tests :) Now lets add to package.json one more thing development dependency for should.js it will enable us to use mocha in a bit rspec bdd style. Like this:

package.json

  "devDependencies": {
    "should": ">= 0.0.0"
  },

And again npm install -l to get everything installed locally.

First test

Initial mocha test for our module

lib/my_first_module.js

function MyFirstFoo(a, b){

}


module.exports = MyFirstFoo

require('should')

describe('MyFirstFoo', function(){
  it("should be able to add", function(){
    MyFirstFoo(2, 3).should.be[5];
  });
});

First we define empty body of function next we have module.exports = this is to mark which functions will be visible outside this module when other clients will require this module. If you need more info about writing specs in mocha please read my earlier post http://no-fucking-idea.com/blog/2012/04/05/testing-handlebars-with-mocha/. Now lets run npm test

 λ npm test

> my_first_module@0.0.0 test /private/tmp/my_first_module
> mocha -R landing lib/my_first_module.js
  -----------------------------------------------------------------------------------------------------------------------------------------------
  ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅✈
  -----------------------------------------------------------------------------------------------------------------------------------------------

  ✖ 1 of 1 tests failed:

  1) MyFirstFoo should be able to add:
     TypeError: Cannot read property 'should' of undefined
      at Context.<anonymous> (/private/tmp/lol/my_first_module/lib/my_first_module.js:13:21)
      at Test.run (/opt/local/lib/node_modules/mocha/lib/runnable.js:156:32)
      at Runner.runTest (/opt/local/lib/node_modules/mocha/lib/runner.js:272:10)
      at /opt/local/lib/node_modules/mocha/lib/runner.js:316:12
      at next (/opt/local/lib/node_modules/mocha/lib/runner.js:199:14)
      at /opt/local/lib/node_modules/mocha/lib/runner.js:208:7
      at next (/opt/local/lib/node_modules/mocha/lib/runner.js:157:23)
      at Array.0 (/opt/local/lib/node_modules/mocha/lib/runner.js:176:5)
      at EventEmitter._tickCallback (node.js:192:40)

...

Fails as expected so lets ad implementation to our function.

lib/my_first_module.js

function MyFirstFoo(a, b){
  return a + b;
}


module.exports = MyFirstFoo

require('should')

describe('MyFirstFoo', function(){
  it("should be able to add", function(){
    MyFirstFoo(2, 3).should.be[5];
  });
});

And run npm test

λ npm test

> my_first_module@0.0.0 test /private/tmp/lol/my_first_module
> mocha -R landing lib/my_first_module.js
  -----------------------------------------------------------------------------------------------------------------------------------------------
  ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅✈
  -----------------------------------------------------------------------------------------------------------------------------------------------

  ✔ 1 tests complete (2ms)

We have landed safely ;) Now we are ready for development!

Summary

Creating good quality of code requires testing in node.js thats why i decided to join this two things and explain how to marry them both fast. More info can be found here http://howtonode.org/how-to-module.

I hope it helped a bit. Cheers!

Setting Up Redis Cluster

Apr 16th, 2012 | Comments

redis cluster in currently unstable, i used todays master HEAD (93a74949d7bb5d0c4115d1bf45f856c368badf31) commit to build my redis server and client. Setting redis cluster requires only few settings to go! :)

Here is link to overview how redis cluster works http://redis.io/presentation/Redis_Cluster.pdf

redis.conf

Regular nodes can’t be part of cluster :( so you have to prepare separate redis configs for your cluster servers. Most important thing is to setup cluster-enabled and cluster-config-file I decided to name my config files redis-cluster-<port>.conf. I used ports 4444, 4445 4446 Here is my sample config

daemonize yes
timeout 0
loglevel notice
logfile stdout
databases 16
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
dir ./cluster_4444
slave-serve-stale-data yes
slave-read-only yes
appendonly no
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
lua-time-limit 5000

# Cluster
#
pidfile /var/run/redis-4444.pid
port 4444
cluster-enabled yes
cluster-config-file redis-cluster-4444.conf

slowlog-log-slower-than 10000
slowlog-max-len 1024
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-entries 512
list-max-ziplist-value 64
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60

For each node i created directory cluster_<port> and that was the hardest part actually to do. With this all you have to do is to start ( for debug you can set daemonize to no) all nodes using redis-server path/to/redis-cluster-<port>.conf and then use magic ruby tool :)

redis-tribe.rb

In src/ directory of source you can find ruby script for creating and managing cluster. But first you need to have ruby installed with redis gem. i just did gem install redis but if you don’t have ruby you have to google how to install it etc (hint: get 1.9.2).

now you can run the script. ./redis-tribe.rb and see

λ ./redis-trib.rb
Usage: redis-trib <command> <arguments ...>

  create               host1:port host2:port ... hostN:port
  check                host:port
  reshard              host:port

To start cluster we will type “create” (useless comment)

λ ./redis-trib.rb create 127.0.0.1:4444 127.0.0.1:4445 127.0.0.1:4446
Creating cluster
Connecting to node 127.0.0.1:4444: OK
Connecting to node 127.0.0.1:4445: OK
Connecting to node 127.0.0.1:4446: OK
Performing hash slots allocation on 3 nodes...
[FAIL] 5a2f6df453f1cd52bcb22c2afc45580283bcce87 127.0.0.1:4444 slots:0-1364 (1365 slots)
[FAIL] 35d107017bc726ece9b57e1ea2f21678555cf6a8 127.0.0.1:4445 slots:1365-2729 (1365 slots)
[FAIL] 76d06b0d3cb1b3829cb60574260dff2d06964cea 127.0.0.1:4446 slots:2730-4095 (1366 slots)
Can I set the above configuration? (type 'yes' to accept): yes
** Nodes configuration updated
** Sending CLUSTER MEET messages to join the cluster
Performing Cluster Check (using node 127.0.0.1:4444)
[FAIL] 5a2f6df453f1cd52bcb22c2afc45580283bcce87 127.0.0.1:4444 slots:0-1364 (1365 slots)
[FAIL] 35d107017bc726ece9b57e1ea2f21678555cf6a8 127.0.0.1:4445 slots:1365-2729 (1365 slots)
[FAIL] 76d06b0d3cb1b3829cb60574260dff2d06964cea 127.0.0.1:4446 slots:2730-4095 (1366 slots)
[OK] All 4096 slots covered.

Nice we have our cluster running :) now we can connect to any node and try it out.

λ ./redis-cli -h 127.0.0.1 -p 4445
redis 127.0.0.1:4445> set "jakub" "oboza"
(error) MOVED 198 127.0.0.1:4444

Sweet :D

Using this tool you can also reshard :D I did on my 15 keys worked :-F.

Smart clients

In redis doc we can read that you will require “smart client” to make it low latency. Yes, you can read from output that it was moved so you will have to cache where the key is now and reset temp cache when it will be moved (resharding)

Fire!

You can now test how it will behaves under fire by killing and restarting your nodes eg.

[19008] 16 Apr 19:44:09.945 # Server started, Redis version 2.9.7
[19008] 16 Apr 19:44:09.946 * The server is now ready to accept connections on port 4444
[19008] 16 Apr 19:45:14.414 * Connecting with Node c20290a7b70a2a840a168c3309f00e3de1b1844d at 127.0.0.1:14446
[19008] 16 Apr 19:45:15.424 * Connecting with Node ab93647957ed4bb93fc43b1dc76202a6cdb94f49 at 127.0.0.1:14445
[19008] 16 Apr 19:59:10.047 * 1 changes in 900 seconds. Saving...
[19008] 16 Apr 19:59:10.047 * Background saving started by pid 19321
[19321] 16 Apr 19:59:10.080 * DB saved on disk
[19008] 16 Apr 19:59:10.248 * Background saving terminated with success
[19008] 16 Apr 20:08:29.837 * I/O error reading from node link: connection closed
[19008] 16 Apr 20:08:29.837 * I/O error reading from node link: connection closed
[19008] 16 Apr 20:08:30.056 * Connecting with Node 76d06b0d3cb1b3829cb60574260dff2d06964cea at 127.0.0.1:14446
[19008] 16 Apr 20:08:30.056 * I/O error writing to node link: Broken pipe
[19008] 16 Apr 20:08:30.525 * I/O error reading from node link: connection closed
[19008] 16 Apr 20:08:30.526 * I/O error reading from node link: connection closed
[19008] 16 Apr 20:08:31.063 * Connecting with Node 35d107017bc726ece9b57e1ea2f21678555cf6a8 at 127.0.0.1:14445
[19008] 16 Apr 20:08:31.064 * Connecting with Node 76d06b0d3cb1b3829cb60574260dff2d06964cea at 127.0.0.1:14446

Summary

Even if i think this is a great tool, and is unstable i saw after few minutes play that some things just don’t work as intended and some keys are not pushed. But it is pulled from unstable branch so i’m crossing my fingers for this project because it looks sweet! Go Go Antirez.

← Older Blog Archives Newer →