‘If’ the Real Game Changer

Feb 21st, 2013 | Comments

Recently i was working on a project that is processing campaigns. I asked my self if I can improve it a bit, but not by refactoring existing code but by trying to approach it from a different angle.

Observations

I started making mind notes of things that i understand about system i’m working with. I never had time to think about grammar of this and really get into it. The system has many parts but i thought it can be reproduced in for of language that is easy to parse and nice to work with. I think i might change my current solution into more LISP looking thing just to make it easier to parse.

Observation 1

Each campaign is a sequence of actions that happens one after another. So my initial thought was “this is simple”. We can represent it by list of actions like this

 [step] -> [step] -> [step]

But this is true only for simple campaigns and most of them are NOT like this :(

Observation 2

Most of the campaigns are built around key thing. This thing is making decisions! So if i can incorporate “if” i WIN ! Lets have a look

                     /> [step]
                   (No)
                    |
  [step] -> [if a < 5] - ( yes ) -> [step]

Yes! This was it, i think this is solution to all problems, ability to represent campaign as a sequence of steps mixed with if statements essentially abstract syntax tree of campaign.

Observation 3

It is not a tree…its more a graph. But AST will make it :] it is a programming language! So this was in my head and i did not had time to work on it… but today i decided to give a try and I made first impression of AST that we would need to have to make it work.

AST

I wrote simple grammar and made a parser of a very simple “ify” language. My language starts like this…

-- Grammar
-- x is variable
-- a is artmetic thing aka number
-- n is actual number eg. 1
-- b is boolean
-- opa is arthmetic operator
-- opb is boolean operator
-- opr is comparison operator
-- S is statement
--  a   ::= x | n | - a | a opa a
--  b   ::= true | false | not b | b opb b | a opr a
--  opa ::= + | - | * | /
--  opb ::= and | or
--  opr ::= > | <
--  S   ::= x | x <- a | S1; S2 | ( S ) | if b then S1 else S2

This shows that the grammar is very simple, we can have assigments, operators to test and compare and ofc IF statement ;) this is our key to divnity!

Parser looks kinda ugly because i used parts of the code i wrote before and had i in different projects.

import System.Environment(getArgs)
import System.IO
import Control.Monad
import Text.ParserCombinators.Parsec
import Text.ParserCombinators.Parsec.Expr
import Text.ParserCombinators.Parsec.Language
import qualified Text.ParserCombinators.Parsec.Token as Token

--  Grammar
-- x is variable
-- a is artmetic thing aka number
-- n is actual number eg. 1
-- b is boolean
-- opa is arthmetic operator
-- opb is boolean operator
-- opr is comparison operator
-- S is statement
--  a   ::= x | n | - a | a opa a
--  b   ::= true | false | not b | b opb b | a opr a
--  opa ::= + | - | * | /
--  opb ::= and | or
--  opr ::= > | <
--  S   ::= x | x <- a | S1; S2 | ( S ) | if b then S1 else S2 

data AExpr = Var String
            | IntConst Integer
            | Neg AExpr
            | ABinary ABinOp AExpr AExpr
              deriving (Show)

data BExpr = BoolConst Bool
            | Not BExpr
            | BBinary BBinOp BExpr BExpr
            | RBinary RBinOp AExpr AExpr
             deriving (Show)

data ABinOp = Add
            | Subtract
            | Multiply
            | Divide
              deriving (Show)

data BBinOp = And | Or deriving (Show)
data RBinOp = Greater | Less deriving (Show)

data Stmt = Seq [Stmt]
           | Assign String AExpr
           | If BExpr Stmt Stmt
           | Ident String
             deriving (Show)

languageDef =
   emptyDef { Token.commentStart    = "/*"
            , Token.commentEnd      = "*/"
            , Token.commentLine     = "#"
            , Token.identStart      = letter
            , Token.identLetter     = alphaNum
            , Token.reservedNames   = [ "if"
                                      , "then"
                                      , "else"
                                      , "true"
                                      , "false"
                                      , "not"
                                      , "and"
                                      , "or"
                                      ]
            , Token.reservedOpNames = ["+", "-", "*", "/", "<-"
                                      , "<", ">", "and", "or", "not"
                                      ]
            }

lexer = Token.makeTokenParser languageDef

identifier = Token.identifier lexer -- parses an identifier
reserved   = Token.reserved   lexer -- parses a reserved name
reservedOp = Token.reservedOp lexer -- parses an operator
parens     = Token.parens     lexer -- parses surrounding parenthesis:
integer    = Token.integer    lexer -- parses an integer
semi       = Token.semi       lexer -- parses a semicolon
whiteSpace = Token.whiteSpace lexer -- parses whitespace

whileParser :: Parser Stmt
whileParser = whiteSpace >> statement

statement :: Parser Stmt
statement =   parens statement
           <|> sequenceOfStmt

sequenceOfStmt =
   do list <- (sepBy1 statement' semi)
      return $ if length list == 1 then head list else Seq list

statement' :: Parser Stmt
statement' =   ifStmt
            <|> assignStmt
            <|> identName

ifStmt :: Parser Stmt
ifStmt =
   do reserved "if"
      cond  <- bExpression
      reserved "then"
      stmt1 <- statement
      reserved "else"
      stmt2 <- statement
      return $ If cond stmt1 stmt2


identName :: Parser Stmt
identName =
   do  varName <- identifier
       return $ Ident varName

assignStmt :: Parser Stmt
assignStmt =
   do var  <- identifier
      reservedOp "<-"
      expr <- aExpression
      return $ Assign var expr

aExpression :: Parser AExpr
aExpression = buildExpressionParser aOperators aTerm

bExpression :: Parser BExpr
bExpression = buildExpressionParser bOperators bTerm

aOperators = [ [Prefix (reservedOp "-"   >> return (Neg             ))          ]
              , [Infix  (reservedOp "*"   >> return (ABinary Multiply)) AssocLeft]
              , [Infix  (reservedOp "/"   >> return (ABinary Divide  )) AssocLeft]
              , [Infix  (reservedOp "+"   >> return (ABinary Add     )) AssocLeft]
              , [Infix  (reservedOp "-"   >> return (ABinary Subtract)) AssocLeft]
               ]

bOperators = [ [Prefix (reservedOp "not" >> return (Not             ))          ]
              , [Infix  (reservedOp "and" >> return (BBinary And     )) AssocLeft]
              , [Infix  (reservedOp "or"  >> return (BBinary Or      )) AssocLeft]
              ]

aTerm =  parens aExpression
      <|> liftM Var identifier
      <|> liftM IntConst integer

bTerm =  parens bExpression
      <|> (reserved "true"  >> return (BoolConst True ))
      <|> (reserved "false" >> return (BoolConst False))
      <|> rExpression

rExpression =
   do a1 <- aExpression
      op <- relation
      a2 <- aExpression
      return $ RBinary op a1 a2

relation =   (reservedOp ">" >> return Greater)
          <|> (reservedOp "<" >> return Less)


parseString :: String -> Stmt
parseString str =
   case parse whileParser "" str of
     Left e  -> error $ show e
     Right r -> r

parseFile :: String -> IO Stmt
parseFile file =
   do program  <- readFile file
      case parse whileParser "" program of
        Left e  -> print e >> fail "parse error"
        Right r -> return r

main = do
  args <- getArgs
  ast <- parseFile(head args)
  putStrLn $ show ast

But it works… we can parse. Now i have to make it more production ready and less unstable ;]. This is first attempt at this idea. I need to expand grammar and think about it more then just few minutes. but i think it is a good start. Next step for me is to make better grammar and parser and move to building interpreter of this mini language.

Recent Rails Security Problems and Building Software

Feb 12th, 2013 | Comments

Recently we have a very very exploit rich winter. By now most of people connects everything to YAML and in general deserializing stuff. But I will not write about this in detail because for me the problem is deeper.

Startup era

We live now in startup era and main focus of people building new products is to make them fast. Who is the first to build some solution is most likely to get the market and monetize it. So almost every father of startup is looking for a technology that will help him build thing fast. This is the main reason that Rails are so popular, they are not fast, they consume memory like crazy but they offer you out-of-box something that you can work on. It has all properties you are looking for and everything is there with rich documentation and easy to learn and use conventions. Rails are like a gift to people who wants to “prototype” application fast, but for 99% of solution out there they never leave prototype stage.

Going big!

Rails started small, the community was small and it was normal that not many people cared about finding security issues because it was more important to add support for everything and make the framework richer and richer. First thing that was the biggest helper in opening every door was raising popularity but this is also the biggest enemy of the framework. It started traction around the framework and people started to hack it, audit code and find things that can be exploited.

Natural point in life

Every framework has to go through this type of periods in its life, there is no bug free code. People will find bugs in code base. What is the best way to prevent this ? Have a great team of engineers that follows the trends. But… most of startup owners wants to cut the costs, they outsource the work and have periods of life of the product that simply nobody cares about it in technical way. Or it can be even worse, team can be focused so much on features because of the boss pressure that they don’t have time for it.

Building solution

Some of startup starters will be technical, more technical or not technical at all, in most cases it is not technical at all and this is a problem. This leads to building every startup base on frameworks like Rails or Django. Scenario looks the same every time. First team spends a lot of time building initial release and next it goes big so they don’t have a way to scale it in other way than spaming rails instances and changing database. So if something hits rails it hits whole platform and that hurts. Second scenario is that team is having some sort of engine and just a rails front end this is a smart approach because if something will go really wrong it only will kill front end and this is not bad! But how many teams build solutions in this way ? Not many, mostly polilingual team that know something more than ruby. What i experienced in my career is that people don’t wan to do some initial design decisions before start they just want to have product and “we will think about it later”. This from business point of view is ok but this “later” is often very early. Some startups are lucky enough to have engineer that are smart and know how to solve problems using background processing, caching and a bit of cheating (eg. like youtube do with vote count) so make everything work smooth but most of startups are created in a crazy way with big stress on speed of building.

Problems, security and future

People will always suggest things like YAML.safe_load in my personal opinion its not a solution but just a patch. Why not disable support for YAML, JSON, XML and any other type of request and make it explicit what you accept as form of request for actions. Trying to apply every possible parser to input is not often best thing to do.

Summary

I think problems like rails have now with security are not something we should cry about, it is just another step in becoming mature framework and problems like this can happen always with every framework. We have to embrace it and devise tactics to deal with it in a timely fashion so we will not be affected. Building software is not cheap, maintaining it is not cheap but… if you will hit right market you will get the money back.

This is how i see startup stage now.

Building FFI Extensions in Erlang

Feb 5th, 2013 | Comments

I’m preparing to upgrade my old humidity/temperature meeeeeter solution with raspberry pi. While it is easy to read stuff in C i don’t want to build whole app in C because i know how this can end when you are 1500 – 3000 km from home and app stops to respond and the only living person to fix it is your Cat Haskell. So i want to have everything in Erlang and only small module reading stuff in C.

Ports

Firt thought was to build small C program that will check stuff periodically or just “check stuff” in SHT1x and just print it out to output. So my first attempt was

For the example here i will use /proc/cpuinfo

-module(mycpuinfo).
-export([info/0]).

-define(CMD, "cat /proc/cpuinfo").

info() ->
    Port = open_port({spawn, ?CMD}, [{packet, 1}, use_stdio, exit_status, binary]),
    port_command(Port, <<>>),
    receive
  {Port, {data, Response}} ->
      {ok, {cpuinfo, Response}};
  _  -> error
    end.

Nothing super special, except that raspberry pi is not really cool with different packet sizes and it can for example not read whole input properly. I observed some issues with it. So i decided to explore more FFI.

erl_interface

The thing i found is called erl_interface and it is designed for FFI. This is it! What you do is you build process like thing in C and micro module in erlang that handles this. (It is just to make it look nice). But there are few glitches!

This is my module posting back “pong” on “ping” message

lolpong.c

#include <stdio.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>

#include "erl_interface.h"
#include "ei.h"

#define BUFFSIZE 100

int main(int argc, char** argv){

  int fd;
  unsigned char buf[BUFFSIZE];
  ErlMessage emsg;
  int msg_type;
  ETERM *fromp, *argp, *rsp;

  erl_init(NULL, 0);

  if(erl_connect_init(1, "cookie",  0) == -1){
    erl_err_quit("init failed");
  }

  if((fd = erl_connect("emu@raspberrypi")) < 0 ){
    erl_err_quit("Emus could won war with australia but you still can't connect to emu :(");
  }

  while(1){
    msg_type = erl_receive_msg(fd, buf, BUFFSIZE, &emsg);
    if(msg_type == ERL_TICK){
      /* Emu is checking australian defences, Polish people are safe! so I can ignore this */
    }
    else if(msg_type == ERL_ERROR){
      /* Huston we have an Emu! */
      break;
    }
    else {
      /* our pro msg */
      fromp = erl_element(1, emsg.msg);
      argp = erl_element(2, emsg.msg);
      rsp = erl_format("pong");
      erl_send(fd, fromp, rsp);


      erl_free_term(emsg.from);
      erl_free_term(emsg.msg);
      erl_free_term(fromp);
      erl_free_term(argp);
      erl_free_term(rsp);
    }
  }

  return 0;

}

First of all complexity goes p fast. You have to think about many things. Here i know that my host is called “emu@raspberrypi” and this is actually process that runs so if you will not remember about freeing memory you will fast learn what means “memory leak”.

But most important thing is how to build this and run.

Building

My make file looks like this

all:
  gcc -o lolpong -I/usr/lib/erlang/lib/erl_interface-3.7.7/include -L/usr/lib/erlang/lib/erl_interface-3.7.7/lib lolpong.c -lerl_interface -lei -pthread

It is important to remember about -pthread.

Running everything.

No how to make everything work… we need module

lolpong.erl

-module(lolpong).
-export([ping/0]).

ping() ->
    {any, 'c1@raspberrypi'} ! {self(), ping},
    receive
  R -> {ok, R}
    end.

` This c1 is for c extension 1 not super obvious :) cN will be for c extension N =).

And finally we need to spawn our node…

λ  erl -sname "emu" -setcookie "cookie"

This spawns node named emu@raspberrypi on my mini box with cookie “cookie” now … i said finally… but it was not final step.

Final step is to run out lolpong binary. It is important to run it after node is up because it will try to connect to this node.

./lolpong

Now we can run in our erlang shell check if everything works!

(emu@raspberrypi)1> lolpong:ping().
{ok,pong}

Works!

Summary

It is good fun, now i need to wait for the parts to assemble everything and build rest of application :)

Cheers!!

Making Request to REST Resources in Erlang

Jan 22nd, 2013 | Comments

Recently with friends we are building side project called “Midway” its a simple battleship server. The whole point of the game is to create a client and defeat other people maps! winner takes it all! The person with lowest number of hits/won-map wins! This is again a not to self type of entry. So i created some placeholder libs to building clients and while working through Erlang placeholder i made some notes on stuff.

Making request

So first of all i used builtin inets for making requests. So first thing to do is to start inets!

inets:start().

Ok i generalized using httpc to really simple thing

post(URL, ContentType, Body) -> request(post, {URL, [], ContentType, Body}).
get(URL)                     -> request(get,  {URL, []}).

request(Method, Body) ->
    httpc:request(Method, Body, [], []).

So if you will a call you will get something like this

httpc:request(get, {"http://www.youtube.com/watch?v=JvxG3zl_WhU", []}, [], []).
{ok,\{ \{"HTTP/1.1",200,"OK"},
     [{"cache-control","no-cache"},
      {"date","Tue, 22 Jan 2013 20:36:16 GMT"},
      {"server","gwiseguy/2.0"},
      {"content-length","221233"},
      {"content-type","text/html; charset=utf-8"},
      {"expires","Tue, 27 Apr 1971 19:44:06 EST"},
      {"set-cookie",
       "PREF=f1=50000000; path=/; domain=.youtube.com; expires=Fri, 20-Jan-2023 20:36:15 GMT"},
      {"set-cookie",
       "use_hitbox=d5c5516c3379125f43aa0d495d100d6ddAEAAAAw; path=/; domain=.youtube.com"},
      {"set-cookie",
       "VISITOR_INFO1_LIVE=99hU0tGtAng; path=/; domain=.youtube.com; expires=Thu, 19-Sep-2013 20:36:15 GMT"},
      {"set-cookie",
       "recently_watched_video_id_list=dda4a4415949369ee6eb305306b40e47WwEAAABzCwAAAEp2eEczemxfV2hV; path=/; domain=.youtube.com"},
      {"x-frame-options","SAMEORIGIN"},
      {"x-content-type-options","nosniff"},
      {"p3p",
       "CP=\"This is not a P3P policy! See http://support.google.com/accounts/bin/answer.py?answer=151657&hl=en-GB for more info.\""},
      {"x-xss-protection","1; mode=block"}],
     [60,33,68,79,67,84,89,80,69,32,104,116,109,108,62,60,104,
      116,109,108,32,108,97,110|...]}}

One more thing i added was extracting body of the response

  response_body({ok, { _, _, Body}}) -> Body.

So after calling it on response we can get body

  Response = httpc:request(get, {"http://www.youtube.com/watch?v=JvxG3zl_WhU", []}, [], []).
  Body = response_body(Response).
  io:format("~s~n",[Body]).
  <!DOCTYPE html><html lang="en"><head> ...

Parsing Json

So next thing to do was to parse json :) for this pupose use two libs mochijson2 or jiffy.

Mochijson2

So this one is easy to find and use. All you need to do is grab file from mochiweb project and compile it :) it has two method encode and decode.

mochijson2:decode(STUFF).
mochijson2:endcode(STUFF).

One thing worth mentioning is that it expects input in a bit structured format.

So typycaly it will use “struct” symbol for json “{}” objects.

{struct, [{name, <<"jakub">>}, {age, 27}, {ids, [1,2]}]}

So array in json will be list but object will have to be in tuple with struct. It is awkward at first but you can get used to it. but there is easier thing to use…

Jiffy

This! There is a good example on main page.

Eshell V5.8.2  (abort with ^G)
1> jiffy:decode(<<"{\"foo\": \"bar\"}">>).
{[{<<"foo">>,<<"bar">>}]}
2> Doc = {[{foo, [<<"bing">>, 2.3, true]}]}.
{[{foo,[<<"bing">>,2.3,true]}]}
3> jiffy:encode(Doc).
<<"{\"foo\":[\"bing\",2.2999999999999998224,true]}">>

This shows how nice it is to use :D no ‘struct’ pure love!

You can get it here https://github.com/davisp/jiffy

Summary

This is just note to self to not be checking up docs looking for this stuff again :).

Gems Hidden in the Past

Dec 17th, 2012 | Comments

I’ve been talking with my friend about Haskell, Ocaml ( Objective CaML), Functional programming, dependent types and INDUSTRY. We had a long talk about new Camridge hacker space, that they want to send satelite in space with “hacker” effort. That is amazing! Suddenly I thought that what we all use now is 50 years old technology. Nothing new. Things that are “new” today like nodejs “the new black” are actually technology that is as old as lisp. Maybe we should take a look at past and think for a moment… That is the things i would like to rant about today.

New Discoveries

“Theory is the thing we all skip and later on regret skipping” (Yes i quoted my self, I heard someone did it before so its not hip). We learn a lot of stuff and some of the things simply slip through and gets forgotten. That is opening for “new discoveries” or “rediscoveries” of some solutions. I have a thesis that “if something is simple it will get its niche” (did it again). This would explain new hipe for Nodejs <nodejs.org>. I don’t want to judge technologies I did some stuff in node and i have small portion of experience but this text is not about this, its about technology behind stuff people think it’s brilliant and new nowadays.

Event loop | Evented solutions

Event loops exists long long and are nothing new. When someone says event loop first thing i think about is not nodejs or eventmachine or twistted but WinAPI. Yes old WinAPI and suddenly everybody realize that every GUI (Graphical User Interface) solution is evented. QT, WinAPI and many other even older solutions. This is nothing new, so lets take a trip through the cards of history to find other gems that we can rediscover and become new rocket scientists…or first tech archeologists.

Step One: Alternatives to sequencing

So most common thing people do is sequencing and even if we use brain in other way it is easy to us to think about and form sequences because it is something we understand. This is basic unit of work we think about. So most of people write small function that sequence some action and they build programs from small functions calling small functions in a really big sequence.

Alternative solution to do this is event machine, this solution will enable us to generate a lot of small event and if we will manage to force programmer to make them as small as possible and as fast as possible we can achieve a feeling of multitasking. How does it work ? we have one loop that picks events from event queue and process them and as long as nobody will make event that is blocking everything this will work like charm. Key to this solution is “non blocking”.

Achieving Non-blocking

To do this we need to have a way of “wait until will happen and if it happen’ed trigger event” thing. This can be done by many different solutions. For example for file descriptors (everything is a file) we have stuff like select, poll, epoll, kqueue. and this is not new its the 80’. Later on POSIX first official POSIX thing 1997. 15 years ago! Previously mentioned nodejs is representing this category (ofc from the fresh news!). Does it have flaws ? Yes! People call this async… its not async, not real async. Real async was also implemented in 80’s :D

Signals

Signals are also very old, basically this is similar concept (every concept is similar) You have a slot that handles signal and something that emits signals. It’s very handy and was implemented with big success in QT library. Does it has flaws ? basically same as event loop solutions. It’s a way of doing “message passing” pattern.

Message passing

Message passing really well implemented in Erlang is another way of achieving same result, imho most clean and good solution in the industry right now. Simply you cast messages and other entities receive them and react. YES THIS IS HOW PEOPLE WORK. Most of thins seems to be inspired by nature… but lets not go this route and start religious war.

Async

The real unix async calls. Yes unix has this built in.

#include <aio.h>
int aio_write(struct aiocb *aiocbp);
int aio_read(struct aiocb *aiocbp);

You call aio_write and write to a file will happen, sometime in future… BEFORE END OF THE WORLD, maybe!

Wow! This is fresh! nobody is using it nowadays! you should make new nodejs on it and call it modejs. Or wait ? Can you ? nobody really know how to use them and not be cluster f@cked after five seconds. Real async syscalls in unix. That is awesome and its older than your son! Lets dig more…

Reactive programming

This is the sh*t, real event programming, you have real reactor nothing is faked like in eventloop! But its nothing new… also implemented in 80’s. But what is the difference between reactive programming and event loop ? Not much because this is more conceptual thing around building your reactor. In reactor concept we think about data flows and reacting to changes in them.

We don’t have to research things that are found!

Map Reduce, Google, Smart people

So how mega smart people in Google work ? I don’t know I don’t work for google and I’m not smart but I assume they want to get sh*t done. And after reading Amazon Dynamo Paper http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/decandia07dynamo.pdf , Google Big Table paper http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/pl//archive/bigtable-osdi06.pdf and many other i think they developed it in kitchen. Yes in a kitchen.

They sat down, people from google and amazon i imagine 6 engineers took 1 apple (Steve Jobs was serving) and asked them self how to how to cut it into 6 parts so they can eat it. They took knife and sliced into 6 parts and split it to each other. Suddenly one stood up (Amazon guy with headphones in avatar) and said. Each part of apple is exclusive (patent trolls have everything exclusive nowadays) to each of us so we can eat them on our own. LETS MAKE DB USING THIS TECH AND WRITE A PAPER ON IT. Next guy said.. yeah we have limited number of apple parts so it is predictable and we form a RING from it… and thats how Big Table and Amazon dynamo was born. The Google cook who was slicing apple said, i sliced apple in parts one by one with one knife… AND HE WROTE MAP REDUCE PAPER.

They did not invented anything really new, they just approached problem simple. We can’t split and scale complex solutions so we will make everything simple. And it happens that most of things can be turned into linear or near linear problem.

Is Map Reduce really something new ?

In my opinion not. Map existed in lisp world probably since 70’s. Now someone took map and sequence of data that is exclusive and don’t depend on each other and said “What will happen if i will split this list into two parts and run map on each in two or more threads and then just sum results ?” and map reduce was born. Concept is dead simple. But its nothing new! I’m not taking here credit from google engineers. I think guys there are very smart and by using simple things in my eyes they are ^2 smart. But is it something ground breaking new ? No.

So maybe we should more often look into past. Maybe we can find more gems like this… i bet google has technology archeology department :D

Step Two: Type writer

I’m 80’ kid. I remember type writers and the sound so for me understanding how files in OS (Unix or in general) works was simple, but when recently i was explaining it to friend who is 90’ kid he did not got the concept of that well. Why ? Because he is from age where type writers did not exist. For me it was a bit shocking because i thought everybody knows exactly how they work. And when i said why we have \r and what it is i was for a moment happy that I’m very very old :D.

Queues, Interpreters and other stuff.

Technology archeologiest must be open to new things, read about everything and most of all, never stop thinking “this had to be already invented”. When I first suggested Queue as a solution to a problem some people looked at me like i would be snake. FIFO queues, something so basic… still some people don’t even think about them. This is dull example but basic data structures should be well known by everybody still… things that freezes blood in your veins happens.

Building DSL and interpreters is good fun, its not easy often I’m still learning a lot but this should not be “devil” to other programmers.

Old manuscripts

There are books that contains tones of cool knowledge, you can be technology archeologist like me! just venture into them and discover “Stuff”. First book i would recall here is Richard Stevens.

1990 – UNIX Network Programming – ISBN 0-13-949876-1
1992 – Advanced Programming in the UNIX Environment – ISBN 0-201-56317-7
1994 – TCP/IP Illustrated, Volume 1: The Protocols – ISBN 0-201-63346-9
1995 – TCP/IP Illustrated, Volume 2: The Implementation (z Garym R. Wrightem) – ISBN 0-201-63354-X
1996 – TCP/IP Illustrated, Volume 3: TCP for Transactions, HTTP, NNTP, and the UNIX Domain Protocols – ISBN 0-201-63495-3
1998 – UNIX Network Programming, Volume 1, Second Edition: Networking APIs: Sockets and XTI – ISBN 0-13-490012-X
1999 – UNIX Network Programming, Volume 2, Second Edition: Interprocess Communications – ISBN 0-13-081081-9

This guy is amazing, books are very good quality talking about topics that today are more than active in our community. 22 year old books!

1989 – Genetic Algorithms in Search, Optimization, and Machine Learning

Amazing book, I never was big fan of A.I. (maybe i should, that would give me chance to actually have some of I…) but this book opened my eyes on how simple this stuff is. How easy and how powerful. When i was at uni this was one of the books i enjoed the most.

1976 – Algorithms + Data Structures = Programs (Prentice-Hall Series in Automatic Computation) by Wirth.

Classic this guy guides you through Pascal and shows you how to program and at the end of book implements subset of language. I say 76’ FFS that good books will not be printed after 2006. 44 years ago published. C’mon!

1988 – C Programming Language (2nd Edition)

Next classic, especially section about making your own malloc(). C’mon 1988’ 24 years ago. Brilliant book.

And something fresh

2004 – The Art of C++ by Herbert Schildt

Really good book, also common pattern in books that i read… guy is implementing language at the end ( subset of C )

Yes this is good stuff. Old books containing knowledge that industry will discover in 6-10 years.

But what about current research ?

Current stuff like dependent types… i recon we will hear about them in industry after our deaths ;/ next week apocalypse ;/. Yes there are tones of things to rediscover… and it’s fun! lets do it!

Summary

I love when i hear about new stuff and suddenly after inspection it becomes clear it is just old thing on new cogs. I love this. I love technology archeology and i want to bring more examples of this and write about it on my blog.

Yes sending satelite in space is something new…. another event loop implementation is not.

Sorry for ranting that much :D… anyway nobody read this :D.

Getting Some Clojure

Dec 15th, 2012 | Comments

After long chat with my friend who is Ocaml ( Objective Caml ) fanatic on topic of Lisp. I decided to get my hands on Clojure. Is there a better idea to learn language other than to write a crapy language learning tutorial ? Sure! but i will go with this one :D.

Lisp

Before i tried Clojure i had few “trips” to Lisp world but i was always stroked by amount of half finished vm’s and implementations that are far from being production ready. For example chicken, scsh, sketchy, mit-scheme, bigloo. Some of them are more some less production ready but none of them is ready for anything real. The most production ready lisp thing i saw in my life was elisp YES f*cking EMACS lisp. But I’m not any expert in lisp world!

Scheme

Scheme is a dialect of Lisp that implements subset of language, in scheme code and data has same form which is a massive game changer. Scheme has super duper hiper mega MACRO building capabilities. And it is probably its biggest problems…

Lisp Hacker’s

I remember when my friend was hacking in lisp, i remember when i was trying to hack stuff in scheme. So scheme is a trap! First stage of lisp/scheme hacker is to build scheme interpreter in scheme, when he finishes this he tries to implement scheme interpreter in his interpreter… Second trap is when he realizes that he wrote enough amount of scheme interpreters and he moves to implementing new features into his interpreter in interpreter in interpreter ( and yes ‘ready’ means that 8/10 runs stuff passes, some crash and half of features don’t even exits so called “junk features” ). So his dialect gets better than Haskell lazy evaluation, better etc… at this stage he has self bootstraping half working interpreter of his language that had no docs and nobody except his can use it and even if he can he must be on drags to make the lazy evaluation happen. Next stage of course is starting to do AI in scheme. So the point is that no shit gets done, its all conceptual work. If Da’vinci would program in Lisp he would not cut of his ear, he would cut of his head.

Life of ()

I wrote some borderline rant on lisp hackers and this was supposed to be micro tutorial on scheme. Well i took the Lisp way of “getting shit done” :D. If you don’t know that lisp bread and butter are ( and ) you should probably run away.

Naow!

So why Clojure ?

Clojure seems to be production ready stuff that can “get shit done”. It support concurrency (with STM – Software Transactional Memory) it is Scheme’y / Lispy and I have read that its hip in Open Source magazine. Thats enough to start learning yeah ?

No! But i want to learn it and write some stuff about it. It seems to be so different from other languages, RP notation and the whole community of Stallman like people.

Clojure runs on JVM ( Java Virtual Machine ) and beside Java being biggest troll language next to C++ ever! JVM is really good. It makes stuff really portable and its good for building stuff like backends for servers. Same goes with Scala using JVM, actually JVM is solid! ( coming from me Java hater ). Its not like some f*cktard writing in “Open Source Magazine” that C++ enables you to write portable code. No, writing os portable code in C++ is not easy, its hard and full of pain.

Getting s*it on your box

First thing you have to do is to install Clojure. This is the easy step. Just visit http://clojure.org/. I installed it using macports by running sudo port install clojure but for more into i would suggest going to:

Minimum jar needed to start REPL http://clojure.org/getting_started
Downloads for latest version http://clojure.org/downloads
Brew / Macports search “clojure”
In case that nothing else work, go to https://github.com/clojure/clojure and look for “build from source instructions”

Next you will want to install leiningen this is a build tool for Clojure. same procedure. either sudo port install leiningen or http://leiningen.org/ for more instructions.

Running repl and first code!

Enough of this sh*t lets do some stuff. First to run repl you need to do following.

clj

I mix it with cjl but thats because i’m a noob. This should prompt you with something like this

20:59 kuba@kuba:~ λ clj
Clojure 1.4.0
user=>

Success!!! We can now try to evaluate some simple stuff.

So now normal person would try to type something like 2 + 6 bad idea! Expressions in Lisp starts and ends with () so if you see things like ((((((((((((((((((((((((((((( or ))))))))))))))))))))))) you know something is happening :D (second one more commonly)

user=> 2 + 6
2
#<core$_PLUS_ clojure.core$_PLUS_@9d686c1>
6
user=> (+ 2 6)
8

RPN now i should quote link to wikipedia on Reverse Polish Notation. (I’m Polish so it is kind’a like looking at my back and trying to read sign on t-shirt). Here you go http://en.wikipedia.org/wiki/Reverse_Polish_notation Reverse polish notation simple words is a stack notation. Where you put operator on stack and then arguments. Or so called prefix notation. C, Java, Ruby are all infix notation eg. 2 + 6 while lisp / scheme is prefix notation aka (+ 2 6) on reality its so cool that you don’t have to repeat your self and you put operator, function only once and a list of arguments eg. (+ 1 2 3 4 5). So lets try it out.

user=> (+ 1 2 3 4 5 6)
21
user=> (- 10 7)
3
user=> (- 10 7 1)
2
user=> (* 6 7)
42
user=> (* 6 7 2)
84
user=> (/ 4 5)
4/5
user=> (/ 4 5.0)
0.8
user=> (/ 4 5.0 2)

We can see that those examples work in REPL nice!. But wait there is something different between (/ 4 5) and (/ 4 5.0). Yes! Clojure has built in support for rational numbers so 4/5 is a rational 4/5 not rounded estimate of 4/5 while result of evaluating 4/5.0 is a floating point number. Lets see what happens if we will try to do 1/3

user=> (/ 1 3)
1/3
user=> (/ 1 3.0)
0.3333333333333333

Yeah the second one is not really accurate. Or is it ?

user=> (+ (/ 1 3) (/ 1 3))
2/3
user=> (+ (/ 1 3.0) (/ 1 3.0))
0.6666666666666666

Not really…. it should be something like 0.(6) :)

So this is how you invoke functions, built in functions. But what if you would want to comment something out ? because its a useful info about new scheme dialect you are building?

Comments

You do comments using ; everything after ; is getting ignored! For example:

user=> (* 1 1) ; most useful thing Clojure has to offer 
1
user=> (/ 1 1)
1
user=>  ; comment about lisp being full of ((((

defn

First thing that makes people happy usually while learning new language is the moment when guy that tries to explain whole concept helps them write first function! Yes thats what i will do now.

To define a function we use defn (old lisp grinders will connect it with def yes!) that defines a function for us, so to it we need to supply name of function, or the symbol that this function will be bind to, arguments and body like this.

user=> (defn multiply-by [base amount] (* base amount))
#'user/multiply-by
user=> (multiply-by 5 6)
30

Next we can invoke it eg. (multiply-by 3 4) its worth mentioning that you can use - in function names. But what is really happening ? this is two part thing. First we have def and second we have fn def binds symbol name with expression or value and fn return a function. So it is “like lambda” in fact it is lambda.

Lets go with example.

user=> (fn [base] (+ base 1) )
#<user$eval33$fn__34 user$eval33$fn__34@f5bfdbd>
user=> ((fn [base] (+ base 1) ) 5)
6
user=> (def troll-symbol 7)
#'user/troll-symbol
user=> (troll-symbol)
ClassCastException java.lang.Long cannot be cast to clojure.lang.IFn  user/eval41 (NO_SOURCE_FILE:25)
user=> (+ troll-symbol ((fn [base] (+ base 1) ) 5) )
13

This is slightly more complex but also should be easy. In first line we use fn to define function with one parameter called base this returns a function. In line two we do the same but this function is returned and passed and argument of 5 and evaluated to 6. In line three we define symbol troll-symbol with value of 7 we obviously can’t evaluate a value so we get exception! It’s all fine because its value. But we can use it like in last line, where we add to value of troll-symbol effect of evaluation of function that we defined passed with argument of 5.

Don’t be afraid :D. All we did here is explained what defn do, basically defn is just composition of def and fn. Lets you specify symbol that function returned by fn is bound to.

Defn simply saves us key strokes :) Like using Emacs :D this can lead to RSI. =)

user=> (def troll-symbol 8)
#'user/troll-symbol
user=> troll-symbol
8
user=> (def troll-foo (fn [base] (+ base 1)))
#'user/troll-foo
user=> (troll-foo 7)
8

This exactly shows what happens when you use defn, you get function from fn and bind it to some symbol. That’s it. Its good to have some play with all this brackets :) its fun! try defining something on your own :D

Hello world

So every tutorial has to have something like hello world program! In my tutorial there is space for Hello World also :D

All you need to do is to create a file called hello.clj and type this in!

(println "Hello World")

When you will run it using clj hello.clj you will see something like this

λ clj hello.clj
Hello World

Lambdas!

So i mentioned something called fn for creating functions. But sometime you need a function ad’hoc eg. when you pass it to map function this is done by using different syntax.

user=> (#(+ 1 %1) 9)
10

You can still use old syntax but this is just faster to type. Remember that %1, %2 … refer to arguments that are passed to this lambda.

Lists and Vectors

Like in all programming languages we have lists and vectors. Those are built in. Syntax for List is simply '() eg. '(1 2 3 4) and for vector aka “array” its [] eg. [1 2 3]. NO COMAS!

user=> [1 2 3 4 5]
[1 2 3 4 5]
user=> '(1 2 3 4)
(1 2 3 4)

Lets do something useful with them! print them out naow!

user=> (map #(println %1) '(1 2 3 4 5))
(1
2
nil 3
nil 4
nil 5
nil nil)
user=> (map #(println %1) [1 2 3 4 5])
(1
2
3
4
5
nil nil nil nil nil)

It is map… so it is ugly because result from println is nil so we printed out contents of vector and list but its ugly. Map is a high order function that takes function and list/array/sequance of arguments, applyies and evaluates funtion on each argument and stores results of this in a list that is returned at the end :) eg. (map (+ %1 1) (1 2 3)) will give [2 3 4].

Lets try to do it with doseq.

user=> (doseq [i [1 2 3]] (println i))
1
2
3
nil

Works!

Glossary

REPL – Read Eval Print Loop. Simply language console in which you can run code and see results without typing code into file and running interpreter on it.

Lisp – On of the biggest language trolls or only language worth learning. Still can’t decide

Jar – Java archive that contains code, can contain also jam. I like jam.

Summary

This is just the first part of the tutorial i hope you liked it, just covers the very basic of defining and evaluation functions.

Using Tcpdump

Dec 11th, 2012 | Comments

We all love tcpdump :D So i found this tool useful while i was working on many things. Guess what ? it if very useful when working with network related stuff :D but its uneasy to grasp. This is my list of commands and options I use.

Mac, Linux

In this text i use en0, en1 naming convention from OSX if you are linux user you should change it to eth0, eth1 w… check your network config using ifconfig. Basic knowledge required! =)

Tcpdump

Tcpdump is a tool that lets you dump network packets. This helps to debug networking issues, apis, communication or other stuff.

Basic options

Tcpdump basic options are

-i ‘interface’ option lets you specify on which interface you will listen
-nS lets you see basic information about packets
-v, -vv, -vvv verbose mode
-s 1514 lets you specify how much data from packet is displayed. In this case you see whole packet
src, dst listening on specific things for source or destination
net eg. 192.168.0.1/24 listening on all stuff in some network.
port eg. port 3000 lets you listen on port

First example, getting info

First thing that people do often is to listen to everything that bounces en1 like this:

sudo tcpdump -nS -i en1

This is obviously bad idea, only good thing about its that i lets you see that “something is on” so you will be able to say that this device is actually working.

Example two, targeting host!

If you want to see all traffic that goes to some host, so something that is useful you should add host option.

sudo tcpdump -nS -i en1 host www.facebook.com

This will let you see if there are some packets going to and from <www.facebook.com>.

Example three, give me stuff targeting some port!

Lets say you want to see what generates curl to your own machine

sudo tcpdump -vvnS -i lo0 port 4000

and in other shell just type

curl http://localhost:4000

port is most fun option because it lets you see stuff that you are interested in.

Summary

Tcpdump is useful tool and i hope this text will let me not constantly forget its options.

Cheers!

Erlang: Digraph

Nov 20th, 2012 | Comments

This is more “note to self” type of post rather than tutorial thing.
It’s hard to write blog posts when you are trying to take over the world :)

Why ?

Recently in “professor Toons” :) computer club was a task that was ideal to solve as graph. It was one of this “find path” tasks. Yesterday just before going to bed I thought “Why not solve it in Erlang”. And this are my thoughts about erlang digraph: library.

Task

First of all http://erldocs.com/ is the way to go when working with documentation.

Second “the Task”. It had two parts, first was to load xml file with songs and second was to build graph and find playlist between two songs having in mind that for each song following has to start with the same letter that first ended.

example.

if you have song list: ABC CBA BAC ACB BBA

And you want to find playlist from ABC to BAC you will get ABC –> CBA –> ACB –> BAC. Easy

Digraph

Digraph API is quite nice but there are few things you have to have in mind.

Digraph is imperative graph!
You need to pass vertices to get_path!
Graph with loads of edges can be quite big :D

To create empty graph you call :new/0

Graph = digraph:new().

To add node to graph you call :add_vertex/2

Data = {Value1, Value2, Value3},
digraph:add_vertex(Graph, Data).

To add edge to graph you call :add_edge/3

digraph:add_edge(Graph, VertexStart, VertexEnd).

To get all vertices from graph (random order) :vertices/1

Vertices = digraph:vertices(Graph).

And finally to get path you just call :get_path/3

Path = digraph:get_path(Graph, VertexStart, VertexEnd).

This path is a list of vertices in order. This API is just to remind me essential things about using the digraph from erlang stdlib. This graph is quite simple.

And for me for about 5000 vertices and a lot of edges it consumed 1.5 GB of ram. But when it was created it was super fast to use.

It was fun to play with.

Quick summary

I will make quick summary in form of short code sample!

G = digraph:new(),
V1 = digraph:add_vertex(G, "Andrychow"),
V2 = digraph:add_vertex(G, "Krakow"),
V3 = digraph:add_vertex(G, "London"),
V4 = digraph:add_vertex(G, "Warsaw"),
V5 = digraph:add_vertex(G, "Paris"),

digraph:add_edge(G, V1, V2),
digraph:add_edge(G, V2, V4),
digraph:add_edge(G, V2, V5),
digraph:add_edge(G, V2, V3),
digraph:add_edge(G, V3, V4),
digraph:add_edge(G, V3, V2),
digraph:add_edge(G, V2, V1),

PathHome = digraph:get_path(G, V3, V1).

This gives you back ["London","Krakow","Andrychow"] so Win! we are at home!

Summary

It was fun to play a bit with digraph before going to bed. There is a tone of things i did not use and also look into digraph_utils for even more things :)

Building Up Queue System

Oct 14th, 2012 | Comments

Queue = FIFO, First in, First Out.

Many people use adds queue system to their products. Some of them do legendary things with it to make it extremely unreliable products :). Most of this solutions may seem trolling but they actually exists in some products.

Rolling out own queue system

First thing often people do is rolling out their own queue system. Is it bad ? no! it is great as long as you don’t have constraint that data can never be lost!

Can lose data in queue

If you use queue just to communicate between processes you can use something like unix name pipe. In reality this is just a file. Actually in Unix everything is a file and this is best ever design (if you neglect it you should die!).

name_pipe

mkfifo oldschool_pipe
gzip -9 -c < oldschool_pipe > out.gz &

And next you can use it to push stuff into it eg.

example

eacho "oldschoolllzzzz!!!!" > oldschool_pipe

but this is shell example, you could create it and just read/write it in your processes.

That is cool. And this is the last point where we will not see problems :)))

But we are programmers!

Yes we are programmers and most of us are young and full of energy nobody remembers 70’ i was born in 85 so i technically would be quite mad if i would remember 70’.

So how we approach problems so of us would create inside of their code queue.

In C it would be simply array wrapped with mutex’es but this is unreliable and its long to write and and and…

So what people do ? They try use READY products.

First big mistake

Use key-value store as queue. – Lets serialize array into XYZ and set it into key. – That’s good idea! Only one thing will write to it!

WRONG!

Such an assumption will provide you with insane amount of carnage in future. And even if i know “agile says XYZ now”… actually “agile” don’t say “take drugs an yolo because tomorrow you can die” but “TAKE RISKY THINGS FIRST” and this is risky thing. Should be implemented well.

What happens in this case ? Someone gets a great idea that product should scale adds another daemon and this f*cks up queue you lose messages.

Some dbms can handle this problem, but wrapping it into transaction will not solve the problem.

Scenario is: Process a)

reads queue
process is (makes a pop or push)
saves the serialised queue

Now imagine process b) does the same. Everything is blazing fast and you get f*cked.

So DB system must know context. This is where RIAK shines, you get vector clocks and you know that you are f*cked. You can react but in 99% you don’t know how to resolve this issue but at least you would know… but some specialsits can disable this because handling vector clocks is a pain and you can get PERFORMANCE BOOST :))).

Redis list as queue

Redis is great tool to build a lot of stuff. And it has built in data structures. I think this is ground breaking because previous solutions like RDBMS most commonly use or other NoSQL solutions. Redis is great how to make queue within redis ?

queue

lpush queue_name value -> push
rpop queue_name -> pop

example like this

redis_example

redis 127.0.0.1:6379> lpush queue 1
(integer) 1
redis 127.0.0.1:6379> lpush queue 2
(integer) 2
redis 127.0.0.1:6379> lpush queue 7
(integer) 3
redis 127.0.0.1:6379> rpop queue
"1"
redis 127.0.0.1:6379> rpop queue
"2"
redis 127.0.0.1:6379> rpop queue
"7"
redis 127.0.0.1:6379> rpop queue
(nil)

Cool ! Works great and any process can access it in atomic way isn’t it great ? best thing ever ?!

Actually it is very good. But there is one thing you just missed! Redis do a flush of keys every 60 sec if 10k keys did change by default. What does it mean ? You can get screwed if redis will instantly die!

How to fix this ? Visit http://redis.io/topics/persistence and see section “Append-only file”

redis_config.fix

- appendonly no
+ appendonly yes

Man you just did it, you lost some performance but you did it. Who would knew ? You just saved the world. How much we just cut performance ? “fsync every time a new command is appended to the AOF. Very very slow, very safe.” that is Ouch! Your boss could be unhappy even if this solution is actually the best, most simple and durable idea.

Can’t lose any data! Redis plus Mysql/Postgres

Persistence daemons worked out a new combo. You store each element of the queue like key-value store in SQL RDBMS and put its id on the queue in redis next you pop it up from the queue in redis process and updates its status in SQL RDBM. This is not so bad but it kills performance more than just turning on “appendonly yes”. Also it makes things hell more complicated and forces you to do updates in both system.

Is this system cure for cancer ? No! You have to have very good queue fail recovery / startup system. Simply empty list and make query

startup

select * jobs where done = false

next you have to clean redis queue and push new data. Is this safe ? No you don’t know if few last jobs did finish or not. Eg. Mysql got f*cked but messages got processed. Yes this adds a lot more complications.

Also with this solution index on ID column makes its fast to make a select but slow to add or remove. And you want your queue to perform and yes mysql will do fsync.

Why not MongoDB

You can’t atomically pop stuff. Don’t think about pop/push/pushall on array in document! If you will have this idea check my gist https://gist.github.com/2071805 run it and see what happens :) what you get back.

RabbitMQ / ZeroMQ

When you will visit ZeroMQ page you will see

intro

ØMQ \zeromq\:
 Ø  The socket library that acts as a concurrency framework.
 Ø  Faster than TCP, for clustered products and supercomputing.
 Ø  Carries messages across inproc, IPC, TCP, and multicast.
 Ø  Connect N-to-N via fanout, pubsub, pipeline, request-reply.
 Ø  Asynch I/O for scalable multicore message-passing apps.
 Ø  Large and active open source community.
 Ø  30+ languages including C, C++, Java, .NET, Python.
 Ø  Most OSes including Linux, Windows, OS X.
 Ø  LGPL free software with full commercial support from iMatix.

Nothing about consistency FASTERN THAN (this has to be good) TCP but can use TCP (i wonder if it can be faster than TCP even using TCP /trollface). Anyway you see a lot of stuff. I started some search on zeromq losing data and what i found http://zguide.zeromq.org/page:all#Missing-Message-Problem-Solver a nice image.

Problem resolution diagram

Big thing :)

If you will visit rabbitmq page http://www.rabbitmq.com/ you will see a lot of nice things like tutorial etc. Page is nice and has useful knowledge. Both solutions have client in Erlang (massive plus) and other languages. And even while setting up whole thing may be a pain i think this is a solid option both ZeromMQ and RabbitMQ.

Why do we use Queues ?

We use them to absorb traffic of messages and process their content by eg. workers / handlers etc. If we will make it unprocessable by more than one worker we ain’t doing our job properly.

What makes things hard.

Locks if we use them, they will bite you back
Many points where we store same data in different way
Yes, locks will bite you back

So what is the best way to go ?

I think the best way to go is just to start a new movement called Unix Archeology because we seems to be reinventing the wheel too many times. But really

Make a list of solution
Ask your self if your idea is really good

I’m 100% sure that storing queues as serialized lists in memcached or keeping them as table in mysql/postgres and making loads of funky stuff to keep it running is not the way to go. It can seem like a good idea at start but it is not. Named pipe in file system can be better.

Loads of things can be brilliant queue choices eg. Redis, ZeroMQ, RabbitMQ or even named pipes but not serialized array in key-value store.

Redis List Internals

Sep 26th, 2012 | Comments

Today i spoted on twitter this:

@antirez: “The Redis community is composed of 99% of people that, really, know their stuff, know the Redis internals and behaviour, and are * great *.”

@shanley: “@antirez I’ve never met a technical community where 99% of them were familiar with the internals of anything. Did you mean 9%?”

This sparked in my mind very quick review of the topics that we talk about in work and i realised that we talk about a lot about internals of redis and a bit about riak but this is different story :).

Spike

I just wanted to write a short post about first thing i ever picked when i was looking into internals of Redis. It is List :D i love lists.

So what i did was opening again github and picking up list header file to re-read.

https://github.com/antirez/redis/blob/d310fbedabd3101505b694f5c25a2e48480a3c2b/src/adlist.h

First thing that you notice is that code is simple and whole thing is implemented in 93 lines of header and 341 lines .c file. (with license etc lol).

Structure of List

In general List is just degenerated Tree. In Redis structure of it is simple. Whole description of the list is simply

typedef struct listNode {
    struct listNode *prev;
    struct listNode *next;
    void *value;
} listNode;

typedef struct list {
    listNode *head;
    listNode *tail;
    void *(*dup)(void *ptr);
    void (*free)(void *ptr);
    int (*match)(void *ptr, void *key);
    unsigned long len;
} list;

this knowledge lets us count how much space this will take on the heap and compare it with eg. set if we really need to. (list is imho most memory effective structure)

List iterator is important so its also worth having a peek at even if this is just internal implementation.

typedef struct listIter {
    listNode *next;
    int direction;
} listIter;

with this we can take a peek into .c file and check how you get iterator.

listIter *listGetIterator(list *list, int direction){
    listIter *iter;

    if ((iter = zmalloc(sizeof(*iter))) == NULL) return NULL;
    if (direction == AL_START_HEAD)
        iter->next = list->head;
    else
        iter->next = list->tail;
    iter->direction = direction;
    return iter;
}

And see that even if AL_START_HEAD is defined as 0 and AL_START_TAIL as 1 if we will use direction of 5 (lol) we will get tail :D I know that i’m bikesheding now.

Even without going any deeper you have a feeling how this works. Double linked list with (void) value. First thing i thought today was (this was stupid) “Wow why this is (void ) and not (char ) this would let compiler better type check it while compilation” but antirez wrote to me “@jakuboboza hint: grep listCreate .c” and that was the hint i needed. (void *) is more generic but list is used in many places in redis internals and i did not thought about it (lol)

grep listCreate *.c

λ grep listCreate *.c
adlist.c:list *listCreate(void)
adlist.c:    if ((copy = listCreate()) == NULL)
aof.c:    server.aof_rewrite_buf_blocks = listCreate();
aof.c:    c->reply = listCreate();
aof.c:    c->watched_keys = listCreate();
bio.c:        bio_jobs[j] = listCreate();
multi.c:        clients = listCreate();
networking.c:    c->reply = listCreate();
networking.c:    c->io_keys = listCreate();
networking.c:    c->watched_keys = listCreate();
networking.c:    c->pubsub_patterns = listCreate();
object.c:    list *l = listCreate();
pubsub.c:            clients = listCreate();
redis-benchmark.c:    config.clients = listCreate();
redis.c:    server.clients = listCreate();
redis.c:    server.clients_to_close = listCreate();
redis.c:    server.slaves = listCreate();
redis.c:    server.monitors = listCreate();
redis.c:    server.unblocked_clients = listCreate();
redis.c:    server.ready_keys = listCreate();
redis.c:    server.pubsub_patterns = listCreate();
sentinel.c:    sentinel.scripts_queue = listCreate();
slowlog.c:    server.slowlog = listCreate();
sort.c:    operations = listCreate();
t_list.c:        list *l = listCreate();
t_list.c:            l = listCreate();
t_list.c:        server.ready_keys = listCreate();
ziplist.c:            ref = listCreate();

A lot of places :) lol.

In adlist.c we can also check how the list is created

list *listCreate(void)
{
    struct list *list;

    if ((list = zmalloc(sizeof(*list))) == NULL)
        return NULL;
    list->head = list->tail = NULL;
    list->len = 0;
    list->dup = NULL;
    list->free = NULL;
    list->match = NULL;
    return list;
}

This is just academic example :D I love it. This code is easy to understand and just pleasure to read.

why to even bother talking about internals ?

It is important to talk about them because if you read in documentation that

lpush is O(1)
rpush is O(1)
lpop is O(1)

you really want to check this out to get better understanding how stuff works under the hood. Even if this is trivial example.

It is worth talking about internals of tools that you use, you learn a lot and i think its truth what antirez said, this community is great!

Thing to view is suggested by antires Dict!

antirez: “@jakuboboza it’s definitely a very simple implementation! Probably our most “on steroids” implementation of data structures is dict.c”

^_____^

← Older Blog Archives Newer →