Bit-Man
May
Sun Mon Tue Wed Thu Fri Sat
   
18 19
20 21 22 23 24 25 26
27 28 29 30 31    


Feed RSS 0.91 Feed RSS 2.0
Suscribite y recib las novedades por e-mail

Novedades por E-Mail
News by e-mail


Preview | Powered by FeedBlitz

www.flickr.com
Bit-Man's items Go to Bit-Man's photostream

My Twitter


Thu, 13 Jan 2011

Twitter Daily News v0.0.3

Today a Twitter Daily  v0.0.3 has been released.

It’s only change was to make the news to appear in chronological order (meaning as they were created).

 

Seems to be a small change and should be that way if a lot of simplistic assumptions were taken that could harm functionality in case of this assumptions were weak (meaning cease to be valid without further notice).

Basically each entry creation timestamp retrieved from the Twitter timeline is compared to the publish date, and if it matches the entry is stored in an array. Once all were read each array element is saved to a file for publication at the Bloxsom blog, starting from element zero of the array, leading to a publication in a reverse order : the newest entries are published at the entry beginning, making it the wrong assumption to the reader that  the first news read were published first.

 

An easy fix would be to read the array in reverse order, but it maintains silently coupled the design to the assumption that the twitter entries are always read in reverse chronological order. A second fix would be to store the entries in a dictionary whose keys are the entries creation date, thus breaking the previously shown coupling but date comparison code should be added. The final solution was to add a new module that represents a Blosxom entry containing only Twitter news (Blosxom::Entry::Twitter) and a date module that manages Twitter creation dates (Twitter::Date).

 

Conclusion : what started as a two line fix ended in a multi module three half days of coding + testing with a more robust design and parts of the main module extracted to new ones and now being able to be tested.

[/Perl] permanent link

0 comentarios


Wed, 08 Dec 2010

Perl 6 y los métodos de acceso


Una vuelta más en mi conocimiento del lenguaje Perl 6.
En un post anterior sobre Perl 6 y el paradigma de objetos una de mis quejas, la menor por cierto, fue sobre el uso de métodos de acceso a los atributos de un objeto (también conocidos como setters y getters).

Esta vez, y sólo para ordenar un poco, agregué los métodos de acceso para el atributo archivo pero no de la forma usual, sino de otra que no sólo ayuda a la lectura del código sino que también es independiente del idioma : no usa los prefijos set y get.
Normalmente para leer el nombre de un atributo se usa el prefijo get y para aisgnarle el valor a un atributo se usa el prefijo set en el nombre del método. Por ejemplo :

    $!archivo is rw;

    method getArchivo {
        return $!archivo;
    }

    method setArchivo(Str $nombre) {
        $!archivo = $nombre;
    }

Algo completamente normal, sano y entendible, salvo por un pequeño detalle : no suena un tanto rara la combinación de un prefijo en inglés con un nombre en español para nombrar los métodos ?
A mí si me suena un poco antinatural, así que mi propuesta es hacer uso del polimorfismo para evitar esta combinación utilizando  como nombre de método el nombre del atributo a leer, y el mismo nombre de método pero recibiendo el valor a almacenar cuando se quiere cambiar el valor del atributo :


    $!nomArchivo is rw;

    multi method archivo {
        return $!nomArchivo;
    }

    multi method archivo(Str $nombre) {
        $!nomArchivo = $nombre;
    }


Se puede ver que es mucho más natural y simple definirlos y utilizarlos de esta forma.
También pueden ver cómo va quedando el código para verificar el copyright de los archivos.

Enjoy !

[/Perl] permanent link

0 comentarios


Indexes are not for books only

One day I woke up with the sweet smell of fresh bread, coffee, milk and feeling like Captain Kirk entering into the Nexus and looking at what once was his home in the mountains ... but this time there was a big differential : my burning CPU. Yes that's true, not literally, but the poor silicon wafer was begging for some rest, a humble gesture after its tireless work.

Let me tell you a little story. Some time ago, in the year 2000, I began to work in an already running web-mail, then called aMail and now reborn as its offspring TRex. Soon it become my main e-mail reader, and trough the time the number of e-mails stored in it began to grow steadily, and as more and more of them were left in the Inbox opening it became a titanic task. That's why my CPU was begging for some rest.

Tu put it in some techno-babble the e-mail store is a filesystem based one ( all of them stored in a single folder ) and one file (the 'folders' file) that contains the folders listing (one entry per folder) plus one e-mails index with one entry for each e-mail containing file name, folder name, date, etc. and this list sorted by e-mail date in descending order (newer e-mail entries are at beginning).

The problem is that when a huge number of e-mails are stored (let's say more than 500) a performance problem arises when listing it, that means one of the most common operations en every e-mail program. This performance issue appears mainly because of :

Imagine doing this for more than 500 files every time you open your Inbox, retrieve e-mail or simply change to a new page !!! :-P

So, will be wise to make a more rationale use of memory and CPU, and the first step will be to insert an index mechanism separated from this one (the folders file) that only contains the e-mails file name and its corresponding key value, (date, sender, etc.) ordered by ascending or descending order. This will allow a faster access to the desired e-mail numbers because of a reduced file size and the fact that it's already sorted.

After that a windowing mechanism should be added to return only the entries that will be shown in the current window. E.g. if you are in page 5 of your Inbox and viewing only 25 e-mails per page, that means that you only need the e-mails ranging in order 101 to 125, so only them will be returned to your main program. This avoids that the complete Inbox folder listing travels to your program and once that it's there only 25 are selected from a total of 500 e-mails (and the other 475 travel a loooong way to be discarded).

Let's start enumerating the main areas to be covered in this work, mainly :

Also the code need to be structured in different layers to be able to build a specialized, traceable and easily debuggable indexing system :

[/Perl] permanent link

0 comentarios


TRex: file access level

OK, after the initial draft and the late explanation about relational databases, let's start with file access.

Starting with it is a base task, mainly because what is intended to do is isolate the upper levels from the nuisances of low level file access, what is made easy through perl's Input and output functions. Index management, mainly, must deal with the index logical contains : closer to search theory than to I/O details.

Analyzing the three main topics (compatibility, performance and integration) only the second contains a major impact here, because this code will be accessed just from the index management layer so there's no integration issues and no legacy code to deal with; something similar occurs with compatibility because if the index file is located inside the current filesystem structure but with a name that has no collision with the existing files (date.index, subject.index, and so) this avoids any problem with the legacy files.

Turning our attention to performance, we must consider the different alternatives to implement the index files and the resource consumption for each of them. As far as I know the index storage can be implemented using three modules or strategies: the Perl data structure persistence aka Storable modules, raw access and Tied structures

Let's analyze them :

As can be seen Storable is fast but consumes a lot of memory, raw access is a very basic tool but allows us a full implementation and tie allows us to take advantage of both models : it's not not memory hungry, there's implementation for the basic Perl structures and it isn't the fastest one but can allows us some easy and quick codification.

Just to make it clear I'll use tied structures as a proof of concept, and if it's not performant enough (I mean, there's no performance gain) then this can be replaced with an implementation based in raw access.

So this file access level will be implemented using two modules : Tie::whatever::it::fits::our:needs and TRex::Storage::Index::File. You can also take a look at the test files

Enjoy !

[/Perl] permanent link

2 comentarios


Perl 6 y paradigma de objetos

Andaba necesitando un programa que se fije si en una lista de archivos existen las líneas que establezcan el copyright para cada uno de ellos, así que me hice un ratito y empecé a hacerlo. Para practicar un poco lo estoy haciendo con Perl 6, así que si quieren darle una mirada pueden bajarse el repositorio o mirarlo directo en la web.
En princpio es muy básico, y por ahora sólo mira en un sólo archivo que está fijo dentro del código (ejemplos/files/read_file.p6).

Por un lado lo empecé  a hacerlo tipo script, o sea bien procedural, pero se me ocurrió practicar un poco la sintaxis de objetos de Perl 6 y aunque el problema no sólo no lo amerita (y creo que hasta es contraproducente) lo modifiqué para seguir el paradigma de objetos (tampoco que me llevó muuuuucho trabajo que digamos).

Como opinión personal, déjenme decirles que hay 2 cosas que me resultaron un contraproducentes :


Enjoy !

[/Perl] permanent link

0 comentarios


TRex::Storage::Index::File

This time is for TRex::Storage::Index::File module. Two important points to note are :

So the methods to access it are :

Ths API seems to be a straigforward implementation of file access primitives, and it's almos it but the main purpose is ti hide the burden of error checking and low level access to allow the cleanest code in other modules.

Enjoy !

[/Perl] permanent link

1 comentario


TRex: Going deeper in the filesystem structure

As seen in the previous post there's an already existing infrastructure (the filesystem e-mail store) that needs to be improved in terms of speed and resources usage (mainly memory and CPU).

Just to make things clear, I will depict the whole idea here because in the previous post I described the code structure but never revealed the whole picture. Here we go.

Relational databases let us store related data in forms of tables (relations) that looks similar to a spreadsheet where each column must be of the same data type (integer, string, etc.) and each one is addresses through a name. Each row contains attributes of one object that needs to be stored (represented) in the table. For example if you have a table called e-mails, the columns could be date, folder, subject and file name, and each row will contain information regarding to one e-mail :

------------------------------------------------------------------------ | date | folder | subject | file name ------------------------------------------------------------------------ | 1 Jan 2007 | Inbox | Hello !!! | 172974229 ------------------------------------------------------------------------ | 6 Feb 2007 | Inbox | Re: Re: Hello !!! | 142532639 ------------------------------------------------------------------------ | 1 Jan 2007 | Sent | Re: Hello !!! | 26785427 ------------------------------------------------------------------------ | 2 Jan 2007 | Inbox | [SPAM] Are you busy | 19334256 ------------------------------------------------------------------------

Just to look for information on any e-mail you look in the table what columns contain the information you need, perform a search on the table, and mark which rows (one at a time) match the required criteria. For example if you want to know which e-mails are declared as SPAM just look at the column called subject and mark all rows that have the word SPAM in it. Namely, in our example, you have found just one entry: the fourth one. Or if you need to know how many SPAM messages did you received in January 2007, then you must try the column subject for the word SPAM and at the column date for the word Jan 2007. The conditions used to search in the table are called search criteria. In our last example it is the word SPAM is in the subject and that the date is located in Jan 2007 (or between 1 Jan 2007 and 31 Jan 2007).

So, this algorithm to search for rows that match with certain criteria is certainly easy :

  1. Start with the first register
  2. if there's no more registers goto line 6
  3. Does this register comply with the search criteria ?
  4. If NO advance one register and go to line 2
  5. If YES mark the register, advance one register and go to line 2
  6. The marked registers are the answer to your request !!!!

But it has a problem : to select the rows the whole table needs to be accessed completely, no matter what you are looking for. If there's not too much rows it's a short duration task, but as it grows there's more and more stress put on the disks because the file that stores the table becomes larger and larger, and thus the time needed to access it grows accordingly and horribly.

YEAH, that's what happened to me with my e-mails !!!!

To speed up the data access commonly there's some paths to follow :

Guess what ?? I'll try the last option !!

So if our problem is that our table grew, and so the file that implements it,then what we can do is avoid it. I mean if you want to search for just one column why not to store the data for this column in one separated file, like this :

      File1                                   File2
      
 -----------------    --------------------------------------------------------
 |     date      |    |     folder     |       subject        |   file name
 -----------------    --------------------------------------------------------
 |  1 Jan 2007   |    |  Inbox         | Hello !!!            |  172974229
 -----------------    --------------------------------------------------------
 |  6 Feb 2007   |    |  Inbox         | Re: Re: Hello !!!    |  142532639
 -----------------    --------------------------------------------------------
 |  1 Jan 2007   |    |  Sent          | Re: Hello !!!        |  26785427
 -----------------    --------------------------------------------------------
 |  2 Jan 2007   |    |  Inbox         | [SPAM] Are you busy  |  19334256
 -----------------    --------------------------------------------------------

The point is a good one because to access the date now just the File1 needs to be accessed, but unfortunately it's common to use a search criteria that involves more than one row so if we separate each one in a different file the search is a bit more complex, and perhaps doesn't reduces significantly the access time, because :

But what we can do to obtain a better performance, and less complicated code, is to use this idea but burning out the cons and leveraging the cons : :

                                 File1.table

   ----------------------------------------------------------------------------
   | row |     date      |     folder     |       subject        |   file name
   ----------------------------------------------------------------------------
   |  0  |  1 Jan 2007   |      Inbox     | Hello !!!            |  172974229
   ----------------------------------------------------------------------------
   |  1  |  6 Feb 2007   |      Inbox     | Re: Re: Hello !!!    |  142532639
   ----------------------------------------------------------------------------
   |  2  |  1 Jan 2007   |      Sent      | Re: Hello !!!        |  26785427
   ----------------------------------------------------------------------------
   |  3  |  2 Jan 2007   |      Inbox     | [SPAM] Are you busy  |  19334256
   ----------------------------------------------------------------------------


         File2.index                        File3.index

   -----------------------         ------------------------------
   | row |     date      |         | row |       subject        |
   -----------------------         ------------------------------
   |  0  |  1 Jan 2007   |         |  3  | [SPAM] Are you busy  |
   -----------------------         ------------------------------
   |  3  |  2 Jan 2007   |         |  1  | Re: Re: Hello !!!    |
   -----------------------         ------------------------------
   |  1  |  6 Feb 2007   |         |  0  | Hello !!!            |
   -----------------------         ------------------------------
   |  2  |  1 Jan 2007   |         |  2  | Re: Hello !!!        |
   -----------------------         ------------------------------
       

But WAIT there's still something that's not right, there's some bee buzzing in my ear ... we're still accessing sequentially each file so it's a matter of time that our indexes grow to the point that became, once more, bigger enough to be the bottleneck.

Until now we've not worked on with the data types, they're just some bytes stored in a file but what we can do is look for some properties of the attributes and see if we can exploit them to speed up the search. Lets take one big example : the phone book guide. It contains thousands of entries but when you need to look for some person's phone number you don't use the previous search algorithm (one line at a time starting with the first person in the first page). Because you're smart people you know that the persons are listed in alphabetical order (beginning with the last name, then the first name), so if you look for mi phone number surely will open in any page at see that the last name in this page begins with N then you'll need to look forward, advance some pages to find the letter P ... you are near ... then some more pages and suddenly the letter T is in front of you. Wait, get back but not too much or you'll end again in the letter N. What you are doing is starting with big steps and doing them smaller each time until you find Rodriguez Victor. You have found me !!! (or perhaps I don't exists if you look at the Tokyo guide).

Basically what we will have to do is maintain each index ordered, so that each time a new criteria must be satisfied the corresponding rows must be found with a really few iterations. That's what search algorithm exploit to make their work.

I hope that this make things a bit more clear :-D

[/Perl] permanent link

1 comentario


Perl 6 y la reconciliacion con el constructor

Le di una vuelta de rosca al tema del constructor en Perl 6, y ahora me cierra un poco más.

Si bien en el post anterior mi queja es sobre el constructor default, en realidad es que me debería quejar por la documentación (en particular Synopsis 12: Objetos) donde, básicamente, dice que hay un constructor default y que se lo puede sobreescribir, pero no mas que eso.
Ahora bien, mirando los casos de test es donde me doy cuenta de la forma de hacer override y ahora todo parece más simple :

class Foo {
      has $.a;
    
      method new ($self: Str $string) {
        $self.bless(*, a => $string);
      }
}

Pero bien, suṕongamos que ahora agrego un atributo el cual quiero inicializr (para eso está el constructor :-D ) entonces le pongo un valor inicial y listo

class Foo {
      has $.a
      has $!b is rw;
    
      method new ($self: Str $string) {
        $self.bless(*, a => $string);
        $!b = False;
      }
}


y al ejecutarlo obtengo un hermoso

Type objects do not have state, but you tried to access attribute $!b

Un vez más mi experiencia en Java me hace trastabillar y hago lo que no debo porque después de llamar al constructor el objeto todavía no existe o, dicho correctamente, sí existe pero la referencia al objeto creado por bless no está implícita como prefijo de todos los nombres de atributos (o sea, $!b no es lo mismo que this!b si se me permite la trans-sintaxis Java-Perl).
Casi caigo de nuevo en la tentación de tomar el valor del objeto devuelto por bless y asignarle un valor, pero me di cuenta de algo y es que toda incialización sería más limpia si la paso a través del constructor default

class Foo {
      has $.a
      has $!b is rw;
    
      method new ($self: Str $string) {
        $self.bless(*, a => $string
                       b => False);
      }
}

Ahora si que vamos mejorando !!!!
Incluso miren que lindo que está quedando mi programa para verificar el copyright de los archivos del repositorio de la Hackatón de Perl 6.

Enjoy !

[/Perl] permanent link

0 comentarios


Perl6
meta-existingDir:Perl

I just updated and compiled Parrot and Rakudo (Perl 6 over Parrot).

My first Perl6 program :

use v6;

"Hello, Rakudo!".say();

[/Perl] permanent link

0 comentarios