The quest for a blog engine

I’ve been trying to become an avid blogger since a few months ago, but none of the available options of blog engines seem to fit my needs:

  • Easy posting workflow
  • Code blocks and highlight
  • Simple & minimalistic UI/UX
  • Markdown based – this allows me to write my posts in any Markdown editor without requiring to be logged into the blog engine to do just that, write a post

Those are my simple needs, nothing fancy, just want a casual blogger would need. I tried different blogging engines, none of which made me feel comfortable fulfilling my requirements. For months I tried Google’s Blogger, WordPress, Posterous and Tumblr and I felt most of them to be practically the same, a system with so many settings and dashboards to manage, even writing a post required me to go to the website, sign in, reserve a tab in my browser to just write my post, giving me no flexibility on when and how to write a post… I know, you could say that you can write your post in some editor – google drive, word, plain text, etc – but that means that later you have to go, copy the content and make sure the formatting is correct, making the process of posting even more complex.

Right now I’m writing this blog post in Markdown with Mou, saving the file into a directory in my Dropbox filesystem, that’s it. I think I’ve found my blog engine for the good.

Originally posted on: http://scriptogr.am/edgar

Reinforcement Learning in web recommender systems

This time I found a pretty interesting form of representing a problem in the web to fit a reinforcement learning approach. Since the information in the Web is increasing day after day, it is necessary to provide systems that could recommend users related content of interest to them. Web content recommendation has been an active application area for information filtering, web mining and machine learning research. In this paper the authors exploit a way to enhance a reinforcement learning solution, that has been devised for web recommendations based on web usage data.

To model the problem as reinforcement learning, they use the analogy of a game in which the system is constantly trying to predict the next state of a user web browsing session, knowing his previous requests (visited pages), and the history of other users browsing sessions. The action is selecting a recommended page. Reward (Rs) is in function of the visited pages and the already recommended pages for each state. A state S’ is rewarded when the last page visited belongs to the recommended pages list.

One interesting aspect of this system, that actually relates to my thesis topic, is that this approach not only takes into account the urls of the web pages in order to record the web usage history, but it also relates those pages with concepts, providing semantics to this recommendations (i.e the system at the same time of recommending urls of web pages it also recommends concepts related to what the user is actually browsing). In this case, semantics provide a more interesting point of view for this recommendation system based on reinforcement learning, and the recommended pages for the user actually match concepts that he/she is browsing, providing timely information reducing the effort to browse about user interests.

Reference:
Nima Taghipour and Ahmad Kardan. 2008. A hybrid web recommender system based on Q-learning. In Proceedings of the 2008 ACM symposium on Applied computing (SAC ’08). ACM, New York, NY, USA, 1164-1168. DOI=10.1145/1363686.1363954 http://doi.acm.org/10.1145/1363686.1363954

Bayesian Network for Ontology Mapping

I found one interesting application of Bayesian Networks for the Semantic Web, and it’s aimed to help ontology mapping between two different Ontologies, i.e. determine how much Onto1:Concept1 is similar to Onto2:Concept2, thus trying to map two different concepts obtaining a value that corresponds to the degree of similarity.
The details of how it works are pretty interesting. First, the Ontologies are converted to Bayesian Networks using a framework called BayesOWL (references on the paper); the resulting Bayesian Network preserves the semantics of the original ontologies, and support ontology reasoning, within and across ontologies, using Bayesian inferences. This BayesOWL framework provides the methods that utilizes available probability constraints about classes and inter-class relations in constructing the the conditional probability tables of the network.
Prior probability distributions of uncertainty about concepts used for the framework, conditional distributions for relations between classes in the same ontology and joint probability distributions for semantic similarity between different concepts in different ontologies, are constructed based on machine learning of these probabilities using text classification techniques, associating a concept with a group of sample text documents called exemplars, retrieved from a search engine.

The main idea of this research is to provide a simple and efficient way of determine the semantic similarity of two concepts in distinct ontologies. This is an approach that I didn’t know, but it’s worth to keep in mind when doing research in Semantic Web technologies.

Reference:
Pan, Rong, et. Al. A Bayesian Network Approach to Ontology Mapping. The Semantic Web – ISWC 2005. Lecture Notes in Computer Science. 2005 Springer Berlin / Heidelberg. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.122.3201&rep=rep1&type=pdf

Novint Falcon Fights!

Recently me and my Robotics team where developing a project in which we incorporate several technologies to build a remotely controlled humanoid. There where some cool devices that I have never worked with, but that made the project really interesting. One of the new devices is the Novint Falcon haptic device. If you have never heard about Haptic technologies, then this is a good time for you to google it, even easier, let me google that for you… done.

One of the main usages for this device are Video Games, and it’s a great experince to play first person shooting games with this device, pretty vivid motions and feedback.

Anyways, here’s a little video of the QA that we did to the device; by the way, no one got hurt during this video.

A complete chapter for XML, it deserves it!

It has been a while since I had my first encounter with the XML syntax, even though I didn’t know about the specification too much it seemed like a very straight forward way to specify data structures for a file that required specific data schema. Because of its similarity with HTML, XML was a natural steep to take for me, having no problem at all. Nevertheless, the importance of XML goes beyond of the practical uses that most of the people may implement on their daily activities or projects. It is important to realize the huge impact that technology, like XML, does on many applications, which are not perceived at first time.

I guess that my story is like many others, at first time, you don’t realize of the importance or scope of the stuff that you are using/learning. Because of its ease of use, people may underestimate the real value of the technology that is used constantly, in this particular case, the XML syntax. XML let machines process and communicate each other, since it provides a machine readable structure. The standard comes from the SGML specification, which defines all the Markup Languages applications such as HTML and XML included.

XML has become the basic serialization schema for applications such as Web Services and Graph serialization under RDF/XML, which will enable all the core functionalities on the internet of tomorrow. Nowadays every mayor web service out there offers the possibility to return information in xml format (among others), which also is the core serialization format for all the information interchange in remote calls such as SOAP, basically XML messages. Core technologies for the Semantic Web are based on the XML specification, such as RDF/XML, which is the serialization of graphs into XML files, but also OWL, used to build Ontologies; all of them are based on XML.

Semantic Web Stack

Semantic Web Stack

But, why XML has become so popular and “Standard”? Many advantages provided by the specification made it the favorite syntax for information serialization/schematization on the web. A clean, human and machine readable structure provides the means to virtually use XML in plenty types of applications, from process communication in distributed systems to database schemas and data transactions. An easy to use and powerful query mechanism/languages such as XPath provides the key access to every piece of information in the structure of the XML files.

But also modularization has an incredible weight on the fact of it’s popularity, since the web is a distributed system, it is also a modular system, which is comprised of many heterogenous elements. XML must deal with disambiguation of terms, thus, it has the so called namespaces, in which terms/tags may be defined without a risk of confusing them with the same named terms in another domain or namespace, these are the basis for Ontologies and RDF. But its most important feature is flexibility. The ability to define any schema / tags / namespace / domain, provides entities out there with the power of defining their own language for serialization; using XML Schema (in the old times DTDs), they can maintain the consistency of communications (in the distributed systems environment), or the consistency of a database schema (for databases).

We can find a full set of features that make of XML syntax a worth candidate to deserve its own chapter in any book related to web technology: Semantic Web, Databases, Web Services, Protocols, Distributed Systems, etc.

RDF databases really differ from NoSQL databases!

Working along with RDF data you can realize pretty fast of the need to store all that huge amount of information in some easy to query engine. Nowadays, there has been a huge explosion of several “so called” NoSQL databases, which are based in the idea of the flexibility of it’s schema free designs. But, RDF data and databases differ from this type of architectures, even when they share some aspects.

Anyways, I have found a pretty interesting article written by Arto Bendiken, a Cofounder of Datagraph, which describes the differences between these approaches. Pretty interesting article and clearly explained. It’s worth the read.

You can find the article here.