Portfolio

Since being founded in 2004, we have developed various complex products,
mainly in the fields of Semantic Web (Linked Data) and text analytics.

Many of our customers are high-tech companies from Europe and North America .
We have been working with some of them for many years now.

The products we have been developing are used by organisations such as HP and NASA,
as well as by pharmaceutical companies currently active in COVID-19 vaccine research.

Please find below descriptions of selected projects.

Collaborative authoring/reviewing software with a layer of semantic analysis

Client:

A North American supplier of software for patent offices

Project description:

We fleshed out a web-based platform for collaboration of patent office workers that allows computer-assisted analysis of patent forms, detecting formal errors, missing definitions, enumerations and figures, etc.

An integral part of the system is a PDF processing and rendering engine, written in JavaScript, that relies on a distributed backend.

Technologies:

Python, PostgreSQL, Redis, lxml, JavaScipt, HTML5, Dojo Toolkit, pdf.js

Project size:

This was an over a year's work that involved several software developers.

Knowledge management system for scientists

Client:

The Golden Web Foundation: a UK-based educational institution facilitating cooperation of scientists on historical research

Project description:

We created a publishing platform that allows scientists in the history and museum fields to discover and manage connections between their work and the works of others.

This was achieved with computer-aided reference discovery to people, places, dates, events, etc. The system allows for very flexible data structure definitions and at the same time supports very detailed and precise querying. High performance was implemented with specialised indexes as well as with distributed computing. Web-based and desktop UIs were developed.

Technologies:

Python, MongoDB, BerkeleyDB, wxWidgets

Project size:

This was six years worth of work that involved several software developers and testers.

Distributed Web Crawler

Client:

A German supplier of trend tracking technologies for enterprises and consumers

Project description:

The company needed an application that would crawl the web and find documents that might be of a interest to its customers.

We implemented a system that, at the time, was able to process over 200.000 pages per hour on a desktop PC connected to the Internet using a 100Mbit link. This was achieved by running multiple download requests in one process using lightweight threads (Eventlet) and distributing parsing jobs to multiple processors (Celery + multiprocessing). An ultra-high-speed probabilistic set implementation (Bloom filter) was introduced to avoid requesting the same web page twice. To further speed up downloading, connection pooling and custom DNS request handling was implemented. The system was designed to be fail-safe, i.e. to resume processing from the point where it was (even unexpectedly) interrupted. It can be deployed to multiple machines to further speed up analysis of documents. In such approach each machine is able to be separately configured to execute a selected set of tasks and each task can be distributed to multiple machines. Using persistent message queues results in load balancing across the machines and in making it possible to continue processing flawlessly when some of the machines fail. The most efficient configuration, allowing the processing of approximately one million pages per hour, consists of two database servers, and of four machines for downloading and analysing web pages.

Technologies:

Python, Celery, Eventlet, ZeroMQ, MongoDB, RabbitMQ, Bloom filters

Project size:

This was several months worth of work for one developer.

Recommendation System for Internet TV company

Client:

Vizimo: An innovative English startup delivering TV and VoD services via web and mobile channels

Project description:

We created a system that guides TV audiences to broadcasts of interest.

It downloads content descriptions from media providers, classifies the content by performing text mining and statistical analysis and then recommends the broadcasts to users. In addition, a specialised UI has been implemented to supervise and fine-tune the classification.

Technologies:

Java, Python, wxWidgets

Project size:

This was several months worth of work for two developers.