The value of the data

(To George Tosi)
16/02/22

It is well known that on the Internet there are a lot of data of different types and sources. Data that are generally available for consultation through those browsers that allow us to interact with the network in ways that can only be textual (reading an article, for example) or even multimedia (audio, video and streaming).

It therefore seems that all these data are available to anyone and can be used to extract information that can help us guide our daily life choices.

In reality, things are not so transparent and simple. The methods of use of this data are in fact designed for the interactivity that distinguishes the Human-Machine relationship. Difficulties arise when you want to reuse these data to process them in a personal way to extract information of specific interest to the individual user.

To clarify, we refer to the case of the share prices of any market (Borsa Italiana, Nyse or Nasdaq ...): there are a multitude of sites that allow the analysis of the performance of a share (TICKER) and provide a whole series of related data the value, the quantities handled, the daily maximums and minimums and so on. The interface to this data is, as stated, the typical one of the Human - Machine interaction: the browser.

If I wanted to process this data locally to extract information of my interest, things get significantly complicated. To access that same data from a programmatic interface, 2 methods are available: the web scraping or access via a specific data interface such as API REST (representational state transfer) or SOAP (Simple Object Access Protocol), where REST is by far the most widespread and used solution.

Let's briefly analyze the two alternatives. The web scraping is based on the emulation of human behavior by transferring the web page of interest to the client and then intervening on it in search of the data of interest by identifying a pattern known recognition. For example, using particular HTML tags that allow you to identify the data of interest to the intent of the HTML page.

Although libraries are available in various programming languages ​​(JavaScript, Python, Java, ...) that somehow facilitate the identification and recognition of the particular data, the procedure is not immediate and is prone to errors. Furthermore, if the source HTML page changes, it is possible that the script developed is wrong because the pattern recognition (for example the tag that identifies the data has changed) this solution therefore, although possible, can be problematic due to the complexity of implementation and the strict dependence on the structure of the HTML page.

The solution based on a programmatic interface API REST (or SOAP) is absolutely more robust and easier to implement.

The task of these interfaces is to standardize the methods of accessing data. In the case of REST interfaces, HTTP technology will therefore be used and the requested data will be read through GETs on specific URLs constructed in such a way as to uniquely identify the data of interest.

As an example, I report the URL for accessing the REST interface of the currency conversion service provided by the Bank of Italy.

https: //tassidicambio.bancaditalia.it/terzevalute-wf-web/rest/v1.0/dailyRates?referenceDate= {data} & baseCurrencyIsoCode = {ffrom} ¤cyIsoCode = {tto} & lang = {“it"}

The terms in bold are used to define the currencies involved and the valuation date of the conversion ratio. A GET operation on that URL will then return the searched value. This operation can be easily implemented from any programming language (Python, Javascript, C #,…) through the use of appropriate libraries.

Although there are many free services based on this type of interface, it is interesting to note that there are many sites that offer financial information for a fee. This is usually particularly detailed data that offers a complete overview of aspects related to financial information relating to markets and companies around the world. I would like to underline that many of these sites, in particular those related to economic and financial aspects, offer full access only against some form of subscription.

Basically, the use of data through a programmatic interface is subject to a payment. Which helps to reinforce a basic concept that is often overlooked: on the Internet, the real value is in the data. If the aggregated and usable, but still public, data of listed companies have a specific value and are the subject of subscription offers, we can imagine what value our personal data have that, more or less knowingly, we have delivered to a multitude of companies that they manage e-commerce sites or social networks.

References

One of the many sites that offer subscription-based financial information https://site.financialmodelingprep.com/developer/docs/pricing

Deepening on REST interface https://www.ibm.com/cloud/learn/rest-apis