Dr Jag Kundi, a Hong Kong–based scholar-practitioner active in the FinTech space, launches a series of articles in CSj exploring the interaction of emerging technologies of the digital era on governance and ethics.
n the 21st century, data is an essential resource that powers the information economy, in a similar way that oil fuelled the industrial economy in the 18th century. ‘Data is the new oil’ as the Economist magazine put it in a recent and quite influential article. If this is the case, then this raises a whole range of complexities and opportunities for organisations. Just like oil, how do we govern and manage this valuable resource, prevent leaks and spills and use this to enhance and create stakeholder value? In this three-part series, these complexities around managing and governing data will be examined. This first article looks at governance and big data. The series will then turn to the way data is being decentralised on a blockchain, and eventually the role and impact of ethics on artificial intelligence (AI) and machine learning. This may be all very new and cutting-edge, but it has profound implications for society.
Big data and governance
As businesses seek to benefit from the current digital transformation taking place across sectors, new terms are being bandied around such as FinTech, RegTech, InsurTech, PropTech and HealthTech, to name a few. Whatever the nomenclature used, capturing, classifying and analysing big data is at the heart of these new approaches. Senior managers are realising that a successful transition to becoming data-driven can only be achieved with quality data and that requires a high level of data governance.
What is big data?
To understand data we need to start with some basic questions – what does ‘big data’ mean and how big is big data?
To address these questions, we need to look at the number of current internet users, as it is by their online activity that big data is being created. In 2018 for example, more than one million users came online for the first time each day (see Figure 1: World internet usage and population statistics, and Figure 2: Digital around the world in 2019).
Gartner’s definition of big data circa 2001, which is still widely used, focused on three Vs – data is arriving in increasing volumes, with ever-higher velocity and containing ever greater variety. This means that big data is getting larger, more complex and arising from new data sources such as the ‘internet of things’ (IoT). These data sets are so voluminous that traditional data processing software just can’t manage them, but these massive volumes of data can be used to address business problems that previously firms would not have been able to handle.
IBM added two further Vs – veracity (implying trust in the data) and value (via superior data analytics) – to characterise big data.
To further understand the sheer complexity involved, consider that data doesn’t sleep. An infographic provided by Domo in 2018 (www.domo.com/learn/data-never-sleeps-5?aid=ogsm072517_1&sf100871281=1), highlights how much data is generated every minute. For example, Google conducts 3.8 million searches, YouTube users watch 4.3 million videos and Snapchat users share 2 million snaps every minute.
At our current pace there are 2.5 quintillion bytes (2,500,000,000,000,000,000 bytes) of data produced every day (see Figure 3: Data measurement scale), but that pace is only accelerating with the growth of IoT. As at the end of 2018, Statista estimates there were 23.14 billion IoT devices connected to the internet and forecasts that this will roughly double by 2022 (www.statista.com/statistics/471264/iot-number-of-connected-devices-worldwide).
To contextualise the sheer volume of this amount of data it is worth pausing to consider the present data measurement scale.
One last thought on the amount of data created is that 90% of all data in the world has been generated over the last two years!
The governance challenge
For companies, big data offers massive challenges around the capture of data, data storage, data analysis, data search, data sharing, data transfer, data visualisation, data querying, data updating, data privacy and data sources. Governance of such huge amounts of data has become of paramount concern. Indeed, certain industries are more at risk such as healthcare and banking. In such cases the amount of effort and expense spent on data governance should be related to the level of risk.
Over time, due to normal business activity, diversity, growth, product expansion, legacy systems and M&A, several different types of data can be introduced to an organisation. Furthermore, this data is typically stored in various platforms and databases, including networked storage, individual hard drives, flash drives and in the cloud. The lack of a unified data management policy and standards raises a number of critical questions.
- What types of data are available within these databases?
- Where is the data stored?
- Who is the ultimate owner of the data?
- Is data being merged with other data sets before being used in reports?
- Can we respond promptly and with confidence to any data requests from regulators?
As the volume, variety, veracity and velocity of available data continues to grow at the rates indicated, businesses face two urgent challenges: how to identify actionable insights within this data (data mining and data analytics) and how to protect it. Both of these challenges depend on a high level of data management and data protection – together data and governance, or for short data governance. We can think of data governance as a combination of both the IT and the business aspects of a firm.
Data governance is linked to a certain extent to IT governance since data management is seen as a discipline of IT management. Both concepts are considered as being part of a company’s corporate governance. Organisational issues that are not within the scope of IT management are part of data quality management. Therefore, data governance defines all necessary decision rights, accountabilities, standards, rules and policies for subsequent data management.
Because of this bifurcation of governance and management in relation to big data, a new approach to data governance is needed for several reasons. Firstly, big data comes in various formats, including structured, unstructured and semi-structured. In addition, the sources of data may not be under the control of the teams that manage it.
Data governance is the formal management of data assets within an organisation. It covers areas such as data stewardship, data quality, data dictionaries, and others to help companies understand and control their data assets and focus on the proper management of data. It can also cover data security and privacy, integrity, usability, integration, compliance, availability, roles and responsibilities, and overall management of internal and external data flows.
Another side effect of this is that, if good data governance doesn’t exist, then organisations may struggle to effectively share and use their data to generate business value as they may not have a clear view of their customer needs. This could result in lost opportunities for revenue and create more business risk for them. They could also be left vulnerable to regulatory requirements.
The drivers of data governance are usually regulatory and legal requirements; however, a governance rule can be any practice to which the company wishes to adhere. Governance would dictate where certain types of data may be stored and codifies data protection methods, such as encryption or password strength. It can also be used to dictate how to back up data, who has access to data, and when archival data is too old and no longer needs to be kept and can be destroyed. Organisations can also set governance objectives around improving data quality or breaking down silos that isolate certain data. In this context, data governance is primarily used to refer to the strategy of managing and controlling data.
Implementing data governance requires establishing rules and policies within organisations from a high strategic level to a detailed operational and process level. A data governance policy can help organisations improve their overall performance as well as reduce risk. Such a policy should address the following concerns.
- What governance mechanisms are there for implementing data governance, for example the roles, responsibilities and committees needed internally and externally?
- How will the integrity of data be kept, for example how is data stored and maintained and how is its trust value verified?
- Who is the owner of the data and how can its accuracy and suitability for decision-making be ensured?
- Who has access to the data and how to set up a usage permissions system to allow users to access data for analysis and reporting?
- How to ensure that the data is compliant with all current regulations (for example the EU’s General Data Protection Regulation), and is secure and private?
For point 5 above, witness the recent fine for British Airways (BA) and its parent International Airlines Group (IAG) amounting to US$230 million, in connection with a data breach that took place in 2018 and affected some 500,000 customers browsing and booking tickets online. In their investigation, the UK regulator – the Information Commissioner’s Office (ICO), found ‘that a variety of information was compromised by poor security arrangements at [BA], including log in, payment card, and travel booking details, as well as name and address information’. Closer to home here in Hong Kong, Cathay Pacific also had a data breach in 2018 affecting 9.4 million customers and investigations are still in progress. In the current era of tighter rules on how companies manage personal data, it will be interesting to see the magnitude of the fine Cathay Pacific eventually faces for this breach.
One way to prevent such lapses is for organisations to adopt a Data Governance Organisation Structure that clearly details the roles of business teams in data governance (see Figure 4: A typical Data Governance Organisation Structure).
The governance imperative
Because information is at the centre of organisations, data governance is also at the centre of organisations. This article suggests a framework that organisations can adapt and adopt for data governance.
Given that regulatory requirements are now more demanding around data privacy, personal information, data security, data provenance and historical data, this makes data governance a top priority for organisations. Witness the introduction of new job titles such as Chief Information Officer, Chief Data Officer, Big Data Architect, Data Scientist and Data Governance Manager. These factors emphasise the higher priority and value attached to data and its management within organisations. In abstract, any corporate process can be thought of as a series of decisions. Without good information to make those decisions, organisations are sailing into a murky future rather than steering to a bright blue ocean.
Dr Jag Kundi
The next article in this series will explore the impact of blockchain on governance.
Dr Jag Kundi is a Hong Kong–based scholar-practitioner active in the FinTech space. He acts as a board adviser, mentor and investor to high-growth businesses and start-ups covering digital currencies, tokenisation and international contactless payment solutions. As a scholar he has developed and taught academic and professional programmes for HKU SPACE covering FinTech and big data and governance, as well as advised other local institutions on this subject matter for undergrad and postgrad programmes. He can be contacted by email: firstname.lastname@example.org, or via LinkedIn: www.linkedin.com/in/jagkundi.