New technological tools, including machine learning, can increase the efficiency of cross-border investigations into malpractice, but Colum Bancroft, Managing Director, and Edward Boyle, Senior Vice-President, AlixPartners, highlight the need for appropriate data governance controls to be in place to ensure regulatory compliance.

An internal investigation can be viewed as an exercise in data management. Those charged with managing the investigation need to determine the data in scope, where it resides and what the contents of the data can reveal about past conduct. Following the increase in local and international extraterritorial data privacy legislation, additional attention is being given to conducting investigations with the right safeguards in place. On the other hand, as with any data-intensive exercise, there are significant benefits and efficiencies that can be gained from the use of technology. However, these tools are most effective when different types and sources of data is aggregated, which can only be considered once the data risks have been thoroughly assessed. Throughout the whole investigation process, from the planning of the investigation through to execution and reporting, the first priority should be to ensure that appropriate measures are taken to ensure the data is being handled in accordance with the law.

Cross-border data challenges

Data collected from different jurisdictions during the course of an investigation is often required to be strictly segregated based on local data laws, and, in the case of the PRC, the Law of the PRC on Guarding State Secrets. Similarly, in Japan, following amendments to the Act of Personal Information Protection, the Personal Information Protection Commission (PIPC) was established. According to PIPC regulations, entities must obtain the consent of the data subject prior to disclosure to overseas third parties. This serves to highlight the minefield of data privacy regulations, which those charged with managing investigations must keep front of mind throughout the course of an investigation.

Where data can be aggregated from multiple locations, the efficiency of the investigation is increased. For practical reasons, it is helpful for investigating professionals to be able to review and share data between locations. Also, when conducting data analytics–based exercises, either as a tool for red flag detection or in the course of an investigation, the analysis is more valuable when the data sits in one database and can be reviewed in the context of all the relevant data. For example, an analytical exercise reviewing expense reports by individual employees would ideally be able to identify benchmarks and outliers between employees across the entire business operations.

Strict data requirements necessitate a great deal of planning and preparation on the part of those managing investigations. While it is often necessary for all data collected during the eDiscovery process to be hosted in-country, this does not solve the issue of dealing with working papers and other documents obtained by forensic accountants and other investigating professionals. In most professional firms, in the ordinary course of business the preference is to store working files on cloud-based systems.

This is rarely appropriate for a cross-border investigation as it limits the control that can be placed on accessing and transmitting data. Another consideration is the location of email servers and backup policy of the investigating firm. Data may be effectively leaving the jurisdiction unbeknown to the individual user by virtue of server locations and routine backups. These considerations must be balanced with the risks associated with data loss if backups are disabled. When hiring consultants to assist with internal investigations, it is essential to consider the location and type of data sources that are in scope and how the flow of data will be controlled throughout the course of the investigation and stored or deleted on completion.

Where regulators and other stakeholders are concerned, there is an expectation that they will be kept apprised of developments as the investigation progresses and, of course, receive a final report. The impact of the report, particularly when it involves personal data and international regulators, may be lessened when findings can only be reported as part of aggregated data or in an anonymised format. For a corruption investigation, the findings will be at the level of individual transactions, payments to whom, and on what date and authorised by whom.

This is typically less of an issue for financial statement fraud cases where the focus of the investigation is usually to try to get to the real underlying financial position of the entity under investigation. However, the conduct of individuals will still be a focus of regulators in all cases. In many jurisdictions this information cannot be reported outside the relevant jurisdiction, so careful determinations must be made about how findings are reported based on appropriate legal advice, as necessary. The use of technology tools can greatly assist the progress and efficiency in an investigation, but the first consideration when deploying these tools is ensuring that they are set up and operated in a controlled environment that is compliant with local data privacy laws. The investigation leaders should be prepared to explain to the relevant authorities what measures have been put in place to comply with local laws. In these situations, as with all compliance matters, planning and documentation of the controls in place is key.

Technology assisted review

Technology assisted review (TAR) – using machine learning as part of the document review process – has been accepted by courts in the US for some time. The take- up in Asia-Pacific countries has been less widespread, particularly for internal investigations. This is a result of a mixture of factors: some unique to Asia and others relating to the technology generally.

The latest TAR software has a number of different functions that can aid the review process. Typically, the process involves taking a set of reviewed seed documents from which the software will look for common factors and apply predictive coding to the remaining review population. This is then refined and validated through an iterative process until the software determines that the remaining documents do not need to be reviewed or, technically speaking, that the probability that the relevance of any document that hasn’t been reviewed (by a human) is outside predetermined statistical parameters.
One conceptual issue that commonly occurs in investigations in Asia is multiple languages in the same review population. In these cases, the data set needs to be categorised by language and the machine learning can only be applied to each category on a siloed basis. This can create issues where a custodian may discuss the same issue in different languages across different email threads. The software will not be able to make the link between one email in, say, Japanese and a related email in English. One way to resolve this, once the issues are well known, is to use targeted search terms as a quality assurance exercise. Any issue that is identified in one language can be searched for using corresponding search terms in any other language used by the relevant custodian.

Hybrid approach

Another concern around this technology is the perception that it is a black box. Investigators who are not familiar with the technology can be reluctant to move away from tried and trusted methodologies. The technology used for traditional linear review has been in use for some time and is widely understood. A set of search terms can be agreed at the outset based on known issues and a review population is identified. From that point the progress of the review is relatively predictable. The review plan is straightforward and easy to communicate to stakeholders, including regulators.

Because of the challenges outlined above, a hybrid approach can be an effective way to defensibly accelerate the progress of an investigation. Firstly, the TAR software can be used as part of an early case assessment. The data visualisation functions quickly help investigators get an overall understanding of the data set and identify if there are any gaps in the data. For the review phase, the search terms can then be applied to the review population as in a linear review. TAR is then used not to predictively code, but to prioritise the review based on the results of an initial review seed set of documents. The advantage of this approach is that the machine learning will help to identify potentially relevant documents and push them up the review queue, meaning early identification of key documents. Compared with a linear review there is no downside, as the prioritisation can be managed at minimal incremental cost and is likely to lead to efficiency savings overall. This is particularly helpful when there are parallel workstreams such as witness interviews and analysis of structured data. Early identification can allow the investigation to quickly hone in on the key issues.

Combining insights from multiple sources of data

One of the most time-consuming, and therefore expensive, aspects of an investigation is identifying links and analysis between different data sets, particularly between unstructured data (such as emails and chat messages) and structured data (usually transaction data). An email might refer to the payment of an invoice, and the investigation then has to identify the payment in the structured data in a different system (or systems) by reference to the date or the invoice number. This can be particularly time-consuming, particularly in the context of investigations where the list of suspect transactions could be voluminous, such as anti-money laundering, corruption or accounting fraud investigations.

New tools are now available that can not only house structured and unstructured data in the same review platform, but also automatically make links between the two data sets. In practice, this means a reviewer can look at the contents of an email discussing a transaction and the actual associated transaction details with a few clicks. This can help to quickly validate findings, as well as root out false positives – for example filtering out emails that on first review might appear to contain issues but are actually benign. As noted above, in many cases data sets from different jurisdictions cannot be reviewed as a whole. While this places some limits on the efficiencies that are available from using various technology tools, the benefits of using these tools outweigh the costs of implementation, even for relatively small data sets. The increasing complexity and sophistication of the issues faced by forensic investigators means investigators must equip themselves with the best available tools to uncover the issues in an efficient and cost-effective manner.

Capturing communications

The evolution of channels of communication and blurring of the lines between business and personal communications means relevant data can sit on multiple devices with multiple applications on each device. Capturing, processing and hosting email and other electronic data has been standard practice for a number of years, but is no longer sufficient. The use of messaging applications for business as well as social interaction is now commonplace. The annual WeChat report issued by Tencent’s research division reported that 83% of surveyed respondents use WeChat for work, with a reported 963 million active accounts, all of which equates to a lot of business conversations happening off email. Crucially these work-related conversations occur regardless of whether the device is issued by the company or is owned by the employee. While there are means of capturing these conversations from backups to laptops, this depends on the settings applied by the user, so cannot be guaranteed.

Conversations on messaging applications can be extremely valuable evidence precisely because bad actors now know very well that their corporate emails can be easily accessed and reviewed. In most cases, custodians tend to be less cautious when communicating over messaging applications. Whether the device is owned by the employee or company-issued (with appropriate data-ownership policies), such communications can be key to an investigation.


Managing increasingly complex sources of data and overall data volumes is creating a number of challenges for those charged with managing investigations. Investigating teams need to be aware of, and able to make best use of, the technological tools available to manage and gain insight from very large and disparate sets of data. Along with the changes in data, the evolving regulatory landscape in respect of data privacy in both Asia-Pacific regulations and extraterritorial international regulations requires investigations to be managed in a way that is compliant with relevant laws and ensures the collection, transfer, and reporting of data is carried out in a controlled environment.

Colum Bancroft, Managing Director, and Edward Boyle, Senior Vice-President