Tech – Page 10 – Juan's adventures in security & compliance

Day 1

Part 1

I solved part 1 with Excel. Despite all its shortcomings, Excel remains an excellent tool to visualise and manipulate data. The logic was as follows: if any 2 unique numbers add up to 2020, then I cannot have any numbers with a sum greater than 2020. First step was to sort the data and find the smallest number, which for me was 60. Now for any number k in the list, if a different number from the list, say b, sums to 2020, then 2020-k-b would be equal to 0. The check is therefore to add an additional column where the values are given by 2020-k-b. When k+b is greater than 2020, the check returns a negative number. By deleting all entries with negative values it is possible to reduce the list and I found that 60 did not give any solutions. The next number in the list was 107. 107 was sadly also not a part of the solution, but did reduce the amount of testing data significantly. Continuing with this method until eventually the check column provided a 0 and I had the 2 numbers required.

Part 2

Part 2 was also solvable using Excel, but not in 5 minutes as was the case for part 1. Part 2 required finding 3 numbers in the list whose sum was 2020. First instinct was to have a symmetrical matrix, setting the values within to be 2020-ColumnHeader-RowHeader and then using conditional formatting such that if a number was a part of the original date it would get a green background.

Instead I went with Python for the solution. After sorting the data, I was able to use the below to get the solution.

for i in range(len(data)-3): n1 = data[i] for j in range(i+1, len(data)-2): n2 = data[j] if (n1 + n2 < 2020): for k in range(len(data)-1, j, -1): n3 = data[k] if n1 + n2 + n3 == 2020: print(n1*n2*n3) exit()

This takes advantage of the fact that at any point, i < j < k slightly reducing the calculation time although not the time complexity, which remains at O(n³). A colleague managed to reduce it to O(n²) by making a smart use of sets.

Day 2

Day 2 was all about data parsing. Nothing better for data parsing than ~~headache inducing~~ regular expressions. The best regex engine is from perl. Most other languages invoke ‘perl-equivalent’ regex, but the syntax is never quite as simple. After looking at the problem again, I could have solved it in a much simpler regex than I did, but then, Advent of Code is about getting a solution, rather than the solution.

$data =~ /(\d)-(\d)*([a-z])\:\s([az]+)/; my $min = $1; my $max = $2; my $check = $3; my $pw = $4;

Not sure if that could be combined into a single line by making the initialisation a part of the regex check. But then, that’s why people say perl is unreadable so back to good, human readable code. Instead I did several substitution expressions at each line to transform it into a CSV line, split it into an array and then used variables such that I wouldn’t need to be referencing array indices in the rest of the code. In order to solve the first part, I removed all characters which were not a part of the password policy and then verified that the remaining string had a length between the min & max character occurences.

The second part involved checking a different policy, verifying if the password had the given letter at one of the 2 positions provided, but not both. I wasn’t able to get perl to treat the passwords as an array of characters as I remember doing in C eons ago. This meant splitting the pw into an array. Because the elves don’t recognise that arrays start at 0 there is a large risk of OBOE errors => reduce both min & max variables by 1, and then through the use of xor verify if only 1 of the 2 conditions works. Could it have been done with regex? Yes. Was using the array index much easier to implement, and later on, read? I certainly hope so.

Interconnectivity in the UK

Following my last post on privacy and the NHS apps in the UK, I was informed the different services in circulation in the UK were talking to each other. The main change for users was that they no longer needed to download different apps when moving accross the UK, being able to use their local app in Scotland, England, Wales, Northern Ireland or Gibraltar. While it certainly provides a better user experience, the backend implementation can be a nightmare for privacy conscious people.

The simplest implementation would be to move from local servers to a centralised location, and have all apps communicate with a single entity. Such a solution would provide the advantage of having an increased manpower which can focus on securing and protecting the infrastructure. On the other hand, with no geolocation data, such a system increases the amount of data devices need to check against drastically, specially with the rising number of cases in the UK at the moment.

Another possibility would be to let all apps communicate with all servers with distributed connection times. Given the data is distributed across devices there is little concern for data misuse, access to the data on the device would need to be as restricted as possible, given positive matches would allow for fingerprinting when an user was in which region.

Finally, a third solution would be a handshake protocol between devices. I remember a colleague who also worked IT in hospitality mentioning the physical access control system he used meant physical access only needed to be used on a single door, with the information being spread “like a virus” as employees and clients alike would load updated access control from locks and distribute it to any locks they opened. Now, imagine a handshake between 2 Bluetooth devices (A&B) before they start sending out their pings:

A: “I’m SCO” – PING
B: “I’m ENG” – PING
A: “Here are some cases from SCO” – PING
B: “Here are some cases from ENG” – PING
…

There are 2 possible implementations for this I can see, each with their drawbacks.
1. Non-local data is distributed only in a P2P fashion, with local devices sharing it locally. That is the local device would then share the new information with other local devices and update its own datapoints. This means the people most at risk from contact with folks travelling the country would be sharing this information with the people they expose (family, coworkers) without the need for a central repository. Additionaly, people with limited contact/exposure to travelers would not be checking oudated datapoints themselves.

2. Local devices update the local servers with the non-local positive cases daily, whereby all local residents can check their data. This of course leads to increased processing on the devices, and essentially the same situation as each app connecting to each server.

What concerns would a P2P infrastructure imply? Well, for starters, when a traveller is tested positive, the people they were in contact with while abroad would not be informed until at least 24 if not 48h after the infected person’s data is made public, at which point some or most of the data points may have been deleted from old age. In terms of privacy, a P2P protocol would be easier to reverse engineer and exfiltrate data / poison existing data due to the limited authentication required. However, a P2P infrastructure would make sharing data internationally far easier, for example by updating national systems from the data obtained from flight crews.

The debate between national & international health and privacy will not be going away overnight. However, regardless of which data sharing method was implemented, it is nice to see that Protect Scotland uses a clear and concise privacy policy, most notably specifying for how long data is retained. One can only hope private sector would follow suit accross the board, rather than only on a case-by-case basis.