Parsing of data: on the two sides of the ocean

We're all active on social media, posting information about ourselves. But have you thought about the fact that information intermediaries, including social media owners or third parties, can use it against you? And it's not just about the names, the texts you write, but also your photos, subscriptions, playlists, geolocation.

There are plenty of firms on the market that make a fortune from selling personal data (or analytics). Typically, they are used to provide advertising services, create user profiles, or improve service quality. Sometimes advertisements for selling personal data can be found on hacker forums.

In this long read we will discuss parsing and two opposite approaches, established in the United States, where the consent of users of social networks is not required for parsing personal data, and in Russia, where it is required since March 2021
What is parsing?

Parsing is the automated collection of information, which is carried out with the help of a special bot (parser), which, just like a bug, moves through the pages and collects the necessary information for the customer. Parsing allows you to reduce costs, and in this sense, it is more profitable to the customer than the mechanical collection of information, which is both time consuming and costly. At the same time, parsing, in fact give a free pass to such customers, because your data is not just viewed, but also copied into the storage of a computer device, and therefore it can be freely used.

As it comes to ethical side of parsing, this technology appears as an absolute evil, therefore, it should be banned. But in today's economy, this is virtually impossible: those, who offer parsing services, will simply move into the shadow market, and citizens will be left without minimal guarantees of protection of their rights. It seems quite realistic to establish a reasonable balance between such constitutionally important values as privacy and the freedom to seek, receive, transmit, reproduce and disseminate information.

In Russia, this balance has swung towards privacy.
To begin with, the untargeted processing of personal data is prohibited by Article 5 of the Federal Law on personal data of 2006.
For example, users of social networks are unlikely to publish their data for credit institutions, which collect data for the purpose of profiling people as potential borrowers (credit scoring). And it is even more unlikely that they expect to encounter the difficulties that may arise in obtaining credit as a result of the bank's examination of the profile. Of course for the banks the profile of a person who visited 50 countries per a year, looks more attractive than a half-empty profile with pictures of cats in a country setting. By the way, one of the most high-profile cases of the last few years (VK v. Double Data) is just about that (link to the case).


But if, for example, someone parses the website of the marketplace Avito, where you sell your car, and then your data are placed on their site – to "help" you sell it, a different justification for the illegality of parsing is used.
In March 2021 some changes were made to Article 10.1 of the Act. Their outcome can be summarized as follows:


For individuals who registered on social media:

Often, infringers cite that their actions fall under an exemption provided by law for search engines. Search engines such as Google and Yandex regularly parse web pages and are not held liable. The reason is that providing the service of searching the information posted by data subjects on the Internet, as long as personal data are included in the search engine index (keywords table), is not a violation of Article 10.1 of the Federal Law on personal data of 2006. This is referred to the case of VK against Double Ltd
What to do if you encounter parsing of your personal data?

You have the right to demand from the operator (hereinafter – the social network) to stop transferring personal data to third parties and from the third parties themselves who process personal data to stop it. Finding such perpetrators, even through the operator, can be difficult, as parsing programmes allows to remain anonymous, as well as bypass established security measures that protect data from being copied.

You have the opportunity to request information from the bank, as it is the "buyer" of your personal data in the credit scoring example.

If a legally decision was made based solely on the automated processing of personal data without your written consent as the data subject, the perpetrator may be held legally liable under Clause 2 Article 13.11 of the Code of Administrative Offences, and the decision itself can be challenged.
Can you minimize the risks of parsing consequences?

It is unlikely to minimize such risks technically. However, you can make it a habit to read the relevant provisions of the privacy policy of the social networks, and make your profiles private.

In European countries personal data is primarily seen as a non-property good, an extension of the individual, while in the U.S. it is viewed as a commodity in respect of which its owners have legitimate interests. That is why in the U.S. the balance has swung in the direction of the freedom to freely seek, receive, transmit, produce and disseminate information.


U.S. courts believe that it is unecessary to obtain consent from the subject to whom the data belongs to process personal data. This approach is understandable: it doesn't matter as much who views the page – the parser-bot or the user – as long as the requirements for ethical parsing are met

The "moral minimum" of legitimate parsing in this case includes the following standards:

1. The copyright of the site content and related database rights are not violated.


2. Technical protection measures are not circumvented.

A number of parsers can quite easily bypass the restrictions, while remaining anonymous. Specialists advise to take a complex approach to site protection, in particular, to use:


  • limiting the data retrieval speed by establishing one search per IP address per second
  • tracking unusual activity (multiple requests from one IP address), using parser detection techniques
  • Require mandatory login to view the site
  • Use captchas for requests (usually look like this: "enter code from picture")
  • regular HTML editing
  • using cookies.

However, you should understand that operators often have to choose between protecting data from parsing and impairing the availability of services for users and search engines. In particular, there may be failures in the operation of sites.


3. No unfair competition (e.g. a company aggregates product information from the sites of several online stores, but does not sell the goods itself, but redirects users to the site of an online store).


4. No losses from failure of the site. Parsers can send several times more requests per second than humans, and it causes a load on the sites - they stop working. If the site is used, for example, as a platform for commercial ads, it can cause huge losses for the site owner.


5. Personal data is used in accordance with the purposes for which it is provided and does not have negative consequences for its subject.

Conclusion:
There are no right and wrong approaches in the choice of regulation, there are only certain legal traditions and priorities that are specific to individual governments. The convergence between approaches takes place at the political level. It is characterized by the desire to find a reasonable balance between the right to information, the development of technology on the one hand and the human right to privacy on the other.

Interesting Cases:


Russian case law: VK v. Double Data


Foreign case law:

1. HIQ Labs, Inc. v. Linkedin Corp (273 F. Supp. 3d 1099 (N.D. Cal.2017)).

2. Linkedln v. Robocog Inc. (Case No. 14-00068 (N.D. Cal. 2014)).

3. QVC, Inc. v. Resultly, LLC (99 F. Supp. 3d 525 (2015).

4. EBay v. Bidder's Edge, 100 F. Supp. 2d 1058 (N.D. Cal. 2000).

5. Maximillian Schrems v. Data Protection Commissioner (Case C-362/14).


Useful Internet resources:


1. Parsing sites. Russia and the World. What does one of the most useful tools look like in terms of the law?

https://vc.ru/legal/64328-parsing-saytov-rossiya-i-mir-kak-s-tochki-zreniya-zakona-vyglyadit-odin-iz-samyh-poleznyh-instrumentov

2. How to protect your site from parsing data. Practical tips

https://vc.ru/services/262190-kak-zashchitit-svoy-sayt-ot-parsinga-dannyh-prakticheskie-sovety

3. Deloitte CIS webinar. Parsing: how not to violate the exclusive rights to other people's content.

https://www.youtube.com/watch?v=cra-i4GZ4go

4. Webinar IP IT BOX Legal Consequences of the VKontakte v. Double data

https://www.youtube.com/watch?v=28yxT52JuX8

5. IP IT BOX webinar 519-FZ: new rules on personal data processing

https://www.youtube.com/watch?v=cwUKnRr6jhs&t=1012s


Additional literature:


1. Saveliev A. I. Scientific and practical article-by-article commentary on the Federal Law "On personal data"

2. Rozhkova M.A. Resolving the question of the legality of parsing (scraping) in vkontakte v. double data

Oreshin E.I. The VKontakte VS double data case on the use of publicly available user data: double data position before the Court of Intellectual Rights