Building a telemetry function that respects your privacy

tl;dr: Starting with wolkenkit 2.0 we ask developers whether they agree to share data on their use of the wolkenkit CLI with us. In this blog post we explain in detail why we do this, and what we do to protect privacy. This way, we want to ensure transparency and hope to establish trust.

First of all and most important, the open source release of wolkenkit is a way for us to give something back to the community: We benefit from open source in so many ways that we strongly believe in not only taking, but also giving back in return.

However, this makes it difficult for us as the developers of wolkenkit to find out how wolkenkit is being used. Since we deliberately decided not to require any registration for the download, we do not know who evaluates wolkenkit. We don't know if you actually use wolkenkit once you have downloaded it – or, if you don't use it, why.

The only hint we have are the download statistics provided by npm and Docker Hub. Unfortunately, these numbers are not very helpful because they only tell us how often the CLI and Docker images have been downloaded – but for most developers, this only happens once per version of wolkenkit. Having only these numbers, you can hardly tell how wolkenkit is used.

Understanding your users

Of course, having more details would be helpful for a better understanding of our users. At the same time this would help us to improve wolkenkit development into a direction useful to the community. Sadly, we rarely get feedback from dissatisfied users. Instead, they often simply turn away. But especially their feedback would be particularly valuable!

For this reason we would like to know more about how you use wolkenkit. Our intention is not to get to know internal information or insights about the applications you have developed with wolkenkit or to collect data about you or your users. Instead, we would like to give some examples of questions that we are interested in:

  • How many developers have installed wolkenkit and use it regularly? This is important for assessing the impact of breaking changes. In other words: How many people would we possibly upset if we changed or removed an existing feature?
  • How long do developers typically wait before installing an update of wolkenkit? This helps us to assess how essential new features are and which topics we should put our focus on. Updates that are downloaded sooner are obviously more relevant.
  • How many applications do you build with wolkenkit? This helps us to assess how important it is to support side-by-side execution of multiple wolkenkit applications on a single machine.
  • How many environments (development, test, production, …) do you use wolkenkit for? This helps us to assess the size of the installations and the importance of tools to support different environments.

All these questions refer exclusively to the use of wolkenkit itself, not to the code or domain of the applications you have developed with it, nor to the end users of your applications. This is information that we intentionally and explicitly do not want to receive.

Respecting privacy by default

To get answers to these questions we have decided that starting with wolkenkit 2.0 we will ask if you agree to share some data with us on how you use the wolkenkit CLI. Therefore, we designed and implemented a function to collect and send data – but we wanted it to work in a way that respects your privacy and does not upset anyone.

Therefore, we have defined a number of design principles and guidelines that are the basis for the telemetry function:

  • First and foremost, sending any data is always entirely opt-in. If you do not explicitly agree to share data with us, we do not collect any data at all.
  • If you have agreed to share data with us, you can revoke this decision at any time, without further ado. Sending telemetry data is always completely on a voluntary basis.
  • Not sharing your data will not result in any disadvantages for you. wolkenkit works the same way without any restrictions, whether or not you share your data with us.
  • We will ask you only once if you agree to share your data with us. If you don't agree, we won't ask any more – except when a new version of wolkenkit is released.
  • If sending the data does not work, e.g. for technical reasons, you will never see an error message. Data that could not be sent is our problem, not yours.
  • Any data that is sent is entirely anonymized and does not allow us to make any conclusions about you, your machine, your application or your application's end users.

In the following, we would like to explain how we have implemented these design principles and guidelines from a technical point of view.

Anonymizing data irrevocably

When you start the wolkenkit CLI for the first time after you have installed or updated it, you will be asked if you agree to share data with us. We summarize why we are asking you and link to this blog post for the details. The answer No, thank you. is selected as default. This means that if you do not read the text and simply press <Enter> you will not accidentally send any data.

We store your answer in a JSON file called .wolkenkit in your home directory, using the dotfile-json module. This way we recognize whether we have already asked and do not ask again.

In addition, we generate a random UUID during the first run of the wolkenkit CLI. We will use this UUID to distinguish between different installations of wolkenkit. Since the UUID is purely based on random, however, it does not allow any conclusions to be drawn about you or your machine. This UUID is also stored in the previously mentioned file.

Now, when you execute a command using the CLI and if you have agreed to share your data with us, the CLI first determines the following data locally:

  • The previously generated random UUID to distinguish your installation from others.
  • The version of wolkenkit you are using.
  • The version of the wolkenkit CLI you are using.
  • The name of the CLI command you are running.
  • The name of your wolkenkit application.
  • The name of the environment of your wolkenkit application.
  • The current date and time in UTC.

Before the CLI sends this data, all information concerning you, your machine or your application is anonymized. For this we calculate the hash value from the individual data using the SHA256 algorithm, using the deep-hash module. Since hash functions are mathematically one-way functions, we can no longer determine the name of your application from the hash value, for example.

We could be accused of precalculating common values using a so-called rainbow table to get the original values. To prevent this, all values are prefixed with the random UUID before hashing, making it virtually impossible to precalculate a rainbow table. Since the CLI also only sends the UUID in hashed form, we are unable to reconstruct and recalculate this step.

The only data sent in plain text are the versions of wolkenkit and the wolkenkit CLI, the name of the CLI command, and the current date and time. Since this data is generic, it also does not allow any conclusions to be drawn. And, because the connection to our server is encrypted, no one else can read this data.

If you want to see for yourself how all this works in detail, feel free to have a look at the source code of the CLI. This is one of the nice aspects of open source: It provides transparency. Additionally, if you run the CLI with the --verbose flag, you are able to see what data is actually being sent.

Summing up

So, to cut a long story short – in this blog post we have tried to explain why we ask you to share some data about your use of the CLI of wolkenkit, and what steps we have taken to protect your privacy.

As already mentioned, we love open source and believe that it is the future of software development. Nevertheless, open source must be sustainable in order to work in the long term. By sharing your data with us, you contribute to the further development of wolkenkit.

If you have any questions, suggestions, comments or criticism, you are always welcome to contact us by email at hello@thenativeweb.io. Additionally, you may also contact our data protection officer Stefan Brandys. He can be contacted via email using privacy@thenativeweb.io.

Twitter Facebook LinkedIn

Golo Roden

Founder, CTO, and managing partner

Since we want to deliver elegant yet simple solutions of high quality for complex problems, we care about details with love: things are done when they are done, and we give them the time they need to mature.