Mirek Długosz personal website - testinghttps://mirekdlugosz.com/2024-03-12T17:57:46+01:00Don’t blindly serve WebP format2024-03-12T17:57:46+01:002024-03-12T17:57:46+01:00Mirek Długosztag:mirekdlugosz.com,2024-03-12:/blog/2024/dont-blindly-serve-webp-format/<p>If you have done any webdev work in last few years, you must have heard about WebP. It’s an image format that promises up to 34% smaller file sizes without noticeable quality downgrade. It’s pretty much universally supported since late 2020.</p>
<p>With smaller file sizes and widespread support …</p><p>If you have done any webdev work in last few years, you must have heard about WebP. It’s an image format that promises up to 34% smaller file sizes without noticeable quality downgrade. It’s pretty much universally supported since late 2020.</p>
<p>With smaller file sizes and widespread support, you might think it’s a good idea to just serve all your images in WebP. Or, if you want to be extra backward-compatible - serve WebP to all browsers that claim to support it, and original image to remaining few.</p>
<p>I also thought it’s a good idea, and made a switch on this very website. I’m creating WebP with a compression factor set to 80. And then I noticed that one file is actually larger after conversion.</p>
<p>Following this thread, I compared size of all WebP images with their original counterparts. Turned out WebP produced larger files in 3% (4 out of 117). In worst case, 20 kB <span class="caps">PNG</span> file turned into 122 kB WebP - over sixfold increase in size!</p>
<p>Since then, when I generate WebP, I compare file size to original and keep it only if new format produces smaller file. This way browsers will always receive the smallest file I can produce, regardless of the format.</p>
<p>I guess the main takeaway here is ages old “measure before optimizing”.</p>What’s the best format for a test plan?2024-03-04T10:15:57+01:002024-03-04T10:15:57+01:00Mirek Długosztag:mirekdlugosz.com,2024-03-04:/blog/2024/whats-the-best-format-for-a-test-plan/<p>The Association for Software Testing is crowd-sourcing a book, <a href="https://associationforsoftwaretesting.org/navigating-the-world-as-a-context-driven-tester-book/"><em>Navigating the World as a Context-Driven Tester</em></a>. The book is edited by <a href="https://therockertester.wordpress.com/">Lee Hawkins</a>, who posts questions on <a href="https://twitter.com/AST_News">Twitter</a>, <a href="https://www.linkedin.com/company/association-for-software-testing/">LinkedIn</a>, <a href="https://sw-development-is.social/web/@AST">Mastodon</a>, <a href="https://associationforsoftwaretesting.org/2016/11/13/ast-members-slack/">Slack</a>, and <a href="http://eepurl.com/tCFsn">the <span class="caps">AST</span> mailing list</a>. You don’t have to be a member to give your answer to the …</p><p>The Association for Software Testing is crowd-sourcing a book, <a href="https://associationforsoftwaretesting.org/navigating-the-world-as-a-context-driven-tester-book/"><em>Navigating the World as a Context-Driven Tester</em></a>. The book is edited by <a href="https://therockertester.wordpress.com/">Lee Hawkins</a>, who posts questions on <a href="https://twitter.com/AST_News">Twitter</a>, <a href="https://www.linkedin.com/company/association-for-software-testing/">LinkedIn</a>, <a href="https://sw-development-is.social/web/@AST">Mastodon</a>, <a href="https://associationforsoftwaretesting.org/2016/11/13/ast-members-slack/">Slack</a>, and <a href="http://eepurl.com/tCFsn">the <span class="caps">AST</span> mailing list</a>. You don’t have to be a member to give your answer to the most recent prompt, “What’s the best format for a test plan?”. Mine is below.</p>
<hr>
<p>It should come as no surprise that there is no single “best” format.</p>
<p>Each document that we create should be <em>for someone</em> and should respond to some real or perceived <em>need</em>. Otherwise, why bother writing it?</p>
<p>I imagine perspective on planning changes as you move up in corporate hierarchy. For directors and executives it’s all about expected resource utilization and budgeting. I think they want to know how many people to hire, how many devices they need to buy, how much their cloud costs are going to go up; and what they can skip or cut out while still meeting the strategic objectives. I wouldn’t know, I have never had this kind of job.</p>
<p>As an individual contributor, my perspective is different. When I am asked to create a test plan, my first thought is “who am I writing this for?” <em>I</em> might not need it, and I <em>definitely</em> don’t need it in this very heavy form exemplified by templates floating around the web. Other testers in my team often don’t need it either - they are just glad they didn’t have to write it. Developers in my team don’t care. Other testers in adjacent teams are usually busy with their own things. My manager might be happy to see it, and he might or might not forward it up the chain, but I never heard back from as close as skip-level manager.</p>
<p>Usually after test plan is written, there’s a request for review and comments sent to a number of people. In the best case I will get few questions pointing out parts that could be clarified, or some suggestions for things that I have missed. I appreciate them. This helps me to be more thorough and allows others to better understand what is happening in other parts of the project.</p>
<p>However, I doubt if test plan is the most effective way of achieving these goals. While everything that test plans usually cover <em>is</em> important, I feel whatever I put in document should be the conclusion of conversation, not the beginning of it.</p>
<p>Let’s say I think we will need a team of three people to complete all testing activities in given time, while there’s only me and one colleague. Putting that somewhere deep in test plan will not make a new person appear out from nowhere. In fact, I am much better off talking with my manager about this - because either we need to arrange for new person right now, or we need to highlight to other stakeholders that certain things are at risk because there are not enough people.</p>
<p>The same applies to any hardware and software that might be needed for testing. Most companies have processes for obtaining these, which I would have to follow independently of any test plan document. If I have any doubts on whether my request is going to be accepted, I’d better talk with my manager instead of hoping he will notice this crucial detail somewhere in a test plan that I send him.</p>
<p>This is also true for test ideas that I might put in test plan. I am happy to share them with developers - the best thing that can happen is that I bring their attention to a thing they have not thought about, and they will fix the bugs before any code is subjected to testing. But I haven’t seen any evidence that a long, mostly irrelevant document is the best way to start that conversation.</p>
<p>In fact, I consider thinking about test ideas to be the most valuable part of test planning. Usually I would try to create a list of things I think would be worth testing, based on any resources I have available at the time. Often while writing things down I will realize I’m not exactly sure what is supposed to happen - which means that either I didn’t fully understood what’s going on, or I found a problem with specification. Sometimes I will also write down questions I will want to find answers for.</p>
<p>That list of test ideas is going to be my starting point for actual testing. However, not all test ideas need to be performed repeatedly, and not all of them are worth automating - which leaves open the question if they should be included in any kind of formal test plan.</p>
<p>And even if I did include the whole list in a test plan, considering it to be complete would be a grave mistake. I assume I will add more items later. Some ideas might come to me while doing other things, some might be prompted by conversations that are going to happen as project evolves, and some I will create in response to the actual behavior of developed software. When I test, there are always things that pique my interest, and more often than not they are worth following - no matter if they are included in test plan or not.</p>
<p>I think all the above points, and then some, are now widely understood in the industry. That’s why test plans are not so ubiquitous as they once were. Personally, I can count on one hand the number of times I’ve been asked to create a test plan in a last few years. The software I work on is in continuous development and sees many relatively small releases. Large things, if they happen, take months to develop, not years. We practice backlog refinements, three amigos meetings and collectively write acceptance criteria, which often <em>are</em> a lightweight form of test plan. Along with the other testers we are in the same team as developers, closely watch progress on new features and get a chance to interact with early versions of the software. I participate in code reviews and suggest improvements to unit tests, which allows me to share test ideas.</p>
<p>We get many of the same benefits that test plan could provide, but in less formal way and without spending much time writing it.
However, if I were asked to write a test plan, I would start by reading <a href="https://developsense.com/blog/2008/12/what-should-test-plan-contain">“What Should A Test Plan Contain?” by Michael Bolton</a>.</p>When the build is green, the product is of sufficient quality to release2024-01-25T23:08:31+01:002024-01-25T23:08:31+01:00Mirek Długosztag:mirekdlugosz.com,2024-01-25:/blog/2024/when-the-build-is-green-the-product-is-of-sufficient-quality-to-release/<p>The Association for Software Testing is crowd-sourcing a book, <a href="https://associationforsoftwaretesting.org/navigating-the-world-as-a-context-driven-tester-book/"><em>Navigating the World as a Context-Driven Tester</em></a>. The book is edited by <a href="https://therockertester.wordpress.com/">Lee Hawkins</a>, who posts questions on <a href="https://twitter.com/AST_News">Twitter</a>, <a href="https://www.linkedin.com/company/association-for-software-testing/">LinkedIn</a>, <a href="https://sw-development-is.social/web/@AST">Mastodon</a>, <a href="https://associationforsoftwaretesting.org/2016/11/13/ast-members-slack/">Slack</a>, and <a href="http://eepurl.com/tCFsn">the <span class="caps">AST</span> mailing list</a>. You don’t have to be a member to give your answer to the …</p><p>The Association for Software Testing is crowd-sourcing a book, <a href="https://associationforsoftwaretesting.org/navigating-the-world-as-a-context-driven-tester-book/"><em>Navigating the World as a Context-Driven Tester</em></a>. The book is edited by <a href="https://therockertester.wordpress.com/">Lee Hawkins</a>, who posts questions on <a href="https://twitter.com/AST_News">Twitter</a>, <a href="https://www.linkedin.com/company/association-for-software-testing/">LinkedIn</a>, <a href="https://sw-development-is.social/web/@AST">Mastodon</a>, <a href="https://associationforsoftwaretesting.org/2016/11/13/ast-members-slack/">Slack</a>, and <a href="http://eepurl.com/tCFsn">the <span class="caps">AST</span> mailing list</a>. You don’t have to be a member to give your answer to the most recent prompt, “When the build is green, the product is of sufficient quality to release”. Mine is below.</p>
<hr>
<p>No, not really. At least not in any general sense. But you are free to choose to release based on single metric. If this is what you want, I would advise you to work on creating an environment that allows for this choice to succeed. I would also assume that you are aware of risks that this option entails, and you have considered other options as well.</p>
<p>If you want to release a software, you need a release strategy. You don’t have to call it like that and you don’t have to write it down, but it makes sense to spend few hours thinking about it. Here are some questions you should be able to answer:</p>
<ul>
<li>How many releases do you want to make? When you work under contract, there might be set number of releases that you will be able to make. Some specialized software is being used only once - think about space rocket controls, or software used during sports events like Olympics.</li>
<li>How often do you want to release? Assuming there are going to be multiple releases, is there a specific frequency of releases that you want to maintain? Or are you going to release when you feel you are ready?</li>
<li>Are you able to fix discovered problems after the release? Especially if you don’t plan multiple releases, would you be able to release fixes for critical problems discovered afterwards?</li>
<li>What are the risks of releasing software with unfixed problems? This greatly depends on the nature of these problems, but think also about your reputation and competition.</li>
<li>What are the risks of holding a release to fix problems? This is mostly about all the things accompanying and related to release, e.g. marketing campaign that started well before the planned release date. Can you move things around? Can you afford to extend it?</li>
<li>What does it take to actually make a release? All the technical aspects of getting the software to customers, including distribution logistics for physical copies, if there are any.</li>
<li>What external factors might force you to make a release? These might be technical, like new version of operating system, or related to changed laws and regulations.</li>
<li>How much in advance will you know about these external factors? Changes in laws are usually announced months or years in advance, but you might have only weeks to respond to changes in operating system.</li>
<li>How does a release align with other things that company is doing? Your software is probably only a part of the company offering. Are you independent of other projects? Even when your software is the only thing that company provides, there are probably other branches in company that support it and they all have to work with common goal in mind.</li>
<li>Who decides to make a release? Is that a development team, a product manager, a branch director, sales people? Software can’t release itself, so who decides that this specific version is going to be made available to customers?</li>
</ul>
<p>You can decide to empower the team to make release decisions whenever they see fit, disregarding most of outside factors and concerns. That’s a perfectly valid choice!</p>
<p>The common risk of every single software release is that there might be important problems that you don’t know about, and that you underestimated problems that you do know about - that they turn out to be more serious than you thought. By making a choice to release a new version based only on build results, you decide that these and some other risks are acceptable for you.</p>
<p>First set of risks revolves around quality of your automated checks suite that is run before the release. Common problems with such suites are lack of data variability and coverage gaps in areas that are hard to test, especially interactions with real external systems. If you don’t maintain the discipline of only merging code that is covered by tests, you might also have insufficient coverage of newer features. Automated checks suite must also maintain a balance between thoroughness and execution time. It is common to keep separate pipelines for slow performance and security testing, and there’s a risk of forgetting to run them before making a release.</p>
<p>If you make it easy to make a release, and you make a habit of frequent releases, there’s a risk of losing vigilance. A Big Important Release acts as external pressure that keeps everyone cautious. When releases become a common thing, you might underestimate the importance of found issues, working under the assumption that you can always fix them later and make a new release. That assumption is true and valid for many software issues, but it will become a problem when it starts to be an excuse for sloppiness.</p>
<p>When release decision is made based on build results, releases are usually more frequent and smaller in scope (contain a limited number of changes). The argument goes, smaller releases are easier to test and inherently less risky, so it’s better to release more frequently. On the other hand, smaller changes carry a risk of losing the forest for the trees. As your team is focused on small tasks, there might be nobody to consider the impact of these small changes on the product as a whole. Things that work well in isolation sometimes do not work well together.</p>
<p>Finally, there’s a risk of that choice being wrong for your actual case. Many developers appreciate easy, small and frequent releases. But most of your customers just want to get the task done - they don’t want to wait for update or be interrupted by it. They might have slow Internet connection or pay for downloaded data. They might be averse to change - people tend to be conservative and bear with arduous way of doing things, just because they have always done it like that.</p>
<p>If you want to release based on build results, you want to ensure that you have other ways to minimize these risks, and you have a safety net that minimizes the impact of them when they inevitably occur.</p>
<p>First, make it technically easy to release a new version of software. That usually means that release pipeline is fully automated. Whenever you decide to make a release, you should be able to just push a button and be confident that new version is available to customers.</p>
<p>Second, prepare a plan for when broken version is released, or release process itself fails. That might mean a simple way to yank and roll back a release, or empowering a team to quickly fix an issue and make another release. You might also schedule fire drills when you pretend that working release is broken, just to prepare for a real situation.</p>
<p>Third, if your delivery process allows it, consider rolling releases, blue/green deployments and post-release monitoring. The general idea is that you release new version to some small subset of your customers, and gradually make it available to others. That should be paired with monitoring, so if you notice that there are some problems with new version, you pause the release. This way most of customers never experience the broken behavior.</p>
<p>Finally, you need the ability to deliver new versions frequently. This is a given when you work on a software as a service, where you fully control who uses which version. But this dynamic changes as you move to software that is installed on customer devices, especially devices that are not always online.</p>
<p>To sum up, the quality of software, as signaled by build status, is only a small part of release equation. There’s nothing wrong in deciding to focus on it, disregarding all the other factors. But you need to ensure that this is the right choice in your general business environment, and you should work on safety net that minimises the risk that this choice entails.</p>Playwright - accessing page object in event handler2024-01-03T18:37:03+01:002024-01-03T18:37:03+01:00Mirek Długosztag:mirekdlugosz.com,2024-01-03:/blog/2024/playwright-accessing-page-object-in-event-handler/<p>Playwright <a href="https://playwright.dev/python/docs/api/class-page#events">exposes a number of browser events</a> and provides a mechanism to respond to them. Since many of these events signal errors and problems, most of the time you want to log them, halt program execution, or ignore and move on. Logging is also shown in <a href="https://playwright.dev/python/docs/network#network-events">Playwright documentation about network …</a></p><p>Playwright <a href="https://playwright.dev/python/docs/api/class-page#events">exposes a number of browser events</a> and provides a mechanism to respond to them. Since many of these events signal errors and problems, most of the time you want to log them, halt program execution, or ignore and move on. Logging is also shown in <a href="https://playwright.dev/python/docs/network#network-events">Playwright documentation about network</a>, which I will use as a base for examples in this article.</p>
<h2 id="problem-statement"><a class="toclink" href="#problem-statement">Problem statement</a></h2>
<p>Documentation shows event handlers created with <code>lambda</code> expressions, but <code>lambda</code> poses significant problems once you leave the territory of toy examples:</p>
<ul>
<li>they should fit in single line of code</li>
<li>you can’t share them across modules</li>
<li>you can’t unit test them in isolation</li>
</ul>
<p>Usually you want to define event handlers as normal functions. But when you attempt that, you might run into another problem - Playwright invokes event handler with some event-related data, that data does not contain any reference back to <code>page</code> object, and <code>page</code> object might contain some important contextual information.</p>
<p>In other words, we would like to do something similar to code below. Note that this example does not work - if you run it, you will get <code>NameError: name 'page' is not defined</code>. </p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">playwright.sync_api</span> <span class="kn">import</span> <span class="n">sync_playwright</span>
<span class="kn">from</span> <span class="nn">playwright.sync_api</span> <span class="kn">import</span> <span class="n">Playwright</span>
<span class="k">def</span> <span class="nf">request_handler</span><span class="p">(</span><span class="n">request</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">page</span><span class="o">.</span><span class="n">url</span><span class="si">}</span><span class="s2"> issued request: </span><span class="si">{</span><span class="n">request</span><span class="o">.</span><span class="n">method</span><span class="si">}</span><span class="s2"> </span><span class="si">{</span><span class="n">request</span><span class="o">.</span><span class="n">url</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">response_handler</span><span class="p">(</span><span class="n">response</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">page</span><span class="o">.</span><span class="n">url</span><span class="si">}</span><span class="s2"> received response: </span><span class="si">{</span><span class="n">response</span><span class="o">.</span><span class="n">status</span><span class="si">}</span><span class="s2"> </span><span class="si">{</span><span class="n">response</span><span class="o">.</span><span class="n">url</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">run_test</span><span class="p">(</span><span class="n">playwright</span><span class="p">:</span> <span class="n">Playwright</span><span class="p">):</span>
<span class="n">browser</span> <span class="o">=</span> <span class="n">playwright</span><span class="o">.</span><span class="n">chromium</span><span class="o">.</span><span class="n">launch</span><span class="p">()</span>
<span class="n">page</span> <span class="o">=</span> <span class="n">browser</span><span class="o">.</span><span class="n">new_page</span><span class="p">()</span>
<span class="n">page</span><span class="o">.</span><span class="n">goto</span><span class="p">(</span><span class="s2">"https://mirekdlugosz.com"</span><span class="p">)</span>
<span class="n">page</span><span class="o">.</span><span class="n">on</span><span class="p">(</span><span class="s2">"request"</span><span class="p">,</span> <span class="n">request_handler</span><span class="p">)</span>
<span class="n">page</span><span class="o">.</span><span class="n">on</span><span class="p">(</span><span class="s2">"response"</span><span class="p">,</span> <span class="n">response_handler</span><span class="p">)</span>
<span class="n">page</span><span class="o">.</span><span class="n">goto</span><span class="p">(</span><span class="s2">"https://httpbin.org/status/404"</span><span class="p">)</span>
<span class="n">browser</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
<span class="k">with</span> <span class="n">sync_playwright</span><span class="p">()</span> <span class="k">as</span> <span class="n">playwright</span><span class="p">:</span>
<span class="n">run_test</span><span class="p">(</span><span class="n">playwright</span><span class="p">)</span>
</code></pre></div>
<p>I can think of three ways of solving that: by defining a function inside a function, with <code>functools.partial</code>and with a factory function. Let’s take a look at all of them.</p>
<h2 id="defining-a-function-inside-a-function"><a class="toclink" href="#defining-a-function-inside-a-function">Defining a function inside a function</a></h2>
<p>Most Python users are so used to defining functions at the top level of module or inside a class (we call these “methods”) that they might consider function definitions to be somewhat special. In fact, some other programming languages do encumber where functions can be defined. But in Python you can define them anywhere, including inside other functions.</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">playwright.sync_api</span> <span class="kn">import</span> <span class="n">sync_playwright</span>
<span class="kn">from</span> <span class="nn">playwright.sync_api</span> <span class="kn">import</span> <span class="n">Playwright</span>
<span class="k">def</span> <span class="nf">run_test</span><span class="p">(</span><span class="n">playwright</span><span class="p">:</span> <span class="n">Playwright</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">request_handler</span><span class="p">(</span><span class="n">request</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">page</span><span class="o">.</span><span class="n">url</span><span class="si">}</span><span class="s2"> issued request: </span><span class="si">{</span><span class="n">request</span><span class="o">.</span><span class="n">method</span><span class="si">}</span><span class="s2"> </span><span class="si">{</span><span class="n">request</span><span class="o">.</span><span class="n">url</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">response_handler</span><span class="p">(</span><span class="n">response</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">page</span><span class="o">.</span><span class="n">url</span><span class="si">}</span><span class="s2"> received response: </span><span class="si">{</span><span class="n">response</span><span class="o">.</span><span class="n">status</span><span class="si">}</span><span class="s2"> </span><span class="si">{</span><span class="n">response</span><span class="o">.</span><span class="n">url</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="n">browser</span> <span class="o">=</span> <span class="n">playwright</span><span class="o">.</span><span class="n">chromium</span><span class="o">.</span><span class="n">launch</span><span class="p">()</span>
<span class="n">page</span> <span class="o">=</span> <span class="n">browser</span><span class="o">.</span><span class="n">new_page</span><span class="p">()</span>
<span class="n">page</span><span class="o">.</span><span class="n">goto</span><span class="p">(</span><span class="s2">"https://mirekdlugosz.com"</span><span class="p">)</span>
<span class="n">page</span><span class="o">.</span><span class="n">on</span><span class="p">(</span><span class="s2">"request"</span><span class="p">,</span> <span class="n">request_handler</span><span class="p">)</span>
<span class="n">page</span><span class="o">.</span><span class="n">on</span><span class="p">(</span><span class="s2">"response"</span><span class="p">,</span> <span class="n">response_handler</span><span class="p">)</span>
<span class="n">page</span><span class="o">.</span><span class="n">goto</span><span class="p">(</span><span class="s2">"https://httpbin.org/status/404"</span><span class="p">)</span>
<span class="n">browser</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
<span class="k">with</span> <span class="n">sync_playwright</span><span class="p">()</span> <span class="k">as</span> <span class="n">playwright</span><span class="p">:</span>
<span class="n">run_test</span><span class="p">(</span><span class="n">playwright</span><span class="p">)</span>
</code></pre></div>
<p>This works because <a href="https://docs.python.org/3/reference/compound_stmts.html#function-definitions">function body is not evaluated until function is called</a> and <a href="https://peps.python.org/pep-0227/">functions have access to names defined in their encompassing scope</a>. So Python will look up <code>page</code> only when event handler is invoked by Playwright; since it’s not defined in function itself, Python will look for it in the function where event handler was defined (and then next function, if there is one, then module and eventually builtins).</p>
<p>I think this solution solves the most important part of the problem - it allows to write event handlers that span multiple lines. Technically it is also possible to share these handlers across modules, but you won’t see that often. They can’t be unit tested in isolation, as they depend on their parent function.</p>
<h2 id="functoolspartial"><a class="toclink" href="#functoolspartial"><code>functools.partial</code></a></h2>
<p><a href="https://docs.python.org/3/library/functools.html#functools.partial"><code>functools.partial</code> documentation</a> may be confusing, as prose sounds exactly like a description of standard function, code equivalent assumes pretty good understanding of Python internals, and provided example seems completely unnecessary.</p>
<p>I think about <code>partial</code> this way: it creates a function that has some of the arguments already filled in.</p>
<p>To be fair, <code>partial</code> is rarely <em>needed</em>. It allows to write shorter code, as you don’t have to repeat the same arguments over and over again. It may also allow you to provide saner library <span class="caps">API</span> - you can define single generic and flexible function with a lot of arguments, and few helper functions intended for external use, each with a small number of arguments.</p>
<p>But it’s invaluable when you have to provide your own function, but you don’t have control over arguments it will receive. Which is <em>exactly</em> the problem we are facing.</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">functools</span> <span class="kn">import</span> <span class="n">partial</span>
<span class="kn">from</span> <span class="nn">playwright.sync_api</span> <span class="kn">import</span> <span class="n">sync_playwright</span>
<span class="kn">from</span> <span class="nn">playwright.sync_api</span> <span class="kn">import</span> <span class="n">Playwright</span>
<span class="k">def</span> <span class="nf">request_handler</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="n">page</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">page</span><span class="o">.</span><span class="n">url</span><span class="si">}</span><span class="s2"> issued request: </span><span class="si">{</span><span class="n">request</span><span class="o">.</span><span class="n">method</span><span class="si">}</span><span class="s2"> </span><span class="si">{</span><span class="n">request</span><span class="o">.</span><span class="n">url</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">response_handler</span><span class="p">(</span><span class="n">response</span><span class="p">,</span> <span class="n">page</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">page</span><span class="o">.</span><span class="n">url</span><span class="si">}</span><span class="s2"> received response: </span><span class="si">{</span><span class="n">response</span><span class="o">.</span><span class="n">status</span><span class="si">}</span><span class="s2"> </span><span class="si">{</span><span class="n">response</span><span class="o">.</span><span class="n">url</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">run_test</span><span class="p">(</span><span class="n">playwright</span><span class="p">:</span> <span class="n">Playwright</span><span class="p">):</span>
<span class="n">browser</span> <span class="o">=</span> <span class="n">playwright</span><span class="o">.</span><span class="n">chromium</span><span class="o">.</span><span class="n">launch</span><span class="p">()</span>
<span class="n">page</span> <span class="o">=</span> <span class="n">browser</span><span class="o">.</span><span class="n">new_page</span><span class="p">()</span>
<span class="n">page</span><span class="o">.</span><span class="n">goto</span><span class="p">(</span><span class="s2">"https://mirekdlugosz.com"</span><span class="p">)</span>
<span class="n">local_request_handler</span> <span class="o">=</span> <span class="n">partial</span><span class="p">(</span><span class="n">request_handler</span><span class="p">,</span> <span class="n">page</span><span class="o">=</span><span class="n">page</span><span class="p">)</span>
<span class="n">local_response_handler</span> <span class="o">=</span> <span class="n">partial</span><span class="p">(</span><span class="n">response_handler</span><span class="p">,</span> <span class="n">page</span><span class="o">=</span><span class="n">page</span><span class="p">)</span>
<span class="n">page</span><span class="o">.</span><span class="n">on</span><span class="p">(</span><span class="s2">"request"</span><span class="p">,</span> <span class="n">local_request_handler</span><span class="p">)</span>
<span class="n">page</span><span class="o">.</span><span class="n">on</span><span class="p">(</span><span class="s2">"response"</span><span class="p">,</span> <span class="n">local_response_handler</span><span class="p">)</span>
<span class="n">page</span><span class="o">.</span><span class="n">goto</span><span class="p">(</span><span class="s2">"https://httpbin.org/status/404"</span><span class="p">)</span>
<span class="n">browser</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
<span class="k">with</span> <span class="n">sync_playwright</span><span class="p">()</span> <span class="k">as</span> <span class="n">playwright</span><span class="p">:</span>
<span class="n">run_test</span><span class="p">(</span><span class="n">playwright</span><span class="p">)</span>
</code></pre></div>
<p>Notice that our function takes the same arguments as Playwright event handler, and then some. When it’s time to assign event handlers, we use <code>partial</code> to create a new function, one that only needs argument that we will receive from Playwright - the other one is already filled in. But when function is executed, it will receive both arguments.</p>
<h2 id="factory-function"><a class="toclink" href="#factory-function">Factory function</a></h2>
<p>Functions in Python may not only define other functions in their bodies, but also return functions. They are called “higher-order functions” and aren’t used often, with one notable exception of <a href="https://realpython.com/primer-on-python-decorators/">decorators</a>.</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">playwright.sync_api</span> <span class="kn">import</span> <span class="n">sync_playwright</span>
<span class="kn">from</span> <span class="nn">playwright.sync_api</span> <span class="kn">import</span> <span class="n">Playwright</span>
<span class="k">def</span> <span class="nf">request_handler_factory</span><span class="p">(</span><span class="n">page</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">inner</span><span class="p">(</span><span class="n">request</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">page</span><span class="o">.</span><span class="n">url</span><span class="si">}</span><span class="s2"> issued request: </span><span class="si">{</span><span class="n">request</span><span class="o">.</span><span class="n">method</span><span class="si">}</span><span class="s2"> </span><span class="si">{</span><span class="n">request</span><span class="o">.</span><span class="n">url</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="k">return</span> <span class="n">inner</span>
<span class="k">def</span> <span class="nf">response_handler_factory</span><span class="p">(</span><span class="n">page</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">inner</span><span class="p">(</span><span class="n">response</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">page</span><span class="o">.</span><span class="n">url</span><span class="si">}</span><span class="s2"> received response: </span><span class="si">{</span><span class="n">response</span><span class="o">.</span><span class="n">status</span><span class="si">}</span><span class="s2"> </span><span class="si">{</span><span class="n">response</span><span class="o">.</span><span class="n">url</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="k">return</span> <span class="n">inner</span>
<span class="k">def</span> <span class="nf">run_test</span><span class="p">(</span><span class="n">playwright</span><span class="p">:</span> <span class="n">Playwright</span><span class="p">):</span>
<span class="n">browser</span> <span class="o">=</span> <span class="n">playwright</span><span class="o">.</span><span class="n">chromium</span><span class="o">.</span><span class="n">launch</span><span class="p">()</span>
<span class="n">page</span> <span class="o">=</span> <span class="n">browser</span><span class="o">.</span><span class="n">new_page</span><span class="p">()</span>
<span class="n">page</span><span class="o">.</span><span class="n">goto</span><span class="p">(</span><span class="s2">"https://mirekdlugosz.com"</span><span class="p">)</span>
<span class="n">page</span><span class="o">.</span><span class="n">on</span><span class="p">(</span><span class="s2">"request"</span><span class="p">,</span> <span class="n">request_handler_factory</span><span class="p">(</span><span class="n">page</span><span class="p">))</span>
<span class="n">page</span><span class="o">.</span><span class="n">on</span><span class="p">(</span><span class="s2">"response"</span><span class="p">,</span> <span class="n">response_handler_factory</span><span class="p">(</span><span class="n">page</span><span class="p">))</span>
<span class="n">page</span><span class="o">.</span><span class="n">goto</span><span class="p">(</span><span class="s2">"https://httpbin.org/status/404"</span><span class="p">)</span>
<span class="n">browser</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
<span class="k">with</span> <span class="n">sync_playwright</span><span class="p">()</span> <span class="k">as</span> <span class="n">playwright</span><span class="p">:</span>
<span class="n">run_test</span><span class="p">(</span><span class="n">playwright</span><span class="p">)</span>
</code></pre></div>
<p>The key here is that inner function has access to all of enclosing scope, including values passed as arguments to outer function. This allows us to pass specific values that are only available in the place where outer function is called.</p>
<h2 id="summary"><a class="toclink" href="#summary">Summary</a></h2>
<p>The first solution is a little different than other two, because it does not solve all of the problems set forth. On the other hand, I think it’s the easiest to understand - even beginner Python programmers should intuitively grasp what is happening and why. </p>
<p>In my experience higher-order functions takes some getting used to, while <code>partial</code> is not well-known and may be confusing at first. But they do solve our problem completely.</p>Are observability and monitoring part of testing?2023-11-26T13:40:52+01:002023-11-26T13:40:52+01:00Mirek Długosztag:mirekdlugosz.com,2023-11-26:/blog/2023/are-observability-and-monitoring-part-of-testing/<p>The Association for Software Testing is crowd-sourcing a book, <a href="https://associationforsoftwaretesting.org/navigating-the-world-as-a-context-driven-tester-book/"><em>Navigating the World as a Context-Driven Tester</em></a>. The book is edited by <a href="https://therockertester.wordpress.com/">Lee Hawkins</a>, who posts questions on <a href="https://twitter.com/AST_News">Twitter</a>, <a href="https://www.linkedin.com/company/association-for-software-testing/">LinkedIn</a>, <a href="https://sw-development-is.social/web/@AST">Mastodon</a>, <a href="https://associationforsoftwaretesting.org/2016/11/13/ast-members-slack/">Slack</a>, and <a href="http://eepurl.com/tCFsn">the <span class="caps">AST</span> mailing list</a>. You don’t have to be a member to give your answer to the …</p><p>The Association for Software Testing is crowd-sourcing a book, <a href="https://associationforsoftwaretesting.org/navigating-the-world-as-a-context-driven-tester-book/"><em>Navigating the World as a Context-Driven Tester</em></a>. The book is edited by <a href="https://therockertester.wordpress.com/">Lee Hawkins</a>, who posts questions on <a href="https://twitter.com/AST_News">Twitter</a>, <a href="https://www.linkedin.com/company/association-for-software-testing/">LinkedIn</a>, <a href="https://sw-development-is.social/web/@AST">Mastodon</a>, <a href="https://associationforsoftwaretesting.org/2016/11/13/ast-members-slack/">Slack</a>, and <a href="http://eepurl.com/tCFsn">the <span class="caps">AST</span> mailing list</a>. You don’t have to be a member to give your answer to the most recent prompt, “Are observability and monitoring part of testing?”. Mine is below.</p>
<hr>
<p>Most commonly these terms are used in context of operations, and activities related to them are responsibility of operators (system administrators, DevOps, site reliability engineers etc.), not testers.</p>
<p>That’s because most commonly a team that develops a software - and that’s where testers usually are - is different than a team that deploys software and makes sure it’s up and running. And there are some good reasons for that.</p>
<p>For one, technical skills required to do these things well are quite distinct.</p>
<p>Two, there’s a difference in work scope. Development team is mainly concerned about the software they develop and it assumes that certain dependencies are met (computing resources, database, message queue etc.). Operations team main job is to ensure these dependencies are indeed available, and usually it works on many projects at the same time.</p>
<p>Historically, these teams were often inside <em>different companies</em>. As users became always online and software vendors moved to software as a service model, this is less relevant today. However, it’s worth pointing out that there are still security, legal and technical reasons for software vendor to leave operations to software users.</p>
<p>But the mere observation that most people working in software do not consider observability and monitoring to be a part of testing is not particularly interesting. I think much more intriguing question is: <em>could</em> they be part of testing? Would it be <em>worth it</em> for them to be a part of testing?</p>
<p>My firm belief is that core of “testing” is gathering of information and learning something about reality. As far as I understand it, observability is <em>precisely</em> that. The main difference is that “testing” happens mostly before a release, and usually in somewhat contrived environment, while observability happens after software has been deployed and refers mostly to actual production environment, with all its complexity and quirks. Testing is also more directed and intentional, while observability system may gather all information available, just in case it turns out relevant later.</p>
<p>But I don’t think it’s controversial to say that information how software is actually used, what parts are most visited, for how long, when, on what environments and by how many users is <em>invaluable</em> to a development team. For testers, it can be extremely useful, helping to design better tests and guiding focus towards “hot” areas where potential bugs would be more visible. It can also be used to design testing environments that better model actual production environments.</p>
<p>The point of monitoring is alerting a team about critical errors and recovering from them as soon as possible. <em>Some</em> of these problems might be relevant and useful for a software development team - either because they were caused by their software, or because software could be improved to react better to similar errors in the future.</p>
<p>If we agree that observability and monitoring <em>can</em> be useful for testers, the last question remains: what would it take to involve testers in these activities?</p>
<p>I can’t answer that question from my own experience, but I do have some intuitions.</p>
<p>First and most important, there must exist a communication channel between operations and development team. Development team should be notified about <strong>all</strong> problems and they should be able to at least passively participate in post-mortem sessions. They should have an ability to influence metrics that are being gathered by observability system, and they should actively change their own software to expose particularly interesting and useful metrics.</p>
<p>Second, there needs to be a will to participate among testers themselves. Unfortunately, there are far too many testers that are happy to just continue things the way they have always been. Testers might also be overworked and overextended, and not very happy to take one more responsibility.</p>
<p>The same applies to operators, who likewise are not dying to have one more thing to do. And in case of operators, there’s one more thing I have not discussed so far - <em>how exactly</em> would working closely with development team benefit <em>them</em>?</p>
<p>Finally, the success might hang on buy-in from organization leaders. They need to understand the benefits of this cross-team collaboration and <em>actively make space for it</em>. It’s not enough to set a goal without providing all the resources necessary to fulfill it.</p>10 years in testing2023-11-04T13:39:01+01:002023-11-04T13:39:01+01:00Mirek Długosztag:mirekdlugosz.com,2023-11-04:/blog/2023/10-years-in-testing/<p>Exactly 10 years ago I started my first job as a software tester.</p>
<p>That doesn’t mean I started testing 10 years ago. Back when I was in high school and at university, I did spend some time doing testing and quality-related stuff for various open source projects - Debian, <span class="caps">KDE …</span></p><p>Exactly 10 years ago I started my first job as a software tester.</p>
<p>That doesn’t mean I started testing 10 years ago. Back when I was in high school and at university, I did spend some time doing testing and quality-related stuff for various open source projects - Debian, <span class="caps">KDE</span>, Kadu, LibreOffice, Cantata, and some more. I don’t remember any longer which was the first and when exactly that happened. I imagine my first contribution was pretty uneventful - perhaps a message on users forum, a response confirming this is a bug, and an encouragement to report it on bug tracking system or devs mailing list.</p>
<p>Nonetheless, “first job as software tester” is a good place to start counting. First, it’s easy - I have <em>papers</em> to prove the exact date. Second, from that day I have spent about eight hours a day, five days a week, every week, on testing-related things. That adds up to a lot of time, but it’s the consistency that sets it apart from any open source work I have done. Last but not least, the decision to start this specific job set me on a path to treat testing much more seriously, and which eventually led me to where I am today.</p>
<p>I’m not much of a job hopper. In these 10 years, I have only had two employers. But I did change teams and projects quite a lot - I’ve been on 4 projects in first company, and now I’m on my 5th project in second company. The longest time I’ve ever been in a single project is 2 years and 7 months. Details are on <a href="https://www.linkedin.com/in/mirekdlugosz/en">LinkedIn</a>.</p>
<p>I came into testing after getting a degree in sociology. In my time at university, I had an opportunity to get my feet wet in empirical social research. I approached testing the same way I approached empirical sociology, even if only because I didn’t really know anything else - I assumed there’s a number of things the team would like to know and my job is to learn about them and report my findings. The hard part is that we don’t have direct access to some of the things we would like to know more about, so we need to depend on a number of proxies of uncertain reliability. X can be caused by Y, and we observed X, but is this because of Y, or some other factor Z? How can we rule out Z? Today, I can confidently say this is not the worst way to approach testing.</p>
<p>When I started my first job, I have been using Linux as my main operating system for about 7 years. During that time I learned how to use shell, I got familiar with the idea that things change and move around, I faced various breakages after updates. Often trying to fix them was frustrating, but I did learn how to search for information, I picked up few tricks and I learned how various components can interact in complex system. That was another major source of experiences that influenced my approach to testing.</p>
<p>I guess I also have certain character traits that helped me to become a decent tester. I tend to be stubborn, I don’t give up easily, I self-identify as perfectionist and I strive to actually <em>understand</em> the thing I am dealing with.</p>
<p>After a year and a half I decided that I want to know more about testing, especially established testing techniques and solutions. My work was praised, but it was all based on intuition and past experiences from other fields. I felt I was missing fundamentals and I feared I might be missing some obvious and elementary testing techniques or skills. I tried to fill these gaps by attending an <span class="caps">ISTQB</span> preparation course, but it did not deliver what I was looking for.</p>
<p>My manager knew about my disappointment and at one point presented me with the opportunity to attend a testing conference in another city. One of the talks given there was called <a href="https://www.youtube.com/watch?v=RMaFZU2qhUA">“Context-Driven Testing: A New Hope”</a>. This is a funny title, as Context-Driven Testing was already 15 years old at that time and “schools of testing” debate has long left community conciousness. I don’t remember many details of the talk itself, but I did left the conference with a feeling that I should learn more about <span class="caps">CDT</span>, as they might have at least some of the answers I was looking for.</p>
<p>I think I started by reading <a href="https://www.goodreads.com/book/show/599997.Lessons_Learned_in_Software_Testing">“Lessons Learned in Software Testing”</a>, and what a book it was! It not only revolutionized the way I think about testing to this day, but also gave me much-needed confidence. I found I was already doing some of the things that book recommended, but now I knew <em>why</em> they were worth doing. This is the book that everyone who is serious about testing should read, and probably re-read thorough their career. I think I read it at very good moment, too - I had about three years of experience at the time. I feel I wouldn’t get that much from it if I read it earlier.</p>
<p>Later I have read <a href="https://leanpub.com/perfectsoftware">“Perfect Software”</a> by late Jerry Weinberg. I think this is a great book for people who just start in testing. It surely helped to establish some of my knowledge, but I don’t think it was as influential for me as “Lessons Learned”. It would have been if I read it earlier.</p>
<p>Finally, I have read the complete archives of <a href="https://www.satisfice.com/">James Bach</a> and <a href="https://developsense.com/">Michael Bolton</a> blogs. This is not something I can recommend to anyone, as both are very prolific writers - each authored few hundreds articles. I think it took me well over a year to get through them all. Nonetheless, this allowed me to fully immerse myself in their thinking and I can confidently say I understand where they are coming from and where they are going to. This also allowed me to stumble upon few very valuable articles and resources that I still refer to.</p>
<p>There’s a lot that I learned from all these resources, but I would like to point out two overarching principles that I often come back to. One, my role as a tester is to show possibilities and broaden the view of the team. My job is to go beyond simple and quick answers. Two, every single day I need to ask myself: what is the most important, most impactful thing I can do right now? And then do this exact thing, even if it means putting aside earlier plans and ideas. Change is something to embrace, not to be afraid of.</p>
<p>About five years into my career, I began to slowly move into more software development-heavy role. To some extent, that was out of necessity - I saw many tasks that could be rectified with a tiny bit of programming. At the same time, I was in the environment where development was considered higher on organizational totem pole than “manual testing”, and showing programming skills was a clear way for more respectable assignments and higher salary. Similar to my testing journey, that was not the moment I started to learn programming - I have written my first shell scripts and perl programs back in high school. While I did struggle, I felt confident enough in my programming prowess to do some simple things.</p>
<p>The event that really helped me to take off to the next level happened about a year after I joined Red Hat. We had a <span class="caps">UI</span> test automation framework, which was recently rewritten by a couple of contractors. They worked in a silo and as a result most of the team was not familiar with that code. My job was to learn it, contribute to it and become one of the maintainers.</p>
<p>I think contractors felt threatened by my presence and thought their job security depended on them being the only people capable of working with the framework. As a result, they made code review a nightmare. They threw it all - passive-aggressive comments, unhelpful comments, misleading comments, requests to change code that was already approved in earlier review cycle, demands to explain almost every single line of code, replying anytime between a day and a week. That was all on top of working with unfamiliar, complex and barely documented libraries.</p>
<p>I don’t look back at that time with fondness, but I have to admit it was an effective learning exercise. I was forced to understand things above my capabilities, and eventually I did understand them. This was very much the moment programming finally clicked for me. Also, I learned precisely what to avoid during code reviews and when teaching others.</p>
<p>Since then, my interests started to move more in direction of software design and architecture. I know I can write good enough code that works. But I also want to write code that is maintainable in the long term and allows for adjustments in response to changing environment or requirements.</p>
<p>In these 10 years, I have primarily been an individual contributor. This is the role I feel comfortable in and which I think suits me well. However, I did act as a kind of team lead in two separate occasions. Both times I was not formally a manager for other people and I didn’t feel I have all the tools necessary to make them do the required work. The first time I was completely unprepared for a challenge in front of me. The second time went a little bit better, as I knew more about ways to informally influence people.</p>
<p>These would be the rough summary and most important highlights of my 10 years in testing. There’s no narrative closure, as I am still here and intend to stay for a while longer. I’m happy to talk about testing, open source, software engineering and related topics, so feel free to <a href="https://mirekdlugosz.com/contact.html">get in touch with me</a> if this is something you find interesting, or if you would like to draw from my experience.</p>The problems with test levels2022-08-15T12:18:00+02:002022-08-15T12:18:00+02:00Mirek Długosztag:mirekdlugosz.com,2022-08-15:/blog/2022/the-problems-with-test-levels/<h2 id="test-levels-in-common-knowledge"><a class="toclink" href="#test-levels-in-common-knowledge">Test levels in common knowledge</a></h2>
<p>A test pyramid usually distinguishes three levels: unit tests, integration tests and end to end tests; the last level is sometimes called “<span class="caps">UI</span> tests” instead.
The main idea is that as you move down the pyramid, tests tend to run faster and be more stable …</p><h2 id="test-levels-in-common-knowledge"><a class="toclink" href="#test-levels-in-common-knowledge">Test levels in common knowledge</a></h2>
<p>A test pyramid usually distinguishes three levels: unit tests, integration tests and end to end tests; the last level is sometimes called “<span class="caps">UI</span> tests” instead.
The main idea is that as you move down the pyramid, tests tend to run faster and be more stable, but at the expense of being isolated.
Only tests on higher levels are able to detect problems in how building blocks work together.</p>
<p><span class="caps">ISTQB</span> syllabus presents similar idea.
They distinguish four test levels: component, integration, system and acceptance.
These test levels drive a lot of thought around testing - each level has its own distinct definition and properties, guides responsibility assignment within a team, is aligned with specific test techniques and may be mapped to phase in software development lifecycle.
That’s a lot of work!</p>
<p>Both of these categorizations share the idea that higher level encompasses level below it, and builds upon it.
There’s also certain synergy effect at play here - tests at higher level cover something more than all the tests at the levels below.
That’s why teams with “100% unit tests coverage” still get bug reports from actual customers.
As far as I can tell, these two properties - hierarchy and synergy - are shared by all test levels categorizations.</p>
<h2 id="the-problems"><a class="toclink" href="#the-problems">The problems</a></h2>
<p>I have some problems with this common understanding.
In my experience, while test levels look easy and simple, it’s unclear how to apply them in practice.
If you give the same set of tests to two testers, they are likely to group them to test levels in very different ways.
Inconsistencies like that begs the question: are test levels actually useful categorization tool?</p>
<p>I know, because I have faced these issues when we tried to standardize test metadata in <a href="https://www.redhat.com/en/technologies/management/satellite">Red Hat Satellite</a>.</p>
<p>One of the things provided by Satellite is host management.
You can create, start, stop, restart or destroy the host.
If you have tests exercising these capabilities, you could file them under component level, because host management is one of components of Satellite system.</p>
<p>Satellite also provides content management.
You can synchronize packages from Red Hat <span class="caps">CDN</span> to your Satellite server and tell your hosts to use that exclusively.
This gives you ability to specify what content is available, e.g. you can offer specific version of PostgreSQL until all the apps are tested against newer version.
This also allows for faster updates, because all the data is already in your data center and you can use fast local connection to fetch it.
Tests exercising various content management features can be filed under component level, because content management is one of components of Satellite system.</p>
<p>You can set up host to consume content from specific content view.
Your test might create a host, create a content view, attach host to content view and verify that some packages are or are not available to this host.
You could file such test under integration level, because you integrate two distinct components.</p>
<p>But you could also file that test under system level, because serving specific filtered view of all available content to specific hosts based on various criteria is one of primary use cases of Satellite, and possibly the main reason people are willing to pay money for it.</p>
<p>For the sake of argument, let’s assume that test above is integration level test, and system level is reserved for tests that exercise some larger, end to end flows.
Something like: create a host, create a content view, sync content to host, install a specific package update that requires restart and wait for a host to be back online.</p>
<p>Satellite may be set up to periodically send data about hosts to cloud.redhat.com.
When you test this feature, you might consider Satellite as a whole to be one component and cloud.redhat.com to be another component.
This leads to conclusion that such test should be filed under integration level.</p>
<p>While this conclusion is <em>logical</em> (it follows directly from premises), it doesn’t <em>feel</em> right.
If test levels form a kind of hierarchy, then why test that exercises the system as a whole is on integration level?</p>
<p>You can try to eliminate the problem by lifting this test to system level.
But there still are two visibly distinct tests filed under single label - some system level tests exercise Satellite as a whole, and some system level tests exercise integration between Satellite and some external system.</p>
<p>Either way, your levels become internally inconsistent.</p>
<p>Let’s leave integration and system level for now.
How about acceptance level?</p>
<p>Satellite is a product that is developed and sold to anyone who wants to buy it.
There is no “acceptance” phase in Satellite lifecycle.
Each potential customer would run their own acceptance testing, and while the team obviously appreciated the feedback from these sessions, it was rarely considered to be a “release blocker”.</p>
<p>Given these circumstances, we decided to create a simple heuristic - if the test covers issue reported by customer, then this test should be on acceptance level.</p>
<p>Soon we realized that a large number of customer issues are caused by specific data they have used, or specific environment in which the product operates.
Our heuristic elevated tests from component or integration level way up to acceptance level.</p>
<p>This shows the biggest problem with acceptance level - it belongs to completely different categorization scheme.
Acceptance level is not defined by <strong>what</strong> is being tested, but by <strong>who</strong> performs the testing.</p>
<p>Perhaps there was a time when that distinction had only theoretical meaning.
As a software vendor, you built units, integrated them, verified that system as a whole performs as expected and sent that to customer, who would verify that it fits the purpose.
Acceptance level tests were truly something greater than system level tests.</p>
<p>But we don’t live in such world anymore.
These days, most software is in perpetual development.
There’s no separate “acceptance” phase, because what is subject to acceptance testing of one customer, is actual production version of another customer.
If product is changed based on acceptance testing results, all customers receive that change.</p>
<p>Perhaps placing acceptance testing at the level above system testing was always something that only made sense in very specific context - when developing business software tailored to specific customer that does not subscribe to “all companies are software companies” world view.</p>
<p>While I do not have this kind of experience, I have heard about military contractor that had to submit each function for independent verification by <span class="caps">US</span> Army staff, because army needed to be <em>really</em> sure there’s nothing dicey going on in the system.
I find it believable.
I can think of bunch of reasons why a customer would want to run acceptance tests on units smaller than the whole system.
One of them would be a really high stake - when a bug in a system could mean a difference between being alive and dead.
Another would be when system is expected to last decades and it’s really important for a customer to obtain certain knowledge and prepare for future maintenance.
Military, government (especially intelligence), medicine and automotive all sound like a places where customer might want to verify parts of the system.</p>
<p>Finally, what about unit (component) level?
Are <em>they</em> simple?</p>
<p>Most of testers learn to understand unit tests as a thing that is a developer problem - they are created, maintained and run by developers.
Of course you might question this understanding in the world of shifting left, DevTestOps and “quality is everyone’s responsibility” mantra, but let’s ignore that discussion for now.
If unit tests are developers problem, we should see what developers think about them.</p>
<p>Apparently, they <a href="https://dev.to/tyrrrz/unit-testing-is-overrated-150e">discuss at length what unit even <em>is</em></a>.
There’s also an anecdote floating around of a <a href="https://martinfowler.com/bliki/UnitTest.html">person that covered 24 different definitions of unit test in the first morning of their training course</a>.</p>
<h2 id="could-we-do-better"><a class="toclink" href="#could-we-do-better">Could we do better?</a></h2>
<p>I think it’s clear that there are problems with common understanding of test levels.
But the question remains: are these problems with that specific implementation of the idea, or is the idea of tests levels itself completely busted?
Could there be another way of defining test levels?
Would it be free of problems discussed above?</p>
<p>My thinking about test levels is guided by two principles.
First, levels are hierarchical - higher level should built upon things from the level below.
Obviously, the higher level should be, in some way, more than simple sum of these things below.
Second, it should be relatively obvious to which level a given test belongs.
“Relatively”, because borderline cases are always going to exist in one form or another, and we are humans, so we are going to see things a little different sometimes.
But these should be exceptions, not the norm.</p>
<p><strong>Function level</strong>.
For large majority of us, function is the smallest building block of our programs.
That’s why the lowest level is named after it.
On the function level, your tests focus on individual functions in isolation.
Most of the time, you would try various inputs and verify outputs or side-effects.
Of course it helps when your functions are pure and idempotent.
This is the level mainly targeted by techniques like fuzzing and property-based testing.</p>
<p><strong>Class level</strong>.
The name comes from object-oriented paradigm, where we tend to group functions that work together into classes.
The main goal of tests at this level is to verify integration between functions.
These functions may, but don’t have to, be grouped in the single class.
Since classes group behavior and state, the setup code is much more common on this level - you will find yourself ensuring that class is in specific state before you can test what you actually care about.
Test cleanup code will also appear more often than on function level, for the same reason.
Property-based testing is harder to apply at this level.</p>
<p><strong>Package level</strong>.
This name is inspired by Python naming convention, where package is a collection of modules (i.e. functions and classes) that work together to achieve single goal.
This is also what package level tests are all about - they test interactions between classes, and between classes and functions.
These are the tests that pose the first challenge for common understanding of test levels.
Some people might consider them integration tests (because there are few classes working together, and you want to test how well they integrate with each other), while others would consider them unit tests (because package is designed to solve the single “unit” of domain problem).
For me, package is something that is coherent enough to have somewhat clear boundary with the rest of the system, but not abstract enough to be considered for extraction from the system into 3rd-party library.
This level might be easier to understand in relation to the next level.</p>
<p><strong>Service level</strong>.
The name comes from microservice architecture.
We can discuss at length whether microservices are right for you, and if they are anything more than a buzzword, but that is a discussion for another time.
What’s important is that your project consists of multiple packages (unless you are in the business of creating libraries).
Some of these packages, or some sets of packages, have very clearly defined responsibility within the system, and boundaries that set them apart from the rest of the system.
At least theoretically, these packages <em>could</em> be extracted into separate library (or separate <span class="caps">API</span> service) that your project would pull in as a dependency.
Service level tests focus at these special packages, or collections of packages.</p>
<p>Service level is where things start to become really interesting.
All levels below are focused on code organization.
At service level, you have to face the question of why are you developing the software at all.
Service level is primarily driven by business needs, and relationship between them and specific system components.
Some services encapsulate “business logic” - external constraints that system has to adhere to.
Other services exist only to support these core services or to enable integration with other systems.
Some services are relatively abstract and are likely to be implemented by some open source library (think about database access service or user authentication service).</p>
<p>Service level is also where testers traditionally got involved, because some services exist only to facilitate interaction of the system with outside world.
Think about generating <span class="caps">HTML</span>, sending e-mails, <span class="caps">REST</span> <span class="caps">API</span> endpoints, desktop UIs etc.</p>
<p><strong>System level</strong>.
For large number of intents and purposes, system is a synonym of “software”.
These days, where everything is interconnected and integrated, sometimes it might be hard to clearly define “system” boundaries.
I would use a handful of heuristics: your customers buy a copy of a system, or license to use a system, or create an account within a system.
System is what users interact with.
System has a name, and this name is known to customers.
System is subject to your company marketing and sales efforts.
Most of the things we know and use everyday are systems: Spotify, Netflix, Microsoft Windows, Microsoft Word, …</p>
<p>A lot of systems truly <em>are</em> a collection of services (subsystems).
Most of discussions around software architecture focus on how to arrange services in a way that responsibilities and boundaries are clear.
For many architects, the end goal is to design a system in a way that makes it possible to swap one service implementation for another without impacting the whole thing.</p>
<p>While this separation is important from development perspective, it’s also crucial that it is <strong>not</strong> visible by a customer.
If user <em>feels</em>, or worse - <em>knows</em> that she moves from one subsystem to another, more often than not it means that <span class="caps">UX</span> attention is required.</p>
<p>System level tests focus on exercising integration between subsystems and exercising system as a whole.
Often they will interact with a system through the interface that is known to users - desktop <span class="caps">UI</span>, web page or public <span class="caps">API</span>.
For that reason, system level tests tend to be relatively slow and brittle.
To offset that, usually you will focus only on happy paths and most important end-to-end journeys.</p>
<p><strong>Offering level</strong>.
Many companies are built around single product and never reach this level.
But when a company is big enough and offers multiple products, usually it is important that these products work well together.</p>
<p>Today, one of the best examples is Amazon and <span class="caps">AWS</span>.
<span class="caps">AWS</span> provides access to many services, including <span class="caps">EC2</span> virtual machines, S3 storage and <span class="caps">RDS</span> managed databases.
Most of these services are maintained by dedicated teams, and customers may decide to pay for one and not another.
But customers might also decide to embrace <span class="caps">AWS</span> completely.
When they do, it’s <em>really</em> important that setting up <span class="caps">EC2</span> machine to store data on S3 is easy, ideally easier than any other cloud storage.
Amazon understands that and offers products that group and connect existing services into ready to use solutions for common business problems.</p>
<p>Testing on this level poses unique technical and organizational challenges.
Company engineering structure tends to be organized around specific products.
Each product will be built by different team using different technology stack and tools, and might have different goal and target audience.
To effectively test at this level, you need people working across organization and you need to fill the gaps that nobody feels responsible for.
Often you need endorsement from the very top of company leadership, because most of the teams already have more work than they can handle - and if they are to help with offering testing, that must be done at expense of something else.</p>
<h2 id="but-this-proposal-is-bad"><a class="toclink" href="#but-this-proposal-is-bad">But this proposal is bad</a></h2>
<p>I am not claiming that above proposal is perfect.
In fact, I can find few problems with it myself, which I discuss briefly below.
But I think it is step in right direction and provides good foundation that you can adjust to your specific situation.</p>
<p>If we follow the pattern that higher level is a collection of elements at the level below, we might notice that function is not the smallest unit - most functions are executing multiple system calls, and some system calls might encapsulate multiple processor instructions.
I’ve decided to skip these levels, because I don’t have any experience working with systems so low in the stack.
But I imagine people working on programming languages, compilers and processors might have a case for level(s) below function level.</p>
<p>You might find “class level” to have a misleading name if you work in the language that does not have classes.
In functional languages, like Lisp or Haskell, it might be more fitting to use “higher-order functions level”.
I don’t think the label is the most important part here - the point is, tests at that level verify integration between functions.</p>
<p>Python naming conventions differentiate between modules and packages.
Without going into much detail, module is approximated by single file, and package is approximated by single directory.
In Python, package is a collection of modules.
Java also differentiates between modules and packages, but the relationship is inverted - package is a collection of classes and functions, and module is a collection of related packages.
Depending on your goals and language, it might make sense to maintain both “module level” and “package level”.</p>
<p>Unless you are working on microservices, you might prefer to call “service level” a “subsystem level”.
My answer is the same as to “class level” in purely functional languages - it doesn’t matter that much <em>how</em> you call it, as long as you are being consistent.
Feel free to use a name that better suits your team and your technology stack naming conventions.
The point of service / subsystem level is that these tests cover part of the system that has clearly defined responsibility.</p>
<p>Users these days expect integrations between various services that they use.
Take Notion as an example - it can integrate with applications such as Trello, Google Drive, Slack, Jira and GitHub.
These integrations need to be tested, but it’s unclear to which level these tests belong.
They aren’t system level tests, because they cover system as a whole and something else.
They aren’t offering level tests either, because Trello, Slack and GitHub are not part of your company offer.
I think that sometimes there might be a need for new level, which we might call “3rd party integrations level”.
I would place it between system level and offering level, or between service level and system level.</p>
<h2 id="why-bother-discussing-test-levels-anyway"><a class="toclink" href="#why-bother-discussing-test-levels-anyway">Why bother discussing test levels, anyway?</a></h2>
<p>You tell me!</p>
<p>This article focuses more on “what” of test levels than on “why”, but that’s a fair question.
To wrap the topic, let’s quickly go over some of the reasons why you might want to categorize tests by their levels.</p>
<p>Perhaps you want to track trends over time.
Is most of your test development time spent at function level or service level?
Can you correlate that with specific problems reported by customers?
Does it look like gaps in coverage are emerging from the data?</p>
<p>Perhaps you want to gate your tests on results of tests at the level below.
So first you run function level tests, and once they all pass, you run class level, and once they all pass, you run package level…
You get the idea.</p>
<p>Perhaps you have different targets for each level.
Tests on lower levels tend to run faster, while tests on higher levels tend to be more brittle.
So maybe you are <span class="caps">OK</span> with system level tests completing in 2 hours, but for function level tests, finishing in 15 minutes is unacceptable.
And maybe you target 100% pass rate at the function level, but you understand it’s unreasonable to expect more than 95% pass rate at the system level.</p>
<p>Perhaps you need a tool to guide your thinking on where testing efforts should concentrate.
As a rule of thumb, you want to test things on the lowest level possible.
As you move up in test levels hierarchy, you want to focus on things that are specific and unique to this level.
It’s also generally fine to assume that building blocks on each level are working as advertised, since they were thoroughly tested on the level below.</p>
<p>Whatever you do with test levels, I think it makes sense to use a classification that can be applied unanimously by all team members.
Hopefully the one proposed above will give you some ideas on how to construct such classification.</p>Comments on ISTQB’s “Vision on the Future of Software Testing”. Appendix2020-02-27T10:38:46+01:002020-02-27T10:38:46+01:00Mirek Długosztag:mirekdlugosz.com,2020-02-27:/blog/2020/comments-on-istqbs-vision-on-the-future-of-software-testing-appendix/<p><span class="caps">ISTQB</span>’s “Vision on the Future of Software Testing” made rounds on social media recently. I’m hopping on to this train with my own commentary.</p>
<p><span class="caps">ISTQB</span>’s “Vision on the Future of Software Testing” made rounds on social media recently. I’m hopping on to this train with my own commentary.</p>
<p>The paper can be found on <a href="https://www.istqb.org/references/white-papers.html"><span class="caps">ISTQB</span> website</a> (<a href="https://www.istqb.org/documents/ISTQB_The_Vision_on_the_Future_of_Software_Testing_Final.pdf">direct link</a>). Below, I focus solely on non-content problems with the paper, mostly editing mistakes. I have covered content in <a href="https://mirekdlugosz.com/blog/2020/comments-on-istqbs-vision-on-the-future-of-software-testing/">part 1</a>.</p>
<p>I do not claim this list to be exhaustive. These are only things that stood out so much, that I noticed them while reading paper.</p>
<p>In first section, words “Vision”, “Future”, “software” and “testing” are in slightly smaller font than the rest of the text. That effect doesn’t seem to be used anywhere else in the paper. In middle paragraph, smaller font is applied to part of word (in “testers”, last two letters are bigger than letters before them).</p>
<figure>
<a href="https://mirekdlugosz.com/blog/2020/comments-on-istqbs-vision-on-the-future-of-software-testing-appendix/comments-on-istqbs-vision-on-the-future-of-software-testing-appendix/smaller-font.png">
<img src="https://mirekdlugosz.com/blog/2020/comments-on-istqbs-vision-on-the-future-of-software-testing-appendix/comments-on-istqbs-vision-on-the-future-of-software-testing-appendix/smaller-font-min.png" loading="lazy">
</a>
</figure>
<p>In first paragraph, there is hard line break and text continues immediately below. This is the only place where such effect is used. If it was supposed to indicate new paragraph, there should be empty line between text lines, as in rest of paper.</p>
<figure>
<a href="https://mirekdlugosz.com/blog/2020/comments-on-istqbs-vision-on-the-future-of-software-testing-appendix/comments-on-istqbs-vision-on-the-future-of-software-testing-appendix/line-break.png">
<img src="https://mirekdlugosz.com/blog/2020/comments-on-istqbs-vision-on-the-future-of-software-testing-appendix/comments-on-istqbs-vision-on-the-future-of-software-testing-appendix/line-break-min.png" loading="lazy">
</a>
</figure>
<p><span class="dquo">“</span><span class="caps">ISTQB</span>” is consistently followed by registered trademark symbol, except for third occurrence, where it is not.</p>
<figure>
<a href="https://mirekdlugosz.com/blog/2020/comments-on-istqbs-vision-on-the-future-of-software-testing-appendix/comments-on-istqbs-vision-on-the-future-of-software-testing-appendix/ISTQB-registered.png">
<img src="https://mirekdlugosz.com/blog/2020/comments-on-istqbs-vision-on-the-future-of-software-testing-appendix/comments-on-istqbs-vision-on-the-future-of-software-testing-appendix/ISTQB-registered-min.png" loading="lazy">
</a>
</figure>
<p>Main heading on page 4 is not centered correctly. Middle line (“For”) particularly stands out, as vertical center splits it in proportion 3/8 to 5/8. Last line is slightly moved to the right as well.</p>
<figure>
<a href="https://mirekdlugosz.com/blog/2020/comments-on-istqbs-vision-on-the-future-of-software-testing-appendix/comments-on-istqbs-vision-on-the-future-of-software-testing-appendix/non-centered-text.png">
<img src="https://mirekdlugosz.com/blog/2020/comments-on-istqbs-vision-on-the-future-of-software-testing-appendix/comments-on-istqbs-vision-on-the-future-of-software-testing-appendix/non-centered-text-min.png" loading="lazy">
</a>
</figure>
<p>In Acknowledgements, there is stray comma at the beginning of one line.</p>
<figure>
<a href="https://mirekdlugosz.com/blog/2020/comments-on-istqbs-vision-on-the-future-of-software-testing-appendix/comments-on-istqbs-vision-on-the-future-of-software-testing-appendix/stray-comma.png">
<img src="https://mirekdlugosz.com/blog/2020/comments-on-istqbs-vision-on-the-future-of-software-testing-appendix/comments-on-istqbs-vision-on-the-future-of-software-testing-appendix/stray-comma-min.png" loading="lazy">
</a>
</figure>
<p>Bill Hefley name is misspelled in Acknowledgements as “Helfley”. I find this to be particularly irking, as he’s one of the “thought leaders”. In my book, it’s not a sign of respect for a person when you can’t be bothered to spell their name correctly.</p>
<figure>
<a href="https://mirekdlugosz.com/blog/2020/comments-on-istqbs-vision-on-the-future-of-software-testing-appendix/comments-on-istqbs-vision-on-the-future-of-software-testing-appendix/misspelled-Hefley.png">
<img src="https://mirekdlugosz.com/blog/2020/comments-on-istqbs-vision-on-the-future-of-software-testing-appendix/comments-on-istqbs-vision-on-the-future-of-software-testing-appendix/misspelled-Hefley-min.png" loading="lazy">
</a>
</figure>
<p>Despite the claim, list of “thought leaders” in Acknowledgements is not in alphabetical order. Capers Jones should be after Bill Helfley for the list to be correctly sorted. Not to mention that it’s customary to sort people by last name, not by first name.</p>
<figure>
<a href="https://mirekdlugosz.com/blog/2020/comments-on-istqbs-vision-on-the-future-of-software-testing-appendix/comments-on-istqbs-vision-on-the-future-of-software-testing-appendix/names-order.png">
<img src="https://mirekdlugosz.com/blog/2020/comments-on-istqbs-vision-on-the-future-of-software-testing-appendix/comments-on-istqbs-vision-on-the-future-of-software-testing-appendix/names-order-min.png" loading="lazy">
</a>
</figure>
<p>There are two pages of “clarifications” for paper that is two and a half page long on its own. These proportions are way off and might be a sign that main body of work made extremely poor job at driving the message home.</p>
<p>Overall, I find it strange to have “clarifications”. Who wrote them, when, and why? If they are here to clarify misunderstandings that became apparent before final publication, then paper should be rewritten to avoid them. If need for clarification became clear after first publication, it’s probably still better to rewrite paper and publish new edition. Original paper and summary of changes could remain available for reference and transparency sake.</p>
<p>General editing sloppiness of published paper is just embarrassing. Especially surprising are mistakes that must have been introduced on purpose – things like slightly smaller font can’t happen when you just type text in Word. I might as well end on sarcastic note that <span class="caps">ISTQB</span> envisions future where machines do humans’ work, but they can’t properly automate editing of high school homework-long document.</p>Comments on ISTQB’s “Vision on the Future of Software Testing”2020-02-27T10:34:23+01:002020-02-27T10:34:23+01:00Mirek Długosztag:mirekdlugosz.com,2020-02-27:/blog/2020/comments-on-istqbs-vision-on-the-future-of-software-testing/<p><span class="caps">ISTQB</span>’s “Vision on the Future of Software Testing” made rounds on social media recently. I’m hopping on to this train with my own commentary.</p>
<p><span class="caps">ISTQB</span>’s “Vision on the Future of Software Testing” made rounds on social media recently. I’m hopping on to this train with my own commentary.</p>
<p>The paper can be found on <a href="https://www.istqb.org/references/white-papers.html"><span class="caps">ISTQB</span> website</a> (<a href="https://www.istqb.org/documents/ISTQB_The_Vision_on_the_Future_of_Software_Testing_Final.pdf">direct link</a>). Feel free to read it first, and decide for yourself if my commentary has any merit. Or keep on reading, and see if <span class="caps">ISTQB</span>’s paper is worth your time.</p>
<p>Below, I focus solely on content. <span class="caps">ISTQB</span>’s paper has many editing mistakes, which I cover in <a href="https://mirekdlugosz.com/blog/2020/comments-on-istqbs-vision-on-the-future-of-software-testing-appendix/">part 2</a>.</p>
<h2 id="introduction"><a class="toclink" href="#introduction">Introduction</a></h2>
<p>First thing you might notice is that while document deals with the future, that future is never specified. The only reference to time frame is in following sentence: “The feedback we received [from surveyed Thought Leaders], while extremely interesting and practical was focused more on the near-term aspects of testing”. The way I read it, surveyed people gave opinion on near-term future, while writers wanted to focus on more distant future. On the side note, “near-term” is never defined either. We just don’t know if <span class="caps">ISTQB</span> means months, years or decades, and that makes all the difference in interpretation and discussion with vision.</p>
<p>In introductory paragraphs, they say “We surveyed a group of twenty Thought Leaders in the <span class="caps">IT</span> testing discipline, such as Capers Jones, Bill Hefley, Robin Poston, Harry Sneed, to list a few.” All names can be found on page 6. I fully understand it speaks much more about me than about <span class="caps">ISTQB</span>, and perhaps I am showing ignorance right now, but I have never heard about most of these people. List of surveyed “thought leaders” is missing many people who I would consider much more influential or relevant to topic at hand, starting with <a href="https://lisacrispin.com/">Lisa Crispin</a> – who actually works in <a href="https://www.mabl.com/">company providing self-healing, <span class="caps">AI</span>-powered test automation tool</a>.</p>
<p>Customary for <span class="caps">ISTQB</span>, activities leading to formulation of vision is completely non-transparent. Who selected “thought leaders” and what was the key for selecting these specific people? Did they survey everyone they wanted, or did some people decline? When were interviews conducted, and how? What questions were asked, or what topics were covered during conversation? Did respondents answer independently? Could they revise their answers after reading answers from other people, or early version of report?</p>
<h2 id="vision"><a class="toclink" href="#vision">Vision</a></h2>
<p>If you haven’t read original paper (and I can’t blame you), here’s the summary of vision. In future, there will be “testing technology solutions” which will somehow obtain all domain and expert knowledge that people have, and will use it to prepare testing plan that will answer “what?”, “how?” and “why?”. Before actually doing anything, these solutions will present that plan for acceptance by testing professional. At this stage, tester will be able to make changes to the plan. These changes, along with changes made in other solutions (existence of some global information exchange network for testing technology solutions is assumed), will serve as base for further learning and improvements of solutions. It’s also stressed out that testing technology solutions will select quality attributes on their own, and will “teach” these attributes to human stakeholders.</p>
<p>Overall, that vision is something taken out of science-fiction movies from 1960s. Today, we are as close to fulfilling it as we were in 1960s.</p>
<h3 id="conversing-with-computer"><a class="toclink" href="#conversing-with-computer">Conversing with computer</a></h3>
<p>Vision calls for testers and testing experts to “join in dialog” with testing technology solutions. It’s build on assumption that humans could have something resembling intelligent conversation with the machine.</p>
<p>I understand how for untrained eye it might look like we are close to creating systems capable of such feats. After all, we can already speak to machines. We can ask them to call hairdresser, and they can put scheduled visit in our calendars! But machines can’t talk with receptionist on our behalf to book the visit, and they can’t give opinion on which haircut would suit us best. And there is a world of difference in complexity of these tasks.</p>
<p>As a side note, presented vision is incoherent in that area. On one hand, testing technology solutions are supposed to learn from experts without people bending too much to machines, present information in clear, easy to understand way, and allow humans to introduce modifications and predict results of these modifications before committing to them. On the other hand, testing professionals are expected to greatly broaden their knowledge about these systems, mostly through time-consuming and costly formal education and certification schemes. I just can’t see how we could have both of these at the same time.</p>
<h3 id="global-information-exchange-network"><a class="toclink" href="#global-information-exchange-network">Global information exchange network</a></h3>
<p>I love free software and open standards – I find idea of global network for information exchange between testing technology solutions very appealing. At the same time, it’s painfully obvious for me that current machine learning revolution eludes established free software procedures.</p>
<p>Machine learning models need huge amounts of data, and that data is rarely shared in open way. Designers and maintainers of algorithms rarely give a second thought to their reproducibility - understood here as ability to bootstrap current revision of algorithm from first revision and initial set of data. Which is not surprising, given that these algorithms are closely guarded trade secrets, and re-building them from scratch is not something that anyone in company would ever need to do.</p>
<p>It’s not clear why anybody would want to share data on this global network. After all, it only makes sense from business point of view to gain competitive edge by gathering data shared by others, but never revealing your own data. This is known as <a href="https://en.wikipedia.org/wiki/Free-rider_problem">free-rider problem</a>. The only realistic solution is strict policing, which requires for network to be neither global nor open.</p>
<p>Interestingly enough, there is ongoing discussion in open source/free software community on how you can prevent large actors from making huge profits while offloading all the costs to small group of creators. That problem is known as “open source sustainability” and the thing is, so far nobody really figured it out.</p>
<h3 id="algorithm-picking-up-its-own-performance-metrics"><a class="toclink" href="#algorithm-picking-up-its-own-performance-metrics">Algorithm picking up its own performance metrics</a></h3>
<p>According to <span class="caps">ISTQB</span>, testing technology solutions will be able to “learn the testing discipline” and “teach all stakeholders the quality attributes”. In other words, future solutions will select their own tasks and performance metrics. I am speechless. Only someone who completely ignored everything that has been happening around artificial intelligence in last decade could say something like that.</p>
<p>End task is one of defining, if not the most important, property of machine learning algorithms. It says what algorithm is supposed to be doing. Maliciously defined task allows to lie and mislead, while maintaining facade of objectivity. Ill defined task results in algorithm inheriting and encoding negative phenomena, like racisms and other forms of discrimination. Undefined task makes it impossible to judge and check algorithm.</p>
<p>Performance metrics tell how well algorithm appears to be doing its job. They are one of primary defenses against overfitting – a situation where algorithm performs too well in laboratory, which most likely means it will perform poorly in real world setting.</p>
<p>Tasks and performance metrics are primary devices of controlling artificial intelligence. Giving them to machine means losing control over algorithms, and there is no way it could end up good. You might think <em>The Terminator</em>, but really look at <a href="https://en.wikipedia.org/wiki/Weapons_of_Math_Destruction"><em>Weapons of Math Destruction</em></a> for multiple examples of real, negative impact that machines have on us <strong>right now</strong>.</p>
<h3 id="artificial-intelligence-ethics"><a class="toclink" href="#artificial-intelligence-ethics">Artificial intelligence ethics</a></h3>
<p>As if somewhat aware of negative impact that artificial intelligence has on society, <span class="caps">ISTQB</span> decided to include statement about ethics. Unfortunately, they did so in completely unsatisfactory manner by hand-waving the whole issue:</p>
<blockquote>
<p>The ethical aspects of decision making by artificial intelligence (hopefully) will be resolved and this requires a level of human intervention in the testing technology solutions where ethical judgment is concerned.</p>
</blockquote>
<p>Reflection on ethics is as old as humanity, and some of the greatest minds in history spent significant time trying to “solve” ethics (usually by trying to ground it in some unshakeable foundation, or derive it logically from first principles). And yet, we haven’t progressed <em>that much</em> since Bible. Our tools, language and ways of thinking are much more sophisticated, but we aren’t much closer to saying we are done than we were 2000 years ago. Ethics might as well be unsolvable, and expecting to have it solved in next couple of decades is not particularly reasonable.</p>
<h3 id="explainable-artificial-intelligence"><a class="toclink" href="#explainable-artificial-intelligence">Explainable artificial intelligence</a></h3>
<p>According to vision, machine itself will be able to explain various decisions it has taken. This is commendable goal. Many politicians, scientists and journalists expressed the need to go in that direction, and some very important work is being done to move us a little closer to fulfilling that goal.</p>
<p>But major problem is, biggest players on the market have very little incentive to prioritize explainability of their models. It’s rather the opposite, they have reasons to ensure that nobody else understands inner workings of these models. After all, you can make some good money this way.</p>
<p>In fact, things are even worse than that. If reports from giants like Google and Facebook are to be trusted, these companies make it a point that their own employees working on these algorithms don’t fully understand how their product will be used and who will it serve.</p>
<p>Fostering openness and transparency in such environment is not easy. Changes won’t happen overnight and will require hard work and dedication of many actors, including governments. It seems community is just starting to realize we have a problem, time for solutions is yet to come.</p>
<h2 id="preparation-and-implementation"><a class="toclink" href="#preparation-and-implementation">Preparation and Implementation</a></h2>
<p>In section following the vision, <span class="caps">ISTQB</span> says that testers will have to gain “soft skills” such as decision-making, leadership, interpersonal communications, project management, and teamwork.</p>
<p>Formal education and certification schemes, including courses and certifications that do not exist yet, are presented as the best way of acquiring these skills. For a future where machines think and converse, you might consider these to be rather traditional, if not a little dated. Of course it’s not very surprising in a paper coming from company whose main business <strong>is</strong> certification.</p>
<p>On a side note, one might wonder why <span class="caps">ISTQB</span> thinks testers don’t have these skills already, or aren’t actively working on acquiring them. “Quality coach” role is not exactly novel concept.</p>
<h2 id="responsibilities-of-the-testing-professionals"><a class="toclink" href="#responsibilities-of-the-testing-professionals">Responsibilities of the Testing Professionals</a></h2>
<p>Closing section, “Responsibilities of the Testing Professionals”, is… surprisingly good! There isn’t anything particularly silly, uninformed or controversial in these items.</p>
<p>I don’t believe presented vision will be reality anytime soon. But in a world where it is realized, or will be realized soon, last section describes good and constructive attitude. In another time and place, I could support this part of paper.</p>
<h2 id="summary"><a class="toclink" href="#summary">Summary</a></h2>
<p>My biggest gripe with <span class="caps">ISTQB</span> paper is that it covers future that is never specified. That makes it hard to have any kind of discussion with their vision, as time frame governs argumentation that should be used.</p>
<p>For a short-term future (couple of years), paper is overly optimistic at best, helplessly incompetent at worst. While we can already see early attempts at systems similar to what they describe, there is enormous amount of work still required to fully realize such vision. I don’t find it likely for this work to be completed in just couple of years, given that some actors are only starting to realize there is the problem, and others are actively pushing in other directions.</p>
<p>For a long-term future (couple of decades), vision is as good as any other going so far into the future. Some of predictions will never become reality, while other fields will advance beyond anything we could imagine today. What’s not very clear is why <span class="caps">ISTQB</span> decided to write science-fiction story like that, and why they published it themselves instead of submitting to relevant magazine.</p>
<p>What’s striking is complete misalignment of presented vision and everything that <span class="caps">ISTQB</span> created so far. It might even be argued that following current <span class="caps">ISTQB</span> teachings will actively hamper any efforts in realizing that vision. If there is one thing in the paper that should be taken seriously and worked upon right now, it’s that certification industry must seriously review and update their education to face coming future. But that’s lesson mainly for <span class="caps">ISTQB</span>, not so much for wider testing community.</p>Which job title I prefer?2019-11-26T10:32:25+01:002019-11-26T10:32:25+01:00Mirek Długosztag:mirekdlugosz.com,2019-11-26:/blog/2019/which-job-title-i-prefer/<p>I overthink question asked by a friend. </p>
<p>I overthink question asked by a friend. </p>
<p>Recently, a friend asked me: “what job title best describes what you do at Red Hat?” He also gave following list of options to choose from: </p>
<ul>
<li>Quality Engineer</li>
<li>Software Developer</li>
<li>Software Developer Engineer</li>
<li>Software Developer Engineer in Test</li>
<li>Software Developer Engineer in Test Automation</li>
<li>Software Engineer</li>
<li>Software Engineer in Test</li>
<li>Software Engineer in Test Automation</li>
<li>Automation Engineer</li>
<li>Automation Engineer In Test</li>
<li>Tester</li>
</ul>
<p>Initially, I wanted to pick “Tester”. I have always self-identified as a tester, even more so since I started to better understand what it means. </p>
<p>But one thing stopped me from selecting this option right away – most of the work I have done in last couple of months isn’t something that we usually associate with testers.</p>
<ul>
<li>I created a tool that gathers data scattered over multiple sources and creates single master list that acts as a source of truth for our automation. I also updated existing tools to use this data, and added new tool that publishes human-readable version of master list on our internal wiki.</li>
<li>I took the role of maintainer of our <span class="caps">UI</span> automation framework. I am responsible for setting general direction where project is heading, but I mostly triage reported issues and review code submissions. I also help others by teaching them how to use the framework, answering various questions and giving code pointers or implementation drafts.</li>
<li>I review PRs in our automation repositories. Many of them. My guesstimate is that I have seen at least 75% of all code submitted in last 6 months. Usually I focus on simplification opportunities and assuring that our checks are adequate and not shallow.</li>
</ul>
<p>Since these are not activities that spring to my mind when I say “tester”, perhaps another name would be better suited? </p>
<p>I kind of like “Software Developer Engineer in Test” (<span class="caps">SDET</span>), and that could be my second choice. As a community, we talk a lot about cross-functional teams, and how there is no dedicated tester role in agile team, and about shifting left, and how quality is everyone’s responsibility, and how testers today act as “quality coaches”. So probably I can call myself software developer, just like other people working on the same product call themselves. Especially since activities listed above often fall under responsibilities of developers.</p>
<p><span class="dquo">“</span>Software developer in test” also reminds me of <a href="https://en.wikipedia.org/wiki/T-shaped_skills">T-shaped person</a>. The idea is that every team member has some knowledge about a lot of things, and very deep knowledge about some specific thing, like JavaScript, databases, or <span class="caps">CI</span> systems. I find this fitting, as I do know a thing or two about multiple software-related topics, but my main area of expertise is testing.</p>
<p>Alas, there are implications of my understanding of the title that are not commonly shared in wider community. Since “<span class="caps">SDET</span>” was introduced, it always meant someone focused solely on “test automation”, frameworks and tools. I feel that all articles describing the role paint a picture of a person that has one solution to all problems - and that solution is him writing more code. SDETs will happily discuss maintainability, extensibility or code cleanliness, but won’t feel comfortable in conversation about oracles. In my experience, their frameworks will be top-notch, but they will use it for rather shallow testing.</p>
<p>Perhaps there is another way to reconcile “tester” label with things I am doing. Maybe defining property of a role is not what you do, but what you are prepared to do?</p>
<p>I am writing tools, maintaining frameworks and reviewing code, because in my organizational context we decided that this is the best use of my time and skills. But there are multiple other things I could be doing, if we decided these contributions were more valuable. I might provide feedback on new feature or product at any stage of development, give feedback on documentation or planned conference talk, verify developers work, triage bug reports, create internal documentation, write guidelines, design or improve processes. I did these things in the past and I will not refuse a request to do them again.</p>
<p>In other words, maybe key to understand role lies in its boundaries. Boundaries that can be determined through observation of activities that are accepted and refused by role-holder. For me, “tester” is all-encompassing label and includes all activities related to measuring and improving quality of the product, team behind it and their organizational and cultural environment. While <span class="caps">SDET</span> might refuse to perform “manual testing”, tester will happily do whatever is necessary.</p>
<p>It’s easy to get lost in discussion about job titles. There are multiple definitions floating around and people acting as if their definition was the only “true” one. At the end of the day, it might be good idea to take a step back and acknowledge that often this is not the most important issue at hand. I am committed to excel at the craft of testing, and this goal will remain unchanged whether my contract spells “quality engineer”, “software developer in test” or “tester”.</p>Simple visual regression checking with Selenium and ImageMagick2019-11-24T20:48:57+01:002019-11-24T20:48:57+01:00Mirek Długosztag:mirekdlugosz.com,2019-11-24:/blog/2019/simple-visual-regression-checking-with-selenium-and-imagemagick/<p>I wanted to ensure that recent change did not break backwards compatibility and I ended up with visual regression checking script built with freely available software.</p>
<p>I wanted to ensure that recent change did not break backwards compatibility and I ended up with visual regression checking script built with freely available software.</p>
<p>Recently, I switched object ids used by <a href="https://createpokemon.team/">createpokemon.team</a>. One of the steps in entire process was creating backwards compatibility layer - these ids are exposed in <span class="caps">URL</span> and there might be bookmarks and links posted around which could suddenly stop loading some data. In my quest to gain confidence that this solution works, I created simple visual regression checking tool.</p>
<h2 id="talk-is-cheap-show-me-the-code"><a class="toclink" href="#talk-is-cheap-show-me-the-code">Talk is cheap, show me the code!</a></h2>
<p><a href="https://github.com/mirekdlugosz/scrapbook/tree/master/create-pokemon-team-visual-diff">Completed solution is hosted at GitHub</a>. This post is intertwined with code samples, but they are not intended to fully work on their own.</p>
<h2 id="testing-goals-and-strategy"><a class="toclink" href="#testing-goals-and-strategy">Testing goals and strategy</a></h2>
<p>Overarching goal of this activity was rather vague “demonstrating that existing URLs continue to work”.</p>
<p>There are two main sources of “existing URLs”. One is version deployed to production. I can fill the form, copy part of <span class="caps">URL</span> and test new version against it. Since I know how backwards compatibility procedure works, I can come up with data that might be problematic, as well as reference data that should not be problematic.</p>
<p>Another source are real URLs that real users navigated to out in the wild. Thankfully, I added Google Analytics to website, and it does provide comprehensive list of all URLs - along with number of visits for each. With that data, I can prioritize checking Pokemon, moves and teams that are most popular.</p>
<p><span class="dquo">“</span>Continue to work” means two things: that form is populated with team data provided in <span class="caps">URL</span>, and that analysis outcome is unchanged.</p>
<p>Since these are questions about data, it’s only natural to think about it in isolation of presentation. That reasoning would set us on path that includes gathering data from website – and since there is no machine-readable output available, that means scraping. But we can abuse the fact that there were no changes in <span class="caps">UI</span> and the same output will be presented in the same way. If there is no visible difference between old and new version, then data in both is sure to be the same. We don’t need to know what the data actually is.</p>
<h2 id="capturing-screenshot-with-selenium"><a class="toclink" href="#capturing-screenshot-with-selenium">Capturing screenshot with Selenium</a></h2>
<p>In first iteration of my work, I focused on gathering screen snapshot automatically. To do that, I need to open web browser, navigate to required page, ensure that all client-side operations have completed, actually capture image of visible site content and save that on disk. This can be done in just couple lines of code:</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">random</span>
<span class="kn">from</span> <span class="nn">selenium</span> <span class="kn">import</span> <span class="n">webdriver</span>
<span class="n">teams</span> <span class="o">=</span> <span class="p">[]</span> <span class="c1"># loading URLs is skipped for brevity</span>
<span class="n">team</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">choice</span><span class="p">(</span><span class="n">teams</span><span class="p">)</span>
<span class="n">chrome_options</span> <span class="o">=</span> <span class="n">webdriver</span><span class="o">.</span><span class="n">ChromeOptions</span><span class="p">()</span>
<span class="n">driver</span> <span class="o">=</span> <span class="n">webdriver</span><span class="o">.</span><span class="n">Chrome</span><span class="p">(</span><span class="n">options</span><span class="o">=</span><span class="n">chrome_options</span><span class="p">)</span>
<span class="n">base_url</span> <span class="o">=</span> <span class="s1">'http://localhost:4200'</span>
<span class="n">driver</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">base_url</span><span class="si">}{</span><span class="n">team</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
<span class="n">driver</span><span class="o">.</span><span class="n">save_screenshot</span><span class="p">(</span><span class="s1">'/tmp/selenium.png'</span><span class="p">)</span>
<span class="n">driver</span><span class="o">.</span><span class="n">quit</span><span class="p">()</span>
</code></pre></div>
<p>After confirming that it indeed opens required page and saves screenshot, I added two command line flags:</p>
<div class="highlight"><pre><span></span><code><span class="n">chrome_options</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s1">'--headless'</span><span class="p">)</span>
<span class="n">chrome_options</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s1">'--window-size=1920,2160'</span><span class="p">)</span>
</code></pre></div>
<p>This way browser opened by script is not visible on screen, so I can use computer without risk of interfering with automation. I increased window size vertically to gather entire page content on single run.</p>
<h2 id="visual-difference-between-two-images"><a class="toclink" href="#visual-difference-between-two-images">Visual difference between two images</a></h2>
<p>Thanks to <a href="https://imagemagick.org">ImageMagick</a> library and set of tools, visual difference between two images can be produced with single command:</p>
<div class="highlight"><pre><span></span><code>compare<span class="w"> </span>-compose<span class="w"> </span>src<span class="w"> </span>FIRST_FILE<span class="w"> </span>SECOND_FILE<span class="w"> </span>OUTPUT_FILE
</code></pre></div>
<p>I ran my script two times and saved page screenshots as two distinct files. After feeding them to above command, I obtained this (click to see full size):</p>
<figure>
<a href="https://mirekdlugosz.com/blog/2019/simple-visual-regression-checking-with-selenium-and-imagemagick/simple-visual-regression-checking-with-selenium-and-imagemagick/sample-difference.png">
<img src="https://mirekdlugosz.com/blog/2019/simple-visual-regression-checking-with-selenium-and-imagemagick/simple-visual-regression-checking-with-selenium-and-imagemagick/sample-difference-min.png" title="Sample visual difference between two teams" alt="Sample visual difference between two teams" loading="lazy">
</a>
<figcaption>Sample visual difference between two teams</figcaption>
</figure>
<h2 id="creating-safe-filenames"><a class="toclink" href="#creating-safe-filenames">Creating safe filenames</a></h2>
<p>I want the ability to track image with differences to <span class="caps">URL</span> that triggered them, in case I need to analyse them in closer detail.</p>
<p>Using <span class="caps">URL</span> as image name seems natural. Unfortunately, full team definition can be quite lengthy (longest <span class="caps">URL</span> in my sample is 528 characters long), and ext4 file system limits file name length to 255 bytes (characters). This is often not enough.</p>
<p>To ensure uniqueness of file name while maintaining its limited length, I decided to use hash (checksum) of <span class="caps">URL</span> string as file name. To meet traceability requirement, I stored both hash and <span class="caps">URL</span> in separate file.</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">hashlib</span>
<span class="k">def</span> <span class="nf">fs_sanitize</span><span class="p">(</span><span class="n">string</span><span class="p">):</span>
<span class="n">hash_</span> <span class="o">=</span> <span class="n">hashlib</span><span class="o">.</span><span class="n">sha256</span><span class="p">(</span><span class="n">string</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="s1">'utf-8'</span><span class="p">))</span>
<span class="k">return</span> <span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">hash_</span><span class="o">.</span><span class="n">hexdigest</span><span class="p">()</span><span class="si">}</span><span class="s2">.png"</span>
<span class="n">map_handle</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="s1">'map.txt'</span><span class="p">,</span> <span class="s1">'w'</span><span class="p">)</span>
<span class="n">team</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">choice</span><span class="p">(</span><span class="n">teams</span><span class="p">)</span>
<span class="n">fs_friendly_url</span> <span class="o">=</span> <span class="n">fs_sanitize</span><span class="p">(</span><span class="n">team</span><span class="p">)</span>
<span class="n">map_handle</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">fs_friendly_url</span><span class="si">}</span><span class="se">\t</span><span class="si">{</span><span class="n">team</span><span class="si">}</span><span class="se">\n</span><span class="s2">"</span><span class="p">)</span>
<span class="n">driver</span> <span class="o">=</span> <span class="n">webdriver</span><span class="o">.</span><span class="n">Chrome</span><span class="p">(</span><span class="n">options</span><span class="o">=</span><span class="n">chrome_options</span><span class="p">)</span>
<span class="n">base_url</span> <span class="o">=</span> <span class="s1">'http://localhost:4200'</span>
<span class="n">driver</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">base_url</span><span class="si">}{</span><span class="n">team</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
<span class="n">driver</span><span class="o">.</span><span class="n">save_screenshot</span><span class="p">(</span><span class="n">fs_friendly_url</span><span class="p">)</span>
<span class="n">driver</span><span class="o">.</span><span class="n">quit</span><span class="p">()</span>
<span class="n">map_handle</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
</code></pre></div>
<h2 id="optimizations"><a class="toclink" href="#optimizations">Optimizations</a></h2>
<p>Google analytics stored some 400 000 unique URLs. This is way too much to check during a weekend project, not to mention that they can be downloaded only in batches of 5000. </p>
<p>So first optimization is downloading only subset of them. I opted for 10 000. Given that from 1600th item onwards, each <span class="caps">URL</span> was accessed less than 10 times, this is essentially exhaustive list of “popular” URLs and some random sample of less-popular <span class="caps">URL</span>.</p>
<p>But 10 000 is still too much. Assuming it would take only 3 seconds to process one team, it would still take good 8 hours to process all of them. I further reduced size of that list by drawing random sample from it.</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">random</span>
<span class="n">teams_subset</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">sample</span><span class="p">(</span><span class="n">teams</span><span class="p">,</span> <span class="mi">400</span><span class="p">)</span>
</code></pre></div>
<p>Initially, I aimed for code simplicity. Since I needed two screenshots to compare, it was obvious that I should use loop.</p>
<p>Then I realized that I am basically doubling the execution time for no good reason. Instead, I should start two web drivers at once, ask each to open different page, wait a little and then obtain both screenshots, even if that means there will be some duplicated code.</p>
<div class="highlight"><pre><span></span><code><span class="n">manager</span> <span class="o">=</span> <span class="p">{</span>
<span class="s2">"actual"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"driver"</span><span class="p">:</span> <span class="kc">None</span><span class="p">,</span>
<span class="s2">"dir"</span><span class="p">:</span> <span class="n">pathlib</span><span class="o">.</span><span class="n">Path</span><span class="p">(</span><span class="s1">'actual_results/'</span><span class="p">),</span>
<span class="s2">"base_url"</span><span class="p">:</span> <span class="s1">'http://localhost:4200'</span>
<span class="p">},</span>
<span class="s2">"expected"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"driver"</span><span class="p">:</span> <span class="kc">None</span><span class="p">,</span>
<span class="s2">"dir"</span><span class="p">:</span> <span class="n">pathlib</span><span class="o">.</span><span class="n">Path</span><span class="p">(</span><span class="s1">'expected_results/'</span><span class="p">),</span>
<span class="s2">"base_url"</span><span class="p">:</span> <span class="s1">'https://createpokemon.team'</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">for</span> <span class="n">run</span> <span class="ow">in</span> <span class="n">manager</span><span class="p">:</span>
<span class="n">manager</span><span class="p">[</span><span class="n">run</span><span class="p">][</span><span class="s2">"driver"</span><span class="p">]</span> <span class="o">=</span> <span class="n">webdriver</span><span class="o">.</span><span class="n">Chrome</span><span class="p">()</span>
<span class="k">for</span> <span class="n">team</span> <span class="ow">in</span> <span class="n">random</span><span class="o">.</span><span class="n">sample</span><span class="p">(</span><span class="n">teams</span><span class="p">,</span> <span class="mi">400</span><span class="p">):</span>
<span class="n">fs_friendly_url</span> <span class="o">=</span> <span class="n">fs_sanitize</span><span class="p">(</span><span class="n">team</span><span class="p">)</span>
<span class="k">for</span> <span class="n">run</span> <span class="ow">in</span> <span class="n">manager</span><span class="o">.</span><span class="n">values</span><span class="p">():</span>
<span class="n">base_url</span> <span class="o">=</span> <span class="n">run</span><span class="p">[</span><span class="s2">"base_url"</span><span class="p">]</span>
<span class="n">run</span><span class="p">[</span><span class="s2">"driver"</span><span class="p">]</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">base_url</span><span class="si">}{</span><span class="n">team</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
<span class="k">for</span> <span class="n">run</span> <span class="ow">in</span> <span class="n">manager</span><span class="o">.</span><span class="n">values</span><span class="p">():</span>
<span class="n">screenshot_path</span> <span class="o">=</span> <span class="n">run</span><span class="p">[</span><span class="s2">"dir"</span><span class="p">]</span><span class="o">.</span><span class="n">joinpath</span><span class="p">(</span><span class="n">fs_friendly_url</span><span class="p">)</span>
<span class="n">run</span><span class="p">[</span><span class="s2">"driver"</span><span class="p">]</span><span class="o">.</span><span class="n">save_screenshot</span><span class="p">(</span><span class="n">screenshot_path</span><span class="o">.</span><span class="n">as_posix</span><span class="p">())</span>
<span class="k">for</span> <span class="n">run</span> <span class="ow">in</span> <span class="n">manager</span><span class="o">.</span><span class="n">values</span><span class="p">():</span>
<span class="n">run</span><span class="p">[</span><span class="s2">"driver"</span><span class="p">]</span><span class="o">.</span><span class="n">quit</span><span class="p">()</span>
</code></pre></div>
<h2 id="results-analysis"><a class="toclink" href="#results-analysis">Results analysis</a></h2>
<p>I started with sorting all created images by size. This allowed me to quickly identify outliers:</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>ls<span class="w"> </span>-lahSr<span class="w"> </span>diff/
...
-rw-r--r--<span class="w"> </span><span class="m">1</span><span class="w"> </span>mdlugosz<span class="w"> </span>mdlugosz<span class="w"> </span><span class="m">5</span>,6K<span class="w"> </span>Nov<span class="w"> </span><span class="m">24</span><span class="w"> </span><span class="m">14</span>:43<span class="w"> </span>67896595bd945c62fdb8c857afb6887baf50e1fb62904e9e7159fc034e7f0912.png
-rw-r--r--<span class="w"> </span><span class="m">1</span><span class="w"> </span>mdlugosz<span class="w"> </span>mdlugosz<span class="w"> </span><span class="m">5</span>,6K<span class="w"> </span>Nov<span class="w"> </span><span class="m">24</span><span class="w"> </span><span class="m">14</span>:36<span class="w"> </span>0487b61857b7417920d0cb3a70641e74d563e417f0354c94a9f66b292a10686e.png
-rw-r--r--<span class="w"> </span><span class="m">1</span><span class="w"> </span>mdlugosz<span class="w"> </span>mdlugosz<span class="w"> </span><span class="m">5</span>,7K<span class="w"> </span>Nov<span class="w"> </span><span class="m">24</span><span class="w"> </span><span class="m">14</span>:43<span class="w"> </span>869fce86191cf921fe253d1f1c792280b0c01d481a35b0da3d10ebe5b27824a6.png
-rw-r--r--<span class="w"> </span><span class="m">1</span><span class="w"> </span>mdlugosz<span class="w"> </span>mdlugosz<span class="w"> </span><span class="m">5</span>,7K<span class="w"> </span>Nov<span class="w"> </span><span class="m">24</span><span class="w"> </span><span class="m">14</span>:30<span class="w"> </span>5e2a96809439a5bac1d235b16544c2385532d1a1ad379abb1586256540d75140.png
-rw-r--r--<span class="w"> </span><span class="m">1</span><span class="w"> </span>mdlugosz<span class="w"> </span>mdlugosz<span class="w"> </span><span class="m">5</span>,9K<span class="w"> </span>Nov<span class="w"> </span><span class="m">24</span><span class="w"> </span><span class="m">15</span>:14<span class="w"> </span>9aa5ca526685da394c0cf401aa44596657298f19ee347b3f880c3f48e25b76a8.png
-rw-r--r--<span class="w"> </span><span class="m">1</span><span class="w"> </span>mdlugosz<span class="w"> </span>mdlugosz<span class="w"> </span><span class="m">6</span>,2K<span class="w"> </span>Nov<span class="w"> </span><span class="m">24</span><span class="w"> </span><span class="m">14</span>:01<span class="w"> </span>7ce7e7f3d05560e26981e6b9c23773a0372f6cf6f1bc21c0ed6a0f8d4da61447.png
-rw-r--r--<span class="w"> </span><span class="m">1</span><span class="w"> </span>mdlugosz<span class="w"> </span>mdlugosz<span class="w"> </span>20K<span class="w"> </span>Nov<span class="w"> </span><span class="m">24</span><span class="w"> </span><span class="m">13</span>:52<span class="w"> </span>862eceaaf4bcb06ffa0fdaf6b263999d1d5e2ec06b1f9d40533c311b9d89bef5.png
-rw-r--r--<span class="w"> </span><span class="m">1</span><span class="w"> </span>mdlugosz<span class="w"> </span>mdlugosz<span class="w"> </span>27K<span class="w"> </span>Nov<span class="w"> </span><span class="m">24</span><span class="w"> </span><span class="m">14</span>:51<span class="w"> </span>0133026b0a0f64ca7cb00529d083ace2effd89fb4d1279283ca6e6d8087cc35e.png
drwxr-xr-x<span class="w"> </span><span class="m">2</span><span class="w"> </span>mdlugosz<span class="w"> </span>mdlugosz<span class="w"> </span>72K<span class="w"> </span>Nov<span class="w"> </span><span class="m">24</span><span class="w"> </span><span class="m">15</span>:29<span class="w"> </span>.
</code></pre></div>
<p>It turned out there are some cases where the same team does not produce identically-looking pages, but not for the reason I was interested in. Some Pokemon changed their displayed name slightly and sometimes new name takes different number of rows than old one. As a result, considerable part of page got moved vertically, causing a big diff.</p>
<p>Another problem is that during development, new version uses different domain than existing instance, and current <span class="caps">URL</span> is displayed near the bottom of page. This caused all pairs to report some differences. I skimmed over all images to confirm there are no unexpected changes, but I should strive for making images really identical. This would allow me to exclude all images with exact same size from analysis, making it trivial to identify cases that differed in significant way.</p>
<figure>
<a href="https://mirekdlugosz.com/blog/2019/simple-visual-regression-checking-with-selenium-and-imagemagick/simple-visual-regression-checking-with-selenium-and-imagemagick/random-result.png">
<img src="https://mirekdlugosz.com/blog/2019/simple-visual-regression-checking-with-selenium-and-imagemagick/simple-visual-regression-checking-with-selenium-and-imagemagick/random-result-min.png" title="Random 'nothing interesting to see here, move along' image" alt="Random 'nothing interesting to see here, move along' image" loading="lazy">
</a>
<figcaption>Random ‘nothing interesting to see here, move along’ image</figcaption>
</figure>
<h2 id="conclusion-and-ideas-for-further-work"><a class="toclink" href="#conclusion-and-ideas-for-further-work">Conclusion and ideas for further work</a></h2>
<p><a href="https://github.com/mirekdlugosz/scrapbook/tree/master/create-pokemon-team-visual-diff">Final version of code I have used is on GitHub</a>.</p>
<p>While this solution did get the work done, it is not perfect. There is number of things that could be done to improve performance and maintainability:</p>
<ul>
<li>Proper logging and exception handling should be added.</li>
<li>Paths and parameters (like sample size) should be passed in as command line options, or loaded from environment.</li>
<li>Screenshots of one team should be bit-by-bit identical to allow easier results analysis. This could be achieved by adjusting browser window size or by changing development version to produce exact same <span class="caps">URL</span> as production instance.</li>
<li>Two webdriver instances are very far from fully utilizing available system resources. Main loop should be revamped to support larger number of concurrent web driver sessions. One way to achieve that is queueing mechanism, which would store list of URLs to process and assign them to web drivers that are free (web drivers would need to report they completed assigned work and can take up another task).</li>
<li>Fixed wait times are widely considered a code smell in web automation. Of course webdriver should take screenshot as soon as page has fully loaded team data.</li>
<li>Image diffs should be created in separate process. This would allow to fully utilize multiple CPUs on machine, but requires implementing another queueing mechanism (as well as efficient way to find pairs of images that were not yet processed).</li>
</ul>How to win 3rd place at TestingCup?2019-07-07T11:28:32+02:002019-07-07T11:28:32+02:00Mirek Długosztag:mirekdlugosz.com,2019-07-07:/blog/2019/how-to-win-3rd-place-at-testingcup/<p><a href="http://testingcup.pl/">TestingCup</a> is annual testing competition in Poland and this year, I won 3rd place in individual category.</p>
<p><a href="http://testingcup.pl/">TestingCup</a> is annual testing competition in Poland and this year, I won 3rd place in individual category.</p>
<h2 id="context-competition-rules"><a class="toclink" href="#context-competition-rules">Context: Competition rules</a></h2>
<p>During TestingCup, we are given three hours to test application that we see for the first time and which is crafted specifically for competition. We earn points and the winner is the person who collected the most of them.</p>
<p>Points are earned in two ways: by reporting bugs and by creating testing process artifact. Number of points from bug report depends on severity - critical security issue and application crashes are worth the most, duplicates are worth the least (actually, they are worth negative points). It pays off to go really deep and find important problems, but it also pays off to maximize coverage and find many issues. Testing process artifact is single document that must comply with “widely-used standard”, such as <span class="caps">IEEE</span>-829. It is graded against unknown checklist of elements it should contain - each checked box earns you some points. As far as I can tell, actual substance of document is of lesser importance.</p>
<p>Points are awarded by championships jury in non-transparent process. After the championships you are given your total number of points, but you don’t know how much points you earned for each activity, what was final severity of your bug reports and which boxes on artifact checklist were checked. I guess you can ask over email? Jury decisions are final and there is no appeal process in place. Jury promises that each bug report goes through at least two jury members and they discuss until disagreements are resolved.</p>
<p>You can download application and accompanying documents, including list of known bugs and exemplary artifacts, from <a href="http://www.mrbuggy.pl/">MrBuggy website</a>.</p>
<h2 id="prepare-your-machine"><a class="toclink" href="#prepare-your-machine">Prepare your machine</a></h2>
<p>During the competition, you are expected to use your own machine. Organizers provide some minimal requirements it must meet (native Windows installation, particular .<span class="caps">NET</span> version or newer, <span class="caps">RJ</span>-45 connection, sometime others) and list of forbidden activities (mainly communicating with external parties and decompiling). Everything in-between is fair play. Which places preparation of machine among the most important things you can do to maximize your chances of winning.</p>
<p>Install every development tool and productivity software you know how to use, and also some that you only heard about. Last year, at one point I discovered that application stores data in SQLite database, but I didn’t have tools to access it and poke around. This year, I installed git for Windows, Python, R with RStudio and tidyverse, Postman, LibreOffice suite, Greenshot, SQLiteBrowser, 7-zip and <span class="caps">VS</span> Code (including plugins for spell checking, linting and indentation). And probably some more. Even then, during competition there was a moment when I wished I had Jupyter Notebook installed.</p>
<p>Keep reference materials on your disk. When working on test process artifact, you might want to open <span class="caps">ISTQB</span> syllabus and ensure you haven’t missed something obvious. Last year, I did not have any testing resources on my machine and I am sure my test report wasn’t particularly good. This year, I had <span class="caps">ISTQB</span> syllabus, offline copies of <a href="https://www.developsense.com/">Michael Bolton</a> and <a href="https://www.satisfice.com/">James Bach</a> blogs and some other documents. Our task was to create test plan and I kept my cool just because I had access to article titled <a href="https://www.developsense.com/blog/2008/12/what-should-test-plan-contain/"><em>What Should A Test Plan Contain?</em></a>.</p>
<p>This kind of feels like cheating, but you might prepare templates for critical test process artifacts. Championships rules do not forbid it. That was my idea for this year - I copied one of test reports from previous competitions and intended to use it as template. I did not, as this year we had to create test plan.</p>
<h2 id="read-instructions"><a class="toclink" href="#read-instructions">Read instructions</a></h2>
<p>I know this one is mentioned in virtually every “how to pass FooBar exam/certification” article, but I underestimated how important it really is.</p>
<p>This year we had the opportunity to evaluate our own reports and judge (anonymized) work of others after the competition. And clearly, some people did not read the instructions, or failed to understand them. I saw bug reports for things that were explicitly included in list of “known issues”. I saw bug reports pointing out that features described in Change Request document are missing - you know, features that were requested by business, for which development has not yet started. I also saw test plan that was literally perfect, except for one small detail - it was completely off-topic, being based on delivered MrBuggy instead of Change Request document. I don’t know how many points this person earned, because “document is on-topic” was not on the list of things we were supposed to check.</p>
<p>I don’t want to bash these people or paint myself as superior. I want to stress out that you should read all of provided materials, especially instructions. And then you should read them again. And then you should read them from bottom to top, just to ensure you really understand what is expected from you, what will be held against you and what doesn’t matter at all.</p>
<h2 id="keep-it-simple"><a class="toclink" href="#keep-it-simple">Keep it simple</a></h2>
<p>You know how articles <a href="https://www.guru99.com/defect-management-process.html#2">introducing various test process artifacts list all kinds of stuff as required</a>? Following their advice is sure way to waste time and focus on least important tasks.</p>
<p>During competition, your bug reports must cover real problems and be understood by jury. There are no other requirements. Usually it’s good idea to provide steps to reproduce, but sometimes there are so short that you may skip them. There are situations when it’s required to point out what you expected to happen, but often this is obvious from context. You might describe testing environment in painstaking detail, but everyone has exactly the same, so why bother?</p>
<p>Same goes for the way you write your reports. Sure, you might show off your language proficiency, but is it worth it to spend 30 second looking for exact word that perfectly conveys what you mean? Someone else used simpler word and spend these 30 second thinking how to test specific requirement.</p>
<p>Simply put, don’t waste time on information that is not required or necessary. Use simple words and simple grammar. Keep your sentences short and on point. Focus on discovering important problems fast and make sure they are communicated clearly.</p>
<h2 id="track-your-time"><a class="toclink" href="#track-your-time">Track your time</a></h2>
<p>It’s pretty obvious, but important enough to state it explicitly. Keep track of time.</p>
<p>It’s very easy to forget about passage of time when you face serious and interesting challenge, or when you are extremely focused on task at hand. Yet competition do not provide luxury of spending as much time as you want on everything that piqued your interest. You have to consciously control amount of time spent on each activity and feature. Concentrating for one hour on one thing only is not worth it.</p>
<p>This also means you have to be relentless in deciding it’s time to move on. Sure, you might feel you are so close to revelation and be tempted to give it one more minute, but what you probably really feel is sunk cost fallacy. Leaving unfinished work is hard, but necessary. It might help to make a note so you can return to this problem later on.</p>
<h2 id="abuse-notes"><a class="toclink" href="#abuse-notes">Abuse notes</a></h2>
<p>This is another rather obvious, but nevertheless important point. You are working on the computer, which is able to store virtually unlimited amount of text. As part of conference pack you will be given pen and notebook. Make use of them.</p>
<p>This year, for the first time, <span class="caps">HTTP</span> <span class="caps">API</span> was supported way of interacting with MrBuggy. All calls required Authorization header, which had to include base64-encoded username and password. While organizers did provide simple tool to encode one string, re-typing usernames and copying them all the time would be huge waste of time. Instead, I kept encoded strings in VSCode. This way I could quickly select and copy them.</p>
<p>Last year, I added item to my notebook after covering each feature. This helped to direct further efforts into areas that were not yet tested, as well as provided overview for what I actually did. I also captured ideas that I would like to pursue further if time permits. This made it easier to leave some tasks unfinished when time for them was running out.</p>
<p>As you can see, notes don’t have to be used in creative way to be useful. Just keep in mind they are an option and use them every time they can support your main activities.</p>
<h2 id="dont-bother-with-live-results"><a class="toclink" href="#dont-bother-with-live-results">Don’t bother with live results</a></h2>
<p>Preliminary results are displayed live during the competition. You will do best if you ignore them completely.</p>
<p>Last year, my name was third on early results table (at least last time I saw it before competition ended). I felt pretty good about it and I thought I can actually get the trophy, so you can imagine my disappointment when final results were announced and I finished up seventh. This year, I fell out of top 10 around midway through the competition. Last time I saw my name, I had around 40 points. Near the end of competition, everyone had 50-70 points. That’s pretty big gap and I was sure there is no way for me to close it, so I accepted I will finish on worse place than previous year. You can imagine my surprise when I was announced as winner of 3rd place.</p>
<p>Live results are misleading in part because they don’t factor in test process artifact. It’s worth 20 or so points, so it can impact your results quite a bit.</p>
<p>But what is much more important, live results are based entirely on self-assigned categories. If you decide your bug is critical security issue, your total points will increase by 10. Later jury might decide this bug should have much lower severity and your final points will go down considerably.</p>
<p>You can easily secure top spot in live results - just report all your bugs as most critical. Live results are as easy to game as they are meaningless.</p>
<h2 id="practice-at-home"><a class="toclink" href="#practice-at-home">Practice at home?</a></h2>
<p>I have not followed this one myself, so I can’t say how important it actually is. Nevertheless, <a href="http://mrbuggy.pl/">MrBuggy website</a> provides software used in previous editions of championships, along with list of known issues and example test process artifacts. You can download it, set timer for three hours and do dry-run of competition. Just write down all bug reports and document in some local file. Afterwards, compare list of bugs you found with list of all known bugs. Which did you fail to find? Why? What could you do differently to earn more points?</p>
<h2 id="make-it-fun"><a class="toclink" href="#make-it-fun">Make it fun</a></h2>
<p>Last, but not least, try to be positive towards entire championships and just have fun. </p>
<p>Competition are not objective assessment of your skills, knowledge or worth as a tester. Neither are they very reliable measurement tool. As an example, the same person won first place in 2017, second place in 2018 and… fourteenth place this year. Shuffles like that are quite common and have many, many reasons.</p>
<p>Personally, I haven’t prepared at all for my first championships in 2018. For 2019, I made a point to prepare my machine, but mostly relied on instinct and natural approach to problems during the competition. Winning a trophy is nice, but it was never the goal for me - I mostly wanted to know how well I naturally stand against the others. As it turns out, pretty well.</p>
<p>As a closing remark: if you had fun during competition, if you learned a single lesson, if you improved your craft in any way - you are the true winner. It doesn’t matter if you were first or last in final standing.</p>My question on AB Testing podcast2019-02-19T21:34:45+01:002019-02-19T21:34:45+01:00Mirek Długosztag:mirekdlugosz.com,2019-02-19:/blog/2019/my-question-on-ab-testing-podcast/<p>Question that I’ve asked has been covered in <a href="https://www.angryweasel.com/ABTesting/ab-testing-episode-97-questions-about-developers-and-tests/">recent episode of <span class="caps">AB</span> Testing</a>. Answer starts at around 9 minutes mark.</p>
<p>Question that I’ve asked has been covered in <a href="https://www.angryweasel.com/ABTesting/ab-testing-episode-97-questions-about-developers-and-tests/">recent episode of <span class="caps">AB</span> Testing</a>. Answer starts at around 9 minutes mark.</p>
<p>I’m not a huge fan of podcasts. Most of them aren’t particularly interesting. In rare cases when I do find something that I would like to listen to, it’s hard for me to find enough time and right place. Usually when I try to listen to them on my computer, I get distracted, shift my attention to something else and eventually find out that recording has stopped and I don’t remember a single thing they said.</p>
<p>This has been changing this calendar year, as I was gifted one of these wireless headphones. I put them on when I do household chores or when I want to take a rest from looking at screen. This works rather well for me so far. One of the things that I have listened to was <a href="https://www.angryweasel.com/ABTesting/ab-testing-episode-94-modern-testing-meets-context-driven-testing/"><span class="caps">AB</span> Testing podcast episode 94: “Modern Testing meets Context-Driven Testing”</a>.</p>
<p>I was surprised by quality of it. One particular thing that stands out is the way that hosts discuss and debate. When they disagree, they do it in very civil and constructive manner. When they agree, they contribute useful point of view to each other thought. When they talk about differences between Modern Testing and Context-Driven Testing, they stress out whether disagreement is on fundamental level, or in wording that is used. I fully recommend listening to that episode, even if only to see how high-quality discussion might look like. We have way too few of these on the Internet.</p>
<p>I recommend listening to that episode also because it has couple of good points and thought-provoking statements on subjects of management and trust between manager and associate. Both hosts are at high level of their respective organizations, so their experience and points of view are worth considering.</p>
<p>Anyway, one of the subjects they touched was “skilled” and “unskilled” testing. I wanted to learn more about it, so I emailed couple of questions to Alan, one of podcast hosts. He decided to answer them over in the show. That episode is now live. If you are interested, listen to <a href="https://www.angryweasel.com/ABTesting/ab-testing-episode-97-questions-about-developers-and-tests/">“<span class="caps">AB</span> Testing Episode 97: Questions About Developers and Tests”</a>.</p>Found on web: Black Box Puzzles2019-01-18T01:30:34+01:002019-01-18T01:30:34+01:00Mirek Długosztag:mirekdlugosz.com,2019-01-18:/blog/2019/found-on-web-black-box-puzzles/<p>Black Box Puzzles is one of few websites that actually can help you become better tester.</p>
<p>Black Box Puzzles is one of few websites that actually can help you become better tester.</p>
<p><a href="http://blackboxpuzzles.workroomprds.com/">Black Box Puzzles by James Lyndsay</a>.</p>
<p>I have heard about this project in the past, but I didn’t use it until couple of days ago. Perhaps the link was outdated? Or I have found reference on my ebook reader, which doesn’t have Internet access? Or there was disclaimer about puzzles being in Flash and I haven’t even tried?</p>
<p>Black Box Puzzles are… well, puzzles. There are no instructions for them. You may use them however you like - it’s entirely up to you. You <strong>can</strong> use them to see how it is to use application without specification and to improve your model-building - and model-refuting - skills. You can play, you can explore, you can learn, you can understand, you can test. They are fun little brain-teaser to kill some time and to provoke a thought about how you work.</p>
<p>All puzzles are deterministic. They will provide the same output for the same input. To the best of their author knowledge, they are bug-free. There are no tricky parts to trip you over. Author promises they do simple things, but there is no guarantee that your definition of “simple” overlaps with his.</p>
<p>It’s not all roses, though.</p>
<ul>
<li>Out of 21 puzzles, 12 are in Flash. They won’t work on mobile device or any modern browser. In fact, I am not yet sure how to run them at all.</li>
<li>There are no instructions, no specification and no oracles. That means you can’t be really sure if your work is done. If you can predict response of system to every action you take, is this due to high predictive power of your model, or is it because you are really bad at thinking about tests that might refute your model?</li>
<li>That also means that if you can’t figure out what is going on, you are screwed. There is no hint that you could take or solution that you could look up. You will learn that you don’t know something, with no indication what that something might be or how to fill that gap.</li>
<li>You can’t read source code of puzzles. Of course reading inner workings of puzzle out of their code could be considered cheating, but comparing conclusions that you can reach using black-box approach and white-box approach could be useful exercise as well.</li>
</ul>
<p><strong>Takeaway</strong>: Go to <a href="http://blackboxpuzzles.workroomprds.com/">this website</a> and do some puzzles.</p>Found on web: AST blogroll2018-08-14T15:38:07+02:002018-08-14T15:38:07+02:00Mirek Długosztag:mirekdlugosz.com,2018-08-14:/blog/2018/found-on-web-ast-blogroll/<p>I grabbed a list of blogs aggregated on Association for Software Testing website, so you don’t have to.</p>
<p>I grabbed a list of blogs aggregated on Association for Software Testing website, so you don’t have to.</p>
<p>Have you heard about Association for Software Testing (<span class="caps">AST</span>), <span class="caps">US</span>-based non-profit organization? I probably did hear the name before, but dismissed it as something akin to <span class="caps">ISTQB</span>. Only recently I learned how wrong I was - it is actually association of Context-Driven testers, created by Cem Kaner himself!</p>
<p>One of the hidden gems on their website is planet / blog aggregator / blog syndication feature. On the right side of their blog there is a black box with links to posts written by some of the members of <span class="caps">AST</span>. While you can use that box directly, or subscribe to <span class="caps">AST</span> blog feed (which is union of posts from all tracked blogs), I wanted to see a full list of all aggregated blogs. This makes it harder to miss people who don’t blog anymore, but have rich collection of past writings.</p>
<p>Since I couldn’t find such list, I decided to create it myself. Blogs are sorted in descending order by the publication time of newest posts.</p>
<ul>
<li><a href="https://syrett.blog">syrett.blog | Neil Syrett | Software Tester</a></li>
<li><a href="https://www.stickyminds.com">StickyMinds | Software Testing <span class="amp">&</span> <span class="caps">QA</span> Online Community</a></li>
<li><a href="https://always-fearful.blogspot.com">אשרי אדם מפחד תמיד Happy is the man who always fears</a></li>
<li><a href="https://qahiccupps.blogspot.com">Hiccupps</a></li>
<li><a href="https://www.mkltesthead.com"><span class="caps">TESTHEAD</span></a></li>
<li><a href="https://blog.tentamen.eu">tentamen blog – Blog that makes software testing interesting and exciting.</a></li>
<li><a href="http://www.satisfice.com">James Bach - Satisfice, Inc.</a></li>
<li><a href="http://blog.aclairefication.com">aclairefication</a></li>
<li><a href="https://mrslavchev.com">Mr.Slavchev - The cave of the testing troll</a></li>
<li><a href="http://nickytests.blogspot.com">Nicky Tests Software</a></li>
<li><a href="https://www.kenst.com">Chris Kenst</a></li>
<li><a href="https://beaglesays.blog">@Beaglesays – a nose for testing</a></li>
<li><a href="https://www.associationforsoftwaretesting.org">Association for Software Testing | Software Testing Professional Association</a></li>
<li><a href="http://www.shino.de">Markus Gärtner | Software Testing, Craftsmanship, Leadership and beyond</a></li>
<li><a href="http://bernieberger1.blogspot.com">Bernie Berger</a></li>
<li><a href="http://www.brendanconnolly.net">Assert.This – Testing, Automation, and Exploration</a></li>
<li><a href="http://xndev.com/creative-chaos/">Creative Chaos | Excelon Development</a></li>
<li><a href="https://thepainandgainofedwardbear.wordpress.com">The Pain and Gain of Edward Bear</a></li>
<li><a href="http://elementalselenium.com">Elemental Selenium: Receive a Free, Weekly Tip on Using Selenium like a Pro</a></li>
<li><a href="http://www.tnridgeback.com">Testing Bites</a></li>
<li><a href="https://roadlesstested.com">Road Less Tested – thoughts on mastering the craft of software testing, delivering quality software and agile practices</a></li>
<li><a href="http://www.huibschoots.nl">Huib Schoots – Software Tester – Trainer – Coach – Writer – Speaker – Leader – Storyteller</a></li>
<li><a href="http://carstenfeilberg.blogspot.com">Let’s go explore</a></li>
<li><a href="http://www.dogmatictesting.com">The Pragmatic Testing | Agile, Testing, Sense-making</a></li>
<li><a href="https://mewtblog.wordpress.com"><span class="caps">MEWT</span> | Midlands Exploratory Workshop on Testing</a></li>
<li><a href="http://testingthoughts.com">Testing Thoughts – A focus on context-driven testing</a></li>
<li><a href="http://www.questioningsoftware.com/">Questioning Software</a></li>
<li><a href="http://scott-barber.blogspot.com/">Peak Performance</a></li>
<li><a href="http://tattooedtester.blogspot.com">A tester in Tennessee</a></li>
<li><a href="http://markwaite.blogspot.com/">Mark Waite</a></li>
</ul>Found on web: Premises of Rapid Software Testing2018-06-22T23:04:17+02:002018-06-22T23:04:17+02:00Mirek Długosztag:mirekdlugosz.com,2018-06-22:/blog/2018/found-on-web-premises-of-rapid-software-testing/<p>Head on to Michael Bolton blog to read about fundamentals of Rapid Software Testing.</p>
<p>Head on to Michael Bolton blog to read about fundamentals of Rapid Software Testing.</p>
<p>I am big fan of <a href="http://www.satisfice.com/blog/">James Bach</a> and <a href="http://www.developsense.com/blog/">Michael Bolton</a> work on testing. As I was reading through their blogs archives - something you might consider to do as well, but be warned that Michael is very prolific writer - I stumbled on three-part series called <em>Premises of Rapid Software Testing</em>.</p>
<p><a href="http://www.developsense.com/blog/2012/09/premises-of-rapid-software-testing-part-1/">Part 1</a>,
<a href="http://www.developsense.com/blog/2012/09/premises-of-rapid-software-testing-part-2/">Part 2</a>,
<a href="http://www.developsense.com/blog/2012/09/premises-of-rapid-software-testing-part-3/">Part 3</a>.</p>
<p>This series explains eight core principles of Rapid Software Testing. They are rather uncontroversial and I feel that every working professional will agree with them, including people who do not agree with some of the other works by James and Michael. I also feel it’s good idea to learn them by heart, or print them out and put someplace close, so you can check them out when working. It’s all too easy to lose touch with fundamentals when you are deep in some detail, like checking framework development. You can only benefit from taking a step back once in a while and reflecting on your work as a whole.</p>
<p>There are three parts, but reading them will take you maybe five minutes, so do it right now. They are great source of inspiration and thinking about their implications might take a good chunk of an hour, so do it when you have some more time.</p>Practicing the Rule of Three: Polish SAF-T2018-02-26T19:16:46+01:002018-02-26T19:16:46+01:00Mirek Długosztag:mirekdlugosz.com,2018-02-26:/blog/2018/practicing-the-rule-of-three-polish-saf-t/<p>Testerzy.pl, one of the biggest websites for testers in Poland, published <a href="http://testerzy.pl/wiesci-ze-swiata-testerow/krytyczna-funkcja-sprawozdawcza-jpk-niedotestowana">short article</a> about problems with system behind <a href="https://en.wikipedia.org/wiki/SAF-T"><span class="caps">SAF</span>-T</a>. It’s in Polish, but Google Translate does not-terrible job at translating it to English. The bottom line is: application does not allow to input numbers with more than two significant digits. Testerzy.pl claim that system was “not tested enough”.</p>
<p>Testerzy.pl, one of the biggest websites for testers in Poland, published <a href="http://testerzy.pl/wiesci-ze-swiata-testerow/krytyczna-funkcja-sprawozdawcza-jpk-niedotestowana">short article</a> about problems with system behind <a href="https://en.wikipedia.org/wiki/SAF-T"><span class="caps">SAF</span>-T</a>. It’s in Polish, but Google Translate does not-terrible job at translating it to English. The bottom line is: application does not allow to input numbers with more than two significant digits. Testerzy.pl claim that system was “not tested enough”.</p>
<p>I was not involved in the project in question and I cannot comment whether it was tested enough or not (and it’s not exactly clear to me what “tested enough” is supposed to mean, but I digress). However, I can think of some other reasons why software could be released with problem like that.</p>
<ul>
<li>Specification explicitly said that numeric values must be expressed in format with two significant digits. Maybe someone brought this to attention of specification-makers, maybe not.</li>
<li>Development process made it very hard to fix bugs in specification, while requiring software to conform to said specification (even when it was buggy).</li>
<li>Information about the issue never left testing team. Maybe their priorities at the time didn’t justify spending time on reporting it. Maybe their process made each reported problem such a hassle, that testers hesitated. Maybe someone found it just before lunch break and then forgot.</li>
<li>None of decision-makers looked into reported issue. It still lingered in <span class="caps">NEW</span> queue at the time of release.</li>
<li>Issue was reported and then closed as too minor to be worth any further work.</li>
<li>After investigation by development team, it turned out that fix would require refactoring of significant part of application. Someone decided that benefits of fix do not outweigh potential costs.</li>
<li>Problem was fixed, but then it was re-introduced sometime before the release. That could happen while fixing another issue in related part of application, or maybe they didn’t have proper version control system set up.</li>
<li>Fix was deferred until future release.</li>
</ul>
<p>I’m sure you can come up with more reasons if you think about it for more than 10 minutes.</p>
<p>This is somewhat interesting case that gives us opportunity to discuss and reflect on role of testers, relationships between specification, needs and values, factors impacting business decisions and host of other topics related to software development. I find it very unfortunate that one of the poster-children of testing in Poland decided to reduce all these topics to only one word, and out of all the words, they have chosen one that suggests that testers did not do their job properly.</p>
<p><strong>Takeaway</strong>: testers are messengers. They can’t take responsibility for decisions that were made by other people based on messages that they brought.</p>Found on web: Radiologist are testers, too2018-02-25T15:14:28+01:002018-02-25T15:14:28+01:00Mirek Długosztag:mirekdlugosz.com,2018-02-25:/blog/2018/found-on-web-radiologist-are-testers-too/<p>Luke Oakden-Rayner, PhD candidate in field of radiology, explains why certain X-ray images database is not really fit to task of training medical systems to do diagnostics. His observations stem from questions that apply to pretty much any dataset used to train machines.</p>
<p>Luke Oakden-Rayner, PhD candidate in field of radiology, explains why certain X-ray images database is not really fit to task of training medical systems to do diagnostics. His observations stem from questions that apply to pretty much any dataset used to train machines.</p>
<p>Read <a href="https://lukeoakdenrayner.wordpress.com/2017/12/18/the-chestxray14-dataset-problems/"><em>Exploring the ChestXray14 dataset: problems</em> by Luke Oakden-Rayner</a>.</p>
<p>It’s worth your time for few reasons:</p>
<ul>
<li>He clearly demonstrates that quality of automated predictions is based solely on quality of input data; how they sometimes call it: garbage in, garbage out.</li>
<li>He stresses out that automated diagnostic systems are only as good as they are useful in the context of established medical practice; but since they cannot learn medical practice and meaning behind text labels, there is danger of spending considerable resources on creating something that is not particularly useful, or, even worse, is actively harmful.</li>
<li>He shows that reducing complex, multidimensional reality to single performance metric might lead to erroneous conclusions.</li>
</ul>
<p>However, I wanted to highlight one quote:</p>
<blockquote>
<p>Radiology reports are not objective, factual descriptions of images. The goal of a radiology report is to provide useful, actionable information to their referrer, usually another doctor. In some ways, the radiologist is guessing what information the referrer wants, and culling the information that will be irrelevant.</p>
</blockquote>
<p>Isn’t this exactly what good testers do, too?</p>Registering for TestingCup2018-02-20T22:50:51+01:002018-02-20T22:50:51+01:00Mirek Długosztag:mirekdlugosz.com,2018-02-20:/blog/2018/registering-for-testingcup/<p>Yesterday, I tried to register for TestingCup competition and conference. Number of issues I have encountered is well outside of my comfort zone.</p>
<p>Yesterday, I tried to register for TestingCup competition and conference. Number of issues I have encountered is well outside of my comfort zone.</p>
<p>A little bit of context. There are only 250 places for competition, offered in first-come first-served system during three rounds of registration - on 22nd January, yesterday and on 19th March. Yesterday’s ticket were all reserved in mere 5 minutes. If you want to participate, you have to act really quick.</p>
<p><em>Get tickets</em> button became active at 9.59. When I clicked it, I saw this screen:</p>
<figure>
<a href="https://mirekdlugosz.com/blog/2018/registering-for-testingcup/testingcup-registration/Screenshot_20180220_095516.png">
<img src="https://mirekdlugosz.com/blog/2018/registering-for-testingcup/testingcup-registration/Screenshot_20180220_095516-min.png" title="Login screen" alt="Login screen" loading="lazy">
</a>
</figure>
<p>Well, <span class="caps">OK</span>. I have no problem with creating account. But they could have said it earlier, so I would be already logged in when tickets became available.</p>
<p>Not the one to be held back, I proceeded to create new account. Since I was in hurry, I entered rather simple password, something like “testingiscool”. Turned out, passwords have to contain uppercase letter and number:</p>
<figure>
<a href="https://mirekdlugosz.com/blog/2018/registering-for-testingcup/testingcup-registration/Screenshot_20180219_100745.png">
<img src="https://mirekdlugosz.com/blog/2018/registering-for-testingcup/testingcup-registration/Screenshot_20180219_100745-min.png" title="Password constraints" alt="Password constraints" loading="lazy">
</a>
</figure>
<p><span class="caps">OK</span>, fine. I decided to generate new random password using password manager. It was something like <code>0uPcYJ=bIELZDZe_NFSh</code>.</p>
<figure>
<a href="https://mirekdlugosz.com/blog/2018/registering-for-testingcup/testingcup-registration/Screenshot_20180219_100819.png">
<img src="https://mirekdlugosz.com/blog/2018/registering-for-testingcup/testingcup-registration/Screenshot_20180219_100819-min.png" title="Password constraints" alt="Password constraints" loading="lazy">
</a>
</figure>
<p>Nope. Only now I know that some special characters are forbidden. I made a mental note to look up password policy later on and generated new password, this time without any special characters. Third time’s a charm.</p>
<p>After finally creating account, I had to create new participant for myself. That makes sense, as one person can register entire team, but it makes me wonder if I could have done it earlier. Either way, I reserved ticket for both competition and conference.</p>
<p>The next step is paying for ticket. You would think that is the easiest part, right? After all, they want my money. Well, no. I couldn’t find bank account number in <em>Dashboard</em> or on <em>Payment Summary</em> page. I looked at conference contact page and in <em>Rules and Regulations</em>. While this last document did state that I am required to transfer money to bank account specified by organizers, it didn’t reveal the number itself.</p>
<p>After some frenzied clicking, I have figured out that in order to proceed, I am supposed to provide my personal details using form on <em>Billing Data</em> page. This unlocks <em>Download pro forma invoice</em> button on <em>Payment summary</em> screen. Clicking this button downloads <span class="caps">PDF</span> that - among other things - contains bank account number.</p>
<p>So, I opened <em>Billing Data</em> page and… I froze. I did notice lack of <span class="caps">HTTPS</span> back when I registered account, but only now, when I am no longer in hurry, I can fully comprehend that.</p>
<p><strong>The entire website, including all the forms, is served through unencrypted connection!</strong></p>
<figure>
<a href="https://mirekdlugosz.com/blog/2018/registering-for-testingcup/testingcup-registration/Screenshot_20180220_100852.png">
<img src="https://mirekdlugosz.com/blog/2018/registering-for-testingcup/testingcup-registration/Screenshot_20180220_100852-min.png" title="Invoice data form page is not secured" alt="Invoice data form page is not secured" loading="lazy">
</a>
</figure>
<p>Yes, you have read it right. Organizers of high-profile software testing conference did not deploy <span class="caps">TLS</span> on their website. They did it these days, when popular browsers scream “unsecure!” on unencrypted websites. These days, when all browsers support <span class="caps">SNI</span> and you aren’t limited to one certificate per <span class="caps">IP</span>. These days, when certificates are given away for free by Let’s Encrypt. These days, when transferring sensitive personal information through insecure channel is violation of <a href="https://www.eugdpr.org/"><span class="caps">GDPR</span></a> and puts you at risk of paying hefty fine.</p>
<p>Since I have already stopped for a moment to reflect upon lack of <span class="caps">HTTPS</span>, I decided to look around more carefully and investigate some of the things that I didn’t want to spend time on before.</p>
<p>Password policy. I couldn’t find anything about it. There are some constraints placed on passwords, but you won’t know until you try to violate them. This is very common practice, but in this particular case - when people are racing against the clock to reserve one of available places - each incorrectly submitted form might make a difference between getting a ticket or not. I would highly appreciate knowing about these constraints beforehand.</p>
<p>Actually, I would highly appreciate if I knew in advance that I need to create account at all. Again, this is quite common practice and a lot of online shops do that, so I could have expected that. But a lot of online shops merge “create account” and “shipping details” forms into one, or just let you place an order without any account at all, and I could have expected that as well. This might have been clearer if account wouldn’t be called “participant” - for me, I am not participant until I actually participate (or, at the very least, buy the ticket). My way of thinking made me disregard <em>Participant zone</em> button early on, but I really shouldn’t.</p>
<p>Finding organizer’s bank account number is hard, as you already know. I can see how putting it in <span class="caps">PDF</span> could have made sense, if only each participant would have individual bank account number. But they don’t. Account number on invoice is just main account number of company behind the event - one that they use since at least 2012. I fail to see a reason why they decided to create this convoluted process instead of just putting account number someplace where it is easy to find, like on <em>Payment Summary</em> page. Or even in dashboard.</p>
<p>TestingCup is trying to reach international audience and, for the first time ever, their website is in English. This is generally a good thing. However, time of registration opening - arguably the most important information on entire website at the moment - is expressed without timezone. You have to guess that they mean local Poland’s time.</p>
<p>First image below shows page that was displayed when I was reserving my ticket (actually, there were four boxes - I had to get back to this page to take screenshot and there weren’t any tickets available anymore). Second image shows page that was displayed when I clicked <em>Options</em> button on dashboard. They are basically the same form, so why do they use two different interfaces?</p>
<div class="gallery">
<figure>
<a href="https://mirekdlugosz.com/blog/2018/registering-for-testingcup/testingcup-registration/Screenshot_20180219_100420.png">
<img src="https://mirekdlugosz.com/blog/2018/registering-for-testingcup/testingcup-registration/Screenshot_20180219_100420-min.png" title="New participant - four boxes one by one" alt="New participant - four boxes one by one" loading="lazy">
</a>
<figcaption>New participant - four boxes one by one</figcaption>
</figure>
<figure>
<a href="https://mirekdlugosz.com/blog/2018/registering-for-testingcup/testingcup-registration/Screenshot_20180219_101107.png">
<img src="https://mirekdlugosz.com/blog/2018/registering-for-testingcup/testingcup-registration/Screenshot_20180219_101107-min.png" title="Change option - two boxes and selection list?" alt="Change option - two boxes and selection list?" loading="lazy">
</a>
<figcaption>Change option - two boxes and selection list?</figcaption>
</figure></div>
<h2 id="bugs-found"><a class="toclink" href="#bugs-found">Bugs found</a></h2>
<ul>
<li>Lack of <span class="caps">HTTPS</span></li>
<li>Bank account number is hard to find</li>
<li>Time expression does not include timezone</li>
<li>Lack of clear information about entire registration process and which steps might be complete in advance to save time</li>
<li>Two different user interfaces for fundamentally the same screen (add participant vs. change participation option)</li>
<li>Password constraints are revealed only when password violates them</li>
</ul>
<p><strong>Takeaway</strong>: Spend some time thinking about your assumptions. Try to lift them during testing and see what happens. A lot of problems I have encountered were product of someone assuming that it’s obvious that things should be done certain way and never challenging that assumption.</p>Everything you need to know about artificial intelligence2018-02-06T21:48:02+01:002018-02-06T21:48:02+01:00Mirek Długosztag:mirekdlugosz.com,2018-02-06:/blog/2018/everything-you-need-to-know-about-artificial-intelligence/<p>Last week I have written about automated translation system gone awry in the midst of Polish international relations crisis. Today I have discovered extremely good article explaining important parts of artificial intelligence in layman’s terms.</p>
<p>Last week I have written about automated translation system gone awry in the midst of Polish international relations crisis. Today I have discovered extremely good article explaining important parts of artificial intelligence in layman’s terms.</p>
<p>Read <a href="https://medium.com/@yonatanzunger/asking-the-right-questions-about-ai-7ed2d9820c48"><em>Asking the Right Questions About <span class="caps">AI</span></em> by Yonatan Zunger</a>.</p>
<p>I would like to put it in context of testing in particular. Using Yonatans terms, testing is mix of indirect and undefined goals in an unpredictable environment. It’s in the group of problems that are the hardest to solve by machine learning systems. Testing won’t be taken over by <span class="caps">AI</span> (“automated”, as they say in our industry) in coming decades; possibly it will never be.</p>
<p>It doesn’t mean that testing won’t change at all and doesn’t have to adapt. All roles where agency is primarily given to non-human actors (erroneously referred as “manual testing” by huge part of industry) will slowly disappear. Testers will need to learn how to make better use of machines they are working with. Testers will need to better understand results given by machines and quickly catch situations where machine is answering question different than the one that was asked - things that will be very hard without better understanding of statistics and machine learning. It’s hard to tell whether industry as a whole will need more or less testers in the future.</p>
<p>And one more thing - when discussing ethical choices that autonomous car will have to make, Yonatan claims that society will have to make a choice and state it explicitly. As sociologist, I disagree. Getting societies to reach consensus is extremely hard, if not impossible. That’s why successful societies are build on rather vague principles (so everyone can agree on them, but interpret them a bit different) and have built-in “venting” mechanisms that allow people to express their disagreement and desire for change.</p>
<p><strong>Takeaway</strong>: read <a href="https://medium.com/@yonatanzunger/asking-the-right-questions-about-ai-7ed2d9820c48">this article</a>. Do situations described in section “Ethics and the Real World” have anything in common with testing? What? How can testers apply lessons learned by <span class="caps">AI</span> researches to their own jobs, even if they don’t use any <span class="caps">AI</span>?</p>What robots can’t do: speech translation2018-02-02T22:42:57+01:002018-02-02T22:42:57+01:00Mirek Długosztag:mirekdlugosz.com,2018-02-02:/blog/2018/what-robots-can-t-do-speech-translation/<p>In the middle of Polish international relations crisis, we can see why robots won’t take our jobs anytime soon.</p>
<p>In the middle of Polish international relations crisis, we can see why robots won’t take our jobs anytime soon.</p>
<p>A week ago, lower house of Polish parliament passed the bill that penalizes claims that Polish People or the Polish state is (in part) responsible for Holocaust and other crimes committed by Third Reich on territory of Poland. That happened at very apt time, merely a day before International Holocaust Remembrance Day. It was met with very strong reaction from Israel and other countries, including <span class="caps">USA</span>. Now we have the worst international relations crisis since December 2017.</p>
<p>To calm the situation down, Polish prime minister Mateusz Morawiecki issued <a href="https://www.youtube.com/watch?v=R9bS9z5OiWY">a statement</a> yesterday. In the speech, he said “Obozy w których wymordowano miliony Żydów nie były polskie”, but English subtitles displayed at the same time read “Camps where millions of Jews were murdered were Polish”. That is, subtitles omitted “not” between “were” and “Polish” and as a result read exactly the opposite of what prime minister said. Prime minister office <a href="https://twitter.com/PremierRP/status/959193834860220419">said on Twitter</a> that translation was done automatically by YouTube. The video was then taken down.</p>
<p>Politics aside, let’s take a closer look at this automated translation system.</p>
<p>I imagine that there are really two independent systems - speech recognition (responsible for translating audio into text) and natural language translator (translates text from one human language to another).</p>
<p>Both of them use machine learning algorithms, which basically means that they are told whether their output is right or not and use that information as input for future operations. The idea is that computer program “learns” where it makes mistakes and changes itself to not make these mistakes again. For machine learning to work, there needs to be some way of telling whether output was correct or not.</p>
<p>Speech recognition success might be measured by the number of words it caught (did not miss), by the number of words it translated correctly (“two” vs “too”, “here” vs “hear” etc.) or by types of noises it can handle without effect on results. Natural language translator performance might be measured against the number of words it translated correctly, correct order of words in sentence, correct usage of grammar features of target language (e.g. inflection) or by idiomaticity of produced text. It goes without saying that there might be other performance metrics as well.</p>
<p>It’s important to note that creators of machine learning algorithms don’t expect them to be 100% correct. Their expectations vary depending on task at hand and fall anywhere between 99,99% and being correct more often than not. Only few would complain if their system features overall success rate of 90%-95%.</p>
<p>If we try to apply some of our performance metrics to YouTube automated translation machine, we will see that speech recognition system has success rate of 87.5% (missed only one out of eight words), but natural language translator has success rate of whopping 100% (it translated everything correctly). Taken as one, these systems has success rate of 93.75%. Assuming that missing “not” is the only mistake in entire translation of prime minister statement (I can’t know), then speech recognition success rate goes up all the way to 99.8%!</p>
<p>Seen from quantitative metrics point of view, YouTube automated translation system is rather good. Some may claim that it’s on its way to automate job of human translators. </p>
<p>Except that no human translator would ever make such mistake.</p>
<p><strong>Takeaway</strong>: Robots will take over jobs in near future - these where quality doesn’t matter or is measured by quantitative metrics.</p>
<p>As a bonus point, let’s apply Jerry Weinbergs Rule Of Three to this situation. For those unaware, Rule Of Three says that if you can’t think of three possible explanations of why something happened, you haven’t thought enough.</p>
<ol>
<li>It was indeed problem with automated translation system.</li>
<li>Speech recognition system caught all words, but natural language translator knew that “death camps” are usually “Polish”, so it corrected this mistake.</li>
<li>Malicious agent in prime minister office manually prepared incorrect subtitles.</li>
<li>Malicious agent on YouTube side changed subtitles to be incorrect.</li>
<li>This never happened. <a href="https://twitter.com/BlazejPapiernik/status/959185720173834242">Initial screenshot</a> was faked. Not entirely sure why prime minister office would play along and why Google representative would apologize.</li>
</ol>