Pipelines For Testing Macos Apps

If you have test projects in your repository, then use the.NET Core task to run unit tests by using testing frameworks like MSTest, xUnit, and NUnit. For this functionality, the test project must reference Microsoft.NET.Test.SDK version 15.8.0 or higher. Azure DevOps is a hosted service to deploy CI/CD pipelines and today we are going to create a pipeline to deploy a Terraform configuration using an Azure DevOps pipeline. This is an updated version. May 27, 2020  There is a rich library of the best Mac apps to compare to the best Windows apps.That’s especially true now that we have access to key iOs apps that have been ported over with macOS.

Learn how the Mac App Store beautifully showcases your apps and makes them even easier to find, and how Developer ID and notarization make it safer for users to install apps that you distribute yourself.

Mac App Store

The Mac App Store makes it simple for customers to discover, purchase, and download your apps, and easily keep them updated. The Mac App Store on macOS Mojave and later offers editorial content that inspires and informs. Organized around the specific things customers love to do on Mac, along with insightful stories, curated collections, and videos, the Mac App Store beautifully showcases your apps and makes them even easier to find.

Outside the Mac App Store

While the Mac App Store is the safest place for users to get software for their Mac, you may choose to distribute your Mac apps in other ways. Gatekeeper on macOS helps protect users from downloading and installing malicious software by checking for a Developer ID certificate. Make sure to test your apps with the macOS 10.15 SDK and sign your apps, plug-ins, or installer packages to let Gatekeeper know they’re safe to install.

You can also give users even more confidence in your apps by submitting them to Apple to be notarized.

Mac Logo

The Mac logo is designed to easily identify software products and hardware peripherals developed to run on macOS and take advantage of its advanced features.

Mac App StoreOutside Mac App Store
App DistributionHosted by AppleManaged by developer
(with Developer ID)
Software UpdatesHosted by AppleManaged by developer
Worldwide Payment ProcessingManaged by AppleManaged by developer
Volume Purchasing and Education PricingManaged by AppleManaged by developer
Advanced App Capabilities (iCloud Storage and Push Notifications)AvailableAvailable
App Store Services (In-App Purchase and Game Center)AvailableNot Available
64-BitRequiredRecommended
App SandboxingRequiredRecommended

MacStadium recently hosted a panel on CI best practices at AltConf, which runs in parallel to Apple's WWDC and is specifically focused on the Apple app development community. The panelists included: Daniel Hagen, Director of IT for Aspyr Media, publisher of popular Mac games like Call of Duty®, Sid Meier’s Civilization® and SimCity™; Debayan Majumdar, Sr. Mobile Tools Engineer at Pandora, the largest streaming music provider in the US; and Peter Steinberger, CEO of PSPDFKit, the go-to solution for integrating PDFs into your app. It was an interesting discussion as the companies represent very different development teams and CI practices. However, the experiences of these diverse teams did have a few common points.

The group was in agreement that a 'commit often' culture was the most effective approach to get working apps written, and all three loved TestFlight. There was agreement on using Jenkins if you need a flexible deployment tool; however, this is where the first difference started – PSPDFKit no longer uses Jenkins as their primary CD tool, they have opted to move from Jenkins to BuildKite for ease of use. The relatively small shop values the labor savings, but still uses Jenkins to launch its TestFlight jobs.

Automated testing was the best practice, but how it was implemented varied. Being a game developer, Aspyr had the most limited set of automated testing. Pandora did not weigh in during the panel, but afterwards discussed testing with MacStadium and agreed automating this was vital – they use extensive automated tests for regression testing and basic pipeline CI as developers do commits. PSPDFKit not only uses their MacStadium infrastructure to test iOS code, but also Android code! Peter even commented that, 'the Android simulator, I don't know if the conference allows swear words, but it sucks. Sorry.'

As is best practice, each organization uses a known good image to base all of their machines off of, but they differed on their implementation method. Aspyr uses Golden Images and Instant Cloning (a vSphere and Jenkins feature) to produce new machines, then updates their Golden Images about once a week. Pandora uses an Anka template as its golden image, and does updates to all their templates with Ansible as needed. PSPDFKit prefers a bare-metal approach, as this reduces technical debt, and again speaks to their team prioritizing ease of use. When a node gives unexpected results, they do a full system reset, then load Chef to return all packages to the desired state. (This takes about four hours.)

Watch the video of the panel discussion for more best practice configurations, favorite automation tricks, and other lessons learned:


Or read on for a transcribed version:

Speakers:

Shawn Lankton, Chief Revenue Officer, MacStadium

Peter Steinberger, Founder & CEO, PSPDFKit

Daniel Hagen, Director of IT, Aspyr Media

Debayan Majumdar, Technical Team Lead, Mobile Tools, Pandora/SiriusXM

Shawn:
We've got three great guests with us here today who are going to be doing most of the talking. I'll let them introduce themselves quickly. Peter, do you want to start?

Peter:
My name is Peter. I work on PSPDFKit; I'm the founder. We are in Dropbox and IBM and Lufthansa, we’re like an SDK that shows PDF on all platforms, not just iOS, but we come from the iOS world and it's still our most important market. Yeah, CI is very important for us. Like every pull request runs all the platforms automated, and we are happy partner of MacStadium.

Dan:
Hey everyone. My name is Daniel Hagen. I'm the director of IT for Aspyr Media. We are known as the world's largest Mac publisher of video games. So many of you probably have played a few of ours, hopefully. Yeah, anything from Civilization to Call of Duty to Sims. We've played around with a lot of the good games. My background comes from web development. So CI/CD to me is a lot of automated pipelines, automated testing, automated unit tests, that sort of thing. So bring that into a thick client world has been an interesting endeavor. And working with development teams that come from -- probably a little bit stuck in the 1980s -- has been an endeavor of its own. So it's been a lot of fun to bring that about and make things happen a lot faster and a lot quicker.

Debayan:
My name is Debayan and I lead the Mobile Tools Team at Pandora/SiriusXM now. So for those of you who don't know, SiriusXM recently acquired Pandora, which makes us the largest music streaming company in the US. We have over 100 million monthly active users. You can imagine the scale at which we have to run our things.

Debayan:
I was actually the first mobile tools engineer or the first DevOps engineer at Pandora. So I can say that we have come a very long way and it's been a great arduous journey not only building out the CI infrastructure, but just changing the culture and kind of bringing in a DevOps mindset that allows us to scale. I think that has been a very challenging journey and very enriching journey as well.

Shawn:
Thanks. I wanted to just give a few words about what continuous integration is. I think probably everybody in this room came because they already know, but spend two seconds on it. Also, I'm Shawn Lankton. I work at MacStadium. I lead our sales and marketing teams and spend a lot of time thinking about how to make sure that everybody knows what's out there, how to use it so that we can help them.

Shawn:
Continuous integration is basically part of the DevOps workflow. It really focuses in on the build and test portions. The idea is that you're going to run a build every time you merge code with a pull request. You're going to test every time you build. It helps you catch errors faster and brings the faster time to value by shifting things left so you catch them sooner.

Shawn:
Of course, again, as I'm sure you all know, when you do this for the Apple ecosystem, you have to do all your builds and a lot of your tests using Xcode. Xcode only runs on macOS. macOS only runs on genuine Apple hardware, which means you have to find some way to get it and deploy it at scale. Once you have it deployed, you have to automate it, and a lot of the tricks that DevOps teams use for every other platform will not work for macOS. That's why we have a packed house at 4:00 in the afternoon.

Shawn:
MacStadium helps. In case you're not familiar, we have data centers around the world. We have about 20,000 Macs deployed in our data centers and we're buying them pretty much as fast as we can. We have Mac minis, Mac Pros, we have iMac Pros deployed and we work with all of the different virtualization technologies, whether it's VMware, Anka, or Orka, which we are just announcing this week. If you'd like to learn more about how to use Kubernetes to orchestrate Mac VMs, come talk to us at our booth anytime throughout the week. But I won't get into that now.

Shawn:
We're also of course very excited about rack mounted Mac Pros and what the future may hold for that. The fall ends on December 20th in case anybody hadn't checked their calendar. So we'll see when that actually ships.

Shawn:
That's enough about context. I want to spend the rest of the time here talking with these guys about some of their experiences on these topics. We'll get started. You've heard a bit about who they are and what the companies are. But I think it makes a lot of sense to start by understanding what are the end points that these guys are targeting, how are their teams structured, and what are their end users expecting to find in an app. So take it away.

Peter:
Sure. So I probably represent the smaller company side here. We have like two other giants in the room. For us, the important part is, it needs to be very cost effective, it needs to be simple to administer, and of course it should be fast and good.

Pipelines for testing macos apps list

Peter:
Obviously we need Mac hardware to run our Mac tests. And we also want to use Mac hardware to build Mac software. We started using Jenkins very early on. Since we are like eight years in or seven years in, everybody hates it, but it gets the job done. We're now having a project to migrate to Buildkite because we slowly see that Jenkins doesn't really scale well. When you update the plugin, all the jobs need to stop and then Jenkins needs to restart. And then you have to hope that it boots up again because if the plugin update failed then you'll just get an error and you have to like manually undo it and it's very, very messy.

Pipelines For Testing Macos Apps Download

Peter:
But there's a lot of those problems you can live with so I think it's a decent choice if you look at the options out there. For us it now makes sense to move to a little bit of a better and also more expensive system. Setup wise, we started with iOS. So very early on we wanted to have hosted Macs via a remote company so there is no real office, meaning there's also no real data center. For a while I had Mac minis at my home and then the cleaning lady unplugged them, which is not really professional and it's also kind of annoying. So having those things in an actual data center is nice.

Peter:
There's not that many companies out there where you can actually do that. So you can do your own research. I did, and that's why I chose MacStadium. So this is may be a marketing plug, but I don't get any affiliation. They are very good to work with. We looked at the different products. I mean they have like Mac Pros and Xserves as well. Do they still exist?

Shawn:
They still exist. Not many people use them anymore.

Peter:
So we chose the simplest set up possible again because our set up needs to be cost effective. It was just Mac minis. One Mac mini for one user doing one test at a time or sometimes two.

Peter:
In the very early days we set them up manually, just VNC in and like you have someone doing a lot of clicking. That gets annoying very quickly. So don't do that. We did invest a lot of time in Ansible and automating everything. Once we're like at 95% done, we found out that Microsoft maintains a really, really good Chef repository with like a cookbook. So you can basically pick what you want from like installing Xcode to a lot more esoteric things.

Peter:
We use Mac minis to not only test iOS but also test macOS and also do Android testing. So for Android OS I use Genymotion because the Android simulator, I don't know if the conference allows swear words, but it sucks. Sorry.

Peter:
We try also to like keep our set up as simple as possible. There are Macs and then we have obviously Windows for the Windows tests, but we didn't want to have yet another Linux system for Android. We just reuse macOS to keep the variable small.

Peter:
The main challenge is actually not finding a partner, and these guys are really easy to work with. The main challenge is to make sure the tests are stable, make sure that macOS works, that your automated tests work. A lot of the trouble is actually making sure you're ready for the new Xcode release, you're ready for the new macOS release. Apple usually gives you half a year time to update macOS before they release an Xcode version that doesn't run on the previous macOS version anymore. So you really have to be fast. Now this year is even more interesting because, for example, we're going to release PDF Viewer for the Mac in fall. Is that recorded? I don't know, this is just my wish, but eventually you said fall is until December 20th so.

Shawn:
Yeah, you got plenty of time.

Peter:
But this actually forces us to be even faster. We need to have a version running with macOS 10.15 on CI. So for us running them bare metal, like just having the Mac mini and just having that on reduces a lot of variables. There's a lot of benefits in Orka and all those more fancy orchestrations, but it also is sometimes a little bit more trouble, it's something that takes a little bit longer for it to work. So that's one of the reasons why we chose to keep it simple.

Peter:
Also, like we are the scale where this still works. If you have, I don't know, thousands of devices, then it might get a little more annoying to do that. But we currently have 20, 30, so this is still a number that's very fine. And sometimes if a Mac makes trouble or it's weird, you just, we just reset it and like let Chef run. That takes around half a day to install macOS, install Xcode, install the Android Studio, all the things we need, and then it's basically exactly at the level that we expect it to be. And then we can just add it to our Jenkins form again.

Dan:
It's interesting hearing some of that and some of the similarities in our environments. Aspyr started going through a transition about a year, year and a half ago to move a lot of our data center services off site. We were running a full build farm internally. Actually I say internally, similar to your story, it lived in a developer's office. And more than once the power turned off and it shut down the build farm. So that was changed. All the end points that did the builds were moved into our little server closet, and that gave us a little bit more stability. But as we continue to grow and our project count goes up, there's no way I can continue to grow our build node count internally effectively. And that's where I stumbled across MacStadium.

Dan:
Again, no affiliation bonus points here, but they are excellent hosts for us. They have provided us a way to expand out when we kind of hit our maximum. Our end points for the most part have been Mac and iOS, although we do also build for PC and Linux and PlayStation and Xbox. You name it, we've got it.

Dan:
So for us, Jenkins has been our backend for that, just firing off build scripts that the developers go, 'Hey, it worked on my system I committed to.' We run on a per force server for source control and that stores it there. Jenkins pulls that script out and kicks it off.

Dan:
The struggle we've had lately is kind of to your point about the build image. Every version of OS X needs to have a certain version of Xcode and all the affiliate libraries in order to compile these build artifacts. And traditionally, yes, it's the go through click, click, click through a remote desktop session just to set it all up.

Dan:
We did get to a point where we kind of found what I call our golden image. We find what we want it to be for that version of Xcode and OS X and we archive that off and keep it around just in case we have to build ancient versions of builds, which we do.

Dan:
And then actually thanks to a tip from MacStadium, we started playing around with a Jenkins plugin called Instant Clone or VMFork, however you may refer to that, where we're able to then just spin off ephemeral images of our golden image and run our build in that, sync up our code repository, compile and go.

Dan:
A big point of that was getting our CI builds going. It's been interesting. Our developers come from a background where you only commit when you have something to commit. And we're changing that culture to a commit and commit often. So you find out what you're breaking as you break it. And that's what this pipeline has given us the ability to do. They commit in. It kicks off a quick CI build. Finds out what they just broke. They get an email back and they're able to quickly iterate off of that instead of waiting for a full feature to be complete, committed in, send it off to build, and then find out. Just kidding… you've got to rebuild a lot of that feature. So that's been helpful for us.

Dan:
The other part about our business that makes us a little bit unique is video games are such a manual testing heavy project. You can't do automated builds on a majority of our projects. You might be able to do some unit tests around the UI and whether it even loads. That was one of our first unit tests. Does the game load? Does it crash?

Dan:
From there it goes into manual testing, and the more we can catch these simple errors before it even gets put in the pipeline for QA, just saves us time and effort. For me, it was just a process of implement small changes as you go. It's actually kind of a CI/CD of a CI/CD, find out what works, deploy new changes, and keep it going.

Debayan:
Very interesting stories. We use Jenkins as our primary CI tool, which I'm sure Peter vehemently disagrees. But from my experience, and I think many others will share, when you're at a scale as big as Pandora or Aspyr or any other company that has that scale, Jenkins is just the most scalable solution at that point. If you're not at that stage, there are various other tools like Buildkite, CircleCI, you have abundant budget and those tools are very efficient to get out of the box. But what Jenkins provides you is the flexibility and the scalability, which too for us works great and does for many other companies as well.

Debayan:
How many of you here have used Mac minis on your desktop to run your CI? Raise your hands. That's it. That's like 70% of the room. So that's pretty much been the experience everywhere I've been. Most of the times, I come from a DevOps background that's mostly working on cloud, AWS, GCP, Azure, and everywhere in every company that I've worked in, the mobile part of it is the most neglected, right? It's almost like they're third class citizens. I don't know why.

Debayan:
I see these developers have their Mac minis on their desk, they're developing their apps and kind of like, 'Oh here, here's my APK, here's my IPA, go and test it,' that kind of model. When I joined Pandora, it was very unique because we didn't have a mobile tools team. We don't have mobile DevOps or anything like that. The entire Pandora application, like all the client teams were sharing just six VM instances. They were building, they were testing, they were doing everything on these six instances. And that's it.

Debayan:
So what does that mean? That means that we have four and a half month release cycles. We were like, 'Okay, this is definitely not scalable. This is not going to work.' I was like, 'Okay, we have to change this.' And the first thing that we needed was more hardware to run.

Debayan:
Now, since I was the only one, and I didn't want to overburden myself by going with Linux and macOS, I decided to share macOS with iOS and Android and of course automation as well. The first thing was to build a CI pipeline, which was great. CI pipeline works, but now all the jobs are in queue because they are now no longer anymore executors to run the builds. So, okay, so how do we solve that?

Debayan:
So the other challenge was, where do we put these Mac minis and how do we provision them? So you asked, 'Hey, how do you guys provision your Mac minis?' And they're like, 'Oh, it's easy. We take a disk, put it in, copy an image, and then go to each Mac mini and provision them independently.' I'm like, “wow, that's definitely not scalable.”

Debayan:
So I ask them like, 'Hey, do we have a data center where we can put these Mac minis?' And they were like, 'Yeah, we have a data center but it runs Linux servers. We have no idea how to rack Mac minis.' So I was like, okay. That became a challenge. So slowly we started talking to our data center team, the site ops team, and we finally got a bunch of Mac minis racked out.

Debayan:
So from six VMs, we went up to like 75. So once we have these 75 VMs, I was like, okay, now how are we going to provision this? Of course the answer is use a provisioning tool. Which one's free? Ansible's free. It's great. It's written in Python. I don't know how many of you have used Ansible here, but it's a very, very, very powerful tool. The other alternatives, Chef and Puppet, you can also use Salt, but Ansible was great. So I wrote a bunch of Ansible scripts, automated everything for iOS provisioning, Android for all of our automation suite. And guess what? I ran the script and it took down our corporate network because it tried to download Xcode at the same time.

Debayan:
So I got pinged by the Net Ops Director and he's like, 'Hey, what the hell are you doing? You just took down the entire CARP net.' And I'm like, 'Oh, okay.' So then I got called into a meeting and said, 'Hey, you have to come up with a new solution because this is not going to work for us.' I'm like, 'Oh, okay.' So then I started looking around and I came across this tool called Anka.

Debayan:
So Anka is basically the Docker for Macs. So they provide like macOS virtualization. And then I was like, okay, so this is one part of my solution. The other part is to get hardware that scales, because it takes us months to create a requirement for Mac minis or whatever, like Mac pros and then actually rack them up in the data center and make it actually functional.

Debayan:
So then I came across MacStadium, and I can tell you that both of these companies are great to work with. They're super easy to work with, great support team and we have been using them for like over two years now, both of them. And it just scales so well. So with Anka, what you have to do is essentially you create one instance and each instance is tagged. So think of it like Git. So you run Ansible on one image, you make the changes and you've committed to a centralized repository. Then you have an orchestrator. The orchestrator, what it does is it looks at the repository, knows all the images that you have, and it caches every image on your nodes, which are our MacStadium notes. And then you integrate it with Jenkins in a way that is completely ephemeral, which means that you now have... you can now forget about the concept of slaves and Jenkins.

Debayan:
So what you're doing is essentially triggering a job. The job talks to an Anka plugin and says, 'Hey, I want to build myself. Where can I build myself?' That the Jenkins plugin now goes and talks to the orchestrator and says, 'Hey, I need an image for so and so version provisioned to me.' That now goes and checks the hardware pool that's available, spins up an image in seconds, runs your job, runs your test, runs your builds, and destroys it. That's it.

Debayan:
So the same hardware can be shared across any different configuration, any testing, Android builds, iOS, web clients, whatever you need you can just build it. So that I think was the biggest thing that... the biggest CI achievement that we made at Pandora, which actually allowed us to release Siri shortcuts for iOS 12 launch on day one, beating Apple Music and Spotify.

Debayan:
And the reason was because we could spin up images with customized Xcode versions. So each branch was building a different Xcode versions, different OS beta versions, and we were able to get the continuous feedback from our manual testers, from our automation testers and of course from the developers as well. So I think that was the pinnacle of our CI, macOS CI system.

Shawn:
That's great. A lot in all of those. I wanted to just unpack some of the themes that came up. So, you guys had talked about... Peter, you were using Ansible, you moved to Chef. Debayan kind of went with Ansible and stuck with it. I think Dan, you sort of found your golden images through a bit of trial and error. I wanted to just kind of stick on that topic a little bit, because that whole defining your infrastructure as code is an important piece. Maybe just spend a moment kind of comparing and contrasting. Start with you Peter, your experiences with the two of them and what advice you would give to the room on how to think about which is the right tool and where to get resources for it.

Peter:
The code, like having infrastructure as code, is very important for us. That's why Ansible or Chef, like anything that scripts and can recreate an environment from scratch, was so attractive for us compared to images. We played with images. We did the whole hypervisor thing, but ultimately both cost and complexity were higher for us, we're not in the four digits of machines. So I think our solution works really well for our scale.

Peter:
The main reason why we moved to Chef was because we found that Microsoft has such a great selection of Mac-specific cookbooks, that it was just such a huge time saver for us. All the things that we had to write ourselves and community's really important, so if you have a very strong player who provides a lot of things, and we contribute that repository now as well. Again, a huge time saver, ultimately a cost saver of your bootstraps, so I need to actually look at every dollar, or euro in my case, that we spend. So the priorities are a little bit different.

Peter:
So that's also why we choose Mac minis, and for a long time we used the 2012 four-core Mac minis, because they were pretty cheap and okay on performance. Obviously a Mac Pro would be faster, but again the cost vs performance was not beneficial for us. So then when the 2018 Mac minis came, this was actually huge for us because it basically made a full build from four hours down to two. Now with some other improvements we are sitting at 30 minutes. So, that's for releasing. So testing is actually faster because it doesn't need to build all the architectures and we can streamline a lot of that and distribute tests to multiple machines. We have modern tests, UI tests, and we just use multiple machines to distribute that vs using more expensive machines that are a little bit faster. For our workload that helps a little more.

Shawn:
Daniel, I don't know if you looked into it and found that it wouldn't work for some reason or if you've kind of got any plans to approach defining those images more clearly, if you could talk about that?

Dan:
Yeah, so this is a process that we're actively going through right now. I come from a background... I've handled all our web infrastructure, and for that I've used Chef and so that's our go-to. Most of that came about because it's written in Ruby and our web services are written in Ruby. So that just made a lot of sense, and to Peter's point, it's such a well-covered community. It reduced our individual contribution effort drastically to be able to use that.

Dan:
The struggle that we are hitting with these build image configuration scripts is really wrapped around our SDKs. The open ones are easy. Android SDK, even Xcode, you can kind of easily grab those images off the Internet provided you have an account. Some of the ones that were a little limited in our access on, such as PlayStation and Xbox, we have to very tightly control those. So we can't just, Wget a SDK image off the internet. You have to have it on your local network, you have to secure it, and actually on some of them audit every transfer of them.

Dan:
So for those, we're looking into building our own cookbooks. The struggle there has been that the cookbooks generally get designed by the people who are familiar with the SDK, and our people familiar with SDK are game developers who mostly do Objective-C and C++. So they're not really game for writing Ruby cookbooks. So that is creating a clash of the worlds where they're having to lean on my infrastructure team to build these cookbooks out, while we partner with them to stabilize golden images so developers aren't clicking through to build every image.

Shawn:
So this is getting into the idea of tips and tricks and secrets and hacks. What are some of the things, Debayan, that you've found that are maybe in Xcode configurations or little nits that you wouldn't find unless you had already kind of walked the path for a couple of years?

Debayan:
So here's a tip that'll definitely speed up your builds, and that is to disable Spotlight. As soon as you disable Spotlight -- but you have to make sure that you are not using xcversion to switch between Xcode versions. So that became a challenge for us because we had multiple Xcode versions on the same image, and we would run the script that would select the Xcode version using xcversion switch. And then the problem was that the Spotlight indexing is required for running xcversion.

Debayan:
So then we had to make the images isolated so we didn't have to switch within the image, and we would have separate images for each Xcode version. That allowed us to disable Spotlight and that immediately increased their build time by almost 15%, which is great.

Peter:
So, the Spotlight thing, you can also solve by selectively disabling some folders. So we disabled everything where code lives and derive data, which almost gets you the same percentage, but you can still do most of the things because some part of macOS just really gets weird if Spotlight's disabled. Apple is only testing the Golden Pass and not thinking about those things.

Dan:
Especially if you're used to command space bar to type in an app name and launch it. Killing Spotlight kills that.

Dan:
The biggest tip I can think of is actually on our artifact delivery. Once we've compiled the builds, keep in mind if anyone here has played Borderlands, I'm so sorry, but it's a 20 gigabyte build. That is not something you just transfer over the network whenever you want to, or if you do, you get that network admin, which is me, very unhappy. So actually what we ended up doing is adding – this is gonna sound a little “no duh” – but putting Dropbox on our build notes, and copying the build artifact into there. And then it will seed out to all of our QA machines that are on the same Dropbox share, and they can selectively pick which project folder they want to be syncing.

Dan:
And so part of the benefit on a CIC pipeline, it doesn't have to just be built when someone commits code. It can be a nightly build, a noon build, a first thing in the morning build. Those builds go off, everything gets synced into the folder. Those computers are up all the time. So they download them as they're able, they client across from each other on the local network instead of through every switch node, and that has saved us a ton of bandwidth and gotten our QA department built when they need it, right when the build's out there. So that's probably my biggest tip.

Shawn:
Awesome. What else you guys got? Watching people take notes in the audience, I think we're onto something.

Debayan:
So, one more important benefit that we got out of using Anka was that we were able to actually cache everything inside the image, which meant that we could produce images that cached our entire source code. It also could cache our build artifacts, dependencies, everything within the image itself, which also saved an incredible amount of time.

Dan:
I'll just chime in on that. That's another thing we're doing on these golden images is – I consider them golden as in they're stabilized and all the tool set – but we are also what I call re-hydrating them. We bring them back online, sync them up to the latest code every week, and then shut them back down and clone off of that. That way every time you create that ephemeral build and it has sync up to the latest, it's only a few days behind.

Peter:
I have a few more tips to how you can reduce the number of machines you need. I'm actually curious, is anybody using Mono ripples for your setup? Do we have anyone? Oh yeah, maybe 10%, 8%. So we switched to one giant Mono-ripple for all our platforms, which then means every time somebody did a pull request, I think 50 jobs spinned up like iOS, Android, Web, Windows, macOS. On iOS you need a 32-bit and 64-bit, now we can drop the 32-bit finally. On the Android, different versions, iPad, iPhone, iOS 11, 12, 13 now. So, a lot of combinations, so there was a lot of machines. I mean we like MacStadium, but we also like not paying too much.

Peter:
So, one of the things we did is we wrote something on Jenkins – and I fully agree Jenkins is the most flexible system – and we made sure when we moved to Buildkite, we can do it there as well. We use something that we call ‘selective pull request testing.’ You can actually Google “selective pull request testing, PSPDFkit,” and you'll find a blog post about just that, where basically a script looks at what files changed and then analyzes what platform it will impact.

Peter:
So for example, if I change something, ‘blah blah view controller’ in the iOS app directory, Jenkins knows through the script that it only needs to run the iOS tests, not core, not Android, not everything else, which saves a huge number of machines. Now if somebody else from the core team – and it's like if this is our shared C++ layer – changes something, that means everything has to be tested.

Peter:
So what our core team then does is for many of the PRs – because they also like commit early commit often, but they don't commit so often that the CI is clogged for hours – we use a tag on Github, and then a Jenkins script again. I think that this [script] is actually available; you don't have to write this yourself. It looks for ‘skip CI’ for the tag and then just doesn't run CI until I remove the ‘skip CI’ tag. So if you're a little bit careful when you open a pull request early you can save a ton of load on CI, and then once your pull request gets towards completion you can still get that feedback. Especially if you do like a larger refactor you don't need all that feedback in the old times, and again this just saves cost, time, ultimately machines, right?

Peter:
So, I've used a bunch of tricks to reduce our load. Like we have something where we started with one test and only if that succeeds we spin off others. So you can get very creative depending on how important time vs cost is, depending on your organization and how much you want to spend. There is a lot of place to optimize in that direction.

Shawn:
You talked about reducing the build time from one hour down to 30...four hours down to two hours I think you said. And then just by changing hardware, which is a great way to speed things up, but then from two hours down to 30 minutes. I don't know how much of that is some of the selective pull requests that you're talking about or if you have some other tricks on achieving a 4X -

Peter:
So we also use something called CC cache and STC. So we actually distribute some of the compile artifacts across build servers, which is much faster than compiling everything directly. And this CC cache is built in a way that it's kind of like an intermediary layer between the compiler and the output. So it's just called before, and if the thing that it's called is the same and all the files are the same, then it will just use a cached result and not even call the compiler. So if you use that, then you get it right and you can often, especially if you switch branches, get a huge speed up.

Peter:
And another one is that we dropped iOS 10 which dropped 32-bit, which was really nice. And just half the architectures we need are just Intel 386 and RM-47 was gone, which basically it's like also the other half of the time that we could reduce.

Shawn:
Any other tips? I know that a speeding a build times is always a hot topic. Any ideas on that?

Debayan:
We recently did an extensive amount of benchmarking between Linux servers and Mac hardware from MacStadium for running our Android builds. It's like I told you the stories... So I was the only person on the team, which is why I went with an all-Mac solution. But now that my team is much bigger, I was like, “oh, why not give it a shot?” And what we found out was that... so we run tests across a bunch of AMD EPYC servers, which we have in house in our data center, and we also used GCP to do the benchmarking. And what we found out was, shockingly, was that the 2018 six-core minis are actually almost two to three times faster, even for Android.

Shawn:
We hear that a lot, and a part of it I think has to do with the fact that the builds do better when they're on a real computer where the memory and the hard drive and the processor are all physically close to each other, versus a scaled out data center architecture.

Debayan:
It also depends on the processor's speed. So, if you have long running tasks which depend on a single processor, then the six-core minis are definitely way more scalable. If you are more multi-threaded, then maybe...

Peter:
One more tip, and it goes towards build stability, because you all want green tests, and hopefully they should be green without retrying five times. So, we experimented a lot again because we want to save costs, and be as efficient as possible, but one machine should do one thing at a time. Don’t try to be clever, don't try to do Android tests and iOS tests on the same machine, or play with, 'Oh there is this checkbox for parallel testing and Xcode. This is nice. What could possibly go wrong?'

Peter:
Especially for model tests, it probably matters less because they are just much more predictable. But if you do iOS, you probably do XCUI tests, and this is just like really fragile. Ideally, the machine shouldn't do anything else, because a lot of it is timing dependent, and you just want to minimize everything that could cause a huge spike in CPU, and then suddenly something that usually takes one second takes 10 seconds, but Xcode only waits five seconds, and then it aborts, and then you have to retry, and then ultimately everything takes longer.

Peter:
So, don't try to become too clever. One machine, one account, one test at a time. Ultimately, that will be much more reliable. Please prove me wrong, because I would love to optimize it more, but we tried a lot in that area, and everything ultimately cost more time, because if you failed and it blocks the pull request, you have to wait long, and everybody gets annoyed.

Shawn:
And VMs are definitely a good way to solve that, while keeping things super clean, but you still have multiple tests or multiple builds running on the same machine.

Dan:
So, not to geek-out on that too much, but the difference between kernel time management versus a hypervisor time slicing is where that benefit comes in. Kernel management will do its best to share out CPU clock cycle, but VM hypervisor time slicing is the best way that I know of, by data and reports, that you can get the most bang for your buck on the hardware.

Dan:
I remembered what I was going to say to the build time, and what hardware does. So for anyone who's a fan of Civilization VI, we currently build the desktop version of that on... Our developers have MacBook Pros, 2017 edition, or not the latest iMac Pro, but the previous. We were doing about a two and a half hour to three hour build for that, locally. We got the newest Mac minis and that dropped to about 40 minutes, so that tells you a little bit on the performance spec there. We got the new iMac Pro, 10 minutes flat. That thing's a beast.

Dan:
The other thing I wanted to mention with that though is, one of the things we looked into, I don't know how many of you have messed around in Xcode and seen the cloud build or remote build option in Xcode. It's basically a way for you to connect your local ID environment to another Mac to have it compile over there.

Dan:
We looked at that as an option to do builds that don't block the developer, and let them keep going. It is very inefficient. It seems to be very immature. And the big thing that I took away from it... and another thing that I would say is the benefit from building out a CICD pipeline. I don't know how many of you have had the experience of, 'But it works on my machine.'

Dan:
Anytime that a developer commits their code, it goes off to another machine that was configured by someone else, it compiles it, it tests it, it hands it off to QA. That means that they can't just compile it locally, get it going great, and hand off to QA, and suddenly it doesn't work for QA. They get stopped at the CI/CD pipeline going, 'Well, it may have worked just fine on your system, but you had a library that you didn't commit in. Fix it.'

Shawn:
And that gets to the culture change point, which I think all three of you guys mentioned, right as you were starting off. So, I think that we're almost at time here. There's a whole bunch of folks, and I'm sure you all brought questions that are interesting and specific to you. Some of them are probably about some of these new features that we heard about yesterday. So, I'd like to open it up for questions. I know that because this is the last session they said we could go a little late, so we'll go until we run out of questions, or run out of people.

Audience member:
Do any of you guys use TestFlight for internal or external app distribution?

Dan:
Yes, we do.

Debayan:
Absolutely. Yes.

Audience member:
How does that fit in into your CI pipeline?

Dan:
I'll take that one first. So, we have not automated it yet. That's my short answer.

Debayan:
Yeah. So we have a complete CI/CD solution, which means that we do automatically upload it to TestFlight and HockeyApp, which is our two primary distribution channels. We use Fastlane. How many of you are familiar with Fastlane here? Great. Most of you are. So, of course, we use Fastlane Pilot. It's built into our CI/CD process, so it runs through a bunch of automated tests. Once all the gates pass, then we automatically deploy it to TestFlight and HockeyApp.

Peter:
We basically do the same. So, it's not every nightly, that doesn't make sense for us. But I think every few days when it makes sense, we have a little bot in Slack that says, 'Release bot, TestFlight this branch,' and then that spins up Jenkins, it runs all the tests, and then they all good, and then it uses Fastlane to upload that to TestFlight. And then, when it's done, then it calls back to Slack, and says, 'Yeah, your build is there.' And then everybody can test it.

Shawn:
The next question is around how do you orchestrate key management with Anka.

Debayan:
That's a great question. So we use Vault for queue key management system. So, that's kind of baked into our process.

Shawn:
We have a question about Xcode settings for build-for-testing, etc. and test without building, to see how that affects build times.

Peter:
Build-for-testing, I'm actually not sure what it does. So, we basically script everything using a little bit of Fastlane, and mostly our own script. So, it's just like build, and then we just manually cold test, and then we use something that's called Trainer, which then translates the Xcode test P listings into J unit test, so Jenkins can actually parse it and show it. That's a small Ruby thing that I wrote with Felix.

Shawn:
I guess that's mostly a no then.

Dan:
Mostly no, but I'm going to go Google it.

Shawn:
We have a question about tips and tricks to get the simulator to spin up more quickly for testing parallelization.

Pipelines for testing macos apps free

Debayan:
Yeah, so I'll take that question. So, like I said, with Anka, you can cache the image so you can actually have running simulators in it already. So, if you want a simulator for iPhone 10, running the iOS 12, you can have that already in your image. As soon as Jenkins fires a test, it's just going to spin up the image with the simulator already running, so you can just start executing a test.

Shawn:
And you can do that with every hypervisor technology.

Debayan:
Yeah, essentially.

Peter:
Yeah, we do something similar. Like especially the simulator sometimes needs to be spun up before, because otherwise it does weird things. So, spinning it up manually before you even build is a very good idea. Also for increasing reliability, otherwise you sometimes get error 65, if that rings a bell.

Debayan:
Just one more thing that I want to add is, I don't know how familiar you guys are with Bluepill, but has anyone used Bluepill here? So, one of the challenges with using third party tools to do these kinds of things like parallel testing, is that every time when Apple releases something, everyone else has to play catch up. Which means at that point, if your CI system heavily depends on that tool, you'll be completely biting the dust at that point, because you don't know what to do. That's why we moved away from Bluepill, and have these kind of customized solutions, so we know that whenever Apple throws a surprise, or spins a surprise on us, we will be ready to tackle it.

Shawn:
Your destiny is at least in your own hands at that point.

Debayan:
It's in my hands, yeah.

Dan:
Just so long as you accept the new terms of use and download it real fast.

Audience member:
Do you guys test on real devices as part of CI?

Peter:
So, that was something we tried, we did not find a solution yet. First of all, devices are not made for 24-hour testing. If you do that, eventually the battery will explode. And they regularly do for the few services that offer that, and you talk to people that know things a little bit better. So, caveat number one.

Peter:
Caveat number two, ideally you should completely restore the device. That's very hard to automate. I think the few companies who do, they write custom drivers and really do edgy stuff to get some of that automated. So, quick answer, we want to, but we haven't figured it out.

Debayan:
Yeah. We are doing real device testing, kind of doing everything, right? So yeah, again, to Peter's point, that definitely should not be your first choice of the testing pyramid. So, if you look at the testing pyramid, the majority of it should be unit tests. There's also now the testing trophy, so that one has more integrated tests.

Debayan:
But either way, the majority of our UI tests are run on simulators and emulators, but only the tests which cannot be run on simulators and emulators, we run on real devices. So, the real devices right now we are running in-house, but right now what we are doing is we are evaluating cloud solutions, so there are plenty of cloud providers like Sauce Labs, BrowserStack, Perfecto, so you can look at those as well as AWS device farms as well.

Debayan:
But actually, what we have been building right now is an in-house solution using Anka and MacStadium, and it actually is working really well.

Shawn:
We expect to see that on Medium sometime soon.

Debayan:
Yeah. You'll probably see an article from me about that, once we are able to completely scale it.

Shawn:
The next question is how do you automate tests that would involve things like push notifications, which are difficult in a simulated environment?

Debayan:
Yeah, that's what we are using to test on real devices. So those are the scenarios. A lot of it is like with Android, a lot of statistics like performance, memory, CPU is readily available, but it's not available in iOS, or on iPhones. So, it's kind of challenging. But definitely, test cases like push notifications, backgrounded app, third party app integrations, those things we are running on real devices, deep links.

Dan:
So, I'll kind of tap onto to the last question along with this one. Obviously video games, different whole game there. So, we have a full test department. It is not the dream job that every teenager thinks it is. If you have a teenager who thinks that, send them to me, I will crush that dream.

Dan:
One quick story that it made me think of, with the devices always on, before we moved to MacStadium and AWS, we had a co-location in Austin, and I remember going down the aisle of racks and noticed smoke coming from one of the racks, and it wasn't an iPhone, it was an Android device, but it was a device that was left on 24/7. So, to backup Peter's story, don't do that.

Dan:
For things like push notifications and that sort of thing, I can't do anything but emphasize the point of automate what makes sense, and limit what you can to the physical environments, to the physical devices. If it is something like push notifications, limit the number of iterations you have to go through on that, see how many things you can check off through a simulator, and leave that behind for something else. If it's performance specs, which is a lot of what we do, how does it run on the device? What is the CPU taxed at? And memory use, and that sort of thing. We have to do that from a mobile device profiling through a desktop client. So, we limit those as much as we can. So, it really takes a build engineer, and that was actually going to be one of my other comments here.

Dan:
If you're just starting out, if you're just building out your app, consider when you need to bring in a build engineer, when you need to have a dedicated point of contact for this. As you can tell from my story earlier, we're kind of stumbling through this. The infrastructure teams, meaning the development team, and trying to figure this thing out, and we're actually about to settle down on... Development runs a lot of this, and my team is just providing the infrastructure, which really means I'm just paying Shawn here to do that job for me.

Testing

Shawn:
So, you can go on vacation.

Dan:
Exactly. So, I can do more talks like this, and tell you to just use his stuff.

Dan:
But yes, build engineers are essential. We talked about build time. We have engineers that are so busy working on new features and getting bugs out of the way, that we really don't have anyone sitting around going, 'Okay, but is this build script efficient? Is there some way that we can parallelize this process?' Or, 'What is the CPU profile on this VM during the build time? Like, am I fully using this or not?'

Dan:
You need to have someone who can sit down and look at that, at some point in your growth process. So, I highly recommend that.

Shawn:
Great. Well that's a great place to leave it. I'd like to thank you guys so much for spending your time with us, and thanks everybody for sticking around, and all the great questions. So, thank you.

Macos App Develop