(edited on 3/8/18 – typos/grammar)
My cps-WhatsNew script has a new permanent page on the blog site now. In addition, it has a new Docker image! Check out the links and let me know what you think.
While the cps-WhatsNew script is relatively simple, I was thinking about how could I make it easier to implement the solution. Docker was a clear choice. As I’ve only been a consumer of Docker images, this “short” project was a very interesting foray into building my own image. Embarrassingly enough, it took much longer to complete than I had anticipated.
It took close to 3 full days of off and on effort and research to get everything put together and optimized. The optimization part was the longest and I’m still not happy with the size of the docker image now that I’ve published everything.
Here are some hard-won lessons:
Have a good directory structure
Initially, my repo consisted of one root directory that contained all my application files and a subdirectory for my email template. The root directory had everything from readme files, configuration files, and python scripts. Everything under the sun! If you ever plan on sharing your code in a Docker image, you need to think about your project’s directory structure a bit more thoroughly.
A better design turns out to be documented in some of the “best practices” for GitHub repos. First, put all your code in a subdirectory off of your main root folder. Create another directory for all files that could be configured by end users of your script. In my case, I created a /custom directory and place the application configuration files, log configuration files, and my email template subdirectory into that new directory. Of course, this required me to make some changes to the code – so expect to re-test your solution.
As I was doing some re-coding, I thought that I would change my code to reference environment variables as opposed to physical locations. Found out that the more you can configure your script to use environment variables from the start, the better for you in the future when it comes to adapting to changes. In my case, I put all configuration bits into a application configuration file and a logging configuration file. Then I created/configured two environment variables, one for the /custom folder and another for the location of the /logs directory, to give my script the required flexibility.
As I mentioned before, make sure you test your modified project’s code to reflect the new directory structure and environment variables. Or learn to start building some automated testing so that you don’t have to continuingly run tests yourself.
Helpful tools
So I started this process with my two standard tools – iTerm2 (work) and Safari (research – Google is your friend.) I was basically creating and editing files right from the console. Nano was my go-to editor. While this worked, I found out that I could have made life a lot easier for myself by using some other tools.
First, GitHub. If you’re half-way serious about building a Docker image, you’ll want to create a new GitHub repo to store the dockerfile you’re going to create. You’ll use this new repo to store all of the scripts you’ll build to automate the Docker instance and all the documentation you’ll need to write. Keep in mind, this is different than the GitHub instance you would have for your code/application. This one is just for tracking everything having to do with the Docker image you’re building.
What surprise me the most was the amount of scripts I had to write to automate the Docker image. In my research, I’ve found that there were some Docker images that had just as much scripting done on automating the Docker image as the actual project! Thankfully I was able to limit mine to two scripts. Yes – I’m sure there will be instances where you can build a Docker image without any supporting scripts. However, you’re more than likely going to have to create some scripts if you want to provide the user with your image a smoother experience.
In my case, I wanted to ensure the user had a default email template and configuration files after the user had spun up a container with my image. The only way to do this was to create a script to copy the necessary files to the mount points the user would provide.
The other tool I recommend that you use: Visual Studio Code – or some other kick-ass editor. I’m starting to fall in love with Visual Studio Code. It just seems to have all the extensions I could ever want. In this case, Visual Studio Code had IntelliSense for the the shell scripts, the README.md file and the dockerfile itself! The awesomeness wasn’t just the IntelliSense bit, but also the built-in help Visual Studio Code kept popping up as I was making modifications. For instance, Visual Studio Code would tell me that some of the commands I had used in the dockerfile were deprecated and what I should be using. Plus, it had links to online help for all the dockerfile commands. I swear, I think that Visual Studio Code has an identity crisis – it just can’t be called a text editor anymore as it’s more akin to a full IDE!
Optimize, test, rinse, repeat
So you’ll see a lot of ‘best practices’ out there for Docker images and one re-occurring one is to have your Docker image do ONLY ONE thing. In my instance, the image is to house the script and provide the user with default templates. I could have stopped there, however, I thought why not just add a default cron job to execute the code? This turn out to be a bit of a challenge to implement but I got it done. So I’m not sure if my Docker image only does one thing – so it sort of bends the rules a bit (I guess.) However if I look at it from a standpoint of providing a simple deployment method for the solution – I guess it does that one thing.
I also wanted to mention that during the process of creating the container, I believe I rebuilt my image at least 50 times, if not more. Each time, making iterative improvements till I painted myself into a corner and had to start two or three steps back. While some of this was configuration of the solution and testing the supporting build scripts, etc., a lot of the rebuilds were focused on optimizing the size of the Docker image.
One such time was when I attempted to reduce the size of the docker image by removing dev and build tools. As my code is written in Python, I have a requirements.txt file that details all of the needed libraries in my code. Some of these libraries have to be built during the install. In order for these libraries to build, I had to download even more libraries that eventually wouldn’t have any use in the completed build. Hence, my dockerfile had a line to remove these packages after I built the needed libraries. This took my 350mb image down to 150mb! I thought I was so clever till I went to execute my code. Turns out I removed far too much. My code wouldn’t work at all. It bitterly complained about everything till I started putting back the libraries I removed.
Eventually I got the image down to about 325mb (150mb compressed on Docker Hub!) The lesson here is that you have to test everything even after you think you’re done with all your coding.
Thanks!
JT
One reply on “Docker Image Building – Lessons learned”
[…] Container Builder [The New Stack] How Docker is disrupting Legacy IT Companies [Sylvain Leroy] Docker Image Building – Lessons learned [recycledpapyr] Docker Tip #1 — Docker Aliases [LooselyTyped] Dockerizing a Spring Boot application with Gradle […]