Jenkins with with PythonEnv

It took me quite sometime to learn the ideosyncracy of withPythonEnv. If you are trying to do something similar to mine, I hope this post helps you. I went through normal channels like StackOverflow.com, Meidum and Google search, I didn’t find much help so I had to bang my head on keyboard to try&error for many times.

Background

First, the prodject structure. This is a Django project. It seems to me that the most of Django project directory found on the internet puts the Django’s directory as root of project. This is pretty unrealistic. On the toplevel, you want to have non-web related things like README, the housekeeping shell scripts, and testing fixures outside of it. The most notable file for this is my “passenger” file which the website DreamHost provides to run WSGI. This is essential to setting up the Django before diving into Django app.

MyProject
  - passenger_wsgi.py
  - DjangoApp
    - DjangoApp
      - settings.py
  - Housekeeping
  - Makefile
  - venv
  - etc.

This Makefile is used to set up Python virtual environment (venv). In order to run the Django app, it needs django and other packages. venv gets created and populated by necessary packages by this Makefile. I wanted vent to be right next to DjangoApp so the passenger_wsgi.py can set up the virtual env and then run the Django requests.
I therefore wanted to do the same for Jenkins. I looked at the withPythonEnv, I thought I can run “make venv” and off I go. Use withPythonEnv(‘venv/bin/python3’) allows me to run Django with Jenkins workspace once I bootstrap the “venv”. I was very wrong.
I create the virutal env, then try to use, but it had different idea. It used the Python I designated to set up its own venv in the workspace.
Also, this means, I cannot use single Makefile verb to do the deployment and Jenkins workspace. I attempted a few things like making Makefile to take care of build/test, dot-include venv/bin/activate for every “sh” comamnd in the jenkinsfile stage. None of this worked.
Using “sh” was most surprising for a noob like me. It created a temp sh file and run it in a sub/sub directory so any of working directory relative includes and executions work.
IOW, I tried to do it without withPythonEnv, and I could not find a good way other than write everything in Makefile and let make to do the work including test. I did not like this idea.
So, I went with withPythonEnv, and ate up the limitation that comes with it. It creates ProjectRoot/.pyenv-<Python> where is the name of Python in Global Tools. I also tried a few permutation of entries in Global Tools, and after some frustrating attempts, I ended up having just a very plain entry.

Python in Jenkins Global Tools Configuration

Albeit this is named for the project, the entry is the most basic.

  - Python
    Name: Python3
    Home or executable: /usr/bin/python3

That’s it. Ignore the yellow warning for the home or executable part. I tried other things like auto-install, and didn’t make sense of it. As withPythonEnv creating its own venv, using venv in Global Tools Configuration makes very little sense. You do need to install “venv” package to the system before doing this. If you don’t have the perm to do so, but if you can become “jenkins” user, you could install “venv using pip so it’s installed under the home directory of user “jenkins”. In other word, using virutal env for this entry makes sense only if you have no root perm or jenkins auth so you must create your own venv to create venv. Since python3 should exist, the simplest solution is to use python3, and install venv package for it.

Jenkinsfile for the project

Here is my Jenkinsfile now.

pipeline {
    agent any

    environment {
        PYTHONPATH = "${env.WORKSPACE}/cworg"
        DJANGO_SETTINGS_MODULE='DjangoApp.settings'
        BUILD_NUMBER = "${env.BUILD_NUMBER}"

        GIT_URL="ntai@git.my-server.com:~/git/MyProject.git"
    }

    options {
        buildDiscarder(logRotator(artifactDaysToKeepStr: '', artifactNumToKeepStr: '', daysToKeepStr: '10', numToKeepStr: '20'))
        timestamps()
        retry(1)
        timeout time:10, unit:'MINUTES'
    }

    parameters {
        string(defaultValue: "master", description: 'Branch Specifier', name: 'SPECIFIER')
    }

    stages {
        stage("Initialize") {
            steps {
            script {
                    echo "${BUILD_NUMBER} - ${env.BUILD_ID} on ${env.JENKINS_URL}"
                    echo "Branch Specifier :: ${params.SPECIFIER}"
            }
            }
        }

        stage('Checkout') {
            steps {
            git branch: "${params.SPECIFIER}", url: "${GIT_URL}"
            }
        }

        stage('Make Virtual Env') {
            steps {
                withPythonEnv('Python3') {
                    sh 'pip install -r requirements.txt'
                }
            }
        }

        stage('Bootstrap') {
            steps {
            dir ("cworg") {
                withPythonEnv('Python3') {
                sh "make bootstrap"
                }
                }
            }
        }

        stage('Build') {
            steps {
            dir("cworg") {
                withPythonEnv('Python3') {
                        sh "make static"
                }
                }
            }
        }

        stage('Test') {
            steps {
                dir('./') {
                withPythonEnv('Python3') {
                        sh "python3 -m pytest"
                }
                }
            }
        }

        stage('Deploy') {
        steps {
        sh "ssh webapp@my-server.com'~/deploy-my-django-app.sh'"
        }
        }
    }
}

With using ‘Python3’ tool thing, it creates jenkins/workspace/MyProject/.pyenv-Python3 I can now run the test with the virutal env and against the test database for the Django project.

Running Jenkins behind nginx

Jenkins does not use HTTPS. It’s a mistery why it does not. So, in order to run this behind HTTPS, you need a reverse HTTP proxy server in order to add “S” to HTTP.
I spent some time looking for ways to set up HTTPS for Jenkins, and the answer was negative. 🙁
Since you don’t want to expose HTTP over network, make sure Jenkins only answers to the localhost. Then, the nginx must be on the same host, or else there is no point of this exercise.

First, Jenkins is working at jenkins_host:9000 and want https runs on 8000. (I just realized the port number choices are kind of weird.)

Install nginx

This is an easy part – “sudo apt install -y nginx”

Configure nginx

This is a little harder part but here is my current config file.

upstream jenkins_host {
  server localhost:9000 fail_timeout=0; # jenkins_host ip and port
}

server {
  listen 8000 ssl;       # Listen on port 8000 for IPv4 requests with ssl
  server_name     jenkins_host.cleanwinner.com;

  ssl_certificate     /etc/ssl/cleanwinner/jenkins_host-nginx-selfsigned.crt;
  ssl_certificate_key /etc/ssl/cleanwinner/jenkins_host-nginx-selfsigned.key;

  access_log      /var/log/nginx/jenkins/access.log;
  error_log       /var/log/nginx/jenkins/error.log;

  location ^~ /jenkins {
    proxy_pass          http://localhost:9000;
    proxy_read_timeout  30;

    # Fix the "It appears that your reverse proxy set up is broken" error.
    proxy_redirect      http://localhost:9000 $scheme://jenkins_host:8000;
  }

  location / {
    # Don't send any file out
    sendfile off;

    #
    proxy_pass              http://jenkins_host;
    proxy_redirect http:// https://;

    # Required for new HTTP-based CLI
    proxy_http_version 1.1;

    # Don't want any buffering
    proxy_request_buffering off;
    proxy_buffering off;         # Required for HTTP-based CLI to work over SSL

    #this is the maximum upload size
    client_max_body_size       10m;
    client_body_buffer_size    128k;

    proxy_set_header        Host $host:$server_port;
    proxy_set_header        X-Real-IP $remote_addr;
    proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header        X-Forwarded-Proto $scheme;

    # workaround for https://issues.jenkins-ci.org/browse/JENKINS-45651
    add_header 'X-SSH-Endpoint' 'jenkins_host.cleanwinner.com:50022' always;
  }
}

So have this file as /etc/nginx/avaialbe-site/jenkins. You need to a link from /etc/nginx/enabled-site to this file in order for this setting to work. “sudo ln -s ../site-available/jenkins” in /etc/nginx/site-enabled is good.

cert files

As you can see, for SSL, you need a SSL certificate. You can create a self-signed, or get something real. For this exercise, it’s not quite relevant so I’ll leave it to you. I’ll talk about making one with pfSense. Stay tuned.

Memo to myself:

sudo openssl req -x509 -nodes -days 999 -newkey rsa:2048 -keyout /etc/ssl/cleanwinner/jenkins_host-nginx-selfsigned.key -out  /etc/ssl/cleanwinner/jenkins_host-nginx-selfsigned.crt

Jenkins session timeout

On Ubuntu, if you are using Jenkins package, you can change the session timeout in /etc/default/jenkins.

JENKINS_ARGS=" BLA BLA -- --sessionEviction=604800"

I tried –sessionTimeout and it does not work.

Where BLA BLA is the existing args and --sessionEviction=604800 is the new session timeout. The default is 30 minutes and I was timing out a lot while testing Jenkinsfile. Unlke sessionTimeout, sessionEviction’s unit is in seconds not minutes. 604800 is 60*60*24*7 so the timeout is a week.

RealWorld app + Spring + PostgreSQL

As WCE’s app is settling down, I’ve been itching to write a backend for my personal use. It’s just a simple database app to keep track of my tennis racquets. This can be done with Python which I’m very confortable with, and probably Django to go with it. Then, this is really a weekend project.
So, I decided to go with most unconfortable environment. Java + Spring. TBH, I was a bit impressed about Spring. When Java EE, Java Beans came out, I thought it’s a real hustle to write Java backend. You have to do so much boilerplate coding just to get things going. Looking at Spring, I can see that Spring is the thing that should have been for backend development.
So, I started writing in Java and Spring on Mac. I got totally wedged. I don’t know much about Spring, or MyBatis, or any of it. And realized – oh wait, this isn’t much different from RealWorld backend. I can probably start from the mother of all examples. Download front and backend repo, build, and it just works on Linux. Used this as a starting point, and see what it takes to make this work with PostgreSQL is a good way to go. This is my 2nd day of using Java and Spring. So, the struggles ensued. This post is about just that. But, without struggle, you cannot learn anything.

Using PostgreSQL JDBC driver

It is definitely not just a matter of setting database connection URL to PostgreSQL. First, you need to install JDBC driver. On Ubuntu, this would be
sudo apt install libpostgresql-jdbc-java
Then, tell gradle to use it.

diff --git a/build.gradle b/build.gradle
index 62d7690..7ff7df7 100644
--- a/build.gradle
+++ b/build.gradle
@@ -48,7 +48,7 @@ dependencies {
     compile('org.springframework.boot:spring-boot-starter-security')
    compile('joda-time:joda-time:2.10')
     compileOnly('org.projectlombok:lombok')
-   runtime('com.h2database:h2')
+   runtime('org.postgresql:postgresql:9.4.1212')
     testCompile 'io.rest-assured:rest-assured:3.1.1'
    testCompile 'io.rest-assured:spring-mock-mvc:3.1.1'
    testCompile 'org.springframework.security:spring-security-test'

URL for PostgreSQL

Point URL to the database. I didn’t want to run the PostgreSQL instance on my laptop, so it’s running on “nefertiti” which is my Linux desktop in house.

--- a/src/main/resources/application.properties
+++ b/src/main/resources/application.properties
@@ -1,4 +1,9 @@
 spring.jackson.deserialization.UNWRAP_ROOT_VALUE=true
+#
+spring.datasource.url=jdbc:postgresql://nefertiti:5432/my_db
+spring.datasource.username=my_db_user
+spring.datasource.password=my_db_password
+#
 image.default=https://static.productionready.io/images/smiley-cyrus.jpg
 jwt.secret=nRvyYC4soFxBdZ-F-5Nnzz5USXstR1YylsTd-mA0aKtI9HUlriGrtkf-TiuDapkLiUCogO3JOK7kwZisrHp6wA
 jwt.sessionTime=86400
@@ -11,3 +16,6 @@ mybatis.mapper-locations=mapper/*.xml
 logging.level.io.spring.infrastructure.mybatis.readservice.ArticleReadService=DEBUG
 # Uncomment the following line to enable and allow access to the h2-console
 #spring.h2.console.enabled=true
+#
+spring.flyway.baseline-on-migrate=true
+spring.jpa.properties.hibernate.jdbc.lob.non_contextual_creation=true

spring.flyway.baseline-on-migrate=true

So, what’s the last two lines added. spring.flyway.baseline-on-migrate=true tells flyway it’s okay to create DB schema migration table when it doesn’t exist. Without this, the schema migration doesn’t take off – aka – no flyway. Oh, I’m actually not using flyway but I’m planning to use it.

spring.jpa.properties.hibernate.jdbc.lob.non_contextual_creation=true

This tells hibernate to not use blob/clob. I probably have to come back to this. Right now, this makes a column to use varchar in place of blob. This is def. not what I want. I have to come back and figure this out to use binary array type of PostgreSQL.

With this, the backend starts, and I can see the backend created necessary tables from the DB migration.

With a lot of googling, it took me a day to get this point but it’s working, and limping. Went to play tennis, ruined my car tire (unfortunate,hate New England winter) and a lot of bad words are said to the pavement.

Select statement with limit does not work

Running the backend from IntelliJ, running query is not happy. It’s a SQL syntax error coming out when retriving the articles from the database. Breakpointing at the “bad SQL exception” gave me a WOT (how can Java people live with this?). Anyhow, the exception is “PostgreSQL does not support the LIMIT ?, ?”. I stepped through the code with debugger, and came to the conclusion that the SQL statement is coming from template, not actually auto-genned. Finally, I came to a conclusion that the source code somewhere is setting up the template SQL. Once I realized this, it took my literal 10 seconds to fix this up.

--- a/src/main/resources/mapper/ArticleReadService.xml
+++ b/src/main/resources/mapper/ArticleReadService.xml
@@ -55,7 +55,7 @@
             </if>
         </where>
         order by A.created_at desc
-        limit #{page.offset}, #{page.limit}
+        limit #{page.limit} offset #{page.offset}
     </select>
     <select id="countArticle" resultType="java.lang.Integer">
         select
@@ -93,7 +93,7 @@
         <foreach index="index" collection="authors" item="id" open="(" separator="," close=")">
             #{id}
         </foreach>
-        limit #{page.offset}, #{page.limit}
+        limit #{page.limit} offset #{page.offset}
     </select>
     <select id="countFeedSize" resultType="java.lang.Integer">
         select count(1) from articles A where A.user_id in
@@ -105,4 +105,4 @@
     <resultMap id="articleId" type="string">
         <id javaType="string" column="articleId"/>
     </resultMap>
-</mapper>

Conclusion

The demo app backend says it’s just a matter of setting URL to other database to use different JDBC driver. It’s pretty close to it, but not really. The source contains some dependancies to SQL dialect. PostgreSQL lacks blob/clob so this probably needs to be patched up. For now, I can move forward with my racquet database app.

P.S.

I just run the unit tests and a bunch are failing. I guess I need to wear the hardhat a little longer.

Wiring webhook from GitHub to Jenkins

I was using email hook for Jenkins to build the release build of WCE Triage UI but I wanted to “modernize” it by using webhook.
For cold Sunday morning with a cup of coffee, I started poking around.
I have a local Jenkins instance. First order is to expose this to wild through https.

Step 1 Jenkins with nginx https

Not going into super details here about running Jenkins behind https. As you know, Jenkins runs with http only. In order to use https, you need a reverse proxy. Since this is a common practice, there is a template for running Jenkins behind nginx.
Making this to work is a task of itself.
First and foremost, I had to change the /etc/default/jenkins.
JENKINS_ARGS="--webroot=/var/cache/$NAME/war --httpPort=$HTTP_PORT
to
JENKINS_ARGS="--webroot=/var/cache/$NAME/war --httpPort=$HTTP_PORT --prefix=$PREFIX"

Create a reverse proxy with nginx. This is my current nginx site file.
You need to replace jenkinshost.cleanwinner.com with your machine. Put this file as /etc/nginx/site-available, and symblic link from /etc/nginx/site-enabled Then reload/restart nginx.

Step 2 Punching a hole through firewall

Now, my Jenkins is inside, and github.com is outside. Webhook needs to go through it. I took a look at ngrok.com, and it works great but I will risk my home server for $5/month. 😛

How you punch through the firewall is up to you/your firewall. In my case, it’s a pfSense, so you go into NAT and set it up. With pfSense, you should use the source IP range to be for github.com.

The github IP address range is posted here but you should know that there is an API to get this automagically.
Since I don’t know how to set up the IP range for pfSense from the API, for now, I will use the available setting for now.

Step 3 Configure GitHub Webhook

Go to the project’s settings.
For here, I need a token string. I used uuidgen to create a random string. Let’s call it “MY_SECRET_TOKEN”.

Webhook / Manage webhook
Payload URL
https://jenkinshost.cleanwinner.com/jenkins/generic-webhook-triger/invoke?token=MY_SECRET_TOKEN

Content type
application/x-www-from-urlencoded
Secret:

Empty here. I’m not sure what I can do with this.

Event:
* Just the push event

Active checked.

Making it go and you should see an 404 error as Jenkins is not configured yet.

Step 4 Configure Jenkins webhook

Go into the Build Triggers.

Check Generic Webhook Trigger

Now, you want to make sure the right item is triggered. I know the repo ID (it’s like 123456789).

Add “Post content parameters”:
Variable is “$.repository.id” and Expression is “123456789”. Pick JSONPath. Here “$” is the root element. .repository is the first level of JSON, and .id is the second level. The matching JSON looks like

{ "repository": { "id": "1234567898" } }

This is enough to ID the repo but I’m a kind of person when the bridge is not broken, just bang it again to make sure it’s not broken.

Add another “Post content parameters”:
This time, variable is “$.repository.html_url” and value is “https://github/mygithubaccount/project-name”

Now, really, you don’t have to do this, but I did:

Header parameters:
* Request Header: x_github_event
* Value filter: ping

Then the most important part:

Token: YOUR_SECRET_TOKEN

Cause: Github Webhook push

Step 5

Okay. Once this is done, it’s time to test this. If you did step 3, and push a trigger and fail, you can see that the github project page shows the failed push. In it, you can see the header and payload. My Github webhook’s set up is done by looking at the header and payload. WIthout it, I had no clue as to how I can set it up. The reason why setting up Github first gives you this clue.

I can report that this is working happily. Hope this helps someone.

Possible Step 6

If you don’t mind paying $5/month, you can use ngrok.com so that you can do this without punching a hole through firewall.

Running TypeScript Project causes inotify to run out

I decided to redo my React project with TypeScript. It causes an error event and doesn’t run at all.

Starting the development server...

events.js:183
      throw er; // Unhandled 'error' event
      ^

Error: watch /home/ntai/sand/triageui/public ENOSPC
    at _errnoException (util.js:1022:11)
    at FSWatcher.start (fs.js:1382:19)
    at Object.fs.watch (fs.js:1408:11)
    at createFsWatchInstance (/home/ntai/sand/triageui/node_modules/chokidar/lib/nodefs-handler.js:38:15)
    at setFsWatchListener (/home/ntai/sand/triageui/node_modules/chokidar/lib/nodefs-handler.js:81:15)
    at FSWatcher.NodeFsHandler._watchWithNodeFs (/home/ntai/sand/triageui/node_modules/chokidar/lib/nodefs-handler.js:233:14)
    at FSWatcher.NodeFsHandler._handleDir (/home/ntai/sand/triageui/node_modules/chokidar/lib/nodefs-handler.js:429:19)
    at FSWatcher.<anonymous> (/home/ntai/sand/triageui/node_modules/chokidar/lib/nodefs-handler.js:477:19)
    at FSWatcher.<anonymous> (/home/ntai/sand/triageui/node_modules/chokidar/lib/nodefs-handler.js:482:16)
    at FSReqWrap.oncomplete (fs.js:153:5)
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

Process finished with exit code 1

After few minutes of Google, it turns out file system monitor resource is running out. You need to increase the fs.inotify.max_user_watches.

Add following line to “”/etc/sysctl.conf”.

fs.inotify.max_user_watches = 524288

Then let the system fetch the value by:

sudo sysctl -p

WCE Triage software installer

Recipes

This is how-to about creating own installer from the disk image.

Creating your own installer installer

Here is the steps to create your own installer.

Ingredients

Computer with Windows 10 + USB3 interface
Passmark ImageUSB
2 x USB3 disks
You can use USB sticks but it makes the process very very slow. So use USB3 disks. USB based docking station is fine but you’d need two slot docks. USB2 is okay but it would take 3x slower.
Installer image

Recipe

Download ImageUSB if you don’t have it already.
Download Installer image “inst_amd64.img.zip”
Unzip inst_amd64.img.zip and make inst_amd64.img.
Plug in a USB disk. Let’s call this Baby Disk.
Bake Baby Disk
Start ImageUSB, choose Baby Disk on step 1, step 2 is left as si, step 3 – choose image source “inst_amd64.img/inst_amd64.img” (Browse/Save), then step 4 “Write”. The process takes about 5 minutes depending on the disk speed.
Unplug Baby disk and let it cool
Plug in second USB disk. Let’s call this Mom Disk.
Bake Mom Disk – steps are identical to Bake Baby Disk
Restart computer, and boot from Mom Disk in update mode
When it boots, you have 10 seconds to make a selection change from boot screen. You need to choose “Software Update” which is the third option of boot menu. First menu is the default but for Mom Disk to have a baby image, you need to boot in update mode.
Wait for WCE Triage web page to show up.
Very often, you have to hit “ALT-F4” to restart the browser. (I will fix this at some point)
Create an installer image from Baby Disk
Plug in the Baby Disk. Choose “Create Disk Image” tab. Hit “Reset” button so the Baby Disk shows up.
Click the left check box to select the Baby Disk.
On the top “Triage 64bit USB flash”.
Then, click “Save”.
Once it’s complete, the Mom Disk contains the disk image of Baby Disk which is the installer itself.
Shutdown computer and unplug Baby Disk

Tasting

Boot from Mom Disk again, but this time choose the first menu option.
Once the web page shows up for Triage, plug in the Baby Disk again.
Go into “Load Disk Image”, choose the disk image of Baby Disk you just created, and load the disk image to the Baby Disk
This is so that the freshly created partition on the disk is expanded to fill entire disk instead of the disk image file which is only 3.5GB.

Creating your own content loading disk

Ingredients

Powerful computer
Existing Ubuntu computer with WCE contents, or a disk that contains the WCE contents
Baby Disk

Recipe

If necessary, clean up the Ubuntu, such as vacuuming file system journals, truncating log files, and most of all, empty trash. apt’s package cache is also large, and recommended to remove them all.
Boot from Baby Disk in Update mode
Go to “Create Disk Image”
Choose the Ubuntu disk, Choose “WCE Ubuntu 18.04LTS”, and “Save”
Unlike small installer image, Ubuntu 18.04LTS with WCE contents requires 45GB disk space, and thus a lot of CPU power to compress the disk image.
Once done, now the Baby Disk contains the master Ubuntu 18.04LTS with WCE contents!
Once the disk image is created, you don’t need to do the same time consuming compression. You can just copy the disk image file to other installers.

Making USB stick installer

This is also “Triage”. When there is no payload on the usb stick, all it does is to gather up the info of computer and display.

Ingredients

Mom Disk
USB stick
A computer

Recipe

Boot from Mom Disk with first menu option
After it boots up, go to “Load Disk Image”, choose USB stick, and “Load”

Copying Disk image from one installer to other

Unfortunately, there is no user interface to copy the disk image. You need to do this on Ubuntu computer for now. It’s probably better to copy the disk image file to the computer first, and then copy to other USB sticks. Writing to USB stick is VERY VERY VERY slow. I have tried a few different ways to make this process faster but so far, the fastest possible way requires making a disk image, which isn’t automated yet.

Setting up a server for diskless clients using Ubutnu 18.04LTS server

Step 1 – install Ubuntu server
Step 2 – configure network interfaces
Step 3 – install dnsmasq
Step 4 – install atftpd
Step 5 – install kernel-nfs-server
Step 6 – struggle

For installing Ubuntu server, do whatever needed. Not going to describe. You don’t need to install any extra package but openssh server so you can log into it after installation. Unless you are going to install X11-based desktop on it, you want a web browser and terminal running at same time.

It’s far easier to have 2nd network interface, one for actual network clients and one for connecting to the server from other machine. You are going to run dnsmasq for DHCP/PXE boot so you do not want to run it on your existing network. For the machine I am installing, unfortunately has one Ethernet port so I’m using a USB Ethernet adapter.

Existing netplan is at /etc/netplan/50-cloud-init.yaml which is created at installation time. I don’t need this so I deleted it. netplan reads every .yaml file so you can name it as you like. Just for the good measure to make sure it’s not coming back, I followed the message in it and created /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg file

network: {config: disabled}

Then proceed to create a new network set up. I named it as /etc/netplan/installation-server.yaml. As you can see, I chose 10.3.2.0/24 as the subnet. Make sure there is no TAB character in this file. (I did, and netplan doesn’t like it.)

network:
  version: 2
  ethernets:
    enx803f5d08c3a0:
      optional: true
      dhcp4: true
    enp3s0:
      optional: true
      dhcp4: false
      addresses: [10.3.2.1/24]
      gateway4: 192.168.10.1
      nameservers:
        addresses: [10.3.2.1]

# netplan generate
# netplan apply
# # wait a couple of seconds for new DHCP lease
# ip addr

Install dnsmasq (sudo apt install dnsmasq) and proceed to set it up for PXE boot. You can make this fancy to put a config in /etc/dnsmasq.d but let’s go for simpler route and write up the /etc/dnsmas.conf. In case you want to see what’s in the original, keep the original in /etc/dnsmasq.d-available. (mv /etc/dnsmasq.conf /etc/dnsmasq.d-available). The new /etc/dnsmasq.conf looks as follow. It’s important to not offer DHCP on the network you are using. Oh, the server name is “wcesrv”.

no-dhcp-interface=enx803f5d08c3a0
no-hosts
expand-hosts
no-resolv
#
address=/wcesrv/10.3.2.1
#
dhcp-range=10.3.2.100,10.3.2.199,2h
# router and dns server
dhcp-option=3,10.3.2.1
dhcp-option=6,10.3.2.1
#
pxe-service=x86PC, "Boot from local disk"
pxe-service=x86PC, "Install WCE Ubuntu", pxelinux

dnsmasq can do tftp but I’m not going to use it. I use “atfpd” with inetd. There is not a lot of reasons for this but I find it easier to understand the logs coming out from tftpd during setting up. Install “atftpd”. (not sure atftpd is any better than tftpd but I like the name “advanced”. Not a great reason.) apt install -y atftpd. Then tell inetd to start it. Following is the content of /etc/inetd.conf. When you install atftpd, it adds a line to inetd.conf. Depending on the package version, the line may different. Only thing to make sure is the last arg – /srv/tftp. I’m going to use /var/lib/netboot

tftp		dgram	udp	wait	nobody /usr/sbin/tcpd /usr/sbin/in.tftpd --tftpd-timeout 300 --retry-timeout 5 --mcast-port 1758 --mcast-addr 239.239.239.0-255 --mcast-ttl 1 --maxthread 100 --verbose=5 /var/lib/netboot

Install nfs-kernel-server. Set up the client’s file system, and export it. I’m using /var/lib/netclient. Then, populate the files under /var/lib/netclient. I find it easier to use rsync. I create /var/lib/netclinet/wcetriage.0.1.20 subdirectory and copy the installer there. This is so if I want to try out a new version, I can have separate set, and easier for the future. Reload the config afterward. “reload” is probably doing “exportfs -a” underneath. You can manually do so by “exportfs -av” (where v is verbose.)

/var/lib/netclient  *(ro,sync,no_wdelay,insecure_locks,no_root_squash,insecure,no_subtree_check)

# systemctl restart nfs-kernel-server.service

Install PXELINUX and populate them to the client file system. apt install -y pxelinux syslinux. Then copy. Use “-p”. Actual modules is in the package “syslinux-common”. I don’t need all those modules but disk is cheap. I think I need dhcp, disk, elf, gfxboot,ldlinux, pxechn, vesa, vesainfo, vesamenu. (not 100% sure, and also changes depending on the pxelinux.cfg.)

cp -p /usr/lib/PXELINUX/pxelinux.0 /var/lib/netboot/
cp -pr /usr/lib/syslinux/modules/bios/* /var/lib/netboot/
mkdir /var/lib/netboot/pxelinux.cfg

Here is the “default” file (/var/lib/netboot/pxelinux.cfg/default)

DEFAULT vesamenu.c32
TIMEOUT 100
TOTALTIMEOUT 600
PROMPT 0
NOESCAPE 1
ALLOWOPTIONS 1
# MENU BACKGROUND wceboot2.png
MENU MARGEIN 5

MENU TITLE WCE PXE Triage

LABEL WCE Triage
  MENU DEFAULT
  MENU LABEL WCE ^Triage
  KERNEL wcetriage.0.1.20/vmlinuz
  APPEND initrd=wcetriage.0.1.20/initrd.img hostname=bionic nosplash noswap boot=nfs netboot=nfs nfs=on nfsroot=10.3.2.1:/netclient/wcetriage.0.1.20/ toram acpi_enforce_resources=lax edd=on ip=dhcp ---
  TEXT HELP
  * WCE Traige V2 alpha 0.1.20
  ENDTEXT

LABEL Local
  MENU LABEL Local operating system in harddrive (if available)
  KERNEL chain.c32
  APPEND sda1
  TEXT HELP
  Boot local OS from first hard disk if it's available
  ENDTEXT

Now, let’s fun begin. (not really.) First test is to make sure dnsmasq serves DHCP. Connect a computer to the dnsmasq serving port and test. Gigabit ethernet’s auto-negotiation does wonder and you don’t even need a hub. Just a cable, and point-to-point is just fine. If netplan/network service and dnsmasq are working, you should get a DHCP lease.

Then, reboot the computer and try PXE boot. Watch the tftp server log. Hummm. TFTP is not working. Seems like I forgot to reload inetd.conf.

systemctl restart inetutils-inetd.service

I actually stopped the inetd, and run the inetd with “-d” debug option so I can observe. I made a mistake in /etc/inetd.conf, giving a wrong directory for tftp. So, I stopped inetd, and restart after changing inetd.conf. Then, I tried again. Still no go. It turns out that vmlinux is 0600 so inetd/tftp cannot read it. initrd file has 0644 so that would work. This must be one of those newer security thing. For now, solution is to copy the file and set different permission. I’m afraid to change the file perm of original. I created a directory in netboot (/var/lib/netboot/wcetriage.0.1.20), and copied the vmlinuz and intird.img from /var/lib/netclient/boot/. Now the boot process reads them, and stops at mounting NFS file root and drops into busybox. Progress!

Looks like nfsroot= needs a full path. Set it to root=10.3.2.1/var/lib/netclient/wcetriage.0.1.20 and try again.

Now, the error message is “mount: cant’ find /root in /etc/fstab”. After couple of hours of head scratching, it came to the conclusion. The /var/lib/netclient hosted file system has no nfs client installed. (Doh!) I installed “nfs-common”, and repopulated the client file system Also the fstab. Not only it needs to mount nfs as ‘/’, it also needs /tmp, etc.

#                
proc            /proc           proc    nodev,noexec,nosuid 0       0
#
10.3.2.1:/var/lib/netclient/wcetriage.0.1.20  /               nfs   (soft,rsize=32768,wsize=32768,proto=tcp,nolock)   0       0
#
none            /tmp            tmpfs   defaults        0       0
none            /var/run        tmpfs   defaults        0       0
none            /var/lock       tmpfs   defaults        0       0
none            /var/tmp        tmpfs   defaults        0       0

At this point, it boots, but a service – X server is failing. The log says – /root/.Xauthority. Right. It is mounted as read-only. The client side has “aufs as root file system” thing put in, but I try to boot with it, it fails without any further information. It turns out, the exported NFS root file system doesn’t have the mount points for /aufs, /ro and /rw that I used for the aufs to work. I created the three directories, and it finally booted correctly. So the final bootstrap config file looks as follow. Also, I did not mention about how to do “aufs=tmpfs” part. This comes from Ubuntu community doc, and it mostly works. (Yes, mostly… cuz, it was written for older kernel.)

DEFAULT vesamenu.c32
TIMEOUT 100
TOTALTIMEOUT 600
PROMPT 0
NOESCAPE 1
ALLOWOPTIONS 1
# MENU BACKGROUND wceboot2.png
MENU MARGEIN 5

MENU TITLE WCE PXE Triage

LABEL WCE Triage
  MENU DEFAULT
  MENU LABEL WCE ^Triage
  KERNEL wcetriage.0.1.20/vmlinuz
  APPEND initrd=wcetriage.0.1.20/initrd.img hostname=bionic nosplash noswap boot=nfs netboot=nfs nfs=on nfsroot=10.3.2.1:/netclient/wcetriage.0.1.20/ toram acpi_enforce_resources=lax edd=on ip=dhcp aufs=tmpfs ---
  TEXT HELP
  * WCE Traige V2 alpha 0.1.20
  ENDTEXT

LABEL Local
  MENU LABEL Local operating system in harddrive (if available)
  KERNEL chain.c32
  APPEND sda1
  TEXT HELP
  Boot local OS from first hard disk if it's available
  ENDTEXT

WCE Triage Update

WCE Triage softawre update

This describes the steps to update the network server, USB stick and the installation laptops. Some of steps are the same.

Network server update

Updating requires an internet connection. If the machine has a wifi, it’s probably easier to use the wifi. By default, the wifi network tries wcetriage/thepasswordiswcetriage. You can use any wifi SSID/Password but it involves editing /etc/netplan/ files. It would be easier to temporary create a guest wifi network and use it. For the machine with more than 4 ethernet port, one of port is left for DHCP. For example, papa bear has 1 on-board ethernet and 4 ethernet on PCI-E card. On-board ehternet can be connected to LAN for internet connection. For baby bear, if you have a USB-to-Ethenet adapter, it would be easier, but then again, it would require to edit a netplan file.
1. Power up
2. Login as “triage”
3. Start terminal
4. make update
5. python3 -m wce_triage.setup.update_share
6. python3 -m wce_triage.setup.update_triage_ui
7. python3 -m wce_triage.setup.update_client

make update

In /home/triage, there is a Makefile. “make update” runs the “update” section of Makefile, which runs pip3 command to download the wce_triage package from test.pypi.org.

update_share

This syncs the latest disk image metadata (aka .disk_image_types.json) files in the “wce_triage” package to the wce’s directory under /usr/local/share as well as the nfs mounts for netboot clients.

update_triage_ui

This downloads the latest triage UI (React.JS app) from WCE’s google drive and deployes in the WCE’s triage-ui directory.

update_client

This copies “/usr/local/lib/python3.6/dist-packages” directory to the netboot clients.

Triage/USB stick update

Updating requires an internet connection. It is definitely easier to use a laptop with wifi, and the wifi is “wcetriage”.
1. Boot from the USB stick and choose “Update” option in the boot menu which is the 3rd option of boot entry.
2. If you fail to chose “update” mode, the machine boots and mounts USB stick with read-only so you cannot update.
3. Wait for the triage screen to show up
4. Hit ctrl-alt-F1 to switch to terminal
5. Login as “triage”
6. make update
7. python3 -m wce_triage.setup.update_share
8. python3 -m wce_triage.setup.update_triage_ui
Unlike installation server, no need for “update_client” as there is no netboot client.

Not Wait For Network

systemd-networkd-wait-online.service is a once shot service that holds the rest of boot up to wait until the network is up. This is fine for 99% of cases but it’s not fine for my USB storage that is used for diagnosing hardware problem. In other word, the reason I’m booting from this USB stick is for trouble shooting network. If you are using Network Manager, it would be fine, then again, this is to trouble shoot the netplan config file and what not.
In any rate, I don’t want to forever for network to be up.

systemd-networkd-wait-online.service is the one shot service that waits for the network up and running so that’s what I have to disable.
sudo systemctl disable systemd-networkd-wait-online.service sudo systemctl mask systemd-networkd-wait-online.serviceThis is the 2nd time I googled, so it’s better to be left here.