All posts by Mattias Jiderhamn

Why I don’t use var keyword in Java

Admittedly Java has had the var keyword for a few years now, since Java 10 (and C# obviously has had it for a lot longer), and while I was involved in a few “for or against” discussions at the time, I realize I never officially posted my position on the matter. So here goes.

I recommend not using the var keyword in Java

In my case, it is not simply personal taste or a matter of opinion – I actually believe it makes you less productive. I base this belief upon two facts.

1. Developers spend much more time reading code than writing code. For example, in Clean Code: A Handbook of Agile Software Craftsmanship Robert C. Martin claims “the ratio of time spent reading versus writing is well over 10 to 1”.

That is, developers spends more than 90% of their coding time reading existing code.

So while the var keyword may save you a little bit of time while writing code (I would say very little with a good IDE), if you only spend 10% of your time actually writing code – isn’t it more interresting how the var keyword affects your reading…?

2. Eye movement analysis of people reading code show that we are repeatedly revisiting the initial variable declarations (reference at the end). It seems that, even in cases with only a few local variables, the mind needs a constant refresher of what these variables mean, and – at least I’m assuming – their type. This happens so frequently that researches even gave the behaviour a name – “retrace declaration”.

Now, let’s assume a method with this piece of code.

var ratio = foo.calculateRatio();

When reading this line, what type is the variable? int? double? String??? A Map keyed by class X with the ratio double as value? Admittedly, upon first read it may not matter (which is kind of the idea with var, right?). But when your mind and eyes are repeatedly doing declaration retraces, do you think the process will be faster or slower compared to if explicit typing was used, such as

int ratio = foo.calculateRatio();

or

Map<Customer, double> ratio = foo.calculateRatio();

?

I know what I believe. And I believe the case is the same when initializing with non-return values, such as

var foo = 1;
var bar = new HashMap<String, Integer>();

vs

int foo = 1;
Map<String, Integer> bar = new HashMap<>();

In the var case you’d need to read more or less the whole line (well, you could skip var itself) to realize of what type the variable is. In the explicitly typed example however, I’m assuming it is enough for the brain to pick up the following, beforing allowing it to re-realize the type and hopefully resume analyzing the use of the variable:

int foo
Map<String, Integer> bar

Admittedly I have not seen any studies on this particular matter, but assuming I’m right, it would mean explicit typing lets you read less number of characters (or rather “words” since the brain normally doesn’t process character by character) than you would have needed to read with var, and thus using var would force you to spend more time reading code than explicit types.

And since reading makes up 90% percent of your coding, in total you would be less productive, even if you saved a second here or there of your typing.

If your experience is contrary to the above or, better yet, if you know of detailed studies explicitly focusing on eye movement / productivity effects of var, please let me know in the comments.

Reference for the eye movent study: Uwano, H., Nakamura, M., Monden, A. and Matsutomo, K. Proceedings of the 2006 Symposium on Eye Tracking Research & Applications as referenced with charts and all by Jason Cohen in Best Kept Secrets of Peer Code Review.

APN for Pocket SIM for Japan

If you find yourself arriving to Tokyo – maybe via the Haneda airport – and having bought a Japanese prepaid SIM card – Pocket SIM – at the airport which you cannot get working, maybe this will help.

The included pamplet says to use the following settings (so maybe Google led you here?)

Nameppsim
APNppsim.jp
User namepp@sim
Passwordjpn
Authentication typePAP or CHAP

but they won’t work.

The actual settings can be found in this Twitter thread. Try the following (Pocket SIM Premium)

APNppsim.jp
User namepp@psim
Passwordjpn

or (Pocket SIM)

APNpsim.jp
User namejapan@psim
PAsswordjapan

or

APNdm.jplat.net
Usernamepocket@sim
Passwordjapan

P.S. If you haven’t bought the SIM card already, I recommend that you try to find the U-Mobile prepaid SIM card, which is sometimes sold in vending machines.

Automatically expand statically provisioned disks for StatefulSet in AKS

When working with StatefulSets in Kubernetes, you can use volumeClaimTemplates to have K8s dynamically provision Persistent Volumes for you. In the case of Azure Kubernetes Service, these end up in the MC_resouce-group-name_aks-name_region resource group together with the other automatically provisioned resources like VMs and Load Balancers.

Statically provisioned disks

For persistent data (in its broader meaning, not K8s terms), you may not want to tie your data storage lifecycle to your Kubernetes cluster. Instead you may wish to create you storage – such as Azure Managed disks – outside of K8s (say, Terraform) and then map them to your pods.

The Azure Kubernetes Service documentation contains a simple example of how this can be achieved for a single pod. Andy Zhang, working with Kubernetes at Microsoft, has a GitHub repository with more detailed examples of static provisioning.

In short, you create your Persistent Volume

apiVersion: v1
kind: PersistentVolume
metadata:
  name: my-statically-provisioned-disk
spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  azureDisk:
    kind: Managed
    diskName: {diskName}
    diskURI: /subscriptions/{subscriptionId}/resourcegroups/{resourceGroupName}/providers/Microsoft.Compute/disks/{diskName}
    fsType: ext4
    readOnly: false
    cachingMode: ReadOnly

and then you can reference it either directly from your pod, or via an intermediate PersistentVolumeClaim.

StatfulSet

When you want to use statically provisioned disks as the persistent volumes of a StatefulSet, you could create PersistentVolumeClaims with the same names as those that the StatefulSet would have created for dynamic provisioning (claimname-podname-N), before you create the StatefulSet.

A more elegant solution however, is to use match labels to identify the PVs to use for your StatefulSet pods, as per this Stackoverflow anser.

apiVersion: v1
kind: PersistentVolume
metadata:
  name: my-statically-provisioned-disk
  labels:
    app: influxdb # Used by volumeClaimTemplates
spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  azureDisk:
    kind: Managed
    diskName: {diskName}
    diskURI: /subscriptions/{subscriptionId}/resourcegroups/{resourceGroupName}/providers/Microsoft.Compute/disks/{diskName}
    fsType: ext4
    readOnly: false
    cachingMode: ReadOnly
---
apiVersion: apps/v1
kind: StatefulSet
...
spec:
  ...
  template:
    ...
    spec:
      ...      
  volumeClaimTemplates:
  - metadata:
      name: influxdata
    spec:
      selector:
        matchLabels:
          app: influxdb # The labels that the PersistentVolumes must have to be used to fulfil this claim
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 1Gi

Automatic volume expansion

Kubernetes supports automatic expansion of dynamically provisioned disks, and we can actually make this work for statically provisioned disks as well.

First, regardless of dynamic vs static provisioning you must have a StorageClass with allowVolumeExpansion: true. As per the Azure documentation, this is not the case for the built in storage classes. Instead you will need to create your own StorageClass, for example

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: expandable-managed-premium
provisioner: kubernetes.io/azure-disk
reclaimPolicy: Delete
allowVolumeExpansion: true # Change compared to built in managed-premium
parameters:
  storageaccounttype: Premium_LRS
  kind: Managed

Now, make sure to reference this storage class from your volumeClaimTemplates and, in the case of static provisioning, your PersistentVolume.

In the case of statically provisioned disks, I also suggest you add the pv.kubernetes.io/provisioned-by: kubernetes.io/azure-disk annotation. (I have yet to verify whether this is required or not.)

So we end up with something like this

apiVersion: v1
kind: PersistentVolume
metadata:
  name: my-statically-provisioned-disk
  labels:
    app: influxdb # Used by volumeClaimTemplates
  annotations:
    pv.kubernetes.io/provisioned-by: kubernetes.io/azure-disk
  finalizers:
  - kubernetes.io/pv-protection
spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  azureDisk:
    kind: Managed
    diskName: {diskName}
    diskURI: /subscriptions/{subscriptionId}/resourcegroups/{resourceGroupName}/providers/Microsoft.Compute/disks/{diskName}
    fsType: ext4
    readOnly: false
    cachingMode: ReadOnly
  storageClassName: expandable-managed-premium # Support volume expansion
  volumeMode: Filesystem
---
apiVersion: apps/v1
kind: StatefulSet
...
spec:
  ...
  template:
    ...
    spec:
      ...      
  volumeClaimTemplates:
  - metadata:
      name: influxdata
    spec:
      selector:
        matchLabels:
          app: influxdb # The labels that the PersistentVolumes must have to be used to fulfil this claim
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 1Gi
      storageClassName: expandable-managed-premium # Support volume expansion

Automatic volume expansion for StatefulSet

At the time of this writing, changing the resources.requests.storage of a StatefulSet volumeClaimTemplates is not supported by Kubernetes. There are issues on GitHub both for the main project and the Enhancement Tracking and Backlog project. (The latter also has a PR.)

Here is my recommended approach to work around this until properly supported, which works for dynamically as well as statically provisioned disks:

  1. Prepare your Kubernetes (or Helm) file with the new storage size.
  2. For each pod (P) in your StatefulSet, repeat
    1. Delete the StatefulSet without cascading
      kubectl delete sts --cascade=false your-stateful-set
      
    2. Delete the pod
      kubectl delete pod/my-pod-P
      
    3. Edit the PersistentVolumeClaim of the pod
      kubectl edit pvc my-volume-my-pod-P
      

      Set the new storage size under resources.requests.storage.

    4. Re-create the StatefulSet using kubectl apply or Helm as applicable
    5. Wait for the volume to expand. Use kubectl describe pvc to see the progress. Sometimes another pod restart is required for changes to be applied.

References

  • This GitHub issue contains some tips for migrating from dynamically to statically provisioned disks. You can also use disk snapshots.
  • Andy Zhang also has some tips on growing disks on GitHub.

Fifty shades of Serverless

Today I gave a talk titled Fifty shades of Serverless at JavaForum Göteborg (Gothenburg), covering Serverless and the different options the market is offering us as developers in general and Java developers in particular. I also demoed a few examples and highlighted a few caveats to consider before going serverless.

Below you’ll find the slides and a recording of the talk. The code from the demo can be found at github.com/mjiderhamn/fifty-shades-of-serverless.

What is configuration?

One of the factors of The Twelve-Factor App methodology is Store config in the environment.

Keeping your configuration separate from your code, rather than hard coding stuff, is sound advice. But in order to follow it, you need to understand what configuration is. The twelve-factor "manifesto" explains

"An app’s config is everything that is likely to vary between deploys (staging, production, developer environments, etc)"

and then goes on to list examples such as connections strings and credentials to databases and other external resources/services and per-deploy things like hostname to be included in aggregated logs.

What is not configuration?

The document continues with clarifying

"Note that this definition of “config” does not include internal application config, … This type of config does not vary between deploys"

In essence: thing that vary between deployments (dev/qa/stage/production; geographic region etc) and/or instances (hostname within a deployment cluster) is configuration that should be kept separate from the version controlled code, while internal configuration identical to all deployments and instances belong inside the codebase.

In this post I’m going to argue that something else isn’t configuration from a Twelve-Factor App point of view:

Things that are expected to change over time is not configuration

Well, of course, if those things also differ per deployment, they are configuration. But just because something is determined to be more or less likely to change down the road, often for non-technical reasons, doesn’t make it configuration and warrant it to be treated as such.

Even if it is expected to first change in your development environment, then in your staging environment and then finally in your production evenment, it isn’t configuration. You know something else that has the same lifecycle expectation? Your code.

Let me take as an example for you: the monthly price of a Netflix subscription. (Please note that this is completely fictitious – I don’t know how Netflix technically treats their prices.)

Netflix

When Netflix launched, they likely anticipated that their prices would change at some point in the future. And when they actually did update their prices, don’t you think that they first did this in some dev/test/qa environment before they "released" the new price to the market (i.e. production environment)?

"I need a UI"

Often these kinds of things are initiated by some business stakeholders, such as the marketing people. "You need to create a UI so that we can change X [monthly price] whenever we want to". Sometimes they also want the ability to copy settings from one environment to another, either by some export/import settings feature in said UI, or by having someone set up a routine so that they can copy database entries from one environment to the other.

This should raise a red flag. If the business people want to be able to first make a "configuration" change in a qa/stage environment and then, when they think they are ready, copy that "configuration" over to the production environment, you should take a step back and contemplate about what you are trying to achieve. Are you in essence creating two parallell deployment pipelines – one for the code and one for the config? Who is going to develop and maintain the config pipeline? Can the extra cost and complexity of having two separate deployment pipelines really be motivated…?

I would say that a litmus test for whether something is truly a business only configuration warranting a UI, is that if you imagine that you stopped all development and then propagated all code so that all environments ran the exact same codebase and all the “marketing deadlines” (such as price increases) were reached and after that you had a difference in configuration between the environments – would that be considered an error? Another way to put it: If you changed this configuration in the production environment first, would there then be a need to propagate the change "backwards" to stage/qa/dev? Or is it totally fine if the Netflix subscription costs $10.99 for actual customers, $9.99 in the stage environment and $6.03 on John Doe’s development machine?

Release cadence

Instead putting "configuration" that is expected to change inside your codebase assumes that you will be releasing that codebase often enough compared to how frequently (and with how much notice) the configuration is expected to change. If your next production release is scheduled in 5 months, you’d wish you had created that UI when the business people requires that you change X at the next turn of the month.

But what if you took the time that you would have spent creating the UI and export/import feature, and instead spent that time streamlining the release pipeline of your codebase? Ultimately you’d have Continuous Delivery, so that a change made in your codebase (on a hotfix branch, if needed) could reach production within hours if not minutes.

If you release relatively often, but marketing requires that a change (such as price increase) occurs at a specific day or even time on that day, and you can’t or don’t want to release on that exact day or time, you could consider including the config in your codebase using Feature toggles. Admittedly there is overhead involved with allowing you to toggle the feature during runtime, however.

Config or data?

Sometimes the thing you want to change or add is not just a simple string or number, but an entire data structure. This still doesn’t warrant a UI and/or export/import. As for database data you could use – and hopefully you are already using – some database migration tool like Flyway or Liquibase, and you can include the config change in those scripts.

Other options include storing structured data in separate files, such as XML or JSON, alongside your source code. Maybe you can even define a file format that the business people can manage themselves, and that will then be included in the codebase? (Before you suggest that however, I should warn you they are likely to suggest Excel…)

Objections

"But that requires a developer to make a business change!"

Generally that is true – as with changes to the business logic of your application. Even though you may be able to find ways around that, as per above, this means putting this kind of config in your codebase probably won’t work well for slow moving, waterfall type of organizations. But in an agile environment with Continuous Delivery or at least a high release cadence, it shouldn’t be much of an issue unless the changes required are very frequent or complex.

And remember, we had already saved ourselves development time by avoiding having to create the UI and/or the process for separately propagating configuration from one environment to another.

Benefits

We’ve already mentioned the benefit of avoiding having to set up, document and maintain a separate "release pipeline" for your config. Consider the fact that it should often be managed by non-tech people and the benefits of avoiding it may be even greater.

Another main benefit if you integrate this type of configuration in your codebase, is that you can use lower level, cheaper/faster tests to verify the correctness of the settings. Hopefully you are familiar with the Test Pyramid, visualising that higher level tests (such as UI or integration tests) are both more expensive to maintain and runs slower, effectively slowing down your release pipeline and decreasing your maximum possible release cadence.
Test pyramid

In the Netflix example, having the price inside your codebase means you can write unit tests for verifying debit calculations etc, rather than having to write for example UI tests for the same verification.

You will also get your configuration version controlled, which is a positive side effect – especially if you managed to avoid Excel. 🙂

Agree or disagree? I’d love to hear your thoughts in the comments below.

Non-backwards compatible SQL database migration

Sometimes you need/want to make more radical changes to your SQL database schema, such as renaming a column or moving data from one table to another.

Tools like Flyway and Liquibase has simplified making backwards compatible database migrations, such as adding columns/tables. However in order to make non-backwards compatible changes “online” (i.e. while the application is up and running) for a clustered environment (multiple application instances accessing the same database) requires a little more thought.

Basically you need to make this change in 5 steps, spread out across (at least) 4 releases (assuming a release means updating the database, either separately or during application boot using a migration tool, and updating application – in that order). I’ll use renaming a database column as an example.

  1. Database: Add the new column (i.e. do not rename or drop the previous one) to the database. You may copy existing data to new column also.
    Application: Start writing the data to both the old and the new column.
  2. Database: Copy existing data to new column also. Even if you did that in the previous step, you need to do it again, since the application may have written to the old column only between the time when the database migration was executed, and the time the application was updated. You could opt to only copy the data that was written during that time (i.e. where the old column is non-null but the new column is null.) This does not have to be a separate release, but could be the migration made as part of the next application release.
  3. Application: Start reading data from the new column instead of the old column. Note that you must not stop writing data to the old column yet, since as you update one application instance (i.e. one cluster node) at a time, there can be non-updated nodes reading the old column for data written by updated nodes.
  4. Application: Stop writing to the old column.
    Database: Note that we cannot drop the old column yet, since the non-updated nodes will still be writing to it.
  5. Database: Drop the old column.

ClassLoader leaks links

During September 2016 I’ll be speaking about ClassLoader leaks on JavaZone, JDK.IO and JavaOne. For those that listened to my talk and want to read more on the subject, here are the slides, links to my blog series and to the ClassLoader Leak Prevention library on GitHub.

Part I – How to find classloader leaks with Eclipse Memory Analyser (MAT)

Part II – Find and work around unwanted references

Part III – “Die Thread, die!”

Part IV – ThreadLocal dangers and why ThreadGlobal may have been a more appropriate name

Part V – Common mistakes and Known offenders

Part VI – “This means war!” (leak prevention library)

Recording (from JDK.IO):

Code review tools, 2016 edition

I’ll be talking about agile code review on JDK.IO in Copenhagen and since last I talked on the subject more code review tools have become available, some of the from “big players”, so an updated list of tools seems to be in place.

  • Gerrit (by Google) – Open Source, web-based, for Git only, used for Android
  • Phabricator (originally by Facebook) – web-based, free when self hosted
  • Upsource (by JetBrains) – web-based, free for up to 10 users, IntelliJ integration (of course)
  • Crucible (by Atlassian) – commercial, web-based
  • Collaborator (formerly Code Collaborator; by SmartBear) – web-based + Eclipse plugin + Visual Studio plugin (IntelliJ plugin under development), free for up to 10 users
  • Klocwork – commercial, web-based
  • ReviewBoard – Open Source, web-based
  • AgileReview – Eclipse plugin

Older, possibly abandoned tools:

ClassLoader Leak Prevention library 2.0 released

I recently released version 2.0.0 of the ClassLoader Leak Prevention library to Maven Central. This is a major refactoring, that provides the following new.

App server and non-servlet framework integration

The library now has a core module that does not assume a servlet environment. This means that the library can be integrated into environments that do dynamic class loading, such as scripting enginges. It also means that Java EE application servers can integrate the library, so that web apps deployed onto that server wouldn’t need to include the library to be protected from java.lang.OutOfMemoryError: PermGen space / Metaspace. More details can be found in the module README.md on GitHub

Zero-config Servlet 3.0+ module

If you’re in a Servlet 3.0 or 3.1 environment, there is no longer a need for explicitly declaring the <listener> in web.xml. Instead use a the classloader-leak-prevention-servlet3 Maven dependency that handles this for you automatically. For details, see the README.md on GitHub.

Preventions are now plugins

In version 1.x, you needed to subclass the librarys ServletContextListener to add, remove or change behaviour of specific leak prevention measures. In 2.x, each prevention mechanism is a separate class implementing an interface. This makes it easier to implement your own additional preventions, remove measures from the configuration, or subclass and adjust any single mechanism.

Improved logging

While 1.x logged to System.out/System.err unless you subclassed and overrode the log methods, 2.x by default uses java.util.logging (JUL). You can also easily switch to the System.out/System.err behaviour, or provide your own logging.

Please note that bridging JUL to other logging frameworks (for example using jul-to-slf4j has not been tested, and may produce unexpected results, in case something is logged after the logging framework has been shut down by the library.