Evan Dontje

Automatic system proxy configuration for Electron applications

02 April 2020

Electron applications integrate and run on a user’s computer like a native application. Most things ‘just work’ but I struggled to get system proxy settings to be detected and used by Electron without requiring manual configuration. It is possible to use system proxy settings automatically and this post lays out some options for configuration.

First, a quick summary of Electron processes

First, you will need to know which process the request originates from because the options are different. More detailed information on the process types can be found in the Electron documentation or this blog post.

The main process is responsible for creating the browser windows for your application. It is a node.js application and therefore uses the node.js networking stack by default. The renderer process is used to run the BrowserWindow.webContents portions of the application (the web UI). The renderer is an instance of Chromium and can access the native Chromium networking APIs (like XmlHttpRequest or fetch) in addition to the node.js stack. Access to the native Chromium networking APIs is important because Chromium handles resolving the system networking settings like proxies.

Renderer process requests

Http requests from the renderer process are relatively straightforward. Make sure to use one of the native Chromium APIs like fetch. The Chromium process will take care of detecting and using the system proxy settings for you. The fetch interface is easy to use and works the same as it does in any other browser.

Main process requests

To make an http request respecting system proxy settings from the main process, you have several options to consider.

The first is the built-in ‘net’ library. This option uses the native Chromium networking stack so it has the same access to the system proxy settings as the renderer process. This API interface is painful to work with and there are not any reliable open-source packages that wrap it in a more usable interface. I have not personally pursued this option due to the difficulty of working with the API but it may evolve to be more usable.

Another option is to use a javascript networking library like axios which uses the built-in http library from node.js. The library you choose can be configured manually to use a proxy. To do this automatically, you can retrieve the Chromium proxy settings and pass those along. Sample code for accessing the proxy information is below:

// retrieve the browser session
const session = require('electron').remote.getCurrentWindow().webContents.session;

// resolve the proxy for a known URL. This could be the URL you expect to use or a known good url like google.com
session.resolveProxy('https://www.google.com', proxyUrl => {

   // DIRECT means no proxy is configured
   if (proxyUrl !== 'DIRECT') {
     // retrieve the parts of the proxy from the string returned
     // the url would look something like: 'PROXY http-proxy.mydomain.com:8080'
     const proxyUrlComponents = proxyUrl.split(':');

       const proxyHost = proxyUrlComponents[0].split(' ')[1];
       const proxyPort = parseInt(proxyParts[1], 10);

       // do something with proxy details
   }
});

This pattern has also been encapsulated into an NPM package node-electron-proxy-agent.

Finally, the user can provide the proxy details and they can be configured in whatever http library being used. This is typically done via environment variables (e.g. HTTP_PROXY) or setting the agent of the request library to a proxy http agent like tunnel. An example of setting the agent on an axios client is below:

const agent = tunnel.httpsOverHttp({
    proxy: {
        host: 'http-proxy.mydomain.com',
        port: 8080,
    },
});
const axiosClient = axios.create({
    ...,
    httpsAgent: agent,
});

Wrap up

I prefer to keep the configuration as automatic as possible, so I prefer a combination of fetch for renderer processes and automatic configuration based on the Chromium session for the main process.

Setting up git running under WSL

03 March 2020

I recently ran into a problem running git from inside a WSL (Windows Subsystem Linux) command prompt to manage a git repository stored in a folder outside the WSL file system. In my case that location was C:\projects but I believe the problem would exist for any location outside the WSL directory.

The error message looked like this when I tried to git clone:

error: chmod on /mnt/c/projects/<repo>/.git/config.lock failed: Operation not permitted
fatal: could not set 'core.filemode' to 'false'

I could overcome the issue by using sudo but other git operations would fail when I tried to use the local repository. The fix (detailed here on askubuntu) was to remount the C:\ drive with the metadata flag enabled to allow chmod/chown commands to work on the C:\ partition.

sudo umount /mnt/c
sudo mount -t drvfs C: /mnt/c -o metadata

More details are in the Microsoft release notes for setting default file permissions in the WSL partition and help configuring other mount options.

Update 5/12/2020

An outcome of that change was that I was seeing file permissions changes from 644 to 755 when I edited a file in Visual Studio or any Windows application. To fix this, make sure that the core.filemode is set to false inside each git repository where you see the permissions being changed - git config core.filemode false.

Stream processing windowing using Azure Event Hubs

04 March 2019

This post is aimed at engineers designing systems that need to process streams of events. One particular solution is explored, using an Azure Event Hub and the Capture feature.

There are a few concepts that are helpful to understand before diving in.

What is stream processing?
Stream processing as defined by Martin Kleppman in his book Designing Data-Intensive Applications is

somewhere between online and offline/batch processing (so it is sometimes called near-real-time or nearline processing). Like a batch processing system, a stream processor consumes inputs and produces outputs (rather than responding to requests). However, a stream job operates on events shortly after they happen, whereas a batch job operates on a fixed set of input data. This difference allows stream processing systems to have lower latency than the equivalent batch systems.
What is stream windowing?
Windowing, as used in this post, is the process of breaking down events from a continuous stream into groups that occurred during a given time interval (typically small, on the order of minutes or seconds). One iteration of the time interval is considered a “window”.
What are Azure Event Hubs?
Azure Event Hub is a managed service that provides a log-based message broker to send and receive messages. A log-based message broker like Event Hub maintains an append only storage of messages and consumers of the log maintain their position in the log as they process messages. Messages are only removed from the log when the log is compacted to remove duplicate messages or when the log begins to run out of space. Apache Kafka is another common implementation of a log-based message broker with similar semantics.

This log message broker should not be confused with “standard” message brokers like RabbitMQ or Azure Service Bus which provide queueing semantics. Messages are pulled from a queue, then locked by the broker until the message is processed/deleted or returned to the queue. No central log is maintained.

Getting started with Capture

A common approach to dealing with an “infinite” event stream is to break it down into time-based windows and process each window as a batch. Any message broker can handle this concept but typically you have to implement it yourself. I like to offload logic like that to the platform, which we can do if we’re using Event Hubs. The Event Hub service provides a feature called Capture that will process messages in a configurable window and write them in a batch to either Azure Data Lake Store or Azure Blob Storage.

For the rest of this post I’m assuming that your Event Hub has been configured to output to Blob Storage using a 5 minute/300 megabyte window. When Capture is configured, all messages inside a given Event Hub partition and time window will be written to one blob file compressed using the Avro format for efficiency. The naming convention for the blob is {AzureNamespaceName}/{EventHubName}/{PartitionId}/{DateDownToTheSecond}.avro. If no messages are received in a window, the Capture feature will write an empty Avro file to blob by default. I recommend turning this off to make processing of the blobs easier.

Processing the windowed output

To handle the Capture output using a scalable platform is essential. For big data analysis, Azure Data Lake works well. However, it can be overkill for smaller datasets and lower throughput services. An option for the low-scale case is Azure Functions with a blob trigger like the one below:

[FunctionName("CaptureBlobTrigger")]        
public static async Task RunAsync([BlobTrigger("myblobcontainername/{name}.avro")] Stream capturedWindowStream, string name, ILogger log)
{
    await ProcessWindowAsync( capturedWindowStream, log );
}

The function will run every time a new blob appears in the blob container myblobcontainername. The ProcessWindowAsync method can then deserialize the input blob and process it. Microsoft provides the Microsoft.Avro.Core nuget package for working with Avro files as shown below. The code deserializes the input and logs each unique message body contained in the window:

private async Task ProcessWindowAsync( Stream capturedWindowStream, ILogger log )
{
  // based on https://gist.github.com/pshrosbree/74c8c4b4744c00cf3d92939952808d1e
  using ( IAvroReader<object> reader = AvroContainer.CreateGenericReader( stream ) )
  {
    while ( reader.MoveNext() )
    {
      foreach ( string recordBody in reader.Current
                    .Objects.Select( o => new AvroEventData( o ) )
                    // assumes UTF8 encoding of input message string
                    .GroupBy( r => Encoding.UTF8.GetString( r.Body ) )
                    .Select( g => g.Key ) )
      {
          log.WriteInformation( $"{DateTime.Now} > Read Unique Item: {recordBody}" );
      }
    }
  }
}

private struct AvroEventData
{
  public AvroEventData( dynamic record )
  {
    SequenceNumber = (long) record.SequenceNumber;
    Offset = (string) record.Offset;
    DateTime.TryParse( (string) record.EnqueuedTimeUtc, out var enqueuedTimeUtc );
    EnqueuedTimeUtc = enqueuedTimeUtc;
    SystemProperties = (Dictionary<string, object>) record.SystemProperties;
    Properties = (Dictionary<string, object>) record.Properties;
    Body = (byte[]) record.Body;
  }
  public long SequenceNumber { get; set; }
  public string Offset { get; set; }
  public DateTime EnqueuedTimeUtc { get; set; }
  public Dictionary<string, object> SystemProperties { get; set; }
  public Dictionary<string, object> Properties { get; set; }
  public byte[] Body { get; set; }
}

For a production system, instead of logging, the messages in the window could be processed and analyzed. I hope this post helped you learn about Event Hub Capture and how it might be useful for processing event streams.

404s and web.config errors using git in WSL

18 January 2019

I’ve run into two problems with local development recently and I wanted to share the resolution to the problems in case it helps others.

IIS Express 404s

The first problem was after a fresh clone using git in a bash prompt under Windows Subsystem Linux (WSL). When I started the full framework, asp.net application, the response was always a 404. IIS Express could not see content or application code in that directory no matter what I changed. The fix was to either re-clone using git in a regular windows command prompt, or copy all the files to another folder manually created in windows explorer. Something about the permissions of the parent folder don’t work well with IIS Express when created through the WSL git executable.

Web.config not recognized

The second problem was with the web.config file not being recognized. I again had cloned the repository using git in a bash prompt under WSL. The file existed on disk but IIS Express was not reading it and applying the configuration. The fix for this problem was to rename the file in WSL bash to match the casing in the .csproj file.

For example, on disk the file was cloned as Web.config but the .csproj entry was:

    <Content Include="web.config">
      <SubType>Designer</SubType>
    </Content>

Note the mismatch in the first character casing. Once I renamed the file to be all lowercased as the .csproj file expected, everything worked. I believe this issue has something to do with case sensitivity differences between WSL and Windows.

High thread count in Azure Functions

09 October 2018

Recently Azure Functions runtime V2 was released and one of my team’s functions started to consume an abnormally high number of threads. When we dug into the details of the problem, the root cause was performing static initialization of logging on each function invocation. Specifically, we were using Serilog with the Application Insights sink and were recreating the sink each time the function ran. The sink was then assigned to a static variable (Log.Logger).

The code used to look something like this:

using System;
using Microsoft.ApplicationInsights.Extensibility;
using Microsoft.Azure.WebJobs;
using Serilog;

namespace ServiceBusThreadingProblem
{
   public static class HighThreadCountFunction
   {
      [FunctionName( "HighThreadCountFunction" )]
      public static void Run( [ServiceBusTrigger( "TestTopicName", "TestSubscriptionName", Connection = "TestSubscriptionConnectionString" )] string messageContents, ExecutionContext context, int deliveryCount, DateTime enqueuedTimeUtc, string messageId )
      {
         var logConfiguration = new LoggerConfiguration()
            .Enrich.WithProperty( "FunctionInvocationId", context.InvocationId )
            .Enrich.WithProperty( "QueueMessageMessageId", messageId )
            .Enrich.WithProperty( "MachineName", Environment.MachineName );

         logConfiguration.WriteTo.ApplicationInsightsTraces( new TelemetryConfiguration( Environment.GetEnvironmentVariable( "APPINSIGHTS_INSTRUMENTATIONKEY" ) ) );

         Log.Logger = logConfiguration.CreateLogger();
         Log.Information( $"C# ServiceBus topic trigger function processed message: {messageContents}" );
      }
   }
}

The issue is with the re-assignment of the Log.Logger static variable each time the function runs. Looking at a memory dump it was apparent that each thread (~400 of them) was waiting somewhere in Application Insights code. That led us to examine the application insights traces line, which is part of the Serilog initialization. A quick test confirmed that without that line, the threads stayed at normal levels.

As a general principle in Azure Functions, if a class can manage external connections and is threadsafe then it should be reused as a static variable. See the Azure Function documentation on static client reuse. Since Serilog’s Log.Logger is a static variable and is threadsafe, the code above should not have been recreating the logger on each invocation.

Instead, the code above should be refactored to reuse the Log.Logger variable. One pattern that makes this easy is using the Lazy class to ensure only one instance of the logger is ever created as shown below.

using System;
using Microsoft.ApplicationInsights.Extensibility;
using Microsoft.Azure.WebJobs;
using Serilog;
using Serilog.Context;
 
namespace ServiceBusThreadingProblem
{
   public static class NormalThreadCountFunction
   {
      private static Lazy<ILogger> LoggingInitializer = new Lazy<ILogger>( () =>
      {
         var logConfiguration = new LoggerConfiguration()
                 .Enrich.FromLogContext()
                 .Enrich.WithProperty( "MachineName", Environment.MachineName );

         logConfiguration.WriteTo.ApplicationInsightsTraces( new TelemetryConfiguration( Environment.GetEnvironmentVariable( "APPINSIGHTS_INSTRUMENTATIONKEY" ) ) );
         Log.Logger = logConfiguration.CreateLogger();
         return Log.Logger;
      } );

      [FunctionName( "NormalThreadCountFunction" )]
      public static void Run( [ServiceBusTrigger( "TestTopicName", "TestSubscriptionName", Connection = "TestSubscriptionConnectionString" )] string messageContents, ExecutionContext context, int deliveryCount, DateTime enqueuedTimeUtc, string messageId )
      {
         ILogger logger = LoggingInitializer.Value;

         using ( LogContext.PushProperty( "FunctionInvocationId", context.InvocationId ) )
         using ( LogContext.PushProperty( "QueueMessageMessageId", messageId ) )
         {
            logger.Information( $"C# ServiceBus topic trigger function processed message: {messageContents}" );
         }
      }
   }
}

In the code above, the Application Insights Serilog sink is no longer reinitialized on each function invocation. The Lazy field handles initializing the logger if one isn’t already available. With this refactoring, the function consistently uses only ~40 threads. Additionally, the thread count is not impacted by additional load.

Takeaways

To help your azure function scale and to avoid consuming extra resources, reuse as much as possible between invocations.

Use static clients to optimize connections. Examples of clients that should be reused are Http Client, Azure Storage clients, and logging clients.
Avoid reinitializing classes that can be reused. In my case, it was logging with Serilog but configuration is another candidate to be initialized once for all invocations of a function.

See All Posts