Thursday, April 23, 2015

Running with alternate JVM args in sbt

It's pretty easy in sbt to run a main class in a forked JVM:

// Ensure we run outside of sbt. This is especially useful for setting JVM-level flags
fork in run := true

// Set flags for java. Memory, GC settings, properties, etc.
javaOptions ++= Seq("-Xmx4g")

mainClass in run := Some("mypackage.MyMainClass")

That lets you use "run" to spin up a forked JVM with your provided main class and arguments. (Aside: sbt-revolver is a MUCH friendlier way to run main classes).

But what if you want to run a main class with a different set of JVM flags? Maybe you have a memory hog, or you want to specify a debug logback config - or any other number of reasons. Or maybe you just want to have a shortcut for a long class name, or to have a default set of arguments for your main class.

This can be done with the Fork API:

// Create a task for your new main class. This is an input task so that you
// can provide arguments to it via the sbt console.
lazy val runMyOtherMain = inputKey[Unit]("run MyOtherMain")

runMyOtherMain := {
  // Parse the arguments typed on the sbt console.
  val args = sbt.complete.Parsers.spaceDelimited("[main args]").parsed
  // Build up the classpath for the subprocess. Yes, this must be done manually.
  // This also ensures that your code will be compiled before you run.
  val classpath = (fullClasspath in Compile).value
  val classpathString = Path.makeString(classpath map { _.data })
  // Any JVM args you want. You could use javaOptions.value to get your default
  // list, if you like.
  val jvmArgs = Seq("-Xmx4g", "-Dlogback.configurationFile=debug_logback.xml")
  Fork.java(
    // Full options include whatever you want to add, plus the classpath.
    ForkOptions(runJVMOptions = jvmArgs ++ Seq("-classpath", classpathString)),
    // You could also add other default arguments here.
    "mypackage.MyOtherMainClass" +: args
  )
}

Now, you can type runMyOtherMain arg1 arg2 from the sbt console, and it'll execute:

java -Xmx4g -Dlogback.configurationFile=debug_logback.xml -classpath [actual classpath] mypackage.MyOtherMainClass arg1 arg2

Sweet!

Monday, February 23, 2015

Partially applied functions and implicit parameters

In trying to test an akka Actor talking over TCP (a whole difficult subject of its own), I wanted to inject a factory method for the TCP actor ref into my Actor. Specifically, I wanted to inject IO.apply. The signature looks like:

// Used like "IO(Tcp)", where Tcp is the object identifying the TCP IO extension.
def apply[T <: Extension](key: ExtensionId[T])(implicit system: ActorSystem): ActorRef

Ignoring the awkward type parameter, this looks like a method with two parameter lists - one normal, and one implicit. My thought was that we could just bind the explicit list, and let the implicit list get bound later. Note that I couldn't figure out how to create a partially-applied function that was 0-ary (that is, all parameters were bound), so I had to have some weird typing:

import akka.actor.{ Actor, ActorRef, ActorSystem, Props }
import akka.io.{ IO, Tcp, TcpExt}
// Note that I hard-coded the type to Tcp for readability.
class MyActor(tcpFactory: Tcp.type => ActorRef) extends Actor {
  // To use:
  tcpFactory(Tcp) ! Tcp.Connect(hostname)
}
object MyActor {
  // Factory method for the class, used outside of unit tests.
  // WARNING: Can you spot the error here?
  def props()(implicit actorSystem: ActorSystem): Props = {
    // Constrain the type on the apply method, and partially bind it.
    Props(new MyActor(IO.apply[TcpExt] _))
  }
}

After this, I compiled & tested the code - it worked! But I was left with an unsettled feeling about the implicit, and thinking for a moment, it struck me: This potentially runs the factory method (IO.apply) in a different ActorSystem than the main actor itself! The reason being, of course, that the implicit is bound up in the partial method when you first call props(), not when it's later called within the Actor. This should be obvious by the partial's signature - Type.type => ActorRef doesn't have an ActorSystem anywhere in it!

The solution would be to bind both the implicit and explicit list to the partial - but I couldn't find any documentation on how to bind an implicit list into a partially-applied function, or even to declare it as such. From what I can tell, this is impossible. You can curry functions with multiple regular parameter lists, but you're out of luck for implicits. However, you can bind a partial function with a single argument list with the implicits tacked on:

class MyActor(tcpFactory: (Tcp.type, ActorSystem) => ActorRef) extends Actor {
  // Use 'implicitly' to get the value from the implicit scope.
  tcpFactory(Tcp, implicitly[ActorSystem]) ! Tcp.Connect(hostname)
}
object MyActor {
  // No more implicit parameters on the factory method!
  def props: Props = {
    // Looks like currying, confusingly.
    Props(new MyActor(_: Tcp.type)(_: ActorSystem))
  }
}

Cleaning up a bit more:

// Partially bind the Tcp parameter now.
class MyActor(tcpFactory: ActorSystem => ActorRef) extends Actor {
  tcpFactory(implicitly[ActorSystem]) ! Tcp.Connect(hostname)
}
object MyActor {
  def props: Props = Props(new MyActor(Tcp)(_: ActorSystem))
}

Wednesday, November 19, 2014

Handling CORS headers with spray-routing

One of those things that crops up every so often is the need to run a REST API on a different host or port than the client application that consumes it. This frequently arises during development, when you might have a client server on localhost:8080 talking to an API on localhost:8081, and you don't have fancy load-balancing to make them both use the same URL.

The problem here is that modern browsers restrict "cross-origin" requests; that is, XHR / Javascript requests from a web page on one site from accessing another site. This is a good thing - it prevents cross-site request forgery, and makes the internet a better place.

Except for API developers, for which it is a huge pain.

The symptom of this is an error in your javascript console like:

XMLHttpRequest cannot load http://localhost:8081/api. No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'http://localhost:8080' is therefore not allowed access.

. . . which results in a bunch of Googling. If you're lucky, you'll end up at the CORS specification page.

After a bit of hacking, the end result, in Spray, is frequently code like:

object ApiServer extends SimpleRoutingApp {
  // TODO: Fix this!!! It's totally insecure . . .
  val AccessControlAllowAll = HttpHeaders.RawHeader(
    "Access-Control-Allow-Origin", "*"
  )
  val AccessControlAllowHeadersAll = HttpHeaders.RawHeader(
    "Access-Control-Allow-Headers", "Origin, X-Requested-With, Content-Type, Accept"
  )
  startServer(interface = "0.0.0.0", port = 8081) {
    respondWithHeaders(AccessControlAllowAll, AccessControlAllowHeadersAll) {
      options {
        complete {
          ""
        }
      } ~
      myInnerRoute
    }
  }
}

Note the comment - this code is completely open to cross-site attacks! If you're coding defensively, you might only return the headers in development mode - but that doesn't solve the problem if your production servers need to support cross-site requests.

Sadly, spray-routing doesn't offer much help for this - but it DOES offer a fairly easy way to define your own directive to handle the boilerplate correctly.

What we want is to have a set of allowed hosts that can use our API, and we want to echo back the Origin header of the request if the host is in the allowed set. We could also require HTTPS for extra security, but a redirect could do that as well. The other key component (seen above) is a handler for the OPTIONS method, which is used to find out what security settings the API requires.

This is an allowHosts directive that does the right thing:

import spray.http.{ HttpHeaders, HttpOrigin, SomeOrigins }
import spray.routing.Directive0
import spray.routing.Directives._

/** Directive providing CORS header support. This should be included in any application serving
  * a REST API that's queried cross-origin (from a different host than the one serving the API).
  * See http://www.w3.org/TR/cors/ for full specification.
  * @param allowedHostnames the set of hosts that are allowed to query the API. These should
  * not include the scheme or port; they're matched only against the hostname of the Origin
  * header.
  */
def allowHosts(allowedHostnames: Set[String]): Directive0 = mapInnerRoute { innerRoute =>
  // Conditionally responds with "allowed" CORS headers, if the request origin's host is in the
  // allowed set, or if the request doesn't have an origin.
  optionalHeaderValueByType[HttpHeaders.Origin]() { originOption =>
    // If Origin is set and the host is in our allowed set, add CORS headers and pass through.
    originOption flatMap {
      case HttpHeaders.Origin(list) => list.find {
        case HttpOrigin(_, HttpHeaders.Host(hostname, _)) => allowedHostnames.contains(hostname)
      }
    } map { goodOrigin =>
      respondWithHeaders(
        HttpHeaders.`Access-Control-Allow-Headers`(
          Seq("Origin", "X-Requested-With", "Content-Type", "Accept"),
        HttpHeaders.`Access-Control-Allow-Origin`(SomeOrigins(Seq(goodOrigin)))
      ) {
        options {
          complete {
            ""
          }
        } ~
        innerRoute
      }
    } getOrElse {
      // Else, pass through without headers.
      innerRoute
    }
  }
}

Tuesday, October 28, 2014

Getting request timeouts to work with spray-can

Request timeouts! This is such a basic feature of any RPC framework - some endpoints are just going to take a really long time compared to others, and you want to be able to wait for reallySlowRpc while still timing out quickly for shouldBeFastButSuperFlaky.

At AI2 we recently switched to using spray-can as our HTTP client, away from dispatch. Dispatch was great in that it let you get started quickly, and it's backed by the excellent-if-generically-named async HTTP client, but has the disadvantages of:
  • backing library is Java, not Scala
  • no Actor support (implied by the above)
  • very limited feature set out of the box
  • documentation is example-based, and sparse
Basically, dispatch is excellent if you need to get something running quickly, but very difficult to use for advanced applications.

spray-can, on the other hand, has great integration with spray-json (which we use for our RPC requests and responses), works well with actors, is pure Scala, and has moderately well-documented code.

The problem is that the actual APIs are often tricky to use. Very, very tricky to use - if the API hasn't been built to support a feature, it can take quite a lot of digging to unearth the right configuration parameter to give you what you need.

Which brings me back to request timeouts.