Sunday, February 21, 2010

Debugging PHP with NetBeans on Mac and Linux

As you probably know, I recently started working with Drupal, with lots of help from the Pro Drupal book. In the book (page 524), the author states:

Real-time debugging is a feature of PHP and not Drupal, but it's worth covering, since you can easily be identified as a Drupal ninja if a real-time debugger is running on your laptop.

In my relentless quest for Drupal ninjahood, I naturally wanted to get debugging working with NetBeans, my PHP editor of choice. Just kidding ... the need to get debugging working was driven by the relatively opaque nature of Drupal - the only two debugging features I am aware of are the watchdog logging feature, which I have used, and the Drupal Devel module, which I have not (yet). Also, like most Java programmers, I am probably more dependent on a good IDE than most PHP programmers.

In any case, this post talks about what needs to be done to get PHP debugging working on NetBeans on Linux (CentOS) and Mac (Snow Leopard). The information was gleaned from multiple posts, some of which provided inaccurate or incomplete information, so there was some amount of trial and error involved.

I will assume that you have NetBeans installed and you occasionally use it. I use NetBeans only for scripting, and so far, the lack of a debugger (or my lack of knowledge on how to run it) has not affected me. With Drupal, however, there is no real way of knowing what module(s) are getting called in a request, save from writing watchdog calls in your code, so having a debugger to step through the code can be quite helpful.

This part probably doesn't matter unless you are also doing Drupal development, but there appears to be a Goldilocks syndrome thing going between Drupal 6.15 and PHP 5.2. My CentOS 5.3 has PHP 5.1.6 in the default yum repository. Apparently that doesn't quite cut it with Drupal, so I initially enabled Remi's repository based on information in Binit Bhatia's post - but that now gives me PHP 5.3 (not surprising, since Binit's post is almost a year old now), which has even bigger problems with some of the modules. Ultimately, I ended up getting the 5.2 from the CentOS testing repository following guidelines in Irakli Nadareishvili's post.

On the Mac, I am using MAMP, which has PHP 5.2 installed as a component within MAMP (ie, under /Applications/MAMP/bin/php5/bin), even though at the OS level it has PHP 5.3 installed (ie, under /usr/bin).

So anyway, what you need to do is to install and configure XDebug to work with PHP and NetBeans. On CentOS, just download the source and build it as explained in the XDebug install page.

1
2
3
4
sujit@lysdexic:xdebug$ phpize
sujit@lysdexic:xdebug$ ./configure --enable-xdebug
sujit@lysdexic:xdebug$ make
sujit@lysdexic:xdebug$ sudo cp modules/xdebug.so /usr/lib64/httpd/modules/xdebug.so

On the Mac, compiling from source did not work, neither against the PHP 5.3 in /usr/bin nor the PHP 5.2 packaged with MAMP. It fails on make, and it looks like a bad #ifdef in the code somewhere. However, I was able to get it by downloading the PHP Remote Debugging package from the ActiveState Komodo Site and copying the xdebug.so file under the 5.2 directory to /Applications/MAMP/bin/php5/lib/extensions/no-debug-non-zts-20060613/ - most of this information came from Felix Geisendörfer's post.

The next step is to hook XDebug with PHP and NetBeans. Thankfully, this is (almost) identical on both Mac and CentOS. Essentially, the following lines need to be added to the php.ini file (/etc/php.ini on CentOS and /Applications/MAMP/conf/php5/php.ini on Mac). The information below mostly comes from the Debugging PHP Source Code in the NetBeans IDE NetBeans article, although some of it has been changed using information from other posts.

1
2
3
4
5
6
7
8
[xdebug]
zend_extension=/path/to/xdebug.so
xdebug.remote_enable=1
xdebug.remote_handler=dbgp
xdebug.remote_host=localhost
xdebug.remote_port=9000
xdebug.profiler_enable=1
xdebug.profiler_output=/tmp

In addition, remove/comment any zend_* properties from the php.ini file. Apparently XDebug and Zend don't go well together.

Restart your webserver and bring up a page with phpinfo() and verify that XDebug is available, or use php -m to list the modules as specified in the XDebug install page. If XDebug seems to be installed okay, the next step is to check if NetBeans can see XDebug.

To verify this, start NetBeans, open an existing PHP project (such as a Drupal application), then click the debug button. You should see a "Connecting to XDebug" message on the status bar which should change to "netbeans-xdebug" within a few seconds. At this point, you can debug a request by clicking a URL on your browser and stepping through the code, setting breakpoints, inspecting variables, etc on NetBeans.

Saturday, February 13, 2010

Handling Lucene Hits Deprecation in Application Code

I have mentioned earlier that I am refactoring our search layer to work with Lucene 2.9.1, up from our current version of Lucene 2.4.0. If you use Lucene, you know that 2.9 is the last release that preserves backward compatibility with earlier versions, so the goal is to remove all deprecation warnings, to give us a clean migration path to Lucene 3.0 (which is already out, BTW).

One of the classes that is going away is the Hits object, which used to be central to most search calls in our application. This post describes a prescriptive approach to replacing calls that return Hits with equivalent code that return an array of ScoreDoc objects instead.

Our typical pattern for searching an index and extracting results goes something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
    Searcher searcher = ...;
    Query query = ...;
    Filter filter = ...;
    Sort sort = ...;
    Hits hits = searcher.search(query, filter, sort);
    int numHits = hits.length();
    for (int i = 0; i < numHits; i++) {
      float score = hits.score(i);
      if (score < cutoff) {
        break;
      }
      int docId = hits.id(i);
      Document doc = hits.doc(i);
      // do something with document
      ...
    }
    searcher.close();

The pattern recommended in the Hits Javadocs is to use a TopScoreDocCollector. This will return an array of ScoreDoc objects instead of the Hits object. However, for performance, this approach will not populate the score values in the ScoreDoc object. I needed the score values (see snippet above), and I also needed to be able to sort the results using custom Sort objects, so I needed to use TopFieldCollector instead, as shown below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
    IndexSearcher searcher = ...;
    Query query = ...;
    Filter filter = ...;
    Sort sort = ...;
    int numHits = searcher.maxDoc(); // if not provided
    TopFieldCollector collector = TopFieldCollector.create(
      sort == null ? new Sort() : sort,
      numHits, 
      false,         // fillFields - not needed, we want score and doc only
      true,          // trackDocScores - need doc and score fields
      true,          // trackMaxScore - related to trackDocScores
      sort == null); // should docs be in docId order?
    searcher.search(query, filter, collector);
    TopDocs topDocs = collector.topDocs();
    ScoreDoc[] hits = topDocs.scoreDocs;
    for (int i = 0; i < hits.length; i++) {
      float score = hits[i].score;
      if (score < cutoff) {
        break;
      }
      int docId = hits[i].doc;
      Document doc = searcher.doc(docId);
      // do something with document
      ...
    }
    searcher.close();

Other approaches I tried before this are the recommendation in the Javadocs for Searcher.search(Query,Filter,Sort) to change it to Searcher.search(Query,Filter,int,Sort), which returned a TopDocs object instead of Hits. This worked fine for Lucene 2.4, but with Lucene 2.9, it returns NaN scores. This is because the search() uses TopDocsScoreCollector internally, and hence does not record the ScoreDoc.score value.

I figured this stuff out by poking around in the Lucene source code. My only concern at that point was that TopFieldCollector is marked as Experimental in the Javadocs, so I figured that there had to be a better way. However, I stumbled upon the Lucene Change Log (which in retrospect should have been the first place I should have looked), which also mentions the identical pattern, so I figure that its relatively safe to use the pattern.

One more thing to be aware of, especially if you've been using scores as we have, is that score normalization that used to happen on Hits is now gone - the ScoreDoc.score field contains the raw unnormalized score. You can read more about why its a bad idea to use it in LUCENE-954, and more importantly how the normalization was done if you need to backport the behavior into the new approach.

Lucene 2.9 has been out for the last 4 or so months, so presumably there are plenty of (okay, some) people who have been down this route, and they have probably implemented solutions different from the one above. If so, would appreciate hearing from you about your solution, and if you see obvious holes with mine. On the other hand, if you are contemplating getting rid of Hits in your code, I hope the post has been useful.

Update: 2010-04-04: One thing I found out the hard way (production searches taking a loooong time), is that by default, searcher.search(Query,Filter,Sort) returns the first 100 (or less if there is less) Hit objects. So when your searcher code doesn't know how many results it wants, don't use searcher.maxDoc(), use 100.

Sunday, February 07, 2010

Apache XML-RPC Service with Spring 3

Last week I described a little Drupal module that sends an XML-RPC request to a remote server every time a node is inserted, updated or deleted. This week, I describe a simple XML-RPC Server using Apache XML-RPC and Spring 3.0, that recieves that request and does something with it.

Although Apache XML-RPC seems to be quite a mature product, the documentation is quite sparse, and seems to be geared towards client-side stuff and standalone servers (the server documentation page mentions the WebServer and the ServletWebServer, both of which run as daemons).

This is kind of surprising, since I would imagine that most people who want to run an XML-RPC server would just naturally embed an XML-RPC service inside their favorite web container, rather than run a standalone server. That way they can leverage the features of the container, like pooling, instead of having to build it in themselves. Additionally, in my case, the web application would be written using Spring 3, because I am trying to learn it.

There is a brief mention about using the XmlRpcServer on the server documentation page, and my initial attempt was to build something around that. I did figure out how to do that, but I ended up instantiating quite a few supporting library classes, so I figured there had to be a simpler way, so I started looking around. Luckily, I found three resources which proved quite helpful.

Dejan Bosanac describes how to wrap a Spring Adapter around the WebServer class. While interesting, the server docs discourage this practice as something that won't scale. Essentially, you end up instantiating a daemon within a controller, which is started up and shut down with the container via the controller's lifecycle methods. In a later article, he describes how to use an XmlRpcServer embedded inside a Spring Controller. I came upon this after I had written my own version, and my version was much more complicated - I suspect its because the API has changed quite a bit between the time the post was written and now. Finally, I came across Perry Nguyen's post which uses the XmlRpcServletServer embedded in a Spring Controller - which effectively allowed me to cut down the controller code (minus the configuration) to a single line.

So really, if you are looking for how to embed Apache XML-RPC inside Spring, the three resources I have referenced earlier give you all the information required. The most value I add in this blog is to show how to do this with Spring 3. From my point of view, however, this is something I did not know how to do before this, so its been quite interesting to figure out how.

The way the Apache XML-RPC server (and possibly other XML-RPC servers) work is quite cool. The XML-RPC request contains a methodName parameter which contains the name of the handler and its method to be executed on the server, something like ${handlerName}.${methodName}. A methodName like "publisher.update" would point to a handler named "publisher" which has a method update() which takes the parameters contained in the params element of the XML-RPC request. The return value from the update() method is passed back as an XML-RPC response back to the caller.

Since I was using Spring 3, and being relatively new to it, I decided to use Annotations and Autowiring to the max, mainly to see how far I could take it. So my myapp-servlet.xml file contains the following snippet, which tells Spring to use annotation based configuration and to automatically scan for components starting from the root of the application.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
<?xml version="1.0" encoding="UTF-8"?>
<!--
Source: src/main/webapps/WEB-INF/myapp-servlet.xml
-->
<beans xmlns="http://www.springframework.org/schema/beans"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:p="http://www.springframework.org/schema/p"
    xmlns:context="http://www.springframework.org/schema/context"
    xsi:schemaLocation="
        http://www.springframework.org/schema/beans
        http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
        http://www.springframework.org/schema/context
        http://www.springframework.org/schema/context/spring-context-3.0.xsd">

  <context:annotation-config/>
  <context:component-scan base-package="com.mycompany.myapp"/>

</beans>

As expected, the controller is annotated with @Controller. It is effectively an InitializingBean, so we have our init method annotated with @PostConstruct - this is where the server is constructed using the properties injected into it. The request handling method is serve, which is annotated with @RequestMapping, which also specifies what the URL is and that it will only accept POST requests.

My Controller uses auto-wiring for setting variables. The default configuration needs no XML configuration at all, but if you wanted to override some of the configuration, then you would need to specify it in XML. Here is the code for the controller.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
// Source: src/main/java/com/mycompany/myapp/controllers/XmlRpcServerController.java
package com.mycompany.myapp.controllers;

import java.util.Map;

import javax.annotation.PostConstruct;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import org.apache.xmlrpc.XmlRpcException;
import org.apache.xmlrpc.server.XmlRpcErrorLogger;
import org.apache.xmlrpc.server.XmlRpcServerConfigImpl;
import org.apache.xmlrpc.webserver.XmlRpcServletServer;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestMethod;

import com.mycompany.myapp.handlers.IHandler;
import com.mycompany.myapp.xmlrpc.SpringHandlerMapping;
import com.mycompany.myapp.xmlrpc.SpringRequestProcessorFactoryFactory;

@Controller
public class XmlRpcServerController {

  @Autowired(required=false) private String encoding = 
    XmlRpcServerConfigImpl.UTF8_ENCODING;
  @Autowired(required=false) private boolean enabledForExceptions = true;
  @Autowired(required=false) private boolean enabledForExtensions = true;
  @Autowired(required=false) private int maxThreads = 1;
  @Autowired private Map<String,IHandler> handlers;
  
  private XmlRpcServletServer server;
  
  @PostConstruct
  protected void init() throws Exception {
    XmlRpcServerConfigImpl config = new XmlRpcServerConfigImpl();
    config.setBasicEncoding(encoding);
    config.setEnabledForExceptions(enabledForExceptions);
    config.setEnabledForExtensions(enabledForExtensions);
    
    server = new XmlRpcServletServer();
    server.setConfig(config);
    server.setErrorLogger(new XmlRpcErrorLogger());
    server.setMaxThreads(maxThreads);

//    PropertyHandlerMapping handlerMapping = new PropertyHandlerMapping();
//    for (String key : handlers.keySet()) {
//      handlerMapping.addHandler(key, handlers.get(key).getClass());
//    }
    SpringHandlerMapping handlerMapping = new SpringHandlerMapping();
    handlerMapping.setRequestProcessorFactoryFactory(
      new SpringRequestProcessorFactoryFactory());
    handlerMapping.setHandlerMappings(handlers);

    server.setHandlerMapping(handlerMapping);
  }
  
  @RequestMapping(value="/xmlrpc.do", method=RequestMethod.POST)
  public void serve(HttpServletRequest request, HttpServletResponse response) 
      throws XmlRpcException {
    try {
      server.execute(request, response);
    } catch (Exception e) {
      throw new XmlRpcException(e.getMessage(), e);
    }
  }
}

The only required field is the Map of handler names and classes - we achieve auto-wiring here by making all our handlers implement a marker interface IHandler.

1
2
3
4
// Source: src/main/java/com/mycompany/myapp/handlers/IHandler.java
package com.mycompany.myapp.handlers;

public interface IHandler {}

So the auto-wiring code can now ask for a Map of all IHandler objects keyed by their bean names. The default naming strategy is to lowercase the first letter of the class name, such that something like com.mycompany.myapp.PublishHandler is converted to publishHandler. However, since our XML-RPC methodName is "publisher.update", we really want to call this handler "publisher" in our handler mappings. Using the @Server's value override allows us to satisfy our Java naming conventions as well as the conventions dictated by XML-RPC.

The PublishHandler methods insert(), update() and delete() correspond to the second part of the methodName in the XML-RPC request. At the moment, all they do is write out their parameters so you know they are being called.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// Source: src/main/java/com/mycompany/myapp/handlers/PublishHandler.java
package com.mycompany.myapp.handlers;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.springframework.stereotype.Service;

@Service("publisher")
public class PublishHandler implements IHandler {

  private final Log log = LogFactory.getLog(getClass());
  
  public int insert(String id, String title, String body) {
    log.info("inserting [" + id + "," + title + "," + body + "]");
    return 0;
  }
  
  public int update(String id, String title, String body) {
    log.info("updating [" + id + "," + title + "," + body + "]");
    return 0;
  }
  
  public int delete(String id, String title, String body) {
    log.info("deleting [" + id + "," + title + "," + body + "]");
    return 0;
  }
}

My Spring controller is called from Drupal via a trigger every time a node is inserted, updated or deleted. The Drupal action sends it an HTTP POST request to a URL like http://localhost:8081/myapp/xmlrpc.do. So predictably, when called from Drupal, the log message on the server side dumps out the contents of the request, and returns a 0 back to Drupal.

The nice thing about using Apache XML-RPC is that, after the initial learning hump, using it is quite simple. The initial learning curve can be made less steep by better documentation, perhaps an example of using Apache XML-RPC as a service embedded in a third party server component.

Update: 2010-02-09

Turns out I was a bit premature assuming that I had figured this stuff out. The code above (prior to the change that I am making now) still works, as long as all your handler does is print cute little messages to the logfile. However, generally you would want your handlers to do something more substantial than that, and for that you would probably set it up in your Spring configuration with some relatively heavyweight resource objects, most likely in your @PostConstruct method. The problem is that, by default, Apache XML-RPC will instantiate a new handler on every request - this means that your @PostConstruct does not get a chance to run (since it is run by Spring on startup) and you start getting NullPointerExceptions all over the place.

Perry Nguyen touches on that in his post, but I guess I did not fully understand it at that point to care enough. Tomas Salfischberger posts a more detailed solution, which I adapted for my needs. Essentially, I needed to replace the PropertyHandlerMapping with a custom HandlerMapping, and set a custom RequestProcessorFactoryFactory into this custom HandlerMapping, that forces Apache-XMLRPC to delegate the handling to a pre-configured Spring bean instead of a newly instantiated bean.

First, the custom RequestProcessorFactoryFactory. This component is called from within a HandlerMapping to get back a RequestProcessor object (our IHandler) via two levels of indirection. Since Apache XML-RPC looks up a RequestProcessor by Class, we make an internal Map of IHandler class to IHandler bean reference. It extends the StatelessProcessorFactoryFactory which is recommended for "heavyweight" handlers. Here is the code for my SpringRequestProcessorFactoryFactory.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// Source: src/main/java/com/mycompany/myapp/xmlrpc/SpringRequestProcessorFactoryFactory.java
package com.mycompany.myapp.xmlrpc;

import java.util.HashMap;
import java.util.Map;

import org.apache.xmlrpc.XmlRpcException;
import org.apache.xmlrpc.XmlRpcRequest;
import org.apache.xmlrpc.server.RequestProcessorFactoryFactory;
import org.apache.xmlrpc.server.RequestProcessorFactoryFactory.StatelessProcessorFactoryFactory;

public class SpringRequestProcessorFactoryFactory 
    extends StatelessProcessorFactoryFactory
    implements RequestProcessorFactoryFactory {

  Map<Class<? extends IHandler>,IHandler> classHandlerMappings;
  
  protected void init(Map<String,IHandler> handlerMappings) {
    classHandlerMappings = new HashMap<Class<? extends IHandler>, IHandler>();
    for (String key : handlerMappings.keySet()) {
      IHandler handler = handlerMappings.get(key);
      Class<? extends IHandler> clazz = handler.getClass();
      classHandlerMappings.put(clazz, handler);
    }
  }
  
  public RequestProcessorFactory getRequestProcessorFactory(
      Class clazz) throws XmlRpcException {
    final IHandler handler = classHandlerMappings.get(clazz);
    return new RequestProcessorFactory() {
      public Object getRequestProcessor(XmlRpcRequest pRequest)
          throws XmlRpcException {
        return handler;
      }
    };
  }
}

Next, we create our custom HandlerMapping. As before, the trick is to know what extension point to use, and I just follow Tomas's lead on this one. This component sets up a mapping of the public methods of the IHandler beans. Because you need to have the RequestProcessorFactoryFactory be set into the HandlerMapping object before the handler mappings themselves, I baked in this requirement into the setHandlerMappings method in my custom subclass. Here it is:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// Source: src/main/java/com/mycompany/myapp/xmlrpc/SpringHandlerMapping.java
package com.mycompany.myapp.xmlrpc;

import java.util.Map;

import org.apache.xmlrpc.XmlRpcException;
import org.apache.xmlrpc.server.AbstractReflectiveHandlerMapping;
import org.springframework.util.Assert;

public class SpringHandlerMapping extends AbstractReflectiveHandlerMapping {

  public void setHandlerMappings(Map<String,IHandler> handlerMappings) 
      throws XmlRpcException {
    SpringRequestProcessorFactoryFactory ff = 
      (SpringRequestProcessorFactoryFactory) getRequestProcessorFactoryFactory();
    Assert.notNull(ff, "RequestProcessorFactoryFactory must be set");
    ff.init(handlerMappings);
    for (String serviceName : handlerMappings.keySet()) {
      IHandler serviceBean = handlerMappings.get(serviceName);
      registerPublicMethods(serviceName, serviceBean.getClass());
    }
  }
}

At this point, all we need to do is replace the PropertyHandlerMapping instantiation and setup, and replace it with our custom HandlerMapping object. The code for doing that has been updated in the controller code above (and the old code that it replaces has been commented out).