Howard,

Thanks for looking at all this. Adding System.gc() did not cause it to
segfault. The segfault still comes much later in the processing.

I was able to reduce my code to a single test file without other
dependencies. It is attached. This code simply opens a text file and reads
its lines, one by one. Once finished, it closes and opens the same file and
reads the lines again. On my system, it does this about 4 times until the
segfault fires. Obviously this code makes no sense, but it's based on our
actual code that reads millions of lines of data and does various
processing to it.

Attached is a tweets.tgz file that you can uncompress to have an input
directory. The text file is just the same line over and over again. Run it
as:

*java MPITestBroke tweets/*


Nate





On Wed, Aug 5, 2015 at 8:29 AM, Howard Pritchard <hpprit...@gmail.com>
wrote:

> Hi Nate,
>
> Sorry for the delay in getting back.  Thanks for the sanity check.  You
> may have a point about the args string to MPI.init -
> there's nothing the Open MPI is needing from this but that is a difference
> with your use case - your app has an argument.
>
> Would you mind adding a
>
> System.gc()
>
> call immediately after MPI.init call and see if the gc blows up with a
> segfault?
>
> Also, may be interesting to add the -verbose:jni to your command line.
>
> We'll do some experiments here with the init string arg.
>
> Is your app open source where we could download it and try to reproduce
> the problem locally?
>
> thanks,
>
> Howard
>
>
> 2015-08-04 18:52 GMT-06:00 Nate Chambers <ncham...@usna.edu>:
>
>> Sanity checks pass. Both Hello and Ring.java run correctly with the
>> expected program's output.
>>
>> Does MPI.init(args) expect anything from those command-line args?
>>
>>
>> Nate
>>
>>
>> On Tue, Aug 4, 2015 at 12:26 PM, Howard Pritchard <hpprit...@gmail.com>
>> wrote:
>>
>>> Hello Nate,
>>>
>>> As a sanity check of your installation, could you try to compile the
>>> examples/*.java codes using the mpijavac you've installed and see that
>>> those run correctly?
>>> I'd be just interested in the Hello.java and Ring.java?
>>>
>>> Howard
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> 2015-08-04 14:34 GMT-06:00 Nate Chambers <ncham...@usna.edu>:
>>>
>>>> Sure, I reran the configure with CC=gcc and then make install. I think
>>>> that's the proper way to do it. Attached is my config log. The behavior
>>>> when running our code appears to be the same. The output is the same error
>>>> I pasted in my email above. It occurs when calling MPI.init().
>>>>
>>>> I'm not great at debugging this sort of stuff, but happy to try things
>>>> out if you need me to.
>>>>
>>>> Nate
>>>>
>>>>
>>>> On Tue, Aug 4, 2015 at 5:09 AM, Howard Pritchard <hpprit...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello Nate,
>>>>>
>>>>> As a first step to addressing this, could you please try using gcc
>>>>> rather than the Intel compilers to build Open MPI?
>>>>>
>>>>> We've been doing a lot of work recently on the java bindings, etc. but
>>>>> have never tried using any compilers other
>>>>> than gcc when working with the java bindings.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Howard
>>>>>
>>>>>
>>>>> 2015-08-03 17:36 GMT-06:00 Nate Chambers <ncham...@usna.edu>:
>>>>>
>>>>>> We've been struggling with this error for a while, so hoping someone
>>>>>> more knowledgeable can help!
>>>>>>
>>>>>> Our java MPI code exits with a segfault during its normal operation, *but
>>>>>> the segfault occurs before our code ever uses MPI functionality like
>>>>>> sending/receiving. *We've removed all message calls and any use of
>>>>>> MPI.COMM_WORLD from the code. The segfault occurs if we call 
>>>>>> MPI.init(args)
>>>>>> in our code, and does not if we comment that line out. Further vexing us,
>>>>>> the crash doesn't happen at the point of the MPI.init call, but later on 
>>>>>> in
>>>>>> the program. I don't have an easy-to-run example here because our non-MPI
>>>>>> code is so large and complicated. We have run simpler test programs with
>>>>>> MPI and the segfault does not occur.
>>>>>>
>>>>>> We have isolated the line where the segfault occurs. However, if we
>>>>>> comment that out, the program will run longer, but then randomly (but
>>>>>> deterministically) segfault later on in the code. Does anyone have tips 
>>>>>> on
>>>>>> how to debug this? We have tried several flags with mpirun, but no good
>>>>>> clues.
>>>>>>
>>>>>> We have also tried several MPI versions, including stable 1.8.7 and
>>>>>> the most recent 1.8.8rc1
>>>>>>
>>>>>>
>>>>>> ATTACHED
>>>>>> - config.log from installation
>>>>>> - output from `ompi_info -all`
>>>>>>
>>>>>>
>>>>>> OUTPUT FROM RUNNING
>>>>>>
>>>>>> > mpirun -np 2 java -mx4g FeaturizeDay datadir/ days.txt
>>>>>> ...
>>>>>> some normal output from our code
>>>>>> ...
>>>>>>
>>>>>> --------------------------------------------------------------------------
>>>>>> mpirun noticed that process rank 0 with PID 29646 on node r9n69
>>>>>> exited on signal 11 (Segmentation fault).
>>>>>>
>>>>>> --------------------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> Link to this post:
>>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27386.php
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27389.php
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/users/2015/08/27391.php
>>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2015/08/27392.php
>>>
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2015/08/27393.php
>>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/08/27396.php
>
import java.util.LinkedList;
import java.util.List;
import java.util.Scanner;
import java.util.zip.GZIPInputStream;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;

import mpi.MPI;

/**
 * Test class to simulate OpenMPI segfault. 
 * 
 * This class opens a text file and reads its lines, over and over. It doesn't do anything with it, just
 * reads lines into memory.
 * 
 * You need to give it a directory on the command-line, and it looks for the text file in there. Each line in 
 * the text file is expected to have some text, a tab character, and more text.
 * 
 * MPITestBroke <dir>
 * 
 */
public class MPITestBroke {
  String tweetDirectory;
  Scanner rawScanner = null;
  
  int id;
  int numProcesses;

  List<String> allFiles;


  /*****
   * MPITestBroke expects one arg, the path to a text directory.
   * @param args Command-line args
   */
  public MPITestBroke(String[] args) {
    id = 0;
    tweetDirectory = args[0];

    // Tries to get MPI going
    setupMPI(args);

    // Gets all the files in the tweet directory
    allFiles = new LinkedList<String>();
    for( String name : getFiles(tweetDirectory) )
      allFiles.add(name);
  }

  private void runPMI() {
    int i = 0;
    for(String file : allFiles ) {
      // If this day should be featurized according to the process number
      if(id == (i % numProcesses)){
        featurize(file);
        System.out.println("Rank " + id + " of " + numProcesses + " finished day " + file);
      }
      else System.out.println("Rank " + id + " of " + numProcesses + " skipped day " + file);
      i++;
    }
    System.out.println("Rank " + id + " of " + numProcesses + " leaving runPMI()");
  }

  /******
   * This method just reads lines from the file forever. It splits them on a tab character.
   * @param file of text, one per line with some tab characters
   */
  private void featurize(String file) {
    String tweet = getNextRawTweet(file);
    tweet = tweet.split("\t")[0];
    while(tweet != null){
      tweet = getNextRawTweet(file);
      if(tweet != null){
        tweet=tweet.split("\t")[0];
      }
    }
  }

  private void setupMPI(String[] args) {
    try {
      MPI.Init(args); 
      System.gc();
      MPI.COMM_WORLD.setErrhandler(MPI.ERRORS_RETURN);
      id = MPI.COMM_WORLD.getRank();
      numProcesses = MPI.COMM_WORLD.getSize();
    }
    catch(Exception e){
      System.out.println("MPI failed to initiate. Assuming normal processing.");
      e.printStackTrace();
    }
    catch(UnsatisfiedLinkError ule){
      System.out.println("Not using MPI");
    }
  }

  /**
   * Read a directory and return all files.
   */
  public static List<String> getFiles(String dirPath) {
    return getFiles(new File(dirPath));
  }
  public static List<String> getFiles(File dir) {
    if( dir.isDirectory() ) {
      List<String> files = new LinkedList<String>();
      for( String file : dir.list() ) {
        if( !file.startsWith(".") )
          files.add(file);
      }
      return files;
    }

    return null;
  }

  /**
   * Returns the string of a single raw tweet line from the file.
   * 
   * @return
   */
  public String getNextRawTweet(String file) {
    if( rawScanner != null && rawScanner.hasNext() ) {
      try {
        String line = rawScanner.next();
        if( line != null )
          return line;
        else {
          rawScanner.close();
          System.out.println("Closed file.");
        }
      } catch (Exception ex) {
        ex.printStackTrace();
      }
    }
    // Open the next file
    else {
      openFile(tweetDirectory + File.separator + file);
      return getNextRawTweet(file);      
    }

    return null;
  }

  private boolean openFile(String path) {
    System.out.println("\nopening file " + path);
    try {
      BufferedReader rawReader;
      if( rawScanner != null )
        rawScanner.close();
      // Zipped
      if( path.endsWith(".gz") ) {
        InputStream in = new GZIPInputStream(new FileInputStream(new File(path)));
        rawReader = new BufferedReader(new InputStreamReader(in));
      }
      // Non-zipped
      else {
        rawReader = new BufferedReader(new FileReader(path));
      }
      rawScanner = new Scanner(rawReader).useDelimiter("\n");
    } catch( IOException ex ) {
      System.err.println("Error opening file: " + path);
      ex.printStackTrace();
      return false;
    }
    return true;
  }

  /**
   * The Main takes one argument: path to a directory that contains a text file
   */
  public static void main(String[] args) {
    if (args.length == 2){
      // Create the Featurize day object
      MPITestBroke test = new MPITestBroke(args);

      //Get the PMIs
      test.runPMI();

      try {
        System.out.println("Finalizing...");
        MPI.Finalize();
      } catch( Exception ex ) {
        ex.printStackTrace();
      }
    }
    // Hits here if the arguments are wrong
    else 
      System.out.println("Argument format needed: path/To/TweetDir");
  }

}

Attachment: tweets.tgz
Description: GNU Zip compressed data

Reply via email to