Roadmap for next release

2010-01-28 Thread Ganesh
Hello all,

Please provide me the information related to road map for the next release. 
This information will be really helpful to plan our product road map for this 
year. 

Is the below feature planned for this year.
-
1. To reduce sorting memory consumption by caching / offload it to disk
2. If all records are not part of sorting, Is there any way to create the 
custom field cache array based on some filter criteria.

Regards
Ganesh
Send instant messages to your online friends http://in.messenger.yahoo.com

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



combine query score with external score

2010-01-28 Thread Dennis Hendriksen
Hi,

I'm struggling to create a performant query in Lucene 3.0.0 in which I
want to combine 'regular' scoring with scores derived from external
sources.

For each document a fixed set of scores is calculated in the range [0.0,
1.0>. These scores represent the confidences that a document falls into
categories. So for example document #1 has a score of 0.3 for cat=boys,
0.2 for cat=girls, 0.1 for cat=toys, 0.05 for cat=animals.

The 'regular' scoring is calculated using a BooleanQuery with TermQuerys
similar to: -type:H +(title:dna body:dna^1.5)

In the current naive approach I'm combining the scores as following:
- for each document store the three best categories in the following
fields:
name=cat1st value=boys fieldboost=0.3
name=cat2nd value=girls fieldboost=0.2
name=cat3rd value=toys fieldboost=0.1
Search-time use the following query if you're interested in 'girls':
-type:H +(title:dna body:dna^1.5) cat1st:girls cat2nd:girls cat3rd:girls
or if you're interested in 'boys':
-type:H +(title:dna body:dna^1.5) cat1st:boys cat2nd:boys cat3rd:boys

Disadvantages of the current approach:
- loss of precision encoding/decoding boosts (performance is important,
so this might be acceptable)
- using TermQuery for the cat fields doesn't make a lot of sense since
the external scores are multiplied by the idf of 'boys'/'girls' and the
querynorm
- the resulting score from the cat field is added to the other query
score instead of multiplied

Just to give you an idea: the index I'm using is growing in time and
contains about 50 million documents

Do you have an idea how I can improve my query and still keep high
performance? 
Or should I combine the scores in the Collector (but this doesn't seem
the right place to retrieve the category scores from the index)?
Is it possible to use a different float->byte encoder per field to
reduce the lack of precision?

Thanks for your time,
Dennis
  


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Lucene full text search

2010-01-28 Thread Lutischán Ferenc

Hi,

I have a problem with Lucene:
I'm indexed an english phrase list with Lucene:
doc.add(new Field("r1", r1.toLowerCase(), Field.Store.NO, 
Field.Index.ANALYZED));


I searched for the word 'arabic':

Analyzer analyzer = new 
StandardAnalyzer(Version.LUCENE_CURRENT);
QueryParser parser = new 
QueryParser(Version.LUCENE_CURRENT, this.searchedField, analyzer);

Query query = parser.parse(searchedStr);
TopScoreDocCollector collector = 
TopScoreDocCollector.create(10, true);

this.memDict.isearcher.search(query, collector);
foundCnt=collector.getTotalHits();
System.out.println(searchedStr + ":" + foundCnt);

// Iterate through the results:
ScoreDoc[] hits = collector.topDocs().scoreDocs;
for (int i = 0; i < hits.length; i++) {
Document hitDoc = this.memDict.isearcher.doc(hits[i].doc);
System.out.println("\"r1\"=" + hitDoc.get("r1"));
}

The result list is:
*arabic
**arabic* numerals
gum *arabic
*
But is not in the result list:
moz*arabic*

How to use Lucene to find all the words contains 'arabic'?

Regards,
Ferenc


Re: Lucene full text search

2010-01-28 Thread Erick Erickson
Well, there are a couple of approaches:


1> enable leading wildcards and search for *arabic*. You
 probably don't want to do this, it's really, really expensive.
2> use the ngram (edgengram?) tokenizers. This'll cost
 you some index space, but that may be acceptable.

HTH
Erick

2010/1/28 Lutischán Ferenc 

> Hi,
>
> I have a problem with Lucene:
> I'm indexed an english phrase list with Lucene:
>doc.add(new Field("r1", r1.toLowerCase(), Field.Store.NO,
> Field.Index.ANALYZED));
>
> I searched for the word 'arabic':
>
>Analyzer analyzer = new
> StandardAnalyzer(Version.LUCENE_CURRENT);
>QueryParser parser = new QueryParser(Version.LUCENE_CURRENT,
> this.searchedField, analyzer);
>Query query = parser.parse(searchedStr);
>TopScoreDocCollector collector = TopScoreDocCollector.create(10,
> true);
>this.memDict.isearcher.search(query, collector);
>foundCnt=collector.getTotalHits();
>System.out.println(searchedStr + ":" + foundCnt);
>
>// Iterate through the results:
>ScoreDoc[] hits = collector.topDocs().scoreDocs;
>for (int i = 0; i < hits.length; i++) {
>Document hitDoc = this.memDict.isearcher.doc(hits[i].doc);
>System.out.println("\"r1\"=" + hitDoc.get("r1"));
>}
>
> The result list is:
> *arabic
> **arabic* numerals
> gum *arabic
> *
> But is not in the result list:
> moz*arabic*
>
> How to use Lucene to find all the words contains 'arabic'?
>
> Regards,
>Ferenc
>


Search a PhraseQuery one multiple terms with the same position

2010-01-28 Thread Karsten F.

Hi,

I have a problem with the checkedRepeats in SloppyPhraseScorer.
This feature is for phrases like "1st word 2st word".
Without this feature the result would be the same as "1st word 2st". 
OK

But I have an Index with more then one token on the same position.
The german sentence  "Die käuflichen Reihenhäuser standen am Waldrand" is
tokenized in the index as
"die käuflichen|kaufen reihenhäuser|reihe|haus standen|stehen am
waldrand|wald|rand"
where e.g. all three terms "reihenhäuser|reihe|haus" have the same position.

My problem:
I need a hit for the phrase "reihe haus", but I don't get it, because of the
checkedRepeats feature in SloppyPhraseScorer.
Any ideas how to deal with this problem?

Best regards
  Karsten

P.S. a source code example to show the problem:
/
package org.apache.lucene.search;

import java.io.IOException;

import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import
org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;
import org.apache.lucene.analysis.tokenattributes.TermAttribute;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;

/**
 * @see SloppyPhraseScorer
 */
public class TestPhraseWithoutPosIncrementQuery
{
public static class MyTokenStream extends TokenStream
{
TermAttribute  termAtt;
PositionIncrementAttribute posIncrAtt;

int[]  posInc = new int[] { 1, 0, 0, 1, 0, 0 };
String[]   terms  = new String[] { "t00", "t01",
"t02", "t10", "t11", "t12" };
intpos= 0;

public MyTokenStream()
{
termAtt = (TermAttribute) addAttribute(TermAttribute.class);
posIncrAtt = (PositionIncrementAttribute)
addAttribute(PositionIncrementAttribute.class);
}

public boolean incrementToken() throws IOException
{
if (pos < terms.length)
{
termAtt.setTermBuffer(terms[pos]);
posIncrAtt.setPositionIncrement(posInc[pos]);
pos++;
return true;
}
return false;
}
}

public static void main(String[] args) throws Exception
{
Directory ramDirectory = new RAMDirectory();
IndexWriter indexWriter = new IndexWriter(ramDirectory, new
StandardAnalyzer());
Document testDocument = new org.apache.lucene.document.Document();
Field f = new Field("field", new MyTokenStream());
testDocument.add(f);
indexWriter.addDocument(testDocument);
indexWriter.commit();
indexWriter.close();

IndexReader iR = IndexReader.open(ramDirectory);
IndexSearcher indexSearcher = new IndexSearcher(iR);
PhraseQuery query = new PhraseQuery();
query.add(new Term("field", "t00"), 0);
query.add(new Term("field", "t10"), 1);
Hits hits = indexSearcher.search(query);
System.out.println(query.toString() + ": " + hits.length());
// field:"t00 t10": 1

query = new PhraseQuery();
query.add(new Term("field", "t01"), 0);
query.add(new Term("field", "t11"), 1);
hits = indexSearcher.search(query);
System.out.println(query.toString() + ": " + hits.length());
// field:"t01 t11": 1

query = new PhraseQuery();
query.add(new Term("field", "t00"), 0);
query.add(new Term("field", "t01"), 1);
hits = indexSearcher.search(query);
System.out.println(query.toString() + ": " + hits.length());
// field:"t00 t01": 0

}

}
 
-- 
View this message in context: 
http://old.nabble.com/Search-a-PhraseQuery-one-multiple-terms-with-the-same-position-tp27356784p27356784.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Average Precision - TREC-3

2010-01-28 Thread Grant Ingersoll

On Jan 27, 2010, at 1:36 PM, Ivan Provalov wrote:

> Robert, Grant:
> 
> Thank you for your replies.  
> 
> Our goal is to fine-tune our existing system to perform better on relevance.

What kind of documents do you have?  Are they very similar to the TREC docs 
(i.e. news articles)?  There can be a fairly wide difference in performance 
between real docs and TREC docs, especially given real queries.  Doing well at 
TREC does not necessarily equate to doing well in your own system.  You might 
be better off just doing something like taking the top 50 queries from your 
logs plus some random ones from the tail and judging the top 10.   See 
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Debugging-Relevance-Issues-Search


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Highlighter / cannot be instantiated

2010-01-28 Thread Marc Schwarz
I'm trying to get the highlighter running, but didn't get it work.

Everywhere it's posted as following:

Highlighter highlighter = new Highlighter(new SimpleHTMLFormatter(), 
new QueryScorer(query));

but that gives me a
"Highlighter is abstract; cannot be instantiated".

I'm using version 2.9 of highliter/2.9 of lucene

Any ideas ? Thanks :-)

Greetings,
Marc




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Average Precision - TREC-3

2010-01-28 Thread Robert Muir
in addition to what Grant said, even if your documents are similar, what
about queries?

For example, if only a few trec queries contain proper names, acronyms,
abbreviations, or whatever, but your users frequently input things like
this, it won't be representative.

i will disagree with him on a few things though, I would rather have less
queries (25 or so), but more judgements, definitely a lot more than 10.
Maybe your users only care about the top-10 results but its crucial to judge
some lower-ranking docs too, especially if you have recall problems...

On Thu, Jan 28, 2010 at 9:34 AM, Grant Ingersoll wrote:

>
> On Jan 27, 2010, at 1:36 PM, Ivan Provalov wrote:
>
> > Robert, Grant:
> >
> > Thank you for your replies.
> >
> > Our goal is to fine-tune our existing system to perform better on
> relevance.
>
> What kind of documents do you have?  Are they very similar to the TREC docs
> (i.e. news articles)?  There can be a fairly wide difference in performance
> between real docs and TREC docs, especially given real queries.  Doing well
> at TREC does not necessarily equate to doing well in your own system.  You
> might be better off just doing something like taking the top 50 queries from
> your logs plus some random ones from the tail and judging the top 10.   See
> http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Debugging-Relevance-Issues-Search
>
>
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem using Solr/Lucene:
> http://www.lucidimagination.com/search
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


-- 
Robert Muir
rcm...@gmail.com


How to get matched terms

2010-01-28 Thread Vaijanath Rao
Hi All,

What is the simplest way of getting the matched terms of the query with
respect to the document. So for example let's say a document has field X and
the contains of the field are "a b c" now when I do a search for 'b c'. The
document will be returned I want to get back the terms that this document
matched with the query terms. Can someone tell me the easiest way to
accomplish this.

--Thanks and Regards
Vaijanath

-- 
I am feeling fine, healthier and Happier, what about you


Re: How to get matched terms

2010-01-28 Thread Benjamin Heilbrunn
You could use Query.extractTerms(..) and then search for possible
matches in the field term vector (requires stored TV).

2010/1/28 Vaijanath Rao :
> Hi All,
>
> What is the simplest way of getting the matched terms of the query with
> respect to the document. So for example let's say a document has field X and
> the contains of the field are "a b c" now when I do a search for 'b c'. The
> document will be returned I want to get back the terms that this document
> matched with the query terms. Can someone tell me the easiest way to
> accomplish this.
>
> --Thanks and Regards
> Vaijanath
>
> --
> I am feeling fine, healthier and Happier, what about you
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Average Precision - TREC-3

2010-01-28 Thread Grant Ingersoll

On Jan 28, 2010, at 11:00 AM, Robert Muir wrote:

> in addition to what Grant said, even if your documents are similar, what
> about queries?
> 
> For example, if only a few trec queries contain proper names, acronyms,
> abbreviations, or whatever, but your users frequently input things like
> this, it won't be representative.

+1

> 
> i will disagree with him on a few things though, I would rather have less
> queries (25 or so), but more judgements, definitely a lot more than 10.
> Maybe your users only care about the top-10 results but its crucial to judge
> some lower-ranking docs too, especially if you have recall problems...

Perfectly reasonable as well.  I've seen some people who only care about p...@5 
and even p...@1 and others who do much more.  The important thing is to think 
about what makes sense for your application and users.  Much of this can be 
found through basic log analysis (assuming an existing system) or some 
reasoning about use cases (new system) and users (how sophisticated they are, 
etc.)  

-Grant


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Average Precision - TREC-3

2010-01-28 Thread Robert Muir
right, but the problem is when something is currently ranked as doc 20 but
should be in the top 1, 5, or 10, and you aren't seeing it.

so I think if you are judging top-N docs from an existing system, you should
look a little farther ahead than the top-N you care about.
I think you should also index your data a few different ways and judge those
top-N too, for example, use n-gram tokenizer.

It doesn't have to be crazy like a formal trec-like pooling process, but I
think you need to introduce enough variation that you have judgements for
docs that should be ranked higher than they currently are.


> Perfectly reasonable as well.  I've seen some people who only care about
> p...@5 and even p...@1 and others who do much more.  The important thing is to
> think about what makes sense for your application and users.  Much of this
> can be found through basic log analysis (assuming an existing system) or
> some reasoning about use cases (new system) and users (how sophisticated
> they are, etc.)
>
> -Grant
>
>
>

> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>



-- 
Robert Muir
rcm...@gmail.com


lucene search

2010-01-28 Thread andy green

hello,

I programmed with Lucene code to handle the search on my site ... the
articles indexed are those stored in a database, then I do a search with
"lucene.queryparser" on the field "code" of various objects (a "code" is a
word of 3 6-character) ...

My problem is the fact that when I search, I am obliged to insert exactly
the real "code" to get a result. For example if in the database, I have an
object whose "code" is "lpg" , by typing "lp" in my textbox (for searching),
I get nothing ... I must enter the real entire code  ... "lpg"

In addition my research does not react with "figures" or characters such as
"_"

How can I do? I think that the problem may be due to "analyzer" I chose? (I
tried to use "SimpleAnalyser" or "StandardAnalyser)


Thank you for your help!

-- 
View this message in context: 
http://old.nabble.com/lucene-search-tp27358766p27358766.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: combine query score with external score

2010-01-28 Thread Steven A Rowe
Hi Dennis,

You should check out payloads (arbitrary per-index-term byte[] arrays), which 
can be used to encode values which are then incorporated into documents' 
scores, by overriding Similarity.scorePayload():



The Lucene in Action 2 MEAP has a nice introduction to using payloads to 
influence scoring, in section 6.5.

See also this (slightly out-of-date*) blog post "Getting Started with Payloads" 
by Grant Ingersoll at Lucid Imagination:



*Note that since this blog post was written, BoostingTermQuery was renamed to 
PayloadTermQuery (in Lucene 2.9.0+ ; see 
http://issues.apache.org/jira/browse/LUCENE-1827 ; wow - this issue isn't 
mentioned in CHANGES.txt???):



Steve

On 01/28/2010 at 6:01 AM, Dennis Hendriksen wrote:
> I'm struggling to create a performant query in Lucene 3.0.0 in which I
> want to combine 'regular' scoring with scores derived from external
> sources.
> 
> For each document a fixed set of scores is calculated in the range [0.0,
> 1.0>. These scores represent the confidences that a document falls into
> categories. So for example document #1 has a score of 0.3 for cat=boys,
> 0.2 for cat=girls, 0.1 for cat=toys, 0.05 for cat=animals.
> 
> The 'regular' scoring is calculated using a BooleanQuery with TermQuerys
> similar to: -type:H +(title:dna body:dna^1.5)
> 
> In the current naive approach I'm combining the scores as following: -
> for each document store the three best categories in the following
> fields:
> name=cat1st value=boys fieldboost=0.3
> name=cat2nd value=girls fieldboost=0.2
> name=cat3rd value=toys fieldboost=0.1
> Search-time use the following query if you're interested in 'girls':
> -type:H +(title:dna body:dna^1.5) cat1st:girls cat2nd:girls cat3rd:girls 
> or if you're interested in 'boys': 
> -type:H +(title:dna body:dna^1.5) cat1st:boys cat2nd:boys cat3rd:boys
> 
> Disadvantages of the current approach:
> - loss of precision encoding/decoding boosts (performance is important,
> so this might be acceptable)
> - using TermQuery for the cat fields doesn't make a lot of sense since
> the external scores are multiplied by the idf of 'boys'/'girls' and
> the querynorm
> - the resulting score from the cat field is added to the other query
> score instead of multiplied
> 
> Just to give you an idea: the index I'm using is growing in time and
> contains about 50 million documents
> 
> Do you have an idea how I can improve my query and still keep high
> performance? Or should I combine the scores in the Collector (but this
> doesn't seem the right place to retrieve the category scores from the
> index)? Is it possible to use a different float->byte encoder per field
> to reduce the lack of precision?
> 
> Thanks for your time,
> Dennis




Re: lucene search

2010-01-28 Thread Shashi Kant
Hi, if you want to search by substring (i.e. "lp" should return "lpg"
as a result) you should look at wildcards.
So a search for "lp*" (* is the wildcard character) would return lpg,
lpghxyz, lp12345 and so on...



On Thu, Jan 28, 2010 at 1:41 PM, andy green  wrote:
>
> hello,
>
> I programmed with Lucene code to handle the search on my site ... the
> articles indexed are those stored in a database, then I do a search with
> "lucene.queryparser" on the field "code" of various objects (a "code" is a
> word of 3 6-character) ...
>
> My problem is the fact that when I search, I am obliged to insert exactly
> the real "code" to get a result. For example if in the database, I have an
> object whose "code" is "lpg" , by typing "lp" in my textbox (for searching),
> I get nothing ... I must enter the real entire code  ... "lpg"
>
> In addition my research does not react with "figures" or characters such as
> "_"
>
> How can I do? I think that the problem may be due to "analyzer" I chose? (I
> tried to use "SimpleAnalyser" or "StandardAnalyser)
>
>
> Thank you for your help!
>
> --
> View this message in context: 
> http://old.nabble.com/lucene-search-tp27358766p27358766.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: lucene search

2010-01-28 Thread Erick Erickson
the issue with non-letter characters is, indeed, the analyzer. Have
a look at all the different subclasses of Analyzer in the javadocs, getting
a copy of Luke will show you exactly what gets in your index, but
KeywordAnalyzer and WhitespaceAnalyzer may work for you (but they
don't normalize the case, you'll have to do that yourself).

And you can easily construct your own analyzer by stringing
together tokenizers and filters.

And don't forget PerFieldAnalyzerWrapper if you want different
analyzers for different fields.

HTH
Erick

On Thu, Jan 28, 2010 at 1:41 PM, andy green  wrote:

>
> hello,
>
> I programmed with Lucene code to handle the search on my site ... the
> articles indexed are those stored in a database, then I do a search with
> "lucene.queryparser" on the field "code" of various objects (a "code" is a
> word of 3 6-character) ...
>
> My problem is the fact that when I search, I am obliged to insert exactly
> the real "code" to get a result. For example if in the database, I have an
> object whose "code" is "lpg" , by typing "lp" in my textbox (for
> searching),
> I get nothing ... I must enter the real entire code  ... "lpg"
>
> In addition my research does not react with "figures" or characters such as
> "_"
>
> How can I do? I think that the problem may be due to "analyzer" I chose? (I
> tried to use "SimpleAnalyser" or "StandardAnalyser)
>
>
> Thank you for your help!
>
> --
> View this message in context:
> http://old.nabble.com/lucene-search-tp27358766p27358766.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


index a database

2010-01-28 Thread luciusvorenus

Hello 

I tried to index a database

""
import org.apache.lucene.demo.FileDocument;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;
import java.sql.*;
import java.util.Properties;
import org.apache.lucene.document.*;
import java.io.*;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

public class Con{
public static void main(String[] args) throws Exception

  {
final File INDEX_DIR = new File("index");


   Class.forName("com.mysql.jdbc.Driver").newInstance();
   Connection conn =
DriverManager.getConnection("jdbc:mysql://127.0.0.1/test", "root",
"passwort");
   StandardAnalyzer analyzer = new StandardAnalyzer(null);
   IndexWriter writer = new IndexWriter(INDEX_DIR, analyzer, 
true);
   System.out.println("Indexing to directory '" + INDEX_DIR + 
"'...");
   indexDocs(writer, conn);
   writer.optimize();
   writer.close();



}
  

private static  void indexDocs(IndexWriter writer, Connection conn) throws
Exception {
  String sql = "select c_id, city from city";
  Statement stmt = conn.createStatement();
  ResultSet rs = stmt.executeQuery(sql);
  while (rs.next()) {
 Document d = new Document();
 d.add(new Field("c_id", rs.getString("c_id"), Field.Store.YES,
Field.Index.NO));
 d.add(new Field("city", rs.getString("city"), Field.Store.NO,
Field.Index.ANALYZED));
 
 writer.addDocument(d);
 }

}
}
 """
and i get this message

""
symbol  : constructor
IndexWriter(java.io.File,org.apache.lucene.analysis.standard.StandardAnalyzer,boolean)
location: class org.apache.lucene.index.IndexWriter
   IndexWriter writer = new IndexWriter(INDEX_DIR, analyzer, 
true);
^
1 error

What i am doing wrong??

I'm a newbie ...

Thank U
-- 
View this message in context: 
http://old.nabble.com/index-a-database-tp27358959p27358959.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: index a database

2010-01-28 Thread Erick Erickson
What version are you using? Because there's no such constructor
(i.e. one that takes a File) in 3.0.

You might want to use something like FSDirectory.open(file) in
your IndexWriter constructor

If this doesn't work, more details please

Erick

On Thu, Jan 28, 2010 at 3:30 PM, luciusvorenus wrote:

>
> Hello
>
> I tried to index a database
>
> ""
> import org.apache.lucene.demo.FileDocument;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field;
> import org.apache.lucene.analysis.Analyzer;
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.queryParser.QueryParser;
> import org.apache.lucene.store.FSDirectory;
> import org.apache.lucene.util.Version;
> import java.sql.*;
> import java.util.Properties;
> import org.apache.lucene.document.*;
> import java.io.*;
> import org.apache.lucene.store.Directory;
> import org.apache.lucene.store.FSDirectory;
>
> public class Con{
>public static void main(String[] args) throws Exception
>
>  {
>final File INDEX_DIR = new File("index");
>
>
>   Class.forName("com.mysql.jdbc.Driver").newInstance();
>   Connection conn =
> DriverManager.getConnection("jdbc:mysql://127.0.0.1/test", "root",
> "passwort");
>   StandardAnalyzer analyzer = new StandardAnalyzer(null);
>   IndexWriter writer = new IndexWriter(INDEX_DIR, analyzer,
> true);
>   System.out.println("Indexing to directory '" + INDEX_DIR
> + "'...");
>   indexDocs(writer, conn);
>   writer.optimize();
>   writer.close();
>
>
>
>}
>
>
> private static  void indexDocs(IndexWriter writer, Connection conn) throws
> Exception {
>  String sql = "select c_id, city from city";
>  Statement stmt = conn.createStatement();
>  ResultSet rs = stmt.executeQuery(sql);
>  while (rs.next()) {
> Document d = new Document();
> d.add(new Field("c_id", rs.getString("c_id"), Field.Store.YES,
> Field.Index.NO));
> d.add(new Field("city", rs.getString("city"), Field.Store.NO,
> Field.Index.ANALYZED));
>
> writer.addDocument(d);
> }
>
> }
> }
>  """
> and i get this message
>
> ""
> symbol  : constructor
>
> IndexWriter(java.io.File,org.apache.lucene.analysis.standard.StandardAnalyzer,boolean)
> location: class org.apache.lucene.index.IndexWriter
>   IndexWriter writer = new IndexWriter(INDEX_DIR, analyzer,
> true);
>^
> 1 error
>
> What i am doing wrong??
>
> I'm a newbie ...
>
> Thank U
> --
> View this message in context:
> http://old.nabble.com/index-a-database-tp27358959p27358959.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Average Precision - TREC-3

2010-01-28 Thread Ivan Provalov
Great reference, Grant!  Thank you!

Our content is very similar to TREC-3 (periodicals).  In fact, there is some 
content overlap between our content and TREC's (actual documents).  

The query types are very similar (ad hoc).  The cost of extracting our top 
queries is that we would have to also perform the judgments ourselves.  This 
could be a very time consuming process.

Thank you,

Ivan



--- On Thu, 1/28/10, Grant Ingersoll  wrote:

> From: Grant Ingersoll 
> Subject: Re: Average Precision - TREC-3
> To: java-user@lucene.apache.org
> Date: Thursday, January 28, 2010, 9:34 AM
> 
> On Jan 27, 2010, at 1:36 PM, Ivan Provalov wrote:
> 
> > Robert, Grant:
> > 
> > Thank you for your replies.  
> > 
> > Our goal is to fine-tune our existing system to
> perform better on relevance.
> 
> What kind of documents do you have?  Are they very
> similar to the TREC docs (i.e. news articles)?  There
> can be a fairly wide difference in performance between real
> docs and TREC docs, especially given real queries. 
> Doing well at TREC does not necessarily equate to doing well
> in your own system.  You might be better off just doing
> something like taking the top 50 queries from your logs plus
> some random ones from the tail and judging the top
> 10.   See 
> http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Debugging-Relevance-Issues-Search
> 
> 
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem using Solr/Lucene: 
> http://www.lucidimagination.com/search
> 
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> 




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Average Precision - TREC-3

2010-01-28 Thread Ivan Provalov
Great points, Robert!  

I agree, we have a lot of fine tuning ahead of us.  

I think we probably have achieved the baseline with our MAP of 0.14.  We should 
move on to stage two and apply some of the suggestions to improve the overall 
scores.

These are just the first steps.  Both you and Grant brought up good points on 
the users' information needs and the precision/recall tuning of the system.

Thanks,

Ivan

--- On Thu, 1/28/10, Robert Muir  wrote:

> From: Robert Muir 
> Subject: Re: Average Precision - TREC-3
> To: java-user@lucene.apache.org
> Date: Thursday, January 28, 2010, 11:44 AM
> right, but the problem is when
> something is currently ranked as doc 20 but
> should be in the top 1, 5, or 10, and you aren't seeing
> it.
> 
> so I think if you are judging top-N docs from an existing
> system, you should
> look a little farther ahead than the top-N you care about.
> I think you should also index your data a few different
> ways and judge those
> top-N too, for example, use n-gram tokenizer.
> 
> It doesn't have to be crazy like a formal trec-like pooling
> process, but I
> think you need to introduce enough variation that you have
> judgements for
> docs that should be ranked higher than they currently are.
> 
> 
> > Perfectly reasonable as well.  I've seen some
> people who only care about
> > p...@5 and even p...@1 and others who do much more. 
> The important thing is to
> > think about what makes sense for your application and
> users.  Much of this
> > can be found through basic log analysis (assuming an
> existing system) or
> > some reasoning about use cases (new system) and users
> (how sophisticated
> > they are, etc.)
> >
> > -Grant
> >
> >
> >
> 
> >
> -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
> 
> 
> 
> -- 
> Robert Muir
> rcm...@gmail.com
> 




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



AW: index a database

2010-01-28 Thread Marc Schwarz
I had that problem yesterday... this works in my app:

Directory directory = new SimpleFSDirectory(new File("c:\\lucene\\index"));

IndexWriter w = new IndexWriter(directory, analyzer,true,
new IndexWriter.MaxFieldLength(25000));



-Ursprüngliche Nachricht-
Von: Erick Erickson [mailto:erickerick...@gmail.com] 
Gesendet: Donnerstag, 28. Januar 2010 21:47
An: java-user@lucene.apache.org
Betreff: Re: index a database

What version are you using? Because there's no such constructor
(i.e. one that takes a File) in 3.0.

You might want to use something like FSDirectory.open(file) in
your IndexWriter constructor

If this doesn't work, more details please

Erick

On Thu, Jan 28, 2010 at 3:30 PM, luciusvorenus
wrote:

>
> Hello
>
> I tried to index a database
>
> ""
> import org.apache.lucene.demo.FileDocument;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field;
> import org.apache.lucene.analysis.Analyzer;
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.queryParser.QueryParser;
> import org.apache.lucene.store.FSDirectory;
> import org.apache.lucene.util.Version;
> import java.sql.*;
> import java.util.Properties;
> import org.apache.lucene.document.*;
> import java.io.*;
> import org.apache.lucene.store.Directory;
> import org.apache.lucene.store.FSDirectory;
>
> public class Con{
>public static void main(String[] args) throws Exception
>
>  {
>final File INDEX_DIR = new File("index");
>
>
>   Class.forName("com.mysql.jdbc.Driver").newInstance();
>   Connection conn =
> DriverManager.getConnection("jdbc:mysql://127.0.0.1/test", "root",
> "passwort");
>   StandardAnalyzer analyzer = new StandardAnalyzer(null);
>   IndexWriter writer = new IndexWriter(INDEX_DIR,
analyzer,
> true);
>   System.out.println("Indexing to directory '" + INDEX_DIR
> + "'...");
>   indexDocs(writer, conn);
>   writer.optimize();
>   writer.close();
>
>
>
>}
>
>
> private static  void indexDocs(IndexWriter writer, Connection conn) throws
> Exception {
>  String sql = "select c_id, city from city";
>  Statement stmt = conn.createStatement();
>  ResultSet rs = stmt.executeQuery(sql);
>  while (rs.next()) {
> Document d = new Document();
> d.add(new Field("c_id", rs.getString("c_id"), Field.Store.YES,
> Field.Index.NO));
> d.add(new Field("city", rs.getString("city"), Field.Store.NO,
> Field.Index.ANALYZED));
>
> writer.addDocument(d);
> }
>
> }
> }
>  """
> and i get this message
>
> ""
> symbol  : constructor
>
>
IndexWriter(java.io.File,org.apache.lucene.analysis.standard.StandardAnalyze
r,boolean)
> location: class org.apache.lucene.index.IndexWriter
>   IndexWriter writer = new IndexWriter(INDEX_DIR,
analyzer,
> true);
>^
> 1 error
>
> What i am doing wrong??
>
> I'm a newbie ...
>
> Thank U
> --
> View this message in context:
> http://old.nabble.com/index-a-database-tp27358959p27358959.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: index a database

2010-01-28 Thread luciusvorenus

lucene 3.3

i tried liek this

""
import org.apache.lucene.demo.FileDocument;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;
import java.sql.*;
import java.util.Properties;
import org.apache.lucene.document.*;
import java.io.*;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.SimpleFSDirectory;

public class Con{
public static void main(String[] args) throws Exception

  {
final File INDEX_DIR = new File("index");


   Class.forName("com.mysql.jdbc.Driver").newInstance();
   Connection conn =
DriverManager.getConnection("jdbc:mysql://127.0.0.1/test", "root",
"passwort");
   StandardAnalyzer analyzer = new StandardAnalyzer(null);
   Directory directory = new SimpleFSDirectory(new
File("home/lucius/Desktop/index"));
   IndexWriter w = new IndexWriter(directory, analyzer,true, new
IndexWriter.MaxFieldLength(25000));
 
   System.out.println("Indexing to directory '" + INDEX_DIR + 
"'...");
   indexDocs(w, conn);
   w.optimize();
   w.close();



}
  

private static  void indexDocs(IndexWriter writer, Connection conn) throws
Exception {
  String sql = "select c_id, city from city";
  Statement stmt = conn.createStatement();
  ResultSet rs = stmt.executeQuery(sql);
  while (rs.next()) {
 Document d = new Document();
 d.add(new Field("c_id", rs.getString("c_id"), Field.Store.YES,
Field.Index.NO));
 d.add(new Field("city", rs.getString("city"), Field.Store.NO,
Field.Index.ANALYZED));
 
 writer.addDocument(d);
 }

}
}
 



I ' on the right track ?

now i get this message after compiling
""
Exception in thread "main" java.lang.NullPointerException
at
org.apache.lucene.analysis.StopFilter.getEnablePositionIncrementsVersionDefault(StopFilter.java:162)
at
org.apache.lucene.analysis.standard.StandardAnalyzer.(StandardAnalyzer.java:73)
at
org.apache.lucene.analysis.standard.StandardAnalyzer.(StandardAnalyzer.java:63)
at Con.main(Con.java:29)
"""


once again many thanks

Erick Erickson wrote:
> 
> What version are you using? Because there's no such constructor
> (i.e. one that takes a File) in 3.0.
> 
> You might want to use something like FSDirectory.open(file) in
> your IndexWriter constructor
> 
> If this doesn't work, more details please
> 
> Erick
> 
> On Thu, Jan 28, 2010 at 3:30 PM, luciusvorenus
> wrote:
> 
>>
>> Hello
>>
>> I tried to index a database
>>
>> ""
>> import org.apache.lucene.demo.FileDocument;
>> import org.apache.lucene.document.Document;
>> import org.apache.lucene.document.Field;
>> import org.apache.lucene.analysis.Analyzer;
>> import org.apache.lucene.analysis.standard.StandardAnalyzer;
>> import org.apache.lucene.index.IndexWriter;
>> import org.apache.lucene.search.IndexSearcher;
>> import org.apache.lucene.search.Query;
>> import org.apache.lucene.queryParser.QueryParser;
>> import org.apache.lucene.store.FSDirectory;
>> import org.apache.lucene.util.Version;
>> import java.sql.*;
>> import java.util.Properties;
>> import org.apache.lucene.document.*;
>> import java.io.*;
>> import org.apache.lucene.store.Directory;
>> import org.apache.lucene.store.FSDirectory;
>>
>> public class Con{
>>public static void main(String[] args) throws Exception
>>
>>  {
>>final File INDEX_DIR = new File("index");
>>
>>
>>   Class.forName("com.mysql.jdbc.Driver").newInstance();
>>   Connection conn =
>> DriverManager.getConnection("jdbc:mysql://127.0.0.1/test", "root",
>> "passwort");
>>   StandardAnalyzer analyzer = new StandardAnalyzer(null);
>>   IndexWriter writer = new IndexWriter(INDEX_DIR,
>> analyzer,
>> true);
>>   System.out.println("Indexing to directory '" +
>> INDEX_DIR
>> + "'...");
>>   indexDocs(writer, conn);
>>   writer.optimize();
>>   writer.close();
>>
>>
>>
>>}
>>
>>
>> private static  void indexDocs(IndexWriter writer, Connection conn)
>> throws
>> Exception {
>>  String sql = "select c_id, city from city";
>>  Statement stmt = conn.createStatement();
>>  ResultSet rs = stmt.executeQuery(sql);
>>  while (r

AW: index a database

2010-01-28 Thread Marc Schwarz
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);

any difference with that ?

-Ursprüngliche Nachricht-
Von: luciusvorenus [mailto:lucius.vore...@hotmail.de] 
Gesendet: Donnerstag, 28. Januar 2010 22:46
An: java-user@lucene.apache.org
Betreff: Re: index a database


lucene 3.3

i tried liek this

""
import org.apache.lucene.demo.FileDocument;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;
import java.sql.*;
import java.util.Properties;
import org.apache.lucene.document.*;
import java.io.*;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.SimpleFSDirectory;

public class Con{
public static void main(String[] args) throws Exception

  {
final File INDEX_DIR = new File("index");


   Class.forName("com.mysql.jdbc.Driver").newInstance();
   Connection conn =
DriverManager.getConnection("jdbc:mysql://127.0.0.1/test", "root",
"passwort");
   StandardAnalyzer analyzer = new StandardAnalyzer(null);
   Directory directory = new SimpleFSDirectory(new
File("home/lucius/Desktop/index"));
   IndexWriter w = new IndexWriter(directory, analyzer,true,
new
IndexWriter.MaxFieldLength(25000));
 
   System.out.println("Indexing to directory '" + INDEX_DIR
+ "'...");
   indexDocs(w, conn);
   w.optimize();
   w.close();



}
  

private static  void indexDocs(IndexWriter writer, Connection conn) throws
Exception {
  String sql = "select c_id, city from city";
  Statement stmt = conn.createStatement();
  ResultSet rs = stmt.executeQuery(sql);
  while (rs.next()) {
 Document d = new Document();
 d.add(new Field("c_id", rs.getString("c_id"), Field.Store.YES,
Field.Index.NO));
 d.add(new Field("city", rs.getString("city"), Field.Store.NO,
Field.Index.ANALYZED));
 
 writer.addDocument(d);
 }

}
}
 



I ' on the right track ?

now i get this message after compiling
""
Exception in thread "main" java.lang.NullPointerException
at
org.apache.lucene.analysis.StopFilter.getEnablePositionIncrementsVersionDefa
ult(StopFilter.java:162)
at
org.apache.lucene.analysis.standard.StandardAnalyzer.(StandardAnalyzer
.java:73)
at
org.apache.lucene.analysis.standard.StandardAnalyzer.(StandardAnalyzer
.java:63)
at Con.main(Con.java:29)
"""


once again many thanks

Erick Erickson wrote:
> 
> What version are you using? Because there's no such constructor
> (i.e. one that takes a File) in 3.0.
> 
> You might want to use something like FSDirectory.open(file) in
> your IndexWriter constructor
> 
> If this doesn't work, more details please
> 
> Erick
> 
> On Thu, Jan 28, 2010 at 3:30 PM, luciusvorenus
> wrote:
> 
>>
>> Hello
>>
>> I tried to index a database
>>
>> ""
>> import org.apache.lucene.demo.FileDocument;
>> import org.apache.lucene.document.Document;
>> import org.apache.lucene.document.Field;
>> import org.apache.lucene.analysis.Analyzer;
>> import org.apache.lucene.analysis.standard.StandardAnalyzer;
>> import org.apache.lucene.index.IndexWriter;
>> import org.apache.lucene.search.IndexSearcher;
>> import org.apache.lucene.search.Query;
>> import org.apache.lucene.queryParser.QueryParser;
>> import org.apache.lucene.store.FSDirectory;
>> import org.apache.lucene.util.Version;
>> import java.sql.*;
>> import java.util.Properties;
>> import org.apache.lucene.document.*;
>> import java.io.*;
>> import org.apache.lucene.store.Directory;
>> import org.apache.lucene.store.FSDirectory;
>>
>> public class Con{
>>public static void main(String[] args) throws Exception
>>
>>  {
>>final File INDEX_DIR = new File("index");
>>
>>
>>   Class.forName("com.mysql.jdbc.Driver").newInstance();
>>   Connection conn =
>> DriverManager.getConnection("jdbc:mysql://127.0.0.1/test", "root",
>> "passwort");
>>   StandardAnalyzer analyzer = new StandardAnalyzer(null);
>>   IndexWriter writer = new IndexWriter(INDEX_DIR,
>> analyzer,
>> true);
>>   System.out.println("Indexing to directory '" +
>> INDEX_DIR
>> + "'...");
>>   indexDocs(writer, conn);
>>   writer.optimize();
>>   writer.close();
>>
>>
>>
>

Re: AW: index a database

2010-01-28 Thread luciusvorenus


""

Exception in thread "main" java.lang.NullPointerException
at
org.apache.lucene.analysis.StopFilter.getEnablePositionIncrementsVersionDefault(StopFilter.java:162)
at
org.apache.lucene.analysis.standard.StandardAnalyzer.(StandardAnalyzer.java:73)
at
org.apache.lucene.analysis.standard.StandardAnalyzer.(StandardAnalyzer.java:63)
at Con.main(Con.java:29)

"""
did you get  also this message  after the compilation?


br



Marc Schwarz wrote:
> 
> I had that problem yesterday... this works in my app:
> 
> Directory directory = new SimpleFSDirectory(new
> File("c:\\lucene\\index"));
> 
> IndexWriter w = new IndexWriter(directory, analyzer,true,
> new IndexWriter.MaxFieldLength(25000));
> 
> 
> 
> -Ursprüngliche Nachricht-
> Von: Erick Erickson [mailto:erickerick...@gmail.com] 
> Gesendet: Donnerstag, 28. Januar 2010 21:47
> An: java-user@lucene.apache.org
> Betreff: Re: index a database
> 
> What version are you using? Because there's no such constructor
> (i.e. one that takes a File) in 3.0.
> 
> You might want to use something like FSDirectory.open(file) in
> your IndexWriter constructor
> 
> If this doesn't work, more details please
> 
> Erick
> 
> On Thu, Jan 28, 2010 at 3:30 PM, luciusvorenus
> wrote:
> 
>>
>> Hello
>>
>> I tried to index a database
>>
>> ""
>> import org.apache.lucene.demo.FileDocument;
>> import org.apache.lucene.document.Document;
>> import org.apache.lucene.document.Field;
>> import org.apache.lucene.analysis.Analyzer;
>> import org.apache.lucene.analysis.standard.StandardAnalyzer;
>> import org.apache.lucene.index.IndexWriter;
>> import org.apache.lucene.search.IndexSearcher;
>> import org.apache.lucene.search.Query;
>> import org.apache.lucene.queryParser.QueryParser;
>> import org.apache.lucene.store.FSDirectory;
>> import org.apache.lucene.util.Version;
>> import java.sql.*;
>> import java.util.Properties;
>> import org.apache.lucene.document.*;
>> import java.io.*;
>> import org.apache.lucene.store.Directory;
>> import org.apache.lucene.store.FSDirectory;
>>
>> public class Con{
>>public static void main(String[] args) throws Exception
>>
>>  {
>>final File INDEX_DIR = new File("index");
>>
>>
>>   Class.forName("com.mysql.jdbc.Driver").newInstance();
>>   Connection conn =
>> DriverManager.getConnection("jdbc:mysql://127.0.0.1/test", "root",
>> "passwort");
>>   StandardAnalyzer analyzer = new StandardAnalyzer(null);
>>   IndexWriter writer = new IndexWriter(INDEX_DIR,
> analyzer,
>> true);
>>   System.out.println("Indexing to directory '" +
>> INDEX_DIR
>> + "'...");
>>   indexDocs(writer, conn);
>>   writer.optimize();
>>   writer.close();
>>
>>
>>
>>}
>>
>>
>> private static  void indexDocs(IndexWriter writer, Connection conn)
>> throws
>> Exception {
>>  String sql = "select c_id, city from city";
>>  Statement stmt = conn.createStatement();
>>  ResultSet rs = stmt.executeQuery(sql);
>>  while (rs.next()) {
>> Document d = new Document();
>> d.add(new Field("c_id", rs.getString("c_id"),
>> Field.Store.YES,
>> Field.Index.NO));
>> d.add(new Field("city", rs.getString("city"), Field.Store.NO,
>> Field.Index.ANALYZED));
>>
>> writer.addDocument(d);
>> }
>>
>> }
>> }
>>  """
>> and i get this message
>>
>> ""
>> symbol  : constructor
>>
>>
> IndexWriter(java.io.File,org.apache.lucene.analysis.standard.StandardAnalyze
> r,boolean)
>> location: class org.apache.lucene.index.IndexWriter
>>   IndexWriter writer = new IndexWriter(INDEX_DIR,
> analyzer,
>> true);
>>^
>> 1 error
>>
>> What i am doing wrong??
>>
>> I'm a newbie ...
>>
>> Thank U
>> --
>> View this message in context:
>> http://old.nabble.com/index-a-database-tp27358959p27358959.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
> 
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/index-a--mysql-database-tp27358959p27363463.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: AW: index a database

2010-01-28 Thread luciusvorenus

yes many thanks ..
 But  /.../my index folder is empty. Have I done something wrong in
"private static void indexDocs"? It is not indexed




Marc Schwarz wrote:
> 
> StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
> 
> any difference with that ?
> 
> -Ursprüngliche Nachricht-
> Von: luciusvorenus [mailto:lucius.vore...@hotmail.de] 
> Gesendet: Donnerstag, 28. Januar 2010 22:46
> An: java-user@lucene.apache.org
> Betreff: Re: index a database
> 
> 
> lucene 3.3
> 
> i tried liek this
> 
> ""
> import org.apache.lucene.demo.FileDocument;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field;
> import org.apache.lucene.analysis.Analyzer;
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.queryParser.QueryParser;
> import org.apache.lucene.store.FSDirectory;
> import org.apache.lucene.util.Version;
> import java.sql.*;
> import java.util.Properties;
> import org.apache.lucene.document.*;
> import java.io.*;
> import org.apache.lucene.store.Directory;
> import org.apache.lucene.store.FSDirectory;
> import org.apache.lucene.store.SimpleFSDirectory;
> 
> public class Con{
>   public static void main(String[] args) throws Exception
> 
> {
>   final File INDEX_DIR = new File("index");
> 
>   
>  Class.forName("com.mysql.jdbc.Driver").newInstance();
>  Connection conn =
> DriverManager.getConnection("jdbc:mysql://127.0.0.1/test", "root",
> "passwort");
>  StandardAnalyzer analyzer = new StandardAnalyzer(null);
>Directory directory = new SimpleFSDirectory(new
> File("home/lucius/Desktop/index"));
>  IndexWriter w = new IndexWriter(directory, analyzer,true,
> new
> IndexWriter.MaxFieldLength(25000));
>  
>  System.out.println("Indexing to directory '" + INDEX_DIR
> + "'...");
>  indexDocs(w, conn);
>  w.optimize();
>  w.close();
>   
> 
>   
>   }
> 
> 
> private staticvoid indexDocs(IndexWriter writer, Connection conn) 
> throws
> Exception {
> String sql = "select c_id, city from city";
> Statement stmt = conn.createStatement();
> ResultSet rs = stmt.executeQuery(sql);
> while (rs.next()) {
>Document d = new Document();
>d.add(new Field("c_id", rs.getString("c_id"), Field.Store.YES,
> Field.Index.NO));
>d.add(new Field("city", rs.getString("city"), Field.Store.NO,
> Field.Index.ANALYZED));
>
>writer.addDocument(d);
>}
> 
> }
> }
>  
>   
>   
> 
> I ' on the right track ?
> 
> now i get this message after compiling
> ""
> Exception in thread "main" java.lang.NullPointerException
>   at
> org.apache.lucene.analysis.StopFilter.getEnablePositionIncrementsVersionDefa
> ult(StopFilter.java:162)
>   at
> org.apache.lucene.analysis.standard.StandardAnalyzer.(StandardAnalyzer
> .java:73)
>   at
> org.apache.lucene.analysis.standard.StandardAnalyzer.(StandardAnalyzer
> .java:63)
>   at Con.main(Con.java:29)
> """
> 
> 
> once again many thanks
> 
> Erick Erickson wrote:
>> 
>> What version are you using? Because there's no such constructor
>> (i.e. one that takes a File) in 3.0.
>> 
>> You might want to use something like FSDirectory.open(file) in
>> your IndexWriter constructor
>> 
>> If this doesn't work, more details please
>> 
>> Erick
>> 
>> On Thu, Jan 28, 2010 at 3:30 PM, luciusvorenus
>> wrote:
>> 
>>>
>>> Hello
>>>
>>> I tried to index a database
>>>
>>> ""
>>> import org.apache.lucene.demo.FileDocument;
>>> import org.apache.lucene.document.Document;
>>> import org.apache.lucene.document.Field;
>>> import org.apache.lucene.analysis.Analyzer;
>>> import org.apache.lucene.analysis.standard.StandardAnalyzer;
>>> import org.apache.lucene.index.IndexWriter;
>>> import org.apache.lucene.search.IndexSearcher;
>>> import org.apache.lucene.search.Query;
>>> import org.apache.lucene.queryParser.QueryParser;
>>> import org.apache.lucene.store.FSDirectory;
>>> import org.apache.lucene.util.Version;
>>> import java.sql.*;
>>> import java.util.Properties;
>>> import org.apache.lucene.document.*;
>>> import java.io.*;
>>> import org.apache.lucene.store.Directory;
>>> import org.apache.lucene.store.FSDirectory;
>>>
>>> public class Con{
>>>public static void main(String[] args) throws Exception
>>>
>>>  {
>>>final File INDEX_DIR = new File("index");
>>>
>>>
>>>   Class.forName("com.mysql.jdbc.Driver").newInstance();
>>>   Connection conn =
>>> DriverManager.getConnection("jdbc:mysql://127.0.0.1/test", "root",
>>> "passwort");
>>>  

AW: AW: index a database

2010-01-28 Thread Marc Schwarz
Maybe you should seperate the add method from the database function...

Separate the db loop something like that:

try
   {
ResultSet rs2 = stm.executeQuery(sql);
while(rs2.next()) {
String text = rs2.getString("textvalue");
addDoc(w, text,rs2.getString("id"));
count_ds++;
}


and then the addDoc seems something like that:


private static void addDoc(IndexWriter w, String value,String empty) throws
IOException {
Document doc = new Document();
doc.add(new Field("title", value, Field.Store.YES,
Field.Index.ANALYZED));
w.addDocument(doc);
  }


-Ursprüngliche Nachricht-
Von: luciusvorenus [mailto:lucius.vore...@hotmail.de] 
Gesendet: Donnerstag, 28. Januar 2010 23:02
An: java-user@lucene.apache.org
Betreff: Re: AW: index a database


yes many thanks ..
 But  /.../my index folder is empty. Have I done something wrong in
"private static void indexDocs"? It is not indexed




Marc Schwarz wrote:
> 
> StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
> 
> any difference with that ?
> 
> -Ursprüngliche Nachricht-
> Von: luciusvorenus [mailto:lucius.vore...@hotmail.de] 
> Gesendet: Donnerstag, 28. Januar 2010 22:46
> An: java-user@lucene.apache.org
> Betreff: Re: index a database
> 
> 
> lucene 3.3
> 
> i tried liek this
> 
> ""
> import org.apache.lucene.demo.FileDocument;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field;
> import org.apache.lucene.analysis.Analyzer;
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.queryParser.QueryParser;
> import org.apache.lucene.store.FSDirectory;
> import org.apache.lucene.util.Version;
> import java.sql.*;
> import java.util.Properties;
> import org.apache.lucene.document.*;
> import java.io.*;
> import org.apache.lucene.store.Directory;
> import org.apache.lucene.store.FSDirectory;
> import org.apache.lucene.store.SimpleFSDirectory;
> 
> public class Con{
>   public static void main(String[] args) throws Exception
> 
> {
>   final File INDEX_DIR = new File("index");
> 
>   
>  Class.forName("com.mysql.jdbc.Driver").newInstance();
>  Connection conn =
> DriverManager.getConnection("jdbc:mysql://127.0.0.1/test", "root",
> "passwort");
>  StandardAnalyzer analyzer = new StandardAnalyzer(null);
>Directory directory = new SimpleFSDirectory(new
> File("home/lucius/Desktop/index"));
>  IndexWriter w = new IndexWriter(directory, analyzer,true,
> new
> IndexWriter.MaxFieldLength(25000));
>  
>  System.out.println("Indexing to directory '" + INDEX_DIR
> + "'...");
>  indexDocs(w, conn);
>  w.optimize();
>  w.close();
>   
> 
>   
>   }
> 
> 
> private staticvoid indexDocs(IndexWriter writer, Connection conn)
throws
> Exception {
> String sql = "select c_id, city from city";
> Statement stmt = conn.createStatement();
> ResultSet rs = stmt.executeQuery(sql);
> while (rs.next()) {
>Document d = new Document();
>d.add(new Field("c_id", rs.getString("c_id"), Field.Store.YES,
> Field.Index.NO));
>d.add(new Field("city", rs.getString("city"), Field.Store.NO,
> Field.Index.ANALYZED));
>
>writer.addDocument(d);
>}
> 
> }
> }
>  
>   
>   
> 
> I ' on the right track ?
> 
> now i get this message after compiling
> ""
> Exception in thread "main" java.lang.NullPointerException
>   at
>
org.apache.lucene.analysis.StopFilter.getEnablePositionIncrementsVersionDefa
> ult(StopFilter.java:162)
>   at
>
org.apache.lucene.analysis.standard.StandardAnalyzer.(StandardAnalyzer
> .java:73)
>   at
>
org.apache.lucene.analysis.standard.StandardAnalyzer.(StandardAnalyzer
> .java:63)
>   at Con.main(Con.java:29)
> """
> 
> 
> once again many thanks
> 
> Erick Erickson wrote:
>> 
>> What version are you using? Because there's no such constructor
>> (i.e. one that takes a File) in 3.0.
>> 
>> You might want to use something like FSDirectory.open(file) in
>> your IndexWriter constructor
>> 
>> If this doesn't work, more details please
>> 
>> Erick
>> 
>> On Thu, Jan 28, 2010 at 3:30 PM, luciusvorenus
>> wrote:
>> 
>>>
>>> Hello
>>>
>>> I tried to index a database
>>>
>>> ""
>>> import org.apache.lucene.demo.FileDocument;
>>> import org.apache.lucene.document.Document;
>>> import org.apache.lucene.document.Field;
>>> import org.apache.lucene.analysis.Analyzer;
>>> import org.apache.lucene.analysis.standard.StandardAnalyzer;
>>> import org.apache.lucene.index.IndexWriter;
>>> 

index demo throws LockObtainFailedException

2010-01-28 Thread Teruhiko Kurosaka
We have many Linux machines of different brands, sharing the same NFS filesystem
for home.  The Lucene file indexing demo program is failing with 
LockObainFailedException 
only on one particular Linux machine (Fedora Core 4, x86).  I am including
the console output at the bottom of this message.

I tried Lucene 2.9.0, 2.9.1 and 3.0.0, and the result is identical.

After searching the Internet, I saw some postings suggesting that this happens
when the disk space is low. But there seem to be more than enough for this
small demo.  I didn't understand suggestions about lockd.  I'd appreciate
for any advices on how to find the cause of this Exception. 

Thank you in advance.

T. "Kuro" Kurosaka

-bash-3.00$ cd lucene-3.0.0/
-bash-3.00$ ant demo-index-text
Buildfile: build.xml

jar.core-check:

compile-demo:
[mkdir] Created dir: /basis/users/kuro/opt/lucene-3.0.0/build/classes/demo
[javac] Compiling 17 source files to 
/basis/users/kuro/opt/lucene-3.0.0/build/classes/demo

jar-demo:
  [jar] Building jar: 
/basis/users/kuro/opt/lucene-3.0.0/lucene-demos-3.0.0.jar

demo-index-text:
 [echo] - (1) Prepare dir -
 [echo] cd /basis/users/kuro/opt/lucene-3.0.0
 [echo] rmdir demo-text-dir
 [echo] mkdir demo-text-dir
[mkdir] Created dir: /basis/users/kuro/opt/lucene-3.0.0/demo-text-dir
 [echo] cd demo-text-dir
 [echo] - (2) Index the files located under 
/basis/users/kuro/opt/lucene-3.0.0/src -
 [echo] java -classpath 
"../lucene-core-3.0.0.jar;../lucene-demos-3.0.0.jar" 
org.apache.lucene.demo.IndexFiles ../src/demo
 [java]  caught a class org.apache.lucene.store.LockObtainFailedException
 [java]  with message: Lock obtain timed out: 
NativeFSLock@/basis/users/kuro/opt/lucene-3.0.0/demo-text-dir/index/write.lock: 
java.io.IOException: Input/output error

BUILD SUCCESSFUL
Total time: 6 seconds
-bash-3.00$ df -k . /tmp
Filesystem   1K-blocks  Used Available Use% Mounted on
storev:/vol/exports/users
 3119362560 2790661520 328701040  90% /basis/users
/dev/sda2  9718360   7700764   1515968  84% /

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: index demo throws LockObtainFailedException

2010-01-28 Thread Otis Gospodnetic
Fedora Core 4 is *ancient*! :)
Could it be that the NFS client on it is old, and this is causing problems?  I 
remember emails about NFS 3 vs. NFS 4 and some improvements in the latter.  I 
don't recall the details and tend to keep my Lucene and Solr instances away 
from NFS mounts.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
> From: Teruhiko Kurosaka 
> To: "java-user@lucene.apache.org" 
> Sent: Thu, January 28, 2010 8:15:26 PM
> Subject: index demo throws LockObtainFailedException
> 
> We have many Linux machines of different brands, sharing the same NFS 
> filesystem
> for home.  The Lucene file indexing demo program is failing with 
> LockObainFailedException 
> only on one particular Linux machine (Fedora Core 4, x86).  I am including
> the console output at the bottom of this message.
> 
> I tried Lucene 2.9.0, 2.9.1 and 3.0.0, and the result is identical.
> 
> After searching the Internet, I saw some postings suggesting that this happens
> when the disk space is low. But there seem to be more than enough for this
> small demo.  I didn't understand suggestions about lockd.  I'd appreciate
> for any advices on how to find the cause of this Exception. 
> 
> Thank you in advance.
> 
> T. "Kuro" Kurosaka
> 
> -bash-3.00$ cd lucene-3.0.0/
> -bash-3.00$ ant demo-index-text
> Buildfile: build.xml
> 
> jar.core-check:
> 
> compile-demo:
> [mkdir] Created dir: /basis/users/kuro/opt/lucene-3.0.0/build/classes/demo
> [javac] Compiling 17 source files to 
> /basis/users/kuro/opt/lucene-3.0.0/build/classes/demo
> 
> jar-demo:
>   [jar] Building jar: 
> /basis/users/kuro/opt/lucene-3.0.0/lucene-demos-3.0.0.jar
> 
> demo-index-text:
>  [echo] - (1) Prepare dir -
>  [echo] cd /basis/users/kuro/opt/lucene-3.0.0
>  [echo] rmdir demo-text-dir
>  [echo] mkdir demo-text-dir
> [mkdir] Created dir: /basis/users/kuro/opt/lucene-3.0.0/demo-text-dir
>  [echo] cd demo-text-dir
>  [echo] - (2) Index the files located under 
> /basis/users/kuro/opt/lucene-3.0.0/src -
>  [echo] java -classpath 
> "../lucene-core-3.0.0.jar;../lucene-demos-3.0.0.jar" 
> org.apache.lucene.demo.IndexFiles ../src/demo
>  [java]  caught a class org.apache.lucene.store.LockObtainFailedException
>  [java]  with message: Lock obtain timed out: 
> NativeFSLock@/basis/users/kuro/opt/lucene-3.0.0/demo-text-dir/index/write.lock:
>  
> java.io.IOException: Input/output error
> 
> BUILD SUCCESSFUL
> Total time: 6 seconds
> -bash-3.00$ df -k . /tmp
> Filesystem   1K-blocks  Used Available Use% Mounted on
> storev:/vol/exports/users
>  3119362560 2790661520 328701040  90% /basis/users
> /dev/sda2  9718360   7700764   1515968  84% /
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Modifying IDF

2010-01-28 Thread Franz Allan Valencia See
Good day,

I am currently using lucene for my searches. And one of the problems that Im
facing is when keyword is a url. The tokens such as http, https, ://, index,
html, etc seems to be messing up with our search results. The focus was
supposed to be only on the url domain.

The idea that I have is modify the idf so that rare terms get boosted much
more than the default settings in lucene. Since there are probably a lot of
http, https://, etc, then matches to these terms should be really really
low, while matches to the domain (which is rare) should be high.

Would this work or am I totally misunderstanding lucene's tf/idf? :-)

Thanks,

-- 
Franz Allan Valencia See | Java Software Engineer
franz@gmail.com
LinkedIn: http://www.linkedin.com/in/franzsee
Twitter: http://www.twitter.com/franz_see