Changes as we approach v4

2012-09-21 Thread david.w.smi...@gmail.com
Rob,
  It appears you are in-effect the Release Manager for v4.0 so I'm
asking you this question.  Clearly v4 is going to be out soon and
consequently we're not pushing new features to the v4 branch.
Regarding the new spatial codebase, there isn't a backwards
compatibility concern to changes until v4 is actually released.  In
your opinion, is it too late to do class renames in this area? --
LUCENE-4374 is about renaming TwoDoublesStrategy to
PointVectorStrategy (much better name; the old name is crap and that's
my fault).   And FYI I intend to add a bunch of javadocs to all
spatial classes this weekend.

Thanks for all the time you spend on doing your R.M. duties -- it's a
ton of work that few people would step forward to do.

~ David

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [JENKINS] Lucene-Solr-trunk-Linux-Java7-64 - Build # 438 - Failure!

2012-06-29 Thread david.w.smi...@gmail.com
I added the missing ASL header.

On Thu, Jun 28, 2012 at 4:54 PM, Policeman Jenkins Server <
[email protected]> wrote:

> Build:
> http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux-Java7-64/438/
>
> All tests passed
>
> Build Log:
> [...truncated 15182 lines...]
> BUILD FAILED
> /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux-Java7-64/checkout/build.xml:62:
> The following error occurred while executing this line:
> /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux-Java7-64/checkout/lucene/build.xml:270:
> The following error occurred while executing this line:
> /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux-Java7-64/checkout/lucene/common-build.xml:1435:
> The following error occurred while executing this line:
> /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux-Java7-64/checkout/lucene/common-build.xml:1275:
> Rat problems were found!
>
> Total time: 6 seconds
> Build step 'Execute shell' marked build as failure
> Archiving artifacts
> Recording test results
> Email was triggered for: Failure
> Sending email for trigger: Failure
>
>
>


BooleanFilter MUST clauses and getDocIdSet(acceptDocs)

2012-11-07 Thread david.w.smi...@gmail.com
I am about to write a Filter that only operates on a set of documents that
have already passed other filter(s).  It's rather expensive, since it has
to use DocValues to examine a value and then determine if its a match.  So
it scales O(n) where n is the number of documents it must see.  The 2nd arg
of getDocIdSet is Bits acceptDocs.  Unfortunately Bits doesn't have an int
iterator but I can deal with that seeing if it extends DocIdSet.

I'm looking at BooleanFilter which I want to use and I notice that it
passes null to filter.getDocIdSet for acceptDocs, and it justifies this
with the following comment:
// we dont pass acceptDocs, we will filter at the end using an additional
filter
Uwe wrote this comment in relation to LUCENE-1536 (r1188624).
For the MUST clause loop, couldn't it give it the accumulated bits of the
MUST clauses?

~ David


Re: Welcome back, Wolfgang Hoschek!

2013-09-26 Thread david.w.smi...@gmail.com
Nice!  Welcome back Wolfgang!


On Thu, Sep 26, 2013 at 6:21 AM, Uwe Schindler wrote:

> Hi,
>
> I'm pleased to announce that after a long abstinence, Wolfgang Hoschek
> rejoined the Lucene/Solr committer team. He is working now at Cloudera and
> plans to help with the integration of Solr and Hadoop.
> Wolfgang originally wrote the MemoryIndex, which is used by the classical
> Lucene highlighter and ElasticSearch's percolator module.
>
> Looking forward to new contributions.
>
> Welcome back & heavy committing! :-)
> Uwe
>
> P.S.: Wolfgang, as soon as you have setup your subversion access, you
> should add yourself back to the committers list on the website as well.
>
> -
> Uwe Schindler
> [email protected]
> Apache Lucene PMC Chair / Committer
> Bremen, Germany
> http://lucene.apache.org/
>
>
>


Fwd: [JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.6.0_45) - Build # 6066 - Still Failing!

2013-06-14 Thread david.w.smi...@gmail.com
Dawid,

Could you please take a look at the reproducibility of this test failure in
lucene/spatial?  I tried to reproduce it but couldn't, and I thought
perhaps you might have some insight because I'm using some
RandomizedTesting features that aren't as often used, like @Repeat.  For
example, one thing fishy is this log message:

[junit4:junit4]   2> NOTE: reproduce with: ant test
 -Dtestcase=SpatialOpRecursivePrefixTreeTest -Dtests.method="testContains
{#1 seed=[9166D28D6532217A:472BE5C4B7344982]}"
-Dtests.seed=9166D28D6532217A -Dtests.multiplier=3 -Dtests.slow=true
-Dtests.locale=uk_UA -Dtests.timezone=Etc/GMT-6 -Dtests.file.encoding=UTF-8

Notice the -Dtests.method="testContains {#1
seed=[9166D28D6532217A:472BE5C4B7344982]}" part, which is wrong because if
I do that, it'll not find the method to test.  If I change this to simply
testContains, and set the seed normally -Dtests.seed=91 then I still
can't reproduce the problem.  This test appears to have failed a bunch of
times lately with different seeds.

~ David

-- Forwarded message --
From: Policeman Jenkins Server 
Date: Fri, Jun 14, 2013 at 9:33 PM
Subject: [JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.6.0_45) - Build # 6066
- Still Failing!
To: [email protected]


Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/6066/
Java: 32bit/jdk1.6.0_45 -server -XX:+UseSerialGC

1 tests failed.
FAILED:
 org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.testContains
{#1 seed=[9166D28D6532217A:472BE5C4B7344982]}

Error Message:
Shouldn't match I
#0:ShapePair(Rect(minX=102.0,maxX=112.0,minY=-36.0,maxY=120.0) ,
Rect(minX=168.0,maxX=175.0,minY=-1.0,maxY=11.0))
Q:Rect(minX=0.0,maxX=256.0,minY=-128.0,maxY=128.0)

Stack Trace:
java.lang.AssertionError: Shouldn't match I
#0:ShapePair(Rect(minX=102.0,maxX=112.0,minY=-36.0,maxY=120.0) ,
Rect(minX=168.0,maxX=175.0,minY=-1.0,maxY=11.0))
Q:Rect(minX=0.0,maxX=256.0,minY=-128.0,maxY=128.0)
at
__randomizedtesting.SeedInfo.seed([9166D28D6532217A:472BE5C4B7344982]:0)
at org.junit.Assert.fail(Assert.java:93)
at
org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.fail(SpatialOpRecursivePrefixTreeTest.java:287)
at
org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.doTest(SpatialOpRecursivePrefixTreeTest.java:273)
at
org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.testContains(SpatialOpRecursivePrefixTreeTest.java:101)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
  

Re: Solr: Serving Javadoc from Jetty server

2014-04-17 Thread david.w.smi...@gmail.com
Alex,
Yes it would be useful (of course)!  In addition, the admin UI should have
a link to it, in addition to the generic documentation link. Create an
issue and I’ll commit it.
~ David


On Thu, Apr 17, 2014 at 6:54 AM, Alexandre Rafalovitch
wrote:

> Hello,
>
> The binary Solr distribution includes Javadoc, but it just sits there.
>
> I just tested adding second Jetty context that makes that Javadoc
> served under /javadoc handle.
>
> I think it is useful as sometimes Javadoc breaks when it is loaded
> from local filesystem (I think), plus it opens up other options like
> linking to it from other places.
>
> Would this be useful as a contribution? The context file is at:
> https://github.com/arafalov/Solr-Javadoc/tree/master/JettyContext
>
> Regards,
>Alex.
>
> Personal website: http://www.outerthoughts.com/
> Current project: http://www.solr-start.com/ - Accelerating your Solr
> proficiency
>
> -
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


Re: [JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.7.0_55) - Build # 10106 - Still Failing!

2014-04-18 Thread david.w.smi...@gmail.com
This build started before I fixed the issue; it’s already fixed.


On Fri, Apr 18, 2014 at 9:12 AM, Policeman Jenkins Server <
[email protected]> wrote:

> Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/10106/
> Java: 64bit/jdk1.7.0_55 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC
>
> All tests passed
>
> Build Log:
> [...truncated 44392 lines...]
> -documentation-lint:
>  [echo] checking for broken html...
> [jtidy] Checking for broken html (such as invalid tags)...
>[delete] Deleting directory
> /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build/jtidy_tmp
>  [echo] Checking for broken links...
>  [exec]
>  [exec] Crawl/parse...
>  [exec]
>  [exec] Verify...
>  [exec]
>  [exec]
> file:///build/docs/spatial/org/apache/lucene/spatial/prefix/PrefixTreeStrategy.html
>  [exec]   BROKEN LINK:
> file:///build/docs/core/org/apache/lucene/spatial.prefix.CellTokenStream.html
>  [exec]   BROKEN LINK:
> file:///build/docs/core/org/apache/lucene/spatial.prefix.CellTokenStream.html
>  [exec]
>  [exec]
> file:///build/docs/spatial/org/apache/lucene/spatial/prefix/RecursivePrefixTreeStrategy.html
>  [exec]   BROKEN LINK:
> file:///build/docs/core/org/apache/lucene/spatial.prefix.CellTokenStream.html
>  [exec]   BROKEN LINK:
> file:///build/docs/core/org/apache/lucene/spatial.prefix.CellTokenStream.html
>  [exec]
>  [exec] Broken javadocs links were found!
>
> BUILD FAILED
> /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:467: The
> following error occurred while executing this line:
> /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:63: The
> following error occurred while executing this line:
> /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build.xml:208:
> The following error occurred while executing this line:
> /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build.xml:221:
> The following error occurred while executing this line:
> /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:2330:
> exec returned: 1
>
> Total time: 68 minutes 2 seconds
> Build step 'Invoke Ant' marked build as failure
> Description set: Java: 64bit/jdk1.7.0_55 -XX:+UseCompressedOops
> -XX:+UseConcMarkSweepGC
> Archiving artifacts
> Recording test results
> Email was triggered for: Failure
> Sending email for trigger: Failure
>
>
>
>
> -
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>


Re: maximum number of shards per SolrCloud

2014-04-21 Thread david.w.smi...@gmail.com
Zhifeng,
Please ask Solr questions on the solr-user list.

Thanks.
~ David


On Mon, Apr 21, 2014 at 9:54 PM, Zhifeng Wang wrote:

> Hi,
>
> We are facing a high incoming rate of usually small documents (logs). The
> incoming rate is initially assumed at 2K/sec but could reach as high as
> 20K/sec. So a year's worth of data could reach 60G (assuming the rate at
> 2K/sec) searchable documents.
>
> Since a single shard can contain no more than 2G documents, we will need
> at least 30 shards per year. Considering that we don't want to have shards
> to their maximum capacity, the shards we need will be considerably higher.
>
> My question is whether there is a hard (not possible) or soft (bad
> performance) limit on the number of shards per SolrCloud. ZooKeeper
> defaults file size to 1M, so I guess that causes some limit. If I set the
> value to a larger number, will SolrCloud really scales OK if there
> thousands of shards?  Or I would be better off using multiple SolrCloud to
> handle the data (Result aggregation is done outside of SolrCloud)?
>
> Thanks,
> Zhifeng
>


DocumentsWriterPerThread architecture

2014-04-30 Thread david.w.smi...@gmail.com
Is this still up to date?:
https://blog.trifork.com/2011/04/01/gimme-all-resources-you-have-i-can-use-them/
I thought at some point subsequently, some significant work was done, and
perhaps it was blogged. But I can’t find it.
~ David


Encoding data in terms; UTF8 concerns?

2014-05-10 Thread david.w.smi...@gmail.com
I’m working on an encoding of numbers / data into indexed terms.  In the
past I limited the encoding to ASCII but now I’m doing it at a more
raw/byte level.  Do I have to be aware of UTF8 / sorting issues when I do
this?  I noticed the following code in NumericUtils.java, line 186:
while (nChars > 0) {
  // Store 7 bits per byte for compatibility
  // with UTF-8 encoding of terms
  bytes.bytes[nChars--] = (byte)(sortableBits & 0x7f);
  sortableBits >>>= 7;
}
It’s the comment more than anything that has my attention. Do I have to
limit my bytes to only the low 7 bits?  If so, why?  I’ve already written a
bunch of code that generates the terms without consideration for this, and
I think a bug I’m looking at could be related to this.

~ David
p.s. sorry to be CC’ing some folks directly but the mailing list is having
problems


Re: Encoding data in terms; UTF8 concerns?

2014-05-11 Thread david.w.smi...@gmail.com
Thank you for the background info Uwe!  It turns out my encoding was fine;
I had some other bug.
-- David

On Sunday, May 11, 2014, Uwe Schindler  wrote:

> Hi David,
>
>
>
> the reason why NumericUtils does the encoding in that way is just:
> NumericField encoding was introduced in Lucene 2.9, where all terms were
> char[], encoded in UTF-8 on the index side. Because of that, encoding each
> byte with full 8 bits wuld have been a large overhead in index size: Each
> term would get an additional byte, because java chars 128…255 would be
> encoded in 2 bytes because of UTF-8. Because of this NumericField uses 7
> bits only.
>
> Because we cannot easily change the numeric encoding (we won’t be able to
> change it ever, unless we have information about the terms in Field
> metadata on the index side), this encoding stayed alive up to now – so it’s
> all about index backwards compatibility.
>
>
>
> If you introduce a new field for spatial, you don’t need to take care
> about this. Since Lucene 4 all terms are byte[] and are sorted in binary
> order. The order of terms in index is given by BytesRef.compareTo(), which
> is pure binary. The good thing for us:  UTF-8 order for string terms (which
> is used in Lucene) is identical to byte[] order, but it is different to
> UTF-16 order (this is why we need a crazy backwards layer to read 3.x
> indexes: terms are sorted slightly differently). We do full 8 bit encoding
> already for Collation fields see CollationKeyAttributeFactory, which
> encoded terms instead of UTF-8 with their collation key).
>
>
>
> Uwe
>
>
>
> -
>
> Uwe Schindler
>
> H.-H.-Meier-Allee 63, D-28213 Bremen
>
> http://www.thetaphi.de
>
> eMail: [email protected] 
>
>
>
> *From:* 
> [email protected][mailto:
> [email protected]]
>
> *Sent:* Sunday, May 11, 2014 1:17 AM
> *To:* 
> [email protected]
> *Cc:* Uwe Schindler; Michael McCandless
> *Subject:* Encoding data in terms; UTF8 concerns?
>
>
>
> I’m working on an encoding of numbers / data into indexed terms.  In the
> past I limited the encoding to ASCII but now I’m doing it at a more
> raw/byte level.  Do I have to be aware of UTF8 / sorting issues when I do
> this?  I noticed the following code in NumericUtils.java, line 186:
>
> while (nChars > 0) {
>
>   // Store 7 bits per byte for compatibility
>
>   // with UTF-8 encoding of terms
>
>   bytes.bytes[nChars--] = (byte)(sortableBits & 0x7f);
>
>   sortableBits >>>= 7;
>
> }
>
> It’s the comment more than anything that has my attention. Do I have to
> limit my bytes to only the low 7 bits?  If so, why?  I’ve already written a
> bunch of code that generates the terms without consideration for this, and
> I think a bug I’m looking at could be related to this.
>
>
>
> ~ David
>
> p.s. sorry to be CC’ing some folks directly but the mailing list is having
> problems
>


-- 
Sent from Gmail Mobile


Re: [JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.7.0_60-ea-b15) - Build # 10394 - Still Failing!

2014-05-26 Thread david.w.smi...@gmail.com
I’ll dig.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, May 27, 2014 at 12:04 AM, Policeman Jenkins Server <
[email protected]> wrote:

> Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/10394/
> Java: 64bit/jdk1.7.0_60-ea-b15 -XX:-UseCompressedOops
> -XX:+UseConcMarkSweepGC
>
> 1 tests failed.
> FAILED:
>  
> org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.testContains
> {#0 seed=[349A33E5B7DF73A2:5681B850511BD076]}
>
> Error Message:
> Should have matched
> I#4:ShapePair(Rect(minX=-74.0,maxX=-56.0,minY=-8.0,maxY=1.0) ,
> Rect(minX=-180.0,maxX=134.0,minY=-90.0,maxY=90.0))
> Q:Rect(minX=-180.0,maxX=180.0,minY=-90.0,maxY=90.0)
>
> Stack Trace:
> java.lang.AssertionError: Should have matched
> I#4:ShapePair(Rect(minX=-74.0,maxX=-56.0,minY=-8.0,maxY=1.0) ,
> Rect(minX=-180.0,maxX=134.0,minY=-90.0,maxY=90.0))
> Q:Rect(minX=-180.0,maxX=180.0,minY=-90.0,maxY=90.0)
> at
> __randomizedtesting.SeedInfo.seed([349A33E5B7DF73A2:5681B850511BD076]:0)
> at org.junit.Assert.fail(Assert.java:93)
> at
> org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.fail(SpatialOpRecursivePrefixTreeTest.java:361)
> at
> org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.doTest(SpatialOpRecursivePrefixTreeTest.java:348)
> at
> org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.testContains(SpatialOpRecursivePrefixTreeTest.java:130)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
> at
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
> at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
> at
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
> at
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
> at
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
> at
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:793)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:453)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
> at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
> at
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
> at
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
> at
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>

Re: [JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0_20-ea-b15) - Build # 10472 - Failure!

2014-06-04 Thread david.w.smi...@gmail.com
Thanks for fixing, Rob.

~ David

On Wed, Jun 4, 2014 at 10:49 PM, Policeman Jenkins Server <
[email protected]> wrote:

> Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/10472/
> Java: 32bit/jdk1.8.0_20-ea-b15 -server -XX:+UseG1GC
>
> 1 tests failed.
> FAILED:
>  
> org.apache.lucene.spatial.prefix.BaseNonFuzzySpatialOpStrategyTest.initializationError
>
> Error Message:
> Suite class
> org.apache.lucene.spatial.prefix.BaseNonFuzzySpatialOpStrategyTest should
> be a concrete class (not abstract).
>
> Stack Trace:
> java.lang.RuntimeException: Suite class
> org.apache.lucene.spatial.prefix.BaseNonFuzzySpatialOpStrategyTest should
> be a concrete class (not abstract).
> at
> com.carrotsearch.randomizedtesting.Validation$ClassValidation.isConcreteClass(Validation.java:90)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.validateTarget(RandomizedRunner.java:1681)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.(RandomizedRunner.java:379)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
> at
> org.junit.internal.builders.AnnotatedBuilder.buildRunner(AnnotatedBuilder.java:31)
> at
> org.junit.internal.builders.AnnotatedBuilder.runnerForClass(AnnotatedBuilder.java:24)
> at
> org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:57)
> at
> org.junit.internal.builders.AllDefaultPossibilitiesBuilder.runnerForClass(AllDefaultPossibilitiesBuilder.java:29)
> at
> org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:57)
> at
> org.junit.internal.requests.ClassRequest.getRunner(ClassRequest.java:24)
> at
> com.carrotsearch.ant.tasks.junit4.slave.SlaveMain.execute(SlaveMain.java:176)
> at
> com.carrotsearch.ant.tasks.junit4.slave.SlaveMain.main(SlaveMain.java:276)
> at
> com.carrotsearch.ant.tasks.junit4.slave.SlaveMainSafe.main(SlaveMainSafe.java:12)
>
>
>
>
> Build Log:
> [...truncated 9585 lines...]
>[junit4] Suite:
> org.apache.lucene.spatial.prefix.BaseNonFuzzySpatialOpStrategyTest
>[junit4] ERROR   0.04s J1 |
> BaseNonFuzzySpatialOpStrategyTest.initializationError <<<
>[junit4]> Throwable #1: java.lang.RuntimeException: Suite class
> org.apache.lucene.spatial.prefix.BaseNonFuzzySpatialOpStrategyTest should
> be a concrete class (not abstract).
>[junit4]>at
> com.carrotsearch.randomizedtesting.Validation$ClassValidation.isConcreteClass(Validation.java:90)
>[junit4]>at
> java.lang.reflect.Constructor.newInstance(Constructor.java:408)
>[junit4] Completed on J1 in 0.04s, 1 test, 1 error <<< FAILURES!
>
> [...truncated 16 lines...]
> BUILD FAILED
> /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:467: The
> following error occurred while executing this line:
> /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:447: The
> following error occurred while executing this line:
> /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:45: The
> following error occurred while executing this line:
> /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/extra-targets.xml:37:
> The following error occurred while executing this line:
> /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build.xml:543:
> The following error occurred while executing this line:
> /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:2017:
> The following error occurred while executing this line:
> /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/module-build.xml:60:
> The following error occurred while executing this line:
> /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:1296:
> The following error occurred while executing this line:
> /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:920:
> There were test failures: 17 suites, 126 tests, 1 error, 12 ignored (2
> assumptions)
>
> Total time: 31 minutes 49 seconds
> Build step 'Invoke Ant' marked build as failure
> Description set: Java: 32bit/jdk1.8.0_20-ea-b15 -server -XX:+UseG1GC
> Archiving artifacts
> Recording test results
> Email was triggered for: Failure - Any
> Sending email for trigger: Failure - Any
>
>


Re: Trappy behavior with default search field

2014-06-05 Thread david.w.smi...@gmail.com
In my view, solrconfig.xml shouldn’t refer to any field by name out of the
box, except for the /browse handler, and perhaps pre-filling the query form
in the admin GUI.  That’s it.

A couple years ago at about the time I became a committer, I finally did
something about a feature I am very opinionated about something I hate (and
there are few things I hate; ‘qt’ is another) — specifying the default
search field and default operator in the schema. So thankfully it’s
commented out and deprecated now. I ideally would have have gone farther
such that the default solrconfig.xml doesn’t set “df” in /select.  Hoss
added that because, if I recall, some tests broke, amongst other possible
reasons.  In my view, the only reason to keep “df” pre-configured in
/select is for back-wards compatibility expectations with sample queries on
tutorials/websites, but I’d like to see it done for 5x at least.

Furthermore, in my view, “df” (and q.op) should only be “seen” as a
local-param, with the further modification that all top-level parameters
can become virtually local-params to the ‘q’ param.  I should be able to
write a “fq”, “facet.query”, or one of the other myriad of queries using
standard default Lucene syntax without a default field and with the
operator being assumed OR unless I locally change it in local-params.
 Doing otherwise is an ambiguous query if looking at the query by itself.

~ David

On Thu, Jun 5, 2014 at 12:48 PM, Erick Erickson 
wrote:

> We've all been bitten by the trappy problem with removing the "text" field
> from schema.xml and then Solr failing to start b/c various handlers
> specifically call it out.
>
> Problem is that as people build out (particularly) SolrCloud clusters,
> this innocent-seeming action is getting harder and harder to track down.
>
> Is it worth a JIRA to address? And any clues how to address it? I started
> thinking about in a _very_ superficial manner and I suspect that this is
> one of those things that _seems_ easy but turns into a sticky wicket.
>
> If we make sure and return a field that _is_ defined, then the problem
> becomes even harder to detect. I mean you don't even get any warning but
> don't find your docs b/c the default field isn't there and you're searching
> on a different field than you think. At least the current behavior
> sometimes causes Solr to not start at all.
>
> Im not even sure it's worth doing, we currently both print an error in the
> log and return an error message for a search, but wanted to gather other's
> thoughts.
>
>
>
>
>


Re: Extract values from custom function for ValueSource with multiple indexable fields

2014-06-08 Thread david.w.smi...@gmail.com
I suggest investigating this using a known example that does this, such as
LatLonType and geodist().  LatLonType registers the field in a custom way
too.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


On Sun, Jun 8, 2014 at 7:54 AM, Costi Muraru  wrote:

> Hi guys,
>
> I have a custom FieldType that adds several IndexableFields for each
> document.
> I also have a custom function, in which I want to retrieve these indexable
> fields. I can't seem to be able to do so. I have added some code snippets
> below.
> Any help is gladly appreciated.
>
> Thanks,
> Costi
>
> public class MyField extends FieldType {
> @Override
> public final java.util.List createFields(SchemaField
> field, Object val, float boost) {
> List result = new ArrayList();
> result.add(new Field(field.getName(), "field1", FIELD_TYPE));
> result.add(new Field(field.getName(), "123", FIELD_TYPE));
> result.add(new Field(field.getName(), "ABC", FIELD_TYPE));
> return result;
> }
> }
>
>
> public class MyFunctionParser extends ValueSourceParser {
> @Override
> public ValueSource parse(FunctionQParser fqp) throws SyntaxError {
> ValueSource fieldName = fqp.parseValueSource();
> return new MyFunction(fieldName);
> }
> }
>
> public class MyFunction extends ValueSource {
> ...
> @Override
> public FunctionValues getValues(Map context, AtomicReaderContext
> readerContext) throws IOException {
> final FunctionValues values = valueSource.getValues(context,
> readerContext);
> LOG.debug("Value is: " + values.strVal(doc); *// prints "123" -
> how can I retrieve the "field1" and "ABC" indexable fields as well?*
> }
> }
>
>


Re: Adding Morphline support to DIH - worth the effort?

2014-06-08 Thread david.w.smi...@gmail.com
> One of the ideas over DIH discussed earlier is making it standalone.

Yeah; my beef with the DIH is that it’s tied to Solr.  But I’d rather see
something other than the DIH outside Solr; it’s not worthy IMO.  Why have
something Solr specific even?  A great pipeline shouldn’t tie itself to any
end-point.  There are a variety of solutions out there that I tried.  There
are the big 3 open-source ETLs: Kettle, Clover, Talend) and they aren’t
quite ideal in one way or another.  And Spring-Integration.  And some
half-baked data pipelines like OpenPipe & Open Pipeline.  I never got
around to taking a good look at Findwise’s open-sourced Hydra but I learned
enough to know to my surprise it was configured in code versus a config
file (like all the others) and that's a big turn-off to me.  Today I read
through most of the Morphlines docs and a few choice source files and I’m
super-impressed.  But as you note it’s missing a lot of other stuff.  I
think something great could be built using it as a core piece.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


On Sun, Jun 8, 2014 at 5:51 PM, Mikhail Khludnev  wrote:

> Jack,
> I found your considerations quite reasonable.
> One of the ideas over DIH discussed earlier is making it standalone. So,
> if we start from simple Morphline UI, we can do this extraction. Then, such
> externalized ETL, will work better with Solr Cloud than DIH works now.
> Presumably we can reuse DIH Jdbc Datasources as a source for Morphline
> records.
> Still open questions in this approach are:
> - joins/caching - seem possible with Morphlines but still there is no such
> command
> - delta import - scenario we don't need to forget to handle it
> - threads (it's completely out Morphline's concerns)
> - distributed processing - it would be great if we can partition
> datasource eg something what's done by Scoop
> ... what else?
>
>
> On Sun, Jun 8, 2014 at 6:54 PM, Jack Krupansky 
> wrote:
>
>> I've avoided DIH like the plague since it really doesn't fit well in
>> Solr, so I'm still baffled as to why you think we need to use DIH as the
>> foundation for a Solr Morphlines project. That shouldn't stop you, but
>> what's the big impediment to taking a clean slate approach to Morphlines -
>> learn what we can from DIH, but do a fresh, clean "Solr 5.0" implementation
>> that is not burdened from the get-go with all of DIH's baggage?
>>
>> Configuring DIH is one of its main problems, so blending Morphlines
>> config into DIH config would seem to just make Morphlines less attractive
>> than it actually is when viewed by itself.
>>
>> You might also consider how ManifoldCF (another Apache project) would
>> integrate with DIH and Morphlines as well. I mean, the core use case is ETL
>> from external data sources. And how all of this relates to Apache Flume as
>> well.
>>
>> But back to the original, still unanswered, question: Why use DIH as the
>> starting point for integrating Morphlines with Solr - unless the goal is to
>> make Morphlines unpalatable and less approachable than even DIH itself?!
>>
>> Another question: What does Elasticsearch have in this area (besides
>> "rivers")? Are they headed in the Morphlines direction as well?
>>
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Alexandre Rafalovitch
>> Sent: Sunday, June 8, 2014 10:16 AM
>>
>> To: [email protected]
>> Subject: Re: Adding Morphline support to DIH - worth the effort?
>>
>> I see DIH as something that offers a quick way to get things done, as
>> long as they fit into DIH's couple of basic scenarios. Going even a
>> little beyond hits bugs, bad documentation, inconsistencies and lack
>> of ongoing support (e.g. SOLR-4383).
>>
>> So, if it works for you - great. If it does not - too bad, use SolrJ.
>> And given what I observe, I believe the next round of improvements
>> might be easier to achieve by moving to a different open-source pipe
>> project than trying to keep reinventing and bandaging one of our own.
>> Go where strongest community is, etc.
>>
>> Morphline can be seen as a replacement for DIH's EntityProcessors and
>> Transformers (Flume adds other bits). The reasons I think it is worth
>> looking at are as follows:
>> 1) DIH is not really being maintained or further improved. So, the
>> list of EP and Transformers is the same and does not account for new
>> requests (which we see periodically on the mailing list); even the new
>> implementations get stuck in JIRA (see the JIRA in original email)
>> 2) It's not terribly well documented either, so people are always
>> struggling to understand how the entity is actually generated and what
>> happens when things go wrong
>> 3) We are already bundling Morphline jars with Solr. But we are NOT
>> using them in any way useful to a non-Hadoop Solr user. Which begs the
>> question why did we add them (one answer I guess: because we don't
>> have module system).
>> 4) Morphlines have more primitives t

Re: [JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.8.0_20-ea-b15) - Build # 10516 - Still Failing!

2014-06-09 Thread david.w.smi...@gmail.com
I’m on it.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Jun 9, 2014 at 10:36 AM, Policeman Jenkins Server <
[email protected]> wrote:

> Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/10516/
> Java: 64bit/jdk1.8.0_20-ea-b15 -XX:-UseCompressedOops -XX:+UseParallelGC
>
> 1 tests failed.
> FAILED:  org.apache.lucene.spatial.prefix.DateNRStrategyTest.testWithin
> {#4 seed=[3F4202D795A1E146:B3DDDC492CA0D99]}
>
> Error Message:
> Shouldn't match I#2:[-264000 TO -264000-11-20] Q:-264000
>
> Stack Trace:
> java.lang.AssertionError: Shouldn't match I#2:[-264000 TO -264000-11-20]
> Q:-264000
> at
> __randomizedtesting.SeedInfo.seed([3F4202D795A1E146:B3DDDC492CA0D99]:0)
> at org.junit.Assert.fail(Assert.java:93)
> at
> org.apache.lucene.spatial.prefix.BaseNonFuzzySpatialOpStrategyTest.fail(BaseNonFuzzySpatialOpStrategyTest.java:128)
> at
> org.apache.lucene.spatial.prefix.BaseNonFuzzySpatialOpStrategyTest.testOperation(BaseNonFuzzySpatialOpStrategyTest.java:117)
> at
> org.apache.lucene.spatial.prefix.BaseNonFuzzySpatialOpStrategyTest.testOperationRandomShapes(BaseNonFuzzySpatialOpStrategyTest.java:64)
> at
> org.apache.lucene.spatial.prefix.DateNRStrategyTest.testWithin(DateNRStrategyTest.java:59)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
> at
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
> at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
> at
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
> at
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
> at
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
> at
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:793)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:453)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
> at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
> at
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
> at
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
> at
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
> at
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
> at
> org.apache.lucene.util.TestRuleIgnore

Re: Adding Morphline support to DIH - worth the effort?

2014-06-11 Thread david.w.smi...@gmail.com
LOL I had the very same reaction Alexandre.  Most of us don’t have all this
big data software sitting around, even if it is free.  Complexity.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


On Thu, Jun 12, 2014 at 12:44 AM, Alexandre Rafalovitch 
wrote:

> On Thu, Jun 12, 2014 at 7:43 AM, Wolfgang Hoschek 
> wrote:
> > On Hadoop, even the JDBC/SQL portion of DIH now seems mostly covered by
> a combination of Sqoop and MapReduceIndexerTool, and perhaps a bit of Hive.
>
> I appreciate that if you are in the Big Data space, you already have
> most of these pieces and the installation space is not a concern
> either.
>
> But for the others, the statement above is probably why DIH is still
> around. It's an easy way to cover those essential "read from
> database", "partial update from database" scenario. If one has to
> setup Sqoop+Hive+other bits to get it, it's probably too much to ask
> and might be too heavy to install. Certainly when they are starting
> with Solr.
>
> The question to me is: what is the _minimum_ set of technologies
> needed to be brought together to replace what DIH provides now. And
> what very Solr-specific gaps it leaves (includes progress indicator,
> SolrCloud, etc). And what's the space/complexity trade-off. Then,
> there is the rest of the questions. Such as: "Which tool/framework has
> the strongest overlapping community with Solr, so that everybody would
> benefit from adopting their platform".
>
> I think Morphline covers most, possibly all of the Entity Processors
> and Transformers in DIH. And maybe XML/File data sources too. But SQL
> data source is the main issue here. I can't tell whether Flume covers
> the DataSources scenario for SQL and makes it worth the upgrade.
>
> Regards,
>Alex.
>
> Personal website: http://www.outerthoughts.com/
> Current project: http://www.solr-start.com/ - Accelerating your Solr
> proficiency
>
> -
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


Re: [JENKINS] Lucene-Solr-trunk-Windows (32bit/jdk1.8.0_20-ea-b15) - Build # 4119 - Failure!

2014-06-15 Thread david.w.smi...@gmail.com
I’m on it.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


On Sun, Jun 15, 2014 at 10:30 PM, Policeman Jenkins Server <
[email protected]> wrote:

> Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/4119/
> Java: 32bit/jdk1.8.0_20-ea-b15 -client -XX:+UseParallelGC
>
> 1 tests failed.
> FAILED:
>  org.apache.lucene.spatial.prefix.DateNRStrategyTest.testIntersects {#9
> seed=[9A471D2338218380:7D5C3DAFC19B7D24]}
>
> Error Message:
> Should have matched I#1:[-1526755-03-07T23:18:19.371 TO
> -1526755-04-01T00:22] Q:[-1526755-04 TO -1526755-04-01T02:41:51.480]
>
> Stack Trace:
> java.lang.AssertionError: Should have matched
> I#1:[-1526755-03-07T23:18:19.371 TO -1526755-04-01T00:22] Q:[-1526755-04 TO
> -1526755-04-01T02:41:51.480]
> at
> __randomizedtesting.SeedInfo.seed([9A471D2338218380:7D5C3DAFC19B7D24]:0)
> at org.junit.Assert.fail(Assert.java:93)
> at
> org.apache.lucene.spatial.prefix.BaseNonFuzzySpatialOpStrategyTest.fail(BaseNonFuzzySpatialOpStrategyTest.java:128)
> at
> org.apache.lucene.spatial.prefix.BaseNonFuzzySpatialOpStrategyTest.testOperation(BaseNonFuzzySpatialOpStrategyTest.java:122)
> at
> org.apache.lucene.spatial.prefix.BaseNonFuzzySpatialOpStrategyTest.testOperationRandomShapes(BaseNonFuzzySpatialOpStrategyTest.java:64)
> at
> org.apache.lucene.spatial.prefix.DateNRStrategyTest.testIntersects(DateNRStrategyTest.java:53)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
> at
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
> at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
> at
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
> at
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
> at
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
> at
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:793)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:453)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
> at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
> at
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
> at
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
> at
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
> at
> org.apache.lucene.

Re: facet.mincount in SolrCloud

2014-06-16 Thread david.w.smi...@gmail.com
That doesn’t make sense to me either, Toke.  Have you tried changing it and
running tests to see that they pass?

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Jun 16, 2014 at 8:39 AM, Toke Eskildsen 
wrote:

> I am having a bit of a challenge getting
> https://issues.apache.org/jira/browse/SOLR-5894
> to work with SolrCloud. I have traced it down to the distributed
> faceting request always having mincount=0 when limit > 0, regardless of
> what I specify mincount to.
>
> In the method modifyRequest in FacetComponent.java at
>
> https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java
> line 238 is
>
>   dff.initialMincount = 0;  // TODO: we could change this to 1,
>   but would then need more refinement for small facet result sets?
>
> I do not understand the logic here: When my request is for mincount > 0,
> when does it ever make sense to have terms with count=0 returned from
> any shard?
>
> - Toke Eskildsen, State and University Library, Denmark
>
>
>
> -
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


Re: [JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.7.0_51) - Build # 9725 - Still Failing!

2014-03-18 Thread david.w.smi...@gmail.com
I'll look into this one and get it fixed ASAP.


On Tue, Mar 18, 2014 at 2:26 AM, Policeman Jenkins Server <
[email protected]> wrote:

> Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/9725/
> Java: 32bit/jdk1.7.0_51 -server -XX:+UseSerialGC
>
> 2 tests failed.
> FAILED:
>  
> org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.testContains
> {#7 seed=[175A10038A619363:146B27D88CDE16A]}
>
> Error Message:
> Shouldn't match I#0:Rect(minX=-180.0,maxX=180.0,minY=-90.0,maxY=90.0)
> Q:ShapePair(Rect(minX=-180.0,maxX=180.0,minY=-90.0,maxY=90.0) ,
> Rect(minX=-21.0,maxX=-14.0,minY=-26.0,maxY=-21.0))
>
> Stack Trace:
> java.lang.AssertionError: Shouldn't match
> I#0:Rect(minX=-180.0,maxX=180.0,minY=-90.0,maxY=90.0)
> Q:ShapePair(Rect(minX=-180.0,maxX=180.0,minY=-90.0,maxY=90.0) ,
> Rect(minX=-21.0,maxX=-14.0,minY=-26.0,maxY=-21.0))
> at
> __randomizedtesting.SeedInfo.seed([175A10038A619363:146B27D88CDE16A]:0)
> at org.junit.Assert.fail(Assert.java:93)
> at
> org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.fail(SpatialOpRecursivePrefixTreeTest.java:355)
> at
> org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.doTest(SpatialOpRecursivePrefixTreeTest.java:335)
> at
> org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.testContains(SpatialOpRecursivePrefixTreeTest.java:126)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1617)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:826)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:862)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:876)
> at
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
> at
> org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
> at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
> at
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
> at
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
> at
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
> at
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:783)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:443)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:835)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:737)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:771)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:782)
> at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
> at
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
> at
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
> at
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
> at
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
> at
> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
> 

Re: [JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.7.0_60-ea-b10) - Build # 9867 - Failure!

2014-03-21 Thread david.w.smi...@gmail.com
I'm definitely looking at it and I've found the problem.  I'm working on a
fix right now.

On Fri, Mar 21, 2014 at 3:27 PM, Michael McCandless <
[email protected]> wrote:

> I someone looking at this test failure?  Should we @BadApple it, or
> revert recent spatial changes, or something?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Mar 21, 2014 at 12:26 PM, Policeman Jenkins Server
>  wrote:
> > Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/9867/
> > Java: 64bit/jdk1.7.0_60-ea-b10 -XX:-UseCompressedOops
> -XX:+UseConcMarkSweepGC
> >
> > 1 tests failed.
> > FAILED:
>  org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.testWithin
> {#9 seed=[E934CCA05FA676E7:BFB8DC407C97398]}
> >
> > Error Message:
> > Shouldn't match I#4:Rect(minX=48.0,maxX=76.0,minY=-44.0,maxY=27.0)
> Q:Pt(x=120.0,y=0.0)
> >
>


Re: [JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.7.0_60-ea-b10) - Build # 9882 - Still Failing!

2014-03-23 Thread david.w.smi...@gmail.com
I'm looking in to this.


On Sun, Mar 23, 2014 at 5:45 AM, Policeman Jenkins Server <
[email protected]> wrote:

> Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/9882/
> Java: 64bit/jdk1.7.0_60-ea-b10 -XX:-UseCompressedOops -XX:+UseSerialGC
>
> 1 tests failed.
> FAILED:
>  org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.testWithin
> {#3 seed=[270CA22550192CCD:5FB2F89BF67A00A9]}
>
> Error Message:
> Shouldn't match I#2:Rect(minX=104.0,maxX=110.0,minY=-127.0,maxY=-119.0)
> Q:Pt(x=6.0,y=0.0)
>
> Stack Trace:
> java.lang.AssertionError: Shouldn't match
> I#2:Rect(minX=104.0,maxX=110.0,minY=-127.0,maxY=-119.0) Q:Pt(x=6.0,y=0.0)
> at
> __randomizedtesting.SeedInfo.seed([270CA22550192CCD:5FB2F89BF67A00A9]:0)
> at org.junit.Assert.fail(Assert.java:93)
> at
> org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.fail(SpatialOpRecursivePrefixTreeTest.java:358)
> at
> org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.doTest(SpatialOpRecursivePrefixTreeTest.java:338)
> at
> org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.testWithin(SpatialOpRecursivePrefixTreeTest.java:120)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1617)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:826)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:862)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:876)
> at
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
> at
> org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
> at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
> at
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
> at
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
> at
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
> at
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:783)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:443)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:835)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:737)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:771)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:782)
> at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
> at
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
> at
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
> at
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
> at
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
> at
> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(Thread

Re: [VOTE] Lucene / Solr 4.7.1 RC1

2014-03-26 Thread david.w.smi...@gmail.com
+1

SUCCESS! [2:13:44.301402]


On Tue, Mar 25, 2014 at 6:46 PM, Steve Rowe  wrote:

> Please vote for the first Release Candidate for Lucene/Solr 4.7.1.
>
> Download it here:
> <
> http://people.apache.org/~sarowe/staging_area/lucene-solr-4.7.1-RC1-rev1581444/
> >
>
> Smoke tester cmdline:
>
> python3.2 -u dev-tools/scripts/smokeTestRelease.py \
>
> http://people.apache.org/~sarowe/staging_area/lucene-solr-4.7.1-RC1-rev1581444/\
> 1581444 4.7.1 /tmp/4.7.1-smoke
>
> The smoke tester passed for me: SUCCESS! [1:08:24.099010]
>
> My vote: +1
>
> Steve
>
>
> -
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


Re: [VOTE] Lucene / Solr 4.7.1 RC2

2014-03-31 Thread david.w.smi...@gmail.com
+1

SUCCESS! [1:51:37.952160]


On Sat, Mar 29, 2014 at 4:46 AM, Steve Rowe  wrote:

> Please vote for the second Release Candidate for Lucene/Solr 4.7.1.
>
> Download it here:
> <
> https://people.apache.org/~sarowe/staging_area/lucene-solr-4.7.1-RC2-rev1582953/
> >
>
> Smoke tester cmdline (from the lucene_solr_4_7 branch):
>
> python3.2 -u dev-tools/scripts/smokeTestRelease.py \
>
> https://people.apache.org/~sarowe/staging_area/lucene-solr-4.7.1-RC2-rev1582953/\
> 1582953 4.7.1 /tmp/4.7.1-smoke
>
> The smoke tester passed for me: SUCCESS! [0:50:29.936732]
>
> My vote: +1
>
> Steve
> -
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


Re: Welcome Alan Woodward to the PMC

2014-04-02 Thread david.w.smi...@gmail.com
Welcome Alan!
~ David


On Wed, Apr 2, 2014 at 8:23 AM, Steve Rowe  wrote:

> I'm pleased to announce that Alan Woodward has accepted the PMC's
> invitation to join.
>
> Welcome Alan!
>
> - Steve
> -
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


Re: Anticipating a benchmark for direct posting format

2014-04-07 Thread david.w.smi...@gmail.com
Benson, I like your idea.

I think your idea can be achieved as a codec, one that wraps another codec
that establishes the on-disk format.  By default the wrapped codec can be
Lucene's default codec.  I think, if implemented, this would be a change to
DPF instead of an additional DPF-variant codec.

~ David


On Mon, Apr 7, 2014 at 9:22 AM, Benson Margulies wrote:

> On Mon, Apr 7, 2014 at 9:14 AM, Robert Muir  wrote:
> > On Thu, Apr 3, 2014 at 12:27 PM, Benson Margulies 
> wrote:
> >
> >>
> >> My takeaway from the prior conversation was that various people didn't
> >> entirely believe that I'd seen a dramatic improvement in query perfo
> >> using D-P-F, and so would not smile upon a patch intended to liberate
> >> D-P-F from codecs. It could be that the effect I saw has to do with
> >> the fact that our system depends on hitting and scoring 50% of the
> >> documents in an index with a lot of documents.
> >>
> >
> > I dont understand the word "liberate" here. why is it such a problem
> > that this is a codec?
>
>  I don't want to have to declare my intentions at the time I create
> the index. I don't want to have to use D-P-F for all readers all the
> time. Because I want to be able to decide to open up an index with an
> arbitrary on-disk format and get the in-memory cache behavior of
> D-P-F. Thus 'liberate' -- split the question of 'keep a copy in
> memory' from the choice of the on-disk format.
>
>
> >
> > i do not think we should give it any more status than that, it wastes
> > too much ram.
>
> It didn't seem like 'waste' when it solved a big practical for us. We
> had an application that was too slow, and had plenty of RAM available,
> and we were able to trade space for time by applying D-P-F.
>
> Maybe I'm going about this backwards; if I can come up with a small,
> inconspicuous proposed change that does what I want, there won't be
> any disagreement.
>
>
> >
> > -
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
>
> -
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


Re: Anticipating a benchmark for direct posting format

2014-04-07 Thread david.w.smi...@gmail.com
Aaaah, nice idea to simply use FilterAtomicReader -- of course!  So this
would ultimately be a new IndexReaderFactory that creates
FilterAtomicReaders for a subset of the fields you want to do this on.
 Cool!  With that, I don't think there would be a need for
DirectPostingsFormat as a postings format, would there be?

~ David


On Mon, Apr 7, 2014 at 10:58 AM, Shai Erera  wrote:

> The only problem is how the Codec makes a dynamic decision on whether to
> use the wrapped Codec for reading vs pre-load data into in-memory
> structures, because Codecs are loaded through reflection by the SPI loading
> mechanism.
>
> There is also a TODO in DirectPF to allow wrapping arbitrary PFs, just
> mentioning in case you want to tackle DPF.
>
> I think that if we allowed passing something like a CodecLookupService,
> with an SPILookupService default impl, you could easily pass that to
> DirectoryReader which will use your runtime logic to load the right PF
> (e.g. DPF) instead of the one the index was created with.
>
> But it sounds like the core problem is that when we load a Codec/PF/DVF
> for reading, we cannot pass it any arguments, and so we must make an
> index-time decision about how we're going to read the data later on. If we
> could somehow support that, I think that will help you to achieve what you
> want too.
>
> E.g. currently it's an all-or-nothing decision, but if we could pass a
> parameter like "50% available heap", the Codec/PF/DVF could cache the
> frequently accessed postings instead of loading all of them into memory.
> But, that can also be achieved at the IndexReader level, through a custom
> FilterAtomicReader. And if you could reuse DPF's structures (like
> DirectTermsEnum, DirectFields...), it should be easier to do this. So
> perhaps we can think about a DirectAtomicReader which does that? I believe
> it can share some code w/ DPF, as long as we don't make these APIs public,
> or make them @super.experimental and @super.expert.
>
> Just throwing some ideas...
>
> Shai
>
>
> On Mon, Apr 7, 2014 at 5:35 PM, [email protected] <
> [email protected]> wrote:
>
>> Benson, I like your idea.
>>
>> I think your idea can be achieved as a codec, one that wraps another
>> codec that establishes the on-disk format.  By default the wrapped codec
>> can be Lucene's default codec.  I think, if implemented, this would be a
>> change to DPF instead of an additional DPF-variant codec.
>>
>> ~ David
>>
>>
>> On Mon, Apr 7, 2014 at 9:22 AM, Benson Margulies 
>> wrote:
>>
>>> On Mon, Apr 7, 2014 at 9:14 AM, Robert Muir  wrote:
>>> > On Thu, Apr 3, 2014 at 12:27 PM, Benson Margulies <
>>> [email protected]> wrote:
>>> >
>>> >>
>>> >> My takeaway from the prior conversation was that various people didn't
>>> >> entirely believe that I'd seen a dramatic improvement in query perfo
>>> >> using D-P-F, and so would not smile upon a patch intended to liberate
>>> >> D-P-F from codecs. It could be that the effect I saw has to do with
>>> >> the fact that our system depends on hitting and scoring 50% of the
>>> >> documents in an index with a lot of documents.
>>> >>
>>> >
>>> > I dont understand the word "liberate" here. why is it such a problem
>>> > that this is a codec?
>>>
>>>  I don't want to have to declare my intentions at the time I create
>>> the index. I don't want to have to use D-P-F for all readers all the
>>> time. Because I want to be able to decide to open up an index with an
>>> arbitrary on-disk format and get the in-memory cache behavior of
>>> D-P-F. Thus 'liberate' -- split the question of 'keep a copy in
>>> memory' from the choice of the on-disk format.
>>>
>>>
>>> >
>>> > i do not think we should give it any more status than that, it wastes
>>> > too much ram.
>>>
>>> It didn't seem like 'waste' when it solved a big practical for us. We
>>> had an application that was too slow, and had plenty of RAM available,
>>> and we were able to trade space for time by applying D-P-F.
>>>
>>> Maybe I'm going about this backwards; if I can come up with a small,
>>> inconspicuous proposed change that does what I want, there won't be
>>> any disagreement.
>>>
>>>
>>> >
>>> > -
>>> > To unsubscribe, e-mail: [email protected]
>>> > For additional commands, e-mail: [email protected]
>>> >
>>>
>>> -
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>
>>
>


Re: Welcome Tim Potter as Lucene/Solr committer

2014-04-07 Thread david.w.smi...@gmail.com
Welcome Tim!


On Tue, Apr 8, 2014 at 12:40 AM, Steve Rowe  wrote:

> I'm pleased to announce that Tim Potter has accepted the PMC's invitation
> to become a committer.
>
> Tim, it's tradition that you introduce yourself with a brief bio.
>
> Once your account has been created - could take a few days - you'll be
> able to add yourself to the committers section of the Who We Are page on
> the website:  (use the ASF CMS
> bookmarklet at the bottom of the page here: <
> https://cms.apache.org/#bookmark> - more info here <
> http://www.apache.org/dev/cms.html>).
>
> Check out the ASF dev page - lots of useful links: <
> http://www.apache.org/dev/>.
>
> Congratulations and welcome!
>
> Steve
> -
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


Re: 4.7.2

2014-04-08 Thread david.w.smi...@gmail.com
LOL indeed ;-)
But in all seriousness, that should have no bearing on this conversation.

On Tue, Apr 8, 2014 at 3:00 AM, Alexandre Rafalovitch wrote:

> Let's hope nobody is trying to finish any books right now. :-)
> Personal website: http://www.outerthoughts.com/
> Current project: http://www.solr-start.com/ - Accelerating your Solr
> proficiency
>
>
> On Tue, Apr 8, 2014 at 1:55 PM, Simon Willnauer
>  wrote:
> > +1 to both 4.7.3 and 4.8 soon
> >
> > On Tue, Apr 8, 2014 at 8:40 AM, Uwe Schindler  wrote:
> >> Hi,
> >>
> >> I am fine! I would also like to push the first 4.8 RC builds soon! I
> will check the changes list and open issues and make a proposal soon.
> >>
> >> Uwe
> >>
> >> -
> >> Uwe Schindler
> >> H.-H.-Meier-Allee 63, D-28213 Bremen
> >> http://www.thetaphi.de
> >> eMail: [email protected]
> >>
> >>
> >>> -Original Message-
> >>> From: Robert Muir [mailto:[email protected]]
> >>> Sent: Monday, April 07, 2014 11:37 PM
> >>> To: [email protected]
> >>> Subject: 4.7.2
> >>>
> >>> Hello,
> >>>
> >>> I would like a 4.7.2 that fixes the corruption bug
> >>> (https://issues.apache.org/jira/browse/LUCENE-5574).
> >>>
> >>> I'd like to build an RC tomorrow night for this (I'll be RM). I think
> its fine if we
> >>> followup with e.g. a 4.7.3 out, but I want to be aggressive about this
> >>> corruption stuff.
> >>>
> >>> Thanks,
> >>> Robert
> >>>
> >>> -
> >>> To unsubscribe, e-mail: [email protected] For
> additional
> >>> commands, e-mail: [email protected]
> >>
> >>
> >> -
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >>
> >
> > -
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
>
> -
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


Blog post: Indexing Polygons In Lucene With Accuracy

2014-04-11 Thread david.w.smi...@gmail.com
FYI I published this blog post today:
http://www.opensourceconnections.com/2014/04/11/indexing-polygons-in-lucene-with-accuracy/
There's a strong Spatial4j connection because the SerializedDVStrategy
referenced uses the new BinaryCodec from Spatial4j 0.4.

~ David


Re: [JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0_20-ea-b15) - Build # 10597 - Still Failing!

2014-06-18 Thread david.w.smi...@gmail.com
This is not a spatial bug; it’s another case of:
https://issues.apache.org/jira/browse/LUCENE-5713

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


Re: [VOTE] 4.9.0

2014-06-21 Thread david.w.smi...@gmail.com
The smoke tester failed for me:

*lucene-solr_4x_svn*$ python3.3 -u dev-tools/scripts/smokeTestRelease.py
http://people.apache.org/~rmuir/staging_area/lucene_solr_4_9_0_r1604085/
1604085 4.9.0 /Volumes/RamDisk/tmp

JAVA7_HOME is
/Library/Java/JavaVirtualMachines/jdk1.7.0_51.jdk/Contents/Home

NOTE: output encoding is UTF-8


Load release URL "
http://people.apache.org/~rmuir/staging_area/lucene_solr_4_9_0_r1604085/";...


Test Lucene...

  test basics...

  get KEYS

0.1 MB in 0.69 sec (0.2 MB/sec)

  check changes HTML...

  download lucene-4.9.0-src.tgz...

27.6 MB in 94.12 sec (0.3 MB/sec)

verify md5/sha1 digests

verify sig

verify trust

  GPG: gpg: WARNING: This key is not certified with a trusted signature!

  download lucene-4.9.0.tgz...

61.7 MB in 226.09 sec (0.3 MB/sec)

verify md5/sha1 digests

verify sig

verify trust

  GPG: gpg: WARNING: This key is not certified with a trusted signature!

  download lucene-4.9.0.zip...

71.3 MB in 217.32 sec (0.3 MB/sec)

verify md5/sha1 digests

verify sig

verify trust

  GPG: gpg: WARNING: This key is not certified with a trusted signature!

  unpack lucene-4.9.0.tgz...

verify JAR metadata/identity/no javax.* or java.* classes...

test demo with 1.7...

  got 5727 hits for query "lucene"

check Lucene's javadoc JAR

  unpack lucene-4.9.0.zip...

verify JAR metadata/identity/no javax.* or java.* classes...

test demo with 1.7...

  got 5727 hits for query "lucene"

check Lucene's javadoc JAR

  unpack lucene-4.9.0-src.tgz...

Traceback (most recent call last):

  File "dev-tools/scripts/smokeTestRelease.py", line 1347, in 

  File "dev-tools/scripts/smokeTestRelease.py", line 1291, in main

  File "dev-tools/scripts/smokeTestRelease.py", line 1329, in smokeTest

  File "dev-tools/scripts/smokeTestRelease.py", line 637, in unpackAndVerify

  File "dev-tools/scripts/smokeTestRelease.py", line 708, in verifyUnpacked

RuntimeError: lucene: unexpected files/dirs in artifact
lucene-4.9.0-src.tgz: ['ivy-ignore-conflicts.properties']

And indeed, that file is there.


Re: [VOTE] 4.9.0

2014-06-22 Thread david.w.smi...@gmail.com
Got it.  Turns out I forgot I was on the 4.8 branch for some other reason.
 (and yes Walter, I updated my JDK too)

 SUCCESS! [1:47:35.355454]

+1  to release

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


On Sat, Jun 21, 2014 at 3:32 PM, Robert Muir  wrote:

> Not *the* smoketester, instead some outdated arbitrary random
> smoketester from the past.
>
> please, use the latest one from the 4.9 branch.
>
> This file is supposed to be there and the smoketester actually looks for
> it.
>
> On Sat, Jun 21, 2014 at 3:16 PM, [email protected]
>  wrote:
> > The smoke tester failed for me:
> >
> > lucene-solr_4x_svn$ python3.3 -u dev-tools/scripts/smokeTestRelease.py
> > http://people.apache.org/~rmuir/staging_area/lucene_solr_4_9_0_r1604085/
> > 1604085 4.9.0 /Volumes/RamDisk/tmp
> >
> > JAVA7_HOME is
> > /Library/Java/JavaVirtualMachines/jdk1.7.0_51.jdk/Contents/Home
> >
> > NOTE: output encoding is UTF-8
> >
> >
> > Load release URL
> > "
> http://people.apache.org/~rmuir/staging_area/lucene_solr_4_9_0_r1604085/
> "...
> >
> >
> > Test Lucene...
> >
> >   test basics...
> >
> >   get KEYS
> >
> > 0.1 MB in 0.69 sec (0.2 MB/sec)
> >
> >   check changes HTML...
> >
> >   download lucene-4.9.0-src.tgz...
> >
> > 27.6 MB in 94.12 sec (0.3 MB/sec)
> >
> > verify md5/sha1 digests
> >
> > verify sig
> >
> > verify trust
> >
> >   GPG: gpg: WARNING: This key is not certified with a trusted
> signature!
> >
> >   download lucene-4.9.0.tgz...
> >
> > 61.7 MB in 226.09 sec (0.3 MB/sec)
> >
> > verify md5/sha1 digests
> >
> > verify sig
> >
> > verify trust
> >
> >   GPG: gpg: WARNING: This key is not certified with a trusted
> signature!
> >
> >   download lucene-4.9.0.zip...
> >
> > 71.3 MB in 217.32 sec (0.3 MB/sec)
> >
> > verify md5/sha1 digests
> >
> > verify sig
> >
> > verify trust
> >
> >   GPG: gpg: WARNING: This key is not certified with a trusted
> signature!
> >
> >   unpack lucene-4.9.0.tgz...
> >
> > verify JAR metadata/identity/no javax.* or java.* classes...
> >
> > test demo with 1.7...
> >
> >   got 5727 hits for query "lucene"
> >
> > check Lucene's javadoc JAR
> >
> >   unpack lucene-4.9.0.zip...
> >
> > verify JAR metadata/identity/no javax.* or java.* classes...
> >
> > test demo with 1.7...
> >
> >   got 5727 hits for query "lucene"
> >
> > check Lucene's javadoc JAR
> >
> >   unpack lucene-4.9.0-src.tgz...
> >
> > Traceback (most recent call last):
> >
> >   File "dev-tools/scripts/smokeTestRelease.py", line 1347, in 
> >
> >   File "dev-tools/scripts/smokeTestRelease.py", line 1291, in main
> >
> >   File "dev-tools/scripts/smokeTestRelease.py", line 1329, in smokeTest
> >
> >   File "dev-tools/scripts/smokeTestRelease.py", line 637, in
> unpackAndVerify
> >
> >   File "dev-tools/scripts/smokeTestRelease.py", line 708, in
> verifyUnpacked
> >
> > RuntimeError: lucene: unexpected files/dirs in artifact
> > lucene-4.9.0-src.tgz: ['ivy-ignore-conflicts.properties']
> >
> >
> > And indeed, that file is there.
>
> -
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


Re: [JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build # 1682 - Still Failing!

2014-07-01 Thread david.w.smi...@gmail.com
Another case of:
https://issues.apache.org/jira/browse/LUCENE-5713
(cause unknown)

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Jul 1, 2014 at 6:08 PM, Policeman Jenkins Server <
[email protected]> wrote:

> Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/1682/
> Java: 64bit/jdk1.7.0 -XX:-UseCompressedOops -XX:+UseG1GC
>
> 1 tests failed.
> FAILED:
>  
> org.apache.lucene.spatial.prefix.RandomSpatialOpFuzzyPrefixTreeTest.testDisjoint
> {#9 seed=[9DF0DFC458CE610C:BB57C3A241E5FB1A]}
>
> Error Message:
> CheckReader failed
>
> Stack Trace:
> java.lang.RuntimeException: CheckReader failed
> at
> __randomizedtesting.SeedInfo.seed([9DF0DFC458CE610C:BB57C3A241E5FB1A]:0)
> at org.apache.lucene.util.TestUtil.checkReader(TestUtil.java:240)
> at org.apache.lucene.util.TestUtil.checkReader(TestUtil.java:218)
> at
> org.apache.lucene.util.LuceneTestCase.newSearcher(LuceneTestCase.java:1598)
> at
> org.apache.lucene.util.LuceneTestCase.newSearcher(LuceneTestCase.java:1572)
> at
> org.apache.lucene.util.LuceneTestCase.newSearcher(LuceneTestCase.java:1564)
> at
> org.apache.lucene.spatial.SpatialTestCase.commit(SpatialTestCase.java:131)
> at
> org.apache.lucene.spatial.prefix.RandomSpatialOpFuzzyPrefixTreeTest.doTest(RandomSpatialOpFuzzyPrefixTreeTest.java:294)
> at
> org.apache.lucene.spatial.prefix.RandomSpatialOpFuzzyPrefixTreeTest.testDisjoint(RandomSpatialOpFuzzyPrefixTreeTest.java:155)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
> at
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
> at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
> at
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
> at
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
> at
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
> at
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
> at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
> at
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
> at
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
> at
> org.apache.lucene.util.TestRuleMarkFail

Re: [JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build # 1682 - Still Failing!

2014-07-01 Thread david.w.smi...@gmail.com
Lets discuss on the issue:
https://issues.apache.org/jira/browse/LUCENE-5713?focusedCommentId=14049562&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14049562

On Tue, Jul 1, 2014 at 6:54 PM, Robert Muir  wrote:
>
> most likely the cause is the spatial module itself?
>
> Every other part of lucene uses docvalues, but this one still relies
> on fieldcache (i didnt change it, the apis for adding things are
> somewhat convoluted).
>
> FieldCache is historically lenient, it allows all kinds of nonsense,
> such as uninverting a multi-valued field as single-valued (e.g. leaves
> gaps in ordinals and other bullshit that will cause this assertion to
> fail).
>
> I can fix fieldcache to be strict (since everything else in the
> codebase is now well-behaved), so you get a better exception message
> that what the spatial module is doing is wrong?
>
> On Tue, Jul 1, 2014 at 6:13 PM, [email protected]
>  wrote:
> > Another case of:
> > https://issues.apache.org/jira/browse/LUCENE-5713
> > (cause unknown)
> >
> > ~ David Smiley
> > Freelance Apache Lucene/Solr Search Consultant/Developer
> > http://www.linkedin.com/in/davidwsmiley
> >
> >

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build # 1682 - Still Failing!

2014-07-01 Thread david.w.smi...@gmail.com
On Tue, Jul 1, 2014 at 6:54 PM, Robert Muir  wrote:
> FieldCache is historically lenient, it allows all kinds of nonsense,
> such as uninverting a multi-valued field as single-valued (e.g. leaves
> gaps in ordinals and other bullshit that will cause this assertion to
> fail).
>
> I can fix fieldcache to be strict (since everything else in the
> codebase is now well-behaved), so you get a better exception message
> that what the spatial module is doing is wrong?

If the FieldCache/UninvertingReader is so lenient, then perhaps
TestUtil.checkReader should never try to validate it?

~ David

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Single Field instance for both DocValues and indexed?

2014-07-03 Thread david.w.smi...@gmail.com
I was experimenting with having a user-provided/customized FieldType
for indexing code of (mostly) a set of numeric fields that are of a
common type.  The user/developer might want the type to both be
indexed & have docValues, or perhaps just one.  Or maybe stored
hypothetically for the purposes of this discussion.   Even though
Lucene’s FieldType allows you to configure both DocValues &
indexed=true, it appears impossible to provide a single Field instance
with both options; the constructors force an either-or situation.  Of
course I know I could add more fields depending on the options (for
example as seen in Solr’s FieldType); but I think it’s awkward.  It
*seems* that Lucene’s indexing guts (DefaultIndexingChain) are
agnostic of this.  Wouldn’t it be great if you could simply provide a
Field with a value and FieldType (with various options) and it’d just
work?  Higher up the stack (Solr and presumably ElasticSearch), there
are abstractions that basically make this possible, but why not at the
Lucene layer?

~ David

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: Single Field instance for both DocValues and indexed?

2014-07-03 Thread david.w.smi...@gmail.com
I overlooked a special constructor labelled “Expert” and discovered it
is possible… though I had to override numericValue which seems quite
hacky:

  private static class ComboField extends Field {
private ComboField(String name, Object value, FieldType type) {
  super(name, type);//this expert constructor allows us to have a
field that has docValues & indexed
  super.fieldsData = value;
}

//Is this a hack?  We assume that numericValue() is only called
for DocValues purposes.
@Override
public Number numericValue() {
  if (fieldType().numericType() == FieldType.NumericType.DOUBLE)
return Double.doubleToLongBits(super.numericValue().doubleValue());
  //TODO others
  throw new IllegalStateException("unsupported type:
"+fieldType().numericType());
}
  }

Why isn’t supporting a single Field with DocValues & indexed, etc.
supported more officially?

Any way, I’ll go with this for now.  FYI this very class is going to
show up in spatial BBoxStrategy in a new patch soon.

~ David


On Thu, Jul 3, 2014 at 12:48 PM, [email protected]
 wrote:
> I was experimenting with having a user-provided/customized FieldType
> for indexing code of (mostly) a set of numeric fields that are of a
> common type.  The user/developer might want the type to both be
> indexed & have docValues, or perhaps just one.  Or maybe stored
> hypothetically for the purposes of this discussion.   Even though
> Lucene’s FieldType allows you to configure both DocValues &
> indexed=true, it appears impossible to provide a single Field instance
> with both options; the constructors force an either-or situation.  Of
> course I know I could add more fields depending on the options (for
> example as seen in Solr’s FieldType); but I think it’s awkward.  It
> *seems* that Lucene’s indexing guts (DefaultIndexingChain) are
> agnostic of this.  Wouldn’t it be great if you could simply provide a
> Field with a value and FieldType (with various options) and it’d just
> work?  Higher up the stack (Solr and presumably ElasticSearch), there
> are abstractions that basically make this possible, but why not at the
> Lucene layer?
>
> ~ David

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [JENKINS] Lucene-Solr-trunk-Linux (32bit/ibm-j9-jdk7) - Build # 10760 - Still Failing!

2014-07-08 Thread david.w.smi...@gmail.com
I’m on it; this’ll get fixed momentarily.  Some co-related JIRA issues; one
got committed without the other.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Jul 8, 2014 at 2:54 PM, Policeman Jenkins Server <
[email protected]> wrote:

> Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/10760/
> Java: 32bit/ibm-j9-jdk7
> -Xjit:exclude={org/apache/lucene/util/fst/FST.pack(IIF)Lorg/apache/lucene/util/fst/FST;}
>
> 2 tests failed.
> FAILED:  org.apache.lucene.spatial.bbox.TestBBoxStrategy.testOperations
> {#5 seed=[CDF4E70BC4C20419:69B5EDAA7D26DA74]}
>
> Error Message:
> [Contains] Shouldn't match
> I#5:Rect(minX=-150.0,maxX=-80.0,minY=-90.0,maxY=-90.0)
> Q:Rect(minX=-110.0,maxX=-100.0,minY=-90.0,maxY=-90.0)
>
> Stack Trace:
> java.lang.AssertionError: [Contains] Shouldn't match
> I#5:Rect(minX=-150.0,maxX=-80.0,minY=-90.0,maxY=-90.0)
> Q:Rect(minX=-110.0,maxX=-100.0,minY=-90.0,maxY=-90.0)
> at
> __randomizedtesting.SeedInfo.seed([CDF4E70BC4C20419:69B5EDAA7D26DA74]:0)
> at org.junit.Assert.fail(Assert.java:93)
> at
> org.apache.lucene.spatial.prefix.RandomSpatialOpStrategyTestCase.fail(RandomSpatialOpStrategyTestCase.java:126)
> at
> org.apache.lucene.spatial.prefix.RandomSpatialOpStrategyTestCase.testOperation(RandomSpatialOpStrategyTestCase.java:115)
> at
> org.apache.lucene.spatial.prefix.RandomSpatialOpStrategyTestCase.testOperationRandomShapes(RandomSpatialOpStrategyTestCase.java:62)
> at
> org.apache.lucene.spatial.bbox.TestBBoxStrategy.testOperations(TestBBoxStrategy.java:97)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
> at java.lang.reflect.Method.invoke(Method.java:619)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
> at
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
> at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
> at
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
> at
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
> at
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
> at
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
> at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
> at
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
> at
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAsserti

Re: Using a patch review tool for Lucene / Solr development.

2014-07-10 Thread david.w.smi...@gmail.com
On Wed, Jul 9, 2014 at 1:34 PM, Mark Miller  wrote:

> A few months ago, I filed INFRA JIRA issue to add the Lucene project to
> review board (https://reviews.apache.org) and it was just resolved (
> https://issues.apache.org/jira/browse/INFRA-7630).
>

Awesome.


> I’m not the biggest fan of review board, but it’s well supported by Apache
> and is sufficient at the key points for a patch review tool I think.
>

Have you considered using GitHub instead?  I’m using that with my GSOC
student, Varun Shenoy on his fork of the lucene-solr mirror on GitHub.
 That is, he has a branch and I’m commenting on his commits.  No need to
upload diff files or have a login to yet another system (doesn’t everyone
have a GitHub account by now?).


> I don’t think we should make this mandatory and that it should just be
> treated as an additional, optional resource, but I wanted to alert people
> to it’s existence and perhaps start a discussion around a few points.
>
> * I’ve been sold more and more over time of the advantages of a review
> tool, especially for large patches. The ability to comment at code points
> and easily view differences between successive patches is super useful.
>
> * I don’t know how I feel about moving comments and discussion for patches
> out of JIRA and into review board. I’m not sure what kind of integration
> there is.
>
> I’m using https://issues.apache.org/jira/browse/SOLR-5656 as a first
> trial issue: https://reviews.apache.org/r/23371/


I think it’s fine that line-number oriented discussion isn’t in JIRA so
long as that the relevant JIRA issue links to the discussion so people know
where to look.  It would be nice if the high level discussion could be kept
in JIRA, which is more searchable (e.g. McCandless’s jirasearch) and
observable by interested parties.

~ David


Re: Hints on constructing/running Solr analyzer chains standalone

2014-07-12 Thread david.w.smi...@gmail.com
That sounds like a wonderful project, Alexandre — I’ve always wanted such a
capability!

I suggest approaching this very pragmatically based on minimizing the time
to get something useful, which means leveraging as much as is available
already — that means solr’s existing analysis UI screen.  I suggest
modifying the FieldAnalysisRequestHandler could take optional input of a
provided XML fieldType definition in the request instead of using the live
schema.  It would create a new temporary SolrSchema based on the provided
data, then re-use the rest of its field analyzing code based on that
schema.   Disclaimer: I have yet to look at FieldAnalysisRequestHandler.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


On Sat, Jul 12, 2014 at 1:16 PM, Alexandre Rafalovitch 
wrote:

> I don't want to read the schema.xml, but I do want to create factories
> using the same parameters they use in schema. So, it looks like I need
> to play around with ResourceLoaders and maybe SPI loaders, so things
> like wordlists get loaded.
>
> Starting from FieldAnalyzer turned out to be a dead-end because it was
> using pre-initialized field definitions. But starting again from Test
> cases seem to be somewhat more productive.
>
> The idea for the project is to give a web UI where a user can quickly
> put one or more analyzer stacks together and see how it/they perform
> against text (multiple texts). A bit similar to FieldAnalyzer but
> allow to have multiple stacks side-by-side and NOT needing to reload
> the core to add new ones. Then, generate the XML definition, ready for
> pasting in. That's the target anyway.
>
> Regards,
>Alex.
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>
>


Re: [JENKINS] Lucene-Solr-SmokeRelease-trunk - Build # 184 - Still Failing

2014-07-18 Thread david.w.smi...@gmail.com
On Fri, Jul 18, 2014 at 10:18 AM, Timothy Potter 
wrote:

> sheisty class


LOL that error is funny.

Tim,
Take a look at the smoke tester like 268 or so.  It already makes
exceptions for certain Solr contrib modules, and apparently you need to
augument it further.

~ David


Re: Distributed spellcheck

2014-07-29 Thread david.w.smi...@gmail.com
Steve,
File a bug.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Jul 29, 2014 at 9:14 AM, Steve Molloy  wrote:

>
> Hi,
>
> I'm running into an issue with distributed spellcheck and was wondering if
> anyone else faced this. Basically, when a term is misspelled according to
> all shards, but at least one of the shards returns no suggestion, then any
> suggestions returned by other shards are lost so that the final response
> correctly states that the term is misspelled, but offers no suggestions. It
> doesn't seem to affect collations.
>
> I'm looking into the code to figure it out, but if anyone has already seen
> this and has a way of getting it to work, it would save me precious time. :)
>
> Thanks,
> Steve
> -
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


Re: Welcome Tomás Fernández Löbbe as Lucene/Solr committer!

2014-07-31 Thread david.w.smi...@gmail.com
Welcome Tomas!

Will you make it to Lucene/Solr Revolution in November?

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


On Thu, Jul 31, 2014 at 1:16 PM, Tomás Fernández Löbbe <
[email protected]> wrote:

> Thanks everyone, I’m really happy to join this great team!
>
> I was born and raised in Argentina, mostly in Buenos Aires but also moved
> through other cities. I studied engineering at the Buenos Aires University.
> I started looking at Solr while working for a healthcare company in
> Argentina, building a search app. After that, I worked at Lucidworks and
> A9, first as consultant and then as developer.
> I recently moved with my wife to California to work at A9’s offices (not
> easy after years of working from home).
>
> Thanks again for inviting me to join this team, I will to continue working
> on improving Lucene/Solr as much as I can.
>
> Tomás
>
>
> PS: “Fernández Löbbe” is my last name, there is no middle name there. I
> don’t even try to write my middle name anywhere because that would make the
> task of signing extremely time consuming.
>
>
> On Thu, Jul 31, 2014 at 9:59 AM, Shai Erera  wrote:
>
>> Welcome Tomas!
>> On Jul 31, 2014 7:55 PM, "Shawn Heisey"  wrote:
>>
>>> On 7/31/2014 9:50 AM, Yonik Seeley wrote:
>>> > I'm pleased to announce that Tomás has accepted the PMC's invitation
>>> > to become a Lucene/Solr committer.
>>>
>>> Congrats and welcome!
>>>
>>>
>>> -
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>
>


Re: [JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.8.0_11) - Build # 11036 - Still Failing!

2014-08-17 Thread david.w.smi...@gmail.com
I’ll look into it.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


On Sat, Aug 16, 2014 at 10:16 AM, Policeman Jenkins Server <
[email protected]> wrote:

> Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/11036/
> Java: 64bit/jdk1.8.0_11 -XX:+UseCompressedOops -XX:+UseG1GC
>
> 1 tests failed.
> FAILED:  org.apache.lucene.spatial.bbox.TestBBoxStrategy.testOperations
> {#19 seed=[631220E490F0DB53:EFFB6AF935D602D2]}
>
> Error Message:
> [BBoxWithin] Shouldn't match
> I#0:Rect(minX=180.0,maxX=180.0,minY=30.0,maxY=40.0)
> Q:Rect(minX=-180.0,maxX=-180.0,minY=-70.0,maxY=50.0)
>
> Stack Trace:
> java.lang.AssertionError: [BBoxWithin] Shouldn't match
> I#0:Rect(minX=180.0,maxX=180.0,minY=30.0,maxY=40.0)
> Q:Rect(minX=-180.0,maxX=-180.0,minY=-70.0,maxY=50.0)
> at
> __randomizedtesting.SeedInfo.seed([631220E490F0DB53:EFFB6AF935D602D2]:0)
> at org.junit.Assert.fail(Assert.java:93)
> at
> org.apache.lucene.spatial.prefix.RandomSpatialOpStrategyTestCase.fail(RandomSpatialOpStrategyTestCase.java:126)
> at
> org.apache.lucene.spatial.prefix.RandomSpatialOpStrategyTestCase.testOperation(RandomSpatialOpStrategyTestCase.java:115)
> at
> org.apache.lucene.spatial.prefix.RandomSpatialOpStrategyTestCase.testOperationRandomShapes(RandomSpatialOpStrategyTestCase.java:62)
> at
> org.apache.lucene.spatial.bbox.TestBBoxStrategy.testOperations(TestBBoxStrategy.java:109)
> at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
> at
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
> at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
> at
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
> at
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
> at
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
> at
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
> at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
> at
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
> at
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
> at
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
> at
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)

Re: [JENKINS] Lucene-4x-Linux-Java7-64-test-only - Build # 29500 - Failure!

2014-08-23 Thread david.w.smi...@gmail.com
Rob, just curious, how did you deduce that?  I’m guessing it didn’t
reproduce and the logic in question seemed valid and wasn’t recently
changed, leaving no sure cause except that G1GC isn’t as stable.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


On Sat, Aug 23, 2014 at 11:20 AM, Robert Muir  wrote:
> g1gc
>
> On Sat, Aug 23, 2014 at 10:45 AM,   wrote:
>> Build: builds.flonkings.com/job/Lucene-4x-Linux-Java7-64-test-only/29500/
>>
>> 1 tests failed.
>> REGRESSION:  
>> org.apache.lucene.index.TestNumericDocValuesUpdates.testManyReopensAndFields
>>
>> Error Message:
>> Captured an uncaught exception in thread: Thread[id=32, name=Lucene Merge 
>> Thread #0, state=RUNNABLE, group=TGRP-TestNumericDocValuesUpdates]
>>
>> Stack Trace:
>> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
>> uncaught exception in thread: Thread[id=32, name=Lucene Merge Thread #0, 
>> state=RUNNABLE, group=TGRP-TestNumericDocValuesUpdates]
>> at 
>> __randomizedtesting.SeedInfo.seed([BF0A468E0776ACD:3D0CC647618209D1]:0)
>> Caused by: org.apache.lucene.index.MergePolicy$MergeException: 
>> java.io.EOFException: read past EOF: 
>> RAMInputStream(name=RAMInputStream(name=_2.cfs) [slice=_2_Memory_0.mdvd])
>> at __randomizedtesting.SeedInfo.seed([BF0A468E0776ACD]:0)
>> at 
>> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:545)
>> at 
>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:518)
>> Caused by: java.io.EOFException: read past EOF: 
>> RAMInputStream(name=RAMInputStream(name=_2.cfs) [slice=_2_Memory_0.mdvd])
>> at 
>> org.apache.lucene.store.RAMInputStream.switchCurrentBuffer(RAMInputStream.java:98)
>> at 
>> org.apache.lucene.store.RAMInputStream.readBytes(RAMInputStream.java:81)
>> at 
>> org.apache.lucene.store.MockIndexInputWrapper.readBytes(MockIndexInputWrapper.java:128)
>> at 
>> org.apache.lucene.store.BufferedChecksumIndexInput.readBytes(BufferedChecksumIndexInput.java:49)
>> at org.apache.lucene.store.DataInput.readBytes(DataInput.java:84)
>> at org.apache.lucene.store.DataInput.skipBytes(DataInput.java:298)
>> at 
>> org.apache.lucene.store.ChecksumIndexInput.seek(ChecksumIndexInput.java:51)
>> at 
>> org.apache.lucene.codecs.CodecUtil.checksumEntireFile(CodecUtil.java:267)
>> at 
>> org.apache.lucene.codecs.memory.MemoryDocValuesProducer.checkIntegrity(MemoryDocValuesProducer.java:261)
>> at 
>> org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsReader.checkIntegrity(PerFieldDocValuesFormat.java:329)
>> at 
>> org.apache.lucene.index.SegmentDocValuesProducer.checkIntegrity(SegmentDocValuesProducer.java:169)
>> at 
>> org.apache.lucene.index.SegmentReader.checkIntegrity(SegmentReader.java:572)
>> at 
>> org.apache.lucene.index.SegmentMerger.(SegmentMerger.java:60)
>> at 
>> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4205)
>> at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3815)
>> at 
>> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
>> at 
>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
>>
>>
>>
>>
>> Build Log:
>> [...truncated 405 lines...]
>>[junit4] Suite: org.apache.lucene.index.TestNumericDocValuesUpdates
>>[junit4]   2> aug 23, 2014 6:40:39 FM 
>> com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
>>  uncaughtException
>>[junit4]   2> VARNING: Uncaught exception in thread: Thread[Lucene Merge 
>> Thread #0,6,TGRP-TestNumericDocValuesUpdates]
>>[junit4]   2> org.apache.lucene.index.MergePolicy$MergeException: 
>> java.io.EOFException: read past EOF: 
>> RAMInputStream(name=RAMInputStream(name=_2.cfs) [slice=_2_Memory_0.mdvd])
>>[junit4]   2>at 
>> __randomizedtesting.SeedInfo.seed([BF0A468E0776ACD]:0)
>>[junit4]   2>at 
>> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:545)
>>[junit4]   2>at 
>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:518)
>>[junit4]   2> Caused by: java.io.EOFException: read past EOF: 
>> RAMInputStream(name=RAMInputStream(name=_2.cfs) [slice=_2_Memory_0.mdvd])
>>[junit4]   2>at 
>> org.apache.lucene.store.RAMInputStream.switchCurrentBuffer(RAMInputStream.java:98)
>>[junit4]   2>at 
>> org.apache.lucene.store.RAMInputStream.readBytes(RAMInputStream.java:81)
>>[junit4]   2>at 
>> org.apache.lucene.store.MockIndexInputWrapper.readBytes(MockIndexInputWrapper.java:128)
>>[junit4]   2>at 
>> org.apache.lucene.store.BufferedChecksumIndexInput.readBytes(BufferedC

Re: Issue with bin/solr script, collection1, and cloud mode

2014-08-28 Thread david.w.smi...@gmail.com
Ok.

I wish the router was an explicit option, separate from declaring
numShards.  And furthermore, that it would never be “implicit” unless you
expressly told it to be.  People sometimes get this router because they
forget numShards, thinking “1 is fine anyway”. — for now.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


On Thu, Aug 28, 2014 at 2:49 PM, Timothy Potter 
wrote:

> Hi,
>
> Wanted to bring up an issue
> (https://issues.apache.org/jira/browse/SOLR-6447) just to get opinions
> on whether this warrants a re-spin (thinking NO but wanted to be sure
> someone didn't feel otherwise).
>
> Basically, if you use: bin/solr -c, Solr will start in cloud mode and
> since collection1 defines a cores.properties, it gets auto-created as
> a collection. Unfortunately, the script doesn't set -DnumShards, so
> collection1 is getting created using the implicit router and a null
> range.
>
> As for overall impact, I think it's pretty minor but could lead to
> confusion for new users that may want to try splitting collection1
> (unlikely).
>
> For now, I've updated the Ref Guide to guide new users to using:
> bin/solr -e cloud instead, which will prompt the user for number of
> nodes, collection name, num shards, rf, etc. In other words, if users
> do: bin/solr -e cloud, it will help them create a new collection that
> has numShards set correctly.
>
> I don't think there's any big risk here but wanted to be proactive in
> letting folks know this issue exists before we move forward with 4.10.
>
> Cheers,
> Tim
>
> -
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


Re: VOTE: Solr Reference Guide for 4.10

2014-09-03 Thread david.w.smi...@gmail.com
+1, at least for the spatial part I looked at

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


On Wed, Sep 3, 2014 at 1:20 PM, Chris Hostetter 
wrote:

>
> +1
>
> :
> https://dist.apache.org/repos/dist/dev/lucene/solr/ref-guide/apache-solr-ref-guide-4.10-RC0
> :
> : $ cat apache-solr-ref-guide-4.10-RC0/apache-solr-ref-guide-4.10.pdf.sha1
> : e7a43acbc3da06f4c65af9067a3850557c665666  apache-solr-ref-guide-4.10.pdf
>
>
>
>
> -Hoss
> http://www.lucidworks.com/
>


Re: [VOTE] Move trunk to Java 8

2014-09-12 Thread david.w.smi...@gmail.com
Your arguments really resonate with me, Ryan…
+1 to Java 8

(FWIW I’m using coding in Java 8 these days already)

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Fri, Sep 12, 2014 at 1:39 PM, Ryan Ernst  wrote:

> One that is on my mind right now may just barely make it to 1.7 this year.
>>
>
>
>
>> Thus my desire to see a way to get the pending trunk work to people who
>> are not moving to 1.8 any time soon.
>
>
> We should not hold Lucene back because some companies have arcane upgrade
> policies.  Part of what allows policies like this to continue is the
> slowness of the ecosystem to update, both support (we already have this)
> and requirement (what is being proposed).  As I said in the original
> message, we should be ahead of the curve, not the project that is dragging
> behind.
>
> I thought I saw a message go by about a 5x branch the other day, so
>> perhaps things are already exactly what I am asking for
>
>
> This is one proposed alternative to "solve all the trunk problems" (bwc
> and java8).  I think it is a copout (no offense Robert) to avoid forcing an
> agreement by the community to move forward.
>
> Given how long it is likely to be until 6.0, I am not here to argue that
>> 6.0 should not require 1.8
>
>
> But no one knows how long it will be until 5.0 either.  Even after 5.0 is
> released, whenever that may be, if there are those in the community that
> want to stretch life out of the 4x branch on java 7, that is their
> prerogative.
>
> I think the question here is, should trunk be the "blazing forefront of
> development?"  I think it should be, and it seems like many others agree.
>  We should not limit what is possible in trunk because corporate overlords
> are afraid of change.
>
> On Fri, Sep 12, 2014 at 1:24 PM, Benson Margulies 
> wrote:
>
>>
>>
>> On Fri, Sep 12, 2014 at 3:35 PM, Robert Muir  wrote:
>>
>>> On Fri, Sep 12, 2014 at 3:31 PM, Chris Hostetter
>>>  wrote:
>>> >
>>> > b) that your argument against benson's claims seemed missleading: just
>>> > because Oracle is EOLing doesn't mean people won't be using OpenJDK;
>>> even
>>> > if they are using Oracle's JDK, if they are large comercial
>>> organizations
>>> > they might pay oracle to keep using it for a long time.
>>> >
>>>
>>> Its not misleading at all, its being practical. If people want to use
>>> old jvm versions, good for them. But if they open a corruption bug
>>> with one of these "commercial" versions, then my only choice is to
>>> close as "wont fix". So they might as well just use an old lucene
>>> version, too.
>>>
>>
>> Here's what I know. Over the last few years, the large entities my
>> employer sells to have been very slow to move to new Java versions. Why? I
>> dunno, maybe all of them have Mordac working there. Do they pay for
>> security fixes from Oracle? Or do they just stick their heads in the sand?
>> I can't tell you. One that is on my mind right now may just barely make it
>> to 1.7 this year.
>>
>> We (meaning this project, not my employer) generally require that
>> 'significant' changes go into major releases. So, that ties together the
>> JVM version and these changes. Thus my desire to see a way to get the
>> pending trunk work to people who are not moving to 1.8 any time soon. An
>> alternative would be to have a different policy for what can go into a 4.x.
>> I thought I saw a message go by about a 5x branch the other day, so perhaps
>> things are already exactly what I am asking for, and I apologize for the
>> noise. Given how long it is likely to be until 6.0, I am not here to argue
>> that 6.0 should not require 1.8. I like a nice lambda expression as well as
>> the next guy.
>>
>>
>>
>>
>>>
>>> -
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>
>>
>


Re: [VOTE] Move trunk to Java 8

2014-09-15 Thread david.w.smi...@gmail.com
Ryan,
I’m unclear on what makes a “procedural vote” as such.  This seems to me to
be about code modifications — in a big way as it’s a large change to the
codebase.

~ David


Re: History question: contribution from Solr to Lucene

2014-11-02 Thread david.w.smi...@gmail.com
Alex,
You should follow Yonik’s blog (Heliosearch), he has a post on this
subject, more or less:
http://heliosearch.org/lucene-solr-history/

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Sun, Nov 2, 2014 at 8:36 PM, Alexandre Rafalovitch 
wrote:

> Hi,
>
> I am trying to understand what used to be in Solr pre-merge and got
> moved into Lucene packages after the projects merged. For example
> analyzers/tokenizers, were they always in Lucene or all originally in
> Solr?
>
> I am not sure where to check this quickly, so I am hoping people can
> do a short history or a good URL.
>
> Regards,
>Alex.
>
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>
> -
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


Re: An experience and some thoughts about solr/example -> solr/server

2014-11-04 Thread david.w.smi...@gmail.com
+1 Yeah, this would be huge.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Tue, Nov 4, 2014 at 2:30 AM, Jan Høydahl  wrote:

> Also a crucial part here is to add a Collection tab in Admin GUI, and a
> more intelligent Cores tag, so that when people open Admin UI and see
> "There are no cores", they can create one by a few clicks. We already have
> a config_sets folder, perhaps the UI could look for that one and offer to
> create a core/collection based on one of the sets?
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 4. nov. 2014 kl. 01.34 skrev Yonik Seeley :
> >
> > On Mon, Nov 3, 2014 at 12:50 PM, Shawn Heisey 
> wrote:
> >> I just ask that this
> >> information be added to the immediately available docs (README.txt and
> >> similar).
> >
> > +1
> >
> > If we look at the old (4.x) README, it shows you how to start the
> > server, get to the admin screen, and index some documents.
> >
> > -Yonik
> > http://heliosearch.org - native code faceting, facet functions,
> > sub-facets, off-heap data
> >
> > -
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
>
>
> -
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


Multi-valued fields and TokenStream

2014-11-05 Thread david.w.smi...@gmail.com
Several times now, I’ve had to come up with work-arounds for a TokenStream
not knowing it’s processing the first value or a subsequent-value of a
multi-valued field.  Two of these times, the use-case was ensuring the
first position of each value started at a multiple of 1000 (or some other
configurable value), and the third was encoding sentence paragraph counters
(similar to a do-it-yourself position increment).

The work-arounds are awkward and hacky.  For example if you’re in control
of your Tokenizer, you can prefix subsequent values with a special flag,
and then do the right think in reset().  But then the highlighter or value
retrieval in general is impacted.  It’s also possible to create the fields
with the constructor that accepts a TokenStream that you’ve told it’s the
first or subsequent value but it’s awkward going that route, and sometimes
(e.g. Solr) it’s hard to know all the values you have up-front to even do
that.

It would be nice if TokenStream.reset() took a boolean ‘first’ argument.
Such a change would obviously be backwards incompatible.  Simply
overloading the method to call the no-arg version is problematic because
TokenStreams are a chain, and it would likely result in the chain getting
doubly-reset.

Any ideas?

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


Re: Multi-valued fields and TokenStream

2014-11-06 Thread david.w.smi...@gmail.com
Are you suggesting that DefaultIndexingChain.PerField.invert(boolean
firstValue) would, prior to calling reset(), call
setPositionIncrement(Integer.MAX_VALUE), but only when ‘firstValue’ is
false?  H.  I guess that would work, although it seems a bit hacky and
it’s tying this to a specific attribute when ideally we notify the chain as
a whole what’s going on.  But it doesn’t require any new API, save for some
javadocs.  And it’s extremely unlikely there would be a
backwards-incompatible problem, so that’s good.  And I find this use is
related to positions so it’s not so bad to abuse the position increment for
this.  Nice idea Steve; this works for me.

Does anyone else have an opinion before I create an issue?

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Thu, Nov 6, 2014 at 2:13 PM, Steve Rowe  wrote:

> Maybe the position increment gap would be useful?  If set to a value
> larger than likely max position for any individual value, it could be used
> to infer (non-)first-value-ness.
>
> > On Nov 5, 2014, at 1:03 PM, [email protected] wrote:
> >
> > Several times now, I’ve had to come up with work-arounds for a
> TokenStream not knowing it’s processing the first value or a
> subsequent-value of a multi-valued field.  Two of these times, the use-case
> was ensuring the first position of each value started at a multiple of 1000
> (or some other configurable value), and the third was encoding sentence
> paragraph counters (similar to a do-it-yourself position increment).
> >
> > The work-arounds are awkward and hacky.  For example if you’re in
> control of your Tokenizer, you can prefix subsequent values with a special
> flag, and then do the right think in reset().  But then the highlighter or
> value retrieval in general is impacted.  It’s also possible to create the
> fields with the constructor that accepts a TokenStream that you’ve told
> it’s the first or subsequent value but it’s awkward going that route, and
> sometimes (e.g. Solr) it’s hard to know all the values you have up-front to
> even do that.
> >
> > It would be nice if TokenStream.reset() took a boolean ‘first’
> argument.  Such a change would obviously be backwards incompatible.  Simply
> overloading the method to call the no-arg version is problematic because
> TokenStreams are a chain, and it would likely result in the chain getting
> doubly-reset.
> >
> > Any ideas?
> >
> > ~ David Smiley
> > Freelance Apache Lucene/Solr Search Consultant/Developer
> > http://www.linkedin.com/in/davidwsmiley
>
>
> -
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


Re: Multi-valued fields and TokenStream

2014-11-06 Thread david.w.smi...@gmail.com
On Thu, Nov 6, 2014 at 3:19 PM, Robert Muir  wrote:

> Do the concatenation yourself with your own TokenStream. You can index
> a field with a tokenstream for expert cases (the individual stored
> values can be added separately)
>

Yes, but that’s quite awkward and a fair amount of surrounding code when,
in the end, it could be so much simpler if somehow the TokenStream could be
notified.  I’d feel a little better about it if Lucene included the
tokenStream concatenating code (I’ve done a prototype for this, I could
work on it more and contribute) and if the Solr layer had a nice way of
presenting all the values to the Solr FieldType at once instead of
separately — SOLR-4329.


> No need to make the tokenstream API more complicated: its already very
> complicated.
>

Ehh, that’s arguable.  Steve’s suggestion amounts to one line of production
code (javadoc & test is separate).  If that’s too much then adding a
boolean argument to reset() would feel cleaner, be 0 lines of new code, but
would be backwards-incompatible.  Shrug.

Another idea is if Field.tokenStream(Analyzer analyzer, TokenStream reuse)
had another boolean to indicate first value or not.  I think I like the
other ideas better though.


>
> On Thu, Nov 6, 2014 at 3:13 PM, [email protected]
>  wrote:
> > Are you suggesting that DefaultIndexingChain.PerField.invert(boolean
> > firstValue) would, prior to calling reset(), call
> > setPositionIncrement(Integer.MAX_VALUE), but only when ‘firstValue’ is
> > false?  H.  I guess that would work, although it seems a bit hacky
> and
> > it’s tying this to a specific attribute when ideally we notify the chain
> as
> > a whole what’s going on.  But it doesn’t require any new API, save for
> some
> > javadocs.  And it’s extremely unlikely there would be a
> > backwards-incompatible problem, so that’s good.  And I find this use is
> > related to positions so it’s not so bad to abuse the position increment
> for
> > this.  Nice idea Steve; this works for me.
> >
> > Does anyone else have an opinion before I create an issue?
> >
> > ~ David Smiley
> > Freelance Apache Lucene/Solr Search Consultant/Developer
> > http://www.linkedin.com/in/davidwsmiley
> >
> > On Thu, Nov 6, 2014 at 2:13 PM, Steve Rowe  wrote:
> >>
> >> Maybe the position increment gap would be useful?  If set to a value
> >> larger than likely max position for any individual value, it could be
> used
> >> to infer (non-)first-value-ness.
> >>
> >> > On Nov 5, 2014, at 1:03 PM, [email protected] wrote:
> >> >
> >> > Several times now, I’ve had to come up with work-arounds for a
> >> > TokenStream not knowing it’s processing the first value or a
> >> > subsequent-value of a multi-valued field.  Two of these times, the
> use-case
> >> > was ensuring the first position of each value started at a multiple
> of 1000
> >> > (or some other configurable value), and the third was encoding
> sentence
> >> > paragraph counters (similar to a do-it-yourself position increment).
> >> >
> >> > The work-arounds are awkward and hacky.  For example if you’re in
> >> > control of your Tokenizer, you can prefix subsequent values with a
> special
> >> > flag, and then do the right think in reset().  But then the
> highlighter or
> >> > value retrieval in general is impacted.  It’s also possible to create
> the
> >> > fields with the constructor that accepts a TokenStream that you’ve
> told it’s
> >> > the first or subsequent value but it’s awkward going that route, and
> >> > sometimes (e.g. Solr) it’s hard to know all the values you have
> up-front to
> >> > even do that.
> >> >
> >> > It would be nice if TokenStream.reset() took a boolean ‘first’
> argument.
> >> > Such a change would obviously be backwards incompatible.  Simply
> overloading
> >> > the method to call the no-arg version is problematic because
> TokenStreams
> >> > are a chain, and it would likely result in the chain getting
> doubly-reset.
> >> >
> >> > Any ideas?
> >> >
> >> > ~ David Smiley
> >> > Freelance Apache Lucene/Solr Search Consultant/Developer
> >> > http://www.linkedin.com/in/davidwsmiley
> >>
> >>
> >> -
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >>
> >
>
> -
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


Re: [JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.8.0_40-ea-b09) - Build # 11586 - Failure!

2014-11-08 Thread david.w.smi...@gmail.com
Weird; I can’t reproduce this given the given Ant invocation given.  I used
JDK 1.8.0_20.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Sat, Nov 8, 2014 at 4:50 PM, Policeman Jenkins Server <
[email protected]> wrote:

> Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/11586/
> Java: 64bit/jdk1.8.0_40-ea-b09 -XX:+UseCompressedOops -XX:+UseParallelGC
> (asserts: false)
>
> 2 tests failed.
> FAILED:
> junit.framework.TestSuite.org.apache.lucene.spatial.vector.TestPointVectorStrategy
>
> Error Message:
> Suite timeout exceeded (>= 720 msec).
>
> Stack Trace:
> java.lang.Exception: Suite timeout exceeded (>= 720 msec).
> at __randomizedtesting.SeedInfo.seed([D20594D8F0B5066A]:0)
>
>
> REGRESSION:
> org.apache.lucene.spatial.vector.TestPointVectorStrategy.testCitiesIntersectsBBox
>
> Error Message:
> Test abandoned because suite timeout was reached.
>
> Stack Trace:
> java.lang.Exception: Test abandoned because suite timeout was reached.
> at __randomizedtesting.SeedInfo.seed([D20594D8F0B5066A]:0)
>
>
>
>
> Build Log:
> [...truncated 9728 lines...]
>[junit4] Suite: org.apache.lucene.spatial.vector.TestPointVectorStrategy
>[junit4]   2> Nov 08, 2014 1:49:51 PM
> org.apache.lucene.spatial.StrategyTestCase executeQueries
>[junit4]   2> INFO: testing queried for strategy PointVectorStrategy
> field:TestPointVectorStrategy ctx=SpatialContext.GEO
>[junit4]   2> Nov 08, 2014 3:49:49 PM
> com.carrotsearch.randomizedtesting.ThreadLeakControl$2 evaluate
>[junit4]   2> WARNING: Suite execution timed out:
> org.apache.lucene.spatial.vector.TestPointVectorStrategy
>[junit4]   2>  jstack at approximately timeout time 
>[junit4]   2> "Lucene Merge Thread #0" ID=16 RUNNABLE
>[junit4]   2>at
> org.apache.lucene.util.fst.PairOutputs.subtract(PairOutputs.java:132)
>[junit4]   2>at
> org.apache.lucene.util.fst.PairOutputs.subtract(PairOutputs.java:32)
>[junit4]   2>at
> org.apache.lucene.util.fst.PairOutputs.subtract(PairOutputs.java:133)
>[junit4]   2>at
> org.apache.lucene.util.fst.PairOutputs.subtract(PairOutputs.java:32)
>[junit4]   2>at
> org.apache.lucene.util.fst.Builder.add(Builder.java:419)
>[junit4]   2>at
> org.apache.lucene.codecs.simpletext.SimpleTextFieldsReader$SimpleTextTerms.loadTerms(SimpleTextFieldsReader.java:566)
>[junit4]   2>at
> org.apache.lucene.codecs.simpletext.SimpleTextFieldsReader$SimpleTextTerms.(SimpleTextFieldsReader.java:527)
>[junit4]   2>at
> org.apache.lucene.codecs.simpletext.SimpleTextFieldsReader.terms(SimpleTextFieldsReader.java:676)
>[junit4]   2>- locked
> org.apache.lucene.codecs.simpletext.SimpleTextFieldsReader@363dc726
>[junit4]   2>at
> org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.terms(PerFieldPostingsFormat.java:280)
>[junit4]   2>at
> org.apache.lucene.index.LeafReader.terms(LeafReader.java:218)
>[junit4]   2>at
> org.apache.lucene.index.SimpleMergedSegmentWarmer.warm(SimpleMergedSegmentWarmer.java:48)
>[junit4]   2>at
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4089)
>[junit4]   2>at
> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3548)
>[junit4]   2>at
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
>[junit4]   2>at
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
>[junit4]   2>
>[junit4]   2>
> "TEST-TestPointVectorStrategy.testCitiesIntersectsBBox-seed#[D20594D8F0B5066A]"
> ID=15 TIMED_WAITING on org.apache.lucene.index.IndexWriter@7658e030
>[junit4]   2>at java.lang.Object.wait(Native Method)
>[junit4]   2>- timed waiting on
> org.apache.lucene.index.IndexWriter@7658e030
>[junit4]   2>at
> org.apache.lucene.index.IndexWriter.doWait(IndexWriter.java:4178)
>[junit4]   2>at
> org.apache.lucene.index.IndexWriter.waitForMerges(IndexWriter.java:2225)
>[junit4]   2>at
> org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:947)
>[junit4]   2>at
> org.apache.lucene.index.IndexWriter.close(IndexWriter.java:991)
>[junit4]   2>at
> org.apache.lucene.index.RandomIndexWriter.close(RandomIndexWriter.java:366)
>[junit4]   2>at
> org.apache.lucene.spatial.SpatialTestCase.tearDown(SpatialTestCase.java:97)
>[junit4]   2>at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>[junit4]   2>at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>[junit4]   2>at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>[junit4]   2>at java.lang.reflect.Method.invoke(Method.java:497)
>

Re: Welcome Gregory Chanan as Lucene/Solr committer

2014-09-20 Thread david.w.smi...@gmail.com
Welcome!

On Friday, September 19, 2014, Steve Rowe  wrote:

> I'm pleased to announce that Gregory Chanan has accepted the PMC's
> invitation to become a committer.
>
> Gregory, it's tradition that you introduce yourself with a brief bio.
>
> Mark Miller, the Lucene PMC chair, has already added your "gchanan"
> account to the “lucene" LDAP group, so you now have commit privileges.
> Please test this by adding yourself to the committers section of the Who We
> Are page on the website:  (use
> the ASF CMS bookmarklet at the bottom of the page here: <
> https://cms.apache.org/#bookmark> - more info here <
> http://www.apache.org/dev/cms.html>).
>
> Since you’re a committer on the Apache HBase project, you probably already
> know about it, but I'll include a link to the ASF dev page anyway - lots of
> useful links: .
>
> Congratulations and welcome!
>
> Steve
>
>
> -
> To unsubscribe, e-mail: [email protected] 
> For additional commands, e-mail: [email protected] 
>
>

-- 
Sent from Gmail Mobile


Re: Lucene Benchmark

2014-09-24 Thread david.w.smi...@gmail.com
I use the benchmark module for spatial and I intend to for highlighting
performance next month.

On Wednesday, September 24, 2014, Mikhail Khludnev <
[email protected]> wrote:

> Hi John,
>
> It's obvious
> http://lucene.apache.org/core/4_8_0/benchmark/org/apache/lucene/benchmark/byTask/package-summary.html
> It's also described in LUA. I just get into it and understand how to use
> it. Feel free to ask if you face any difficulties.
>
> Beware that Lucene devs use
> https://code.google.com/a/apache-extras.org/p/luceneutil/
> http://blog.mikemccandless.com/2011/04/catching-slowdowns-in-lucene.html
> I didn't get into it, just know that it reports fancy tables which you can
> meet in performance optimization jiras.
>
>
> On Wed, Sep 24, 2014 at 10:45 AM, John Wang  > wrote:
>
>> Hi guys:
>>
>>  Can you guys point me to some details on the Lucene Benchmark
>> module? Specifically the grammar/syntax for the Algorithm files?
>>
>> Thanks
>>
>> -John
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>


-- 
Sent from Gmail Mobile


Re: Question for D. Smiley

2014-08-05 Thread david.w.smi...@gmail.com
Hi Erick,

The field type for LatLonType mandates a subFieldSuffix or subFieldType
attribute, and so I think there’s clearly a problem if you don’t provide a
field type that’s going to match it.  The default schema even has a comment
on the dynamicField definition for *_coordinate that it’s needed.  I think
this is enough but if you want to improve the situation further, be my
guest.

By the way, next time title the subject appropriately and then CC me if you
want to ensure I see it.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Aug 5, 2014 at 4:53 PM, Erick Erickson 
wrote:

> Just had a situation where searching on a successfully indexed document
> with a location type field errored out. It turns out that the problem was
> the following:
>
> 1> the user removed the *_coordinate dynamic field definition
> 2> the user had the universal ignore dynamic field un-commented, as:
>  
>
>
> So, (and I'm guessing a bit here) the underlying code for adding the
> lat/lon as location_1_coordinate and location_0_coordinate did its thing,
> and then the document was successfully added to the index... by throwing
> location_1_coordinate and location_0_coordinate on the floor. So of course
> when he tried to use bbox (in an fq clause in this case) he got an error
> message about requiring indexed fields to work from.
>
> The long and short of it is whether it's worth opening a JIRA? Something
> like "geospatial should check dynamic field definitions and fail if they
> aren't properly defined"? If so, I''ll do it.
>
> Thanks!
> Erick
>
>


Re: Can't assign jiras to myself

2014-08-06 Thread david.w.smi...@gmail.com
Tomás, I put you into the “Committers” role for Lucene & Solr in JIRA just
now.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


On Wed, Aug 6, 2014 at 9:51 PM, Tomás Fernández Löbbe  wrote:

> May I be missing the "committer role" in Jira?
>
> https://wiki.apache.org/lucene-java/CommittersResources
>
> Tomás
>


Re: how to do auto suggestion using apache lucene?

2014-10-01 Thread david.w.smi...@gmail.com
On Wed, Oct 1, 2014 at 9:19 AM, Alexandre Rafalovitch 
wrote:

> https://github.com/arafalov/Solr-Javadoc/tree/master/SearchServer


Pretty cool, Alex!

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


Highlighters, accurate highlighting, and the PostingsHighlighter

2014-10-09 Thread david.w.smi...@gmail.com
I’m working on making highlighting both accurate and fast.  By “accurate”,
I mean the highlights need to accurately reflect a match given the query
and various possible query types (to include SpanQueries and
MultiTermQueries and obviously phrase queries and the usual suspects).  The
fastest highlighter we’ve got in Lucene is the PostingsHighlighter but it
throws out any positional nature in the query and can highlight more
inaccurately than the other two highlighters. The most accurate is the
default highlighter, although I can see some simplifications it makes that
could lead to inaccuracies.

The default highlighter’s “WeightedSpanTermExtractor” is interesting — it
uses a MemoryIndex built from re-analyzing the text, and it executes the
query against this mini index; kind of.  A recent experiment I did was to
have the MemoryIndex essentially wrap the “Terms” from term vectors.  It
works and saves memory, although, at least for large docs (which I’m
optimizing for) the real performance hit is in un-inverting the TokenStream
in TokenSources to include sorting the thousands of tokens -- assuming you
index term vectors of course.  But with my attention now on the
PostingsHighlighter (because it’s the fastest and offsets are way cheaper
than term vectors), I believe WeightedSpanTermExtractor could simply use
Lucene’s actual IndexReader — no?  It seems so obvious to me now I wonder
why it wasn’t done this way in the first place — all WSTE has to do is
advance() to the document being highlighted for applicable terms.  Am I
overlooking something?

WeightedSpanTermExtractor is somewhat accurate but my reading of its source
shows it takes short-cuts I’d like to eliminate.  For example if the query
is “(A && B) || (C && D)” and if the document doesn’t have ‘D’ then it
should ideally NOT highlight ‘C’ in this document, just ‘A’ and ‘B’.  I
think I can solve that using Scorers.getChildScorers to see which scorers
(and thus queries) actually matched.  Another example is that it views
SpanQueries at the top level only and records the entire span for all terms
it is comprised of.  So if you had a couple Phrase SpanQueries (actually
ordered 0-slop SpanNearQueries) joined by a SpanNearQuery to be within ~50
positions of each other, I believe it would highlight any other occurrence
of the words involved in-between the sub-SpanQueries. This looks hard to
solve but I think for starters, SpanScorer needs a getter for the Spans
instance, and furthermore Spans needs getChildSpans() just as Scorers
expose child scorers.  I could see myself relaxing this requirement because
of it’s complexity and simply highlighting the entire span, even if it
could be a big highlight.

Perhaps the “Nuke Spans” effort might make this all much easier but I
haven’t looked yet because that’s still not done yet.  It’s encouraging to
see Alan making recent progress there.

Any thoughts about any of this, guys?

p.s. When I’m done, I expect to have no problem getting open-source
permission from the sponsor commissioning this effort.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


Re: Highlighters, accurate highlighting, and the PostingsHighlighter

2014-10-10 Thread david.w.smi...@gmail.com
On Fri, Oct 10, 2014 at 6:39 AM, Michael McCandless <
[email protected]> wrote:

> +1 for a "completely accurate" (each snippet shown matches the query)
> and fast highlighter, but it's a real challenge because you need a
> clean way to recursively iterate all positions for any (even
> non-positional) queries (what LUCENE-2878 will give us).  To properly
> handle your (+A +B) (+C +D) example, you'd need BooleanQuery to
> participate in enumerating the positions...
>

My plan for that is to convert TermQueries to something similar that gets a
docsAndPositionsEnum (with offsets) instead of a plain DocsEnum.  The code
that navigates the graph can cast it to get what it needs.  Alternatively,
I thought perhaps I might wrap the IndexReader on down with pass-throughs
but ensure that you always get positions (with offsets) even when you don’t
ask for it, and then I’ll keep track of each instance for retrieval later.
Though somehow I’d need to map the Query to the tracked positions
enumerators, and this sounds like more work so I probably won’t go this
route.

I plan to convert the Query tree to an equivalent (for highlighter
purposes) comprised of BooleanQuery, TermQuery (some custom similar one,
actually), MultiTermQueries (again, some custom variant), and SpanQueries —
phrase queries get converted to those.

~ David


Re: Highlighters, accurate highlighting, and the PostingsHighlighter

2014-10-10 Thread david.w.smi...@gmail.com
On Fri, Oct 10, 2014 at 7:13 AM, Robert Muir  wrote:

> On Fri, Oct 10, 2014 at 12:38 AM, [email protected]
>  wrote:
> > The fastest
> > highlighter we’ve got in Lucene is the PostingsHighlighter but it throws
> out
> > any positional nature in the query and can highlight more inaccurately
> than
> > the other two highlighters. mission from the sponsor commissioning this
> effort.
> >
>
> Thats because it tries to summarize the document contents wrt to the
> query, so the user can decide if its relevant (versus being a debugger
> for span queries, or whatever). The algorithms used to do this don't
> really get benefits from positions, because they are the same ones
> used for regular IR.


> In short, the "inaccuracy" is important, because this highlighter is
> trying to do something different than the other highlighters.
>

I’m confused how inaccuracy is a feature, but nevertheless I appreciate
that the postings highlighter as-is is good enough for most users.  Thanks
for your awesome work on this highlighter, by the way!


> The reason it might be faster in comparison has less to do with the
> fact it reads offsets from the postings lists and more to do with the
> fact it does not have bad O(n^2) etc algorithms that the other
> highlighters do. Its not faster: it just does not blow up.
>

Well, it isn’t cheap to re-analyze the document text (what the default
highlighter does) nor to read term-vectors and sort the tokens (what the
default highlighter does when term vectors are available).  At least not
with big docs (lots of text to analyze or large term vectors to read and
sort).  My first steps were to try and make the default highlighter faster
but it still isn’t fast enough and it isn’t accurate enough either (for me).

I looked at the FVH a little but thought I’d skip the heft of term vectors
and use PostingsHighlighter, now that I’m willing to break open these
complex beasts and build what’s needed to meet my accuracy requirements.

Do you foresee any O(n^2) algorithms in what I’ve said?

I don't think you can safely make this highlighter do what you would
> like without compromising these goals (relevance of passages, and not
> blowing up): for a phrase or span, how can you compute the
> within-document freq() without actually reading all those positions
> (means blowing up)? With terms its simple, effective, and does not
> blow up: freq() -> IDF. Its the same term dependence issue from
> regular scoring, not going to be solved in an email to lucene jira
> list. The best I can do that is safe is
> https://issues.apache.org/jira/browse/LUCENE-4909, and nobody seemed
> interested, so it sits.
>

I plan to make simple approximations to score one passage relative to
another.  The passage with the most diversity in query terms wins, or at
least is the highest scoring factor. Then, low within-doc-freq (on a
per-term basis).  Then, high freq in the passage.  Then, shortness of
passage and closeness to the beginning.  In short, something fast to
compute and pretty reasonable — my principal requirement is highlighting
accuracy, and needs to support a lot of query types (incl. custom span
queries).


> So IMO, for scoring spans or intervals or whatever, a different
> highlighter is needed that makes some compromises (worse relevance,
> willingness to blow up). Hopefully they would be contained so that
> most users aren't impacted heavily and blowing up or getting badly
> ranked sentences. But I don't think we should make it so
> PostingsHighlighter can blow up. There are already two other
> highlighters for that.
>

Ok; I’m not sure yet how much from the PostingsHighlighter I’ll re-use but
there is a lot of it that is pertinent to my aims.  So much so, probably,
that I can see it being a subclass, or at least belong in the same
package.  It uses postings/offsets, (and not term vectors and without
re-analzing text).

Thanks for your input, Rob.

~ David


Re: Highlighters, accurate highlighting, and the PostingsHighlighter

2014-10-10 Thread david.w.smi...@gmail.com
Spot on Walter.  “trust in the engine” is precisely how it was put to me
for a technical user-community that I’m aiming this at.  When they see
“wrong” highlighting, they lose trust.  One might argue it’s a matter of
educating the user but I think it’s a reasonable requirement (a reasonable
thing to want of Lucene).

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Fri, Oct 10, 2014 at 12:55 PM, Walter Underwood 
wrote:

> I think of snippets and highlighting as explaining to the end user why the
> engine decided this was relevant. This tends to increase the user’s trust
> in the engine even when the results are not relevant.
>
> wunder
> Walter Underwood
> [email protected]
> http://observer.wunderwood.org/
>
>
> On Oct 10, 2014, at 9:37 AM, Uwe Schindler  wrote:
>
> Hi,
>
> > I’m confused how inaccuracy is a feature, but nevertheless I appreciate
> that the postings highlighter as-is is good enough for most users.  Thanks
> for your awesome work on this highlighter, by the way!
>
> The problem here are 2 different opinions how highlighting should look
> like. What is always wanted by most “technical” people is **not**
> “highlighting” like “showing where the search terms match in a specific
> document to make the user himself allow to ‘relevance test’ a specific
> result”, instead technical people want to have “query debugging”: exactly
> showing why a query matches. But this is not what highlighting was made for
>  *(especially not postings highlighter!).*
>
> I think Robert’s intention behind the postings highlighter is – and I
> fully think he is right – is to just give the “end user” (not “technical
> user”) a quick overview of where the terms match in a document, completely
> ignoring the type of query. You just want to get a quick context in the
> document where the terms of your query match. I always explain it to
> customers like “allow the end user to relevance rank the document
> themselves”.
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [email protected]
>
> *From:* [email protected] [mailto:[email protected]
> ]
> *Sent:* Friday, October 10, 2014 4:46 PM
> *To:* [email protected]
> *Subject:* Re: Highlighters, accurate highlighting, and the
> PostingsHighlighter
>
> On Fri, Oct 10, 2014 at 7:13 AM, Robert Muir  wrote:
>
> On Fri, Oct 10, 2014 at 12:38 AM, [email protected]
>  wrote:
> > The fastest
> > highlighter we’ve got in Lucene is the PostingsHighlighter but it throws
> out
> > any positional nature in the query and can highlight more inaccurately
> than
> > the other two highlighters. mission from the sponsor commissioning this
> effort.
> >
>
> Thats because it tries to summarize the document contents wrt to the
> query, so the user can decide if its relevant (versus being a debugger
> for span queries, or whatever). The algorithms used to do this don't
> really get benefits from positions, because they are the same ones
> used for regular IR.
>
>
> In short, the "inaccuracy" is important, because this highlighter is
> trying to do something different than the other highlighters.
>
>
> I’m confused how inaccuracy is a feature, but nevertheless I appreciate
> that the postings highlighter as-is is good enough for most users.  Thanks
> for your awesome work on this highlighter, by the way!
>
>
> The reason it might be faster in comparison has less to do with the
> fact it reads offsets from the postings lists and more to do with the
> fact it does not have bad O(n^2) etc algorithms that the other
> highlighters do. Its not faster: it just does not blow up.
>
>
> Well, it isn’t cheap to re-analyze the document text (what the default
> highlighter does) nor to read term-vectors and sort the tokens (what the
> default highlighter does when term vectors are available).  At least not
> with big docs (lots of text to analyze or large term vectors to read and
> sort).  My first steps were to try and make the default highlighter faster
> but it still isn’t fast enough and it isn’t accurate enough either (for me).
>
> I looked at the FVH a little but thought I’d skip the heft of term vectors
> and use PostingsHighlighter, now that I’m willing to break open these
> complex beasts and build what’s needed to meet my accuracy requirements.
>
> Do you foresee any O(n^2) algorithms in what I’ve said?
>
>
> I don't think you can safely make this highlighter do what you would
> like without compromising these goals (relevance of passages, and not
> blowing up): for a phrase or span, how can you compute the
> wi

Re: Solr: very slow custom Query/Weight/Scorer, "post filtering" vs sorting

2014-10-14 Thread david.w.smi...@gmail.com
On Mon, Oct 13, 2014 at 11:04 AM, Patrick Schemitz  wrote:

> This Query/Weight/Scorer construct is obviously very costly, so I don't
> want it to leapfrog with the other - much faster - filters in the query
> (especially when using a high threshold).
>

It’s leap-frogging with filters?  That’s strange; filters should be running
first, assuming the default cache=true on them.  If for some reason you
want to not cache filters or other queries, then you cold have your
QueryParser parse those queries as additional parameters, and then
construct a FilteredQuery configured with your query
and QUERY_FIRST_FILTER_STRATEGY.

Cheers,
  David


Re: Change the name of the implicit router in SolrCloud?

2014-10-16 Thread david.w.smi...@gmail.com
+1 for “manual”.

Furthermore, I think specifying the router should become mandatory or
default to the has based router.  For back-compat, we can keep current
behavior but output a warning about what choice was made.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Thu, Oct 16, 2014 at 11:03 AM, Shawn Heisey  wrote:

> I had this exchange with an IRC user named "kindkid" this morning:
>
> -
> 08:30 < kindkid> I'm using sharding with the implicit router, but I'm
> seeing
>  all my documents end up on just one of my 24 shards. What
>  might be causing this? (4.10.0)
> 08:35 <@elyograg> kindkid: you used the implicit router.  that means that
>   documents will be indexed on the shard you sent them
> to, not
>   routed elsewhere.
> 08:37 < kindkid> oh. wow. not sure where I got the idea, but I was under
> the
>  impression that implicit router would use a hash of the
>  uniqueKey modulo number of shards to pick a shard.
> 08:38 <@elyograg> I think you probably wanted the compositeId router.
> 08:39 <@elyograg> implicit is not a very good name.  It's technically
> correct,
>   but the meaning of the word is not well known.
> 08:39 <@elyograg> "manual" would be a better name.
> -
>
> The word "implicit" has a very specific meaning, and I think it's
> absolutely correct terminology for what it does, but I don't think that
> it's very clear to a typical person.  This is not the first time I've
> encountered the confusion.
>
> Could we deprecate the implicit name and use something much more
> descriptive and easily understood, like "manual" instead?  Let's go
> ahead and accept implicit in 5.x releases, but issue a warning in the
> log.  Maybe we can have a startup system property or a config option
> that will force the name to be updated in zookeeper and get rid of the
> warning.  If we do this, my bias is to have an upgrade to 6.x force the
> name change in zookeeper.
>
> Thanks,
> Shawn
>
>
> -
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


Re: Change the name of the implicit router in SolrCloud?

2014-10-16 Thread david.w.smi...@gmail.com
Not if you don’t specify numShards, and then you can’t shard-split later.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Thu, Oct 16, 2014 at 11:18 AM, Yonik Seeley 
wrote:

> On Thu, Oct 16, 2014 at 11:15 AM, [email protected]
>  wrote:
> > +1 for “manual”.
> >
> > Furthermore, I think specifying the router should become mandatory or
> > default to the has based router.
>
> That is the current default (compositeId)
>
> -Yonik
> http://heliosearch.org - native code faceting, facet functions,
> sub-facets, off-heap data
>
> -
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


Re: Inconsistency in the suggester factories

2015-02-28 Thread david.w.smi...@gmail.com
On Sat, Feb 28, 2015 at 2:48 PM, Erick Erickson 
wrote:

> I think this is worth a JIRA, anyone else got an opinion? And what
> should we do here? Just use one or the other? Remove the one we decide
> against? Allow both as synonyms? Deprecate one?
>

Standardize on one, the other becomes a deprecated synonym.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


Re: [JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0_40-ea-b22) - Build # 11897 - Still Failing!

2015-02-28 Thread david.w.smi...@gmail.com
Ha!  Randomized testing FTW!

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Sat, Feb 28, 2015 at 10:29 AM, Michael McCandless <
[email protected]> wrote:

> I committed a fix .. this was a fun one: SimpleText had a bug where if
> you indexed a SORTED doc value with the string value "END", its
> checkIntegrity got confused and falsely detected corruption.
>
> It just took our random tests this long to index the string "END" ...
>
> Soon we will be indexing the full works of Shakespeare...
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Sat, Feb 28, 2015 at 7:28 AM, Policeman Jenkins Server
>  wrote:
> > Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/11897/
> > Java: 32bit/jdk1.8.0_40-ea-b22 -server -XX:+UseConcMarkSweepGC
> >
> > 1 tests failed.
> > FAILED:
> org.apache.lucene.codecs.simpletext.TestSimpleTextDocValuesFormat.testSortedFixedLengthVsStoredFields
> >
> > Error Message:
> > SimpleText failure: expected checksum line but got length 3
> (resource=BufferedChecksumIndexInput(MockIndexInputWrapper(_w.dat)))
> >
> > Stack Trace:
> > org.apache.lucene.index.CorruptIndexException: SimpleText failure:
> expected checksum line but got length 3
> (resource=BufferedChecksumIndexInput(MockIndexInputWrapper(_w.dat)))
> > at
> __randomizedtesting.SeedInfo.seed([4879A5F99AD2035B:A4FCD66955DBA1EC]:0)
> > at
> org.apache.lucene.codecs.simpletext.SimpleTextUtil.checkFooter(SimpleTextUtil.java:90)
> > at
> org.apache.lucene.codecs.simpletext.SimpleTextDocValuesReader.checkIntegrity(SimpleTextDocValuesReader.java:527)
> > at
> org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:135)
> > at
> org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:143)
> > at
> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105)
> > at
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3928)
> > at
> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3509)
> > at
> org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:40)
> > at
> org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1798)
> > at
> org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2733)
> > at
> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2838)
> > at
> org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2805)
> > at
> org.apache.lucene.index.RandomIndexWriter.commit(RandomIndexWriter.java:252)
> > at
> org.apache.lucene.index.BaseDocValuesFormatTestCase.doTestSortedVsStoredFields(BaseDocValuesFormatTestCase.java:1448)
> > at
> org.apache.lucene.index.BaseDocValuesFormatTestCase.testSortedFixedLengthVsStoredFields(BaseDocValuesFormatTestCase.java:1493)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> > at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:497)
> > at
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
> > at
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
> > at
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
> > at
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
> > at
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
> > at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
> > at
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
> > at
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
> > at
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
> > at
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
> > at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> > at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
> > at
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
> > at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
> > at
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
> > at
> com.carrotsearch.r

Re: Welcome Ramkumar Aiyengar as Lucene/Solr committer

2015-03-02 Thread david.w.smi...@gmail.com
Welcome Ram!

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Sun, Mar 1, 2015 at 11:39 PM, Shalin Shekhar Mangar <
[email protected]> wrote:

> I'm pleased to announce that Ramkumar Aiyengar has accepted the PMC's
> invitation to become a committer.
>
> Ramkumar, it's tradition that you introduce yourself with a brief bio.
>
> Your handle "andyetitmoves" has already added to the “lucene" LDAP group,
> so you now have commit privileges. Please test this by adding yourself to
> the committers section of the Who We Are page on the website: <
> http://lucene.apache.org/whoweare.html> (use the ASF CMS bookmarklet at
> the bottom of the page here:  - more
> info here ).
>
> The ASF dev page also has lots of useful links: <
> http://www.apache.org/dev/>.
>
> Congratulations and welcome!
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: DocValues instead of stored values

2015-03-02 Thread david.w.smi...@gmail.com
I have a patch to do this somewhat automatically here:
https://issues.apache.org/jira/browse/SOLR-5478
I’m waiting on SOLR-6810 before revisiting the patch.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Mon, Mar 2, 2015 at 9:08 AM, Toke Eskildsen 
wrote:

> Given a non-tokenized field that has DocValues, the primary (maybe even
> only?) reason for making it stored, seems to be document retrieval. When
> the goal is to construct documents, the base difference between just
> returning the stored values and returning both stored and DocValued
> values seems to be performance: Resolving a non-trivial amount of stored
> values for each document is mostly a bulk operation, while the DocValued
> ones is more random access.
>
> In most of our setups, search-results are divided between overviews
> (classic top-10 or top-20 with most relevant documents) and expanded
> views (separate page or a result box that changes size). The overviews
> have few data and the expanded views have more data. The data for
> overviews needs to be provided quickly (stored), whereas the expanded
> views are one-document-at-a-time and thus does not have the same time
> requirements (DocValue speed is fine).
>
> As non-trivial space (15% in an index I am investigating) can be saved
> by doing DocValue without storing, would it be an idea to provide
> support for retrieving DocValued fields as part of document retrieval?
>
> This could be done in different ways:
>
> * Only return stored values with fl=*. If a field is referenced
>   explicitly with fl=myfield and is DocValued but not stored, return
>   the DocValued value.
>
> * State that DocValued fields, that are not stored, should be returned
>   with a flag: resolvedv=true
>
>
> - Toke Eskildsen, State and University Library, Denmark
>
>
>
> -
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


Re: DocValues instead of stored values

2015-03-02 Thread david.w.smi...@gmail.com
On Mon, Mar 2, 2015 at 9:13 AM, Shalin Shekhar Mangar <
[email protected]> wrote:

> The problem with fetching from DocValue automatically is that it may cause
> as many number of disk seeks as the number of doc value fields being
> retrieved.


True in the worst-case (cold disk cache).  If you have warm-up and only use
this technique judiciously (on a few docvalues fields), then one can assume
the data is cached.  And you can explicitly use a memory based DV format
too, which puts all the values in a memory-efficient FST assuming
string/text data.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


Re: svn commit: r1664126 - in /lucene/dev/trunk/solr: core/src/java/org/apache/solr/core/ core/src/java/org/apache/solr/handler/ core/src/test-files/ core/src/test/org/apache/solr/core/ core/src/test/

2015-03-04 Thread david.w.smi...@gmail.com
I use that judgement too — sometimes I don’t bother if it’s internal and I
never would for something trivial like a typo.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Wed, Mar 4, 2015 at 4:13 PM, Ramkumar R. Aiyengar <
[email protected]> wrote:

> The change had no functional impact, hence left it alone.
>
> But happy to follow whatever is the existing practice. Should I have one
> for every change?
>
> On Wed, Mar 4, 2015 at 8:29 PM, Alan Woodward  wrote:
>
>> Hi Ram, I think you missed a CHANGES.txt entry on this one?
>>
>> Alan Woodward
>> www.flax.co.uk
>>
>>
>> On 4 Mar 2015, at 19:45, [email protected] wrote:
>>
>> Author: andyetitmoves
>> Date: Wed Mar  4 19:45:09 2015
>> New Revision: 1664126
>>
>> URL: http://svn.apache.org/r1664126
>> Log:
>> SOLR-6804: Untangle SnapPuller and ReplicationHandler
>>
>> This closes #110
>>
>> Added:
>>
>>
>> lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/IndexFetcher.java
>>  - copied, changed from r1663969,
>> lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/SnapPuller.java
>> Removed:
>>
>>
>> lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/SnapPuller.java
>> Modified:
>>lucene/dev/trunk/solr/core/src/java/org/apache/solr/core/SolrCore.java
>>
>>
>> lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/ReplicationHandler.java
>>
>>
>> lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/SnapShooter.java
>>lucene/dev/trunk/solr/core/src/test-files/log4j.properties
>>
>>
>> lucene/dev/trunk/solr/core/src/test/org/apache/solr/core/TestArbitraryIndexDir.java
>>
>>
>> lucene/dev/trunk/solr/core/src/test/org/apache/solr/handler/TestReplicationHandler.java
>>
>>
>> lucene/dev/trunk/solr/test-framework/src/java/org/apache/solr/core/MockDirectoryFactory.java
>>
>> Modified:
>> lucene/dev/trunk/solr/core/src/java/org/apache/solr/core/SolrCore.java
>> URL:
>> http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/core/SolrCore.java?rev=1664126&r1=1664125&r2=1664126&view=diff
>>
>> ==
>> ---
>> lucene/dev/trunk/solr/core/src/java/org/apache/solr/core/SolrCore.java
>> (original)
>> +++
>> lucene/dev/trunk/solr/core/src/java/org/apache/solr/core/SolrCore.java Wed
>> Mar  4 19:45:09 2015
>> @@ -83,9 +83,9 @@ import org.apache.solr.common.util.IOUti
>> import org.apache.solr.common.util.NamedList;
>> import org.apache.solr.common.util.SimpleOrderedMap;
>> import org.apache.solr.core.DirectoryFactory.DirContext;
>> +import org.apache.solr.handler.IndexFetcher;
>> import org.apache.solr.handler.ReplicationHandler;
>> import org.apache.solr.handler.RequestHandlerBase;
>> -import org.apache.solr.handler.SnapPuller;
>> import org.apache.solr.handler.admin.ShowFileRequestHandler;
>> import org.apache.solr.handler.component.DebugComponent;
>> import org.apache.solr.handler.component.ExpandComponent;
>> @@ -291,7 +291,7 @@ public final class SolrCore implements S
>>   dir = getDirectoryFactory().get(getDataDir(), DirContext.META_DATA,
>> getSolrConfig().indexConfig.lockType);
>>   IndexInput input;
>>   try {
>> -input = dir.openInput(SnapPuller.INDEX_PROPERTIES,
>> IOContext.DEFAULT);
>> +input = dir.openInput(IndexFetcher.INDEX_PROPERTIES,
>> IOContext.DEFAULT);
>>   } catch (FileNotFoundException | NoSuchFileException e) {
>> input = null;
>>   }
>> @@ -307,7 +307,7 @@ public final class SolrCore implements S
>>   }
>>
>> } catch (Exception e) {
>> -  log.error("Unable to load " + SnapPuller.INDEX_PROPERTIES, e);
>> +  log.error("Unable to load " + IndexFetcher.INDEX_PROPERTIES,
>> e);
>> } finally {
>>   IOUtils.closeQuietly(is);
>> }
>>
>> Copied:
>> lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/IndexFetcher.java
>> (from r1663969,
>> lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/SnapPuller.java)
>> URL:
>> http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/IndexFetcher.java?p2=lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/IndexFetcher.java&p1=lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/SnapPuller.java&r1=1663969&r2=1664126&rev=1664126&view=diff
>>
>> ==
>> ---
>> lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/SnapPuller.java
>> (original)
>> +++
>> lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/IndexFetcher.java
>> Wed Mar  4 19:45:09 2015
>> @@ -67,11 +67,7 @@ import java.util.concurrent.ExecutionExc
>> import java.util.concurrent.ExecutorService;
>> import java.util.concurrent.Executors;
>> import java.util.concurrent.Future;
>> -import java.util.concurrent.ScheduledExecutorService;
>> import java.util.concurrent.Ti

Re: solr client sdk's/libraries for native platforms

2014-11-24 Thread david.w.smi...@gmail.com
FYI see https://wiki.apache.org/solr/IntegratingSolr for a list.  This is a
great use of the wiki.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Mon, Nov 24, 2014 at 10:35 AM, Alexandre Rafalovitch 
wrote:

> Well, a start would be to actually have an up-to-date list of Solr
> clients. I have the list, if somebody knows where it should go (Ref
> Guide). I don't want to contribute this to WIKI as we are trying to
> get rid of it.
>
> Then somebody (Summer of Code project?) would derive from that a list
> of clients that are up-to-date (a very different story). This would
> require a high-level set of features that clients are expected to
> cover. I have some thinking around that I am happy to share in a rough
> form.
>
> I would also - as mentioned before - setup a mailing list for all the
> client developers to discuss new features in a common way.
>
> Do not think of this as a primarily code problem - think of it as a
> community consolidation and establishing clear interfaces to the
> downstream projects.
>
> Regards,
>Alex.
>
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>
>
> On 24 November 2014 at 10:24, Noble Paul  wrote:
> > This has been a constant pain point for Solr. Java client is a first
> class
> > client where it benefits from knowing the correct servers to communicate
> to
> > because it is aware of the clusterstate. The java client also has the
> > advantage of using the faster and compact binary format.
> >
> > We will need to build these basic capabilities built in other languages
> such
> > as  C++, C# and provide bindings for other languages
> > . We are aware of this need and any suggestions to address this are
> welcome
> >
> > On Sun, Nov 23, 2014 at 2:08 PM, Anurag Sharma 
> wrote:
> >>
> >> Solr interface is through REST API's which makes it easy to integrate
> with
> >> any platform and do binding any language.
> >>
> >> Each developer have to write common code to do the api bindings if using
> >> Solr in non java framework/platform. This overhead can be reduced by
> >> building client sdk's/libraries for popular languages and platforms e.g.
> >> - web: js, ruby, python
> >> - mobile: Objective C, Swift, C#
> >> - other: C++,  Scala, perl, php
> >>
> >> Also, this can significantly reduce time to Solr on-boarding when using
> >> non java platform.
> >>
> >> Suggestions?
> >>
> >
> >
> >
> > --
> > -
> > Noble Paul
>
> -
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


IntelliJ build

2014-11-24 Thread david.w.smi...@gmail.com
On trunk I cleaned and re-created my IntelliJ based build (ant clean-idea,
idea).  IntelliJ didn’t get the memo about Java 8 so I changed that
(locally).  Then I found that the Solr velocity contrib couldn’t resolve a
ResourceLoader class in analysis-common.  So I simply checked the “export”
checkbox on analysis-comon from the Solr-core module, and Solr-core is a
dependency of velocity, and this it can resolve it.  Export is synonymous
with transitive resolution.  Now it compiles locally.  It seems like an odd
thing to go wrong.  Java 8 I expected.

So if any IntelliJ user has run into issues lately, maybe sharing my
experience will help.  I should commit the changes but I’ll wait for a
reply.

I think the “Export” (transitive resolution) feature could allow us to
simplify some of the dependency management quite a bit within IntelliJ so
that it may need less maintenance.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


Re: Where is the SVN repository only for Lucene project ?

2014-11-26 Thread david.w.smi...@gmail.com
GitHub offers SVN access:
svn checkout https://github.com/apache/lucene-solr

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Wed, Nov 26, 2014 at 4:19 AM, Yosuke Yamatani <
[email protected]> wrote:

> Dear sir/madam
>
> Hello, I’m Yosuke Yamatani.
> I’m a graduate student at Wakayama University, Japan.
> I study software evolution in OSS projects through the analysis of SVN
> repositories.
> I found the entire ASF repository, but I would like to mirror the SVN
> repository only for your project.
> Could you let me know how to get your repository ?
>
> Sincerely yours.
> Yosuke
>
>
>
> -
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


Re: IntelliJ build

2014-11-26 Thread david.w.smi...@gmail.com
I don’t feel strongly about that so I won’t.

Maybe the IntelliJ module dependencies could be built from the Maven
pom’s?  Does that sound feasible to you Steve?  There may be some
exceptions but if there aren’t a ton then why not?  Ultimately it would be
nice to have a solution that doesn’t require that much continuous updating.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Wed, Nov 26, 2014 at 4:57 PM, Steve Rowe  wrote:

> David,
>
> I’d rather not go down the transitive route, because it would introduce
> misalignments with the Ant build, and because unwanted transitive deps
> could improperly influence the IntelliJ build.  But if you feel strongly
> about it, go ahead: -0.
>
> Thanks for working on it.
>
> Steve
>
> > On Nov 24, 2014, at 10:37 PM, [email protected] wrote:
> >
> > On trunk I cleaned and re-created my IntelliJ based build (ant
> clean-idea, idea).  IntelliJ didn’t get the memo about Java 8 so I changed
> that (locally).  Then I found that the Solr velocity contrib couldn’t
> resolve a ResourceLoader class in analysis-common.  So I simply checked the
> “export” checkbox on analysis-comon from the Solr-core module, and
> Solr-core is a dependency of velocity, and this it can resolve it.  Export
> is synonymous with transitive resolution.  Now it compiles locally.  It
> seems like an odd thing to go wrong.  Java 8 I expected.
> >
> > So if any IntelliJ user has run into issues lately, maybe sharing my
> experience will help.  I should commit the changes but I’ll wait for a
> reply.
> >
> > I think the “Export” (transitive resolution) feature could allow us to
> simplify some of the dependency management quite a bit within IntelliJ so
> that it may need less maintenance.
> >
> > ~ David Smiley
> > Freelance Apache Lucene/Solr Search Consultant/Developer
> > http://www.linkedin.com/in/davidwsmiley
>
>


Re: svn commit: r1642294 - in /lucene/dev/trunk/lucene: ./ highlighter/src/java/org/apache/lucene/search/highlight/ highlighter/src/test/org/apache/lucene/search/highlight/ test-framework/src/java/org

2014-11-29 Thread david.w.smi...@gmail.com
Reposting my comment on JIRA:

Ouch; so sorry I failed the build! In my checkout I have several pending
issues related to highlighting, and apparently the Solr one, SOLR-6680
, is dependent. I should
have monitored the dev list closely; I recall getting a nastygram from
Jenkins when I failed the build in the past and thought I was in the clear
since I didn't get one this time.

The coupling between this and SOLR-6680
 is that TokenSources, *prior
to my commit here*, did not require that you call reset(). This is of
course a violation of the TokenSources contract which is unacceptable. The
patch to SOLR-6680  does
several things to DefaultSolrHighlighter, one of which is ensuring reset()
is called appropriately. Since I've posted SOLR-6680
 some time ago, I will
commit within an hour or so, and thus fix the build. I will also add a note
in the upgrading section of Lucene in case someone else might be forgetting
to reset the stream returned from TokenStream.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Sat, Nov 29, 2014 at 10:27 AM, Yonik Seeley 
wrote:

> Highlighting tests have been failing 100% lately.  Was it this commit?
>
> http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/11680/
>
> https://builds.apache.org/job/Lucene-Solr-Tests-5.x-Java7/2253/#showFailuresLink
>
> -Yonik
> http://heliosearch.org - native code faceting, facet functions,
> sub-facets, off-heap data
>
> -
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


Re: solr client sdk's/libraries for native platforms

2014-12-01 Thread david.w.smi...@gmail.com
I meant to reply earlier...

On Mon, Nov 24, 2014 at 11:37 AM, Alexandre Rafalovitch 
wrote:

> They are super-stale


Yup but it’s a wiki so feel free to freshen it up.  I’ll be doing that in a
bit.  It may also be helpful if these particular pages got more
prominence/visibility by being linked from the ref guide and/or the website.


> and there is no easy mechanism for people to
> announce their additions. I am not even sure the announcements are
> welcome on the user mailing list.
>

IMO the mailing list is an excellent place to announce new Solr
integrations in the ecosystem out there.  People announce various things on
the list from time to time.


>
> It comes down to the funnel/workflow. At the moment, the workflow
> makes it _hard_ to maintain those pages. CMM level 1 kind of hard.
>

Can you recommend a fix or alternative?


>
> Regards,
>Alex.
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>
>
> On 24 November 2014 at 11:26, Eric Pugh 
> wrote:
> > On the wiki are two pages listing out projects that use Solr:
> >
> > http://wiki.apache.org/solr/SolrEcosystem
> > http://wiki.apache.org/solr/IntegratingSolr
> >
> > I noticed that they have become stale and was going to update them.
>  Maybe they could have more prominence in the Solr site?  But keep them
> community driven since things change so quickly.
> >
> > Eric
> >
> >
> >> On Nov 24, 2014, at 10:35 AM, Alexandre Rafalovitch 
> wrote:
> >>
> >> Well, a start would be to actually have an up-to-date list of Solr
> >> clients. I have the list, if somebody knows where it should go (Ref
> >> Guide). I don't want to contribute this to WIKI as we are trying to
> >> get rid of it.
> >>
> >> Then somebody (Summer of Code project?) would derive from that a list
> >> of clients that are up-to-date (a very different story). This would
> >> require a high-level set of features that clients are expected to
> >> cover. I have some thinking around that I am happy to share in a rough
> >> form.
> >>
> >> I would also - as mentioned before - setup a mailing list for all the
> >> client developers to discuss new features in a common way.
> >>
> >> Do not think of this as a primarily code problem - think of it as a
> >> community consolidation and establishing clear interfaces to the
> >> downstream projects.
> >>
> >> Regards,
> >>   Alex.
> >>
> >> Personal: http://www.outerthoughts.com/ and @arafalov
> >> Solr resources and newsletter: http://www.solr-start.com/ and
> @solrstart
> >> Solr popularizers community:
> https://www.linkedin.com/groups?gid=6713853
> >>
> >>
> >> On 24 November 2014 at 10:24, Noble Paul  wrote:
> >>> This has been a constant pain point for Solr. Java client is a first
> class
> >>> client where it benefits from knowing the correct servers to
> communicate to
> >>> because it is aware of the clusterstate. The java client also has the
> >>> advantage of using the faster and compact binary format.
> >>>
> >>> We will need to build these basic capabilities built in other
> languages such
> >>> as  C++, C# and provide bindings for other languages
> >>> . We are aware of this need and any suggestions to address this are
> welcome
> >>>
> >>> On Sun, Nov 23, 2014 at 2:08 PM, Anurag Sharma 
> wrote:
> 
>  Solr interface is through REST API's which makes it easy to integrate
> with
>  any platform and do binding any language.
> 
>  Each developer have to write common code to do the api bindings if
> using
>  Solr in non java framework/platform. This overhead can be reduced by
>  building client sdk's/libraries for popular languages and platforms
> e.g.
>  - web: js, ruby, python
>  - mobile: Objective C, Swift, C#
>  - other: C++,  Scala, perl, php
> 
>  Also, this can significantly reduce time to Solr on-boarding when
> using
>  non java platform.
> 
>  Suggestions?
> 
> >>>
> >>>
> >>>
> >>> --
> >>> -
> >>> Noble Paul
> >>
> >> -
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >>
> >
> > -
> > Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com | My Free/Busy
> > Co-Author: Apache Solr 3 Enterprise Search Server
> > This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > -
> > To unsubscribe, e-mail: [email protected]
> > For additional 

Re: solr client sdk's/libraries for native platforms

2014-12-01 Thread david.w.smi...@gmail.com
I like the “last updated …” (rounded to the month) idea.  It may be
difficult to maintain a “last checked” distinction, and create somewhat
more of a burden on maintaining the list.  I think it’s useful to list out
old projects, maybe separately, and indicated as old.  This makes the page
a better comprehensive resource.

Thanks for volunteering Alex!

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Mon, Dec 1, 2014 at 7:35 PM, Alexandre Rafalovitch 
wrote:

> What would be the reasonable cutoff for the client library last
> update? Say if it was not updated in 2 years - should it be included
> in the list? In 3? Included with a warning?
>
> Or do we list them all and let the user sort it out? Or put a
> last-checked date on the wiki and mention rough last update against
> each library?
>
> Regards,
>Alex.
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>
>
> On 1 December 2014 at 11:03, Eric Pugh 
> wrote:
> > I think in the vein of a “do-it-tocracy”, getting the Wiki updated is a
> perfectly good first step, and then if there is a better approach,
> hopefully that occurs.… ;-)
> >
> >
> >
> >> On Dec 1, 2014, at 10:51 AM, Alexandre Rafalovitch 
> wrote:
> >>
> >> On 1 December 2014 at 10:02, [email protected]
> >>  wrote:
> >>> I meant to reply earlier...
> >>>
> >>> On Mon, Nov 24, 2014 at 11:37 AM, Alexandre Rafalovitch <
> [email protected]>
> >>> wrote:
> >>>>
> >>>> They are super-stale
> >>>
> >>>
> >>> Yup but it’s a wiki so feel free to freshen it up.  I’ll be doing that
> in a
> >>> bit.  It may also be helpful if these particular pages got more
> >>> prominence/visibility by being linked from the ref guide and/or the
> website.
> >>
> >> On the TODO list. If you are planning to update the client list, maybe
> >> we should coordinate, so we don't step on each other's toes. I am
> >> planning to do more than a minor tweak.
> >>
> >>>> and there is no easy mechanism for people to
> >>>> announce their additions. I am not even sure the announcements are
> >>>> welcome on the user mailing list.
> >>>
> >>>
> >>> IMO the mailing list is an excellent place to announce new Solr
> integrations
> >>> in the ecosystem out there.  People announce various things on the
> list from
> >>> time to time.
> >> I haven't even announced solr-start.com on the list, wasn't sure
> >> whether it's appropriate. So, maybe it's ok, but I suspect that's not
> >> visible.
> >>
> >>>> It comes down to the funnel/workflow. At the moment, the workflow
> >>>> makes it _hard_ to maintain those pages. CMM level 1 kind of hard.
> >>> Can you recommend a fix or alternative?
> >>
> >> I thought that's what my previous emails were about?!? Setup a
> >> 'client-maintainer' mailing list seeded with SolrJ people, update the
> >> Wiki, make it more prominent. Organize a TodoMVC equivalent for Solr
> >> clients (with prizes?). Ensure it is a topic (with mentor) for
> >> Google's Summer of Code. Have somebody from core Solr to keep at least
> >> one eye on the client communities' mailing lists.
> >>
> >> I started doing that as an individual, but the traction was not there.
> >> It needs at least a couple of people to push in the same direction.
> >>
> >> Regards,
> >>   Alex.
> >>
> >> -
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >>
> >
> > -
> > Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com | My Free/Busy
> > Co-Author: Apache Solr 3 Enterprise Search Server
> > This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > -
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
>
> -
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


Re: [JENKINS] Lucene-Solr-Tests-5.x-Java7 - Build # 2267 - Failure

2014-12-02 Thread david.w.smi...@gmail.com
I’ll dig.

On Tue, Dec 2, 2014 at 11:57 AM, Apache Jenkins Server <
[email protected]> wrote:

> Build: https://builds.apache.org/job/Lucene-Solr-Tests-5.x-Java7/2267/
>
> 1 tests failed.
> FAILED:  org.apache.lucene.spatial.prefix.DateNRStrategyTest.testContains
> {#9 seed=[801EEB0A7D92DF9E:7325FAA06CA28D9F]}
>
> Error Message:
> [Contains] Shouldn't match I#1:465172-01 Q:[465172 TO 465172-01]
>
> Stack Trace:
> java.lang.AssertionError: [Contains] Shouldn't match I#1:465172-01
> Q:[465172 TO 465172-01]
> at
> __randomizedtesting.SeedInfo.seed([801EEB0A7D92DF9E:7325FAA06CA28D9F]:0)
> at org.junit.Assert.fail(Assert.java:93)
> at
> org.apache.lucene.spatial.prefix.RandomSpatialOpStrategyTestCase.fail(RandomSpatialOpStrategyTestCase.java:126)
> at
> org.apache.lucene.spatial.prefix.RandomSpatialOpStrategyTestCase.testOperation(RandomSpatialOpStrategyTestCase.java:115)
> at
> org.apache.lucene.spatial.prefix.RandomSpatialOpStrategyTestCase.testOperationRandomShapes(RandomSpatialOpStrategyTestCase.java:62)
> at
> org.apache.lucene.spatial.prefix.DateNRStrategyTest.testContains(DateNRStrategyTest.java:65)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
> at
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
> at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
> at
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
> at
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
> at
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
> at
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
> at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
> at
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
> at
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54)
> at
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
> at
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
> at
> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> a

Re: [JENKINS] Lucene-Solr-Tests-5.x-Java7 - Build # 2267 - Failure

2014-12-04 Thread david.w.smi...@gmail.com
Filed with patch to fix: https://issues.apache.org/jira/browse/LUCENE-6092
Credit to randomized testing approach to discover the bug.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Tue, Dec 2, 2014 at 1:06 PM, [email protected] <
[email protected]> wrote:

> I’ll dig.
>
> On Tue, Dec 2, 2014 at 11:57 AM, Apache Jenkins Server <
> [email protected]> wrote:
>
>> Build: https://builds.apache.org/job/Lucene-Solr-Tests-5.x-Java7/2267/
>>
>> 1 tests failed.
>> FAILED:  org.apache.lucene.spatial.prefix.DateNRStrategyTest.testContains
>> {#9 seed=[801EEB0A7D92DF9E:7325FAA06CA28D9F]}
>>
>> Error Message:
>> [Contains] Shouldn't match I#1:465172-01 Q:[465172 TO 465172-01]
>>
>> Stack Trace:
>> java.lang.AssertionError: [Contains] Shouldn't match I#1:465172-01
>> Q:[465172 TO 465172-01]
>> at
>> __randomizedtesting.SeedInfo.seed([801EEB0A7D92DF9E:7325FAA06CA28D9F]:0)
>> at org.junit.Assert.fail(Assert.java:93)
>> at
>> org.apache.lucene.spatial.prefix.RandomSpatialOpStrategyTestCase.fail(RandomSpatialOpStrategyTestCase.java:126)
>> at
>> org.apache.lucene.spatial.prefix.RandomSpatialOpStrategyTestCase.testOperation(RandomSpatialOpStrategyTestCase.java:115)
>> at
>> org.apache.lucene.spatial.prefix.RandomSpatialOpStrategyTestCase.testOperationRandomShapes(RandomSpatialOpStrategyTestCase.java:62)
>> at
>> org.apache.lucene.spatial.prefix.DateNRStrategyTest.testContains(DateNRStrategyTest.java:65)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at
>> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
>> at
>> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
>> at
>> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
>> at
>> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
>> at
>> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
>> at
>> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
>> at
>> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>> at
>> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
>> at
>> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
>> at
>> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>> at
>> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>> at
>> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
>> at
>> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
>> at
>> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
>> at
>> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
>> at
>> com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
>> at
>> com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
>> at
>> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
>> at
>> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
>> at
>> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
>> at
>> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>> at
>> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
>> at
>> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(

Re: solr client sdk's/libraries for native platforms

2014-12-04 Thread david.w.smi...@gmail.com
Nice!

I like the title “Solr Ecosystem” so I propose the SolrIntegration content
by moved there, but it’s not critical to me that the content move that way
vs the other.

I think when listing projects grouped by source code language, it’s
important to make further distinctions as to what the nature of the project
is.  Some are just clients, some are actually response formats Solr
natively supports, and some are fundamentally integrated with another
framework (e.g. Rails or Django).  It’s good to see some of that here… but
it’s weird to see Haystack (a Django plug-in) down in the unorganized list
at the bottom “Integrating Solr with Other (Non Search) Applications”.
CMSs should get their own category.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Wed, Dec 3, 2014 at 8:36 AM, Alexandre Rafalovitch 
wrote:

> +1 on merging those two. But also needs a bit of a 'design' of what
> goes into it. I have probably another 30 links of various Solr-related
> products.
>
> I didn't touch SolrPython page because it had that extra information
> compared to just one liners on the main screen. And I didn't have the
> time to review whether those examples are still valid or need to be
> present. Same with SolrPHP legacy stuff I linked to.
>
> Another pass through this would be nice.
> > For example, little did I know there was a client for Solr for the Rust
> programming language.
> And four for Clojure :-)
>
> Regards,
>Alex.
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>
>
> On 3 December 2014 at 08:28, Eric Pugh 
> wrote:
> > Maybe this should just be one long page?   David and I were thinking of
> merging it, since it’s an arbitrary split anyway between IntegratingSolr
> and the SolrEcosystem pages.   After all, it’s all part of the
> SolrEcosystem!
> >
> > One of the reasons I like having this all pulled together into one place
> is that it shows new users how much breadth and depth there is!For
> example, little did I know there was a client for Solr for the Rust
> programming language.
> >
> > Maybe merge IntegratingSolr and SolrEcosystem and SolPython?  And rename
> SolPython to SolrPython, and put a link with just the example code bits?
> >
> > Eric
> >
> >> On Dec 3, 2014, at 12:23 AM, Alexandre Rafalovitch 
> wrote:
> >>
> >> Ok,
> >>
> >> Done: https://wiki.apache.org/solr/IntegratingSolr
> >> Also: https://wiki.apache.org/solr/SolPython
> >>
> >> I am not sure what to do with the stuff at the bottom of the client
> >> list, though I've put the dates on it anyway. It's neither
> >> comprehensive nor representative and I don't understand the
> >> significance of that part vs.
> >> https://wiki.apache.org/solr/SolrEcosystem . But that's all I had
> >> patience for this time with WIKI being an absolute turtle. Perhaps
> >> somebody else can revisit it with a fresh eye now that I cleaned it up
> >> a bit.
> >>
> >> Regards,
> >>   Alex.
> >> Personal: http://www.outerthoughts.com/ and @arafalov
> >> Solr resources and newsletter: http://www.solr-start.com/ and
> @solrstart
> >> Solr popularizers community:
> https://www.linkedin.com/groups?gid=6713853
> >>
> >>
> >> On 1 December 2014 at 20:04, [email protected]
> >>  wrote:
> >>> I like the “last updated …” (rounded to the month) idea.  It may be
> >>> difficult to maintain a “last checked” distinction, and create
> somewhat more
> >>> of a burden on maintaining the list.  I think it’s useful to list out
> old
> >>> projects, maybe separately, and indicated as old.  This makes the page
> a
> >>> better comprehensive resource.
> >>>
> >>> Thanks for volunteering Alex!
> >>>
> >>> ~ David Smiley
> >>> Freelance Apache Lucene/Solr Search Consultant/Developer
> >>> http://www.linkedin.com/in/davidwsmiley
> >>>
> >>> On Mon, Dec 1, 2014 at 7:35 PM, Alexandre Rafalovitch <
> [email protected]>
> >>> wrote:
> >>>>
> >>>> What would be the reasonable cutoff for the client library last
> >>>> update? Say if it was not updated in 2 years - should it be included
> >>>> in the list? In 3? Included with a warning?
> >>>>
> >>>> Or do we list them all

Re: Give Solr its "own" port number

2015-02-04 Thread david.w.smi...@gmail.com
-0  Hoss’s points are my view as well.  8983 is already pretty well known
amongst Solr users.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Wed, Feb 4, 2015 at 12:37 PM, Chris Hostetter 
wrote:

> : Until 5.x Solr would start on whatever port of the appserver chosen,
> i.e. 8983 for Jetty, 8080 for Tomcat etc.
> : Now that Solr is a "standalone" app, why should we "inherit" Jetty's
> default port 8983 anymore?
>
> last time i checked, 8983 is not "Jetty's default port" ... Jetty's
> hardcoded default port is "0" (ie: listen on any port assigned by the OS)
> and jetty's "sample" default (from the jetty.xml they ship to use as a
> default) use port "8080"
>
> IIRC: 8983 was explicitly picked for Solr years ago because there weren't
> really any other systems out there using it (unlike 8000, 8080, ,
> etc...)
>
> : * Identity - people will immediately identify Solr by its port number.
> Even IANA?
>
> this is pretty much already true -- and if folks really care about getting
> IANA recognition, then 8983 seems like the best choice since:
>   1) we already have heavy recognition for that port
>   2) it's currently "unassigned" in the IANA port numbers list.
>
> : PS: Same goes for the default URL. We could move to toplevel now
> http://localhost:8983/
>
> -0 ... i don't see any downside to leaving "/solr/" in the URL, and
> if/when we rip out the jetty stack completley and stop beholding to the
> servlet APIs internally it
> gives us flexibility if we want to start deprecating/retring things to be
> able to say "All of the legacy, pre-Solr X.0, APIs use a base path of
> '/solr/' and all the new hotness APIs use a base path of '/v2/'" ... or
> something like that.
>
> ie: worry about it when we have another use for the URL path.
>
>
>
> -Hoss
> http://www.lucidworks.com/
>
> -
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


Re: Interesting resource for Unix shell script cleanup

2015-02-05 Thread david.w.smi...@gmail.com
Cool!

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Thu, Feb 5, 2015 at 10:25 AM, Steve Rowe  wrote:

> > On Feb 5, 2015, at 9:51 AM, Alexandre Rafalovitch 
> wrote:
> >
> > Hi,
> >
> > Just saw a link to http://www.shellcheck.net/ .
> >
> > I run Solr start script and it picked up a couple of interesting
> > issues around variable escaping and deprecated shell commands.
> >
> > Is that something that's worth making JIRA about?
> >
>
> +1
>
> Steve
> http://www.lucidworks.com
> -
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


Re: Static Analysis Tooling

2015-02-05 Thread david.w.smi...@gmail.com
+1 to this idea.  Note this is tracked as
https://issues.apache.org/jira/browse/LUCENE-3973

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Thu, Feb 5, 2015 at 12:43 PM, Mike Drob  wrote:

> Devs,
>
> I'd like to bring up static analysis for Solr and Lucene again. It's been
> about a year since the last conversation[1] and it might be time to
> revisit. There is a JIRA issue too[2], but it's also in need of some love.
>
> ASF already provides a Sonar instance that we might be able to use[3],
> alternatively we can just hook up whatever static analysis tool works well
> with ant (this is most of them) and rely on Jenkins to provide reports. The
> Eclipse FindBugs plug-in works pretty well for me personally.
>
> I will plan on submitting first some patches to fix issues found as
> "critical" in my local instance. Then I will work on adding analysis to the
> build, and figuring out how to fail the build if we exceed a certain
> threshold. And then we can incrementally lower the threshold while fixing
> additional issues.
>
> Does this sound like a reasonable plan? I want to give folks a heads up
> before creating a bunch of issues - FindBugs currently reports just over
> 500 hits on trunk.
>
> Mike
>
> [1]: http://markmail.org/thread/pxf7lg7kzflnknmm
> [2]: https://issues.apache.org/jira/browse/LUCENE-5130
> [3]: https://analysis.apache.org/
>


Re: [VOTE] 5.0.0 RC2

2015-02-11 Thread david.w.smi...@gmail.com
I found two problems, and I’m not sure what to make of them.

First, perhaps the simplest.  I ran it with Java 8 with this at the
command-line (copied from Uwe’s email, inserting my environment variable):

python3 -u dev-tools/scripts/smokeTestRelease.py --test-java8 $JAVA8_HOME
http://people.apache.org/~anshum/staging_area/lucene-solr-5.0.0-RC2-rev1658469

And I got this:

Java 1.8
JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_20.jdk/Contents/Home
NOTE: output encoding is UTF-8

Load release URL "
http://people.apache.org/~anshum/staging_area/lucene-solr-5.0.0-RC2-rev1658469
"...
  unshortened:
http://people.apache.org/~anshum/staging_area/lucene-solr-5.0.0-RC2-rev1658469/

Test Lucene...
  test basics...
  get KEYS
0.1 MB in 0.69 sec (0.2 MB/sec)
  check changes HTML...
  download lucene-5.0.0-src.tgz...
27.9 MB in 129.06 sec (0.2 MB/sec)
verify md5/sha1 digests
verify sig
verify trust
  GPG: gpg: WARNING: This key is not certified with a trusted signature!
  download lucene-5.0.0.tgz...
64.0 MB in 154.61 sec (0.4 MB/sec)
verify md5/sha1 digests
verify sig
verify trust
  GPG: gpg: WARNING: This key is not certified with a trusted signature!
  download lucene-5.0.0.zip...
73.5 MB in 223.35 sec (0.3 MB/sec)
verify md5/sha1 digests
verify sig
verify trust
  GPG: gpg: WARNING: This key is not certified with a trusted signature!
  unpack lucene-5.0.0.tgz...
verify JAR metadata/identity/no javax.* or java.* classes...
Traceback (most recent call last):
  File "dev-tools/scripts/smokeTestRelease.py", line 1486, in 
main()
  File "dev-tools/scripts/smokeTestRelease.py", line 1431, in main
smokeTest(c.java, c.url, c.revision, c.version, c.tmp_dir, c.is_signed,
' '.join(c.test_args))
  File "dev-tools/scripts/smokeTestRelease.py", line 1468, in smokeTest
unpackAndVerify(java, 'lucene', tmpDir, artifact, svnRevision, version,
testArgs, baseURL)
  File "dev-tools/scripts/smokeTestRelease.py", line 616, in unpackAndVerify
verifyUnpacked(java, project, artifact, unpackPath, svnRevision,
version, testArgs, tmpDir, baseURL)
  File "dev-tools/scripts/smokeTestRelease.py", line 737, in verifyUnpacked
checkAllJARs(os.getcwd(), project, svnRevision, version, tmpDir,
baseURL)
  File "dev-tools/scripts/smokeTestRelease.py", line 257, in checkAllJARs
checkJARMetaData('JAR file "%s"' % fullPath, fullPath, svnRevision,
version)
  File "dev-tools/scripts/smokeTestRelease.py", line 185, in
checkJARMetaData
(desc, verify))
RuntimeError: JAR file
"/private/tmp/smoke_lucene_5.0.0_1658469_1/unpack/lucene-5.0.0/analysis/common/lucene-analyzers-common-5.0.0.jar"
is missing "X-Compile-Source-JDK: 1.8" inside its META-INF/MANIFEST.MF

When I executed the above command, my CWS was a trunk checkout. Should that
matter?  It seems unlikely; the specific error references the unpacked
location, not CWD.



I also executed with Java 7; I did this first, actually.  This time, my
JAVA_HOME is set to Java 7 and I ran this from my 5x checkout.  When the
Solr tests ran, I got a particular test failure.  It reproduces, but only
on the 5.0 checkout — not my 5x checkout:

ant test  -Dtestcase=SaslZkACLProviderTest
-Dtests.method=testSaslZkACLProvider -Dtests.seed=1E2F7F6DC94B2138
-Dtests.slow=true -Dtests.locale=hi_IN -Dtests.timezone=ACT
-Dtests.asserts=true -Dtests.file.encoding=UTF-8

Does this trip for anyone else?  Again, use Java 7 and the release branch.

~ David


Re: [VOTE] 5.0.0 RC2

2015-02-11 Thread david.w.smi...@gmail.com
Thanks for the clarifications on these two issues, Shalin, Ryan, and Uwe.

I got it to pass when my CWD is 5x and current JAVA_HOME is Java 7, with
—test-java8 test to my Java 8.

SUCCESS! [1:24:57.743374]

+1 to Ship!

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Wed, Feb 11, 2015 at 10:36 AM, Uwe Schindler  wrote:

> I think the problem is the inverse:
>
>
>
> RuntimeError: JAR file
> "/private/tmp/smoke_lucene_5.0.0_1658469_1/unpack/lucene-5.0.0/analysis/common/lucene-analyzers-common-5.0.0.jar"
> is missing "X-Compile-Source-JDK: 1.8" inside its META-INF/MANIFEST.MF
>
>
>
> The problem: Smoketester expects to find Java 1.8 in the JAR file’s
> metadata. The problem: Shalin said, he runs trunk’s smoke tester on the 5.0
> branch. This will break here, because Trunk’s smoketester expects Lucene
> compiled with Java 8.
>
>
>
> Uwe
>
> -
>
> Uwe Schindler
>
> H.-H.-Meier-Allee 63, D-28213 Bremen
>
> http://www.thetaphi.de
>
> eMail: [email protected]
>
>
>
> *From:* Ryan Ernst [mailto:[email protected]]
> *Sent:* Wednesday, February 11, 2015 3:27 PM
> *To:* [email protected]
> *Subject:* Re: [VOTE] 5.0.0 RC2
>
>
>
> And I got this:
> Java 1.8
> JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_20.jdk/Contents/Home
>
>
>
> Did you change your JAVA_HOME to point to java 8 as well (that's what it
> looks like since only jdk is listed in that output)? --test-java8 is meant
> to take the java 8 home, but your regular JAVA_HOME should stay java 7.
>
>
>
> On Wed, Feb 11, 2015 at 6:13 AM, [email protected] <
> [email protected]> wrote:
>
> I found two problems, and I’m not sure what to make of them.
>
>
>
> First, perhaps the simplest.  I ran it with Java 8 with this at the
> command-line (copied from Uwe’s email, inserting my environment variable):
>
>
>
> python3 -u dev-tools/scripts/smokeTestRelease.py --test-java8 $JAVA8_HOME
> http://people.apache.org/~anshum/staging_area/lucene-solr-5.0.0-RC2-rev1658469
>
>
>
> And I got this:
>
>
>
> Java 1.8
> JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_20.jdk/Contents/Home
>
> NOTE: output encoding is UTF-8
>
>
>
> Load release URL "
> http://people.apache.org/~anshum/staging_area/lucene-solr-5.0.0-RC2-rev1658469
> "...
>
>   unshortened:
> http://people.apache.org/~anshum/staging_area/lucene-solr-5.0.0-RC2-rev1658469/
>
>
>
> Test Lucene...
>
>   test basics...
>
>   get KEYS
>
> 0.1 MB in 0.69 sec (0.2 MB/sec)
>
>   check changes HTML...
>
>   download lucene-5.0.0-src.tgz...
>
> 27.9 MB in 129.06 sec (0.2 MB/sec)
>
> verify md5/sha1 digests
>
> verify sig
>
> verify trust
>
>   GPG: gpg: WARNING: This key is not certified with a trusted
> signature!
>
>   download lucene-5.0.0.tgz...
>
> 64.0 MB in 154.61 sec (0.4 MB/sec)
>
> verify md5/sha1 digests
>
> verify sig
>
> verify trust
>
>   GPG: gpg: WARNING: This key is not certified with a trusted
> signature!
>
>   download lucene-5.0.0.zip...
>
> 73.5 MB in 223.35 sec (0.3 MB/sec)
>
> verify md5/sha1 digests
>
> verify sig
>
> verify trust
>
>   GPG: gpg: WARNING: This key is not certified with a trusted
> signature!
>
>   unpack lucene-5.0.0.tgz...
>
> verify JAR metadata/identity/no javax.* or java.* classes...
>
> Traceback (most recent call last):
>
>   File "dev-tools/scripts/smokeTestRelease.py", line 1486, in 
>
> main()
>
>   File "dev-tools/scripts/smokeTestRelease.py", line 1431, in main
>
> smokeTest(c.java, c.url, c.revision, c.version, c.tmp_dir,
> c.is_signed, ' '.join(c.test_args))
>
>   File "dev-tools/scripts/smokeTestRelease.py", line 1468, in smokeTest
>
> unpackAndVerify(java, 'lucene', tmpDir, artifact, svnRevision,
> version, testArgs, baseURL)
>
>   File "dev-tools/scripts/smokeTestRelease.py", line 616, in
> unpackAndVerify
>
> verifyUnpacked(java, project, artifact, unpackPath, svnRevision,
> version, testArgs, tmpDir, baseURL)
>
>   File "dev-tools/scripts/smokeTestRelease.py", line 737, in verifyUnpacked
>
> checkAllJARs(os.getcwd(), project, svnRevision, version, tmpDir,
> baseURL)
>
>   File "dev-tools/scripts/smokeTestRelease.py", line 257, in checkAllJARs
>
> checkJARMetaData('JAR file "%s"' % fullPath, fullPath, svnRevision,
> version)
>
>   File 

Re: [JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0_31) - Build # 11780 - Failure!

2015-02-11 Thread david.w.smi...@gmail.com
It reproduces; I’m on it.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Wed, Feb 11, 2015 at 12:30 PM, Policeman Jenkins Server <
[email protected]> wrote:

> Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/11780/
> Java: 32bit/jdk1.8.0_31 -server -XX:+UseConcMarkSweepGC
>
> 1 tests failed.
> FAILED:
> org.apache.lucene.spatial.prefix.HeatmapFacetCounterTest.testRandom {#3
> seed=[6B7EE18F8044BF08:1263454538DCD1B5]}
>
> Error Message:
> expected:<1> but was:<0>
>
> Stack Trace:
> java.lang.AssertionError: expected:<1> but was:<0>
> at
> __randomizedtesting.SeedInfo.seed([6B7EE18F8044BF08:1263454538DCD1B5]:0)
> at org.junit.Assert.fail(Assert.java:93)
> at org.junit.Assert.failNotEquals(Assert.java:647)
> at org.junit.Assert.assertEquals(Assert.java:128)
> at org.junit.Assert.assertEquals(Assert.java:472)
> at org.junit.Assert.assertEquals(Assert.java:456)
> at
> org.apache.lucene.spatial.prefix.HeatmapFacetCounterTest.validateHeatmapResult(HeatmapFacetCounterTest.java:221)
> at
> org.apache.lucene.spatial.prefix.HeatmapFacetCounterTest.queryHeatmapRecursive(HeatmapFacetCounterTest.java:188)
> at
> org.apache.lucene.spatial.prefix.HeatmapFacetCounterTest.queryHeatmapRecursive(HeatmapFacetCounterTest.java:201)
> at
> org.apache.lucene.spatial.prefix.HeatmapFacetCounterTest.testRandom(HeatmapFacetCounterTest.java:172)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
> at
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
> at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
> at
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
> at
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
> at
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
> at
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
> at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
> at
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
> at
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54)
> at
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
> at
> org

Re: [VOTE] 5.0.0 RC2

2015-02-13 Thread david.w.smi...@gmail.com
Anshum (and anyone else),

How would you feel about getting
https://issues.apache.org/jira/browse/LUCENE-6215 in, which is quite simply
moving a particular class (new to 5.x) to the correct Java package.  If
this isn’t done… then I’m forced to consider marking it deprecated at it’s
current wrong location and adding a subclass with the same name in the
correct location. Yuck.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Fri, Feb 13, 2015 at 4:09 AM, Anshum Gupta 
wrote:

> Not exactly but the one that Mark asked for help on has a mention of this.
>
> On Fri, Feb 13, 2015 at 1:06 AM, Uwe Schindler  wrote:
>
>> Ah,
>>
>>
>>
>> is this related to the one where Mark Miller also asked me for help
>> during review – I wanted to take care today?
>> https://issues.apache.org/jira/browse/SOLR-6736
>>
>>
>>
>> Uwe
>>
>>
>>
>> -
>>
>> Uwe Schindler
>>
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>
>> http://www.thetaphi.de
>>
>> eMail: [email protected]
>>
>>
>>
>> *From:* Anshum Gupta [mailto:[email protected]]
>> *Sent:* Friday, February 13, 2015 10:02 AM
>>
>> *To:* [email protected]
>> *Subject:* Re: [VOTE] 5.0.0 RC2
>>
>>
>>
>> Hi Uwe,
>>
>>
>>
>> You could upload a jar to Solr via the blob handler and then register
>> this custom-handler via the configs API.
>>
>> Anyone having http access to any solr node could potentially run
>> malicious code on all nodes.
>>
>>
>>
>>
>>
>> On Fri, Feb 13, 2015 at 12:56 AM, Uwe Schindler  wrote:
>>
>> Hi,
>>
>>
>>
>> What are we talking about? I just heard security, but no issue number or
>> explanation what’s wrong!
>>
>>
>>
>> Uwe
>>
>>
>>
>> -
>>
>> Uwe Schindler
>>
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>
>> http://www.thetaphi.de
>>
>> eMail: [email protected]
>>
>>
>>
>> *From:* Shalin Shekhar Mangar [mailto:[email protected]]
>> *Sent:* Friday, February 13, 2015 9:49 AM
>> *To:* [email protected]
>> *Subject:* Re: [VOTE] 5.0.0 RC2
>>
>>
>>
>> This is serious enough to re-spin. I have to change my vote to -1 to
>> release the current RC.
>>
>> On 13-Feb-2015 2:15 pm, "Noble Paul"  wrote:
>>
>> We should disable the dynamic loading by default . It's a security
>> vulnerability and users should have to explicitly enable it in a system
>> property.
>>
>> On Feb 13, 2015 6:47 AM, "Anshum Gupta"  wrote:
>>
>> Thank you everyone! This vote has passed and I'll start the process later
>> tonight.
>>
>>
>>
>>
>>
>> On Mon, Feb 9, 2015 at 3:16 PM, Anshum Gupta 
>> wrote:
>>
>> Please vote for the second release candidate for Lucene/Solr 5.0.0.
>>
>>
>>
>> The artifacts can be downloaded here:
>>
>>
>> http://people.apache.org/~anshum/staging_area/lucene-solr-5.0.0-RC2-rev1658469
>>
>>
>>
>> Or you can run the smoke tester directly with this command:
>>
>> python3.2 dev-tools/scripts/smokeTestRelease.py
>> http://people.apache.org/~anshum/staging_area/lucene-solr-5.0.0-RC2-rev1658469
>>
>>
>>
>>
>>
>> I could not get the above command to work as downloading some file or the
>> other timed out for me (over 6 attempts) so I instead downloaded the entire
>> RC as a tgz. I still have it here:
>>
>>
>>
>>
>> http://people.apache.org/~anshum/staging_area/lucene-solr-5.0.0-RC2-rev1658469.tgz
>>
>>
>>
>> Untar the above folder at a location of choice. Do not change the name of
>> the folder as the smokeTestRelease.py extracts information from that.
>>
>>
>>
>> and then instead of using http, used file://. Here's the command:
>>
>>
>>
>> python3.2 dev-tools/scripts/smokeTestRelease.py
>> file://
>>
>>
>>
>> and finally, here's my +1:
>>
>>
>>
>> > SUCCESS! [0:30:50.246761]
>>
>>
>>
>>
>> --
>>
>> Anshum Gupta
>>
>> http://about.me/anshumgupta
>>
>>
>>
>>
>>
>> --
>>
>> Anshum Gupta
>>
>> http://about.me/anshumgupta
>>
>>
>>
>>
>>
>> --
>>
>> Anshum Gupta
>>
>> http://about.me/anshumgupta
>>
>
>
>
> --
> Anshum Gupta
> http://about.me/anshumgupta
>


Re: Welcome Varun Thacker as Lucene/Solr committer

2015-02-23 Thread david.w.smi...@gmail.com
Welcome Varun, and congratulations!

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Mon, Feb 23, 2015 at 9:51 AM, Grant Ingersoll 
wrote:

> Hi All,
>
> Please join me in welcoming Varun Thacker as the latest committer on
> Lucene and Solr.
>
> Varun, tradition is for you to provide a brief bio about yourself.
>
> Welcome aboard!
>
> -Grant
>
>
>
>


StandardTokenizer, maxTokenLength behavior — likely bug

2015-01-26 Thread david.w.smi...@gmail.com
On one of my other open-source projects (SolrTextTagger) I have a test that
deliberately tests the effect of a very long token with the
StandardTokenizer, and that project is in turn tested against a wide matrix
of Lucene/Solr versions.  Before Lucene 4.9, if you had a token that
exceeded maxTokenLength (by default the max is 255), this created a skipped
position — basically a pseudo-stop-word.  Since 4.9, this doesn’t happen
anymore; the JFlex scanner thing never reports a token > 255.  I checked
our code coverage and sure enough the “skippedPositions++” never happens:

https://builds.apache.org/job/Lucene-Solr-Clover-trunk/lastSuccessfulBuild/clover-report/org/apache/lucene/analysis/standard/StandardTokenizer.html?line=167#src-167

Any thoughts on this?  Steve?

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


Re: StandardTokenizer, maxTokenLength behavior — likely bug

2015-01-26 Thread david.w.smi...@gmail.com
Thanks Steve.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Mon, Jan 26, 2015 at 11:22 AM, Steve Rowe  wrote:

> The behavior changed in https://issues.apache.org/jira/browse/LUCENE-5897
> / https://issues.apache.org/jira/browse/LUCENE-5400
>
> On Mon, Jan 26, 2015 at 11:17 AM, [email protected] <
> [email protected]> wrote:
>
>> On one of my other open-source projects (SolrTextTagger) I have a test
>> that deliberately tests the effect of a very long token with the
>> StandardTokenizer, and that project is in turn tested against a wide matrix
>> of Lucene/Solr versions.  Before Lucene 4.9, if you had a token that
>> exceeded maxTokenLength (by default the max is 255), this created a skipped
>> position — basically a pseudo-stop-word.  Since 4.9, this doesn’t happen
>> anymore; the JFlex scanner thing never reports a token > 255.  I checked
>> our code coverage and sure enough the “skippedPositions++” never happens:
>>
>>
>> https://builds.apache.org/job/Lucene-Solr-Clover-trunk/lastSuccessfulBuild/clover-report/org/apache/lucene/analysis/standard/StandardTokenizer.html?line=167#src-167
>>
>> Any thoughts on this?  Steve?
>>
>> ~ David Smiley
>> Freelance Apache Lucene/Solr Search Consultant/Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>
>


Re: Potential contribution: Geo 3d package

2015-01-22 Thread david.w.smi...@gmail.com
Nice Karl!  I’d love to learn more about this.  Does the shapes here
implement a Spatial4j Shape and thus would work with SpatialPrefixTree &
friends for index & search?  If not, what is the search side of the
equation here?

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Thu, Jan 22, 2015 at 3:08 PM, Karl Wright  wrote:

> I would like to explore contributing a geo3d package to Lucene.  This can
> be used in conjunction with Lucene search, both for generating geohashes
> (via spatial4j) for complex geographic shapes, as well as limiting results
> resulting from those queries to those results within the exact shape in
> highly performant ways.
>
> The package uses 3d planar geometry to do its magic, which basically
> limits computation necessary to determine membership (once a shape has been
> initialized, of course) to only multiplications and additions, which makes
> it feasible to construct a performant BoostSource-based filter for
> geographic shapes.  The math is somewhat more involved when generating
> geohashes, but is still more than fast enough to do a good job.
>
> For reasons that are not really technical, the only open-source project
> that I can contribute this to initially is Lucene.  If people believe it
> would be a valuable addition, and would like me to create a ticket and
> attach a patch, please respond.
>
> Thanks,
> Karl Wright
>
>


Re: Potential contribution: Geo 3d package

2015-01-22 Thread david.w.smi...@gmail.com
Okay.  Since this is not _already_ implementing a Spatial4j shape, I can
only presume this isn't using SpatialPrefixTree &
RecursivePrefixTreeStrategy etc.  So how is index & search done?  Or is
that simply not a part of what you are open-sourcing here — this
open-source release is just the computational geometry work you’ve done?
If I’m right can you reveal how that’s working in your system or is that
not for public release?

Any way, to make it abundantly clear I’m a strong +1 to this based on what
you’ve had to say so far.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Thu, Jan 22, 2015 at 3:42 PM, Karl Wright  wrote:

> I should make it clear: geo3d does not do geo hashing by itself -- it
> simply provides support for determining relationships between shapes and
> traditional bounding boxes, which is what Spatial4J needs to support Lucene
> geo hashing.
>
> Karl
>
> On Thu, Jan 22, 2015 at 3:36 PM, Karl Wright  wrote:
>
>> Hi David,
>>
>> The package itself is independent of spatial4j, but a GeoShape
>> implementation of spatial4j Shape is trivial; I can contribute that
>> separately.
>>
>> Karl
>>
>>
>> On Thu, Jan 22, 2015 at 3:27 PM, [email protected] <
>> [email protected]> wrote:
>>
>>> Nice Karl!  I’d love to learn more about this.  Does the shapes here
>>> implement a Spatial4j Shape and thus would work with SpatialPrefixTree &
>>> friends for index & search?  If not, what is the search side of the
>>> equation here?
>>>
>>> ~ David Smiley
>>> Freelance Apache Lucene/Solr Search Consultant/Developer
>>> http://www.linkedin.com/in/davidwsmiley
>>>
>>> On Thu, Jan 22, 2015 at 3:08 PM, Karl Wright  wrote:
>>>
>>>> I would like to explore contributing a geo3d package to Lucene.  This
>>>> can be used in conjunction with Lucene search, both for generating
>>>> geohashes (via spatial4j) for complex geographic shapes, as well as
>>>> limiting results resulting from those queries to those results within the
>>>> exact shape in highly performant ways.
>>>>
>>>> The package uses 3d planar geometry to do its magic, which basically
>>>> limits computation necessary to determine membership (once a shape has been
>>>> initialized, of course) to only multiplications and additions, which makes
>>>> it feasible to construct a performant BoostSource-based filter for
>>>> geographic shapes.  The math is somewhat more involved when generating
>>>> geohashes, but is still more than fast enough to do a good job.
>>>>
>>>> For reasons that are not really technical, the only open-source project
>>>> that I can contribute this to initially is Lucene.  If people believe it
>>>> would be a valuable addition, and would like me to create a ticket and
>>>> attach a patch, please respond.
>>>>
>>>> Thanks,
>>>> Karl Wright
>>>>
>>>>
>>>
>>
>


Re: Potential contribution: Geo 3d package

2015-01-22 Thread david.w.smi...@gmail.com
Ok; I get it.  Please go right ahead and create an issue with the source
attached.

p.s. I was half-expecting you to mention you had some sort of quadrangle
thing…. (memories from 5 years ago when you worked at MetaCarta) :-)

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Thu, Jan 22, 2015 at 4:46 PM, Karl Wright  wrote:

> Hi David,
>
> What I'm open-sourcing is the computational geometry package which
> provides the _underpinnings_ of both geo hash generation and shape
> filtering in our system.  What I would expect a user to index are geohashes
> generated by Lucene's spatial4j integration, using a spatial4j Shape which
> is trivial and which I can contribute separately only (because it has other
> ties in our world that I am *not* contributing).  The user will also need
> to index DocValues fields for each document that contain the X, Y, and Z
> values of the items in question, which can be generated by the package I'm
> contributing as well.  For filtering to the shape, there are a number of
> approaches possible; we have a BoostSource implementation that does this
> but which requires a FunctionQuery to be part of the search query.
>
> If you are in favor, I'll create a ticket and attach the library.
>
> Karl
>
>
> On Thu, Jan 22, 2015 at 4:01 PM, [email protected] <
> [email protected]> wrote:
>
>> Okay.  Since this is not _already_ implementing a Spatial4j shape, I can
>> only presume this isn't using SpatialPrefixTree &
>> RecursivePrefixTreeStrategy etc.  So how is index & search done?  Or is
>> that simply not a part of what you are open-sourcing here — this
>> open-source release is just the computational geometry work you’ve done?
>> If I’m right can you reveal how that’s working in your system or is that
>> not for public release?
>>
>> Any way, to make it abundantly clear I’m a strong +1 to this based on
>> what you’ve had to say so far.
>>
>> ~ David Smiley
>> Freelance Apache Lucene/Solr Search Consultant/Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>> On Thu, Jan 22, 2015 at 3:42 PM, Karl Wright  wrote:
>>
>>> I should make it clear: geo3d does not do geo hashing by itself -- it
>>> simply provides support for determining relationships between shapes and
>>> traditional bounding boxes, which is what Spatial4J needs to support Lucene
>>> geo hashing.
>>>
>>> Karl
>>>
>>> On Thu, Jan 22, 2015 at 3:36 PM, Karl Wright  wrote:
>>>
>>>> Hi David,
>>>>
>>>> The package itself is independent of spatial4j, but a GeoShape
>>>> implementation of spatial4j Shape is trivial; I can contribute that
>>>> separately.
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Thu, Jan 22, 2015 at 3:27 PM, [email protected] <
>>>> [email protected]> wrote:
>>>>
>>>>> Nice Karl!  I’d love to learn more about this.  Does the shapes here
>>>>> implement a Spatial4j Shape and thus would work with SpatialPrefixTree &
>>>>> friends for index & search?  If not, what is the search side of the
>>>>> equation here?
>>>>>
>>>>> ~ David Smiley
>>>>> Freelance Apache Lucene/Solr Search Consultant/Developer
>>>>> http://www.linkedin.com/in/davidwsmiley
>>>>>
>>>>> On Thu, Jan 22, 2015 at 3:08 PM, Karl Wright 
>>>>> wrote:
>>>>>
>>>>>> I would like to explore contributing a geo3d package to Lucene.  This
>>>>>> can be used in conjunction with Lucene search, both for generating
>>>>>> geohashes (via spatial4j) for complex geographic shapes, as well as
>>>>>> limiting results resulting from those queries to those results within the
>>>>>> exact shape in highly performant ways.
>>>>>>
>>>>>> The package uses 3d planar geometry to do its magic, which basically
>>>>>> limits computation necessary to determine membership (once a shape has 
>>>>>> been
>>>>>> initialized, of course) to only multiplications and additions, which 
>>>>>> makes
>>>>>> it feasible to construct a performant BoostSource-based filter for
>>>>>> geographic shapes.  The math is somewhat more involved when generating
>>>>>> geohashes, but is still more than fast enough to do a good job.
>>>>>>
>>>>>> For reasons that are not really technical, the only open-source
>>>>>> project that I can contribute this to initially is Lucene.  If people
>>>>>> believe it would be a valuable addition, and would like me to create a
>>>>>> ticket and attach a patch, please respond.
>>>>>>
>>>>>> Thanks,
>>>>>> Karl Wright
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


Re: Solr geospatial index?

2015-01-10 Thread david.w.smi...@gmail.com
Hello Matteo,

Welcome. You are not bothering/me-us; you are asking in the right place.

Jack’s right in terms of the field type dictating how it works.

LatLonType, simply stores the latitude and longitude internally as separate
floating point fields and it does efficient range queries over them for
bounding-box queries.  Lucene has remarkably fast/efficient range queries
over numbers based on a Trie/PrefixTree. In fact systems like TitanDB leave
such queries to Lucene.  For point-radius, it iterates over all of them
in-memory in a brute-force fashion (not scalable but may be fine).

BBoxField is similar in spirit to LatLonType; each side of an indexed
rectangle gets its own floating point field internally.

Note that for both listed above, the underlying storage and range queries
use built-in numeric fields.

SpatialRecursivePrefixTreeFieldType (RPT for short) is interesting in that
it supports indexing essentially any shape by representing the indexed
shape as multiple grid squares.  Non-point shapes (e.g. a polygon) are
approximated; if you need accuracy, you should additionally store the
vector geometry and validate the results in a 2nd pass (see
SerializedDVStrategy for help with that).  RPT, like Lucene’s numeric
fields, uses a Trie/PrefixTree but encodes two dimensions, not one.

The Trie/PrefixTree concept underlies both RPT and numeric fields, which
are approaches to using Lucene’s terms index to encode prefixes.  So the
big point here is that Lucene/Solr doesn’t have side indexes using
fundamentally different technologies for different types of data; no;
Lucene’s one versatile index looks up terms (for keyword search), numbers,
AND 2-d spatial.  For keyword search, the term is a word, for numbers, the
term represents a contiguous range of values (e.g. 100-200), and for 2-d
spatial, a term is a grid square (a 2-D range).

I am aware many other DBs put spatial data in R-Trees, and I have no
interest investing energy in doing that in Lucene.  That isn’t to say I
think that other DBs shouldn’t be using R-Trees.  I think a system based on
sorted keys/terms (like Lucene and Cassandra, Accumulo, HBase, and others)
already have a powerful/versatile index such that it doesn’t warrant
complexity in adding something different.  And Lucene’s underlying index
continues to improve.  I am most excited about an “auto-prefixing”
technique McCandless has been working on that will bring performance up to
the next level for numeric & spatial data in Lucene’s index.

If you’d like to learn more about RPT and Lucene/Solr spatial, I suggest my
“Spatial Deep Dive” presentation at Lucene Revolution in San Diego, May
2013:  Lucene / Solr 4 Spatial Deep Dive

Also, my article here illustrates some RPT concepts in terms of indexing:
http://opensourceconnections.com/blog/2014/04/11/indexing-polygons-in-lucene-with-accuracy/

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Sat, Jan 10, 2015 at 10:26 AM, Matteo Tarantino <
[email protected]> wrote:

> Hi all,
> I hope to not bother you, but I think I'm writing to the only mailing list
> that can help me with my question.
>
> I am writing my master thesis about Geographical Information Retrieval
> (GIR) and I'm using Solr to create a little geospatial search engine.
> Reading  papers about GIR I noticed that these systems use a separate data
> structure (like an R-tree http://it.wikipedia.org/wiki/R-tree) to save
> geographical coordinates of documents, but I have found nothing about how
> Solr manages coordinates.
>
> Can someone help me, and most of all, can someone address me to documents
> that talk about how and where Solr saves spatial informations?
>
> Thank you in advance
> Matteo
>


Re: how to highlight the whole search phrase only?

2015-01-12 Thread david.w.smi...@gmail.com
Hi Meena,
Please use the “solr-user” list for user questions. This is the list for
development of Lucene & Solr.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Mon, Jan 12, 2015 at 6:26 PM, [email protected] <
[email protected]> wrote:

> Highlighting does not highlight the whole Phrase, instead each word gets
> highlighted.
> I tried all the suggestions that was given, with no luck
> These are my special setting for phrase highlighting
> hl.usePhraseHighlighter=true
> hl.q="query"
>
>
>
> http://localhost.mathworks.com:8983/solr/db/select?q=syndrome%3A%22Override+ignored+for+property%22&rows=1&fl=syndrome_id&wt=json&indent=true&hl=true&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E&hl.usePhraseHighlighter=true&hl.q=%22Override+ignored+for+property%22&hl.fragsize=1000
>
>
> This is from my schema.xml
> 
>
> Should I add special in the indexing stage itself to make this work?
>
> Thanks for your time.
>
> Meena
>
>
>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/how-to-highlight-the-whole-search-phrase-only-tp4179078.html
> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


  1   2   3   >