Hello Erick,
I was trying to optimise the searching.
Basically my data is like field1 has less no of docs matching compared to
field2, which has larget sets.
So if search goes by order to order, then i can make field1 to be search
first, (by making order of boolean query such )and from thr the
Thanks for the ref - didn't know about Pig before.
the language and approach looks useful, so now I'm wondering if it
couldn't be used
across lucene over hadoop too. If data was indexed in lucene and Pig knew that,
then it could make for an interesting alternate lucene query language.
could this w
All -
I have a question is about memory use and Lucene. I'm not sure if I'm
dealing with a leak, or if I'm seeing expected behavior. I'll preface this
by acknowledging that the "error" could be in my understanding of things.
I've included a lot of information below. There is a demo progr
By issuing multiple queries, one against each localized index, results being
clustered by locale.
You can further refine by translating the end-user input query terms for
each locale and issue "translated" queries against the respective indices.
I've seen satisfying results with "key" terms dicti
pagod wrote:
>
> ... apply only in a particular situation:
>
Very true, as often in the IR field :-) ; in our case, the "same" document
existed in different locales; these were localized technical docs which also
meant the dictionary (of important) terms was limited and used to influence
scorin
pagod wrote:
>
> ... apply only in a particular situation:
>
Very true, as often in the IR field :-) ; in our case, the "same" document
existed in different locales; these were localized technical docs which also
meant the dictionary (of important) terms was limited and used to influence
scorin
My big question is how do you loop 1M records, sum up field(s), and then
sort on that field... all in memory (could use too much ram) ? In a
temporary index (could take a while to re-write a lot of documents in a new
index) ?
- Mike
aka...@gmail.com
On Thu, Apr 1, 2010 at 5:31 PM, Chris Lu wro
By issuing multiple queries, one against each localized index, results being
clustered by locale.
You can further refine by translating the end-user input query terms for
each locale and issue "translated" queries against the respective indices.
I've seen satisfying results with "key" terms dictio
Hmm, not good. Can you post a heap dump? Also, can you turn on
infoStream, index up to the OOM @ 512 MB, and post the output?
IndexWriter should not hang onto much beyond the RAM buffer. But, it
does allocate and then recycle this RAM buffer, so even in an idle
state (having indexed enough docs
We are seeing a situation where the IndexWriter is using up the Java Heap space
and only releases memory for garbage collection upon a commit. We are using
the default RAMBufferSize of 16 mb. We are using Lucene 2.9.1. We are set at
heap size of 512 mb.
We have a large number of documents th
Hi David,
pagod wrote:
>
> ... apply only in a particular situation:
>
Very true, as often in the IR field :-) ; in our case, the "same" document
existed in different locales; these were localized technical docs which also
meant the dictionary (of important) terms was limited and used to influ
By issuing multiple queries, one against each localized index, results being
clustered by locale.
You can further refine by translating the end-user input query terms for
each locale and issue "translated" queries against the respective indices.
I've seen satisfying results with "key" terms dicti
Thanks. Not really trying to sell DBSight here since most people here
are Lucene experts.
Just to confirm that this "challenge" has been done via Lucene for quite
a while.
The technique for it is very similar to how facet search is done, which
has several ways also.
Million's of rows are not r
I'm sure the DBSight feature is great, but we already have a system in place
and we're not throwing it away -- it's closely integrated with our whole
platform. We're way past the point to switch our solution to DBSight. We'd
be more than happy to use the DBSight feature if it would be opensource b
For DBSight, the aggregated values are computed during run time.
And the sorting on the computed aggregated values are done when
displaying the results.
The idea is, after the aggregation, the number of aggregated values are
much much smaller.
--
Chris Lu
-
Instant Sc
On Fri, Apr 2, 2010 at 12:54 AM, Chris Lu wrote:
> No need for Hadoop. It's even more slower. Lucene can do it easily.
>
> This has been implemented in DBSight.
> The implementation is very similar to Facet search. Just need a way to load
> the field quickly, like put it in memory or some data str
Hi all,
I'm happy to announce the release of Luke - the Lucene Index Toolbox.
You can get an executable self-contained jar here:
http://luke.googlecode.com/files/lukeall-1.0.1.jar
The Downloads section contains also the source code and a minimal
Luke-only jar.
This release upgrades to Lucene 3
No need for Hadoop. It's even more slower. Lucene can do it easily.
This has been implemented in DBSight.
The implementation is very similar to Facet search. Just need a way to
load the field quickly, like put it in memory or some data structure,
and count the sum/min/max during searching.
--
Hey,
I found them by googling and searching within the website but it
will be better to update the links in that wiki .
On Fri, Apr 2, 2010 at 12:43 AM, rohit dholakia wrote:
> Hi,
>
>I am trying to access the articles in the resources part of the lucene
> wiki but all of them say
Hi,
I am trying to access the articles in the resources part of the lucene
wiki but all of them say "Page not found" . Why is that? Are all the
articles hosted at another page now ?
Rohit
This looks like a use case more suited for Pig ( over Hadoop ).
It could be difficult for lucene to do sort and sum simultaneously as
sorting itself depends upon summed value.
On Thu, Apr 1, 2010 at 11:47 PM, Michel Nadeau wrote:
> Well that's my problem: we have a lot of records of all types (
Well that's my problem: we have a lot of records of all types (afiiliates,
sales) so looping tons of records each time isn't possible.
- Mike
aka...@gmail.com
On Thu, Apr 1, 2010 at 2:11 PM, prasenjit mukherjee wrote:
> If the number of documents ( in this case "Affiliates" ) aren't huge,
> so
If the number of documents ( in this case "Affiliates" ) aren't huge,
sorting can probably be done as a post-process.
Still dont see any need of joins here.
On Thu, Apr 1, 2010 at 7:16 PM, Michel Nadeau wrote:
> Hi,
>
> Here's an example of raw data that would be in my Sales index:
>
> *Affili
Hi, Michel,
You can use DBSight free version to test it out.
However, it's a whole solution since you will need to configure it
first. Like specifying which column you want to do the counting before
the actual search.
BTW: DBSight also support MIN and MAX, in addition to SUM,AVG.
--
Chris Lu
1 apr 2010 kl. 11.21 skrev >:
its written "to do a "search within search", so that the second
search is
constrained by the results of the first query"
If I understand your needs you could while collecting search results
populate a new filter with all matching documents and use that filt
: Subject: query: order of search
: In-Reply-To: <8d42dcc0-4e03-4f8b-a6cc-c53890910...@transpac.com>
: References:
: <8d42dcc0-4e03-4f8b-a6cc-c53890910...@transpac.com>
http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists
When starting a new discussion on a mail
Have you looked at Solr's StatsComponent?
On Mar 31, 2010, at 9:17 PM, Michel Nadeau wrote:
> Hi,
>
> We're currently in the process of switching many of our screens from MySQL
> to Lucene because MySQL simply dies because we have too much data and it's
> becoming too long to generate the stats
Why do you care? By that I mean "what problem are you trying to solve"
(See "The XY problem at http://people.apache.org/~hossman/). The reason
I'm asking here is that very often, when people ask this kind of question
without providing background, they're trying the wrong approach to solve
a problem
Hi,
thx for sharing your experience with us. I'm happy to see that both methods
I've thought of are apparently sensible ;-)
However, it might be due to my lack of experience in that domain, but some of
your arguments in favor of a multi-index solution seem to me to be also
compatible with a si
Hi,
Here's an example of raw data that would be in my Sales index:
*Affiliate / SaleDate / SaleAmount*
* mike / 2010-03-01 / 10.00
* john / 2010-03-01 / 10.00
* mike / 2010-03-02 / 15.00
* john / 2010-03-02 / 5.00
* mike / 2010-03-03 / 20.00
* john / 2010-03-03 / 1.00
* mike / 2010-03-04 / 10.0
Hi Michel,
You can do all of this with Lucene however not with a standard index/query
operators. At Attivio we have a custom Lucene index structure + custom
query operators that support relational joins across records in an index. You
can write the queries in our standard query language or run
> Lucene is great at searching for data, but just because it is awesome in one
> area doesn't mean it would excel in something it wasn't designed for ;-)
I think lucene is probably one of the better data structures for
computing "conditional aggregated stats". Even for straight search
lucene has
How?
paul
Le 01-avr.-10 à 14:19, henrib a écrit :
Finally, query expansion can also be used in the multiple indices
case and
might even use automated/guided translation.
-
To unsubscribe, e-mail: java-user-unsubscr...@lu
Hi,
I worked some time ago on a similar system (using Solr) and used the
multiple indices route (the multicore feature in Solr). In our case, the
"same" document could exist in different languages; different localized
versions of the same information (same Solr unique id for each l10n
version).
If you are going to end up either copying or moving all the data to lucene
(which, when you hook up lucene even to the existing mysql data, it will still
create it's own copy of the data), you might really want to look at other
options:
*column oriented databases (analytical databases). If ope
Not sure what you mean by "joining" in lucene , since conceptually
there is only 1 table ( with many field aka columns ) in lucene. A
representative query would be good to know the use case.
Again didn't get the "sorting" part. SUM() will return only 1
aggregated value, so what do you want to sor
Are you planning to be able to sort by these SUMs? A SpanQuery would work
great to get the integers... then you would loop and sum up... but what
about "joining" with your other data and sorting?
- Mike
aka...@gmail.com
On Wed, Mar 31, 2010 at 9:23 PM, prasenjit mukherjee
wrote:
> I too am tryi
@Ken: yeah we thought about it - but we have a HUGE amount of data (sales,
affiliates, etc.) - so pre-calculating everything isn't really an option.
Plus I don't know how we would sort.. let's say I get the totals for
affiliate X, loop totals from day 1 to X (range), sum up, great: I can do
this fo
There is no way to just index an object as such, but there are ways of
creating Fields out of byte arrays or with Readers so you could use
them with serialization or something.
--
Ian.
On Thu, Apr 1, 2010 at 10:46 AM, Bujji wrote:
> hi all,
>
> i want to index an object once instead of indexing
hi all,
i want to index an object once instead of indexing it by strings seperately
can i have that facility exisitng or any anlyzer is there ?
please help me
Thanks
Bujji
Query I
its written "to do a "search within search", so that the second search is
constrained by the results of the first query"
we can use boolean query.
So doesn't it mean the order of query ll be preserved
give me an simple example of how the docs get searched in lucene.
10 docs with 3 fields
> Query I
> Does the order of query play role in searching
> example:doc has fields
> rollno(pk), name, marks
>
> Query : marks=90&rollno=2&name=abc
>
> Query :rollno=2&name=abc&marks=90
>
> which query processing will be more efficient.
> is it work like search doc field by field , it will look fo
Hi, Michel,
This has already been implemented in DBSight. Check it out!
http://www.dbsight.net
You can get sum, avg for Facet searches. And count is included in Facet
search directly.
--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: htt
Hi,
thanks Paul for your input. I'm gonna try the "localized field" variant and see
how it works for me.
I think your idea of automatically boosting the user language is neat, but it
should definitely be possible to disable this boosting... Most users have no
idea about the language settings
44 matches
Mail list logo