Documents indexing

2023-12-04 Thread Ashkar
Hi Solr Users,

I have a few questions.


  1.  Can I crawl One Drive and index the documents?
  2.  Are we able to crawl a website that has a login?
  3.  Can we crawl documents from an HTTP/HTTPS-based portal and do the 
indexing?


Regards,





[Logo]






Ashkar

System Analyst

M +91 9605043094

E   ash...@chimeratechnologies.com

W  www.chimeratechnologies.com

Solutions for : FinTech | InsurTech | HRTech | Monitoring | Governance
Offered as : Product Development | Application Management | QA and Testing




***Disclaimer *** This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION 
intended solely for the use of the addressee(s). If you are not the intended 
recipient, please notify the sender by e-mail and delete the original message. 
The unauthorized use, dissemination, distribution, or reproduction of this 
e-mail, including attachments, is prohibited and may be unlawful. This e-mail 
may contain viruses. Chimera has taken every reasonable precaution to minimize 
this risk but is not liable for any damage you may sustain as a result of any 
virus in this e-mail. You should carry out your own virus checks before opening 
the e-mail or attachment. Chimera reserves the right to monitor and review the 
content of all messages sent to or from this e-mail address. Messages sent to 
or from this e-mail address may be stored on the Chimeras' e-mail system..


Re: Documents indexing

2023-12-04 Thread Charlie Hull

Hi Ashkar,


Yes you can do all these things  - but not with Solr, which doesn't come 
with a built-in website crawler. You'll need to look at some other 
projects for that such as:

http://crawler.archive.org/index.html Heritrix
http://lucene.apache.org/nutch/ Nutch (created by Doug Cutting who also 
created Lucene) - there's a tutorial that includes Solr 
https://cwiki.apache.org/confluence/display/nutch/NutchTutorial

https://manifoldcf.apache.org/en_US/index.html ManifoldCF


There's a few other options on this (slightly old) page 
https://cwiki.apache.org/confluence/display/SOLR/SolrEcosystem - and 
there are probably hundreds of other options, including writing your own.



Best

Charlie

On 04/12/2023 08:28, Ashkar wrote:

Hi Solr Users,

I have a few questions.

 1. Can I crawl One Drive and index the documents?
 2. Are we able to crawl a website that has a login?
 3. Can we crawl documents from an HTTP/HTTPS-based portal and do the
indexing?


Regards,

Logo




*Ashkar*

System Analyst

*M***+91 9605043094

*E _ash...@chimeratechnologies.com 
_*


*W *_www.chimeratechnologies.com _

Solutions for : FinTech | InsurTech | HRTech | Monitoring | Governance
Offered as : Product Development | Application Management | QA and Testing




Disclaimer  This e-mail contains PRIVILEGED AND CONFIDENTIAL 
INFORMATION intended solely for the use of the addressee(s). If you 
are not the intended recipient, please notify the sender by e-mail and 
delete the original message. The unauthorized use, dissemination, 
distribution, or reproduction of this e-mail, including attachments, 
is prohibited and may be unlawful. This e-mail may contain viruses. 
Chimera has taken every reasonable precaution to minimize this risk 
but is not liable for any damage you may sustain as a result of any 
virus in this e-mail. You should carry out your own virus checks 
before opening the e-mail or attachment. Chimera reserves the right to 
monitor and review the content of all messages sent to or from this 
e-mail address. Messages sent to or from this e-mail address may be 
stored on the Chimeras' e-mail system.. 


--
Charlie Hull - Managing Consultant at OpenSource Connections Limited
Founding member of The Search Network and co-author of Searching the Enterprise
tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828

OpenSource Connections Europe GmbH | Pappelallee 78/79 | 10437 Berlin
Amtsgericht Charlottenburg | HRB 230712 B
Geschäftsführer: John M. Woodell | David E. Pugh
Finanzamt: Berlin Finanzamt für Körperschaften II


Re: Invalid PHPS response for Luke request

2023-12-04 Thread Mikhail Khludnev
Hello Thomas.
It's not really my wheelhouse. There might be a problem in LukeHandler or
its' compatibility with phps. e.g. under "solr" key it puts Lucene's
Document but not SolrDocument really (see
LukeRequestHandler.handleRequestBody).
Do you really need /luke can't you obtain a doc via /select or /get?

On Sun, Dec 3, 2023 at 11:49 AM Thomas Corthals 
wrote:

> Hi all,
>
>
> The output of a Luke request for a specific document with wt=phps can't be
> unserialized in PHP because it contains an error.
>
>
> > curl '
> http://localhost:8983/solr/techproducts/admin/luke?id=apple&wt=phps'
>
>
> The output is structured like this, I'm omitting some details for brevity.
>
>
>
> a:4:{s:14:"responseHeader";a:2:{s:6:"status";i:0;s:5:"QTime";i:1;}s:5:"index";a:0:{}s:3:"doc";a:3:{s:5:"docId";i:7;s:6:"lucene";a:3:{s:2:"id";a:0:{}s:10:"compName_s";a:0:{}s:9:"address_s";a:0:{}}s:4:"solr";i:0;a:3:{s:2:"id";s:5:"apple";s:10:"compName_s";s:5:"Apple";s:9:"address_s";s:28:"1
> Infinite Way, Cupertino
> CA";}}s:4:"info";a:2:{s:3:"key";a:0:{}s:4:"NOTE";s:116:"Document Frequency
> (df) is not updated when a document is marked for deletion.  df values
> include deleted documents.";}}
>
>
> The Solr document in this output should be an array item with key "solr"
> and an array of fields as value. It's represented by this bit:
>
>
>
> s:4:"solr";i:0;a:3:{s:2:"id";s:5:"apple";s:10:"compName_s";s:5:"Apple";s:9:"address_s";s:28:"1
> Infinite Way, Cupertino CA";}
>
>
> The i:0; part doesn't belong there. It makes PHP think the value of key
> "solr" is integer 0 and then it bails because it doesn't expect an array
> a:3: to follow next. I suspect writeSolrDocument() in the
> PHPSerializedResponseWriter is to blame. It starts with writeKey(idx,
> false); assuming it's always writing a document to an array?
>
>
> Thomas
>


-- 
Sincerely yours
Mikhail Khludnev


Re: [TOKYO Lucene/Solr meetup] ~ How to Integrate Open Source Large Language with Apache Solr (Alessandro) ~

2023-12-04 Thread Mikhail Khludnev
Hello,
Alessandro, thank you. It's a great concise talk!
I was a little bit late to the party. Koji thanks for the link!
Wouldn't you share a recording?


On Fri, Dec 1, 2023 at 11:16 AM Koji Sekiguchi 
wrote:

> https://meet.google.com/zrf-khvk-ifw
>
>
> 2023年12月1日(金) 17:09 Mikhail Khludnev :
>
> > Hi there!
> > Is it supposed to go right now? I joined via link in event
> > https://meet.google.com/eob-sjub-ghe, but I'm the only one.
> >
> > On Mon, Nov 27, 2023 at 5:01 AM Moriyasu Kannami <
> > moriyasu.kann...@rondhuit.com> wrote:
> >
> > > Hi, Alessandro.
> > >
> > > Thank you for your reply and  advertisement.
> > > I am looking forward to the lecture scheduled for December 1st in
> JAPAN.
> > >
> > > And,
> > > To the members of this community, here is a reminder!
> > >
> > > If you wish to participate in this meetup, please register using the
> > > following URL.
> > > The event page(in Japanese): https://solr.doorkeeper.jp/events/166257
> > >
> > > We are currently looking for several speakers for Lightning Talks of
> > > this meetup.
> > > Speakers are welcome to participate online(Google Meet) as well.
> > > The theme can be anything related to the Apache Lucene/Solr,
> > > such as case studies, new features, and more.
> > > If you wish to participate as a LT speaker, please contact me
> > > at my email address.
> > >
> > > Thanks!
> > >
> > > moriyasu
> > >
> > >
> > > 2023年11月24日(金) 0:26 Alessandro Benedetti :
> > >
> > > > Thanks, Moriyasu for the heads up!
> > > > I'll advertise it on my LinkedIn soon and on the Apache Solr Twitter
> > > > account!
> > > > Also contacted some of the Japanese people I know to get better
> > > engagement
> > > > :)
> > > >
> > > > Cheers
> > > > --
> > > > *Alessandro Benedetti*
> > > > Director @ Sease Ltd.
> > > > *Apache Lucene/Solr Committer*
> > > > *Apache Solr PMC Member*
> > > >
> > > > e-mail: a.benede...@sease.io
> > > >
> > > >
> > > > *Sease* - Information Retrieval Applied
> > > > Consulting | Training | Open Source
> > > >
> > > > Website: Sease.io 
> > > > LinkedIn  | Twitter
> > > >  | Youtube
> > > >  | Github
> > > > 
> > > >
> > > >
> > > > On Wed, 22 Nov 2023 at 04:13, Moriyasu Kannami <
> > > > moriyasu.kann...@rondhuit.com> wrote:
> > > >
> > > > > Hi, everyone!
> > > > >
> > > > > The Solr community in Tokyo(#SolrJP) will hold a meet-up on 1
> > December
> > > > > 2023 at 17:00 JST(1 December 2023 at 01:00 PDT) in Toranomon
> > > > > Minato-Ku Tokyo. Online streaming via Google Meet is also
> available.
> > > > >
> > > > > The topic of this presentation is "How to Integrate
> > > > > Open Source Large Language with Apache Solr".
> > > > > This presentation is in English for about 45min.
> > > > >
> > > > > Speaker : Alessandro Benedetti
> > > > >   Apache Lucene/Solr Committer
> > > > >   Apache Solr PMC Member
> > > > >
> > > > > If you wish to participate, please register using the following
> URL.
> > > > > The event page(in Japanese):
> > https://solr.doorkeeper.jp/events/166257
> > > > >
> > > > > We look forward to your participation.
> > > > >
> > > > > We are currently looking for several speakers for Lightning Talks
> of
> > > > > this meetup who reside in the Tokyo vicinity and can attend the
> > > > > venue(Toranomon Minato-Ku) on the day of the event.
> > > > > Speakers are welcome to participate online(Google Meet) as well.
> > > > > In that case,the place of residence is not a concern.
> > > > > The theme can be anything related to the Apache Lucene/Solr,
> > > > > such as case studies, new features, and more.
> > > > > If you wish to participate as a LT speaker, please contact me
> > > > > at this email address.
> > > > >
> > > > > Thanks!
> > > > >
> > > > > moriyasu
> > > > >
> > > >
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>


-- 
Sincerely yours
Mikhail Khludnev


Re: [TOKYO Lucene/Solr meetup] ~ How to Integrate Open Source Large Language with Apache Solr (Alessandro) ~

2023-12-04 Thread Moriyasu Kannami
Hi Mikhail,

I was the host of the Google Meet at that meet-up.
At the meet-up, I didn't record the Google meet video.

Regards,
moriyasu


2023年12月4日(月) 18:29 Mikhail Khludnev :

> Hello,
> Alessandro, thank you. It's a great concise talk!
> I was a little bit late to the party. Koji thanks for the link!
> Wouldn't you share a recording?
>
>
> On Fri, Dec 1, 2023 at 11:16 AM Koji Sekiguchi <
> koji.sekigu...@rondhuit.com> wrote:
>
>> https://meet.google.com/zrf-khvk-ifw
>>
>>
>> 2023年12月1日(金) 17:09 Mikhail Khludnev :
>>
>> > Hi there!
>> > Is it supposed to go right now? I joined via link in event
>> > https://meet.google.com/eob-sjub-ghe, but I'm the only one.
>> >
>> > On Mon, Nov 27, 2023 at 5:01 AM Moriyasu Kannami <
>> > moriyasu.kann...@rondhuit.com> wrote:
>> >
>> > > Hi, Alessandro.
>> > >
>> > > Thank you for your reply and  advertisement.
>> > > I am looking forward to the lecture scheduled for December 1st in
>> JAPAN.
>> > >
>> > > And,
>> > > To the members of this community, here is a reminder!
>> > >
>> > > If you wish to participate in this meetup, please register using the
>> > > following URL.
>> > > The event page(in Japanese): https://solr.doorkeeper.jp/events/166257
>> > >
>> > > We are currently looking for several speakers for Lightning Talks of
>> > > this meetup.
>> > > Speakers are welcome to participate online(Google Meet) as well.
>> > > The theme can be anything related to the Apache Lucene/Solr,
>> > > such as case studies, new features, and more.
>> > > If you wish to participate as a LT speaker, please contact me
>> > > at my email address.
>> > >
>> > > Thanks!
>> > >
>> > > moriyasu
>> > >
>> > >
>> > > 2023年11月24日(金) 0:26 Alessandro Benedetti :
>> > >
>> > > > Thanks, Moriyasu for the heads up!
>> > > > I'll advertise it on my LinkedIn soon and on the Apache Solr Twitter
>> > > > account!
>> > > > Also contacted some of the Japanese people I know to get better
>> > > engagement
>> > > > :)
>> > > >
>> > > > Cheers
>> > > > --
>> > > > *Alessandro Benedetti*
>> > > > Director @ Sease Ltd.
>> > > > *Apache Lucene/Solr Committer*
>> > > > *Apache Solr PMC Member*
>> > > >
>> > > > e-mail: a.benede...@sease.io
>> > > >
>> > > >
>> > > > *Sease* - Information Retrieval Applied
>> > > > Consulting | Training | Open Source
>> > > >
>> > > > Website: Sease.io 
>> > > > LinkedIn  | Twitter
>> > > >  | Youtube
>> > > >  | Github
>> > > > 
>> > > >
>> > > >
>> > > > On Wed, 22 Nov 2023 at 04:13, Moriyasu Kannami <
>> > > > moriyasu.kann...@rondhuit.com> wrote:
>> > > >
>> > > > > Hi, everyone!
>> > > > >
>> > > > > The Solr community in Tokyo(#SolrJP) will hold a meet-up on 1
>> > December
>> > > > > 2023 at 17:00 JST(1 December 2023 at 01:00 PDT) in Toranomon
>> > > > > Minato-Ku Tokyo. Online streaming via Google Meet is also
>> available.
>> > > > >
>> > > > > The topic of this presentation is "How to Integrate
>> > > > > Open Source Large Language with Apache Solr".
>> > > > > This presentation is in English for about 45min.
>> > > > >
>> > > > > Speaker : Alessandro Benedetti
>> > > > >   Apache Lucene/Solr Committer
>> > > > >   Apache Solr PMC Member
>> > > > >
>> > > > > If you wish to participate, please register using the following
>> URL.
>> > > > > The event page(in Japanese):
>> > https://solr.doorkeeper.jp/events/166257
>> > > > >
>> > > > > We look forward to your participation.
>> > > > >
>> > > > > We are currently looking for several speakers for Lightning Talks
>> of
>> > > > > this meetup who reside in the Tokyo vicinity and can attend the
>> > > > > venue(Toranomon Minato-Ku) on the day of the event.
>> > > > > Speakers are welcome to participate online(Google Meet) as well.
>> > > > > In that case,the place of residence is not a concern.
>> > > > > The theme can be anything related to the Apache Lucene/Solr,
>> > > > > such as case studies, new features, and more.
>> > > > > If you wish to participate as a LT speaker, please contact me
>> > > > > at this email address.
>> > > > >
>> > > > > Thanks!
>> > > > >
>> > > > > moriyasu
>> > > > >
>> > > >
>> > >
>> >
>> >
>> > --
>> > Sincerely yours
>> > Mikhail Khludnev
>> >
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


RE: [TOKYO Lucene/Solr meetup] ~ How to Integrate Open Source LargeLanguage with Apache Solr (Alessandro) ~

2023-12-04 Thread ufuk yılmaz
Hi,

Thanks for inviting us, it was very informative!

Is it possible to have a copy of the slides? If they are private/closed source, 
than I understand.

Sincerely,

Ufuk yilmaz 

-ufuk yilmaz

Sent from Mail for Windows

From: Moriyasu Kannami
Sent: Monday, December 4, 2023 12:58 PM
To: Mikhail Khludnev
Cc: Alessandro Benedetti; users@solr.apache.org; Koji Sekiguchi
Subject: Re: [TOKYO Lucene/Solr meetup] ~ How to Integrate Open Source 
LargeLanguage with Apache Solr (Alessandro) ~

Hi Mikhail,

I was the host of the Google Meet at that meet-up.
At the meet-up, I didn't record the Google meet video.

Regards,
moriyasu


2023年12月4日(月) 18:29 Mikhail Khludnev :

> Hello,
> Alessandro, thank you. It's a great concise talk!
> I was a little bit late to the party. Koji thanks for the link!
> Wouldn't you share a recording?
>
>
> On Fri, Dec 1, 2023 at 11:16 AM Koji Sekiguchi <
> koji.sekigu...@rondhuit.com> wrote:
>
>> https://meet.google.com/zrf-khvk-ifw
>>
>>
>> 2023年12月1日(金) 17:09 Mikhail Khludnev :
>>
>> > Hi there!
>> > Is it supposed to go right now? I joined via link in event
>> > https://meet.google.com/eob-sjub-ghe, but I'm the only one.
>> >
>> > On Mon, Nov 27, 2023 at 5:01 AM Moriyasu Kannami <
>> > moriyasu.kann...@rondhuit.com> wrote:
>> >
>> > > Hi, Alessandro.
>> > >
>> > > Thank you for your reply and  advertisement.
>> > > I am looking forward to the lecture scheduled for December 1st in
>> JAPAN.
>> > >
>> > > And,
>> > > To the members of this community, here is a reminder!
>> > >
>> > > If you wish to participate in this meetup, please register using the
>> > > following URL.
>> > > The event page(in Japanese): https://solr.doorkeeper.jp/events/166257
>> > >
>> > > We are currently looking for several speakers for Lightning Talks of
>> > > this meetup.
>> > > Speakers are welcome to participate online(Google Meet) as well.
>> > > The theme can be anything related to the Apache Lucene/Solr,
>> > > such as case studies, new features, and more.
>> > > If you wish to participate as a LT speaker, please contact me
>> > > at my email address.
>> > >
>> > > Thanks!
>> > >
>> > > moriyasu
>> > >
>> > >
>> > > 2023年11月24日(金) 0:26 Alessandro Benedetti :
>> > >
>> > > > Thanks, Moriyasu for the heads up!
>> > > > I'll advertise it on my LinkedIn soon and on the Apache Solr Twitter
>> > > > account!
>> > > > Also contacted some of the Japanese people I know to get better
>> > > engagement
>> > > > :)
>> > > >
>> > > > Cheers
>> > > > --
>> > > > *Alessandro Benedetti*
>> > > > Director @ Sease Ltd.
>> > > > *Apache Lucene/Solr Committer*
>> > > > *Apache Solr PMC Member*
>> > > >
>> > > > e-mail: a.benede...@sease.io
>> > > >
>> > > >
>> > > > *Sease* - Information Retrieval Applied
>> > > > Consulting | Training | Open Source
>> > > >
>> > > > Website: Sease.io 
>> > > > LinkedIn  | Twitter
>> > > >  | Youtube
>> > > >  | Github
>> > > > 
>> > > >
>> > > >
>> > > > On Wed, 22 Nov 2023 at 04:13, Moriyasu Kannami <
>> > > > moriyasu.kann...@rondhuit.com> wrote:
>> > > >
>> > > > > Hi, everyone!
>> > > > >
>> > > > > The Solr community in Tokyo(#SolrJP) will hold a meet-up on 1
>> > December
>> > > > > 2023 at 17:00 JST(1 December 2023 at 01:00 PDT) in Toranomon
>> > > > > Minato-Ku Tokyo. Online streaming via Google Meet is also
>> available.
>> > > > >
>> > > > > The topic of this presentation is "How to Integrate
>> > > > > Open Source Large Language with Apache Solr".
>> > > > > This presentation is in English for about 45min.
>> > > > >
>> > > > > Speaker : Alessandro Benedetti
>> > > > >   Apache Lucene/Solr Committer
>> > > > >   Apache Solr PMC Member
>> > > > >
>> > > > > If you wish to participate, please register using the following
>> URL.
>> > > > > The event page(in Japanese):
>> > https://solr.doorkeeper.jp/events/166257
>> > > > >
>> > > > > We look forward to your participation.
>> > > > >
>> > > > > We are currently looking for several speakers for Lightning Talks
>> of
>> > > > > this meetup who reside in the Tokyo vicinity and can attend the
>> > > > > venue(Toranomon Minato-Ku) on the day of the event.
>> > > > > Speakers are welcome to participate online(Google Meet) as well.
>> > > > > In that case,the place of residence is not a concern.
>> > > > > The theme can be anything related to the Apache Lucene/Solr,
>> > > > > such as case studies, new features, and more.
>> > > > > If you wish to participate as a LT speaker, please contact me
>> > > > > at this email address.
>> > > > >
>> > > > > Thanks!
>> > > > >
>> > > > > moriyasu
>> > > > >
>> > > >
>> > >
>> >
>> >
>> > --
>> > Sincerely yours
>> > Mikhail Khludnev
>> >
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>



Possible to identify _all_ fields in the index with actual multiple values?

2023-12-04 Thread Jimi Hullegård
We have a lot of fields that are defined as multiValued="true", but where I 
suspect that we only actually use it as a single value field. And I would like 
to tidy up the schema a bit, and only use multiValued="true" for fields where 
it's actually needed. So, I would like to get an overview of all fields that 
could potentially be converted to single value fields, as well as all fields 
that actually need to stay multi valued.

As far as I understand it, there is no built in way to get this information out 
of the box (ie without changing the configuration, and without going through 
each document and each field). I have read about the 
CountFieldValuesUpdateProcessorFactory, but I'm unsure how to use it for many 
fields, most of which are dynamic fields. Does someone have an example on this? 
Preferably a way to make solr index the count for every multivalue field in the 
index.

Or is there some different approach I can take?

Regards

Svenskt Näringsliv är företagsamhetens röst i Sverige. Vi samverkar med 48 
arbetsgivar- och branschorganisationer och är den gemensamma rösten för 60 000 
företag med nästan 2 miljoner medarbetare. Vår uppgift är att tala för alla 
företag och branscher, även de som ännu inte finns men som kan uppstå om 
förutsättningarna är de rätta. Ett bättre företagsklimat för ett bättre 
Sverige. Det är vårt uppdrag.

Svenskt Näringsliv behandlar dina personuppgifter i enlighet med GDPR. Här kan 
du läsa mer om vår behandling och dina rättigheter, 
Integritetspolicy


Re: Invalid PHPS response for Luke request

2023-12-04 Thread Thomas Corthals
Hello Mikhail

I wrote the Luke functionality for the Solarium PHP client. There is no use
case here, just the desire to fully support the functionality.

If we can agree this is a bug in the PHPS output (JSON output doesn't
contain errors) I'll open a JIRA issue for it and implement a forward
compatible workaround in the client.

Thomas

Op ma 4 dec 2023 om 10:15 schreef Mikhail Khludnev :

> Hello Thomas.
> It's not really my wheelhouse. There might be a problem in LukeHandler or
> its' compatibility with phps. e.g. under "solr" key it puts Lucene's
> Document but not SolrDocument really (see
> LukeRequestHandler.handleRequestBody).
> Do you really need /luke can't you obtain a doc via /select or /get?
>
> On Sun, Dec 3, 2023 at 11:49 AM Thomas Corthals 
> wrote:
>
> > Hi all,
> >
> >
> > The output of a Luke request for a specific document with wt=phps can't
> be
> > unserialized in PHP because it contains an error.
> >
> >
> > > curl '
> > http://localhost:8983/solr/techproducts/admin/luke?id=apple&wt=phps'
> >
> >
> > The output is structured like this, I'm omitting some details for
> brevity.
> >
> >
> >
> >
> a:4:{s:14:"responseHeader";a:2:{s:6:"status";i:0;s:5:"QTime";i:1;}s:5:"index";a:0:{}s:3:"doc";a:3:{s:5:"docId";i:7;s:6:"lucene";a:3:{s:2:"id";a:0:{}s:10:"compName_s";a:0:{}s:9:"address_s";a:0:{}}s:4:"solr";i:0;a:3:{s:2:"id";s:5:"apple";s:10:"compName_s";s:5:"Apple";s:9:"address_s";s:28:"1
> > Infinite Way, Cupertino
> > CA";}}s:4:"info";a:2:{s:3:"key";a:0:{}s:4:"NOTE";s:116:"Document
> Frequency
> > (df) is not updated when a document is marked for deletion.  df values
> > include deleted documents.";}}
> >
> >
> > The Solr document in this output should be an array item with key "solr"
> > and an array of fields as value. It's represented by this bit:
> >
> >
> >
> >
> s:4:"solr";i:0;a:3:{s:2:"id";s:5:"apple";s:10:"compName_s";s:5:"Apple";s:9:"address_s";s:28:"1
> > Infinite Way, Cupertino CA";}
> >
> >
> > The i:0; part doesn't belong there. It makes PHP think the value of key
> > "solr" is integer 0 and then it bails because it doesn't expect an array
> > a:3: to follow next. I suspect writeSolrDocument() in the
> > PHPSerializedResponseWriter is to blame. It starts with writeKey(idx,
> > false); assuming it's always writing a document to an array?
> >
> >
> > Thomas
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: Invalid PHPS response for Luke request

2023-12-04 Thread Mikhail Khludnev
It looks like https://github.com/apache/solr/pull/2114
I see 'i:0' in actual output. I'm not sure whether it's wrong or right.

On Mon, Dec 4, 2023 at 3:23 PM Thomas Corthals 
wrote:

> Hello Mikhail
>
> I wrote the Luke functionality for the Solarium PHP client. There is no use
> case here, just the desire to fully support the functionality.
>
> If we can agree this is a bug in the PHPS output (JSON output doesn't
> contain errors) I'll open a JIRA issue for it and implement a forward
> compatible workaround in the client.
>
> Thomas
>
> Op ma 4 dec 2023 om 10:15 schreef Mikhail Khludnev :
>
> > Hello Thomas.
> > It's not really my wheelhouse. There might be a problem in LukeHandler or
> > its' compatibility with phps. e.g. under "solr" key it puts Lucene's
> > Document but not SolrDocument really (see
> > LukeRequestHandler.handleRequestBody).
> > Do you really need /luke can't you obtain a doc via /select or /get?
> >
> > On Sun, Dec 3, 2023 at 11:49 AM Thomas Corthals 
> > wrote:
> >
> > > Hi all,
> > >
> > >
> > > The output of a Luke request for a specific document with wt=phps can't
> > be
> > > unserialized in PHP because it contains an error.
> > >
> > >
> > > > curl '
> > > http://localhost:8983/solr/techproducts/admin/luke?id=apple&wt=phps'
> > >
> > >
> > > The output is structured like this, I'm omitting some details for
> > brevity.
> > >
> > >
> > >
> > >
> >
> a:4:{s:14:"responseHeader";a:2:{s:6:"status";i:0;s:5:"QTime";i:1;}s:5:"index";a:0:{}s:3:"doc";a:3:{s:5:"docId";i:7;s:6:"lucene";a:3:{s:2:"id";a:0:{}s:10:"compName_s";a:0:{}s:9:"address_s";a:0:{}}s:4:"solr";i:0;a:3:{s:2:"id";s:5:"apple";s:10:"compName_s";s:5:"Apple";s:9:"address_s";s:28:"1
> > > Infinite Way, Cupertino
> > > CA";}}s:4:"info";a:2:{s:3:"key";a:0:{}s:4:"NOTE";s:116:"Document
> > Frequency
> > > (df) is not updated when a document is marked for deletion.  df values
> > > include deleted documents.";}}
> > >
> > >
> > > The Solr document in this output should be an array item with key
> "solr"
> > > and an array of fields as value. It's represented by this bit:
> > >
> > >
> > >
> > >
> >
> s:4:"solr";i:0;a:3:{s:2:"id";s:5:"apple";s:10:"compName_s";s:5:"Apple";s:9:"address_s";s:28:"1
> > > Infinite Way, Cupertino CA";}
> > >
> > >
> > > The i:0; part doesn't belong there. It makes PHP think the value of key
> > > "solr" is integer 0 and then it bails because it doesn't expect an
> array
> > > a:3: to follow next. I suspect writeSolrDocument() in the
> > > PHPSerializedResponseWriter is to blame. It starts with writeKey(idx,
> > > false); assuming it's always writing a document to an array?
> > >
> > >
> > > Thomas
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Invalid PHPS response for Luke request

2023-12-04 Thread Mikhail Khludnev
I noticed that every array element, and Solr document field is prepended
with a numeric key which is a sequential element index. So, these i:0 might
be pretty valid.

On Mon, Dec 4, 2023 at 4:04 PM Mikhail Khludnev  wrote:

> It looks like https://github.com/apache/solr/pull/2114
> I see 'i:0' in actual output. I'm not sure whether it's wrong or right.
>
> On Mon, Dec 4, 2023 at 3:23 PM Thomas Corthals 
> wrote:
>
>> Hello Mikhail
>>
>> I wrote the Luke functionality for the Solarium PHP client. There is no
>> use
>> case here, just the desire to fully support the functionality.
>>
>> If we can agree this is a bug in the PHPS output (JSON output doesn't
>> contain errors) I'll open a JIRA issue for it and implement a forward
>> compatible workaround in the client.
>>
>> Thomas
>>
>> Op ma 4 dec 2023 om 10:15 schreef Mikhail Khludnev :
>>
>> > Hello Thomas.
>> > It's not really my wheelhouse. There might be a problem in LukeHandler
>> or
>> > its' compatibility with phps. e.g. under "solr" key it puts Lucene's
>> > Document but not SolrDocument really (see
>> > LukeRequestHandler.handleRequestBody).
>> > Do you really need /luke can't you obtain a doc via /select or /get?
>> >
>> > On Sun, Dec 3, 2023 at 11:49 AM Thomas Corthals 
>> > wrote:
>> >
>> > > Hi all,
>> > >
>> > >
>> > > The output of a Luke request for a specific document with wt=phps
>> can't
>> > be
>> > > unserialized in PHP because it contains an error.
>> > >
>> > >
>> > > > curl '
>> > > http://localhost:8983/solr/techproducts/admin/luke?id=apple&wt=phps'
>> > >
>> > >
>> > > The output is structured like this, I'm omitting some details for
>> > brevity.
>> > >
>> > >
>> > >
>> > >
>> >
>> a:4:{s:14:"responseHeader";a:2:{s:6:"status";i:0;s:5:"QTime";i:1;}s:5:"index";a:0:{}s:3:"doc";a:3:{s:5:"docId";i:7;s:6:"lucene";a:3:{s:2:"id";a:0:{}s:10:"compName_s";a:0:{}s:9:"address_s";a:0:{}}s:4:"solr";i:0;a:3:{s:2:"id";s:5:"apple";s:10:"compName_s";s:5:"Apple";s:9:"address_s";s:28:"1
>> > > Infinite Way, Cupertino
>> > > CA";}}s:4:"info";a:2:{s:3:"key";a:0:{}s:4:"NOTE";s:116:"Document
>> > Frequency
>> > > (df) is not updated when a document is marked for deletion.  df values
>> > > include deleted documents.";}}
>> > >
>> > >
>> > > The Solr document in this output should be an array item with key
>> "solr"
>> > > and an array of fields as value. It's represented by this bit:
>> > >
>> > >
>> > >
>> > >
>> >
>> s:4:"solr";i:0;a:3:{s:2:"id";s:5:"apple";s:10:"compName_s";s:5:"Apple";s:9:"address_s";s:28:"1
>> > > Infinite Way, Cupertino CA";}
>> > >
>> > >
>> > > The i:0; part doesn't belong there. It makes PHP think the value of
>> key
>> > > "solr" is integer 0 and then it bails because it doesn't expect an
>> array
>> > > a:3: to follow next. I suspect writeSolrDocument() in the
>> > > PHPSerializedResponseWriter is to blame. It starts with writeKey(idx,
>> > > false); assuming it's always writing a document to an array?
>> > >
>> > >
>> > > Thomas
>> > >
>> >
>> >
>> > --
>> > Sincerely yours
>> > Mikhail Khludnev
>> >
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
Sincerely yours
Mikhail Khludnev


Re: knn query parser, number of results and filtering by score

2023-12-04 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Hi Mirko,

Thanks for sharing both the cloud and non-cloud test scenarios here!

Earlier today I merged a (one line) change for the 
https://issues.apache.org/jira/browse/SOLR-16952 issue. From reading this 
thread is appears that it could be correlated, as you mentioned, i.e. the 0.0 
in "0.0 = not in top 3" might it arise from a "1.0" rather than 1.0 explanation 
input?

If you were able to test with the SOLR-16952 change and share your findings 
that would be great.

Thank you,

Christine

From: users@solr.apache.org At: 10/23/23 16:21:09 UTC+1:00To:  
users@solr.apache.org
Subject: Re: knn query parser, number of results and filtering by score

Hi!

I tested this, and it seems the problem is related to Solr cloud mode. 
It works perfectly in a non-cloud setup. Here is the debug response:

{
   "responseHeader":{
 "status":0,
 "QTime":2,
 "params":{
   "q":"{!knn f=TESTEMBEDDING_EU_3  topK=3}[1.0,1.0,1.0]",
   "defType":"lucene",
   "indent":"true",
   "fl":"id,[explain],[shard],TESTEMBEDDING_EU_3",
   "q.op":"OR",
   "debug.explain.structured":"true",
   "debugQuery":"true",
   "useParams":"",
   "_":"1698059075615"
 }
   },
   "response":{
 "numFound":3,
 "start":0,
 "numFoundExact":true,
 "docs":[{
   "id":"Position11",
   "TESTEMBEDDING_EU_3":[1.0,1.0,1.0],
   "[explain]":"1.0 = within top 3\n",
   "[shard]":"[not a shard request]"
 },{
   "id":"Position10",
   "TESTEMBEDDING_EU_3":[0.9,0.9,0.9],
   "[explain]":"0.97087383 = within top 3\n",
   "[shard]":"[not a shard request]"
 },{
   "id":"Position9",
   "TESTEMBEDDING_EU_3":[0.8,0.8,0.8],
   "[explain]":"0.89285713 = within top 3\n",
   "[shard]":"[not a shard request]"
 }]
   },
   "debug":{
 "rawquerystring":"{!knn f=TESTEMBEDDING_EU_3 topK=3}[1.0,1.0,1.0]",
 "querystring":"{!knn f=TESTEMBEDDING_EU_3 topK=3}[1.0,1.0,1.0]",
"parsedquery":"KnnFloatVectorQuery(KnnFloatVectorQuery:TESTEMBEDDING_EU_3[1.0,..
.][3])",
"parsedquery_toString":"KnnFloatVectorQuery:TESTEMBEDDING_EU_3[1.0,...][3]",
 "explain":{
   "Position11":{
 "match":true,
 "value":1.0,
 "description":"within top 3"
   },
   "Position10":{
 "match":true,
 "value":0.97087383,
 "description":"within top 3"
   },
   "Position9":{
 "match":true,
 "value":0.89285713,
 "description":"within top 3"
   }
 },
 "QParser":"KnnQParser",
 "timing":{
   "time":2.0,
   "circuitbreaker":{
 "time":0.0
   },
   "prepare":{
 "time":0.0,
 "query":{
   "time":0.0
 },
 "facet":{
   "time":0.0
 },
 "facet_module":{
   "time":0.0
 },
 "mlt":{
   "time":0.0
 },
 "highlight":{
   "time":0.0
 },
 "stats":{
   "time":0.0
 },
 "expand":{
   "time":0.0
 },
 "terms":{
   "time":0.0
 },
 "debug":{
   "time":0.0
 }
   },
   "process":{
 "time":1.0,
 "query":{
   "time":0.0
 },
 "facet":{
   "time":0.0
 },
 "facet_module":{
   "time":0.0
 },
 "mlt":{
   "time":0.0
 },
 "highlight":{
   "time":0.0
 },
 "stats":{
   "time":0.0
 },
 "expand":{
   "time":0.0
 },
 "terms":{
   "time":0.0
 },
 "debug":{
   "time":1.0
 }
   }
 }
   }
}

Please let me know where to file a bug report. Is it the Jira at 
issues.apache.org?

Mirko

Am 20.10.2023 um 12:09 schrieb Alessandro Benedetti:
> I agree, you can definitely raise a bug for the debug, if you do me a
> favour and also test in no Cloud mode, it will help us to understand if
> it's a Solr bug or Lucene bug.
>
> I also agree with your second point about the functional expectations, that
> is a very minor though, you can create the ticket and contribute a fix if
> you like, happy to review it!
> --
> *Alessandro Benedetti*
> Director @ Sease Ltd.
> *Apache Lucene/Solr Committer*
> *Apache Solr PMC Member*
>
> e-mail: a.benede...@sease.io
>
>
> *Sease* - Information Retrieval Applied
> Consulting | Training | Open Source
>
> Website: Sease.io 
> LinkedIn  | Twitter
>  | Youtube
>  | Github
> 
>
>
> On Thu, 19 Oct 2023 at 17:51, Mirko Sertic  wrote:
>
>> I've prepared a testcase. Given the following documents with
>> TESTEMBEDDING_EU_3 is a DenseVectorField with length 3 and euclidean
>> distance function. They are written to a collection made of two shards
>> with

Re: Invalid PHPS response for Luke request

2023-12-04 Thread Mikhail Khludnev
Probably, /like/admin might be out of scope for particular client api,
since it's more like low level internal facility, subject to change -and
errorprone-.

пн, 4 дек. 2023 г., 15:23 Thomas Corthals :

> Hello Mikhail
>
> I wrote the Luke functionality for the Solarium PHP client. There is no use
> case here, just the desire to fully support the functionality.
>
> If we can agree this is a bug in the PHPS output (JSON output doesn't
> contain errors) I'll open a JIRA issue for it and implement a forward
> compatible workaround in the client.
>
> Thomas
>
> Op ma 4 dec 2023 om 10:15 schreef Mikhail Khludnev :
>
> > Hello Thomas.
> > It's not really my wheelhouse. There might be a problem in LukeHandler or
> > its' compatibility with phps. e.g. under "solr" key it puts Lucene's
> > Document but not SolrDocument really (see
> > LukeRequestHandler.handleRequestBody).
> > Do you really need /luke can't you obtain a doc via /select or /get?
> >
> > On Sun, Dec 3, 2023 at 11:49 AM Thomas Corthals 
> > wrote:
> >
> > > Hi all,
> > >
> > >
> > > The output of a Luke request for a specific document with wt=phps can't
> > be
> > > unserialized in PHP because it contains an error.
> > >
> > >
> > > > curl '
> > > http://localhost:8983/solr/techproducts/admin/luke?id=apple&wt=phps'
> > >
> > >
> > > The output is structured like this, I'm omitting some details for
> > brevity.
> > >
> > >
> > >
> > >
> >
> a:4:{s:14:"responseHeader";a:2:{s:6:"status";i:0;s:5:"QTime";i:1;}s:5:"index";a:0:{}s:3:"doc";a:3:{s:5:"docId";i:7;s:6:"lucene";a:3:{s:2:"id";a:0:{}s:10:"compName_s";a:0:{}s:9:"address_s";a:0:{}}s:4:"solr";i:0;a:3:{s:2:"id";s:5:"apple";s:10:"compName_s";s:5:"Apple";s:9:"address_s";s:28:"1
> > > Infinite Way, Cupertino
> > > CA";}}s:4:"info";a:2:{s:3:"key";a:0:{}s:4:"NOTE";s:116:"Document
> > Frequency
> > > (df) is not updated when a document is marked for deletion.  df values
> > > include deleted documents.";}}
> > >
> > >
> > > The Solr document in this output should be an array item with key
> "solr"
> > > and an array of fields as value. It's represented by this bit:
> > >
> > >
> > >
> > >
> >
> s:4:"solr";i:0;a:3:{s:2:"id";s:5:"apple";s:10:"compName_s";s:5:"Apple";s:9:"address_s";s:28:"1
> > > Infinite Way, Cupertino CA";}
> > >
> > >
> > > The i:0; part doesn't belong there. It makes PHP think the value of key
> > > "solr" is integer 0 and then it bails because it doesn't expect an
> array
> > > a:3: to follow next. I suspect writeSolrDocument() in the
> > > PHPSerializedResponseWriter is to blame. It starts with writeKey(idx,
> > > false); assuming it's always writing a document to an array?
> > >
> > >
> > > Thomas
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>


Re: [TOKYO Lucene/Solr meetup] ~ How to Integrate Open Source LargeLanguage with Apache Solr (Alessandro) ~

2023-12-04 Thread Alessandro Benedetti
Thanks guys! The slides will be available roughly in one month here:
https://sease.io/conferences-attended

I'll write again here when available.
If you want them more urgently feel free to reach me on Linkedin and I can
share the pdf there with you.
Cheers

On Mon, 4 Dec 2023, 20:32 ufuk yılmaz,  wrote:

> Hi,
>
> Thanks for inviting us, it was very informative!
>
> Is it possible to have a copy of the slides? If they are private/closed
> source, than I understand.
>
> Sincerely,
>
> Ufuk yilmaz
>
> -ufuk yilmaz
>
> Sent from Mail for Windows
>
> From: Moriyasu Kannami
> Sent: Monday, December 4, 2023 12:58 PM
> To: Mikhail Khludnev
> Cc: Alessandro Benedetti; users@solr.apache.org; Koji Sekiguchi
> Subject: Re: [TOKYO Lucene/Solr meetup] ~ How to Integrate Open Source
> LargeLanguage with Apache Solr (Alessandro) ~
>
> Hi Mikhail,
>
> I was the host of the Google Meet at that meet-up.
> At the meet-up, I didn't record the Google meet video.
>
> Regards,
> moriyasu
>
>
> 2023年12月4日(月) 18:29 Mikhail Khludnev :
>
> > Hello,
> > Alessandro, thank you. It's a great concise talk!
> > I was a little bit late to the party. Koji thanks for the link!
> > Wouldn't you share a recording?
> >
> >
> > On Fri, Dec 1, 2023 at 11:16 AM Koji Sekiguchi <
> > koji.sekigu...@rondhuit.com> wrote:
> >
> >> https://meet.google.com/zrf-khvk-ifw
> >>
> >>
> >> 2023年12月1日(金) 17:09 Mikhail Khludnev :
> >>
> >> > Hi there!
> >> > Is it supposed to go right now? I joined via link in event
> >> > https://meet.google.com/eob-sjub-ghe, but I'm the only one.
> >> >
> >> > On Mon, Nov 27, 2023 at 5:01 AM Moriyasu Kannami <
> >> > moriyasu.kann...@rondhuit.com> wrote:
> >> >
> >> > > Hi, Alessandro.
> >> > >
> >> > > Thank you for your reply and  advertisement.
> >> > > I am looking forward to the lecture scheduled for December 1st in
> >> JAPAN.
> >> > >
> >> > > And,
> >> > > To the members of this community, here is a reminder!
> >> > >
> >> > > If you wish to participate in this meetup, please register using the
> >> > > following URL.
> >> > > The event page(in Japanese):
> https://solr.doorkeeper.jp/events/166257
> >> > >
> >> > > We are currently looking for several speakers for Lightning Talks of
> >> > > this meetup.
> >> > > Speakers are welcome to participate online(Google Meet) as well.
> >> > > The theme can be anything related to the Apache Lucene/Solr,
> >> > > such as case studies, new features, and more.
> >> > > If you wish to participate as a LT speaker, please contact me
> >> > > at my email address.
> >> > >
> >> > > Thanks!
> >> > >
> >> > > moriyasu
> >> > >
> >> > >
> >> > > 2023年11月24日(金) 0:26 Alessandro Benedetti :
> >> > >
> >> > > > Thanks, Moriyasu for the heads up!
> >> > > > I'll advertise it on my LinkedIn soon and on the Apache Solr
> Twitter
> >> > > > account!
> >> > > > Also contacted some of the Japanese people I know to get better
> >> > > engagement
> >> > > > :)
> >> > > >
> >> > > > Cheers
> >> > > > --
> >> > > > *Alessandro Benedetti*
> >> > > > Director @ Sease Ltd.
> >> > > > *Apache Lucene/Solr Committer*
> >> > > > *Apache Solr PMC Member*
> >> > > >
> >> > > > e-mail: a.benede...@sease.io
> >> > > >
> >> > > >
> >> > > > *Sease* - Information Retrieval Applied
> >> > > > Consulting | Training | Open Source
> >> > > >
> >> > > > Website: Sease.io 
> >> > > > LinkedIn  | Twitter
> >> > > >  | Youtube
> >> > > >  |
> Github
> >> > > > 
> >> > > >
> >> > > >
> >> > > > On Wed, 22 Nov 2023 at 04:13, Moriyasu Kannami <
> >> > > > moriyasu.kann...@rondhuit.com> wrote:
> >> > > >
> >> > > > > Hi, everyone!
> >> > > > >
> >> > > > > The Solr community in Tokyo(#SolrJP) will hold a meet-up on 1
> >> > December
> >> > > > > 2023 at 17:00 JST(1 December 2023 at 01:00 PDT) in Toranomon
> >> > > > > Minato-Ku Tokyo. Online streaming via Google Meet is also
> >> available.
> >> > > > >
> >> > > > > The topic of this presentation is "How to Integrate
> >> > > > > Open Source Large Language with Apache Solr".
> >> > > > > This presentation is in English for about 45min.
> >> > > > >
> >> > > > > Speaker : Alessandro Benedetti
> >> > > > >   Apache Lucene/Solr Committer
> >> > > > >   Apache Solr PMC Member
> >> > > > >
> >> > > > > If you wish to participate, please register using the following
> >> URL.
> >> > > > > The event page(in Japanese):
> >> > https://solr.doorkeeper.jp/events/166257
> >> > > > >
> >> > > > > We look forward to your participation.
> >> > > > >
> >> > > > > We are currently looking for several speakers for Lightning
> Talks
> >> of
> >> > > > > this meetup who reside in the Tokyo vicinity and can attend the
> >> > > > > venue(Toranomon Minato-Ku) on the day of the event.
> >> > > > > Speakers are welcome to participate online(Google Meet) as well.
> >> > > > > In that case,the place of res