Hi Katy.

I just tested it and here's what I think.

The PDF is stored, (all of it, the whole document) in assetstore, e.g. in
/dspace/assetstore/48/52/25/48522548568548809071217508207656426356
as any other uploaded file would be. As you can notice, it has no extension.
But if you know what extension is proper (e.g. here I know it is pdf), I
just add it,
resulting in file
48522548568548809071217508207656426356.pdf
that can be opened.
In postgres you can find name of this file (the number) and probably info
about
extension somewhere.

I am not entirely sure when exactly is the file stored in the filesystem,
but
my guess is the moment it is uploaded, since I can see files in assetstore,
that belong to items in workspace, or even files I deleted from workspace
items.

What you probably did, is pick the wrong assetstore file, since appending
.pdf
works for me (and always did with different formats too).

I am unable to answer all of your questions, but I hope at least some things
are clearer now.

Best regards,
Majo

On Thu, Nov 23, 2023 at 1:17 AM Katy Earl <katy.e...@gmail.com> wrote:

> Hello,
>
>  So I’ve scoured what areas I could find in the documentation,
> particularly the Lyrasis “Storage Layer
> <https://wiki.lyrasis.org/display/DSDOC7x/Storage+Layer>” page, but I
> admit I’m still not fully understanding what exactly happens to files
> imported into the DSpace system. Perhaps someone can point me in the right
> direction?
>
>  Do I have this correct, using PDFs as an example? When you import a PDF
> to DSpace, such as by creating a new Item in the front end browser and
> dragging and dropping it for ingestion, and then saving it, and then later
> running filter-media, I’m guessing that the PDF is stored in pieces in the
> following way? Is it stored in whole anywhere??
>
>    1. *Assetstore*: DSpace makes a new file based on the PDF, converting
>    the bits by some sort of converter to a bitstream.
>       1. Result: a “bitstream” file which looks, more or less, like a
>       text file (openable and readable in Notepad).
>       2. Where: DSpace’s “assetstore” directory.
>       3. Questions:
>          - The original PDF, with all the formatting information, what
>          happens to it? Is it deleted?
>             - Why confused: If I change the bitstream in the “assetstore”
>             to have an extension of .pdf, PDF readers don’t recognize the 
> file, so this
>             can’t be the original file. So where did it go?
>          - What part of DSpace is doing the file conversion?
>       2. *PostgreSQL*: DSpace creates some metadata about the file,
>    perhaps including the metadata that the PDF has in its Properties and puts
>    it somewhere in the PostgreSQL database.
>       1. Question:
>          - Does this metadata about the PDF’s page formatting also get
>          stored in PostgreSQL?
>          - Does any part of the PDF itself get stored in PostgreSQL?
>          Maybe as a BLOB?
>          - Which tables exactly are holding this information?
>       3. *Solr*: DSpace’s filter-media extracts the full-text of the PDF
>    and gives it to Solr for indexing about the PDF.
>    4. *Is there anywhere else the PDF, in part or in whole, is stored*?
>
>
>
> Related to this, when a DSpace Item that is a PDF is opened in a browser
> through DSpace’s ui and it is downloaded, is the PDF getting “recreated” by
> DSpace on the fly from the bitstream and metadata in PostgreSQL (and ???)
> and then fed to the browser to open? Which part of DSpace code is handling
> this? Or is DSpace feeding the browser an already intact PDF from somewhere
> on the system?
>
> Anyway, apologies if the answers should be obvious, but looking around,
> I’m not the only person who is unfamiliar with a DAMS and how they store
> files, pieced out or in whole or otherwise.
>
> Many thanks in advance, an answer will clear up a lot of uncertainty.
>
> Best,
>
> Katy
>
>
>
>
>
> --
> All messages to this mailing list should adhere to the Code of Conduct:
> https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
> ---
> You received this message because you are subscribed to the Google Groups
> "DSpace Technical Support" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to dspace-tech+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/dspace-tech/d922ddf1-1b0f-45f7-8f4d-3271d9f4c518n%40googlegroups.com
> <https://groups.google.com/d/msgid/dspace-tech/d922ddf1-1b0f-45f7-8f4d-3271d9f4c518n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
All messages to this mailing list should adhere to the Code of Conduct: 
https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-tech/CAJ98do6Edh1-6BvQ9ZXsAMov3i1S9U4OBJQPL4jEBXQNdFri%3Dw%40mail.gmail.com.

Reply via email to